This question uses the combined cycle power plant data set on Lichman, M. (2013). UCI Machine Learning Repository (http://archive.ics.uci.edu/ml). You can download the data set from the course website as “power plants.csv”. Here we aim to use the “Exhaust vacuum” (V) predictor to predict the “Net hourly electrical energy output” (EP) of the 9568 power plants. A description of the data set can be found at http: //archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant.

(a) Denote x as the Exhaust vacuum (V) and y as the Net hourly electrical energy output (EP). Use the poly() function to fit a cubic polynomial regression to predict y using x. Report the regression output, and plot the resulting data and polynomial fits.

(b) Plot the polynomial fits for a polynomials of degree 1,3,7, and 10, and report the associated residual sum of squares.

(c) Perform cross-validation to select the optimal degree for the polynomial, and explain your results.

(d) Use the bs(x, df = 4) function to fit a regression spline to predict EP using V. This will result in a spline with 5 of freedom when we include the intercept. Report the output for the fit. How did you choose the knots? Plot the resulting fit.

(e) Now fit a regression spline for a range of degrees of freedom, and plot a few of the resulting fits and report the resulting RSS. Describe the results obtained.

(f) Perform cross-validation in order to select the best degrees of freedom for a regression spline. Plot the CV estimate of error versus the degrees of freedom. Describe your results