Assignment 5 - 2013
Due date(s): | 08 March 2013 |
(PDF) | Assignment questions |
<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/> Assignment objectives
=========
.. note:: **Assignment objectives**
* Build least squares models in R. * Extract useful information about the model outputs. * Investigate and understand multiple linear regression (MLR) models.
.. question:: :grading: 12
*No need to use software. Question from the final exam, 2011.*
Some data were collected from tests where the compressive strength, :math:`x`, used to form concrete was measured, as well as the intrinsic permeability of the product, :math:`y`. There were 16 data points collected. The mean :math:`x`-value was :math:`\overline{x} = 3.1` and the variance of the :math:`x`-values was 1.52. The average :math:`y`-value was 40.9. The estimated covariance between :math:`x` and :math:`y` was :math:`-5.5`.
The least squares estimate of the slope and intercept was: :math:`y = 52.1 - 3.6 x`.
#. What is the expected permeability when the compressive strength is at 5.8 units?
#. Calculate the 95% confidence interval for the slope if the standard error from the model was 4.5 units. Is the slope coefficient statistically significant?
#. Provide a rough estimate of the 95% prediction interval when the compressive strength is at 5.8 units (same level as for part 1). What assumptions did you make to provide this estimate?
#. Now provide a more accurate, calculated 95% prediction confidence interval for the previous part.
.. question:: :grading: 10
Use the `gas furnace data <http://datasets.connectmv.com/info/gas-furnace>`_ from the website to answer these questions. The data represent the gas flow rate (centered) from a process and the corresponding CO\ :sub:`2` measurement.
#. Make a scatter plot of the data to visualize the relationship between the variables. How would you characterize the relationship?
#. Calculate the variance for both variables, the covariance between the two variables, and the correlation between them, :math:`r(x,y)`. Interpret the correlation value; i.e. do you consider this a strong correlation?
#. Now calculate a least squares model relating the gas flow rate as the :math:`x` variable to the CO\ :sub:`2` measurement as the :math:`y`-variable. Report the intercept and slope from this model.
#. Report the :math:`R^2` from the regression model. Compare the squared value of :math:`r(x,y)` to :math:`R^2`. What do you notice? Now reinterpret what the correlation value means (i.e. compare this interpretation to your answer in part 2).
#. Switch :math:`x` and :math:`y` around and rebuild your least squares model. Compare the new :math:`R^2` to the previous model's :math:`R^2`. Is this result surprising? How do interpret this?
.. question:: :grading: 15
In this question we consider the `bioreactor yield <http://datasets.connectmv.com/info/bioreactor-yields>`_ data set and fit a linear model using all :math:`x`-variables simultaneously to predict the yield.
#. Provide the interpretation for each coefficient in the model, and also comment on each one's confidence interval when interpreting it.
#. Compare the 3 slope coefficient values the case when you regress yield onto each :math:`x`-variable on its own.:
- :math:`\hat{y} = 102.5 - 0.69T`, where :math:`T` is tank temperature - :math:`\hat{y} = -20.3 + 0.016S`, where :math:`S` is impeller speed - :math:`\hat{y} = 54.9 - 16.7B`, where :math:`B` is 1 if baffles are present and :math:`B=0` with no baffles
Explain why your coefficients do not match.
#. Are the residuals from the multiple linear regression model normally distributed?
#. In this part we are investigating the variance-covariance matrices used to calculate the linear model.
#. First center the :math:`x`-variables and the :math:`y`-variable that you used in the model.
*Note*: feel free to use MATLAB, or any other tool to answer this question. If you are using R, then you will benefit from the R tutorial on the course website. Also, read the help for the ``model.matrix(...)`` function to get the :math:`\mathbf{X}`-matrix. Then read the help for the ``sweep(...)`` function, or more simply use the ``scale(...)`` function to do the mean-centering.
#. Show your calculated :math:`\mathbf{X}^T\mathbf{X}` and :math:`\mathbf{X}^T\mathbf{y}` variance-covariance matrices from the centered data.
#. Explain why the interpretation of covariances in :math:`\mathbf{X}^T\mathbf{y}` match the results from the full MLR model you calculated in part 1 of this question.
#. Calculate :math:`\mathbf{b} =\left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T\mathbf{y}` and show that it agrees with the estimates that R calculated (even though R fits an intercept term, while your :math:`\mathbf{b}` does not).
#. What would be the predicted yield for an experiment run without baffles, at 4000 rpm impeller speed, run at a reactor temperature of 90 °C?
</rst>