Assignment 5 - 2013

From Statistics for Engineering
Revision as of 03:14, 1 March 2013 by Kevin Dunn (talk | contribs) (Created page with "{{OtherSidebar | due_dates = 08 March 2013 | dates_alt_text = Due date(s) | questions_PDF = 4C3-2013-Assignment-5.pdf | questions_text_alt = Assignment questions | questions_l...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Due date(s): 08 March 2013
Nuvola mimetypes pdf.png (PDF) Assignment questions

<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/> Assignment objectives

=========

.. note:: **Assignment objectives**

* Build least squares models in R. * Extract useful information about the model outputs. * Investigate and understand multiple linear regression (MLR) models.

.. question:: :grading: 12

*No need to use software. Question from the final exam, 2011.*

Some data were collected from tests where the compressive strength, :math:`x`, used to form concrete was measured, as well as the intrinsic permeability of the product, :math:`y`. There were 16 data points collected. The mean :math:`x`-value was :math:`\overline{x} = 3.1` and the variance of the :math:`x`-values was 1.52. The average :math:`y`-value was 40.9. The estimated covariance between :math:`x` and :math:`y` was :math:`-5.5`.

The least squares estimate of the slope and intercept was: :math:`y = 52.1 - 3.6 x`.

#. What is the expected permeability when the compressive strength is at 5.8 units?

#. Calculate the 95% confidence interval for the slope if the standard error from the model was 4.5 units. Is the slope coefficient statistically significant?

#. Provide a rough estimate of the 95% prediction interval when the compressive strength is at 5.8 units (same level as for part 1). What assumptions did you make to provide this estimate?

#. Now provide a more accurate, calculated 95% prediction confidence interval for the previous part.

.. question:: :grading: 10

Use the `gas furnace data <http://datasets.connectmv.com/info/gas-furnace>`_ from the website to answer these questions. The data represent the gas flow rate (centered) from a process and the corresponding CO\ :sub:`2` measurement.

#. Make a scatter plot of the data to visualize the relationship between the variables. How would you characterize the relationship?

#. Calculate the variance for both variables, the covariance between the two variables, and the correlation between them, :math:`r(x,y)`. Interpret the correlation value; i.e. do you consider this a strong correlation?

#. Now calculate a least squares model relating the gas flow rate as the :math:`x` variable to the CO\ :sub:`2` measurement as the :math:`y`-variable. Report the intercept and slope from this model.

#. Report the :math:`R^2` from the regression model. Compare the squared value of :math:`r(x,y)` to :math:`R^2`. What do you notice? Now reinterpret what the correlation value means (i.e. compare this interpretation to your answer in part 2).

#. Switch :math:`x` and :math:`y` around and rebuild your least squares model. Compare the new :math:`R^2` to the previous model's :math:`R^2`. Is this result surprising? How do interpret this?

.. question:: :grading: 15

In this question we consider the `bioreactor yield <http://datasets.connectmv.com/info/bioreactor-yields>`_ data set and fit a linear model using all :math:`x`-variables simultaneously to predict the yield.

#. Provide the interpretation for each coefficient in the model, and also comment on each one's confidence interval when interpreting it.

#. Compare the 3 slope coefficient values the case when you regress yield onto each :math:`x`-variable on its own.:

- :math:`\hat{y} = 102.5 - 0.69T`, where :math:`T` is tank temperature - :math:`\hat{y} = -20.3 + 0.016S`, where :math:`S` is impeller speed - :math:`\hat{y} = 54.9 - 16.7B`, where :math:`B` is 1 if baffles are present and :math:`B=0` with no baffles

Explain why your coefficients do not match.

#. Are the residuals from the multiple linear regression model normally distributed?

#. In this part we are investigating the variance-covariance matrices used to calculate the linear model.

#. First center the :math:`x`-variables and the :math:`y`-variable that you used in the model.

*Note*: feel free to use MATLAB, or any other tool to answer this question. If you are using R, then you will benefit from the R tutorial on the course website. Also, read the help for the ``model.matrix(...)`` function to get the :math:`\mathbf{X}`-matrix. Then read the help for the ``sweep(...)`` function, or more simply use the ``scale(...)`` function to do the mean-centering.

#. Show your calculated :math:`\mathbf{X}^T\mathbf{X}` and :math:`\mathbf{X}^T\mathbf{y}` variance-covariance matrices from the centered data.

#. Explain why the interpretation of covariances in :math:`\mathbf{X}^T\mathbf{y}` match the results from the full MLR model you calculated in part 1 of this question.

#. Calculate :math:`\mathbf{b} =\left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T\mathbf{y}` and show that it agrees with the estimates that R calculated (even though R fits an intercept term, while your :math:`\mathbf{b}` does not).

#. What would be the predicted yield for an experiment run without baffles, at 4000 rpm impeller speed, run at a reactor temperature of 90 °C?


</rst>