Assignment 6 - 2013

From Statistics for Engineering
Revision as of 03:11, 12 March 2013 by Kevin Dunn (talk | contribs)
Jump to navigation Jump to search
Due date(s): 15 March 2013
Nuvola mimetypes pdf.png (PDF) Assignment questions

<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/> .. note:: **Assignment objectives**

* Using and interpreting an MLR model with integer variables. * Using an MLR with integer variables that are at more than 2 levels.

.. question:: :grading: 0

*This question is fully solved in the course textbook*, Process Improvement using Data. So it is worth no credit, and will not be graded. However, you are strongly recommended to complete the question without looking at the answers.

In this question we will use the `LDPE data <http://datasets.connectmv.com/info/ldpe>`_ which is data from a high-fidelity simulation of a low-density polyethylene reactor. LDPE reactors are very long, thin tubes. In this particular case the tube is divided in 2 zones, since the feed enters at the start of the tube, and some point further down the tube (start of the second zone). There is a temperature profile along the tube, with a certain maximum temperature somewhere along the length. The maximum temperature in zone 1, ``Tmax1`` is reached some fraction ``z1`` along the length; similarly in zone 2 with the ``Tmax2`` and ``z2`` variables.

We will build a linear model to predict the ``SCB`` variable, the short chain branching (per 1000 carbon atoms) which is an important quality variable for this product. Note that the last 4 rows of data are known to be from abnormal process operation, when the process started to experience a problem. However, we will pretend we didn't know that when building the model, so keep them in for now.

#. Use only the following subset of :math:`x`-variables: ``Tmax1``, ``Tmax2``, ``z1`` and ``z2`` and the :math:`y` variable = ``SCB``. Show the relationship between these 5 variables in a scatter plot matrix.

Use this code to get you started (make sure you understand what it is doing)::

LDPE <- read.csv('http://datasets.connectmv.com/file/ldpe.csv') subdata <- data.frame(cbind(LDPE$Tmax1, LDPE$Tmax2, LDPE$z1, LDPE$z2, LDPE$SCB)) colnames(subdata) <- c("Tmax1", "Tmax2", "z1", "z2", "SCB")

Using bullet points, describe the nature of relationships between the 5 variables, and particularly the relationship to the :math:`y`-variable.

#. Let's start with a linear model between ``z2`` and ``SCB``. We will call this the ``z2`` model. Let's examine its residuals:

#. Are the residuals normally distributed? #. What is the standard error of this model? #. Are there any time-based trends in the residuals (the rows in the data are already in time-order)? #. Use any other relevant plots of the predicted values, the residuals, the :math:`x`-variable, as described in class, and diagnose the problem with this linear model. #. What can be done to fix the problem?


.. question:: :grading: 6

Operators have noticed differences in the yield from our batch process [g/L] depending on the raw material supplier. You've collected data from the last 12 batches and coded the data from the city and country of origin::

# 1 = València, Spain # 2 = Luxembourg, Luxembourg # 3 = Utrecht, Netherlands

country <- c(3, 2, 1, 3, 1, 1, 2, 2, 2, 1, 3, 3) yield <- c(72.9, 69.3, 70.8, 79.1, 66.3, 73.3, 65.1, 66.5, 54.9, 74.7, 80.8, 79.3)

Build a linear model that predicts the yield from the country of origin. Make sure you reassign the ``country`` variable as follows::

country <- as.factor(country)

before you use it in the model (and understand what the ``as.factor(...)`` function does).

#. Interpret the ``Intercept`` term, the ``country2`` slope coefficient and the ``country3`` slope coefficient in your written answer. If you haven't yet discovered and used the ``model.matrix(...)`` command, you will need it here.

#. What have you learned from this model?

#. Is what you have learned still valid when you consider the 95% confidence intervals for the slope coefficients? Explain clearly in your answer.


.. question:: :grading: 5

In a previous assignment you compared the ``TK104`` reactor to the ``TK105`` using the `Brittleness Index dataset <http://datasets.connectmv.com/info/brittleness-index>`_.

#. Repeat the confidence interval calculation for the comparison between the ``TK104`` and ``TK105`` reactors, assuming the variances can be pooled. Report your answer as:

.. math::

\text{LB} \leq \mu_{105} - \mu_{104} \leq \text{UB}

#. Now build a linear model that uses a single integer variable coded as ``1`` when running the batch in ``TK105``, and coded as ``0`` when running the batch in ``TK104``. The :math:`y`-variable is the brittleness index value.

Prove to yourself that you get the same confidence interval for the integer variable, as you do with the regular confidence interval in the first part of the question. Make sure you can explain why this is the case.


.. question:: :grading: 6

#. Using the data from the previous question, code the integer variable in the linear model as ``0`` when running the batch in ``TK105``, and code it as ``1`` when running the batch in ``TK104``. The :math:`y`-variable is the brittleness index value. Report the slope coefficient and confidence interval. (*This question is mostly a repeat of the previous one*).

#. Now code the integer variable in the linear model as ``1`` when running the batch in ``TK105``, and code it as ``2`` when running the batch in ``TK104``. The :math:`y`-variable is the brittleness index value. Report the slope coefficient and confidence interval. How do the answers compare? Explain any differences or similarities you observe.

#. Now code the integer variable in the linear model as ``-1`` when running the batch in ``TK105``, and code it as ``+1`` when running the batch in ``TK104``. The :math:`y`-variable is the brittleness index value. Report the slope coefficient and confidence interval. How do the answers compare? Explain any differences or similarities you observe.


</rst>