Assignment 6 - 2013
Due date(s): | 15 March 2013 |
(PDF) | Assignment questions |
(PDF) | Assignment solutions |
Note
Assignment objectives
- Using and interpreting an MLR model with integer variables.
- Using an MLR with integer variables that are at more than 2 levels.
Question 1 [0]
This question is fully solved in the course textbook, Process Improvement using Data. So it is worth no credit, and will not be graded. However, you are strongly recommended to complete the question without looking at the answers.
In this question we will use the LDPE data which is data from a high-fidelity simulation of a low-density polyethylene reactor. LDPE reactors are very long, thin tubes. In this particular case the tube is divided in 2 zones, since the feed enters at the start of the tube, and some point further down the tube (start of the second zone). There is a temperature profile along the tube, with a certain maximum temperature somewhere along the length. The maximum temperature in zone 1, Tmax1
is reached some fraction z1
along the length; similarly in zone 2 with the Tmax2
and z2
variables.
We will build a linear model to predict the SCB
variable, the short chain branching (per 1000 carbon atoms) which is an important quality variable for this product. Note that the last 4 rows of data are known to be from abnormal process operation, when the process started to experience a problem. However, we will pretend we didn't know that when building the model, so keep them in for now.
Use only the following subset of \(x\)-variables:
Tmax1
,Tmax2
,z1
andz2
and the \(y\) variable =SCB
. Show the relationship between these 5 variables in a scatter plot matrix.Use this code to get you started (make sure you understand what it is doing):
LDPE <- read.csv('http://openmv.net/file/ldpe.csv') subdata <- data.frame(cbind(LDPE$Tmax1, LDPE$Tmax2, LDPE$z1, LDPE$z2, LDPE$SCB)) colnames(subdata) <- c("Tmax1", "Tmax2", "z1", "z2", "SCB")
Using bullet points, describe the nature of relationships between the 5 variables, and particularly the relationship to the \(y\)-variable.
Let's start with a linear model between
z2
andSCB
. We will call this thez2
model. Let's examine its residuals:- Are the residuals normally distributed?
- What is the standard error of this model?
- Are there any time-based trends in the residuals (the rows in the data are already in time-order)?
- Use any other relevant plots of the predicted values, the residuals, the \(x\)-variable, as described in class, and diagnose the problem with this linear model.
- What can be done to fix the problem?
Question 2 [6]
Operators have noticed differences in the yield from our batch process [g/L] depending on the raw material supplier. You've collected data from the last 12 batches and coded the data from the city and country of origin:
# 1 = València, Spain
# 2 = Luxembourg, Luxembourg
# 3 = Utrecht, Netherlands
country <- c(3, 2, 1, 3, 1, 1, 2, 2, 2, 1, 3, 3)
yield <- c(72.9, 69.3, 70.8, 79.1, 66.3, 73.3, 65.1, 66.5, 54.9, 74.7, 80.8, 79.3)
Build a linear model that predicts the yield from the country of origin. Make sure you reassign the country
variable as follows:
country <- as.factor(country)
before you use it in the model (and understand what the as.factor(...)
function does).
- Interpret the
Intercept
term, thecountry2
slope coefficient and thecountry3
slope coefficient in your written answer. If you haven't yet discovered and used themodel.matrix(...)
command, you will need it here. - What have you learned from this model?
- Is what you have learned still valid when you consider the 95% confidence intervals for the slope coefficients? Explain clearly in your answer.
Question 3 [5]
In a previous assignment you compared the TK104
reactor to the TK105
using the Brittleness Index dataset.
Repeat the confidence interval calculation for the comparison between the
TK104
andTK105
reactors, assuming the variances can be pooled. Report your answer as:\[\text{LB} \leq \mu_{105} - \mu_{104} \leq \text{UB}\]Now build a linear model that uses a single integer variable coded as
1
when running the batch inTK105
, and coded as0
when running the batch inTK104
. The \(y\)-variable is the brittleness index value.Prove to yourself that you get the same confidence interval for the integer variable, as you do with the regular confidence interval in the first part of the question. Make sure you can explain why this is the case.
Question 4 [6]
- Using the data from the previous question, code the integer variable in the linear model as
0
when running the batch inTK105
, and code it as1
when running the batch inTK104
. The \(y\)-variable is the brittleness index value. Report the slope coefficient and confidence interval. (This question is mostly a repeat of the previous one). - Now code the integer variable in the linear model as
1
when running the batch inTK105
, and code it as2
when running the batch inTK104
. The \(y\)-variable is the brittleness index value. Report the slope coefficient and confidence interval. How do the answers compare? Explain any differences or similarities you observe. - Now code the integer variable in the linear model as
-1
when running the batch inTK105
, and code it as+1
when running the batch inTK104
. The \(y\)-variable is the brittleness index value. Report the slope coefficient and confidence interval. How do the answers compare? Explain any differences or similarities you observe.