Assignment 6 - 2013

From Statistics for Engineering
Jump to: navigation, search
Due date(s): 15 March 2013
Nuvola mimetypes pdf.png (PDF) Assignment questions
Nuvola mimetypes pdf.png (PDF) Assignment solutions

Note

Assignment objectives

  • Using and interpreting an MLR model with integer variables.
  • Using an MLR with integer variables that are at more than 2 levels.

Question 1 [0]

This question is fully solved in the course textbook, Process Improvement using Data. So it is worth no credit, and will not be graded. However, you are strongly recommended to complete the question without looking at the answers.

In this question we will use the LDPE data which is data from a high-fidelity simulation of a low-density polyethylene reactor. LDPE reactors are very long, thin tubes. In this particular case the tube is divided in 2 zones, since the feed enters at the start of the tube, and some point further down the tube (start of the second zone). There is a temperature profile along the tube, with a certain maximum temperature somewhere along the length. The maximum temperature in zone 1, Tmax1 is reached some fraction z1 along the length; similarly in zone 2 with the Tmax2 and z2 variables.

We will build a linear model to predict the SCB variable, the short chain branching (per 1000 carbon atoms) which is an important quality variable for this product. Note that the last 4 rows of data are known to be from abnormal process operation, when the process started to experience a problem. However, we will pretend we didn't know that when building the model, so keep them in for now.

  1. Use only the following subset of \(x\)-variables: Tmax1, Tmax2, z1 and z2 and the \(y\) variable = SCB. Show the relationship between these 5 variables in a scatter plot matrix.

    Use this code to get you started (make sure you understand what it is doing):

    LDPE <- read.csv('http://openmv.net/file/ldpe.csv')
    subdata <- data.frame(cbind(LDPE$Tmax1, LDPE$Tmax2, LDPE$z1, LDPE$z2, LDPE$SCB))
    colnames(subdata) <- c("Tmax1", "Tmax2", "z1", "z2", "SCB")
    

    Using bullet points, describe the nature of relationships between the 5 variables, and particularly the relationship to the \(y\)-variable.

  2. Let's start with a linear model between z2 and SCB. We will call this the z2 model. Let's examine its residuals:

    1. Are the residuals normally distributed?
    2. What is the standard error of this model?
    3. Are there any time-based trends in the residuals (the rows in the data are already in time-order)?
    4. Use any other relevant plots of the predicted values, the residuals, the \(x\)-variable, as described in class, and diagnose the problem with this linear model.
    5. What can be done to fix the problem?

Question 2 [6]

Operators have noticed differences in the yield from our batch process [g/L] depending on the raw material supplier. You've collected data from the last 12 batches and coded the data from the city and country of origin:

# 1 = València, Spain
# 2 = Luxembourg, Luxembourg
# 3 = Utrecht, Netherlands

country <- c(3, 2, 1, 3, 1, 1, 2, 2, 2, 1, 3, 3)
yield <- c(72.9, 69.3, 70.8, 79.1, 66.3, 73.3, 65.1, 66.5, 54.9, 74.7, 80.8, 79.3)

Build a linear model that predicts the yield from the country of origin. Make sure you reassign the country variable as follows:

country <- as.factor(country)

before you use it in the model (and understand what the as.factor(...) function does).

  1. Interpret the Intercept term, the country2 slope coefficient and the country3 slope coefficient in your written answer. If you haven't yet discovered and used the model.matrix(...) command, you will need it here.
  2. What have you learned from this model?
  3. Is what you have learned still valid when you consider the 95% confidence intervals for the slope coefficients? Explain clearly in your answer.

Question 3 [5]

In a previous assignment you compared the TK104 reactor to the TK105 using the Brittleness Index dataset.

  1. Repeat the confidence interval calculation for the comparison between the TK104 and TK105 reactors, assuming the variances can be pooled. Report your answer as:

    \[\text{LB} \leq \mu_{105} - \mu_{104} \leq \text{UB}\]
  2. Now build a linear model that uses a single integer variable coded as 1 when running the batch in TK105, and coded as 0 when running the batch in TK104. The \(y\)-variable is the brittleness index value.

    Prove to yourself that you get the same confidence interval for the integer variable, as you do with the regular confidence interval in the first part of the question. Make sure you can explain why this is the case.

Question 4 [6]

  1. Using the data from the previous question, code the integer variable in the linear model as 0 when running the batch in TK105, and code it as 1 when running the batch in TK104. The \(y\)-variable is the brittleness index value. Report the slope coefficient and confidence interval. (This question is mostly a repeat of the previous one).
  2. Now code the integer variable in the linear model as 1 when running the batch in TK105, and code it as 2 when running the batch in TK104. The \(y\)-variable is the brittleness index value. Report the slope coefficient and confidence interval. How do the answers compare? Explain any differences or similarities you observe.
  3. Now code the integer variable in the linear model as -1 when running the batch in TK105, and code it as +1 when running the batch in TK104. The \(y\)-variable is the brittleness index value. Report the slope coefficient and confidence interval. How do the answers compare? Explain any differences or similarities you observe.