Written midterm - 2012

 Date: 16 February 2012 (PDF) Midterm questions (PDF) Midterm solutions

Note

• You may bring in any printed materials to the midterm; any textbooks, any papers, etc.
• You may use any calculator during the midterm.
• You may answer the questions in any order in the answer booklet.
• You may use any table of normal distributions and $$t$$-distributions in the midterm; or use the copy that is available in the course notes.
• 400-level students: please answer all the questions, except those marked as 600-level questions. You will get extra credit for answering the 600-level questions though.
• 600-level students will be held to a higher level of technical accuracy than 400-level students.
• Total marks: 70 marks for 400-level students; 75 marks for 600-level students.
• Total time for all levels: 2.5 hours

Question 1 [11 = 2 + 2 + 2 + 2 + 3]

A food production facility fills bags with potato chips with an advertised bag weight of 35.0 grams.

1. The government's Weights and Measures Act requires that at most 2.5% of customers may receive a bag containing less than the advertised weight. At what setting should you put the target fill weight to meet this requirement exactly? The check-weigher on the bagging system shows the long-term standard deviation for weight is about 1.5 grams.
2. Out of 100 customers, how many are lucky enough to get 40.0 grams or more of potato chips in their bags?
3. What is the current Cpk of this process?
4. What is your assessment regarding this process's capability? Is it capable?
5. If you wanted to change the Cpk to a value of 1.3, what could you change, and to what new value would you change it?

Question 2 [5 = 2 + 3] (600-level students only)

1. Explain why robust methods are desirable in automatic data analysis systems.
2. What is meant by the break-down point of a robust statistic? Give an example to explain your answer.

Question 3 [16 = 8 + 8]

A motor company is testing their new automated parallel parking assistant. The aim is to use the automatic system to reduce the time taken to safely parallel park in different lengths of parking spaces (short parking space, or a longer parking space), with different models of their cars.

The table below lists the cars and conditions under which the tests were performed. The experiments were performed in totally randomized order to those listed below. The manual parking time is the median time of 12 representative drivers, of different levels of skill.

Car Parking length Automatic time [s] Manual parking time [s] Manual $$-$$ Automatic time [s]
A Short 35 44 9
B Short 47 49 2
C Short 19 39 20
D Short 22 41 19
E Short 33 44 11
A Long 24 35 11
B Long 35 39 4
C Long 18 26 8
D Long 22 35 13
E Long 28 39 11

The following summary calculations have been made for your reference, though you might not need all this information:

• All automatic parkings: mean = 28 s, standard deviation = 9.1 s
• All manual parkings: mean = 39 s, standard deviation = 6.3 s
• All short space parkings: mean = 37 s, standard deviation = 10 s
• All long space parkings: mean = 30 s, standard deviation = 7.5 s
• Differences between manual and automatic times: mean = 11 s, standard deviation = 5.7 s

If required, you may assume that variances can be pooled. All calculations should be at the 95% confidence level. Always state the degrees of freedom you use when reading from statistical tables.

1. Is parking in a short parking space significantly extended in duration than parking in a long parking space? Show all calculations and list all assumptions. When giving any assumption, either explain why it is reasonable, or explain how you might test whether it is reasonable.
2. Does parking with the automatic system reduce the time when compared to regular, manual drivers? You must clearly justify your choice of statistical test and list all assumptions in your answer.

Question 4 [11 = 2 + 2 + 3 + 2 + 2]

One criterion for a "clean room" is to maintain a positive differential pressure; i.e. the pressure in the room must exceed the pressure on the other side of the door. This is to prevent contaminants from entering the room via any gaps around the door. Differential pressure gauges are commercially available for this purpose and can be made to record their data automatically.

1. Which process monitoring chart would you choose to monitor that the differential pressure remains at the target value of +30 Pascal. Explain your choice.
2. Give an example of a type II error in the context of this example.
3. In general, in the context of process monitoring, what can you do if too many type II errors occur on a particular monitoring chart?
4. Explain what is meant by a process being "in a state of statistical control".
5. Explain what a "special cause" is in the context of the above example.

Question 5 [10 = 2 + 2 + 2 + 2 + 2]

The following confidence interval is reported by our company for the amount of sulphur dioxide measured in parts per billion (ppb) that we send into the atmosphere.

$123.6 \text{ppb} \leq \mu \leq 240.2 \text{ppb}$

Only $$n=21$$ raw data points (one data point measured per day) were used to calculate that 90% confidence interval. A $$z$$-value would have been calculated as an intermediate step to get the final confidence interval, where $$z = \displaystyle \frac{\overline{x} - \mu}{s / \sqrt{n}}$$.

1. What assumptions were made about those 21 raw data points to compute the above confidence interval?
2. Which lower and upper critical values would have been used for $$z$$? That is, which critical values are used before unpacking the final confidence interval as shown above.
3. What is the standard deviation, $$s$$, of the raw data?
4. Today's sulphur dioxide reading is 460 ppb and your manager wants to know what's going on; you can quickly calculate the probability of seeing a value of 460 ppb, or greater, to help judge the severity of the pollution. How many days in a 365 calendar-day year are expected to show a sulphur dioxide value of 460 ppb or higher?
5. Explain clearly why a wide confidence interval is not desirable, from an environmental perspective.

Question 6 [15 = 2 + 2 + 2 + 2 + 2 + 3 + 2]

The mass of steam required to heat a building can be related to the average ambient temperature. Being able to predict the mass of steam required, $$s$$, when given the ambient temperature, $$T$$, can help in energy planning, and ultimately lead to energy reduction.

The table below lists the mass of steam produced [ton] in the past when the average temperature over a 2 hour period, recorded in K, was observed outside.

 Temperature = $$T$$ [Kelvin] 267 268 272 273 278 281 283 288 289 293 296 Steam produced = $$s$$ [tons] 220 251 211 210 155 152 122 157 100 64 58

The following calculations have already been performed for you:

• Number of samples, $$n = 11$$
• Temperature: mean = $$\overline{T} = 281$$ K and standard deviation is 10.0 K
• Average steam produced, $$\overline{s} = 155$$ tons, and standard deviation is 64.4 tons.

The modified output from a certain statistical package is:

Coefficients:
Standard
Value       Error
-----------------------------------
(Intercept)    1871.6936   183.2183
T                -6.1168     0.6523

Residual standard error: ____ on ____ degrees of freedom
Multiple R-squared: ____


A portion of the analysis of variance table is given below:

     Analysis of Variance
-----------------------------
Sum of
Source                Squares
-----------------------------
Due to the model        37572
Due to error             3845
-----------------------------
Total                   41417

1. What is the interpretation of the intercept? Is it a useful piece of knowledge derived from the model?
2. What is the interpretation of the slope coefficient? Is it a useful piece of knowledge derived from the model?
3. What is the multiple $$R^2$$ value that would have been calculated for this model?
4. What is the standard error, $$S_E$$, value that would have been calculated for this model?
5. How would you interpret your calculated standard error value, and what assumptions are required to match your interpretation?
6. Give a confidence interval for the slope coefficient and interpret what it means.
7. Which other input variable might be added to the linear model to help improve the model's prediction ability?

Question 7 [7 = 2 + 3 + 2]

In the course notes on the section on comparing differences between two groups we used, without proof, the fact that:

$\mathcal{V}\left\{\bar{x}_B - \bar{x}_A\right\} = \mathcal{V}\left\{\bar{x}_B\right\} + \mathcal{V}\left\{\bar{x}_A\right\}$

Using the fact that $$\mathcal{V}\{cx\} = c^2\mathcal{V}\{x\}$$, you can show that:

$\mathcal{V}\left\{\bar{x}_B + \bar{x}_A\right\} = \mathcal{V}\left\{\bar{x}_B\right\} + \mathcal{V}\left\{\bar{x}_A\right\}$
1. The first equation is only correct when an important assumption is true; what is that assumption?

2. Based on an actual industrial problem: A filling machine doses a drug to a canister. The patient will inhale the drug (imagine an asthma pump). The weight of the drug in the canister must be added as precisely and accurately as possible, to avoid patient over- or under-dosing.

The weight filled will fluctuate with temperature in the building and is theoretically calculated as having a standard deviation of 32mg due to typical temperature variations. The filling line has 6 machines that fill the canisters and the variability from machine-to-machine is 40mg. The operators calibrate the machines at the start of each shift, and their estimated calibration accuracy is estimated at 15mg. The wear and tear on the machine parts over the year is estimated to only add an extra 10mg of variation.

What is the expected long-term standard deviation of fill weights recorded from this process? What assumption(s) do you have to make to calculate this?

3. Continuing the above question, assume the long-term standard deviation of fill weights was 40 mg (not the correct answer for the previous question), what is the process capability ratio if the filler operates midway between the upper and lower specification limits, where USL = 1200 mg and LSL = 800 mg?