# Practice questions

The questions below are from previous McMaster University exams when the course was taught by Dr. John MacGregor and other instructors. We have covered much of the same material - but a few topics were not covered - so don't expect to be able to answer all questions.

# Univariate statistics

## April 1997 [12 out of 100; 3 hour exam]

Two different analytical tests can be used to determine the impurity level in steel alloys. Eight specimens are tested using both procedures, and the results are shown in the following tabulation. Is there sufficient evidence to conclude that the tests give different mean impurity levels?

Specimen Test 1 Test 2
1 1.2 1.4
2 1.3 1.7
3 1.5 1.5
4 1.4 1.3
5 1.7 2.0
6 1.8 2.1
7 1.4 1.7
8 1.3 1.6

## April 1998 [12 out of 100; 3 hour exam]

Ten individuals have participated in a diet-modification program designed to stimulate weight loss. Their weight both before and after participation in the program is shown in the following list. Is there evidence to support the claim that this particular diet-modification program is effective in reducing weight?

Individual Before After
1 195 187
2 213 195
3 247 221
4 201 190
5 187 175
6 210 197
7 215 199
8 246 221
9 294 278
10 310 285

# Process monitoring

## April 1996 [8 out of 100; 3 hour exam]

1. Explain what is meant by common cause variation in process monitoring.
2. If a Shewhart chart on individual observations had control limits at 6.7 and 7.9, and the specification limits were 6.5 and 8.5, what is the process capability as measured by Cpk?

## April 1995 [5 out of 100; 3 hour exam]

Discuss how to set up a Shewhart chart, and the basic concepts behind how it should be used for process monitoring.

# Least squares modelling

## April 1995 [5+5+6 out of 100; 3 hour exam]

1. During a study, two responses, $$y_1$$ and $$y_2$$ are measured during each experimental run. There is no information about the variance-covariance matrix of the measurement errors. How would you go about estimating the parameters in the two models?

$\begin{split}y_1 &= f_1(\beta_1, \beta_2) + e_1 \\ y_2 &= f_2(\beta_1, \beta_2) + e_2\end{split}$

Discuss any analysis or checks that one should do prior to the final estimation stage.

2. What are some of the potential benefits that can sometimes be derived from transforming the data before applying standard statistical techniques?

3. Why is ordinary least squares sometimes a poor estimation method when the $$x$$-variables are highly correlated with one another? Suggest an alternative approach to obtain improved estimates of the parameters. Tell why the estimates may be better by your approach.

## April 1995 [15 out of 100; 3 hour exam]

Size experiments were performed to assess the effects of the density ($$x_1$$) and melt index ($$x_2$$) of polyethylene chips on the quality ($$y$$) of an extruded part. It is hypothesized that, over the ranges considered, the effects are linear. The data are mean centered and so a model of the form $$y = b_1x_1 + b_2x_2 + e$$ should hold. The sums of squares and cross products of the data are given below.

$\begin{split}\sum{x_1^2} &= 7.0 \\ \sum{x_2^2} &= 3.0 \\ \sum{x_1x_2} &= 4.0 \\ \sum{x_1y} &= 38.0 \\ \sum{x_2y} &= 30.0 \\\end{split}$
1. Obtain the least squares estimates of the parameters.
2. What is the correlation between the two parameter settings.
3. Given that the residual sum of squares is equal to 1.40, do these data show that product quality ($$y$$) depends upon the density ($$x_1$$)? State any assumptions you are making.

## April 1997 [15 out of 100; 3 hour exam]

Fit the model $$y = b_1x + b_2/x$$ to the following data by least squares:

 x 1 2 3 4 5 y 2.5 4.2 6.2 8.1 10.1

Is there evidence that the $$b_2/x$$ term is necessary?

# Design and Analysis of Experiments

Note

Any questions from chapters 5 and 6 in Box, Hunter, and Hunter's 2nd edition, or chapters 10, 11, 12 and 13 from the 1st edition, are excellent practice questions for solving real DOE problems.

## April 1996 [10 out of 100; 3 hour exam]

What blocking scheme would you recommend if it were necessary to run a $$2^4$$ design in four blocks of four runs each?

## April 1996 [14 out of 100; 3 hour exam]

In a chemical plant there are two "identical" reactors that can be run in parallel. Raw materials from a common source are fed continuously to the reactors, but the composition of the raw materials is known to vary or drift with time. The standard operating temperature of these reactors is 120°C, but there is a proposal to use 140 °C. It is claimed that this increased temperature should improve the yield of the process.

1. Set up an experimental design over a six-day period to test the hypothesis that the increased temperature will improve the yield. Be specific in your instructions to the plant operator. She will run the experiments exactly as instructed. Both reactors can be operated each day but each can only be operated at one temperature on a given day.
2. Show how you would analyze the results from your experimental program.

## April 1996 [4 out of 100; 3 hour exam]

A young engineer has passively observed a reactor unit for several months and noticed that whenever the feedrate was low the material produced was of rather low quality. He, therefore, decided to improve the product quality by operating the reactor at an increased feedrate only to find that the quality was not improved. Provide a possible explanation for this.

## April 1996 [20 out of 100; 3 hour exam]

1. Construct a $$2^{8-4}$$ fractional factorial design with as high a resolution as possible.
2. What are the generators of your design?
3. What is confounded with the main effect of the variable 7 in your design?
4. What is confounded with the two-factor interaction 12 in your design?
5. Suppose that the runs will have to be performed on two different days, and you feel that a day-to-day effect may be important. Make a recommendation on how to proceed.
6. How would you change your design if all the factors could not be run together at their high levels?

[Note: in an exam from another year the question was the same, except you were asked to construct a $$2^{7-3}$$ fractional factorial].

## April 1996 [5 out of 100; 3 hour exam]

Discuss the differences between "one at a time" experimental design and a response surface methodology approach.

## April 1996 [5+5+5 out of 100; 3 hour exam]

One objective an engineering has is to produce a product with a minimum quantity of impurities. There are two variables of concern, temperature and concentration. An initial $$2^2$$ factorial design was run, giving the following results:

Temperature Concentration $$y$$
130 20 52
160 20 71
130 40 42
160 40 65
1. What effects are significant using an estimate of $$\sigma^2$$ from the repeated measures of the center point of 59, 59, 61, 62, 62?
2. Suggest several new experiments to be run in order to achieve the stated objectives. Give temperature and concentration settings.
3. What is a second order design and when should it be used? Give some settings of temperature and concentration to convert the above design into a second order design.

## April 1996 [6 out of 100; 3 hour exam]

You intended to perform a full $$2^3$$ factorial design, but two values were lost as follows.

Experiment $$x_1$$ $$x_2$$ $$x_3$$ $$y$$
1 $$-$$ $$-$$ $$-$$ 3
2 $$+$$ $$-$$ $$-$$ 5
3 $$-$$ $$+$$ $$-$$ 4
4 $$+$$ $$+$$ $$-$$ 6
5 $$-$$ $$-$$ $$+$$ 3
6 $$+$$ $$-$$ $$+$$ NA
7 $$-$$ $$+$$ $$+$$ NA
8 $$+$$ $$+$$ $$+$$ 6

What effects can you estimate from this data? Do you have to make any assumptions to interpret the results?

## April 1997 [6 out of 100; 3 hour exam]

Discuss the role of randomization in the design of experiments.

## April 1997 [6 out of 100; 3 hour exam]

Discuss the difference between correlation among variables and casual relationships. How can one distinguish between these?

## April 1998 [23 out of 100; 3 hour exam]

1. Design an eight-run fractional factorial design for an experimenter with the following five variables: temperature, concentration, pH, agitation rate and catalyst type (A or B). He tells you he is particularly concerned about the two-factor interactions between temperature and concentration, and between catalyst type and temperature. He would like a design, if it is possible to construct one, such that these two two-factor interactions are unconfounded with one another. Suggest a design.
2. Suppose after running your design for suitable ranges of the variable, the experimenter obtained the response measurements: [5, 9, 10, 8, 3, 10, 14, 7] respectively, for the 8 runs in standard order. Estimate whatever effects you can.
3. Assuming from past records an estimate of the error variance of the response, $$s^2 = 1.1$$, is available with 10 degrees of freedom, are any of the effects in the second part of this question significantly different from zero at the 5% level of significance?
4. If an estimate of $$\sigma^2$$ were not available, how would you assess the statistical significance of the effects?

## April 1998 [8 out of 100; 3 hour exam]

Suppose that from a $$2^2$$ factorial experiment the estimated effects of temperature (T) and pressure (P) on the process yield, $$y$$, are given by $$\beta_T = -1.0$$, $$\beta_P = -2.0$$, and $$\beta_{TP} = -0.3$$ (with an estimate 95% confidence interval of $$\pm 0.7$$), where $$x_T = \frac{T-50}{10}$$, and $$x_P = \frac{P-20}{5}$$.

If this were the first experiment in a response surface study aimed at maximizing the yield, where would you suggest we perform the next experiment(s). be specific, and give recommended settings of $$T$$ and $$P$$. If $$\beta_{TP}$$ were equal to -3.0 instead of -0.3, how would you proceed? (A qualitative answer would be sufficient in this latter case).

## April 1998 [4 out of 100; 3 hour exam]

Suppose you have performed a $$2^3$$ factorial experiment, but the desired settings for the last run $$(x_1 = +1, x_2 = +1, x_3=+1)$$ could not be attained. Instead the experimental settings were $$x_1 = +0.7, x_2 = +1, x_3=+0.8$$. How would you analyze the data, and how would this affect your interpretation of the effects.

## April 1998 [4 out of 100; 3 hour exam]

It is desired to run a $$2^4$$ factorial design in 16 runs. However, the raw material comes in lots sufficient for only 4 runs, and there may be differences among the various lots. Set up a design in 4 block of 4 runs each.

## April 1998 [10 out of 100; 3 hour exam]

An experimenter has performed $$n$$ runs in the region $$0 \leq x_1 \leq 2$$ and $$2 \leq x_2 \leq 4$$ and has fitted these observations to the model: $$y = b_1x_1 + b_2x_2$$ to give $$\hat{b}_1 = 0.75$$ and $$\hat{b}_2 = 0.45$$, and $$X^TX = \begin{bmatrix} 11 & 20 \\ 20 & 40 \end{bmatrix}$$.

Determine the settings (integer values) for one additional experiment using the D-optimality criterion.

## April 2007 [20 out of 100; 3 hour exam]

A manufacturer of fabrics is interested in comparing the effects of 2 different chemicals on the surface finish of a fabric. The chemicals are to be used as part of a permanent press finishing process. Five fabric samples are to be used for the evaluation. Propose a designed experiment for this study, giving specific instructions on how to perform it. Explain how you would analyze the results.

Tell how you would modify the design and the analysis of the results if 3 chemicals were to be compared.

## April 2007 [18 out of 100; 3 hour exam]

1. In performing a response surface study, a $$2^4$$ factorial with center points has been performed and only the following effects were found to be statistically significant in terms of the scaled variables: $$b_1 = -3.0$$, $$b_3 = +4.0$$ and $$b_4 = +2.0$$. If the objective is to maximize $$y$$, suggest some specific runs to perform in terms of the scaled variables.
2. What are D-optimal designs? What is the justification for these designs? List some situations where these designs are very useful.

## April 2007 [30 out of 100; 3 hour exam]

Construct a $$2^{7-3}$$ fractional factorial design with the highest resolution possible.

1. Write down the design matrix showing the conditions for the 7 variables for each experiment.
2. What effects are confounded with the main effect of variable F, and with the two-factor interaction between variables B and F?
3. If the observed results from this experiment for the response $$(y)$$, given in standard order, were as given below, estimate the effect of the BF interaction and any other effects confounded with it. $$y = [5.0, 4.1, 4.5, 6.0, 5.5, 4.3, 5.0, 5.5, 6.2, 3.5, 5.8, 4.0, 3.9, 5.5, 6.0, 4.0]$$.
4. Suppose that you used a fold-over design as a second fraction. What would the design matrix look like for this second fraction? If the two fractions are combined, what combination of effects can be estimated (ignore 3 factor interactions and higher)?

## April 2007 [10 out of 100; 3 hour exam]

Suppose that, when you ran the first fractional factorial of a $$2^{7-3}$$ design, the $$y$$ response for the first experiment was not actually available because the sample collected from that experimental run was damaged prior to measurement. If the experiment could not be repeated, explain how you would analyze the results from the remaining 15 runs.

## Tutorial question, March 2006

Set up an 8 run fractional factorial design in 5 variables (1/4 fraction of a $$2^5$$ design)

1. Work out the confounding pattern of the effects.
2. For the runs in standard order, assume that the measured responses are: 2.0, 4.1, 3.5, 3.8, 4.5, 5.0, 2.9, 4.3. Estimate all the effects you can.
3. Suggest a second fraction to run? Write down the 8 design conditions for this. Work out the confounding pattern for the effects.
4. Work out the confounding pattern for the 16 effects of the combined design.

## April 1995 [6 out of 100; 3 hour exam]

Maximizing the determinant $$\|\mathbf{X}'\mathbf{X}\|$$ has been used to design the experimental conditions for the next run or runs. What justification exists for this design criterion? Suggest an alternative criterion for optimal designs.

## Unknown exam

Suggest an experimental design with a small number of runs for investigating 3 variables, $$x_1, x_2, x_3$$, where it is desired to estimate the main effects and two factor interactions for all three variables as well as the quadratic and cubic effects of $$x_3$$.

# Latent variable methods

## April 2007 [16 out of 100; 3 hour exam]

Describe briefly the concepts and methodology behind using PCA or PLS for setting up SPC control charts for the monitoring of processes. (I do not need an explanation of what PCA or PLS is, just its use for control charting.) When might you choose to use PLS over PCA? What advantages do these multivariate charts have over the use of Shewhart charts on individual variables?

## April 1996 [8 out of 100; 3 hour exam]

Discuss the use of PCA and/or PLS in multivariate process monitoring. What problems do these methods overcome, and what control charts and diagnostic procedures do they lead to?

## Unknown exam

Briefly discuss how Principal Component Analysis (PCA) is useful for the analysis of industrial databases.