5.15. Exercises¶
Question 1
These readings are to illustrate the profound effect that designed experiments have had in some areas.
Application of Statistical Design of Experiments Methods in Drug Discovery and using DOE for high-throughput screening to locate new drug compounds.
High traffic websites offer a unique opportunity to perform testing and optimization. This is because each visitor to the site is independent of the others (randomized), and these tests can be run in parallel. Read more in this brief writeup on how Google uses testing tools to optimize YouTube, one of their web properties. Unfortunately they use the term “multivariate” incorrectly - a better term is “multi-variable”; nevertheless, the number of factors and combinations to be tested is large. It’s well known that fractional factorial methods are used to analyze these data.
See three chemical engineering examples of factorial designs in Box, Hunter, and Hunter: Chapter 11 (1st edition), or page 173 to 183 in the second edition.
Question 2
Your family runs a small business selling low dollar value products over the web. They want to improve sales. There is a known effect from the day of the week, so to avoid that effect they run the following designed experiment every Tuesday for the past eight weeks. The first factor of interest is whether to provide free shipping over $30 or over $50. The second factor is whether or not the purchaser must first create a profile (user name, password, address, etc) before completing the transaction. The purchaser can still complete their transaction without creating a profile.
Solution Click to show answerThese are the data collected:
Date
Free shipping over …
Profile required before transaction
Total sales made
05 January 2010
$30
Yes
$ 3275
12 January 2010
$50
No
$ 3594
19 January 2010
$50
No
$ 3626
26 January 2010
$30
No
$ 3438
02 February 2010
$50
Yes
$ 2439
09 February 2010
$30
No
$ 3562
16 February 2010
$30
Yes
$ 2965
23 February 2010
$50
Yes
$ 2571
Calculate the average response from replicate experiments to calculate the 4 corner points.
Calculate and interpret the main effects in the design.
Show the interaction plot for the 2 factors.
We will show in the next class how to calculate confidence intervals for each effect, but would you say there is an interaction effect here? How would you interpret the interaction (whether there is one or not)?
What is the recommendation to increase sales?
Calculate the main effects and interactions by hand using a least squares model. You may confirm your result using software, but your answer should not just be the computer software output.
Question 3
More readings:
It is worth reading this paper by Bisgaard to see how the same tools shown in these notes were used to solve a real industrial problem: designed experiments, autocorrelation plots, data visualization, and quality control charts. Also he describes how the very real pressure from managers, time-constraints and interactions with team-members impacted the work.
“The Quality Detective: A Case Study” (and discussion), Philosophical Transactions of the Royal Society A, 327, 499-511, 1989.
George Box, The R. A. Fisher Memorial Lecture, 1988, “Quality Improvement - An Expanding Domain for the Application of Scientific Method”, Philosophical Transactions of the Royal Society - A, 327: pages 617-630, 1989.
Question 4
Note
This is a tutorial-type question: all the sub-questions build on each other. All questions deal with a hypothetical bioreactor system, and we are investigating four factors:
A = feed rate: slow or medium
B = initial inoculant size (300g or 700g)
C = feed substrate concentration (40 g/L or 60 g/L)
D = dissolved oxygen set-point (4mg/L or 6 mg/L)
The 16 experiments from a full factorial, \(2^4\), were randomly run, and the yields from the bioreactor, \(y\), are reported here in standard order: y = [60, 59, 63, 61, 69, 61, 94, 93, 56, 63, 70, 65, 44, 45, 78, 77].
Calculate the 15 main effects and interactions and the intercept, using computer software.
Use a Pareto-plot to identify the significant effects. What would be your advice to your colleagues to improve the yield?
Refit the model using only the significant terms identified in the second question.
Explain why you don’t actually have to recalculate the least squares model parameters.
Compute the standard error and confirm that the effects are indeed significant at the 95% level.
Write down the exact settings for A, B, C, and D you would provide to the graduate student running a half-fraction in 8 runs for this system.
Before the half-fraction experiments are even run you can calculate which variables will be confounded (aliased) with each other. Report the confounding pattern for these main effects and for these two-factor interactions. Your answer should be in this format:
Generator =
Defining relationship =
Confounding pattern:
\(\widehat{\beta}_\mathbf{A} \rightarrow\)
\(\widehat{\beta}_\mathbf{B} \rightarrow\)
\(\widehat{\beta}_\mathbf{C} \rightarrow\)
\(\widehat{\beta}_\mathbf{D} \rightarrow\)
\(\widehat{\beta}_\mathbf{AB} \rightarrow\)
\(\widehat{\beta}_\mathbf{AC} \rightarrow\)
\(\widehat{\beta}_\mathbf{AD} \rightarrow\)
\(\widehat{\beta}_\mathbf{BC} \rightarrow\)
\(\widehat{\beta}_\mathbf{BD} \rightarrow\)
\(\widehat{\beta}_\mathbf{CD} \rightarrow\)
Now use the 8 yield values corresponding to your half fraction, and calculate as many parameters (intercept, main effects, interactions) as you can.
Report their numeric values.
Compare your parameters from this half-fraction (8 runs) to those from the full factorial (16 runs). Was much lost by running the half fraction?
What was the resolution of the half-fraction?
What is the projectivity of this half-fraction? And what does this mean in light of the fact that factor A was shown to be unimportant?
Factor C was found to be an important variable from the half-fraction; it had a significant coefficient in the linear model, but it was aliased with ABD. Obviously in this problem, the foldover set of experiments to run would be the other half-fraction. But we showed a way to de-alias a main effect. Use that method to show that the other 8 experiments to de-alias factor C would just be the other 8 experiment not included in your first half-fraction.
Question 5
Your group is developing a new product, but have been struggling to get the product’s stability, measured in days, to the level required. You are aiming for a stability value of 50 days or more. Four factors have been considered:
A = monomer concentration: 30% or 50%
B = acid concentration: low or high
C = catalyst level: 2% or 3%
D = temperature: 393K or 423K
These eight experiments have been run so far:
Experiment |
Order |
A |
B |
C |
D |
Stability |
---|---|---|---|---|---|---|
1 |
5 |
\(-\) |
\(-\) |
\(-\) |
\(-\) |
40 |
2 |
6 |
\(+\) |
\(-\) |
\(-\) |
\(+\) |
27 |
3 |
1 |
\(-\) |
\(+\) |
\(-\) |
\(+\) |
35 |
4 |
4 |
\(+\) |
\(+\) |
\(-\) |
\(-\) |
21 |
5 |
2 |
\(-\) |
\(-\) |
\(+\) |
\(+\) |
39 |
6 |
7 |
\(+\) |
\(-\) |
\(+\) |
\(-\) |
27 |
7 |
3 |
\(-\) |
\(+\) |
\(+\) |
\(-\) |
27 |
8 |
8 |
\(+\) |
\(+\) |
\(+\) |
\(+\) |
20 |
Where would you run the next experiment to try get the stability above 50 or greater?
Question 6
The following diagram shows data from a central composite design. The factors were run at their standard levels, and there were 4 runs at the center point.
Calculate the parameters for a suitable quadratic model in these factors. Show your matrices for \(\mathbf{X}\) and \(\mathbf{y}\).
Draw a response surface plot of A vs B over a suitably wide range beyond the experimental region.
Where would you move A and B if your objective is to increase the response value?
Report your answer in coded units.
Report your answer in real-world units, if the full factorial portion of the experiments were ran at:
A = stirrer speed, 200rpm and 340 rpm
B = stirring time, 30 minutes and 40 minutes
You might feel more comfortable setting up the problem in MATLAB. You can use the contour plot functions in MATLAB to visualize the results.
If you are using R, you can use the rbind(...)
or cbind(...)
functions to build up your \(\mathbf{X}\) matrix row-by-row or column-by-column. The equivalent of meshgrid in R is the expand.grid(...)
function. See the R code on the course website that shows how to generate surface plots in R.
Question 7
A full \(2^3\) factorial was run as shown:
Experiment |
A |
B |
C |
---|---|---|---|
1 |
30% |
232 |
Larry |
2 |
50% |
232 |
Larry |
3 |
30% |
412 |
Larry |
4 |
50% |
412 |
Larry |
5 |
30% |
232 |
Terry |
6 |
50% |
232 |
Terry |
7 |
30% |
412 |
Terry |
8 |
50% |
412 |
Terry |
What would be the D-optimal objective function value for the usual full \(2^3\) factorial model?
If instead experiment 2 was run at (A,B,C) = (45%, 200, Larry), and experiment 3 run at (A, B, C) = (35%, 400, Larry); what would be the D-optimal objective function value?
What is the ratio between the two objective function values?
Question 8
In your start-up company you are investigating treatment options for reducing the contamination level of soil that has been soaked with hydrocarbon products. You have two different heaps of contaminated soil from two different sites. You expect your treatment method to work on any soil type though.
Your limited line of credit allows only 9 experiments, even though you have identified at least 6 factors which you expect to have an effect on the treatment.
Write out the set of experiments that you believe will allow you to learn the most relevant information, given your limited budget. Explain your thinking, and present your answer with 7 columns: 6 columns showing the settings for the 6 factors and one column for the heap from which the test sample should be taken. There should be 9 rows in your table.
What is the projectivity and resolution of your design?
Question 9
A factorial experiment was run to investigate the settings that minimize the production of an unwanted side product. The two factors being investigated are called A and B for simplicity, but are:
A = reaction temperature: low level was 420 K, and high level was 440 K
B = amount of surfactant: low level was 10 kg, high level was 12 kg
A full factorial experiment was run, randomly, on the same batch of raw materials, in the same reactor. The system was run on two different days though, and the operator on day 2 was a different person. The recorded amount, in grams, of the side product was:
Experiment |
Run order |
Day |
A |
B |
Side product formed |
---|---|---|---|---|---|
1 |
2 |
1 |
420 K |
10 kg |
89 g |
2 |
4 |
2 |
440 K |
10 kg |
268 g |
3 |
5 |
2 |
420 K |
12 kg |
179 g |
4 |
3 |
1 |
440 K |
12 kg |
448 g |
5 |
1 |
1 |
430 K |
11 kg |
196 g |
6 |
6 |
2 |
430 K |
11 kg |
215 g |
What might have been the reason(s) for including experiments 5 and 6?
Was the blocking for a potential day-to-day effect implemented correctly in the design? Please show your calculations.
Write out a model that will predict the amount of side product formed. The model should use coded values of A and B. Also write out the \(\mathbf{X}\) matrix and \(\mathbf{y}\) vector that can be used to estimate the model coefficients using the equation \(\mathbf{b} = \left(\mathbf{X'X}\right)^{-1}\mathbf{X'y}\).
Solve for the coefficients of your linear model, either by using \(\mathbf{b} = \left(\mathbf{X'X}\right)^{-1}\mathbf{X'y}\) directly, or by some other method.
Assuming the blocking for the day-to-day effect was implemented correctly, does your model show whether this was an important effect on the response or not? Explain your answer.
You have permission to run two further experiments to find an operating point that reduces the unwanted side product. Where would you place your next two runs, and show how you select these values. Please give your answer in the original units of A and B.
As you move along the response surface, performing new experiments to approach the optimum, how do you know when you are reaching an optimum? How does your experimental strategy change? Please give specific details, and any model equations that might help illustrate your answer.
Question 10
Adapted from Box, Hunter and Hunter
A liquid polymer formulation is being made that is applied as a polish to wood surfaces. The group responsible for the product have identified 3 elements to the formulation that have an effect of the liquid polish’s final quality attributes (FQAs: this acronym is becoming a standard in most companies these days).
A: amount of reactive monomer in the recipe (10% at the low level and 30% at the high level)
B: the amount of chain length regulator (1% at the low level and 4% at the high level)
C: the type of chain length regulator (regulator P at the \(-\) level or regulator Q at the \(+\) level)
In class we have focused on the case where our \(y\)-variable is continuous, but it could also be descriptive. In this question we also see what happens when we have more than one \(y\)-variable.
\(y_1\) = Milky appearance: either Yes or No
\(y_2\) = Viscous: either Yes or No
\(y_3\) = Yellow colour: either No or Slightly
The following table captures the 8 experiments in standard order, although the experiments were run in a randomized order.
Experiment |
A |
B |
C |
\(y_1\) |
\(y_2\) |
\(y_3\) |
---|---|---|---|---|---|---|
1 |
\(-\) |
\(-\) |
P |
Yes |
Yes |
No |
2 |
\(+\) |
\(-\) |
P |
No |
Yes |
No |
3 |
\(-\) |
\(+\) |
P |
Yes |
No |
No |
4 |
\(+\) |
\(+\) |
P |
No |
No |
No |
5 |
\(-\) |
\(-\) |
Q |
Yes |
Yes |
No |
6 |
\(+\) |
\(-\) |
Q |
No |
Yes |
Slightly |
7 |
\(-\) |
\(+\) |
Q |
Yes |
No |
No |
8 |
\(+\) |
\(+\) |
Q |
No |
No |
Slightly |
What is the cause of a milky appearance?
What causes a more viscous product?
What is the cause of a slight yellow appearance?
Which conditions would you use to create a product was not milky, was of low viscosity, and had no yellowness?
Which conditions would you use to create a product was not milky, was of low viscosity, and had some yellowness?
Question 11
Using a \(2^3\) factorial design in 3 variables (A = temperature, B = pH and C = agitation rate), the conversion, \(y\), from a chemical reaction was recorded.
Experiment |
A |
B |
C |
\(y\) |
---|---|---|---|---|
1 |
\(-\) |
\(-\) |
\(-\) |
72 |
2 |
\(+\) |
\(-\) |
\(-\) |
73 |
3 |
\(-\) |
\(+\) |
\(-\) |
66 |
4 |
\(+\) |
\(+\) |
\(-\) |
87 |
5 |
\(-\) |
\(-\) |
\(+\) |
70 |
6 |
\(+\) |
\(-\) |
\(+\) |
73 |
7 |
\(-\) |
\(+\) |
\(+\) |
67 |
8 |
\(+\) |
\(+\) |
\(+\) |
87 |
A = \(\displaystyle \frac{\text{temperature} - 150\text{°C}}{10\text{°C}}\)
B = \(\displaystyle \frac{\text{pH} - 7.5}{0.5}\)
C = \(\displaystyle \frac{\text{agitation rate} - 50 \text{rpm}}{5 \text{rpm}}\)
Show a cube plot for the recorded data.
Estimate the main effects and interactions by hand.
Interpret any results from part 2.
Show that a least squares model for the full factorial agrees with the effects and interactions calculated by hand.
Approximately, at what conditions (given in real-world units), would you run the next experiment to improve conversion. Give your settings in coded units, then unscale and uncenter them to get real-world units.
Question 12
Why do we block groups of experiments?
Write a \(2^3\) factorial design in two blocks of 4 runs, so that no main effect or 2 factor interaction is confounded with block differences.
Question 13
Factors related to the shrinkage of plastic film, produced in an injection molding device, are being investigated. The following factors have been identified by the engineer responsible:
A = mold temperature
B = moisture content
C = holding pressure
D = cavity thickness
E = booster pressure
F = cycle time
G = gate size
Experiment |
A |
B |
C |
D |
E |
F |
G |
\(y\) |
---|---|---|---|---|---|---|---|---|
1 |
\(-\) |
\(-\) |
\(-\) |
\(+\) |
\(+\) |
\(+\) |
\(-\) |
14.0 |
2 |
\(+\) |
\(-\) |
\(-\) |
\(-\) |
\(-\) |
\(+\) |
\(+\) |
16.8 |
3 |
\(-\) |
\(+\) |
\(-\) |
\(-\) |
\(+\) |
\(-\) |
\(+\) |
15.0 |
4 |
\(+\) |
\(+\) |
\(-\) |
\(+\) |
\(-\) |
\(-\) |
\(-\) |
15.4 |
5 |
\(-\) |
\(-\) |
\(+\) |
\(+\) |
\(-\) |
\(-\) |
\(+\) |
27.6 |
6 |
\(+\) |
\(-\) |
\(+\) |
\(-\) |
\(+\) |
\(-\) |
\(-\) |
24.0 |
7 |
\(-\) |
\(+\) |
\(+\) |
\(-\) |
\(-\) |
\(+\) |
\(-\) |
27.4 |
8 |
\(+\) |
\(+\) |
\(+\) |
\(+\) |
\(+\) |
\(+\) |
\(+\) |
22.6 |
You can obtain a copy of this data set if you install the BsMD
package in R. Then use the following commands:
library(BsMD)
data(BM93.e3.data)
# Use only a subset of the original experiments
X <- BM93.e3.data[1:8, 2:10]
How many experiments would have been required for a full factorial experiment?
What type of fractional factorial is this (i.e. is it a half fraction, quarter fraction …)?
Identify all the generators used to create this design. A table, such as on page 272 in Box, Hunter and Hunter, 2nd edition will help.
Write out the complete defining relationship.
What is the resolution of this design?
Use a least squares approach to calculate a model that fits these 8 experiments.
What effects would you judge to be significant in this system? The engineer will accept your advice and disregard the other factors, and spend the rest of the experimental budget only on the factors deemed significant.
What are these effects aliased with (use your defining relationship to find this).
Why is in necessary to know the confounding pattern for a fractional factorial design.
Question 14
One of the experiment projects investigated by a previous student of this course was understanding effects related to the preparation of uncooked, breaded chicken strips.
The student investigated these 3 factors in a full factorial design \(^\ast\):
D = duration: low level at 15 minutes; and high level = 22 minutes.
R = position of oven rack: low level = use middle rack; high level = use low oven rack (this coding, though unusual, was used because the lower rack applies more heat to the food).
P = preheated oven or not: low level = short preheat (30 seconds); high level = complete preheating.
\(^\ast\) The student actually investigated 4 factors, but found the effect of oven temperature to be negligible!
The response variable was \(y\) = taste, the average of several tasters, with higher values being more desirable.
Experiment
D
R
P
Taste
1
\(-\)
\(-\)
\(-\)
3
2
\(+\)
\(-\)
\(-\)
9
3
\(-\)
\(+\)
\(-\)
3
4
\(+\)
\(+\)
\(-\)
7
5
\(-\)
\(-\)
\(+\)
3
6
\(+\)
\(-\)
\(+\)
10
7
\(-\)
\(+\)
\(+\)
4
8
\(+\)
\(+\)
\(+\)
7
A full factorial model, using the usual coding, was calculated from these 8 experiments:
What is the physical interpretation of the \(+2.5 x_\text{D}\) term in the model?
From the above table, at what real-world conditions should you run the system to get the highest taste level?
Does your previous answer match the above model equation? Explain, in particular, how the non-zero two factor interaction term affects taste, and whether the interaction term reinforces the taste response variable, or counteracts it, when the settings you identified in part 2 are used.
If you decided to investigate this system, but only had time to run 4 experiments, write out the fractional factorial table that would use factors D and R as your main effects and confound factor P on the DR interaction.
Now add to your table the response column for taste, extracting the relevant experiments from the above table.
Next, write out the model equation and estimate the 4 model parameters from your reduced set of experiments. Compare and comment on your model coefficients, relative to the full model equation from all 8 experiments.
Question 15
Your company is developing a microgel-hydrogel composite, used for controlled drug delivery with a magnetic field. A previous employee did the experimental work but she has since left the company. You have been asked to analyze the existing experimental data.
Response variable: \(y\) = sodium fluorescein (SF) released [mg], per gram of gel
The data collected, in the original units:
Experiment
Order
M = microgel weight [%]
H = hydrogel weight [%]
\(y\)
1
4
4
10
119
2
1
8
10
93
3
6
4
16
154
4
3
8
16
89
5
2
6
13
85
6
5
6
13
88
7
9
3.2
13
125
8
7
8.8
13
111
9
10
6
17.2
136
10
8
6
8.8
98
What was likely the reason the experimenter added experiments 5 and 6?
Why might the experimenter have added experiments 7, 8, 9 and 10 after the first six? Provide a rough sketch of the design, and all necessary calculations to justify your answer.
What is the name of the type of experimental design chosen by the employee for all 10 experiments in the table?
Using these data, you wish to estimate a nonlinear approximation of the response surface using a model with quadratic terms. Write out the equation of such a model that can be calculated from these 10 experiments (also read the next question).
Write out
the \(\mathbf{X}\) matrix,
the corresponding symbolic entries in \(\mathbf{b}\)
and the \(\mathbf{y}\) vector
that you would use to solve the equation \(\mathbf{b} = \left(\mathbf{X}^T \mathbf{X} \right)^{-1} \mathbf{X}^T \mathbf{y}\) to obtain the parameter estimates of the model you proposed in the previous part. You must use data from all 10 experiments.
How many degrees of freedom will be available to estimate the standard error and confidence intervals?
Question 16
Biological drugs are rapidly growing in importance in the treatment of certain diseases, such as cancers and arthritis, since they are designed to target very specific sites in the human body. This can result in treating diseases with minimal side effects. Such drugs differ from traditional drugs in the way they are manufactured – they are produced during the complex reactions that take place in live cell culture. The cells are grown in lab-scale bioreactors, harvested, purified and packaged.
These processes are plagued by low yields which makes these treatments very costly. Your group has run an initial set of experiments to learn more about the system and find better operating conditions to boost the yield. The following factors were chosen in the usual factorial manner:
G = glucose substrate choice: a binary factor, either Gm at the low level code or Gp at the high level.
A = agitation level: low level = 10 rpm and high level = 20 rpm, but can only be set at integer values.
T = growth temperature: 30°C at the low level, or 36°C at the high level, and can only be set at integer values in the future, with a maximum value of 40°C.
C = starting culture concentration: low level = 1100 and high level = 1400, and can only be adjusted in multiples of 50 units and within a range of 1000 to 2000 units.
A fractional factorial in 8 runs at the above settings, created by aliasing C = GAT, gave the following model in coded units:
The aim is to find the next experiment that will improve the yield, measured in milligrams, the most.
What settings might have been used for the baseline conditions for this factorial experiment?
What is the resolution of this design?
Using the method of steepest ascent, state all reasonable assumptions you need to find the experimental conditions for all 4 factors for the next experiment. Give these 4 conditions in both the real-world units, as well as in the usual coded units of the experiment. Note however that your manager has seen that temperature has a strong effect on yield, and he has requested the next experiment be run at 40°C.
Report the expected yield at these proposed experimental conditions.