Difference between revisions of "Design and analysis of experiments"
Kevin Dunn (talk | contribs) |
Kevin Dunn (talk | contribs) |
||
Line 286: | Line 286: | ||
summary(mod) | summary(mod) | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== Detailed analysis of the 3 factor example === | |||
<rst> | |||
<rst-options: 'toc' = False/> | |||
The data are from a plastics molding factory which must treat its waste before discharge. The :math:`y` variable represents the average amount of pollutant discharged [lb per day], while the 3 factors that were varied were: | |||
- :math:`C`: the chemical compound added [A or B] | |||
- :math:`T`: the treatment temperature [72°F or 100°F] | |||
- :math:`S`: the stirring speed [200 rpm or 400 rpm] | |||
- :math:`y`: the amount of pollutant discharged [lb per day] | |||
.. tabularcolumns:: |l|l||c|c|c||c| | |||
+-----------+-------+---------------+-----------------+-----------------+-----------------+ | |||
| Experiment| Order | :math:`C` | :math:`T` [°F] | :math:`S` [rpm] | :math:`y` [lb] | | |||
+===========+=======+===============+=================+=================+=================+ | |||
| 1 | 5 | A | 72 | 200 | 5 | | |||
+-----------+-------+---------------+-----------------+-----------------+-----------------+ | |||
| 2 | 6 | B | 72 | 200 | 30 | | |||
+-----------+-------+---------------+-----------------+-----------------+-----------------+ | |||
| 3 | 1 | A | 100 | 200 | 6 | | |||
+-----------+-------+---------------+-----------------+-----------------+-----------------+ | |||
| 4 | 4 | B | 100 | 200 | 33 | | |||
+-----------+-------+---------------+-----------------+-----------------+-----------------+ | |||
| 5 | 2 | A | 72 | 400 | 4 | | |||
+-----------+-------+---------------+-----------------+-----------------+-----------------+ | |||
| 6 | 7 | B | 72 | 400 | 3 | | |||
+-----------+-------+---------------+-----------------+-----------------+-----------------+ | |||
| 7 | 3 | A | 100 | 400 | 5 | | |||
+-----------+-------+---------------+-----------------+-----------------+-----------------+ | |||
| 8 | 8 | B | 100 | 400 | 4 | | |||
+-----------+-------+---------------+-----------------+-----------------+-----------------+ | |||
We showed the cube plot for this system in the class on 10 March. From the cube plot we could already see the main factors, and even the **CS** interaction was noticeable. | |||
* **C effect**: There are 4 estimates of :math:`C = \displaystyle \frac{(+25) + (+27) + (-1) + (-1)}{4} = \frac{50}{4} = \bf{12.5}` | |||
* **T effect**: There are 4 estimates of :math:`T = \displaystyle \frac{(+1) + (+3) + (+1) + (+1)}{4} = \frac{6}{4} = \bf{1.5}` | |||
* **S effect**: There are 4 estimates of :math:`S = \displaystyle \frac{(-27) + (-1) + (-29) + (-1)}{4} = \frac{-58}{4} = \bf{-14.5}` | |||
* **CT interaction**: There are 2 estimates of :math:`CT`. Recall that interactions are calculated as the half difference going from high to low. Consider the change in :math:`C` when | |||
- :math:`T_\text{high}` (at :math:`S` high) = 4 - 5 = -1 | |||
- :math:`T_\text{low}` (at :math:`S` high) = 3 - 4 = -1 | |||
- First estimate = [(-1) - (-1)]/2 = 0 | |||
- :math:`T_\text{high}` (at :math:`S` low) = 33 - 6 = +27 | |||
- :math:`T_\text{low}` (at :math:`S` low) = 30 - 5 = +25 | |||
- Second estimate = [(+27) - (+25)]/2 = +1 | |||
- Average **CT** interaction = (0 + 1)/2 = **0.5** | |||
- You can interchange :math:`C` and :math:`T` and still get the same result. | |||
* **CS interaction**: There are 2 estimates of :math:`CS`. Consider the change in :math:`C` when | |||
- :math:`S_\text{high}` (at :math:`T` high) = 4 - 5 = -1 | |||
- :math:`S_\text{low}` (at :math:`T` high) = 33 - 6 = +27 | |||
- First estimate = [(-1) - (+27)]/2 = -14 | |||
- :math:`S_\text{high}` (at :math:`T` low) = 3 - 4 = -1 | |||
- :math:`S_\text{low}` (at :math:`T` low) = 30 - 5 = +25 | |||
- Second estimate = [(-1) - (+25)]/2 = -13 | |||
- Average **CS** interaction = (-13 - 14)/2 = **-13.5** | |||
- You can interchange :math:`C` and :math:`S` and still get the same result. | |||
* **ST interaction**: There are 2 estimates of :math:`ST`: (-1 + 0)/2 = **-0.5**, calculate in the same way as above. | |||
* **CTS interaction**: There is only a single estimate of :math:`CTS`: | |||
- :math:`CT` effect at high :math:`S` = 0 | |||
- :math:`CT` effect at low :math:`S` = +1 | |||
- :math:`CTS` interaction = [(0) - (+1)] / 2 = **-0.5** | |||
- You can calculate this also by considering the :math:`CS` effect at the two levels of :math:`T` | |||
- Or, you can calculate this by considering the :math:`ST` effect at the two levels of :math:`C`. | |||
- All 3 approaches give the same result. | |||
Next, use computer software (see below) to verify that | |||
.. math:: | |||
y = 11.25 + 6.25x_C + 0.75x_T -7.25x_S + 0.25 x_C x_T -6.75 x_C x_S -0.25 x_T x_S - 0.25 x_C x_T x_S | |||
The :math:`\mathbf{X}` matrix and :math:`\mathbf{y}` vector used to calculate the least squares model: | |||
.. math:: | |||
\begin{bmatrix} 5\\30\\6\\33\\4\\3\\5\\4 \end{bmatrix} &= \begin{bmatrix} +1 & -1 & -1 & -1 & +1 & +1 & +1 & -1\\ +1 & +1 & -1 & -1 & -1 & -1 & +1 & +1\\ +1 & -1 & +1 & -1 & -1 & +1 & -1 & +1\\ +1 & +1 & +1 & -1 & +1 & -1 & -1 & -1\\ +1 & -1 & -1 & +1 & +1 & -1 & -1 & +1\\ +1 & +1 & -1 & +1 & -1 & +1 & -1 & -1\\ +1 & -1 & +1 & +1 & -1 & -1 & +1 & -1\\ +1 & +1 & +1 & +1 & +1 & +1 & +1 & +1\\ \end{bmatrix} \begin{bmatrix} b_0 \\ b_C \\ b_T \\ b_{S} \\ b_{CT} \\ b_{CS} \\ b_{TS} \\ b_{CTS} \end{bmatrix} \\ \mathbf{y} &= \mathbf{X} \mathbf{b} | |||
</rst> | |||
=== The above 3 factor example, but using computer software === | |||
[http://www.r-fiddle.org/#/fiddle?id=EFnCAWkr Try this code in a web-browser] | |||
<syntaxhighlight lang="rsplus"> | |||
# Create the design matrix in a quick way in R | |||
x <- c(-1, +1) | |||
design <- expand.grid(C=x, T=x, S=x) | |||
design | |||
C <- design$C | |||
T <- design$T | |||
S <- design$S | |||
y <- c(5, 30, 6, 33, 4, 3, 5, 4) | |||
# Full factorial model (you could make errors typing this in) | |||
mod.full <- lm(y ~ C + T + S + C*T + C*S + S*T + C*T*S) | |||
# This powerful notation will expand all terms up to the 3rd order interactions | |||
mod.full <- lm( y ~ C*T*S ) | |||
summary(mod.full) | |||
</syntaxhighlight> | |||
Compare the result from the last line with the manual calculations above. |
Revision as of 19:14, 4 January 2016
This is a work in progress. It will be completed by the end of the day on 04 January 2016.
Learning outcomes
- Learn the basic terminology of experiments: responses, factors, outcomes, real-world units vs coded units, confounding
- Analyze and interpret data from an experiment with 2, 3 or more factors by hand
- Use R to do the analysis, and interpret the various plots, such as the Pareto plot
- Analyze and interpret data from an experiment with 3 or more factors
- Recognize when to use a fractional factorial, to go mostly the same results as from a full factorial
- Understand when to use screening experiments
- Use the concepts of response surface methods to systematically reach an optimum
- What you can do when you make a mistake, or hit against constraints.
Extended readings/practice
- Don't accept what is often believed to be true: always experiment.
- In case you've been wondering how the trade-off table was constructed.
- You must cover steps 16, 21 and 22 of the detailed software tutorial to master this section of the course.
- Some extra videos about experimentation, as applied in this course:
- Interview 01: Experiments in a non-engineering context, a Hamilton baker: Made for you by Madeleine (05:23)
- Interview 02: Dr. Soo Chan Carusone talks about experiments in a medical context (07:16)
- Interview 03: David Latulippe and his student Jeff, talk about water treatment experiments (05:46)
- Interview 04: An interview with Dr. Joe Kim at McMaster University (10:54)
Resources
- Class notes 2015 This is a large file, at 219 Mb. You might prefer the smaller set of notes from 2014, but they are less comprehensive.
- Class notes 2014
- Textbook, chapter 5
- What experiments are you going to do for your course project? What are the factors, what are the levels? How did you chose your levels? What is your response? How will you measure your response?
- Quizzes (with solutions): attempt these after you have watched the videos
Tasks to do first Quiz Solution Watch videos 1A, 1B, 1C and 1D Quiz Solution Watch videos 2A, 2B, 2C, 3A and 3B (note!) Quiz Solution Watch videos 2D, 3C and 3D and 4A (note!) Quiz Solution Watch videos 4B, 4C, 4D, and 4E Quiz Solution Watch videos 4F, 4G, 4H, 5A and 5B Quiz Solution Watch videos 5C, 5D, 5E and 5F Quiz Solution
Class videos from prior years
Videos from 2015
- 00 - Introduction video for the Coursera online course [01:56]
- 1A - Why experiments are so important [07:48]
- 1B - Some basic terminology [06:37]
- 1C - Analysis of your first experiment [09:00]
- 1D - How NOT to run an experiment [03:07]
- 2A - Analysis of experiments in two factors by hand [13:37]
- 2B - Numeric predictions from two-factor experiments [07:25]
- 2C - Two-factor experiments with interactions [15:15]
- 2D - In-depth case study: analyzing a system with 3 factors by hand [17:28]
- 3A - Setting up the least squares model for a 2 factor experiment [05:46]
- 3B - Solving the mathematical model for a 2 factor experiment using software [08:46]
- 3C - Using computer software for a 3 factor experiment [08:37]
- 3D - Case study: a 4-factor system using computer software [09:03]
- 4A - The trade-offs when doing half-fraction factorials [13:20]
- 4B - The technical details behind half-fractions [09:38]
- 4C - A case study with aliasing in a fractional factorial [06:38]
- 4D - All about disturbances, why we randomize, and what covariates are [11:00]
- 4E - All about blocking [09:21]
- 4F - Fractional factorials: introducing aliasing notation [12:00]
- 4G - Fractional factorials: using aliasing notation to plan experiments [10:45]
- 4H - An example of an analyzing an experiment with aliasing [09:50]
- 5A - Response surface methods - an introduction [06:13]
- 5B - Response surface methods (RSM) in one variable [18:40]
- 5C - Why changing one factor at a time (OFAT) will mislead you [05:33]
- 5D - The concept of contour plots and which objectives should we maximize [03:40]
- 5E - RSM in 2 factors: introducing the case study [19:20]
- 5F - RSM case study continues: constraints and mistakes [13:45]
- 5G - RSM case study continues: approaching the optimum [17:05]
- 06 - Wrap-up: the course in review, multiple objectives, and references for the future [08:10]
01:56 | Download video | Download captions | Script |
07:24 | Download video | Download captions | Script |
05:48 | Download video | Download captions | Script |
08:57 | Download video | Download captions | Script |
03:23 | Download video | Download captions | Script |
13:37 | Download video | Download captions | Script |
07:25 | Download video | Download captions | Script |
15:15 | Download video | Download captions | Script |
17:28 | Download video | Download captions | Script |
05:46 | Download video | Download captions | Script |
08:46 | Download video | Download captions | Script |
08:37 | Download video | Download captions | Script |
09:03 | Download video | Download captions | Script |
13:20 | Download video | Download captions | Script |
09:38 | Download video | Download captions | Script |
06:38 | Download video | Download captions | Script |
11:00 | Download video | Download captions | Script |
09:21 | Download video | Download captions | Script |
12:00 | Download video | Download captions | Script |
10:45 | Download video | Download captions | Script |
09:50 | Download video | Download captions | Script |
06:13 | Download video | Download captions | Script |
18:40 | Download video | Download captions | Script |
05:33 | Download video | Download captions | Script |
03:40 | Download video | Download captions | Script |
19:20 | Download video | Download captions | Script |
13:45 | Download video | Download captions | Script |
17:05 | Download video | Download captions | Script |
08:10 | Download video | Download captions | Script |
Videos from 2014
Videos from 2013
Software codes for this section
Code to build a model for a 2-factor system
Try this code in a web-browser
T <- c(-1, +1, -1, +1) # centered and scaled temperature
S <- c(-1, -1, +1, +1) # centered and scaled substrate concentration
y <- c(69, 60, 64, 53) # conversion is the response, y
mod <- lm(y ~ T + S + T * S) # this works, but is more typing
mod <- lm(y ~ T*S) # preferred method
summary(mod)
Detailed analysis of the 3 factor example
<rst> <rst-options: 'toc' = False/> The data are from a plastics molding factory which must treat its waste before discharge. The :math:`y` variable represents the average amount of pollutant discharged [lb per day], while the 3 factors that were varied were:
- :math:`C`: the chemical compound added [A or B]
- :math:`T`: the treatment temperature [72°F or 100°F] - :math:`S`: the stirring speed [200 rpm or 400 rpm] - :math:`y`: the amount of pollutant discharged [lb per day]
.. tabularcolumns:: |l|l||c|c|c||c|
+-----------+-------+---------------+-----------------+-----------------+-----------------+ | Experiment| Order | :math:`C` | :math:`T` [°F] | :math:`S` [rpm] | :math:`y` [lb] | +===========+=======+===============+=================+=================+=================+ | 1 | 5 | A | 72 | 200 | 5 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 2 | 6 | B | 72 | 200 | 30 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 3 | 1 | A | 100 | 200 | 6 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 4 | 4 | B | 100 | 200 | 33 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 5 | 2 | A | 72 | 400 | 4 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 6 | 7 | B | 72 | 400 | 3 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 7 | 3 | A | 100 | 400 | 5 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 8 | 8 | B | 100 | 400 | 4 | +-----------+-------+---------------+-----------------+-----------------+-----------------+
We showed the cube plot for this system in the class on 10 March. From the cube plot we could already see the main factors, and even the **CS** interaction was noticeable.
- **C effect**: There are 4 estimates of :math:`C = \displaystyle \frac{(+25) + (+27) + (-1) + (-1)}{4} = \frac{50}{4} = \bf{12.5}`
- **T effect**: There are 4 estimates of :math:`T = \displaystyle \frac{(+1) + (+3) + (+1) + (+1)}{4} = \frac{6}{4} = \bf{1.5}`
- **S effect**: There are 4 estimates of :math:`S = \displaystyle \frac{(-27) + (-1) + (-29) + (-1)}{4} = \frac{-58}{4} = \bf{-14.5}`
- **CT interaction**: There are 2 estimates of :math:`CT`. Recall that interactions are calculated as the half difference going from high to low. Consider the change in :math:`C` when
- :math:`T_\text{high}` (at :math:`S` high) = 4 - 5 = -1 - :math:`T_\text{low}` (at :math:`S` high) = 3 - 4 = -1 - First estimate = [(-1) - (-1)]/2 = 0 - :math:`T_\text{high}` (at :math:`S` low) = 33 - 6 = +27 - :math:`T_\text{low}` (at :math:`S` low) = 30 - 5 = +25 - Second estimate = [(+27) - (+25)]/2 = +1
- Average **CT** interaction = (0 + 1)/2 = **0.5** - You can interchange :math:`C` and :math:`T` and still get the same result.
- **CS interaction**: There are 2 estimates of :math:`CS`. Consider the change in :math:`C` when
- :math:`S_\text{high}` (at :math:`T` high) = 4 - 5 = -1 - :math:`S_\text{low}` (at :math:`T` high) = 33 - 6 = +27 - First estimate = [(-1) - (+27)]/2 = -14 - :math:`S_\text{high}` (at :math:`T` low) = 3 - 4 = -1 - :math:`S_\text{low}` (at :math:`T` low) = 30 - 5 = +25 - Second estimate = [(-1) - (+25)]/2 = -13
- Average **CS** interaction = (-13 - 14)/2 = **-13.5** - You can interchange :math:`C` and :math:`S` and still get the same result.
- **ST interaction**: There are 2 estimates of :math:`ST`: (-1 + 0)/2 = **-0.5**, calculate in the same way as above.
- **CTS interaction**: There is only a single estimate of :math:`CTS`:
- :math:`CT` effect at high :math:`S` = 0 - :math:`CT` effect at low :math:`S` = +1 - :math:`CTS` interaction = [(0) - (+1)] / 2 = **-0.5**
- You can calculate this also by considering the :math:`CS` effect at the two levels of :math:`T` - Or, you can calculate this by considering the :math:`ST` effect at the two levels of :math:`C`. - All 3 approaches give the same result.
Next, use computer software (see below) to verify that
.. math:: y = 11.25 + 6.25x_C + 0.75x_T -7.25x_S + 0.25 x_C x_T -6.75 x_C x_S -0.25 x_T x_S - 0.25 x_C x_T x_S
The :math:`\mathbf{X}` matrix and :math:`\mathbf{y}` vector used to calculate the least squares model:
.. math:: \begin{bmatrix} 5\\30\\6\\33\\4\\3\\5\\4 \end{bmatrix} &= \begin{bmatrix} +1 & -1 & -1 & -1 & +1 & +1 & +1 & -1\\ +1 & +1 & -1 & -1 & -1 & -1 & +1 & +1\\ +1 & -1 & +1 & -1 & -1 & +1 & -1 & +1\\ +1 & +1 & +1 & -1 & +1 & -1 & -1 & -1\\ +1 & -1 & -1 & +1 & +1 & -1 & -1 & +1\\ +1 & +1 & -1 & +1 & -1 & +1 & -1 & -1\\ +1 & -1 & +1 & +1 & -1 & -1 & +1 & -1\\ +1 & +1 & +1 & +1 & +1 & +1 & +1 & +1\\ \end{bmatrix} \begin{bmatrix} b_0 \\ b_C \\ b_T \\ b_{S} \\ b_{CT} \\ b_{CS} \\ b_{TS} \\ b_{CTS} \end{bmatrix} \\ \mathbf{y} &= \mathbf{X} \mathbf{b} </rst>
The above 3 factor example, but using computer software
Try this code in a web-browser
# Create the design matrix in a quick way in R
x <- c(-1, +1)
design <- expand.grid(C=x, T=x, S=x)
design
C <- design$C
T <- design$T
S <- design$S
y <- c(5, 30, 6, 33, 4, 3, 5, 4)
# Full factorial model (you could make errors typing this in)
mod.full <- lm(y ~ C + T + S + C*T + C*S + S*T + C*T*S)
# This powerful notation will expand all terms up to the 3rd order interactions
mod.full <- lm( y ~ C*T*S )
summary(mod.full)
Compare the result from the last line with the manual calculations above.