Difference between revisions of "Design and analysis of experiments"

From Statistics for Engineering
Jump to navigation Jump to search
 
(34 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div class="noautonum">__TOC__</div>  
<div class="noautonum">__TOC__</div>  
<span style="padding:3px; margin:auto; display:table; border:2px solid red; max-width: 500px;">
 
This is a work in progress. It will be completed by the end of the day on 04 January 2016.
</span>
== Learning outcomes ==
== Learning outcomes ==
* Learn the basic terminology of experiments: responses, factors, outcomes, real-world units vs coded units, confounding
* Learn the basic terminology of experiments: responses, factors, outcomes, real-world units vs coded units, confounding
Line 12: Line 10:
* Use the concepts of response surface methods to systematically reach an optimum
* Use the concepts of response surface methods to systematically reach an optimum
* What you can do when you make a mistake, or hit against constraints.
* What you can do when you make a mistake, or hit against constraints.
== Extended readings/practice ==
* Don't accept what is often believed to be true: [http://arstechnica.com/science/2015/01/orchid-mantis-astonishing-camouflage-isnt-especially-orchid-like/ always experiment].
* In case you've been wondering how the [http://www.jstor.org/discover/10.2307/1267548 trade-off table was constructed].
* You must cover steps 16, 21 and 22 of the detailed [[Software_tutorial| software tutorial]] to master this section of the course.
* Some extra videos about experimentation, as applied in this course:
*# Interview 01: [https://youtu.be/VD3hsIm05Dc Experiments in a non-engineering context, a Hamilton baker: Made for you by Madeleine] (05:23)
*# Interview 02: [https://youtu.be/98ffTL5YJnc Dr. Soo Chan Carusone talks about experiments in a medical context] (07:16)
*# Interview 03: [https://youtu.be/gN0xqXqk6h0 David Latulippe and his student Jeff, talk about water treatment experiments] (05:46)
*# Interview 04: [https://youtu.be/lGs8-d3uybo An interview with Dr. Joe Kim at McMaster University] (10:54)


== Resources ==
== Resources ==


* [[Image:Nuvola_mimetypes_pdf.png|20px|link=Media:2015-4C3-6C3-Design-and-analysis-of-Experiments.pdf]]  [[Media:2015-4C3-6C3-Design-and-analysis-of-Experiments.pdf| Class notes 2015]] ''This is a large file, at 260 Mb''. You might prefer the smaller set of notes from 2014, but these are less comprehensive.
* [[Image:Nuvola_mimetypes_pdf.png|20px|link=Media:2015-4C3-6C3-Design-and-analysis-of-Experiments.pdf]]  [[Media:2015-4C3-6C3-Design-and-analysis-of-Experiments.pdf| Class notes 2015]] ''This is a large file, at 219 Mb''. You might prefer the smaller set of notes from 2014, but they are less comprehensive.
* [[Image:Nuvola_mimetypes_pdf.png|20px|link=Media:2014-4C3-6C3-Design-and-analysis-of-Experiments.pdf]] [[Media:2014-4C3-6C3-Design-and-analysis-of-Experiments.pdf | Class notes 2014]]
* [[Image:Nuvola_mimetypes_pdf.png|20px|link=Media:2014-4C3-6C3-Design-and-analysis-of-Experiments.pdf]] [[Media:2014-4C3-6C3-Design-and-analysis-of-Experiments.pdf | Class notes 2014]]
* [http://learnche.org/pid/design-analysis-experiments/index Textbook, chapter 5]
* [http://learnche.org/pid/design-analysis-experiments/index Textbook, chapter 5]
* You must have completed steps 13, and 16 to 22  of the [[Software tutorial]] to successfully use R in this section.
* Take the [http://yint.org/experiments Coursera course] on this topic. That course was partially developed based on the Statistics for Engineers course.
* What experiments are you [[Designed_experiments_projects_from_prior_years | going to do for your course project]]? What are the factors, what are the levels? How did you chose your levels? What is your response? How will you measure your response?
* What experiments are you [[Designed_experiments_projects_from_prior_years | going to do for your course project]]? What are the factors, what are the levels? How did you chose your levels? What is your response? How will you measure your response?
* Quizzes (with solutions): attempt these after you have watched the videos
* Quizzes (with solutions): attempt these after you have watched the videos
Line 40: Line 30:
| [https://docs.google.com/document/d/1HiKEkJrCchKxLG_NjXr2Y5EtaEiKJvP0uH3jA0-441g Solution]
| [https://docs.google.com/document/d/1HiKEkJrCchKxLG_NjXr2Y5EtaEiKJvP0uH3jA0-441g Solution]
|-
|-
| Watch videos BBB
| Watch videos 2A, 2B, 2C, 3A and 3B (note!)
| [https://docs.google.com/document/d/1yopqdqe6Ts6LTGSY_qCMFd7F4JydIrxg_wVA4R8LZrs Quiz]
| [https://docs.google.com/document/d/1yopqdqe6Ts6LTGSY_qCMFd7F4JydIrxg_wVA4R8LZrs Quiz]
| [https://docs.google.com/document/d/1Od5E1mQZSfm9O-vRaox20VvOFEqo_RmzvuFuYvo5T6I Solution]
| [https://docs.google.com/document/d/1Od5E1mQZSfm9O-vRaox20VvOFEqo_RmzvuFuYvo5T6I Solution]
|-
|-
| Watch videos CCC
| Watch videos 2D, 3C and 3D and 4A (note!)
| [https://docs.google.com/document/d/1db1OgwCROvo7JEOMZ7xlJEMW6nQAARDTz5sAva1zY4w_ Quiz]
| [https://docs.google.com/document/d/1db1OgwCROvo7JEOMZ7xlJEMW6nQAARDTz5sAva1zY4w Quiz]
| [https://docs.google.com/document/d/1T4_MSW3PX2TXqFJZvCSnMnW_suSq7-Ujx3HsxJTEqBc Solution]
| [https://docs.google.com/document/d/1T4_MSW3PX2TXqFJZvCSnMnW_suSq7-Ujx3HsxJTEqBc Solution]
|-
|-
| Watch videos DDD
| Watch videos 4B, 4C, 4D, and 4E
| [https://docs.google.com/document/d/1wXAbu7cDOJ8fj7KIKU89lWmmXYNqh-qm4K-DSRAJf-Q Quiz]
| [https://docs.google.com/document/d/1wXAbu7cDOJ8fj7KIKU89lWmmXYNqh-qm4K-DSRAJf-Q Quiz]
| [https://docs.google.com/document/d/11-YfEbhn9b6q20ey9p7tbB5_U44mwEqZpl7a0hO6lXY Solution]
| [https://docs.google.com/document/d/11-YfEbhn9b6q20ey9p7tbB5_U44mwEqZpl7a0hO6lXY Solution]
|-
|-
| Watch videos EEE
| Watch videos 4F, 4G, 4H, 5A and 5B
| [https://docs.google.com/document/d/1inrBO1R0Tu-TZO-NylYheA69Me67Z4XirrNIH1OkuMs Quiz]
| [https://docs.google.com/document/d/1inrBO1R0Tu-TZO-NylYheA69Me67Z4XirrNIH1OkuMs Quiz]
| [https://docs.google.com/document/d/14ICoIx0N9j0YDimxr8SqkmGgnkhgplzDuEZqBlgzvrA Solution]
| [https://docs.google.com/document/d/14ICoIx0N9j0YDimxr8SqkmGgnkhgplzDuEZqBlgzvrA Solution]
|-
|-
| Watch videos FFF
| Watch videos 5C, 5D, 5E and 5F
| [https://docs.google.com/document/d/1msOcyj61D8O8uxTtEgLm3O4YSC4dHYIXpbg-G5w27jM Quiz]
| [https://docs.google.com/document/d/1msOcyj61D8O8uxTtEgLm3O4YSC4dHYIXpbg-G5w27jM Quiz]
| [https://docs.google.com/document/d/1cgmbIRNUsf3t4-cA7XE3Qg8Y_ivyJvAuNzV7AAf9LXA Solution]
| [https://docs.google.com/document/d/1cgmbIRNUsf3t4-cA7XE3Qg8Y_ivyJvAuNzV7AAf9LXA Solution]
|}
|}
== Extended readings/practice ==
* Don't accept what is often believed to be true: [http://arstechnica.com/science/2015/01/orchid-mantis-astonishing-camouflage-isnt-especially-orchid-like/ always experiment].
* In case you've been wondering how the [http://www.jstor.org/discover/10.2307/1267548 trade-off table was constructed].
* You must cover steps 16, 21 and 22 of the detailed [[Software_tutorial| software tutorial]] to master this section of the course.
* Some extra videos about experimentation, as applied in this course:
*# Interview 01: [https://youtu.be/VD3hsIm05Dc Experiments in a non-engineering context, a Hamilton baker: Made for you by Madeleine] (05:23)
*# Interview 02: [https://youtu.be/98ffTL5YJnc Dr. Soo Chan Carusone talks about experiments in a medical context] (07:16)
*# Interview 03: [https://youtu.be/gN0xqXqk6h0 David Latulippe and his student Jeff, talk about water treatment experiments] (05:46)
*# Interview 04: [https://youtu.be/lGs8-d3uybo An interview with Dr. Joe Kim at McMaster University] (10:54)
* '''Something new to try''' (09 March 2016): test your knowledge about Design of Experiments by optimizing these systems: [http://yint.org/rsmopt-m http://rsmopt.com]


== Class videos from prior years ==
== Class videos from prior years ==
===Videos from 2015===
===Videos from 2015===
Watch all these videos in [https://www.youtube.com/watch?v=9Sljs2064u4&list=PLHUnYbefLmeOPRuT1sukKmRyOVd4WSxJE&index=31 this YouTube playlist]


* 00 - Introduction video for the Coursera online course [01:56]
* 00 - Introduction video for the Coursera online course [01:56]
Line 243: Line 246:
[[Design_and_analysis_of_experiments_(2014)|See the webpage from 2014]]
[[Design_and_analysis_of_experiments_(2014)|See the webpage from 2014]]


{{#widget:Vimeo|id=87882235}}
<!--{{#widget:Vimeo|id=87882235}}
{{#widget:Vimeo|id=88125382}}
{{#widget:Vimeo|id=88125382}}
{{#widget:Vimeo|id=88302027}}
{{#widget:Vimeo|id=88302027}}
Line 255: Line 258:
{{#widget:Vimeo|id=90230249}}
{{#widget:Vimeo|id=90230249}}
{{#widget:Vimeo|id=90274891}}
{{#widget:Vimeo|id=90274891}}
{{#widget:Vimeo|id=90659606}}
{{#widget:Vimeo|id=90659606}}-->


===Videos from 2013===
===Videos from 2013===
[[Design_and_analysis_of_experiments_(2013)|See the webpage from 2013]]
[[Design_and_analysis_of_experiments_(2013)|See the webpage from 2013]]


<!--
{{#widget:Vimeo|id=61377497}}
{{#widget:Vimeo|id=61377497}}
{{#widget:Vimeo|id=61649809}}
{{#widget:Vimeo|id=61649809}}
Line 271: Line 275:
{{#widget:Vimeo|id=63267610}}
{{#widget:Vimeo|id=63267610}}
{{#widget:Vimeo|id=63425141}}
{{#widget:Vimeo|id=63425141}}
{{#widget:Vimeo|id=63679543}}
{{#widget:Vimeo|id=63679543}}-->


== Software codes for this section ==
== Software codes for this section ==


=== Code to show how to build and plot a least squares model in R ===
=== Code to build a model for a 2-factor system ===
[http://www.r-fiddle.org/#/fiddle?id=tOdY4LEG Try this code in a web-browser]
* [http://www.r-fiddle.org/#/fiddle?id=RJyNzQ1B Run this code in a web-browser]
 
<html><div data-datacamp-exercise data-lang="r" data-height="400px">
    <code data-type="sample-code">
 
# Centered and scaled temperature
T <- c(-1, +1, -1, +1)     
 
# Centered and scaled substrate concentration
S <- c(-1, -1, +1, +1)     
 
# Conversion is the response, y
y <- c(69, 60, 64, 53)     
 
# this works, but is more typing
mod <- lm(y ~ T + S + T * S)
 
# preferred method
mod <- lm(y ~ T * S)     
   
summary(mod)
    </code>
</div></html>
 
=== Detailed analysis of the 3 factor example ===
<rst>
<rst-options: 'toc' = False/>
The data are from a plastics molding factory which must treat its waste before discharge.  The :math:`y` variable represents the average amount of pollutant discharged [lb per day], while the 3 factors that were varied were:
 
- :math:`C`: the chemical compound added [A or B]
- :math:`T`: the treatment temperature [72°F or 100°F]
- :math:`S`: the stirring speed [200 rpm or 400 rpm]
- :math:`y`: the amount of pollutant discharged [lb per day]
 
.. tabularcolumns:: |l|l||c|c|c||c|
 
+-----------+-------+---------------+-----------------+-----------------+-----------------+
| Experiment| Order | :math:`C`    | :math:`T` [°F]  | :math:`S` [rpm] | :math:`y` [lb]  |
+===========+=======+===============+=================+=================+=================+
| 1        | 5    | A            | 72              | 200            | 5              |
+-----------+-------+---------------+-----------------+-----------------+-----------------+
| 2        | 6    | B            | 72              | 200            | 30              |
+-----------+-------+---------------+-----------------+-----------------+-----------------+
| 3        | 1    | A            | 100            | 200            | 6              |
+-----------+-------+---------------+-----------------+-----------------+-----------------+
| 4        | 4    | B            | 100            | 200            | 33              |
+-----------+-------+---------------+-----------------+-----------------+-----------------+
| 5        | 2    | A            | 72              | 400            | 4              |
+-----------+-------+---------------+-----------------+-----------------+-----------------+
| 6        | 7    | B            | 72              | 400            | 3              |
+-----------+-------+---------------+-----------------+-----------------+-----------------+
| 7        | 3    | A            | 100            | 400            | 5              |
+-----------+-------+---------------+-----------------+-----------------+-----------------+
| 8        | 8    | B            | 100            | 400            | 4              |
+-----------+-------+---------------+-----------------+-----------------+-----------------+
 
We showed the cube plot for this system in the class on 10 March.  From the cube plot we could already see the main factors, and even the **CS** interaction was noticeable.
 
* **C effect**: There are 4 estimates of :math:`C = \displaystyle \frac{(+25) + (+27) + (-1) + (-1)}{4} = \frac{50}{4} = \bf{12.5}`
* **T effect**: There are 4 estimates of :math:`T = \displaystyle \frac{(+1) + (+3) + (+1) + (+1)}{4} = \frac{6}{4} = \bf{1.5}`
* **S effect**: There are 4 estimates of :math:`S = \displaystyle \frac{(-27) + (-1) + (-29) + (-1)}{4} = \frac{-58}{4} = \bf{-14.5}`
* **CT interaction**: There are 2 estimates of :math:`CT`.  Recall that interactions are calculated as the half difference going from high to low.  Consider the change in :math:`C` when
 
- :math:`T_\text{high}` (at :math:`S` high) = 4 - 5 = -1
- :math:`T_\text{low}` (at :math:`S` high) = 3 - 4 = -1
- First estimate = [(-1) - (-1)]/2 = 0
- :math:`T_\text{high}` (at :math:`S` low) = 33 - 6 = +27
- :math:`T_\text{low}` (at :math:`S` low) = 30 - 5 = +25
- Second estimate = [(+27) - (+25)]/2 = +1
- Average **CT** interaction = (0 + 1)/2 = **0.5**
- You can interchange :math:`C` and :math:`T` and still get the same result.
 
* **CS interaction**: There are 2 estimates of :math:`CS`.  Consider the change in :math:`C` when
 
- :math:`S_\text{high}` (at :math:`T` high) = 4 - 5 = -1
- :math:`S_\text{low}` (at :math:`T` high) = 33 - 6 = +27
- First estimate = [(-1) - (+27)]/2 = -14
- :math:`S_\text{high}` (at :math:`T` low) = 3 - 4 = -1
- :math:`S_\text{low}` (at :math:`T` low) = 30 - 5 = +25
- Second estimate = [(-1) - (+25)]/2 = -13
 
- Average **CS** interaction = (-13 - 14)/2 = **-13.5**
- You can interchange :math:`C` and :math:`S` and still get the same result.
* **ST interaction**: There are 2 estimates of :math:`ST`: (-1 + 0)/2 = **-0.5**, calculate in the same way as above.
 
* **CTS interaction**: There is only a single estimate of :math:`CTS`:
 
- :math:`CT` effect at high :math:`S` = 0
- :math:`CT` effect at low :math:`S` = +1
- :math:`CTS` interaction = [(0) - (+1)] / 2 = **-0.5**
- You can calculate this also by considering the :math:`CS` effect at the two levels of :math:`T`
- Or, you can calculate this by considering the :math:`ST` effect at the two levels of :math:`C`.
- All 3 approaches give the same result.
Next, use computer software (see below) to verify that
 
.. math::
y = 11.25 + 6.25x_C + 0.75x_T -7.25x_S + 0.25 x_C x_T -6.75 x_C x_S -0.25 x_T x_S - 0.25 x_C x_T x_S
 
The :math:`\mathbf{X}` matrix and :math:`\mathbf{y}` vector used to calculate the least squares model:
 
.. math::
\begin{bmatrix} 5\\30\\6\\33\\4\\3\\5\\4 \end{bmatrix} &= \begin{bmatrix} +1 & -1 & -1 & -1 & +1 & +1 & +1 & -1\\                 +1 & +1 & -1 & -1 & -1 & -1 & +1 & +1\\                 +1 & -1 & +1 & -1 & -1 & +1 & -1 & +1\\                 +1 & +1 & +1 & -1 & +1 & -1 & -1 & -1\\                 +1 & -1 & -1 & +1 & +1 & -1 & -1 & +1\\                 +1 & +1 & -1 & +1 & -1 & +1 & -1 & -1\\                 +1 & -1 & +1 & +1 & -1 & -1 & +1 & -1\\                 +1 & +1 & +1 & +1 & +1 & +1 & +1 & +1\\ \end{bmatrix} \begin{bmatrix} b_0 \\ b_C \\ b_T \\ b_{S} \\ b_{CT} \\ b_{CS} \\ b_{TS} \\ b_{CTS}  \end{bmatrix} \\ \mathbf{y} &= \mathbf{X} \mathbf{b}
</rst>
=== The above 3 factor example, but using computer software ===
 
* [http://www.r-fiddle.org/#/fiddle?id=EFnCAWkr Run this code in a web-browser]
 
<html><div data-datacamp-exercise data-lang="r" data-height="auto">
    <code data-type="sample-code">
 
# Create the design matrix in a quick way in R
x <- c(-1, +1)
design <- expand.grid(C=x, T=x, S=x)
design
C <- design$C
T <- design$T
S <- design$S
y <- c(5, 30, 6, 33, 4, 3, 5, 4)
# Full factorial model (you could make errors typing this in)
mod.full <- lm(y ~ C + T + S + C*T + C*S + S*T + C*T*S)
# This powerful notation will expand all terms up to the 3rd order interactions
mod.full <- lm( y ~ C * T * S )
summary(mod.full)
 
    </code>
</div></html>
 
Compare the result from the last line with the manual calculations above.
 
=== Quickly calculate, with software, the names of all the interactions terms ===
* Calculating all combinations of the interactions can be tedious by-hand in larger models.
* Let the computer do the work for you.
* [http://www.r-fiddle.org/#/fiddle?id=BsGCk7wB Run this code in a web-browser]
 
<html><div data-datacamp-exercise data-lang="r" data-height="auto">
    <code data-type="sample-code">
 
print('Two-factor interactions in a 4-factor model')
attr(terms.formula( y ~ (A+B+C+D)^2 ), "term.labels")
 
print('Two- and three-factor interactions in a 4-factor model')
attr(terms.formula( y ~ (A+B+C+D)^3 ), "term.labels")
 
print('Two-, 3-, and 4-factor interactions in a 4-factor model')
attr(terms.formula( y ~ (A+B+C+D)^4 ), "term.labels") 
    </code>
</div></html>
 
=== Plot a Pareto plot of all coefficients in a least-squares model ===
 
* [http://www.r-fiddle.org/#/fiddle?id=xD45WYG8 Run this code in a web-browser]
 
<html><div data-datacamp-exercise data-lang="r" data-height="auto">
    <code data-type="sample-code">
 
# First create a model to demonstrate.
x <- c(-1, +1)
design <- expand.grid(C=x, T=x, S=x)
C <- design$C
T <- design$T
S <- design$S
y <- c(5, 30, 6, 33, 4, 3, 5, 4)
my.model <- lm( y ~ C * T * S )
 
# The quick way to plot the coefficients: you can supply
# the "paretoPlot" function with any least-squares model
#library(pid)
 
# The "PID" library is not part of this web-based version
# of R. So load it from this server:
source('https://yint.org/paretoPlot.R')
 
# And now we can generate the plot:
p <- paretoPlot(my.model) 
    </code>
</div></html>
 
=== Fractional-factorial example (saturated design) in R, including how to eliminate factors ===
 
* The code shows how the Pareto plot is used to find variables with low significance.
* These variables can be eliminated and the model rebuilt without them.
* [http://www.r-fiddle.org/#/fiddle?id=2IG7mBbA  Run this code in a web-browser]
<html><div data-datacamp-exercise data-lang="r" data-height="600px">
    <code data-type="sample-code">
A <- B <- C <- c(-1, +1)
design <- expand.grid(A=A, B=B, C=C)
A <- design$A
B <- design$B
C <- design$C
# Specify the other factors 4: up to 7 factors plus an
# intercept can be estimated from the 8 experiments.
D <- A*B
E <- A*C
F <- B*C
G <- A*B*C
 
# The results from the 8 experiments
y <- c(77.1, 68.9, 75.5, 72.5, 67.9, 68.5, 71.5, 63.7)
mod <- lm(y ~ A + B + C + D + E + F + G)
summary(mod)
 
# Visualize the Pareto plot using the PID library
#library(pid)
# The "PID" library is not part of this web-based version
# of R. So load it from this server:
source('https://yint.org/paretoPlot.R')
paretoPlot(mod)
# The factors B, F and D are not important.
# Remove and rebuild the model without them.
mod.update.1 <- lm(y ~ A + C + E + G)
confint(mod.update.1)
paretoPlot(mod.update.1)
 
# Notice that the bars are the same length as before.
# Can you explain why?
# The confidence interval for factor E spans zero. It
# might be of practical relevance, but it is not of
# statistical relevance.
mod.update.2 <- lm(y ~ A + C + G)
confint(mod.update.2)
 
    </code>
</div></html>
 
=== Plot a contour plot of a least-squares DOE model: a system with 2 factors ===
 
* [http://www.r-fiddle.org/#/fiddle?id=PPFHsIzX Run this code in a web-browser]
<html><div data-datacamp-exercise data-lang="r" data-height="700px">
    <code data-type="sample-code">
# First create a model to demonstrate.
# Note that this model has a center-point at (0,0)
T <- c(-1,    +1,  -1,  +1,    0)
S <- c(-1,    -1,  +1,  +1,    0)
y <- c(193,  310,  468,  571,  407)
# Visualize the surface
my.model.round1 <- lm(y ~ T*S)
 
# Note that the interaction is small;
# we should observe that in the contour plot
summary(my.model.round1)     
 
# The PID library can draw contour plots for
# any least squares model where there are two factors:
# library(pid)
# The "PID" library is not part of this web-based version
# of R. So load it from this server:
source('https://yint.org/contourPlot.R')
 
# Confirms the weak interaction
contourPlot(my.model.round1) 
 
# Now make the interaction more prominent
T <- c(-1,    +1,  -1,  +1,    0)
S <- c(-1,    -1,  +1,  +1,    0)
y <- c(193,  310,  281,  571,  407)
my.model.round2 <- lm(y ~ T*S)
 
# Note the stronger interaction
summary(my.model.round2)
 
# Confirms the strong interaction
contourPlot(my.model.round2)
    </code>
</div></html>
 
=== Plot a contour plot of a least-squares DOE model: a system with 3 factors ===
 
* When you have 3 (or more) factors in a DOE system, you can still use the <tt>contourPlot()</tt> function in R.
* Except, you need to tell the software which variables you want to see on the \(x\) and \(y\) axes.
* [http://www.r-fiddle.org/#/fiddle?id=JGkhreRt Run this code in a web-browser]
 
<html><div data-datacamp-exercise data-lang="r" data-height="600px">
    <code data-type="sample-code">
# First create a model to demonstrate.
x <- c(-1, +1)
design <- expand.grid(C=x, T=x, S=x)
C <- design$C
T <- design$T
S <- design$S
y <- c(5, 30, 6, 33, 4, 3, 5, 4)
mod.full <- lm( y ~ C*T*S )
 
# The summary shows a high C:S interaction, but
# the other interactions are weak. We should confirm
# this in the visualizations that follow next.
summary(mod.full)
 
# Visualize the surface using the PID library in R
# library(pid)
# The "PID" library is not part of this web-based version
# of R. So load it from this server:
source('https://yint.org/contourPlot.R')
 
# Any least squares model can be supplied as
# the first input into the contourPlot function.
 
# Start by visualizing the C and T factors
# (small interaction only)
contourPlot(mod.full, "C", "T", main='Weak interaction')
 
# Try the T and S factors
# (some interaction)
contourPlot(mod.full, "T", "S", main='Weak interaction')
 
# Try the C and S factor combination
# (high interaction)
contourPlot(mod.full, "C", "S", main='Strong interaction')
 
# Change the colour scheme and the plot resolution
contourPlot(mod.full, "C", "S", N=50,
            colour.function=rainbow,
            main='Different colour scheme')
 
# In which direction would you go to minimize
# the response surface?
</code>
</div></html>


<syntaxhighlight lang="rsplus">
=== Bringing it all together: a comprehensive response surface method (RSM) example) ===


</syntaxhighlight>
The code is provided here for you to try: http://yint.org/rsm-code

Latest revision as of 20:48, 14 January 2019

Learning outcomes

  • Learn the basic terminology of experiments: responses, factors, outcomes, real-world units vs coded units, confounding
  • Analyze and interpret data from an experiment with 2, 3 or more factors by hand
  • Use R to do the analysis, and interpret the various plots, such as the Pareto plot
  • Analyze and interpret data from an experiment with 3 or more factors
  • Recognize when to use a fractional factorial, to go mostly the same results as from a full factorial
  • Understand when to use screening experiments
  • Use the concepts of response surface methods to systematically reach an optimum
  • What you can do when you make a mistake, or hit against constraints.

Resources

  • Nuvola mimetypes pdf.png Class notes 2015 This is a large file, at 219 Mb. You might prefer the smaller set of notes from 2014, but they are less comprehensive.
  • Nuvola mimetypes pdf.png Class notes 2014
  • Textbook, chapter 5
  • You must have completed steps 13, and 16 to 22 of the Software tutorial to successfully use R in this section.
  • Take the Coursera course on this topic. That course was partially developed based on the Statistics for Engineers course.
  • What experiments are you going to do for your course project? What are the factors, what are the levels? How did you chose your levels? What is your response? How will you measure your response?
  • Quizzes (with solutions): attempt these after you have watched the videos
Tasks to do first Quiz Solution
Watch videos 1A, 1B, 1C and 1D Quiz Solution
Watch videos 2A, 2B, 2C, 3A and 3B (note!) Quiz Solution
Watch videos 2D, 3C and 3D and 4A (note!) Quiz Solution
Watch videos 4B, 4C, 4D, and 4E Quiz Solution
Watch videos 4F, 4G, 4H, 5A and 5B Quiz Solution
Watch videos 5C, 5D, 5E and 5F Quiz Solution

Extended readings/practice

Class videos from prior years

Videos from 2015

Watch all these videos in this YouTube playlist

  • 00 - Introduction video for the Coursera online course [01:56]
  • 1A - Why experiments are so important [07:48]
  • 1B - Some basic terminology [06:37]
  • 1C - Analysis of your first experiment [09:00]
  • 1D - How NOT to run an experiment [03:07]
  • 2A - Analysis of experiments in two factors by hand [13:37]
  • 2B - Numeric predictions from two-factor experiments [07:25]
  • 2C - Two-factor experiments with interactions [15:15]
  • 2D - In-depth case study: analyzing a system with 3 factors by hand [17:28]
  • 3A - Setting up the least squares model for a 2 factor experiment [05:46]
  • 3B - Solving the mathematical model for a 2 factor experiment using software [08:46]
  • 3C - Using computer software for a 3 factor experiment [08:37]
  • 3D - Case study: a 4-factor system using computer software [09:03]
  • 4A - The trade-offs when doing half-fraction factorials [13:20]
  • 4B - The technical details behind half-fractions [09:38]
  • 4C - A case study with aliasing in a fractional factorial [06:38]
  • 4D - All about disturbances, why we randomize, and what covariates are [11:00]
  • 4E - All about blocking [09:21]
  • 4F - Fractional factorials: introducing aliasing notation [12:00]
  • 4G - Fractional factorials: using aliasing notation to plan experiments [10:45]
  • 4H - An example of an analyzing an experiment with aliasing [09:50]
  • 5A - Response surface methods - an introduction [06:13]
  • 5B - Response surface methods (RSM) in one variable [18:40]
  • 5C - Why changing one factor at a time (OFAT) will mislead you [05:33]
  • 5D - The concept of contour plots and which objectives should we maximize [03:40]
  • 5E - RSM in 2 factors: introducing the case study [19:20]
  • 5F - RSM case study continues: constraints and mistakes [13:45]
  • 5G - RSM case study continues: approaching the optimum [17:05]
  • 06 - Wrap-up: the course in review, multiple objectives, and references for the future [08:10]
01:56 | Download video | Download captions | Script
07:24 | Download video | Download captions | Script
05:48 | Download video | Download captions | Script
08:57 | Download video | Download captions | Script
03:23 | Download video | Download captions | Script
13:37 | Download video | Download captions | Script
07:25 | Download video | Download captions | Script
15:15 | Download video | Download captions | Script
17:28 | Download video | Download captions | Script
05:46 | Download video | Download captions | Script
08:46 | Download video | Download captions | Script
08:37 | Download video | Download captions | Script
09:03 | Download video | Download captions | Script
13:20 | Download video | Download captions | Script
09:38 | Download video | Download captions | Script
06:38 | Download video | Download captions | Script
11:00 | Download video | Download captions | Script
09:21 | Download video | Download captions | Script
12:00 | Download video | Download captions | Script
10:45 | Download video | Download captions | Script
09:50 | Download video | Download captions | Script
06:13 | Download video | Download captions | Script
18:40 | Download video | Download captions | Script
05:33 | Download video | Download captions | Script
03:40 | Download video | Download captions | Script
19:20 | Download video | Download captions | Script
13:45 | Download video | Download captions | Script
17:05 | Download video | Download captions | Script
08:10 | Download video | Download captions | Script

Videos from 2014

See the webpage from 2014


Videos from 2013

See the webpage from 2013


Software codes for this section

Code to build a model for a 2-factor system

# Centered and scaled temperature T <- c(-1, +1, -1, +1) # Centered and scaled substrate concentration S <- c(-1, -1, +1, +1) # Conversion is the response, y y <- c(69, 60, 64, 53) # this works, but is more typing mod <- lm(y ~ T + S + T * S) # preferred method mod <- lm(y ~ T * S) summary(mod)

Detailed analysis of the 3 factor example

<rst> <rst-options: 'toc' = False/> The data are from a plastics molding factory which must treat its waste before discharge. The :math:`y` variable represents the average amount of pollutant discharged [lb per day], while the 3 factors that were varied were:

	-	:math:`C`: the chemical compound added [A or B]

- :math:`T`: the treatment temperature [72°F or 100°F] - :math:`S`: the stirring speed [200 rpm or 400 rpm] - :math:`y`: the amount of pollutant discharged [lb per day]

.. tabularcolumns:: |l|l||c|c|c||c|

+-----------+-------+---------------+-----------------+-----------------+-----------------+ | Experiment| Order | :math:`C` | :math:`T` [°F] | :math:`S` [rpm] | :math:`y` [lb] | +===========+=======+===============+=================+=================+=================+ | 1 | 5 | A | 72 | 200 | 5 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 2 | 6 | B | 72 | 200 | 30 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 3 | 1 | A | 100 | 200 | 6 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 4 | 4 | B | 100 | 200 | 33 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 5 | 2 | A | 72 | 400 | 4 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 6 | 7 | B | 72 | 400 | 3 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 7 | 3 | A | 100 | 400 | 5 | +-----------+-------+---------------+-----------------+-----------------+-----------------+ | 8 | 8 | B | 100 | 400 | 4 | +-----------+-------+---------------+-----------------+-----------------+-----------------+

We showed the cube plot for this system in the class on 10 March. From the cube plot we could already see the main factors, and even the **CS** interaction was noticeable.

  • **C effect**: There are 4 estimates of :math:`C = \displaystyle \frac{(+25) + (+27) + (-1) + (-1)}{4} = \frac{50}{4} = \bf{12.5}`
  • **T effect**: There are 4 estimates of :math:`T = \displaystyle \frac{(+1) + (+3) + (+1) + (+1)}{4} = \frac{6}{4} = \bf{1.5}`
  • **S effect**: There are 4 estimates of :math:`S = \displaystyle \frac{(-27) + (-1) + (-29) + (-1)}{4} = \frac{-58}{4} = \bf{-14.5}`
  • **CT interaction**: There are 2 estimates of :math:`CT`. Recall that interactions are calculated as the half difference going from high to low. Consider the change in :math:`C` when

- :math:`T_\text{high}` (at :math:`S` high) = 4 - 5 = -1 - :math:`T_\text{low}` (at :math:`S` high) = 3 - 4 = -1 - First estimate = [(-1) - (-1)]/2 = 0 - :math:`T_\text{high}` (at :math:`S` low) = 33 - 6 = +27 - :math:`T_\text{low}` (at :math:`S` low) = 30 - 5 = +25 - Second estimate = [(+27) - (+25)]/2 = +1

- Average **CT** interaction = (0 + 1)/2 = **0.5** - You can interchange :math:`C` and :math:`T` and still get the same result.

  • **CS interaction**: There are 2 estimates of :math:`CS`. Consider the change in :math:`C` when

- :math:`S_\text{high}` (at :math:`T` high) = 4 - 5 = -1 - :math:`S_\text{low}` (at :math:`T` high) = 33 - 6 = +27 - First estimate = [(-1) - (+27)]/2 = -14 - :math:`S_\text{high}` (at :math:`T` low) = 3 - 4 = -1 - :math:`S_\text{low}` (at :math:`T` low) = 30 - 5 = +25 - Second estimate = [(-1) - (+25)]/2 = -13

- Average **CS** interaction = (-13 - 14)/2 = **-13.5** - You can interchange :math:`C` and :math:`S` and still get the same result.

  • **ST interaction**: There are 2 estimates of :math:`ST`: (-1 + 0)/2 = **-0.5**, calculate in the same way as above.
  • **CTS interaction**: There is only a single estimate of :math:`CTS`:

- :math:`CT` effect at high :math:`S` = 0 - :math:`CT` effect at low :math:`S` = +1 - :math:`CTS` interaction = [(0) - (+1)] / 2 = **-0.5**

- You can calculate this also by considering the :math:`CS` effect at the two levels of :math:`T` - Or, you can calculate this by considering the :math:`ST` effect at the two levels of :math:`C`. - All 3 approaches give the same result.

Next, use computer software (see below) to verify that

.. math:: y = 11.25 + 6.25x_C + 0.75x_T -7.25x_S + 0.25 x_C x_T -6.75 x_C x_S -0.25 x_T x_S - 0.25 x_C x_T x_S

The :math:`\mathbf{X}` matrix and :math:`\mathbf{y}` vector used to calculate the least squares model:

.. math:: \begin{bmatrix} 5\\30\\6\\33\\4\\3\\5\\4 \end{bmatrix} &= \begin{bmatrix} +1 & -1 & -1 & -1 & +1 & +1 & +1 & -1\\ +1 & +1 & -1 & -1 & -1 & -1 & +1 & +1\\ +1 & -1 & +1 & -1 & -1 & +1 & -1 & +1\\ +1 & +1 & +1 & -1 & +1 & -1 & -1 & -1\\ +1 & -1 & -1 & +1 & +1 & -1 & -1 & +1\\ +1 & +1 & -1 & +1 & -1 & +1 & -1 & -1\\ +1 & -1 & +1 & +1 & -1 & -1 & +1 & -1\\ +1 & +1 & +1 & +1 & +1 & +1 & +1 & +1\\ \end{bmatrix} \begin{bmatrix} b_0 \\ b_C \\ b_T \\ b_{S} \\ b_{CT} \\ b_{CS} \\ b_{TS} \\ b_{CTS} \end{bmatrix} \\ \mathbf{y} &= \mathbf{X} \mathbf{b} </rst>

The above 3 factor example, but using computer software

# Create the design matrix in a quick way in R x <- c(-1, +1) design <- expand.grid(C=x, T=x, S=x) design C <- design$C T <- design$T S <- design$S y <- c(5, 30, 6, 33, 4, 3, 5, 4) # Full factorial model (you could make errors typing this in) mod.full <- lm(y ~ C + T + S + C*T + C*S + S*T + C*T*S) # This powerful notation will expand all terms up to the 3rd order interactions mod.full <- lm( y ~ C * T * S ) summary(mod.full)

Compare the result from the last line with the manual calculations above.

Quickly calculate, with software, the names of all the interactions terms

  • Calculating all combinations of the interactions can be tedious by-hand in larger models.
  • Let the computer do the work for you.
  • Run this code in a web-browser

print('Two-factor interactions in a 4-factor model') attr(terms.formula( y ~ (A+B+C+D)^2 ), "term.labels") print('Two- and three-factor interactions in a 4-factor model') attr(terms.formula( y ~ (A+B+C+D)^3 ), "term.labels") print('Two-, 3-, and 4-factor interactions in a 4-factor model') attr(terms.formula( y ~ (A+B+C+D)^4 ), "term.labels")

Plot a Pareto plot of all coefficients in a least-squares model

# First create a model to demonstrate. x <- c(-1, +1) design <- expand.grid(C=x, T=x, S=x) C <- design$C T <- design$T S <- design$S y <- c(5, 30, 6, 33, 4, 3, 5, 4) my.model <- lm( y ~ C * T * S ) # The quick way to plot the coefficients: you can supply # the "paretoPlot" function with any least-squares model #library(pid) # The "PID" library is not part of this web-based version # of R. So load it from this server: source('https://yint.org/paretoPlot.R') # And now we can generate the plot: p <- paretoPlot(my.model)

Fractional-factorial example (saturated design) in R, including how to eliminate factors

  • The code shows how the Pareto plot is used to find variables with low significance.
  • These variables can be eliminated and the model rebuilt without them.
  • Run this code in a web-browser

A <- B <- C <- c(-1, +1) design <- expand.grid(A=A, B=B, C=C) A <- design$A B <- design$B C <- design$C # Specify the other factors 4: up to 7 factors plus an # intercept can be estimated from the 8 experiments. D <- A*B E <- A*C F <- B*C G <- A*B*C # The results from the 8 experiments y <- c(77.1, 68.9, 75.5, 72.5, 67.9, 68.5, 71.5, 63.7) mod <- lm(y ~ A + B + C + D + E + F + G) summary(mod) # Visualize the Pareto plot using the PID library #library(pid) # The "PID" library is not part of this web-based version # of R. So load it from this server: source('https://yint.org/paretoPlot.R') paretoPlot(mod) # The factors B, F and D are not important. # Remove and rebuild the model without them. mod.update.1 <- lm(y ~ A + C + E + G) confint(mod.update.1) paretoPlot(mod.update.1) # Notice that the bars are the same length as before. # Can you explain why? # The confidence interval for factor E spans zero. It # might be of practical relevance, but it is not of # statistical relevance. mod.update.2 <- lm(y ~ A + C + G) confint(mod.update.2)

Plot a contour plot of a least-squares DOE model: a system with 2 factors

# First create a model to demonstrate. # Note that this model has a center-point at (0,0) T <- c(-1, +1, -1, +1, 0) S <- c(-1, -1, +1, +1, 0) y <- c(193, 310, 468, 571, 407) # Visualize the surface my.model.round1 <- lm(y ~ T*S) # Note that the interaction is small; # we should observe that in the contour plot summary(my.model.round1) # The PID library can draw contour plots for # any least squares model where there are two factors: # library(pid) # The "PID" library is not part of this web-based version # of R. So load it from this server: source('https://yint.org/contourPlot.R') # Confirms the weak interaction contourPlot(my.model.round1) # Now make the interaction more prominent T <- c(-1, +1, -1, +1, 0) S <- c(-1, -1, +1, +1, 0) y <- c(193, 310, 281, 571, 407) my.model.round2 <- lm(y ~ T*S) # Note the stronger interaction summary(my.model.round2) # Confirms the strong interaction contourPlot(my.model.round2)

Plot a contour plot of a least-squares DOE model: a system with 3 factors

  • When you have 3 (or more) factors in a DOE system, you can still use the contourPlot() function in R.
  • Except, you need to tell the software which variables you want to see on the \(x\) and \(y\) axes.
  • Run this code in a web-browser

# First create a model to demonstrate. x <- c(-1, +1) design <- expand.grid(C=x, T=x, S=x) C <- design$C T <- design$T S <- design$S y <- c(5, 30, 6, 33, 4, 3, 5, 4) mod.full <- lm( y ~ C*T*S ) # The summary shows a high C:S interaction, but # the other interactions are weak. We should confirm # this in the visualizations that follow next. summary(mod.full) # Visualize the surface using the PID library in R # library(pid) # The "PID" library is not part of this web-based version # of R. So load it from this server: source('https://yint.org/contourPlot.R') # Any least squares model can be supplied as # the first input into the contourPlot function. # Start by visualizing the C and T factors # (small interaction only) contourPlot(mod.full, "C", "T", main='Weak interaction') # Try the T and S factors # (some interaction) contourPlot(mod.full, "T", "S", main='Weak interaction') # Try the C and S factor combination # (high interaction) contourPlot(mod.full, "C", "S", main='Strong interaction') # Change the colour scheme and the plot resolution contourPlot(mod.full, "C", "S", N=50, colour.function=rainbow, main='Different colour scheme') # In which direction would you go to minimize # the response surface?

Bringing it all together: a comprehensive response surface method (RSM) example)

The code is provided here for you to try: http://yint.org/rsm-code