Assignment 4 - 2011 - Solution
Due date(s): | 07 February 2011 |
(PDF) | Assignment questions |
(PDF) | Assignment solutions |
<rst> <rst-options: 'toc' = False/> <rst-options: 'reset-figures' = False/>
Assignment objectives
=========
.. rubric:: Assignment objectives
- Be comfortable using distributions in R. - Using unpaired and paired tests via confidence intervals. - Construct and use Shewhart process monitoring charts
Question 1 [1.5]
=====
In the previous assignment you collected the snowfall and temperature data for the ``HAMILTON A`` weather station. Here are the data again:
* 1990 to 1999 snowfall: :math:`[131.2, 128.0, 130.7, 190.6, 263.4, 138.0, 207.3, 161.5, 78.8, 166.5]` * 2000 to 2008 snowfall: :math:`[170.9, 94.1, 138.0, 166.2, 175.8, 218.4, 56.6, 182.4, 243.2]` * 1990 to 2000 temperature: :math:`[8.6, 8.6, 6.9, 7.1, 7.1, 7.7, 6.9, 7.3, 9.8, 8.8]` * 2000 to 2008 temperature: :math:`[7.6, 8.8, 8.8, 7.3, 7.7, 8.2, 9.1 , 8.2, 7.7]`
- . Use these data to construct a :math:`z`-value *and* confidence interval on the assumption that the snowfall in the earlier decade (case A) is statistically the same as in 2000 to 2008 (case B).
- . Repeat this analysis for the average temperature values.
- . Do these, admittedly limited, data support the conclusion that many people keep repeating: the amount of snow we receive is less than before and that temperatures have gone up?
- . In the above analysis you had to pool the variances. There is a formal statistical test, described in the course notes, to verify whether the variances could have come from the same population:
.. math:: :nowrap:
\begin{alignat*}{4} F_{\alpha/2, \nu_1, \nu_2}\dfrac{s_2^2}{s_1^2} &\qquad<\qquad& \dfrac{\sigma_2^2}{\sigma_1^2} &\qquad<\qquad& F_{1-\alpha/2, \nu_1, \nu_2}\dfrac{s_2^2}{s_1^2} \end{alignat*}
where we use :math:`F_{\alpha/2, \nu_1, \nu_2}` to mean the point along the cumulative :math:`F`-distribution which has area of :math:`\alpha/2` using :math:`\nu_1` degrees of freedom for estimating :math:`s_1` and :math:`\nu_2` degrees of freedom for estimating :math:`s_2`. For example, in R, the value of :math:`F_{0.05/2, 10, 20}` can be found from ``qf(0.025, 10, 20)`` as 0.2925. The point along the cumulative :math:`F`-distribution which has area of :math:`1-\alpha/2` is denoted as :math:`F_{1-\alpha/2, \nu_1, \nu_2}`, and :math:`\alpha` is the level of confidence, usually :math:`\alpha = 0.05` to denote a 95% confidence level.
Confirm that you can pool the variances in both the snowfall and temperature case by verifying the confidence interval contains a value of 1.0.
.. note:: The equations in the printed course notes you downloaded, or bought from Titles, are wrong; the above version is correct. Please update your printouts.
Solution (Thanks to Ryan and Stuart)
1 and 2.
Unpaired :math:`t`-tests were performed on the snowfall datasets from 1990-1999 and 2000-2008 to test the assumption that the amount of snowfall and average temperature has not changed significantly over the last two decades. In order to perform these tests, the following assumptions were made:
* The variances of both samples are comparable * Independence within each sample and between the sample * Both samples are normally distributed
The :math:`z`-value for this test was constructed as follows:
.. math::
z&=\frac{(\overline{x}_B-\overline{x}_A)-(\mu_B-\mu_A)}{\sqrt{\sigma^2\left(\dfrac{1}{n_A}+ \dfrac{1}{n_B}\right)}} =\frac{(\overline{x}_B-\overline{x}_A)}{\sqrt{\sigma^2\left(\dfrac{1}{n_A}+ \dfrac{1}{n_B}\right)}}
Since an external estimate of variance was not available, the estimated variances from each data set were pooled to create an internal estimate using the following formula:
.. math::
s_P^2&=\frac{(n_A-1)s_A^2+(n_B-1)s_B^2}{n_A-1+n_B-1}
Using this internal estimator:
.. math::
z&=\frac{(\overline{x}_B-\overline{x}_A)-(\mu_B-\mu_A)}{\sqrt{s_P^2\left(\dfrac{1}{n_A}+ \dfrac{1}{n_B}\right)}}=\frac{(\overline{x}_B-\overline{x}_A)}{\sqrt{s_P^2\left(\dfrac{1}{n_A}+ \dfrac{1}{n_B}\right)}}
which follows the :math:`t`-distribution with :math:`(n_A+n_B-2)=17` degrees of freedom.
Unpacking this :math:`z`-value, confidence intervals were constructed at the 95\% confidence level as follows:
.. math::
\begin{array}{rcccl} c_{t,0.025,17} &\leq& z &\leq& c_{t,0.975,17}\\ (\overline{x}_B-\overline{x}_A)-2.109816\sqrt{s_P^2\left(\dfrac{1}{n_A}+ \dfrac{1}{n_B}\right)} &\leq& \mu_B-\mu_A &\leq& (\overline{x}_B-\overline{x}_A)+2.109816\sqrt{s_P^2\left(\dfrac{1}{n_A}+ \dfrac{1}{n_B}\right)} \end{array}
The results for the **snowfall** data set:
* :math:`s_P^2 = 2968` * :math:`z`-value = 0.0408 * CI is :math:`-51.8 \leq \mu_B-\mu_A \leq 53.8` in units of centimetres of snow
and for the **temperature** data set:
* :math:`s_P^2 = 0.7211` * :math:`z`-value = 0.706 * CI is :math:`-0.55 \leq \mu_B-\mu_A \leq 1.1` in units of Celcius
3. As the :math:`z`-values for both :math:`t`-tests fall within :math:`\pm 2.11` and the 95% confidence intervals contain zero, the data provided does not support the notion that we receive less snow or that the temperature has risen over the last two decades. i.e. change over the last two decades is not significantly different at the 95% confidence level. In fact, the unpaired :math:`t`-test on the total snowfall actually indicates that the means of the total snowfall data for the two decades are identical at the 95% confidence level.
4. An F-test was performed on the weather data to test the assumption that the variances for both samples are comparable. The confidence interval for the F-test was constructed on the ratio of sample variances as follows:
.. math::
F_{\alpha/2,\nu_1,\nu_2}\frac{s_2^2}{s_1^2}\leq \frac{\sigma_2^2}{\sigma_1^2} \leq F_{1-\alpha/2,\nu_1,\nu_2}\frac{s_2^2}{s_1^2}
where :math:`\nu_1` and :math:`\nu_2` are the degrees of freedom used to compute :math:`s_1` and :math:`s_2`, respectively, i.e. :math:`\nu_1=n_A-1` and :math:`\nu_2=n_B-1`.\\
Evaluating this expression at the 95% confidence level for both datasets:
**Snowfall:** :math:`0.3097 \leq \dfrac{ \sigma_2^2}{ \sigma_1^2} \leq 5.535`
**Average temperature**: :math:`0.0962 \leq \dfrac{ \sigma_2^2}{ \sigma_1^2} \leq 1.719`
Assuming that the variances for both samples are from the same population entails that the ratio of population variances is 1. Therefore, as the 95% confidence intervals from the :math:`F`-test contain a value of 1 for both datasets, the assumption that the variances are comparable is statistically valid. Hence, the :math:`F`-test supports the pooling of variances for use in the unpaired :math:`t`-test for both datasets (i.e. it was not incorrect to do so in parts 1 and 2).
.. literalinclude:: ../che4c3/Assignments/Assignment-4/code/snow-and-temp-significance-tests.R :language: s
Question 2 [1.5]
=====
The percentage yield from a batch reactor, and the purity of the feedstock are available as the `Batch yield and purity <http://openmv.net/info/batch-yield-and-purity>`_ data set. Assume these data are from phase I operation and calculate the Shewhart chart upper and lower control limits that you would use during phase II. Use a subgroup size of :math:`n=3`.
- . What is phase I?
- . What is phase II?
- . Show your calculations for the upper and lower control limits for the Shewhart chart on the *yield value*.
- . Show a plot of the Shewhart chart on these phase I data.
Solution (Thanks to Ryan, Stuart and Mudassir)
- . Phase 1 is the period from which historical data is taken that is known to be "in control". From this data, upper and lower control limits can be established for the monitored variable that contain a specified percent of all in control data.
- . Phase 2 is the period during which new, unseen data is collected by process monitoring in real-time. This data can be compared with the limits calculated from the "in control" data.
- . Assuming the dataset was derived from phase I operation, the batch yield data was grouped into subgroups of size 3. However, since the total number of data points (N=241) is not a multiple of three, the data set was truncated to the closest multiple of 3, i.e. :math:`N_{new} = 240`, by removing the last data point. Subsequently, the mean and standard deviation were calculated for each of the 80 subgroups. From this data, the lower and upper control limits were calculated as follows:
.. math::
\overline{\overline{x}} &= \frac{1}{8}\sum\limits_{k=1}^{80}\overline{x}_k = \bf{75.3}\\ \overline{S} &= \frac{1}{80}\sum\limits_{k=1}^{80}s_k = \bf{5.32}\\ \text{LCL} &= \overline{\overline{x}} - 3\cdot\frac{\overline{S}}{a_n\sqrt{n}} = \bf{64.9}\\ \text{UCL} &= \overline{\overline{x}} + 3\cdot\frac{\overline{S}}{a_n\sqrt{n}} = \bf{85.7}\\ \text{using}\,\,a_n &= 0.886\qquad \text{for a subgroup size of 3}\\ \text{and}\,\,\overline{\overline{x}} &= 75.3
Noticing that the mean for subgroup 42, :math:`\overline{x}_{42}=63.3`, falls below this LCL, the control limits were recalculated excluding this subgroup from phase I data (see R-code). Following this adjustment, the new control limits were calculated to be:
* LCL = 65.0 * UCL = 85.8
- . Shewhart charts for both rounds of the yield data (before and after removing the outlier):
.. figure:: ../che4c3/Assignments/Assignment-4/images/batch-phaseI-round-1-Yield.png :alt: images/math :width: 750px :align: center
.. figure:: ../che4c3/Assignments/Assignment-4/images/batch-phaseI-round-2-Yield.png :alt: images/math :width: 750px :align: center
.. literalinclude:: ../che4c3/Assignments/Assignment-4/code/monitor-batch-yield-and-purity-recursive.R :language: s
Question 3 [2]
===
You want to evaluate a new raw material (B), but the final product's brittleness, the main quality variable, must be the same as achieved with the current raw material. Manpower and physical constraints prevent you from running a randomized test, and you don't have a suitable database of historical reference data either.
One idea you come up with is to use to your advantage the fact that your production line has three parallel reactors, TK104, TK105, and TK107. They were installed at the same time, they have the same geometry, the same instrumentation, *etc*; you have pretty much thought about every factor that might vary between them, and are confident the 3 reactors are identical.
This means that when you do your testing on the new material next week you can run test A using one reactor and test B in another reactor, if you can find the two reactors that have *no statistical difference* in operation.
Normal production splits the same raw material between the 3 reactors. Data `on the website <http://openmv.net/info/brittleness-index>`_ contain the brittleness values from the three reactors for the past few runs using the current raw material (A).
Using a series of paired tests, calculate which two reactors you would pick to run your comparative trial on. Be *very specific and clearly substantiate why* you have chosen your 2 reactors.
Solution
Pairing assumes that each reactor was run with the same material, except that the material was split into thirds: one third for each reactor. As described in the section on paired tests we rely on calculating the difference in brittleness, then calculating the :math:`z`-value of the average difference. Contrast this to the unpaired tests, where we calculated the difference of the averages.
The code below shows how the paired differences are evaluated for each of the 3 combinations. The paired test highlights the similarity between TK105 and TK107, (the same result if you used an unpaired test - you should verify that). However the paired test shows much more clearly how different tanks TK104 and TK105 are, and especially TK104 and TK107.
In the case of TK104 and TK105 the difference might seem surprising - take a look back at the box plots (type ``boxplot(brittle)`` into R) and how much they overlap. However a paired test cannot be judged by a box plot, because it looks at the case-by-case difference, not the overall between group difference. A better plot with which to confirm the really large :math:`z`-value for the TK105 and TK107 difference is the plot of the differences.
.. literalinclude:: ../che4c3/Assignments/Assignment-4/code/brittleness-paired-comparison-assignment3-2010.R
:language: s :lines: 1-36
From the above code we get the 95% confidence intervals as:
.. math::
\begin{array}{rcccl} 9.81 &\leq& \mu_{105 - 104} &\leq& 88.4 \\ 48.3 &\leq& \mu_{107 - 104} &\leq& 68.7 \\ -46.1 &\leq& \mu_{107 - 105} &\leq& 33.5 \\ \end{array}
Now onto the most important part of any statistical analysis: interpreting the results and making a decision.
We can clearly see that TK104 is very different from TK105 and TK107 at the 95% confidence level, because these confidence intervals *do not* include zero (their :math:`z`-values are very large positive or large negative values).
The most similar reactors are TK105 and TK107, because this confidence interval *for the difference* spans zero, and it does this nearly symmetrically, from -46 up to +33, so the risk that this CI was found to span zero due to only a subset of the data is minimal. In fact, a plot of the differences show several large and several small differences.
So you would naturally conclude that the trial should be conducted in reactors TK105 and TK107. However a contrarian point of view holds that you should conduct the trial in TK104 and TK107. This is the confidence interval with the smallest span, i.e. it is the "tightest confidence interval". It interpretation says, well I recognize there is a difference, but I can reliably predict that difference to be only 20 units wide. So I can do my tests in TK104 and TK107, then just subtract off the bias of 20 units. Any tests done in TK105 and TK107 though, should have no *statistically significant* difference, but this confidence limit spans 33+46=79 units, 4 times greater.
My recommendation would be to use TK104 and TK107; however if you answered TK105 and TK107, I will accept that as an answer. But this question should make you realize that most statistical analyses are not clear cut, and you always need to ask what is the engineering significance and implication of your results.
Question 4 [2]
===
A tank uses small air bubbles to keep solid particles in suspension. If too much air is blown into the tank, then excessive foaming and loss of valuable solid product occurs; if too little air is blown into the tank the particles sink and drop out of suspension.
.. figure:: ../che4c3/Assignments/Assignment-4/images/tank-suspension.png :align: center :width: 300px
- . Which monitoring chart would you use to ensure the airflow is always near target?
- . Use the `aeration rate dataset <http://openmv.net/info/aeration-rate>`_ from the website and plot the raw data (total litres of air added in a 1 minute period). Are you able to detect any problems?
- . Construct the chart you described in part 1, and show it's performance on all the data. Make any necessary assumptions to construct the chart.
- . At what point in time are you able to detect the problem, using this chart?
- . Construct a Shewhart chart, choosing appropriate data for phase I, and calculate the Shewhart limits. Then use the entire dataset as if it were phase II data.
* Show this phase II Shewhart chart. * Compare the Shewhart chart's performance to the chart in part 3 of this question.
Solution (thanks to Ryan and Stuart)
- . A CUSUM chart would be the most appropriate monitoring chart to ensure the airflow is always near the intended target. A EWMA chart could also be used for the same purpose, but the value of :math:`\lambda` would have to be set fairly low (i.e. long memory) such that the EWMA would approximate the CUSUM.
- . The aeration rate dataset is depicted below:
.. figure:: ../che4c3/Assignments/Assignment-4/images/aeration-rate-raw-data-assign4.png :alt: images/airflow-monitoring.R :width: 750px :align: center
It is very difficult to assess problems from the raw data plot. There might be a slight upward shift around 300 and 500 minutes.
- . Assumptions for the CUSUM chart:
* We will plot the CUSUM chart on raw data, though you could use subgroups if you wanted to. * The target value can be the mean (24.17) of all the data, or more robustly, use the median (24.1), especially if we expect problems with the raw data (true of almost every real data set).
The CUSUM chart, using the median as target value showed a problem starting to occur around :math:`t=300`. So we recalculated the median, using only data from 0 to :math:`t=200`, to avoid biasing the target value. Using this median instead, 23.95, we get the following CUSUM chart:
.. figure:: ../che4c3/Assignments/Assignment-4/images/aeration-CUSUM-assign4.png :alt: images/airflow-monitoring.R :width: 750px :align: center
4. The revised CUSUM chart suggests that the error occurs around 275 min, as evidenced by the steep positive slope thereafter. It should be noted that the CUSUM chart begins to bear a positive slope around 200 min, but this initial increase in the cumulative error would likely not be diagnosable (i.e. using a V-mask).
.. literalinclude:: ../che4c3/Assignments/Assignment-4/code/airflow-monitoring.R :language: s
5. Using the iterative Shewhart code from the previous question, we used
* Phase I was taken far enough away from the suspected error: 0 - 200 min * Subgroup size of :math:`n=5`
* :math:`\bar{\bar{x}} = 23.9` * :math:`\bar{S} = 1.28` * :math:`a_n = 0.940` * LCL = :math:`23.9 - 3\cdot\frac{1.28}{0.940\sqrt{5}}= 22.1` * UCL = :math:`23.9 + 3\cdot\frac{1.28}{0.940\sqrt{5}}= 25.8`
The Shewhart chart applied to the entire dataset is shown below. In contrast to the CUSUM chart, the Shewhart chart is unable to detect the problem in the aeration rate. Unlike the CUSUM chart, which has infinite memory, the Shewhart chart has no memory and cannot adequately assess the location of the monitored variable in relation to its specified target. Instead, the Shewhart chart merely monitors aeration rate with respect to the control limits for the process. Since the aeration rate does not exceed the control limits for the process (i.e. process remains in control), the Shewhart chart does not detect any abnormalities.
.. figure:: ../che4c3/Assignments/Assignment-4/images/aeration-Shewhart-chart-assign4.png :width: 750px :align: center
If you used the Western Electric rules, in addition to the Shewhart chart limits, you would have picked up a consecutive sequence of 8 points on one side of the target around :math:`t=350`.
Question 5 [1.5]
=====
.. note:: For 600-level students
The carbon dioxide measurement is available from a `gas-fired furnace <http://openmv.net/info/gas-furnace>`_. These data are from phase I operation.
- . Calculate the Shewhart chart upper and lower control limits that you would use during phase II with a subgroup size of :math:`n=6`.
- . Is this a useful monitoring chart? What is going in this data?
- . How can you fix the problem?
Solution (thanks to Ryan and Stuart)
First a plot of the raw data will be useful:
.. figure:: ../che4c3/Assignments/Assignment-4/images/CO2-raw-data-assign4.png :alt: code/co2-chart.R :width: 750px :align: center
- . Assuming that the CO\ :sub:`2` data set is from phase I operation, the control limits were calculated as follows:
* Assume subgroups are independent * :math:`\bar{\bar{x}} =\frac{1}{K}\sum\limits_{k=1}^K\bar{x}_k= 53.5`\ * :math:`\bar{S} =\frac{1}{K}\sum\limits_{k=1}^K s_k= 1.10` * :math:`a_n =0.952` * LCL = :math:`53.5 -3 \cdot\frac{1.10}{0.952\sqrt{6}} = 53.5` * UCL = :math:`53.5 +3 \cdot\frac{1.10}{0.952\sqrt{6}} = 54.0`
- . The Shewhart chart, with a subgroup of 6, is not a useful monitoring chart. There are too many false alarms, which will cause the operators to just ignore the chart.
.. figure:: ../che4c3/Assignments/Assignment-4/images/CO2-phaseI-first-round-assign4.png :alt: code/co2-chart.R :width: 750px :align: center
The problem is that the first assumption of independence is not correct. As shown in the previous assumption
The raw data show that the subgroups will be related; in this case :math:`n=6` and any 6 points side-by-side will still have a relationship between them.
- . One approach to fixing the problem is to subsample the data, i.e. only use every :math:`k^\text{th}` data point as the raw data, e.g. :math:`k=10`, and then form subgroups from that sampled data.
Another is to use a larger subgroup size. We will introduce a method later on that can be used to verify the degree of relationship: the `autocorrelation function <http://en.wikipedia.org/wiki/Autocorrelation>`_, and the corresponding ``acf(...)`` function in R. Using this function we can see the raw data are unrelated average the 17th lag, so we could subgroups of that size. However, even then we see the Shewhart chart showing frequent violation, though fewer than before.
Yet another alternative is to use an EWMA chart, which takes the autocorrelation into account. However, the EWMA chart limits are found from the assumption that the subgroup means (or raw data, if subgroup size is 1), are independent.
So we are finally left with the conclusion that perhaps there data really are not from in control operation, or, if they are, we must manually adjust the limits to be wider.
.. literalinclude:: ../che4c3/Assignments/Assignment-4/code/co2-chart.R :language: s
Something to think about
============
Being RRSP season, it is tempting to start buy and selling stock, mutual funds and exchange traded funds (ETFs). One issue faced by any investor is when is a good time to buy or to sell.
Using the tools of process monitoring, think about you can use control limits to decide when to sell a poorly performing stock (going below the LCL?) and when to buy a weak stock that is strengthening (going up, over the UCL). One issue with stock prices is of course the lack of independence between the daily prices (form subgroups!).
But using the concepts of process monitoring you can devise a trading strategy that prevents trading too frequently (rapid buying and selling), as well as from selling/buying when there is just "common cause" fluctuations in the data. The control limits are set based on your personal level of risk.
You should always verify your trading strategies with historical data, and Yahoo Finance provides CSV data dumps for free. There are also *many* R packages that automatically get the data for you from Yahoo. Calculate your control limits and simulate your buying/selling strategy using data, for example from 2004 to 2006, then test your strategy on phase II data, from 2007 onwards. You could conceivably write it as a nonlinear optimization problem that calculates limits to maximize your profit. Its expected that different stock sectors will have different limits (e.g. compare slower moving financial stocks to high-tech stocks), because their variability is different.
If this sort of concept seems interesting, take a further look at `technical analysis <http://en.wikipedia.org/wiki/Technical_analysis>`_.
Question (not for credit)
=============
.. note:: This question should take you some time to complete and is open-ended.
A common unit operation in the pharmaceutical area is to uniformly blend powders for tablets. In this question we consider blending an excipient (an inactive magnesium stearate base), a binder, and the active ingredient. The mixing process is tracked using a wireless near infrared (NIR) probe embedded in a V-blender. The mixer is stopped when the NIR spectra stablize. A new supplier of magnesium stearate is being considered that will save $ 294,000 per year.
.. figure:: ../che4c3/Assignments/Assignment-4/images/V-Blender.jpg :width: 500px :align: center
..
Illustration from `Wikipedia <http://en.wikipedia.org/wiki/Industrial_mixer>`_
The 15 most recent runs with the current magnesium stearate supplier had an average mixing time of 2715 seconds, and a standard deviation of 390 seconds. So far you have run 6 batches from the new supplier, and the average mixing time of these runs is 3115 seconds with a standard deviation of 452 seconds. Your manager is not happy with these results so far - this extra mixing time will actually cost you more money via lost production.
The manager wants to revert back to the original supplier, but is leaving the decision up to you; what would be your advice? Show all calculations and describe any additional assumptions, if required.
Solution
This question, similar to most real statistical problems, is open-ended. This problem considers whether a significant difference has occurred. And in many cases, even though there is significant difference, it has to be weighed up whether there is a *practical* difference as well, together with the potential of saving money (increased profit).
You should always state any assumptions you make, compute a confidence interval for the difference and interpret it.
The decision is one of whether the new material leads to a significant difference in the mixing time. It is desirable, from a production point of view, that the new mixing time is shorter, or at least the same. Some notation:
.. math:: \begin{array}{rclrcl} \hat{\mu}_\text{Before} = \overline{x}_B &=& 2715 &\qquad\qquad \hat{\mu}_\text{After} = \overline{x}_A &=& 3115\\ \hat{\sigma}_\text{Before} = s_B &=& 390 &\qquad\qquad \hat{\sigma}_\text{After} = s_A &=& 452\\ n_B &=& 15 &\qquad\qquad n_A &=& 6 \end{array}
Assumptions required to compare the two groups:
* The individual samples within each group were taken independently, so that we can invoke the central limit theorem and assume these means and standard deviation are normal distributed. * Assume the individual samples within each group are from a normal distribution as well. * Assume that we can pool the variances, i.e. :math:`\sigma_\text{Before}` and :math:`\sigma_\text{After}` are from comparable distributions. * Using the pooled variance implies that the :math:`z`-value follows the :math:`t`-distribution. * The mean of each group (before and after) is independent of the other (very likely true). * No other factors were changed, other than the raw material (we can only hope, though in practice this is often not true, and a paired test would eliminate any differences like this).
Calculating the pooled variance:
.. math:: s_P^2 &= \dfrac{(n_A -1) s_A^2 + (n_B-1)s_B^2}{n_A - 1 + n_B - 1} \\ & = \dfrac{(6-1) 452^2 + (15-1)390^2}{6 - 1 + 15 - 1} \\ & = 165837
Computing the :math:`z`-value for this difference:
.. math:: z &= \dfrac{(\overline{x}_B - \overline{x}_A) - (\mu_B - \mu_A)}{\sqrt{s_P^2 \left(\frac{1}{n_A} + \frac{1}{n_B}\right)}}\\ z &= \dfrac{(2715 - 3115) - (\mu_B - \mu_A)}{\sqrt{165837 \left(\frac{1}{6} + \frac{1}{15}\right)}} \\ z &= \dfrac{-400 - (\mu_B - \mu_A)}{196.7} = -2.03 \qquad \text{on the hypothesis that}\qquad \mu_B = \mu_A
The probability of obtaining this value of :math:`z` can be found using the :math:`t`-distribution at 6 + 15 - 2 = 19 degrees of freedom (because the standard deviation is an estimate, not a population value). Using tables, a value of 0.025, or 2.5% is found (in R, it would be ``pt(-2.03, df=19) = 0.0283``, or 2.83%). At this point one can argue either way that the new excipient leads to longer times, though I would be inclined to say that this probability is too small to be due to chance alone. Therefore there is a significant difference, and we should revert back to the previous excipient. Factors such as operators, and other process conditions could have affected the 6 new runs.
Alternatively, and this is the way I prefer to look at these sort of questions, is to create a confidence interval. At the 95% level, the value of :math:`c_t` in the equation below, using 19 degrees of freedom is ``qt(0.975, df=19) = 2.09`` (any value close to this from the tables is acceptable):
.. math:: \begin{array}{rcccl} -c_t &\leq& z &\leq & +c_t \\ (\overline{x}_B - \overline{x}_A) - c_t \sqrt{s_P^2 \left(\frac{1}{n_A} + \frac{1}{n_B}\right)} &\leq& \mu_B - \mu_A &\leq & (\overline{x}_B - \overline{x}_A) + c_t \sqrt{s_P^2 \left(\frac{1}{n_A} + \frac{1}{n_B}\right)}\\ -400 - 2.09 \sqrt{165837 \left(\frac{1}{6} + \frac{1}{15}\right)} &\leq& \mu_B - \mu_A &\leq& -400 + 2.09 \sqrt{165837 \left(\frac{1}{6} + \frac{1}{15}\right)} \\ -400 - 412 &\leq& \mu_B - \mu_A &\leq& -400 + 412 \\ -812 &\leq& \mu_B - \mu_A &\leq& 12 \end{array}
The interpretation of this confidence interval is that there is no difference between the current and new magnesium stearate excipient. The immediate response to your manager could be "*keep using the new excipient*".
However, the confidence interval's asymmetry should give you pause, certainly from a practical point of view (this is why I prefer the confidence interval - you get a better interpretation of the result). The 12 seconds by which it overlaps zero is so short when compared to average mixing times of around 3000 seconds, with standard deviations of 400 seconds. The practical recommendation is that the new excipient has longer mixing times, so "*revert to using the previous excipient*".
One other aspect of this problem that might bother you is the low number of runs (batches) used. Let's take a look at how sensitive the confidence interval is to that. Assume that we perform one extra run with the new excipient (:math:`n_A = 7` now), and assume the pooled variance, :math:`s_p^2 = 165837` remains the same with this new run. The new confidence interval is:
.. math:: \begin{array}{rcccl} (\overline{x}_B - \overline{x}_A) - c_t \sqrt{s_P^2 \left(\frac{1}{n_A} + \frac{1}{n_B}\right)} &\leq& \mu_B - \mu_A &\leq & (\overline{x}_B - \overline{x}_A) + c_t \sqrt{s_P^2 \left(\frac{1}{n_A} + \frac{1}{n_B}\right)}\\ (\overline{x}_B - \overline{x}_A)- 2.09 \sqrt{165837 \left(\frac{1}{7} + \frac{1}{15}\right)} &\leq& \mu_B - \mu_A &\leq& (\overline{x}_B - \overline{x}_A) + 2.09 \sqrt{165837 \left(\frac{1}{7} + \frac{1}{15}\right)} \\ (\overline{x}_B - \overline{x}_A) - 390 &\leq& \mu_B - \mu_A &\leq& (\overline{x}_B - \overline{x}_A) + 390 \end{array}
So comparing this :math:`\pm 390` with 7 runs, to the :math:`\pm 412` with 6 runs, shows that the confidence interval shrinks in quite a bit, much more than the 12 second overlap of zero. Of course we don't know what the new :math:`\overline{x}_B - \overline{x}_A` will be with 7 runs, so my recommendation would be to perform at least one more run with the new excipient, but I suspect that the new run would show there to be a significant difference, and statistically confirm that we should "*revert to using the previous excipient*".
.. raw:: latex
\vspace{0.5cm} \hrule \begin{center}END\end{center}
</rst>