2.13. Paired tests

A paired test is a test that is run twice on the same object or batch of materials. You might see the nomenclature of “two treatments” being used in the literature. For example:

  • A drug trial could be run in two parts: each person randomly receives a placebo or the drug, then 3 weeks later they receive the opposite, for another 3 weeks. Tests are run at 3 weeks and 6 weeks and the difference in the test result is recorded.

  • We are testing two different additives, A and B, where the additive is applied to a base mixture of raw materials. Several raw material lots are received from various suppliers, supposedly uniform. Split each lot into 2 parts, and run additive A and B on each half. Measure the outcome variable, e.g. conversion, viscosity, or whatever the case might be, and record the difference.

  • We are testing a new coating to repel moisture. The coating is applied to randomly selected sheets in a pattern [A|B] or [B|A] (the pattern choice is made randomly). We measure the repellent property value and record the difference.

In each case we have a table of \(n\) samples recording the difference values. The question now is whether the difference is significant, or is it essentially zero?

The advantage of the paired test is that any systematic error in our measurement system, what ever it might be, is removed as long as that error is consistent. Say for example we are measuring blood pressure, and the automated blood pressure device has a bias of -5 mmHg. This systematic error will cancel out when we subtract the 2 test readings. In the example of the raw materials and additives: any variation in the raw materials and its (unintended) effect on the outcome variable of interest will be cancelled.

The disadvantage of the paired test is that we lose degrees of freedom. Let’s see how:

  1. Calculate the \(n\) differences: \(w_1 = x_{B,1} - x_{A,1}; w_2 = x_{B,2} - x_{A,2}, \ldots\) to create the sample of values \(\mathbf{w} = [w_1, w_2, \ldots, w_n]\)

  2. Assume these values, \(w_i\), are independent, because they are taken on independent objects (people, base packages, sheets of paper, etc)

  3. Calculate the mean, \(\overline{w}\) and the standard deviation, \(s_w\), of these \(n\) difference values.

  4. What do we need to assume about the population from which \(w\) comes? Nothing. We are not interested in the \(w\) values, we are interested in \(\overline{w}\). OK, so what distribution would values of \(\overline{w}\) come from? By the central limit theorem, the \(\overline{w}\) values should be normally distributed as \(\overline{w} \sim \mathcal{N}\left(\mu_w, \sigma_w^2/n \right)\), where \(\mu_w = \mu_{A-B}\).

  5. Now calculate the \(z\)-value, but use the sample standard deviation, instead of the population standard deviation.

    \[z = \frac{\overline{w} - \mu_w}{s_w / \sqrt{n}}\]
  6. Because we have used the sample standard deviation, \(s_w\), we have to use to the \(t\)-distribution with \(n-1\) degrees of freedom, to calculate the critical values.

  7. We can calculate a confidence interval, below, and if this interval includes zero, then the change from treatment A to treatment B had no effect.

    \[\overline{w} - c_t \frac{s_w}{\sqrt{n}} < \mu_w < \overline{w} + c_t \frac{s_w}{\sqrt{n}}\]

    The value of \(c_t\) is taken from the \(t\)-distribution with \(n-1\) degrees of freedom at the level of confidence required: use the qt(...) function in R to obtain the values of \(c_t\).

The loss of degrees of freedom can be seen when we use exactly the same data and treat the problem as one where we have \(n_A\) and \(n_B\) samples in groups A and B and want to test for a difference between \(\mu_A\) and \(\mu_B\). You are encouraged to try this out. There are more degrees of freedom, \(n_A + n_B - 2\) in fact when we use the \(t\)-distribution with the pooled variance shown here. Compare this to the case just described above where there are only \(n\) degrees of freedom.