6.5.12. Hotelling’s T²

The final quantity from a PCA model that we need to consider is called Hotelling’s \(T^2\) value. Some PCA models will have many components, \(A\), so an initial screening of these components using score scatterplots will require reviewing \(A(A-1)/2\) scatterplots. The \(T^2\) value for the \(i^\text{th}\) observation is defined as:

\[T^2 = \sum_{a=1}^{a=A}{\left(\dfrac{t_{i,a}}{s_a}\right)^2}\]

where the \(s_a^2\) values are constants, and are the variances of each component. The easiest interpretation is that \(T^2\) is a scalar number that summarizes all the score values. Some other properties regarding \(T^2\):

  • It is a positive number, greater than or equal to zero.

  • It is the distance from the center of the (hyper)plane to the projection of the observation onto the (hyper)plane.

  • An observation that projects onto the model’s center (usually the observation where every value is at the mean), has \(T^2 = 0\).

  • The \(T^2\) statistic is distributed according to the \(F\)-distribution and is calculated by the multivariate software package being used. For example, we can calculate the 95% confidence limit for \(T^2\), below which we expect, under normal conditions, to locate 95% of the observations.

  • It is useful to consider the case when \(A=2\), and fix the \(T^2\) value at its 95% limit, for example, call that \(T^2_{A=2, \alpha=0.95}\). Using the definition for \(T^2\):

    \[T^2_{A=2, \alpha=0.95} = \dfrac{t^2_{1}}{s^2_1} + \dfrac{t^2_{2}}{s^2_2}\]

    On a scatterplot of \(t_1\) vs \(t_2\) for all observations, this would be the equation of an ellipse, centered at the origin. You will often see this ellipse shown on \(t_i\) vs \(t_j\) scatterplots of the scores. Points inside this elliptical region are within the 95% confidence limit for \(T^2\).

  • The same principle holds for \(A>2\), except the ellipse is called a hyper-ellipse (think of a rugby-ball shaped object for \(A=3\)). The general interpretation is that if a point is within this ellipse, then it is also below the \(T^2\) limit, if \(T^2\) were to be plotted on a line.