6.5.3. Mathematical derivation for PCA¶
Geometrically, when finding the best-fit line for the swarm of points, our objective was to minimize the error, i.e. the residual distances from each point to the best-fit line is the smallest possible. This is also mathematically equivalent to maximizing the variance of the scores, \(\mathbf{t}_a\).
We briefly review here what that means. Let \(\mathbf{x}'_i\) be a row from our data, so \(\mathbf{x}'_i\) is a \(1 \times K\) vector. We defined the score value for this observation as the distance from the origin, along the direction vector, \(\mathbf{p}_1\), to the point where we find the perpendicular projection onto \(\mathbf{p}_1\). This is illustrated below, where the score value for observation \(\mathbf{x}_i\) has a value of \(t_{i,1}\).

Recall from geometry that the cosine of an angle in a right-angled triangle is the ratio of the adjacent side to the hypotenuse. But the cosine of an angle is also used in linear algebra to define the dot-product. Mathematically:
where \(\| \cdot \|\) indicates the length of the enclosed vector, and the length of the direction vector, \(\mathbf{p}_1\) is 1.0, by definition.
Note that \(t_{i,1} = \mathbf{x}'_i \mathbf{p}_1\) represents a linear combination
So \(t_{i,1}\) is the score value for the \(i^\text{th}\) observation along the first component, and is a linear combination of the \(i^\text{th}\) row of data, \(\mathbf{x}_i\) and the direction vector \(\mathbf{p}_1\). Notice that there are \(K\) terms in the linear combination: each of the \(K\) variables contributes to the overall score.
We can calculate the second score value for the \(i^\text{th}\) observation in a similar way:
And so on, for the third and subsequent components. We can compactly write in matrix form for the \(i^\text{th}\) observation that:
which calculates all \(A\) score values for that observation in one go. This is exactly what we derived earlier in the example with the 4 thermometers in the room.
Finally, for an entire matrix of data, \(\mathbf{X}\), we can calculate all scores, for all observations: