6.5.4. More about the direction vectors (loadings)

The direction vectors \(\mathbf{p}_1\), \(\mathbf{p}_2\) and so on, are each \(K \times 1\) unit vectors. These are vectors in the original coordinate space (the \(K\)-dimensional real-world) where the observations are recorded.

But these direction vectors are also our link to the latent-variable coordinate system. These direction vectors create a (hyper)plane that is embedded inside the \(K\)-dimensional space of the \(K\) original variables. You will see the terminology of loadings - this is just another name for these direction vectors:

\[\text{Loadings, a $K \times A$ matrix:}\qquad\qquad \mathbf{P} = \begin{bmatrix} \mathbf{p}_1 & \mathbf{p}_2 & \ldots & \mathbf{p}_A \end{bmatrix}\]

Once this hyperplane is mapped out, then we start to consider how each of the observations lie on this hyperplane. We start to be more and more interested in this reduced dimensional plane, because it is an \(A\)-dimensional plane, where \(A\) is often much smaller than \(K\). Returning back to the case of the thermometers in a room: we had 4 thermometers (\(K=4\)), but only one latent variable, \(A=1\). Rather than concern ourself with the original 4 measurements, we only focus on the single column of score values, since this single variables is the best summary possible of the 4 original variables.

How do we get the score value(s)? We use the equation from the prior section (repeated here). It is the multiplication of the pre-processed data by the loadings vectors:

\[\begin{split}\mathbf{T} &= \mathbf{X} \mathbf{P} \\ (N \times A) &= (N \times K)(K \times A)\end{split}\]

and it shows how the loadings are our link from the \(K\)-dimensional, real-world, coordinate system to the \(A\)-dimensional, latent variable-world, coordinates.

Let’s return to the example of the 4 temperatures. We derived there that a plausible summary of the 4 temperatures could be found from:

\[\begin{split}t_1 &= \begin{bmatrix} x_1 & x_2 & x_3 & x_4 \end{bmatrix}\begin{bmatrix} p_{1,1} \\ p_{2,1} \\ p_{3,1} \\ p_{4,1} \end{bmatrix} = \begin{bmatrix} x_1 & x_2 & x_3 & x_4 \end{bmatrix}\begin{bmatrix} 0.25 \\ 0.25 \\ 0.25 \\ 0.25 \end{bmatrix} = \mathbf{x}_i \mathbf{p}_1\end{split}\]

So the loading vector for this example points in the direction \(\mathbf{p}'_1 = [0.25, 0.25, 0.25, 0.25]\). This isn’t a unit vector though; but we can make it one:

  • Current magnitude of vector = \(\sqrt{0.25^2 + 0.25^2 + 0.25^2 + 0.25^2} = 0.50\)

  • Divide the vector by current magnitude: \(\mathbf{p}_1 = \dfrac{1}{0.5} \cdot [0.25, 0.25, 0.25, 0.25]\)

  • New, unit vector = \(\mathbf{p}_1 = [0.5, 0.5, 0.5, 0.5]\)

  • Check new magnitude = \(\sqrt{0.5^2 + 0.5^2 + 0.5^2 + 0.5^2} = 1.0\)

What would be the entries in the \(\mathbf{p}_1\) loading vector if we had 6 thermometers? (Ans = 0.41; in general, for \(K\) thermometers, \(1/\sqrt{K}\)).

This is very useful, because now instead of dealing with \(K\) thermometers we can reduce the columns of data down to just a single, average temperature. This isn’t a particularly interesting case though; you would have likely done this anyway as an engineer facing this problem. But the next food texture example will illustrate a more realistic case.