6.5. Principal Component Analysis (PCA)

Principal component analysis, PCA, builds a model for a matrix of data.

A model is always an approximation of the system from where the data came. The objectives for which we use that model can be varied.

In this section we will start by visualizing the data as well as consider a simplified, geometric view of what a PCA model looks like. A mathematical analysis of PCA is also required to get a deeper understanding of PCA, so we go into some detail on that point, however it can be skipped on first reading.

The first part of this section emphasizes the general interpretation of a PCA model, since this is a required step that any modeller will have to perform. We leave to the second half of this section the important details of how to preprocess the raw data, how to actually calculate the PCA model, and how to validate and test it. This “reverse” order may be unsatisfying for some, but it is helpful to see how to use the model first, before going into details on its calculation.