6.5. Principal Component Analysis (PCA)¶
Principal component analysis, PCA, builds a model for a matrix of data.
A model is always an approximation of the system from where the data came. The objectives for which we use that model can be varied.
In this section we will start by visualizing the data as well as consider a simplified, geometric view of what a PCA model looks like. A mathematical analysis of PCA is also required to get a deeper understanding of PCA, so we go into some detail on that point, however it can be skipped on first reading.
The first part of this section emphasizes the general interpretation of a PCA model, since this is a required step that any modeller will have to perform. We leave to the second half of this section the important details of how to preprocess the raw data, how to actually calculate the PCA model, and how to validate and test it. This “reverse” order may be unsatisfying for some, but it is helpful to see how to use the model first, before going into details on its calculation.
- 6.5.1. Visualizing multivariate data
- 6.5.2. Geometric explanation of PCA
- 6.5.3. Mathematical derivation for PCA
- 6.5.4. More about the direction vectors (loadings)
- 6.5.5. PCA example: Food texture analysis
- 6.5.6. Interpreting score plots
- 6.5.7. Interpreting loading plots
- 6.5.8. Interpreting loadings and scores together
- 6.5.9. Predicted values for each observation
- 6.5.10. Interpreting the residuals
- 6.5.11. PCA example: analysis of spectral data
- 6.5.12. Hotelling’s T²
- 6.5.13. Preprocessing the data before building a model
- 6.5.14. Algorithms to calculate (build) PCA models
- 6.5.15. Testing the PCA model
- 6.5.16. Determining the number of components to use in the model with cross-validation
- 6.5.17. Some properties of PCA models
- 6.5.18. Latent variable contribution plots
- 6.5.19. Using indicator variables in a latent variable model
- 6.5.20. Visualization latent variable models with linking and brushing
- 6.5.21. PCA Exercises