# Assignment 3

1. Build a PCA model on the data on the first 100 rows.
2. Plot the scores. What do you notice?
3. Investigate the outliers with the contribution tool.
4. Verify that the outliers exist in the raw data
5. Exclude any unusual observations and refit the model
6. Did you get all the outliers? Check the scores and SPE. Repeat to get all outliers removed.
7. Plot a loadings plot for the first component. What is your interpretation of $$p_1$$?
8. Given the $$R^2$$ and $$Q^2$$ values for the first component, what is your interpretation about the variability in this process? (Remember the goal of PCA is to explain variability)
9. What is the interpretation of $$p_2$$? From a quality control perspective, if you could remove the variability due to $$p_2$$, how much of the variability would you be removing from the process?
10. Plot the corresponding time series plot for $$t_1$$. What do you notice in the sequence of score values?
11. Repeat the above question for the second component.
12. Use all the data as testing data (184 observations, of which the first $$\approx 100$$ were used to build the model).
13. Do the outliers that you excluded earlier show up as outliers still? Do the contribution plots for these outliers give the same diagnosis that you got before?
14. Are there any new outliers in points 101 to 184? If so, what are is their diagnosis?