Difference between revisions of "Principal Component Analysis"

From Latent Variable Methods in Engineering
Jump to navigation Jump to search
Line 82: Line 82:
* \(p_1\) and \(p_2\) are the unit vectors for components 1 and 2.
* \(p_1\) and \(p_2\) are the unit vectors for components 1 and 2.
* \( \mathbf{x}_i \) is a row of data from matrix \( \mathbf{X}\).
* \( \mathbf{x}_i \) is a row of data from matrix \( \mathbf{X}\).
* \(\hat{x}_{i,1} = t_{i,1}p_1\) = the best prediction of  \( \mathbf{x}_i \) using only the first component.
* \(\hat{\mathbf{x}}_{i,1} = t_{i,1}p_1\) = the best prediction of  \( \mathbf{x}_i \) using only the first component.
* \(\hat{x}_{i,2} = t_{i,2}p_2\) = the improvement we add after the first component to better predict \( \mathbf{x}_i \).
* \(\hat{\mathbf{x}}_{i,2} = t_{i,2}p_2\) = the improvement we add after the first component to better predict \( \mathbf{x}_i \).
* \(\hat{x}_{i} = \hat{x}_{i,1}  + \hat{x}_{i,2} \) = is the total prediction of \( \mathbf{x}_i \) using 2 components and is the open blue point lying on the plane defined by \(p_1\) and \(p_2\). Notice that this is just the vector summation of \( \hat{x}_{i,1}\) and \( \hat{x}_{i,2}\).
* \(\hat{\mathbf{x}}_{i} = \hat{\mathbf{x}}_{i,1}  + \hat{\mathbf{x}}_{i,2} \) = is the total prediction of \( \mathbf{x}_i \) using 2 components and is the open blue point lying on the plane defined by \(p_1\) and \(p_2\). Notice that this is just the vector summation of \( \hat{\mathbf{x}}_{i,1}\) and \( \hat{\mathbf{x}}_{i,2}\).
* \(e_{i} \)  = is the prediction error '''vector''' because the prediction \(\hat{x}_{i} \) is not exact: the data point  \( \mathbf{x}_i \) lies above the plane defined by \(p_1\) and \(p_2\). This \(e_{i} \) is the residual distance after using 2 components.
* \(\mathbf{e}_{i} \)  = is the prediction error '''''vector''''' because the prediction \(\hat{\mathbf{x}}_{i} \) is not exact: the data point  \( \mathbf{x}_i \) lies above the plane defined by \(p_1\) and \(p_2\). This \(e_{i} \) is the residual distance after using 2 components.
* \( \mathbf{x}_i = \hat{x}_{i} +  e_{i} \) : also a vector summation
* \( \mathbf{x}_i = \hat{\mathbf{x}}_{i} +  \mathbf{e}_{i} \) : also a vector summation
|}
|}

Revision as of 14:18, 18 September 2011

Video material
Download video: Link (plays in Google Chrome) [290.1Mb]

Video timing

  • 00:00 to 21:37 Recap and overview of this class
  • 21:38 to 42:01 Preprocessing: centering and scaling
  • 42:02 to 57:07 Geometric view of PCA

Class notes

<pdfreflow> class_date = 16 September 2011 [1.65 Mb] button_label = Create my projector slides! show_page_layout = 1 show_frame_option = 1 pdf_file = lvm-class-2.pdf </pdfreflow>

Class preparation

Class 2 (16 September)

  • Reading for class 2
  • Linear algebra topics you should be familiar with before class 2:
    • matrix multiplication
    • that matrix multiplication of a vector by a matrix is a transformation from one coordinate system to another (we will review this in class)
    • linear combinations (read the first section of that website: we will review this in class)
    • the dot product of 2 vectors, and that they are related by the cosine of the angle between them (see the geometric interpretation section)

Class 3 (23 September)

  • Least squares:
    • what is the objective function of least squares
    • how to calculate the two regression coefficients \(b_0\) and \(b_1\) for \(y = b_0 + b_1x + e\)
    • understand that the residuals in least squares are orthogonal to \(x\)
  • Some optimization theory:
    • how an optimization problem is written with equality constraints
    • the Lagrange multiplier principle for solving simple, equality constrained optimization problems

Update

This illustration should help better explain what I trying to get across in class 2B

Geometric-interpretation-of-PCA-xhat-residuals.png
  • \(p_1\) and \(p_2\) are the unit vectors for components 1 and 2.
  • \( \mathbf{x}_i \) is a row of data from matrix \( \mathbf{X}\).
  • \(\hat{\mathbf{x}}_{i,1} = t_{i,1}p_1\) = the best prediction of \( \mathbf{x}_i \) using only the first component.
  • \(\hat{\mathbf{x}}_{i,2} = t_{i,2}p_2\) = the improvement we add after the first component to better predict \( \mathbf{x}_i \).
  • \(\hat{\mathbf{x}}_{i} = \hat{\mathbf{x}}_{i,1} + \hat{\mathbf{x}}_{i,2} \) = is the total prediction of \( \mathbf{x}_i \) using 2 components and is the open blue point lying on the plane defined by \(p_1\) and \(p_2\). Notice that this is just the vector summation of \( \hat{\mathbf{x}}_{i,1}\) and \( \hat{\mathbf{x}}_{i,2}\).
  • \(\mathbf{e}_{i} \) = is the prediction error vector because the prediction \(\hat{\mathbf{x}}_{i} \) is not exact: the data point \( \mathbf{x}_i \) lies above the plane defined by \(p_1\) and \(p_2\). This \(e_{i} \) is the residual distance after using 2 components.
  • \( \mathbf{x}_i = \hat{\mathbf{x}}_{i} + \mathbf{e}_{i} \) : also a vector summation