PCA Tutorial

Last updated on May 6, 2024

Table of Contents

Reference document: PDF

Eigenvalue decomposition

Consider a matrix $\mathbf{X}$ with $m$ features and $n$ observations, which means a dimension of $m\times n$. Suppose the row means are subtracted, then its covariance matrix can be estimated as:

$$ \mathbf{S}_{\mathbf{X}} \equiv \frac{1}{n-1} \mathbf{X X}^{T} $$

Our goal is to manipulate this covariance matrix to $\mathbf{S_Y}$ so that the redundancy among features are reduced. More specifically, we can do a linear transformation of $\mathbf{X}$ by changing the basis:

Goal: Find some orthonormal matrix $\mathbf{P}$ where $\mathbf{Y}=\mathbf{P} \mathbf{X}$ such that $\mathbf{S}_{\mathbf{Y}} \equiv \frac{1}{n-1} \mathbf{Y} \mathbf{Y}^{T}$ is diagonalized. The rows of $\mathbf{P}$ happens to be the eigenvectors of $\mathbf{S}_{\mathbf{X}}$

Through simple derivation, we find the following: (here $\mathbf{P}$ is still pending to find)

$$ \begin{aligned} \mathbf{S}_{\mathbf{Y}} &=\frac{1}{n-1} \mathbf{Y} \mathbf{Y}^{T} \\ &=\frac{1}{n-1}(\mathbf{P} \mathbf{X})(\mathbf{P} \mathbf{X})^{T} \\ &=\frac{1}{n-1} \mathbf{P} \mathbf{X} \mathbf{X}^{T} \mathbf{P}^{T} \\ &=\frac{1}{n-1} \mathbf{P}\left(\mathbf{X X}^{T}\right) \mathbf{P}^{T} \\ \mathbf{S}_{\mathbf{Y}} &=\frac{1}{n-1} \mathbf{P} \mathbf{A} \mathbf{P}^{T} \end{aligned} $$

The last step above defined $\mathbf{XX^T}=\mathbf{A}$. And of course, $\mathbf{A}$ is a symmetric matrix, thus it has a eigenvalue decomposition, where $\mathbf{E}$ is the orthogonal matrix of eigenvectors on each column, and $\mathbf{D}$ is a diagonal matrix with eigenvalues on each diagonal entries, so we have $\mathbf{A} = \mathbf{E D E}^{T}$.

Thus, we have found a $\mathbf{P} = \mathbf{E}^T$ that satisfies the goal, making $\mathbf{P A P}^T = \mathbf{D}$. Note that here $\mathbf{E}$ is a orthonormal matrix (from eigenvalue decomposition), so $\mathbf{E}^T = \mathbf{E}^{-1}$.

$$ \begin{aligned} \mathbf{S}_{\mathbf{Y}} &=\frac{1}{n-1} \mathbf{P} \mathbf{A} \mathbf{P}^{T} \\ &=\frac{1}{n-1} \mathbf{P}\left(\mathbf{P}^{T} \mathbf{D} \mathbf{P}\right) \mathbf{P}^{T} \\ &=\frac{1}{n-1}\left(\mathbf{P} \mathbf{P}^{T}\right) \mathbf{D}\left(\mathbf{P} \mathbf{P}^{T}\right) \\ &=\frac{1}{n-1}\left(\mathbf{P} \mathbf{P}^{-1}\right) \mathbf{D}\left(\mathbf{P} \mathbf{P}^{-1}\right) \\ \mathbf{S}_{\mathbf{Y}} &=\frac{1}{n-1} \mathbf{D} \end{aligned} $$

Additional Notes

Each eigenvector is called a principle axis.
The projections of the data onto the principle axes are called principle components.
Suppose the eigenvalues are sorted in descending order, the first row vector of $\mathbf{P}$ is the first principle axis, denote as $\mathbf{P}_1$. The first principle component is $\mathbf{P}_1 \mathbf{X}$, which is the projection of $\mathbf{X}$ onto the first principle axis.
Then the first eigenvalue is the variance of the first principle component, which is $\mathbf{P}_1 \mathbf{X} \mathbf{X}^T \mathbf{P}_1^T = \mathbf{P}_1 \mathbf{A} \mathbf{P}_1^T$.

Coding Experiment

nbviewer is an open source project under the larger Project Jupyter. If you see warning message on phone, please change to Chrome browser / or view it on desktop.

Other examples

Classic Example of Yield Curve

A Graphical Experiment

PCA Tutorial

Eigenvalue decomposition

Additional Notes

Coding Experiment

Other examples

Yiming Zhang

Quantitative Researcher Associate, JP Morgan