Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a perennial favorite of asset allocators in the financial industry. It is a statistical tool used to confirm portfolio diversification (or the lack thereof) in terms of hidden drivers of return. I will not concern myself with explaining its theory or the intuition as many on the internet have very successfully done so. Read here or here. This tutorial, however, will show you how to perform this quantitative witchcraft in Mathematica.

For this example let’s suppose that we would like to discover the underlying structure among monthly returns of three indices:

  • Dow Jones Industrial Average
  • S&P 500
  • NASDAQ Composite

Intuition tells us that we should expect very similar drivers of returns for those three indices. Let’s use PCA to see if it is indeed the case.

Prepare Raw Data

If you are the Sultan of Brunei and therefore has a Bloomberg terminal installed on your home computer, then you don’t need me to handhold you through this step. Press “help help” and let your minions do your bidding. For the rest of us proletarians without access to premium market data however, we need to be resourceful. I use Yahoo Finance myself for popular data.

Visit and obtain monthly historical data for DJI (Dow Jones), GSPC (S&P 500), and IXIC (NASDAQ). For this example, I used monthly data between Feb 1987 and Dec 2015.

Data from Yahoo Finance come in the form of index values. You will need to convert them to changes in index values in Excel. Remember to remove any column or row headers. cleaned data should look something like this:


Now, open Mathematica and import the data into variable rawdata.

rawdata=Import["full file path including file extension"][[1]];

It’s always a good idea to check the dimension of imported data:


What you should see is the number of data points and the number of columns grouped in a list in curly brackets. For me, it showed {353,3}. This provides a sanity check against Mathematica’s occasionally finicky Import function.

The Long Way

The long way breaks down the individual steps in producing the analysis. If you are not interested in seeing the “work” so to speak, skip to the next section.

First, find the covariance matrix of rawdata and assign it to variable cov:


Then we will need to find Eigenvectors of cov and assign it to variable eigenvector:


Now, we will need to subtract each data point in rawdata from its column mean. For this to happen, we will need to use a loop function.

First, assign rawdata to variable diff. We will preserve rawdata while working on transforming diff.


Now we perform the loop:

Do[diff[[All,i]] = diff[[All,i]]-Mean[diff[[All,i]]], {i,1,Dimensions[rawdata][[2]]},1];

To find the principal components, we will now do a matrix multiplication between diff and transposed eigenvectors:


That’s it! Variable pc now has the principal components.

To see the variances of each column:


Incidentally this is also equal to the Eigenvalues of variable cov. Try this out for yourself:


To check how much explanatory power each component has in descending order:


To plot the new returns on 3-dimensional Euclidean space with the principal components as the new axes:



Apply dimension reduction and project the points to a 2D space:



The Fast and Furious Way


And boom you are done.

Leave a Reply

Your email address will not be published. Required fields are marked *