Principal component analysis (PCA) is a useful technique for dimensionality reduction and is optimal in the sense that it represents the variability if the training data with as few dimensions as possible. The resulting projection matrix can be seen as a change of coordinates to a coordinate system where the coordinates are in descending order of importance.
To do PCA on image data, the images need to be converted to a one-dimensional vector representation, e.g. using flatten() in numpy. A tiny 100*100 pixel grayscale image is then a point in a 10,000 dimensional space, a megapixel image has dimensions in the millions, so dimensionality reduction comes handy in many applications.
The flattened images are collected in a single matrix by stacking them, one row for each image. The rows are then centered around the mean image before the computation of the dominant directions. To find the principal components, singular value decomposition (SVD) is usually used but if the dimensionality is high, there is a useful trick that can be used instead since the
SVD computation will be very slow in that case. Here is what it looks like in python code.
from PIL import Image
from numpy import *
# Principal Component Analysis
# input: X, matrix with training data as flattened arrays in rows
# return: projection matrix (with important dimensions first),
# variance and mean
num_data,dim = X.shape
mean_X = X.mean(axis=0)
for i in range(num_data):
X[i] -= mean_X
print 'PCA - compact trick used'
M = dot(X,X.T) #covariance matrix
e,EV = linalg.eigh(M) #eigenvalues and eigenvectors
tmp = dot(X.T,EV).T #this is the compact trick
V = tmp[::-1] #reverse since last eigenvectors are the ones we want
S = sqrt(e)[::-1] #reverse since eigenvalues are in increasing order
print 'PCA - SVD used'
U,S,V = linalg.svd(X)
V = V[:num_data] #only makes sense to return the first num_data
#return the projection matrix, the variance and the mean
The threshold for when to switch from SVD to use the trick with computing eigenvectors of the (smaller) covariance matrix X*X^T is a bit arbitrary in this case but that doesn't really matter. The rows of the matrix V are orthogonal and contains the coordinate directions in order of descending variance of the training data.
Let's try this on an example of face images downloaded from the "Labeled Faces in the Wild" data set available at http://vis-www.cs.umass.edu/lfw/index.html.
The file gwb_cropped.zip contains 20 selected images of George W Bush, cropped slightly tighter to the face than the originals. Assuming that the filenames of these cropped face images are stored in a list, imlist, the principal components can be computed and shown like this.
from PIL import Image
im = numpy.array(Image.open(imlist)) #open one image to get the size
m,n = im.shape[0:2] #get the size of the images
imnbr = len(imlist) #get the number of images
#create matrix to store all flattened images
immatrix = numpy.array([numpy.array(Image.open(imlist[i])).flatten() for i in range(imnbr)],'f')
V,S,immean = pca(immatrix)
#mean image and first mode of variation
immean = immean.reshape(m,n)
mode = V.reshape(m,n)
#show the images
Note that the images need to be converted back from the one-dimensional representation using reshape(). Running the example should give two plots like these.
The mean image of the 20 sample images and the first mode, i.e. the direction with most variation.
This example is of course a bit silly since, to have something useful, the images should be properly aligned and cropped or masked to remove the influence of unimportant pixels.
Anyway, hope this is useful to someone. I'll write an update with some better examples (I have some ideas for cool data sets) when there is time.