Tuesday, December 20, 2011

Installing the OpenCV Python Interface on Lion

On some machines I ran into "Segmentation fault 11" when I use the OpenCV Python interface on Mac OS X 10.7 Lion. These problem machines were all upgraded from Snow Leopard which had Python 2.6 (Lion has 2.7). Anyway, the simplest solution for me was to do a clean Lion install and then install OpenCV. Here's the whole process, as a big fat "note to self" and hopefully a help for others.

1. Make a clean install of Lion by copying the install .dmg to a DVD or USB stick. See process here.

2. Install Xcode through the Mac App Store.

3. Get the Scipy/Matplotlib/iPython/... stack from the Scipy Superpack.

4. Install Cmake. Download the .dmg from here and run.

4. Download OpenCV, the latest version is 2.3.1.

5. Unpack, open Terminal and go to the folder OpenCV-2.3.1. Run the following commands:

mkdir build
cd build
cmake -G "Unix Makefiles" ..
make -j8
sudo make install


6. Add the build/lib folder that contains cv2.so to your PYTHONPATH.

I have tried this on a number of computers and it works great for me. If you are having OpenCV-Python problems and want to use MacPorts or other variants, the clean Lion install might help as well.

Good luck.

Tuesday, August 23, 2011

Book draft

The other night I put up the first early draft of my upcoming book "Programming Computer Vision with Python".

The book is meant as an entry point to hands-on computer vision for students, researchers and enthusiasts using python. I'm thinking introductory courses in image analysis and computer vision. Also OpenCV python uses who need to do something outside what's in OpenCV will find the book useful. There is a growing python community around computer vision with projects like pythonvision and SimpleCV. The book can serve as a computer vision introduction for these users as well. And let's not forget all the hackers and enthusiasts out there.

It is not meant as a textbook, it is a manual for getting started with python and/or computer vision. I'm putting everything online in order to get early feedback and help hammering out errors and improve the text and code.

Draft versions together with code and data are available from the book webpage.

I'll take all the feedback and comments I can get! Preferably via email.

UPDATE (Sept 25): Thanks so much for all the feedback and comments! I'm now on the third iteration of the book pdf since putting it online a month ago. Check the book page for updates and keep sending me your thoughts. /JES

Wednesday, June 8, 2011

Another Python Interface for SIFT

I previously wrote about using David Lowe's SIFT code from Python here. There is a good open source alternative for those on Windows or Mac OS (Lowe's binaries are Linux only) called VLFeat. The library is written in c but has a command line interface that we can use much in the same way.

To install VLFeat, download and unpack the latest binary package from the download page (currently the latest version is 0.9.9). Add the paths to your environment or copy the binaries to a directory in your path. The binaries are in the bin/ directory, just pick the sub-directory for your platform. The use of the VLFeat command line binaries is described in the src/ sub-directory. Alternatively you can find the documentation online.

I wrote a similar Python interface as the one I had for Lowe's code. You can download it here (vlfeat.py).

Try it out like this:

from PIL import Image
from pylab import *

process_image('box.pgm','tmp.sift')
l,d = read_features_from_file('tmp.sift')

im = array(Image.open('box.pgm'))
figure()
plot_features(im,l,True)

process_image('scene.pgm','tmp2.sift')
l2,d2 = read_features_from_file('tmp2.sift')
im2 = array(Image.open('scene.pgm'))

m = match_twosided(d,d2)
figure()
plot_matches(im,im2,l,l2,m)

show()

The result looks like this.




For those who want more control, there is also the option of using the Python wrapper I wrote about before.

Tuesday, May 31, 2011

Read/Write World

Last week at ARE2011 Blaise Aguera y Arcas and Avi Bar-Zeev gave two talks that got my attention. The Read/Write World initiative from Bing. This is interesting from a number of angles:

RML - Reality Markup Language

RML is a markup language for describing the shape, location, and content of the world (and images of it). All the details are not posted yet but supposedly supports transforms between views, images, video, panoramas, geo etc. Worth checking in to see where this goes in the near future.

Open Source Viewers

The demo apps show some interesting ways of viewing content. The project will contain HTML5 based viewers available under open source licenses. Again, worth checking in on later.

Services

Hosted by Bing, services will include indexing, matching between images (e.g. homographies), 20GB of storage and possibly other image and video services.

All in all it looks ambitious and very interesting. Not much to see right now but definitely a project to keep an eye on and check out in a few months time.

Friday, April 8, 2011

Adjacency matrix for image pixel graphs

One of my favorite NumPy functions is roll(). Here's an example of using this function to get neighborhood indices to create adjacency matrices for images.

An adjacency matrix is a way of representing a graph and shows which nodes of the graph are adjacent to which other nodes. For n nodes is is an n*n matrix A where a_ij is the number of edges from vertex i to vertex j. The number of edges can also be replaced with edge weights depending on the application.

For images, a pixel neighborhood defines a graph which is sparse since each pixel is only connected to its neighbors (usually 4 or 8). A sparse representation, for example using dictionaries, is preferable if only the edges are needed. For clustering using spectral graph theory, a full matrix is needed. The following function creates an adjacency matrix in sparse or full matrix form given an image:

from numpy import *

def adjacency_matrix(im,connectivity=4,sparse=False):
""" Create a pixel connectivity matrix with
4 or 8 neighborhood. If sparse then a dict is
returned, otherwise a full array. """

m,n = im.shape[:2]

# center pixel index
c = arange(m*n)
ndx = c.reshape(m,n)

if sparse:
A = {}
else:
A = zeros((m*n,m*n))

if connectivity==4: # 4 neighborhood
hor = roll(ndx,1,axis=1).flatten() # horizontal
ver = roll(ndx,1,axis=0).flatten() # vertical
for i in range(m*n):
A[c[i],hor[i]] = A[hor[i],c[i]] = 1
A[c[i],ver[i]] = A[ver[i],c[i]] = 1

elif connectivity==8: # 8 neighborhood
hor = roll(ndx,1,axis=1).flatten() # horizontal
ver = roll(ndx,1,axis=0).flatten() # vertical
diag1 = roll(roll(ndx,1,axis=0),1,axis=1).flatten() # diagonal
diag2 = roll(roll(ndx,1,axis=0),1,axis=0).flatten() # diagonal
for i in range(m*n):
A[c[i],hor[i]] = A[hor[i],c[i]] = 1
A[c[i],ver[i]] = A[ver[i],c[i]] = 1
A[c[i],diag1[i]] = A[diag1[i],c[i]] = 1
A[c[i],diag2[i]] = A[diag2[i],c[i]] = 1

return A

Use it like this:

from pylab import *

im = random.random((10,10))
A = adjacency_matrix(im,8,False)

figure()
imshow(1-A)
gray()
show()

Which should give you an image like the one below.

Tuesday, March 22, 2011

RQ Factorization of Camera Matrices

A common operation on camera matrices is to use RQ factorization to obtain the camera calibration matrix given a 3*4 projection matrix. The simple example below shows how to do this using the scipy.linalg module. Assuming the camera is modeled as P = K [R | t], the goal is to recover K and R by factoring the first 3*3 part of P.

The scipy.linalg module actually contains RQ factorization although this is not always clear from the documentation (here is a page that shows it though). To use this version, import rq like this:

from scipy.linalg import rq

Alternatively, you can use the more common QR factorization and with some modifications write your own RQ function.

from scipy.linalg import qr

def rq(A):
Q,R = qr(flipud(A).T)
R = flipud(R.T)
Q = Q.T
return R[:,::-1],Q[::-1,:]

RQ factorization is not unique. The sign of the diagonal elements can vary. In computer vision we need them to be positive to correspond to focal length and other positive parameters. To get a consistent result with positive diagonal you can apply a transform that changes the sign. Try this on a camera matrix like this:

# factor first 3*3 part of P
K,R = rq(P[:,:3])

# make diagonal of K positive
T = diag(sign(diag(K)))

K = dot(K,T)
R = dot(T,R) #T is its own inverse

Wednesday, March 9, 2011

Multidimensional meshgrid with Python

Looking for an easy way to plot cubes in 3D I found NumPy's mgrid that can do meshgrid in any number of dimensions. The syntax is a bit confusing but once you understand the trick, it works.

Here's a simple example that creates points for a unit cube and plots the points.

from pylab import *
from numpy import *
from mpl_toolkits.mplot3d import axes3d

# create 3D points
x,y,z = mgrid[0:2,0:2,0:2]
xx = x.flatten()
yy = y.flatten()
zz = z.flatten()

# plot 3D points
fig = figure()
ax = fig.gca(projection='3d')
ax.plot(xx,yy,zz,'o')

show()