Shape and Motion from Image Streams using Factorization Method

Erika Chuang and Ulises Robles-Mellin


next previous contents
 

Review of Factorization Method


After tracking P feature points over F frames,  we generate a measurement matrix W of size 2*F by P, which is composed by 2 submatrices: UF x P  that contains the horizontal feature coordinates and VF x P , that contains the vertical feature coordinates, i.e.,

W =  [u11 u12 u13 .... u1p]
       [u21 u22 u23 .... u2p]

           :
           :

       [uF1 uF2 uF3 ... uFp]
       [v11 v12 v13 .... v1p]

         :
         :

      [vF1 vF2 ...........vFp]



where (uij, vij) is the x and y coordinate of the j-th feature point in the i-th frame.  Under orthographic assumption, this measurement matrix, when subtracted by the centroid T (i.e., W' = W - T),  is of rank 3 [1].  Using singular value decomposition (SVD)  this normalized matrix can be factored as:

W'= R*D*St    (1)


 


Because of noise in the measurement, W' is rarely rank 3, thus the D matrix normally has more than 3 diagonal entries.  We then approximate the matrix as:
 


W' = R'*D'*S'= M* (2)


 


where R' is the first 3 columns of R, D' is a diagonal matrix with the largest 3 eigenvalues, and S' is the first 3 column of S.  We now have W' = M * A. M is a 2F by 3 matrix that represent the camera motion.  The first F rows are of the form [i1 i2 i3 .... iF]t,  such that the if vector is the orientation of the i axis of the camera during f-th frame;  the next F rows are of the form [j1 j2 ..... jf], where the jf vector is the orientation of the j axis of the f-frame. A is a 3 x P matrix that represent the object shape, with the p-th column the location of the p-th feature point in the object coordinate.  There are many solutions to the same factorization M * A.  Essentially, for any invertible 3 x 3 matrix Q:

W' = M*A = R' * Q * Q-1 * S'


 


where M is the true rotation matrix of the camera motion, and A is the object shape.  Let if and jf be the rows of matrix R' for the i axis and j axis in the f-th frame before the transformation, and mf and nf be the rows of M.  Since the mf and nf represent the camera axes, they must satisfy the following:

|mf|2 = if * Q * QT *if = 1  (3)
 

|nf|2 = j* Q * QT * jf = 1  (4)
 

mf.nf =  if * Q * QT *jf = 0  (5)



Since G = Q * Qt is symmetric, we have only 6 unknowns. This system can be solved efficiently given G positive definite [1][2][4].
 

next previous contents

Next: Perspective Approximation Previous: Introduction Contents: Shape and Motion from Image Streams



Erika Chuang and Ulises Robles-Mellin
Last modified: Tue. Mar 14, 2000