Shape and Motion from Image Streams using Factorization Method

Erika Chuang and Ulises Robles-Mellin

Review of Factorization Method

After tracking P feature points over F frames, we generate a measurement matrix W of size 2*F by P, which is composed by 2 submatrices: U_{F x P} that contains the horizontal feature coordinates and V_{F x P}, that contains the vertical feature coordinates, i.e.,

W = [u₁₁ u₁₂ u₁₃ .... u_1p]
[u₂₁ u₂₂ u₂₃ .... u_2p]

:
:

[u_F1 u_F2 u_F3 ... u_Fp]
[v₁₁ v₁₂ v₁₃ .... v_1p]

:
:

[v_F1 v_F2 ...........v_Fp]

where (u_ij, v_ij) is the x and y coordinate of the j-th feature point in the i-th frame. Under orthographic assumption, this measurement matrix, when subtracted by the centroid T (i.e., W' = W - T), is of rank 3 [1]. Using singular value decomposition (SVD) this normalized matrix can be factored as:

W'= R*D*S^t(1)

Because of noise in the measurement, W' is rarely rank 3, thus the D matrix normally has more than 3 diagonal entries. We then approximate the matrix as:

W' = R'*D'*S'^t= M*A (2)

where R' is the first 3 columns of R, D' is a diagonal matrix with the largest 3 eigenvalues, and S' is the first 3 column of S. We now have W' = M * A. M is a 2F by 3 matrix that represent the camera motion. The first F rows are of the form [i₁ i₂ i₃ .... i_F]^t, such that the i_fvector is the orientation of the i axis of the camera during f-th frame; the next F rows are of the form [j₁ j₂ ..... j_f], where the j_fvector is the orientation of the j axis of the f-frame. A is a 3 x P matrix that represent the object shape, with the p-th column the location of the p-th feature point in the object coordinate. There are many solutions to the same factorization M * A. Essentially, for any invertible 3 x 3 matrix Q:

W' = M*A = R' * Q * Q^{-1 *}S'

where M is the true rotation matrix of the camera motion, and A is the object shape. Let i_fand j_fbe the rows of matrix R' for the i axis and j axis in the f-th frame before the transformation, and m_fand n_f be the rows of M. Since the m_fand n_f represent the camera axes, they must satisfy the following:

|m_f|² = i_f* Q * Q^T*i_f = 1 (3)

|n_f|² = j_f*Q * Q^T* j_f = 1 (4)

m_f.n_f = i_f*Q * Q^T*j_f= 0 (5)

Since G = Q * Qt is symmetric, we have only 6 unknowns. This system can be solved efficiently given G positive definite [1][2][4].

Next: Perspective Approximation Previous: Introduction Contents: Shape and Motion from Image Streams

Erika Chuang and Ulises Robles-Mellin
Last modified: Tue. Mar 14, 2000