After tracking P feature points over
F frames, we generate a measurement matrix W of size 2*F by P, which
is composed by 2 submatrices: UF x P that contains
the horizontal feature coordinates and VF x P , that
contains the vertical feature coordinates, i.e.,
W = [u11 u12
u13 .... u1p]
[u21
u22 u23 .... u2p]
:
:
[uF1
uF2 uF3 ... uFp]
[v11
v12 v13 .... v1p]
:
:
[vF1 vF2 ...........vFp]
where (uij, vij) is the x and y coordinate of the j-th feature point in the i-th frame. Under orthographic assumption, this measurement matrix, when subtracted by the centroid T (i.e., W' = W - T), is of rank 3 [1]. Using singular value decomposition (SVD) this normalized matrix can be factored as:
W'= R*D*St (1)
Because of noise in the measurement, W'
is rarely rank 3, thus the D matrix normally has more than 3 diagonal entries.
We then approximate the matrix as:
W' = R'*D'*S't = M*A (2)
where R' is the first 3 columns of R, D' is a diagonal matrix with the largest 3 eigenvalues, and S' is the first 3 column of S. We now have W' = M * A. M is a 2F by 3 matrix that represent the camera motion. The first F rows are of the form [i1 i2 i3 .... iF]t, such that the if vector is the orientation of the i axis of the camera during f-th frame; the next F rows are of the form [j1 j2 ..... jf], where the jf vector is the orientation of the j axis of the f-frame. A is a 3 x P matrix that represent the object shape, with the p-th column the location of the p-th feature point in the object coordinate. There are many solutions to the same factorization M * A. Essentially, for any invertible 3 x 3 matrix Q:
W' = M*A = R' * Q * Q-1 * S'
where M is the true rotation matrix of the camera motion, and A is the object shape. Let if and jf be the rows of matrix R' for the i axis and j axis in the f-th frame before the transformation, and mf and nf be the rows of M. Since the mf and nf represent the camera axes, they must satisfy the following:
|mf|2 = if *
Q * QT *if = 1 (3)
|nf|2 = jf
* Q * QT
* jf = 1 (4)
mf.nf = if * Q * QT *jf = 0 (5)
Since G = Q * Qt is symmetric, we have
only 6 unknowns. This system can be solved efficiently given G positive
definite [1][2][4].
Next: Perspective Approximation Previous: Introduction Contents: Shape and Motion from Image Streams