A fundamental question in computer vision is "Given that two cameras, A and B, are looking at the same scene, what is the orientation of B relative to A?" The essential matrix provides us a way, given enough equivalent points in each camera, to determine this relative orientation. This is due to the following relation between points y in camera A and points y' in camera B: \begin{equation} (y')^TEy=0 \end{equation}
E is the essential matrix and can be further decomposed into a translation and a rotation which will give us our relative orientation, but the translation is written in an odd way:
$$E = R[tx]$$ where \([tx]\) is the matrix cross product of t. So weird.
It turns out this equation has a simple geometric origin. Consider a single point in space P seen by two cameras, A and B. The lines AP, BP, and AB form a plane as seen in the figure below.
Taking the cross product of any two of the respective vectors \(\overline{AP}\), \(\overline{BP}\), and \(\overline{AB}\) will result in a vector perpendicular to the plane. Taking the inner-product (dot product), of this resultant vector \(\overline{V}\) with any vector which lies in the plane will give 0, for example: $$\overline{AP}^T (\overline{AP} \times \overline{BP}) = 0$$ This constraint is the basis of equation (1).
Deriving the essential matrix
Since we are seeking the pose of camera B relative to camera A we can choose our origin to be aligned with camera A so that A sits at (0,0,0) and looks down the positive z-axis. In that case \(\overline{AP} = P-A = P-0 = P\) and \(\overline{BP} = P-B\). Let's also write \(\overline{AB} = B - A\). We can then write: $$P_1^T(T\times(P_1-B))=0$$ To transform the point P in camera A to P2 in camera B: $$P_2=R(P_1-t)$$ And solving for P: $$P_1=R^{-1}P_2+t$$ Plugging this back into our equation: $$P_1^T(t\times(R^{-1}P_2+t-B))=0$$ $$P_1^T(t\times(R^{-1}P_2+t-B))=0$$ $$P_1^T(t\times(R^{-1}P_2))=0$$ Now is a useful time for the matrix form of the cross product: $$P_1^T[t]_xR^{-1}P_2=0$$ Or by transposing the left hand side: $${P_2}^TR[t]_xP_1 = 0$$ since \((R^{-1})^T=R\) and \([t]_x^T=-[t]_x\). Plugging back in our definition of E: $${P_2}^TEP_1 = 0$$ The only difference between this equation and equation 1 is that equation 1 uses normalized image coordinates, so we need \(y'=P_2/z_2\) and \(y=P_1/z_1\): $$ {y'}^TEy = 0 $$
We can now come up with a constraint which is the basis of equation (1). If we take the cross product of any of these two vectors, we will get a vector perpendicular to the plane. Taking the dot product of this perpendicular vector with any of the initial vectors AP, BP, or AB, will give 0.
If we follow this through we will arrive at the desired constraint on the essential matrix.
This scene is composed of two cameras. Select identical points in the two cameras by clicking or pressing on the scene and then click calculate to determine the relative camera orientation.
We can now come up with a constraint which is the basis of equation (1). If we take the cross product of any of these two vectors, we will get a vector perpendicular to the plane. Taking the dot product of this perpendicular vector with any of the initial vectors AP, BP, or AB, will give 0.