Camera Calibration Guide with Matlab Code

In this article, I will be explaining step by step process to perform camera calibration with MATLAB code. I wrote this code while doing Computer Vision course to better learn the academic lessons on intrinsic and extrinsic calibration parameter of a camera. With the help of code, one can even perform calibration of a mobile camera.

Book

I followed the book ‘Computer Vision: A Modern Approach (1st Edition)’ by Forsyth & Ponce to learn about the various intrinsic/extrinsic camera parameters. One can visit my article ‘Computer Vision Resources for a Beginner‘ to know about best resources/books to learn Computer Vision.

Perspective Projection Matrix

It is a matrix which contains all the intrinsic and extrinsic parameters and it has the following form

$ \mathcal{M}= \begin{bmatrix} \alpha \mathbf{r}_1^T – \alpha \cot{\theta} \mathbf{r}_2^T + u_0 \mathbf{r}_3^T & \alpha t_x – \alpha \cot{\theta} t_y + u_0 t_z \\ \frac{\beta}{\sin{\theta}} \mathbf{r}_2^T + v_0 \mathbf{r}_3^T & \frac{\beta}{\sin{\theta}} t_y + v_0 t_z \\ \mathbf{r}_3^T & t_z \end{bmatrix}$

where $ (\alpha,\beta) $ are magnifications in pixel units, $\theta$ is skew, $(u_0, v_0)$ is position (in pixel units) of principal point, $(\mathbf{r}_1^T, \mathbf{r}_2^T, \mathbf{r}_3^T)$ are three rows of rotation matrix and $(t_x, t_y, t_z)$ are coordinates of translation vector. If $\mathbf{P}$ is a homogeneous coordinate vector of a point in the world coordinate system, then corresponding homogeneous image point is

$\mathbf{p} = \mathcal{M} \mathbf{P}.$

We have to convert homogeneous coordinate to non-homogeneous to obtain image point in pixels. If $\mathbf{m}_1^T$, $\mathbf{m}_2^T $ and $\mathbf{m}_3^T$ are rows of the matrix $\mathcal{M} $, then image point is

\begin{align*}
u &= \frac{\mathbf{m}_1^T.\mathbf{P}}{\mathbf{m}_3^T.\mathbf{P}}, \\
v &= \frac{\mathbf{m}_2^T.\mathbf{P}}{\mathbf{m}_3^T.\mathbf{P}}.
\end{align*}

Step – 1: Points for Correspondence

As there are 5 intrinsic parameters and 6 extrinsic parameters, we require minimum 6 points for correspondence to evaluate the perspective projection matrix. We can’t choose all these points to lie on a line or a plane, as this will give no unique information about the projection matrix. In the absence of calibration pad, I used a corner of a room and took photo of it using my mobile camera. One can note that the encircled points in the photo lie on three different plane with not more than 3 points on a single plane.

I then did point correspondence between world coordinates and image coordinates. The unit for World coordinates does not matter if we use one uniformly.

 N = 6; % number of points for correspondence

% each colum in one coordinate
WldPts = zeros(3,N); %
Img1Pts = zeros(2,N);

% points as per my camera calibration
% matrix for world coordinates. Each point will be a column vector
%
WldPts(1,:) = [58.6,58.6, 0, 0, 0, 58.6]; % all x coordinates are mentioned
WldPts(2,:) = [0, 0, 0, 58.6, 58.6, 58.6]; % all y coordinates are mentioned
WldPts(3,:) = [0, 10, 10, 10, 0, 0]; % all z coordinates are mentioned

% corresponding points in image 1
Img1Pts(1,:) = [212, 126, 1618, 2979, 2899, 1265]; %in pixels: all x coordinates are mentioned
Img1Pts(2,:) = [2177, 1913, 1372, 2071, 2344, 3518]; % in pixels: all y coordinates are mentioned

Step – 2: Evaluating Perspective Projection Matrix

Second step is to convert the projection equation into a system of homogeneous linear equations to evaluate the projection matrix. If $n$ points are chosen for correspondence, we get the following system of equations (refer eq. 5.3.1 in the book):-

$\begin{bmatrix} \mathbf{P}_1^T & \mathbf{0}^T & -u_1 \mathbf{P}_1^T \\ \mathbf{0}^T & \mathbf{P}_1^T & -v_1 \mathbf{P}_1^T \\ \dots & \dots & \dots \\ \mathbf{P}_n^T & \mathbf{0}^T & -u_n \mathbf{P}_n^T \\ \mathbf{P}_n^T & \mathbf{0}^T & -v_n \mathbf{P}_n^T \\ \end{bmatrix} \begin{bmatrix} \mathbf{m}_1 \\ \mathbf{m}_2 \\ \mathbf{m}_3 \end{bmatrix} = \mathcal{P}\mathbf{m} = \mathbf{0}.$

Next we have to perform Singular Value Decomposition (SVD) of the matrix $\mathcal{P}$ and take eigenvector corresponding to lowest eigenvalue as solution for $\mathbf{m}$.

 % calculate the vector P corresponding to projection matrix
P = zeros(2*N,12);

for i = 1:N
P(2*i-1,:) = [WldPts(:,i)', 1, 0,0,0,0, -Img1Pts(1,i)*WldPts(:,i)', -Img1Pts(1,i)];
P(2*i,:) = [0,0,0,0,WldPts(:,i)', 1, -Img1Pts(2,i)*WldPts(:,i)', -Img1Pts(2,i)];
end

[U,S,V] = svd(P'*P);
indx = 12;
eigval_min = S(indx,indx);
m_vect = V(:,indx);

% projection matrix M
M = zeros (3,4);
M(1,1:4) = m_vect(1:4);
M(2,1:4) = m_vect(5:8);
M(3,1:4) = m_vect(9:12);

Step – 3: Recovering Parameters

Next step is to recover intrinsic and extrinsic parameters from the perspective projection matrix. For that I used Eqs. (5.3.5) – (5.3.8) from the book.

  a1 = M(1,1:3)';
  a2 = M(2,1:3)';
  a3 = M(3,1:3)';
  % epsilon = +-1. With -1, a different set of values of rotation matrix
  % and translation vector will be obtained.
  epsilon = 1;

  % value of epsilon = +-1 comes from the fact that if a^2 = 1 then
  % a = +-1. A row of rotation matrix has unit magnitude.
  prow = epsilon/ norm(a3);

  r3 = prow*a3;
  u0 = prow^2*(a1'*a3);
  v0 = prow^2*(a2'*a3);

  tmp1 = cross(a1,a3);
  tmp2 = cross(a2,a3);

  tmp3 = (tmp1'*tmp2)/(norm(tmp1)*norm(tmp2));

  % as theta is between 0 to 90, cos(theta) is always positive
  tmp3 = abs(tmp3);

  theta = acosd(tmp3);

  % sign of alpha &amp;amp;amp;amp;amp; beta depends on sign of focal length of lens. Assuming
  % positive focal length
  alpha = prow^2*norm(tmp1)*sind(theta);
  beta = prow^2*norm(tmp2)*sind(theta);

  r1 = tmp2/norm(tmp2);

  r2 = cross(r3,r1);

  % intrinsic parameters matrix
  K = [alpha, -alpha*cotd(theta),u0; ...
      0, beta/sind(theta), v0; ...
      0, 0, 1];  

  % rotation matrix
  R = [r1';r2';r3'];

  % translation vector
  tz = M(3,4)*prow;
  ty = (M(2,4)*prow - v0*tz)*sind(theta)/beta;
  tx = (M(1,4)*prow-u0*tz+alpha*cotd(theta)*ty)/alpha;

Verifying the Results

I obtained the following result for the image shown above:-

$\begin{align} \alpha &= 3368.83, \quad \beta = 3385.01 \\ \theta &= 89.43^{\circ} \\ u_0 &= 1523.64, \quad v_0 = 2112.19 \\ \mathbf{R} &= \begin{bmatrix} -0.7857 & 0.6184 & -0.0172 \\ -0.3838 & -0.4654 & 0.7975 \\ 0.4852 & 0.6332 & 0.6030 \end{bmatrix} \\ \mathbf{t} &= [-3.6982, 20.4628, -136.0939]^T \end{align}$.

In order to cross-check the results, one has to observe the following:-

Skew should be close to $90^{\circ}$ but less than it.
Aspect ratio to be close to one. Though it depends on camera also.
Position of principal point ($u_0, v_0)$ to close to half of the size of the image.

That was all I had to write about the camera calibration. I hope this article has been of use to you. If that is the case, then you can donate me to support this website!