1 Introduction

Uncertainty occurs in almost every element of a computer vision system. For example, the measurements of sensors contain noise, their calibration is affected with errors, etc. In order to archive high accuracy, the uncertainty of each element should be taken into account in computer vision systems. Only if the uncertainties are propagated correctly through the whole system, the central filter (e.g. Kalman filter) of the system can make use of them to calculate the optimal result by a statistical model.

In this paper, we focus on the uncertainty of the image (i.e. image noise) and the uncertainty of the sub-pixel template feature matching. We propose an image noise model which can be used for real-time processing. On the other hand, the noise model can be combined with our proposed sub-pixel matching algorithm to calculate the uncertainty of template matching without a significant computational overhead. Particular uncertainty calculation methods are presented for the SAD and NCC matcher. However, it can easily be ported to other template feature matching algorithms.

Feature extraction and feature matching are elementary parts in many computer vision applications, such as optical navigation system (e.g. SLAM [4, 20, 24], IPS [10, 11]). In these applications, features are extracted from one image by a feature extractor (e.g. FAST, AGAST [23, 26, 32]). A feature matcher (e.g. SAD, NCC, KLT [12, 21, 28]) matches the features to another image. The image noise affects the matching result and therefore influences the performance of the whole system. Many researches focus on the problem of feature uncertainties [15, 16, 27, 31].

In [31] the authors present a framework to calculate the uncertainty of scale invariant features (SIFT) [19] and speeded up robust features (SURF) [3]. In that paper, the uncertainty of features are depend on the scale and the neighborhood. The results of the experiments show that the proposed method improves the performance for bundle adjustment. However, due to the complexity of calculating SIFT features, this method possibly may not be used for real-time computer applications, especially, for mobile platforms.

In [16] the authors proposed a method for involving the uncertainty of features into a homography and fundamental matrix calculation. The paper shows that in most of the cases the results can be improved by considering the covariances of the features. However, because of lacking a noise model, the method can only get rough uncertainties of features.

In our method, we assume that the covariance for the feature extraction step is zero, because the template feature matcher gets pixel by pixel matching results. Therefore the uncertainties of feature extraction can be omitted and the noise in both images impacts the feature matching step. By combined with our proposed image noise model, the template feature matcher provides matching results and covariance matrices with values propagated from image noise. This approach simplifies the calculation and enables real-time processing without losing accuracy. As shown in Sect. 4, the uncertainty can be used to identify and eliminate features with high uncertainties, usually indicating mismatched features. The most important benefit is that the uncertainty of the matching can be involved in further calculations, e.g. triangulation, ego-motion calculation of the system, etc. The propagation of the uncertainties through the whole calculation chain to the central filter significantly increases the stability and the quality of the results.

This paper is organized as follows: In Sect. 2 an introduction to our image noise model is given. Such a model can be used to calculate the uncertainties of sub-pixel matching results as well as for further processing. In Sect. 3 a sub-pixel template matching algorithm and a method to propagate uncertainties from image noise to sub-pixel matching results are described. Experimental results are presented in Sect. 4, and Sect. 5 concludes the paper.

2 Image Noise Model

Even though it varies between cameras and scenes, image noise is always present in images taken with digital cameras. There are two major sources of noise. Firstly, fixed pattern noise is caused by different light sensitivities (photo response non-uniformity - PRNU [2]) and signal offsets (dark signal non-uniformity - DSNU [8]) of the pixels of an image sensor. This noise does not change over short time and is usually corrected by the camera itself. Secondly, dynamic noise changes from image to image even without a change of the input signal. It is mainly caused by the read-out electronics (read-out noise [25]), but also by the stochastic nature of the incoming photons (photon noise). This is just a simple model, more accurate one’s can be applied if needed.

To control the impact of image noise to image processing, it can be taken into account as the uncertainty of an image. Such uncertainties can be handled via propagation of uncertainties, which allows to propagate them through the entire computer vision system. In order to achive this goal, the mathematical model of the image noise must be known. There are many noise models proposed in computer vision community, e.g. [5]. However, they either do not quantify the noise (e.g. Salt and Pepper Noise [14]) or are built as common probability models (e.g. Gaussian Noise [9]), which do not take into account the variation of different camera systems.

In this paper, we propose a method to build an image noise model which is suitable for real-time processing. We assume that the parameters of that noise model are different for each camera so that the model building step can be done by camera calibration step. During the calibration, by taking a batch of M frames (e.g. 100) in a short time (less than 1 min) from a fixed scene, M almost identical images are received. The mean value and standard deviation of all pixels are calculated for all values measured during the M frames. Hence:

$$\begin{aligned} \text {Mean}(i,j) = \frac{1}{M}\sum \limits _{m=1}^M g(i,j)_m \end{aligned}$$
(1)

where \(g(i,j)_m\) is the locale gray value at coordinate (ij) for the frame m. The standard deviation of this input vector can be easily calculated. The relation between all mean values and standard deviations can be displayed in a graph, which is done in Fig. 1 for a set of sample images. The standard deviations can be seen as a function of the corresponding mean values. It is obvious that the standard deviation grows according to the mean gray value. This reveals the relation between pixel noise and the gray value of the pixel. On the other hand, because of electronic noise, the standard deviation should be greater than zero even for dark pixels. Based on this, we propose a noise model for each pixel.

$$\begin{aligned} \text {Noise} = \sqrt{N_E^2+I/G} \end{aligned}$$
(2)
Fig. 1.
figure 1

In the left: pixels mean and the corresponding standard deviation are shown by blue points. The red curve is the fitted model based on our proposed noise equation, with \(G = 58.1224\) and \(N_E = 0.3166\). In the right: by comparison with left side, the input signal of camera is amplified by 10 dB, from decibel equation \(G_{dB}=20log_{10}(V/V_0)\), the ratio between G in (a) and (b) is 3.162. The fitted new noise model is \(G = 18.1069\) and \(N_E = 0.6453\), the ratio between two G is 3.21, it is very close to the theory value. The camera is Prosilica GC1380H. (Color figure online)

\(N_E^2\) is the variance of electronic noise of camera in gray scale values, the second part is from shot noise (\(\text {Shot noise} = \sqrt{signal}\) [13]), here I is the gray value of the pixel and G is a gain parameter. Next, a Gauss–Newton algorithm is used to estimate the model with the mean values and standard deviations from the calibration images, fitting the curve (2) to the data. Equation (3) shows the Gauss–Newton algorithm.

$$\begin{aligned} \beta ^{(s+1)}=\beta ^{(s)}-(J_r^\mathsf {T}J_r)^{-1}J_r^\mathsf {T}r(\beta ^{(s)}) \end{aligned}$$
(3)

The Gauss–Newton algorithm iteratively calculates the results. In Eq. 3, \(\beta \) is the vector of variables to calculate (in our case \(N_E\) and G), the superscript s indicate the sth iteration value. \(J_r\) is the Jacobian matrix of a residual function \(r(\beta )\), where

$$\begin{aligned} \begin{aligned} r(\beta )=y-f(I,\beta ) \\ J_r=\frac{\partial r(\beta )}{\partial \beta } \end{aligned} \end{aligned}$$
(4)

In our case y is the calculated standard deviation of the gray values of the image set, and the \(f(I,\beta )\) is the proposed noise model (Eq. (2)). Starting with an initial \(\beta ^{(0)}\), the Gauss–Newton algorithm can get a convergent \(\beta \) (Fig. 1) after several iterations. More details can be found at [22]. Knowing the parameters G and \(N_E\) makes the noise model available. This noise information can be used in the further processing, e.g. to model the uncertainty of template matching introduced in Sect. 3.2.

3 Error Model for Template Sub-pixel Matching Algorithm

Once the image noise model is known, it can be involved into data processing chain to get the uncertainty information. In this section, we focus on the uncertainty of feature matching result based on proposed noise model. The calculated uncertainty of matching result can be easily propagated to further calculation.

The SAD and NCC template feature matchers [1, 6] are extensively used in stereo feature matching, feature tracking, etc. Therefore in this paper, these two feature matchers are used. Moreover, we extend them by Taylor’s theorem to get sub-pixel level matching results. Sub-pixel matching algorithm is a well-researched topic in computer vision community, many sub-pixel matching algorithms are proposed [17, 21, 30]. Besides, descriptor based feature extraction algorithm [3, 18, 19] can get sub-pixel matching result as well. All of these methods can get good sub-pixel matching results. However, these methods need iteration calculation, image pyramid, difference of Gaussian, brute-force search respectively, which are too “heavy” for low computational resource platform. Our method is derived from polynomial interpolation which is faster and easy to implement and can get acceptable results, on the other hand, the uncertainty of matching result can be calculated by combination with our noise model.

3.1 Template Sub-pixel Matching Algorithm

The sub-pixel matching algorithm includes three steps. At first, the common SAD/NCC template feature matcher is used to match the features, the details are shown below. For each extracted feature on the first image, a \(5\times 5\) pixel region around it is taken as a template. Then the template is shifted over the second image pixel by pixel. For each position, the SAD/NCC value between the template and covered area in the second image is calculated. The search area in the second image is limited by some conditions (e.g. epipolar line). The position in the second image with the lowest SAD value or the highest NCC value is the matched feature. This step outputs the matching results with pixel-level coordinates.

Secondly, in case that the search area in the first step does not cover all the \(3\times 3\) pixel neighbor positions around the matched feature (e.g. epipolar line case), the SAD/NCC values for these positions need to be calculated by the same template. Then a \(3\times 3\) SAD/NCC matrix \(\varvec{A}\) can be picked up, as shown in Fig. 2.

Fig. 2.
figure 2

In (a), a \(5 \times 5\) template is taken from first image, the green cross represents the feature position. In (b), part of search area on the second image is shown, the blue cross indicates the matched feature position, the SAD/NCC value between the template and the covered area (red part) is stored in the central entry of matrix \(\varvec{A}\) in figure (c). By sliding the template center around the \(3 \times 3\) neighbor area of the matched feature in (b) (as indicated by different colors), all the values in the matrix \(\varvec{A}\) can be calculated. The template and all the \(7 \times 7\) pixels in figure (b) are needed for the propagation of uncertainties step which is described in Sect. 3.2. (Color figure online)

In the third step, the values of matrix \(\varvec{A}\) can be seen as a surface in 3D space with the central element located in a valley (SAD) or on a hill (NCC). Using a two-dimensional interpolation of the values allows to get a more precise position of the surface’s extremum: the sub-pixel position of the matched feature. Two-dimensional interpolation follows the second-order Taylor series in Sect. 3.1.

$$\begin{aligned}&f(x+\delta x,y+ \delta y) \approx \nonumber \\&\quad f(x,y) {+} \left[ \frac{\partial f}{\partial x}\delta x {+} \frac{\partial f}{\partial y}\delta y\right] {+} \frac{1}{2}\left[ \frac{\partial ^2 f}{\partial x^2}(\delta x)^2 {+} 2\frac{\partial ^2 f}{\partial x\partial y}\delta x \delta y {+} \frac{\partial ^2 f}{\partial y^2}(\delta y)^2\right] \end{aligned}$$
(5)

In Sect. 3.1, f(xy) is the matrix \(\varvec{A}\), where x and y are the feature coordinates. If the feature coordinate is expressed in vector form \(\varvec{x} = \begin{bmatrix} x&y\end{bmatrix}^\mathsf {T}\), then the equation can be written as:

$$\begin{aligned} f(\varvec{x}+\delta \varvec{x}) \approx f(\varvec{x}) + \left( \frac{ df}{d\varvec{x}}\right) ^\mathsf {T}\delta \varvec{x}+ \frac{1}{2}\delta \varvec{x}^\mathsf {T}\frac{d^2f}{d \varvec{x}^2}\delta \varvec{x} \end{aligned}$$
(6)

The local extremum is calculated by Eq. (7)

$$\begin{aligned} \frac{\partial f(\varvec{x})}{\partial \varvec{x}} = 0 \end{aligned}$$
(7)

this leads to:

$$\begin{aligned} \delta \varvec{\hat{x}}=-\left( \frac{d^2f}{d \varvec{x}^2}\right) ^{-1}\frac{df}{d\varvec{x}} \end{aligned}$$
(8)

Equation (8) outputs two real numbers, taken as an offset for the feature coordinate x and y. Adding these two values to the feature pixel level coordinate will get the sub-pixel coordinate of the feature.

3.2 Propagation of Uncertainty of the SAD Feature Matcher

In this subsection, the proposed image noise model is applied to the SAD sub-pixel matching algorithm. The aim is to have an uncertainty model of the matching procedure. The equation of SAD is shown as follows.

$$\begin{aligned} \text {SAD}(u,v)=\sum \limits _{i,j} \left| {{\varvec{S}}(u+i,v+j)-{\varvec{T}}(i,j)}\right| \end{aligned}$$
(9)

\(\varvec{S}(u + i, v + j)\) is the search area in the second image, \(\varvec{T}\) is the template from the first image. For each matched feature pair, the SAD propagation of uncertainties algorithm includes two parts. The first part propagates the image noise to the uncertainty of the \(3\times 3\) matrix \(\varvec{A}\). The second part handles the uncertainty of \(\varvec{A}\) and finally gets the uncertainty of the sub-pixel matching result.

Part 1: From the linear propagation of uncertainties theory [7] it is known that in order to propagate the uncertainty; a matrix \(\varvec{F}\) which can linearize the SAD calculation must be known.

$$\begin{aligned} \varvec{a} = \varvec{F}\varvec{v} \end{aligned}$$
(10)

Referring to Fig. 2, the \(9 \times 1\) vector \(\varvec{a}\) of Eq. (10) is the reformatted \(3 \times 3\) matrix \(\varvec{A}\). The vector \(\varvec{v}\) includes all the pixel values from the template and the \(7 \times 7\) search area on the second image. The method for specifying \(\varvec{F}\) is described in the following.

The template from the first image is reformatted, from a \(5\times 5\) matrix \(\varvec{V}_f\) into a \(25\times 1\) vector \(\varvec{v}_f\). The corresponding \(7\times 7\) area \(\varvec{V}_s\) (see Fig. 2) in the second image is reformatted into a \(49\times 1\) vector \(\varvec{v}_s\).

By concatenation of these two vectors, a \(74\times 1\) vector \(\varvec{v}= [\varvec{v}_s \ \ \varvec{v}_f]^\mathsf {T}\) (see Fig. 3) is defined and used for further calculations. Furthermore, a \(9\times 74\) matrix \(\varvec{F}\) (see Fig. 3) is built. In the first, \(\varvec{F}\) is set to be a zero matrix (i.e. all of the entries equals zero). Then, some entries of \(\varvec{F}\) are calculated as follows:

$$\begin{aligned}&\varvec{F}_s[(i+m-1+7(j+n-2)),(m+3(n-1))] \nonumber \\&\qquad \qquad \qquad =\mathrm {sgn}[\varvec{V}_s(i+m-1,j+n-1),\varvec{V}_f(i,j)] \nonumber \\&\varvec{F}_f[(i+5(j-1)),(m+3(n-1))]\\&\qquad \qquad \qquad =-\mathrm {sgn}[\varvec{V}_s(i+m-1,j+n-1),\varvec{V}_f(i,j)] \nonumber \\&\text{ with } \;\; 1 \le i,j \le 5, \quad 1 \le m,n \le 3 \nonumber \end{aligned}$$
(11)

where \(\varvec{V}_f(i,j)\) indicates the entry which is located in the i th column and j th row in matrix \(\varvec{V}_f\).

The \(9\times 49\) matrix \(\varvec{F}_s\) is the left part of \(\varvec{F}\) and the \(9\times 25\) matrix \(\varvec{F}_f\) is the right part, as shown in Fig. 3. Parameters m and n are shift coordinates of the template \(\varvec{V}_f\) over the search area \(\varvec{V}_s\) in horizontal direction and vertical direction, respectively.

For example, the blue area in Fig. 2(b) indicates the shift coordinate (1, 1), and the red area indicates (2, 2). The sign function \(\mathrm {sgn}\) is defined as follows:

$$\begin{aligned} \mathrm {sgn}(x,y) := {\left\{ \begin{array}{ll} -1 \quad \text {if}\ x < y \\ 1 \quad \ \ \, \text {if}\ x > y \end{array}\right. } \end{aligned}$$
(12)

The template \(\varvec{V}_f\) can only cover a part of search area \(\varvec{V}_s\), for each (mn); some entries in \(\varvec{F}_s\) remain unchanged as 0. Therefore, the values in \(\varvec{F}\) are from the set \(\{-1, 0, +1\}\). Finally, we obtain a dynamically changing matrix \(\varvec{F}\), as the values of \(\varvec{F}\) are different for different features. These steps guarantee that the vector \(\varvec{a}\) in Eq. (10) is always identical to the standard absolute calculation results.

Fig. 3.
figure 3

First-image and second-image areas are reformatted first, forming a \(74 \times 1\) vector \(\varvec{v}\). Next, values in \(\varvec{F}\) are calculated by Eq. (12). Finally, SAD vector \(\varvec{a}\) equals \(\varvec{F} \varvec{v}\). The \(3\times 3\) matrix \(\varvec{A}\) is obtained by reformatting \(\varvec{a}\); this matrix is identical to the results of the original SAD algorithm. The red part of the second-image area shows the area considered for the calculation of the first SAD value. The red part in \(\varvec{F}\) reflects the relationship between the first-image area and the red part of the second-image area. (Color figure online)

Once we know the matrix \(\varvec{F}\), the SAD algorithm becomes a linear calculation. From our proposed noise model, the noise vector \(\varvec{v}_{n}\) of all pixels in \(\varvec{v}\) can be calculated. Assuming, that the noise level between each pixel is uncorrelated, a \(74\times 74\) covariance matrix \(\varvec{\varSigma _v}\) is defined. Its diagonal elements are \(\varvec{v}_{n}^2\), and the others are 0. The covariance of \(\varvec{a}\) can be calculated as:

$$\begin{aligned} \varvec{\varSigma _a} = \varvec{F}\varvec{\varSigma _v}\varvec{F}^\mathsf {T} \end{aligned}$$
(13)

Hence, \(\varvec{\varSigma _a}\) is the \(9\times 9\) covariance matrix of \(\varvec{a}\).

Part 2: the task of the second part is the propagation of uncertainties from \(\varvec{\varSigma _a}\) to the final sub-pixel calculation results. From Eqs. (6) and (8), the sub-pixel calculation step is a non-linear function. The Jacobian matrix of Eq. (8) is calculated. First of all, Eq. (8) can be written as:

$$\begin{aligned} \delta \varvec{\hat{x}}=\begin{bmatrix} -(\partial yy \partial x - \partial xy \partial y)/(\partial xx \partial yy - \partial xy\partial xy )\\ -(\partial xx \partial y - \partial xy \partial x)/(\partial xx \partial yy - \partial xy \partial xy) \end{bmatrix} \end{aligned}$$
(14)

where

$$\begin{aligned} \begin{aligned} \partial x = \frac{a_{23}-a_{21}}{2} \qquad \partial xx = a_{23}+a_{21}-2 a_{22} \\ \partial y = \frac{a_{32}-a_{12}}{2} \qquad \partial yy = a_{32}+a_{12}-2 a_{22} \\ \partial xy = \frac{a_{33}-a_{31}-a_{13}+a_{11}}{4} \end{aligned} \end{aligned}$$
(15)

\(a_{ij}\) is the element of matrix \(\varvec{A}\) located in row i and column j. The Jacobian matrix is calculated as:

$$\begin{aligned} \varvec{J}=\begin{bmatrix}\frac{\partial \varvec{\hat{x}}}{\partial a_{11}}&\frac{\partial \varvec{\hat{x}}}{\partial a_{12}}&\dots&\frac{\partial \varvec{\hat{x}}}{\partial a_{33}} \end{bmatrix} \end{aligned}$$
(16)

\(\varvec{J}\) is a \(2\times 9\) matrix, so the last step of calculation is:

$$\begin{aligned} \varvec{\varSigma }_{\delta \varvec{\hat{x}}} = \varvec{J}\varvec{\varSigma _a}\varvec{J}^\mathsf {T} \end{aligned}$$
(17)

\(\varvec{\varSigma }_{\delta \varvec{\hat{x}}}\) is a \(2\times 2\) covariance matrix, the diagonal elements are the variance values of the matched sub-pixel coordinate, and the off-diagonal entries are the covariance values of the sub-pixel coordinates. This matrix finally includes the uncertainty information propagated from the image noise to the SAD sub-pixel matching result and can be used for further processing.

3.3 Propagation of Uncertainty of the NCC Feature Matcher

The normalized cross correlation (NCC) template matching algorithm is very similar to the SAD algorithm, NCC uses the Eq. (18) to calculate the normalized cross correlation between the templates from both images.

$$\begin{aligned} \text {NCC}(u,v)=\dfrac{1}{n}\sum \limits _{i,j}\dfrac{(\varvec{S}(u+i,v+j)-\varvec{\bar{S}})(\varvec{T}(i,j)-\varvec{\bar{T}})}{\sigma _{\varvec{S}}\sigma _{\varvec{T}}} \end{aligned}$$
(18)

In the equation, \(\varvec{S}(u + i, v + j)\) is the search area in the second image, \(\varvec{T}\) is the template from the first image, where \(n = i\cdot j\) and \(\varvec{\bar{S}}\), \(\varvec{\bar{T}}\) are the means of the template and the search area. The standard deviation of \(\varvec{S}\) and \(\varvec{T}\) are represented by the symbols \(\sigma _{\varvec{S}}\), \(\sigma _{\varvec{T}}\). Because of the non-linear nature of NCC, the error propagation for the \(3 \times 3\) NCC matrix \(\varvec{A}\) needs to be done by a Jacobian matrix. Considering an example, the template size is the same as in Fig. 3. Hence, the Jacobian matrix is calculated as follows:

$$\begin{aligned} \varvec{J}_{ncc} =\dfrac{\partial f}{\partial p} = \begin{bmatrix} \dfrac{\partial f_1}{\partial p_{r1}}&\dfrac{\partial f_1}{\partial p_{r2}}&\dots&\dfrac{\partial f_1}{\partial p_{l25}} \\ \vdots&\vdots&\ddots&\vdots \\ \dfrac{\partial f_9}{\partial p_{r1}}&\dfrac{\partial f_9}{\partial p_{r2}}&\dots&\dfrac{\partial f_9}{\partial p_{l25}} \end{bmatrix} \end{aligned}$$
(19)

f is the common NCC equation given in Eq. (18), \( p_{ln} \) are the pixel values from the first image template and \(p_{rn}\) are the pixel values in the \(7 \times 7\) search area in the second image. \( f_1 \) maps the template and the search area to the first entry of the \( 3\times 3 \) NCC matrix and so on. The usage of the \( 9\times 74 \) matrix \( \varvec{J}_{ncc} \) is same to the usage of \(\varvec{F}\) in Sect. 3.2. After the calculation of the covariance matrix \(\varvec{\varSigma _a}\), the remaining steps are identical with part 2 in Sect. 3.2. To avoid numerical errors, we recommend to use the Matlab symbolic calculation to get the final format of \( \varvec{J}_{ncc} \).

4 Experiment Results

To check the quality of the proposed method, we designed three experiments. Without loss of generality, the first experiment quantitatively verifies the proposed algorithm. The second one is a general feature matching test, and the third experiment checks the proposed method on an optical navigation system IPS [11], which was developed at the German Aerospace Center (DLR).

The first test is designed to verify the uncertainties propagation. The testing method is similar to a Monte Carlo test. The difference is that we do not generate artificial noise and add to image. Instead, we take a set of images, these images include noise from the camera system. The details of the test are described below.

The first step is similar to the noise model calculation step: The stereo camera system takes 100 image pairs from a fixed scene in a short time. The image contents are almost the same, but affected by noise from the camera system. Next, a feature extractor detects features from the first left image, then a sub-pixel template matcher matches the features to the first right image (without uncertainties propagation step). This step is repeated for all of the image pairs, but the feature extraction step is skipped. Instead, the coordinates of features in the first left image are used. As features are located in the same coordinate during 100 frames, there are 100 different stereo matching results, the standard deviation of the resulting sub-pixel offset is calculated and drawn as a curve in Fig. 4. Now these empirical results can be used to compare them with the propagated uncertainty which is calculated in the next step.

In the second step, only the first image pair is needed. We apply the sub-pixel template matcher and involve the propagation of uncertainties step in the first image pair. This way we get the uncertainties of matched features which are propagated from image noise. The Fig. 4 indicates that the noise model and propagation of uncertainties algorithm shows an accurate reflection of the real uncertainties of the matching results.

Fig. 4.
figure 4

The left graph shows the sub-pixel uncertainty of x coordinate, the blue curve is the uncertainty of features sub-pixel coordinate calculated from 100 images, and the red curve shows the propagated uncertainty based on our proposed method. The right graph shows the sub-pixel uncertainty of y coordinate. These two graphs show the strong correlation between real uncertainties and propagated uncertainties. (Color figure online)

The second experiment check the performance of the proposed algorithm on stereo feature matching problem. First, the sub-pixel matching algorithm works on a stereo camera system. The noise model of the camera system is already calculated using our method. Figure 5 shows the images from left and right camera respectively. We use the AGAST [23] feature extractor to get features from the left image, and a SAD template matcher to match the features to the right image under epipolar line constraint. The green and orange crosses on the left image are the features successfully matched to the right image by a common SAD template matcher, and the yellow and cyan crosses symbolize mismatches. However, the orange crosses are the features filtered out by our proposed sub-pixel matching algorithm because of their high uncertainty (higher than 0.4 pixel). And in fact the orange features in Fig. 5 (near the cupboard and the white computer monitor) actually cannot be seen from the right camera’s perspective. However, the common template matcher wrongly matches them to the right image (not drawn). Finally, the crosses on the left and right image are the features successfully matched after the sub-pixel matching step. This test shows that with the propagated uncertainty it is possible to filter for mismatched features, proving the correctness of the proposed algorithm from another perspective. The algorithm also improves the robustness of the system by filtering the mismatched features.

Fig. 5.
figure 5

Image pair from a stereo camera system, the crosses are the features. The orange and green crosses on the left image symbolize the successfully matched features by a normal template matcher, the yellow and cyan crosses are features where matching failed. The orange crosses are filtered out by the proposed SAD sub-pixel matching algorithm with error propagation. (Color figure online)

The last test based on an optical navigation project. The test platform is IPS. IPS was developed for real-time vision-aided inertial navigation [10, 11], especially for an environment where GNSS is not available. The IPS is a Kalman filter based optical navigation system, in the previous version, the normal NCC and SAD feature matcher are used for stereo matching and tracking respectively. The matching results are integral pixels. On the other hand, because lacking noise model, we cannot get uncertainties of matching results. However in order to use Kalman filter, the uncertainties information must be provided, therefore in the previous version, only a rough uncertainty of matching result (e.g. quantization error (\(\frac{1}{12}\)) [29]) are given. These problems can be solved by our proposed method. This experiment shows the comparison of measured trajectory in previous IPS version and the IPS combined with our noise model and sub-pixel algorithm.

For test purposes, a dataset is recorded by walking with the IPS through a realistic scene with a length of about 410 m. Such a physical run is called a session. We recorded eight sessions in total. Because the lack of ground truth, the start and end position of the loop are exactly the same. As the system does only consider the motion information extracted from two consecutive image pairs, and it does not recognize that it has been in a place before, the performance of the system can be measured by the error between the known start position and the calculated end position. As a RANSAC algorithm is used for optical navigation, the calculated positions have a random component. This is why each session is processed (offline) 50 times to calculate the root-mean-square (RMS) of trajectory errors as a final result. More details about the test procedure can be found in [32].

Table 1. The 8 sessions trajectory errors. The RMS is the root-mean-square of errors of 50 runs, and STD is the standard deviation. The results show the sub-pixel matching algorithm decrease not only the trajectory errors but also the standard deviation. New IPS is the IPS combined with proposed methods.

The IPS gets state of the art results, the trajectory error is about \(0.1\%\) of the traveled distance. In this instance, usually it is hard to get improvement. However, as Table 1 shows, the accuracy of the measurement is increased about 12% by our method. On the other hand, the new algorithm also leads to a better standard deviation, the improvement of standard deviation is about 44%, which means an improvement of the robustness of the system.

5 Conclusion

In this paper, we propose a method to model the image noise, this noise model can be retrieved during normal camera calibration step. Based on the noise model, uncertainties propagation for sub-pixel matching algorithms is described. The proposed image noise model and the method to get uncertainty of sub-pixel matching results can be widely used in many computer vision applications. The performance of the proposed methods is evaluated by a full system test. The experimental results show that the noise model is actually able to reflect the uncertainty of sub-pixel matching results. An additional test shows that the uncertainty calculation can even be utilized as a mismatched filter without any computational overhead. The last test concentrates on the performance of the new algorithm combined with an optical navigation system. The result proves that the proposed method decreases the trajectory errors and standard deviation of errors simultaneously. The test shows our method can get significantly better results without much effort. In our future work, we will implement the uncertainties propagation method for other sub-pixel matching algorithms.