Stereo matching algorithm based on illumination normal similarity and adaptive support weight

Abstract. For the purpose of representing the feature of the gray image, illumination normal of pixels in a two-dimensional gray image plane is proposed, which can reflect the high-frequency information of the gray image. In order to get an accurate dense disparity map based on the adaptive support weight (ASW) approach in RGB vector space, a matching algorithm is proposed that combines the illumination normal similarity, gradient similarity, color similarity, and Euclidean distance similarity to compute the corresponding support weights and dissimilarity measurements. After testing by the Middlebury stereo benchmark, the result of the proposed algorithm shows more accurate disparity than many state-of-the-art stereo matching algorithms.


Introduction
Dense stereo matching is one of the most challenging problems in the field of computer vision.It is an important requirement for many applications, such as three-dimensional (3-D) reconstruction and virtual view synthesis.Generally, the purpose of stereo matching is to find the corresponding pixels between the stereo image pairs captured by two or more cameras in the same scene, and get the disparity map composed by the coordinate difference of corresponding pixels in the stereo image pair.
There are plenty of algorithms available to solve the dense stereo problem, the choice of which depends on whether you want to get the area-based solution by a global method or local method.Stereo matching algorithms can be classified as either global or local.The typical global algorithms, such as graph cuts, 1 belief propagation, 2,3 and dynamic programming, 4,5 can generate a dense disparity map precisely based on global energy function and suitable constraints.However, graph cuts and belief propagation usually consume a great deal of time and memory, and dynamic programming needs specific constraints at different times.Local matching algorithms are known for their simplicity and efficiency, and they also can achieve disparity more accurately.The basic idea of local matching is to estimate the disparity of a pixel in the target image by correlating a support window around the pixel with a similar support window in the reference image.One of the typical local matching algorithms is adaptive support weight (ASW), proposed by Yoon. 6The method in Ref. 6 adopts the fixed-size square windows and allocates a support weight to each pixel in the window according to pixel color and position similarities.The disparity map generated by Ref. 6 can get perfect effects similar to that obtained by global algorithms.The gradient information can indicate the variation between neighboring pixels and the structure of the information, 7 as well as decreasing the noise presented in the disparity map more.The method that uses the gradient similarity and local ASW to compute the disparity is proposed. 8Considering that the information will be lost when converting the stereo images from RGB vector space to the CIELab color space, the ASW approach in RGB vector space is proposed. 9The gradient similarity is used to compute the support weight in Ref. 9.However, the difficulties in stereo matching are still at the boundaries of objects and fine texture area, which can be reflected by the high-frequency information.In this paper, we propose to utilize the illumination normal similarity of the twodimensional (2-D) gray image to compute the support weight based on the ASW in RGB vector space.The experimental results prove that the proposed method can improve the accuracy of the disparity map.
This paper is organized as follows.Section 2 gives the definition of illumination normal in the image space.Section 3 provides a specific explanation for the proposed method.In Sec. 4, experimental results of the proposed method are compared to that of other methods.Conclusions and future work are provided in Sec. 5.

Illumination Normal of Pixels in a 2-D Image Plane
A normal vector almost exists for each point of the object in 3-D space.Given a 2-D gray image, the gray value of every pixel can reflect the illumination information of the object.In order to obtain the illumination normal vector of pixels in a 2-D image, each pixel of the image is regarded as a point in 3-D space.This can be expressed as P½x; y; pðx; yÞ, where x and y are the horizontal and vertical coordinates, respectively, and pðx; yÞ is the pixel value at the position ðx; yÞ.
The current point and the points located below and to the right of it are used to compute its normal vector.Figure 1 illustrates how the illumination normal vector is calculated.Point A is the current point, B and C are the neighboring points used to compute the normal vector of point A. The 3-D vectors from A to C and from A to B are computed as follows: The illumination normal vector of point A is obtained by the cross-product of vec1 and vec2: vecNðAÞ ¼ vec1 × vec2 ¼ ½vecN x ðAÞ; vecN y ðAÞ; vecN z ðAÞ: (3) Normalize the illumination normal vector of point A: normalðAÞ ¼ n½n x ðAÞ; n y ðAÞ; n z ðAÞ; where The modulus images of the illumination normal vector of the image pairs, which are used to analyze the illumination normal similarity of the image pairs, are shown in Fig. 2. The features of the illumination normal vector in Fig. 2(b) and 2(d) reflect the high-frequency information of the gray image pairs.The high-frequency information reflects some small-scale details of the image, which is useful for searching the matching pixels in the stereo pair.In this paper, the character mentioned above is utilized, and the illumination normal similarity of the gray image is combined into the ASW method to compute the weights in the support window.

Proposed Algorithm
To assign the support weight more accurately for each pixel in the support window, the similarity measurements are considered.Geng et al. 9 adds the gradient similarity in RGB vector space to the gestalt group proposed by Yoon. 6ere, we propose to compute the support weight by a number of multi-similarity measurements, including color similarity, Euclidean distance similarity, gradient similarity, and illumination normal similarity.The support weight of a pixel in a support window can be expressed by where p, q are the pixels in the reference image, which have the RGB components; q is the pixel in the support window centered at p; Δc p q, Δdis p q, and Δgrad p q are the color difference, spatial distance, and gradient difference between pixel pðx; yÞ ¼ fp R ; p G ; p B g and qðx; yÞ ¼ fq R ; q G ; q B g, respectively; Δn p q is the illumination normal difference between p Gray and q Gray ; p Gray and q Gray are the gray values of p and q, respectively; and τ c , τ d ,τ g and τ n are all constant.Δc p q, Δdis p q, Δgrad p q and Δn pq are calculated by The weights calculated for the pixels in the window between the reference image and the target image are combined in the aggregation step.The dissimilarity E can be expressed by where pd and qd are the corresponding pixels in the target image for p and q with the disparity d; N p and N pd are the support windows centered at p and pd , respectively; and e matching ð q; qd Þ is the pixel-based matching cost between q and qd , which is obtained by Here, e c ð q; qd Þ is the color difference, e dis ð q; qd Þ is the gradient difference, and e n ð q; qd Þ is the illumination normal difference, as follows: where Δc q qd , Δgradx q qd , Δgrady q qd , and Δn q;q d are obtained by Eqs. ( 9)-( 12), and λ c , λ gradx , λ grady , and λ n are constant.Find the best disparity of pixel p by maximizing the dissimilarity function Eð p; pd Þ: where D ¼ fd min;: : : ; d max g is the search range of possible disparities, which is variable for different image pairs.In order to refine the disparity, the consistency check is used to detect matching errors, as follows: where d L ðx; yÞ is the disparity of the pixels regarding the left image as the reference image and d R ðx; yÞ is the disparity of the pixels regarding the right image as the reference image.
Here, d L ðx; yÞ and d R ðx; yÞ are computed separately.The pixels that fail during the consistency check are classified as bad.The support weight for each neighboring pixel in the fixed-size support window centered on the bad pixel is recomputed using the proposed method.The disparity of the pixel with the largest support weight when recomputed is considered to be the disparity of the bad pixel.
4 Experimental Results

Performance Comparison
The stereo image pairs "tsukuba," "venus," "teddy," and "cones," which are provided by the Middlebury stereo benchmark, were used in our experiments.The size of the support window was fixed at 35 × 35 pixels, and the constants τ c ¼ 30, λ c ¼ 40, τ d ¼ 10, λ gradx ¼ 20, λ grady ¼ 10, τ g ¼ 30, 9 τ n ¼ 40, and λ n ¼ 1 were fixed for all the test stereo image pairs.To evaluate the proposed algorithm, we obtained the ground truth provided by Scharstein and Szeliski 10 and the disparity maps of the ASW method by Yoon 6 in the Middlebury stereo benchmark.The subjective quality comparison of the disparity maps is shown in Fig. 3. Figure 3(a) and 3(b) are the color image and the ground truth, respectively; while Fig. 3(c), 3(e), 3(g) and Fig. 3(d), 3(f), and 3(h) are the disparity maps and the bad-pixel images of disparity maps produced by our algorithm, ASW, 6 and ASW-RGB, 9 respectively.The error threshold Th in our experiment was 0.5.We found that the smaller the area of gray and black was, the more accurate the disparity map was. Figure 3(d), 3(f), and 3(h) shows that the disparity map of our algorithm is more accurate than those of ASW 6 and ASW-RGB. 9n order to measure the objective quality of the disparity map, the Middlebury stereo benchmark provides the quality metrics to evaluate the generated disparity map, which can be separated into three parts: all pixels ("all"), nonoccluded regions ("nonocc"), and pixels near depth discontinuities ("disc").When the absolute difference between generated disparity and ground truth is less than Th, the generated disparity value can be considered correct.Tables 1 and 2 show two cases of Th ¼ 1 and Th ¼ 0.5.2][13] The comparison of results is shown in Tables 1 and 2, and the results of the proposed algorithm (ASW-MS) improve the matching accuracy by different degrees.

Influence of the Illumination Normal
In order to analyze the influence of the illumination normal in the algorithm, the experiment with illumination normal (ASW-MS) and without illumination normal (ASW-MS-  6 (f) Bad-pixel image of the ASW approach. 6(g) Disparity map generated by the ASW-RGB approach. 9(h) Bad-pixel image of the ASW-RGB approach. 9

Fig. 1
Fig.1The illustration of calculating the illumination normal vector.

Fig. 2
Fig. 2 The analysis of illumination normal vectors of image pairs.(a) left image (gray); (b) the modulus image of the illumination normal vector of the left image (gray); (c) right image (gray); (d) the modulus image of the illumination normal vector of the right image (gray).

Fig. 3
Fig. 3 The comparison of results.(a) Left image.(b) Ground truth.(c) Disparity map generated by our algorithm.(d) Bad-pixel image of our algorithm.(e) Disparity map generated by the ASW approach.6(f) Bad-pixel image of the ASW approach.6(g) Disparity map generated by the ASW-RGB approach. 9(h) Bad-pixel image of the ASW-RGB approach.9

Table 1
Performance comparison of the proposed method with the Middlebury stereo benchmark (error threshold: 1.0).Conclusions and Future Work In this paper, based on the multi-similarity measure, we present a new ASW matching algorithm that includes color similarity, Euclidean distance similarity, gradient similarity, and illumination normal similarity.The experimental results show that the algorithm proposed here can improve the matching precision compared to other local ASW matching algorithms.In future research, we plan to investigate other similarity measures to improve our method further.