Aggregation functions to combine RGB color channels in stereo matching

: In this paper we present a comparison study between different aggregation functions for the combination of RGB color channels in stereo matching problem. We introduce color information from images to the stereo matching algorithm by aggregating the similarities of the RGB channels which are calculated independently. We compare the accuracy of different stereo matching algorithms and aggregation functions. We show experimentally that the best function depends on the stereo matching algorithm considered, but the dual of the geometric mean excels as the most robust aggregation


Introduction
The stereo matching problem consists in obtaining the three-dimensional information from two bi-dimensional images of the same scene taken from different viewpoints.When an image is taken, the depth of each point in the scene is lost.Therefore, the objective of a stereo matching algorithm is to retrieve this information.
The basis of stereo vision is that a single physical point in the scene is uniquely projected to a pair of image locations.Hence, to obtain the depth from both images, first we have to estimate the correspondence between the pixels in each image.This step consists in identifying the same physical point in both projections to determine the difference between the position in each image.This difference is called disparity.The disparity, together with the parameters of the camera allows us to obtain the depth.
Thereby, the main problem of the stereo matching is the difficulty to find the correspondence correctly.The images are taken from different cameras with different viewing angles.These facts sometimes produce occlusions, perspective distortion, different lighting intensities, reflections, shadows, repetitive patterns, sensory noise, etc.All these facts convert a simple correspondence task in a very difficult one.
For this reason, an ideal configuration of the cameras is usually supposed.That is, they are only horizontally displaced and the focus lines are parallel.This ideal configuration and epipolar geometry allow to restrict the search of one point in the first image to the epipolar line in the second image, reducing the search space and hence, greatly decreasing the computational cost.
An exhaustive overview on stereo matching can be found in [1], while a complete introduction to stereo vision can be found in [2].Stereo matching algorithms can be classified into local and global methods.The local approaches compare the intensity levels of a finite window to determine the disparity for each pixel.These methods use different metrics or similarities to compare intensity levels such as SAD [3], SSD [4] or NCC [5], which are widely applied despite its simplicity due to their low computational complexity [6,7].Global approaches apply some global assumptions about smoothness and try to determine all the disparities at the same time by using different optimization techniques such as, graph cuts [8][9][10], belief propagation [11], etc.These methods usually start from a local disparity estimation.
The utilization of color information, specifically the use of RGB color space, improves the stereo matching results achieved with gray scale images [12][13][14].The extra information provided by color channels removes some ambiguities produced when the information is reduced to gray scale.Therefore, matching can be improved avoiding false correspondence matches, but we will show that this improvement is directly related with the aggregation function considered to add color similarities.Having more information might lead to worse results if it is not handled properly.
In general, there exist several techniques to calculate the disparity map using color informa-tion, but there is no agreement about which is the best color space to work with.In this work RGB representation is used following previous approaches [12,13].
Local search matching algorithms assign to every pixel of the right image a correspondence with another pixel of the left image.Using a local search algorithm with color images we will obtain for each pixel in the right image three correspondences (one per channel) in the left image.Usually these degrees will not match, so we will have different matching scores.A simple and efficient solution is to aggregate the similarity information of channels and then choose as corresponding pixel the one having the largest aggregated similarity value.
In this work we study the performance and influence of different aggregation operators, such as the arithmetic mean, the median, the minimum, etc.To do so, we use several test images from [1] which have been taken using the ideal configuration of the cameras.Our aim is to study their different behaviors among different measures used in stereo matching and image comparison [6,14].We empirically show that using the proper aggregation functions can produce significant better results, whereas using inappropriate ones could decrease the performance.Our objective is to find an aggregation for color similarities that works well whichever method (similarity measure) is used.In this sense, we want to study which aggregation is more robust.While some aggregations excels in some methods or images, our interest reside in finding an appropriate set of aggregations that can be safely used among different metrics and images.
This work is organized as follows: In Section 2 we remain the classical stereo matching algorithm and we present the metrics that we have considered.In Section 3 we present the aggregation operators that we are going to compare in Section 4, where the experimental study is carried out.Section 5 concludes this work.

Stereo matching for color images
In this section, we recall the typical steps of the classical stereo matching algorithm.Afterward, we present the different metrics and similarity measures used in the comparison.

Stereo matching algorithm
Minor changes have to be applied to transform the original stereo matching algorithm for gray scale images to color ones.In the first case, the algorithm computes the similarity of the window surrounding each pixel in the right image with several windows in the left image (considering the epipolar constraint and the maximum disparity).The pixel which surrounding window reaches the largest similarity degree is chosen and used to compute the disparity.
Regarding color images, we simply compute the correspondence between color channels independently, and then we aggregate these correspondence scores (similarity degrees).We can summarize the algorithm for color images as follows (in Fig. 1 we depict an overall view of the method): There are many different versions of the classical stereo matching algorithm.However, most of them use the scheme we have presented.The metric or the similarity measure used is usually the biggest difference between algorithms.Another key-factor of the algorithm is the aggregation function used to aggregate the color similarities.In the following subsection we present seven common similarity measures for stereo matching problem.Then, in Section 3 we present the aggregation functions considered for the empirical study.

Correspondence and similarity measures between windows
There exist several methods to compute the similarity between windows.The results (given by the obtained disparity maps) directly depends on these measures.In this paper, we study several metrics to show the behavior of different aggregations within each method.

Sum of Square Differences (SSD)
SSD [4] computes the matching score as the sum of the square differences between all pixels intensities from left window with respect to right window.Then, the disparity is computed with the one with the lowest value (largest correspondence), which indicates the most similar window.SSD can be expressed as follows: 2  (1) being x, y the position of the pixel, k the displacement of the left window respect to the right window, W the window (size n × m) considered and I r , I l right and left images respectively.

Sum of Absolute Differences (SAD)
SAD [3], computes the disparity in the same way as SSD, but using the absolute differences between pixel intensities instead of the square differences: NCC [5] is expressed by the following formula: The disparity is obtained from the k reaching the maximum value.

Fuzzy similarity (SM FS )
The fuzzy similarity introduce the fuzzy set theory to compute the correspondence between two windows.It is computed with the following expression [12]: where the parameter α = 16 is used generally [12].Disparity is computed with the pixel which similarity measure attains its maximum value.Distance-based similarities are widely used in image processing for image comparison techniques [15].Hence, they are appropriate to compare the correspondence between windows.The smaller the distance is, the greater similarity is obtained.In our experiments we consider two different cases.The first one is based on Minkowski distance d r with r = 1, that is equivalent to Manhattan distance, but in this case the measure is normalized by the sum of the intensities within windows (note that different w.r.t.Eq. ( 2)).We denote this measure as SM M [16,17]: The second one is based on the Kullback distance [18] between fuzzy sets.We denote this similarity as SM K : In both cases, we normalize the intensities of the pixels to the unit interval in such way that we can apply these similarities and then compute the disparity from the largest one.The concept of similarity from union and intersection operations also comes from fuzzy set theory [19], where the similarity between two fuzzy sets can be computed as the division of intersection's cardinality and the union's cardinality.In the same way as in distance-based methods, we normalize the intensities before applying this similarity.Then, the disparity is obtained from the largest output.The expression is as follows:

Aggregation functions
In our experiments we use RGB representation following previous works in this field [12,13].
Similarly to [12] we treat each color channel separately until we aggregate their similarity values.By using color information in the stereo matching algorithm we can avoid some false correspondence produced by color ambiguities.
In [12] they propose to use the minimum as aggregation function for this task, but some inconsistencies can be produced.For example, a pixel with low similarity values in all channels will have a larger matching score than another pixel with a great value of similarity in two channels and the other value near (but under) the similarities of the first pixel.Hence, the minimum would cause some undesirable matches (mismatches).Therefore, it is necessary to study of several aggregation functions to find which one is the most suitable (robust) to aggregate color similarities in the stereo matching problem.Next we present different aggregation functions that we will analyze in the experimental study presented in Section 4.

Note:
We denote a vector of n elements with x = {x 1 , x 2 , . . ., x n }.
• Product • Arithmetic Mean (A-Mean) x i • w i (11) where w = {w 1 , w 2 ,...,w n } is the weight vector that satisfies ∑ n i=1 w i = 1.In our comparison we consider different weight vectors to compute the final similarity: For example if w R = 0.1, w G = 0.8 and w B = 0.1, we obtain If w R = 0, 299, w G = 0, 5870 and w B = 0, 1140, we obtain The weights values of Eq. ( 14) belong to the computation of the luminance of a RGB image [20].The expression of luminance is used to transform RGB color images into gray scale.The purpose of luminance is to represent the brightness of colors just as human perceive them.In this manner, it represents that humans consider the color green brighter than the color blue.
• Harmonic Mean (H-Mean) • Median where x (k) is the k-th largest (or smallest) component of x.

Mode
The mode is the most frequent value in x.
We also consider in our experiments the dual aggregation function of the Geometric and Harmonic means constructed as:

Experimental results
In our experimental study, we compare the behavior of different aggregation functions to aggregate color similarities.We have three main objectives: • To check whether using color information by aggregating the similarities improves the results of using gray scale images.• To study which aggregation fits each correlation method or similarity measure in order to carry out a global analysis.• To study which is the best (most robust) aggregation function in order to combine color similarities among all methods.
In order to evaluate the performance we use the Middlebury test bed proposed by Scharstein and Szeliski [1] (http://cat.middlebury.edu/stereo),which is established as the common benchmark for stereo matching methods and allows one to easily reproduce the results obtained.In this test, the disparity maps obtained by each algorithm are compared with the ideal disparity maps.The test images are shown in Fig. 2 with their corresponding ideal disparity maps.We refer to each image pair by the name given in [1]: "Tsukuba", "Teddy" and "Cones".We have to recall that stereo matching algorithms do not use any type of preprocessing or post processing steps (such as, optimization techniques, occlusion detection or image filtering).As a particular case, we start presenting the quantitative results for the stereo matching algorithm using the fuzzy similarity measure (SM FS , Eq. ( 4)) and different aggregation functions to merge color channels similarities in Table 1.The leftmost column of Table 2 indicates the aggregation function used.Then, we present three columns for each image pair.These columns represent the percentage of absolute disparity error greater than one for three different regions in the image: • no-oc.: only non-occluded pixels are considered.• all: whole image is considered.
• disc.: only pixels near discontinuities are considered.
The rightmost column is the overall performance of the algorithm computed by the arithmetic mean of all other columns.The results are listed following the total error in descending order, and the row corresponding to the stereo algorithm applied to gray scale images is shaded to ease the comparison with respect to the performance using color images.Following Table 1, we can conclude that using color information is beneficial.Notice that the aggregations using color information performing worse than the usage of gray scale images mainly consider one of the coefficients in the aggregation (in the case of the weighted means, the weights give most of the importance to a unique channel).However, these statements are based on a unique similarity measure; hence, we should study whether these conclusions are maintained across all metrics.
Table 2 summarizes the total error obtained for the considered metrics and aggregations.The last two columns present the mean error for each aggregation and the average rank (and rank position) respectively.The average rank is obtained by assigning a position to each aggregation depending on its performance on each metric.The aggregation which achieves the best accuracy in a specific method will have the first ranking (value 1); then, the aggregation with the second best accuracy is assigned rank 2, and so forth.This task is carried out for all metrics and finally an average ranking is computed as the mean value of all rankings.It provides a quick view of the overall behavior of each aggregation with respect to the others.Bold numbers indicate the best aggregation (rank 1) among the method.In addition, Fig. 3 shows the comparison of the best disparity maps obtained using color aggregation (those stressed in bold-face in Table 2) and the ones obtained with gray scale images.
This experiment shows that the optimal aggregation function to be used depends on the stereo matching algorithm considered.However, the previously stated facts are confirmed, the usage of color information can improve the matching as long as all color channels are taken into account.The usage of color information may need a greater computational effort in order to compute the disparity map, but it can be really low since the process can be easily carried out in parallel, whereas the obtained improvements can make a difference.It is remarkable that the total error obtained using the weighted arithmetic mean based on the luminance formula is always better than using gray scale images.Gray scale images are obtained by the RGB luminance formula, and hence, it is advisable to use the three color channels to compute the matching independently, aggregating the matching scores instead of aggregating the information first and then applying the algorithm for a unique color channel.obtain gray scale images and then compute the similarities.That is, color information is useful for the matching process and must not be overlooked.
The experiment has shown that despite the optimum aggregation function depends on the metric used, there are robust aggregations such as the dual of the geometric and harmonic mean, the weighted arithmetic mean based on the luminance formula, the geometric mean or the product which performs properly in all metrics among all images and hence, whose usage can be recommended.

Fig. 1 .
Fig. 1.Stereo matching algorithm scheme using color information from RGB channels.

Table 1 .
Quantitative evaluation results for different aggregation functions to add color similarities using fuzzy similarity measure where *Weighted Mean 262 means that w R = 0.2, w G = 0.6 and w B = 0.2