REDUCTION OF THE FRONTO-PARALLEL BIAS FOR WIDE-BASELINE SEMI-GLOBAL MATCHING

: Semi-Global Matching (SGM) is a widely-used technique for dense image matching that is popular because of its accuracy and speed. While it works well for textured scenes, it can fail on slanted surfaces particularly in wide-baseline conﬁgurations due to the so-called fronto-parallel bias. In this paper, we propose an extension of SGM that utilizes image warping to reduce the fronto-parallel bias in the data term, based on estimating dominant slanted planes. The latter are also used as surface priors improving the smoothness term. Our proposed method calculates disparity maps for each dominant slanted plane and fuses them to obtain the ﬁnal disparity map. We have quantitatively evaluated our approach outperforming SGM and SGM-P on synthetic data and demonstrate its potential on real data by qualitative results. In this way, we underscore the need to tackle the fronto-parallel bias in particular for wide-baseline conﬁgurations in both the data term and the smoothness term of SGM.


INTRODUCTION
Structure from Motion (SfM) and Multi-View Stereo (MVS) are fundamental tasks in Computer Vision and Photogrammetry.In order to obtain dense 3D information about a scene from a set of 2D images, SfM first simultaneously estimates sparse 3D geometry (structure) and camera poses (motion).From this, MVS then reconstructs a dense 3D point cloud.Both steps are based on the establishment of correspondences between images (image matching).While there are approaches that can cope with wide baselines for sparse image matching (Mishkin et al., 2015;Roth et al., 2017), it still remains a challenging problem for dense image matching.
Dense image matching aims at computing the apparent motion between as many individual pixels of two images as possible.In the case of rectified images of a rigid scene, this motion is called disparity.One of the most widely used techniques for dense image matching is Semi-Global Matching (SGM) proposed by Hirschmüller (2005).It is popular because of its accuracy and speed and is, therefore, employed in a broad spectrum of applications ranging from 3D mapping (Hirschmüller, 2008;Rothermel et al., 2012;Kuhn et al., 2017), the navigation of robots and UAVs (Unmanned Aerial Vehicle) (Schmid et al., 2012) to autonomous driving (Franke et al., 2013).SGM has been implemented on different hardware architectures like GPU (Graphics Processing Unit) (Banz et al., 2011) and FPGA (Field-Programmable Gate Array) (Gehrig et al., 2009).While it works well for aerial images and terrestrial images with small baselines and sufficiently textured scenes mainly consisting of fronto-parallel surfaces, its performance drops significantly for wide-baseline images, in particular at higher resolutions, and with slanted, weakly-textured surfaces.
The reason for this is that SGM, just like all other local or window-based methods for dense image matching, has the underlying implicit assumption that the disparity within the window being considered for the calculation of the matching cost is constant (fronto-parallel bias).However, this assumption is only fulfilled if image plane and object plane are parallel (fronto-parallel).This is for instance approximately true for the wall shown in the bottom row of Figure 1.Comparing the two windows marked by the green squares, it is clear that the disparity at this position can be reliably determined: The image patches are almost identical, i.e., all pixels have the same disparity as the center pixel.On the other hand, for the ground shown in the middle row of Figure 1, image plane and object plane are not parallel.If one compares the two windows marked by the orange squares, it is obvious that the disparity at this position cannot be reliably determined: The image patches are hardly similar, as most of the pixels have a disparity different from the center pixel.
The aspect above refers to the data term of SGM describing the cost of matching a pixel at a certain disparity.However, SGM also incorporates a smoothness term that penalizes changes in neighboring pixels' disparity similar to global methods.Since the fronto-parallel bias also occurs in the smoothness term, our main motivation is to reduce the fronto-parallel bias in both the data term and the smoothness term of SGM.
In this paper, we propose an extension of SGM that utilizes image warping to reduce the fronto-parallel bias in the data term, such that the calculation of the matching cost is no longer affected by it.For this purpose, we generate hypotheses for dominant slanted planes, using them for warping the images and as surface priors improving the smoothness term of SGM.We calculate disparity maps for each dominant slanted plane and fuse them to obtain the final disparity map.

RELATED WORK
In this section, we review related work, focusing on dense image matching methods that address the problem of the fronto-parallel bias in general and on SGM-based methods in particular.
Dense image matching methods are usually classified (Scharstein and Szeliski, 2002) into local and global methods.The former, also termed window-based methods, make implicit smoothness assumptions by aggregating the matching cost over a local window.Global methods, on the other hand, make explicit smoothness assumptions and then solve an optimization problem over all pixels.
Among the local methods, several approaches have been proposed that reduce the effect of the fronto-parallel bias through targeted warping of the input images.Burt et al. (1995) recommend to warp the right image of a stereo image pair to align it with a reference plane, such as the ground, before performing dense image matching.They report improved performance at lower computational cost due to the reduced disparity range.Einecke and Eggert (2013) as well as Ranft and Strauss (2014) adopt the idea of warping one of the input images, parameterized by horizontal shear and shift.While the former set the parameters manually, the latter propose a procedure that dynamically generates hypotheses based on the scene structure.Disparity maps from both, the differently warped image pairs and the original image pair, are fused to avoid that the final disparity map deteriorates in image regions not belonging to one of the planes.
Other local methods that aim at reducing the effect of the frontoparallel bias use oriented matching windows that adapt to the scene structure.PatchMatch stereo (Bleyer et al., 2011) initializes each pixel with a random disparity as well as a randomly slanted plane and iteratively propagates these parameters to neighboring pixels.Sinha et al. (2014) perform local slanted plane sweeps around disparity planes that are estimated from sparse feature correspondences.For the final disparity map, each pixel is assigned to one of the local plane hypotheses by an efficient optimization technique based on SGM.Among all the plane-sweeping approaches that succeeded Collins (1996), the one of Gallup et al. (2007) was the first to explicitly handle slanted planes.In (Bulatov et al., 2011), triangular meshes from a sparse point cloud are used to compensate for the fronto-parallel bias.
SGM-based methods address the problem of the fronto-parallel bias either by replacing the unweighted sum over the aggregated cost of each direction in SGM with a weighted sum or by manipulating the penalties of SGM's smoothness term.Michael et al. (2013) introduce both path-dependent weights and penalties resulting in 20 parameters that are optimized by an evolutionary algorithm.Spangenberg et al. (2013) propose to weight the aggregated cost of each direction according to its compliance with the scene structure.While the above two approaches use global weights for each path, Poggi and Mattoccia (2016) predict perpixel weights for each path, using random forests based on several disparity-based features.Random forests are also employed in (Schönberger et al., 2018), where disparity proposals estimated using features based on the aggregated cost of each direction are fused directly.SGM-Net (Seki and Pollefeys, 2017) is a CNNbased (Convolutional Neural Network) method that predicts the penalties of SGM's smoothness term.SGM-P (Scharstein et al., 2017) instead utilizes surface orientation priors to modify the penalties to favor surfaces coinciding with the expected scene structure.
Just like Burt et al. (1995), Einecke and Eggert (2013) and Ranft and Strauss (2014), our approach uses image warping to reduce the fronto-parallel bias.Nevertheless, this is novel in the context of SGM.Arguing that it is necessary to tackle the effect of the fronto-parallel bias in both the data term and the smoothness term, we adopt the approach of Scharstein et al. (2017) and incorporate it into our proposed extension.As we correct the effect of the fronto-parallel bias beforehand, there is no need to introduce a weighted sum in SGM's sum-based aggregation over the paths, such as, e.g., in (Spangenberg et al., 2013).Finally, we note that we fuse disparity maps similarly to Sinha et al. (2014).

ALGORITHM
Before describing our proposed extension, we first give a review of SGM and SGM-P.

SGM and SGM-P
SGM is an efficient algorithm for approximate energy minimization of a 2D Markov Random Field (MRF).It defines the energy function where Cp (d) is a unary data term representing the cost of matching pixel p at disparity d ∈ D = {dmin, . . ., dmax} and V (d, d ) is a pairwise smoothness term penalizing changes in neighboring pixels' disparity: (2) It adds a constant penalty P1 for small changes in disparity and a larger constant penalty P2 for all larger disparity changes.This allows an adaption to slanted surfaces, while preserving discontinuities at the same time.Unfortunately, this also introduces a fronto-parallel bias in the smoothess term.
As minimizing E (D) from Eq. ( 1) is NP-hard, SGM divides the grid-shaped problem into multiple one-dimensional problems that can be efficiently solved via dynamic programming by defining an aggregated cost Lr (p, d) along a path in the direction r: The aggregated cost Lr (p, d) is recursively computed from the image boundaries for eight cardinal directions r and summed up at each pixel, resulting in the aggregated cost volume (4) The final disparity at each pixel is chosen by a winner-takes-all strategy: dp = arg min d S (p, d) . (5) The sum of the minima of the aggregated cost Lr (p, d) of these eight paths represents a lower bound for the minimum of the aggregated cost volume S (p, d) for each pixel p.The difference between these two quantities defines an uncertainty measure Up (Drory et al., 2014): If the minima of the aggregated cost of all eight directions agree (they all occur at the same disparity), then Up equals zero.This is often the case in image regions with textured, fronto-parallel surfaces, where wrong disparities would lead to high matching costs.In image regions with weakly-textured, slanted surfaces, instead, different disparities can cause similarly high matching costs.Therefore, the minima of the aggregated costs probably occur at different disparities, causing Up to be different from (greater than) zero.We use this uncertainty measure to fuse different disparity maps.
In SGM-P, surface priors are utilized to modify the penalties in SGM's smoothness term to favor these surfaces.This is done by first rasterizing a real-valued disparity surface prior S, as SGM uses discrete (integer) disparities: with steps (or jumps) for the discretized disparities Ŝ.The original smoothness term V is then replaced with By this means, the zero-cost transitions coincide with the disparity jumps.As we want to tackle the fronto-parallel bias in both the data term and the smoothness term, we incorporate SGM-P into our proposed extension and feed it with the same hypotheses for dominant slanted planes that we use for warping the images.the images.It is not our goal to improve the disparity map over the entire image by finding as many planes as possible.We aim at improving the disparity map in image regions that could only be poorly reconstructed or are partially or even completely missing due to the fronto-parallel bias by only considering dominant slanted planes.In urban environments, these often are the ground, facade or roof planes.We use RANSAC (Random Sample Consensus) (Fischler and Bolles, 1981;Schnabel et al., 2007) to find these dominant slanted planes in the sparse SfM point cloud.

Generation of
If no sparse point cloud is available, the disparity map D0 calculated with original SGM in the first step of our algorithm (cf.Algorithm 1) is used to search for dominant slanted planes.In this case, the search is performed in disparity space rather than in 3D space.Since we only consider dominant slanted planes, calculating the disparity map D0 is always necessary to obtain a complete disparity map D at the end.Almost fronto-parallel planes, for which the angle between the normal and the cameras' orientation is smaller than α fp (we used an empirically determined angle α fp = 60 • in our experiments), are discarded and not further considered.For these planes, reliable estimates should already be obtained by the disparity map D0.

Improving SGM's Data Term and Smoothness Term
Based on the generated hypotheses for dominant slanted planes Π, we utilize image warping to improve the data term of SGM.In our proposed extension, we particularly warp the right image so that the window which is considered for the calculation of the matching cost coincides with the left image with respect to the considered plane.We use a plane-induced homography (Hartley and Zisserman, 2004).With the left camera placed at the origin and the camera projection matrices P1 = K1 [I | 0] and P2 = K2 [R | t] for the left camera and the right camera, respectively, the plane-induced homography from the left image to the right image is given by for a plane π = n , a with normal n and distance a to the origin.As we map from the right image to the left image, we use the inverse of matrix Hπ from Eq. ( 10).
The generated hypotheses for slanted planes Π are also used to calculate the surface priors to manipulate the smoothness term in SGM to favor these surfaces.The surface prior Sπ for a plane π with the plane equation nxx + nyy + nzz + a = 0 can be calculated in the following way.For a perspective camera with focal length f , we have x = (u − u0) z/f and y = (v − v0) z/f for image coordinates (u, v) with (u0, v0) being the camera's principal point.Substituting these quantities into the plane equation results in For a rectified stereo pair, we also have z = bf /d, where b and d are the baseline between the cameras and the disparity, respectively.The disparity d to be expected for an image point (u, v) lying on plane π is then given by Besides calculating the surface prior Sπ for a plane π to modify the penalties in the smoothness term, our proposed method uses Eq. ( 12) to limit the disparity search space.
For each dominant slanted plane πi ∈ Π = {π1, . . ., πn}, our proposed extension of SGM calculates a disparity map Dπ i as well as an uncertainty map Uπ i using Eq. ( 6).Thus, the frontoparallel bias is reduced in both the data term and the smoothness term.
In order to limit the calculation of the disparity map Dπ i to the corresponding image regions, we estimate the approximate extent of the considered dominant slanted plane πi in the images.For this, we use image segmentation, particularly GrabCut (Rother et al., 2004) applied to the down-scaled images with the foreground pixels being initialized with regions around the 3D points belonging to the considered plane projected into the images.

Fusion of Disparity Maps
In the last step of our algorithm (see Algorithm 1), the disparity maps D0, Dπ 1 , . . ., Dπ n are fused to form the final disparity map D based on the uncertainty maps U0, Uπ 1 , . . ., Uπ n .We follow the idea of Sinha et al. (2014) and formulate this fusion as a pixel labeling problem, where each pixel p has to be assigned to a label l.In our case, these labels l are equivalent to the disparity maps D0, Dπ 1 , . . ., Dπ n .The optimal assignment L is computed by minimizing the energy function where the uncertainty maps U0, Uπ 1 , . . ., Uπ n are used as unary data term.As there is no order among the labels l, the pairwise smoothness term differs from Eq. ( 2) in equally penalizing varying labels between neighboring pixels with a constant penalty P .We also use SGM to efficiently obtain an approximate solution for this optimization problem and the final disparity map D.

EXPERIMENTS
We report a quantitative evaluation on synthetic data as well as qualitative results on real data demonstrating the potential of our proposed method.In this way, we underscore the need to tackle the fronto-parallel bias in both the data term and the smoothness term of SGM, in particular for wide-baseline configurations.

Implementation Details
We compare our approach against SGM and SGM-P, emphasizing the individual contributions of reducing the fronto-parallel bias in the data term and the smoothness term.In order to ensure an unbiased evaluation, we build on the same implementation of SGM for all experiments.We use the OpenCV implementation (Bradski, 2000) extended to allow the Census transform (Zabih and Woodfill, 1994) to be used as a cost function and to calculate uncertainty maps.For all experiments, the employed parameters of SGM's smoothness term are P1 = 8 and P2 = 32.For the data term, we rely on the Census transform calculated on patches of a size of 7 × 7 pixels.Since we use SGM also for the fusion of disparity maps, we note that, in this case, the constant penalty in the smoothness term is P = 32.
When comparing our proposed method against SGM-P, we use identical hypotheses for dominant slanted planes for both.In contrast to (Scharstein et al., 2017), where a single surface prior is derived, we create a surface prior for each dominant slanted plane and fuse the corresponding disparity maps, just like for our approach, to allow a fair comparison.By this means, differences in the final disparity maps from SGM-P and our proposed method express the effect of additionally reducing the fronto-parallel bias in the data term.On the other hand, the effect of reducing the fronto-parallel bias in both the data term and the smoothness term can be seen when comparing the final disparity maps from original SGM and our approach.
If the dominant slanted planes are estimated in the (isotropic) disparity space, the RANSAC parameters are fixed to 0.99 for the confidence threshold and to 2.0 for the distance threshold.For sparse (anisotropic) SfM point clouds, the RANSAC parameters have to be adapted.We discard almost fronto-parallel planes, for which the angle between the normal and the cameras' orientation is smaller than α fp = 60 • (empirically determined).These parameters are the same for SGM-P as for our proposed method.

Quantitative Evaluation
We start by evaluating our approach on the Driving dataset of Mayer et al. (2016).This synthetic dataset inspired by the KITTI dataset (Geiger et al., 2012) provides between 300 and 800 stereo image pairs with a resolution of 960 × 540 pixels for each setup.
Besides the virtual focal length (15 mm or 35 mm), the setups differ in the "speed" they were recorded (fast or slow), causing more or less motion blur and defocus blur.We consider the following four setups, using only the backwards scenes: (35 mm, slow), (35 mm, fast), (15 mm, slow) and (15 mm, fast).As ground-truth disparity maps are available for all setups, we use the Driving dataset to quantitatively evaluate our proposed method.For this purpose, we consider the disparity error, i.e., the percentage of pixels in the image whose disparity differs by more than 2.0 pixels from the ground truth.In Figures 3 and 4, this disparity error is plotted against the image number for the 35 mm and the 15 mm setups, respectively, comparing our approach with SGM and SGM-P.For all four setups, our approach's curve is below that of SGM and SGM-P, indicating that it clearly outperforms the other two.Our proposed method performs virtually never worse than SGM or SGM-P.It performs worst if no dominant slanted planes are found.In this case, our approach just returns the SGM disparity map.This happens several times for image pairs around 150 in the slow setups.Since the fast and slow setups come from the same trajectories, but with different distances, the shape of the curves is similar (cf., e.g., Figures 3a  and 3b).Table 1 shows the mean disparity error reduction ranging from 23 to 35% over SGM and from 8 to 24% over SGM-P.It is evident that the improvement decreases from the 15 mm (wide-baseline) to the 35 mm (small-baseline) setups.This is in particular true for SGM-P.The results prove what intuition tells us: Our approach is particularly suitable for wide-baseline image pairs, whereas for small-baseline image pairs, the improvement over SGM-P is significantly smaller.This is the reason for us refraining from a quantitative evaluation on well-known stereo benchmarks such as KITTI or ETH3D (Schöps et al., 2017) with small-baseline image pairs.Instead, we focus on qualitative results demonstrating the potential of our proposed method on meaningful examples.

Error reduction
Error reduction over SGM [%]

Qualitative Results
Besides examples from the synthetic Driving dataset, we demonstrate the potential of our approach on image pairs from the Middlebury dataset (Scharstein et al., 2014), from the multi-view dataset of Strecha et al. (2008), and from own images.While for the Driving and the Middlebury dataset, dominant slanted planes are estimated in the disparity space, for the others these are estimated in the sparse point cloud obtained using the wide-baseline SfM technique of Mayer et al. (2012) as well as Michelini and Mayer (2016).The resolution is about six megapixels across all image pairs.Two examples from the Driving dataset are shown in Figure 5. Due to the characteristics of the dataset, usually the ground and, more rarely, facade planes are found as dominant slanted planes.
In particular, the strongly distorted image regions in the foreground are completely reconstructed by our proposed method in contrast to SGM.As the CrusadeP image pair from the Middlebury dataset and image pair 1 in Figure 6 prove, our approach is not limited to synthetic KITTI-like data.Nevertheless, our proposed method strongly relies on the scene structure, presuming dominant slanted planes.We aim at reconstructing these image regions, as they are potentially missing in the disparity map due to the fronto-parallel bias.For most of the image pairs from the Middlebury dataset, we did not succeed in finding dominant slanted planes.Please note that, in this case, our approach still returns the SGM disparity map.As use case we mainly concentrate on scenes in urban environments, Figure 7 shows two examples from the multi-view dataset of Strecha et al. (2008), more precisely the fountain-P11 (7,4) image pair and the Herz-Jesu-P25 (15,16) image pair, along with two more examples acquired by us.Our approach significantly improves the disparity maps in image regions belonging to the ground for all four image pairs.In addition, the roof which is largely missing in the disparity map of SGM is reconstructed for image pair 3.

CONCLUSION
We have proposed an extension of SGM that tackles the frontoparallel bias in both the data term and the smoothness term.It utilizes image warping to reduce the fronto-parallel bias in the data term.Hypotheses for dominant slanted planes are generated either from the sparse SfM point cloud or from the SGM disparity map, being used as surface priors to improve the smoothness term.Our approach calculates disparity maps for each dominant slanted plane and fuses them to obtain the final disparity map.
Our proposed method has been quantitatively evaluated on synthetic data, where it outperforms SGM and SGM-P, underscoring the need to tackle the fronto-parallel bias in both the data term and the smoothness term of SGM, in particular for wide-baseline configurations.Qualitative results on real data demonstrate its potential.
As our approach strongly relies on the robust detection of dominant slanted planes, future work includes assisting their detection by semantic image analysis.
Figure 1.Illustration of the fronto-parallel bias: In the bottom row, the image patches marked by the green squares are almost identical, because the disparity inside the window is constant.The disparity can be reliably determined.The image patches marked by the orange squares (middle row) are hardly similar, since image plane and object plane are not parallel.The disparity cannot be reliably determined.

Figure 2
Figure 2 exemplarily shows the individual steps of our approach for the image pair from Figure 1.

Figure 2 .
Figure 2. Individual steps of our approach exemplarily shown for the image pair from Figure 1.

Figure 6 .
Figure 6.Qualitative results for two image pairs.CrusadeP is from the Middlebury dataset, image pair 1 acquired by us.Top: Left image.Middle: Disparity map (SGM).Bottom: Disparity map (our approach).
Input: rectified stereo image pair, sparse SfM point cloud (optional) Output: disparity map D Variables: SGM parameters, RANSAC parameters, α fp Calculate disparity map D0 and uncertainty map U0 with original SGM Generate hypotheses for dominant slanted planes Π with RANSAC from sparse SfM point cloud (or disparity map) discarding almost fonto-parallel planes for each dominant slanted plane πi ∈ Π = {π1, . . ., πn} do Estimate approximate image extent of πi with GrabCut Calculate disparity map Dπ i and uncertainty map Uπ i with our proposed extension of SGM improving the data term by image warping with Hπ i and improving the smoothness term by manipulating penalties according to Sπ i end Fuse disparity maps D0, Dπ 1 , . . ., Dπ n to final disparity map D based on uncertainty maps U0, Uπ 1 , . . ., Uπ n with SGM Algorithm 1.Our proposed method.
Hypotheses for Dominant Slanted PlanesSince our approach (see Algorithm 1) is to be used in the classic SfM/MVS pipeline, we assume that a sparse point cloud is available from SfM.From this sparse point cloud, we generate hypotheses for dominant slanted planes Π and use them for warping

Table 1 .
Mean disparity error reduction over SGM and SGM-P on the Driving dataset.