A Framework for SAR-Optical Stereogrammetry over Urban Areas

Currently, numerous remote sensing satellites provide a huge volume of diverse earth observation data. As these data show different features regarding resolution, accuracy, coverage, and spectral imaging ability, fusion techniques are required to integrate the different properties of each sensor and produce useful information. For example, synthetic aperture radar (SAR) data can be fused with optical imagery to produce 3D information using stereogrammetric methods. The main focus of this study is to investigate the possibility of applying a stereogrammetry pipeline to very-high-resolution (VHR) SAR-optical image pairs. For this purpose, the applicability of semi-global matching is investigated in this unconventional multi-sensor setting. To support the image matching by reducing the search space and accelerating the identification of correct, reliable matches, the possibility of establishing an epipolarity constraint for VHR SAR-optical image pairs is investigated as well. In addition, it is shown that the absolute geolocation accuracy of VHR optical imagery with respect to VHR SAR imagery such as provided by TerraSAR-X can be improved by a multi-sensor block adjustment formulation based on rational polynomial coefficients. Finally, the feasibility of generating point clouds with a median accuracy of about 2m is demonstrated and confirms the potential of 3D reconstruction from SAR-optical image pairs over urban areas.


Introduction
Three-dimensional reconstruction from remote sensing data has a range of applications across different fields, such as urban 3D modeling and management, environmental studies, and geographic information systems.Manifold high-resolution sensors in space provide the possibility of reconstructing natural and man-made landscapes over large-scale areas.Conventionally, 3D reconstruction in remote sensing is either based on exploiting phase information provided by interferometric SAR, or on space intersection in the frame of photogrammetry with optical images or radargrammetry with SAR image pairs.In all these stereogrammetric approaches, at least two overlapping images are required to extract 3D spatial information.Both photogrammetry and radargrammetry, however, suffer from several drawbacks.Photogrammetry using high-resolution optical imagery is limited by relatively poor absolute localization accuracy and cloud effects, whereas radargrammetry suffers from the difficulty of image matching for severely different oblique viewing angles.
On the other hand, the huge archives of high-resolution SAR images provided by satellites such as TerraSAR-X and the regular availability of new data alongside archives of high-resolution optical imagery provided by sensors such as WorldView provide a great opportunity to investigate data fusion pipelines for producing 3D spatial information [1].As relatively few studies have dealt with 3D reconstruction from SAR-optical image pairs [2,3,4], there has been no investigation into the feasibility of a dense multi-sensor stereo pipeline as known from photogrammetric computer vision yet.This paper investigates the possibility of implementing such a pipeline, and describes all processing steps required for 3D reconstruction from very-high-resolution (VHR) SAR-optical image pairs.In detail, this paper discusses both an epipolarity constraint and a bundle adjustment formulation for SAR-optical multi-sensor stereogrammetry first.Regarding the complicated radiometric relationship between SAR and optical imagery, the epipolarity constraint accelerates the matching process and helps to identify reliable and correct conjugate points [5,6].For this objective, we first demonstrate the existence of an epipolarity constraint for SAR-optical imagery by reconstructing the rigorous geometry models of SAR and optical sensors using both collinearity and range-Doppler relationships.We prove that a SAR-optical epipolarity constraint can be rigorously modeled using the sensor geometries.Subsequently, rational polynomial coefficients (RPCs) are fitted to the SAR sensor geometry to ease further processing steps.Consequently, epipolar curves can be established using projection and back-projection from SAR imagery to terrain and then from terrain to optical imagery using RPCs.In addition, the RPCs ease the formulation of multi-sensor block adjustment for SAR-optical imagery.
The block adjustment is used to align the optical imagery with respect to the SAR data.Generally, the absolute geolocalization accuracy of optical satellite imagery is lower than that of modern SAR sensors.Evaluations show that the absolute accuracy of geopositioning using TerraSAR-X imagery is within a single resolution cell in both the azimuth and range directions, and can even go down to the cm-level [7].In contrast, the absolute accuracy of geolocalization using basic WorldView-2 products is generally no better than 3m [8].Consequently, the block adjustment propagates the high geometrical accuracy of SAR data into the final 3D product, thus avoiding the need for external control points.
The main stage of SAR-optical stereogrammetry, however, is a dense matching algorithm for 3D reconstruction [9].In this study, we use the semi-global matching (SGM) method, which incorporates both mutual information and census, as well as their weighted sum as cost functions, in its core.
The remainder of this paper is organized as follows.First, the modeling of SAR sensor geometries with RPCs is explained in Section 2.1.After briefly introducing the epipolarity constraint and its benefits, a mathematical proof of this constraint for SAR-optical image pairs is presented in Section 2.2.In Section 2.3, the application of multi-sensor block adjustment using RPCs for SAR-optical image pairs is introduced.The principle of the SGM algorithm is recapitulated in Section 2.4.Section 3 summarizes experiments and results of our implementation of the SAR-optical stereogrammetry workflow for TerraSAR-X/WorldView-2 image pairs over two urban study areas.Based on these results, the feasibility of stereogrammetric 3D reconstruction from SAR-optical image pairs over urban areas, as well as its advantages and limitations, are discussed in Section 4. Finally, Section 5 presents the conclusions to this study.

SAR-Optical Stereogrammetry
Figure 1 shows the general framework of SAR-optical stereogrammetric 3D reconstruction.Similar to optical stereogrammetry, one grayscale optical image and one amplitude SAR image form a stereo image pair that can be processed by suitable matching methods to find all possible conjugate pixels.However, some important pre-processing steps are required before the matching and 3D reconstruction.Currently, most VHR optical images are delivered using RPCs.Thus, the primary step in the SAR-optical stereogrammetry framework is to estimate the RPCs for SAR imagery as well.This process homogenizes the geometry models of both sensors and simplifies the subsequent processes of SAR-optical block adjustment and establishing an epipolarity constraint.The next phase is to carry out multi-sensor block adjustment to align the optical image to the SAR image.This rectifies the RPCs of the optical imagery with respect to the SAR imagery, thus improving the absolute geolocalization of the optical imagery and correcting the positions of the epipolar curves on the optical imagery.A disparity map is then produced in the frame of the reference image via a dense image matching algorithm such as SGM.From this map, the 3D positions of the points can be determined by reconstructing the geometry of the SAR and optical imagery for a particular exposure.However, the success of the aforementioned framework relies on the possibility of establishing an epipolarity constraint for SAR-optical image pairs.Thus, the existence of the epipolarity constraint for SAR-optical image pairs must be investigated.In the following, the details of each step of the SAR-optical stereogrammetry framework are explained and the potential of using an epipolarity constraint for SAR-optical image pairs is investigated.RPCs are a well-established substitute for the rigorously derived optical imaging model.They are widely used for different purposes such as epipolar curve reconstruction [10], block adjustment [11], space resection-intersection and 3D reconstruction [12,13,14,15,16] or image rectification [17].The relation between the image space and the geographic reference system is created by the rational functions [18] and where r, c are normalized image coordinates, i.e. normalized rows and columns of points in the scene and φ, λ, and h denote the normalized latitude, longitude, and height of the respective ground point.The relationship between normalized and un-normalized coordinates is given by [ where X is the normalized coordinate, X u is the un-normalized value of the coordinate, and X o , S x are the offset and scale factors, respectively.In equations ( 1) and (2), P i (i = 1, ..., 4) are n-order polynomial functions that are used to model the relationship between the image space and the reference system.They can be written as where a i,n (n = 0, 1, ..., 19) are the polynomial coefficients.
For projection from the image space to terrain, the inverse form of the rational function models is used: and For this task, another set of RPCs for inverse projection as well as the terrain height h is needed.
The main reason for using RPCs is to facilitate the computational process of the subsequent processing tasks.Instead of describing the stereogrammetric intersection with a combination of the range-Doppler model for the SAR image and a push-broom model for the optical image, from a mathematical point of view the RPC formulation homogenizes everything to a comparably simple joint model.However, fitting RPCs to a sensor model is challenging in its own way and demands sufficient, well-distributed control points.RPCs are usually calculated with either a terrain-independent or terrain-dependent approach [19].In the terrain-dependent approach, accurate Ground Control Points (GCPs) are used to estimate the RPCs.Thus, the final accuracy of the RPCs depends on the number, accuracy, and distribution of GCPs.
While the terrain-dependent approach is an expensive way of estimating RPCs (and GCPs may not be available for every study area), the terrain-independent method allows RPCs to be estimated without any GCPs [19].Instead, a set of virtual GCPs (VGCPs), which are related to the image through the rigorous imaging model of the respective sensor, are used to approximate the RPCs.The VGCPs are arranged in a grid-shape format on planes located at different heights over the study area, such as depicted in Fig 2 .The resulting cube of points is then projected to the image space, and their corresponding image coordinates are determined by reconstructing the rigorous model.The RPCs can subsequently be estimated using a least-squares calculation.Note that, when using higher-order RPCs, the least-squares estimation suffers from an ill-posed configuration that causes the results to deviate from their optimal values.In these circumstances, a regularization approach based on Tikhonovs method can be employed to obtain acceptable solutions [20].
The RPCs for optical sensors are usually delivered by vendors alongside the image files.For SAR sensors -with the exception of the Chinese satellite GaoFen-3 -however usually only ephemerids and orbital parameters are attached to the data.Therefore, Zhang et al. investigated the generation of RPCs for various SAR sensors based on the terrain-independent approach [21].Their results show that RPCs can be used as substitutes for the range-Doppler equations with acceptable accuracy.In this study, we use this terrain-independent approach for SAR RPC generation.

Epipolarity Constraint for SAR-Optical Image Matching
In most stereogrammetric 3D reconstruction scenarios, the epipolarity constraint facilitates the procedure of image matching by reducing the search space from 2D to 1D [5].The epipolarity constraint always exists for optical stereo images captured by frame-type cameras that follow a perspective projection [22].This phenomenon is illustrated in Fig. 3.For a point p in the left-hand image, the conjugate point in the corresponding right-hand image is located on the so-called epipolar line.This epipolar line lies on the plane passing through both image projection centers (O, O ) and the image point p.It can also be obtained by changing the depth or height of p in the reference coordinate system.While it is known that epipolar lines exist for images captured from frame-type cameras, straightness cannot be ensured for other sensor types [22].We thus refer to epipolar curves instead of epipolar lines to express generality in the remainder of this paper.
With respect to remote sensing, several studies have demonstrated that the epipolar curves for scenes acquired by linear array push-broom sensors are not straight [23,24].For example, Kim [24] used the model developed by Orun and Natarajan [25] to prove that the epipolar curves in SPOT scenes looks like hyperbolas.Orun and Natarajan's model assumes that the rotational roll and pitch parameters are constant during the flight, while the yaw can be modeled by quadratic time-dependent polynomials.Morgan et al. [26] demonstrated that the epipolar curves would not be straight even with uniform motion.
For a SAR sensor, the imaging geometry is completely different from that of optical sensors, as data are collected in a side-looking manner based on the range-Doppler geometry [27].However, the possibility of establishing the epipolarity constraint in stereo SAR image pairs has been investigated by Gutjahr et al. [28] and Li and Zhang [29] for radargrammetric 3D reconstruction.Gutjahr et al. experimentally showed that epipolar curves in SAR image pairs are also not perfectly straight, but can be approximately assumed to be straight for radargrammetric 3D reconstruction tasks through dense matching [28].
In this research, we investigate the epipolarity constraint mathematically and experimentally for the unconventional multi-sensor situation of SAR-optical image pairs.In general, epipolar curves in image pairs captured by frame-type cameras (as shown in Fig. 3)) can be described as [30] l r = F T p (7) where l r refers to the epipolar curve in the right-hand image associated with the image point p on the left-hand image.F is the fundamental matrix, which includes interior and exterior orientation parameters for projecting coordinates between the two images.Similarly, an epipolar curve in the left-hand image can be written as l l = Fp .For push-broom satellite image pairs, the epipolarity constraint can be verified in a similar way, but linear arrays are substituted for a frame image.Furthermore, the fundamental matrix for push-broom sensors is more complex than that for frametype sensors.In the following, inspired by the mathematical proof of the epipolarity constraint for stereo optical imagery given in [5] and using the epipolar curve equation presented in (7), a rigorous epipolar model for SAR-optical image pairs acquired by space-borne platforms will be constructed.For this task, the optical image is considered as the left-hand image and the SAR image is the righthand image.Figure 4 shows the configuration of the SAR-optical stereo case.The points o and s mark the positions of the optical linear array push-broom sensor and the SAR sensor, respectively.Using a collinearity condition, a rigorous model for reconstructing the imaging geometry of linear array push-broom sensors can be expressed as [31] x l = 0 where (x l , y l ) are the coordinates of point p in the linear array coordinate system, f is the focal length, (X o (t), Y o (t), Z o (t)) represents the satellite position at time t in the reference coordinate system, (X, Y, Z) are the ground coordinates of the target point T , λ is the scale factor, and R ω(t)φ(t)κ(t) is the 3D rotational matrix computed from rotations ω(t), φ(t), κ(t) along the three dimensions at time t.Note that the aforementioned rotational and translational components are estimated by time-dependent polynomials.
A rigorous model based on the range-Doppler geometry [32] (displayed in Fig. 4 as well) can also be applied to the SAR imagery.In this model, the slant-range equation is first used to describe the range sphere as [27]: where R is the slant-range and R CT , R CS are the target point and SAR sensor position vectors in the reference coordinate system.C refers to the center of the reference coordinate system.For a given pixel y r in the slant-range SAR scene, equation ( 9) can be reformulated as where R is the slant-range of the target point, c is the velocity of light, t 0 , t are the one-way signal transmission times for the first range pixel and range pixel coordinate y r , respectively, and f r is the range sampling rate.R 0 gives the slant-range for the first range pixel and Γ = c 2f r .
The second equation describes the geometry of the Doppler cone: where f D is the Doppler frequency, λ r is the SAR signal wavelength, V is the velocity vector and • denotes the inner product operator.
For the epipolarity constraint, we assume that the target point T is imaged at time τ by the pushbroom sensor.If we fix the time variable t with τ and consider the corresponding image coordinate in the linear array coordinate system as (0, y l , f ), we can use equation ( 8) to back-project from the linear array coordinate system to the terrestrial reference system as follows: where M ωφκ = R T ωφκ .For imaging time t, all time-dependent parameters are estimated under the constraint t = τ .Thus, for this specific instance, the variable index t is eliminated from equation (8).By expanding equation ( 12) and removing the scale factor effect, we have: where m ij are elements of matrix M ωφκ .
If the velocity vector of the SAR sensor is computed in the zero-Doppler frequency transition, we can reformulate equation (11) as: where (V x (t), V y (t), V z (t)) are the components of velocity vector V. Hence: From equations ( 13) and ( 14), we can derive.
Multiplying both sides of the equations ( 17) and ( 18) by −V x (t) and −V y (t), respectively, and then combining them with equation ( 16), Z can be calculated as follows: Changing the position of target point T in the Z direction is equivalent to changing the corresponding image coordinates on the epipolar curve.Consequently, the determination of image coordinates in the SAR scene can be realized by tracking the sensor positions in the respective instances.In fact, in spite of fixing the position of the target point for the optical sensor at time τ , the location components (X s (t), Y s (t), Z s (t)) and the velocity components of the SAR sensor (V x (t), V y (t), V z (t)) are time-dependent and can be estimated for each instant using time-dependent polynomials.For the sake of simplicity, the SAR sensor trajectory can be approximated by a linear motion model.Thus, the velocity components (V x (t), V y (t), V z (t)) will remain constant with time at (X s (t), Y s (t), Z s (t)) i.e., the acceleration is 0 and the location components (X s (t), Y s (t), Z s (t)) can be calculated using linear time-dependent functions: where (X s 0 , Y s 0 , Z s 0 ) is the position of the SAR sensor at initial time t 0 .The time t a is the azimuthal time, which can be expressed according to the line coordinate x r in the SAR scene as where P RF is the pulse repetition frequency in Hz.
Substituting the parameters expressed in equations ( 20) and ( 21) into (19), we obtain: Equation ( 22) can be simplified to: and substituting ( 23) into ( 17) and (18) gives: Equations ( 24) and ( 25) can be written in the form: The slant-range represented by equation ( 10) can be reformulated as: Substituting ( 26), ( 27), ( 23) and ( 20) into (28), we have: For simplicity, setting which can be expanded to yield: this can be rewritten as: Equation ( 32) is a general rigorous model representing the epipolarity constraint for SAR-optical image pairs based on their imaging parameters contained in F 0 , F 1 , F 2 , Γ and R 0 .This shows that an epipolarity-like constraint can be established for SAR-optical image pairs.However, the non-linear relation between y r and x r in equation (32) shows that SAR-optical epipolar curves are not straight, even under the assumption of linear motion for the SAR system.In Section 3.3, this epipolarity constraint will be experimentally investigated for an RPC-based imaging model.

SAR-Optical Multi-Sensor Block Adjustment
As illustrated in Fig. 1, the main step before implementing dense image matching is to align the optical image to the SAR image.This process is performed using a multi-sensor block adjustment which is based on RPCs instead of rigorous sensor models as proposed in [11].The block adjustment process improves the relative orientation between both images fixed to the more accurate SAR image orientation parameters.Through the block adjustment, the bias components induced by attitude, ephemeris, and drift errors in the optical image are compensated [33].
The main bias compensation for the RPCs of the optical image involves translating the locations of the epipolar curves to accurate positions using the SAR geopositioning accuracy.Generally, designing an appropriate function for modeling the existing bias in the RPCs given by the optical image depends on the sensor properties [34], but for most sensors an affine model can be applied [35].Even for the current generation of VHR linear push-broom array sensors such as WorldView-2, employing only the shift parameters will be sufficient.The affine model for RPC bias compensation can be formulated as where x o ,y o represent column and row of tie points in the optical images and m i and n i (i = 0, 1, 2) are unknown affine parameters to be estimated through the block adjustment procedure.Note that tie points are the common points between the SAR and optical images, and can be obtained by manual or automatic sparse matching between two images.Since the automation of the tie point generation process is not the focus of this study, we refer the reader to possible solutions described in [36,37,38].The geographic coordinates of the tie points in the SAR image are calculated by the inverse rational functions computed for the SAR imagery as described in Section 2.1: and where λ i and φ i are the normalized longitude and latitude of tie point i with normalized image coordinates x i s and y i s (index s denotes the SAR scene), and here, H is a constant, e.g., the mean height of the study area.f s and g s are inverse rational functions computed for the SAR sensor to project the tie points from the SAR image to the reference system.The output is a collection of GCPs that can be applied for the RPC rectification of the optical imagery.The resulting GCPs are then projected by the rational function associated with the optical images to give the image coordinates of the GCPs: and where c i o , r i o are the normalized image coordinates of tie point i computed by the forward rational functions of the optical sensor, f o and g o .
From equations (33), (36), and (37), the primary equations for SAR-optical block adjustment are formed as: and where, x i o and y i o denote the column and row of tie point i in the optical scene, and c i ou and r i ou are the un-normalized coordinates of the tie point after projection and back-projection using the RPCs.The block adjustment equations can then be written as: and Finally, through an iterative least-squares adjustment [11], the unknown parameters m i and n i are estimated and the affine model can be formed.This affine model is added to the rational functions of the optical image to improve the geolocation accuracy to that of the SAR image.

SGM for Dense Multi-Sensor Image Matching
The core step in a stereogrammetric 3D reconstruction workflow is the dense image matching algorithm to obtain the disparity map, which can then be transformed into the desired 3D point cloud.Generally, there are two different dense matching rationales that can be used according to whether local or global optimization is more important [39].For the case of global optimization, an energy functional consisting of two terms is established to find the optimal disparity map [6]: where E data (d) is a fidelity term that makes the computed disparity map consistent with the input image pairs, E smooth (d) considers the smoothness condition for the disparity map, and λ is a regularization parameter that balances the fidelity and smoothness terms.For a given image pair, the disparity map is calculated by minimizing the energy functional in (42).The main advantage of global dense matching over local matching methods is greater robustness against noise [39], although most existing algorithms for global dense image matching have a greater computational cost [40].
For the experiments presented in this paper, we use the well-known SGM method [40], which offers acceptable computational cost and high efficiency, and performs very similarly to global dense image matching.

Cost Functions used in SGM
In this study, the ability of performing dense image matching for SAR-optical image pairs using SGM is investigated.For this purpose, two different cost functions, namely Mutual Information (MI) and Census, as well as their weighted sum, are examined for the dense matching of high-resolution SAR and optical imagery.Typically, the similarity measures employed in the cost function are either signal-based or feature-based metrics [41].Classically, signal-based similarity measures such as Normalized Cross Correlation (NCC) and MI are preferable to feature-based similarity measures when used in dense image matching algorithms because of faster calculation.
Among the signal-based matching measures, MI was recommended for SGM as it is known to perform well for images with complicated illumination relationship, such as SAR-optical image pairs [36].
Another similarity measure used in the SGM cost function is Census, which actually acts as a nonparametric transformation.The weighted sum of MI and Census is beneficial for 3D reconstruction in urban areas, especially for reconstructing the footprints of buildings to produce sharper and clearer images [42].The weighted similarity measure can be defined as where α changes from 0 to 1 to weigh the effect of Census cost in relation to MI.

SGM Settings for Efficient SAR-Optical Dense Matching
To increase the efficiency of the SGM performance, some important settings for the dense matching of SAR-optical image pairs must be considered.The basic principle of 3D reconstruction by dense matching is to use the epipolarity constraint to limit the search space.Usually, before dense matching, normal images are created by resampling the original images according to epipolar geometry [5,10].In this study, we use the RPC model to realize the SAR-optical epipolar geometry implicitly without the need to generate normal images.This is done by implementing projections and back-projections from the reference image to the ground and back to the corresponding image, respectively, for a specified height range using rational functions.Then, the search for computing disparities can be performed along the thus-created epipolar curves.
In addition, the minimum and maximum disparity values should be selected to restrict the length of the search space along the epipolar curves.In general, there is more flexibility regarding the selection of this disparity interval for optical image pairs than for SAR-optical image pairs, and using unsuitable values will result in more outliers.The minimum and maximum disparity values can be determined using external data such as the SRTM digital elevation model, which is available for most land surfaces around the world [43].For sake of exploiting the simplicity offered by comparably flat study scenes, we just add and subtract 20-m height differences to the mean terrain heights of the study scenes to obtain the disparity thresholds.
The next setting is to switch off the minimum region size option in the SGM algorithm, which is usually used to decrease the noise level in the stereogrammetric 3D reconstruction of optical image pairs by eliminating isolated patches from the disparity map based on their small size.Experimental results show that, for SAR-optical image pairs, the complex illumination relationship between the images and the different imaging effects (especially for urban areas) make the minimum region size criterion useless, as connectivity cannot be ensured in the disparity map.
Similar to other dense matching cases, we use the LR (Left-Right) check to investigate binocular half-occlusions [44].This strategy changes the reference images from left to right, and consequently produces two disparity maps that can be checked against each other.To reach sub-pixel accuracy, the disparity in each point is estimated by a quadratic interpolation of neighboring disparities.
In this study, SGM is implemented at four hierarchy levels and the aggregated cost is calculated along 16 directions around each point.

Study Areas and Datasets
We selected two study areas, one in Berlin and one in Munich (both located in Germany), to investigate the potential for 3D reconstruction from high-resolution SAR-optical image pairs over urban areas.The locations of TerraSAR-X and WorldView-2 images are displayed in Figs. 5 and  6.The properties of the image pairs for each study area are presented in Table 1.In order to enhance the general image similarity and facilitate the matching process, all images were resampled to 1 m × 1 m pixel spacing and the SAR images were filtered with a non-local speckle filter.After implementing bundle adjustment for both datasets, two sub-scenes (with a size of 1000 × 1500 pixels each) from overlapped parts of the study areas were cropped.These sub-scenes are displayed in Figs.7 and 8.

Validation of RPCs to Model SAR Sensor Geometry
As described in Section 2.1, RPCs can be used as a substitute for the rigorous range-Doppler model, similar to the standard RPCs delivered with optical imagery.This step is performed to simplify the multi-sensor block adjustment and epipolarity constraint construction.The accuracy of the RPCs can be estimated using independent virtual checkpoints that are produced in a similar way to VGCPs using the range-Doppler model.The word independent implies that the virtual checkpoints are never used in the RPC fitting, i.e., they are located in different positions respective   to the VGCPs.The accuracy of the fitted RPCs for the TerraSAR-X data in each study area is listed in Table 2. Analysis was performed based on the residuals of the rows and columns, given by the differences row RP C −row DR and col RP C −col DR , i.e., the differences between image coordinates computed by RPCs and range-Doppler.The analysis results confirm that the RPCs can model the range-Doppler geometry for TerraSAR-X data to within a millimeter, and can thus well be used in the 3D reconstruction process.

Validity of the Epipolarity Constraint
A general model that proves the epipolarity constraint for SAR-optical image pairs was described in Section 2.2.It was also concluded that epipolar curves are usually not straight.Experimentally, the epipolarity constraint for SAR-optical image pairs can also be modeled based on RPCs.In this paper, we evaluate the epipolarity constraint for TerraSAR-X and WorldView-2 image pairs acquired over the two study areas.

Existence of Epipolar Curves
We analyzed the validity of the derived SAR-optical epipolarity constraint for an exemplary point located at the corner of the Munich central train station building (p).This point was projected to the terrain space by changing the heights in specific steps, e.g., 10 m, starting from the lowest possible height and proceeding to the highest possible height in the scene (for this experiment we used the interval [0 m, 1200 m]).The output will be an ensemble of points with different heights, such as depicted in Fig. 9(c).All these points were then back-projected to the WorldView-2 image space using RPCs.The corresponding epipolar curve for all possible heights in the study area is constructed by connecting the image points obtained in this way, as shown in Fig. 9(c).Although the epipolar curve appears to be straight, more analysis is required to determine whether this is the case.By expanding the image, it can be seen in Fig. 9(b) that the epipolar curve nearly passes through the conjugate point of p in the WorldView-2 image.Similarly, another experiment was carried out using the Berlin dataset.The epipolar curve was constructed for an exemplary point located on the corner of a building and for all possible heights in the scene.Figure 10(d) displays the position of the selected point on the SAR image as well as the corresponding epipolar curve in the WorldView-2 imagery (Figs.10(e) and 10(f)).

Straightness of Epipolar Curves
To clarify the straightness of the epipolar curve constructed for point p, linear and quadratic polynomials were fitted to the image points of the epipolar curve.Figures 11(a      vary between -0.15 m and 0.25 m.Both analyses illustrate that the constructed epipolar curves are not straight.However, their curvatures do not exceed more than one pixel over the whole possible range of heights in these scenes.

Conjugacy of Epipolar Curves
To investigate the conjugacy of the SAR-optical epipolar curves, two distinct points (q 1 ,q 2 ) were selected from the epipolar curve in the WorldView-2 image.From each of these points, the corresponding epipolar curves were constructed in the TerraSAR-X image for all possible heights as in the experiments before.Figure 12 displays the corresponding epipolar curves in TerraSAR-X given by q 1 and q 2 located in the WorldView-2 image.The epipolar curve appears to pass through point p located in the SAR image.Further analysis clarifies that the differences in the column direction between the two epipolar curves passing through point p are less than one pixel, allowing the matching of the two epipolar curves (Fig. 13(a)).In addition, Fig. 14(a) shows that the gradients of the two epipolar curves change at each point, as illustrated by the column index, whereas the maximum difference is less than 0.001, i.e., 0.1%.This indicates that the epipolar curves can be assumed to be parallel.The gradient changes at each point also confirm that the epipolar curves in TerraSAR-X are not perfectly straight.
In a similar manner, the conjugacy of the epipolar curves was evaluated for the Berlin dataset.Figure 13(b) shows that the difference between the epipolar curves is less than one pixel, so these lines can be paired.Similarly, the maximum difference between the slopes of the epipolar curves is less than 0.2%, which confirms the possibility of epipolar curve conjugacy (Fig. 14(b)).
From the above investigations and discussions, it is clear that the epipolarity constraint can be established for SAR-optical image pairs such as those from TerraSAR-X and WorldView-2 data.As expected, the epipolar curves are not perfectly straight and there are tiny differences between the epipolar curve in one image produced from points on the epipolar curve in the other image.However, analyses show that the epipolar curves can be approximated as straight lines without sacrificing too much, and that they can be paired together well.This means that the epipolarity constraint can be used to ease the subsequent stereogrammetric matching process.

Use of Block Adjustment
As discussed in Section 3.3 and mathematically proved in Section 2.2, the epipolarity constraint can be established for a SAR-optical image pair.However, the positions of epipolar curves in the optical image can be placed in a more accurate position by exploiting the high geolocalization accuracy of the SAR image through a multi-sensor block adjustment.The experiments described in Section 3.3 demonstrate that, for the case of WorldView-2 imagery, the curvature of the epipolar curves does not exceed one pixel, and using only two bias terms as shifts in the column and row directions suffice to modify the position of the epipolar curves.
Implementing block adjustment using RPCs requires some conjugate points to be assigned as common tie points between the target (WorldView-2) and the reference (TerraSAR-X) images.Theoretically, just one tie point would be sufficient to estimate the bias in the least-squares adjustment based on equation (33) (two unknowns and two equations), but using more redundancy and incorporating more tie points allows for more accurate estimations of the bias parameter.For  this experiment, eight and six tie points were selected to match the WorldView-2 images to the TerraSAR-X images in the Munich and Berlin study areas, respectively.The block adjustment equations were then established as described in Section 2.3.During the iterative least-squares adjustment, tie points with residuals exceeding a threshold were removed from the full adjustment process.Figures 15(a 3 presents the bias of the row and column components resulting from the block adjustment of WorldView-2 and TerraSAR-X image pairs for both study areas.As the SAR image was selected as the reference to which the optical imagery was aligned, the bias components for the reference SAR imagery are zero.The quality of block adjustment has been evaluated by calculating the positional errors of tie points under projection and reprojection from the SAR image to terrain and from terrain to the optical image, and vice versa.Statistical metrics such as the standard deviation (STD), median absolute deviation (MAD), and minimum (Min)/maximum (Max) errors were calculated.The results illustrate that the major bias component is in the row direction in both study areas and that the epipolar curves will mostly shift in this direction after block adjustment.Figures 16(a) and 16(b) display the locations of the epipolar curves before and after adjustment.By enlarging the images, it is possible to confirm that the displacement of the epipolar curves is minimal, yet noticeable.To verify the success of SAR-optical block adjustment, we require highly accurate GCPs, which are not available for the study areas.However, we evaluated the accuracy of block adjustment in sub-scenes of the two study areas with the assistance of available LiDAR point clouds.First, we  manually found some matched points that had been measured in the SAR and optical sub-scenes, similar to the tie point selection step.Next, the measured points located in the optical imagery were projected to the terrain using the corresponding reverse rational polynomial functions (f o (c, r, h) and g o (c, r, h)).To ensure exact back-projection, the height h of each point was extracted from the available high-resolution LiDAR point clouds of the target sub-scenes.To overcome the noise in the LiDAR data, we considered neighboring points around the selected measured point, and the final height of the target point was selected based on the mode of the heights in the considered neighborhood.The resulting ground points (f o (c, r, h), g o (c, r, h), h) were then back-projected to the SAR scene using the forward RPCs fitted to the SAR imagery.Finally, comparing the image coordinates of the measured points on the SAR imagery (from manual matching) with their coordinates derived by projection from the optical to the SAR imagery using RPCs and LiDAR data provides an evaluation of the SAR-optical block adjustment performance.The residuals can be calculated as: where (dc, dr) is the column and row difference between the measured point (c m s , r m s ) located on the SAR imagery and the corresponding coordinates given by the projection from the optical to the SAR imagery using RPCs.
Table 4 presents some statistical analysis on residuals calculated according to equation (44) for two states: using the original WorldView-2 RPCs for the projections and using the WorldView-2 RPCs modified with respect to the TerraSAR-X SAR imagery.The results demonstrate the successful implementation of RPC-based multi-sensor block adjustment for SAR-optical image pairs.This means that the existing bias in the RPCs of optical imagery such as WorldView-2 can be modified according to high-resolution SAR imagery such as TerraSAR-X to improve the absolute geolocalization accuracy of the optical imagery, and consequently the modification of epipolar curves in stereo cases.

Dense Matching Results
The output of dense matching by SGM is a disparity map that is calculated in the frame of the reference sensor geometry.This disparity map should be transferred from the reference sensor geometry to a terrestrial reference coordinate system such as UTM.The difference of SAR and optical observation geometries and the lack of jointly visible scene parts means that stereogrammetric 3D reconstruction leads to sparse rather than dense point clouds over urban areas.Figures 17(a The accuracy of these sparse point clouds was compared to that of reference LiDAR point clouds with densities of 6 and 6.5 points per square meter acquired by airborne sensors over Munich and Berlin, respectively. Different approaches can be used to assess the accuracy of point clouds.The simplest way is to calculate the Euclidean distance to the nearest-neighbor of each target point in the reference point cloud [45].This strategy, however, should only be used when both point clouds are very dense.We therefore used another approach, which is based on fitting a plane to the k (here is 6) nearest neighbors of each target point in the reference point cloud [46].The perpendicular distance from the target point to this plane is then the measured reconstruction error.To speed up the process of point cloud evaluation, an octree data structure is used for the binary partitioning of both reconstructed and reference point clouds [47].The measured distances between both point clouds are decomposed into three components that represent the accuracy of the reconstructed points along the X, Y , and Z directions.Table 5 summarizes the mean, STD, and Root Mean Square Error (RMSE) of the distances along the different axes.In addition, histograms of the Euclidean distances between reconstructed points and k-nearest neighbors-based reference planes are depicted in Figs.18 and 19, while the corresponding metrics are summarized in Tab. 6.In order to also provide an outlier-free accuracy assessment, we additionally show results corresponding to point clouds that were cleaned by removing points deviating from the SRTM model by more than 5m.
Finally, Figs.20 and 21 display high-accuracy points (i.e.those with a Euclidean distance of less than 1m to the reference) achieved by SGM dense matching for TerraSAR-X-WorldView-2 image pairs in Munich and Berlin, respectively.

Feasibility of SAR-Optical Stereogrammetry Workflow
The results described in the previous section demonstrate the potential of the proposed SARoptical stereogrammetry framework.Our analyses show that all primary steps involved in SARoptical stereogrammetry, such as RPC fitting, epipolar-curve generation, and multi-sensor block adjustment, can be successfully implemented for VHR SAR-optical image pairs.In addition to the mathematical proof of the existence of an epipolarity constraint for arbitrary SAR-optical image pairs in Section 2.2, the experimental results have illustrated the validity of establishing an epipolarity constraint by showing that SAR-optical epipolar curves are approximately straight.Using RPCs for both sensor types paves the way for the implementation of stereogrammetry.As a result, estimating the RPCs for SAR imagery is a prerequisite for SAR-optical stereogrammetry.The RPCs delivered with optical imagery must be improved with respect to the SAR sensor geometry using RPC-based multi-sensor block adjustment.The block adjustment aligns pairs of SAR and optical images and improves the absolute geopositioning accuracy of the optical imagery.This ensures that the epipolar curves pass through the correct positions of conjugate points.Applying a dense matching algorithm such as SGM then produces a disparity map.

Potential and Limitations of SAR-Optical Stereogrammetry
As discussed in Section 3.5, the dense matching of TerraSAR-X/WorldView-2 imagery produces a sparse point cloud over each of the urban study areas.However, the resulting point clouds are affected by a significant amount of noise because of the difficult radiometric and geometric relationships between the SAR and the optical images.Hence, the SGM algorithm struggles to find the exact conjugate points.On the one hand, this is related to the similarity measures employed in this prototypical study.The influence of similarity measures on the height accuracy of the Munich point cloud is shown in Fig. 22.
The RMSE of the estimated heights decreases when Census and MI are combined and used as a weighted sum cost function, although the number of outliers increases.Identifying the optimum weighting to balance the percentage of outliers against the height accuracy is impractical, because the output disparity maps are rather sparse; a visualization would not be helpful for this task.In stereogrammetric 3D reconstruction using optical image pairs, visualizing the disparity map enables the weight value to be tuned so as to preserve the edges and sharpness of building footprints, whereas in the SAR-optical case, there is no perfectly dense disparity map.Using Census alone produces points with higher accuracy than the MI-only results, but a higher percentage of outliers.In general, a similarity measure specifically designed for SAR-optical matching is required.
On the other hand, the reconstruction suffers from the fact that the SGM search strategy is designed for relatively simple isotropic geometric distortions and was not adapted to the peculiarities of SAR-optical matching yet.Therefore, the differences in the imaging geometries of SAR and optical sensors in terms of their off-nadir and horizontal viewing angles can further decrease the matching accuracy.For example it is known from previous research that the optimal geometrical condition for SAR-optical stereogrammetry would be an image pair acquired with similar viewing geometries [48].This, however, would make the geometrically induced dissimilarities in the images even larger and render the matching more complicated.If both sensors were at the same position, and thus would share the same viewing angle, the intersection geometry would be perfect [48].However, due to the different imaging geometries, elevated objects would appear to collapse away from the sensor in the optical image, while they would appear to collapse towards the sensor in the SAR image.Thus, the choice of a good stereo geometry will always need to be a trade-of between image similarity and favorable intersection angle in the SAR-optical case.
Last, but not least, many points cannot be sensed by a nadir-looking optical sensor but are well observed by a side-looking SAR sensor, such as points located on building facades.As has been shown before, the joint visibility between SAR and optical VHR images of urban scenes can be as low as about 50% [49].In the present study, the situation was most complicated in the Berlin case, because of differences in both the horizontal viewing directions and the off-nadir angles.The horizontal viewing direction of the WorldView-2 sensor was approximately north-south, whereas that of TerraSAR-X was east-west.This affected the visibility of common points between the two images during 3D reconstruction negatively.Consequently, most of the reconstructed points are located on the flat areas or outlines of buildings that are observed by both sensors (see Figs. 20 and 21).
Finally, some differences in the image pairs may be caused by the interval between the acquisition times of the WorldView-2 and TerraSAR-X data (5 and 3 years for Munich and Berlin, respectively).This can cause the matching process to fail in problematic areas, thus affecting the quality and density of the disparity maps.
In spite of the differences in sensor geometries, acquisition times, and illumination conditions between the two SAR-optical image pairs, the quantitative analysis demonstrated in Section 3.5 shows that 25% of all points are reconstructed with clear sub-pixel accuracy, while the median accuracy lies at about 1.5 to 2m.The experiments also show that the results can be further improved by filtering outliers from the reconstructed point clouds.In this study, we employed the globally available SRTM DEM as prior knowledge for outlier removal.As Tab. 6 shows, discarding points with a height difference to SRTM greater than 5m improves the results significantly.
Of course, this simple filtering strategy will probably also remove some accurate points that just deviate a lot from the SRTM DEM (e.g.newly built skyscrapers).In conclusion, a more sophisticated algorithm should be developed for removing noise and outliers from derived point clouds in the future.Nevertheless it can be confirmed that the SAR-optical stereo results have the potential to provide both higher accuracy and higher point density than the SRTM data, making SAR-optical stereogrammetry another possible means for 3D reconstruction in remote sensing.

Conclusion
In this study, we investigated the possibility of stereogrammetric 3D reconstruction from VHR SAR-optical image pairs by developing a full 3D reconstruction framework based on the classic photogrammetric workflow.First, we analyzed all prerequisites for this task.The main requirement for SAR-optical stereogrammetry is to establish an epipolarity constraint to reduce the search space of the matching process.We mathematically proved that the epipolarity constraint can be established for SAR-optical image pairs.Furthermore, experimental analysis demonstrated that the epipolarity constraint can be employed for SAR-optical image pairs such as those from TerraSAR-X/WorldView-2, and showed that the epipolar curves are sufficiently straight.Because of the limited accuracy of the RPCs delivered with optical data, the relative orientation between both images can be improved with respect to the more accurate SAR orientation parameters using multi-sensor block adjustment.This shifts the epipolar curves toward their correct positions.An SGM-based dense matching algorithm was implemented using the MI and Census similarity measures, as well as their weighted sum.The outputs were sparse point clouds with a median accuracy of about 1.5 to 2m and the 25%-quantile of best points well in the sub-pixel accuracy domain.Finally, SRTM data were used to remove outliers from the point clouds.This improved the accuracy of the point clouds further.Overall, this study has demonstrated that a 3D reconstruction framework can be designed and implemented for SAR-optical image pairs over urban areas.Future work will have to focus on the development of similarity metrics specific to the multi-sensor matching problem, and on an adaption of the semi-global search strategy that accounts for the anisotropic geometric distortions between SAR and optical images.

Figure 1 :
Figure 1: Framework for stereogrammetric 3D reconstruction from SAR-optical image pairs

Figure 2 :
Figure 2: Procedure of estimating RPCs by terrain-independent approach

Figure 4 :
Figure 4: Imaging geometry for configuration of SAR-optical imagery

Figure 5 :
Figure 5: Display the location of SAR-optical image pairs of the Munich study area.The red and blue rectangles identify areas covered by the WorldView-2 and TerraSAR-X images, respectively, and the black rectangle displays the study subset selected for stereogrammetrix 3D reconstruction over Munich.

Figure 6 :
Figure 6: Display the location of SAR-optical image pairs of the Berlin study area.The red and blue rectangles identify areas covered by the WorldView-2 and TerraSAR-X images, respectively, and the black rectangle displays the study subset selected for stereogrammetrix 3D reconstruction over Berlin.

Figure 7 :Figure 8 :
Figure 7: Display of SAR-optical sub-scenes extracted from Munich study areas (The left-hand image is from WorldView-2, right-hand image is from TerraSAR-X) ) and 11(b) represent the least-squares residuals with respect to the point heights for the epipolar curves created in both study subsets.The residuals of the linear fit for the epipolar curve established for the Munich WorldView-2 image range from -0.25 to 0.1 pixels (i.e.meters), whereas the residuals of the quadratic fit are close to zero.Similar results were given by fitting linear and quadratic polynomials to the image points of the epipolar curve established in the WorldView-2 image of Berlin.Figure11(b) clearly shows that the residuals of the epipolar points fitted to the quadratic model are zero, whereas those of the linear fit

Figure 9 :
Figure 9: Epipolar curves for the WorldView-2 image (of Munich) given by changing the heights of point p (located at the corner of the Munich central train station in the TerraSAR-X scene) for all possible height values in the image scene.The epipolar curves look like straight, but are not actually straight.

Figure 10 :
Figure 10: Epipolar curves for the WorldView-2 image (of Berlin) given by changing the heights of point p (a distinct corner of a building in the Berlin TerraSAR-X scene) for all possible height values in the image scene.The epipolar curves look like straight, but are not actually straight.

Figure 11 :
Figure 11: Linear and quadratic polynomials fitted on the epipolar curves in the WorldView-2 images

Figure 12 :
Figure 12: Corresponding epipolar curves in the Munich TerraSAR-X image (left) derived from two points, q1 and q2 on the epipolar curve of the Munich WorldView-2 image (right).

Figure 13 :
Figure 13: Difference of two corresponding epipolar curves over the column direction.The maximum difference between the two epipolar curves is less than one pixel.

Figure 14 :
Figure 14: Gradients of two epipolar curves constructed from q1 and q2 in TerraSAR-X.
) and15(b)  show the residuals of the full multi-sensor block adjustment for each tie point.The results demonstrate that the residuals of most points are less than one pixel in both experiments, which indicates a successful implementation of SAR-optical block adjustment.Table

Figure 15 :
Figure 15: Residuals of tie points after full multi-sensor block adjustment in the TerraSAR-X image space.

Figure 16 :
Figure 16: Displacement of epipolar curves after block adjustment by RPCs.Left images show the epipolar curve positions before and after the bundle adjustment and right images display the selected patch (identified by dashed yellow rectangles) in an enlarged image.
) and 17(b) display the reconstructed point clouds from SAR-optical sub-scenes of Munich and Berlin.

Figure 20 :
Figure 20: Position of points with subpixel accuracy (1 m) achieved by dense matching over Munich city

Figure 21 :
Figure 21: Position of points with subpixel accuracy (1 m) achieved by dense matching over Berlin city

Figure 22 :
Figure 22: Performance of MI, Census, and weighted sum of both measures as cost functions in SGM.

Table 1 :
Specifications of the TerraSAR-X and WorldView-2 images used for dense matching

Table 2 :
Accuracy (standard deviation: STD) of RPCs fitted on SAR sensor model (units: m)

Table 4 :
Residuals (unit: m) of the control points for block adjustment validation.Original indicates the residuals before the adjustment, modified those after adjustment.

Table 5 :
Accuracy assessment of reconstructed point clouds with respect to LiDAR reference

Table 6 :
Accuracy assessment of point clouds after SRTM-based outlier removal