A Shadow-Overlapping Algorithm for Estimating Building Heights From VHR Satellite Images

Building height is a key geometric attribute for generating 3-D building models. We propose a novel four-stage approach for automated estimation of building heights from their shadows in very high resolution (VHR) multispectral images. First, a building’s actual shadow regions are detected by applying ratio-band algorithm to the VHR image. Second, 2-D building footprint geometries are identified using graph theory and morphological fuzzy processing techniques. Third, artificial shadow regions are simulated using the identified building footprint and solar information in the image metadata at predefined height increments. Finally, the difference between the actual and simulated shadow regions at every height increment is computed using Jaccard similarity coefficient. The estimated building height corresponds to the height of the simulated shadow region that resulted in the maximum value for Jaccard index. The algorithm is tested on seven urban sites in Cardiff, U.K. with various levels of morphological complexity. Our method outperforms the past attempts, and the mean error is reduced by at least 21%.


I. INTRODUCTION
G EOMETRY identification of buildings and subsequent three-dimensional (3D) modelling play an important role in a range of urban applications-from urban energy and environmental analysis [1] and the estimation of renewable energy potential [2] to data-centric operation and management of smart and resilient cities [3].Building height (H B ) is one of the key geometric parameters that is used to transform the twodimensional (2D) footprint area into a 3D model.Manually obtaining H B of a large number of buildings for urban-scale 3D modelling and analysis is resource intensive.The difficulty and cost involved in H B estimation also creates a barrier to the use and deployment of advanced modelling, analysis, and management of the built environment for most, if not all cities and countries.Finding an efficient and cost-effective way to estimate H B is, therefore, of paramount importance.
A common factor in the extraction of H B approaches based on remotely sensed elevation data is that they require sophisticated data calibration and processing to obtain a reliable digital surface model (DSM).Although studies have shown the utility and usefulness of elevation data for extracting H B , their implementation typically requires the use of additional data and often multiple images from different angles to obtain Both authors are affiliated with the School of Engineering, Cardiff University, Cardiff, CF24 3AA, UK.N. Kadhim is also affiliated with the Department of Civil Engineering, University of Diyala, Diyala, Iraq.E-mail: Mo-hammedSalihNM@cardiff.ac.uk (N.Kadhim) and MourshedM@cardiff.ac.uk (M.Mourshed).
a satisfactory view of building size and shape [4].Another feature of elevation data based H B extraction is the need for data pre-processing because of point cloud sparsity and data misalignment [5].As an alternative to costly data acquisition and processing, several studies have developed methods for obtaining H B from one data source such as satellite images utilising the shadows cast by buildings.This letter presents an original approach, termed the shadow-overlapping algorithm, A SO , for the automated estimation of H B from monocular VHR multispectral pan-sharpened satellite images.The contributions of this work are three-fold: (a) the generation of artificial shadows, S Ar from a simulation of the actual shadows, S Ac of the buildings in the image space; (b) the solution to the issue of overlapping shadows of the multiple buildings; and (c) the development of an algorithm by combining (a) and (b) for the automated estimation of H B by identifying the optimal height value for the given building.
The rest of this letter is structured as follows.Previous work on H B estimation are reviewed in Section II.Our approach and the simulation process are described in Section III.Experimental results are discussed in Section IV, while Section V provides concluding remarks and directions of future work.

II. PREVIOUS WORK
The first task in shadow-based H B estimation is the extraction of shadow regions from VHR satellite images.In this respect, [6] reported two widely-used techniques: ratioband (RB) and Graph-Cut partitioning via kernel mapping (KM), with overall accuracies of 85% and 79%, based on two performance metrics, F 1 -score (harmonic mean of precision and recall) and probabilistic Rand index (PRI), respectively.A semi-automatic approach was proposed by [7] to estimate H B from a single satellite image by manually adjusting the height of a simulated building and then matching the projected shadow with the actual.In contrast, [8] used volumetric shadow analysis (VSA) to automatically extract H B , which is primarily designed for buildings with full scenes of their bases and rooftops, including the sides of the building.[9] also matched shadow regions but estimated H B using simple triangulation.Estimation accuracy in this approach is dependent on the quality of the segmented rooftop polygons.
Shao et al. [10] used a classification approach to detect shadows and characterised the relationship between buildings and their shadows to estimate H B , with overall classification accuracy of 87% evaluated by a confusion matrix.The method overestimates shadow lengths of buildings, resulting in large errors in calculated H B .In [11], the number of a building's storeys as well as its length and shape were inferred based on the identification of shadow areas.In the absence of detailed validation, it appears that the empirical nature of the rule-based classification may only be practical for the presented cases.In [12], H B was estimated from the extracted shadow regions based on the sun-satellite geometry relationship using three different approaches: ratio, example-based and rule-based with mean error in estimated H B of 0.67 m, 1.51 m and 0.96 m, respectively.The method does not consider overlapping shadow regions caused by other buildings and vegetation, limiting its applicability in dense urban areas.[13] applied a semiautomated triangular analysis of shadow geometry to estimate H B , which is computationally expensive and may restrict its use if there are many buildings in the scene.
More recently, [14] estimated H B from Google Earth1 images by first calculating the ratio of H B to the shadow length of known buildings, and thereafter utilising the identified shadowlength ratio to obtain heights of other buildings with unknown heights.The approach sits somewhere between direct and and indirect approaches as some field measurements are required.[15] developed a custom filter for enhancing shadows and reducing the spectral heterogeneity of the regions of interest (ROI) to form an optimized contour model for estimating H B using shadow length and solar elevation angle.However, the presented approach is not tailored to detect the ROIs of objects with spectral dissimilarity.F 1 -score evaluation results illustrate a large aggregate height variance (4.13 m) due to the underestimation of building shadow lengths.In the studies reviewed here, the lowest and highest root-mean-square error (RMSE) were found to be 0.98 m [14] and 22.66 m [15] respectively.Error estimates of all reviewed literature are provided in Table II for comparison against our work.

III. PROPOSED METHOD
The framework of the shadow-overlapping algorithm, A SO , is illustrated in Figure 1 and the steps are discussed as follows.

A. Identification of the building shadow mask
To reliably extract the actual shadows of buildings, S Ac , we applied a refined version of our previously developed shadow detection algorithm, A S [6], as proposed by [16].Candidate shadow regions are obtained by applying a non-linear mapping function to the extracted dark pixels from both the visible, V , and near infrared, NIR, bands using Equation (1).
where, α, β and γ are parameters to control the sigmoid function.Each candidate dark pixel in V is multiplied with its counterpart in NIR to obtain the candidate shadow image D. In cases where V and NIR pixel values are dark, the calculation of image ratio (T = V /NIR) can affect shadow detection.Both D and T are in the range [0, 1].The final shadow candidate image I S is calculated using Equation (2).2 We then applied image thresholding to I S to obtain the initial shadow mask, M S,I , where the influence of noise is reduced by computing the histogram3 of I S .Although noise reduction from the M S,I was conducted to obtain an accurate binary image of the initial shadow mask, small dark areas of non-shadow pixels remained.We, therefore, proposed a morphological filter to remove all small objects; i.e. the nonshadow pixels.To keep only the areas of building shadows, the process of elimination of small objects is controlled by removing the areas <100 pixels [6].In this letter, the A S is improved further to detect and eliminate the vegetation cover by applying Normalised Difference Vegetation Index (NDVI).A binary vegetation mask M V is extracted using automatic histogram thresholding from [18].Thereafter, we subtracted M V from M S,I to obtain the final building shadow mask M S .To eliminate shadows from vegetation canopies near buildings, we applied a probabilistic fuzzy landscape approach from [19] that applies low and high thresholds to the membership values of the generated fuzzy landscapes.

B. Generation of the artificial shadow
A binary image of the building footprint or region is required to simulate its shadows for estimating H B .To do so, we automatically detected the building footprints using the graph theory based approach proposed by [19].We used the first level of partitioning in which building footprints are determined by iterative GraphCut performed in two-label partitioning comprising the background and foreground.As this study is one of the first use of the images from new satellite sensors, WorldView-3, for object detection, we conducted further refinements to the approach, in particular on the adjustment of the thresholding values, fuzzy landscape parameters, and the geometry of the extracted building regions.The mask of buildings M B was then obtained where the building footprints are accurately separated from their background.Thereafter, based on a flat terrain assumption 4 , we utilised solar information 5 in the image metadata to generate a flat linear morphological opening, the line structuring element, which is symmetric with respect to the neighbourhood centre.The direction of the sun illumination is maintained using the solar azimuth angle (λ , as per Equation ( 3)) to determine the length of the smallest connected single edge segment L, using Equation (4).where, H max T is the maximum height threshold for the buildings that cast shadows, Az is azimuth, φ is solar elevation angle El, and R img is image resolution.After creating the line structuring element, we investigated the connected components of buildings in M B and their shadows in M S to label each component with an eight-connected neighbourhood.The connected components are then extracted with a unique label for creating a mask of their perimeter pixels M Bp , as in Equation (5).
where, M Bc is the complement of the building object (Equation ( 6)), and O 3×3 is a matrix of ones.⊗ and ∩ denote morphological dilation and pixel value intersection respectively.
Once the M Bp is identified, A SO then simulates the S Ac in the opposite direction of the solar illumination to generate new regions of building shadows S max sim , as shown in Equation (7).
where, N se is the neighbourhood associated with the structuring element (se).The S Ar for each building region was then identified using Equation (8).
Because of shadows cast by adjacent and connected buildings, some shadow areas in S Ac may not have corresponding shadows in S Ar .Matching between both shadows is, therefore, evaluated using F 1 -score as A SO keeps tracing the trail of each S Ac in the shadow mask M S to fit each S Ar with the corresponding S Ac in the M S .

C. Estimation of building heights
To estimate the heights of the buildings from a single VHR satellite image, we develop the shadow-fit function f sf based on Jaccard Index (JI) which yields A SO to compute the fitting connected components over the pixels between S Ac and S Ar regions.The estimated values of the H B are extracted depending on the optimal height of a specific building using a set of H B , solar angles Az and El, number of buildings in the M B , R img , M B and M S into the f sf .We measure how fit and similar the two shadow regions are by Jaccard similarity coefficient using Equation (9).
JI computes the size of the intersection (S Ac and S Ar ) divided by the size of the union of the two regions.The computation of the overlap by JI between S Ac and S Ar is iterated until A SO finds the maximal index of fitting the two shadow regions, which approaches to 1.The algorithm then extracts the highest index which will represent the value of the optimal height H opt 6 of a given building.The final shadow region is simulated within the image space using the H opt , Az, El, R img , and a given building footprint in the M B into another developed function f ssim to visualise and combine all outputs of the S Ar regions into a single image.A SO performance is evaluated using the RMSE measure, as in Equation (10), where y i and ŷi are actual and predicted values respectively.

A. Input dataset
The test images 7 used in this study contain seven orthorectified and pansharpened images, obtained from one of the latest satellites, VHR WorldView-3 (WV3) with a ground sampling distance (GSD) of 40 cm, as shown in the first column of Figure 2. The images include four multispectral bands (B, G, R, and NIR) with a radiometric resolution of 16 bits per band.The WV3 images were taken with solar elevation (El) and azimuth (Az) angles of 16.3°and 173.2°respectively, and with a maximum off-nadir angle of 27.6°in which the cast shadow lengths were visible within the selected image.All test images are chosen to evaluate the performance of the new approach over diverse urban landscape scenes which include different geometry in buildings in challenging environmental conditions.Moreover, we used VHR WV3 images to evaluate the usage of new satellite images in urban applications, as well as to compare the performance of the proposed approach to the existing methods for estimating building heights.The performance of our approach is evaluated using the reference building height data from Ordnance Survey 8 (OS) MasterMap Topography layer.Building heights of the corresponding urban areas of the test images in the OS reference data are used to compare results with the actual.The experiments were performed on an Intel i7 PC at 3.40 GHz and 16 GB RAM.

B. Results and discussion
Results are given in Figure 2 and validation outputs in Table I.Algorithmic accuracy was measured using precision, recall and F 1 score, which are presented in Figure 3.The overall accuracy of our approach is compared against previous works on image-based building height estimation in Table II.
The results in Figure 2 are promising considering the complex building characteristics of the test images; i.e. the variations in geometry, roof colour, orientation, and challenging illumination conditions.As expected, the algorithm performs very well for standalone buildings of regular shape.Zero error was found for test image #1, which is a detached commercial building.The algorithm underestimated the heights for mixedgeometry and mixed-size buildings in #6 and #7 respectively, 8 The UK government agency responsible for the official, definitive topographic survey and mapping of Great Britain.Building heights in OS MasterMap are automatially derived from digital terrain and surface models.Further details on their creation can be found here: https://goo.gl/ohws6cby a small margin.Mean absolute error for all images was 0.65 m, demonstrating the robustness of the algorithm.
One of the reasons for the success of our approach in challenging urban conditions is that in addition to applying a thresholding scheme to filter out the building from the background, our algorithm is also able to mitigate the issue of the overlapping shadow of two buildings.When the simulated shadow of a building overlaps the shadow of another, JI is set to zero to avoid erroneous estimation of the building height.The performance appears to be affected by the presence of vegetation within the shadow region, as can be seen in image #4.Two buildings are separated by a narrow gap.The shadow of the smaller building gets blocked by the larger building on the north.The estimation error for the larger building was 0.1 m (≈1%) while the error for the smaller building was 2.55 m (≈47%).The situation was exacerbated by the presence of tall vegetation on the north-eastern side of the smaller building, preventing the accurate filtering of the shadow.
The Graph-Cut segmentation method [19] used for detecting building footprint worked well, except for image #6, in which the side of the oil storage silo and the roof were of the same colour.The shape of the building footprint therefore was larger than the actual (Figure 2, C2).However, the shadows were unaffected and the overestimated footprint did not have a noticeable effect on the estimation of the building height.On the other hand, in cases where there are many shallow buildings such as images #3 and #7, the presence of darker pavements and roads can affect the performance of the  algorithm.One possible reason is that the spectral reflectance of some non-shadow dark objects such as roads and building roofs are nearly identical to each other or to their background, resulting in dark objects being identified as shadow regions.
In addition, some adjacent buildings occlude the shadow cast by other buildings.As a result, the length of the shadow region appears longer than its actual length if the shadow regions are combined with other shadow regions cast by other objects or buildings.In contrast, they will appear shorter if some parts of the shadow region are obstructed because of the adjacent buildings.Our approach mitigated this issue with the help of morphological post-processing and thresholding in the simulation process.In addition, the approach removed all shadow pixels of non-building objects such as walls, cars and utility pole that are independent from the building shadow.
Although satellite test images present diverse building attributes from different areas, the results demonstrate the effectiveness of our approach.The comparison of accuracy between the new and past approaches in Table II indicates that our approach gives more accurate estimation of H B using shadow information in an automated manner.The average processing times for one building, one scene, and all seven test images were 0.10 s, 1.50 min and 4 min, respectively.Execution time depends on the size of the image and the complexity of the scene, as shown in Table I.Smaller images (e.g.#1 and #6) with one or two rectilinear buildings require less processing time and has the lowest estimation error.

V. CONCLUSION
In this letter, we presented a novel shadow-overlapping algorithm, A SO , for estimating building heights from a single VHR multispectral image.The new approach is based on the automated identification of building shadow regions using the solar information in the image metadata, morphological operations and Jaccard Index (JI).The algorithm is tested on different urban scenarios with varying building and neighbourhood attributes.Results are encouraging and outperforms past approaches with a 21% reduction in mean error and an overall accuracy of ∼ 80%.The increased accuracy is attributed to the consideration of overlapped shadow regions, and the removal of landscape features (e.g.shadows of vegetation canopies).
The core benefit of our approach is the cost-effective extraction of building height and subsequent 3D construction of urban areas.Applications can range from 3D urban change monitoring to the high-resolution assessment of potential and forecasting of renewable energy such as solar photovoltaics and wind.The speed and frequency (e.g.daily) of VHR acquisition compared to the more expensive methods such as LiDAR opens up significant possibilities for emergency response such as the assessment of damage to buildings and infrastructure immediately after a disaster event.Future work can expand on our methodology to enhance accuracy by differentiating between the terrain and building shadows, as well as by integrating multiple methods and data sources such as digital surface models.

)Figure 1 .
Figure 1.The framework of the shadow-overlapping algorithm A SO for building height estimation.

Figure 3 .
Figure 3. Performance of the proposed pixel based approach for estimating shadow regions: S Ac and S Ar , considering all of seven test images.