A comparative study on extraction of buildings from Quickbird-2 satellite imagery with & without fusion

: Extraction of building from very high resolution satellite imagery is a challenging task. Many automatic algorithms are proposed to extract buildings from remote sensing imageries, but most of the algorithms detect only rectangular buildings very effectively (i.e. buildings with the same size and shape). In this paper, an attempt is made to extract buildings with different shape, size, color and pattern from Quickbird-2 imagery. In the automatic method, firstly the adaptive k means clustering algorithm is performed to classify the pixels into a number of classes which then is followed by morphological operators to extract the buildings. The manual method is also implemented to extract building feature. Consequently, both, the automatic and manual methods are adopted on the original Multispectral (MS) image and on the fused image obtained by fusing Quickbird-2 Panchromatic (Pan) image with MS image using the Fuze Go method. The performance of both the methods for the extraction of buildings is evaluated using qualitative and metric analysis. The experimental results show that both the methods are performed reasonably well. However, improving the spatial resolution of the original MS image by fusion helps to determine the buildings information more precisely in terms of spatially as well as spectrally.


PUBLIC INTEREST STATEMENT
Remote sensing satellite sensor capture the image of the earth surface. In which the remote sensing image contains information of both man-made and non-man made features. These image help the viewer to visually interpret the changes of features over the period of time. The remote sensing satellite images can be deployed for various applications including building extraction. Further, extraction of buildings has significant applications in the domain of urban mapping, urban planning and also helps to assess the destruction caused after the natural disasters such as floods and earthquake. Therefore, persistent attention is paid on the remote sensing satellite imagery for the extraction of buildings. Many algorithms are developed and available for the extraction of buildings from the satellite imagery. In which majority of algorithms extracts the buildings accurately and effectively.

Introduction
Extraction of buildings from remote sensing satellite imagery is one of the challenging problems. Recently, improvement in the spatial and spectral resolution of remote sensing satellite imagery has driven researchers to develop different algorithms (i.e. automatic and semi-automatic) for the extraction of buildings from very high resolution satellite imagery. Detection of buildings from the satellite imagery has various significant applications in the domain of urban mapping, urban planning, urban change detection analysis, target detection, geographic information system (GIS) (Shorter & Kasparis, 2009). Further, detection of buildings is very much important to assess the extent of destruction caused after natural disasters such as floods, earthquake and military operation.
Many factors that appear in the satellite imageries, make the process more complex for the extraction of buildings, even though the new sensors provide satellite imagery with improved resolutions. Factors such as scene complexity, building variability and sensor resolution (Mayer, 1999) affect the overall accuracy for the detection of buildings. The man-made feature (i.e. building) is one of the most significant feature among the other features, which consume time and cost to extract, for the reason of their variability, complexity and abundance in urban areas (Chaudhuri, Kushwaha, Samal, & Agarwal, 2016).
Generally, very high resolution satellite imagery is necessary to extract detailed spatial and spectral information of buildings. The satellite captures the images of earth with Pan and MS sensors. The Pan sensor offer image with high spatial resolution. The MS sensors offer image with high spectral resolution but low spatial resolution when compared to Pan image Zhang and Mishra (2013). Generally, Pan image covers wider spectral wavelength whereas MS image covers the minute range of wavelength. Thus, there is a trade-off between the sensors in the form of spatial resolution, spectral resolution and swath width etc. which are caused due to technical and budget limitations. In reality, acquisition of MS image with the spatial resolution of Pan image is expensive. On the other hand, Pan-sharpening or image fusion methods are developed to obtain the image with both high spatial and spectral resolution. Therefore, to meet this goal, various image fusion or Pan-sharpening methods have been proposed to improve the spatial resolution of multispectral (MS) image such as principal component analysis (PCA) (Chavez & Kwarteng, 1989), hyperspectral color space (HCS) (Padwick, Deskevich, Pacifici, & Smallwood, 2010), high pass filter (HPF) (Chavez, Sides, & Anderson, 1991), gram-schmidt (GS) (Laben & Brower, 2000), Ehlers (Ehlers, Kolouch, Lohman, & Dennert-Möller, 1984) and subtractive resolution merge (SRM) (Ashraf, Brabyn, & Hicks, 2012). Some hybrid Pan-sharpening methods are also used widely now-a-days such as wavelet-principal component analysis (W-PCA) and wavelet-intensity hue saturation (W-HIS). These hybrid methods work on the principle of wavelet decomposition (King & Wang, 2001) and more details of these hybrid methods can be found in (González-Audícana, Otazu, Fors, & Seco, 2005;Ranchin & Wald, 1993). In order to extract buildings from satellite imagery, numerous algorithms were proposed by various researchers, which are as follows. Attarzadeh and Momeni (2012) proposed object based algorithm, in which stable and variable features were utilised jointly, obtained from inherent qualities and threshold analysis. The visual analysis indicate that the algorithm can detect major rectangular buildings of Quickbird imagery. Wang, Yuan, and Pan (2013) detected the rectangular buildings using mean shift segmentation, scale invariant feature transform (corner detection) and adaptive windowed Hough transform. Wang, Qin, et al. (2013) adopted bilateral filter, line segment detector and perceptual grouping approach. All of the algorithm mentioned above detects only the rectangular buildings from the RGB image. Ghaffarian and Ghaffarian (2014b) used double threshold method, parallelepiped supervised classification and morphological operators for building detection from Google earth image. The proposed method can detect the buildings without influencing from their geometric characteristics and also it provides the training data sample automatically to the supervised classification. However, the method classifies the non-buildings features as building features, when they have equivalent spectral values. Chaudhuri et al. (2016) detected building from high resolution Panchromatic (Pan) imagery of Quickbird and Ikonos using morphological operator, multispeed-based clustering technique and adaptive threshold based segmentation. The proposed approach relies on the shadows of the buildings to accurately extract the buildings, images with low-rise buildings in urban area do not have sufficient shadows in that situation the buildings are not detected accurately. Liasis and Stavrou (2016) developed new active contour model to extract the buildings from the RGB image, the proposed method detects the buildings with arbitrary shapes and sizes. A limitation of the proposed method was that some non-building objects like bridges, roads are classified as buildings. Further, buildings which are close to each other were classified as a single building.
In previous studies, buildings are extracted using google earth image, Panchromatic image, and multispectral image with the combination of R, G, and B color mode. Majority of algorithms work efficiently for detecting buildings with the same shape (i.e. rectangular, square), colour and size. Only few algorithms are available for detecting buildings with arbitrary shapes and sizes. In the present study, buildings are extracted from the multispectral image with the combination of R, G, B and NIR color mode, in which buildings are in different shapes, colours and sizes.
In the present work, one of the objectives was to extract the buildings from the original multispectral (MS) imagery of Quickbird-2 in which the buildings are in different shape, size and colour. Two approaches were adopted to extract the buildings: (1) automatic, (2) manual, and the results of both were compared qualitatively. In the automatic approach; firstly, the vegetation portion was removed from the input image. Secondly, adaptive K-means clustering algorithm was adopted to cluster the different pixels into different classes. Finally, the morphological operator fill and open was implemented to extract the buildings. In the manual approach, area of interest (AOI) was created from the input image. Later, the generated AOI was used to subset the interested features (i.e. buildings) from the image. The next important objective was to find the effectiveness of improving the spatial resolution of the original MS image by fusing Pan and MS imageries of Quickbird-2, for the extraction of buildings using automatic and manual methods. The results of both the methods were compared qualitatively and discussed.

Data used
A high-resolution imaging satellite named Quickbird-2 was launched on 18 October 2001. Quickbird-2 acquires five bands covering Panchromatic, blue, green, red and near-infrared (NIR). The spectral response of Quickbird-2 imagery is shown in Figure 1. The satellite sensor captures the Pan image with a high spatial resolution of 0.60 m and the MS image with high spectral resolution but a low spatial resolution of 2.4 m. The data location is the Opera house in Sydney, Australia (33° 51′ 25′′ S, 151° 12′ 55′′ E) provided by Digital Globe. The wavelength range of four bands such as blue, green, red and NIR matches with the Pan band and all four bands are layer stacked to obtain the MS image. The imagery of Quickbird-2 covers features such as commercial buildings, urban area, roads, vehicles, water, roof, tree, and grass. In the MS image the shape of the vehicles, building roofs are not easily identifiable; on the other hand, these are easily recognizable in the Pan image. Therefore, enhancing the spatial resolution of an MS image of Quickbird-2 will help to extract the buildings with high spatial and spectral information.

Methodology
The methodology for extracting the buildings using automatic and manual methods from the Quickbird-2 satellite imagery is shown in Figures 2 and 3.

Methodology for the extraction of buildings automatically
The detailed explanation of methodology for extracting the buildings automatically are explained as below, the algorithm process automatically without preclassification or any training sets, however some initial algorithm parameters must be set by the user.

Removal of vegetation portion.
The multispectral image with the combination of band blue, green, red and near-infrared were used. From the input image, it is visualised that the vegetation feature is quite dominant compared to the other features; therefore the portion of vegetation is removed based on the intensity value. The threshold value of red >120, green <100 and blue <100 were used for the removal of vegetation. These values needs to be manually adjusted by the user. In our experiments we have found that the above mentioned values are giving satisfactory results.

Adaptive K-means clustering.
The Adaptive K-means clustering algorithm functions by automatically choosing the appropriate K elements from the input image (Bhatia, 2004) and further information of adaptive clustering can be found in (Chen, Luo, & Parker, 1998;Pappas, 1992). The algorithm automatically determines the K elements and generates the group of clusters (i.e. the feature with the same intensity value are grouped together). Generally, the algorithm classifies each pixel into the clusters, based on their intensity values. Firstly, the algorithm computes the distance between the selected element and the number of clusters. This process also helps to determine the distance between the two elements. In order to compute the distance, it is important to normalize the distance properties, so that the domination of distance from one property (or) certain properties are not omitted from the computation of distance. The method, Euclidean distance is adequate for determining the distance between two elements. If the input data encompass n-dimension, then the distance of two elements such as A 1 = A 11 , A 12 , … , A 1n and A 2 = A 21 , A 22 , … , A 2n is given by: By the distance function, the further processing of the algorithm is explained as below: The distance is computed for each of the clusters from one another. The computed distance is warehoused in two-dimensional array as a triangular matrix. Later, the minimum distance d min among any two clusters (i.e. B m1 and B m2 ) and also the two nearest clusters are identified. For any un-clustered element E i , it computes the distance of E i from every cluster. To assign the element E i to the appropriate cluster, the following three different processes are mentioned.
(i) If the distance of the element E i is zero from the clusters, then allocate E i to that cluster and examine the other un-clustered elements.
(ii) If the distance of the element E i from the known clusters is less than the distance d min then allocate E i to the nearest clusters. By allocating E i to the clusters, the centroid representation of clusters may differ; therefore the centroid is recalculated for all the elements presented in the respective clusters. Further, the distance of disturbed clusters from the other clusters, minimum distance between the two clusters and the two clusters that are near to each other is also recomputed.
(iii) The distance d min is less than the distance of the element from the nearest cluster, in this case we select the two closest clusters (e.g. C m1 and C m2 ), then merge C m2 into C m1 . Later the elements presented in the C m2 cluster are removed and also the representation of respective cluster is deleted. Further, new elements are added to the empty cluster and distance between all the clusters are re-determined and two nearest clusters are recognized again.
The above mentioned steps are iterated for all the elements to be clustered.

Morphological fill and open operation 2.2.2.1. Fill operation.
It is used to fill the holes in the grayscale image I. A hole is defined as an area of dark pixels surrounded by lighter pixel.
The following matlab syntax is used to fill holes in the image: where, I is the binary image. The advantage of fill operation is to fill the holes in the image by describing an area of dark pixels bounded by light pixels and producing another binary image I2.

Open operation:
The morphological open operator are normally applied to the binary image. It is used to remove the features that are smaller than the value of p pixels and retains the large structure in the image.
The following matlab syntax is used to extract the objects from the input image: In our experiments we have used a threshold value of 600 pixels (p = 600) which is found to be appropriate for the extraction of majority buildings.
To evaluate the performance of automatic algorithm, following two metrics proposed in (Lin & Nevatia, 1998) were used. Here, the performance of algorithm were compared with the ground truth data which is derived manually.
where, True positive (TP) denotes the detection of buildings by both automatic algorithm and manual. False positive (FP) indicates the number of buildings detected by the algorithm but not manually. True negative (TN) denotes a buildings extracted by a manual approach but not by the automatic Branch factor(BF) = 100 × FP TP + FP algorithm. The detection of building is rated, if at least a small portion of it is detected by the automatic algorithm (Chaudhuri et al., 2016). The two metrics are computed by comparing the buildings detected by the manual approach and by the automatic algorithm. The metric DP calculates how many of the buildings in the image are extracted by the automatic algorithm and BF denotes how many buildings are found erroneously.
The above procedure is adopted to extract buildings from the original MS image of Quickbird-2 satellite imagery and the same procedure is adopted to extract buildings from the Pan-sharpened image (i.e. fused image is generated by using the Fuze Go method) and the results of both are compared using metrics and qualitative analysis.

Pan-sharpening
Pan-sharpening is the process of transferring the spatial resolution of Pan image to the MS image to obtain a single fused (or) Pan-sharpened image with both high spatial and spectral resolution (Zhang, 2010). During the process of Pan-sharpening, the two most important key quality aspects of fused images are the enhancement of spatial resolution and the preservation of spectral information. In other words, the effectiveness of Pan-sharpening algorithm (i.e. Fuze Go) should not distort the spectral information of an MS image while enhancing the spatial resolution.

Fuze Go.
The Fuze Go method achieves a Pan-sharpened MS image by implementing the following process: The MS bands having a spectral range equal to that of the Pan band are selected. Standard deviation, mean, and covariance are calculated for both the selected MS bands and the Pan band. Then, histogram standardization is implemented on both bands. By implementing mean and standard deviation, all of the selected sets of MS and Pan images are standardized. The coefficient values are computed by applying the selected MS and Pan bands. Band weights calculated from the covariance matrix are applied for simulating a synthetic Pan band. Subsequently, a synthetic Pan band is created by applying the selected MS bands and set weights. The product-synthetic ratio is determined by applying the standardized Pan band, standardized MS bands and synthetic Pan image to obtain the fused image. Further details can be found in Zhang (2004). A common flowchart for the Fuze Go method is shown in Figure 4.

Quality analysis
The main theme of Fuze Go algorithm is to preserve the relevant information that is present in the input images and to reduce the spatial and spectral distortion in the fused image. Therefore, the performance of Fuze Go method is evaluated by Quality with no reference image (QNR) index (Alparone et al., 2008).

QNR.
It computes the spectral and spatial distortion, D and D s , in the fused image without demanding a reference image. The ideal value of QNR is one which indicates the best spatial and spectral performance of the fused image. The QNR index is defined as: where α and β are set to one, thus the equivalent position is given to both spatial and spectral quality.
For determining the spectral distortion, the parameter Q is calculated at both low and fused resolutions among each of the MS bands (Inter bands). For determining spatial distortion, the Q index is calculated between each MS band (Inter bands) and Pan image over both the low and high resolution.
The inter-band computation at the two scale aid is defined. If there is a difference between the spectral content between MS bands across scale, spectral distortion is indicated. The intra-band calculations at the two scales helps to determine the difference between the spatial information between MS bands and Pan image across scale, indicating spatial distortion. Q index calculated among images across scale should remain constant.

Methodology for extracting the building manually
The methodology for extracting the buildings using manual methods from the Quickbird-2 satellite imagery is shown in Figure 3.
Firstly, area of interest (AOI) is generated by drawing the polygon over the interested building on the input image. Secondly, the created AOI portion of the building is used to subset the buildings from the input image.
The same procedure is applied on the input image (i.e. original MS image and after improving the spatial resolution of the original MS image) and the results of both are compared qualitatively.

Results and discussion
The original MS and Pan images are shown in the Figures 5a and 5b. The original MS with the combination of band blue, green, red and near-infrared contains different features such as buildings, roads, vehicles, vegetation, etc. It is important to note that the pattern, shape, size and spectral reflectance of the buildings vary from each other. It is also visualized that the color reflectance of roads and color reflectance of buildings are similar. Therefore, the attempt is made to extract the buildings with different size, shape, color and pattern.
The automatic approach for extracting the buildings from the original MS image is shown in Figures 6a-6f. Firstly, the vegetative portion of the image is removed and shown in Figure 6a. Secondly, the adaptive k-means clustering algorithm automatically classifies the different pixels based on the intensity value into five different classes Figure 6b. In which majority of buildings were observed in the class two. It is further observed that only the roof tops coming under class four and five which appears comparatively brighter. Majority of buildings portion comes under the class two, therefore, even if we remove the pixels coming under the class four and five, we were still able to identifying the majority of buildings. From the classified image, it is notable that the buildings with the same intensity value are clustered into one class. Further, it is also noted that the intensity value corresponding to some buildings is close to the intensity value of roads and hence it is segregated in the same class.     Majority of buildings were found only in the class two. Therefore the binary image is created only for the class two is shown in the Figure 6c, the small portion of buildings which are presented in others classes were ignored. It clearly indicates that some portion of the road is identified as building due to the similarity in intensity value and spectral reflectance. The same behaviour is noticed in literature (Ghaffarian & Ghaffarian, 2014a;Liasis & Stavrou, 2016). However, the morphological fill operation helps in restoring some of these pixels which were lost in the above process. Since some buildings roof top is void, in order to reduce the potential error, the voids were filled using morphological fill operator (i.e. the void presented in the buildings, after the classification process is identified with the reference to the original Pan image) which is shown in Figure 6d. To extract the buildings from the image, the morphological area open operator is adopted and the result is shown in Figure 6e. The extracted buildings in the RGB color mode are shown in Figure 6f.     spatial resolution of MS image is Pan-sharpened using Fuze Go method and a single image is obtained having both; high spatial and spectral information which is shown in Figure 7. During the translation of spatial information from the Pan image to the MS image, the method may generate spatial and spectral distortion. In order to evaluate the quality of the Pan-sharpened image, the statistical index QNR is adopted. The ideal value of QNR is one which indicates the best Pansharpened image with high spatial and spectral information. The value (0.9173) of QNR indicates that the method Fuze Go generates a Pan-sharpened image with high spatial and spectral information.
The same methodology of automatic approach adopted to extract buildings from the Pan-sharpened image is shown in Figures 8a-8f. The removal of vegetation portion is shown in Figure 8a. The classification of different features using adaptive k-means clustering is shown in Figure 8b. Consequently, conversion of classified image into the binary image followed by morphological fill and open operators are shown in Figures 8c-8e. The extraction of building in the RGB color mode is shown in Figure 8f.
To evaluate the performance of automatic algorithm both metrics and visual analysis were used. The total number of buildings presented in the input image is twelve and shown in the Figure 9. Table 1 shows the performance of automatic algorithm using two metrics like DP and EF. Here, the building detection percentage of automatic algorithm for before fusion image and after fusion image are reasonable for such a challenging MS image. However, the branch factor indicates the percentage of buildings found erroneously. It is notable that some portion of the road is identified as buildings due to the similarity in intensity value and spectral reflectance. However, the loss of information is higher in Pan-sharpened image compared to the original image. The consequence of resolution for detecting buildings are presented in (Segl & Kaufmann, 2001). The common challenge for detecting buildings from less than or equal to 1 m pixel resolutions are low-signal to noise ratio and weak objects signal.
Generally, it is well understood that the loss of information is evident for the extraction of buildings using any automatic algorithm. The loss of information may be higher or lower which is totally  dependent on the scene complexity, building variability and abundance in the urban areas. If there exists a significant difference in the feature size, pattern and shape, loss of information ensues. In our case, majority of buildings were found in the class two. Therefore, the binary image is created only for the class two, the small portion of buildings which are presented in others classes were ignored. However, the morphological fill operation helps in restoring some of these pixels which were lost in the above process. If the threshold value for morphological open operation is too large or small, it may lead to over and under segmentation respectively.
The visual comparison of spatial and spectral information of extracted buildings from the automatic approach (i.e. before fusion and after fusion) are shown in Figures 10 and 11. Here, the red color circle indicates the sample location to differentiate the extracted buildings in terms of spatial and spectral information.
The circles A, B and C represent the sample extraction of building which differ in pattern, color, size and shape. In the circle A of Figure 11, a small white portion on top of the building is clearly visible both spatially and spectrally, whereas the same object is not clearly visible in Figure 10. The circle B of Figure 11, representing the rooftop of the building is clearly interpreted both spatially and    spectrally, when compared to circle B in Figure 10. Moreover, the circle C in Figure 11 indicates that the building roof top with tiny structures, is clearly visible in size, shape and color, whereas the same tiny structures are not identified in Figure 10.
Therefore, it is evident that the improvement in the spatial and spectral information helps to determine the information of buildings more effectively. However, loss of information is visible in both the images (i.e. Figures 10 and 11) due to various factors such as pattern, size, shape and color of buildings which different from each other and some of the buildings having same color reflectance as a road. Moreover, detailed spatial and spectral information about buildings are high in the Pan-sharpened image when compared to the original MS image.
In order to extract the buildings without any loss of information, the manual extraction method is adopted using ERDAS imagine 2014 software. At first, the area of interest (AOI) is created with the interested features (i.e. buildings) in the input image. Secondly, the same AOI is used to subset the interested features from the image. The same methodology adopted to extract the buildings from the original MS image and the Pan-sharpened image are shown in Figures 12 and 13. Here, the manual method extracts all the buildings without any loss of information and the Pan-sharpened image with high spatial and spectral information helps to extract the building information very effectively. Nevertheless, manual extraction of building is time consuming and also depends on the user experience to digitize the boundary of buildings for effective extraction of buildings.
By comparing the results of both automatic and manual methods, it is understandable that the automatic algorithm works efficiently when the interested features in the image are recognized to be in the same pattern and size. Generally, the loss of information in the output image is common regardless of input image (i.e. features with the same size or different size, same shape or different shape and same pattern or different pattern). Here, the loss of information in the automatic algorithm is noted for various reasons such as different building size, shape, pattern and color. In the case of manual method, the loss of information for extracting buildings is less, but the method requires more time to complete the process.

Conclusions
In this paper, automatic and manual method for the extraction of buildings are presented and compared for Quickbird-2 imagery. In the first phase, both the methods are adopted on the original MS image. Secondly, buildings are extracted after improving the spatial resolution of the original MS image by fusion and final results of both the methods (i.e. from the original MS image and the Pan-sharpened image) are compared. The effectiveness of improving the spatial resolution of the original MS image for the extraction of buildings are compared qualitatively and quantitatively.
From the results of automatic method, it is noted that major buildings are detected correctly for the original MS image and the Pan-sharpened image. However, loss of information is evident in both the images. The results of manual method indicate that the extraction of buildings is achieved with minimum loss of information in comparison with the automatic method.
The results from both the automatic and manual methods of Pan-sharpened image indicate that the spatial and spectral information of buildings are clearly identifiable. Therefore, improving the spatial resolution of the original MS image increases the spatial and spectral information of buildings.
In the case of any input image, (i.e. if the interested features are identified to be different from each other in terms of shape, size and color) the manual method is recommended, in order to reduce loss of information. However, the effectiveness of the method depends on the user experience and it is a time consuming process.
It is to be noted that the performance of automatic algorithm is very effective when all buildings are in rectangular shape. In our case, the building shapes are different from one another and nevertheless, the performance of automatic approach in the paper for the extraction of buildings with different shape, size and color is reasonable.