Building Extraction from High–Resolution Remote Sensing Images by Adaptive Morphological Attribute Profile under Object Boundary Constraint

A novel adaptive morphological attribute profile under object boundary constraint (AMAP–OBC) method is proposed in this study for automatic building extraction from high-resolution remote sensing (HRRS) images. By investigating the associated attributes in morphological attribute profiles (MAPs), the proposed method establishes corresponding relationships between AMAP–OBC and building characteristics in HRRS images. In the preprocessing step, the candidate object set is extracted by a group of rules for screening of non-building objects. Second, based on the proposed adaptive scale parameter extraction and object boundary constraint strategies, AMAP–OBC is conducted to obtain the initial building set. Finally, a further identification strategy with adaptive threshold combination is proposed to obtain the final building extraction results. Through experiments of multiple groups of HRRS images from different sensors, the proposed method shows outstanding performance in terms of automatic building extraction from diverse geographic objects in urban scenes.


Introduction
With the continuous improvement of satellite and sensor technology, high-resolution remote sensing (HRRS) images have been widely used in many fields, such as updating geographic databases, creating urban thematic maps, etc. As buildings are among the most representative types of artificial targets in urban scenes, extraction of buildings from HRRS images is important in these applications [1][2][3]. Compared with traditional medium-and low-resolution remote sensing images, a great amount of semantic, textural, and spatial information of land covers is contained in HRRS images. Hence, HRRS images are appropriate data sources for building feature extraction. However, the increasing resolution of remote sensing images leads to the prominent phenomena of high intraclass variance and low interclass variance, which reduce the ability to distinguish buildings and other geographic objects [4].
In order to address this challenge, much effort has been made on importing spatial information as a supplement to spectral and textural features [5]. It has been proven that such information is highly effective in improving the ability to identify buildings in HRRS images [6,7]. In current works, machine learning-based methods are the main strategy for building a feature extraction [8][9][10][11]. However, such methods deeply rely on a huge number of samples and the effective selection of training samples. This means that in building feature extraction applications, such methods may not be implemented or obtain reliable results due to the lack of samples in HRRS images [4].

Analysis of Building Characteristics in HRRS Images
The geometric relationship between the sensor, the ground, and buildings in remote sensing images is shown in Figure 1.
Roof, ground, and shadow, respectively, represent the roof of a building, the adjacent ground, and shadow caused by the building occluding sunlight. In general, different building roofs have different spectra and reflectivity due to material differences, so there may be significant differences in spectral and textural characteristics. However, since the pixels belonging to the roof of the same building have strong spectral and textural consistency, they are manifested as a homogeneous connected area constrained by the boundary of the building. In terms of geometric features, buildings usually behave as various rectangles or other regular shapes, and morphological attributes such as area, etc., are significantly different from other geographic objects such as roads and vehicles. The shadow of a building shows a significant dark color and a shape-regular connected area, and is distributed adjacent to the building, so it frequently produces confusion in the building extraction. Roof, ground, and shadow, respectively, represent the roof of a building, the adjacent ground, and shadow caused by the building occluding sunlight. In general, different building roofs have different spectra and reflectivity due to material differences, so there may be significant differences in spectral and textural characteristics. However, since the pixels belonging to the roof of the same building have strong spectral and textural consistency, they are manifested as a homogeneous connected area constrained by the boundary of the building. In terms of geometric features, buildings usually behave as various rectangles or other regular shapes, and morphological attributes such as area, etc., are significantly different from other geographic objects such as roads and vehicles. The shadow of a building shows a significant dark color and a shape-regular connected area, and is distributed adjacent to the building, so it frequently produces confusion in the building extraction.

MAP Theory
MAP theory is developed from set theory, in which adjacent pixels are first selected through spectral similarity and spatial connectivity to conduct the connected area, and then different operators are designed according to the characteristics of the geographic objects with different scale parameters and different attributes, and finally the extraction of specific objects is realized through differential processing [22]. Let M denote grayscale image, i denote a pixel point of the image, and k denote an arbitrary gray level. Then, a binary image () i k Th M can be obtained: Traverse all pixels in an image to get a series () k Th M and set the maximum grayscale that satisfies the attribute constraint as the result of the attribute opening operation of point i : By using the symmetry of attribute transformation, the attribute closed transformation

MAP Theory
MAP theory is developed from set theory, in which adjacent pixels are first selected through spectral similarity and spatial connectivity to conduct the connected area, and then different operators are designed according to the characteristics of the geographic objects with different scale parameters and different attributes, and finally the extraction of specific objects is realized through differential processing [22]. Let M denote grayscale image, i denote a pixel point of the image, and k denote an arbitrary gray level. Then, a binary image Th i k (M) can be obtained: Traverse all pixels in an image to get a series Th k (M) and set the maximum grayscale that satisfies the attribute constraint as the result of the attribute opening operation of point i: By using the symmetry of attribute transformation, the attribute closed transformation Φ i (M) of point i can be obtained: where Φ i Th k M)) = (Γ i (( Th k (M) c ) c denotes the attribute closed transformation of Th k (M), and (T h k (M)) c denotes the complementary set of Th k (M). All pixels are traversed to obtain the attribute open transformation Γ(M) and the attribute closed transformation Φ(M) of M. On this basis, let T = {T 0 , T 1 , . . . , T W } denote the scale parameter set of MAPs and T w ∈ T denote the wth scale parameter; the difference between the adjacent scales of the attribute opening operation and closed operation result is taken separately, and the difference result constitutes the different morphological profile (DAP) transformation ∆Ψ(M) of M, represented as follows: where Γ T w (M) and Φ T w (M) denote the attribute opening and closed transformation results obtained by scale T w , respectively. Due to the difference between attributes, objects will have the greatest response on different scale parameters, then a set of pixels that conform to the attribute range of the building can be extracted according to this principle.

Constitution of Building Attribute Set
The constitution of the building attribute set is determined based on prior knowledge and the semantic characteristics contained in different attributes. According to the characteristics of the building analyzed in Section 2, this study constructed a building attribute set with four attributes: Area, diagonal, standard deviation, and normalized moment of inertia (NMI).
Among them, area reflects the size of the building; diagonal describes the diagonal length of the minimum external rectangle, thus reflecting the aspect ratio of the building; standard deviation describes the degree of gray variation inside the building; and NMI reflects the shape and gravity position of the building.

Method
The implementation of the proposed method mainly included image segmentation and non-building object screening, initial building set extraction by AMAP-OBC, and further identification of indefinite objects. A specific description of the implementation process is shown in Figure 2.   Figure 2, the discrete pixels in an HRRS image are first classified into geographic objects with semantic information through image segmentation, thus providing basic analysis units   Figure 2, the discrete pixels in an HRRS image are first classified into geographic objects with semantic information through image segmentation, thus providing basic analysis units for building extraction [23]. The quality of segmentation has a strong influence on the practical value of the building extraction results [24]. Therefore, wavelet-JSEG (WJSEG), an effective high-resolution remote sensing image segmentation method, was adopted in this study [25].
Compared with the famous eCognition commercial software, WJSEG locates object boundaries more accurately in the complex background of a city, and helps to increase the transparency of the proposed method [26]. As an advanced multiscale segmentation method, WJSEG mainly includes four steps: Multiband image fusion, seed region conduction and secondary extraction, inter scale constraint segmentation, and region merging. The specific implementation steps can be found in [25].

Non-Building Object Screening
On the basis of segmentation results, objects that differed significantly from the morphological characteristics of the building were removed, along with shadow and vegetation detection results. For each extracted object, the specific screening rules were as follows: Rule 1: In order to reduce false positives caused by shadow, a pixel-level shadow detection method based on the Gaussian distribution background model theory was adopted. The specific implementation steps can be found in [27]. If the proportion of shadow pixels in an object was greater than 80%, the object was considered to be seriously affected by shadow and should be removed.
Rule 2: In order to reduce false positives caused by vegetation such as lawn and tree canopy, a vegetation index based on the red-green-blue (RGB) model was adopted to extract vegetation pixels. The specific implementation steps can be found in [18]. If the proportion of vegetation pixels in an object is greater than 80%, remove this object.
Rule 3: If there were fewer than 10 pixels in an object, the object was considered to be a dim or small target, such as a vehicle or noise, and it should be removed.
Rule 4: If the rectangular degree of an object was less than 0.8 and the length-width ratio of its minimum bounding rectangle was greater than 5, the object was considered to be a narrow target, such as a road or waterway, and it should be removed [28].
After the discrimination of all objects in segmentation results with the above group of rules, the remaining objects constituted a candidate object set as the input for subsequent building extraction.

Producing Attribute Profile Under the Object Boundary Constraint
During the process of calculating the attributes, the connected area for each pixel is produced by the similarity between adjacent pixels in traditional MAPs, as shown in Figure 3.
As shown in Figure 3, i represents a general pixel that belongs to an object in the candidate object set. The extracted corresponding connected area in a traditional MAP is expressed as the area with a black mesh pattern. It is shown that this area has invaded into adjacent objects. In this case, the inherent attributes of the current object cannot be accurately extracted. Therefore, this study retained only the pixels that were inside the object to produce the connected area for pixel i, as shown by the area with red lines. That is, the connected area would be constrained by the inherent boundary of the object pixel i belonging to, thus providing more accurate attributes for subsequent building extraction.

Producing Attribute Profile Under the Object Boundary Constraint
During the process of calculating the attributes, the connected area for each pixel is produced by the similarity between adjacent pixels in traditional MAPs, as shown in Figure 3. As shown in Figure 3, i represents a general pixel that belongs to an object in the candidate object set. The extracted corresponding connected area in a traditional MAP is expressed as the area with a black mesh pattern. It is shown that this area has invaded into adjacent objects. In this case, the inherent attributes of the current object cannot be accurately extracted. Therefore, this study retained only the pixels that were inside the object to produce the connected area for pixel i , as shown by the area with red lines. That is, the connected area would be constrained by the inherent boundary of the object pixel i belonging to, thus providing more accurate attributes for subsequent building extraction.

Adaptive Scale Parameter Extraction
Based on the connected areas, the MAPs of different attributes were constructed according to Equations (1)-(4) in Section 3.1. In this process, whether the selection of the scale parameter set was reasonable was the key factor that affected the building extraction, which depended on the following: In urban scenes, building clusters in the same local area (such as a residential or industrial area) usually have a class of typical morphological attributes different from other features. Therefore, in the multiscale MAP of each attribute, it should be ensured that building clusters with typical attributes in the scene could be extracted through subsequent differential processing, while other

Adaptive Scale Parameter Extraction
Based on the connected areas, the MAPs of different attributes were constructed according to Equations (1)-(4) in Section 3.1. In this process, whether the selection of the scale parameter set was reasonable was the key factor that affected the building extraction, which depended on the following: In urban scenes, building clusters in the same local area (such as a residential or industrial area) usually have a class of typical morphological attributes different from other features. Therefore, in the multiscale MAP of each attribute, it should be ensured that building clusters with typical attributes in the scene could be extracted through subsequent differential processing, while other objects were just removed. Based on this principle, this study proposed an adaptive extraction strategy for scale parameters, and the specific steps were as follows: Step 1: Set the range and subintervals of the attribute interval to adaptively search the optimal scale parameters. According to suggestions regarding the fluctuation range of building attributes in [29][30][31], set area interval as [500, 28000], diagonal interval as [10,100], standard deviation interval as [10,70], and NMI interval as [0.2, 0.5], and divided each interval equally into 50 subintervals.
Step 2: For each attribute, let SI x denote the xth subinterval. Under the object boundary constraint, the number of connected areas that met the requirements of the attribute range corresponding to SI x was calculated, denoted by Q x .
Step 3: Denote µ as an index of change degree. If it satisfies: The initial value of SI x−1 and the final value of SI x are included as the optimal scale parameters. If it satisfies: The initial value of SI x and the final value of SI x+1 are included as the optimal scale parameters; otherwise, continue the discrimination in the next interval. According to the ideal results of multiple experiments, it is suggested to set µ as 0.4 in this study.
The proposed adaptive scale parameter extraction strategy was based on the following corresponding relationships between morphological attributes and characteristics of buildings in HRRS images: If the number of connected areas satisfying the attribute range corresponding to SI x was significantly higher than that of SI x−1 , or when the number of connected areas satisfying the attribute range corresponding to SI x+1 was significantly lower than that of SI x , SI x matched the typical morphological attributes of the building clusters that might exist in the scene. Therefore, it was necessary to consider SI x as a typical interval, and the corresponding scale parameters need to be retained to ensure that the connected areas corresponding to SI x could be effectively extracted during the differential processing. Step 4: Traverse all intervals and use all optimized scale parameters extracted to form the final scale parameter set T opt = {T 0 , T 1 , . . . , T W }. Then, the proposed AMAP-OBC could be produced based on T opt and under object boundary constraint.
Step 5: Conduct DAP by the steps introduced in Section 3.1. On this basis, the pixels in each DAP that conformed to the attribute range of the building constituted a union set, and the pixels belonging to shadow and vegetation were removed. Finally, combined with the obtained set of candidate objects, all objects containing building pixels were retained to form the initial building set.

Further Identification of Indefinite Objects
The extraction results of the initial building set are not reliable, because the objects only need to meet the conditions for the existence of building pixels from AMAP-OBC. For this reason, this study partitioned the initial building set into a definite building set, an indefinite object set, and a definite non-building set, and further identified the indefinite objects. The specific steps were as follows: Step 1: In the initial building set, let g denote the building pixel proportion in an object and g max denote the maximum of g, g mid = 0.5 × g max .
As shown in Figure 4, p(g) represents the number of objects with g in the initial building set, and the fluctuation intervals of dynamic thresholds δ 1 and δ 2 are (0, g mid ) and (g mid , g max ), respectively.  Step 2: Calculate the Jeffries Matusita (J-M) distance between any two objects that satisfy 1 to obtain the sum of these distances, Step 4: Traverse all objects in the indefinite object set to obtain the final building extraction results.

Experiments and Discussion
In the experiments, three datasets of HRRS images were used. Combining statistical accuracy and visual inspection, the performance of the method in this study was verified by comparison with a variety of advanced building extraction methods. Step 2: Calculate the Jeffries Matusita (J-M) distance between any two objects that satisfy g ∈ (0, δ 1 ) and g ∈ (δ 1 , g mid ) to obtain the sum of these distances, JM δ 1 . Similarly, JM δ 2 can be calculated based on the objects that satisfy g ∈ (g mid , δ 2 ) and g ∈ (g 2 , g max ). Let JM δ 1,2 = JM δ 1 + JM δ 2 ; by traversing all possible combinations of δ 1 and δ 2 , the optimal combination can be adaptively extracted when the minimum value of JM δ 1,2 is obtained, as shown by δ opt1 and δ opt2 in Figure 4. On this basis, the definite building set, indefinite object set, and definite non-building set are extracted.
Step 3: For each object R inde f inite in the indefinite object set, further identification was made. Let the sum of J-M distances between R inde f inite and all objects in the definite building set be JM true , and the sum of J-M distances between R inde f inite and all objects in the definite non-building set be JM f alse . If JM true < JM f alse , put R inde f inite in the definite building set; otherwise, put R inde f inite in the definite non-building set.
Step 4: Traverse all objects in the indefinite object set to obtain the final building extraction results.

Experiments and Discussion
In the experiments, three datasets of HRRS images were used. Combining statistical accuracy and visual inspection, the performance of the method in this study was verified by comparison with a variety of advanced building extraction methods.

Dataset Description
Dataset 1 was a pan-sharpened WorldView image with red, green, and blue bands of Chongqing, China; the acquisition date was August 2011, the spatial resolution was 0.5 m, and the size was 1370 pixels × 1370 pixels, as shown in Figure 5a. Dataset 2 was an aerial remote sensing image with red, green, and blue bands of Nanjing, China; the acquisition date was October 2011, the spatial resolution was 2 m, and the image size was 300 pixels × 500 pixels, as shown in Figure 5b. Dataset 3 was a WorldView pan-sharpened image with red, green, and blue bands of Nanjing, China; the acquisition date was December 2012, the spatial resolution was 0.5 m, and the image size was 1400 pixels × 1400 pixels, as shown in Figure 5c. In addition, the ground truth maps were manually delineated by field investigation and visual interpretation, in which white objects represent buildings and black objects represent non-buildings. Some representative areas marked in red boxes (patches I1, I3, and I5) and blue boxes (patches I2, I4, and I6) in Figure 5 were chosen for detailed comparison and analysis. The reasons for selecting these three datasets for the experiments were as follows: (1) Airborne and satellite-borne sensors are currently the two principal forms of HRRS image acquisition. Using  The reasons for selecting these three datasets for the experiments were as follows: (1) Airborne and satellite-borne sensors are currently the two principal forms of HRRS image acquisition. Using these datasets was helpful to analyze the applicability of the proposed method for different data sources.
(2) These datasets were typical urban scenes, mainly composed of land covers such as buildings, roads, vegetation, wasteland, shadows, etc., which was helpful to verify the stability and reliability of the proposed method. (3) The acquisition seasons of these datasets were different, which was helpful to analyze the influence of vegetation factors on the extraction of buildings. (4) As an aerial remote sensing image, dataset 2 had a large oblique imaging angle. By comparing with the other two datasets, it was helpful to analyze the influence of building inclination, especially for high-rise buildings, on the proposed method.

Experimental Setup
In order to analyze the performance of this method comprehensively and objectively, this study used four advanced building extraction methods for comparative experiments: The traditional MAP method (method 1) [5], the MBI-based method (method 2) [16]; the top-hat filter and k-means classification based method (method 3) [7], and the gray-level co-occurrence matrix (GLCM) and support vector machine (SVM) based method (method 4) [20]. By comparing with method 1, it was helpful to analyze the validity of the proposed boundary constraint strategy. Methods 2 and 3 were automatic building extraction methods: Building index and rooftop detector methods, respectively. Method 4 was the machine learning method. These three types of advanced methods were adopted to evaluate the overall performance of the proposed method. Methods 1 and 2 were pixel-based, and it was difficult to compare their building extraction effect directly with the object-based method. Therefore, based on the building pixels extracted from methods 1 and 2, the subsequent implementation steps were the same as the proposed method. At the same time, in order to ensure consistency of the basic units, the segmentation in methods 3 and 4 was replaced with WJSEG, and the other implementation steps and parameter settings were consistent with the original reference. The parameter setting of the proposed method and the corresponding basis were given in Section 4. On this basis, the adaptively extracted scale parameters were set, and the parameter combinations of δ opt1 and δ opt2 are shown in Tables 1-4.

General Results and Analysis of Datasets
The building extraction results of the three datasets are given in Figures 6-8

General Results and Analysis of Datasets
The building extraction results of the three datasets are given in Figures 6-8       The quantitative results of the different methods are reported in Tables 5-7. By the statistical accuracy and visual inspection shown in the three groups of experiments, overall accuracy (OA) of the proposed method reached more than 90%, and the fluctuation range was less than 2%, which was significantly higher than the other four comparison methods. Therefore, among the challenges brought by the different data sources, the proposed method had high accuracy, high stability, and high reliability. Moreover, it also shows that the seasonal differences in the collection of the three datasets and the existing differences in building inclination did not significantly affect the extraction accuracy of the proposed method. Compared with the proposed method, the FPs of method 1 in the three groups of experiments were significantly reduced, and there was no significant difference of FNs between the two methods. This shows that MAPs had the advantage of being very sensitive to potential buildings in the image. On the other hand, it also shows that the traditional MAP strategy of constructing a connected area only based on similarities between adjacent pixels had difficulty accurately describing the inherent attributes of the object, which led to an increase in FPs and a significant decrease in OA. Therefore, the object boundary constraint strategy proposed in this study was feasible, effective, and necessary.
Except for the OA of method 3 in dataset 1 (82.4%), the OA of methods 2 and 3 in the three groups of experiments was lower than 80%. This was mainly due to the fixed-shape structural elements adopted by these methods in constructing the descriptors. These kinds of descriptors were only sensitive to the pixels that belong to buildings with similar morphological characteristics of structural elements, while ignoring the diversity of building shapes and sizes in urban scenes, so it was difficult to obtain ideal results. In addition, since shadows were not considered in method 3, there was a certain amount of fake shadow objects in the final building extraction results.
Since method 4 was a classification method based on machine learning, it had higher requirements for an abundance of samples. However, there were only 833, 462, and 212 samples after WJSEG segmentation in datasets 1, 2, and 3, respectively, so it was difficult to reflect the real accuracy that method 4 could reach. Therefore, although the OA of method 4 fluctuated slightly and exceeded 80% in the three groups of experiments, FPs and FNs show large fluctuations. In addition, as the OA in dataset 1 with more samples (83.2%) was higher than that in dataset 2 (80.1%) and dataset 3 (80.7%), we believed that with increased samples, the OA of method 4 would be significantly improved.

Visual Comparison of Representative Patches
The results of the representative patches in each dataset are reported in Figure 9 (patches I1 and I2), Figure 10 (patches I3 and I4), and Figure 11 (patches I5 and I6). The results obtained by the proposed method were the most complete and precise in most scenes. The results for each representative patch were discussed as follows.  The results of the representative patches in each dataset are reported in Figure 9 (patches I1 and I2), Figure 10 (patches I3 and I4), and Figure 11 (patches I5 and I6). The results obtained by the proposed method were the most complete and precise in most scenes. The results for each representative patch were discussed as follows. As the most common types of buildings in urban HRRS images, residential and industrial buildings are always regions of interest (ROIs) in related applications. Therefore, the following analysis and discussion were focused on the extraction effects of these two types of buildings. First of all, for residential building with small size (e.g., residential buildings in the yellow rectangle of I1) and industrial buildings with large size (e.g., industrial buildings in the yellow rectangle of I6), the analysis shows that the adopted WJSEG could accurately extract their complete contours with different shapes, thus providing effective analysis units for subsequent building extraction. In terms of residential building extraction, the proposed method accurately extracted the vast majority of buildings, as shown in the yellow rectangles of I1, I3, and I5, which was significantly better than the other comparison methods. At the same time, mixed shadows, vegetation, roads, and other artificial targets (e.g., green rectangle in I3) were effectively filtered out. Among the other four comparison methods, the extraction effect of method 4 was better than that of the other three. Especially in I5, due to the irregular shapes of the buildings, methods 1, 2, and 3 all had serious FPs and FNs. In the building extraction of industrial areas, for example, in the yellow rectangle of I2, only methods 3 and 4 and the proposed method completely extracted three buildings, but at the same time methods 3 and 4 erroneously detected the wasteland in the green rectangle of I2 as a building. As for common stacking areas of production materials in industrial areas (e.g., the green rectangle in I6) and wasteland around factory buildings (e.g., the purple rectangle in I6), all five methods could extract them correctly. In addition, geographic objects with similar morphological features of building, such as playground (e.g., the purple rectangle in I1) and pool (e.g., the purple rectangle in I2), which were located around the two types of buildings, were also effectively screened by the proposed method. To sum up, these representative patches show that the proposed method was significantly better than the other four comparison methods.
On this basis, we further discussed the influence of shadow, vegetation, and building inclination on the extraction effect of the proposed method. (1) In terms of shadow, the shadow detection strategy introduced in the proposed method already filtered out most shadow objects. However, there were a few ground surfaces (e.g., the green rectangle in I5) with similar textures and morphological features of buildings between adjacent shadows that were erroneously detected as buildings. (2) In terms of vegetation, although the collection seasons of the three datasets were summer, autumn, and winter, the vegetation index basically filtered out vegetation objects, such as canopies and lawns in the yellow and purple rectangles of I2. Obvious FNs only existed in areas where buildings and low canopies with weak edges were densely distributed (e.g., the green and brown rectangles in I1). (3) Since the building inclination effect was more prominent in high-rise buildings in aerial remote sensing images, we chose I4, belonging to dataset 2, for detailed discussion. Through analysis, we found that the building side elevation generated by the building inclination effect would result in two situations after segmentation: (1) When the side elevation and the roof were divided into the As the most common types of buildings in urban HRRS images, residential and industrial buildings are always regions of interest (ROIs) in related applications. Therefore, the following analysis and discussion were focused on the extraction effects of these two types of buildings. First of all, for residential building with small size (e.g., residential buildings in the yellow rectangle of I1) and industrial buildings with large size (e.g., industrial buildings in the yellow rectangle of I6), the analysis shows that the adopted WJSEG could accurately extract their complete contours with different shapes, thus providing effective analysis units for subsequent building extraction. In terms of residential building extraction, the proposed method accurately extracted the vast majority of buildings, as shown in the yellow rectangles of I1, I3, and I5, which was significantly better than the other comparison methods. At the same time, mixed shadows, vegetation, roads, and other artificial targets (e.g., green rectangle in I3) were effectively filtered out. Among the other four comparison methods, the extraction effect of method 4 was better than that of the other three. Especially in I5, due to the irregular shapes of the buildings, methods 1, 2, and 3 all had serious FPs and FNs. In the building extraction of industrial areas, for example, in the yellow rectangle of I2, only methods 3 and 4 and the proposed method completely extracted three buildings, but at the same time methods 3 and 4 erroneously detected the wasteland in the green rectangle of I2 as a building. As for common stacking areas of production materials in industrial areas (e.g., the green rectangle in I6) and wasteland around factory buildings (e.g., the purple rectangle in I6), all five methods could extract them correctly. In addition, geographic objects with similar morphological features of building, such as playground (e.g., the purple rectangle in I1) and pool (e.g., the purple rectangle in I2), which were located around the two types of buildings, were also effectively screened by the proposed method. To sum up, these representative patches show that the proposed method was significantly better than the other four comparison methods.
On this basis, we further discussed the influence of shadow, vegetation, and building inclination on the extraction effect of the proposed method. (1) In terms of shadow, the shadow detection strategy introduced in the proposed method already filtered out most shadow objects. However, there were a few ground surfaces (e.g., the green rectangle in I5) with similar textures and morphological features of buildings between adjacent shadows that were erroneously detected as buildings. (2) In terms of vegetation, although the collection seasons of the three datasets were summer, autumn, and winter, the vegetation index basically filtered out vegetation objects, such as canopies and lawns in the yellow and purple rectangles of I2. Obvious FNs only existed in areas where buildings and low canopies with weak edges were densely distributed (e.g., the green and brown rectangles in I1). (3) Since the building inclination effect was more prominent in high-rise buildings in aerial remote sensing images, we chose I4, belonging to dataset 2, for detailed discussion. Through analysis, we found that the building side elevation generated by the building inclination effect would result in two situations after segmentation: (1) When the side elevation and the roof were divided into the same object, such as yellow and green rectangles, these objects were correctly extracted. After visual inspection of all the datasets, it was also rare to find any FPs or FNs caused by this situation. (2) When the side elevation was regarded as an individual object in the segmentation results, FNs (e.g., the purple rectangle) or filtering out as shadow (e.g., the brown rectangle) might occur. In spite of this, we found that the roofs corresponding to these side elevations were accurately extracted, so it still had certain reference value in practical application. same object, such as yellow and green rectangles, these objects were correctly extracted. After visual inspection of all the datasets, it was also rare to find any FPs or FNs caused by this situation. (2) When the side elevation was regarded as an individual object in the segmentation results, FNs (e.g., the purple rectangle) or filtering out as shadow (e.g., the brown rectangle) might occur. In spite of this, we found that the roofs corresponding to these side elevations were accurately extracted, so it still had certain reference value in practical application.

Analysis of the Impact on the Overall Accuracy with Different µ
During the adaptive scale parameter extraction process proposed in this study, the change degree index µ in Equations (5) and (6) was used to determine the degree of difference between extracted typical interval and adjacent intervals. In order to specify the setting basis of µ, the impact on OA with different µ was analyzed in this study. As shown in Figure 12, the horizontal coordinate was µ, the interval was 0.05, the longitudinal coordinate was OA, and the experimental results of three datasets were represented by curves in different styles.

Analysis of the Impact on the Overall Accuracy with Different μ
During the adaptive scale parameter extraction process proposed in this study, the change degree index μ in Equations (5) and (6) was used to determine the degree of difference between extracted typical interval and adjacent intervals. In order to specify the setting basis of μ , the impact on OA with different μ was analyzed in this study. As shown in Figure 12, the horizontal coordinate was μ , the interval was 0.05, the longitudinal coordinate was OA, and the experimental results of three datasets were represented by curves in different styles.  Table 8.
Through analysis we found that when μ was set as 0.4, OA could reach 92.1% and 90.5%, and was only slightly lower, by 0.02% and 0.03%, than the corresponding highest OA in datasets 1 and 3, respectively. This means that the ideal results could be obtained in all three dataset experiments by setting μ as 0.4. Therefore, considering the requirements of automation and reliability, it is suggested to directly set μ as 0.4 in practical applications.  As shown above, in the three dataset experiments, with the continuous increase of µ, OA shows a similar trend of gradually increasing at first and then rapidly decreasing after reaching the peak. Among them, µ = 0.45, µ = 0.4, and µ = 0.5 corresponded to the peaks of the overall accuracy curves with 92.3%, 90.2%, and 90.8% in the experiments of datasets 1, 2, and 3, respectively. The detailed µ-OA values in the three groups of experiments are shown in Table 8. Through analysis we found that when µ was set as 0.4, OA could reach 92.1% and 90.5%, and was only slightly lower, by 0.02% and 0.03%, than the corresponding highest OA in datasets 1 and 3, respectively. This means that the ideal results could be obtained in all three dataset experiments by setting µ as 0.4. Therefore, considering the requirements of automation and reliability, it is suggested to directly set µ as 0.4 in practical applications.

Conclusions
Aiming at the restrictions in automatically extracting buildings by MAPs, a novel adaptive morphological attribute profile under object boundary constraint (AMAP-OBC) was proposed in this study. By establishing the corresponding relationships between AMAP-OBC and characteristics of buildings in HRRS images, a set of scale parameters could be adaptively obtained, and meanwhile the connected area extraction was restricted by the inherent boundaries of geographic objects. On this basis, the final building extraction results were obtained by a further identification strategy with an adaptive threshold combination. In experiments with urban high-resolution remote sensing images, the proposed method was significantly better than four comparison methods in statistical accuracy and visual inspection, and OA reached more than 90%, while FPs and FNs were lower than 7% and 6%, respectively. Therefore, the proposed method showed outstanding performance in terms of building extraction from diverse objects in urban districts.