Monitoring household upgrading in unplanned settlements with unmanned aerial vehicles

In-situ slum upgrading projects include infrastructural improvements such as new roads, which are perceived to improve the quality of life for the residents and encourage structural improvements at a household level. Although these physical changes are easily visible in satellite imagery, it is more di ﬃ cult to track incremental improvements undertaken by the residents – which are perhaps more closely linked to the socio-economic development of the households themselves. The improved detail provided by imagery obtained from Unmanned Aerial Vehicles (UAVs) has the potential to monitor these more subtle changes in a settlement. This paper provides a framework which takes advantage of high-resolution imagery and a detailed elevation model from UAVs to detect changes in informal settlements. The proposed framework leverages expert knowledge to provide training labels for deep learning and thus avoids the cost of manual labelling. The semantic classi ﬁ cation is then used to interpret a change mask and identify: new buildings, the creation of open spaces, and incremental roof upgrading in an informal settlement. The methodology is demonstrated on UAV imagery of an informal set- tlement in Kigali, Rwanda, successfully identifying changes between 2015 and 2017 with an Overall Accuracy of 95 % and correctly interpreting changes with an Overall Accuracy of 91 %. Results reveal that almost half the buildings in the settlement show visible changes in the roo ﬁ ng material, and 61 % of these changed less than 1m². This demonstrates the incremental nature of housing improvements in the settlement.


Introduction
An estimated 61.7 % of the urban population in Africa live in deprived areas known as slums (The World Bank, 2017). The severity of slums is emphasized by Sustainable Development Goal target 11.1, which aims to "ensure access for all to adequate, safe and affordable housing and basic services and upgrade slums" (UN, 2019). According to the UN definition (UN-Habitat and Earthscan, 2003), these areas are characterized by one or more of the following characteristics: low quality housing, lack of access to water, lack of access to sanitation, overcrowding, and/or lack of tenure. In-situ slum upgrading often focuses on infrastructural changesintroducing utilities and services into the area or improving access by providing new roads. Indeed, improving streets is considered to be a basis for commercial development and would spur residents to invest in housing improvements (UN-Habitat, 2012). Given the significant investments in slum upgrading projects, it is important to monitor changes in the areaboth to track the execution of the infrastructure upgrading project itself as well as to monitor the impact of the project. Changes could include: large infrastructural changes such as new buildings or roads, vertical extensions through adding floors to buildings, and roof upgrading. Due to the economic limitations of the residents, upgrading at a household level is often incremental (Abbott, 2003). For example, corrugated iron sheets are commonly used as a roof material. These can be improved incrementally by replacing a sheet at a time if the funds to improve the entire roof at once are not available.
Remote sensing is a valuable tool to give an overview of such physical changes over time. Satellite or aerial imagery has typically been used for change detection in urban areas (Weng et al., 2018) and for the analysis of slums' temporal dynamics (Liu et al., 2019). Imagery from UAVs are an effective alternative for informal settlement upgrading projects as they can capture spatial details not visible in satellite imagery and provide crucial height information (Gevaert et al., 2016). A wide range of change detection methods have been developed for remotely sensed imagery. Overviews of change detection methods in remote sensing can be found in (Bovolo and Bruzzone, 2015;Karantzalos, 2015;Lu et al., 2014;Tewkesbury et al., 2015). In general, methods can be divided according to comparison method and unit of analysis (Tewkesbury et al., 2015). Unsupervised change detection methods are data-driven and quick to implement. However, the changes they detect may not be relevant for the application at hand. Supervised change detection methods, on the other hand, can target specific types of changes. The two main strategies are: 1) to label images of both time stamps t 1 (before change) and t 2 (after change) independently and identify locations with a label change, and 2) first identify changes at pixels/segment level and then classify the identified changes. Current developments in change-detection algorithms include: improving the level of automation (Lv et al., 2017), simultaneous image registration and change detection (Vakalopoulou et al., 2015), and combining various forms of spatial data (Karantzalos, 2015). As with many other fields of image analysis, deep learning algorithms are being successfully applied to change detection problems (Zhan et al., 2017). For example, fully-convolutional Siamese networks efficiently detect changes from pairs of images (Caye Daudt et al., 2018).
However, these supervised methods generally require large amounts of training data. Image benchmarks are not available for UAV imagery of informal settlements, let alone to detect changes. The utilization of supervised deep learning therefore entail lengthy and impractical manual digitization efforts. Luckily, studies using deep learning for Digital Surface Models indicate that manual labelling can occasionally be bypassed (Gevaert et al., 2018). Instead, a simple rule-based classification based on expert user knowledge was used to automatically label parts of the dataset, at a cost of 1-3 % accuracy compared to training on manual labels in that case. Rule-based labels can thus occasionally be used to train a fully convolutional network which classifies the entire image.
The objective of this paper is, therefore, to create a framework for automatically detecting different types of changes in an informal settlement. A key contribution is the application, namely the use of UAVs to detect changes in informal settlements. The types of changes targeted by the methodology proposed here are especially selected for informal settlement upgrading projects: (incremental) changes in building extensions and demolitions and completions, changes to roofing material, and open spaces. Note that pixels are the units of analysis for the change detection. This means that the class 'new buildings' may also refer to small extensions to existing houses. In the same way, partial demolitions for road widening can be detected. Focusing on UAVs as a platform, we demonstrate the importance of the optical (2D) and elevation (3D) information. Another key contribution is the emphasis on a complete automatic workflow which combines the accuracy of deep learning methods with expert user knowledge to both provide the training labels as well as interpret the type of change.

Methodology
The methodology combines simple rules and deep learning to create an automatic workflow for recognizing different types of changes in informal settlements. The basic steps are as follows: (1) apply knowledge-based rules to a DSM and orthophoto derived from UAV imagery to obtain training samples, (2) train a deep learning algorithm to classify the entire study area, (3) use an unsupervised technique to identify changes in between the scenes, and (4) label the changes semantically.

Datasets
The study area consists of an informal settlement in Kigali, Rwanda. This area was targeted by the government for urban upgrading, including building a new asphalt road. The first set of UAV imagery was collected in 2015 (before the road was built) with a DJI Phantom 2 Vision + quadrocopter. The flights were conducted at a height of 70 m with a 90 % forward-and 70 %-side-overlap. Images were taken with a 14 megapixel RGB camera (FOV = 110°). The same area was flown in 2017 (after the road was built). At least 20 well-distributed Ground Control Points were collected with an RTK-GNSS at both time stamps individually. This resulted in a highly accurate georeferencing with an RMSE of less than 10 cm for both datasets. The UAV data from both t 1 and t 2 were processed with Pix4D to obtain an RGB orthomosaic and corresponding DSM with a spatial resolution of 3 cm. The overlapping area covered in 2015 and 2017 had a dimension of 13,200 × 9800 pixels, or 396 by 294 m, and contained around 650 buildings.
Buildings, vegetation and terrain were labeled manually in the two scenes to allow for assessing the accuracy of the final classified image. Dubious pixels, or those representing temporary objects such as cars and people were not labelled. The semantic change reference data was created by spatially overlaying the reference data from both time stamps. The new building and new open space classes were identified by comparing the reference class labels from t 1 and t 2 . The roof material changes were manually identified. This labelled data was used to calculate the Overall Accuracy (OA), Producer's Accuracy and User's Accuracy for the class labels and semantic changes.

Rule-based training labels
Previous work has shown how to use simple rules and the Digital Surface Model (DSM) to select training samples for an image-based classification problem (Gevaert et al., 2018). Here, we use a vegetation index and relative height in a similar way to select training pixels for four semantic classes: 1) buildings, 2) vegetation, 3) miscellaneous objects, and 4) terrain. Note that this initial step aims to obtain a set of training samples which represents the variability of the four classes; it does not require the rules to assign a label to every pixel in the image.
The rule-based labelling is applied to all images in the change-detection series, thus including the images from both epochs in the training set. This constructs a more complete training set and accounts for small (e.g. radiometric) differences between the scenes.
Due to the lack of a NIR band in the UAV imagery, we use the Normalized ExG vegetation index (Woebbecke et al., 1995) to identify vegetation. The ExG can be calculated as: where R, G, and B represent the three spectral bands of the UAV orthomosaic. Scientific studies have shown that the ExG obtained from UAV imagery can be related to vegetation cover (Rasmussen et al., 2016;Torres-Sánchez et al., 2014). As a proxy for the height of objects, we apply a top-hat filter γ ω with a diameter of ω to the DSM. This essentially returns the height of the central point above the lowest point in the neighborhood. The diameter should be larger than the largest expected roof in the settlement, in this case we use 15 m (500 pixels).
Using these two features (the vegetation index and top-hat filter of the DSM), we identify pixels representing the four classes according to the rules described in Table 1. The miscellaneous class is needed to group temporary objects, narrow structures such as walls around plots, and shadowed elevated objects. An erosion filter with a radius of 27 cm (or 9 pixels) is used to enhance the reliability of the terrain class by removing pixels along the borders. Similarly, labels are removed for pixels along the borders of the image because of the edge-effects of the top-hat filter, and a final morphological eroding filter is applied with a radius of 27 cm (9 pixels) to account for misalignments between the two datasets between the DSMs and orthophotos. Note that this rule-set is intended to collect training data and not to classify every pixel, some pixels will therefore remain unlabeled. In the current experiments, the rule-based stage labelled 46.67 % (t 1 ) and 45.55 % (t 2 ) of the pixels. All pixels will be labelled after the supervised classification step as described below.

Train a Fully Convolutional Network and label the entire study area
The rule-based training set is used to train a Fully Convolutional Network (FCN). We use the same shallow FCN architecture developed for DTM extraction from UAV imagery (Gevaert et al., 2018). The input data consisted of the RGB UAV imagery and the rule-based labels generated in the previous step. Five hundred training patches of 169 × 169 pixels (approx. 5 × 5 m and equal to the receptive field of the FCN) were extracted randomly from the images of both time stamps. As the study area is relatively small and cardinal orientation is not important in urban environments, data augmentation was performed on 70 % of these samples by randomly flipping and rotating at 90°intervals. The network was trained with a batch size of 64; a learning rate of 0.0001 was utilized for the first 300 epochs and 0.00001 for an additional 50 epochs.

Unsupervised change detection
For the unsupervised change detection technique, we apply the IR-MAD algorithm (Rasmussen et al., 2016). This method isolates unchanged pixels in the image to perform an automatic radiometric normalization, and therefore is more robust to spectral differences due to image acquisition conditions. It is still commonly used as an unsupervised change detection method in remote sensing, e.g. (Falco et al., 2016;Wang et al., 2017). Note that changed pixels are obtained by comparing the images rather than the class labels. This enables changes within a class (i.e. roof upgrading) to be detected as well.
Two different experimental set-ups are used for the unsupervised change detection step. The first utilizes only the imagery for the change detection and the second makes use of both the imagery and the DSM as inputs for IR-MAD.

Rule-based semantic interpretation of changes
A framework of rules is developed to identify three distinct types of changes between dates t 1 and t 2 ( Table 2): new buildings, roof upgrading, and new open spaces. This framework combines the change mask with the input height information and FCN classification results to categorize the type of change. New buildings are pixels in the change mask which were not classified as buildings at t 1 but were classified as buildings at t 2 and have a height increase of at least 2 m. Roof upgrading consisted of pixels which are labelled as buildings by the FCN classification results of both t 1 and t 2 , but which are flagged as having changed in the IR-MAD change mask. New open spaces consist of areas in the change mask which were classified as buildings at t 1 but not at t 2 . In this case, the "new open spaces" generally consists of buildings removed for the new road.

Rule-based training labels and classification
The results of the rule-based labels and FCN classification are presented in Fig. 1. Quantitative accuracy measures are presented in Tables 3 and 4. The rule-based labels obtain an OA above 98 %, with the PA and UA of at least 95 % for each of the classes. The FCN obtains an OA of around 95 % for both time stamps. The PA and UA is above 90 % in all situations except the UA for terrain in the first time stamp. Note that the accuracies obtained by the rule-based labels and FCN classification are not completely comparable as not all the pixels are labeled in the initial rule-based step, but they are all assigned a label in the subsequent FCN classification. Visual interpretation indicates that terrain is sometimes misclassified as buildings in the first time stamp, where more of the terrain features consist of narrow footpaths. Additionally, there is some confusion between green roofs which are misclassified as vegetation.
Another observation was the importance of the miscellaneous class. Previous experiments not including this class caused shadowed tree borders, walls along plot boundaries, and traffic on roads to be often misclassified as buildings.

Semantic change detection
The labeled change detection maps for (i) the image-based and (ii) the image and DSM-based experiments are given in Fig. 2. The IR-MAD change mask has an OA of 95.20 % and an F1-score of 0.9732 when using only RGB information. It obtains an OA of 93.17 % and F1-score of 0.9611 when including the DSM. The lower accuracy when including the DSM in the IR-MAD can be attributed to misalignments between the DSM and orthomosaics causing an over-prediction of the roof material change class. Previous research indicated that smoothing functions used to generate DSMs from photogrammetric point clouds may cause discrepancies between the DSM and corresponding orthomosaic, which causes errors when using both datasets for classification (Gevaert et al., 2017). Furthermore, some changes in vegetation height are identified by the IR-MAD change mask including DSM but not indicated as change in the reference data. In both cases, changes in roof upgrading were relatively easy to identify (> 97 %), whereas new buildings were more the most difficult to identify. This is likely due to: the complexity of the topography, small buildings partially underneath tree canopies which cause difficulties in the DSM generation, and propagation of uncertainties in the classification step. For example, in the bottom center of the study area there is one structure which was a building foundation at t 1 (see the white circle in Fig. 2b). In t 1 was this building was partially classified as building and partly as terrain (Fig. 1c). It is understandable that an incomplete building is classified in this way. The pixels which were classified as terrain in t 1 are labelled as "new buildings" in the change detection results in Fig. 2.b. and 2.d., whereas the pixels which were identified as buildings in t 1 labeled as "roof material change".

Changes in roofing material
The proposed methodology identifies changes in roof material, though it does not specify whether these are improvements or damages. Visual inspection showed that most changes in this particular study area were consistent with upgrading. The results also provide two main observations regarding roof material. Firstly, that a large number of the buildings have changed roofs between 2015 and 2017. Roof changes were detected (using the RGB method and assuming a minimal area of 0.5m² to be a change) in 244 out of the 536 buildings in 2017, or 45.5 %. This is slightly higher than the 209 buildings with roof changes in All other combinations a Where D 1 and D 2 represent the DSM at t 1 and t 2 , and C 1 and C 2 represent the class label defined by the FCN at t 1 and t 2 . C.M. Gevaert, et al. Int J Appl Earth Obs Geoinformation 90 (2020) 102117 the reference data. The overprediction of roofs which have changes may be due to small changes such as rocks and other objects on roofs; these are detected as changes by the proposed workflow but are not labeled as roof changes in the reference data. Reducing the spatial resolution of the images or additional post-classification smoothing could reduce this effect. Secondly, we observe that changes in roofing material often occurs incrementally rather than replacing entire roofs. Changes in roofing material were restricted to an area of 1 m² for 62 % of the buildings where changes were detected (compared to 61 % in the reference data). This is quite limited, given that the average building in the study area has a surface of 56.7 m². This could possibly reflect the low-income status of residents who repair patches of roofs rather than replacing an entire roof. Low income residents will typically build and renovate from savings. A larger number of full roof replacements could indicate rising income or influx of wealthier families through gentrification (Raman,   Gevaert, et al. Int J Appl Earth Obs Geoinformation 90 (2020) 102117 2014). Further research could target specific types of roof material change to automatize the detection of upgrading versus deterioration of roofing materials.
Other changes that can be observed in Fig. 2. include the extension of buildings, though this was much less frequent than roof upgrading.
Several new open spaces are clearly visible in blue along the newly constructed road. Here houses were demolished to provide the necessary space for the road construction. Whether and how urban upgrading programs focusing on physical interventions improve the quality of life in deprived areas is still unclear (Turley et al., 2013). The methodology  Fig. 2. Results of the semantic change detection between the UAV imagery at t1 (a) and t2 (b) using only the RGB (b) and RGB + DSM (d) to determine the change mask. The white circle in (b) indicates a building foundation partially classified as terrain and partially as building in the previous FCN classification step, these errors are propagated in the subsequent semantic change detection.
presented in the current manuscript can support the identification of physical changes in a settlement. Further research is needed to interpret the effects of these physical changes on the quality of life of the residents and to determine the extent to which such changes reflect the influx of wealthier households through gentrification.

Conclusions
This work integrates expert knowledge and deep learning to develop an automatic change detection workflow. The workflow is specifically designed for use with UAV datasets, though cases where orthoimagery and digital surface models are available for both time stamps will be able to use similar methodologies. It was also specifically designed to identify changes in informal settlementsboth large infrastructural changes such as those implemented by development organizations and subtle roof upgrading implemented at a household level. The results demonstrate the utility of integrating 3D and 2D image-based information for the rule-based labelling. The manual generation of training labels for the deep learning algorithm is thus by-passed by using expert knowledge. Extending the usage of deep learning to the change detection step and creating a more end-to-end workflow was not considered here due to the limited size of the study area, but would be a suitable topic for extended research.
From an urban management perspective, this work illustrates how many changes in informal settlements are subtle. Almost half of the buildings in the reference data showed visible changes in roofing material, and 61 % these only changed less than 1m² of roofing material. Of course, change detection methods are susceptible to irrelevant changes in the scene. For example, the prevalence of roof upgrading was slightly overestimated due to the storage of temporary objects on roofs. The ability to capture small changes, which are frequent in informal settlements, attests to the fitness of using UAVs in this context. Change detection methods such as the one presented here can identify where incremental changes are likely to have taken place, as a guide for field verification and/or enforcement. Moreover, they could be useful proxies for monitoring gentrification and displacement of low income families, during and after settlement upgrading processes.