Identifying degrees of deprivation from space using deep learning and morphological spatial analysis of deprived urban areas

of


Introduction
African cities are in a period of rapid urbanization. African urban population grows on average by 3.55% per year . Between 1999 and2009, the population of Nairobi, Kenya, increased from 2 million to 3.1 million people. The latest census (KNBS, 2019) indicates that the Nairobi urban population reaches up to 4.3 million people, implying an annual growth rate of 3.8% per year. These new urban residents need housing, and if low-cost formal housing is underprovided, many end up living in unplanned areas that grow organically (Davis, 2006;Oberay, 1993). Already, 56% of the urban population in Kenya is living in poor and unplanned areas (World Bank, n.d) referred to as slums by international institutions such as the World Bank (WB) and the United Nations (UN). Specifically, in the city of Nairobi, where recently being applied successfully to the processing of RS data for generating building maps (Camps-Valls et al., 2021;Wu et al., 2018), offering better performance for object detection and image segmentation compared to traditional algorithms (Li, Wang, Wang, & Lu, 2018). Specifically, convolutional neural networks (CNN) and deep neural networks (DNN), algorithms improve considerably the performance of semantic segmentation (Krizhevsky, Sutskever, Hinton, & G., 2015;Simonyan & Zisserman, 2015). The conclusions in recent literature on the best performing deep learning architecture for building footprint extraction from VHR imagery suggest that U-Net is one of the best options (Ayala, Aranda, & Galar, 2021;Rastogi, Bodani, & Sharma, 2020;Peng, Zhang, & Guan, 2019;Li, Wang, Zhang, & Zhang, 2019;Xing et al., 2019). Furthermore, U-Net is used by companies such as Microsoft and Google for mapping building footprints (Sirko et al., 2021;Yang, 2018). However, as both data sets have not been trained with enough buildings from Deprived Urban Areas (DUAs), they perform poorly and inconsistently within DUAs.
Several studies have detected DUAs areas mainly at the city scale, focusing on the boundary of the settlements (Kohli, Sliuzas, & Stein, 2016;Kuffer & Barros, 2011;. For this, an automatic global process remains a challenge due to their variability in types and definitions between cities (Duque, Patino, Ruiz, & Pardo-Pascual, 2015;Kuffer, Barros, & Sliuzas, 2014;Sliuzas & Kuffer, 2008a, 2008b) and due to their often-rapid development processes (Liu, Kuffer, & Persello, 2019). Semantic segmentation of buildings in DUAs from RS-based methods is still challenging due to the urban complexity (i.e., variety of physiognomy and materiality of the roofs of buildings) (Fig. 1). CNN building segmentation in a DUAs was seen for the first time in Guangzhou City, Southern China (Pan, Xu, Guo, Hu, & Wang, 2020) and in Ahmedabad city in Gujarat, India. The U-net architecture was adopted for building segmentation, showing robust performance. However, the spatial urban characteristics found in Nairobi City are different from those in Guangzhou and in Ahmedabad, where the roofings are more regular, with similar materials and the space between buildings is larger.
Cities are complex systems where diverse domains interact in a static physical structure (White, Engelen, & Uljee, 2015). The urban form facilitates the social, economic, and cultural life of the city, and when a design is inadequate, processes are hindered (e.g., mobility). Therefore, characterizing and mapping the urban form is essential to understand the flow of different domains , relating spatiality to urban behaviour, and guiding evidence-based planning accordingly (Simon, 1962). The physical structure of cities can be further understood as being characterised by quantifiable elements, also known as morphological features. This urban morphology can be defined by its fundamental elements, i.e., buildings and open spaces, plots, and streets (Mumford, 1961), and it can be quantified by a set of geometric measuresindicatorscharacterizing the different urban spaces. The morphological characterisation helps in capturing the high degree of variation observed across urban forms. Meanwhile, this variation is not just limited to the differences between non-deprived and deprived urban areas (DUAs), but in fact as Davis (2006) points out there exist intrinsic differences between and within DUAs themselves Davis (2006). This morphological characterisation can be an expression of inequality and socio-economic disparities visible from both facets: from the street and from space (Naik, Raskar, & Hidalgo, 2016;Sliuzas & Kuffer, 2008a) (Fig. 2).
Although DUAs have existed in cities from their beginning and some governments acknowledge their existence (Davis, 2006), deprivation research has greatly increased following the third United Nations Conference on Human Settlements (Habitat III) in 2015, calling for an SDG assessment. Since then, the scientific literature has become increasingly interested in studying deprived urban characterisation (in response to SDG 11). Urban spatial deprivation has often been simplified by mapping socio-economic data aggregated to administrative units, as poverty has predominantly been characterised as household-level deprivation based on census data (Schirmer, van Eggermond, & Axhausen, 2014). Meanwhile, the area-level characterisation in DUAs is mostly ignored Baud, Kuffer, Pfeffer, Sliuzas, & Karuppannan, 2010;Kuffer et al., 2014;Taubenböck et al., 2018;Thomson et al., 2020). Spatial characteristics of deprivation are still unknown and there is even less global agreement to formulate deprivation through morphological variables.
Few studies have acknowledged the diversity within DUAs (e.g., Graesser et al., 2012;Krishna, Sriram, & Prakash, 2014, Kuffer, Pfeffer, Sliuzas, Baud, & van Maarseveen, 2017, and the categorization has mostly been based on statistical methodologies (e.g., PCA). Nonetheless, local expertise (i.e., link with local urban policies) and ground knowledge (i.e., DUA inhabitant's insights) have been important in interpreting and categorising deprivation (Joshi, Sen, & Hobson, 2002). The physical characterisation in quantifying deprivation have been conducted mostly based on household-level characteristics, such as building conditions which are defined by building size or building material, and primarily extracted from census or survey data (Anurogo, Lubis, Pamungkas, Hartono, & Ibrahim, 2017). Deprivation conceptualisation through urban physical characteristics is still largely under-researched, without an agreement on the indicators conceptualisation nor the methods of measurement. On the other hand, the physical conditions of the area that are analysed by RS-based methods (Kohli, Sliuzas, Kerle and Stein, 2012;Taubenböck et al., 2018) are mainly focused on isolated features such as buildings and open spaces within a settlement. Both the relative location of the area within the larger urban spatial configuration as well as the subtler aspects of the building orientation are found missing from the literature. Therefore, there is a need for an integrated approach that situates these isolated features within a larger spatial configuration and considers the orientation within the settlement pattern.
This paper aims to investigate the following research question: Can deep learning be used to characterise degrees of deprivation based on the morphology of DUAs in LMICs? To this end, the following specific objectives are pursued: 1. To generate a reference dataset of building footprints in DUAs in LMICs through participative community-based crowdsourcing 2. To employ deep learning for the automatic generation of building footprints Fig. 1. WorldView-3 imagery (resolution: 0.3 m). A dense urban slum in Nairobi. Variable building sizes and compact urban form. Right: Ground photo taken by the author. The roofs overlap at different levels and the street below cannot be detected from the satellite image.

Fig. 2.
Top row: RS imagery (Google Earth) of Nairobi city. Bottom row: 3 images from ground level; an example of a deprived area image taken by the author (left: red frame), a low-cost housing area (centre: blue frame) and a middle-class housing area (right: yellow frame), the latter two from Google Street View. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) 3. To perform a morphological analysis based on the building footprints from deep learning (predicted buildings) and on the participative crowdsourcing reference dataset (digitised buildings) and link its output to deprivation levels We claim that morphological characteristics can be captured by RS and be linked with deprivation. The paper is organised as follows: Section 2 covers the Study Area and Data used, Section 3 the Methodology, Section 4 the Results, Section 5 the Discussion and Section 6 the Conclusion.

Study area
This study is conducted in Nairobi City, Kenya. Nairobi's first settlement dates to the year 1899 and was set up as a colonial railway settlement (Morgan, 1967). The political responses to the basic right to housing has been inadequate since the colonial era, and the continued existence and growth of DUAs constitute a real challenge to urban life in Nairobi (Van Zwanenberg, 1972). Nowadays, more than half of its population live in deprived urban areas, which cover less than 6% of the total city area (APHRC, 2012).
The administrative area of Nairobi city covers 695 km 2 , while the total area of the urban deprived regions within the city is 7,75 km 2 (APHRC, 2012). The administrative boundary remains the same as in the year 1963, just after the Kenyan independence. It is now outdated, as the city has expanded outwards, and it is in this periphery that new DUAs are emerging today. Based on the data availability and contacts with community-based groups, the DUAs analysed in this study are limited to ones that are within the Nairobi city administrative area. Nairobi city is divided into sub-counties (11), divisions (29), locations (72) and the lower disaggregated unit are the sublocations (147) (KNBS, 2019). Despite these fine-scale subdivisions (e.g., smaller sublocation size is 0.07 km 2 ), not all DUAs are delimited within an administrative boundary, and therefore cannot be adequately analysed through a census analysis.
From its earliest times, spatial patterns in Nairobi reflected divisions in terms of social class, mostly related to race due to colonial planning. This segregation was between the Central Business District (CBD) and European, Asian, and African residential areas (Morgan, 1967). Today, this structure is reflected not so much in terms of race, but instead in terms of spatial and economic deprivation (K'Akumu & Olima, 2007). The wealthiest people live in Nairobi West, which is characterised by its greener areas and lower built-up density. On the other hand, low-and middle-income groups dominate in the eastern locations (Fig. 3). However, within the areas populated by the low-income groups itself, one observes differences in terms of economic deprivation and the corresponding physical characteristics.

Satellite imagery
VHR images were acquired by the Worldview-3 (WV-3) satellite in 2019. They cover Nairobi City County almost entirely (excluding the Nairobi National Park) while extending to part of the peripheric urbanised areas that fall outside of the Nairobi administrative boundary (blue polygon in Fig. 3). For this study, only the DUAs area was used with total coverage of 7 km2 (more details in Table 2). The WV-3 data consist of eight multispectral bands (MS) (1.20-m resolution) and a panchromatic band (0.30-m resolution). The WV-3 MS bands contain information across the visible and near-infrared spectrum (coastal, blue, green, yellow, red, red edge, NIR 1, and NIR 2 bands). Table 1 shows the satellite imagery specifications (DigitalGlobe, 2014).

Reference data
The reference data consist of manually delineated buildings footprints (digitised buildings). Three entire settlements were selected, namely Mathare, Korogocho and Mukuru, as well as parts of other settlements (Kariobangi, Waruku, Pumwani, Soweto, Kibera, Mukuru Kwa Njenga). Different characteristics guided their selection i.e., diversity in size and shape of buildings (as small-sized buildings are commonly found in DUAs), diversity in roofing materials, diversity in urban patterns, diversity in open space forms, and diversity of street widths.
First, we obtained a building footprint layer generated by the community of Mathare through participatory mapping coordinated by the Nairobi-based Spatial Collective (SC) company. A total of 4410 buildings in Mathare North (locally known as Mlango Kubwa) were digitised using a Google Satellite image of 2019 as background. The original dataset can be found in SC GitHub repository (SCollective, 2020). Then, we inspected the building footprints and corrected the digitizing errors. Subsequently, we extended this dataset to the other DUAs through photo interpretation (over the WV3 reference image from 2019) and digitised 25,000 additional buildings (Zenodo, 2022). During this process, community members carried out targeted ground visits to validate a number of digitised footprints for which photo interpretation was uncertain.

Methodology
The methodology section is structured in two steps (Fig. 4). The first  Fig. 4. Overview of the main steps of the methodology.
step introduces the extraction of the building footprints (Section 3.1), and the second step details the computation of the morphological metrics and the analysis of these metrics in link with deprivation levels (Section 3.2 and Section 3.3).

Building footprints extraction with DNN
The extraction of the building footprints was performed with a DNN and is structured as follows: (1) Data Pre-processing; (2) Adaptation of the Network Architecture; (3) Network training; (4) Validation; (5) Application. The complete procedure is shown in Fig. 4 (step 1).

Data pre-processing
The reference dataset needs to be split while training the DNN model in training (data used to fit the model), validation (parameter optimization of the model) and testing datasets (unbiased evaluation of final model (Kuhn & Johnson, 2013). As shown in Table 1, data from Mathare, Korogocho and Mukuru was used for training and validation of the deep learning model, whereas data from Mathare Central, Kariobangi, Waruku, Pumwani, Soweto, Kibera and Mukuru Kwa Njenga provided unseen data for testing. This choice was motivated by the need to test the generalisation ability of the deep learning model across datasets exhibiting spatial diversity.
The Coastal band was discarded due to atmospheric noise. The WV-3 MS bands were pansharpened with Gram-Schmidt Average Neighbourhood method based on the nearest neighbourhood diffusion pan sharpening algorithm (NNDiffuse) (Sun, Chen, & Messinger, 2014). Next, the images were clipped in a grid of 512*512 px and split into the datasets ( Table 2).
The reference data were manually delineated by community members and by authors in the QGIS software with WGS 84/ UTM zone 37S as coordinate reference system. The reference data was rasterised with the algorithm Rasterize of QGIS 3.12, using 0.30 cm as pixel resolution. Afterwards, the raster was clipped in a grid of 512*512 pixels and split similarly to the satellite images.

Adaptation of the deep learning architecture
The U-Net architecture proposed in Ronneberger, Fischer, and Brox (2015) is adapted in this work. The network has a characteristic encoding-decoding structure. As shown in Fig. 5, feature maps from one encoding layer are concatenated to the corresponding decoding layer using skip connections. In the contracting branches, downsampling is achieved using maxpooling with a size of 2 × 2. On the other hand, upsampling is achieved through bilinear upsampling with a factor of 2 in the expanding branches.

Network model training
In total, 276 patches comprising seven bands and dimensions of 512 × 512 pixels were generated with an overlap of 64 pixels (px) (to cover unconcluded buildings after being clipped). The dataset was randomly partitioned into 75% training, i.e., 192 patches and 25% validation, i.e., 64 patches. To expand the training dataset, data augmentation was applied to the training patches. Geometric augmentation was done by rotating (45,90,135,180,270) and flipping (horizontal and vertical), allowing the network to learn invariance to such changes. This is an interesting approach not only because it provides more training data, but also because DUAs do not follow any spatial pattern, and all possible orientations can be simulated. During training, the categorical loss function is minimised (Bishop, 2006), and the optimisation done using Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.0001. In case the validation accuracy fails to improve after 19 epochs, the training is stopped.
The U-Net was implemented based on the Tensorflow (Tensorflow 2.0) and Keras (Keras 2.2.4) deep learning frameworks with Python (Python 3.7.3) programming language, on a computer running on Linux 5.10, with Intel i7, 32GB RAMand Nvidia GeForce RTX 2070 (8GB). For further details, the code is available at Zenodo scientific repository (Zenodo, 2022).

Accuracy assessment
Accuracy assessment of the U-net model performance for extracting building footprints in DUAs was evaluated by the confusion matrix (Table 3).
TP is known as True Positive, FP as False Positive, TN as True Negative and FN as False Negative. Recall and Precision scores were extracted from the confusion matrix represented by the ground truth with the predicted values.
The F1-score evaluation is derived from the confusion matrix as the harmonic mean of Precision and Recall. It can be interpreted as a weighted average of the precision and recall and varies from 1 (best) to 0 (worst). The relative contributions of precision and recall to the F1score are equal.
The Jaccard Index, also known as Intersection-over-Union (IoU) was calculated as an additional accuracy metric. IoU is the area of overlap between the prediction (A) and the reference (B) divided by the area of union between the prediction (A) and the reference (B). IoU tends to have lower accuracies than F1-score, as it penalises misclassification.

Computing morphological metrics
Physical differences in urban patterns from a fine-scale morphological perspective were computed. Mathare Central, locally known as Huruma, was selected since it comprises heterogeneous morphologies (Fig. 6).
Urban form is assessed through morphological indicators related to the building form and the open space form (Mumford, 1961). A subset of metrics is derived from each indicator. Morphological metrics are usually based on clearly demarcated building outlines and are categorised according to 1) building form (size and internal irregularity) and 2) open space form (proximity and directionality). However, since U-Net building predictions can result in building clumps rather than individual buildings, especially where the urban area consists of densely clustered buildings constructed using a variety of hard-to-distinguish materials, we propose a series of equivalent metrics to be computed at a grid level for the predicted buildings, as shown in Table 4. (See Table 4.) Different levels (scales) were used in the morphological analysis, as shown in Fig. 7. The digitised buildings metrics were computed at the building level and aggregated at the grid level. The predicted buildings metrics were computed at the grid level. At the grid level both building datasets were characterised and deprivation clustering was performed (section 3.3). There is no building footprint coverage in DUAs, so there are no settlement level metrics for the digitised buildings. The settlement level was used to show deprivation degrees results within the studied DUAs in Nairobi City.

Computation at the building level -Digitised buildings
The building level defines the attributes related to the physical characteristics of the building form as well as its proximities (e.g., size and arrangement). Several aspects were considered while performing the morphological analysis, reflected in four main indicators: size, inner irregularity, proximity, and directionality. Thirty-five metrics were extracted related to statistics from each indicator (Table 3).
Related to the building form only the shape characteristics were chosen displayed in building size and inner irregularity. The condition of the buildings (e.g., building materials) was not considered, since only the roofs are visible on monoscopic satellite images. Techniques such as 3D building models (Taubenböck & Kraff, 2014) were not implemented, as Google Street View is not available for most DUAs in the city of Nairobi. Seven metrics were extracted relating to the indicator building size. Among the variables related to the inner irregularity of the  (Ronneberger et al., 2015). Each black rectangle represents a multi-channel feature map. The number of channels is denoted at the top of each. The left label denotes the size of the patch. Coloured arrows represent the different operations.  Fig. 3).
building, the internal angle of each corner was measured, and diverse statistics were calculated (formula in Fig. 8). The more irregular the internal building angles are, the more irregular the shape of the building itself and the more informal the building is understood. Another morphological feature that characterizes the physical structure is the open space form. To define it, proximity and directionality indicators were extracted. The proximity between buildings represents the open space width. Proximity approximation was computed by measuring the distance between the building of study and the four nearest buildings. As opposed to what was presented in other studies, our metrics were calculated with the four nearest buildings, instead of using ten neighbouring buildings (Taubenböck & Kraff, 2014) as the building surrounding open space captures the width of the street. The four buildings selected were the closest centroids from each quadrant of the Cartesian axes (see Fig. 8). Different metrics were performed to define the best approximation: distances between centroids (Taubenböck et al., 2018), named: dcc; distance between the nearest vertices, named: dvv; and the minimum distance between the vertex of the study building and the orthogonal distance to the nearest facade of the neighbouring building, named: dvl. The directionality of the open spaces is defined by the building orientation index, oi (formula 6 in Fig. 8), as a proxy for measuring the alignment between buildings Venerandi, Quattrone, & Capra, 2018). To capture the index, instead of using pairs or buildings , the four nearest buildings were used. Different statistics were performed. DUAs are characterised by being unplanned, which is reflected by their irregular physical structure.

Computation at the grid level -Digitised and predicted buildings
The grid cell level defines the attributes related to the physical characteristics of the urban pattern as the aggregation of each individual building and the open space disposition. The grid level was selected, as compared to blocks, as there are often no administrative boundaries or recognised roads available from which blocks can be extracted in DUAs (Grippa et al., 2018). The selected grid unit size is 75 m × 75 m as it permits optimal granularity representing a high variety of urban form, while protecting privacy. For example, the total number of grids was 52 within Mathare. At this level, the metrics can be computed from both datasets (Table 3), i.e., in the manually delineated set (digitised building footprints), by aggregating the building level metrics to the grid level; and from the U-net output set (predicted building footprints) by computing metrics and deriving statistics within the grid.
To characterize the grid level from the predicted footprints, certain metrics were calculated directly from the building footprints, while others were calculated from Delaunay Triangulation (DT) in the open space, using the building footprint corners as vertices. DT was produced in QGIS 3.12. (Fig. 9). Triangles from DT are conformed by two long Table 4 Metrics at the grid level. Thirty-five digitised building metrics. Twenty-six predicted building metrics.

Urban form Indicators
Metrics -digitised buildings

Orientation index mean ("oi_mean")
Orientation of DT short side mean ("mean_oi_short") Orientation index range ("oi_range") Orientation of DT short side range ("range_oi_short") Orientation index median ("oi_med") Orientation of DT short side med ("med_oi_short") Orientation of DT short side sd ("sd_oi_short") -Angle between DT long sides (ang long sides) minimum ("min_ang_long_sides") -Ang long sides maximum ("max_ang_long_sides") -Ang long sides mean ("mean_ang_long_sides") -Ang long sides range ("range_ang_long_sides") -Ang long sides median ("med_ang_long_sides") -Ang long sides sd ("sd_ang_long_sides") sides (with a similar length) and a short side. DT mainly implies that the short side of the triangle links two vertices of the same building footprint, whereas the two long sides link vertices of distinct footprints. Except, rarely occurring, when a footprint is isolated between others, then the triangles do not always follow this placement (Fig. 9). Metrics referring to the building form were extracted for the building size indicator (Table 3). Inner irregularity could not be simulated due to the pixel-based output that did not allow to measure the angles of the predicted buildings. Related to the building size the "area" and the "perimeter" of the predicted buildings were extracted, as well as the sum of all the predicted buildings ("num_clumps"), and the total number of predicted building corners ("num_vertices"). Related to the open space form, proximity and directionality indicators were defined by Delaunay Triangulation (DT) measures. Related to the proximity indicator, both DT long sides lengths were measured ("max_long_sides"; "mean_long_sides"). Intuitively, the distance between buildings is related to the width of the streets (Fig. 9). Related to the indicator directionality, the orientation of the short side of the DT was measured (oi_short = 1-(| angle-45|/45), expecting it can show differences in the orientation pattern, as that the direction of the street could estimate the direction of the buildings ("oi_short"). If angle = 0 • OR 90 • then oi_short = 0; if angle = 45 • then oi_short = 1. The angles of the short side ("short_ang") were extracted as they simulate the orientation of the urban pattern. The angles between DT long sides were extracted as we observed that they vary depending on the width and direction of the open space ("mean_ang_long_sides").
The Pearson correlation coefficient between each pair of variables was calculated to define the independent variables that most define variability in the morphological characterisation. Thirty-five metrics were computed and highly correlated metrics, − 05 > r > 05 were discarded. Weakly correlated metrics were aggregated to the grid cell level. Principal Component Analysis (PCA) was performed with the final weak correlated metrics to reduce dimensionality, not by creating new variables, as the metrics would lose urban interpretability, but to check whether all selected metrics had a high loading in the PCA test, and if not, to remove those variables and reduce the set of metrics.
Furthermore, to capture spatial patterns not only metrics within a grid cell were studied, for each grid cell, the difference between its value (j) and the mean of the eight neighbours'(i) was calculated. The Neighbour Index (Ineig) is based on the heterogeneity index from Taubenböck and Kraff (2014), capturing the similarity of the neighbours in those metrics.
where Ineig = Neighbourhood index; N = total number of neighbours studied; Vi = metric value of the neighbour grid; Vj = metric value of the studied grid.

Assessing degrees of deprivation at settlement level
Urban spatial deprivation was categorised based on the morphological variables extracted from grid level metrics (Fig. 10). Transferability, scalability, and replicability were aimed, so we wanted to obtain as an output of the model an interpretable degree of deprivation (based on predicted building metrics). Thus, we wanted to avoid the process of manually digitising building footprints to interpret the clusters and their morphological indicators but to interpret them directly from the model output metrics.
Firstly, a multivariate analysis was performed between both datasets to select among the metrics those that are highly representative in both sets. Because we have multiple responses, i.e., diverse predicted footprint metrics related to each indicator, tests for separate regression parameters are insufficient. It is important to establish whether our independent variables (ν) affect all the variables in the index (dependent Table 4 MMRA between metrics of both datasets. In blue digitised footprint metrics. In pink predicted footprint metrics. N = 52 grid cells. variables, y) or only a few, i.e., whether all the digitised footprint metrics are explained by those in the model (predicted footprint metrics). For this purpose, the Multivariate Multiple Regression Analysis (MMRA) method was selected, to model the relationship between the metrics derived from the predicted footprints (independent variables) and the metrics derived from the digitised footprints (dependent variables). y = α 0 + α 1 *ν 1 + α 2 *ν 2 + … + α n *ν n To achieve accurate results, both datasets assumed linearity, no multicollinearity, no outliers, similar spread across range and normality of residuals. MMRA was computed in the R statistical environment (R-3.6.1). We decided to keep the predicted footprints with a high level of significance (p < 0.05) and the digitised footprints with the adjusted Rsquared value higher than 0.4, a threshold that captures approximately 23% of the standard deviation (Everit & Skrondal, 2010). The resulting MMRA metrics were used for cluster performance. Secondly, the clustering process was carried out. We decided to go for an unsupervised method for both sets of metrics to propose a more objective and data-driven approach that can be interpreted in a more  rigorous manner. For instance, creating references or labels (supervised methods) for such complex and largely unknown concepts as urban deprivation would bring a significant degree of uncertainty to our research.
Before performing clustering, the optimal number of clusters was determined. Several methods were tested to find the optimal number of clusters, using measures such as connectivity, the Dunn index, and the Silhouette index (Milligan & Cooper, 1985), using the Clvalid R package (Brock, Susmita, Pihur, & Datta, 2008). We proved varying the number of clusters from 2 to 6. Different unsupervised clustering methods (k-means, k-medoids, hierarchical) were performed to guide the definition of degrees of deprivation. A hybrid method, hierarchical-k-means, was also performed as the post application of k-means optimizes and improves the initial partitioning generated by the hierarchical clustering method. All variables were scaled, to avoid the influence over the clustering of variables with greater value ranges.
The clustering method was selected based on the performance with the digitised footprint metrics, as it is the "reference clustering" and needs to be interpreted and validated. The same clustering method was performed with both sets of metrics. Clustering based on the digitised footprints metrics was used as a "reference clustering" for validating the clustering based on predicted footprints. The method selected was guided by the correspondence of each class and the levels of deprivation. Validation was carried out by consulting the current Kenya's building code legal document and through visual evaluation by local-based urban experts (i.e., urban planners and architects working in Nairobi).
The Building Code of Kenya (BCK) (Local Government, 1968) is the most up-to-date Kenyan legal code that delimits urban morphology. The BCK was analysed to interpret the morphological indicators and to relate them to degrees of deprivation. Concerning building form, only the building size indicator was assessed with the BCK, as no law is stipulated regarding the geometry of the building (i.e., inner irregularity). According to Act 72 (section 3 from the BCK) a "small house" is defined as a house with an area of less than 68 square metres (20,000 cu. feet). The higher the number of buildings per grid (num_build) in relation to the density (sum_area), the smaller the estimated area of each building, and the higher the degree of deprivation. As far as the open space form is concerned, only the Proximity indicator was taken as a reference for the BCK, as no law is stipulated regarding the Directionality of the urban  A. Abascal et al. pattern. To define proximity, part 2 of the BCK was used, specifically Act 18, which stipulates 2.4 m (8 ft) as the minimum spacing between buildings. Act 17 mandates that each building must have a space in front of it of at least 6 m (20 ft). The distance between buildings was measured through the variables mean_dvl_mean and the sd_dvl_mean. They capture respectively the average and the sd per grid of the average of the distances of each individual building with the four nearest buildings. It is interpreted that the greater the distance between buildings, the less the area is deprived.

Building footprint extraction with DNN
During the training process, the DNN model for extracting building footprints was evaluated in terms of accuracy-loss and the following values were obtained: Training Loss: 0.18; Training Accuracy: 0.92; Validation Loss: 0.24; Validation accuracy: 0.92.
The accuracy assessment of the model was also evaluated using the validation dataset (64 patches), at pixel level with the F1-score, and at area level with the Jaccard Index (IoU). The result from the validation set was F1-score ¼ 0.76 and IoU ¼ 0.61. Besides the quantitative accuracy assessment, the results were also visually analysed, by comparing them to the reference data. The performance of the model on the validation dataset is shown in Fig. 11.
Subsequently, the testing dataset, composed of 80 patches (256 by 256 pixel) from nine different settlements, had an F 1-score ¼ 0.84 and IoU ¼ 0.73. The performance of the model in the testing dataset is shown in Fig. 12.
It was found that the predicted areas visually cover all the roofs. In certain areas, because of the high density of buildings and the barely existing space between them, the model had difficulties distinguishing the buildings individually. The WV3 pixel size in some areas is larger than the space between buildings. As illustrated in Fig. 1, overlapping of roofs is common in these areas. Therefore, in some cases it leads to a merging of buildings, showing one uniform polygon (clump) where several individual buildings should be observed.

Computing morphological metrics
Deep learning predictions face challenges in dense and closely packed urban areas where individual roofs are difficult to extract. In this type of urban setting, the objects that are extracted often represent building clumps (Fig. 13). For this reason, performing a morphological analysis at the building level would lead to inaccurate conclusions. Consequently, the morphological analysis needs to be conducted at an aggregated level, e.g., at the grid level.
Eight weakly correlated metrics from the digitised footprints extracted at the building level were selected (see Appendix). The selected eight metrics cover all four indicators. The Size Indicator was characterised by the area of each individual building ("area"), and the "perimeter" metric was discarded. The short side of the facade ("short") was chosen over "long" as some buildings share the same roof, and the short façade represents each individual house. The total number of corners ("num_vertices") was also selected. Eccentricity and elongation metrics ("ecc", "elong") were discarded due to high correlation with variables selected. The Inner irregularity Indicator was represented by the index of internal irregularity of their angles ("inn_irr_angle") calculated as shown in Fig. 10. Other variables within the indicator are correlated with it. In relation to Proximity Indicator, the min and mean distances between buildings were selected "dvl_min", "dvl_mean"). The minimum orthogonal distance between a vertex of the building and the closest facade of the neighbour (dvl) was selected over distance between centroids (dcc) and distances between vertex (dvv) as it better captures the real distance between buildings (i.e., street width). The minimum distance alludes to Act 18, and the mean to the Act 17 from the BCK. For the Directionality Indicator metrics "oi_mean" and "oi_sd" were chosen.
Most grid cells show urban similarities with their surroundings i.e., neighbouring grid cells, which is reflected in the high correlation the metrics have with the neighbour (metric_neig). The more deprived the neighbours are, the more deprived the studied grid is. To capture spatial patterns, even if there were some metrics correlated with their neighbours, both metrics were kept to proceed with the deprivation clustering. Twenty-six total metrics were used for the deprivation clustering.
The predicted buildings metrics were extracted directly from each grid. Twenty-six was the total number of metrics aggregated to each grid cell. After computing the Pearson Correlation Coefficient, eight metrics were selected related to three main indicators. The final metrics were chosen as most alike to those of the digitised footprint metrics, i.e., the more interpretable. From the Size Indicator, metrics "area", "num_clumps", "num_vert" were kept. Proximity was defined by "max_-mean_long_sides", "mean_mean_long_side" metrics. Both metrics were chosen, even if they present high correlation (0.75) as some grids show higher variability in their open space (e.g., very dense areas in the DUA edges are close to private, and not accessible, open spaces) and the "max_mean_long_sides" can detect this variability. The metric "mean_-mean_long_side" also shows high correlation (− 0.95) with the "area". As the built-up area increases, the proportion of open space is reduced, which is distributed and laid out in very narrow streets. Both metrics were kept as they define different deprivation indicators. The Directionality indicator was defined by "mean_ang_long_sides", "mean_-oi_short", "sd_oi_short", metrics. The interpretability of these metrics in relation to the reference metrics is uncertain, so it was decided to retain them even though there is a high correlation. After performing PCA all metrics were kept since their loadings in at least one of the principal components obtained the largest absolute value. Finally, for each  variable the neighbour index was applied, and aggregated to the grid. The Pearson Correlation Coefficient was again calculated for the Sixteen metrics (Fig. 15). Table 4 shows the MMRA results of the digitised footprint metrics using the predicted footprint metrics as explanatory variables. For further analysis, we retained only statistically significant values (p < 0.05) with an adjusted R-squared value higher than 0.4. These values operate as a threshold that captures approximately 23% of the standard deviation explained by the digitised footprint metrics. Low values of the adjusted R-squared (as in num_build) indicate that the model does not capture the complexity of the relationships.
As it can be observed, not all the digitised footprint metrics can be explained by the predicted footprint metrics. The building size metrics ("mean_area", "mean_short", "mean_num_vert") and the inner irregularity metrics ("mean_inn_irr_ang") cannot be captured by metrics in the predicted footprints. This comes from the difficulty of the U-net model to extract individual building footprints consistently.
The optimal number of clusters was determined using the Dunn index. The Dunn index was selected as it showed a robust performance for metrics from both datasets. The optimal number of clusters was three. A hybrid method was chosen, hierarchical-k-means unsupervised clustering, as it produced meaningful clusters for differentiating degrees of deprivation. The three classes of deprivation are colour-coded: class number one (red colour), is named "High deprivation"; class number two (green colour) represents the "Medium deprivation" areas; and class number three (blue colour), refers to "Low deprivation" areas (Figs. 17 and 18).
In Highly deprived areas the Size Indicator reflects the high density per block (labelled in the Fig. 18 as "sum_area"). The density has a high and positive association with the number of buildings and we can assume that according to Act 72 (BCK) the higher density, the smaller the estimated area of each building, and the higher the degree of deprivation. With respect to the Inner Irregularity Indicator, the High deprivation class exhibits the highest values of inner irregularity of the buildings ("max_inn_irr_angle_max"). The BCK does not refer to this metric, but local urban experts and community members agree that building materials influence the shape of the building (i.e., brick or concrete are used in less deprived areas, and their construction techniques require more orthogonality of the walls, so less irregularity in the building internal angles) (Fig. 16). As such, the higher the Inner Irregularity Indicator, the higher the degree of deprivation.
The Proximity Indicator within the Mathare DUA, represented by the "mean_dvl_mean" metric, shows that streets in Mathare are characterised by being narrow (Table 5). Act 18 (BCK) indicates a minimum spacing between buildings of 2.4 m. The High Deprivation class shows the lowest values for proximity between buildings, below the minimum BCK measure (i.e., with an average distance of 2.02 m). Consequently, the smaller the distance between buildings, the larger the degree of deprivation of the area. The values of the Directionality Indicator ("mean_oi_mean") do not clearly follow the initial hypothesis that the higher the metric, the higher the deprivation, but the opposite. This can be explained by the fact that directionality is only defined by the four neighbouring buildings. In highly deprived areas, due the high density and the proximity, buildings have low chances to settle freely, so the urban pattern is not as organic as parts with greater open spaces. However, it should be noted that this indicator has often been used to differentiate formal areas from deprived areas, as the fabric is much more organic in the latter (Gomes, 2015).
The clustering obtained from the predicted footprints was assessed against the "reference clustering", extracted from the digitised footprints. The overall accuracy was 0.71 (proportion of grids correctly classified). The overall F1-score was 0.47, and the F1-score by class was 0.82 for High Deprivation, 0 for Medium Deprivation, and 0.58 for Low Deprivation, which indicates that the model is able to capture high and low deprivation but fails to capture medium deprivation (Fig. 17).
The values metrics (scaled variables) from both datasets were plotted in a box plot graphic to illustrate the performance of each metric through the deprivation cluster classes (Fig. 18).
Top: metrics from the manually delineated set. Bottom: metrics from the U-net output set.
It can be observed, that in the predicted clustering, the metric density ("area"), and the difference in density of the neighbour's ("area_neig") define High deprivation in a very precise way, even better than with the digitised building metric "sum_area". The number of clumps

Table 5
Descriptive statistics of the proximity between buildings (m) measured by "mean_dvl_mean" (i.e., the mean per grid of the mean of the four closest neighbours per building). Each Deprivation class is represented. ("num_clumps"), compared to the number of digitised footprints (correlated with "area"), show that the model is unable to extract single buildings when they are very close and their roofs overlap. The number of clumps metric ("num_clumps") and the neighbours' metric ("num_-clumps_neig") are the metrics characterising the medium deprivation class, which is not well captured by the predicted cluster. Thus, both metrics do not improve the clustering of degrees of deprivation. The distance between buildings ("max_mean_long_sides") also relates to deprivation degrees, along with the difference in distance between neighbours ("max_mean_long_sides-neig"). These metrics do not perform as well as the "mean_dvl_mean" and "mean_dvl_mean-neig" metrics, but in both cases, the classes are well defined. The metric "mean_ang_long_sides" and its difference with the neighbours ("mean_-ang_long_sides-neig") does not capture the orientation index variables of the digitised model.

Discussion
In LMIC cities, most inhabitants are living in DUAs, but we know very little -or we have very little quantitative information -about them, due to the absence of data Kuffer et al., 2017). Availability of open spatial data sets, such as OSM, offers opportunities for the study of urban patterns, however, there still remains an insufficient data coverage of DUAs. The development of new tools that bring building footprints up to date is urgently needed.
Manually digitizing has been the traditional building mapping method, but it is time costly and doesn't allow for frequent updates. This research develops a model based on deep learning techniques (U-Net architecture) designed to cover the gap in DUAs. U-Net has proved successful for extracting building footprints, achieving an F1-score of 0.84, and by visual interpretation examination, getting good coverage of the urban pattern. However, individual segmentation of overlapping buildings or close buildings (less than 60 cm) is still challenging, as the building edges are not well defined (i.e., iron sheets roofs exhibit irregularities as are made up of several sheets) and the materials of all roofs are similar, making it difficult to distribute pixels (30 cm) into the appropriate classes. The model needs to introduce separation borders between overlapping buildings so the smallest buildings and the densest areas can extract buildings individually. Such a semi-automatic building footprint extraction could be also a first step in producing a building map by manually splitting the building clumps if this would be required.
The resulting outputs show robust results from the transferability to other nine DUAs of the city. A global application (transferability to other cities) has not been tested as more WV-3 images would be required. Further research will aim to provide a trained model by adding DUAs from other cities, as urban patterns differ from cities (Brown, 1997). This will help to create a global tool to analyse the morphology of DUAs. The ability to map building footprints in deprived areas at large scale will bridge one of the urgent data gaps for global models (e.g., population models), but it is also of significant relevance for local applications such as supporting based data for urban renewal.
Furthermore, the design of urban analytical tools based on morphological metrics, being easily reproducible, is needed. We categorise a deprivation classification approach based on morphological analysis, reflecting area-level deprivation. Building materials (McCartney & Krishnamurthy, 2018), building height (3D analysis)  or other physical features are recommended for further analysis. Understanding the morphology of urban deprivation is a necessary step towards characterizing urban poverty. This has not previously been well covered, due to the lack of data. Neither are global morphological welfare standards, and outdated codes, such as the Building Code of Kenya, are not sufficiently developed documents to categorize deprivation. In addition, it would be needed to go further into the predicted metrics (metrics from the clumps) and provide more useful metrics from them.
Modelling deprivation based on the clustering of morphological features is complex and may also be subjective when identifying deprivation patterns. The morphological metrics only capture the physical characteristics and lack other relevant aspects of deprivation (e. g., socio-economic characteristics). Even if some of the datasets used in this study were generated by local community members, the results must be complemented by robust local knowledge based on community engagement. To fully verify the validity of the morphological characteristics that determine deprivation, the analysis could be linked to other socio-economic variables to understand deprivation in a more multidimensional way . Therefore, further research will analyse the transferability of morphological deprivation clustering together with socio-economic indicators in the studied areas, as well as in other areas across the globe. Also, the incorporation of some imagederived features, such as textures and image edges, partially linked from the literature to some facets of deprivation, could add robustness to our degrees of deprivation measured from satellite imagery (Engstrom et al., 2019;. As we claim, open data is needed. Not only to continue developing tools to characterize deprivation, but also for providing municipalities, NGOs, and communities with missing information. Information could empower stakeholders to advocate on behalf of deprived areas. Mapping poverty should be also discussed from a geo-ethics point of view. In this respect, aggregating metrics at the grid level could be preferable than using disaggregated building metrics. Our aim is that this tool can be merged with other platforms that provide deprivation data at a grid level (Chi, Fang, Chatterjee, & Blumenstock, 2021;Stewart & Oke, 2012;WorldPop, 2020). The implementation of a more common grid size (100 by100 meters grid cell) would make it possible to combine results with other global gridded datasets, e.g., WorldPop (WorldPop, 2020), to derive new spatial indicators of deprivation.

Conclusion
This research has generated a method for the reliable identification and delineation of urban deprivation patterns by the characterisation of urban morphology using satellite imagery. The above results, in line with other RS studies, such as DUAs location (Wang, Kuffer, Roy, & Pfeffer, 2019), characterisation (Taubenböck & Kraff, 2014;Georganos et al., 2021) and categorization  contribute to the improvement of existing data gaps of the most deprived urban areas (Taubenböck, Kraff, & Wurm, 2018b). The extent of our research significantly recalls the scope and limitations of the urban morphology extracted from the satellite, and to which extent degrees of urban deprivation can be defined. Deep learning -U-Net architecture -can extract building footprints in DUAs using VHR imagery -VW3 imageryshowing good performance with an F1 score of 0.84 and a Jaccard index of 0.73 in the testing set. Thus, our results show the ability of U-Net to extract building footprints in diverse DUAs of the city of Nairobi. The extraction of individual buildings remains a challenge, due to the complex urban environments, e.g., variety of physiognomies and limited space between buildings, of DUAs. Moreover, it has been proven that morphological characteristics are linked with degrees of deprivation at a grid level. The deep learning model output combined with a morphological analysis has the capability of characterizing urban deprivation. The availability of building footprint data could act as a game-changer. Commonly, city and local-area planning are done without data, and hence planners do not realize the scale of the consequences of infrastructure developments for communities, such as large-scale evictions. Such evictions are very common in Nairobi, as well as in other sub-Saharan African cities. Overlaying detailed building footprints on planned infrastructure development plans would enable developers to understand, at least partially, the scale of the consequences for communities. Moreover, categorizing physical deprivation could allow for the most critical areas to be detected, and for prioritizing upgrading programs accordingly.

Data and software availability
All the collected data, the manually delineated dataset, the deep learning code, and the morphological analysis codes are available from the SLUMAP project website (https://slumap.ulb.be).

Funding
The research pertaining to these results received financial aid from the Belgian Federal Science Policy (BELSPO) according to the agreement of subsidy no. SR/11/380. (SLUMAP), from the NWO grant number VI. Veni. 194.025 and from the GCRF Digital Innovation for Development in Africa panel (EPSRC Reference: EP/T029900/1). Stefanos Georganos is supported by a Digital Futures postdoctoral fellowship grant.

Declaration of Competing Interest
The authors declare no conflicts of interest.

Data availability
Data will be available in the project webpage.