Human dependence on natural resources in rapidly urbanising South African regions

Enhancing the governance of social-ecological systems for more equitable and sustainable development is hindered by inadequate knowledge about how different social groups and communities rely on natural resources. We used openly accessible national survey data to develop a metric of overall dependence on natural resources. These data contain information about households’ sources of water, energy, building materials and food. We used these data in combination with Bayesian learning to model observed patterns of dependence using demographic variables that included: gender of household head, household size, income, house ownership, formality status of settlement, population density, and in-migration rate to the area. We show that a small number of factors—in particular population density and informality of settlements—can explain a significant amount of the observed variation with regards to the use of natural resources. Subsequently, we test the validity of these predictions using alternative, open access data in the eThekwini and Cape Town metropolitan areas of South Africa. We discuss the advantages of using a selection of predictors which could be supplied through remotely sensed and open access data, in terms of opportunities and challenges to produce meaningful results in data-poor areas. With data availability being a common limiting factor in modelling and monitoring exercises, access to inexpensive, up-to-date and free to use data can significantly improve how we monitor progress towards sustainability targets. A small selection of openly accessible demographic variables can predict household’s dependence on local natural resources.


Introduction
As countries develop and many societies transition to urbanisation, they tend to reduce their dependence on the local natural environment to meet basic needs (Anderson 1987, Cumming et al 2014, Sanderson et al 2018. This decoupling is often described by the transition from a strong reliance on agriculture as the main source of national income to other, less directly coupled sectors such as industrial and service sectors (Daunton 1995, Mellor 1995, Soubbotina and Sheram 2000. Beyond agriculture, this transition entails a move from 'green-loops', i.e. social-ecological systems characterized by high dependence on local provisioning ecosystem services (ES), to 'red loops', characterised by intensive imports or mediated (see figure 1) provisioning ES from local or other systems (Cumming et al 2014). A key driver of this transition is urbanisation (Grimm et al 2018). Urban areas will continue to experience rapid increases of population, consumption, and diet changes, among other factors into the future. How cities balance rapid urbanisation in relation to resource consumption may determine whether or not many of the sustainable development goals (SDGs) are achieved (UN 2015).
In today's globalized world, most of the urban consumption of natural resources is mediated by biophysical features (e.g. cross border river systems), formal and informal markets, international governance mechanisms (e.g. protection of threatened species) and infrastructures (e.g. dams) (Syrbe and Walz 2012, Reyers et al 2013, Palomo et al 2016. As a result, most people in urban environments consume natural resources produced or extracted somewhere else , Nel et al 2017. However, cities are not homogenous entities and seeing them as such can mask their unique development trajectories and the differentiated dependences of various groups and sectors on local ecosystems. For example, urbanisation in the Global South is partly characterised by informality both in land-use, resource access, governance processes (Brouwer et al 2009, McFarlane 2012, Güneralp et al 2018. In-migration from rural areas usually occurs more quickly than infrastructure can be developed to meet the growing demand (Foster andBriceno-Garmendia 2010, Brueckner andLall 2015). Additionally, there is incomplete and unequal access to the main infrastructure grid and other formally provided services such as water and energy, which often imply that people have to rely on local ES as alternatives.
Urbanisation in cities in the Global South also involves a combination of people relocating permanently to cities, people moving to cities but maintaining strong connections to rural areas, and 'circular migration' which refers to repeated temporary migration between two or more areas (Collinson et al 2007, UNDP and Newland 2009, Masterson et al 2017. Urban dwellers in these cities continue to rely on local natural resources to satisfy basic needs (McHale et al 2013). This means that natural resources within the city boundaries play a broader role beyond providing recreational and cultural services, which are often the most commonly cited and studied urban ES (Bertram andRehdanz 2015, La Rosa et al 2016). The direct extraction of natural resources within the city boundaries has implications for how natural resources in urban areas are managed and sustained.
Tools and approaches that can account for multiple modalities of reliance on the environment can help understand and manage natural resources and manmade infrastructure and support targeting of policies towards the SDGs. In contrast to the common framing of natural resource dependence as a dichotomous category, we propose a conceptualization of dependence as a gradient (figure 1). This builds on Cumming (a) Dichotomous view of dependence suggests that people depend either directly or indirectly on natural resources. Direct dependence refers to locally sourced or unmediated use, whereas indirect dependence means nonlocal or mediated use. Mediating factors include physical infrastructure (e.g. dams or taps), markets or even institutions. The dichotomous view also suggests that often rural areas are more directly dependent whereas urban areas tend to be more indirectly dependent. (b) Alternative framing of dependence as a gradient. This view suggests that people can depend on natural resources both directly and indirectly at the same time, along a gradient. The gradient could represent physical space (e.g. rural-urban) or socioeconomic factors such as income levels (e.g. rich to poor), which affects how households depend on natural resources. Moreover, how households depend on natural resources is also affected by a range of factors such as seasonality (e.g. harvest season), income fluctuations, market fluctuations, climatic conditions and other factors which means households can move along the gradient towards either direct or indirect dependence depending on circumstances.
et al (2014) and Hamann et al (2015)'s conceptualisations of 'green and red loops' which shows dependence on natural resources being both direct and indirect, or local and nonlocal. Similar to Raudsepp-Hearne et al (2010) and Hamann et al (2015), we use national survey data, which they used successfully to explore dependence on and use of ES at national and subnational scales. First, we develop a metric of households' overall dependence on natural resources based on Statistics South Africa's census data. Second we use households' demographic characteristics to model the same patterns observed in the census data using machine learning. Lastly, we explore whether similar patterns can be observed using other publicly available data. Our context of analysis is the eThekwini Metropolitan Municipality, comprising the city of eThekwini (Durban) in South Africa. We show that our simulation accurately models the empirical data and that some of the main variables driving the model can be sourced from alternative data sources in the absence of national statistics. Lastly, we discuss the implication of these findings for other data-scarce African metropolitan areas, and more generally for similar contexts in the Global South.

Study area
The eThekwini Municipality is located on the east coast of South Africa in the province of KwaZulu Natal (figure 2). It spans an area of 2297 km 2 characterised by coastal plains and a steep, dissected topography. The municipality is a middle-income metropolitan region of 3.6 million people residing under highly unequal social, economic and environmental conditions. The urban landscape consists of commercial buildings, formal and informal settlements and periurban agricultural land. Approximately 68% is considered rural, with pockets of dense settlements, and about 25% of the inhabitant live in informal and rural settlements. The region therefore shares an urban and rural landscape with a wide range of settlement types, where both rural and urban areas experience significant population growth. The unit of analysis for the study was the municipal ward level, the smallest administrative unit in South Africa; there were 103 wards in the 2011 census, and 113 in 2017 due to changes in administrative borders. Wards are also the scale at which socio-demographic data are collected in the national census of South Africa (StatsSA 2011).

Model design
Our approach is articulated in four steps. First, we take advantage of the existing and openly accessible 2011 Census data from Statistics South Africa (http:// statssa.gov.za/) to develop a metric of overall direct dependence on natural resources (equation (1)). This metric of ES dependence, is determined by the product of the proportions of households in a ward that do not directly depend on the 5 natural resources considered (table 1). In this formulation, 100% means that all households in a particular ward depend on at least one provisioning ES, and 0% means that none of the households depend on any of the provisioning ES. The five ES were chosen for two main reasons: first, because they are provided by the census data, and second because they can be disaggregated to the smallest administrative unit in South Africa. Other data, e.g. water quality determined by households treating water before using it, are only available at the provincial scale, which is not a useful scale for our analysis.
Secondly, we use seven socio-demographic variables available at the ward level in the South African Census data and one variable representing the coverage of open spaces across the municipality (described in table 2), to explain the output from the metric. Some of these variables were also used in previous literature (Hamann et al 2015). The socio-demographic factors we used in this study have been identified elsewhere in literature to be important determinants of how communities, households or individuals interact with the natural environment and ultimately how they benefit from it ( We investigate the explanatory power of the input variables (E1-E8) by applying machine learning (Willcock et al 2018). Our learning model uses the WEKA (Waikato Environment for Knowledge Analysis) machine learning software (http://cs.waikato.ac.nz/ ml/weka/) and integrated into k.LAB, the software powering the ARIES (Artificial Intelligence for Ecosystem Services; http://aries.integratedmodelling.org/) technology adopted for the study. We apply a Bayesian classifier (BayesNet) that learns a naive Bayesian network from the data using the K2 algorithm with all input nodes connected to the output node (Witten et al 2018). The training process involved 10 cross-validation iterations, each using 90% of the data to train the model and 10% to validate it. The machine learning algorithm used quantitative variables, discretised in 10 equal intervals, for both inputs and outputs.
To estimate dependence in the future, we use projected population growth data in the 25 years between 2011-2036 as input for the trained model. Population growth was based on the annual growth rate of population during 2001-2011, which averaged 1.09%,  As a last step, we reapply the model at the pixel level, rather than at the ward level, using only two of the eight exploratory variables (table 2: E5, E7), simulating a scenario in which only global and remotely sensed data are available. We discuss the predictions of this 'data-poor' model scenario both in the eThekwini Municipality and in the Cape Town metropolitan region. Figure 4 shows our metric of dependence, the percentage of households dependent on at least one of the five ES described in table 1, at the ward level. Around eThekwini, only a small proportion of households are directly dependent on one or more of the above listed natural resources, as one would except of an urban setting. Such proportions increase the farther away from the city, as would also be expected. In the most marginal areas, 100% of households are directly accessing at least one natural resource among water, food (own grown), solid fuel (for heating and for cooking) and building materials.

Modelling dependence with socio-demographic variables
The machine learned Bayesian model, using eight input variables in table 2, is able to reproduce the empirical output with significant accuracy as reported in table 3: 92% of the instances are correctly classified using the training dataset and 60% using only the validation dataset. The spatial results are illustrated in figure 5(A). However, not all of the variables contribute equally. For example, a similar network using only the predictors that describe population (E2, E5, E7, E8) is able to capture 80% of correct instances (using the training set). The influence analysis of the network shows that three out of eight predictorspopulation density, proportion of large households, and proportion of households living in informal settlements-are driving the results (see table 4). The strength of influence is calculated from the conditional probability tables and expresses the difference between the probability distributions of two nodes by looking at the posterior probability distribution of a node, for each possible state of the parent or child node. To summarize this difference, we report normalized Euclidean distance, although other types of distances (e.g. Hellinger) are also used (Anderson 2006).
The 25-year projected analysis shown in figure 5(B) suggests a slow and mixed urbanisation transition in the metropolitan region, with most wards only slightly reducing their dependence on local natural resources and few moving in the opposite direction. The first outcome is more evident in the coast and in the closest outskirts of the city, while the second is seen in peri-urban areas and in the interior, confirming a divide between more and less urbanised areas. This can be attributed to densification in formal areas.

Application of the model in data-scarce situations
As a preliminary test, we applied the machine learned model described previously using only two out of eight input variables: presence of informal settlements (E5) and population density (E7). These variables, producing the strongest influence on the model results, can easily be provided through alternative data sources, including remote sensing and open data (e.g. for informal settlements see Busgeeth et al 2008, Chakraborty et al 2015. The Bayesian model was run with vector files containing polygons for the informal settlements (provided by the regional authority for eThekwini and downloaded from an open data project on informal settlements, http://ismaps.org.za, for Cape Town). Population density data came from the global grid dataset provided by NASA (Gridded Population of the World: GPW, v4), referring to the year 2015. The model Figure 3. Structure of the naive Bayesian network with all explanatory variables as input nodes connected to the output node. All nodes are discretised into 10 classes. Variable names and arrow thickness specified in table 4. was run in two spatial contexts, eThekwini and Cape Town metropolitan regions (figure 6), at 100 m resolution, although population density is currently offered at a resolution of 30 arc-seconds (approximately 1 km at the equator). The latter is thus the main limiting factor in achieving finer-scale results. In each 100 m cell, the Bayesian network model simulates expected output using data-based evidence for the two mentioned variables and the prior probabilities learned from the previous steps for the remaining variables. Figure 6 highlights how the distinction among wards is lost in favour of a more evenly distributed outcome. The gradient of dependence is still visible, particularly for the eThekwini context where the data about informal settlements is more comprehensive. Differently from eThekwini, the metropolitan area of Cape Town is regarded at 99% urban area.

Discussion
Given our results, we hypothesize that our data-driven model could be applied in other areas of South Africa or even in parts of the African continent, with fewer input variables, and still retain the ability to capture the dependence of households in metropolitan areas a All input nodes and the output node of the network are discretised in 10 equal intervals. b There were four incorrectly classified instances-a 20% class was classified as 10%, a 50% class as 40%, a 100% class as 80% and finally a 60% class as 100%. on natural resources. However, although the model provides reasonable indications in the eThekwini and Cape Town cases, its exportability to other cases requires further investigation. At the same time, the combination of machine learning techniques and heterogeneous typologies of data can be customized to local needs and provide useful indications in datascarce situations. Nevertheless, we showed that a simple model using 2 variables, is able to reasonably capture the spatial distribution of the gradient's extremes (respectively urban areas and natural areas) and some key hotspots of transition coinciding with the presence of informal settlements. It is important to note that the conditions for how households in a particular geographic location depend on direct natural resources is also dependent on the availability, climatic conditions and other contextual factors. While South Africa provides detailed demographic data at the ward level, this is not the norm in other parts of the continent (Chandy 2013, Randall andCoast 2015). A methodology capable of substituting or complementing incomplete or missing national survey data with freely available or less expensive information can improve the coverage of studies of natural resource dependence-a valuable outcome as most regions of the world lack such data (Karp et al 2015). Availability of such data is increasingly important for the governance of social-ecological systems, as countries and regions look for ways to achieve equitable and sustainable development.
Our study exemplifies the use and integration of multiple data sources to respond to multidimensional problems (Tallis et al 2012). More generally, this work points at a possible future where the role of census data changes from being the main source of information to supporting learning and validation of models that can be run with proxy inputs. The use of census data as a complement to other data has long been done for Small Area Estimation Yu 1994, Pfeffermann 2002). Other combinations of data include use of , it is becoming increasingly possible to assess changes over time at significantly higher frequencies. Integration technologies such as k.LAB (http://integratedmodelling.org), can reduce data preparation time and enable large-scale replication without additional costs by enabling rapid data and model access. Such advantages increase when the number of explanatory variables is small, as our study highlighted. With rapid assessment tools and easy web access to data and models, it becomes possible to envision monitoring services such as early warning systems, where specific structural transitions can be predicted based on continuously updated, remotely sensed data. In addition, this methodology could also be used by national statistic departments to track progress towards SDGs. Our model has four main demographic driving variables (E2, E5, E7, E8) which overshadow the role of income. We expected income to be important, after studies (Shackleton et al 2008, Yang et al 2013 that highlight poverty as a significant determinant of dependence on natural resources. One explanation for this discrepancy is that South African households are shielded from much of the direct dependence because the government provides some free basic services to low-income households who cannot afford them, such as water, sanitation, waste removal and electricity (Muller 2008, Bhorat et al 2012. Another explanation might involve the intricate connections between poverty and other variables (Unterhalter 2012, Djoudi et al 2016) such as gender, geographic positionality, ethnicity, immigration status etc. Additionally, poverty is also linked to and influenced by external drivers, path dependencies and cross-scale interactions (Haider et al 2018). As a result, its effects may be masked or embedded in other process, especially those associated with power and with how systems of oppression and marginalisation play out in informal settlements (Rao et al 2017).
Our analysis shows how investigating dependence with a fine-grained approach (e.g. ward level versus fine-resolution grids) unveils fine-grained variations not captured by census data aggregated by administrative unit. For example, both the eThekwini and the Cape Town metropolitan regions were previously classified as uniformly 'red loop', representing low direct dependence on local natural resources (Hamann et al 2015); our analysis shows variations in dependence, comprising low, high and transition loops. Within these areas we are now able to identify potential transition hotspots, crucial to understanding key social-ecological dynamics, such as inequality of access to natural resource and public infrastructure. Such areas are shared by people relying on direct harvest of natural resources, municipal services, or on a combination of both, with a mixture of green and red loops across households. This suggests that dependence on natural resources cannot be characterised in discrete groups, but rather along a gradient with underlying feedbacks as to when and how long natural resources are used. While our model is unable to determine which resources are preferred in space or time which still warrants a deeper analysis, identifying this gradient improves our ability to spot source areas of direct resource use. We argue that spatially explicit analysis based on verified criteria and finer-grained spatial data is an improvement over the use of data aggregated per administrative unit.
The lack of finer-scale temporal analysis is certainly an area in need of improvement (Karp et al 2015). Our attempt to deal with this limitation through projection of population growth rates does not suggest radical changes in natural resource dependence, which remains largely mixed (figure 5). It is possible that the level of dependence will persist for the foreseeable future (McHale et al 2013), as the importance of informality in the urban and peri-urban system is becoming a common characteristic of rapidly growing cities on the African continent, and generally in the Global South (Roy 2009, McFarlane 2012. Thus a key priority for the local authorities should be to maintain adequate local ecosystem health (e.g. by further preserving and/or expanding conservation areas like the D'MOSS) around the identified transition hotspots, so that the supply of ES can be maintained over time. We regard this as a robust policy recommendation even considering the assumption of constant population growth over time, which might be underplaying more complex dynamics to come. For example, dependence on local ecosystems is determined by more than population growth and may be mediated or substituted by municipal provision of services. Additionally, the urbanisation process is not linear and may even yield surprising and potentially positive outcomes (Sanderson et al 2018), as people move between cities and rural areas in a circular migration pattern (Collinson et al 2007, McHale et al 2013. Yet this finding does highlight the need for adequate holistic management processes that are mindful of both human and environmental needs, and suggests that natural resources within these systems need to be managed accordingly along the gradient.

Conclusion
Survey-driven social data such as those made available by national statistical offices are expensive to gather and update frequencies are low (usually 10-year cycles). For example, much of the demographic data required to monitor key proxy indicators, such as poverty, remain difficult to access or too expensive to develop for many areas of the globe. Such information is likely to be patchy or inconsistent at a certain spatial scale, or available for very few points in time over long intervals. Thus, efforts to complement, harmonize and validate inconsistent information and improve spatial and temporal resolution of such data products by means of more easily obtained data are of paramount importance. In this study we show that machine learning techniques can contribute to filling knowledge gaps concerning the use of natural resources by human societies. For our context of study, we show that it is possible to model human dependence on local natural resources with few demographic variables (population density, intensity of informal settlements, immigration rate and proportion of large size households) that can be produced at higher frequency and finer spatial resolution at minimum cost, by combining remote sensing with crowdsourced information. The methodology is particularly suited for monitoring programs in data-scarce areas of the Global South, as a means to include estimates of ES use in addition to well-developed methods to understand ES supply, and can be generalized to meet the needs of different sustainability related issues. Fully understanding the spatial and temporal dynamics of human dependence on local natural resources is fundamental in order to target existing policies, or create new ones that can enable the maintenance of adequate local ecosystem health, even more so in rapidly urbanising contexts.