Modelling built-settlements between remotely-sensed observations

Mapping settlement extents at the annual time step has a wide variety of applications in demography, public health, sustainable development, and many other fields. Recently, while more multitemporal urban feature or human settlement datasets have become available, issues still exist in remotely-sensed imagery due to coverage, adverse atmospheric conditions, and expenses involved in producing such feature sets. These challenges make it difficult to increase temporal coverage while maintaining high fidelity in the spatial resolution. Here we demonstrate an interpolative and flexible modeling framework for producing annual builtsettlement extents. We use a combined technique of random forest and spatio-temporal dasymetric modeling with open source subnational data to produce annual 100m x 100m resolution binary settlement maps in four test countries of varying environmental and developmental contexts for test periods of five-year gaps. We find that in the majority of years, across all study areas, the model correctly identified between 85-99% of pixels that transition to built-settlement. Additionally, with few exceptions, the model substantially out performed a model that gave every pixel equal chance of transitioning to the category “built” in each year. This modelling framework shows strong promise for filling gaps in crosssectional urban feature datasets derived from remotely-sensed imagery, provide a base upon which to create future built/settlement extent projections, and further explore the relationships between built area and population dynamics.


INTRODUCTION
Having time series of regular and consistent observations of built settlement extents is important given that forecasted growth of populations within dense urban areas are expected to continue through 2050, with much of that increase will occur within Africa and Asia (1,2).
Further, rapidly changing magnitudes and distributions of both settlements and populations have significant implications for sustainability (3), climate change (4,5), and public health (6,7), amongst others.At local and regional levels, the availability (or non-availability) and accuracy of built-settlement extent data affect measured population distributions, densities, and built type (e.g.urban, peri-urban, and rural) used to inform and shape policies.The 2030 Agenda for Sustainable Development, which have a focus on accounting for and including "all people everywhere", reinforced the need for readily and globally available baseline data to guide efforts and measure progress toward its Sustainable Development Goals (SDGs) (8).
Urban has been defined in many ways across many fields with different definitions existing even within the same field depending upon the specific application.Many countries define urban as a function of some population magnitude/density threshold or based upon administrative jurisdictions and functional economic areas and activities (9,10).While not conducive to applications requiring global consistency in definitions (11), none of these definitions of the concept of urban are objectively wrong.Urban, whose formal yet vague language definition is "of, relating to, characteristic of, or constituting a city" ( 12) is a complex amalgamation of the physical environment, population, economics, movements, and connectivity (3,(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30). Figure 1, Part A gives a generalized diagrammatic view of the factors contributing to the concept of urban.
Figure 1.Conceptual relations and definition of "built-settlement" (Part B) within the generalized concept of "urban" (Part A) and the broad, non-exhaustive contributing factors that make these concepts.
As a result, many studies have turned to a definition based upon the remotely-sensed physical features of urban areas, i.e. the built-environment.However, even reducing the definitional scope of urban to its physical dimension, the form of built-environment can widely vary across space and time due to materials used, differences in urban morphology, and the surrounding environmental context (18,31,32).Initially, remotely sensed urban definitions were optically-based thematic classifications of land cover, typically captured the "built-environment," including buildings, roads, runways, and, sometimes erroneously, bare soil (18,(33)(34)(35).Later improvements using supporting information about the surrounding environment and vegetation during post-processing helped discern the true built-environment from the surrounding land covers (18).Other notable advances include the use of high resolution orthographic imagery to detect subtle short-term built-environment change in China (36) and the use of Landsat imagery to create multi-temporal thematic representations of the built environment across the globe (37).
Coinciding with advances in imagery, statistical methods, and computational resource availability, high-resolution datasets with global extent have been created either through combining multi-source optical imagery with contrast detection methods (38,39) or utilizing Synthetic Aperture Radar (SAR) data with object-based image analysis to refine the capture of urban features, with a focus on vertical human-made structures, while attempting exclude other anthropogenic land covers (40).However, it remains a challenge to produce a consistent global urban feature product while maintaining high temporal and spatial fidelity, meaning most of the global multi-temporal urban feature data sets refer to few time points across a larger time period.Further, the resource cost of producing these data remains relatively high (41) and there can be pre-existing gaps in the input data, due to selected sensor/platform characteristics or problems and adverse atmospheric conditions, prior to the other fidelity considerations.While there is now a global abundance of high-resolution imagery, with various instruments and revisit times on various platforms, not every image is suitable for extracting urban features due to the aforementioned reasons and/or because the processing cost is either not economical for the global production of high frequency urban feature extractions or the funding for such endeavours is not available.
One way to address these issues is to leverage years where remotely-sensed urban feature extractions with high spatial fidelity are available and interpolate for missing time points and areas of interest by modelling between the coverage of remotely-sensed urban feature data.Overall, urban feature/built-environment growth models have disproportionately focused on high-income countries, which have different dynamics than low-and middleincome countries (1,2,21,42), and most have been limited to city or regionally specific models (42)(43)(44)(45)(46)(47)(48)(49)(50)(51).Previous methods of modelling urban feature/built-environment growth across space and time at the continental and global scales include land cover/land use transition models (52,53) and cellular automata models (50,54,55), with features or thematic classes extracted from remotely-sensed imagery being the primary source of cross-sectional input for these models (18,(38)(39)(40)56).Readers are referred to Li & Gong (57) and Sante et al. (50) for comprehensive reviews of the wide field of cellular automata models as applied to urban feature/built-environment growth modelling.
Of the few models predicting urban feature/built-environment growth across the globe within a standardized framework, almost none have explicit spatial prediction finer than country level summaries (20,21).Global models that did provide discrete spatial predictions, did not allow local sub-national variation to drive the modelled changes or had not been assessed against comparable existing datasets (20,42,58,59).Because the focus of this study is on modelling urban feature/built-environment extents that better represent where people may be located, we adopted the Global Human Settlement Layer (GHSL) concept of "builtsettlement" (BS), which is defined as, "…enclosed constructions above ground which are intended for the shelter of humans, animals, things or for the production of economic goods and that refer to any structure constructed or erected on its site."(38).We generalized the definition of built-up structure to include other urban feature datasets that attempt to refine their thematic or feature set to represent buildings associated with human activities while attempting to exclude more general impervious surfaces, such as roads, parking lots, and runways.With the adopted definition of BS, the analogue to the process of "urbanization", is taken within a remote sensing context to be the physical transition from a non-BS area to a BS area.
With these definitions in mind, we leveraged recent advances in multi-temporal global BS feature datasets, global environmental datasets, subnational census-based population data, and computational methods to develop a globally applicable and flexible modelling framework.Our specific objectives were to i) determine if random forests can reasonably predict non-BS to BS transitions, ii) use the random forest to create an automated globally applicable framework that can produce spatially explicit estimates of BS extent at annual steps using sub-nationally driven geospatial covariates and population counts, iii) validate the model performance and compare the model predictions to the "observed" data.

Study Areas and Data
We selected four countries (Table 1) from across the globe to capture a variety of BS morphologies, contexts, and evolutions as well as to demonstrate the flexibility of the model for differing spatial resolutions of input census-based population data, as measured by the average spatial resolution (60).The countries selected here were Panama, Switzerland, Uganda, and Vietnam, which exist in rather contrasting environmental/urban biomes (18) and topographies and represent quite different cultural and developmental contexts (from low-, middle-, to high-income.We also selected a number of covariates based upon previous urban feature/builtenvironment models (42,55,61) to give immediate environmental/land cover context and connectivity of areas to urban/built agglomerations to the model.Ultimately, the model is not dependent on any specific covariates, retaining a level of flexibility for use in a wide variety of applications.For example, we have also tested the model using a minimal set of globally available predictive covariates to provide input to other modelling efforts and avoid potential covariate conflict for end users.In the case presented here, covariates available for each year were either directly used in the model (i.e.esa_cls190 and DMSP night-time lights) or aggregated temporally (i.e., VIIRS night-time lights), while covariates available for just one point in time were used only if reasonably assumed to be time invariant (Table 2).All covariates were pre-processed, appropriately resampled to a common spatial grid having a resolution of 3 arc seconds; with the latter chosen as a compromise between the higher resolutions of some covariates (Table 2) and the coarser covariates such as the ESA land cover datasets.All data used to restrict the area of modelling and inform the redistribution of transitions are also detailed in Table 2. Further details on pre-processing of specific covariates are provided in the Supplemental Material.

Overview
Here we interpolated for every year between a set of timepoints, T = {t0, t1, t2, …, t1} where t0 is the initial observed timepoint, t1 is the final observed timepoint, and all other times t are points lying between t0 and t1 for which we had observed BS extents.The time between any two observed time points t is referred to as a period, p. Within this study, t0 is interpolated.We generalize the process to determine the timing and number of transitions for each time step independently for each subnational unit, hereafter unit, as follows: 1. Create a population map for all tobserved time points in T.
2. At all tobserved, for each unit, extract the time-specific population count within the time specific BS extents and derive the corresponding average BS population density.
3. On a unit-by-unit basis, interpolate the extracted BS population count between each tobserved using piecewise-fit logistic growth curves and BS population density by fitting natural cubic splines across all tobserved.
4. Estimate expected unit, time-step-specific, total BS extent area, in number of pixels, and the expected number of transitions for that time-step based upon predicted unitspecific total BS population and BS population density.
5. For every unit, adjust the expected transitions by the sum of all expected transitions across the given period, e.g.2000-2005, and divide by the total observed changes, e.g.

BS extents minus 2000 BS extents.
Repeat for all periods.
6.For each unit, use the adjusted predicted transitions from step five as relative weights within a given unit to dasymetrically redistribute observed transitions from the larger source period to smaller individual time-steps, i.e. years, while maintaining the original number of transitions when all time-steps are summed.Repeat for all periods.
To spatially disaggregate the predicted transitions, we first train a Random Forest (RF) model ( 76) to produce a continuous surface representing the probability of a given grid pixel transitioning from non-BS to BS between t0 and t1.For every time-step, processing each subnational unit independently, we utilized unit normalized lagged lights-at-night (LAN) data to annually adjust the base RF-derived transition probabilities.The assumption behind this being that areas that underwent the largest increase in brightness, relative to the rest of the unit in a single time-step, have a higher probability of transitioning and vice versa.
From the pixels that were known to have transitioned, as indicated in the input BS data, we selected pixels with the n th highest probabilities for transition, where n was equal to the number of pixels predicted to transition for that time-step.We then converted those pixels to BS, recorded the new BS extents, and used those extents as the basis for the next time-step of transitions.This resulted in a series of regularly spaced time-specific binary spatial predictions of the BS extents in raster format.A full process diagram is shown in Figure 1.
All modelling and analyses were carried out using R 3.4.2(77) on the IRIDIS 4 highperformance computing cluster (see Supplemental Material for code).

Random Forest (RF) Estimation of the Probability of Transition
Given that the binary dataset of transition/non-transition constitutes an intrinsic "imbalanced set" (78), i.e. there are many more non-transitions than transitions; we adopted a stratified random over/under-sampling method (78), similar to (42), as follows: (i) randomly sample 80 percent of the pixels that transitioned, up to 50,000 and, (ii) randomly sample an equal number of pixels that did not undergo transition.The choice for equal sampling of each stratum was determined by testing different relative proportions and samples sizes until finding the most consistent and best model results, balancing performance and efficiency.
To create a probability of transition surface for the complex and non-linear phenomenon of BS transition, we utilized a RF model to accurately and efficiently model across an entire country at the pixel level (i.e. 3 arc second spatial resolution).A RF model was selected for its robustness to noise, its automatability and efficiency, and its ability to capture non-linear and complex interactions (76).Further, in at least one study, RFs have been shown to perform equally, if not better, than other methods, such as support vector machines and logistic regression, when predicting the probability of transitioning from nonbuilt-environment to built-environment (79).
We trained the classification RF on whether a pixel had transitioned between time t0 and t1 (1) or not transitioned (0) against the corresponding values of covariates at time t0.We used the constructed RF to predict for the entire given country.Rather than accepting the default output of the RF classifier, which outputs a single predicted class as indicated by the majority of the predictions of its individual constituent trees, we wanted a continuous, 0.00 to 1.00, probability of transitioning in to discriminate between high and low probabilities.Given that we trained the RF as a binary classifier, we took the mean of the individual tree predictions for each pixel.This class probability has a value between 0.00 and 1.00 represents the posterior probability of a pixel being classified by the RF as transitioning between t0 and t1 (76).

Population Mapping of Endpoints
To interpolate, we first needed a spatially-explicit best estimate of the subnational unit specific BS population at all observed timepoints in the modelling period.To get this we created a population surface using the available time-specific and, assumed, time-invariant covariates (see Supplemental Material, Table A2) using the WorldPop RF method, described in Stevens et al. (80), to dasymetrically redistribute the time-specific population totals from the subnational unit level to 3 arc second grid pixels (80)(81)(82).
For any given time point in the population modelling, we included the distance to nearest BS edge for the t0 timepoint, i.e. 2000, as population relates to older parts of a BS agglomeration differently from newer ones (83).For example, if we were to model the population map of 2010 we would include the distance to nearest observed BS edge for 2010 as one of the predictive covariates as well as the distance to nearest BS edge corresponding to the observed 2000 BS extents.This was done to avoid centres of agglomerations being assigned artificially low population densities relative to the preceding modelled time point (83).For each subnational unit, we then extracted and summed the total populations that were spatially coincident with the BS extents and derived the corresponding BS population density for use in the BSGM predictive phases.

Transition Magnitude Estimation
To estimate the number of transitions for each time-step within the study period, we used the predicted BS population changes and the predicted changes in BS population density for every subnational unit.We first interpolated the BS population count of each unit i for every year, BSPOP(t), by fitting logistic growth curves (84), in a piecewise manner, i.e. for each time-period between observed points, using the year-specific total population, Ki(t), as the varying carrying capacity (85) as shown in Equation 1Preprints Sibly and Hone (90) note, "While environmental stressors have negative effects on population growth rate, the same is true of population density, the case of negative linear effects corresponding to the well-known logistic equation."as put forth first by Verhult (91) in 1838.Furthermore, Ledent (28) showed that urbanization, the process of population becoming urban, across time can be adequately summarized by "S-shaped curves", specifically the functional logistic form.We followed this same underlying conceptual logic using the above logistic curve with a dynamic limiting factor (Equation 1), i.e. the total population of an area is the theoretical limit of the temporally coincident BS population count.
Then, for each subnational unit, we interpolated its average BS population density across all observed time points in the study period using natural cubic splines (92) and the observed points as the knots, i.e. for ESA 2000, 2005, 2010, and 2015.Because there is a lack of literature and data on the actual or theoretical limits of human population density, we selected cubic splines, which have a long history in demographic interpolation, including interpolation of rates associated with urbanization processes (28,92).We are assuming that the trend of population density change of a given subnational unit is smooth and continuous across time, because short of some drastic (and unlikely and largely unaccountable "population shocks" such as wars or natural catastrophes) event, at the annual time scale we would not expect to see "cliffs" of population density change.With a piecewise linear interpolation, we would see such sharp turns in rates of change.With our priority being that the fit curve would match values of our observations, the rationale behind non-parametric smoothing is to employ un-parametrized functions (i.e.splines) that adapt to the data, rather than adapting the data to a given distribution, (i.e. a parametric approach).We did not choose polynomial interpolation or a higher degree spline to avoid Runge's Phenomena where, as data is added, the derivatives at each data point increases, resulting in large oscillations of rates of change between data points.
Using the interpolated subnational unit BS population and the interpolated subnational unit-average BS population density, we simply relate population and population density to get the expected number of transitions in Equation 3  () =   ()   () where BSDi(t) is the unit, i, average BS population density at time t.See Supplemental Materials for how predicted "negative growth" resulting from Equations 1-3 was handled.

2.2.5Dasymetric Redistribution of Transitions Across Time
The number of predicted transitions derived from Equations 1-3 are not inherently constrained by the observed transitions between any two observed time points.To match the total number of observed transitions for the modelled period, we reweighted the transitions of each time-step on a unit-by-unit basis.This is essentially a temporal dasymetric redistribution of transitions from the larger source period, e.g.2000-2005, to the smaller target periods of t, e.g.2001,2002, etc., based upon temporal information contained in the time-specific unit population totals.To calculate the weight for each time-step, wt, we write the calculation in Equation 4 as: where t is again relative to the given period from 1 to the last year k and all wt for a given unit i sum to one.To obtain the temporally weighted transitions,   ̂, we multiplied the weight of each year by the observed number of transitions in Equation 5, rounding to the nearest whole number for each year (see Supplemental Materials, section A4 for obtaining agreement with rounding differences).

𝐵𝑆𝐶𝑁𝑇 𝑡𝑖 ̂= 𝑟𝑜𝑢𝑛𝑑(𝑤 𝑡𝑖 * ∆𝐵𝑆𝐶𝑁𝑇 𝑖 ) [Eq. 5]
Where ∆  is the number of observed transitions, in pixels, from non-BS to BS for a given unit i.This allows the model to maintain agreement of transitions between the points that we interpolated.

Spatially Disaggregating Transitions Using Annual Unit-specific LAN-weights
For each period, we then processed the tabular predicted transitions into time-specific BS extent maps, i.e. spatially allocated the transitions within each subnational unit for each time-step.We spatially assigned the transitions within each unit using the RF-derived transition probability surface adjusted by time-specific weights in the form of subnational unit normalized LAN brightness differences, i.e. one time-step lags (see Supplemental Material).Given that the non-BS to BS transition process is iterative in nature, we began by taking the extents of the previous time-step, or the previous observed extents if t was equal to one.We limited the location(s) where transitions could be allocated within the subnational unit to pixels where transitions were observed to have occurred, as defined by the input BS data.For every one of those locations, j, assuming they weren't transitioned in previous steps, we retrieved the transition probability as calculated in the transition probability surface.We took this base probability of transition for every pixel j and adjusted it by the spatially coincident lagged and weighted LAN data, for time-step t, using Equation 6: where ()  is the RF-derived transition probability where transition was observed,   is the corresponding adjusted LAN difference observed for the time-step t, and   ()  is the adjusted probability of transition for the time-step t in a given pixel j within the administrative unit i.Similar to previous models (42,53), we assumed pixels with a higher probability of transition are more likely to transition before pixels with lower probabilities.We selected the n th highest probabilities from the subset of potential transition pixels, where n was equal to   ̂, changed the value of those selected pixels to represent a transition to BS, and output the union of the new transitions and previous BS extents as the predicted BS extents for that time-step.We repeated this procedure using the newly produced extents for the preceding time-step as the base BS extent for the next timestep's transition procedure, until all time-steps for the period were processed and then the entire procedure was repeated until all periods had been processed.

Validation and Comparison Metrics
While the RF produces its own validation estimates (76), we tested the accuracy of the RF classifier by randomly sampling 100,000 pixels, not utilized in the training of the RF, for validation.We selected this sample size as we were able to obtain sample prevalence rates equal to the known true prevalence rates of each country while still maintaining efficiency.
Based on this sample, we plotted receiver operator curves (ROCs) and, given the imbalanced data (93,94), precision recall curves (PRCs) with simulated perfect and random classifier curves for comparison.
Here, for the ESA models, we are comparing the predicted extents to all withheld extents between 2000 and 2015.For every year of prediction, we determined whether a pixel was True Positive, False Positive, False Negative, or True Negative, TP, FP, FN, TN, respectively.Pixels used for validation of the modeled outcomes were limited only to pixels having been observed transitioning from non-BS to BS between the modeled periods for two related reasons: to no worse than the input data, although temporal uncertainty and the spatial configuration of transitions for any specific year between those periods remained.
Essentially the same limiting of uncertainty seen with "volume preserving" spatial only dasymetric mapping, but extended to the temporal component as well as indicated in Mennis and Hultgren (82).
2) Given that we masked our predictions to only pixels we knew transitioned, if we were to have included pixels that we knew not to have transitioned, we would have grossly and erroneously inflated the error metrics as they would have had no chance of disagreeing.
Using the validation data for comparison, we calculated contingency table-based metrics to evaluate classification accuracy based primarily on the F1 score (Table 3) which is the harmonic mean of recall and precision, the quantity disagreement (95), and the allocation disagreement (96).We aggregated the pixel level results, to the unit level and calculated the same metrics since precision, and by extension F1, is sensitive to the corresponding prevalence and is subject to the modifiable areal unit problem (MAUP) (97).The MAUP not only reduces variance in value distributions the more the data are aggregated from their original resolution ( 97), but will result in different prevalences with different subnational area, i.e. zonal, configurations.The equations of the metrics calculated are listed in Table 3.As suggested by Pontius,Shusas,and McEachern (99), to assess whether the model is adding any additional predictive ability , we constructed a naive (basic) model that randomly assigns the transitions to a year within the given period, with every year having an equal likelihood, and carried out predictions for each year within pixels that were known to have transitioned for comparability with our model.Again, we determined for each pixel whether it was a TP, FP, FN, or TN and calculated metrics to compare the BSGM and the naive model at the across each country at the pixel level, and at the unit level.The naive model was bootstrapped 500 times based upon resource limits and prediction stability, for each year and was specific to each country.

RESULTS
Across all study areas, two-thirds of the modelled years correctly predicted between 85-99 percent of transition pixels.For all years, again at the pixel level, the BSGM displayed low quantity and allocation disagreement in both absolute and relative terms.Similarly, the pixel level F1 score, with few exceptions, was higher than the naive model, yet had more variance in absolute terms of performance.Comparable results were found at the unit level, with particularly good results in the middle and later years of the study period.

RF Performance
The ROC plots in Figure 2 show that the RFs approach the performance of the theoretical perfect model.However, given the imbalanced data, the PRC plots (Figure 2) show a more nuanced picture of performance where a maximum level of precision is quickly achieved, remains steady up to a certain value of recall that varies by study area, and then precision quickly decreases with increasing recall.Of the covariates informing the RFs of the models, we consistently saw that the most important predictors of a pixel transitioning from non-BS to BS were covariates related to distance ("esa_cls190_dst_2000") and local density ("esa_cls190_prp_5_2000", "esa_cls190_prp_10_2000", and "esa_cls190_prp_15_2000") of BS established at the beginning of the overall study period, i.e. 2000, and connectivity of BS extents at the beginning ("tt50k_2000") or end ("urbanaccessibility_2015" and "osmroa_dst") of the study period (Figure 3).  2 for covariate names.

Pixel Level Results
Examining the proportion of pixels known to transition that were predicted correctly in Table 4, we show that out of 48 model years predicted, 39 of those years had correctly predicted proportions between 0.80 and 0.99 (green) with 25 of them having proportions over 0.90.The ESA based modelled years ranged from 0.57 to 0.99 of pixels predicted correctly (Table 4).Note that one minus the proportion correct is equal to the total disagreement of the predicted pixels, i.e. the sum of the quantity and allocation disagreement (96).Examining the year-specific study area F1 scores in Figure 4  Examining the source of the disagreement further, we display the observed quantity and allocation disagreement as well as the corresponding disagreements under the naive model in Figure 5.We show that for all modelled years using the ESA data, the total disagreement is substantially less than that of the naive model and predominantly, the disagreement produced by the BSGM model is predominantly due to allocation error (Figure 5).However, there does appear to be a pattern of increasing disagreement due to quantity error after 2010.Full annual contingency data and metrics in supplementary material

Subnational Unit Level Results
Overall, at the subnational unit level, we found results similar to the pixel-level results, including poor performance in absolute terms between 2001 to 2003, but some units were obviously performing worse than others as compared to the naive model.Plotting the ESA-informed model distributions of unit-level F1 scores by study area and year against the corresponding naive model performance, we show that the BSGM generally performs better in the majority of subnational units from which the transitions were disaggregated from (Figure 6).At worst, e.g.Vietnam 2002, approximately half of the units were still performing better than the naive model (Figure 6).For quantity disagreement (Figure 7) and allocation disagreement (Figure 8), results similar to pixel level results were found.Plotting the unit-level metrics for all models as choropleth maps (see Supplementary Material for select maps and shape files containing contingency data), shows that years of generally good performance, the units of lesser performance are those that correspond to areas of less densely settled areas and the peripheries of established urban areas.Other years, such as Uganda 2001, performed poorly across many units with no apparent pattern.Identical analyses for the WSF Evolution data are given in the supplementary materials as well.
Additionally, for each country the GeoTiffs showing the BS extents at 100m resolution for every predicted year are provided in the Supplementary Materials.

DISCUSSION
The 2030 Agenda for Sustainable Development and its SDGs, have reinforced the importance of data to being able to account for "all people everywhere (8).Differences in the dynamic spatial distributions of hazards (100,101), the spatial variation of the effects of climate change (5,102,103), spatially allocating services to ensure sufficient coverage (104,105) However, the BSGM is neither without error nor a total replacement for urban feature extractions.Given that the BSGM is interpolative, its predictions are limited by the accuracy, the spatial quality, and the temporal quality of its input urban feature dataset, the input time-specific subnational population data, and the input population surfaces.For example, the poorer model performance seen from 2001 through 2003 (Figures 4-8) are likely due to the fact the ESA data did not delineate changes, detected at 30 arc sec resolution, at 10 arc second resolution due to the MERIS and PROBA V imagery not being available (75).
With regards to total disagreement of the model (Figure 5), the relatively high contribution of allocation disagreement prior to circa 2010 and corresponding decrease in contribution post-2010 is likely due to the switch from using coarser DMSP-based LAN data to VIIRS-based LAN data at the 2012 time point.
Further, models are limited by their conceptual and mathematical assumptions.In this framework, we are assuming a certain relationship between relative population and population density changes and drive demand for temporally coincident settlement growth.This is not to say the assumed relationship is correct; here we assume BS population grows logistically with a time varying capacity that is temporally coincident, i.e. not lagged, and we assume BS population density follows a natural cubic spline across all observed points.This is further predicated upon the assumption that the BS growth is strongly correlated by changes in population and or population density and the resulting demand is instantaneously filled as opposed to being delayed temporally.While there is support for population change being an empirical and theoretical driver of urban/BS growth (20)(21)(22)42,58), there is also evidence for the use of other covariates, not used here because of their unavailability at subnational levels globally through time, such as Gross Domestic Product and arable land per capita (20,21).Furthermore, there are other "intangibles" such as local, regional, and national land use or development policies, which almost certainly shape BS growth, but are typically not in an accessible format, if available at all.The value of using population data to predict growth of settlement, shown here in a semi-independent model framework, and the value of using urban/settlement feature data sets to predict population distribution (108), raises the question of whether within a strictly predictive modelling framework (109,110), i.e. no inference on causality and measuring effect, if it is worthwhile or proper to try to fully separate population and settlement given their reciprocal causal links via economics, i.e.
economics drives population distribution and built-settlement which can then feed back into economics.
The modelling assumptions here are preceded by the assumption of exponential interpolation of annual population totals by the GPWv4-based data (66) and that the RFinformed dasymetric redistribution of those population totals are correctly locating the population in a manner which leads to correct BS population estimates for the BSGM to utilize.We already know that the RF population model tends to underestimate populations in BS and overestimate populations outside of urbanized areas (80).Since the BSGM allocates transitions based upon relative changes in BS population, this last point should not affect prediction timings, assuming the RF-informed population modelling, or any spatial population modelling approach, biases are consistent between the two times.Alternatively, any spatially explicit population datasets can be used as inputs for the BSGM, even the base GPWv4, removing the need to use a modelled population input.As with any "model built upon models," producers and users of such data must be cautious of accumulated data error, however, given the results here, error is relatively low for most predicted years, although some undoubtedly originates due to accumulated errors.
With any area-based metric the Modifiable Areal Unit Problem (97) must be considered, as the total number of pixels in each unit is typically larger in the less settled areas resulting in less variation of aggregated metric values in those areas.With dasymetric redistribution methods, the size and configuration of the source units, spatially or temporally, can also affect the quality of the disaggregation with the larger relative differences between source unit and target unit sizes introducing more uncertainty to the output (81,82).As the concept of urban growth, regardless of the chosen representation of urban, can be thought of as an incremental process, with future outcome dependent upon previous growth, the gridded outputs of the BSGM can be aggregated across years to decrease uncertainty of the interpolated extents should annual datasets not be needed.
Unfortunately, for both the use of logistic curves to interpolate between estimates of built-settlement population data and cubic splines to interpolate between estimates of builtsettlement population density, independent data does not exist to evaluate the error or uncertainty of these interpolated values largely because of absolute data availability.Further, even if such urban population and urban population density data existed, because of the differing definitions of urban (2,111,112), most probably we would have had a definitional disagreement with our adopted BS definition (unless these data came with corresponding spatial extents).
This aside, we also cannot calculate uncertainty of these curves because they are nonparametric growth curves or simple fitted splines that are not conducting any statistical inferences.However, we are not actually using the interpolated values of BS population and BS population density as the predicted outcome of interest, but rather to derive estimated counts of non-BS to BS pixel transitions that are then used as relative weights for the spatial disaggregation of the actual observed transition counts across time.This dasymetric disaggregation by weights precludes the propagation of any uncertainties calculated before the disaggregation step, a well-known characteristic of dasymetric methods, limiting us to only measuring absolute error of the final transitions as we have done here.

CONCLUSIONS
Here we introduced and validated a flexible modelling framework for globally modelling BS between observed time points, with 39 of 48 validated years having over 80 percent agreement and 25 of those years having over 90 percent agreement (Table 4).As global urban feature dataset producers such as ESA, MAUPP, GHSL, GUF and others continue to improve and release datasets with higher temporal resolution, models such as the BSGM will likely still have utility due to imagery/extraction issues and a need to smooth or fill-in time periods where difficulty was experienced (39,40,62,113).By the time annual urban feature extractions from currently available imagery have become an economical means of filling gaps and the current demand for annual global urban extent datasets, eventually becoming the standard, there will have grown a demand for quarterly extents , monthly extents, and so on.Until such a time, globally consistent feature extractions and an interpolative model can attempt to fill the need, data permitting.This is not to say that interpolative models and imagery-extracted features are oppositional, but rather are complementary.Should the time come where high-resolution global annual feature datasets become the norm, this would offer a wealth of information from which to improve the model assumptions the BSGM currently makes.Further, the predictions of the BSGM could serve as a comparative check in the production of future urban feature extractions and a platform to explore population and BS dynamics.
As informative as global urban feature extraction datasets are, imagery will never see into the future and we plan on extending this framework to allow for short-term projection of BS growth, both in a predictive manner as well as allowing scenarios to be input.We found that the primary predictors of growth BS extents were related to connectivity, i.e. road networks, and local, i.e. ~0.5-1.5km,settlement density (Figure 3) both giving support to work in attempting to define "urban" base on contiguity, connectivity, and spatial density (114)(115)(116) and implying that investment in detecting/simulating new road network data would be beneficial to better predicting urban feature extents.Still mostly unknown is how the BSGM would perform for smaller settlements, not captured by coarser datasets such as the ESA land cover, and we are looking to test this with forthcoming feature data sets with resolutions below 3 arc seconds.Lastly, we plan to validate the utility of these dataset in an applied manner by comparing the effects of including the BSGM derived extents in annual population modelling.
Here we have presented an open source framework for interpolating a binary BS or urban feature dataset using a limited covariate dataset that can be used to further population mapping by filling a current gap in annual global urban feature datasets.This framework is scalable globally, but also allows for sub-study area variation in transition probability, population changes, and lights-at-night changes to drive the overall study area model.

2. 1 . 1 .
Built-Settlement DataWe chose to use the "Urban" thematic class, class 190, from the annual ESA CCI land cover dataset (hereafter ESA) for our study.It was selected for its annual coverage from 1992 to 2015, allowing for the exclusion of years in the modelling process for validation of outputs.For our period of interest, 2000 to 2015, the ESA data creates annual 10 arc sec resolution (~300m at Equator) datasets by looking for thematic class changes from a baseline land cover map, produced using MERIS imagery, using 30 arc second (~1 km at the Equator) SPOT VGT imagery (1999-2013) and PROBA-V imagery (2014-2015)(75).Prior to 2004, detected changes are delineated at 30 arc second resolution.Starting in 2004, if there are changes detected, then the individual pixels of change detected at 30 arc second are further delineated using 10 arc second MERIS or PROBA-V imagery(75).To reduce false detections, changes must be observed over two years or more(75).Furthermore, the Global Human Settlement Layer (GHSL)(38,39) and Global Urban Footprint (GUF)(40) datasets are utilized in defining the extents of the ESA Urban class(75), which thus incorporate elements of two BS datasets within the larger built-environment context.While still undergoing full validation, initial validation efforts estimate the 2015 Urban class user and producer accuracies between 86-88 percent and 51-60 percent, respectively(75).We also test and validate a single year from an alpha version of forthcoming multi-temporal World Settlement Footprint (WSF) dataset, known as WSF Evolution(41), and present the results in the Supplementary Material.

2. 1 . 2
Population Data Annual population estimates from 2000 to 2020 for subnational areas were provided by the Center for International Earth Science Information Network (CIESIN) in tabular format with unique IDs corresponding to a consistent grid containing the corresponding unique subnational unit IDs (66).Populations and the areas of subnational units are based upon the Gridded Population of the World, version 4 (GPWv4) and as such follow the methods detailed in Doxsey-Whitfield et al. (66) for the interpolation of population between years of official counts or estimates.

2000, t1 is
2015, and we also have observed time points at 2005 and 2010, however the framework can handle any regularly spaced intra-period time-step if the input data corresponds.Therefore, in this study, for the ESA informed models we are carrying out modelling on three time periods, 2000-2005, 2005-2010, 2010-2015, for a total of 12 years

Figure 2 .
Figure 2. Overview of the generalized modelling process for a case of only two observed timepoints, t0 and t1, with references to utilized equations.

1 )
Being an interpolative model, we constrained the areas of possible transition predictions to only the areas of observed transition.This limited the spatial uncertainty of the model between 2000 to 2005, 2005 to 2010, and 2010 to 2015

Figure 2 .
Figure 2. Receiver Operator Curve (left plots) and Precision Recall Curves (right plots) with the RF model performance, blue lines, against a random model, red lines, and a perfect model, green lines, for each modelled country and input dataset.

Figure 3 .
Figure 3. Random forest covariate importance as measured by the average log decrease in the Gini impurity when the covariate is used as the splitting criteria at nodes; higher values indicate better performance of covariate.Model for Swizerland (CHE) ESA, Panama (PAN) ESA, Uganda (UGA) ESA, and Vietnam (VNM) ESA, are shown.Refer toTable 2 for covariate names.
, we show that the BSGM had low performance between 2001 and 2003, in absolute terms, and relatively near naive model performance, across all ESA study areas.The F1 score for all modelled years after 2003 increases to quite good performance, with values approaching 1.0 in some cases, and the difference between the BSGM and naive model performance increases even more dramatically (Figure 4).

Figure 4 .
Figure 4. Pixel-level F1 score by year for Switzerland (CHE), Panama (PAN), Uganda (UGA), and Vietnam (VNM) as compared to a naive model.Full annual contingency data and metrics in supplementary material

Figure 5 .
Figure 5. Pixel-level quantity and allocation disagreement of BSGM and naive models for Switzerland (CHE), Panama (PAN), Uganda (UGA), and Vietnam (VNM) as compared to a naive model, given in red.Full annual contingency data and metrics in supplementary material

Figure 6 .
Figure 6.Unit level F1 score box plots, by dasymetric period, of Switzerland (CHE), Panama (PAN), Uganda(UGA), and Vietnam (VNM) ESA informed models as compared to a naive model, given by a red "x".Number of units exhibiting any transitions for each period and a defined metric value is given above the x-axis.

Figure 7 .
Figure 7. Unit level quantity disagreement box plots, by dasymetric period, of Switzerland (CHE), Panama (PAN), Uganda (UGA), and Vietnam (VNM) ESA informed models as compared to a naive model, given by a red "x".Number of units exhibiting any transitions for each period and a defined metric value is given above the x-axis.

Figure 8 .
Figure 8. Unit level allocation disagreement box plots, by dasymetric period, of Switzerland (CHE), Panama (PAN), Uganda (UGA), and Vietnam (VNM) ESA informed models as compared to a naive model, given by a red "x".Number of units exhibiting any transitions for each period and a defined metric value is given above the x-axis.

Table 1 .
(60)ary Average spatial resolution is the square root of the average subnational area, in km 2 , and can be thought of as analogous to pixel resolution with smaller values indicating finer areal data and vice versa(60) of built-settlement transition data by country and period.Areal units here are pixels (~100m) as that is the unit handled by the model which looks at relative areal changes as opposed to absolute areal changes.a

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 8 April 2019 doi:10.20944/preprints201812.0250.v2
Covariates involved in Demand Quantification were used to determine the demand for non-BS to BS transitions at the subnational unit level for every given year.Covariates involved in Spatial Allocation were either used as predictive covariates in the random forest calculated probabilities of transition or as a post-random forest year specific weight on those probabilities and the spatial allocation of transitions within each given unit area.Cov ariates used as restrictive masks prevented transitions from being allocated to these areas.c The binary BS data utilized 2000, 2005, 2010, and 2015 as observed points in the dasymetric modelling process, but only derived covariates for 2000 were utilized in the random forest as predictive covariates

Table 3 .
Classification agreement metrics utilized and a brief description of each.The F1-score is interpreted as the harmonic mean of precision and recall.TP is "True Positive", FP is "False Positive", FN is "False

Table 4 .
Proportion of transition pixels predicted correctly by the BSGM by year.Note that 1the proportion correct is equal to the overall disagreement, i.e. the sum of the quantity and allocation disagreement.Modelled years with proportions greater than or equal to 0.80 are highlighted in green.
(8)6,107)geting interventions and planning(106,107)based upon local context with limited resources requires more accurate mapping of BS and mapping of populations, both large and small(8).Here we have shown a flexible modelling framework constructed from open-source methods and covariates to produce a framework that can be scaled to global extent across a variety of study areas and input data.This model approximates patterns of BS growth through time with good agreement for most years at the pixel and the units used in disaggregation (Table4, Figures4 and 6).Here we have shown the BSGM framework is capable of filling gaps of imagery-derived urban feature datasets by estimating the extents in between observations based upon relative changes in BS population and BS population density combined with environmental covariates.This emphasizes the strength of using an interpolative model, such as BSGM, as opposed to more imagery dependent annual feature extraction methods that may encounter adverse atmospheric conditions, limited sensor revisits, or the need for more resource intensive imagery-based interpolation or extraction Further, the model can be adapted to run at other scales, both spatially and temporally, either by modifying the provided code (See Supplementary Material) or in many cases, simply by modifying the input data.Annual global interpolated datasets from 2000 to 2014 based on GHSL/ESA/GUF input datasets, produced with an early version of this model and a reduced set of covariates, is freely available on the WorldPop website (worldpop.org)with the up to date model code used here provided in the supplementary material.