Geospatial Analysis of Displacement in Afghanistan

The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


Introduction
An estimated 5.5 million people were internally displaced in Afghanistan in the summer of 2021, according to the International Organization for Migration (IOM, 2021).The 2020 Global Report ranks Afghanistan among the top five countries with high levels of internal displacement due to conflict and violence.It ranks first with the highest number of Internally Displaced Persons (IDPs) due to natural disasters (IDMC, 2020a, 11-12).Important geographic disparities can be observed, with regions such as Herat, Helmand, Badghis, or Kandahar hosting many IDPs (Figure 1).Since 2017, 350,000 -480,000 people have been displaced annually due to ongoing conflict and violence (UNOCHA, 2020;  UNOCHA, 2021).A developing drought that threatens millions with food insecurity worsens a precarious political context.At the same time, fragile contexts like Afghanistan have relatively scarce on-the-ground data due to constraints on data collection.This paper explores how remote sensing data, particularly geospatial data, can be leveraged to monitor displacement flows in Afghanistan.It seeks to draw lessons from the 2018 drought, which up to that point was considered one of the worst in decades.Between mid-2017 and mid-2018, Afghanistan recorded a deficit of 70 percent precipitation and insufficient snowfall, which constitutes an important water source for irrigation (Oxfam,  2018).The 2018 drought was dubbed the worst drought in two decades and affected 22 of the 34 provinces in the country.About 13.5 million Afghan people faced heightened food insecurity in 2018, with lost livelihoods, livestock deaths, and forced displacement into urban areas (FAO, 2019).An estimated 371,000 people left their homes, including 120,000 in Badghis province or about a quarter of the population.They congregated in the provincial capital Qala-e-Naw to receive emergency humanitarian assistance.
To identify patterns of displacement, the team combined displacement data from the IOM Displacement Tracking Matrix (DTM) with Nighttime Lights (NTL) data obtained from NOAA's VIIRS satellite. 3his exploratory study was regionally focused on capturing displacement in and around Badghis.The results suggest that cumulated displacement movement over 2018-2020 can be proxied by trends in NTL imagery.Settlements with higher net inflows of displaced persons between 2018 and 2020 have larger NTL growth than others, and the results hold with additional robustness checks.Allowing for non-linearity suggests diminishing returns to displacement flows; settlements with the largest level of NTL expansion are those with the lowest level of displacement inflows.
The model uses NTL data to predict whether a settlement was a net receiver of displacement flows in 2018-2020 and finds promising results.The paper uses a fixed effect model to predict the total inflow of displaced persons and estimate whether a settlement has a positive or negative net inflow, i.e., whether it was an inbound area or served as an outbound area that people fled.We find a significant and concave relationship between the average NTL and the cumulated net inflow of displaced population over 2018-2020.The reverse model correctly classifies 63.2 percent of these settlements, a promising baseline of accuracy that could be improved upon with more training data.
This study provides a proof of concept to test whether population displacements can be proxied by variations in geospatial data.It contributes to the literature on geospatial data and displacement by combining remote sensing data and IOM administrative data in Afghanistan.This country has experienced large-scale displacement and where security and accessibility constraints make ground-truth data sparse.Geospatial data offers a unique opportunity to track the evolution of indicators through time-series satellite images in hard-to-reach areas.With this work, we aim to complement the literature on displacement by providing an example of how to proxy for displacement flows in situations where data are inconsistent and hard to collect.The methodology can be scaled and tailored to the country level to monitor the effects of drought on displacement.
The rest of the paper is organized as follows: section 2 summarizes the literature on geospatial analysis to track displacement.Section 3 outlines the datasets leveraged for the analysis.Section 4 presents the analytical results and robustness checks.Section 5 illustrates how these results can be used to predict displacement levels based on geospatial data alone.Section 6 concludes.

Literature Review
A growing strand of literature is seeking to track displacement using satellite imagery.Our work on displacement flows and NTL echoes the seminal work of Giada et al. (2003), which utilized satellite images from refugee settlements to estimate refugee populations.The identification of new refugee settlements has also been enabled by improved remote sensing (UNITAR, 2011), which can provide detailed maps of settlements (Wang et al., 2015; Pelizari et al.,  2018).Several reviews of previous research on remote sensing in conflict and human rights work were elaborated by Marx & Goward (2013), Witmer (2015), and Quinn et al. (2018).
Our study directly complements existing humanitarian work using nighttime lights (NTL) imagery to monitor displacement crises.Li and Li (2014) explored the spatial and temporal patterns of nighttime lights in the Syrian Arab Republic, finding a moderate correlation between NTL loss and the number of IDPs in each province.Witmer and  O'Loughlin (2011) analyzed fluctuations in the NTL levels of cities within the Caucasus region of the Russian Federation and Georgia between 1992 and 2009 to detect conflict-related events, including large flows of populations.Their findings confirmed that such satellite data could detect large displacement movements.Quinn et al. (2018) showed that machine learning algorithms can return precise estimates of forcibly displaced people and geographic structures.They demonstrated how machine learning with satellite imagery could support humanitarian operations by providing objective assessments of the situation in the aftermath of a natural disaster.
Humanitarian agencies have also explored alternative indicators, notably the destruction of houses, to estimate the size and duration of migration.The Internal Displacement Monitoring Centre (IDMC) and Human Rights Watch (HRW) contributed to the global research on displacement by using indicators of housing destruction and flooding gathered through satellite and aerial imagery.They demonstrate the viability of such an approach in Türkiye and the Arab Republic of Egypt.Such methodology was especially effective in urban settings, where the availability of images over time allows for time-series analysis to track the construction and destruction of buildings.In a 2018 paper, HRW documented how Rohingya villages were bulldozed in Myanmar (HRW, 2018).The high frequency of satellite images allowed us to track the evolution of destruction over time.
In terms of real-time monitoring, this study contributes to the work produced by the humanitariandevelopment nexus to predict displacement flows using variations in satellite imagery-derived data.For example, the UNHCR Winter Cell work aimed at forecasting migration flows with weather data, as harsh winters tend to impede movements across borders and within countries.The team collaborated with national migration agencies and meteorological offices and monitored social media to elaborate daily weather reports and provide information on possible effects on camps, border controls, and transportation.
Limitations in terms of measurement errors are nevertheless important to keep in mind and further make the case for better data on displacement in Afghanistan.Due to the absence of a clear definition of geospatial borders for IOM settlements, the possibility that settlement buffers could overlap cannot be excluded.This could bias the results if the weights of specific outcomes are artificially inflated by appearing multiple times in overlapping buffers for what should be a unique settlement.In other words, in denser areas, some of the buffers may overlap; hence the population summarized can be duplicated.Therefore, displacement numbers might also be overestimated by counting multiple times the same population.This, however, is not an important issue for our analysis, which focuses on the evolution of the displacement flows in settlements rather than absolute values.In addition, the survey relies on self-reported estimates, which can be associated with important measurement errors.There again, assuming that the bias remains constant throughout time, this issue is unlikely to affect the analysis dramatically.
The IOM Displacement Tracking Matrix (DTM December 2020) allowed us to identify 294 settlements in Badghis across the seven districts.As of December 2020, there were 294 settlements in Badghis for which IOM has collected data on inflows and outflows of persons (Figure 2). 2 Key indicators refer to IDP arrivals (Afghans who fled other settlements and presently reside in the assessed location), returning IDPs (previous IDPs who returned to their location of origin), returnees from abroad (Afghans who returned to Afghanistan after having spent at least six months abroad) and out migrants (Afghans who migrated from the assessed settlement to another country).Summary statistics are provided in Table 1.Among these 294 settlements, 247 (84%) were surveyed four times during 2018 and 2020, 7 (2%) were surveyed three times, 20 settlements (7%) were surveyed twice, and 20 settlements (7%) recently emerged in the northern bound of the province and were surveyed between June and December 2020 by IOM (round 11 of data), see Figure 3.We then have 1,069 observations (with multiple data points per settlement).Note: NTL stands for Nighttime Light.There are 294 settlements, among which 84% were surveyed four times over the 2018-2020 period, 2% were surveyed three times, 20% were surveyed twice, and 20% were surveyed once (in 2020).Hence, the total number of observations is 1,069.It is to be noted that the Net Inflow variable may take a negative value.Indeed, the net inflow of persons can be negative if the population migrating out of the settlement (internationally or within Afghanistan) is higher than the number of displaced persons settling in.Source: WB staff' computation using IOM DTM 2020.
Second, each settlement was associated with its average monthly NTL value around a 1km buffer, from January 2018 to January 2020.The imagery is obtained from the Day/night Band (DNB) of NOAA's VIIRS platform, known as Nighttime Light (NTL).This sensor has a ground resolution of approximately 750 meters by 750 meters.NTL from VIIRS, and its predecessor Defense Meteorological Satellite Program (DMSP), capture low-light emissions from Earth; "These include sources that indicate aspects of human activity, like city lights, gas flares, fishing boats, and agricultural fires, while also capturing other nighttime lights phenomena such as auroras."(WB, 2020). 3 This data is available for every month in the studied period (2018-2020), with the exception of seasonal issues with reflectance and the DNB Sensor (for June). 4The team then created a buffer with a 1km radius around the IOM settlements in Badghis and extracted the average monthly NTL value within these areas to capture the monthly evolution of nightlights from 2018 to 2020.An overview of the average NTL value across IOM settlement for the entire 2020 year is displayed in Figure 5. Summary statistics are available in Table 1.The IOM data does not contain information on the actual settlements' sizes; hence we focus on the average NTL levels across a 1km-by-1km cell centered around the GPS point collected by IOM.This relies on the assumption that IOM staff members recorded a central GPS point, around which NTL activity is likely representative of the urbanization dynamic.As a reference, it takes roughly 10 minutes for a person to walk 1km.While the absence of solid data on settlement boundaries is a caveat, robustness checks will be run using a 2km and 5km radius.Note: WB staff computation.The mean NTL across all settlements in Badghis is 0.309, the standard deviation is 0.14, the minimum level is 0.03, and the maximum is 2.215, on a total of 9,702 cells.It is to be noted that there were no negative NTL values for the period being studied.

Main regressor
To identify settlements receiving many displaced persons (IDPs, returnees, and migrants), we look at the cumulative net inflow of persons in a settlement over December 2018-2020 in the IOM data. 5Net Inflow corresponds to the total net inflow of persons in a given settlement between December 2018 and December 2020, defined as the total inflows of persons minus the total outflows.Figure 6 shows the density of the Net Inflow variable across settlements in Badghis, and summary statistics of the variable can be observed in Table 1. 6While the underlying refugee data is only from settlements sampled by IOM and does not offer a comprehensive sample, the net displacement for many of these settlements is at or near zero, which would suggest many of these settlements had a stable population that can serve as a baseline (see Figure 6).

Controls
The correlation between displacement patterns and NTL growth could be affected by several factors related to economic growth, such as the initial urbanization level of the settlement.One could be concerned that displacement flows to hosting areas are not random, as IDPs may prefer to settle in urbanized areas, where access to services is easier.Large flows of displaced persons could tend to converge to large settlements, with a specific NTL growth pattern, hence skewing our interpretation of the data.There is evidence that displaced Afghan people flow to cities, where security and availability of services are perceived to be greater (IDMC, 2020b; EASO, 2020).IPDs affected by conflict tend to flee rural areas for their regional centers (Samuel Hall/NRC/IDMC, 2018). 7As the findings of the study confirm, IDPs assume that urban areas are safer and, in these areas, where employment opportunities, services, and humanitarian aid are more readily accessible, they would be more able to cope.Furthermore, IDPs seem to prefer to stay closer to their places of origin.Social ties could also be at stake, whereby migrants tend to congregate in places in which IDPs already settled to capitalize on the network effect.Thus, it is likely that large displacement flows converge to already large settlements with specific NTL growth patterns.To prevent overweighting existing urban centers with high baseline NTL values, we thus control for population estimates in 2017 (pre-drought levels).The team, therefore, incorporated 2017 population data from the Government of Afghanistan (population originally from WorldPop, see Section 4.2).
In addition, we also attempt to proxy for the type of economic activity present in the settlement, as the correlation between displacement flows and NTL growth might depend on the share of agricultural labor and urbanization levels.IOM started collecting data on the percentage of settlement income derived by sectors (e.g., agriculture) as of June 2020.While it is unfortunate that this data does not exist for the previous round, this caveat can be mitigated by controlling for the relative importance of agriculture in the settlement in late 2020, which returns information on the general level of urbanity in the settlement over the 2018-2020 period.We create the variable Agric2020 that associates each settlement with its proportion of income obtained through agriculture (farming, crop receivers of displacement inflows over the 2018-2020 period had consistent inflows during the whole period.As there would not be a lot of heterogeneity to exploit by looking at the dynamics of the 247 settlements, we focus the analysis on the cumulative displacement flows -which allows us to include a larger set of settlements. 7According to the study by Samuel Hall/NRC/IDMC conducted in 2017 in the provinces of Kabul, Herat, Kandahar, Kunduz, and Nangarhar, 92% of the respondents in the southwest of the country had moved to Kandahar city, 91% in the west to Herat city, and 76% in the east to Jalalabad (Samuel Hall/NRC/IDMC, 2018, 20).
Source: WB staffs' computations.production, etc.) and livestock (cattle, sheep, fish farming, etc.).For each of the settlements, summary statistics are presented in Table 1.On average, settlements derived 71% of their income from agricultural activities, with a minimum of 0% and a maximum of 100%.We also tried to include additional measures of economic activity, such as the average income level in the settlement and the share of employment in the service sectors (both only available for 2020).None of these yielded significant results and were hence not included in the regressions.
Yet limitations in the data available reduce our ability to fully disentangle the interaction between displacement flows, economic growth, and NTL evolution.One should then refrain from making strong inferences in terms of causality, as this paper mainly captures complex correlations.

Regression Analysis
In this analysis, we regress the NTL growth on the total level of displacement in the settlement, represented by the cumulative net inflow of displaced persons over the 2018-2020 period.The linear regression will return information on the general evolution of NTL, i.e., whether and how cumulative net inflow of displaced persons correlates with higher NTL growth.

Benchmark NTL growth
This section explores the NTL growth rate observed in Badghis, i.e., the NTL evolution over time, before accounting for potential deviations from displacement patterns.NTL patterns indicate improvements in overall electrification or an increase in other ambient light sources, like fires in previously electrified areas, and increases or appearance of electrification in areas with little or no historic light emittance.These findings can be shown by Li et al. (2020), which used global NTL time-series data (1992-2018) to show that the NTL time series experienced a trend of both increasing NTL and continuous spatial expansion for high luminance pixels, both in urban centers and fringe locations.This indicates that the NTL growth rate for a given area will primarily exhibit an increase in overall average NTL for an already electrified area or will indicate the presence of new electrification in areas with improved infrastructure or newly inhabited areas.While there is often growth and expansion of NTL over time, baseline NTL levels remain relatively constant due to existing infrastructure and outdoor lighting, with fluctuations over time reflecting shifts in population or economic activity.
To control for population size and level of economic activity, the analysis includes a benchmark population estimate (2017) and a measure of the share of agricultural income at the settlement level.Keeping in mind that the cumulative net inflow is based on the 2018 to 2020 flows of persons, the population levels of 2017 allow to proxy for initial urbanization levels and find the relevant benchmark to each settlement.
The first column of Table 2 displays the linear regression of NTL levels on its past values at the settlement level, controlling for district-fixed effects.As seen previously, there are 294 settlements spread across seven districts.Let's define  ,, the average NTL in the 1km buffer around the settlement i in district j, at date t (each month from Jan 2018 to Jan 2020).The following regressions looks at how the NTL levels naturally grow through time in settlements within Badghis province, accounting for the initial population in 2017, the share of community income derived from agricultural and livestock activities in late 2020, time seasonality (µ  the time fixed effect) and   the fixed effect for district j: Table 2 (column 1) shows that the benchmark growth of NTL levels is positive on average, i.e., a 1-unit increase in past NTL value in an IOM settlement always raises the current NTL value.
Quadratic robustness.Appendix C expands the analysis with a quadratic model.The regression analysis of the growth rate of NTL in Badghis settlement is positive and concave, as settlements with higher NTL in period t-1 will experience larger NTL in period t.The quadratic regression suggests that the NTL growth follows a concave shape; that is, settlements with the largest expansion are those with the lowest level of displacement inflow to start with.On average, the natural growth of NTL levels is such that a 1-unit increase in past NTL value in an IOM settlement always raise the current NTL, as long as past NTL values are smaller than 2.2, which is always true in the Badghis sample.8

Impact of displacement flows on NTL growth
New population flows settling in the location are likely to impose deviation from the NTL growth, e.g., through settlements' expansion, growing services needs, or the modification in the economic activity.High inflows of the displaced population are associated with a growing need for basic electricity services such as indoor and street lighting.In addition, they could directly impact settlements' economic activity, e.g., by modifying the repartition of the labor force throughout the different sectors or through a direct increase in the settlements' population.As NTL levels are traditionally used to monitor human activities, see e.g., Li et al. (2018), important influxes of persons are expected to impact the evolution of NTL levels.This note therefore does not assume a fixed growth rate of NTL but rather compares growth rates across settlements and provides insights on why some settlements have higher growth rates than others -reflecting an additional inflow of migrants but also increased economic activity.
A linear regression at the settlement level allows us to check for correlation between NTL growth and overall displacement flows between 2018 and 2020. 9The following regression looks at how the NTL growth across time is affected by Net Inflow, the cumulative net inflow of persons in settlement over the 2018-2020 period.The regression controls for the initial population in 2017, the share of community income derived from agricultural and livestock activities in late 2020, time seasonality (µ  the time fixed effect) and   the fixed effect for district j.For visibility's sake, let's label X := Net Inflow, the regression is hence: The NTL growth therefore follows The linear regression shows that settlements with higher net inflow of displaced population experience a larger NTL growth (Table 2, column 2).From equation (4), one obtains that β 3 is positive, i.e., d NTL i,t d NTL i,t−1 is increasing with Net inflow.In other words, an increase in net inflow of displaced population (for inbound areas with Net Inflow positive), or a reduction in net outflows (for outbound areas with Net Inflow negative) would be associated to larger NTL growth, while a marginal increase in outflows drops the NTL levels.Second, a marginal increase in the net inflow of displaced population would raise NTL levels, provided the settlement already recorded some level of human activity.A marginal increase in the net inflow of displaced population increases the NTL levels if equation ( 5) is positive, that is, if NTL i,t−1 > 0.28.As a reference, all settlements studied recorded an NTL value above 0.28 at least once during the 2018-2020 period. 10In other words, settlements receiving a higher number of displaced populations experience a higher NTL growth if the Nighttime Light level is not too small to start with.

Can We Use NTL Growth to Predict Displacement?
Having established a positive correlation between NTL growth and displacement flows, we aim to test whether overall migration patterns can be predicted using NTL data.The model described below yields promising results, as it correctly classifies 63.2 percent of settlements as serving either as a net receiver or net sender of displaced persons over 2018-2020.
First, we create the binary variable Inbound, which equals 1 if the settlement was a net receiver of displaced persons (Net Inflow ≥0) over the 2018-2020 period.The Inbound variable takes value 0 if it is an outbound area (Net Inflow <0).Out of the 294 settlements, 149 were net receivers (50.7%), and 145 served as outbound areas from which people mostly left (49.3%).
Second, for settlement i in district j, we then use a reversed analysis, whereby we regress the net inflow of displaced population (outcome variable Y) on the average NTL level, and the consecutive increase of NTL, the 2017 initial population level, the share of income derived from agriculture (in 2020), and the district fixed effect   .Let's define the regressors Δ ,,−1 =  ,, −  ,,−1 , with mean( , ) being the average  , across all periods between January 2018 and January 2020.The regression can be written as Table 3 shows that average NTL is a good predictor of the total net inflow of displaced population in a settlement across 2018-2020.A marginal increase in the average NTL by one unit would raise by 129,797 persons the cumulated inflow over the period.This relationship seems to be concave, at 95% confidence interval.This methodology allows us to predict whether a settlement is an Inbound area from coefficients in Table 3.We thereby construct the variable Predicted Inbound, which equals 1 if the predicted values for Net Inflow are positive, and 0 if they are negative.We then compare the actual Inbound classification with the predicted outcome.If the null hypothesis is H0: outbound settlement, the model has a false positive rate of 40 percent and a false negative rate of 34 percent.Overall, 63.2 percent of settlements are correctly classified.Given limited accessibility, this level of accuracy can inform preliminary assessments and resource mobilization prior, which can then be confirmed with onthe-ground validation when feasible.There is always a tradeoff between false positives and false negatives, in terms of policy preference.With limited resources, a model that minimizes false positives would allow the concentration of resources where they are most needed.Source: Authors' computation.
Using the reverse engineering method, hence, yields promising results, yet the methodology needs to be improved, which stresses the need to collect high frequency data on displacement and socio-economic outcomes at the disaggregated level.Including other NTL related regressors did not improve the performance of the model, neither did the inclusion of additional controls on economic activities (average settlement income and share of employment in services, in 2020). 11To date, alternative statistical methodologies do not yield better predictive power.For example, machine learning algorithm such as the least absolute shrinkage and selection operator (Lasso) or the random forest model do not outperform the current analysis, due to the low sample size (294 settlements).This paper works as a proof-of-concept, and calls for an expansion of the analysis to the whole Afghanistan, as a means to increase both statistical power and representativity.

Conclusion
This work investigated whether geospatial data could be used to proxy for displacement flows and settlement expansion, using NTL data.By focusing on Badghis province it provided a relevant proof of concept for a province that was particularly affected by recent climatic shocks, including the 2018 drought.We find that large displacement movements can indeed be observed by NTL satellite imagery.Settlements classified as high net inflow of displaced persons over the 2018-2020 timespan (high inflow receivers) have larger NTL growth than others.Results remain robust when looking at the net inflow of persons as a share of the existing population.Accounting for population in 2017 as a proxy for initial levels of urbanization does not change our findings.We can use these insights to predict which settlements are experiencing a large inflow of displaced people, with a true positive rate of 66 percent.
This paper provides a first basis for better understanding the drivers and spatial characteristics of settlement growth.Future expansions of the methodology could draw on additional sources of geospatial data (e.g., shifts in vegetation density due to drought).Evidence from the ground suggests that there are regional differences across displacement patterns based on geography, administration, and the like.Future work would investigate patterns between NTL growth of significant locations country-wide, within the context of recent IOM work on determinant factors to return.For the purposes of exploring drought-driven displacement, we kept the analysis constrained to Badghis Province in Northwestern Afghanistan, but the same approach is easily replicated in other regions or even country wide.The input data used in this analysis were the UN IOM DTM dataset from 2020, and the Nighttime Lights data were monthly composites from January 2018 to December 2020.2

Abbreviations and
To replicate the methodology, one may open the Google Earth Engine script at the universal link3 or download the script directly from GitHub4 .In the GitHub repository there is a folder titled "NTL," which contains the monthly NTL script that can be extracted to the buffers, as well as a script for extracting a monthly average of NTL data for the entire area of interest.The user then may update the boundaries of the analysis by simply uploading a new dataset and preserving the object name 'table' so that the script will apply the same processes on the new area of interest.
The user would also update the export location to ensure that the file saves to a location on their personal Google Drive account (or alter the code to save it as an Earth Engine 'asset', or dataset.The outputs of the methodology are a panel dataset in the form of a CSV (unless otherwise specified in the code) and can be exported from Drive onto one's local machine for further statistical analysis.
Appendix C: Quadratic model C.1.Benchmark NTL growth (quadratic) Table 5 displays the linear and quadratic regression of NTL levels on its past values, at the settlement level, controlling for district fixed effects.As seen previously, there are 294 settlements, spread across 7 districts.Let's define  ,, the average NTL in the 1km buffer around the settlement i in district j, at date t (each month from Jan 2018 to Jan 2020).The following regressions looks at how the NTL levels naturally grow through time in settlements within Badghis province, accounting for the initial population in 2017, the share of community income derived from agricultural and livestock activities in late 2020, time seasonality (µ  the time fixed effect) and   the fixed effect for district j.Equation-1 in the main text assumes a linear evolution of NTL, while equation-2 is quadratic: The NTL evolution is such that a marginal increase in past NTL value modifies the current NTL by  ̂1 + 2 ̂2 ,,−1 .
The regression analysis of the growth rate of NTL in Badghis settlement is positive and concave, as settlements with higher NTL in period t-1 will experience larger NTL in period t.The linear regression suggests that the NTL growth follows a concave shape, that is, settlements with the largest expansion are those with the lowest level of displacement inflow to start with.On average, the natural growth of NTL levels is such that a 1-unit increase in past NTL value in an IOM settlement always raise the current NTL, as long as past NTL values are smaller than 2.2, which is always true in the Badghis sample.5

C.2. Quadratic impact of displacements flows on NTL growth
Expanding the previous analysis with a quadratic regression returns interesting and significant results.While linear regressions return information on general trends, quadratic regressions account for non-linear dynamics.Hence, the coefficients β 3 and β 5 of the quadratic regression are of particular interest, as they represent the correlation between high inflows of persons and the NTL growth.Defining X =    , the regression is hence One obtains that Note: The settlements are spread across seven districts.Regressions control for population in settlement in 2017, the share of settlement income derived from agricultural activities in 2020, and a constant.
The quadratic regression displays a concave relationship between displacement flows and NTL growth, which suggests diminishing marginal returns of displacement flows on NTL.When comparing settlements of equal population size in 2017, settlements that show the largest impact of displacement flows on NTL expansion are those that received the lowest cumulated levels of displacement inflows.There exist several situations whereby a marginal increase in the net inflow of displaced persons within a settlement would increase the NTL levels.Figure 7 displays the set of combination of Net Inflow and past NTL values such that a marginal increase in the net total inflow of displaced person hosted over 2018-2020 increases current NTL level.As can be seen in the green shaded area, marginal increase in Net inflow would raise the NTL as long as the net inflow is smaller than f(NTL), with f(.) a quadratic function defined below the graph. 6(green shaded area) Inbound areas.In settlements that have a positive net inflow of displaced population, a marginal increase of net inflow would raise the NTL level whenever the inflow of displaced person is low enough, as long as there is some minimum level of human activities. 7It will however always drop the NTL level if the settlement serves as important inbound with a net inflow of displaced population large enough. 8In outbound areas, a marginal increase in the net inflow of displaced person would raise the NTL levels whenever the total outflow is negative enough, as we will have that Net Inflow i < f�NTL i,t−1 � < 0 . 9 Outbound areas.In settlements that serve as outbound areas for a small number of displaced persons, a marginal decrease in outflows (i.e., Net Inflow becoming less negative) would drop the NTL level if the past NTL level is small enough.Indeed, equation ( 7) is negative whenever f(NTL i,t−1 ) < Net Inflow i < 0. In other words, in outbound settlements with small outflows, there exists a range of NTL (smaller than 0.28) such that a reduction in the total outflows (i.e., marginal increase in net inflow) would reduce the NTL levels.
Equation-8 can be written as ) .Equation-8 is positive if 0 < X i < f(NTL i,t−1 ) ≡ 37310.9− 17647.1/NTLi,t−1 2 + 53445.4/NTLi,t−1 .However, the numerator is positive as long as NTL i,t−1 > 0.28; and equation-8 negative if NTL i,t−1 = 0.28. 9As can be seen in the left lower part of Figure 7, one gets that lim NTL i,t−1 → 0 (NTL i,t−1 ) =-∞.Hence, whenever Net Inflow is sufficiently negative, we find that equation-8 is positive for all NTL i,t−1 >0. a 1km radius around the settlement GPS point, any increase in the NTL value reported could represent either a variation in the intensity of night-time light, or could capture a spatial extension (still within the 1km radius).Our data does not allow to identify directly whether NTL growth originate from a spatial expansion or a variation in intensity.This question could be further investigated through an analysis of the relationship NTL growth and urban expansion.
Appendix D: Differences based on initial NTL levels Using a Jenks decomposition to identify natural breaks in the distribution of settlements' initial NTL level as of January 1 st , 2018, settlements are classified into two groups (high or low initial NTL) to measure the urbanization starting point.On average, settlements had an initial NTL level of 0.217, with a minimum of 0.175 and a maximum of 1.058.A Jenks decomposition is used to identify two groups of settlement (higher or lower initial NTL), based on the distribution of NTL levels (on Jan 1 st , 2018) throughout the Badghis settlement.The Jenks method allows to create two groups such that the variance within a group is minimized, while the variance between groups is maximized.The categorization is such that 289 settlements are associated with the low NTL group, with a minimum of 0.175 and a maximum of 0.414.On the other hand, 5 settlements are classified into the higher NTL initial level, with a minimum NTL of 0.474 and a maximum of 1.058.

Source: Authors' computations
The impact that displacement flows may have on the NTL growth does depend on initial level of NTL, as urbanized areas do not seem to be impacted by inflows of displaced.We run a linear and quadratic regression following equation ( 3) and equation ( 6), on two different sample: the 145 observations with high initial NTL levels (5 settlements across 2018-2020), and the 8381 observations with lower initial NTL levels (289 settlements).The linear and quadratic regression show that larger displacement inflows are always correlated with larger NTL growth in settlement with higher initial NTL levels, at, 95% confidence interval level, see columns 3 and 4 of Table C.2.
The linear and quadratic regressions analysis on the settlements starting with lower initial NTL levels return mixed results.Table C.2. column (1) show that equation ( 4) is positive, i.e., the NTL growth increases with the Total Inflow, as the interaction term is positive.The linear regression suggests that a marginal increase in net inflows of persons would increase the NTL in places with medium-high levels of human activities, i.e., with NTL above 0.30.The quadratic regression displays interesting patterns: -In outbound settlements, a reduction in the net outflows of displaced persons raises the NTL level in outbound areas with important human activity.A sufficient condition for a marginal decrease in outflows (i.e., an increase in Total Inflow) to be associated with an increase in NTL levels is that past NTL be larger than 0.9. 10 -In inbound areas, a marginal increase in net inflows of persons would increase the NTL in places with small levels of human activities, i.e., with NTL between 0 and 0.29.Robustness checks will be run, by modifying the monthly NTL values associated to each settlement, through changes in the size of the catchment area around settlements GPS points (from 1km to 2 and 5km radius).As discussed earlier, we do not have information on the actual settlements' borders, and must rely on averaging NTL values around the given GPS location.While the study focused on a 1km-by-1km cell, this section extracts the average NTL values observed within 2 and 5km radius around the IOM settlements GPS point.As a reference, a 2km distance would take roughly a 20min walk, while 5km would be completed in 1hour.Summary statistics are available in Table 5.0.1.Linear results are robust to the construction of NTL average based on a 2km and 5km radius instead of 1km, and the 5km radius results suggest an even stronger positive correlation between Net Inflow and NTL growth.The 2km radius analysis suggests that a marginal increase in Net Inflow would increase the NTL levels in settlements which had a past NTL levels larger than 0.64.The 5km radius analysis shows that a marginal rise in Net Inflow would always be associated with an increase in NTL levels.
Quadratic results become more complex when using the NTL average based on a 2 and 5km radius instead of 1km, as some results differ.

In inbound areas:
-Using the 2km radius, a marginal increase in Net Inflow would increase NTL level when the initial level of human activity is low enough (for NTL below 0.22), while the 5km radius analysis suggest that this is true (for NTL below 0.31), provided that the settlement is an important receiver of net inflows, i.e., the Net Inflow is high enough.
o Using the 2km radius average for NTL values, equation ( 7) can be decomposed as ) .One finds that f(NTL) is positive whenever NTL < 0.22 or whenever NTL >0.56.The function h(NTL) is positive when NTL in [0.25; 0.79].
o Using the 5km radius average for NTL values, equation ( 7) can be decomposed as , with =  .One finds that f( ,−1 )>0 whenever  ,−1 < 0.31, and that the function h is always negative.Hence, equation ( 7) is positive whenever past NTL levels are above 0.31 and the Net Inflow is large enough.
-Using the 2km radius, a marginal increase in Net Inflow would increase NTL level in areas with already some level of human activity (NTL > 0.56) as soon as the inflow of displaced persons is high enough; while the 5km radius infirm this finding.Using the 5km NTL average, there exists no level of net inflow Net Inflow high enough so that an increase in net inflow would raise the NTL levels, even in areas with some level of human activities In outbound areas, a marginal increase in the net inflow of displaced person would raise the NTL levels whenever the total outflow is negative enough in areas with past NTL levels high enough (above 0.56 for the 2km analysis, and above 0.31 for the 5km analysis).This section attempts to study even further how displacement flows may impact NTL growth differently based on the initial level of human activity and urbanization, e.g., depending on whether IDPs will have to set up a camp or can settle in an already existing urban center.While the initial regressions attempted to control for this by including initial population level (in 2017), this section goes further by allowing the relationship between NTL and displacement to depend on initial NTL levels (as of January 1 st , 2018), or by replacing the main independent variable to account for displacement inflows as a share of the existing population.
There is however no clear evidence of a correlation between displacement flows and initial NTL levels.No correlation was observed between total displacement outflows and initial NTL levels, and the dynamic observed in inbound settlements seems to be mostly driven by outliers (Figure 5.0). 11The impact of displacement flows on NTL growth might depend on the relative size of the settlement, we hence test for the robustness of our result when changing the main independent variable to account for net inflow of displaced population as a share of total population in population.Displaced households settling into a new settlement might have a very different effect on NTL growth (through modification in urbanization and human activity), depending on whether they have to build a new settlement or settle in an already urbanized area.In particular, the speed of infrastructure construction and access to additional services is likely to differ, hence resulting in different NTL growth.

Variables
For the sake of robustness, the analysis will therefore be completed by replacing the independent variable with two alternative measures of net inflow of persons as a proportion of settlement population.First, the variable Average Inflow Share accounts for the average share of displaced population hosted in a given period (i.e., the average net inflow of displaced persons hosted in a settlement as a percentage of the total population on a given period).Figure 5.1 shows the density of the Average Inflow Share is variable across settlements in Badghis, while summary statistics can be observed in Table 5.1.Second, the variable Net Inflow Share is constructed as the sum of the net inflow of displaced persons divided by the cumulated population in settlement (both summed over each observation available for 2018-2020).Figure 5.2 shows the density of the Net Inflow Share is variable across settlements in Badghis, while summary statistics can be observed in Table 5.2

Figure 5.1 Distribution of average net inflow of displaced persons in settlements (as share of population)
11 The dynamic remains unchanged when including the top 1% outflows, i.e., settlements with more than -5000 net total inflow (more than 5000 individuals having fled the settlement -in net)..8400924Note: The net inflow of person can be negative if the population that migrate out of the settlement (either internationally or within Afghanistan) is higher than the number of displaced persons settling in.

Results
There is a risk that NTL growth depends on the size of displaced population relative to the total settlement population, we therefore test the robustness of our results by first accounting for the average net influx of displaced person, as a share of settlement population.That is, we first replace our main independent variable Net Inflow by the variable   , representing the cumulative net inflow of persons as a share of population in settlement (see Section 4.1 for descriptive statistics).The linear regression following equation ( 3) is are displayed in column 1 of Table 5.3, the quadratic regression equation ( 6) in column 2.Then, results are replicated by using the Net Inflow Share (rather than the average), in columns 3 and 4.
The linear regression is robust to the inclusion of the alternative variables; a marginal increase in net relative inflow of displaced population (relative to settlement population) do increase the NTL, as long as the past NTL level is not too small.Assuming a linear relationship, equation ( 5) is positive as long as past NTL levels are larger than 0.24, see column (1) of Table 5.3.An increase in the average net inflow of displaced persons (as a proportion of population) raises the NTL level as long as past NTL level is not too small.Similarly, an increase in the total net inflow of displaced persons (as a proportion of cumulated population) raises the NTL level -equation ( 5) is positive -as long as past NTL levels are larger than 0.22.As a reference, all IOM settlements in Badghis recorded an NTL value above both thresholds at least one during 2018-2020.12The robustness analysis using quadratic regressions on both NTL and displacement measures yields mixed results, column 2 and 4 of Table 5.3.
There are several cases whereby a marginal increase in the Average Inflow Share (relative to cumulated population) increases the NTL levels, e.g. in inbound settlements with large average net inflow and past NTL levels within a medium-low range, or in important outbound settlements with both low and high past levels of NTL. 13 -In inbound settlements, a sufficient condition for equation ( 7) to be positive is that past NTL level lies between 0.35 and 0.50.As a reference, this corresponds to around 20% of observations in Badghis.Another sufficient condition is that past NTL ranged between [0.22;0.50]and that the average inflow share be sufficiently large.
-An increase in the Average Inflow Share (average net inflow of displaced relative to settlement population) will reduce the NTL as soon as the past NTL and Average Inflow Share are high enough.A sufficient condition for equation ( 7) to be negative is that the Average Inflow Share (Xi) is high enough and the past NTL level is higher than 0.5.Around one quarter of Badghis settlements reached that NTL threshold at least once during the 2018-2019 period.
-In settlements that serve as important outbound areas, a reduction in the average outflow share (increase in the average inflow share) would increase the NTL both in areas that have very low or medium-high level of human activity.In outbound areas, a sufficient condition for a marginal increase in the Average Inflow Share to increase the NTL is that the average outflow is large enough, in areas with low past NTL levels (below 0.22) or high NTL levels (larger than 0.5).
There are several cases whereby a marginal increase in the total net inflow (relative to cumulated population) increases the NTL levels, e.g., in already urbanized outbound areas with high NTL, and in important inbound areas with medium human activity levels.
-In inbound areas, a sufficient condition for equation ( 7) to be positive is that NTL in [0.26,0.56] in inbound area.It is negative if past NTL levels are larger than 0.55 and inflows are large enough.That is, a marginal increase in the total inflow of displaced persons as share of cumulated population would decrease the NTL in important inbound areas that already have medium-high level of human activity. 14-As before, low past NTL value (below 0.26) or medium-high (above 0.56) associated with a large negative Net Inflow Share (important outbound) also results in equation ( 7) being positive.That is, in urbanized settlements from which people flee massively, a marginal decrease in the number of people leaving the settlement (i.e., a net increase in Net Inflow Share) will be associated with an increase in NTL levels. 13When   = Average Inflow Share, equation-8 becomes  In 2020, 44,566 IDPs arrived in the Badghis province due to various reasons.Table A3 shows the breakdown of numbers of IDPs that arrived due to conflict, natural disasters, and their combinations17 .It additionally specifies the numbers of IDPs due to conflict (9,504) and due to conflict and conflict-natural disasters (30,571).OCHA reports conflict-induced displacements for districts and provinces of Afghanistan.OCHA dataset additionally includes dates of displacement.According to OCHA, in 2020, there were 66 incidents (dates) of displacement in the Badghis province (Table A4).Table A4 presents summary statistics for conflict-induced IDPs based on displacement incidents (not comparable to IOM data because IOM's summary statistics is for settlements).As reported by OCHA, in 2020, conflict-induced IDPs settled in two districts of the Badghis province Qala-e-Naw (7,882 people) and Ab Kamari (20 people) (Table A5).According to IOM, settlements in this district reported 9,350 IDPs that arrived due to conflict and conflict-natural disasters (but 0 IDPs if to look at settlements that reported the arrival of IDPs due to conflict only).This hints that OCHA data may also include displacement due to natural disasters along with those due to conflict, even though the former is not specified.Correlation analysis of OCHA and IOM's data on internal displacement was conducted based on a limited number of comparable observations (districts) (Table A7).Pearson's correlation coefficient is 0.8162 (p<0.01),which, however, may not be interpreted because of too few observations.In 2020, the number of IDPs (protracted IDPs) in the Badghis province decreased by 24,327 compared to 2018 (Table A8).In particular, this effect was achieved due to IDPs leaving Qala-e-Naw (-67,888) and Ab Kamari (-693) districts.In other districts, the numbers of IDPs increased as compared with those in 2018.These numbers should be interpreted with caution as the government of Afghanistan disputes IOM's numbers of protracted IDPs.

Figure 1 .
Figure 1.IDPs located in Afghanistan (as of December 2020)

Figure 4 .
Figure 4. Visualization of average NTL in January 2020 in Badghis

Figure 7 .
Figure 7. Conditions for an increase in net displacement inflow to raise the NTL level.
NTL levels on Jan 1st, 2018

Figure 5
Figure 5.0.Scatter plot of total inflow, by initial NTL level ' computation based on IOM DTM 2020.

Table 1 .
Summary statistics of key variables in Badghis settlements across the 2018-2020

Table 2 .
Regression of NTL on net inflows of persons in settlement(cumulated over 2018-2020)Quadratic robustness.Expanding the previous analysis with a quadratic regression returns interesting and significant results.Their results suggest a concave relationship between displacement flows and NTL growth, i.e., diminishing marginal returns of displacement flows on NTL.When comparing settlements of equal population size in 2017, settlements showing the largest impact of displacement flows on NTL expansion received the lowest cumulated displacement inflows.Quadratic results are, however, more complex to interpret and depend on initial parameters; they are displayed in Appendix C. Yet, several situations exist whereby a marginal increase in the net inflow of displaced persons within a settlement would increase the NTL levels.

Table 3 .
Regression of Total inflow on average NTL levels and growth

Table 4 .
Inference Predicted Inbound using NTL variations

Table 5 .
Linear and quadratic regression of NTL its lagged value and district fixed effects Regressions control for population in settlement in 2017, the share of settlement income derived from agricultural activities in 2020, and a constant.

Table 6 .
Regression of NTL on net inflows of persons in settlement(cumulated over 2018-2020)

Table C .
1. Summary statistics of NTL levels as of Jan 1 st , 2018

Table 5 .
0.1.Summary statistics of NTL levels, average across 2km and 5km radius around GPS points

Table 5 .
0.2 Regression of NTL on net inflows of persons in settlement, 2km and 5km radius

Table 5 .
1. Summary statistics for the cumulative net inflow in Badghis settlements

Table 5 .
2. Summary statistics for the cumulative net inflow as a share of cumulated population

Table 5 .
3. Regression of NTL on average and total share of displaced persons as share of population

Table A4 .
The summary statistics for conflict-induced IDPs in 2020.

Table A5 .
Conflict-induced IDPs settled in Badghis province in 2020.CBNA reports arrival IDPs for settlements (in 2020, 1,069 settlements in the Badghis province), which are then aggregated into districts (7 districts in the region).OCHA's data only contains information for districts; in 2020, conflict-induced IDPs arrived in two districts of the Badghis province.TableA6compares displacements reported by OCHA to those recorded by IOM for two districts that are included in the OCHA's dataset.For example, OCHA reported the arrival of 7,862 IDPs due to conflict in Qala-e-Naw district.

Table A6 .
Comparison of OCHA and IOM's data on displacements in Badghis province in 2020.

Table A8 .
Changes in the numbers of IDPs in 2020 vs. 2018 according to IOM data