Downscaling satellite night-time lights imagery to support within-city applications using a

16 For mapping and monitoring socioeconomic activities in cities, night-time lights (NTL) satellite 17 sensor images are used widely, measuring the light intensity during the night. However, the main 18 challenge to mapping human activities in cities using such NTL satellite sensor images is their 19 coarse spatial resolution. To address this drawback, spatial downscaling of satellite nocturnal 20 images is a plausible solution. However, common approaches for spatial downscaling employ 21 spatially stationary models that may not be optimal where the data are spatially heterogeneous. In 22 this research, a geostatistical model termed Random Forest area-to-point regression Kriging 23 (RFATPK) was employed to disaggregate coarse spatial scale VIIRS NTL images (450 m) to a 24 fine spatial scale (100 m). The RF predicts at a coarse resolution from fine spatial resolution 25 variables, such as a Population raster. ATPK then downscales the coarse residuals from the RF 26 prediction. In numerical experiments, RFATPK was compared with three benchmark techniques, 27 including the simple Allocation of pixel values from the coarse resolution NTL data, Machine 28

For mapping and monitoring socioeconomic activities in cities, night-time lights (NTL) satellite 17 sensor images are used widely, measuring the light intensity during the night. However, the main 18 challenge to mapping human activities in cities using such NTL satellite sensor images is their 19 coarse spatial resolution. To address this drawback, spatial downscaling of satellite nocturnal 20 images is a plausible solution. However, common approaches for spatial downscaling employ 21 spatially stationary models that may not be optimal where the data are spatially heterogeneous. In 22 this research, a geostatistical model termed Random Forest area-to-point regression Kriging 23 (RFATPK) was employed to disaggregate coarse spatial scale VIIRS NTL images (450 m) to a 24 fine spatial scale (100 m). The RF predicts at a coarse resolution from fine spatial resolution 25 variables, such as a Population raster. ATPK then downscales the coarse residuals from the RF 26 prediction. In numerical experiments, RFATPK was compared with three benchmark techniques, 27 including the simple Allocation of pixel values from the coarse resolution NTL data, Machine 28 Learning with Splines and Geographically Weighted Regression. The downscaled results were 29 validated using fine resolution LuoJia 1-01 satellite sensor imagery. RFATPK produced more 30 accurate disaggregated images than the three benchmark approaches, with mean root mean square 31 errors (RMSE) for the year 2018 of 13.89 and 6.74 nWcm -2 sr -1 , for Mumbai and New Delhi,32 respectively. Also, the property of perfect coherence, measured by the Correlation Coefficient, 33 was preserved consistently when applying RFATPK and was almost 1 for all years. The 34

Introduction 52
Human development is a crucial factor to consider when assessing a nation's degree of 53 development since it gives inhabitants equal chances and fair choices, extends their lives, and 54 improves their living conditions, health care, and education (Wang et al., 2021). In September 55 2016, the world committed to implementing the 2030 Agenda for Sustainable Development. The 56 Sustainable Development Goals (SDGs), according to Reid et al. (2017), strike a balance between 57 the economic, social, and environmental dimensions of sustainable development. Despite decades 58 of tremendous progress in eradicating poverty and fostering wealth, a sizable segment of the 59 world's poorest population still encounters difficulties to maintain an acceptable standard of living 60 in emerging nations, particularly Asia, Africa, and Latin America and the Caribbean. It appears 61 that regional and national differences have led to the unequal reduction of severe poverty in these 62 areas (Georgeson et al., 2016; Omar and Inaba, 2020). To achieve the goals of SDGs, we need 63 better ways to collect and interpret information about many aspects of human development in a 64 timely, accurate and appropriate manner. 65 The traditional approach to examining human growth and well-being is based mostly on survey 66 data, which includes information on income, consumption, health, education, and housing. These 67 surveys are usually carried out every three to five years, but collecting survey data is expensive 68 and tedious process. Between surveys, detailed socioeconomic data are still needed (Watmough et  69 al., 2013). Moreover, countries in war or extreme poverty may even lack these survey data for 70 years (Zhao et al., 2019). In addition, fewer than two census surveys in many developing nations, 71 such as African nations, were carried out in the decade leading up to 2000, limiting the construction 72 of nationally representative human development metrics (Jean et al., 2016). Additionally, several 73 nations, like India, have suspended measures like unemployment (Dasgupta, 2022). Another 74 limitation of these censuses is that population sizes between censuses are projected, frequently 75 with linear yearly growth rates, despite the fact that censuses are expensive and may only be 76 undertaken at sporadic intervals when resources are scarce. Despite the high levels of uncertainty 77 in the estimates, they are utilized to evaluate, for example, the dangers to public health and need 78 for health services. Additionally, censuses are unable to reflect accurately intra-annual changes in 79 a nation's socioeconomic conditions since they are not designed to do so (Bharti and Tatem, 2018). 80 Using new passively gathered data sources, such as information from satellite sensors, provides an 81 alternate method of monitoring socioeconomic processes. Such data can help address the challenge 82 of scaling up (i.e., increasing the temporal resolution of) traditional data collection efforts which 83 are generally very limited in frequency due to financial cost (Jean et al., 2016). Early studies used 84 satellite "night-lights" data to demonstrate that areas with more economic output tended to emit 85 more artificial light (Head et al., 2017). Nocturnal images, such as the Day-Night Band (DNB), 86 from the Visible Infrared Imaging Radiometer Suite (VIIRS) is a valuable source of satellite 87 imagery. The VIIRS is onboard the Suomi National Polar-orbiting Partnership (SNPP) satellite. 88 The ability for researchers to track socioeconomic activity is made possible by the worldwide 89 coverage and coarse spatial resolution of these data, which have pixels that are less than one square 90 kilometer in size. Additionally, nighttime lighting is consistently assessed across nations with 91 extremely diverse institutional capacity and is not prone to manipulation for political reasons 92 (Zhang and Gibson, 2022 However, the spatial pattern of NTL is diverse. For example, the light intensity differs depending 160 on the land use (Ye et al., 2021), and the spatial pattern of NTL intensity varies from geographic 161 region-to-region, even within the same area (e.g., city). 162 163 Earth-observed variables also may exhibit spatial heterogeneity in addition to spatial 164 autocorrelation (Jin et al., 2018). For such spatially diverse variables, the global model used in 165 ATPRK may be unable to adequately capture local characteristics in the multivariate data. In 166 essence, the global ATPRK model assumes that the process under inquiry is constant across space. 167 Where the data exhibit spatial heterogeneity a more flexible model is needed; one that permits 168 spatial non-stationarity in some model parameters. between the three approaches and RFATPK was performed. 203 2. The spatial downscaled NTL data were further applied to estimate the GNI as well as to 204 measure light inequality at the within-city scale by comparing them with equivalents using 205 the coarse spatial resolution NTL. 206 207 The remainder of this research paper is organized as follows. The research areas and the data used 208 are described in Section 2. The suggested downscaling technique is described in Section 3. We 209 give the results in Section 4. We expand on the suggested downscaling approach in Section 5 210 before presenting our conclusions in Part 6. 211 Mumbai Metropolitan Region, including Mumbai and its surrounding suburban area, is known as 220

Study areas and data
India's economic engine as it accounts for over 6.16% of India's GDP, providing 10% of industrial 221 jobs. More than 20 million people live in this territory today and this amount is 2) Yearly population count data were derived from the WorldPop website 234 (https://www.worldpop.org/) (accessed 10/01/2023) for the calculation of NLDI, the 235 estimation of GNI and to assist the spatial downscaling. The NLDI and GNI were used to 236 highlight the applicability and superiority of the downscaled NTL product compared to the 237 original coarse resolution NTL. 238 3) Landsat 8 data were used to obtain land surface temperature (LST). We selected Landsat 8 239 OLI/TIRS yearly median cloud-free imagery for the same years as for NTL. The nominal 240 resolution of the initial images was 100 m. 241 4) The global human settlement (GHS) layer is a human settlement map product covering the 242 entire world (Pesaresi and Politis, 2022). We used the GHS layers of 2015 and 2020 as 243 well as the average global building height (AGBH) product of 2018 (accessed 10/01/2023). 244 5) The results of the downscaling were validated using Loujia1-01 imagery. Loujia1-01 data 245 has a pixel size of, approximately, 120 m and wider spectral range compared to the VIIRS 246 NTL data (Liu et al., 2020). 247

Methodology 248
The Methodology is organized as follows: (1) Firstly, a brief introduction of the ATPRK is given. 249 (2) A detailed explanation of the proposed RFATPK and its parts (i.e., RF regression and ATPK) 250 follows. Additionally, a description of the benchmark methods is given and lastly the two 251 socioeconomic applications. Figure 2 summarizes the methodology as a series of successive steps 252 designed to meet the research objectives. The first part includes selection of the Inputs (the target 253 variable and the covariates), namely the NTL data, WorldPop product, the LST band from Landsat 254 8 and the GHS and AGBH layers, respectively. Then, the data were regressed utilizing RF 255 regression and the predictions were separated from the residuals. In the third part, the residuals 256 from the RF model were downscaled using ATPK. Finally, the prediction was added to the 257 downscaled residuals and the NTL raster layer at 100 m spatial resolution was produced. 258 259 Figure 2: Flowchart of RFATPK. The first part includes the Inputs which are the target variable 260 and the covariate. In the second part, the input data are regressed using RF regression. The third 261 part involves ATPK-based downscaling of the residuals. Finally, the prediction is added to the 262 downscaled residuals and the NTL raster layer at 100 m spatial resolution is produced. 263

264
ATPRK is a spatial downscaling method that applies a regression model to coarse spatial 265 resolution data and subsequently applies ATPK to enhance the spatial resolution of the residuals 266 (Wang et al., 2016a) . The regression component alone is insufficient for disaggregation because it 267 does not utilize fully the spectral characteristics in the observed low-resolution data. As an addition 268 to the regression step, ATPK-based residual downscaling is utilized to account for the spectral 269 characteristics of the coarse data. The regression model in ATPRK has two parts, the prediction and the residuals. The residuals can 278 be extracted as follows: 279 where λjl represents the weights for the prediction at fine scale that honor the sum-to-one constraint 285 ∑ =1 = 1. The weights can be calculated by lessening the error variance of the prediction. The 286 analogous Kriging matrix is depicted in Equation 6: 287 where is the block-to-block (i.e., area-to-area) variogram among coarse pixels Si and Sj, is 288 the point-to-point covariance between fine spatial resolution pixels sj and sj, is the ATP 289 variogram between high resolution pixel sj and coarse resolution pixel Sl and the μj are Lagrange 290 multipliers. The covariance can be produced from the variogram. 291 The error variance δ of the ATPK prediction for the sj at fine-resolution can be calculated as 292 follows: 293 is the area-to-point covariance between coarse spatial resolution pixels Sl and fine spatial 294 resolution pixels sj.

295
The generation of the point support variogram is considered the most crucial step in area-to-point 296 Kriging method, for which Wang et al., (2016b) provides the necessary details, including an 297 explanation of how to employ a deconvolution process. The target fine pixel and the original coarse 298 pixel can be used as point and area supports, respectively, in the ATPRK prediction, which can be 299 described as follows: 300

Random Forest area-to-point Kriging 301
In the presence of spatial variability in a region, the global regression approach in the original 302 ATPRK implementation is unsuitable for characterizing this variability. A non-stationary model 303 is more appropriate when the association between the target and variable and the covariates varies 304 geographically (Jin et al., 2018a(Jin et al., , 2018b enforcing ATPK requires inversion of a large matrix, which is computationally expensive. For the 320 downscaling process the R software and the package atakrig were utilized (Hu and Huang, 2020). 321 To generate the downscaled 100 m NTL, the RFATPK disaggregating approach of combining the 322 RF (Breiman, 2001) and ATPK (Kyriakidis, 2004) methods was developed. The spatial non-323 stationarity of the regression's residuals was taken into account by the RFATPK, as well as the 324 nonlinear association between NTL and the covariates. ATPK parts, the RFATPK forecast is: 333

Random Forest Regression modelling for the trend prediction 334
The RF is a non-parametric machine learning (ML) method for regression tasks (Breiman, 2001), 335 which has been applied to fields such as, population mapping and properties relating to the soil 336 (Cheng et al., 2022;Takoutsing & Heuvelink, 2022). Based on bagging method of the training 337 data, the RF constructs an ensemble or forest of individual and non-correlated trees, saves the best 338 randomly chosen variable combination for each node of each tree, and then uses an average of the 339 individual trees' predictions to make the final prediction (Cheng et al., 2022). 340 Since they offer more useful higher spatial resolution and richer textural information than the 341 response low resolution variable, the covariate(s) in RFATPK (e.g., the Population raster) are 342 utilized to detrend the ( ) and are crucial in sharpening. The regression stage aims to fully use 343 the fine spatial resolution textural and geographic information in the given data by characterizing 344 the correlation between each coarse response image and the fine predictors. assumption. The NTL spatial trend can then be produced at a downscaled 100 m spatial resolution. 355 Due to the availability of the predictors at the fine spatial scale, the RF regression prediction at a 356 location x at the fine spatial scale, that is, ̂1 ( 0 ), is calculated as: 357 It is crucial, when using RF, to fine-tune the model parameters (Takoutsing and Heuvelink, 2022). 359 First, we used R's caret package to conduct RF regression using all the covariates and the default 362 model settings. 500 trees, a node size value of 5 and a third of the total number of covariates (mtry) 363

Random Forest regression parameter fine-tuning
were included in the default model parameters. The entire study region was considered in this step. 364

Model calibration and fitting 365
The study region was divided initially into two sets, the training and a test set. The splitting of the 366 two sets was conducted based on a stratified random sampling. This is an efficient sampling 367 method because it captures the variability of multiple inputs of auxiliary information in the feature 368 space (Getis and Ord, 1992 tuned arguments for the ntree and mtry. 380

Spatial prediction at the fine spatial scale 381
The average of all measurements embedded in one of the end nodes of the tree serves as the 382 forecast of a single decision tree of RF for a new site x0. By branching through the tree depending 383 on the covariate values at x0, the end node may be located.

384
The RF prediction can be calculated by taking the mean of all tree forecasts. Because it is a 385 weighted linear combination of the measurements, it can be represented as: 386 where ̂1 ( 0 ) stands for the prediction, n, and are the number of measurements, the weights 387 and the NTL measurements, respectively. Note that the weights are obtained from the variables at 388 the observed and predicted location, even though this isn't stated explicitly in Equation 1. 389 (Takoutsing and Heuvelink, 2022). 390

Benchmark methods 391
In this research, the proposed approach was compared to three benchmark methods, namely GWR, 392 Machine Learning with Splines and the Allocation of raster values. The benchmark methods are 393 described below. 394 Prior to GWR, simple linear regression models were, thereafter, fitted to reveal the model's where, ̂0 (·) and ̂( ·) represents the estimated GWR coefficients with spatial locations centered 398 at fine pixel sj and coarse pixel Sj, respectively.

399
For GWR's kernel a Gaussian function was selected and the width of the kernel was determent 400 using an adaptive spatial kernel function (Chen, 2015). The Gaussian function describes the 401 relationship between the weight Wij and distance from center dij and is a continuous monotonically 402 decreasing function. The Gaussian function is used widely: where b and are the kernel bandwidth and the distance between two locations i and j, With the allocation-based method, a new fine spatial resolution raster (i.e., 100 m pixel size) is 416 created with null cell values, but with the same spatial reference system as the coarse resolution 417 raster and then the two rasters are properly overlaid. Then, the pixels of the newly created empty 418 raster are given a value corresponding to the pixel value of the overlaid coarse spatial resolution 419 raster. This approach, thus, represents the "do nothing" or "null" baseline and all other methods of 420 allocation should improve on this baseline if they add any useful information. 421 3.4 Two socioeconomic criteria 422 The use of NTL as a proxy to various socioeconomic indexes is a major application. Therefore, 423 the application of downscaled imagery to proxy the Gross National Income per capita and the 424 Night Light Development Index (NLDI) is meaningful to illustrate the necessity of downscaling. 425 Payments go toward a country's Gross National Income (GNI), which is comprised of the GDP 426 plus net revenues from employee compensation and foreign property income. The money that 427 foreign migrants send to their home nations is known as remittances (Ghosh et al., 2009). To 428 measure the association between the GNI and the NTL at the city scale, we sum all the lit pixels 429 of the NTL, where "lit pixel" means a radiance value equal to or greater than 1 nWcm -2 sr -1 . Then 430 we computed two linear regression models, one using the coarse resolution NTL as explanatory 431 variable and one linear model using the disaggregated NTL and compared their R 2 values (Gibson 432 and Boe-Gibson, 2021). The dependent variable in both cases was the GNI and it was measured 433 in 1000 US dollars. 434 The NLDI varies from 0 to 1, representing perfect equality and inequality, respectively. The two 435 geo-referenced gridded layer inputs to the NLDI were the population count raster and the NTL 436 image. 437 The brightness (NTL's pixel value) and population count were associated in tables created using 438 crosstabulation. In order to compute the NLDI, the two rasters were stacked and the joint 439 distribution of brightness and population count in cell was calculated. To measure equality in the 440 geographic distribution of lights, the Gini index was computed based on the statistical distribution 441 (i.e., the table containing the pixel values of NTL and Population, sorted by the NTL) according 442 to the formula: 443 where R and n represents the NLDI and the number of raster images, respectively, = Splines, over-fitting issues can be used to explain this outcome. Since the raw NTL coarse 470 reference data are known perfectly in the experiment, preservation of the original patterns is the 471 desired target. 472  RFATPK is clearly more precise than the three benchmark methods in terms of all three indices. 490 This is due to the fact that the sceneries under study are highly developed metropolitan 491 environments with a variety of impervious surfaces (such as buildings, roads, and vegetation), 492 which are better suited to being well described by a spatially non-stationary model. Machine 493 Learning with Splines yielded greater accuracy compared to GWR and Allocation-based 494 downscaling. The least accuracy resulted for GWR-based downscaling, in terms of all three 495 indices. 496

Downscaling prediction (450 m to 100 m) 497
To facilitate visual comparison, three zoomed sub-areas selected randomly and their corresponding 498 results are shown in Figure 6  subtraction. Compared to the other approaches, the Allocation-based downscaling method's 519 variogram exhibited the highest semivariance (Figure 7). The GWR-based downscaling approach 520 in Figure 9 provided the highest semivariance, while the Machine Learning with Splines-based 521 downscaling method produced the lowest semivariance. In comparison to GWR and Machine Learning with Splines, the suggested RFATPK generated the 531 best visual outcome among the three downscaling techniques and also had the attribute of perfect 532 coherence (Table 3). Allocation-based downscaling also preserves the property of perfect 533 coherence, but no new information is added and there is, consequently, no spatial variability in the 534 NTL intensity within the fine resolution pixels. 535  and New Niludipine Delhi. It can be seen that the index for Mumbai shows an upward trend which 582 means that light inequality is increasing. On the contrary, the NLDI index for New Niludipine 583 Delhi is decreasing with the exception for 2015. The data are consistent with the HDI index from 584 Global Data Lab which reveals an increase in the index through time (Table 6). 585 586

Gross National Income per capita 592
The GNI was measured only for New Niludipine Delhi for the years 2013 to 2019. It can be seen 593 from Table 7   Monitoring socioeconomic indicators at the city-scale is of great importance for governments and 655 policy makers. As such unbiased data at fine spatial resolution are a critical input to support policy 656 development and decision-making. To highlight the applicability of the downscaling method, two 657 socioeconomic applications were considered at the city-scale. 658

Night Light Development Index 659
The index is an estimation for economic and human development in a region. The strong 660 association between the NLDI and the HDI suggests the former index measures human 661 development, which is consistent with Elvidge et al. (2012). The results using the fused NTL are 662 encouraging and we suggest the downscaled data are suitable for measuring human development 663 at the city-scale. 664

Gross National Income per capita 665
The reference NTL data were less accurate at predicting yearly GNI than the downscaled NTL at 666 the city scale. The application of studies that demonstrate the efficiency of estimating such 667 socioeconomic indicators at the city level is called into question by the poor association between 668 coarse resolution nocturnal data and GNI and makes it difficult to understand how such data may 669 serve as a reliable indicator of changes in city-scale economic activity. The results provided here, 670 on the other hand, point to the downscaled NTL as a far more accurate way to quantify GNI and a 671 viable substitute for the index. 672

Future research 673
The point spread function (PSF) exists in every satellite sensor imagery. It has a significant impact 674 on image quality and sets a strict cap on how much information is included in satellite sensor 675 images . It is clear that the PSF can affect the downscaling process because 676 disaggregating methods aim to increase the pixel size by creating more (sub-) pixels than the 677 original image and thus, better describing the spatial content of a region. A variety of PSFs will be 678 evaluated in a future effort to reduce the uncertainty in the downscaling procedure. Another 679 important limitation is that NTL values cannot be determined from a single covariate, as shown by 680 the global model. This means that, more ancillary variables are more suitable for charactering NTL 681 intensity and may lead to more accurate prediction of the trend (Ye et al., 2021). Future research 682 will focus on incorporating more ancillary variables, mainly from the so called 'social pixels', for 683 example, geo-tagged Twitter data or geo-located POIs. Thus, fusion with social data at a fine 684 resolution should be tested in future. Lastly, as mentioned in the Discussion, Section 5.1, this 685 research did not take into account the scale effect. Therefore, studies in the future should need to 686 be designed for and check if the by accounting for the scale effect will improve the downscaling 687 predictions. 688

Conclusion 689
Spatial downscaling is widely used to transform remotely sensed images from coarse resolution to 690 fine resolution in order to track human activity. For the first time, a strategy for spatially 691 downscaling nocturnal pictures was presented in this study using RF and ATPK. The RFATPK 692 approach has the advantage of taking into consideration both the spatial correlation between the 693 response variable and the predictors as well as local spatial variation. To show the effectiveness of 694 this approach, it was used on yearly coarse NTL products in two separate Indian megacities. 695 The geostatistical RFATPK solution was compared against three benchmark algorithms in 696 experiments conducted on one experimental case in the two mega-cities. The results are 697 summarized as follows: 1) The three benchmark methods were outperformed by RFATPK, 698 demonstrating the utility of this technique for spatial sharpening; 2) RFATPK, consistently, 699 assures total coherence with the original coarse data, in contrast to two of the benchmarks, and 3) 700 due to its spatially non-stationary nature, RFATPK was able to lower the residual variance in 701 comparison to a single, global regression model. The encouraging results suggest that RFATPK 702 can produce images that are suitable for socioeconomic analysis at the city-scale, as illustrated 703 when comparing a human development index using coarse-resolution NTL data against fine-704 resolution nocturnal lights. Indeed, the GNI index was better approximated using the downscaled 705 NTL data. Another application suggesting that the disaggregated NTL are more suitable for fine 706 scale (social) applications was the measurement of wellbeing by means of light inequality. The 707 results implied that using the proposed solution, the nocturnal satellite sensor data are closer to the 708 values of the official statistics (i.e., HDI). According to the results, our method can be generalized 709 worldwide (i.e., to other cities) and for a variety of social science applications. 710