Combining Luojia1-01 Nighttime Light and Points-of-Interest Data for Fine Mapping of Population Spatialization Based on the Zonal Classification Method

Fine-scale population spatial distribution plays an important role in urban microcosmic research, influencing infrastructure placement, emergency evacuation management, business decisions, and urban planning. In the past, nighttime light (NTL) data were generally used to map the spatial distribution of the population at a large scale because of their low spatial resolution. The new generation of Luojia1-01 NTL data can be used for fine-scale social and economic analysis with its high spatial resolution and quantitative range. However, due to the geometry and background noise of the data themselves, the accuracy of the original NTL data is still low. Points-of-interest (POI) also can be used to map the population spatialization, but the indicative relationship between the POI and population is not clear, especially in rural and urban areas with different landscape structures. To solve the above-mentioned problems, this study proposes an improved nighttime light (INTL) index to better use the Luojia1-01 NTL data. Meanwhile, a zonal classification model based on INTL and impervious surface area is proposed to distinguish urban and rural areas. Compared with previous research and existing datasets, our result had the highest accuracy (R² = 0.86). This study explains that the INTL index is applicable to population spatialization research with the emergence of high-resolution and multispectral NTL satellite data. Moreover, the zonal classification model in this research can significantly improve the accuracy of population spatialization in rural areas. This study provides a possible way to use NTL and POI data in other social and economic spatialization research.


I. INTRODUCTION
P OPULATION data are the basic data sources reflecting human activities and are widely used in ecological and environmental protection, disaster risk assessment, business decision making, and regional planning and development [1], [2], [3], [4]. The census, based on the administrative area as the unit of statistics, is the major way of obtaining population information in most countries. It is authoritative, systematic, and normative [5], [6]. However, the boundaries of the administrative areas are inconsistent with those of natural units (watersheds, natural hazards affected area, etc.) in actual research, and the differences make it difficult to integrate and comprehensive analyze multisource data at a fine scale [7]. Gridded population distribution data can solve the above-mentioned problems [8], [9], [10].
Many methods have been proposed to disaggregate population data from administrative to grid units [11], [12]. One of the most common methods is dasymetric mapping, whose main idea is to establish models between auxiliary data and population data to generate weight layers and disaggregate statistical population data according to the weight layers [13], [14]. Meanwhile, remote sensing data are becoming important auxiliary data because of their large coverage, rapid updating, and public availability [15], [16], [17]. Nighttime light (NTL) data as useful remote sensing data with spatial morphology information and intensity information have been proved to be significantly correlated with large areas of population distribution and have been widely used in generating gridded population distribution [18], [19]. However, the Defense Meteorological Satellite Program and Operational Linescan System (DMSP-OLS) NTL images have some inherent drawbacks and quality problems, such as blooming, saturation, and coarse resolution (∼1 km) [20], [21], [22], [23]. The Suomi National Polar-orbiting Partnership (Suomi-NPP) was launched in October 2011, which acquired Visible Infrared Imaging Radiometer Suite Day/Night Band (VIIRS/DNB) NTL data is considered to be the successor product to DMSP/OLS. There is no saturation of VIIRS/DNB NTL data and its spatial resolution (∼750 m) has been significantly improved. Compared with DMSP/OLS, VIIRS/DNB NTL data has become much more effective in reflecting social and economic activities. Nevertheless, the moderate spatial resolution This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ still limits the ability of VIIRS/DNB NTL data to capture urban detail, which restricts the mapping of population spatial distribution at fine scale [24], [25]. Launched in 2018, the professional NTL satellite Luojia1-01 can provide remote sensing images with a spatial resolution of 130 m, which can convey more detailed structural information within cities [26], [27]. Furthermore, it does not have a saturation effect, and the "blooming" effect is reduced due to the increase in spatial resolution. Therefore, it has a more accurate description of the extent and intensity of human activity and has greater advantages in social and economic research. Previous studies have mostly focused on the direct use of Luojia1-01 NTL images [23], [28]. However, several papers pointed out that when high quantification level NTL images are used, the contrast of the low and high areas in the image is enhanced [18], [29], [30]. The views suggest that direct use of Luojia1-01 NTL images would result in overestimation of the population in high light areas and underestimation of the population in low light areas. Therefore, the accuracy of the population spatialization is not high when directly using Luojia1-01 NTL data, and the data quality needs to be improved. In this context, one of the contributions of this study is to propose an improved nighttime light (INTL) index to improve the accuracy of the population spatialization.
On the other hand, the geospatial Big Data with specific geographical coordinates and semantic information can be a useful supplement to remote sensing data [4], [31], [32]. As a representative of geospatial Big Data, points-of-interest (POI) data have the characteristics of large volume, wide variety, and high positioning accuracy [33], [34], [35]. Yang et al. [36] conducted experiments in Zhejiang and concluded that the combination of POI and multisource remote sensing data significantly improved the accuracy of the population spatialization. However, the relationship between the POI and population may vary from one region to another because different regions or environmental conditions have different spatial structures and population aggregation patterns [19]. For example, obvious differences exist in architectural patterns and population distribution between urban and rural areas. In this research, we proposed a threshold based on the INTL and the impervious surface area (ISA) index to effectively identify areas with large structural differences. We performed zonal classification based on the above-mentioned index and established different models.
In summary, the main objective of this study was to improve the accuracy of the population estimation. We established INTL index to make better use of the Luojia1-01 NTL images. At the same time, we used a zonal classification model to ensure the POI had a better population interpretation effect. In this research, we constructed a geographically weighted regression (GWR) to obtain the population from street scale to grid level based on the above-mentioned model. This result not only provided a fine scale population distribution but could also benefit other social and economic spatialization research.

A. Study Area
Beijing is a world-famous ancient capital and a modern international city [37]. By 2020, the city is expected to cover an area of 16410.54 km 2 (Fig. 1). The resident population of Beijing is approximately 21893100 (approximately 1344 persons/km 2 ). The level of economic development and social functions of each region in Beijing are different, resulting in varying population densities. For example, the Xicheng district has the highest population density of 21 818 persons/km 2 , while the Yanqing district has the lowest density of 173 persons/km 2 . The population distribution of Beijing has spatial heterogeneity and complex characteristics. Therefore, selecting Beijing as the study area has reference significance for other large cities with a very large population, strong spatial heterogeneity, and rapid development in terms of population spatialization.

B. Datasets
The data in this article are divided into remote sensing data, geospatial Big Data, and validation data. Table I lists all the data used in this study, including the remote sensing image data (Luojia1-01 NTL, Landsat 8), the geospatial Big Data (POIs, OpenStreetMap data), the census data, and the validation data (WorldPop and LandScan). The Luojia1-01 NTL satellite provided images with a spatial resolution of 130 m, which can more accurately represent the spatial information inside cities and has irreplaceable advantages in spatial scale analysis. The Landsat 8 satellite has a 16-day revisit period. The Operational Land Imager (OLI) includes nine bands with a spatial resolution of 30 m, including a panchromatic band with a spatial resolution of 15 m [38]. Two cloud-free images of Landsat 8 from January 2020 were selected in this study to extract ISA information. The POIs for this study were obtained from the Amap service, which is the most used and largest online mapping service in China. We obtained more than 3 00 000 records in the study area, including catering services, public facilities, companies, transportation facilities, commercial institutions, commercial spaces, scientific, educational and cultural services, medical care services, accommodation services, etc. OpenStreetMap (OSM) is a type of volunteered geographic information. After years of accumulation, OSM now has detailed geographic information for most regions of the world. This study used OSM data to eliminate the nonresidential areas, such as roads and scenic spot information.

III. METHOD
The target of the study was to generate an accurate population distribution map with a spatial resolution of 100 m. Fig. 2 illustrates the flowchart of the entire study. The following four steps were taken: 1) Data preprocessing, which included extracting the ISA, preprocessing the Luojia1-01 NTL images, and processing the POI data. 2) Based on the NTL and ISA, we obtained the results of the zonal classification. 3) We constructed regression models using GWR with the INTL and POI as independent variables and the census data as the dependent variable. The population was disaggregated into grids, according to the regression coefficients. 4) The accuracy of our model was assessed.

A. Extracting the Residential ISA From the Landsat 8 Images
The ISA represents major sites of human activities. There are already many methods to accurately extract the ISA using Landsat [39], [40], [41], [42]. Fig. 3 shows the framework for extracting the residential ISA.
In this study, we used the linear spectral mixture analysis method to analyze the images, and three fraction images were generated, representing different endmembers [39]. The three endmembers were high-albedo objects, low-albedo objects, and vegetation, respectively. We generated high-albedo and lowalbedo images according to the selected endmembers. The ISA exists in both high-albedo and low-albedo images. At the same time, shallow water, wet soil, and impurities were also assigned to the two images. Therefore, subsequent removal was necessary. We used the normalized difference water index to remove the water [43]. The NTL was used to remove the remaining non-ISA, such as landslides, poor vegetation coverage, and water, which can easily be misclassified as ISA [44]. The roads, squares, parking lots, airports, etc., were classified as ISA, but they are not places where resident populations live. Therefore, in order to estimate the population more accurately, these areas in the ISA needed to be removed via the OSM data [8]. Finally, we used Google Earth images to evaluate the accuracy of the ISA. The verification samples were selected using the random sampling method. A total of 600 samples were selected, including 400 non-ISA samples and 200 ISA samples. The accuracy was evaluated using an error matrix [45], [46].

B. Luojia1-01 Preprocessing and the Development of an INTL Index
The Luojia1-01 satellite, launched in June 2018, provided images with a spatial resolution of 130 m. The data are freely available from the Hubei Provincial Data and Application Centre (http://59.175.109.173:8888). We selected the cloud-free image on 6 September 2018 and the image was a single-phase image. According to the information provided by the open download website, we converted the digital numbers (DNs) to radiance values after absolute radiation correction [26], [27]. Compared with the DMSP-OLS NTL data (6 bit), Luojia1-01 NTL data (15 bit) have a higher quantification, which creates a large difference between the maximum and minimum values. The Luojia1-01 NTL data were the system's geometric correction product, and the positioning accuracy was relatively low [47]. Moreover, there was still residual background noise in areas such as forests and water bodies. These phenomena will have a negative impact on data processing and are not suitable for direct population spatial research. Therefore, we processed the Luojia1-01 images as follows: 1) Reprojection and resampling: we reprojected the extracted images to the WGS 1984 UTM zone 50N. Next, the nearest neighbor method was used for resampling the images to a resolution of 100 m. 2) Geometric correction: taking the Landsat 8 image as the reference image, by selecting several evenly distributed road intersections on the image, geometric correction processing was carried out to keep the image offsetting within one cell size. 3) After absolute radiation correction using (1), we converted the DNs to radiance values.
where DN is the brightness value of the pixel, and L is the radiant value, the unit of which is W/(m 2 · sr · μm).

4)
Denoise: in order to remove background noise, we used (2) to unify the NTL data. Subsequently, there were still some extremely high values. In order to reduce their influence, we used the threshold method to remove the background noise and the extremely high values. Here, we set the maximum threshold to 900, and the minimum threshold was set to 0.92 [27] where ω is the bandwidth; the range is 460∼980 nm; ω = 0.52 μm; and L is the radiance after the unified dimension. It is worth noting that in some human residential areas, such as residential quarters, shanty towns, and especially rural resident areas, the light sources mainly come from indoor lighting. Compared with areas that directly emit light, the radiance value in the residential area is low; hence, the intensity of human activities in these areas could be underestimated, resulting in the inaccuracy of the population distribution [48]. Therefore, we proposed an INTL index to enrich the spatial variation information of these areas, which can reflect the population activity level more effectively. In this study, we chose logarithms and square roots to smooth some of the extremely high brightness values. We added a value of 1 in the formula to keep the value positive.
The INTL can be demonstrated using the following formula: The NTL is the radiance value of the Luojia1-01 NTL data after preprocessing.

C. Kernel Density Processing for the POI Data
The POIs in our study were obtained from the Amap service (https://www.amap.com), which is the most used and largest online mapping service in China. In 2020, there were more than 3 00 000 records in the study area, covering 23 categories. The POI has the characteristic of reflecting human activities at the micro scale, which makes it possible to describe more fine-grained land use types.
First, data cleaning was carried out to remove incomplete and repeated data from the POIs, and 9 03 544 effective POI records were retained. Based on previous studies and the actual condition of the study area [3], [49], we reclassified these POIs into four categories, as shown in Table II. These four categories were based on convenient transportation, complete living facilities, and complete services. POIs are point data with discrete and discontinuous characteristics; the kernel density method was used to transform the discrete POI into smooth and continuous density raster data. After kernel density processing, the POI reflected the spatial aggregation state quantitatively, which is convenient for statistical analysis. This study selected the bandwidth of 1.5 km to generate the POI density layers with a 100 m spatial resolution.
The Luojia1-01 NTL, the residential ISA, and the four categories of POI density layers were converted to the WGS-84 coordinate system and projected into the Universal Transverse Mercator coordinate system. Then, the nearest neighbor method was used to resample the images to a resolution of 100 m. In the process of population spatial distribution, the principle is that

D. Threshold Based on the INTL and ISA Index
Due to the spatial heterogeneity between urban and rural areas, the performance of modeling factors shows discrepancy; the same value is different in terms of the population distribution and human activity intensity, which leads to the low accuracy of the model. Therefore, we proposed a threshold based on the INTL and ISA index (TBNII), representing the intensity of the INTL in subdistricts, to classify each subdistrict into urban and rural areas. The formula for the TBNII is as follows: where T NL i represents the total value of INTL in the i th subdistrict, and ISA i represents the total area of the ISA in the ith subdistrict. The higher the TBNII, the greater the activity intensity of the subdistrict.

E. Constructing the Population Model
According to previous studies, the distribution of the population with spatial nonstationarity leads to poor accuracy of those models. GWR is an effective way to examine spatial nonstationarity [50], [51]. Compared with global linear regression, the GWR model takes into account the spatial location of data in the regression process [52], [53]. Only data within a certain distance of the regression point are considered, and the closer the distance, the greater the effect on the regression point. Therefore, there are different regression coefficients at each regression point, which means that each regression point has an independent regression equation. 1, 2, . . . , m, j = 12, . . . , n) where i illustrates the ith subdistrict; m is the number of subdistricts participating in the regression; j illustrates the jth independent variable; n is the number of independent variables; (u i , v i ) indicates the spatial geographic center coordinate of the ith subdistrict; β 0 (u i , v i ) represents the constant term; x ij illustrates the jth variable in the ith subdistrict; ε i illustrates the error term of the ith subdistrict, ε i ∼ N (0, σ 2 ); and β j (u i , v i ) represents the local regression coefficients. The regression coefficients are estimated by the following equation: X illustrates a vector of independent variables; Y illustrates a vector of the dependent variables; and W (u i , v i ) is a spatial weight matrix, whose diagonal elements are w lc [54]. In this study, the matrix was calculated by Gauss's function, represented by the following equation: where d lc illustrates the Euclidian distance of the subdistrict geographic center between l and c. With the change in d lc , w lc varies. That is, the closer the distance, the higher the weight. b illustrates the bandwidth, which determines the maximum distance from the ith region center. Different bandwidths have different weight decay rates. With the increase in distance, the weight with a large bandwidth decays slowly, while the weight with a small bandwidth decays quickly. In this study, the adaptive bandwidth was adopted to determine the number of nearby subdistricts. Cross validation was used to optimize the bandwidth [55]. The population GWR model can be expressed as follows: where P OP i illustrates the ith subdistrict; RE i is the value of residences in the ith subdistrict; IN i is the value of institutions in the ith subdistrict; SE i is the value of services in the ith subdistrict; and T R i is the value of transportation in the ith subdistrict. Therefore, we obtained the regression coefficients for independent variables on each street, which are β 1 , β 2 , β 3 , β 4 , β 5 in (8).
Disaggregating population from subdistrict-level to grid-level is based on the following assumption: the modeled relationship between population and independent variables derived from district-level data retains at grid-level.
The trained model and regression coefficients were used to redistribute the total population. We calculate the population in each grid with 100 m resolution by multiplying coefficients with the values of the corresponding independent variables at each grid.

F. Accuracy Assessment
The coefficient of determination (R 2 ), mean absolute error (MAE), and root mean square error (RMSE) were used to assess where n is the total number of subdistrict units; P OP estimated represents the estimated data of the subdistrict; P OP census denotes the census data of the subdistrict; and P OP i denotes the average value of the estimated data. Meanwhile, to highlight that our new population map holds a competitive accuracy, we introduced WorldPop and LandScan for validation.

A. Residential ISA Distribution Analysis and Accuracy Evaluation
The ISA is a key factor of land use and land cover, which is representative of the building footprint and the human settlement. However, the ISA information contains other areas such as roads and parking lots [2], [8]. To obtain more accurate data, we trimmed the refined ISA to obtain the residential ISA by excluding roads, parking lots, and other nonresidential data, which were obtained from OSM. It is important to extract the ISA accurately. As shown in Table III, the accuracy of the ISA was good with an overall accuracy of 92.17%. The results of the residential ISA are shown in Fig. 4.
The fractional ISA has a value from 0 to 1, representing the percentage of land occupied, and was also used as a variable in this study. The ISA was abundant in the relatively flat central, southern, and eastern parts of Beijing. In the mountainous areas of the north and west, the distribution of ISAs was sparse.   large difference between the low regions and the high regions (the highest was close to 400). Because of the large difference in the Luojia1-01 NTL data values, the urban and the rural/suburb population distribution was not well represented. Fig. 5(b) shows that the INTL effectively smoothed the extremely high values and improved the low values. It also better distinguished the low-value area from the nonvalue area and reduced the large difference between the high and low values. Thus, the INTL significantly improved the accuracy of the population distribution.

C. Results of the Zonal Classification Based on TBNII
We backward the optimal value of TBNII from the accuracy of population spatialization. If R² is higher, it means that the model is less confusing, which represents the best differentiation effect of population aggregation pattern. That is, the discrimination  between rural and urban works is better. In this study, 0.1 was used as the interval unit to traverse different thresholds; the R 2 fluctuated significantly with the adjustment of the threshold. As shown in Fig. 6, the fluctuating R 2 first increased and then decreased. When the TBNII value was 1.2, the model had the highest accuracy with an R 2 at 0.858. Thus, we used 1.2 as the threshold to classify the urban (TBNII>1.2) and rural (TBNII <1.2) areas. The results of the regional classification based on the TBNII are shown in Fig. 7, the central urban area was transformed into one class and the surrounding areas into the other class, which was consistent with the development of Beijing.

D. Model Comparison and Accuracy Assessment
M1 and M2 are two commonly used products of population spatial distribution. M3 represents the model combined original Luojia1-01 NTL data with POI data. M4 represents the improved model, which performs logarithm and square root processing according to the characteristics of Luojia1-01 NTL data. M5 represents the model proposed in this study. Table IV shows the accuracy comparison of population mapping models. Overall, M5 had the highest adjusted R-squared and the lowest MAE and RMSE. The results showed that the INTL index improved the model accuracy, and the zoning strategy also significantly improved the model accuracy. Meanwhile, the above-mentioned three models were all better than WorldPop and LandScan in terms of the population spatialization. Fig. 8 shows the scatterplots between the estimated population and the census data at the subdistrict level of Beijing. M5 had the minimum deviation between the estimated population and the census data in urban areas (i.e., subdistricts with a population greater than 20×10 4 ). In the rural areas with a smaller population, M5 also performed best.
There was a mismatch between Luojia1-01 NTL data and other data of the time-scale, because we had to use the night light data from 2018. Nevertheless, in the population spatialization model proposed by the study, the Luojia1-01 NTL data was not a single independent variable and was only used as a weighting layer for redistribute the total population, which reduced the influence of the temporal difference to a certain extent.

E. Mapping Results of the Population Spatial Distribution
The spatial distribution result of the gridded population in Beijing in 2020 is shown in Fig. 9. The grid-level population data better expressed the spatial heterogeneity of the population distribution. The population was mainly concentrated in the central urban areas. The spatial distribution of the population in Beijing was uneven. The population density gradually decreased from the central city to the surrounding areas. The central, eastern, and southern regions were flat places with dense populations, while the western and northern mountainous areas were sparsely populated, determined by the characteristics of Beijing's natural environment and socioeconomic development. Compared with the central, southern, and eastern plains, the mountainous areas in the west and north are not suitable for large populations to inhabit.

A. Efficacy and Potential of the INTL
The Luojia1-01 NTL performs better when reflecting social and economic activities because of its high spatial resolution, no saturation, and lower blooming effect. At present, many studies have used Luojia1-01 NTL images to analyze socioeconomic parameters such as the gross domestic product (GDP) [56], urban activities [47], poverty estimation [57], and urban transformation [58]. However, there are still some limitations in Luojia1-01 NTL, such as the existence of noise background, which creates a large difference between the extremely high and low values. Therefore, we needed to process the extremely high values and background noise to enable the Luojia1-01 NTL image to perform better.
To accomplish this goal, we used the square root and logarithm transformation to suppress the high values and enhance the low values in the Luojia1-01 NTL, which we called the INTL index. This index better represented the spatial distribution of the population. As indicated in Table V, the model based on the INTL had a higher accuracy than the model based on the NTL. The INTL has two main purposes. The first purpose is to enhance the low nightlight information, especially in rural areas. INTL can provide richer spatial variations in population distribution between rural and urban areas. The second purpose is to suppress the extremely high values, especially those values in urban commercial centers. If the original Luojia1-01 NTL data are directly used to estimate the spatial distribution of population, it tends to distribute the population excessively into the area with extremely high NTL values. Therefore, INTL can alleviate the overestimating problem in some regions.
We used skewness to analyze the validity of the INTL, and a positive bias means that most values are to the right of the average. The skewness of the NTL and INTL were 10.99 and 0.52, respectively. The INTL greatly alleviated the deviation of the NTL, indicating that the INTL enlarged the brightness variation degree of the NTL and enriched the spatial variation information. To verify the superiority of the INTL, we analyzed the correlation of the INTL and NTL with the population. As shown in Table V, the correlation between the INTL and the population was higher than that of the NTL in all regions, and the increments in urban and rural areas were 0.16 and 0.08, respectively.
The excellent performance of the INTL was verified in this study. It can be extended to other social and economic research, including the extraction of built-up areas and GDP assessment. It certainly provides a new data processing method, which can be used as a reference for other NTL satellite data, such as Jilin1-07 [49], EROS-B [18], and SDGSAT-1 [59].

B. Importance of the Zone Strategy
At present, the POI is the most widely used geospatial Big Data for population spatialization [32]. Previous studies have pointed out that POI contributes significantly to fine-scale population distribution [35]. However, few studies have noted that the regression relationship between the population and the POI may be different due to the variance in urban structures in different regions, although this phenomenon has been put forward in the study of population spatialization using NTL data [19], [51], [60], [61]. In rural and urban areas with different spatial structures, the same NTL data represent different populations. The TBNII used in our study reflected regional differences according to the average NTL data of the residential ISA. It is assumed that the higher the TBNII, the greater intensity of social and economic activities. Table IV shows that the accuracy of the zonal classification model was significantly improved, indicating that adopting a zone strategy was necessary. Considering the strong spatial heterogeneity in urban and rural areas, building the relationship between independent variables and dependent variables in one regression model is not suitable. Therefore, when the NTL data and POI are used as representative variables, the areas with different structures should be modeled separately.  As shown in Table VI, before the implementation of the zonal classification strategy, 41.38% of the subdistricts had an R 2 higher than 0.75. After the zone strategy was implemented, 71% of the subdistricts had an R 2 higher than 0.75. As shown in Fig. 10, the accuracy of all areas was greatly improved, especially in rural areas. In general, the implementation of the zonal classification strategy significantly improved the regional population spatialization accuracy, which is also meaningful for other spatialization research using POI and NTL data.
We can also see from Fig. 10 that there are differences in the spatial distribution of Local R-squared. Local R-squared varies from region to region and had higher values in the surrounding area of the city center. In the future study, we will introduce GW framework to explore and analyze the influence of each variable on population to explore the spatial heterogeneity [62].

C. Limitations and Recommendations for Future Work
We developed an INTL index and used the zonal classification method to build a model between the POIs, INTL, and subdistrict population, which greatly improved the accuracy of the population spatial distribution. However, our research had certain limitations. The grid-level error may be higher, as it was difficult to obtain the grid-unit census data for Beijing; hence, we were unable to conduct more in-depth accuracy verification. Although the global population dataset can provide grid-level population data, it is still difficult to use as an absolute reference value. There are many derived properties of POIs, such as the number of types and density. The selection of the POI category also needs to be considered. When we used the kernel density method to process the POIs, the search radius for all types of POIs was set as 1.5 km. In fact, the impact radius of POIs varies greatly depending on the type, for example, large hospitals and small clinics. At the same time, some errors and omissions were observed in the ISA extracted from the satellite data, which also led to inevitable errors. The Luojia1-01 satellite stopped operating in 2019, which limits the use of this data in the long term.
The spatial distribution of the population is a complex process influenced by the natural environment, economic conditions, and social development [63]. Therefore, more variables should be considered when constructing the population estimation model. In future research, we need to consider further geospatial Big Data, such as mobile communication positioning data, POIs, and check-in data, to distinguish commercial, industrial, and residential areas and villages. Building height information is also important in population estimation but was not used in this study [3]. High-resolution stereo images, which can be used to obtain large-scale and high-precision building height, would have more potential to estimate the population density. Furthermore, the spatial structures in different cities cause differing results for population distribution. Thus, more studies using a combination of remote sensing and geospatial Big Data should be conducted in the future.

VI. CONCLUSION
In this study, the grid level population spatial distribution of Beijing in 2020 was mapped based on the POIs, Luojia1-01 NTL data, and census data at the subdistrict level. The fine-scale population spatialization dataset provides rich information for urban planning, resource allocation, and environmental protection. According to the results of this study, the main conclusions can be summarized as follows: 1) The INTL suppressed high values and enhanced low values in the Luojia1-01 NTL image through using square roots and logarithm transformation. The transformed image better represented the spatial distribution of the population. 2) After zoning the study area into rural and urban areas based on the TBNII, different GWR models were established, which decreased the confusion between the dependent variables and the independent variable. The zonal classification model reduced the overestimation of the population in rural areas and the underestimation in urban areas. 3) Compared with the WorldPop, LandScan, and previous models based on the NTL and POI, the zonal classification model had the highest R 2 and the lowest MAE and RMSE, and the accuracy of the model was significantly improved.
In future research, we will consider high spatial and multispectral satellite imagery in the model as auxiliary data. Meanwhile, we will extend the population spatialization to a larger scale such as for the whole of China.