Correlation Analysis between NPP-VIIRS Nighttime Light Data and POIs Data —a Comparison Study in Different Districts and Counties of Nanchang

The study of urban spatial structure is very important for understanding the relationship between people and urban infrastructure as well as urban planning . This paper selected the 2016-2019 NPP-VIIRS NTL data and POI data of Nanchang City to conduct a pixel-level correlation analysis of Nanchang urban districts and counties. Firstly, the NPP-VIIRS and POI datasets should be respectively preprocessed to synthesize NPP-VIIRS annual average dataset and the POI kernel density dataset. Secondly, these datasets were respectively subjected to logarithmic transformation. Finally, they were divided into administrative districts for linear regression analysis. Experimental results showed that the correlation coefficients between NPP-VIIRS NTL data and POIs data gradually decreased from urban to rural districts.


Introduction
Urban spatial structure is a very hot research field in urban geography. A good urban spatial structure can promote the harmonious development between people and urban land. However, rapid development of cities will cause many problems. The unplanned expansion of urban will affect a series of problems against the optimization of urban functions and industrial layout. Nighttime light (NTL) remote sensing can capture the weak light signals at night. It indirectly reflects the intensity of human activities on the surface, and provides a special perspective for the study of urban structures. NTL data has been widely used in regional economic development imbalance [1], rural power consumption [2], natural disaster assessment [3], and identification of urban centers [4], and so on. Points-of-interest (POIs) contain rich attribute information and high precision. Besides, they are easily obtained. Kai et al. used multi-source datasets including POIs to evaluate the poverty [5]; Tu et al. used POIs, community media check-in and mobile phone records to develop urban vitality indicators [6]; Chen et al. used POIs data to map the urban spatial structure [7]. At present, a large number of studies have used NTL data and POIs data to analyze the urban spatial structure. Ge et al. used NTL data and POIs data to explore the city center [8]; Liu et al. used NTL data and POIs data to map the urban night leisure space [9]; Sun et al. used multi-source data such as NTL data and POIs data to classify built-up areas [10]. This paper takes Nanchang as the study area, and divides Nanchang into districts and counties under administrative divisions. Then the correlations between NTL data and POIs data of each districts and counties are quantitatively estimated by Pearson correlation coefficients under linear regression analysis. Then the reasons for the inconsistency between NTL data and POIs data are discussed, which is helpful to quantitatively analyze the urban internal spatial structure.

Study Area
Nanchang is the capital of Jiangxi Province. It is located in East China, north of Jiangxi Province. According to the website of Nanchang Municipal People's Government, Nanchang City currently governs three counties (Nanchang, Jinxian, Anyi) and six districts (Donghu, Xihu, Qingyunpu, Qingshanhu, Xinjian and Honggutan) as shown in figure 1.

Datasets
The datasets used in this study included NPP-VIIRS NTL data, POIs data and administrative boundary data. NPP-VIIRS NTL datasets (2016-2019) were provided by NOAA/NGDC (https://eogdata.mines.edu/). NPP-VIIRS NTL data of 2016 was the annual average data, and NPP-VIIRS NTL datasets of 2017-2019 were monthly composite data. There were 100,478 POIs data of 14 types collected by Python on Gaode Map (https://lbs.amap.com/) in 2019 of Nanchang City. The administrative division boundary data of Nanchang City was downloaded in the National Catalogue Service for Geographic Information (https://www.webmap.cn/)

Synthesis of npp-viirs annual average data
The 2016 annual average data has removed unstable light sources and background noise, while the 2017-2019 monthly data has not been standardized and still has noise. Hence the original data should be processed. Zhou et al. developed a method of synthesizing annual night light data based on NPP-VIIRS monthly NTL data [11]. Firstly, images of June 2017 and February 2019 were removed out of the data set as the study area was seriously covered by clouds. Then all remained images were cropped respectively by the vector boundary of the administrative divisions of Nanchang City. Radiance values less than 0 were set to 0, which meant the non-light area. All NTL images were reprojected to an Albers Conic Equal Area projection with a spatial resolution of 500m. The following 8th Annual International Conference on Geo-Spatial Knowledge and Intelligence IOP Conf. Series: Earth and Environmental Science 693 (2021) 012103 IOP Publishing doi:10.1088/1755-1315/693/1/012103 3 steps were taken to eliminate unstable regions with the assumption that if a place had been bright (NTL larger than 0) for two consecutive years, it would be a stable light brightness area. The radiance values of NTL data greater than 0 were set to 1 with others set to 0. Then binary images of 2016-2019 were got. The stable light brightness area of 2017 was got by multiply the binary image of 2016 with 2017. The NTL image of 2017 was multiplied by the stable light brightness area of 2017 to eliminate unstable regions. NTL images of 2018-2019 were similarly treated to eliminate unstable regions. Extremely high radiance values were considered as false. The most economically developed area in China is Shanghai, so the radiance value in Shanghai should be the highest in China. The stable highest radiance value of 192.684 in Shanghai in 2016 was chosen as the threshold to detect the false extremely high value in Nanchang City [11]. Then mean filtering was used to calculate the average value of the pixels around the extremely high values. Then extremely high values were replaced by their average value. Finally, all NTL values were rounded-off. The annual average NTL data of 2019 was shown as in Figure 2.

The Kernel Density Estimation
Kernel density estimation (KDE) was used to calculate the unit density of the observed values of points and line elements in a specified neighborhood, which can directly reflect the distribution of discrete values in a continuous area. The principle of kernel density estimation was to assign weights according to the distance between the data point and the center point. The smaller the distance between the observation point and the center point, the greater the weight. The formula is as follows: where P i was the kernel density of the data point i, K j was the weight of the data point j, D ij was the distance between space point i and point j, and R was the distance threshold between the two points, that is, the bandwidth (D ij < R), n was the number of point data in the bandwidth. A large number of studies had shown that the spatial weight function had little effect on the results of KDE. Reasonable choice of bandwidth R had a key influence on the result. Hinnerburg et al. studied the relationship between the bandwidth and the number of density centers and found that there were a certain number of bandwidth intervals to keep the density center stable [12]. It was reasonable to select the bandwidth in this interval.  Figure 3. The result of kernel density estimate (R=500m-3000m) As shown in Figure 3, the comparison showed that the density center was relatively stable between the bandwidth of 800m-1000m, so 1000m was selected as the bandwidth of KDE, and the pixel size was set to be 500m.

Logarithmic transformation
Logarithmic transformation was a common gray-scale transformation method in image processing. The formula is as follows: where NTL i, j was the original NPP-VIIRS NTL intensity of the pixel at row i and column j. f i, j was the result of the pixel corresponding to the NPP-VIIRS NTL data after logarithmic transformation. The formula is as follows: where k m, n was the original kernel density of the pixel at row m and column n. g m, n was the result of the pixel corresponding to the POIs data after logarithmic transformation. In the logarithmic transformation values should be larger than 0, so pixels with 0 value in NTL or KDE were removed.

Pearson correlation coefficient
The Pearson correlation coefficient was one of the common methods to measure the linear correlation between two numerical variables. Its mathematical definition was: where n was the number of samples; x i and y i were the variable values of the two variables respectively; x and y were the average values of the two variables respectively. The Pearson correlation coefficients of 2019 annual average NTL data and POIs KDE in every administrative division were respectively calculated.

Results and Discussion
As shown in table 1 and figure 4, generally speaking, the NTL intensity values were obviously positively correlated with the kernel density values of POIs in all districts and counties of Nanchang. It indicated NTL intensity was positively correlated with degree of urban building aggregation. Specifically, the correlation coefficients between NTL data and KDE of POIs data from high to low were Donghu, Honggutan, Qingshanhu, Qingyunpu, Xihu, Jinxian, Nanchang(county), Anyi and Xinjian. Donghu district as the oldest urban district of Nanchang city had the highest correlation coefficient of all. The correlation coefficients decreased from the center of the city to rural. Urban districts had remarkable high correlation coefficients than suburban counties. Xinjian district was an exception because it was rural before 2015 and had become urban district since 2016. There had been no substantial changes during a short period. The kernel density of the county centers in Anyi and Jinxian counties were higher, but the NTL intensity were lower. It may be due to the gathering of buildings in the center of the county and the infrequent human activities in the county at night. The NTL intensity indirectly expressed human social and economic activities intensity. The kernel density of POIs showed the concentration of places where human social and economic activities happened. The high consistency of the NTL intensity and the kernel density of POIs implied harmony in urban development planning and a mature urban structure. In our study of Nanchang city, urban districts had higher concentration of social and economic activities as well as high-intensity public and commercial lights than rural countries. Urban districts also had higher consistency of the NTL intensity and the kernel density of POIs than suburban counties. It implies that urban districts had better urban planning and more mature urban structure. Xinjian district as an exception had the lowest correlation coefficient between NTL data and KDE of POIs data. It implied the change of administrative attribute could not change the urban planning and urban structure in a short term.

Conclusions
To understand the spatial structure of a city, it is necessary to obtain accurate urban construction spatial distribution information at the pixel level. In this study, the linear regression analysis on the NTL intensity and POIs KDE quantitatively described the consistency between human social and economic activities intensity and urban construction. The specific conclusions were drawn as follows: The correlation between NTL intensity and kernel density of POIs in Nanchang's urban districts was obviously higher than Nanchang's suburban country, especially the oldest urban district-Donghu district, which showed that the NTL intensity in the urban area had a strong consistency with the degree of building agglomeration. It could efficiently reflect the concentration of human social and economic activities.
The correlation between NTL intensity and kernel density of POIs in county-level suburban administrative areas was relatively weak. The POIs kernel density of the county center was high, but the NTL intensity was low. The non-consistence of NTL intensity and kernel density of POIs in rural area may imply that night time light remote sensing may not be accurate in characterizing the built-up spatial structure in rural areas.
The consistency of the NTL intensity and the kernel density of POIs could efficiently estimate harmony in urban development planning. High consistency could imply a mature urban structure and low consistency may imply that may be some problems in urban development planning.