Modelling urban expansion with cellular automata supported by urban growth intensity over time

ABSTRACT The simulation of urban expansion has become an important means to assist urban development planning and ecological sustainable development. However, the spatial and temporal heterogeneities of urban expansion has been a major challenge for modelling urban expansion. This study designed three features from the perspective of spatiotemporal heterogeneity to improve the accuracy of CA model. The new features cover the trends effects of long time-series data on urban expansion, urban spatial growth intensity based on urban growth kernel estimation and allocation probability of the newly generated urban cells from global neighbourhood effects. Finally, urban expansion in Huizhou, China, was simulated and predicted. The experimental results show that the new features can effectively reduce the prediction error for the total amount of urban growth with a deviation of about 2%, and the overall accuracy of urban expansion is as high as 0.93. The features designed in this paper are shown to be effective and can be applied to urban simulations and scenario prediction with various models.


Introduction
The cellular automata (CA) has become a main method in urban spatial modelling (Li and Yeh 2002;Liu and Feng 2012;Ku 2016;Aburas et al. 2016;Cao et al. 2016;Feng, Liu, and Batty 2016;Feng et al. 2018;He et al. 2018;Mirbagheri and Alimohammadi 2018;Mustafa et al. 2018;Tong and Feng 2020;Xu et al. 2020). However, due to the many factors affecting urban expansionin particular, spatial and temporal heterogeneity, how to excavate the rule of urban expansion to improve the simulation accuracy of urban CA has long been the focus of its simulation application (Said et al. 2021) Methods to deal with urban spatial heterogeneity are mainly divided into two categories. The first category uses the neighbourhood characteristics of each cell to reflect the spatial heterogeneity of the urban distribution in space. This is usually qualified by the number of urban cells or the proportion of urban cells in the neighbourhood (White and Engelen 1993;Poelmans and Van Rompaey 2009;Liu and Feng 2012;Chen et al. 2014;Liao et al. 2016). Apart from the method of directly counting the number of neighbourhood urban cells, the geographically weighted regression model and spatial autocorrelation are also used to measure the effect of the neighbourhood (Vasanen 2012;Nasri and Zhang 2018;Shafizadeh-Moghadam and Helbich 2015;Ku 2016). This type of method follows the basic principle of the interaction between neighbouring cells, and adheres to the law of spatial autocorrelation (Phipps 1989). Although neighbourhood features express some local characteristics, they are not sufficient to show spatial heterogeneity. Zhai et al. (2021) further integrated the annual growth rate to reflect spatially heterogeneous neighbourhoods and generated two spatial heterogeneity layers to improve the transition rules of cellular automata. However, the relationship between neighbourhood features and the allocation probability for new urban cells remains unclear.
The second category of urban modelling methods partitions the whole urban area into different regions, so as to eliminate the influence of spatial heterogeneity. According to the division method of space unit, various methods are adopted. The first commonly used method is directly using the urban land patches classified from remote sensing imageries. This method considers that the scale of urban expansion is proportional to the size of the land patch, then the newly generated urban cells are calculated and allocated based on the patch characteristics. For example, Chen et al. (2014) took the patch as a spatial unit, using the patch growth index proposed by Fragkias and Seto (2009). First, they calculated the area of the patch that may grow, then determined whether the newly growing cells were adjacent to or separated from the original patch randomly, so as to construct patch-based cellular automata for urban expansion simulation. The second method divides the urban land into different homogeneous units according to a series of designed features. Taking the feature of population density as an example, the land cells within different homogeneous units are then applied with different transition rules for urban expansion modelling. Ke, Qi, and Zeng (2016) and Li et al. (2017) first used the results of cluster analysis to the urban expansion driving factors to divide the urban area into different spatial units, then constructed a cellular automata model for urban spatial evolution. Yang et al. (2020) further combined the quantitative composition of the conversion from non-urban land type to urban land type and its dynamic feedback with the spatial form of urban growth so as to improve the simulation effect of cellular automata. There are other urban spatial division methods, such as using the border of administrative units or dividing urban areas into a series of rings or sectors (Firozjaei et al. 2019;Wu et al. 2021). For example, Firozjaei et al. (2019) divided the whole city into eight sectors and used cellular automata based on the Markov chain to simulate the land use transformation of each sector. The simulation results are more consistent with the actual situation than the method without sector division. Xu and Gao (2019) applied the same idea to divide the whole city into 24 fan-shaped areas with due east as the starting point. For each fanshaped region, the characteristics of the newly generated patches were counted as the input conditions of the CA model for urban evolution within the fan area. The simulation results based on the separate spatial unit are more consistent with the overall shape of the city, but the distribution of newly generated urban cells in a spatial unit is still random, and there is a lack of local differences within the same unit. To overcome this problem, Zhang et al. (2020) further designed the global and local spatial heterogeneity features of urban expansion and integrated them into the CA model, so as to improve the modelling performance. Their study results show that the simulation accuracy is greatly improved. Although existing research has engaged in much discussion with regard to eliminating the impact of urban heterogeneity on simulation, the differences in the evolution of local land units have yet to be effectively resolved.
Existing methods have explored the spatial and temporal heterogeneities of urban growth from various perspectives, such as neighbourhood, land patches and partitioned areas. However, in-depth research regarding the urban growth intensity in different regions over time is still lacking. In different locations, there may be multiple such cells, which have identical neighbourhood calculation results, while their evolution results may be completely different. Similarly, some regions with large spatial heterogeneity may have the same evolutionary scale. The reason for this phenomenon is that different areas of the same city exhibit different evolutionary intensities over time. Some studies have mentioned that the time effect is also a key factor affecting the spatial heterogeneity (Song, Shi, and Wang 2020;Shafizadeh-Moghadam and Helbich 2015). However, the dynamics of characteristics over time in the application of urban simulations is relatively lacking . In most cases, the evolution number of urban cells in the previous period is used to estimate the evolution number of urban cells in the following period, and then it is used as the termination condition for the iterative operation of cellular automata. For example, Li and Yeh (2000) proposed that the total amount of urban land growth in different time periods can be set to control the scale of evolution. He et al. (2015) designed an adaptive CA model combined with an intelligent immune model to automatically mine the dynamic rules of CA evolution over time. Shu et al. (2017) considered the change of influencing factors with time by adding variable weights to influencing factors with time through the genetic algorithm and integrated it into the cellular automata model. Although these research methods consider the influence of time to a certain extent, the trend effects from long-time series data, especially the urban growth intensity over time, have yet to be thoroughly discussed. The evolution intensity reflects the quantity of non-urban land evolving into urban land over a certain period of time. Accurately calculating the intensity of urban growth in different regions can reflect not only the trend of urban growth but also the spatial differences of urban growth.
Based on the above analysis, the present paper aims to study the urban growth intensity during different time periods based on the long time series of urban expansion data and to explore the allocation method of the newly generated urban cells in different regions. The results of this study can help to optimize and improve the effect of spatial and temporal heterogeneities in urban simulations. The following section introduces the study area and data materials. The detailed method is described in Section 3, followed by the results and discussion. The final section is the conclusion of this paper.

Study area and data materials
Huizhou, located between 22°24′−23°57′ N and 113°51′ −115°28′ E, a city in Guangdong Province, China, was selected as the case study area. The urban land in Huizhou has taken great places since the performance of opening up and reforming policies in 1978 with a continuous quick growing up. Huizhou's resident population is also growing year by year and has now exceeded 6 million. In 2018, Huizhou planned the strategic orientation of urban development around the construction of a first-class city in China and the planning of the Guangdong, Hong Kong, and Macao Bay Area. At present, it is undergoing rapid urbanization. Taking Huizhou as the research area is in line with the needs of Huizhou's urban development. It is of great significance to explore the evolution rules of Huizhou's urban expansion and simulate its possible development in the future.
The urban land was extracted from the data of human settlement changes published in 2020 which was constructed by Gong's team with a resolution of 30 × 30 square metres per pixel ). This set of data includes urban land data for each year from 1986 to 2018. The data of two time periods ranging from 2001 to 2006 and 2006 to 2011, respectively, are extracted for simulation applications. The slope was calculated using QGIS software with DEM data provided by the Geospatial Data Cloud site, 1 Computer Network Information Center, Chinese Academy of Sciences. The DEM data were originally published by NASA and Japan's Ministry of Economy in 2009 and created by processing and stereo-correlating the 1.3 million-scene ASTER archive of optical images. The vertical accuracy of the DEM data is 20 metres and the horizontal accuracy is 30 metres. Basic geographic data containing transportation, rivers, green parks, and so on are downloaded from the Tianditu map services provided by the National Catalogue Service for Geographic Services. 2 The scale of the original vector data is 1:250000, which was produced in 2017. In order to facilitate the processing, the vector data are converted into raster images with the same resolution of 30 × 30 square metres per pixel.

Basic idea
Urban growth is unbalanced over time. In some periods, urban growth is fast, and while at other times it is gradual. If we do not study the characteristics of urban growth over time, the size of urban growth cannot be accurately estimated. In previous studies, the prediction of urban size is usually determined by the scale of urban growth in the previous time period, with a length of 5 or 10 years. Such a time interval is likely to exceed the cycle of urban development. The basic idea of this study is to use an urban growth data series with a high temporal resolution for mining urban growth rules, which can be used for predicting the urban growth area in the following years. The new growth cells are then allocated to different regions with the assistance of the urban spatial growth intensity and allocation probability of neighbourhood effects proposed in this paper. Finally, simulations are performed using cellular automata.

Estimation of urban growth area with high temporal resolution data series
First, suppose X is the time-series data of urban evolution size changing with years: Considering that the development of a city is periodical, it is reasonable to fit the time-series data by dividing the data into different segments. Suppose the X data series is divided into K segments, then the X can be expressed as follows: where F i represents the fitting function for the data of the i th segment; α={α 1 , . . . , α k , . . . α T } is the split point of X data series; and e i (t) represents the zero-mean noise of the time series. Figure 1 shows the increment of urban pixels in Huizhou from 1986 to 2018, where it can be seen that the urban growth is very unbalanced. Overall, Huizhou's urban expansion has gone through five stages. From 1985 to 1990, the urban growth scale and speed showed a slight downward trend. The first period of rapid expansion began from 1991 to 1993, after which it began to decline again. The rate of urban expansion then declined every year until 1997. From 1998 to 2011, it underwent a rapid growth phase of the increase year by year, then from 2012 it entered a phase of decline. By observing the characteristics of Huizhou's annual growth, the previous simulation and prediction methods with intervals of 5 or 10 years exhibit very large instabilities, and are likely to cause prediction deviations in the model. It is reasonable to simulate the urban growth based on the piecewise fitting method, and the most important is to detect the turning point automatically. In most cases, the turning point can be determined by visual inspection, and the fitting functions can be regressed manually segment by segment. In order to segment the urban growth data series automatically and efficiently, we propose a moving window scheme to detect the turning point and carry out regression fitting ( Figure 2).
According to the visual inspection for the growth point in Figure 1, it can be determined that there is a nonlinear relationship between the number of urban growth pixels and years. After various fitting analyses, including polynomial fitting, exponential fitting and linear fitting, the r square values are all lower than 0.5, which means the fitted equation is not significant. Considering the interannual volatility of urban growth, the moving average method is selected for prediction of urban growth over time. For Huizhou, a moving window with 3 years is tested, and is suitable for detection of turning point in the timeseries data. First, the slope of each point is computed by itself and the following two points, after which the positive and negative changes of the slope at each point can be determined. In our programme, if the slope sign of the detected point changes and two identical slope signs appear consecutively, then the point is determined to be a turning point. Next, the turning points are used to segment the data series for regression and prediction. With the high temporal resolution data series being supported, the urban growth data are then divided into different segments representing different development trends. For the estimation of urban growth, this piecewise regression method is obviously more accurate.

Urban spatial growth intensity based on urban growth kernel density analysis
The imbalance of urban spatial growth is first manifested in the inconsistency of growth intensity. That is, some regions grow quickly, while others grow gradually. Identifying the intensity of urban growth in different regions helps to simulate the actual urban form. Different from the method of dividing the city into concentric circles or sectors, we quantify the urban spatial growth intensity according to the urban growth kernel density estimation. Kernel density estimation is a method for hot area analysis and detection. It is calculated based on the number of event points in a location, with larger number of clustered points, thus resulting in larger values. A smooth density surface of point events for the whole city can be generated by density estimation. For urban growth, the state of a cell changing from non-urban to urban may be considered as a point event. Taking Huizhou, for example, the urban changing point events can be generated by converting the raster pixels to vector points. Figure 3 demonstrates the urban growth event points from 2001 to 2006.
According to the equation given by Xie and Yan, the kernel density estimation equation can be described as Equation 2:  where λ (s) stands for the density at location s; r is the search radius of the kernel density estimation; n is the number of event points; and k is the weight of a point i at distance d is to location s. k is usually modelled as a kernel function of the ratio between d is and r. A kernel with a Gaussian function given by Equation 3 is usually used.
To quantify the urban spatial growth intensity, a kernel density analysis tool in QGIS software was used, and the size of the search radius is the key parameter to be set. According to the experiments performed by Liao et al. in 2016, the best model performance is obtained when the neighbourhood is 1.36 km. To simplify the calculation, in our study, we set the value of r to 1,000 m. Figure 4 shows the urban growth intensity from 2001 to 2006, from which it can be clearly seen that the urban growth is obviously nonstationary. The urban growth of most places is not strong, with only a few regions being hot areas of urban expansion from 2001 to 2006. The intensity of urban growth well reflects the urban evolution trend in 2-D space. If the city continues to undergo this growth trend, the urban growth intensity will be a good auxiliary variable for the urban CA model.

Allocation probability of new urban cells from neighbourhood effects
Based on the method described above, the urban growth size can be estimated using a piecewise regression method. However, this predicted urban size represents the total growth of a city. In addition to the growth of the total number of pixels, the accuracy of the spatial allocation of new urban pixels is an important metric by which to measure the simulation results of urban expansion.
Many factors, especially the neighbourhood numbers of urban pixels, have been used to estimate the probability of urban cell state changing. Previously, either the 3 × 3 or 5 × 5 neighbourhood was used to compute the neighbourhood effect on the generation of new urban cells (Chen et al. 2014;Ku 2016;Mustafa et al. 2018). The probability from neighbourhood effect was always calculated by an equation, such as Equation (3): where P stands for the transition probability of the central grid cell from undeveloped to developed land; S i is the state value of i th cell in the range of neighbourhood, and the state value was usually 1 representing developed cell state and 0 for undeveloped cell; and m is the radius of the neighbourhood. Although there are some variations to calculate the neighbourhood effect, the calculation method is almost the same.   (2) for the calculation of probability from the neighbourhood, Figure 5(a) has the minimum evolution probability for the central grey cell, since there is only one urban cell in the neighbourhood area. While the central grid in Figure 5(c) will have the highest probability to be evolved into an urban cell; Figure 5(b) will have a moderate probability for half the number of urban cells in the neighbourhood range. This probability calculation method used by many previous studies seems reasonable, yet it ignores the normal status of urban spatial distribution. Taking Figure 5(c) for example, the spatial distribution of urban pixels is very dense, and the non-evolved cells may be used as the space of urban activities, such as a park or green space. In this case, the central grey cell in Figure 5(c) will never be evolved into an urban cell. In other words, high-density urban pixels in the neighbourhood do not necessarily give the central pixel a higher evolution probability.
In order to discover the relationship between urban cell density and the probability of changing non-urban cells to urban cells in a neighbourhood, a neighbourhood with the dimensions of 5 × 5 was used. The urban cells in Huizhou from 2007 to 2012 were, respectively, used to count neighbourhood urban densities and number of new urban cells. To simplify the calculation, we used the number of urban cells to stand for the density in a neighbourhood, because all the data had the same neighbourhood size. The probability for a certain density could be calculated as follows: where P den stands for the probability of emerging new urban cells at the density of den; N den is the total number of neighbourhoods with existing urban cells equal to den in the starting year of the growth phase, for example, 2007 of phase from 2007 to 2012; and U new stands  for the number of new emerged urban cells in the n th neighbourhood. The equation can also be explained as the average number of newly emerged urban cells per neighbourhood with a certain density. Figure 6 demonstrates the relationship between the density in a neighbourhood and the probability calculated by Equation 4 for the 5 × 5 and 10 × 10 neighbourhood, respectively. It can be noticed that a higher number of urban cells, which signifies a higher urban density for a neighbourhood, cannot generate a higher probability of emergence of new urban cells. If the number of urban cells in a 5 × 5 neighbourhood is in the range of 5 to 10, then comparatively higher probabilities can be acquired. Meanwhile, within the 10 × 10 neighbourhood, higher probabilities are found in the range of 20 to 40. However, the two different sizes of neighbourhoods demonstrate the same trend pattern. For both neighbourhood sizes, the corresponding equations can be fitted, and based on the fitted equation, the probability of evolving one cell from a non-urban state to an urban state can be easily acquired by using the number of urban cells in a neighbourhood area.

Transition rules with support of high temporal resolution data series
The urban CA transition rule consists of two parts: the termination condition of evolution and the evolution probability of the grid cells. The termination condition of evolution is usually determined by the estimated number of urban cells. Once the number of evolved urban cells reaches an estimated amount, then the simulation iteration will be ceased. As described in Section 3.2, the total number of grid cells that would evolve into an urban state can be estimated by a piecewise regression method. The evolution probability of grid cells is affected by many factors; aside from the traditional factors, we integrated two new ones, namely urban spatial growth intensity and spatial allocation probability of new urban cells from neighbourhood effects.
Transition rule is the key component of the CA model. It has been the mainstream to use the machine learning method to calculate the transition rules of the CA model (Li et al. 2012). All the features related to a cell can be used as input variables, and the final state of the cell can be remarked as the output of the CA model. In most cases, we define the output, with 1 representing a new emerging urban cell and 0 no changed cell. The urban growth modelling based on machine learning can then be expressed as follows: where M stands for the selected machine learning model used for predicting the cell state at time t + 1; X t stands for the input features of a cell at time t; S t+1 is the cell state predicted by model M with input of X t ; S t stands for the state of a grid cell at time t; and R stands for the effect of stochastic disturbance to urban growth, which can be computed by Equation (12): where R stands for the disturbance value; r is a random number between 0 and 1; a is a control parameter between 0 and 10; the threshold is determined according to the debugging results of the model; P den stands for spatial allocation probability of new urban cells from neighbourhood effects calculated from Equation (5); and P k is the normalization value of urban spatial growth intensity based on the kernel density estimation in Equation (1).
As is well known, urban growth is affected by many factors, including urban population growth, Figure 6. Relationship between urban density and probability of emerging new urban cell for 5×5 and 10×10 neighborhoods. transportation, industrial development and government policies, and thus the future spatial pattern presents obvious uncertainty. This uncertainty can be presented to a certain extent by introducing the variable R (Mirbagheri and Alimohammadi 2018). Some areas, such as protected agricultural land and parks, are forbidden for development, and cells located in such areas are set as constrained cells that will remain in their original state, as existing urban cells. Therefore, the key task is to predict the state of the cells that do not belong to the existing urban and constrained areas. The objective of cellular automata is to evolve every non-urban cell according to the above rules until the total number of urban cells reaches the expected number.

Feature selection
Referring to previous studies, the features integrated in the CA model are usually selected according to the data available for acquisition. In the present study, Table 1 shows the features used in the prediction model, and about 12 variables were prepared. These features reflect the effects of the neighbourhood, natural conditions, spatial proximity, and kernel density estimation. Most of these features have been widely applied in previous CA models.

Model implementation
The entire CA model was implemented under the QGIS 3.14 and Python 3.7 environment. QGIS is an open-source software for geographical data processing that is integrated with many useful plug-ins. First, layers of transportation, land use, DEM, and administrative borders were created as the basic data. For the vector data such as transportation and administrative borders, rasterization operations were performed for the data format consistency. Second, the features listed in Table 1 were calculated respectively by generating new characteristic layers. Finally, the characteristics data of all the predictive layers were exported and saved in a CSV file. These data were used as the data of the input variables for the CA model and to generate the output results.
The CA model was implemented in a Python 3.7 environment with scikit-learn, matplotlib, and numpy supported. Scikit-learn is a robust library with a range of supervised and unsupervised learning algorithms that can be used to simplify machine learning applications in a production system. With the support of the scikit-learn library, SVM, neural network, and logistic regression models were all tried to predict the urban cell state.

Model evaluation
Because the grid cells were expected to evolve into either urban or non-urban cells, it can be considered as a binary classification problem, and the confusion matrix is a common method for evaluating the model. A confusion matrix shows the number of correct and incorrect predictions made by the model compared to the actual value in the data. If a record with an urban state is correctly classified by the model as an urban cell, then it is True Positive (TP), and if it is incorrectly classified as a non-urban cell, then it is False Negative (FN). Similarly, if a record in which its real value is non-urban is incorrectly classified as urban by the model, then it is False Positive (FP), and if it is correctly classified as non-urban, then it is True Negative (TN). After the creation of confusion matrices, the model accuracy, precision, recall, specificity and f1-score are calculated by the following equations (Suthaharan 2016): Among the above indexes, the F1-score is an aggregative indicator considering both the precision and recall, and is usually used as the final evaluation index.

Forecast of urban growth area
According to the piecewise regression method introduced in Section 3.2, first the urban growth area was estimated based on the urban growth data series from Distance to provincial roads Pixels 7 Distance to county roads Pixels 8 Distance to water bodies Pixels 9 Distance to governmental agencies Pixels 10 Distance to green parks Pixels 11 Distance to urban centre Pixels 12 Urban growth density Kernel density estimation 1985 to 2018. In Figure 7, the blue dots represent the newly increased urban cells for each year from 1985 to 2018. The red dots are the turning points automatically detected by the programme. The solid lines are the fitting lines of piecewise regression. It can be seen that there is a steady increase from 2001 to 2011, which means that the urban expansion in this period is suitable for simulation. In most cases, the period is divided into two stages for urban modelling. The data in the first period were used for model training and testing, while the data in the second period were used for prediction validation. Table 2 shows the urban growth in the two periods of 2001 to 2006 and 2006 to 2011. It can be seen that the cell numbers of urban growths in two periods are quite different. If the urban growth number in the first period was directly considered as the urban growth number of the second period, then the error rate of the result would reach 57%. The strategy of directly applying the calibration parameters of the previous period to simulate urban expansion in the following period has often been used in past studies (Siddiqui et al. 2018;Mirbagheri and Alimohammadi 2018), yet this is obviously inappropriate. Instead, we use the piecewise fitting method (Figure 7). The linear regression equation with r square value of 0.89 between 2001 and 2011 can be expressed as follows: According to the regression equation, the number of increased urban cells in each year can be estimated easily, and the number of predicted newly urban cells in 2011 would reach as many as 1,764,551. The error rate is only about 2%.

Training data and parameters
As discussed in the paper, the selection of modelling periods is a key factor affecting the model accuracy. As demonstrated in Figure 7, there is a continued increase in the number of cells from 2001 to 2011. It is reasonable to train a model with the data from 2001 to 2006 and then use the model to predict the urban expansion in the following period of 2006 to 2011, since the two periods have almost the same trend. Table 3  During the training process, the cells that changed from non-urban to urban state were used as the positive samples, while the non-urban cells without a change in state were considered as negative samples. According to past studies, a balance sampling method has been proven to be the best sampling strategy for CA modelling (Karimi et al. 2019). Therefore, the balance sampling method was used in the present study. Due to the large number of cells, 30% of the new urban cells with an equal number of non-urban cells were randomly collected to train the machine learning models. SVM, LR, and ANN, as algorithms often used in cellular automata, were selected for comparison. The difference   between model accuracies with different sample proportions is very small. Table 4 shows the training accuracies of three models using a fivefold cross-validation method. It can be found that the ANN algorithm achieved the highest accuracy and was thus integrated into the CA model.

Urban simulation results
Based During the simulation, the initial urban cells in 2001 remained in the same state. Therefore, it is reasonable to evaluate the model using only the new evolution cells, and this can most accurately reflect the model's performance. Table 5 shows the performance results of the CA model for two periods. It can be seen that the F1-score was 0.42 when only considering new evolved urban cells, which is an acceptable result. Furthermore, when the same transitional rules were applied to predict urban expansion in 2011, then the precision, recall, and F1score were all greatly improved, which reveals that the first and second stages are highly consistent. The overall accuracy was computed for all the cells, including both urban and non-urban ones. Because non-urban cells accounted for the vast majority, the overall accuracy was very high, with a value of at least 0.93.

Estimation of urban growth area
In previous studies, some researchers have made scenario predictions by calculating the area of growth according to different hypotheses (Bharath et al. 2018;Siddiqui et al. 2018;Liang et al. 2018). However, this artificial method does not have sufficient statistical basis; thus, the guiding value for urban planning is limited. In this paper, we propose a method to estimate the urban growth area with a high temporal resolution data series. This method considers the imbalance of urban development over time and designs an automatic programme for judging the trend turning point of urban development, thereby providing an analysis tool for urban space simulation and prediction. Taking the periods of 2001 to 2006 and 2006 to 2011 as examples, the number of newly generated urban cells for the first period is 43,160, yet the growth number in the second period is 68,094, the former being much lower than the latter. If we only follow the same trend for urban CA simulation, there will be a large deviation compared with the actual situation. Consequently, the prediction with piecewise regression formula is obviously consistent with the actual growth.

Distribution of new urban cells affected by neighbourhood
The neighbourhood that is used in almost every CA model bears a major impact on the evolution of the central cell. However, this does not mean that the more urban cells there are in the neighbourhood (i.e. a density of urban cells) will lead to a greater probability of the central cell evolving into an urban cell. Figure 9 shows the proportion of central cells, with different urban cell densities evolving into urban cells. In Figure 9, we demonstrate three conditions under different numbers of urban growth pixels, namely about 10,000, 40, 000 and 10, 000. The different urban growth numbers are, respectively, calculated by three periods, including 2001 to 2003, 2001 to 2006 and 2001 to 2011. It can be seen that the probability of new urban cells is in accordance with normal distribution. Taking the 5 × 5 neighbourhood as an example, when the number of existing urban cells in the neighbourhood is between 8 and 10, then the central cell has the highest conversion rate. On the contrary, when there are too many or too few urban cells in the neighbourhood, then the probability of emerging new urban cells is substantially lower. Compared with the traditional calculation of neighbourhood effect, this method can effectively avoid the generation of cells being too dense in the process of evolution iteration.

Decision of threshold and number of iterations
According to the features related to the central grid cell, a transition probability is calculated. The threshold directly affects whether the central cell evolves into an urban state, and also affects the number of iterations. In order to avoid the threshold set being either too large or too small, we calculate the evolution probability of all cells by first simulating it once, and counting the average value and standard deviation of all cell evolution probabilities. The appropriate threshold is then set according to the number of evolution cells. Figure 10 shows the difference of simulation results with only one iteration and with seven iterations. In Figure 10(a), the threshold is set as 0.0031, which is a very small value, so that sufficient non-urban cells could be changed into urban cells with only a single iteration. The distribution of new urban cells is very highly concentrated. In contrast, the threshold value of Figure 10(b) is 0.038, and it must be iterated seven times to reach the expected number of evolved urban cells. It can be noticed that the more iterations there are, the more dispersed the distribution of evolved urban cells will be. According to the actual urban spatial distribution in 2011, the result of a single iteration is more closely in line with the actual situation, thus this study only uses one iteration.

Scenario prediction in the future
As mentioned above, the growth rate of Huizhou's urban expansion differed in various periods, including the rapid rise period from 2002 to 2006 and declining period from 2013 to 2018. From the previous urban growth trend, we can infer that there will be several different possibilities for future urban development. According to the study performed by Chen et al. (2020), global urban land will continue to expand rapidly before 2040s. In this study, we assume that there are three scenarios for the development of the next 10 years beginning from 2018: the first is stable development; the second is a slow decline in the growth rate; and the third is accelerating growth. Based on the assumed growth rate under three scenarios, the respective total number of cells in the urban expansion will be 21,085, 38060, and 104,715. Figure 11 shows the urban growth trends with three different development scenarios for low, moderate, and high speed of urban growth.
Next, according to the method of urban spatial growth intensity based on urban growth kernel density analysis proposed in Section 3.3, the spatial growth intensity is calculated ( Figure 12). Based on the distribution of urban cells in 2018, we forecast urban expansion under three scenarios. Figures 13 and 14 respectively show the predicted urban expansion in 2028 with declining and stable growth rate. It can be seen that the expansion of the city is mostly concentrated around the main urban area in both cases. For the third case in Figure 15, due to the accelerated expansion, the number of cells that evolve into cities greatly increases. In addition to the surrounding areas, the outlying suburbs are also developed.
The scenario prediction is first used for the estimation of urban growth areas with high temporal resolution data series, followed by an intensity distribution map based on kernel density analysis. The simulation results are basically consistent with the trend of spatial growth intensity. This also implies that the intensity distribution map based on kernel density analysis bears a certain effect on the prediction of urban development.

Conclusion
Modelling urban expansion is an important method for urban planning and sustainable development. Various features can greatly impact the simulation results. In order to extract more effective features, this study analysed the characteristics used for improving the CA model from three aspects. First, we proposed to estimate urban growth areas with high temporal resolution data series, and the piecewise regression method was used to predict the total amount of urban growth. Second, we quantified the intensity of urban growth in different regions using a kernel density estimation method. The intensity of urban growth designed in this paper helped to simulate the actual urban form. Third, we further discussed the allocation probability of new urban cells based on neighbourhood effects.
By adding the features designed in this paper, a new CA model was implemented and proven to be effective in the simulation and prediction of urban expansion in Huizhou, China. The experimental results show that the predicted urban growth area using the piecewise regression method is most consistent with the actual growth. The urban spatial growth intensity can effectively improve the spatial distribution of urban growth and can increase the accuracy of the urban CA model. As for the neighbourhood effects, we observed that more urban cells in the neighbourhood, does not necessarily mean a greater the probability of the central cell evolving. On the contrary, too many or too few urban cells in the neighbourhood will lead to a low probability of new urban cells emerging. For the 5 × 5 neighbourhood, the highest conversion rate appeared when the number of urban cells in the neighbourhood was between 8 and 10. Notes 1. https://www.gscloud.cn/sources/accessdata/421?pid=1. 2. https://www.webmap.cn/.

Disclosure statement
No potential conflict of interest was reported by the authors.