Urban growth simulation and scenario projection for the arid regions using heuristic cellular automata

Arid regions tend to form compact urban patterns that have significant implications on urban growth and future urban patterns. Spatial simulation and projection using cellular automata (CA)-based models are important for achieving sustainable urban development in arid regions. In response to this need, we developed a new CA model (GSA-CA) using the gravitational search algorithm (GSA) to capture and project urban growth patterns in arid regions. We calibrated the GSA-CA model for the arid city of Urumqi in Northwest China from 2000 to 2010, and validated the model from 2010 to 2020, and then applied to project urban growth in 2040. The results indicated that the optimal performance of the model was achieved when the fraction of the population was 0.5. GSA-CA achieved an overall accuracy of 98.42% and a figure of merit (FOM) of 43.03% for the year 2010, and an overall accuracy of 98.52% with FOM of 37.64% for 2020. The results of the study help to adjust urban planning and development policies. The developed model has the potential to be employed in simulating urban growth and future scenarios in arid regions globally, including Northwest China and Africa.

simulation and projection of urban growth in these regions by incorporating climatic variables such as drought to guarantee the results are accurate and reflect the distinctive environment and development dynamics.
Researchers have developed and validated different CA models for simulating and projecting urban growth [18][19][20] .However, these models were usually designed for fast-growing coastal areas and are not universally applicable to slower-growing arid regions 21 .Therefore, these models need to be modified when applied to the arid regions of Northwest China.The arid regions of Northwest China have complex topography and comparatively slow growth rates.Specifically, there are two main characteristics: (1) the undulating terrain restricts urban growth, and the land-use type is mostly desert, which makes the construction land scarce, and thus it is necessary to consider the terrain factor in the CA model; and (2) the low rainfall and high evapotranspiration rate lead to sparse vegetation, which are important considerations for the urban growth model.This suggests that developing CA models appropriate for arid regions is a pressing necessity to explore the spatio-temporal patterns for urban growth in Northwest China.These models will greatly improve the planning optimization and carrying capacity of urban layouts in these regions 22,23 .
In light of the aforementioned considerations, CA model development and projection research for arid areas in Northwest China, such as Yinchuan City, Xining City, and Urumqi City, urgently needs to be strengthened.This study aims to address several key research questions for urban simulation in arid regions: (1) Can heuristics effectively simulate urban growth in typical arid regions?(2) Can the parameters of the heuristic algorithm be adjusted to more accurately simulate urban growth in arid regions?(3) Can the resulting models accurately project future urban growth scenarios for arid cities in Northwest China, such as Urumqi?The insights gained from answering these questions will greatly enhance the CA modeling framework and contribute to a deeper understanding of urban growth in arid regions, especially in Northwest China.The gravitational search algorithm (GSA) utilizes Newton's Law of Gravity and the Second Law of Motion to avoid local optimization and enhance the global search capability by dynamically adjusting the gravitational force and acceleration to converge rapidly; simultaneously, due to the effect of gravitational force, objects with large mass are given priority, which can make the solution set converge towards better solutions, thus improving the overall search accuracy; moreover, compared with other complex optimization algorithms, the algorithm structure is simple which requires excessive parameter adjustments and is easy to implement 17 .Consequently, we constructed a CA model (named GSA-CA) using GSA, and the model was developed within the UrbanCA framework 24 .This framework incorporates a time-increasing parameter and a locally adjusted parameter for generating the probability of occurrence map.The GSA-CA model was used to investigate the dynamics of urban growth and to project future urban scenarios for the urban area of Urumqi, situated in Northwest China.In our study, we calibrated the GSA-CA model with data from 2000 to 2010, validated the model for 2010-2020, and projected the urban scenarios for 2040 for Urumqi.Through this study, we endeavor to develop a new approach for urban growth in arid regions and apply it to different arid regions.

The workflow
Figure 1 shows how to optimize CA parameters using the heuristic GSA for constructing a dynamic simulation model to project urban growth in an arid area.We selected nine driving factors for modeling, ranging from topography, human impact, socio-economics, and climate.We used a systematic sampling method to extract training samples from the input land-use maps (2000 and 2010) and the factor layer (i.e., the 9 factors).We then constructed an objective function representing the difference between the GSA-CA model and the actual urban growth, projecting the space of urban growth models into the heuristic GSA's search space.Based on the objective function, we initialized the agent population size, calculated the fitness function value and mass, and updated the optimal position.Finally, GSA identified the optimal CA parameters and generated the probability of occurrence maps using GSA-CA with different fractions of the population with the best fitness.Ultimately, the GSA-CA model was used to project two urban growth scenarios for Urumqi in 2040.Modeling and implementation were performed in UrbanCA software, which is available to users worldwide 24 .

The basic urban CA model
CA can be considered as a state model consisting of a cell space and a transformation function that defines the likelihood of urban growth.In the CA framework, the transformation rules are established through the combination of five factors: cell state (CST), urban growth factor (P UGR ), neighborhood effect (NEF), spatial constraints (SCO), and stochastic factors (SFA) 25 .The CA transformation rules can be expressed as 26 : where CST t+1 i and CST t i denote the state of cell i at time t and t + 1, respectively; TFU denotes the transformation function.
The global probability of occurrence (P GPO ) can be given by 24,27 : where S TIP denotes a time-increment parameter to resist the decaying effect of local probability-of-occurrence; S LAP denotes a local adjustment parameter to reduce the enhancement of the neighborhood effects; a 0 denotes a constant; x j denotes the j-th driving factor of urban growth; a j denotes the weight of factor x j ; δ denotes the modeling residual. (1)    P GPO = (P UGR ×(1+S TIP ) www.nature.com/scientificreports/Smaller modelling residuals can lead to optimal CA parameters, which improve the simulation accuracy.To obtain these parameters with reduced modelling residuals, the objective function can be given by 24 : where F(u) denotes the objective function; u = (u 0 , …, u d ) denotes a feasible solution of CA parameters; P 0 denotes the observed urban growth; s denotes the number of samples.

The GSA-CA model
To realize the objective function, we employ the GSA method originally proposed by 17 , which searches for the optimal solution through iterations.It is assumed that there are n agents (masses) in a d-dimensional search space, and each agent represents a unique set of feasible CA parameters, with each dimension associated with a factor that affects urban growth.The configuration of agents (masses) can be characterized as: where A t+1 k denotes the position of the k-th agent selected (the k-th CA parameter) at the t + 1-th iteration.A higher mass leads to a more efficient agent that walks slower and has a higher attraction.We update the mass ( M t k ) by the following formula: where m t k represents the inertial mass of agent k at time t; F(u) t k represents the objective function value of the k-th agent at time t; Max j∈{1,...,N} F(u) t k and Min j∈{1,••• ,N} F(u) t k correspond to the maximum and minimum values of the objective function value, respectively; n represents the number of agents.
According to the law of motion, the acceleration a of agent k at time t ( a t k ) is calculated as follows: (3) www.nature.com/scientificreports/where F t k and F t kl represents the total force acting on agent k and the force acting on l-th mass from k-th mass, respectively; rand l is a random number in the interval [0,1]; G(t) is gravitational constant at time t; M t al and M t pl is the active gravitational mass and passive gravitational mass related to agent l, respectively; e is a small constant; R t kl is the Euclidian distance between two agents k and l; A t k and A t l denote position of k-th and l-th agent at time t, respectively.
The velocity of the agent is considered to be the sum of its current velocity and acceleration; thus the position of the agent is updated with the following equation: where v t+1 k and v t k denote velocity of k-th agent at time t and t + 1, respectively; A t+1 k denote the position of k-th agent at time t + 1.

The model evaluation methods
To evaluate the accuracy of the simulation results, an error matrix was computed to enable a cell-by-cell comparison of the observed urban patterns with those simulated by the GSA-CA model 28 .The matrix yields two key metrics: overall accuracy (OA) and figure of merit (FOM).

Study area
Urumqi is a typical arid city in Northwest China, and the study of its urban growth can provide a reference for urban layout planning in such areas.The city is the capital of Xinjiang Uygur Autonomous Region in Northwest China and has a continental temperate climate.The average annual precipitation of the city is 236 mm, and evapotranspiration is much greater than precipitation, resulting in sparse vegetation, making the drought as a major constraint for urban growth.Urumqi had a population of 4.08 million in 2023, of which more than 90 percent was made up of urban dwellers.The city is encircled by mountains on three sides and has seven districts and one county with a total area of 13,800 square kilometers (urumqi.gov.cn).The Xinshi District, Tianshan District, and Shayibake District, along with the surrounding areas are the center of urban growth in Urumqi (Fig. 2).

Datasets and the preprocessing
We used GLC_FCS30 datasets with a resolution of 30 m as land use data for the years 2000, 2010, and 2020 29 .The official overall accuracy of GLC_FCS30 as a public land use data product is 82.5%, with a kappa coefficient of 0.784 29 .Additionally, this data product performs exceptionally well in capturing intricate urban details in arid regions 30 .We combined the GLC_FCS30 datasets into three types: urban, nonurban, and excluded types (water body), as our study focuses on urban growth in arid regions.Studies have shown that urban growth is influenced by topography, human activities, climate change, and socio-economics 31,32 .Among these factors, we selected nine influence factors to characterize the arid regions and express their relationship with urban growth.We used the Shuttle Radar Topography Mission Digital Elevation Model (SRTM DEM) to calculate surface elevations to assess the impact of topography on urban growth.We extracted the spatial proximity of city centers, town centers, railways, and roads, and assessed their impact as human influences features on urban growth.Given the study area is located in arid regions, we chose drought intensity and land surface temperature (LST) as climatic factors to assess their impact on urban growth.Among them, drought intensity 33 was calculated based on the LNPS-EWM model using the entropy weighting method and considering four factors including land surface temperature, normalized difference vegetation index, potential evapotranspiration, and soil moisture.Specifically, we pre-processed the above-mentioned four factors, including land surface temperature downscaling, obtaining soil moisture and validation.The factor weights determined using the entropy weighting method can objectively reflect the information of the original data, thus effectively avoiding the bias caused by human factors.Consequently, these factors were normalized to be dimensionless, and the information entropy for each factor was calculated based on the entropy weight method to determine the weight of each factor in the LNPS-EWM model to generate the drought intensity map.Socioeconomic factors include gross domestic product (GDP) from the NOAA website and population per pixel (PPP) from the worldpop website (Table 1).
To reduce the computational cost and speed up the simulation process, we normalized all the spatial variables.The normalized proximity variables and remotely sensed images were then visualized as input layers for the CA model using ArcGIS 10.8 (Fig. 3).( 6)

Observed urban growth pattern
To analyze urban growth in Urumqi, we collected multiple types of GLC_FCS30 for the years 2000, 2010, and 2020 to generate urban patterns for Urumqi (Fig. 4).Here, we considered impervious surfaces as urban and water body as a spatial constraint on urban growth, which was excluded from further modeling, and then merged the remaining land use categories as non-urban.Between 2000 and 2010, urban growth in Urumqi occurred mainly in low-lying areas close to the existing built-up areas, showing wrap-around growth (Figs.4a, b).Between 2010 and 2020, urban growth in Urumqi occurred mainly in the northeast and southwest of the original built-up area, in the Xinshi District, Tianshan District, Shaybak District, and the surrounding low-lying areas, showing a significant agglomeration pattern (Fig. 4c).

The CA parameters and probability of occurrence map
The optimal solution of a heuristic algorithm is vulnerable to its control parameters.Among heuristic algorithms like GSA, smaller populations (numPopulation) may lead to locally optimal solutions, while larger populations  may lead to heavy computational loads.In this study, numPopulation was specified as 20 times the number of variables according to an earlier publication 24 .The change in the fraction of the population (kbest) leads to different simulation results.In the study, we determined the optimal kbest through running simulations with kbest set at 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9, respectively.We set the lower bounds for all positive parameters and the upper bounds for all negative parameters to zero.Instead, the upper bounds of the positive parameters and the lower bounds of the negative parameters were set to twice the parameters obtained from logistic regression 24 .
We fixed the number of iterations (maxIter) to 5000 and the tolerance threshold to 1e−6 (Table 2).We used the GSA to obtain CA parameters for different fractions of the population with the best fitness scenarios (Table 3).Except for GDP and PPP, positive values of the parameters for the other factors indicate an inhibitory effect on urban growth, while negative values indicate a facilitating effect on urban growth.Moreover, the absolute value of each parameter reflects the extent of its impact on urban growth.In the GSA-CA model,  Figure 5 shows the probability of occurrence maps generated using the GSA-CA model for different kbest, with the probability of occurrence values ranging from 0 to 1. Visual inspection of the probability of occurrence maps shows higher probabilities around the city center, suggesting that land close to the existing city center is more suitable for development.Significantly, the probability of occurrence map for kbest settings of 0.5 was slightly higher than those for other kbest values, suggesting that kbest of 0.5 was likely to be the most efficient simulation scale.The probability of occurrence maps were used to construct a CA model to simulate the urban pattern in 2020 and project future urban growth in 2040.

The simulated results
We used the GSA-CA model to simulate the urban pattern for 2010 (Figs.6a-i) and 2020 (Figs.6j-r).The results showed that the urban growth from 2000 to 2020 mainly concentrated in the low-lying areas in proximity to the existing built-up areas, demonstrating a tendency towards outward urban growth.Urumqi's urban growth primarily occurs in the Xinshi District, Tianshan District, Shaybak District, and the surrounding low-lying areas, exhibiting a clear agglomeration pattern.

The GSA-CA model assessment
A pixel-by-pixel comparison was conducted between the simulated urban growth and the actual growth observed for the years 2010-2020.Figure 7 illustrates that the simulation accuracy increased and then decreased as the fractions of the population (kbest) value increased.The overall accuracy of the urban pattern simulations in both the calibration and validation phases exceeded 98.40%, with the FOM exceeding 36.50%.During the calibration phase, the highest overall accuracy (98.42%) and FOM (43.03%) were observed at kbest value of 0.5.A similar pattern was observed in the validation phase, with peaks in overall accuracy (98.52%) and FOM (37.64%).These results indicated that the GSA-CA model performed well and had a slightly higher simulation capability in the calibration phase than in the validation phase.
Figure 8 illustrates the assessment maps that show the distribution of hit cells, missing cells, and false cells with the highest modeling accuracy for the calibration and validation for 2010 and 2020.These maps were dominated by correct projections (hits), omissions (misses), and incorrect predictions (false alarms) for 2010 and 2020, with kbest set to 0.5.The zoomed-in portion shows that accurate simulations typically occur near the original urban area, while false alarms are more prevalent near accurate simulations.Missed areas of projection are typically observed in proximity to correctly simulated neighborhoods.www.nature.com/scientificreports/

Future scenario projections
We used the calibrated GSA-CA model based on the best-performing results from previous studies with fractions of the population (kbest) at 0.5 to project urban growth patterns in 2040 under two different scenarios (Fig. 9): (a) Scenario I (BAU Scenario): a business-as-usual (BAU) approach based on extrapolated urban growth rates from 2010 to 2020; and (b) Scenario II (RUG Scenario): a rapid urban growth (RUG) scenario, in which urban growth around the major urban centers is expected to accelerate.In Scenario I, the same rate of urban growth in Urumqi is assumed as in the previous period, without allowing for potential changes in policy, infrastructure improvements, or economic conditions.In this scenario, urban growth is mainly dispersed around the original urban area, particularly in the Xinshi District and Midong District (Fig. 9a).In Scenario II, accelerated urban growth is foreseen, with the built-up area increasing by 1.5 times in the following 20 years.This scenario results in a triangular-shaped urban pattern concentrated in the peripheral areas of the urban area (Fig. 9b).Above all, the two urban growth patterns lead to two different urban scenarios, which provide valuable insights for urban planning and policy making in Urumqi.

Discussion
The complex topography and special climate of arid regions exert a significant influence on urban growth and ultimately urban patterns 32,34 .Therefore, it is imperative to simulate accurately the dynamic urban growth in arid regions like Urumqi by considering the topographic features and climatic features 35 .Meanwhile, the effectiveness of modeling methods is crucial to reflecting the current urban conditions and projecting future urban scenarios 12 .Consequently, a meticulous selection of typical drivers for the study area and the optimization of modeling approaches are essential for generating different future land use and urban growth projections for arid regions, which in turn can help to formulate sound urban growth policies.

The model performance and impact of heuristic parameterization on modeling
Regarding cellular automata-based urban growth simulation, many publications have applied different modeling approaches to simulate and project future scenarios for various kinds of areas.However, different modeling approaches perform differently in arid regions, and the same models perform differently in different arid regions 36 .The models including SLEUTH, CA-Markov, and FLUS are widely recognized for their ability to dynamically simulate the urban growth process by considering the influencing factors in a comprehensive way 35,37 .SLEUTH-Density model was developed to simulate the land density in the built-up area of Ajmer city, India, which has good performance 38 .Researchers used CA-Markov to investigate land use and land cover change in the desert region of Pakistan with an overall accuracy of more than 87% 39 .The overall accuracy of our modeling is over 98%, which is much higher than the simulations of many cities, confirming the superiority of our results 28 .This suggests that our method provides a commendable alternative for urban growth simulation and projection in arid regions.Parameter settings for CA models targeting different study areas can vary significantly 12,28 .Fundamentally, GSA is a stochastic optimization algorithm that generates different simulation results when run multiple times with the same training data 40,41 .In our calibration of nine GSA-CA models with a parameter for the fraction of the population with the best fitness, we determined that the fraction of the population with the best fitness was 0.5.The study found that the modeling accuracy of the model improved slightly as the fraction of the population with the best fitness increased, and that beyond the 0.5 threshold, increasing the fraction of the population with the best fitness did not significantly improve GSA performance.It is therefore recommended that for regions like Urumqi, the fraction of the population with 0.5 may be the most effective parameter.

The leading factors of Urumqi expansion and the suggestions based on the research
The CA parameters automatically identified by the GSA heuristic algorithm can account for the contribution of each factor to urban growth in arid regions.In this study, DEM is the most important factor inhibiting urban growth, which may be given Urumqi is surrounded by mountains on three sides, with more than half of the mountainous area, leading to a very limited amount of urban land 35 .City and town are the most important factors contributing to the promotion of urban growth, followed by road and railway, which suggests that for cities in arid regions, the location and transportation factors may have an impact on urban growth, and unplanned urban growth may exacerbate the further deterioration of drought.
The research findings have led to the formulation of several recommendations for urban management and planning in the city of Urumqi.Firstly, we would like to advocate an in-depth investigation into the dynamics of urban growth in arid regions to provide reliable data support for the sustainable development of cities in arid regions 1,42 .Secondly, it is proposed that local governments should formulate rational urban planning policies to improve land use efficiency 43 .Thirdly, given the limitations of terrain and climate conditions, urban infrastructure should be strengthened to enhance the radial links between cities 44,45 .

Deficiencies and direction of future efforts
The study simulated and projected urban growth in an arid region based on the heuristic algorithm.The driving factors are time-dependent and would vary under different scenarios, we did not consider the effect of their changes on the model, and it may improve the modeling accuracy if their changes are incorporated into the model.In addition, the model requires a large amount of land use and driver factor as inputs, and the quality and reliability of these remotely sensed data will have an impact on model performance.The uncertainty in modeling results caused by the accuracy of data inputs can be reduced if higher-resolution data are available.

Conclusions
Modeling and projecting urban growth in arid regions is essential for sustainable development.In recognition of the distinctive characteristics of cities in arid regions, we developed an innovative CA model (GSA-CA) using GSA to simulate and project urban growth in arid regions.We calibrated the GSA-CA model for Urumqi using datasets from 2000 to 2010, validated it using datasets from 2010 to 2020, and subsequently projected urban growth scenarios for 2040.The results showed that Urumqi's urban growth during the past 20 years occurred mainly in the Xinshi District, Tianshan District, Shaybak District, and the surrounding low-lying areas, showing a clear agglomeration pattern.The overall accuracy of the model in 2010 was 98.42%, with a FOM of 43.03%; in 2020, the accuracy was 98.52%, with a FOM of 37.64%, which fully proves the effectiveness of the novel model.Finally, we projected two potential scenarios for Urumqi in 2040, namely the BAU scenario and the RD scenario, to help improve urban planning and development strategies.
The model is capable of simulating urban growth and project future scenarios well in arid regions worldwide.Future research should focus on identifying and integrating key meteorological and ecological factors to improve the projection capability of CA models for urban growth in arid regions.

Fig. 1 .
Fig. 1.Workflow of the GSA-CA model for the simulation and projection of dynamic urban growth.

Fig. 2 .
Fig. 2. The administrative profile of the study area with slope.The map is created by ArcMap, version 10.8 (http:// www.esri.com).

Fig. 6 .
Fig. 6.The simulated urban patterns using the GSA-CA model with different fractions of the population for 2010 and 2020.The map is created by ArcMap, version 10.8 (http:// www.esri.com).

Fig. 7 .
Fig. 7. Overall accuracy and FOM during the model calibration and validation under different fractions of the population (kbest) for 2010 and 2020.

Fig. 8 .
Fig. 8.The FOM assessment of the simulated results produced by GOA-CA with fractions of the population (kbest) at 0.5.The map is created by ArcMap, version 10.8 (http:// www.esri.com).

Table 1 .
Influence factors used to examine the spatiotemporal pattern of urban growth.

Table 2 .
The GSA control parameters for retrieving the CA transformation rules.

Table 3 .
Generated CA parameters using GSA-CA for different fractions of the population with the best fitness scenarios.