Driving forces and prediction of urban land use change based on the geodetector and CA-Markov model: a case study of Zhengzhou, China

ABSTRACT Exploring urban land use change is a classical problem in urban geography. Taking Zhengzhou as an example, this paper analyzes the spatial and temporal characteristics and driving factors of urban land use change, and simulates the spatial pattern of urban land use in the future. The results of the study show that the land use types in Zhengzhou city were mainly farmland and construction land, the area of forestland, grassland, water area, and unused land was smaller, and the main land use change was the transformation of farmland into construction land. The accuracy check of the simulated land use type data in 2020 showed that the kappa coefficient reached 0.9445, which met the accuracy requirement. Then, according to the predicted land use data in 2025, it was found that the area of grassland, construction land, and water area may have decreased, and the area of farmland, forestland and unused land may have increased. Based on the driving force analysis of land use changes, its prediction results can provide an important reference basis for the formulation and planning of policies related to urban construction.


Introduction
The current global urbanization rate has reached over 50% and it is estimated that by 2050 over 67.2% of the world's population will be living in cities (Song, Pijanowski, and Tayyebi 2015). Urban land use change is a direct manifestation of urbanization (Deng et al. 2009;Dadashpoor, Azizi, and Moghadasi 2019;Qu and Long 2018), and as the urbanization rate continues to climb, the spatial pattern of urban land use gradually tends to become irrational (Wang et al. 2018;Liu, Li, and Yang 2018). Urban land use change is a complex process (Alqurashi, Kumar, and Al-Ghamdi 2016;Luo et al. 2016;Qiu et al. 2019), and its main study is the impact of the combined action of human activities and natural development processes on the earth's surface and the ecological environment within the region (Rockström et al. 2009). Land use change is caused by humans and in turn has an impact on human life (Luo and Zhang 2022;Webber et al. 2021;He et al. 2021;Kalnay and Cai 2003;Ouyang et al. 2021), so it is necessary to study the development patterns of land use change. Since the 1990s , almost 30 years of continuous development have led to an increase in the depth and breadth of research tools and research topics related to land use evolution (Herold, Couclelis, and Clarke 2005;Mosammam et al. 2017;Zhai et al. 2020;Yin et al. 2021). The main directions of research on the spatio-temporal development of land use are the detection of driving mechanisms, the portrayal of processes and the prediction of simulations (Liu and Deng 2010).
The issue of driver analysis began to receive widespread attention in the 1990s (Committee et al. 1999;Turner, Moss, and Skole 1993), and initial studies usually involved extensive empirical analysis of specific regions as a way to study changes in a single land use type (Sneath 1998;Lambin et al. 2001). As research progressed, empirical statistical methods were gradually used to quantify the influence of drivers, with specific methods such as principal component analysis (PCA) Guo, Ye, and Hu 2022), random forest methods (Zhou et al. 2020;Wu et al. 2021;Lv et al. 2021), analytic hierarchy process (AHP) (Mohamed and Worku 2020;Fitawok et al. 2020;Thapa and Murayama 2010), geographically weighted regression (GWR) (Wu, Li, and Wang 2021;Punzo, Castellano, and Bruno 2022;Chen et al. 2020;Zhang et al. 2020), and logistic regression models (Shu et al. 2014;Peng et al. 2017;Cao et al. 2020;Mahmoud and Divigalpitiya 2019). In addition to the methods mentioned above, the more emerging Geodetector has also gained widespread application in explaining drivers (Zhu, Meng, and Zhu 2020;Ouyang et al. 2021;Liu et al. 2021;, which is easy to operate and highly applicable. Proposed by Wang Jinfeng , the geographic detector reveals the size of the influence of the drivers of geographical phenomena by detecting their spatial heterogeneity, i.e. if an independent variable has an important influence on a dependent variable, the spatial distribution of the independent and dependent variables should have similarity, and the higher the degree of similarity, the greater the influence of the independent variable on the dependent variable (Long et al. 2022;Wang, Zhang, and Chen 2021b).
The simulation and prediction of future land use changes can provide some reference for the planning and construction of cities, optimize the urban environment and improve the efficiency of urban land use (Koroso, Lengoiboni, and Zevenbergen 2021;Khamchiangta and Dhakal 2021;Wang et al. 2021a). Solving this problem first requires studying urban expansion patterns (Zheng et al. 2022;Cengiz, Görmüş, and Oğuz 2022), and then simulating and predicting future urban land use patterns using appropriate models Shukla and Jain 2020). Among them, for the simulation and prediction of land use, quite a few models have been applied to urban geography, such as the CA model (Lauf et al. 2012; van Vliet et al. 2013), CLUE-S model (Huang, Huang, and Liu 2019;Peng et al. 2020), GEOMOD model (Rashmi and Lele 2010), MLP -NN model (Singh et al. 2020;Darvishi, Yousefi, and Marull 2020) and Markov model (Bose and Chowdhury 2020). Of these models, the meta-automata (CA) model excels in simulation by analyzing transformations of spatio-temporal locations Tobler 1970), but suffers from certain deficiencies in linear analysis. The Markov model is very effective in quantitatively modeling trends over long time series, but is deficient in spatial level modeling (Zhang et al. 2021). Combining the advantages of both, the CA-Markov model not only has the ability of CA to effectively simulate complex spatial evolution, but also has the advantage that Markov models can make long-term predictions (Tariq and Shu 2020;Matlhodi et al. 2021;da Cunha et al. 2021). The CA-Markov model is well-established and has been applied to a variety of studies (Aburas et al. 2017;Fu, Wang, and Yang 2018;Firozjaei et al. 2019;Zhang et al. 2021;Fu et al. 2022). The method has relative advantages over the CLUE-S model in simulating and predicting urban land use, and can simulate a larger area (Wang et al. 2010;Zheng and Hu 2018;Chotchaiwong and Wijitkosum 2019); compared to the GEOMOD model, it can directly simulate and predict land use for multiple images (Rashmi and Lele 2010;Matlhodi et al. 2021;Hamad, Balzter, and Kolo 2018); compared to the MLP -NN model, it is more convenient and faster (Singh et al. 2020;Darvishi, Yousefi, and Marull 2020).
The drivers of land use type change come from a number of sources and are not only constrained by the natural environment, but are also influenced by land planning and urban planning, so the search for drivers affecting urban land use evolution should be as comprehensive as possible. Planning for the future should be based on sound forecasting, and the detection and analysis of historical land use change drivers, the identification of transfer matrices and the allocation of appropriate weights to each driver are the focus and difficulty of forecasting future urban land use.
The two main types of questions on how to study the drivers of land use change and prediction are addressed in this paper using a Geodetector and a CA-Markov model. The principle of the Geodetector determines that it is more suitable for processing geographical data than the previously used methods of determining drive factors. The CA-Markov model is a combination of quantitative and spatial modeling and has a high degree of accuracy. The research and application of Geodetector models and CA-Markov models are becoming increasingly mature in China, but studies combining the use of both methods and using the drivers from Geodetector analysis for CA-Markov model inputs have not been published. Therefore, this paper makes an attempt to combine the two models and obtain simulation results with high accuracy, which provides new ideas and solutions for the study of urban land use evolution drivers and simulation and prediction.
Zhengzhou, one of China's 'national central cities' and a mega-city. The study of Mu et al. showed that the urbanization level of Zhengzhou increased rapidly after 2000 (Mu et al. 2016), and the urbanization rate increased from 63.6% to 78.4% during 2010-2020. The rapid economic growth and urbanization levels have led to frequent land use changes in the region, which have seriously affected the living environment of the residents and the ecological environment of the region (Zhou and Chen 2018;Ye et al. 2018). How to determine the development pattern of urban land use change and model the future has become an issue that cannot be ignored in the sustainable development of Zhengzhou. In this paper, Zhengzhou is used as the study area to determine the explanatory power of different driving factors using Geodetector based on the analysis of the characteristics of urban land use type conversion in previous years, followed by spatio-temporal prediction by CA-Markov model. The results of this experiment have a certain reference value for the future urban planning and other work in Zhengzhou, so that the allocation of land resources can be gradually rationalized and improved, bringing positive significance to the future benign economic, social and environmental development of Zhengzhou (Li, Sato, and Zhu 2003;Jenerette and Wu 2001;Deal and Schunk 2004;Araya and Cabral 2010).

Study area
Zhengzhou city (112°42 ′ -114°14 ′ E, 34°16 ′ -34°58 ′ N) is the capital of Henan Province, China, and a national historical and cultural city. It is located in the lower reaches of the Yellow River, the transition zone between the second and third terrain steps in China, with a total terrain of high in the southwest and low in the northeast, descending in a stepped pattern (Figure 1), with a total area of 7532.56 km 2 . The study area is the only 'double cross' center in China's general railway and highspeed railway network. The land use types in the study area are diverse, with the highest proportion of farmland area, but spatially unevenly distributed; the degree of land development is high, the expansion rate of construction land is fast, and the contradiction between the protection of farmland and rapid development is prominent. In 2012, The State Council approved the Plan for the Central Plains Economic Zone (2012-2020), requiring Zhengzhou city to play a core and leading role in the development of the Central Plains Economic Zone. The following year, Zhengzhou city was selected as an important node city of the Belt and Road Initiative. In 2016, Zhengzhou city was designated as a 'national central city', requiring the acceleration of relevant work to comprehensively improve the level of economic development while taking into account the protection of agricultural land. In the future, the development of Zhengzhou city will continue to be oriented to the west and south, with 'strong east, dynamic south, beautiful west, quiet north, medium and excellent, outreach' to build the blueprint of territorial space. By the end of 2020, the permanent resident population of Zhengzhou city is 12.6 million, the urbanization rate of population has reached 78.4%, and the GDP is 12.03 billion yuan, an increase of 3.0% over the previous year.

Data collection and processing
The land use data used in this study are from the Resources and Environmental Science and Data Center, Chinese Academy of Sciences (https://www.resdc.cn/). According to the national land use/cover classification system of remote sensing monitoring, combined with the actual situation of the study area, the land use types were divided into six categories: farmland, forestland, grassland, water, construction land and unused land. The data is based on Landsat TM images and is generated by manual visual interpretation. The data of 2010, 2015 and 2020 are used in this paper, and the format is vector data.
The factors used as drivers for the study in this paper are ① DEM data, selected from the geospatial data cloud (http://www.gscloud.cn/search) with ASTER GDEM 30 M resolution digital elevation data; ② slope data are extracted from the obtained DEM data in ArcGIS; ③ population data for 2010, 2015 and 2020 are obtained from WorldPop (https://www.worldpop.org/) with a resolution of 1 km; and ④ road data and 17 other vector data of Zhengzhou city (Table 1) are used as the driving force of the city study and the data is in real time, with the exception of city center.
The CA-Markov model can be calculated with different image resolution, and different image resolution will lead to different results (Pan et al. 2010;Wu et al. 2019). First of all, the original data were pre-processed using ArcMap 10.7 software. The original land use vector data was projected into CGCS2000_3_Degree_GK_CM_114E and then transformed into raster image. Other vector driving force data were also converted into CGCS2000 and then into raster images by 'Euclidean Distance' tool. DEM, slope, and population data are aligned with other image projections by 'Project Raster'. Finally, considering the experimental accuracy and research efficiency, the resolution of all raster images was set to 200m×200 m (Cui 2014).  considered the influence of different factors on the drivers of urban land use change in terms of natural, major road, and social conditions Table 2.

Drivers of urban land use change
. Natural: Elevation and slope are the main environmental factors affecting urban construction, which largely determine the spatial pattern of urban development; the water basin also influences the type and change of surrounding sites. These factors reveal the relationship between natural environmental factors and land use types interacting and influencing each other. . Road: High economics along with transportation routes influence or change the land development within the radiation area and drive changes in the spatial pattern of land use. In this paper, driving factors such as national roads and provincial roads are selected to study the driving effect of traffic road factors on land use types. . Socioeconomic condition: Distance from medical services and other institutions and population distribution represent the impact of different social factors on land use. Social factors not only reflect the regional differences in space but also important reflections of the economic level in space.

Geodetector
A statistical method in which geographic detectors reveal the driving forces of different processes by detecting their spatial heterogeneity has been widely used in several fields of natural and social sciences. The basic idea is that the study area is divided into several small regions to satisfy spatial differentiation, i.e. the sum of the variances of each region is smaller than the total variance of the region; if the spatial distribution of two variables is approximately the same, the two variables are   . Geographical detectors have two main advantages over other driving force analysis methods: i) they can detect not only numerical variables but also qualitative variables; ii) they can detect the interaction of two independent variable factors on the dependent variable.
Geodetector is good at analyzing qualitative variables, while for sequential quantities, ratio quantities and other data, as long as appropriate discretization, Geodetector can also be used for statistical analysis. The 'GD' R package, which allows for optimal discretization and spatially stratified heterogeneity analysis, was developed by Yongze Song et al. and is available from The Comprehensive R Archive Network (CRAN) (https://cran.r-project.org/web/packages/GD/index. html) for free download . Five commonly used unsupervised discretization methods are set to discretize continuous variables during operation time, including several breakpoints such equipartition, natural, quantization, geometric and standard deviation, and all continuous variables are divided into 3-7 intervals. For all combinations of discretization methods and the number of intervals, the relative importance (Q value) is calculated using a factor detector model, and the largest Q value determines the best combination of parameters for the discretized continuous variables.
The factor detector: Q-statistic detects the spatial heterogeneity of the dependent variable and the extent to which each driver explains the independent variable using the Q-value. The computational expression is : where h = 1, 2 … , L is the stratification of the dependent variable or driver; N h and N are the number of layers h and the whole area, respectively; and s 2 h and s 2 are the variances of the Y values of layers h and the whole area, respectively. SSW and SST are the sums of the variances within the strata and the full variance of the whole area, respectively.
The interaction detector: For determining whether the driver two-by-two has an interactive effect on the dependent variable, the relevant data are superimposed, and then the differences are compared for significance. That is, first calculate the explanatory power Q(X m ) and Q(X n ) values of X m and X n on Y, respectively, and then calculate the explanatory power Q(X m ∩ X n ) when the two act simultaneously and compare them.
The data entered into the 'GD' package can only be in tabular form (Wang, Zhang, and Fu 2016), so the dependent variable (land use data) and independent variable (driving force data) should be pre-processed first: ① the land use data of 2010 and 2020 are intersected in ArcMap, and the map spots with changes are assigned 1, and the map spots without changes are assigned 0; ② establish fishing nets in the study area; ③ the binary image in ① and the raster image of each driving force factor are sampled through the fishing net to get the tabular data; ④ filter out the NULL values in the table to get the initial data input into the 'GD' package. The results of factor detector and interaction detector can be obtained by running the model.

CA-Markov model
The CA-Markov model is a combination of two parts, CA and Markov, to simulate future land use type changes by using the transfer probability matrix as a rule. The Markov model was born from the study of stochastic processes and is currently widely used in the process of land pattern evolution (Luo and Zhang 2014). It quantifies the land use type transfer states during different periods, which describes the land use type transfer states in different periods, and the corresponding area transfer matrix and probability transfer matrix of different land use states can be obtained (Ibarra-Bonilla et al. 2021). The calculation equation is as follows (Mansour, Al-Belushi, and Al-Awadhi 2020): P ij = P 11 P 12 · · · P 1n P 21 P 22 · · · P 2n · · · · · · · · · · · · P n1 P n2 · · · P nn ⎡ ⎢ ⎢ ⎣ ⎤ ⎥ ⎥ ⎦ and n j=1 P ij = 1(i, j = 1, 2, · · · , n) (2) where S t and S t+1 are the land use statuses in periods t and t+1, respectively, P ij is the transfer probability matrix, and n is the number of land use types. The CA is a network dynamics model with discrete space, time and state, and local spatio-temporal causality (Li et al. 2016;Wang et al. 2018). It is composed of cell, lattice, neighborhood and rule of transformation, based on the principle that a cell transforms or remains unchanged according to the transition rules, considering cells both its own and neighboring states Hou, Chang, and Yu 2004). The model is formulated as follows (Rahman and Ferdous 2021): where S is the set of cellular states, f is the rule of transformation, N is the neighborhood filter, t and t+1 represent the early year and the later year respectively, and the calculation of the model is based on the extended Moore neighborhood (Zhao et al. 2019;Matlhodi et al. 2021;Ghosh, Chatterjee, and Dinda 2021). The combination of the two models overcomes the inability of the Markov model to predict spatial data and the inability of the CA model to consider the actual influencing factors. it combines the two components, making full use of the powerful spatial computing power of the CA and the ability of the Markov to make long-term predictions.
In this study, kappa coefficients were used for the accurate evaluation of simulations and predictions (Sankarrao, Ghose, and Rathinsamy 2021), corresponding to the CROSS-TAB module in IDRISI. The principle equation is as follows: where n indicates the total number of raster pixels; a 1 is the number of pixels of actual construction land; a 2 is the number of pixels of actual nonconstruction land; b 1 and b 2 are the number of pixels of simulated construction land and nonconstruction land, respectively; and s is the number of pixels with equal values in the real raster and simulated raster. The simulation and prediction of land use are mainly carried out in IDRISI software. Weighted Linear combination of MCE module in the software was used to input the influence of driving factors calculated by the Geodetector to generate suitability images of each land use type, and then package them into suitability atlas. Then run CA_MARKOV module, input the forecast base year as 2015, input the transfer area matrix calculated from the land use data in 2010 and 2015, and the transfer area matrix and suitability rule image set obtained above. According to the backward prediction year interval, set the iteration number as 5. The land use simulation results of 2020 can be obtained by selecting 5*5 filters as the filtration type of cellular automata. The CROSSTAB module is used to assess the simulation accuracy, and if the accuracy test passes, the simulation can continue to predict the land use in 2025. The research framework is shown as Figure 2.

Characteristics of land use change in historical periods
During the period from 2010-2020, the land use types in the study area were dominated by farmland and construction land, which together accounted for approximately 86% of the overall area of the study area, with farmland covering the largest area, exceeding 60% of the overall area each year ( Figure 3).
As seen from Table 3, during the period 2010-2015, farmland was the most transferred area (262.72 km 2 ) and the most net transferred land use type (256.49 km 2 ) in the study area. The land use type with the largest area expansion was construction land (232.05km 2 ), followed by the second largest net transfer in the area was water (18.39 km 2 ), and the main sources of transfer were all farmland. Grassland (5.78 km 2 ), forestland (0.21 km 2 ) and unused land (0.06 km 2 ) all had a small increase in area. Unused land was the land use type with the least overall area change, with no transfer out and only a small transfer in farmland (0.05 km 2 ) and forestland (0.02 km 2 ), which may be due to abandonment and deforestation and destruction of forestland. Although the area of farmland shows a trend of reduction, there was still a small amount of transfer to it, mainly forestland (2.30 km 2 ) and water area (3.85km 2 ), reflecting that the phenomenon of deforestation and land creation around water still existed. From 2010 to 2015, the change in land use types mainly occurred from farmland to construction land (230.85 km 2 ), which indicated that with the continuous improvement of social and economic levels, the urbanization process was expanding; the overall trend of land type change was that the area of farmland decreased and the area of other land use types increased, which indicated that to meet the needs of urban construction and the ecological environment, the implementation of decisions such as returning farmland to grassland, forestland and water was effective.
As shown in Table 4, during the period from 2015-2020, the trend of land use types was approximately the same as the period 2010-2015: construction land (347.91 km 2 ) remained the largest land use type with the largest net transfer in area, and the main source of transfer was farmland (343.07 km 2 ), indicating the increasing demand for construction land under the social environment of dramatic population growth and rising economic levels. The area of farmland also continued to shrink,  with the decrease rising from 4.87% to 7.34%, mostly shifting to construction land (347.91 km 2 ), followed by shifting to water (22.71 km 2 ). Unlike previous years, there was shrinkage in the area of forestland (7.68 km 2 ), which was shifted out for different land types, mainly in the direction of farmland (4.67 km 2 ), which indicated that deforestation and reclamation were occurring to compensate for the large reduction in farmland and to support the steady forward progress of urbanization.
Looking at the overall period of 2010-2020, i.e. at the 10-year time scale (Table 5), the land use pattern in the study area changed significantly, and the land use types with the greatest changes were construction land and farmland, with an increase of 43.99% in the area of construction land and a total increase of 579.96 km 2 , reflecting the continuous improvement in the economic level. The area of farmland decreased by 11.85%, with a total decrease of 624.43 km 2 , indicating that the problem of farmland occupation needs to be solved. Grassland and water area continued to increase, and the main source of transfer was farmland, indicating that the policy of returning farmland to forestland and grassland had achieved certain results. Forestland first increased and then decreased, showing an overall trend of shrinkage. The area of unused land showed only slight fluctuations. By looking at the map, we can see that the unused land is mainly concentrated near the top of the mountainous areas with large undulations in the terrain, and the reason for the small change in the area may be due to the difficulty of development.

Factor detector
This paper explored the magnitude of the influence of the driving factors of population density, elevation and slope on land use type change using the factor detector of the Geodetector, and the numbering of each driver is shown in Table 6.  The results (Figure 4) show that the distance from the subway (x17) had the greatest correlation with land type change, which may be because the accessibility of urban subway transportation has a large impact on land value and promotes the economic vitality of the surrounding areas to a great extent. Second, the distance from the city center (x2) also had a very high impact on land use type change, and generally speaking, the closer to the city center, the better the political and economic service system, and the smaller the probability of land use change. The correlation between DEM (x3) and slope (x16) and land use type change was greater because the topography and geomorphology of the area directly affect the change in land use and spatial structure, influence the function and structure within the city, and determine the future development direction of the city and the spatial pattern of land use. In addition, the distance from the train stations (x19) and the explanatory power of traffic road data, such as the distance from high-speed rail (x5) and secondary roads (x15), on land use change were also relatively high because the planning and construction of traffic may have a certain guiding effect on the development direction of the city and maybe conducive to promoting development along the route. In terms of development cost, the closer the distance from the traffic road, the smaller the logistics and work cost required to change the type of land use.

Interaction detector
Interaction detection is an advantage of Geodetector, which can detect the interaction effect of two driving factors based on factor detection. In this paper, the results of two-by-two interaction detection of the driving factors with the help of a Geodetector are shown in Figure 5.
It could be seen that the Q value of the influence of the interaction of each two drivers was an enhanced effect, i.e. these drivers do not act independently on land use change but in conjunction with other factors. Among them, the distance from the city center and the distance from the high speed toll stations had the greatest correlation on the land use type change when interacting with each other, which may be because the high speed toll stations is one  x17 Subway lines  x3  DEM  x8 Tertiary roads  x13 Scenic area  x18 High speed toll stations  x4  Government offices x9 National roads  x14 Distribution of universities x19 Train stations  x5 High-speed rail x10 Population x15 Secondary roads x20 Water of the main entrances into the city and the city center is one of the most attractive areas for human traffic, which inevitably affects the land use type change around the connecting channel between the two as the image of the city is becoming increasingly important today; this is followed by the DEM and the distance from the subway. The interaction between DEM and distance from the subway shows that under the limitation of the natural topography, the subway, as the main component of urban rail transportation, had an important influence on urban land use change. In addition, the interaction between distance from university and distance from the subway was also large, which may be because university students themselves are a relatively active group, and the easy and fast subway had become the preferred travel mode of university students, thus promoting the land use type change around the subway line. This may be because students at colleges and universities are more active groups, and the subway has become the preferred travel mode for them, thus promoting the change in land use types around the subway. Overall, the correlations with distance to the subway were strong, and distance to the subway, as the main explanatory factor for land use change, significantly increased in interaction with other factors.

Simulation and prediction of land use pattern change
The results obtained from factor detection in Geodetection, i.e. the explanatory power of each driving factor on land use change, were used to simulate and predict land use type changes in the study area in combination with the CA-Markov model. The simulation results of the land use type distribution in 2020 were obtained by combining land use data in 2010 and 2015 with the land use transfer area matrix and the suitability image set. The kappa coefficient was calculated by comparing it with the actual situation in 2020, and a kappa coefficient greater than 0.75 is usually considered to have high reliability (Foody 2002). The final kappa coefficient was obtained as 0.9445, which indicates that the simulation is good and shows that the explanatory power of the driving factors obtained by the Geodetector is reasonable. This paper further compared the simulated results of each land type with the real results in 2020 (Table 7), and the closer the ratio of simulated to the real area for each land type is to 1, the higher the simulation accuracy. Forestland was the land type with the highest simulation accuracy, which was slightly higher than the actual value. In addition, the simulated value of farmland was also higher than the actual value, while the simulated values of the remaining land types were lower than the actual value. Except for construction land, the simulation accuracy of other land types was higher, probably because the relevant policies affecting construction land types are not fully considered. The simulated land use distribution map in 2020 was then compared with the real distribution map in 2020, as shown in Figure 6. Observing the spatial distribution of the two maps, it can be seen that the 2020 land use distribution map obtained from the simulation was consistent with the real distribution map in terms of the distribution of various feature types. Among them, the simulation was most similar to the real distribution of forestland and grassland, and the simulation effect was better; the simulated area of water in the northern part of the region was smaller than the real one, but the spatial pattern of distribution was similar; the types of land with less simulation accuracy were construction land and farmland, the real distribution of construction land was larger and the area of farmland was smaller. From the overall perspective, the real land use distribution was more scattered with many small, fragmented patches, while the simulated distribution map was more concentrated in terms of land type.
After the accuracy test, land use type projections for 2025 were made using data from 2015 and 2020 to produce a land use distribution map for 2025 (Figure 7), a table of area changes by category (Table 8), and a land use transfer matrix for 2020-2015 (Table 9).
The results showed that in 2020-2025, Zhengzhou is still dominated by farmland and construction land, with smaller areas of forestland, grassland, water and unused land. Compared with 2020, Figure 6. Simulation forecast land use distribution map in 2020. the overall distribution of land use types have not been changed much, among which the change of farmland is more prominent, with an increase of 1.48%, followed by grassland, with a decrease of 0.62%. Based on the land use data of Zhengzhou city from 2010 to 2020, the development and evolution of each land use type obtained from the prediction for 2020-2025 differed slightly from the historical land evolution trend, with an increasing trend of farmland and forestland and a shrinking area of grassland, water and construction land. Farmland (112.42 km 2 ) was the land use type with the largest area growth, and the main source of transfer was construction land; construction land (65.26 km 2 ) was the land use type with the largest area decrease, followed by grassland (47.03 km 2 ), both of which were mainly transferred to farmland; forestland (2.15 km 2 ) showed a trend of area expansion, while water (2.35 km 2 ) showed an area shrinkage. The trend of both types of land was mainly with the conversion of farmland; the unused land (0.07 km 2 ) had only a slight area change.
The strategic positioning of Zhengzhou city as a 'central city' determines its overall development goal, which requires upgrading the level of economic development and urbanization. This corresponds to the continuous growth of building land. However, with the continuous development of urban economy, agricultural land has been seriously damaged, and the policy is more inclined to protect agricultural land, which corresponds to the simulated growth of farmland area in 2025. In general, the prediction results of this study are basically similar to the direction of planning policy. In addition, for the land use obtained in 2025, it is worth noting that the utilization rate of unused land should be improved, so as to optimize the structural layout of the city.

Discussion
Using Zhengzhou city as the study area, this paper was the first attempt to combine Geodetector with the CA-Markov model, and used Geodetector to obtain the explanatory power of each driving factor on land use change as the weight value of each factor input to a multicriteria CA-Markov model, and then to simulate and predict urban land use types.
In previous studies, many scholars have applied the driving force weights obtained from logistic regression models and others to the prediction of CA-Markov models, and many research results have been achieved. Although the logistic regression model is quick to train and the results are very interpretable, the accuracy is not very high and does not consider the influence of the spatial distribution of the drivers. And GWR is widely used in various disciplines. However, because GWR only allows for individual modeling of each response variable, it does not provide sufficient information about the data when analyzing multiple response variables of interest (Naikoo et al. 2022;Chen, Yang, and Jian 2022;Xue et al. 2022). In addition, although the GWR model is a linear regression model, its calcu (Luo and Zhang 2014)lation involves only linear interpolation, which has some limitations (Liu, Wu, and Wang 2019). While the Geodetector, as an emerging research model, is a statistical method to detect drivers specifically for spatial heterogeneity. To form a comparison, the experiment was conducted using the LOGISTICS module in IDRISI using data of the same resolution and the same study area, and the final kappa coefficient obtained for the 2020 land use data was 0.8729, while the results of the CA-Markov model simulation of the Geodetector improved by 0.0716. This indicates that for this study area, the present study method works better than the logistic regression model.
The main types of land transfer occurring in this paper are the same as the findings of existing studies (Lu 2020); the results obtained by the Geodetector are essentially the same as those of published studies on land use drivers in the study area (Liu 2021;Wang 2017); the land use data for 2025 obtained by observing the projections, in addition to existing studies illustrating the direction of development of the cities in the study area toward the east, are similar to the results of this study (Dong et al. 2021).
Geodetector can only input tabular data when running, and the raw data to be processed are usually image data; CA-Markov model has the problem of finding the optimal scale. In addition to the above limitations of the two models, there are some limitations specific to the content of this paper. The number of driving factors mentioned in this study is higher than those used in previous studies, involving natural, road, and socioeconomic conditions factors. The large variety of data inevitably raises the difficulty of data collection: the temporal dimension of the data does not correspond enough, and vector data of previous years, especially road data, are extremely difficult to obtain. Therefore, in this paper, only the data of roads of grade 3 and above are selected because the higher the road grade is, the less likely it is to change, and the longer the corresponding time duration is. In addition, the poor simulation of construction land was found in the previous paper, probably because the relevant land use policies were not considered, and construction land is the type of land most affected by policies. The change in the spatial distribution of land use is a complex process, and its development will be influenced not only by the regional geographic environment and ecological conditions but also by the relevant land use policies and other human activities. The influence of relevant policies on the change in land use patterns can be explored in the future to improve the accuracy of prediction.

Conclusions
Based on the land use raster data of Zhengzhou city, this paper used Geodetector to explore the driving mechanism of urban land use changes based on the analysis of historical land use change characteristics and then combined it with the CA-Markov model to simulate and predict the future urban land use changes in Zhengzhou. The following conclusions were obtained: (1) The historical land use types in Zhengzhou were mainly farmland and construction land and the main land use change was the transformation of farmland into construction land, which indicates that Zhengzhou city is developing rapidly in urbanization and that its economic level has been significantly improved.
(2) Using the factor detector of the Geodetector, we found that the distance from the subway had the most significant impact on the change of urban land use type because the construction and operation of the subway greatly promote the vitality of the development of the surrounding areas.
(3) Using the interaction detection of the Geodetector, it could be obtained that the interaction of both driving factors is enhanced, which indicates that the effect of each individual factor on land use change is less than the joint effect with other factors. Among them, the distance from the city center and the distance from the high speed toll stations had the greatest effect on land use type change when interacting. (4) Projections for 2025 showed that the land use types in Zhengzhou city in 2025 would continue to be dominated by farmland and construction land. In the absence of changes in the regional natural environment and current policies from 2020 to 2025, the area of farmland and forestland expands; the area of grassland, construction land and water area will shrink, and all of them are mainly transformed into farmland; and the area of unused land fluctuates only slightly. According to the forecast results, the implementation of farmland protection, ecological protection and optimization of the green environment is in line with the future development trend of urban land use. The future urbanization level will be not simply an increase in the area but an improvement in the spatial pattern of urban land use and an increase in urban agglomeration.
The results show that the use of Geodetector combined with CA-Markov model can effectively improve the accuracy of the simulation of the spatial pattern of urban land use change. It provides a reference for optimizing the spatial distribution of cities and improves help for relevant policy formulation.

Data availability statement
The land use data that support the findings of this study are available from the Resources and Environmental Science and Data Center, Chinese Academy of Sciences. Restrictions apply to the availability of these data, which were used under license for this study. Land use data are available at https://www.resdc.cn/ with the permission of the Resources and Environmental Science and Data Center, Chinese Academy of Sciences. The driver data that support the findings of this study are available on reasonable request from the corresponding author, K.Z.