An efficient built-up land expansion model using a modified U-Net

ABSTRACT This paper introduces an improved convolutional neural network based on the conventional U-Net for simulating built-up land expansion. The proposed method hires a pixel-wise semantic segmentation approach considering the spatial drivers affecting urbanization as data cubes. Independent variables including altitude, slope, and distance from barren, crop, greenery, roads, and urban areas for 1998, 2008, and 2018 were considered as covariates for the simulation of built-up land expansion in Tehran and Karaj regions in Iran. The proposed method was compared with the random forest (RF) algorithm as the baseline model. Evaluation using the area under the total operating characteristic indicated the superiority of our modified U-Net (0.87) over the RF (0.82) algorithm. Furthermore, evaluation using the percent correct metric indicated that our proposed model is capable of learning neighborhood effects effectively leading to simulate built-up land expansion accurately, independent from applying a cellular automata (CA) model. Therefore, the modified U-Net independent from the CA which can consider the neighborhood effects is recommended for the simulation of built-up land expansion precisely.


Introduction
Land use and land cover change management, unbalanced built-up land expansion prevention, and providing the facilities for balanced development of cities, require information about the future amount and pattern of built-up land expansion. Land use and land cover change (LULCC) arises mostly from human effects, such as agriculture activities on the earth (Lambin, Rounsevell, and Geist 2000;Turner, Lambin, and Reenberg 2007;Sumari et al. 2017). Even changes occurring locally can affect the environment on a global scale (Lambin and Geist 2008), including global changes in climate (Kalnay and Cai 2003), carbon cycling (Offerman et al. 1995;Li et al. 2019), and ecosystem services (Trinder and Liu 2020;Shao et al. 2021). Built-up land expansion is one of the most common types of LULCC, which has led to the destruction of natural resources and threatened the health of humans and other organisms (Weng 2001;Yulianto, Maulana, and Khomarudin 2019). In addition to environmental goals, a wide range of urban decisions require a view of future built-up development. Built-up land expansion researchers have used a wide range of LULCC models to simulate and predict built-up land expansion, but achieving high accuracy is still a challenging issue (He et al. 2019).
LULCC models can be categorized into statistical models, machine learning (ML) models, and tree-based models (Pontius et al. 2008;Shafizadeh-Moghadam et al. 2017). A variety of studies have used and compared the LULCC modeling performance of different methods (Kamusoko and Gamba 2015;Shafizadeh-Moghadam et al. 2017;Goetz et al. 2015;Zare Naghadehi et al. 2021). Shafizadeh-Moghadam et al. (2017) compared two widely used methods from each category. The investigated methods including logistic regression (LR) and multivariate adaptive regression splines (MARS) from statistical, artificial neural networks (ANNs) and support vector machine (SVM) from ML and classification and regression trees (CART) and random forest (RF) from tree-based models. Their results showed that the performance of ANNs surpass the five other models (Shafizadeh-Moghadam et al. 2017). The same results have also been reported by several other studies (Pijanowski, Hyndman, and Shellito 2001;Islam, Rahman, and Jashimuddin 2018). Although ANNs achieve the best results in built-up land expansion simulation, the limitation of these methods in considering the neighborhood effect in simulating built-up land expansion has forced researchers to combine them with cellular automata (CA) to address this issue.
Research on ANNs led to the genesis of deep neural networks, which are the basis of deep learning methods (Schmidhuber 2015). Advancements in deep learning methods and successful results reported by various studies on satellite image classification and segmentation have opened the way to their use in land use change detection (He et al. 2019;Huang and Wang 2020;Boulila et al. 2021;Liu et al. 2021). Zhou et al. (2017) simulated built-up land expansion from 2000 to 2015 using multi-temporal Landsat images of Jiaxing city and integrating the heuristic bat algorithm (BA) and deep belief network (DBN) with CA. The researchers compared the proposed models with the ANN-CA model and reported some enhancements in the stability and accuracy of built-up land expansion simulation (Zhou et al. 2017). Among the deep learning methods, convolutional neural networks (CNNs) use convolution operators to extract the spatial features from an image by considering the relationship between the neighbors of each pixel (Albawi, Mohammed, and Al-Zawi 2017). By achieving remarkable results in a wide range of image processing studies, CNNs have gradually entered the remote sensing field and have been used in numerous studies for processing, classifying, and segmenting satellite images as well as for LULCC detecting (Kampffmeyer, Salberg, and Jenssen 2016;Lee and Kwon 2017;Sumbul and Demir 2019;Hu et al. 2015;Castelluccio et al. 2015).
A variety of empirical analyses of neighborhood characteristics have proven the strong relationship between land uses near each other, meaning that each land use is affected by its neighboring land uses . The critical role of the neighborhood effect in built-up land expansion simulation has been illustrated by several studies, and this has led to defining CA for allocating the developed pixels by considering the neighborhood as one of its parameters (Kocabas and Dragicevic 2006;Li, Peng Gong, and Hu 2017;Van Vliet, White, and Dragicevic 2009;Batty, Couclelis, and Eichen 1997;Couclelis 1985). CNN instinctively considers different neighborhood regions based on the network design, as the receptive field of each convolutional filter is increased during the sequences of convolutional layers. He et al. (2018) designed an effective convolution neural network integrated with CA and Markov Chain for simulating built-up land expansion in the Pearl River Delta of China during 2000-2030 (He et al. 2018). They compared the CNN-CA model with the deployment of three ML-based CA models (LR, ANN, and RF). The reported figure of merit (FoM) which was used for evaluating the simulation results was 0.268 for 2005 and 0.346 for 2010 simulations, which proved that CNN-CA can overcome all issues with ML methods including RF-CA, ANN-CA and LR-CA with the FoM of 0.240, 0.246 and 0.231, respectively (He et al. 2018). Moreover, Zhai et al. (2020) proposed a CNN-VCA model, combining a convolutional neural network with vector-based cellular automata (VCA) (Zhai et al. 2020).
CNN-VCA method was applied to simulate urban land use changes in Shenzhen, China using cadastral land use data for 2009, 2012(Zhai et al. 2020). CNN extracts high-level neighborhood features to provide a transition suitability map. Consequently, VCA simulates land use change calculating by a combination of the transition suitability, neighborhood land use condition, constraint coefficient and stochastic factor. Evaluating the land use simulation results using the FoM indicated the higher performance of the CNN-VCA in a comparison with three ML methods combined with VCA. CNN-VCA obtained the FoM value of 0.361 while LR-VCA, ANN-VCA and RF-VCA resulted in the FoM values of 0.023, 0.078 and 0.323, respectively (Zhai et al. 2020). These studies have used CNN but in a combination with CA to simulate built-up land expansion effectively. CNN extracts neighborhood features and CA allocates built-up pixels considering these features.
Considering that the output of typical convolutional networks is the class label assigned to each input image, these networks have no attention to localization (Ronneberger, Fischer, and Brox 2015). Here, localization refers to assigning a class label to each pixel, which is an important requirement for built-up land expansion simulation. As previous studies have ignored this issue, in the current study, a modified U-Net model for simulating built-up land expansion pixel-wise is introduced. U-Net was first introduced by (Ronneberger, Fischer, and Brox 2015), who aimed to perform pixel-wise semantic segmentation on biomedical images. It consists of an encoder-decoder structure, the former for extracting contextual information from input images for segmentation purposes, and the latter for accurately assigning each pixel to a class.
The current study proposes a novel approach for built-up land expansion simulation which applies a modified U-Net convolutional deep learning method which effectively performs pixelwise segmentation of satellite images into developed and undeveloped classes with high accuracy, considering the localization concept. When working with a small changing area compared to the unchanged area, deep learning methods suffer from the lack of an appropriate training dataset. To cope with this problem, the conventional U-Net architecture was modified herein to obtain a balance between the accuracy of the model and the required number of training data. Therefore, the second novelty of this research refers to the proposed model that can be trained and used in small areas where training data is limited. In addition, the variables affecting built-up land expansion are considered as a data cube inspired by multi-band images in remote sensing. This approach with the input data structure allows the proposed U-Net to classify each pixel according to all variables affecting it as well as the spatial pattern of the neighborhoods around that pixel. The comparison of the results of the proposed U-Net and the random forest results showed improvement in the area under the curve (AUC) of the total operating characteristic (TOC) as an accuracy assessment metric. Furthermore, this study defined a primary assumption about the advantage of using the CNN model in considering the neighborhood effect, which is critical in built-up land expansion simulation and renders CA useless in allocating the developed pixels, as the neighborhood effect is considered by the convolutional filters, allowing the researcher use a top-down allocation method. The rest of the paper is organized as follows: Section 2 describes the dataset, study area, and the proposed methodology; the experiments and a detailed analysis are described in Section 3; and finally, conclusions derived from this research are presented in Section 4.

Data and methods
The dataset utilized in the current study was adopted from . and comprise altitude, slope, and distances from specific land uses (namely, barren lands, crop lands, greenery, roads, and urban areas) and were calculated separately for 1998 and 2008. The seven independent variables of 1998 and the built-up land expansion map for the period 1998-2008 were used to train and validate the proposed U-Net. Therefore, the parameters of the proposed U-Net are defined by the training set and evaluated by the validation set. The trained model was then tested with the unseen data, i.e. the independent variables of 2008 and the built-up land expansion map for the period 2008-2018.

Data pre-processing
In the first step, the labels and the input layers are prepared. The seven effective independent variables are considered as inputs of the proposed model. These variables are stacked on top of each other to form a data cube. The advantage of using such a data structure in simulating built-up land expansion is the possibility of simultaneously considering all the variables affecting the built-up land expansion process for each location according to its neighboring pixels by convolutional filters. Table 1 shows the statistical description of the variables. As can be seen, the independent variables in this research have a variety of value ranges. To make them more consistent, they were standardized between 0 and 1.
The labels include pixels belonging to one of the two developed or undeveloped classes produced by the real built-up land expansion maps for the periods 1998-2008 and 2008-2018, which were derived from comparing their classified land cover maps. Some regions named exclusion zones could not be developed into urban areas (Pijanowski et al. 2002). In this research, exclusionary zones include water bodies and greeneries at the beginning of each time interval.  The input variables and labels measured 3094×5014 pixels and needed to be divided into smaller patches. Because of the direct effect of enlarging the dataset on improving the performance of CNN, these patches were created with overlap. Therefore, an experiment was designed in four stages to find the optimum dimensions for the patches and the most appropriate amount of overlap between adjacent patches. In each stage, the dimensions of the patches were 256×256, 128×128, 64×64, and 32×32, respectively. Three values of zero percent (no overlap), 50, and 75 percent were considered for the amount of overlap between the adjacent patches. These were examined on three different datasets of training, validation, and testing.
The dataset related to 1998-2008 was considered as the training and validation set, while the dataset for 2008-2018 was used to test the model. The division of the 1998-2008 dataset into two partitions of training and validation were observed in two modes. In the first mode, namely partitioning after patch creation, the main data was divided into smaller patches, and then ten percent of the patches were randomly considered for validation. The remaining patches were used for training. In the second mode, namely partitioning before patch creation, ten percent of the original data was considered as a validation dataset before the patches were built, and the remaining patches were used to train the model. This experiment helped identify the best way of partitioning data into training and validation sets. To examine the effect of the dataset split ratio into training and validation sets on the model performance, four tests using 40%, 30%, 20%, and 10% of data as validation and the remaining as training were performed. Figure 2 is a flowchart of the proposed method for built-up land expansion modeling. The data cubes of the seven variables affecting built-up land expansion are entered into the proposed model as input data. The output of the model is a map with dimensions equal to those of the input data, in which each pixel is assigned to one of the two developed and undeveloped classes. In this study, the conventional U-Net was modified to make it compatible with built-up land expansion modeling with limited training datasets; several experiments were needed to find the best modifications of the U-Net. A schematic illustration of the proposed model is shown in Figure 3. The network architecture follows the original U-Net, which consists of a contracting or encoder path and an expansive or decoder path (Ronneberger, Fischer, and Brox 2015). The contracting path includes five convolution blocks. In the proposed U-Net, each contracting block consists of the repeated application of two 3 × 3 convolutions with the same padding to prevent dimension reduction, each followed by a batch normalization layer to standardize the inputs to the activation function, and an Exponential Linear Unit (ELU) as the activation function, which is similar to the ReLU and does not suffer from the problem of vanishing and/or exploding gradients. In the following, there is a spatial dropout, which randomly sets entire feature maps to 0 rather than individual pixels to prevent overfitting effectively. At the end of each convolution block, there is a 2×2 max pooling operation for down-sampling. The number of convolution filters begins with 32 at the first block in the contracting path and at each down-sampling step, this value is doubled to reach the last block with 512 feature maps.

Modified U-Net
The expansive path includes four convolution blocks. Each block in this path consists of an upsampling of the feature map performed by a transposed convolution layer that halves the number of feature channels, followed by a concatenation with the correspondingly cropped feature map from the contracting path and two 3×3 padded convolutions, each followed by a batch normalization layer, an ELU activation function, and a spatial dropout. At the final layer, a 1×1 convolution followed by a sigmoid activation function is used to map each of the 32 components of the feature vector to a probability map in which every pixel has the probability of belonging to the developed class in the range of (0, 1).
The proposed modified U-Net model has 8,637,537 trainable parameters. The loss function for this model was binary cross-entropy, and the adaptive moment estimation (Adam) was adopted as the optimizer. Because a small dataset was used in this study, data augmentation techniques of flipping and patch overlapping were utilized.

Random forest
Random Forest was considered as the base line model for evaluating the built-up land expansion predicted by the proposed U-Net. RF is a supervised machine learning algorithm which is an ensemble of classification and regression trees (Cutler, Richard Cutler, and Stevens 2012). It combines the output of multiple trees to reach a more precise result. To overcome the complexity of the RF model, Shafizadeh-Moghadam et al. (2021) used a combination of the Forward Feature Selection algorithm with Random Forest (FFS-RF), which runs a RF multiple times while considering a different subset of independent variables in each run (Meyer et al. 2018). Herein, the FFS-RF method was used to compare with the proposed U-Net results.

Evaluation metrics
Relative operating characteristic (ROC) is a popular statistical method that compares built-up land expansion predictions with a real built-up land expansion map. This comparison for diagnosing the presence or absence of built-up land expansion is done by considering a specific threshold and determining whether the value of the prediction map is equal to or greater than the threshold (Pontius and Si 2014;Green and Swets 1966;Egan and Egan 1975;Fawcett 2006). Each threshold leads to constructing a contingency table that includes four entries: hits, misses, false alarms, and correct rejections. Hits is the number of pixels that are developed in both the real and predicted built-up land expansion maps. Misses refers to the number of pixels that are developed in the real built-up land expansion map but undeveloped in the predicted map. False alarms indicate the number of pixels that are undeveloped in the real built-up land expansion map but developed in the predicted map. And correct rejections are the number of pixels that are undeveloped in both the real and predicted built-up land expansion maps (Pontius and Si 2014).
ROC is a diagram that displays two ratios of false alarms / (false alarms + correct rejections) and hits / (hits + misses) for each threshold on the horizontal and vertical axes, respectively. ROC is limited by the impossibility of attaining any information about the size of each entry for each contingency table related to each threshold. This issue limits the useful measurements a scientist could have for each threshold. To address this limitation, Pontius and Si (2014) proposed the TOC, which reveals all the information of a contingency table for each threshold; thus, it is more complete than ROC.
TOC, an improved version of ROC, begins with the construction of contingency tables for each threshold as mentioned for ROC with the same entries. The horizontal axis of TOC represents the total number of pixels of the developed class in the prediction map corresponding to the value of (hits + false alarms) at a certain threshold in a numerical range from zero to the total area of the study area (hits + misses + false alarms + correct rejections). The numerical range of the vertical axis of TOC, which represents the value of hits, is also between zero and the total number of pixels corresponding to the developed class in the real change map (hits + misses) (Shafizadeh-Moghadam et al. 2021). Therefore, TOC reveals the total number of pixels of the study area as well as the number of the developed pixels in the real change map. This enables the full retrieval of the contingency table corresponding to each threshold using the TOC diagram.
The output of the modified U-Net is a probability map in which every pixel has a value between 0 and 1, indicating its probability of developing into an urban area. Herein, a top-down cell allocation method was used to allocate developed pixels. With this method proposed by (Pijanowski et al. 2002), pixels with the highest transition probability were considered as developed pixels. The exact number of the developed pixels was extracted from the real change map of 2008-2018. After generating the predicted change map for 2008-2018, the error maps were produced by comparing the real and predicted change maps for 2008-2018.

Results and discussion
In this section, the results of the experiments mentioned in section 2.2, the results of determining the number of optimal parameters for the proposed U-Net model, examining the effect of using batch normalization method, and examining the usage of dropout layers are presented.

Adjusting the parameters used in data pre-processing
The results of the experiment of finding the optimal dimensions for the patches and their appropriate overlap are illustrated in Table 2. The results showed that the conversion of the dataset into patches of 256 × 256 and an overlap of 75% produced the best AUC for training, validation, and test datasets. The reason for this is the increase in data, which is in line with the need of deep learning networks for an extensive dataset. It should be noted that to avoid an imbalance in the number of pixels of the developed class compared to the undeveloped, only those patches containing at least ten percent developed pixels were considered.
The results of the experiment of dividing the 1998-2008 dataset into training and validation sets in the two modes of partitioning after patch creation and partitioning before patch creation are presented in Table 3. The test dataset was the same for the two cases and differed only in the training and validation. In the first case, due to the high overlap between the patches, the probability of repeating the similar patterns in the training and validation data is high, which prohibits model from validating on a different dataset. In this case, the AUC for training and validation are 0.99, while the AUC value of 0.61 for applying this model on the test dataset indicates the poor performance of this model on a different dataset. This is an overfitting case, meaning that the proposed model has learned training and validation data so strict that it has lost its predicting power for the unseen test set. This situation did not occur in the second case. In the second case, the dataset of 1998-2008 was divided into two different partitions of training and validation sets before creating the overlapped patches. Therefore, there is no overlap between the two sets of training and validation. This helps the model become more generalized, as the validation set acts as a completely different dataset. Although this leads to a decrease in the AUC for training and validation, it prevents overfitting the model on the dataset of 1998-2008. This is shown by the increase in the AUC of the test set from 0.61-0.79. The performance of the model on all three sets of training, validation, and test datasets in the second mode was acceptable. The AUC of 0.91 for both training and validation and the AUC of 0.74 on the test dataset indicate a decrease in modeling accuracy in the second mode compared to the first one, which produces a flexible model indicating an increase in the performance on the test dataset. Table 4 illustrates the results of the experiment of the dataset split ratio for training and validation sets. According to the results shown in Table 4, the best performance of the U-Net model occurred in the fourth case, in which the dataset of 1998-2008 was divided into training and validation sets with ratios of 90% and 10%, respectively.

Results of modifying the U-Net
To determine the optimal number of U-Net parameters, different numbers of convolutional filters (NCF) in each convolutional block were tested in six steps, as shown in Table 5. In each step, the number of the convolutional filters of the first block of the encoding path was considered as 8, 12, 16, 32, 64, and 128, respectively. By increasing the number of convolutional filters, the number of trainable parameters of the U-Net increases, as mentioned in Table 5.
To evaluate the impact of the number of trainable parameters in the proposed U-Net on its performance, each model with a different number of trainable parameters was trained in 100 epochs. These results of AUC can be seen for each of the 6 models in Table 6. The trained models in the mentioned six types are also applied to the test dataset.
According to Table 6, increasing the number of trainable parameters of the proposed U-Net model leads to higher performance. This is true until the third stage of increasing the parameters of the model, when such increase causes the model to become so complex that the training and validation datasets do not meet the needs of this model to define its parameters. The best AUC among the six models presented on the training and validation data shown in Table 6 is for the fourth model, in which the structure of the U-Net model begins with 32 filters and has the total number of 8,637,537 trainable parameters. This model obtained AUC values of 0.97 and 0.96 for training and validation datasets, respectively.
The impact of adding the batch normalization method in the U-Net was also investigated with the aim of improving the performance of the U-Net for simulating and predicting built-up land  Table 5. The number of convolutional filters (NCF) in each convolutional block and the total number of trainable parameters of the U-Net. expansion. The performances of the two U-Net models (with and without batch normalization) are summarized in Table 7. After training the U-Net, the trained model entered the forecasting process using the test data, the results of which are shown in Table 7 and Figure 4. According to Table 7, the AUC values of using the two U-Net models without using a batch normalization method for training, validation, and testing datasets are equal to 0.86, 0.86, and 0.81, respectively. When adding batch normalization to the U-Net, these values increased to 0.97, 0.96, and 0.86, respectively. These results indicate the significant impact of using batch normalization on the convergence of the U-Net model, especially in the current research which had a small dataset.
The effect of using the dropout method in increasing the generalizability of the U-Net model was investigated by observing the performance of this model into three models: the U-Net model without the dropout method, U-Net using the dropout method, and U-Net using the spatial dropout method. The spatial dropout method is created by making a small change in the structure of the dropout method. In the regular dropout method, neurons are randomly deleted regardless of the input data structure, while in the spatial dropout method, the structure of the input data does not change, and the removal of neurons works to maintain the dimensions of the input data. The TOC and the AUC for the training, validation, and test sets are given in Figure 5 and Table 8.  According to the TOC diagrams and AUC reported in Figure 5 and Table 8, using the spatial dropout method in the structure of the U-Net model effectively improved the accuracy of modeling and predicting built-up land expansion Figure 6.

Predictive power comparison for U-Net and FFS-RF as a base line model
The TOC was used in this study to evaluate the transition probability maps produced by the proposed U-Net and the FFS-RF method proposed by Shafizadeh-Moghadam et al. (2021). The AUC for each TOC of the modified U-Net and the FFS-RF baseline model is mentioned in Table 9.
Based on the results mentioned above, the modified U-Net proposed in the current research could reach a TOC of 0.87 for predicting built-up land expansion during 2008-2018, while the AUC for the FFS-RF method was 0.82. Figure 7 shows the transition probability map showing the potential of built-up land expansion for 2008-2018.
To justify our results in comparison with other related works, the FoM is also calculated and compared to the studies by He et al. (2018) and Zhai et al. (2020). The proposed modified U-Net by achieving FoM value of 0.371 proved its superiority to the UMCNN-CA model proposed by He et al. (2018) and CNN-VCA model proposed by He et al. (2018) with the FoM of 0.346 and 0.361, respectively.

Built-up land expansion allocation
The number of the developed pixels is 224,599 based on the real change map of 2008-2018. For allocating these pixels, a top-down and a CA model were used. The percent correct metric (PCM), which calculates the number of true positive pixels by the total number of developed pixels, was used to evaluate and compare their performance. The results, presented in Table 10, indicate relatively equal PCMs using both the top-down and CA method, which proves the primary assumption that considering the neighborhood effect by applying a CNN for built-up land expansion simulation satisfies the simulation and allocation precision and eliminates the need for combining the proposed model with a CA. Therefore, the developed pixels were allocated using a top-down method and produced the predicted change map for 2008-2018, which is presented in Figure 8. An error map created by comparing the real and predicted change maps for 2008-2018 is represented in Figure 9.

Conclusion
This research proposed a novel approach for using a modified U-Net as a deep convolutional neural network for pixel-wise semantic segmentation for built-up land expansion modeling during the period of 1998-2008 in Tehran and Karaj provinces, Iran. One benefit of this approach is the consideration of localization using deep learning methods, which is a remarkable issue in land transformation modeling. Due to the complexity of the built-up land expansion   process and the impact of neighborhoods in each region on the possibility of future built-up development in that region, the independent variables entered the network in the form of data cubes, which empowers the model to consider all factors affecting built-up land expansion as well as the neighborhood effect for each patch of the study area. Different modifications for U-Net were examined in this study to obtain a model capable of simulating and predicting built-up land expansion accurately, even if the study area is limited, which is an important issue when using a deep learning model. A predictive power comparison of the proposed modified U-Net with the AUC value of 0.87 along with a RF as a base line model with the AUC value of 0.82, conducted by TOC, showed that the proposed model is capable of simulating and predicting built-up land expansion more accurately. Furthermore, our results have proved that the proposed modified U-Net is capable of considering the neighborhood effects effectively, whose advantage is eliminating the need of using a CA to combine with our method. Applying a modified U-Net on built-up land expansion simulation as a deep convolutional neural network which considers localization, is proposed to overcome the limitation of other methods on this subject in considering the neighborhood effect, which forces them to be combined with CA. To evidence this issue, a comparison was done between the results of the proposed modified U-Net and a simple top-down allocation with the findings of two other studies using a combination of conventional CNN models with CA. The modified U-Net by achieving FoM value of 0.371 proved its superiority to the UMCNN-CA model and CNN-VCA model with the FoM of 0.346 and 0.361, respectively. However, due to using different datasets within these studies, our suggestion for future works is to compare them on a unique dataset.
The results showed that this approach yields a precise simulation through considering the neighborhood effects, entering independent variables in the form of data cubes, and learning the complex transition rules through deep convolutional layers by CNN, and allocating effectively by a simple top-down method. Furthermore, it is possible to apply the modified U-Net on a limited area study, which gives policymakers a view of the future of an individual city to help in their decision-making.

Data availability statement
The data that support the findings of this study are openly available in 'figshare' at https://doi.org/10.6084/ m9.figshare.15101853.v1.