Artiﬁcial Intelligence and Urban Green Space Facilities Optimization Using the LSTM Model: Evidence from China

: Urban road green belts, an essential component of Urban Green Space (UGS) planning, are vital in improving the urban environment and protecting public health. This work chooses Long Short-Term Memory (LSTM) to optimize UGS planning and design methods in urban road green belts. Consequently, sensitivity-based self-organizing LSTM shows a Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) of 1.75, 1.12, and 6.06, respectively. These values are superior to those of LSTM, XGBoost, and SVR. Furthermore, we conﬁgure three typical plant community models using the improved LSTM model and found that different plant community conﬁgurations have distinct effects on reducing PM 2.5 concentrations. The experimental results show that other plant community conﬁguration models have speciﬁc effects on reducing PM 2.5 concentrations, and the multi-layered green space with high canopy density in the community has a better impact on PM 2.5 reduction than the single-layer green space model with low canopy density. We also assess the reduction function of green road spaces on PM 2.5, which revealed that under zero pollution or slight pollution (PM 2.5 < 100 µ g · m − 3 ), the green space signiﬁcantly reduces PM 2.5. In UGS planning, the proposed model can help reveal UGS spatial morphology indicators that signiﬁcantly impact PM 2.5 reduction, thereby facilitating the formulation of appropriate green space planning strategies. The ﬁnding will provide primary data for selecting urban road green space plant conﬁguration.


Introduction
With continuous socioeconomic development and improved living standards, urban modernization paces up, and private car ownership substantially grows. However, this has also given rise to various urban issues, like environmental pollution, of which the hazy weather affects urban residents' lives most directly [1,2]. In particular, Particulate Matter 2.5 (PM 2.5) particles are the main component of smog, particles in the ambient air with an equivalent aerodynamic diameter of 2.5 microns or less. Of these, larger-sized PM 2.5 particles can adsorb harmful substances and bypass the human body's immune system. Eventually, they will adhere to and invade the respiratory tract and pulmonary lobes, reducing the defense function of the human respiratory system. It can also weaken near-ground ultraviolet rays, enhancing in-air infectious bacteria's vitality and increasing contagious diseases. In haze conditions, individuals who breathe more frequently and deeply are at a higher risk of inhaling more significant amounts of harmful substances into their bodies. These can induce or exacerbate various diseases. The direct results are the subjects' poor breathing, eye irritation, chest tightness, chest pain, headache, dry throat itching, cough, and vomiting. Elderly patients with a history of chronic bronchitis, asthma, or chronic respiratory diseases, can easily suffer acute upper respiratory tract infections, rhinitis, acute bronchitis, chronic bronchitis, asthma, and pneumonia attacks. Therefore, hazy weather can damage human cells and prevent them from functioning normally, damaging the human's physical and mental health. PM pollution can also cause severe losses to tourism, agriculture, social order, and ecosystems [3]. To this end, many experts devised ways to monitor and predict air quality. Currently, air quality prediction models employ either numerical prediction based on atmospheric kinematics or statistical approaches based on Machine Learning algorithms. The numerical prediction model is based on the internal physical laws of the atmosphere, such as atmospheric dynamics and thermodynamics, to establish corresponding mathematical and physical models relying on mathematical methods and the computing power of large-scale computers. Numerical prediction models factor in the dynamic distribution, transportation, and diffusion of air pollutant concentrations. Machine Learning models use statistics, probability theory, and complex algorithms to build models, mine data relationships from known data, and achieve refined predictions. Ecologists initially proposed the concept of Sustainable Development (SD) as a development plan that can meet contemporary needs without adversely affecting future generations. The aim is to balance the development of natural ecological resources. Subsequently, experts in various fields began to define SD in terms of social, economic, and technological attributes. Ultimately, achieving Sustainable Development (SD) hinges on coordinating and integrating social, economic, population, resources, and environmental development efforts. So far, research on SD mainly draws on academic attempts in different academic fields, with much information scattered or heterogeneous. A sustainable economy has emerged as the engine for the era's continued growth in light of the global economy's quick expansion [4]. Continuous social development will likely drive much information interleaving and fusion. Regarding the big data of scientific research, applying SD to the scientific field can improve the information utilization rate and promote sustainable science and technological development.
Researchers have done much research work in corresponding fields. For example, Yin (2020) [5,6] studied the characteristics of Urban Green Space (UGS) under shared transportation. They proposed strategies for urban spatial planning in China, such as activating spatial inventory, road sharing, and establishing a credit system. Zhang et al. (2020) [7,8] pointed out that UGS, a natural-based solution for promoting public health, provided cities with a wide range of ecosystem services. They believed UGS was conducive to promoting the SD of cities and improving residents' life quality. The author proposed a theoretical framework and summarized the potential ways for UGS to provide health benefits from four perspectives: encouraging physical activity, alleviating psychological stress, providing environmental regulation and support services, and promoting social cohesion. Then, combined with theoretical review research, a preliminary health-based UGS planning and design strategy was proposed. Shan et al. (2021) [9,10] contended that UGS, a key component of urban infrastructure, should be upgraded with the Smart City construction. Intelligent UGS (Unigraphics) planning and sustainable landscaping schemes would fuse UGS's ecological performance and similar functions. Intelligent unattended ground sensor planning was a technological approach and a human-oriented smart application. They proposed an ensemble UGS segmentation model based on five types of unattended ground sensors from sample stations in Baqiao District, the City of Xi'an, Shaanxi Province. They conducted a survey to assess the socioeconomic attributes of respondents, as well as differences in their access frequency and demand for the services under investigation. Finally, the UGS under study was improved based on public preferences and the station's current conditions.   [11,12] suggested that by drawing upon the expe-rience of green development in the existing pilot cities, the government could broaden the pilot's scope and reinforce its policy orientation towards green development, thus facilitating sustainable development in the cities. Zou and Wang (2021) [13,14] argued that a morphological perspective was a novel way to carry out UGS planning practices, protect and restore urban natural habitat functions, and maintain an excellent spatial pattern of the ecological environment. With respect to the investigation and analysis of relevant scientific literature on UGS morphology, their research discussed the regional and temporal background of UGS morphology research, as well as the knowledge framework of related research. The existing problems in UGS morphology research have also been revealed. Based on this, it proposed future research directions and objectives. Notably, the extant scholarship predominantly focuses on the urban scale, with some quantitative research results, and needs an in-depth quantitative and detailed analysis of green patches.
The significance of this work is based on the fact that the harmful effect of PM 2.5 has recently attracted widespread attention. This work quantifies the impact of UGS on air quality from the perspective of PM 2.5 reduction, enriching the theories and methods of UGS planning and design. This work aims to deepen research on urban infrastructure in the ecological direction. By studying the planning and design of UGS to regulate PM 2.5 pollution, the research system of PM 2.5 has been enriched. This work begins with a quantitative study of the impact of different morphological characteristics of UGS on the internal PM 2.5 concentration and the degree of impact of different types of UGS on surrounding PM 2.5. It supplements the systematic quantitative study of the current preliminary phenomenon description. The PM 2.5 concentration in xx city was chosen as the research object, and the Long Short-Term Memory (LSTM) model was introduced to forecast the PM 2.5 concentration. Then, to solve the network structure problem, a sensitivity-based self-organizing LSTM is proposed for PM 2.5 concentration prediction. Finally, UGS planning is conducted based on the predicted PM 2.5 concentration in different regions of xx city.

RNN and LSTM Algorithm
As a typical Feed Forward Neural Network (FNN), Recurrent Neural Network (RNN)s differ from FNNs. While an FNN's neurons only transfer information between layers, an RNN allows information to be transferred between neurons by introducing a circular structure [15][16][17]. In this way, RNN can memorize the last input as a "storage" and provide a reference for the following step [18][19][20][21][22]. Unlike an FNN, which can only map a hidden layer-to-output layer input, an RNN can map all information to an individual output neuron with a better prediction. An RNN model includes three layers: input, hidden, and output. The mathematical model of the RNN is shown in Equations (1)-(5): In Equations (1)-(5), t is the time point, x (t) and y (t) are the input quantities of the RNN's input layer, and h (t) and h (t−1) represent the hidden layer outputs for the RNN at times t and t−1, respectively. The ϕ function activates the RNN's hidden layer. o (t) and σ denote the input quantity and activation function of the RNN's output layer, respectively. U denotes the weighted value from the input layer to the hidden layer. W and b mean the weight and bias between each hidden layer, respectively. Finally, V and c are the hidden layer-output layer weight and bias, respectively.
LSTM is a sub-category of RNN [23][24][25][26][27] and can effectively handle time seriesdependent events. An LSTM unit contains one input gate and one output gate but multiple forgetting gates [28][29][30][31][32][33]. The input gate can control the model's input data. The output gate functions as the model's output controller for the calculation results. The forgetting gate helps determine what the memory module should forget (abandon) from the previous moment. The model structure is shown in Figure 1 and is calculated by Equation (6): respectively. U denotes the weighted value from the input layer to the hidden layer. W and b mean the weight and bias between each hidden layer, respectively. Finally, V and c are the hidden layer-output layer weight and bias, respectively.
LSTM is a sub-category of RNN [23][24][25][26][27] and can effectively handle time series-dependent events. An LSTM unit contains one input gate and one output gate but multiple forgetting gates [28][29][30][31][32][33]. The input gate can control the model's input data. The output gate functions as the model's output controller for the calculation results. The forgetting gate helps determine what the memory module should forget (abandon) from the previous moment. The model structure is shown in Figure 1 and is calculated by Equation (6): In Equation (6), f indicates a forgetting gate, which controls what input information to "forget," and the input gate controls which new data to be written into long-term memory.
In Equations (7)-(9), i represents an input gate. f and i choose the Sigmoid activation function. When the Sigmoid function outputs between 0 and 1, C and C represent the neuron's t − 1-moment and t-moment states, respectively. In Equation (6), f t indicates a forgetting gate, which controls what input information to "forget," and the input gate controls which new data to be written into long-term memory.
In Equations (7)-(9), i t represents an input gate. f t and i t choose the Sigmoid activation function. When the Sigmoid function outputs between 0 and 1, C t−1 and C t represent the neuron's t − 1-moment and t-moment states, respectively. In Equation (11), o t is the output gate control signal's output level. w o denotes the sequence's output at the tth step. Figure 2 shows an improved LSTM.
In Equation (11), o is the output gate control signal's output level. w denotes the sequence's output at the tth step. Figure 2 shows an improved LSTM.

Prediction Algorithm Based on Self-Organizing LSTM
This section introduces a sensitivity-based algorithm for self-organizing training of LSTM. The decomposition of the LSTM network is shown in Figure 3. Here, the red arrow represents the self-feedback portion of the hidden layer neuron [34,35], which is the output of the hidden layer neuron at t − 1 time (expressed as z (t) = (H (t − 1), H (t − 1), ⋯ , H (t − 1))) where N represents the number of neurons in the hidden layer. The blue arrow indicates the output portion from the hidden layer to the output layer, that is, the output of the hidden layer neuron at time t (expressed as z (t) = (H (t), H (t), ⋯ , H (t))).

Prediction Algorithm Based on Self-Organizing LSTM
This section introduces a sensitivity-based algorithm for self-organizing training of LSTM. The decomposition of the LSTM network is shown in Figure 3. Here, the red arrow represents the self-feedback portion of the hidden layer neuron [34,35], which is the output of the hidden layer neuron at t − 1 time (expressed as z 1 where N represents the number of neurons in the hidden layer. The blue arrow indicates the output portion from the hidden layer to the output layer, that is, the output of the hidden layer neuron at time t (expressed as z 2 (t) = (H 1 (t), H 2 (t), · · · , H N (t))).
In Equation (11), o is the output gate control signal's output level. w denotes the sequence's output at the tth step. Figure 2 shows an improved LSTM.

Prediction Algorithm Based on Self-Organizing LSTM
This section introduces a sensitivity-based algorithm for self-organizing training of LSTM. The decomposition of the LSTM network is shown in Figure 3. Here, the red arrow represents the self-feedback portion of the hidden layer neuron [34,35], which is the output of the hidden layer neuron at t − 1 time (expressed as z (t) = (H (t − 1), H (t − 1), ⋯ , H (t − 1))) where N represents the number of neurons in the hidden layer. The blue arrow indicates the output portion from the hidden layer to the output layer, that is, the output of the hidden layer neuron at time t (expressed as z (t) = (H (t), H (t), ⋯ , H (t))). The sensitivity of the self-organizing LSTM is defined by Equation (12): In Equation (12), Z h represents the hth input factor, Y is the output layer output of the model and E(Y | Z h ) means the expected output of the constant Z h under the condition of Y. Var() means the calculated variance.
In this work, the proposed LSTM model divides the sensitivity analysis into two calculation parts: indirect sensitivity and direct sensitivity. The red self-feedback section in Figure 3 is used to calculate indirect sensitivity, while the blue output section is used to calculate direct sensitivity. Therefore, according to Equation (12), the calculation for indirect sensitivity reads: In Equation (13), H h (t − 1) represents the output of the hth hidden layer neuron at time H h (t − 1), namely the self-feedback of the hth hidden layer neuron at time t. By comparison, y(t) denotes the output of the network output layer. According to Equation (13), the mathematical expression of H h (t − 1) reads: In Equation (14), W x , W i , W f , W o represents the input weight of the LSTM unit, the weighted input gate, the forgetting gate, and the output gate's control signals, respectively. b x , b i , b f , andb o are the input bias, the bias of the input gate control signal, the bias of the forgetting gate control signal, and the bias of the output gate control signal, respectively.
The output representation of the output layer is shown in Equation (15): In Equation (15), W j (t) represents the weight connecting the jth hidden layer neuron to the output layer at time t.
The calculation method of direct sensitivity reads: Direct sensitivity and indirect sensitivity have some differences. For direct sensitivity, the condition for obtaining the desired output is replaced by the output of the hth hidden layer neuron at time h, and this relationship is shown in Equation (17): After obtaining the indirect sensitivity and direct sensitivity, the overall sensitivity will be calculated by Equation (18): Based on the calculation of sensitivity analysis, this section proposes a self-organizing LSTM. The algorithm flow is explained in Algorithm 1: Before training, the hidden layer neurons numbers and network parameters are initialized randomly [36][37][38][39]. Then, the network continuously performs iterative training until the maximum iteration is reached and the output of the loss function has not reached the set threshold ζ(t) = t −0.65 . The loss function of the proposed LSTM model is defined in Equation (19): In Equation (19), y d (p) and y(p) represent the expected and real outputs at time p. In each iterative training, four values are first calculated: the output value, E(t), of the loss function, the indirect sensitivity, S 1 h, of hidden layer neurons, the direct sensitivity, S 2 h , of hidden layer neurons, and the total sensitivity, S h , of hidden layer neurons.
If E(t) > ζ(t), the performance of the network does not achieve the desired effect. In that case, the N + 1 LSTM unit needs to be added. The weight initialization of the LSTM unit is shown in Equations (20)-(25): represents the weight connecting the output layer with a certain unit. n is the most sensitive hidden layer neuron (with the highest sensitivity).

Selection of Test Points and Setup of Experimental Hardware Equipment
According to the main urban road green space types and the distribution characteristics of urban road patterns in xx city, three different green space configuration modes are selected as test points. The experiment is based on the xx North Road (A), xx South Road (B), and This section uses Python to preprocess data, build, train, and test models under the TensorFlow-based deep learning framework. Computing platforms, including Compute Unified Device Architecture (CUDA), is installed to create a Graphics Processing Unit (GPU) accelerated environment. The hardware simulation environment for the experiment is an Intel (R) Core (TM) i7-6900K CPU, using NVIDIA GeForce GTX 1080 GPU to accelerate the model, with an 8 G GPU.

Comparative Analysis of the Fitting Effects of Self-Organizing LSTM Based on Sensitivity and Other Models
This section compares the proposed sensitivity-based self-organizing LSTM with LSTM, Extreme Gradient Boosting (XGBoost), and Support Vector Regression (SVR). Their performance is analyzed in terms of Accuracy, Precision, Recall, and F1-value. The results are unfolded in Figure 4:  In Figure 4, the proposed sensitivity-based self-organizing LSTM is compared with several classical models from the accuracy, precision, recall, and F1 perspectives. The experimental outcomes show that the accuracy of the proposed model is 89.24%, outperforming other classical algorithms. Accuracy is improved by at least 3.6%, which is 20% higher than the SVR model. At the same time, the proposed model has the highest preci-   In Figure 4, the proposed sensitivity-based self-organizing LSTM is compared with several classical models from the accuracy, precision, recall, and F1 perspectives. The experimental outcomes show that the accuracy of the proposed model is 89.24%, outperforming other classical algorithms. Accuracy is improved by at least 3.6%, which is 20% higher than the SVR model. At the same time, the proposed model has the highest precision, recall, and F1 values, over 2% higher than the control models. Compared with other classical algorithms, the sensitivity-based self-organizing LSTM algorithm has shown higher accuracy and minor error. Figure 5 compares the prediction speed of different algorithms. In Figure 4, the proposed sensitivity-based self-organizing LSTM is compared with several classical models from the accuracy, precision, recall, and F1 perspectives. The experimental outcomes show that the accuracy of the proposed model is 89.24%, outperforming other classical algorithms. Accuracy is improved by at least 3.6%, which is 20% higher than the SVR model. At the same time, the proposed model has the highest precision, recall, and F1 values, over 2% higher than the control models. Compared with other classical algorithms, the sensitivity-based self-organizing LSTM algorithm has shown higher accuracy and minor error. Figure 5 compares the prediction speed of different algorithms. As per Figure 5, in the proposed sensitivity-based self-organizing LSTM, the response time corresponding to eight time points is distributed between 5.1 s and 5.9 s. The result suggests that the proposed model presents high real-time performance. Table 2 Table 2, the sensitivity-based self-organized LSTM PM 2.5 concentration estimation model proposed has a daily RMSE of 120 µg·m −3 at Station A. Apparently, the daily RMSE has decreased by 20.13%, 18.81%, and 29.51%, respectively, compared to LSTM, XGBoost, and SVR. On the other hand, with a daily MAE of 115 µg·m −3 , the proposed self-organizing LSTM model's MAE has decreased by 5.46%, 12.39%, and 7.09%, respectively, compared to LSTM, XGBoost, and SVR. The results show that the proposed sensitivity-based, self-organized LSTM PM 2.5 concentration estimation model can accurately predict PM 2.5 concentration in different regions. However, for Stations with low PM 2.5 concentration, the sensitivity-based self-organized LSTM PM 2.5 concentration estimation model's prediction error is relatively large. The sensitivity-based self-organizing LSTM has RMSE, MAE, and MAPE of 1.75, 1.12, and 6.06, respectively, superior to LSTM, XGBoost, and SVR. According to the daily measurement curve in Figure 6, before UGS planning near stations A, B, and C, the PM 2.5 concentration is low in the daytime but high at the beginning and the end of the day. Specifically, the PM 2.5 concentration of Stations A, B, and C's nearby green belts increases from 8:00 a.m. to 10:00 p.m. It begins to decrease, reaching According to the daily measurement curve in Figure 6, before UGS planning near stations A, B, and C, the PM 2.5 concentration is low in the daytime but high at the beginning and the end of the day. Specifically, the PM 2.5 concentration of Stations A, B, and C's nearby green belts increases from 8:00 a.m. to 10:00 p.m. It begins to decrease, reaching a minimum around 12:00-14:00. After that, PM 2.5 concentration rises until it reaches its maximum at 7:00 p.m. The urban traffic flow is closely related to the atmospheric PM 2.5 concentration difference.

Comparison of Daily Changes in PM 2.5 Concentration
After the UGS planning near Stations A, B, and C, the PM 2.5 concentration is also high in the morning and evening and low in the daytime. However, the variation range is relatively small. PM 2.5 concentration in the different green belts near stations A, B, and C has a slow trend of increasing from 9:00 a.m. The concentration begins to decline after 10:00 a.m., reaching a minimum value around 14:00. After that, PM 2.5 concentration slowly increases with the increase of traffic flow.  Table 3 lists the ability of green road space to reduce PM 2.5 under zero-pollution or light pollution conditions/moderate pollution conditions/severe pollution conditions. Under zero-pollution or slightly polluting weather conditions throughout the year, the reduction function of green road space on PM 2.5 varies between different green belt widths and distances. On each measurement Station, there is a significant dust reduction function at a green belt width of 6 m, 16 m, 26 m, and 36 m. The green belt's dust reduction rate order is C > A > B. Among the three green belts, the green belt near Station C has a higher reduction function on PM 2.5 than the other two green belts, with an average reduction rate of 9.70%. The highest reduction rate is at 36 m, reaching 12.22%. The order of reduction rates at different green belt widths is 36 m > 26 m > 16 m > 6 m. This is mainly because the dust reduction difference may be related to the configuration structure of green belts and plant species at various points. The composition of the green space community near Station C is mostly large trees, with a high canopy density in the forest, reaching 80%. Under moderately polluted weather conditions, the reduction function of green space on PM 2.5 is insignificant at different locations and greenbelt widths. Except for Station A's green belt, the reduction rates of the other two green belts on PM 2.5 are primarily negative. The results indicate that certain weather pollution conditions restrict the effect of green space on PM 2.5 reduction. Furthermore, under the same weather conditions, different plant allocation modes significantly impact the PM 2.5 pollution in the air. For example, under severely polluted weather, or the PM 2.5 pollution reaches a level above severe, the reduction function of green road space on PM 2.5 in different locations and green belt widths is insignificant. At this time, the reduction function is negative. The findings corroborate that the UGS's dust reduction and retention functions are limited. The dust reduction and retention functions are sometimes neglected, especially in highly polluted environments.

Conclusions
PM 2.5 is a tiny inhalable particulate with a radius of 1.25 or smaller in ambient air. Featuring small particle sizes and long transportation distances, PM 2.5 often contains complex harmful and even toxic substances. It does not easily settle down, causing air pollution and endangering public health and air quality. This work mainly studies prediction schemes based on LSTM and proposes an LSTM model based on the self-organizing algorithm. To this end, sensitivity is chosen as an indicator to increase or delete hidden layer neurons. This work determines the hidden layer neurons' quantity after network training. From the experimental results, this determined quantity falls within the available range. Thereby, it ensures high prediction accuracy and alleviates difficulty in determining how many hidden layer neurons to choose. Further, the proposed sensitivity-based self-organizing LSTM algorithm is compared with LSTM, XGBoost, and SVR. The results show that the sensitivity-based self-organizing LSTM algorithm presents an RMSE, MAE, and MAPE of 1.75, 1.12, and 6.06, respectively, superior to LSTM, XGBoost, and SVR. Last but not least, there are still some shortcomings in the research results. The proposed model has certain areas for improvement in prediction accuracy. Therefore, the main direction of future work is expected to utilize or develop better deep learning models to improve the proposed model.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.