Effect of geographical parameters on PM 10 pollution in European landscapes: a machine learning algorithm-based analysis

Background PM 10 , comprising particles with diameters of 10 µm or less, has been identified as a significant environmental pollutant associated with adverse health outcomes in European cities. Understanding the temporal variation of the relationship between PM 10 and geographical parameters is crucial for sustainable land use planning and air quality management in European landscapes. This study utilizes Conditional Inference Forest modeling and partial correlation to examine the impact of geographical factors on monthly average concentrations of PM 10 in European suburban and urban landscapes during heating and cooling periods. The investigation focuses on two buffer zones (1000 m and 3000 m circle radiuses) surrounding 1216 European air quality monitoring stations. Results Results reveal importance and significant correlations between various geographical variables (soil texture, land use, transportation network, and meteorological) and PM 10 quality on a continental scale. In suburban land-scapes, soil texture, temperature, roads, and rail density play pivotal roles, while meteorological variables, particularly monthly average temperature and wind speed, dominate in urban landscapes. Urban sites exhibit higher R -squared values during both cooling (0.41) and heating periods (0.61) compared to suburban sites (cooling period R -squared: 0.39; heating period: R -squared: 0.51), indicating better predictive performance likely attributed to the less heterogeneous land use patterns surrounding urban PM 10 monitoring sites. Conclusion The study underscores the importance of investigating spatial and temporal dynamics of geographical factors for accurate PM 10 air quality prediction models in European urban and suburban landscapes. These findings provide valuable insights for policymakers, urban planners, and environmental scientists, guiding efforts toward sustainable and healthier urban environments.


Introduction
Air quality is a critical concern with direct implications for public health, particularly in urban and suburban areas where various anthropogenic activities contribute to the dispersion of particulate matter (PM) [82].Particulate matter is considered one of the most serious hazards for human health, the environment, and the climate on a global scale [28].Despite considerable improvements in recent decades in many industrialized countries, PM pollution still causes thousands of premature deaths and increases in various pathologies each year in Europe [36].
PM 10 , comprising particles with diameters of 10 µm or less, has been identified as a significant environmental pollutant associated with adverse health outcomes [46,47,74].In Europe, air pollution significantly impacts human health (e.g., reducing life expectancy) and the economy (e.g., increasing medical costs and reducing productivity).
Our previous study [83] examined the connection between different land use categories from the Urban Atlas Land Cover/Land Use 2018 [26] and the monthly average PM 10 concentrations reported by the European Environmental Agency (EEA), finding significant seasonal variations between heating (cold seasons) and cooling (warm seasons) periods [49].We hypothesized that the relationship between PM 10 and geographical parameters varies significantly between these periods.Understanding the temporal variations in PM 10 concentrations during both heating and cooling periods is critical for developing effective air quality management strategies, particularly in Europe's dynamic urban and suburban landscapes [51,67,83].Previous studies have primarily investigated correlations between influential factors, such as land use structures, climatological variables, or soil properties, and either PM 10 or PM 2.5 individually, typically on a local or regional scale [37,39,44,51,62,73,80,86,92,98,100].
This study applied an advanced Random Forest (CRF) model to estimate monthly average PM 10 concentrations across European countries.Our objectives are to (1) estimate monthly average PM 10 concentrations based on geographical parameters, (2) determine the importance of each independent geographical variable in predicting PM 10 concentrations during the cooling and heating period, and (3) examine the scale dependence of these variables' effects on PM 10 levels using two buffer zone radii (1000 m and 3000 m) around air quality stations across European cities.This research is essential to understand the effects of geographical factors on the level of PM 10 pollution.Our research opens possibilities for predicting PM 10 levels and offers a foundation for assessing particulate matter exposure, particularly in developing countries, and regions with limited PM 10 monitoring stations.The findings will be crucial for future modeling efforts, enabling more accurate PM 10 predictions and informing air quality management strategies that directly impact the health of billions of inhabitants in urban and suburban areas.

PM10 concentrations
The monthly average dataset of PM 10 concentrations (in µg/m 3 ) in 2018 is from the European AQ Portal (2020), including 1216 European AQ monitoring stations (1039 stations in urban areas and 177 stations in suburban areas) (Fig. 1).This AQ e-Reporting system was established by the European Commission and is run by the European Environmental Agency ( [29]. The stations are classified according to the old Exchange of Information Decision (EoI) 97/101/EC and the current Implementing Provisions on Reporting (IPR) 2011/850/EC, regarding the type of area (urban, suburban or rural) and also depending on the influence of the immediate surroundings (traffic, industrial or background).This classification is based mainly on spatial considerations like the degree of built-up areas, population distribution, or the influence of near sources [7,33].

Land use/cover data
Urban areas in the Urban Atlas are typically characterized by higher-density built-up areas, such as residential, commercial, and industrial zones, along with infrastructure like roads and railways.These areas are often marked by a high degree of impervious surfaces and minimal green spaces.Suburban areas, on the other hand, are characterized by lower-density residential zones, mixed land use with more open spaces, and often include suburban agricultural lands.These areas have a mixture of built-up surfaces and more natural or semi-natural spaces, providing a transition between urban cores and rural areas.
The composition of the landscape structure (area of different land use categories) around the PM 10 monitoring stations was calculated using the European Union Copernicus Land Monitoring Service's LULC 2018 Urban Atlas database.The minimum mapping unit of this polygonbased LULC dataset is 0.25 hectares.It was published by the EEA within the framework of the Copernicus program on April 16, 2020 and revised on July 16, 2021 [26].
The aggregated Urban Atlas land use categories were employed in this study based on the findings of previous investigations [83] (Table 1).

Transportation network and soil data
The GEOFABRIK (OpenStreetMap database) road network dataset was utilized in our study.Additionally, the rail network dataset for European territories was downloaded from the Euro-Global Map, published on July 27, 2018, and updated on January 14, 2019.(January 14, 2019, updated) ("Geofabrik", 2022 [35]).The United States Department of Agriculture (USDA) soil textural class dataset, based on LUCAS topsoil data, was also utilized as qualitative data for European topsoil physical properties.This dataset includes soil texture categories such as clay, silty clay, silty clay-loam, sandy clay, sandy clay-loam, clay-loam, silt, silt-loam, loam, sand, loamy loam, and sandy loam [72], and was published by the European Soil Data Centre (ESDAC), European Commission and Joint Research Center in 2015 (Fig. 2).

Meteorological data
Monthly data for 2018 were obtained in NetCDF format from the Climate Data Store (CDS) to analyze meteorological conditions, which serve as the foundational infrastructure for the Copernicus Climate Change Service: monthly averaged 10 m wind speed (m s −1 ), mean sea level pressure (Pa), average temperature, and total precipitation (m) (Table 2).According to the ERA5 data documentation, the dataset was evaluated for usability and reliability by the Evaluation and Quality Control (EQC) function of C3S [40]. Figure 3 illustrates all the European-scale digital datasets utilized in this study.

GIS analysis
Based on our previous studies [83,84], we selected the area of each aggregated Urban Atlas land cover category (km 2 ) that has shown a significant correlation with the monthly average concentration of PM 10 , road and railway density (km/km 2 ), distance from the nearest road and railway network (m), soil texture, the monthly average of meteorological variables such as wind speed (m s −1 ), mean sea level pressure (Pa), total precipitation (mm), and mean temperature (C°) as effective factors.The selected independent variables were calculated or extracted within the 1000 m and 3000 m buffer zones, based on previous results on scale sensibility [83].Arc Map 10.6.1,QGIS 3.22, and ArcGIS Pro 2.8 were used to perform spatial analysis and mapping.

Data preparation
As part of the data-cleaning procedures, missing, nodata values, and outliers were identified and removed from all dependent and independent variables.Additionally, LULC classes with fewer than 25 occurrences within the specified buffer zones were excluded [12].As demonstrated in our previous study [83], the relationship between PM 10 concentrations and land use structures varies significantly between cooling and heating periods.In the context of heating degree days (HDDs) and cooling degree days (CDDs), the "base temperature" is a threshold outside temperature used to determine when a building requires heating or cooling.Base temperatures may be defined for a particular building as a function of the temperature that the building is heated to, or it may be defined for a country or region.The range of heating threshold temperatures varies from 10-19 °C among European countries [56].To have a reliable dataset, we ignored AQ stations in countries with different temperature thresholds (e.g., extremely cold), such as Denmark, Norway, Finland, Switzerland, and Sweden [85].(In total, we had 148 AQ stations in urban landscapes and 970 AQ stations in urban landscapes.)[48,95,103] evaluated the indoor temperature set point (ST) is often 26 °C in summer and 18 °C in winter according to local building design codes and indoor thermal comfort standards in China.Also according to [68] the base indoor temperature is 18•C in Europe.Furthermore, the World Health Organization (WHO) suggests that 18 °C is a "safe and well-balanced indoor temperature to protect the health of general populations during cold seasons" [94].The United States system states that the base temperature is 18.3 °C [2,8,69].Therefore, we selected 18.3 °C as the European monthly average temperature threshold and divided the datasets between the heating period (the monthly mean temperature was below 18.3 °C) and the cooling period (the monthly mean temperature was above 18.3 °C).For the last step, we split the final dataset according to the type of landscape (urban and suburban) to better understand and explain the results in different landscape structures.

Statistical modeling
This study used a conditional inference regression random forest model (CRF), a robust ensemble learning technique recognized for its predictive capabilities and resistance to overfitting.The CRF model is suitable for analyzing the impact of both quantitative independent variables and qualitative variables, such as soil texture categories, on monthly average PM 10 values.CRFs are effective when predictors are highly intercorrelated or when models have numeric response variables, and they can handle relationships between response variables and predictors at any measurement scale [61].
Cforest is an implementation of the random forest and bagging ensemble algorithms utilizing conditional inference trees as base learners.While measuring variable importance in random forests is common for variable selection, it becomes unreliable when predictor variables differ in scale or category number.Cforest offers options not available in traditional random forests, such as fitting forests to censored, multivariate, and ordered responses [42].Additionally, when predictors (geographical factors) vary in measurement scale or category number, variable selection and importance computation in random forests are biased towards variables with many potential cut points.In contrast, cforest uses unbiased trees and an appropriate resampling scheme by default [43,87].
Both random forest and cforest use random subsets of input data for recursive partitioning to develop multiple classification or regression trees, but they differ in determining predictor variable importance.Random forest does not accurately account for correlation among predictors, leading to overestimating the significance of highly correlated predictors.Cforest addresses  this issue with a conditional permutation-importance measure.Moreover, a random forest requires normally distributed continuous dependent variables [58], which was not the case in this study.Our dependent variable, monthly average PM 10 concentration, was continuous but not normally distributed, and we had soil texture as a categorical predictor.
Conditional inference trees (CITs) and conditional random forests (CRFs) allow researchers to model relationships between numeric or categorical response variables (PM 10 ) and various predictors (geographical factors), especially when parametric methods are problematic due to complex interactions, non-linearity, and correlated predictors [61].Given these considerations, we chose the cforest algorithm within the "party" package in R to determine the most important predictor variables [23].
Model evaluation was conducted using a combination of train-test splitting and cross-validation through the following steps: (1) split the dataset into 70% training and 30% testing data.(2) Perform a grid search on the training set using cross-validation to select the optimal 'ntrees' and 'mtry' parameters.(3) Train the model using k-fold cross-validation and identify the best model.( 4) Retrain the final model on the entire training dataset using the best hyperparameters [78].(5) Use the final model to make predictions on the test dataset.(6) Evaluate the model's performance on the test set using RMSE, MAE, and R 2 metrics.The root mean square error (RMSE) measured the average magnitude of errors between predicted and observed values [16].This indicator is commonly used in regression-based machine learning models to evaluate predictive performance [52], the mean absolute error (MAE) to show the distance of the predicted values from the observed values [77], and the coefficient of determination (R-square or R 2 ) that determines the proportion of variance in the dependent variable that can be explained by the independent variable [14].The study aims to determine the parameters related to PM 10 air pollution.As a final step, we analyzed each variable's importance as a percentage, indicating its contribution to the model's predictive accuracy.Conditional inference forests provided variable importance scores, where the sign (positive/negative) indicates whether a  variable's presence improves or degrades model efficiency [23].Partial correlation analysis was performed using the Spearman method to assess nonparametric correlations, with R Studio software.This analysis explored relationships between PM 10 and variables with an importance of 5% or higher in PM 10 prediction accuracy.The "pcor" function calculates pairwise partial correlations of each variable pair, providing p-values and statistics [54].The flowchart of our research process is shown in Fig. 4.

Impact of geographical variables on PM 10 concentration in suburban landscapes
The variable importance analysis for two different scenarios during the cooling period, suburban landscape buffer zones of 1000 m and 3000 m expresses the importance of each variable as a percentage, indicating its contribution to the model's predictive accuracy (Fig. 5).In suburban areas within a 1000 m radius during the cooling period, soil texture emerges as the most influential variable, with a notable importance of 24.79%.Temperature follows closely behind at 14.45%, succeeded by roads land use area at 14.1%, and road density at 11.36%.The analysis of variables that affect PM 10 levels during the cooling period in suburban areas within a radius of 3000 m reveals compelling insights.As the most predominant variable, soil texture (35.62%) plays a crucial role in shaping AQ dynamics.Following closely in importance in the second position, railway density (in m/km 2 ) (14.5%) significantly impacts PM 10 concentrations.
Roads land use area (9.23%) is the third highest in importance.Atmospheric conditions, represented by the monthly average wind speed (8.58%), and temperature (8.23%), are the fourth and fifth most influential variables, respectively.Urban parks have the least impact on PM 10 predictions, contributing only 1.9% (Fig. 7).Table 3 presents the performance statistics of the Conditional Inference Forest (CRF) models for the monthly average concentration of PM 10 at 148 AQ stations in suburban European landscapes.For suburban 1000 m, the RMSE is 4.83, the R-squared is 0.36, and the MAE is 3.48, reflecting a reasonable but not exceptionally accurate predictive performance.In comparison, suburban 3000 m exhibits slightly higher RMSE (5.01) and R-squared (0.39) values and a slightly increased MAE of 3.63.
The variable importance analysis conducted for suburban landscapes under two different scenarios during the heating period within 1000 m and 3000 m buffer zones is presented in Fig. 5. Within a 1000 m radius, temperature emerges as the most influential factor, with a significance of 17.06%.This is closely followed by total precipitation  at 12.24%.The area of built-up land use is the third in importance at 10.57.Conversely, water cover holds the least significance at 0.79% in affecting PM 10 levels during the heating period (Fig. 7).
Our analysis of geographical variables affecting monthly average PM 10 concentrations during the heating period in suburban landscapes within a 3000 m radius reveals the key influencing factors (Fig. 8).Temperature is the most significant variable, contributing 15.83% to PM 10 levels.Total precipitation follows at 13.09%, and soil texture is the third most important factor at 10.56%.In contrast, railway density (0.99%) and the industrial unit land use category (0.96%) have relatively minor impacts on PM 10 levels during the heating period in suburban areas within a 3000 m radius.
The model's performance evaluation for suburban landscapes within 1000 m and 3000 m during the heating period is summarized in Table 4.The R 2 values indicate a moderately good fit of the models to the observed data, with an R 2 of 0.52 for suburban 1000 m and an R 2 of 0.51 for suburban 3000 m.
During the cooling period in suburban landscapes, examining the 1000 m buffer zones reveals a significantly positive relationship between PM 10 concentration and both temperature and built-up areas, suggesting that higher temperatures and more developed areas are associated with higher PM 10 levels.In the 3000 m buffer zones, a positive significant relationship is found between PM 10 concentration and rail density, roads, and temperature (Table 5).In contrast, a negative significant correlation between PM 10 concentration and wind speed indicates that higher wind speeds correspond to lower PM 10 levels.
The PC analysis of the 1000 m buffer zones in suburban landscapes during the heating period shows a significantly negative correlation between PM 10 concentration and the variables of temperature and total precipitation, suggesting that higher temperatures and greater precipitation levels are associated with lower PM 10 concentrations.In contrast, a positive correlation exists between PM 10 concentration and the expansion of built-up areas, wind speed, mining, dumping and construction sites, and rail density.Similar results are observed in the 3000 m buffer zones, but with notable differences: a significant negative correlation is found between PM 10 concentration and both the urban park area and the distance to railways within the 3000 m radius surrounding the PM 10 emission monitoring stations (Table 6).

Impact of geographical variables on PM 10 concentration in urban landscapes
Our analysis of variable importance during the cooling period in urban areas within a 1000 m radius reveals that soil texture is the most influential variable, accounting for 20.75% of the importance.Roads are the second most significant at 11.77%, followed closely by temperature at 10.26%.When extending the analysis to a 3000 m radius, forests emerge as the most influential variable, with substantial importance of 15.44%, followed by soil texture at 12.83%.Other notable variables include vacant land (8.25%), 10 m wind speed (7.49%), and temperature (7.04%), highlighting the impact of open spaces and meteorological conditions on air quality.The least impactful variables are mining, dumping, and construction sites (3.9%) and road density (3.02%) (Fig. 9).
For the cooling period in the 1000 m urban buffer zones, the model validation results show an RMSE of 4.84, an R-squared value of 0.36, and an MAE of 3.58, indicating average to low accuracy in predicting PM 10 concentrations.In urban landscapes with 3000 m buffer zones, a slightly lower RMSE of 4.63 suggests improved predictive accuracy in this larger buffer zone.The R-squared value of 0.41 indicates a moderate fit of the model to the observed data, while a reduced MAE of 3.35 signifies a decrease in the average absolute errors compared to the 1000 m buffer zones (Table 7).
During the heating period in urban areas within a 1000 m radius, temperature is the most influential factor, with an importance of 26.33%.Wind speed follows closely, contributing 20.43%, and total precipitation accounts for 12.76%.The least influential variables are water cover (0.89%), mining, dumping, and construction sites (0.86%) indicating their limited impact on PM 10 concentrations during the heating period in these urban landscapes (Fig. 6).
During the heating period in urban areas within a 3000 m buffer zone, temperature is the most significant factor (24.02%), followed by wind speed (21.28%) and total precipitation (12.79%).In contrast, surface water, mining, dumping and construction sites, and railway density are less influential, with each contributing less than 1% (Fig. 10).
The model validation outcomes for the heating period within the 1000 m buffer zones indicate an RMSE of 6.83, an R-squared value of 0.57, and an MAE of 4.92.Transitioning to urban 3000 m, the model shows improved performance with an RMSE of 6.64, an R-squared value of 0.61, and an MAE of 4.7 (Table 8).
The partial correlation analysis for the 1000 m buffer zones during the cooling period shows significant positive correlations between PM 10 concentration and both road land use area and temperature, while forest areas and total precipitation are significantly negatively correlated with PM 10 (Table 9).In the 3000 m buffer zones, PM 10 concentration is significantly negatively correlated with forests, wind speed, and total precipitation, but positively correlated with temperature and vacant lands.
During the heating period in urban landscapes, PM 10 concentration is significantly negatively correlated with temperature, wind speed, and total precipitation.Similar patterns are observed in the 3000 m buffer zones, where PM 10 concentration also shows significant negative correlations with temperature, wind speed, and total precipitation (Table 10).Additionally, there is a significant positive correlation between PM 10 concentration and arable land.

Discussion
Temperature and total precipitation are the most important factors during the heating period in suburban landscapes' buffer zones, reflecting the crucial role of weather conditions in determining PM 10 concentrations.We confirmed the results of [11] during the heating period, with the temperature decline causing an increase in PM 10 in the air surrounding urban areas built up due to the increase in heating intensity.Furthermore, colder temperatures and stable air conditions exacerbate particle pollution, which has been proven by [30,70,96].The dust-fixing capacity of precipitation is also well supported by previous studies [65,76].
The negative correlation between PM 10 concentration and total precipitation suggests that precipitation may help to clean the atmosphere of PM 10 particles by scavenging them from the air [76].Higher temperatures, stronger winds, and increased precipitation are associated with lower PM 10 concentrations, probably due to improved dispersion and removal of pollutants, because in the winter months, pollutants resulting from natural and anthropogenic sources are trapped in the boundary layer due to frequent temperature inversions [59].During the winter, the atmospheric conditions are different; lower average wind speeds, lower temperatures, and a lack of precipitation reduce surface vertical mixing, resulting in limited dilution and dispersion [11,22].However [91], the negative correlations between PM 10 and wind speed and precipitation become weaker during warm seasons, probably due to secondary aerosol formation and enhanced soil dust resuspension.In summer, PM 10 production is mainly related to secondary inorganic production [15].On the contrary, [10] found that the temperature was statistically nonsignificant in PM 10 concentrations, while the wind speed was negatively correlated with PM 10 , similar to the studies [11, 25,65].However, this result contradicts the findings of [76] found that annual PM 10 emissions decreased from 1958 to 2018 and that this trend was significantly associated with the decrease in wind speed.
Based on our results, there is the same significant positive correlation between temperature and PM 10 concentration in both urban and suburban areas.During the warmer months (cooling period), an increase in temperature leads to increased PM 10 pollution.On the contrary, during the colder months (heating period), there is a negative correlation, with decreasing temperatures causing higher PM 10 levels.These temporary changes are observed because, in the warm months, higher temperatures can enhance PM 10 formation, while in the cold months, increased heating activities in both urban and suburban areas lead to more PM 10 emissions.
In terms of the importance of variables show that roads are the main source of PM 10 pollution in summer (cooling period).At the same time, built-up areas are the main cause of accelerated PM 10 pollution, particularly in winter (heating period) [83].However, there is a negative correlation between road density and monthly average PM 10 concentration in suburban landscapes during both cooling and heating periods, consistent with our previous study [84].However, we did not find a significant correlation between road density and PM 10 values in urban landscapes.This might be due to several reasons, for instance, in urban areas where PM 10 sources are well known, such as industrial zones or built-up areas near air quality (AQ) immission measurement points, secondary winds generated by vehicles can help reduce PM 10 concentrations.This is supported by the cleaning effect of wind speed observed in our current study.Furthermore, in locations with significant industrial pollutants, the influence of the density of the road network on PM 10 concentration could be overshadowed.PM 10 levels could be more affected by the types and sizes of industrial estates or built-up areas within a 3000-m radius around the AQ point, rather than by the density of the road network [9,84].
On the other hand, suburban landscapes typically have a lower density network density of road networks compared to urban areas.This difference in transport infrastructure can lead to lower PM 10 concentrations from traffic sources in suburban areas, especially in European public transport systems.In addition, urban landscapes experience the heat island effect, where higher temperatures due to dense buildings and less vegetation can enhance PM 10 formation [34,67].Because of the urban heat island, the road network is often ventilation corridors of the cool winds from the rural areas into the city center.
The negative correlation found between PM 10 concentration and forests underscores the potential role of green spaces in mitigating air pollution by acting as filters, absorbing and trapping particulate matter.Therefore, designing a proper configuration of planted spaces that facilitate ventilation may be a more effective approach to achieving the PM purification function of vegetation [19,20].The high cleaning effect of forest land use, particularly during the vegetation period (approximately during the cooling season) has been proven by other studies [19,41,99].On the contrary, the positive correlation between PM 10 concentration and vacant land suggests that areas with more vacant land could exhibit higher PM 10 concentrations [76] [83].
The positive correlation between PM 10 concentration and arable land during this period can also be attributed to agricultural activities, such as emissions from neighboring agricultural land (e.g., burning crop straw in fields after harvest), tilling and harvesting, which generate PM 10 emissions and provide no vegetation cover during harvesting in the winter [75,79].Therefore, farming activities must be sustainable for the landscape and environmentally friendly [53].Additionally, the percentages of the categories of soil texture of sand, silt and clay are important factors that affect soil erosion and dust emission [27,71], particularly during the warm seasons, which aligns with our findings that identified soil texture as the most significant factor in predicting PM 10 levels in both suburban and urban areas during the cooling period.A rise in temperature causes an increase in PM 10 concentrations, as the topsoil dries out during the drier summer and the wind moves soil grains more easily due to the lack of soil moisture to compact [76].
A limitation of our study is the missing of certain AQ stations not covered by the Urban Atlas 2018 land cover map along with the lack of data on traffic density.Additionally, the moderate precision reflected by R-squared values between 0.36 and 0.61 is comparable to some prior studies related to ours [64,90,101].Nonetheless, these values highlight the necessity for enhancements in future research.We will analyze landscape metrics around AQ stations within target buffer zones and evaluate the impact of landscape patterns on PM 10 quality.Following this, we will develop a highly accurate PM 10 prediction model incorporating only the geographical variables from our current study and the landscape metrics that demonstrate a significant correlation with PM 10 concentrations.

Conclusions
According to our research findings the geographical (soil, climatic, etc.) variables during and outside the heating season affect PM 10 concentrations with fundamentally different weights and often with different signs (e.g., monthly average temperature and road density).Based on our results it is possible to predict PM 10 concentrations after separating the heating and non-heating periods, consider with different input geographical variables within these periods.Our research makes an essential contribution to the estimation of PM 10 pollution in areas where soil, climatic and land use data are available, but the network of PM 10 immission monitoring stations is very sparse.This research provides valuable information to policymakers, urban planners, and environmental scientists for formulating targeted strategies to predict and mitigate the impact of PM 10 pollution.

Appendix
See Tables 8, 9, 10 and Figs. 7, 8, 9, 10.Total precipitation mm The accumulated liquid and frozen water, comprising rain and snow, falls to the Earth's surface.It is the sum of large-scale precipitation and convective precipitation.Large-scale precipitation is generated by the cloud scheme in the ECMWF Integrated Forecasting System (IFS).This parameter does not include fog, dew, or precipitation that evaporates in the atmosphere before lands on the surface of the Earth.Units of this parameter are the depth in millimeters of equivalent water

Fig. 1
Fig. 1 Distribution of AQ monitoring stations in the study area (urban, suburban)

Fig. 3
Fig. 3 Flowchart of the datasets used for analysis.MA monthly average

Fig. 4
Fig. 4 Flowchart of the research process

Fig. 6
Fig. 6 Importance of variables based on the conditional inference regression random forest model in urban landscapes.Cooling period (1000 m and 3000 m buffer zones) and heating period (1000 m and 3000 m buffer zones)

Fig. 7
Fig. 7 Conditional inference forest results for suburban landscapes during the cooling period: A 1000 m distance from AQ station points and B 3000 m distance from AQ station points.Grey shaded variable with importance < 5%

Fig. 8 Fig. 9
Fig. 8 Conditional inference forest results for suburban landscapes during the heating period: A 1000 m distance from AQ station points and B 3000 m distance from AQ station points.Grey shaded variable with importance < 5%

Fig. 10
Fig. 10 Conditional Inference Forest results for urban landscapes during the heating period: A 1000 m distance from AQ station points and B 3000 m distance from AQ station points.Grey-shaded variable with importance < 5%

Table 1
[83] (aggregated) land cover categories based on the 2018 Urban Atlas LULC map[26], which are significantly correlated with the monthly average concentrations of PM 10[83]

Table 2
PC results between PM 10 concentration and selected variables using Conditional Inference Forest analysis in suburban landscapes during the cooling period; A) 1000 m buffer zones, B) 3000 m buffer zones * p<0.05, positive significant relationship *p<0.05,negative significant relationship ** p<0.01, positive significant relationship * * p<0.01, negative significant relationship *** p<0.001, positive significant relationship *** p<0.001, negative significant relationship p>0.05, no effect was observed.

Table 3
Partial correlation results between PM 10 concentration and selected variables using Conditional Inference Forest analysis in suburban landscapes during the heating period; C) 1000 m buffer zones, D) 3000 m buffer zones

Table 4
Conditional Inference Forest performance statistics for PM 10 AQ monitoring sites in European urban landscapes during the cooling period

Table 5
Conditional Inference Forest performance statistics for PM 10 AQ monitoring sites in European urban landscapes during the heating period

Table 6
Partial correlation results between PM 10 concentration and selected variables using Conditional Inference Forest analysis in urban landscapes during the cooling period

Table 7
Partial correlation results between PM 10 concentration and selected variables using Conditional Inference Forest analysis in urban landscapes during the heating period

Table 8
The main meteorological variables used in this study were downloaded from the Climate Data Store, published onApril 18,  2019The horizontal speed of the wind, or movement of air, at a height of 10 m above the surface of the Earth.The units of this parameter are meters per second2 m temperature ° CThe temperature of air 2 m above the surface of land, sea or inland waters.The 2 m temperature is calculated by interpolating between the lowest model level and the Earth's surface, taking into account the atmospheric conditions.This parameter has units of kelvin (K).The temperature measured in kelvin can be converted to degrees Celsius (°C) by subtracting 273.15Mean sea level pressure Pa Pressure (force per unit area) of the atmosphere on the surface of the Earth is adjusted to the height of the mean sea level.It is a measure of the weight that all of the air in a column vertically above a point on the Earth's surface would have if the point were located at mean sea level.It is calculated on all surfaces, land, sea, and inland water.The contours of the mean sea level pressure also indicate the strength of the wind.The tight contours show stronger winds.The units of this parameter are paces (Pa)

Table 9
Conditional inference forest performance statistics for PM 10 AQ monitoring sites in European suburban landscapes during the cooling period

Table 10
Conditional inference forest performance statistics for PM 10 AQ Monitoring Sites in European suburban Landscapes during the heating period