High-Resolution Digital Soil Maps of Forest Soil Nitrogen across South Korea Using Three Machine Learning Algorithms

: Reliable estimation of the forest soil nitrogen spatial distribution is necessary for effective forest ecosystem management. This study aimed to develop high-resolution digital soil maps of forest soil nitrogen across South Korea using three powerful machine learning methods to better understand the spatial variations of forest soil nitrogen and its environmental drivers. To achieve this, the study used national-level forest soil nitrogen data and environmental data to construct various geographic and environmental variables including geological, topographic, and vegetation factors for digital soil mapping. The results show that of the machine learning methods, the random forest model had the best performance at predicting total soil nitrogen in the A and B horizons, closely followed by the extreme gradient-boosting model. The most critical predictors were found to be geographic variables, quantitatively conﬁrming the signiﬁcant role of spatial autocorrelation in predicting soil nitrogen. The digital soil maps revealed that areas with high elevation, concave slopes, and deciduous forests had high nitrogen contents. This ﬁnding highlights the potential usefulness of digital soil maps in supporting forest management decision-making and identifying the environmental drivers of forest soil nitrogen distribution.


Introduction
Nitrogen is one of the most important nutrients for the primary productivity of forest ecosystems [1]. During plant growth, nitrogen is required in large amounts, and when nitrogen availability becomes limited, it can affect the growth and productivity of plants, leading to reduced forest biomass and changes in species composition. This, in turn, can have cascading effects on forest-dependent organisms. Thus, nitrogen limitation can affect forest productivity, biodiversity, and community composition [2,3]. Soils in forest ecosystems contain large quantities of nitrogen [4], in a shallow layer supporting uncountable flora. Therefore, understanding the dynamics of soil nitrogen in forest ecosystems is crucial in maintaining forest health and productivity.
Reliable estimation of the forest soil nitrogen spatial distribution and site characteristics such as soil nitrogen availability is essential for effective forest ecosystem management [5]. Most forests in South Korea are located in mountainous areas, which present a heterogeneous environment [6], where the spatial variability of the forest soil is relatively high. Information on the spatial distribution of forest soils can be obtained from existing forest soil maps. However, polygon-based soil maps cannot describe the continuous local variations of forest soil properties. Additionally, updating forest soil maps based on past field works is expensive, and time-consuming, requiring many soil mappers.
A convenient and low-cost "predictive soil mapping" or "digital soil mapping (DSM)" methodology, i.e., the development of a numerical or statistical model of the relationship between soil properties and environmental predictors, is desirable. A key advantage of DSM is that the characteristics of soils at "unknown" locations can be predicted quantitatively onto a digital soil map, albeit with uncertainty, using the relationship between soil and environmental predictors at "known" locations under similar environmental conditions. Digital soil mapping has been introduced to estimate the distribution of soil properties across local, regional, and large-scale areas. This advanced tool could allow forest ecologists to understand spatial forest soil variations with greater insight.
Natural ecological processes influence the spatial distribution of forest soil nitrogen. The relationship between the nitrogen content of forest soils and the surrounding environmental variables is complex and dynamic, with multiple factors influencing its availability and its cycle. One important environmental variable that can affect soil nitrogen content is climate. A study by Giardina et al. [7] found that nitrogen availability in tropical forest soils was strongly influenced by temperature and precipitation, with warmer and wetter conditions leading to higher rates of nitrogen mineralization and uptake. Vegetation also affects forest soil nitrogen contents. Nave et al. [8] studied the changes to nitrogen content that occur during forest succession. In addition to climate and vegetation, other environmental variables, such as land use history, topography, and plant community composition, can also impact soil nitrogen contents. Zhu et al. [9] found that land use changes from forest to agricultural or forest to urban areas resulted in significant decreases in soil nitrogen contents. Organic matter, in particular, is directly related to forest soil nitrogen content.
In DSM, a number of environmental variables have been employed to map soil properties in forest ecosystems. Vegetation indices derived from satellite data have been used to assess spatial patterns of forest soil nitrogen. The spatial distribution of nitrogen in forest soils is also related to the topographical position, where soil nitrogen content tends to increase downslope [10]. Indeed, a lot of research has tried to explain the spatial variability of soil nitrogen using topographical variables. Taken together, it is clear that in order to understand the distribution of nitrogen in forest soils, it is crucial to consider a variety of environmental factors, such as geology, topography, and vegetation. Examining the importance of these factors can give deeper insights into the intricate workings of the nitrogen cycle in the forest ecosystem.
A variety of DSM techniques have been used to predict soil nitrogen distribution. Machine learning methods, such as support vector regression [11], random forest [12], and boosted regression trees [13], have shown especially good performance in forest areas with complex environments. Despite this, we still lack the knowledge to select the best models for predicting forest soil nutrients. In particular, there have been few studies using deep learning techniques for predicting soil nutrients in forest ecosystems. Therefore, evaluating the potential of deep learning and comparing it with other machine learning techniques would be worthwhile.
Many studies have used regression kriging as a DSM tool to account for soil spatial autocorrelation [14]. However, a novel approach has emerged that integrates the spatial autocorrelation of soil into the prediction model. This method utilizes geographic explanatory variables, producing results similar to those obtained through the regression kriging [15,16]. Geographic variables indicate the proximity or connectivity between the soil samples being predicted and can be employed as explanatory variables. With the incorporation of these geographic variables, spatial autocorrelation between soil samples can be taken into account, even when non-spatial prediction techniques such as machine learning are used, possibly enhancing the accuracy of soil nitrogen prediction models.
Access to comprehensive soil information at a national scale could be highly beneficial in supporting decision-making regarding forest management. Despite the need for such information, there have been few national digital soil maps created specifically for forest soil nitrogen [11]. Previous studies on forest soil nitrogen utilizing DSM methods were conducted mainly on the local and regional scales, with spatial ranges of approximately 2-3000 km 2 [5,[17][18][19]. To address this gap, this study aims to develop high-resolution digital soil maps of forest soil nitrogen at a national level in South Korea with a view to revealing environmental drivers that contribute to the distribution of forest soil nitrogen. The primary objectives were to (1) predict forest soil nitrogen contents in the A and B horizons using three powerful machine learning methods, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Deep Neural Networks (DNNs); (2) explore the significant role played by environmental and geographic variables in determining variations in total soil nitrogen; and (3) describe and interpret spatial variations of forest soil nitrogen in relation to important environmental variables.

Environmental Information of Research Area
South Korea, located on the southern half of the Korean Peninsula, experiences a climate influenced by both the Asian monsoon and continental climate, with cooler temperatures in the northern inland areas and milder conditions in the southern and coastal areas. Precipitation is typically above 1000 mm annually in most regions and rarely less than 750 mm. The Korean Peninsula is characterized by mountains and hills, with the Baekdudaegan mountain range traditionally regarded as the backbone of the peninsula ( Figure 1). This range starts from Mt. Baekdu and ends at Mt. Jiri and is an important ecological axis and a key forest area on the Korean Peninsula. South Korea has complex and diverse geology, with metamorphic rocks concentrated in the northwest region, sedimentary rocks mainly found in the southeast region, and limestone distributed between these two regions. Igneous rocks, particularly granite, are scattered across the country, while volcanic rocks are found in Jeju-do and northwest Gangwon-do. The distribution of limestone and basalt geology is significant for soil formation in South Korea. Forests cover 64.2% of the Korean Peninsula, with the forest types mainly composed of coniferous forests (36.9%, 2020), broad-leaved forests (31.8%), and mixed forests (26.45%). Over time, coniferous forests have decreased while broad-leaved forests have increased, and mixed forests have remained stable. Access to comprehensive soil information at a national scale could be highly beneficial in supporting decision-making regarding forest management. Despite the need for such information, there have been few national digital soil maps created specifically for forest soil nitrogen [11]. Previous studies on forest soil nitrogen utilizing DSM methods were conducted mainly on the local and regional scales, with spatial ranges of approximately 2-3000 km 2 [5,[17][18][19]. To address this gap, this study aims to develop high-resolution digital soil maps of forest soil nitrogen at a national level in South Korea with a view to revealing environmental drivers that contribute to the distribution of forest soil nitrogen. The primary objectives were to (1) predict forest soil nitrogen contents in the A and B horizons using three powerful machine learning methods, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Deep Neural Networks (DNNs); (2) explore the significant role played by environmental and geographic variables in determining variations in total soil nitrogen; and (3) describe and interpret spatial variations of forest soil nitrogen in relation to important environmental variables.

Environmental Information of Research Area
South Korea, located on the southern half of the Korean Peninsula, experiences a climate influenced by both the Asian monsoon and continental climate, with cooler temperatures in the northern inland areas and milder conditions in the southern and coastal areas. Precipitation is typically above 1000 mm annually in most regions and rarely less than 750 mm. The Korean Peninsula is characterized by mountains and hills, with the Baekdudaegan mountain range traditionally regarded as the backbone of the peninsula ( Figure 1). This range starts from Mt. Baekdu and ends at Mt. Jiri and is an important ecological axis and a key forest area on the Korean Peninsula. South Korea has complex and diverse geology, with metamorphic rocks concentrated in the northwest region, sedimentary rocks mainly found in the southeast region, and limestone distributed between these two regions. Igneous rocks, particularly granite, are scattered across the country, while volcanic rocks are found in Jeju-do and northwest Gangwon-do. The distribution of limestone and basalt geology is significant for soil formation in South Korea. Forests cover 64.2% of the Korean Peninsula, with the forest types mainly composed of coniferous forests (36.9%, 2020), broad-leaved forests (31.8%), and mixed forests (26.45%). Over time, coniferous forests have decreased while broad-leaved forests have increased, and mixed forests have remained stable.  South Korea's forest soil classification system is unique and incompatible with international standards. The most common types of forest soils in South Korea are brown forest soils, dark red forest soils, and volcanic ash forest soils. Brown forest soils are formed in the general geology of South Korea, such as gneiss and granite. These soils can be classified into Cambisols (mountain tops or ridges), Umbrisols (backslopes), and Acrisols (low-altitude forests) in the FAO/WRB system [20]. Dark red forest soils are formed in limestone geological environments, typically classified as Luvisol [20]. Volcanic ash forest soils are formed in volcanic rocks such as basalt, which is similar to Andosol in the international soil class [20]. However, many of these soils could be classified as immature soil, such as Inceptisols or Entisols in the USDA soil taxonomy, due to South Korea's rapid afforestation [21]. Consequently, converting South Korea's forest soil classification system to the international one is challenging, requiring further research to clarify the compatibility of South Korea's forest soil classification. Based on the data obtained from the forest soil survey (refer to Section 2.2.1), the mean depth of the A horizon in South Korea's forest soil is 8.31 cm, with a corresponding standard deviation of 7.48 cm. The mean depth and standard deviation of the B horizon in the forest soil are 44.79 cm and 18.45 cm, respectively.

Materials for Digital Soil Mapping
In order to conduct this study's DSM, spatial data on forest soil nitrogen was obtained by analyzing soil samples and taking soil environmental and geographic variables into account.

Geospatial Data on Forest Soil Nitrogen
The total nitrogen contents in forest soil data were generated by combining the soil sample analysis data from the 1:5000 national forest soil map and the 7th National Forest Inventory and Forest Health Monitoring (NFI7·FHM) survey.
The national forest soil map, which provides information on the total nitrogen content in the A and B horizons, includes data from field surveys and experiments conducted from 2009 to 2021. The soil sample analysis data used in this study were composed of 5457 samples from the A horizon and 6072 samples from the B horizon ( Figure 2). South Korea's forest soil classification system is unique and incompatible with international standards. The most common types of forest soils in South Korea are brown forest soils, dark red forest soils, and volcanic ash forest soils. Brown forest soils are formed in the general geology of South Korea, such as gneiss and granite. These soils can be classified into Cambisols (mountain tops or ridges), Umbrisols (backslopes), and Acrisols (lowaltitude forests) in the FAO/WRB system [20]. Dark red forest soils are formed in limestone geological environments, typically classified as Luvisol [20]. Volcanic ash forest soils are formed in volcanic rocks such as basalt, which is similar to Andosol in the international soil class [20]. However, many of these soils could be classified as immature soil, such as Inceptisols or Entisols in the USDA soil taxonomy, due to South Korea's rapid afforestation [21]. Consequently, converting South Korea's forest soil classification system to the international one is challenging, requiring further research to clarify the compatibility of South Korea's forest soil classification. Based on the data obtained from the forest soil survey (refer to Section 2.2.1), the mean depth of the A horizon in South Korea's forest soil is 8.31 cm, with a corresponding standard deviation of 7.48 cm. The mean depth and standard deviation of the B horizon in the forest soil are 44.79 cm and 18.45 cm, respectively.

Materials for Digital Soil Mapping
In order to conduct this study's DSM, spatial data on forest soil nitrogen was obtained by analyzing soil samples and taking soil environmental and geographic variables into account.

Geospatial Data on Forest Soil Nitrogen
The total nitrogen contents in forest soil data were generated by combining the soil sample analysis data from the 1:5000 national forest soil map and the 7th National Forest Inventory and Forest Health Monitoring (NFI7‧FHM) survey.
The national forest soil map, which provides information on the total nitrogen content in the A and B horizons, includes data from field surveys and experiments conducted from 2009 to 2021. The soil sample analysis data used in this study were composed of 5457 samples from the A horizon and 6072 samples from the B horizon ( Figure 2).  The NFI7·FHM data are a survey conducted on a nationwide grid of sample points, evenly distributed throughout the country, to assess changes in forest resources in the health of forest ecosystems. Details of this data can be found in the work of Lee et al. [22]. The NFI7·FHM soil properties data from 2016 to 2020 include soil texture (sand, silt, and clay content), organic carbon, acidity (pH), total nitrogen, and available phosphorus. To determine the soil nitrogen content, soil samples were collected and analyzed at 10 cm depth intervals between 0 and 30 cm. In this study, the nitrogen content from the 0-10 cm depth was used as the A horizon data. A total of 968 samples from the NFI7·FHM data were used. Figure 2 illustrates the spatial distribution of total nitrogen contents in the forest soil for both the A and B horizons. The soil data in both horizons were somewhat spatially biased, but the A horizon data were relatively more uniform as it integrated NFI data collected from standard, evenly distributed points. It was observed that the areas with higher elevations and those located in the northeastern regions of South Korea tended to have higher soil nitrogen contents in both horizons.

Environmental and Geographic Variables
In DSM, it is crucial to identify and utilize the environmental correlations with soil properties. To accomplish this, the environmental and geographic variables suitable for South Korean forests were identified and constructed. The soil varies spatially and temporally depending on soil formation factors, such as climate, parent material, relief (topography), organisms, and time, so a variety of environmental variables are necessary.

• Parent material variables
Parent material data were obtained from the digital geological map (1:50,000) provided by the Korea Institute of Geoscience and Mineral Resources and converted into variables. The rock types were classified into 16 lithological units (Table 1). Since parent material is a categorical variable, it was transformed into dummy variables to be incorporated into the model. • Topographic variables The topographic variables used in this study were derived from a digital elevation model (DEM) produced by the National Geographic Information Institute, which originally had a resolution of 10 m. However, the model was resampled to a resolution of 30 m to reduce errors that may have been introduced during the DEM construction process, as recommended by previous studies [23]. From this, the topographic variables (Table 2) were constructed using SAGA GIS [24]. The hillslope position variables were obtained using a hillslope position classification technique based on the concept of catena proposed by Shim and Park [25]. This method classifies the hillslope position based on the upslope  as recommended by previous studies [23]. From this, the topographic variables (Table 2) were constructed using SAGA GIS [24]. The hillslope position variables were obtained using a hillslope position classification technique based on the concept of catena proposed by Shim and Park [25]. This method classifies the hillslope position based on the upslope contributing area, which reflects the soil formation process and the movement of influencing substances (Figure 3).  The vegetation variables used in this study were obtained from the Korean Forest Service's forest map (1:5000). The forest map includes information on forest type, age class, diameter class, and crown density. In this study, these data were transformed into dummy variables for use in the analysis (Table 3). •

Geographic variables
To account for spatial autocorrelation in the forest soil information, geographic variables were used as per Behrens et al. [15]. The coordinates (X, Y) of the forest soil information and the Euclidean distances from five reference points-center, upper left, lower

• Vegetation variables
The vegetation variables used in this study were obtained from the Korean Forest Service's forest map (1:5000). The forest map includes information on forest type, age class, diameter class, and crown density. In this study, these data were transformed into dummy variables for use in the analysis (Table 3).

• Geographic variables
To account for spatial autocorrelation in the forest soil information, geographic variables were used as per Behrens et al. [15]. The coordinates (X, Y) of the forest soil information and the Euclidean distances from five reference points-center, upper left, lower left, upper right, and lower right-were calculated and incorporated into the predictive model (Table 3).

Digital Soil Mapping Methods
The advancement of artificial intelligence has led to the widespread use of machine learning techniques such as support vector regression and back propagation neural network to predict soil properties. In this study, RF, XGBoost, and DNN techniques were utilized to predict the distribution of forest soil nitrogen.

Random Forest
Tree-based algorithms are commonly used in the DSM literature because they consist of nodes and leaves that partition the training dataset to maximize within-node homogeneity and between-node heterogeneity. The RF algorithm is based on decision trees proposed by Breiman [26] that combines multiple decision trees (learners) with different structures and performance to create an ensemble model. The RF method can represent the complex relationship between environmental factors and soil properties well and is widely used in the field of DSM [27].
The "caret" and "ranger" packages in R were used for RF model tuning. For RF modeling, there are three parameters, including the number of regression trees (ntree), minimum size of terminal nodes (min_node_size), and the number of randomly selected predictors (mtry). A grid search was used to evaluate the combinations of ntree (500), min_node_size (5), and mtry (21, 33, and 64) through the "caret" package.

Extreme Gradient Boosting
The XGBoost method is based on a tree-based algorithm used for regression and classification predictions. It utilizes a gradient-boosting framework to create a strong learner from multiple weak learners and employs many decision trees to make predictions. What sets XGBoost apart is the way it constructs new decision trees based on the prediction errors of the previous tree model, with the aim of minimizing the prediction error of the final prediction. This results in a final prediction that is an ensemble of multiple decision trees. The XGBoost algorithm is highly effective and widely used due to its ability to work with a weak set of predictions as an ensemble model [28,29].
The XGBoost method allows for model utilization tailored to the characteristics of the data and the purpose of the model through parameter tuning. The parameters can be broadly divided into three categories: General parameters can set the overall functionality of the model, including the type of booster and the number of threads used for parallel processing. Booster parameters control the settings of the tree boosters used in the model, allowing for control of the learning rate, tree depth, number of nodes per tree, and other aspects of the tree's structure. Learning task parameters can be set to optimize the model's performance, including the objective function and various settings depending on the data structure and the desired output variable to be extracted.

Deep Neural Networks
Deep Neural Networks were developed based on shallow neural networks, such as the Multilayer Perceptron [30]. Shallow neural networks typically only have one hidden layer. In contrast, DNN models often utilize multiple hidden layers between the input and output layers, giving them the ability to learn more complex non-linear relationships. This makes DNNs highly effective in handling complicated data and patterns. Some DSM studies have shown more accurate prediction results using DNNs than when using the RF method, a representative machine learning model, and DNNs show high potential for soil prediction [31][32][33].
This study constructed 18 models by training with a DNN architecture consisting of 3 hidden layers between the input and output layers. Among the model parameters, the number of units in the first and third hidden layers (3 options-32, 64, and 96) and the number of units in the second hidden layer (3 options-96, 128, and 256) were varied along with dropout (2 options-0.1 and 0.3) during training. The best model was selected from these 18 models. The rectified linear unit (ReLU) was used as the activation function to consider non-linearity. The batch size, which is the small sample size in which the entire data are divided to create small pieces for processing in the DNN model, was set to 128, and the number of iterations performed by the batch size, called the epoch, was set to 300. The RMSProp algorithm was used as the optimizer for model optimization, and the learning rate was set to the default value of 0.001. The "keras" and "tftuns" packages in R were used for DNN model analysis and training.

Model Validation and Uncertainty
To evaluate and train the models, 70% of the soil samples was used as training data and 30% as validation data. In the training data, three repetitions and 2-fold cross-validation were performed for the model tuning. Independent (external) validation was conducted using the 30% validation data. The RMSE and R 2 statistics were compared to assess model performance. To quantify the spatial prediction uncertainty, the soil training data were randomly split into two datasets using the resampling method. The first dataset, which accounted for 70% of the training data, was used to calibrate the model parameters by 3 repeated 2-fold cross-validation. This process was repeated 10 times, generating 10 calibrated models. The standard deviation of the cell-wise predictions from the 10 models can be used to estimate the spatial uncertainty.

Description of Soil Nitrogen Data
The majority of the soil nitrogen data in the A and B horizons exhibited values within a range of 0%-1% (Figure 4), with only a few instances where they extended up to 4.8% (A horizon) or 2.6% (B horizon). The mean total nitrogen content of the A horizon was 0.22%, while that of the B horizon was 0.09%. The standard deviation was either similar to the mean for the A horizon or greater than the mean for the B horizon. Consequently, the coefficient of variation of the A horizon was 80.21%, while that of the B horizon was 125.88%. This indicated that the total nitrogen content values in mountainous soils on the Korean Peninsula exhibit high variance. Moreover, the results of the t-test indicate that there was a statistically significant difference in the distribution and variability of soil nitrogen between the A horizon and the B horizon ( Figure 5).

Description of Soil Nitrogen Data
The majority of the soil nitrogen data in the A and B horizons exhibited values within a range of 0%-1% (Figure 4), with only a few instances where they extended up to 4.8% (A horizon) or 2.6% (B horizon). The mean total nitrogen content of the A horizon was 0.22%, while that of the B horizon was 0.09%. The standard deviation was either similar to the mean for the A horizon or greater than the mean for the B horizon. Consequently, the coefficient of variation of the A horizon was 80.21%, while that of the B horizon was 125.88%. This indicated that the total nitrogen content values in mountainous soils on the Korean Peninsula exhibit high variance. Moreover, the results of the t-test indicate that there was a statistically significant difference in the distribution and variability of soil nitrogen between the A horizon and the B horizon ( Figure 5).    Table 4 displays the results of the accuracy evaluation of three machine learning methodologies used to predict the forest soil nitrogen distributions. The RMSE value for the A horizon ranged from 0.11 to 0.12, and the R 2 value was between 0.13 and 0.27. The RF algorithm demonstrated the best performance, followed by XGBoost, while DNN produced relatively poor results. It is worth noting that the RF and XGBoost methods produced similar results. Concerning the B horizon, the RMSE was between 0.054 and 0.057, and the R 2 value ranged from 0.37 to 0.45. The RF algorithm outperformed the others, with results comparable to those of the A horizon. Thus, there was little discernible difference between the performance of the RF and XGBoost methods, but the DNN model produced lower accuracy for both A and B horizon data.  Table 4 displays the results of the accuracy evaluation of three machine learning methodologies used to predict the forest soil nitrogen distributions. The RMSE value for the A horizon ranged from 0.11 to 0.12, and the R 2 value was between 0.13 and 0.27. The RF algorithm demonstrated the best performance, followed by XGBoost, while DNN produced relatively poor results. It is worth noting that the RF and XGBoost methods produced similar results. Concerning the B horizon, the RMSE was between 0.054 and 0.057, and the R 2 value ranged from 0.37 to 0.45. The RF algorithm outperformed the others, with results comparable to those of the A horizon. Thus, there was little discernible difference between the performance of the RF and XGBoost methods, but the DNN model produced lower accuracy for both A and B horizon data. As the RF method demonstrated superior performance in predicting the soil nitrogen contents of both horizons, a variable importance analysis was conducted for each horizon to identify the key variables used in the RF algorithm (Figures 6 and 7). For the A horizon, the most important predictors were found to be geographic variables, such as coords.x2, DISTC4, and DISTC1. Elevation (ELEV) showed relatively high variable importance, followed by terrain openness (POPEN) and curvature (TPI05). Furthermore, the nitrogen content in the A horizon was affected by factors such as andesite as the parent material (GTYPE4), the broad-leaved forest type (FTYPE2), and the toeslope topographical position (CATENA5). As the RF method demonstrated superior performance in predicting the soil nitrogen contents of both horizons, a variable importance analysis was conducted for each horizon to identify the key variables used in the RF algorithm (Figures 6 and 7). For the A horizon, the most important predictors were found to be geographic variables, such as coords.x2, DISTC4, and DISTC1. Elevation (ELEV) showed relatively high variable importance, followed by terrain openness (POPEN) and curvature (TPI05). Furthermore, the nitrogen content in the A horizon was affected by factors such as andesite as the parent material (GTYPE4), the broad-leaved forest type (FTYPE2), and the toeslope topographical position (CATENA5).

Selecting the Optimal Prediction Machine Learning Methods
Similarly, the variable importance analysis revealed that geographic variables, such as coords.x2, coords.x1, and DISTC2, played a crucial role in predicting the soil nitrogen content in the B horizon ( Figure 6). In addition, topographic variables such as terrain openness (POPEN), elevation (ELEV), and slope (SLOPE) were found to be critical predictors. Like the A horizon, the nitrogen content in the B horizon was associated with the broadleaved forest type (FTYPE2), but the coniferous forest type (FTYPE1) was also an important predictor. However, unlike the A horizon, the presence or absence of granite (GTYPE1) as the parent material was a key predictor of the nitrogen content in the B horizon. Figure 6. Importance of the top 20 variables in predicting the total soil nitrogen content of the A horizon using the random forest algorithm. Similarly, the variable importance analysis revealed that geographic variables, such as coords.x2, coords.x1, and DISTC2, played a crucial role in predicting the soil nitrogen content in the B horizon ( Figure 6). In addition, topographic variables such as terrain openness (POPEN), elevation (ELEV), and slope (SLOPE) were found to be critical predictors. Like the A horizon, the nitrogen content in the B horizon was associated with the broadleaved forest type (FTYPE2), but the coniferous forest type (FTYPE1) was also an important predictor. However, unlike the A horizon, the presence or absence of granite (GTYPE1) as the parent material was a key predictor of the nitrogen content in the B horizon.  Figure 8 displays the digital maps of total forest soil nitrogen for the A and B horizons, respectively, for South Korea. The nitrogen content in the A horizon exhibited high levels in mountainous areas, including the Baekdu Mountain Range and Yeongnam Alps in the eastern part of the Korean peninsula, the Jirisan Mountains in the southern part, and the Hallasan Mountains on Jeju Island. Additionally, areas, such as Inje-gun, Hongcheon-gun, Pyeongchang-gun, Yeongwol-gun, Taebaek-si, Bonghwa-gun, Cheongdo-gun, Gyeongju-si, MilYang-si, Yangsan-si, Ulju-gun, and Jeju Island, had high nitrogen contents in the A horizon. Elevation and convex slopes were also correlated with high nitrogen contents, with some exceptions such as in the Yeongnam Alps, Jirisan Mountains, and Jeju Island, where high nitrogen content was observed throughout the slope. The distribution of soil nitrogen content in the A horizon was generally greater in areas with andesite and high-elevation deciduous forests.

Digital Soil Maps for Total Nitrogen in Forest Soils
Similarly, the distribution of soil nitrogen content in the B horizon showed high nitrogen contents in mountainous areas but also was associated with areas, where granite parent material is present. Notably, there were high soil nitrogen contents in the southern part of Gangwon Province and the northern parts of Chungcheongbuk-do and Gyeongsangbuk-do, particularly along their boundary. This area included Pyeongchang-gun, Jeongseon-gun, Yeongwol-gun, Danyang-gun, Jecheon-si, Chungju-si, Goseong-gun, and Mungyeong-si. Jeju Island also had high nitrogen content in its B horizon soils. The variable importance analysis suggested that deciduous and coniferous forests have contributed significantly to the distribution of soil nitrogen contents in the B horizon, reflecting microspatial differences.
In Figure S3, the standard deviation of nitrogen contents in the A and B horizons is presented. It was observed that nitrogen content showed a higher standard deviation along the high ridges, the Baekdudaegan mountain range. The higher uncertainty for nitrogen was found in areas with higher nitrogen content values due to the soil data Figure 7. Importance of the top 20 variables in predicting the total soil nitrogen content of the B horizon using the random forest algorithm. Figure 8 displays the digital maps of total forest soil nitrogen for the A and B horizons, respectively, for South Korea. The nitrogen content in the A horizon exhibited high levels in mountainous areas, including the Baekdu Mountain Range and Yeongnam Alps in the eastern part of the Korean peninsula, the Jirisan Mountains in the southern part, and the Hallasan Mountains on Jeju Island. Additionally, areas, such as Inje-gun, Hongcheon-gun, Pyeongchang-gun, Yeongwol-gun, Taebaek-si, Bonghwa-gun, Cheongdo-gun, Gyeongju-si, MilYang-si, Yangsan-si, Ulju-gun, and Jeju Island, had high nitrogen contents in the A horizon. Elevation and convex slopes were also correlated with high nitrogen contents, with some exceptions such as in the Yeongnam Alps, Jirisan Mountains, and Jeju Island, where high nitrogen content was observed throughout the slope. The distribution of soil nitrogen content in the A horizon was generally greater in areas with andesite and high-elevation deciduous forests.

Digital Soil Maps for Total Nitrogen in Forest Soils
Similarly, the distribution of soil nitrogen content in the B horizon showed high nitrogen contents in mountainous areas but also was associated with areas, where granite parent material is present. Notably, there were high soil nitrogen contents in the southern part of Gangwon Province and the northern parts of Chungcheongbuk-do and Gyeongsangbuk-do, particularly along their boundary. This area included Pyeongchang-gun, Jeongseon-gun, Yeongwol-gun, Danyang-gun, Jecheon-si, Chungju-si, Goseong-gun, and Mungyeong-si. Jeju Island also had high nitrogen content in its B horizon soils. The variable importance analysis suggested that deciduous and coniferous forests have contributed significantly to the distribution of soil nitrogen contents in the B horizon, reflecting microspatial differences.
In Figure S3, the standard deviation of nitrogen contents in the A and B horizons is presented. It was observed that nitrogen content showed a higher standard deviation along the high ridges, the Baekdudaegan mountain range. The higher uncertainty for nitrogen was found in areas with higher nitrogen content values due to the soil data structure. When the soil data include a small proportion of extreme values, it becomes skewed, resulting in data sparsity and increased spatial uncertainty. Uncertainty can arise from data deficiencies, such as missing covariates and small or biased samples [34]. structure. When the soil data include a small proportion of extreme values, it becomes skewed, resulting in data sparsity and increased spatial uncertainty. Uncertainty can arise from data deficiencies, such as missing covariates and small or biased samples [34].  Table 5 presents the total nitrogen content (%) in soil data from 2009 to 2021, along with the corresponding data on the nitrogen content of South Korean forest soils from the 1984 to 1990 period. By comparing the current research data with the previously gathered data, we can validate the reliability of the research findings and observe changes in the nitrogen content of forest soil on the Korean Peninsula over the past two decades. For the A horizon, the average total nitrogen content of the research data was slightly higher than that of previous data, and the variability was also marginally greater. On the other hand, for the B horizon, the average nitrogen content of the research data was the same as the previous data, but the variability was relatively higher. This result demonstrated the consistency between past surveys and current research data. Moreover, the findings suggested that the total nitrogen content of the South Korean forest soils' surface (A) horizon has slightly increased over the past 20 years, indicating an improvement in ecological forest health. However, due to the high variance, this trend represents a relatively significant difference from period to period, necessitating more precise verification. The intensification of agricultural practices and the growth of large industrial operations have resulted in a significant increase in atmospheric nitrogen inputs in South Korea. Specifically, between 2005 and 2010, the annual average wet input of nitrogen ranged from 12.9 to 24.9 kg/ha-year, which is substantially higher than pre-industrial levels [36]. While this trend is decreasing recently [37], it is possible that the input of atmospheric nitrogen enrichment during this period has led to an increase in soil nitrogen in forest ecosystems. Further research is necessary to obtain a better understanding of the changes in soil nitrogen.   Table 5 presents the total nitrogen content (%) in soil data from 2009 to 2021, along with the corresponding data on the nitrogen content of South Korean forest soils from the 1984 to 1990 period. By comparing the current research data with the previously gathered data, we can validate the reliability of the research findings and observe changes in the nitrogen content of forest soil on the Korean Peninsula over the past two decades. For the A horizon, the average total nitrogen content of the research data was slightly higher than that of previous data, and the variability was also marginally greater. On the other hand, for the B horizon, the average nitrogen content of the research data was the same as the previous data, but the variability was relatively higher. This result demonstrated the consistency between past surveys and current research data. Moreover, the findings suggested that the total nitrogen content of the South Korean forest soils' surface (A) horizon has slightly increased over the past 20 years, indicating an improvement in ecological forest health. However, due to the high variance, this trend represents a relatively significant difference from period to period, necessitating more precise verification. The intensification of agricultural practices and the growth of large industrial operations have resulted in a significant increase in atmospheric nitrogen inputs in South Korea. Specifically, between 2005 and 2010, the annual average wet input of nitrogen ranged from 12.9 to 24.9 kg/hayear, which is substantially higher than pre-industrial levels [36]. While this trend is decreasing recently [37], it is possible that the input of atmospheric nitrogen enrichment during this period has led to an increase in soil nitrogen in forest ecosystems. Further research is necessary to obtain a better understanding of the changes in soil nitrogen.

Discussion
The RF method showed the best performance considering its RMSE and R 2 value, while the DNN method showed rather worse results, based on its RMSE. The RF and XGBoost methods were better models than the DNN in our results. Joharestani et al. [38] employed RF, XGBoost, and DNN algorithms to predict PM2.5 concentration, with XGBoost exhibiting the best performance and a reasonable training time. However, they found that all three machine learning techniques had similar performances. A study conducted by Kampichler et al. [39] compared five machine learning techniques based on modeling effort, modeling performance, method intricacy, and classifier comprehensibility. The authors suggested using RF due to its strong modeling performance and the simplicity of constructing and interpreting models. Our findings also supported the use of RF as a reliable method for predicting forest soil nitrogen, as it produced accurate model results. In terms of ease of use, the RF model can outperform the XGBoost due to a need for fewer parameter adjustments.
The results assigned high importance to several variables for predicting forest soil nitrogen in the A horizon (in descending order): geographic variables (coords.x2, DISTC4, and DISTC1), elevation (ELEV), topographic openness (POPEN), and curvature (TPI05). Thus, the importance of spatial autocorrelation in forest soil nitrogen prediction was quantitatively confirmed by including the geographic variables in our models. The universal model of soil variation proposes that complex spatial variation is composed of three primary components: (i) the deterministic component (a function of environmental variables), (ii) the spatially correlated component (a function of geographic variables), and (iii) pure noise (measurement error and short-range spatial variation) [40]. However, most machine learning algorithms take a non-spatial approach, ignoring the spatial locations of observations and thus failing to account for any spatially correlated components not captured by the environmental variables. This could result in biased predictions, even leading to systematic over-or under-prediction, particularly when the target variable exhibits a high spatial autocorrelation [16]. This study confirms that incorporating geographic variables into forest soil prediction models can potentially enhance their accuracy. Geographic variables could also be convenient for users because they can be directly used in machine learning models [41].
Higher values of soil nitrogen content were found at higher altitudes in this study. Jeong et al. [6] also found that forest soil nitrogen in the A horizon was correlated with elevation. Additionally, positive relationships between soil nitrogen and elevation were reported by Kunkel et al. [18], Peng et al. [19], and Wang et al. [42]. Elevation might be an indicator of soil temperatures and decomposition rates that result in nitrogen accumulations [43]. The high-altitude areas of South Korea are mostly forest lands partly designated as natural reserves, national parks, or military zones. In these areas, high vegetation density might indicate low levels of disturbance.
This study found a strong spatial autocorrelation in the spatial distribution of soil nitrogen contents in both horizons, and terrain openness and curvature variables were important predictors in both horizons too. Specifically, concave areas were predicted to have high soil nitrogen contents. Slope configuration affects the lateral movement of water and nutrients, which results in high spatial heterogeneity of soil [44,45]. In the research areas composed of deciduous forests, the nitrogen contents were generally higher than in other areas. A number of studies have found significant relationships between soil nitrogen and the normalized difference vegetation index, which are measures of aboveground biomass not used in this study. Areas dominated by forests with higher vegetation density and deciduous trees might have higher soil productivity and higher nitrogen fixation [46].

Conclusions
Forest ecosystems rely heavily on the availability of nitrogen in the soil, and having reliable estimates of its spatial distribution can be crucial for effective forest ecosystem management. Digital soil mapping technology provides a convenient and cost-effective method for predicting forest soil nutrients. However, there have been few instances of national digital soil maps developed specifically for forest soil nitrogen. Therefore, this study aimed to develop high-resolution digital soil maps of forest soil nitrogen across South Korea to better comprehend the spatial variation of forest soil nitrogen and its environmental drivers. To achieve this, we constructed more than 20 independent variables by examining the effects of environmental and geographic factors that determine changes in the total soil nitrogen content. We then used three machine learning models (RF, XGBoost, and DNN) to predict the nitrogen content in horizons A and B.
The total soil nitrogen content varied greatly in different regions, with higher levels and lower variability in the A horizon compared to the B horizon. The RF method performed the best in predicting nitrogen content in both horizons, followed by the XGBoost. The prediction of soil nitrogen content in the A horizon was strongly associated with various geographic variables. Our results indicated a strong spatial autocorrelation in the forest soil nitrogen data, and elevation, topographic openness, and curvature (TPI05) were significant predictors. For the B horizon, the key variables were slightly different but followed a similar overall trend. Geographic variables, along with topographic variables, were the most important predictors. Environmental variables related to deciduous and coniferous forest types and granite parent material also played a critical role in the B horizon prediction model.
The study produced digital soil maps of forest soil nitrogen for the A and B horizons, revealing a high soil nitrogen content in the northeastern mountainous areas with high elevations and concave slopes. Moreover, areas with deciduous forests showed high levels of soil nitrogen content in both horizons. The results of this study suggest that soil information based on digital soil maps would be useful in supporting forest management decision-making and could also provide insights into the environmental factors controlling forest soil nitrogen distribution.

Data Availability Statement:
The data presented in this study are available from the corresponding author upon request.

Conflicts of Interest:
We have no conflict of interest to disclose.