Development of the spectral forest index in the Khangai region, Mongolia using Sentinel-2 imagery

Abstract Mongolian forests have low productivity and growth and are vulnerable to disturbances. Additionally, it is difficult to control and evaluate the forested areas. Therefore, satellite data and surveillance methods are needed to study mountain forests. This study aimed to determine the changes in the main forest cover classes of Khangal soum using remote sensing and geographical information system datasets. A spectral forest index (SFI) using Sentinel-2 imagery was developed for forest cover estimations and applied to the study area during 2015–2020. The SFI was based on the forest index (FI) and the concept of Dark Objects. Each SFI was compared to existing vegetation indices (ratio vegetation index, normalized difference vegetation index, leaf area index, and forest index) for forest data analysis. The highest correlation was with SFI2. The SFI2 data agreed with the national forest inventory (NFI) 2018 data. The SFI2 of the forest area was set at 1.2, which was confirmed with 90.4% confidence. Overall, SFI2 is suitable for land cover/land use changes and forest classification, monitoring, and management in Mongolia and could be crucial for estimating the boundary of forested areas depending on the forest cover and species in the region.


Introduction
Mongolia has relatively low forest cover, with just over 8% of the country covered by closed forests, and the area of exploitable forests is between 5 and 8 million ha (Bank and Alliance 2002). A forest is an essential natural resource that maintains the ecological balance by regulating water regimes and promoting drought conservation (Dulamsuren et al. 2009). Mongolia is 1,564,116 km 2 (Erdenejav 2020), including 11.3 million ha of forest out of the total land area (Altrell 2019). The health of the stocked boreal forests is similar to that of the other natural boreal forests, and 71% of the stocked boreal forests are very healthy, with only 6.8% of the forest being damaged (Altrell 2019). The foreststeppe zone toward the south consists of a mosaic of forest and grassland patches. Due to topographical differences, there is a differentiation in the vegetation of the forests on the north-facing slopes and higher altitudes when compared to that of the steppes on the south slopes and valleys. The Siberian larch (Larix sibirica Ledeb.) is a dominant tree species in Mongolia (Jamsran 2004), covering 80% of the forested areas (Gunin et al. 1999). Larch forests are indispensable for energy exchange and climate regulation between the land surface and atmosphere (Kelliher et al. 1997).
Furthermore, larches in the northern hemisphere serve as large global carbon stocks (Euskirchen et al. 2006) and are critical natural resources for human welfare (Burton et al. 2003).
Mixed tree forests form the basis for plant research. However, forestry investigation fieldwork demands intensive physical labor, which is expensive and timeconsuming, especially for surveys in remote mountainous regions (Tseveen et al. 2020). Therefore, remote sensing (RS) techniques are effective for measuring and monitoring forests on a regional or global scale and in the research and development of environmental policies and plans (Enkhjargal et al. 2014). Satellitebased RS imagery has become an indispensable tool for forest monitoring and other numerous areas of scientific research. Therefore, in addition to providing information to satisfy the needs of forest managers, RS technology must also be cost-effective and easily understandable (Altangerel and Udval 2019). Currently, numerous researchers are using active and passive RS to monitor forest cover, forest type, forest degradation, and forest fires (Mitchell et al. 2017).
Many methods have been developed or used for detecting land cover and vegetation changes by RS technology. Furthermore, over recent decades, various methods have been proposed for the extraction of forest cover from remotely sensed imagery. Some of these methods are based on supervised classification techniques for generating forest/non-forest classification maps, such as maximum likelihood (Bayarsaikhan et al. 2009), object-based classification, and the spectral index (Weih and Riggan 2010). These VIs effectively distinguish vegetation from other non-vegetation land covers (such as water and impervious surfaces) but fail to differentiate between forest and non-forest vegetation (Ye et al. 2014). In addition, existing vegetation indices based on Sentinel-2 can easily detect vegetation, but forested areas could not be detected by VIs (Nyamjargal et al. 2020).
The forest index (FI) is derived from the green, red, and near-infrared (NIR) bands. An FI image can classify a map as forest or non-forest based on a threshold value. The scatter plot of the spectral space of the normalized difference vegetation index (NDVI) was found to be similar to that of the spectral space between the land surface temperature (LST) and NDVI (Yao et al. 2008). When compared to the latest Landsat Operational Land Imager (OLI)/Thermal Infrared Sensor, Sentinel-2 has a better spatial and spectral resolution in the NIR region, but it does not offer thermal data (Kaplan and Avdan 2017). The LST was obtained from the Landsat-8 OLI to better define the forest coverage. Landsat-derived LST is also used to monitor forest area changes, such as the correlation between LST and tree loss or the detection of changes in forest cover (Parastatidis et al. 2017). The FI threshold can be defined from the LST change.
Spectral indices have significant advantages over the aforementioned methods for land-cover mapping owing to their simplicity for practical applications (Ye et al. 2014). The FI is used for a combination of forest taxation inventory, thus, it can be used to estimate the forest area. In a recent Mongolian forest study, researchers (Nyamjargal et al. 2020) began using object-based classification in 2019. Sentinel satellite data was used to classify Bogd Khan Uul forest species with object-based classification, and they conducted high-precision thematic studies. In addition, (Munkh-Erdene et al. 2018) used object-based segmentation with optical, radar, and hyperspectral data to compare different indices, and they used demarcation values to create new rules to differentiate forest species. Spectral variability vegetation index (SVVI) developed to provide useful and unique information for discriminating and classifying patches of natural vegetation types of interest in the Savanna (Coulter et al. 2016). Therefore, it required long-term and detailed mapping using satellite data.
The advantages of this study are to determine forested area accurately by the spectral bands of Sentinel-2 imagery. Therefore, we developed a new index for automatically delineating the training samples that are required for forest cover change analysis. It is based on the FI index using the concept of Dark Objects (Huang et al. 2008). This study aimed to map the changes in the forest area and type using satellite images and field measurement data, to classify forest tree species using satellite image data, and to improve the classification accuracy when using FI information. Overall, this study indicated that modern RS techniques and technologies are reliable forest monitoring and management tools.

Materials and methods
Mongolia spans the transitional zone between the deserts of Central Asia and the Siberian Boreal Taiga Forest Batkhuu et al. 2011. The boreal forests in the northern part of the steppe and taiga zones occupy most of the forested areas in Mongolia. Figure 1 illustrates the framework of this study. We used SNAP and ArcGIS software (Zhang et al. 2011;Marcal 2019) assess the processed data layers, including the forest taxation data of the Forest Research Development Center (FRDC) and Bing Maps (Qu et al. 2011). We estimated the spectral forest index (SFI) from Sentinel satellite data to define the forest cover of the study area.

Study area
The study site is in the Siberian taiga and forms a transition zone. The central Asian steppe zone is predominantly dry and windy, with a short growing season. The growth rate of the Mongolian forests is slow (Batsukh 2004). The study area is in the montane forest-steppe regions and includes the Khangai mountain forest eco-region (Dorjsuren et al. 2021). The north of the region is characterized by alpine forests, which gradually blend into the arid steppe plains of the central Mongolian highland (Figure 2). The grasslands of the Mongolian Plateau play a vital role in preventing soil erosion, regulating the water regime, and providing suitable conditions for wildlife and biodiversity conservation (Tsogtbaatar 2000). The Mongolian forests generally have a relatively low restorative capacity and are sensitive to a harsh climate, forest fires, and degradation by human influence (Dulamsuren et al. 2016).
According to the Holdridge life zone system of bioclimatic classification, the Khangal soum is situated in the boreal dry scrub biome and consists of 60.61% larch, 21.22% pine, 17.94% birch, and 0.22% elm (Forest Management Agency, 2010). According to our survey, the forests in the Khangal and Khandgait valleys span an area of 78,582 ha at an elevation of 1260-1570 m (Norovsuren et al. 2019). This area has a subarctic climate with an average annual temperature of À1.3 C and average annual precipitation of 278.4 mm.
The boreal forest is mainly dominated by larch (L. sibirica) and five conifer species: birch (Betula platyphylla Sukaczev), Siberian pine (Pinus sibirica Du Tour), Siberian elm (Ulmus pumila L.), aspen (Populus tremula L.), and spruce (Picea obovata Ledeb.). The Mongolian forests are mainly coniferous and are mixed with some broadleaved trees that grow on the mountain slopes 800-2500 m above sea level (Government of Mongolia 2018). The study site was a boreal forest comprising deciduous and coniferous trees that grew in forest-steppe, boreal forest, and mountain zones.

Data collection
Remote sensing data: Sentinel-2, launched in 2015, provides free satellite images with a spatial resolution of 10-60 m. Sentinel-2 imagery was acquired for the years 2015-2020 for optical multispectral imagery. The multispectral bands were visible green (560 nm), visible red (665 nm), NIR (842 nm), and near-infrared narrow (865 nm). The bands that were used in this study are presented in Table 1.
Ancillary data: The national forest inventory (NFI) statistics in Mongolia are based on many field plots and are the most important data sources for forest research (FDRE 2018). Field data from a previous survey and the FRDC were also used in this study (Yangiv 2017). An inventory map that was produced by forest classification was represented as units of interest in ArcGIS in shapefile format (the parcels corresponding to each tree species). Figure 3 represents google earth map, Sentinel-2 surface reflectance images in false color, forest tree species reference map for the study area developed in the study area.  Field data: During the field data collection, the surveyed plots were located in areas with significant disturbances, such as areas with forest fires and deforestation. All the ground data were collected from 2017 to 2020. A total of 269 samples were obtained from the study area. Additionally, the following attributes were measured in each Siberian larch-dominated plot: the height, diameter at breast height, and type of field plots by size, shape, and number (Yrttimaa et al. 2020). In this study, the following three datasets were used: land survey measurements, tree growth from field surveys, and satellite data ( Table 2). Figure 4 shows the study field in Khangal soum, Bulgan aimag.

Data processing and methodology
The forest taxation data was collected from the FRDC, Google Earth Pro, and Bing Maps (Wiseman 2019) for the assessment. The FI from the Sentinel satellite data was used to define the forest cover of the study area. The Sentinel-2 data were pre-processed using SNAP v. 4.0.0 (Sentinel Application Platform, European Space Agency). This software includes the plugin "Sen2Cor" (Belgiu & Csillik, 2018), which performs atmospheric correction of the top-of-atmosphere (TOA) Level 1C input data. Sen2Cor also creates bottom-of-atmosphere (BOA), optional terrain, cirrus-corrected reflectance images, aerosol optical thickness, water vapor, scene classification maps, and quality indicators for cloud and snow probabilities (Clerici et al. 2017). The TOA reflectance data at Sentinel-2 Level-1C was then processed to Level-2A using European Space Agency (ESA)'s Sen2Cor algorithm to obtain the BOA reflectance images (Main-Knorn et al. 2017). The separability (ability to distinguish between the foreground and background classes (Radoux et al. 2016)) was relatively good due to this sensor. Then, statistical analysis was performed to evaluate the separability of the forest cover classes based on the spectral information from the samples (Carrao et al. 2007). The separability input data enabled the identification of different classes in a mixed pixel. The classifications that were achieved based on these Sentinel-2 data are more accurate and of higher quality than those that were based on the Landsat data (Topalo glu et al. 2016). Therefore, the selection of the proper resampling technique for the registration of multidate Sentinel imagery can be critical for digital classification accuracy in mountainous forest areas (Mitchell et al. 2017).

Development of forest cover algorithm
In recent years, vegetation indices that are dependent on the plant growth environment, surface temperature, soil moisture, precipitation, and atmospheric environment have been developed (Table 3). For instance, Cohen (1991) suggested that the first actual VI was the simple ratio of red-reflected radiant flux to NIR radiant flux as described by Birth and McVey (1968). The ratio vegetation index (RVI) was derived from the visible red and NIR spectral bands (Equation 1). The vegetation biomass was successfully estimated based on the NDVI (Equation 2), which was derived from the visible red (RED) and NIR channels of Sentinel-2. The SNAP-derived leaf area index (LAI) was used to obtain the Sentinel-2 TOA and BOA images at two spatial resolutions (10 and 20 m). Moreover, LAI uses a builtin biophysical processor, also called the Sentinel-2  Land bio-physical processor, within the SNAP Toolbox (Equation 3). The VIs were created to define the vegetation and non-vegetation areas and not the forest and non-forest areas. Thus, the FI was selected to differentiate between the forest and non-forest vegetation areas in our study (Equation 4). The FI was applied and confirmed using pre-processed Landsat data from the boreal forest region (Ye et al. 2014). The SVVI was developed and utilized for classifying natural vegetation types and agricultural fields in the South Africa (Coulter et al. 2016;Equation 5). The largest plant biomass was studied based on the spectral characteristics of the Sentinel-2 satellite, which includes the distinguishing properties of the above indices and on SNAP tools that process Sentinel-2 satellite data to analyze the forest cover. The model equations based on the plant and forest indices are presented in Table 4.
A natural logarithm-type function was used in these equations. It is a particular type of logarithm that is commonly used to solve time and growth problems. We developed the SFI for the Sentinel-2 (SFI 2 ) satellite images of boreal forests. The SFI was created to define the vegetation and non-vegetation areas but not the forest and non-forest areas. The proposed SFIs were correlated with the RVI, NDVI, LAI, SVVI and FI. The correlation between SVVI and SFIs was lowest correlation which as lower than R 2 ¼ 0.02. Therefore, we did not include the results of SVVI in the Table 4. The correlations between SFI 2 and RVI, NDVI, FI, and LAI were 0.97 (p < 0.0001), 0.95 (p < 0.0001), 0.89 (p < 0.001), and 0.78 (p < 0.001), respectively. We then selected the SFI 2 with the highest correlation (Equation 7).
Spectrum forest index equation: where parameter L is a very small value, and its introduction can effectively lower the NDVI of the water while having little impact at 0.01. The natural logarithm Ln was empirically set to 2.7. The q NIR , q NIRn , q RED , and q green q NIR , q NIRn , q red , q green are the surface reflectance. Thus, we differentiated between the forest and non-forest areas in the data that were calculated using the newly modeled SFI 2 . The forested area was selected using a threshold value of 1.2. The value 1.2 is the threshold value for SFI 2 , and it is the arithmetic value that was identified from the samples that were collected between 2013 and 2020. Specifically, a pixel with an SFI 2 value !1.2 was classified as forest, while that which was <1.2 was classified as non-forest. We completed three-level validation of the forest cover map from the SFI 2 . In addition, the base error matrix method for each validation plot was manually determined using a sample size of 10 m, and the forest/non-forest areas were assessed manually. Our results were validated using the Forest Research Center's Forest Census Forest Measurement Point to compare the 2015-2020 forested areas with the forested and unforested regions.

Forest cover assessment
The SFI 2 values between À10 and þ10 were identified, and the forested areas were characterized as <10. The correlation between SFI 2 and RVI, and NDVI was 97 and 95%, respectively, which shows that SFI 2 can be used for forest cover analysis. Figure 5 shows the relationship between the FI and newly modeled SFI 2 for 269 randomly distributed samples from the 2015-2020 Sentinel-2 satellite data. The mean R 2 was 0.81 which was correlated between the FI and SFI 2 from 2015 to 2018 ( Figure 5). The SFI appears to be a better explanatory index than the FI.
Where parameter L is a minimal value and the introduction can effectively lower the water's NDVI while having little impact (L ¼ 0.01), Natural Logarithm Ln empirically set to (2.5). and are surface reflectance, respectively.
The FI was also less dependent on SFI 2 than on the other plant indices, such as the RVI, NDVI, and LAI.
Owing to the error matrix, the overall assessment of the accuracy was 90.4% between the national forest inventory and SFI 2 . Figure 6 is the SFI 2 map overlaid with the field samples, which includes forest cover analysis, and Figure 7 shows the SFI forest cover map in 2018 overlaid on Google Maps (Table 5).

Forest type determination based on spectral Forest index 2
Spectral vegetation indices are often used in land-cover classification to help distinguish between vegetation types. We applied SFI2 to determine the forest type based on the SFI2 indices. Birch and larch dominated the study area. The output map of the forest-type SFI2 values is shown as boxplots (Figure 8).
The SFI of 178 measurements in the forest-steppe had statistically (p < 0.001) different threshold values for elm (2.49), birch (2.61), larch (2.71), pine (2.73), and all of the species together for each forest type but the leaves were better differentiated by the coniferous forest type. Therefore, the forest type was classified using the average threshold value for each plot and its comparison with the forest inventory map data. Figure 9 shows that the plot-1 and plot-4 areas were covered with larch and birch forests, and most of them were dominated by larch forests. The plots were classified according to the SFI 2 , which distinguished the birch and larch species. The plot-1 forest area had 43% larch and 31% birch trees, while plot-4 had 41% larch and 23% birch trees. In Plot-2, the evergreen pine forest differed from the deciduous forest type by 38% for the pines and 26% for the elms. Additionally, Plot-3 had 35% pine, 20% larch, and 26% birch forest types, which indicates a mixed forest type.
In some areas of the forest classes that were represented in the optical images, the fuzzy boundary between the grasslands and young forests could not be distinguished because they have similar spectral characteristics. The larch trees on the dunes are very old, isolated, and have a low stand density (Klinge et al. 2020). However, these two classes could be identified in synthetic aperture radar images because they have different structures that can cause different backscatter returns (Altangerel and Udval 2019). In general, the classification of the forest tree species was acceptable and can be used in forest management.

Discussion
In this study, we developed an spectral forest index. A forest zoning process was conducted in Khangal soum,  Bulgan Aimag, to determine the boundaries of the forested areas (Figure 7). Spectral forest index was developed based on Sentinel-2 imagery. Some previous studies six vegetation indexes (EVI, GVMI, MTCI, NDII, NDVI, RENDVI) were tested as input to the proposed algorithm to detect for grassland moving frequency (Andreatta et al. 2022). RVI, NDVI, LAI and FI were applied for the validation of SFIs which as strong correlation with the RVI and NDVI. Strong correlations were found between specific regions of the electromagnetic spectrum and species-specific physiological characteristics useful in estimating forest cover, especially using VIs based on infrared wavelengths: the normalized difference vegetation index (Marzialetti et al. 2019;Spadoni et al. 2020). Based on developed SFI 2 , we determined the changes in the forest area using Sentinel-2 images and field measurements data, to classified forest tree species, and improved the classification accuracy. Freely available Sentinel imagery and forest classification methods applied here many be used to further monitor forest management in the region as the reason of deforestation of continues to expand at ever increasing.
In the previous studies, De Luca et al. (2022) explored the potential of the combined use of synthetic aperture radar (SAR) Sentinel-1 and optical Sentinel-2 band and indices time-series, integrated the InSAR coherence measure and the optical biophysics variables to classify forest cover and discriminate it from the surrounding non-forest land covers. Median filtering removed some small shadow-affected regions which were noted earlier, and further improvements were made by truncating the predicted values based on Rees et al. (2021). Further studies will focus on combination of SAR and optical imagery on deforestation and forest degradation analysis.
It is essential to determine the boundary of the forested area depending on the forest cover and species in the region. The boundaries of deciduous and coniferous forests were more clearly distinguished by classifying the forest type using processed data. In pixel-based classification, the spectral information contained in individual image pixels is analyzed (Nyamjargal et al. 2020) after rules are applied that use different categories and threshold values. Furthermore, it demonstrated that the deforestation processes, resulting from factors such as negative climate trends, will continue in the following years (Sukhbaatar et al. 2021).

Conclusion
Spectral forest index (SFI) was developed in this study in order to determine forested area and classify forest species. The surveyed forest area was compared with 6-year results and the SFI 2 boundary using a Google Maps background. Overall, the SFI 2 forest and non-forest classification method was acceptable and can be used to classify tree species in other regions.
The Sentinel-2 imagery was processed, and the SFI 2 was calculated to determine the forest cover and compare it with other vegetation and forest indices. The determination coefficient R 2 was 0.81. The SFI 2 of the forest area was set at 1.2, which was confirmed with 90.4% confidence. However, some forest areas were detected outside the national forest inventory area, suggesting that the boreal forest in the study area could extend beyond the defined premises of the national forest inventory.
In the future, classification methods based on machine learning algorithms can be used to distinguish and classify forest types and identify forest species at   different growth stages. Furthermore, this would enable us to compare the impact of the Sentinel-2 channel bands with the FI, particularly in forest-dominated areas. From the study between 2015 and 2020, we found that forest cover is declining and forest species patterns are changing. The limitation of data collection was related to the followings: (1) there are no constant field measurements; for example: forest inventory map has made for more than 10 years later; (2) It is costeffectiveness and lack of human resource; (3) field samples was taken far away from the samples which are located closed or open forested and deforested area.
The SFI 2 threshold value should be determined in a different region, such as another natural zone. Our study shows that the newly developed index is important for monitoring forests in northern Mongolia, which are sparsely populated and inaccessible to rangers and researchers. SFI 2 was found to accurately differentiate forests and forest species and it is potential cost-effective method for forest monitoring and forest management inaccessible forests. This study can be used for the decision makers in the regional level and forest community of other provinces.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This study was supported by the RFBR-Mongolia "Comparative assessment of the dynamics and origin of desertification in the border area of Russia and Mongolia" under [grant project @ 18-55-91047].