Satellite-based analysis of classification algorithms applied to the riparian zone of the Malaya Kokshaga river

The paper comparatively analyses the accuracy of land cover classification in the riparian zone of the Malaya Kokshaga river in the Mari El Republic of Russia using Sentinel-2A satellite images with the algorithms of supervised classification: Maximum Likelihood (ML), Decision Tree (DT) and Neural Net (NN) in the ENVI-5.2 software package. Six main classes of land cover were identified based on field studies: coniferous, mixed (deciduous), shrublands, herbaceous, and water. The assessment of the area and the structure of land cover showed that forest covers 76% of the entire territory of the riparian area of the Malaya Kokshaga river. The analysis of the results of thematic mapping shows that the overall classification accuracy obtained by the ML algorithm is 96.09%, by NN - 94.51%, and by DT - 86.54%. The producer’s accuracy and user’s accuracy for most classes have the maximum value when the ML algorithm is used. For the NN algorithm, the maximum value of producer’s accuracy is observed for the mixed (deciduous) class, while for the DT algorithm – for the coniferous. When classified using all three algorithms the water and bare land classes were mixed, which requires more detailed work when estimating riparian forest ecosystems.


Introduction
Water protection forests of river basins are complex ecosystems that serve water-regulating, sanitaryhygienic, protective-accumulative, anti-erosion, and recreational purposes. They protect the river banks from destruction, accumulate sandy alluvium in floodplains, protect steep valley slopes from erosion and landslides, thereby protecting rivers from erosion products and siltation. Riparian forest ecosystems also redirect the surface runoff from higher-lying barren areas to subsurface [1,2] and fulfill recreation functions. In recent years, there has been a high anthropogenic impact on such ecosystems in different regions of the world, which leads to disturbances and consequent deterioration of water conservation properties [3,4].
To assess the ongoing disturbances and to monitor riparian forests, the remote sensing data have become a priority source of information about their condition and dynamics [5,6,7]. Traditionally, researchers use Landsat satellite images to monitor vegetation cover due to their availability, adequate spatial resolution, as well as a large archive of time seies data [8]. The use of Sentinel high-resolution satellite data provides new opportunities for accuracy improvement of thematic mapping [9]. The data from these satellites of the European Copernicus programme have high spatial and temporal resolution and wide coverage of the territory [10]. In forestry, Sentinel products are used for various purposes such  [11,12], determining forest types, setting boundaries of forest stands, analyzing vegetation indices, and assessing damage of forest cover [13].
Various algorithms based on traditional methods of classification with training (Parallelepiped, Minimum Distance, Mahalanobis Distance, Spectral Angle Mapper, Binary Encoding) are successfully used for monitoring and analysis of land cover [14]. However, the features of the study area and the quality of the selected remote sensing data affect the accuracy of thematic mapping. Researchers are trying to solve this problem by combining various classification methods [15,16] and using algorithms based on a Decision Tree [17,18] and machine learning (Neural Net, Random Forest, Support Vector Machine) [19,20], as well as comparing classification algorithms [21,22,23].
It should be noted that satellite data are not sufficiently used to assess the state and monitoring of water-protected forests in Russia. The accuracy of classification of riparian forests based on medium and high-resolution satellite data arouses practical interest. The purpose of this study is to compare the accuracy of classification algorithms Maximum Likelihood (ML), Decision Tree (DT), Neural Net (NN) on the example of the land cover of the riparian area of the Malaya Kokshaga river in the Mari El Republic of Russian Federation using Sentinel-2A satellite data. These algorithms were selected for the analysis based on the results of the literature review and their widespread use in the classification of vegetation [24,25,26].
In order to achieve this goal, the following objectives have been addressed: -the dominant types of land cover in the study area based on high-resolution images and forest inventory data have been identified; -field studies have been carried out with the outlining of sample plots on the territory of water protection forests; -thematic maps of the riparian zone of the Malaya Kokshaga River have been obtained from the Sentinel-2A image using three classification methods such as ML, NN, DT and spectral indices; -the assessment and analysis the thematic mapping accuracy has been carried out by all three classification methods.

Study location
The focus of our study is the land cover of the riparian zone of the Malaya Kokshaga river including the water conservation zone which has a width of 200m as stipulated in the Water Code of the Russian Federation (figure 1). Within the boundaries of this water conservation zone, there is a riparian protective strip where economic or other types of activities are restricted.
The Malaya Kokshaga river is the longest river of the Mari El Republic (part of its basin is located in the Kirov region) and a left tributary of the Volga river. The river originates from the village Maly Kuglanur of the Orshanka district, flows southwest along the eastern edge of the Russian Plain and feeds the Volga river. The length of the river is 194 km, its total basin area is 5,160 km 2 (the basin area in the republic is 4,760 km 2 ). The main tributaries are the Oshla, Bolshaya Oshla (right tributaries) and Maly Kundysh (left tributary) rivers. According to the State Water Register of Russia, it belongs to the Upper Volga Basin.
The climate of the basin is temperate continental. The average annual precipitation is about 540 mm. Solid precipitation prevails from November to March. The snow cover lasts for about 155 days. The river is fed mainly by snow (maximum water runoff in May). The Malaya Kokshaga has an Eastern European type of water regime with high spring flood, stable winter low water and intermittent summerautumn low water. The annual range of water level fluctuations is 5.8 m. The vegetation cover in the study area is represented by pine shrubby green moss and lichen forests, as well as floodplain forests. The dominant tree species in the study area are pine (Pinus sylvestris) and birch (Betula pendula). Non-dominant tree species are spruce (Picea abies), linden (Tilia cordata), oak (Quercus robur), aspen (Pulpulus tremula). Willow (Salix) and alder (Alnus glutinosa) are also common along the rivers [27]. In the basin of the Malaya Kokshaga river, observed on the Sentinel, there are Kokshaisk, Kuyar, experimental, suburban, and Orshanka forest enterprises of the Mari El Republic.

Research methodology
The flowchart presents step-by-step actions to classify and evaluate the accuracy mapping of three methods of supervised classification of the riparian zone of the Malaya Kokshaga river using satellite data (figure 2). The Sentinel-2A satellite scene (RT_T38VPH_20201029T081051_4328TIF) obtained on October 29, 2020 was selected for the classification [28]. Only the second (blue, 490 nm), third (green, 560 nm), fourth (red, 665 nm) and eighth (near-infrared, 842 nm) spectral bands with a spatial resolution of 10 m per pixel were used in the study. The satellite image underwent atmospheric, radiometric and geometric corrections using the ESA SNAP software package [29].
A 200 m wide water conservation zone of the Malaya Kokshaga river was highlighted in the image on both sides of the river course in the Arcmap 10.3 software package according to the Rosreestr public cadastral map. To do this, the PKK6_Zones.lyr file was uploaded to Arcmap 10.3 from the Rosreestr website [30], which was converted into several bitmaps. Further, these bitmaps were converted into shapefiles (*.shp) and merged into a single file. The area under study also comprises the Malaya Kokshaga river and urbanized areas including roads along its course. In 2020-2021, fieldwork was carried out to establish the sample plots and examine them. They were established on the base of circular relascopic plots. At the preparatory stage of sample plot selection, we carefully analysed the existing maps of forest stands, forest inventory, and forest plans of the Mari El Republic. High-resolution satellite images (Yandex, Google Earth Internet resources, SAS Planet) were used as supplementary sources. Based on a detailed analysis of the available data, the study identified six dominant thematic classes of land cover (table 1) and selected the most represented areas for these classes. When selecting sample plots, the main condition was their representativeness in all classes of land cover and even distribution over the study area.
The geographical coordinates of each sample plot were recorded onsite using the GARMIN eTrex 20 GPS receiver in order to identify them on satellite images. The data on the area, species composition and the main dendrometry indicators of the dominant species of the stand (average height, diameter, age) were recorded and linked to the adjacent plots. As a result of fieldwork, 50 sample plots were established and studied: 30 sample plots were used for classification and 20 -for accuracy estimation. After establishing a region of interest (ROI) for each type of land cover in ENVI 5.2, their spectral separability was evaluated on the Sentinel-2A satellite images using the Jeffries-Matusita (JM) method (a measure of inter-cluster distance), which is calculated by comparing each pair of sample plots [31,32]. The values of statistical separability of the selected ROI can vary from 0 to 2, and the index value above 1.4 indicates suitable spectral separability of the studied types of land cover [33]. The results of the spectral separability estimation are presented in table 2. The analysis of pairwise separability using the JM method indicates high separability for the considered ROIs, which have the values of statistical separability above 1.4.  In order to carry out classification according to the ML and NN methods, sample plots were established using ROI for each land cover type (table 1). Classification parameters typical for these methods were used in the study.

Mixed (deciduous) Shrublands
The ENVI 5.2 NN uses a standard error backpropagation algorithm (an iterative gradient algorithm used to minimize the standard deviation of the current output of the multilayer perception and the desired output). Neurons are divided into groups with a common input signal, i.e. layers. All elements of the external input signal are fed to each neuron of the first layer (Hidden Layer). All outputs of the neurons of the n th layer are fed to each neuron of the n+1 layer. Neurons perform a weighted summation of input signal elements with the addition of neuron displacement. Nonlinear priming is performed on the summation result -the activation function. The value of this function is the output of the neuron. When classifying by NN, we used the default parameters in the ENVI 5.2 programme: Activation -Logistic, Training Threshold Contribution -0.9, Training Rate -0.2, Training Momentum -0.9, Training RMS Exit Criterion -0.1, Number of Hidden Layers -1, Number of Training Iteration -1000.
Evaluation of the accuracy of the obtained thematic maps was carried out in the Accuracy Estimation utility module of the ENVI-5.2 software package. Sets of pixels (ROI) with the main legend types were used as sample plots. They were allocated according to the data of the forest management of the Ministry of Forestry of the Mari El Republic and field data. The resulting thematic maps are shown in figure 4. As a result, we obtained a Confusion Matrix with the main indicators (statistics) used when assessing the accuracy of thematic maps [34]: general classification accuracy, Kappa coefficient; producer's accuracy; user's accuracy, commission error, and omission error.

Results and discussions
The data for assessing the accuracy of thematic maps obtained by three classification methods from the Sentinel-2A image are presented in When classification is carried out using all three methods, we observed mixing of shrublands class with mixed (deciduous), herbaceous, and bare lands. The water class is mixing with the bare land. This can be explained by the fact that the river course of the Malaya Kokshaga often has narrowings, as a result, the pixels of the water class have the same spectral characteristics as the pixels of the bare land, which can be attributed to shadowed areas. Various approaches can be used to avoid these errors in the classification of water protection forests. For example, using high spatial resolution images, shadow masks, raster with a mask of water class.  Thus, the ML algorithm proved the highest accuracy. Therefore, further classification of the land cover of the water protection area of the Malaya Kokshaga river was carried out by this method. The results of the assessment of the area and the structure of the land cover after classification are presented in table 4. The largest area under study is covered with mixed (deciduous) (41.24%) and shrublands (25.44%) classes, which is typical for the vegetation cover of the riparian zone. The smallest class represented in the water protection zone is herbaceous (4.81%).

Conclusion
As a research result, we have developed the classification of water protection forests along the Malaya Kokshaga river using three algorithms: ML, NN and DT. Comparison of the results of thematic mapping of the riparian land cover according to Sentinel-2A data showed that the ML classification method proved to be the most accurate (96.09%). Despite the fact that in this study, the accuracy of classification by the DT method demonstrates a lower accuracy (86.54%) as compared with the ML and NN methods, this algorithm also has great potential for mapping water protection forests. The use of more suitable parameters (vegetation indices, soil indicators, topographic data, etc.) can improve the accuracy of classification carried out using DT. The NN method also provides rather accurate thematic maps. The change of standard classification parameters such as Activation, Training Threshold Contribution, Training Rate, Training Momentum, Training RMS Exit Criteria, Number of Hidden Layers, Number of Training Iteration, Min Output Activation Threshold can improve the accuracy of classification carried out using this method. The vegetation cover classified as coniferous, mixed (deciduous) and shrubland reaches 76% of the entire territory of the riparian zone of the Malaya Kokshaga river. Such distribution of classes indicates the current sustainability of the ecosystem under study. Further studies of the riparian zone of water-protected forests using satellite images can be aimed at identifying the dynamics and assessing the disturbance of its cover, as well as determining the main factors which influence these changes (illegal logging, fires, pests).