Estimating the Soil Erosion Cover-Management Factor at the European Part of Russia

: Evaluation of the vegetation and agricultural-management factor (C-factor) is an important task, the solution of which affects the correct assessment of the intensity of soil erosion. For the vast area of the European part of Russia (EPR), this task is particularly relevant since no products allow taking into account the C-factor. An approach based on automated interpretation of the main crop groups based on MODIS satellite imaging data from Terra and Aqua satellites with the LSTM machine-learning method was used to achieve this goal. The accuracy of crop group recognition compared to the open data of the Federal State Statistics Service of Russia was 94%. The resulting crop maps were used to calculate the C-factor for each month of a particular year from 2014 to 2019. After that, summaries were made at the regional and landscape levels. The average C-factor value for the EPR was 0.401, for the forest landscape zone 0.262, for the forest-steppe zone 0.362, and for the steppe zone 0.454. The obtained results are in good correlation with the results of previous field studies and provide up-to-date (based on 2014–2019 data) estimates of C-factor for rainfall erosion (monthly, annual) with high spatial detail (250 m).


Introduction
In recent decades, due to the changes in the climate in the study area, the contribution of snowmelt runoff to soil erosion has significantly decreased. Therefore, an important task is the assessment of the soil losses due to rainfall runoff [1]. The vegetation and agricultural-management factor (C-factor) is a key parameter in soil erosion models. The methods of vegetation factor (C-factor) estimation used in the erosion equations have been considered in many Russian and foreign publications concerning soil erosion models. A complete review of existing methods for assessing vegetation cover and management factors for various erosion models was carried out in the works of G.A. Larionov [2], L.F. Litvin [3], F.N. Lisetsky et al. [4], monographs "Catchment erosion-fluvial systems" ed. R.S. Chalov et al. [5] and the article by Q. Feng et al. [6]. In addition to the USLE [7] and RUSLE [8], the C-factor is also applied in other erosion models, such as ANSWERS (Areal Nonpoint SourceWatershed Environment Response Simulation) [9], SWAT (Soil and Water Assessment Tool) [10], and SEMMED (Soil Erosion Model for Mediterranean Regions) [11]. Soil erosion assessments are conducted at various scales. They may cover particular slopes [12], small [13] and large watersheds [14], regions [15], countries, and continents [16,17]. C-factor estimation methods that are appropriate for one scale are usually not appropriate for other scales.
Two standard empirical methods exist to obtain C-factor values in the USLE/RUSLE models. The first one-the C-factor, is defined as the soil loss ratio from the land under specified vegetation cover to the corresponding loss from clean-tilled, continuous fallow. The second one-the C-factor, is calculated as a product of five sub-factors described in the RUSLE handbook [8]. Researchers in different countries have adopted the calculation methods to local conditions, but the method, in general, has not changed. The US Soil Conservation Service uses the values of agricultural management factors, summarized within particular regions of the United States [4]. A quantitative assessment of the soilprotective properties of crops for the USLE equation for the territory of the USSR was carried out at Moscow State University [18]. In the absence of a sufficient number of unified measurements of soil washout, the soil-protective properties of crops were determined using a projective coverage monthly for the forest and steppe zones [18,19]. In the work of Morgun et al. [20], the values and dynamics of C-factor (calculated based on generalized field data of various authors) for the main crops (winter, spring, maize and sunflower, beets and potatoes, perennial herbs) for the forest-steppe zone of Ukraine are presented. The monograph "Catchment erosion-fluvial systems" [5] contains particular erosion indices of crops combined into four groups: 1. Thick cover (winter and spring crops, leguminous crops, annual grasses) 2. Tilled high-stem (corn, sunflower) 3. Tilled low-stem (potatoes, sugar beet) 4. Perennial grasses (second-third years of vegetation) The data are presented for six periods of the year's warm season, determined by the main stages of cultivation and phases of crop vegetation.
Annual soil loss by erosion was determined for the European territory of Russia. In these estimations, very generalized regional values of soil-protective properties of vegetation were taken into account. The average soil erosion in the study area amounts to 4.04 t ha −1 year −1 , considering the soil-protective coefficients of crops [21].
Estimation of the C-factor using empirical methods based on field experiments is very laborious and expensive to perform. Therefore, alternative methods to assess the Cfactor at the regional and national levels are used. Currently, there are three main groups of such methods.
(1) C-factor values from literature or field data are assigned to corresponding vegetation types [13,[22][23][24][25][26][27][28][29]. The enhanced method is used on the territory of the European Union, where European land cover databases (for example, CORINE Land Cover), biophysical parameters obtained by remote sensing data, and statistical data on crops and practices are used as source data. The C-factor was estimated using crop statistics (% of land per crop) and data on management practices such as conservation tillage, plant residues, and winter crop cover in arable lands. The C-factor in non-arable lands was estimated using the vegetation cover dataset FCover [16,17]. In some works, the vegetation factor's lack of a seasonal component is noted as a disadvantage of this approach. Even in Europe in neighbor countries, the seasons with the most intense rainfall belong to different periods of the year, when soil vegetation (especially crops) is completely different [30].
(2) Calculation of linear and non-linear regressions between the values of vegetation indices and the values of C-factor obtained in the field to consider spatial and seasonal variability [31][32][33][34][35][36]. The most commonly used index is NDVI (Normalized Difference Vegetation Index) and EVI (Enhanced Vegetation Index). The main problems associated with NDVI are soil reflectance (especially in the case of sparse vegetation) and the variability of vegetation period. Several soil-adjusted indices, such as TSAVI (Transformed Soil-Adjusted Vegetation Index) [37] and PVI (Perpendicular Vegetation Index) [38], have been developed to consider soil reflectance. The main effects of vegetation phases occur during earlier growth stages when the NDVI often overestimates thin vegetation cover due to intense chlorophyll activity and vegetation senescence when vegetation indices typically decrease even when the cover remains the same [39]. Along with this, many studies using vegetation indices analyze the seasonal effect when estimating C-factor, including both monthly and annual periods [40][41][42].
(3) Calculation of regressions between C-factor and projective coverage. Projective coverage, usually used in such models, cannot reflect contributions from different vege-tation layers to soil conservation. The effects of vegetation structures on soil erosion control are ignored. For example, the mixed forest is more effective than pure forest at reducing soil erosion, although both types of forests exhibit the same level of cover. The leaf area index (LAI) has been used as a vegetation structure index to replace projective coverage [6].
In general, none of the methods discussed above is ideal because they do not allow to carry out a comprehensive assessment of the effect of vegetation on reducing erosion. There are two main directions of research in this area: maximum consideration of all the soil-protective properties of vegetation, including biophysical properties, projective coverage, vegetation structure, seasonal vegetation features, etc., and extrapolation of values over large areas, taking into account spatial variability. So, there is no unified methodology for assessing this factor, and this topic is still actual and open to discussion. Also, the relevance is largely determined by the variety of landscape conditions and features of land-use systems in different countries.
For the first time for this vast region of the Earth, the results of estimating the vegetation factor for calculating soil erosion are presented. They are relevant both in time, spatial detail, and used methods, in contrast to previous small-scale assessments provided on the study area. The huge area of Russia and the inaccessibility of many regions for field research makes it necessary to use remote sensing data. In addition, the assessment for such large areas was previously carried out only for the European Union territory [16]. Using the actual climatic data, Copernicus FCover data (vegetation projective cover), and VNP22Q2 data from the VIIRS satellite (data allow determining the beginning and end of the growing season), the phenological phases of vegetation and the seasonality of precipitation were taken into account. This allowed to obtain not fixed values of the C-factor for specific types of vegetation, but also to obtain monthly estimates taking into account the latitudinal landscape zoning. Thus, the proposed study contributes to solving the problem of assessing C-factor for the eastern part of Europe. Also, it should be noted that the study is largely methodological in fact.
The main aim of this study is to assess C-factor for the European Russia territory. To achieve this aim, several tasks were solved:

•
The seasonality of precipitation has been determined, i.e., areas with rainfalls in a particular month are identified; • Due to the lack of government statistical and spatial data on the structure of cultivated areas and crop rotation, crop recognition was carried out using data from peer territories (Canada); • Using free products of biophysical parameters calculated on the basis of remote sensing data, the phenological phases of vegetation were determined; • The annual and monthly mean values of C-factor for the study area were calculated.

Study Area
The European part of Russia (EPR) covers approximately 4 million Km 2 and is situated over several landscape zones, from tundra to temperate zone deserts. According to the Land Cover Map obtained in 2014 from satellite data from TerraNorte RLC v.3 [43], which was designed by the Space Research Institute of the Russian Academy of Science (RAS), the total area of arable lands in the EPR comprises approximately 600,000 Km 2 . Approximately 95 million people live in this region, which is the majority of Russia's population. The environmental conditions that accompany soil erosion in this vast territory are diverse. We created and published a dedicated online geoportal called 'The River Basins of the European Territory of Russia' (http://bassepr.kpfu.ru/ Access on: 15.September.2021) that displays the environmental conditions of the territory (terrain, climate, hydrology, soils, land use, human-induced impact) [44].

Input Data
The assessment of the C-factor for the territory of European Russia was carried out using data from remote sensing of the Earth from space. The extensive spatial coverage caused the choice of the source data. The spatial and temporal resolution of the data was considered. We used products obtained from satellite data and available on the USGS/NASA (USA) and Copernicus websites (a joint program of the European Commission and the European Space Agency): • Products MOD13Q1, MYD13Q1 obtained from the MODIS satellite imagery data from Terra and Aqua satellites-16-day composites of NDVI and EVI vegetation indices with a spatial resolution of 250 m; • VNP22Q2 products obtained from VIIRS satellite imagery data from the Suomi NPP satellite-annual indicators of the phenology of the land surface with a spatial resolution of 500 m. The product contains six phenological transition dates: onset of greenness increase, the onset of greenness maximum, the onset of greenness decrease, the onset of greenness minimum, dates of mid-green, and senescence phases; • FCover product obtained from PROBA-V satellite imagery data. 10-day composites of the biophysical parameter FCover-the fraction of the surface covered by any green vegetation type, with a spatial resolution of 300 m.
The Russian product was also used-the annual map of terrestrial ecosystems of Russia TerraNorte RLC created by the Space Research Institute of the Russian Academy of Sciences [43,45] based on MODIS satellite data. It represents the spatial distribution of the main types of land cover with a spatial resolution of 230 m.
The total volume of satellite data obtained from open archives is about 370 GB.

C-factor Assessment Method
The contemporary assessment of the soil-protective role of vegetation (including agrocenoses) was performed based on the above mentioned RS data for the 2014-2019 observation period. The value of the C-factor in rainfall runoff was assessed. The general scheme of the process of C factor assessment is shown in Figure 1. The C-factor was estimated monthly for "warm" months. "Warm months" -are months when liquid forms of precipitation fall. For such a large area as the EPR, "warm" months were determined in each pixel. In order to do this the average monthly temperatures at 204 meteorological stations of EPR were calculated based on daily air temperature observations at Roshydromet. Spatial interpolation was done for the entire study region using the Multilevel B-spline Approximation method [46], and "indicator" rasters were built for each month, I(i), i = 1,...,12, where 1 or 0 in pixel indicated i-th month as "warm" or "cold", respectively ( Figure 2). Assessment of soil-protective properties of vegetation in the i-th month of the j-th year was carried out separately (by different methods) on arable lands and on all other types of lands (non-arable).

C-Factor Assessment on Arable Land
Arable land (pixels) was identified using the TerraNorte RLC annual map. In each pixel, the C-factor changes from month to month depending on the agrotechnical management and growing season phases of crops. It also changes from year to year due to crop rotation and differences in soil-protective properties of crops. The C (i, j) (C-factor in the i-th month of the j-th year) on arable land were calculated.
The developed approach to assess C (i, j) on arable land consisted in the recognition of crops by multi-temporal remote sensing data of the j-th year and the assessment of the C-factor in the i-th month of this year, and taking into account phenological metrics, FCover, and values of soil protection coefficients of crops in 6 periods of the warm part of the year.
For automated crops recognition, MODIS composites (MOD13Q1, MYD13Q1) were used, including time series of NDVI and EVI for the growing season of each year. Data in the peer territories were used as training samples.
As peer territories, several Canadian regions that are located in the temperate climate zone of humid plains and their landscape and climatic conditions are close to the conditions of the ETR were considered. For these territories, data from the Annual Crop Inventory project of the Ministry of Agriculture and Food of Canada for 2011-2018 are publicly available [47]. These are land use/land cover rasters with a spatial resolution of 30 m, including the classification of various crops.
The list of recognizable crops and groups of crops included crops represented both in the territories of analogs and in the studied territory-legumes, corn, spring cereals, winter cereals, sunflower, sugar beet, potatoes, buckwheat, fallow lands, perennial grasses, as well as nurseries and gardens. Pixels of MODIS data, covered more than 80% by one crop according to the Annual Crop Inventory data were chosen for training. The created training samples are subsets of MODIS rasters on the territory of Canada for the vegetative seasons of 2015-2017. They contain the values of NDVI, EVI, and the QA (quality assesment) layer in sample pixels with a temporal step of 8 days. A total of 30 NDVI, EVI, and QA rasters were used for each year. The index value was considered acceptable if the pixel value on the reliability raster equaled 0 ("Good data") or 1 ("Marginal data"). The preprocessing included filling missing data using Cubic Spline interpolation (in individual pixels for particular dates), plotting, generalizing, analyzing graphs of the seasonal variation of indices for various crops, and studying their variability, etc. The NDVI and EVI series of the current year (30-dimensional vectors in each pixel) were used as input data to the recognition algorithm for training and classification.
On the stage of development of the automated recognition technique, numerous experiments were carried out to test different classification methods: feedforward neural networks with deep learning (Deep Learning MultiLayer Perceptron, MLP) [48], Random Forest (RF) [49], recurrent neural networks (Long short-term memory, LSTM) [50]. The learning and classification algorithms that implement these methods were developed in Python 3.7 using the Keras v.2.3.1 + Tensorflow v.2.2.0 framework, with a GPU (computation on a video card). For data preprocessing, post-processing, and analysis, codes in R v.3.4.4 were developed. During the tests, the accuracy of recognition was assessed by different methods (producers's accuracy, user accuracy, commission and omission errors), including cross-validation. MLP method showed the worst results; The RF method recognized well the widely represented in the territories classes of crops (the per cent of correctly recognized pixels was about 90%). Still, it poorly recognized the underrepresented crops, despite attempts to balance the sample. The LSTM method showed the best recognition quality (percent of correct recognition 94%). Its advantage can be explained by the fact that it's aimed at recognizing sequences, which is important in our case. Table 1 shows the recognition accuracy of the classes of crops by the above methods using the data in the analogous territories. The trained recognition algorithm using the LSTM method was applied for arable land in the European Russia territory for 2014-2019. The input data to the algorithm was a set of NDVI, EVI rasters, and MODIS quality layers for the study area for the vegetative season of each year. The result was six raster layers, where codes of crops (or groups of crops) in a particular year were assigned in the pixels of the arable land.
At the next stage, the quality of crop recognition was assessed. For this aim, we used open data from the Federal State Statistics Service (Rosstat), namely the Database of indicators of municipalities (https://rosstat.gov.ru/dbscripts/munst/ Access on: 15.September.2021). Information about the sown areas of crops in the municipal districts of the European part of Russia was obtained by queries to this Database. This data was geocoded using the vector layer of the district's boundaries. Next, for the 1120 districts where arable land is present, the areas covered by a particular crop were estimated according to the recognition results, after which a comparison with Rosstat data for 2014-2019. The Table  2 shows the generalized results of comparisons by groups of main crops presented in government statistics. As we can observe, the correlation between the obtained results and the Rosstat data is quite high. Mean values and medians of area differences are about zero. The frequency histograms of the difference values demonstrate the degree of underestimation or overestimation of certain categories of crops in more detail. The recognition quality was assessed as quite acceptable for our purposes.
The next stage consisted of converting the raster of crops recognized for the j-th year to the values of the C-factor in the i-th month of this year in pixels of arable land. For this, the following procedure was performed. The basis was the soil protection coefficients (erosion indices) of crops for six periods of the warm part of the year determined using literature [5,16,17] (Table 3). It was decided to realize the "pessimistic" scenario. Because there is no information about the yield and agricultural techniques, the values of the erosion indices of crops were assigned in the suggestion of using moldboard ploughing. Also, using the sigmoid function a/(1 + exp ((b-X)/c)), a mathematical approximation of the dependence of the erosion indices on the projective cover was performed for four periods of the warm part of the year-from the beginning of the growing season to harvesting. The results are shown in the Table 3 (values of the sigmoid parameters) and, in more detail, in the Table 4. For the j-th year, in each pixel of arable land, according to the phenological metrics of the VNP22Q2 product, the dates of the beginning and end of the vegetation season were determined. They were taken as the beginning of the 2nd and the end of the 5th periods, respectively. For the i-th "warm" month, the fractions (per cent of days) of three-time intervals were calculated in the pixel: (1) before the beginning of the vegetation season, (2) during the vegetation season, and (3) after its end. For example, in a pixel in the territory of the Republic of Tatarstan for January, the first fraction is 100%, the second and third-0%, and for May, the first fraction can be 20%, the second 80%, the third 0%. In the range from the beginning of the vegetation season to harvesting (periods 2-5), the FCover data was used as a parameter of the projective cover (step of 10 days). FCover values on these days of the month were recalculated into C values using a sigmoid, the parameters of which depended on the crop type in the pixel. Then average C values were calculated. It was more difficult to calculate C in the 1st and 6th periods due to the lack of complete information on the timing of the main tillage. In the "pessimistic" scenario, we chose the main tillage option for all crops shortly after harvest. So, the 6th period (stubble) was ignored (the properties of the 1st period were assigned).
Further, the C-factor in the i-th "warm" month in the pixel was estimated as a weighted average of the obtained C values in three ranges of the month: (1) for days before the start of the growing season, (2) for days during the vegetation season and (3) for days after its end. The weights are the fraction of the corresponding ranges in the i-th month. A slightly different approach has been taken for perennial grasses where there is no such problem. Here, the assessment was carried out in the same way as for meadow vegetation of non-arable lands.

C-Factor Assessment on Non-Arable Land
To estimate C (i, j) (C-factor in the i-th month of the j-th year) on non-arable lands, the approach described in work by Panagos et al. [43] was applied. Following the most cited literature data, ranges of C-factor values were assigned for the land cover classes of non-arable lands in the annual map of TerraNorte RLC (Table 5).
Urban areas, wetlands, water bodies, permanent snow and ice, bare lands, and rocks were not included in the assessment. The effect of vegetation density was quantified using the FCover biophysical parameter, normalized in the 0-1 range. In each pixel of non-arable land, the C-factor in the i-th "warm" month of the j-th year was estimated as (1): where FCover (i, j) is the FCover value averaged over all days of the i-th month of the j-th year; Cmin, Cmax-minimum and maximum values of the C-factor for the land cover class in a pixel (Table 5). C-factor reaches its maximum value when FCover is 0 and minimum value when FCover is 1. At the final step, the rasters C (i, j) of arable land and C (i, j) of non-arable land were combined into single rasters (mosaics) C (i, j), representing the spatial distribution of the C-factor for the "warm" months of 2014-2019 in the European territory of Russia.

Results and Discussion
According to the developed approach, the C-factor was calculated for the agrocenoses of European Russia for each "warm" month of 2014-2019. The Figure 3 shows fragments of the results for June, August, and October 2016. Using the obtained estimates of C (i, j) for arable and non-arable lands, various generalizations were made. Raster layers representing the spatial distribution of generalized estimates were calculated: 1) C-factor of the j-th year (warm period) -annual estimates of the C-factor for rainfall runoff for 2014-2019. The values were averaged in the "warm" months in a pixel, taking into account the erosion potential of rainfall (fractions of the annual) (2): C(j) = (R(1,j)·C(1,j)·I(1) + R(2,j)·C(2,j)·I(2) + … + R(12,j) ·C(12,j) ·I(12))/R(j), where I (i)-indicator function "warm"/"cold" in the i-th month, and R (i, j) · I (i) -erosion potential of precipitation in the i-th month of the j-th year; 2) C-factor in the i-th month, the long-term average (3): where N = 6 years; 3) C-factor average long-term annual (in a warm period, with rainfall) (4): where N = 6 years. The Figure 4 shows thematic maps of the average long-term C-factor for each month (on a small scale) and the long-term annual C-factor for the study area. In Figure 5 individual fragments of these maps are shown in more detail.
For the territory of European Russia, the obtained values of the average long-term annual C-factor vary in the range from 0.0006 to 0.808, the mean value is 0.1264, and the standard deviation is 0.1796. For agrocenoses, these values are 0.0199, 0.808, 0.401, 0.1686, respectively. The results obtained earlier for the European Union [16]: mean C-factor 0.1043, standard deviation 0.1046, range 0.0001 to 0.526. At the same time, on arable land, the mean is 0.233, and without considering soil protection management 0.287.  To assess the reliability of the results, their agreement with the results of the 2010 studies [5] was checked. The data on the soil-protective capacity of agrocenoses (erosion indices), weighted over crop areas and generalized for the main natural agricultural zones, are presented here. In addition, the results were compared with the data obtained for the landscape zones of European Russia in the studies of 2012-2014 [51]. These data are obtained from field observations. The Table 6 presented C-factor values on the agrocenoses of European Russia in previous studies and the results of our study. Table 6. C-factor on arable land according to literature data [5,51]  We can see that for the entire European part of Russia, the average value of the annual C-factor is absolutely the same. For landscape zones, the difference in the values of the Cfactor on agrocenoses compared with the 2010 data ranges from −10% to 20%, and with 2012-2014 from −5 to 15%. Such values demonstrate a high degree of agreement, which indicates the adequacy of our approach and the reliability of the obtained results. It makes no sense to discuss the difference or dynamics of the C-factor estimates considering the difference in the used methods and their accuracy.
Thus, for the European part of Russia, as a result of the study, modern (2014-2019) estimates of the C-factor for rain erosion (monthly, annual) with high spatial detail (250 m) were obtained.

Conclusions
Approaches to studying the dependence of the C-factor from the vegetation indices obtained from the remote sensing data and modeling such dependencies are widely used now [32,36,[52][53][54]. But in most cases, such models are obtained for small regions and, if applied without calibration for other territories, they give wrong results. The developed method uses climatic, biophysical, and phenological parameters. It allowed obtaining adequate results for such a large study area, considering latitudinal zonality.
The developed model needs to consider all the features of crop rotations on arable land, land cultivation methods, etc. However, the methodology provides results for different scenarios by adjusting the values and the timing of the main tillage in pixels of arable land. The most feasible is correcting the C-factor values based on crop rotations according to the data of long-term recognition of crop types. As a rule, for agricultural land specializing in cereals and leguminous crops, 5-6 field crop rotations are used. Therefore, accounting for these crop rotations types is a solvable task. However, the problem is assessing land plowing methods, which is the most significant adjusting to the C-factor value. To consider the plowing techniques, analyzing the historical methods of tillage in a particular area or recommended plowing for a particular crop could be carried out.
The model gives results that are most similar to the data obtained from field observations of previous years. We should note that the results were obtained in one scenario. The most reliable results can be obtained considering data about the structure of sown areas and the methods of soil cultivation, which, unfortunately, are currently unavailable in Russia. As noted earlier, it is possible to take into account these parameters using indirect data. This approach will be tested in further studies.
Nevertheless, we can say that the data obtained as a result of the study already take into account the maximum possible set of open resources currently available for the terri-tory of Russia. The obtained data could significantly improve the assessment and modelling of modern erosion-accumulative processes. The developed methodology makes it possible to assess the C-factor in the European territory of Russia. The results also have practical importance. Now, within the framework of the project, they are used to calculate soil erosion losses in the study area, along with estimates of the factors R, K, etc. In the future, the applicability of the developed methodology to the eastern part of Russia will be assessed, which has regional conditions that differ from the European part and analogous territories.