Mapping croplands with a long history of crop cultivation using time series of MODIS vegetation indices

B. Mapping croplands with a long history of crop cultivation using time series of MODIS vegetation indices. Abstract In the study, a method of automated detection of croplands and natural grasslands using multi-year time series of the vegetation indices was developed and implemented. The methodology is based on a new recognition feature enabling identification of all lands cultivated over a long period. The new recognition feature was calculated from time series of the MODIS vegetation indices. It takes into account both intraseasonal and interannual characteristics of the vegetation period for arable lands with spring and winter crops, as well as for natural grasslands. The methodology determines: the procedure of obtaining time series of the NDVI and EVI vegetation indices for the period from March to November in the given years; generalization of the time series by land cover categories and agro-climatic zones; dates for calculating the recognition feature in each zone; calculation of the recognition feature; adaptation of the decision rules to the regional environmental conditions (threshold values for the agro-climatic zones); procedure of classification into croplands and grasslands. Croplands within the territory of the European part of Russia were mapped for the period from 2000 to 2016. The assessment of recognition accuracy and comparison of the results with the existing products confirm that the developed recognition feature can be successfully used for cropland detection and can improve the accuracy of data interpretation when applied along with other features.


Introduction
During the recent years, monitoring of agricultural land using remote sensing data has become an increasingly urgent issue, from both practical and research perspective. Of major interest are automated mapping of cultivated and abandoned croplands [1,2], crop recognition [3,4], and cropland-use intensity assessment [5]. On a regional scale, this requires data with large spatial coverage, high temporal resolution, sufficient spatial resolution, and minimal cost. Data collected by the MODIS instrument installed either on the TERRA and AQUA satellites appear to meet all these criteria. Most standard techniques for cropland mapping are based on the time-series analysis of vegetation indices using parametric (maximum likelihood classifier method [6][7][8]) and non-parametric (decision tree classifier [8][9][10], random forest [11][12][13], support vector machine [1,14], artificial neural network [5,8,15], etc.) classifiers. The most complete list of methods for cropland mapping has been compiled by Chen et al. [16].
This study proposes a new recognition feature for cropland mapping based on the analysis of long-term time series of the MODIS vegetation indices. A method for automated detection of crop-and grasslands using the proposed feature was developed and realized for the European part of Russia.

Source data.
The analysis was carried out on 16-day composites of the NDVI and EVI vegetation indices based on the MODIS data with a spatial resolution of 250 m (MOD13Q1 and MYD13Q1 products), which were obtained for the observation period of 2000-2016 over the entire territory of European Russia covered by seven MODIS granules (h19v02, h19v03, h20v02, h20v03, h20v04, h21v03, and h21v04). We also used spectral data (MOD/MYD09Q1 products) and the MODIS land cover model (MCD12Q1), as well as suitable (cloudless) data from the Landsat satellites for the observation period of 2000-2015 (path 172-174, row 15-29) and the Landsat TM image mosaics 5 by 6 degrees size (Tri-Decadal TM Mosaics) from the US Geological Survey (USGS) [17].

Preparation of training sites.
At the initial stage of the study, we prepared training sites for the land cover of three types: fields planted with spring and winter crops, as well as areas with natural grass vegetation. The Landsat data were used as the reference geodata to differentiate between two land cover types: cropland and land with meadow grass vegetation. Then, we analyzed the multi-season images (the MOD/MYD09Q1 data synthesized in the band1-band2-band1 combination), which made it possible to assign the fields to either spring or winter crops of certain years. In the spring images, spring crops were pink or purple and most often had a uniform texture. In June-July, depending on the region and the farming system, a peak in the vegetation of spring crops was observed; the fields turned green (of various shades, depending on the crop type). After the harvesting period (August, September), the fields with open soil began to take on shades from pink to brown again. Winter crops were clearly visible in the spring images, as well as in the images taken in late autumn; their color was bright green and they had a homogeneous structure. Due to the spatial resolution (250 m) of the MODIS data, the size of the training sites was at least 1 per 1 km, i.e., 4 by 4 pixels. When choosing trainings, the maximum possible regularity of their spatial distribution was ensured. As a result, 4020 training sites were selected over the territory of European Russia for different years: 618 for 2000, 988 for 2008, 782 for 2011, 734 for 2012, and 898 for 2015.
Since the time of meadow grass vegetation and the schedule of crop sowing and harvesting differ between the regions, time series of the vegetation indices must be analyzed separately for different agro-climatic zones. For this reason, each agroclimatic zone was assigned an individual code based on the corresponding map [22]. Moreover, each MODIS granule was processed separately, because their combined mosaic is too large.
As a result, an attributive database was created for the selected trainings. The database includes the following information: Yearthe year of the image, in which the training was allocated; Typethe land cover type, to which the training site belongs (1spring crops, 2winter crops, 3natural grass vegetation); Zonebelonging to particular agro-climatic zones digitized from the map of agro-climatic resources of the USSR [22]; Scenethe nomenclature of the granule.

Analysis of time series of the vegetation indices.
For each training, using 16-day composites of the MODIS vegetation indices, the NDVI and EVI values were assigned to all selected dates of the corresponding year. The index value for a certain date was considered acceptable if the pixel value on reliability raster equaled 0 (good data) or 1 (marginal data). Otherwise, the value of the vegetation index attribute field was considered 0 and this record was excluded from further analysis.
With the help of the obtained database, the NDVI and EVI time series were analyzed for each land cover type in all agro-climatic zones and during all years under consideration. The agro-climatic zones with fewer than 20 trainings of each type were excluded from the analysis. These are the focal agriculture zone and the northward zones.
The next step was to analyze of the seasonal dynamics of vegetation indices in training pixels. For each training, according to MOD/MYD13Q1, the NDVI and EVI time series were plotted for the period from March to November of the corresponding year (taking into account the quality band values). A total 8040 time series were plotted (2 indices for 4020 trainings). The time series were generalized for each land cover type and agro-climatic zone using the average, median, and smoothing splines included in the method of generalized additive models [19]. The seasonal dynamics of the vegetation indices in 2008 for the training sites of all three land cover types in one of the studied zones is shown in Fig. 1. Here, the dynamics of the average, median, and model approximation of each index can also be seen. The following recognition feature, which can separate croplands from grasslands was proposed based on the analysis of the dynamics of the indices. On the plots (Fig. 2, b), we can see that the absolute value of difference between the indices for spring and winter crops has two maximums: the first one occurs in May, while the other takes place in July. Due to the rotation of spring and winter crops in the arable area, the difference between the vegetation indices calculated on these dates will have greater variability from year to year than in the area with natural grass vegetation.
As a measure of variability, a variance or standard deviation of the difference between the values of the vegetation index on these dates as estimated over 17 years (from 2000 to 2016) can be assessed. The dispersion for croplands should be greater than for grasslands.

Calculation of the recognition feature and classification.
To calculate the recognition feature, we determined two dates for each agro-climatic zone of European Russia that are the minimum and maximum values of the difference between the indices of spring and winter crops (based on the analysis of the NDVI and EVI generalized time series in training pixels), respectively. Dates are based on the MOD/MYD13Q1 time resolution (Table 1).
The calculation of the recognition featurethe standard deviation (SD) of the difference between the values of the vegetation index on these dates was carried out both for NDVI and EVI. At the same time, the study area was initially masked to exclude lands a b  that obviously did not belong to the category of croplands or grasslands using the MODIS land cover model (MCD12Q1). For each pixel of the unmasked territory, the standard deviations sd NDVI and sd EVI were calculated, for which: 1) a particular agroclimatic zone and two corresponding dates were defined for each pixel; 2) in each year from 2000 to 2016, differences between the index values on these two dates were calculated; 3) the standard deviation of the difference was calculated. Some fragments of the result for sd EVI are shown in Fig. 3. In the image, typical regular spatial structures with high sd EVI values (brown) corresponding to cultivated croplands are clearly visible. Low values of sd EVI , which are displayed in green, correspond to grasslands and are mostly allocated along the river valleys.
The assignment of pixels to croplands and grasslands was carried out using the threshold classification. For a more accurate adaptation of the decision rules to the regional environments of European Russia, a detailed zoning of the territory was carried out. The accumulated temperatures above 10 °C were used for zoning. To calculate this indicator, open data from the RosHydromet weather stations (long-term daily observations of the air temperature and precipitation at weather stations in Russia and the former USSR) were processed [23]. Spatial interpolation to the study area was performed using the multilevel B-spline approximation method [18,24]. Following that, zones with a step of 250 °C were allocated. ; top rowestimate of the entire sample of training pixels; middle rowestimate by subsample within the zone with the accumulated temperatures above 0 °С from 2500 °С to 2750 °С; bottom rowestimate by subsample within the zone with the accumulated temperatures above 0 °С from 3000 °С to 3250 °С The sample of pixels belonging to the training sites allowed us to explore the distribution of values of the sd NDVI and sd EVI recognition features for the classes of croplands and grasslands, to assess the degree of their separability, and to determine threshold values.

MAPPING CROPLANDS WITH A LONG HISTORY OF CROP CULTIVATION… 307
In Fig. 4, the estimates are given for the conditional probability density functions ( sd ) which were obtained both for the entire sample and for subsamples within different zones. In the parts of this figure, the red histograms correspond to the sample of training pixels of croplands, while the green histograms indicate the sample of training pixels of grasslands.
The Hellinger distance was calculated as a separation measure of distributions [25]: Pu are conditional probabilities of the classification feature.
The H value can vary from 0 (complete coincidence of histograms) to 2 (complete separability of histograms). In our case, the Hellinger distance for both features (sd NDVI and sd EVI ) varies from 0.7 to 1.3 depending on the zone. The separability of classes is satisfying (the lower threshold of the Hellinger distance is normally 0.4). Then, an independent classification was carried out for each of the classification features (sd NDVI and sd EVI ). The results of the independent classifications were integrated: only the pixels assigned to the corresponding class by both recognition features were classified as croplands and grasslands.

Results and Accuracy Assessment
As a result of the classification, a raster layer of croplands and grasslands for the European territory of Russia was obtained (Fig. 5).

Fig. 5. Cropland recognition results
To assess the accuracy of mapping, the results of manual digitizing of croplands from the multi-seasonal Landsat 8 images for 2014-2016 at eight control sites were used: 1a part of the Izh River basin (Udmurt Republic); 2the Mesha River basin (Republic of Tatarstan); 3a part of the Sviyaga River basin (Ulyanovsk region); 4the Ulema River basin (Republic of Tatarstan); 5the Veduga River basin (Voronezh region); 6a part of the Medveditsa River basin (Saratov region); 7a part of the Samara River basin (Orenburg region); 8a part of the Kuma River basin (Stavropol territory) [26].
To assess the accuracy of cropland mapping, the boundaries obtained by automated classification were compared with the boundaries of croplands obtained by the manual vectorization of Landsat 8. The sites of agreement and disagreement were identified. Their areas were calculated. An error matrix was created. The accuracy and omission and commission errors were calculated (Table 2). The mapping accuracy in almost all cases is higher than 75%. Thus, the proposed recognition feature that separates croplands from grasslands can be used among a number of other features.

MAPPING CROPLANDS WITH A LONG HISTORY OF CROP CULTIVATION… 309
In addition, the mapping results were compared with the boundaries of croplands obtained from the TerraNorte RLC model (the land cover map of Russia) [27]. These boundaries were also compared with the data obtained by the Landsat 8 vectorization. An error matrix was created, and the accuracy and errors were estimated ( Table 3).
As evident from Tables 2 and 3, for all control sites, the user's accuracy for the obtained classification results is lower than the same accuracy for the data from the TerraNorte RLC map, while the producer's accuracy is higher (except for the Sviyaga River basin). This means that the results of the classification carried out according to the proposed methodology have fewer omission errors, i.e., fewer pixels belonging to croplands were skipped. Moreover, a greater number of pixels, which do not belong to croplands according to the control data, were classified as croplands.
The differences in accuracy can be explained by the different duration of time series of the used input and verification data. Since the MODIS data obtained in 2000-2016 were used in this study for classification, the result is the boundaries of all croplands cultivated during this period. The borders of the cultivated croplands on the TerraNorte RLC model are relevant for 2014, so its area here is obviously smaller. Furthermore, verification data were obtained from the 2014-2016 images, i.e., they almost coincide in date with the TerraNorte model. Taken together, these two points led to higher values of commission errors in the results obtained in this work.