Mapping crop types in complex farming areas using SAR imagery with dynamic time warping

Crop type information is essential for many practical applications, yet its mapping is often constrained by inherent characteristics of most farming areas, such as fragmentation and small farm plots, changes in crop morphology across the season, and cloud cover. This study investigated whether these limitations could be overcome by using time-series of Synthetic Aperture Radar (SAR) and crop phenological information combined with different Dynamic Time Warping implementation strategies. Focusing on a fragmented landscape with small farm plots in the Netherlands, we used Sentinel-1 dual polarimetry (VV + VH) and TerraSAR-X single polarimetry (HH) images and the Dutch Basic Registration of Crop Plots (BRP) dataset for training and validation. Image pre-processing was followed by the generation of radar vegetation indices and polarimetric decomposition. Crop-specific responses to incident radar signal were analyzed, as well as the accuracy of the crop classification by Time-Weighted Dynamic Time Warping (twDTW) using either backscatter bands only or backscatter bands in combination with derived indices and polarimetric decomposition features. In addition, two further modified Dynamic Time Warping strategies, namely Variable Time Weight Dynamic Time Warping (vtwDTW) and Angular Metric for Shape Similarity (AMSS), were tested for their performance. Furthermore, we investigated the accuracy performance of a decision level fusion of TerraSAR-X and Sentinel-1 classification outputs. Results show that even in a fragmented landscape with relatively small plots (around 0.08 ha), crop types can be successfully mapped by using decision level fusion of the twDTW results of both sensors, reaching an accuracy of 77.1%. When using twDTW on Sentinel-1 (VV + VH) only, including Ratio, MRVI, and DPSVI indices, overall accuracy reached 69.5%; without those indices, accuracy was slightly lower (67.5%). Merging six different grain crops with similar leaf geometry into winter grains and summer grains improved classification accuracy to 80.6%. Our findings demonstrate that twDTW on SAR imagery allows to map crop types in fragmented landscapes with relatively small farm plots, offering potential for crop type mapping in areas with smallholder


Introduction
Accurate information on crop production is essential for many applications. For example, annual information on crop types and areas grown is needed for decision making in food production and food security (Samberg et al., 2016), characterization of cropping intensity (Jain et al., 2013), soil and water resources research, especially erosion modelling (Panagos et al., 2015), generation of cost-effective information about agricultural production (Carfagna and Gallego, 2005) and estimation of crop damage for insurance payout (ESA, 2017).
Mapping this information based on single-date satellite imagery can be unsatisfactory, especially in complex farming areas (e.g. Delrue et al., 2013). One of the persisting bottlenecks is crop-specific seasonality, requiring time series observation of satellite imagery to characterize the different phenological phases of crop development. In the last few years, the domain of optical remote sensing has made good progress towards time-series analysis, including the formulation of automatic information retrieval procedures in smallholder farming areas (Stratoulias et al., 2017), parcel boundary delineation in complex farming systems (Persello et al., 2019), extraction of cropland extent (Oliphant et al., 2019;Useya et al., 2019;Mohammed et al., 2020), crop types mapping (Belgiu and Csillik, 2018) and parcel level classification of crops (Aguilar et al., 2018;Sonobe et al., 2018).
Despite the progress made, using optical satellites to account for crop development across the growing season is still constrained by the persistent presence of clouds. For example, in the Netherlands, the probability of getting cloud-free images from an optical satellite with daily observation frequency is only 20% of the days in a year (van der Wal et al., 2013). In contrast, Synthetic Aperture Radar (SAR) satellites such as Sentinel-1 and TerraSAR-X are not hindered by cloud cover and can supply regular observations for monitoring of crop development (Khabbazan et al., 2019). Hence, some studies have tried to classify crop types from TerraSAR-X dual polarimetry (Hütt and Waldhoff, 2018;Sonobe, 2019) and Sentinel-1 dual polarimetry imagery (Danilla et al., 2017). However, a great deal of work is still needed to properly include crop phenological development into a classification scheme. While the use of SAR sensors allows regular, all-weather observation for crop monitoring, other challenges remain which emanate from the inherent characteristics of complex farming areas. These include the presence of small farm plots, irregular planting dates, and spatial complexity resulting from fragmentation, intercropping, and complex terrain. As a result, at a given time, similar crops can exhibit different backscatter while different crops may exhibit similar backscatter. This inter-crop similarity and intra-crop heterogeneity can only be resolved in the temporal domain by using time-series imagery to account for the evolvement of scattering mechanisms. Many studies have attempted to address this issue using time series of SAR data with classifiers such as Random Forest (RF) (Hütt and Waldhoff, 2018), Support Vector Machine (SVM) (Kang et al., 2018), and Convolutional Neural Networks (CNN) (Danilla et al., 2017;Sonobe, 2019). However, although these classifiers can incorporate multi-temporal observations across the growing season as a stack of time-series images, their implementation strategy does not allow to leverage phenological information into a classification scheme, i.e. shuffling the temporal order of the images in a stack does not have any implications for crop classification performance.
To solve this problem, Dynamic Time Warping (DTW) is a more suitable classifier (Sakoe and Chiba, 1978), as it allows to account for phase shifts in backscatter and reflectance characteristics as a function of crop development. Using DTW, Petitjean et al. (2012) were able to classify land cover features from irregularly observed Satellite Image Time Series (SITS). Following the work of Jeong et al. (2011), which improved the original version of DTW by including time weights, Maus et al. (2016) classified land cover features from SITS using timeweighted Dynamic Time Warping (twDTW). Belgiu and Csillik (2018) and Csillik et al. (2019) applied twDTW to map crop types from optical imagery using objects as the smallest unit of analysis. Li and Bijker (2019) classified short-cycle vegetables with a high degree of accuracy from SAR imagery using twDTW combined with SPRING strategy for subsequence searching.
Building on the significant progress in DTW-based crop mapping made by these previous studies, we identified four additional issues that need further investigation. Firstly, while past studies mainly focus on homogenous farming landscapes with large plots (Belgiu et al., 2020;Csillik et al., 2019), the performance of DTW classifiers using SAR imagery of different spatial resolution and polarization (Sentinel-1 versus TerraSAR-X) has not yet been investigated for a fragmented area with smaller farm plots. Secondly, the few studies that did focus on fragmented areas with smallholder farming (Aguilar et al., 2018;Stratoulias et al., 2017) did not give due emphasis to incorporation of phenology into the classification scheme, while, in a given landscape, crop phenology can be complex due to varying planting and harvesting dates within the same crop. To incorporate this phenological information into the classification procedure requires an investigation of crop-specific time lags in the computation of linear or logistic time weight (vtwDTW), replacing the fixed weight used in twDTW. Thirdly, while previous studies mainly focused on similarity measures based on Euclidean distance, an angular distance measure could be useful for time series of radar imagery, as this measure is considered robust for temporal variation of observations (Teke and Yardımcı, 2019). And finally, although some studies have already been done on fusing optical images of different resolution (Zhao et al., 2019) and fusing optical images with SAR images (Waske and van der Linden, 2008), there is a lack of studies on fusing Sentinel-1 with TerraSAR-X images to benefit from the different spatial and temporal resolution and polarimetry of these allweather sensing systems.
To address these issues, the current study mapped crop types in a complex farming area with small plots sizes, using time-series of Sentinel-1 and TerraSAR-X imagery, phenological information, and time-weighted DTW with both Euclidean and angular distance measures. In addition, we tested the use of a variable time lag in the computation of time weight in DTW to incorporate variations in crop phenology in the classification scheme. Finally, we devised a novel strategy for fusing medium resolution Sentinel-1 and high resolution TerraSAR-X data to increase mapping accuracy by using the potential of both sensing systems.
Although the Netherlands is well-known for its well-planned agricultural landscape resulting from past land consolidation (van den Noort, 1987;van den Brink and Molema, 2008), the selected study site was characterized by irregular, relatively small plots (~0.08 ha) and fragmented holdings where crops dominate the landscape in a mixed pattern. In the Netherlands, main crop production takes place from the middle of February to the end of September (Fig. 2). The average total annual precipitation is 832 mm, while the coldest month is January with a mean monthly temperature of 2.3 • C and the hottest month is July with 17.7 • C (KNMI, 2019).

Data
For this study we used time-series of dual polarimetry Sentinel-1 SAR (VV and VH) data, obtained from the European Space Agency (ESA), and single polarimetry TerraSAR-X (HH) data, obtained from AIRBUS Defence and Space (see Table 1 for metadata). Both products were in Slant Range Single Look Complex (SLC) data format, which includes both the phase and amplitude information needed for polarimetric decomposition. Detailed descriptions of the Sentinel-1 and TerraSAR-X missions are provided by Torres et al. (2012) and Roth (2003), respectively.
For training and validation, we used the Netherlands' Basic Registration of Crop Plots (BRP) for the year 2018. This registration system records the primary crops cultivated in a given year on all parcels registered in the Agricultural Area Netherlands (AAN). AAN was initially built on RGB air photos of 2008, with a resolution of 25 cm, and is regularly updated by users. The BRP is updated yearly, based on information provided by farmers, with a sounding date of May 15. AAN and BRP are made available via the 'Public Data on the Map' initiative of the Dutch Ministry of Economic Affairs and Climate via (https://www.pdok.nl/geo-services/-/article/basisregistratie -gewaspercelen-brp-#9464039d91ac261a857ee92a9f215250).
For terrain correction of SAR images, we used a 1-arc-second Surface Radar Topographic Mission (SRTM) Digital Elevation Model (DEM) from the United States Geological Survey (USGS) (https://urs.earthdata.nasa. gov).

Data preprocessing
First, orbit correction was applied to the Sentinel-1 data. As these images were sensed with Terrain Observation and Progressive Scan (TOPS) in successive sub-swaths and bursts within each sub-swath (De Zan and Guarnieri, 2006), burst lines were removed. As our study site was situated within two successive sub-swaths, these were separated and processed individually. After TOPS split operation, radiometric calibration (Miranda and Meadows, 2015) was done to reduce radiometric irregularities. Next, individually processed sub-swaths were merged to create a wider sub-scene. To reduce impacts of local terrain on backscatter, Range Doppler Terrain Correction was done using SRTM DEM. Given the fact that our study site was very small and covered only a small number of bursts in a sub-swath, we could assume the impact of incidence angle variation on backscatter to be negligible and not needing the incidence angle correction conducted for larger geographic regions such as in Arias et al. (2020). After terrain correction, an improved Lee Sigma filter with a window size of 7 by 7 pixels and a sigma of 0.9 (Lee et al., 2009) was used for speckle reduction. Next, images were converted to a decibel scale for both polarizations (VV and VH) as described in Miranda and Meadows (2015). Three secondary features were computed: the ratio between VH and VV bands, the Modified Radar Vegetation Index (MRVI) (Czuchlewski et al., 2003), and the Dual Polarization SAR Vegetation Index (DPSVI) (Periasamy, Fig. 2. Agricultural calendar for dominant crops in the Netherlands, compiled from Darwinkel and Zwanepol (1997) and Osman et al. (2005).

Table 1
Selected metadata of SAR images used in this study.  (Nasirzadehdizaji et al., 2019). As for pre-processing the TerraSAR-X HH images, the obtained SLC data were radiometrically corrected and multilooked. Similar to the Sentinel-1 SAR images, speckle was filtered from the TerraSAR-X images using a refined Lee sigma filter with similar parameters and terrain correction was applied using SRTM DEM. Finally, we tailored the BRP dataset to be used for training and validation, by removing grassland and horticultural crops from the dataset before starting the sampling procedure. The horticultural crops had too few fields per crop for adequate sampling.

Sampling strategy and creation of temporal backscatter profiles
Training samples were selected from the BRP dataset reference polygons in a two-stage sampling strategy. In the first stage, by using crop types as strata, 20 polygons were picked randomly from each stratum. For a crop with fewer than 20 polygons, all parcels were included in the training sample. The specified sample size for this study was mainly decided by considering the uniformity of existing terrain and climatic conditions in the study site. Evidently, random sampling cannot guarantee spatial heterogeneity and is vulnerable to spatial autocorrelation (Dobbie et al., 2008). Therefore, to control spatial autocorrelation during random selection, a neighbourhood constraint was imposed for each polygon. This was done by computing the minimum distance to the nearest neighbour polygon and then, within each stratum, selecting the first 20 polygons with the largest minimum distance. It should be noted that within a single polygon, there may also be significant variations in the backscatter characteristics resulting from differences in surface and subsurface soil moisture and crop development in response to variations in soil fertility. To account for this variation, we picked random pixel samples from each selected sample polygon, in proportion to the polygon areal extent, to serve as final training pixels. To validate classification output, we selected the centroid points of all polygons from the whole parcel data set. This yielded a dense number of points ( Table 2) that were fairly well distributed across the test site.
After the selection of training samples from reference polygons, values for all observables (VV, VH, HH, derived indices, and polarimetric decomposed features) were extracted from the processed SITS per crop. These values were then aggregated into a single time series per crop per observable using the median value. The median statistic was selected because of its robustness regarding outliers, especially for skewed distribution in a series. To suppress noise in the temporal domain and create a smooth temporal profile representative of crop development, a Savitzky-Golay filter (Savitzky and Golay, 1964) was applied. This filter effectively suppresses noise in the temporal domain of time-series satellite observations (Chen et al. 2004) while preserving the time series structure, especially the positions of local minima and maxima, and smoothing unnecessary fluctuations. The generated temporal profiles provided clues on the shape-based separability of the different crops from the SAR signal (Section 3.1) and hence the classification accuracy (Section 2.5 and 3.2).

Classification using twDTW
The first procedure in twDTW based classification (Maus et al., 2016) is the computation of the distance matrix between a data sequence at a given pixel location in SITS and a temporal profile of a specific crop (query sequence). Accordingly, for a data sequence with length M and specific crop temporal profile with length N, we computed a distance matrix D with a size of M by N using Euclidean distance as provided in Petitjean et al. (2012): where D [i, j] is the distance at row i and column j of the distance matrix D, [ SITS r,c ] i is a value at row r and columnc in SITS at time i, TPF j is the temporal profile of a specific crop at a time j, and tw i,j is a logistic weight that accounts for the absolute differences between times i and j, which was computed as: where i is the time of observation in SITS r,c at the i th position in a time series, j is the time of observation in the temporal profile at the j th po-sition, β is the maximum time lag for the warping window to search for a match, which is constant for all crops, and α is a user-defined constant to control the steepness of the slope. While previous studies fine-tuned α and β values by using a set of user-defined values (Maus et al., 2016;Belgiu and Csillik, 2018), we implemented a grid search strategy where the classifier picks an optimal combination of values after recursively computing overall accuracy on the fly from a series of provided α and β values. The warping path (P) of two sequences was then considered as a set of data points that define the mapping between the two signals SITS r,c and TPF j as: The k th element of the warping path can be defined as p k = (i, j) k where iand jare an index of the data points in [ SITS r,c ] and TPF j , respectively, that are part of the warping path. This warping path is subject to constraints of boundary condition, continuity, and monotonicity (Berndt and Clifford, 1994). As there are potentially many paths in a distance matrix D which satisfy these constraints, the optimum alignment is generated by following a path that minimizes the sum of distances:   As presented by Sakoe and Chiba (1978) and Berndt and Clifford (1994), keeping the minimal values (valley points) and fulfilling search constraints can be achieved by using a dynamic programming approach. Therefore, we generated a cumulative distance cost matrix C with size M by N where values at row i and column jwere recursively computed by summing a distance matrix D i,j and the minimum of surrounding cumulative distance cost values as: which was further subjected to the following constraints: In a cumulative distance cost matrix, the optimal similarity or dissimilarity between two different time series observations is a value at the position C [M, N]. To label a pixel with a specific crop type, warping distances were computed for each crop recursively. Following a 1-Nearest Neighbour (1-NN) classification approach (Hastie et al., 2009), each pixel was assigned the crop type label that had the lowest twDTW distance, compared to other crops.
The classifier was tested using Sentinel-1 (VV and VH), Sentinel-1 backscatter, and TerraSAR-X (HH) backscatter coefficients. It was also rerun by incorporating derived secondary products and indices (Ratio, MRVI, DPSVI) and decomposed polarimetric features (H, A, α) to test the effects of changing input observables on accuracy performance.

Classification using vtwDTW and AMSS
To account for variations, within the same crop type, in planting date and duration in the field, twDTW was tested with a crop-specific variable time lag in what hereafter will be named variable time weight Dynamic Time Warping (vtwDTW). The implementation of vtwDTW is almost similar to twDTW except for the computation of the logistic weight, changing Eq. (2) into: where β k is the maximum time lag for crop k in the temporal profile, where crop-specific time lags were provided as independent input. We generated these time-lags based on the agricultural calendar for dominant crops (Fig. 2), where short-duration crops were given smaller time lags while longer-duration crops were given larger time lags. Based on values investigated by Jeong et al. (2011), the α value was set to 0.1.
Both twDTW and vtwDTW were implemented by using Euclidean distance as a distance measure. As Euclidean distance is sensitive to outliers (Shirkhorshidi et al., 2015), we also tested Angular Metric for Shape Similarity (AMSS) using an angular distance measure, in this case, a cosine distance. Overall implementation of the classifier followed Nakamura et al. (2013) who used it for classification of various time series datasets. Pseudocodes for the sampling procedure, temporal profile creation, and algorithmic implementations of twDTW, vtwDTW, and AMSS are provided in Supplementary Material 1.

Decision level fusion of Sentinel-1 and TerraSAR-X outputs
To benefit from both Sentinel-1 (dual-polarized medium spatial resolution) and TerraSAR-X images (single polarization high spatial resolution), we implemented a decision level fusion (Zhang et al., 2009) on the individual classification outputs from Sentinel-1 and TerraSAR-X. These individual outputs showed different user and producer accuracies for the various crop types. To prioritize a crop type which is better separated in either TerraSAR-X or Sentinel-1, user and producer accuracies from each classification output for each crop were converted into a single accuracy metric as: where A c,s and A c,t are normalized accuracy for a specific crop (c) for classifications from Sentinel-1 (s) and TerraSAR-X (t) images respectively, UA and PA are user and producer accuracies respectively, and w is a weight for each accuracy, to which 0.5 was assigned to give equal weight to UA and PA. Then, the classification output from Sentinel-1 was resampled to TerraSAR-X pixel spacing using the Nearest Neighbour (NN) approach. We chose the NN resampling approach because of its suitability for resampling categorical and nominal data such as land cover classes, which cannot be interpolated. It is evident that NN resampling approach has limitations with regard to the positional uncertainty of linear features such as roads and rivers, creating a jagged pattern (Verbyla, 2002). Since crop polygons do not have a linear shape and individual polygons are covered by one single crop type, we assumed the positional error after resampling is negligible. For Table 2 Training and validation points. Training polygons  Validation points   1  Maize  20  1072  2  Potato  20  51  3  Rye  20  41  4  Summer barley  20  35  5  Summer wheat  13  13  6  Triticale  20  35  7  Winter barley  7  7  8  Winter wheat  16  16  Total  136  1270   Table 3 Accuracy of crop type classification using twDTW with different inputs.  verification we followed Porwal and Katiyar (2014) by doing a visual inspection of the fused classified image overlaid with the reference polygons of the field boundaries (see Fig. 6). To preserve the spatial detail in TerraSAR-X, we opted for resampling the Sentinel-1 output.

S.N. Crop
Resampling the coarser-resolution Sentinel output provided the opportunity to align this image to the finer resolution TerraSAR-X image, making the combined images suitable for the pixel-wise overlay needed for decision level fusion. Based on the computed single accuracy metric and co-registered images, the following decision rule was established to create a new crop type map: where E[i, j] c is the crop type label for crop c in a map classified by decision level fusion, S[i, j] c is the crop type label for crop c in Sentinel-1 and T[i, j] c is its TerraSAR-X counterpart, at the i th row and j th column of the pixel location. At each specific pixel location, the classifier picked the crop type with the highest normalized accuracy. Crops with similar leaf dimensions that grow in a similar season can cause confusion in the classification (Bargiel and Herrmann, 2011;Hütt and Waldhoff, 2018). To test the accuracy implications of grouping such similar crops together, the models were re-run after merging the various grain crops into summer grains and winter grains respectively. Finally, we calculated the user, producer, and overall accuracies using validation points, following procedures presented by Congalton (1991).

Crop-specific temporal backscatter profiles
The main crops grown in the study site showed different backscatter patterns, which also varied between Sentinel-1 and TerraSAR-X output (Fig. 3). For the co-polarized Sentinel-1 VV polarization band (Fig. 3A), the backscatter coefficients of all crops decreased up to the early weeks of May. After that, the backscatter coefficients for maize and potato abruptly increased until the end of June, stabilized during July and then started to drop around the beginning of August. Backscatter for summer wheat, winter wheat, summer barley and winter barley kept decreasing until the end of May and then stabilized until July. From July, backscatter continued to drop for winter barley, while it increased until October for summer wheat, winter wheat and summer barley.
On cross-polarized Sentinel-1 VH images (Fig. 3B), all crops showed decreasing backscatter up to the end of April. From the beginning of May, potato and maize showed an increasing backscatter, which dropped again in August. Although their shape was similar, the backscatter peak of potato was higher than that of maize. From the beginning of May, backscatter of summer wheat and winter wheat continued to decrease until July and then increased again, while backscatter for winter barley showed a smaller increase in August. Backscatter of summer barley increased from May until October. Except for minor differences, triticale and rye followed a similar pattern of decreasing backscatter until August, followed by stabilization or a slight increase.
In the TerraSAR-X co-polarized HH band (Fig. 3C), all crops showed a similar backscatter pattern until the end of April. From May, we observed three clusters of crops with similar patterns: 1) maize and potato, 2) summer wheat, summer barley, triticale and rye and 3) winter wheat and winter barley.

Crop type mapping using twDTW
The accuracy of crop type mapping using twDTW varied with input datasets (Table 3). Overall accuracy was highest (69.5%) when using Sentinel-1 input combining backscatter and generated indices (Ratio, MRVI, and DPSVI) but this was only slightly higher than when using Sentinel-1 backscatter alone (67.5%). Adding polarimetric decomposition features (H-A-α) decreased the overall accuracy to 36.6%, and also significantly reduced the user and producer accuracies for potato. The lowest overall accuracy (29%) was achieved with the Terra-SAR-X single polarimetry (HH) dataset. Large differences were observed between user and producer accuracies for the same crop (see Supplementary material 2 for confusion matrices).
These different accuracy levels were reflected in the crop classification maps (Fig. 4). Maps produced with Sentinel-1 backscatter coefficients only (Fig. 4A) or with added indices (Fig. 4C) resulted in outputs comparable to the reference crop polygons (Fig. 4E). However, the schemes produced with TerraSAR-X single polarimetry (Fig. 4B) and Sentinel-1 with decomposed features (Fig. 4D) yielded a poorer result, in which potato was overpredicted while maize was underpredicted. Confusion between maize and potato was very high when using TerraSAR-X HH (Fig. 4B)

Crop type mapping using vtwDTW and AMSS
Compared to twDTW (Section 3.2), the alternative classifiers vtwDTW and AMSS yielded lower accuracies. Using Sentinel-1 data, vtwDTW yielded an overall classification accuracy of 56.9%, while AMSS yielded only 35.8% (Table 4). For the TerraSAR-X data, accuracies achieved with vtwDTW and AMSS were even lower (28.6% and 10.6%, respectively). Fig. 5 shows a subset of the maps resulting from vtwDTW and AMSS of Sentinel-1 and TerraSAR-X data. The low accuracies achieved with TerraSAR-X data ( Fig. 5B and 5D) were mainly due to maize crops being misclassified as potato crops. As indicated on the classification maps, the best result was achieved with vtwDTW on Sentinel-1 data (Fig. 5A), producing a crop type map comparable to reference crop polygons. AMSS provided a more spatially confused crop type map, with larger within-field variation. In particular, accuracy results were significantly reduced for winter grains and summer grains (Fig. 5C-D).

Crop type mapping using decision level fusion of Sentinel-1 and TerraSAR-X
Integration of Sentinel-1 and TerraSAR-X achieved better classification results than using either of them alone (Table 5). Decision level fusion of outputs from twDTW yielded an overall classification accuracy of 77.1%, while fusion of outputs from vtwDTW and AMSS achieved accuracies of 69.7% and 51.8%, respectively. The user and producer accuracies also improved. For example, for maize, twDTW on the separate outputs of Sentinel-1 and TerraSAR-X yielded producer accuracies of 73.6% and 27.1% , respectively (Table 3), while using the fused outputs increased this accuracy to 84.5% (Table 5). Overall, decision level fusion twDTW provided more homogeneous polygons, especially for maize, which better resembled the reference polygons (Fig. 6).

Crop type mapping using merged classes
Based on the observed backscatter profiles (Fig. 3) and confusion matrices (Supplementary material 2), we merged all grains grown in winter into the class 'winter grains' and those grown in summer into the class 'summer grains', while leaving potato and maize as individual classes. Next, we re-ran all classifiers (twDTW, vtwDTW, and AMSS) using the backscatter dataset of Sentinel-1. The merging of grain crops with similar leaf geometry, morphology, and seasonal growth pattern improved crop classification accuracy ( Table 6). The highest overall accuracy of 80.6% was achieved using twDTW, while the lowest accuracy, using AMSS, was still as high as 54.9%. Merging the grain classes also reduced within-polygon mixing of crops, especially the confusion between potato and maize (Fig. 7). Compared to the results achieved for individual winter grains (Tables 3-5), the user and producer accuracies for the merged winter grains were significantly higher, reaching 59.6% and 62.6%, respectively.

Discussion
The central aim of this study was to assess the accuracy performance of different DTW classifiers in a fragmented farming area with relatively small plot sizes. Specifically, we investigated whether crop classification in areas of such complexity could be improved by 1) using twDTW on time series of radar images with different spatial resolution and polarization (Sentinel-1 versus TerraSAR-X), including decision level fusion of the two radar outputs; 2) using crop-specific logistic weights (vtwDTW) to account for variable planting and harvesting dates within the same crop type (time lags) ; 3) exploring angular distance (AMSS) as an alternative similarity measure to Euclidean distance; and 4) merging crops with similar leaf morphology and seasonality to reduce confusion between classes.
In general, the classification accuracy of the DTW classifiers tested in this study hinges on the separability of the temporal backscatter profiles of the various crops. The profiles measured in our study (Fig. 3) show that, in all observables (VV, VH, and especially HH), maize and potato showed a similar profile shape which only differed in amplitude. Before these crops emerge, backscatter decreases due to changes in surface roughness as a result of tillage (Mc Nairn and Brisco, 2004); after sprouting, backscatter increases, levels, and then drops, reflecting the scattering responses synchronized with changing Leaf Area Index (LAI) (Macelloni et al., 2001), crop fraction of vegetation cover (Liao et al., 2018), crop height (Nasirzadehdizaji et al., 2019), and finally, harvesting (Harfenmeister et al., 2019;Shang et al., 2020). The backscatter peak from potato was higher than from maize, which is likely due to the randomly oriented and dense potato leaves (Khabbazan et al. 2019) and the greater effect of surface soil moisture in the less dense maize canopy (El Hajj et al., 2019). In contrast to maize and potato, the backscatter profiles of the various summer grains showed a continuous decrease throughout most of the season. This finding agrees with Macelloni et al. (2001), who found that the backscatter from crops with narrow dimension and vertically oriented leaves, such as wheat and barley, starts to drop as soon as their Leaf Area Index (LAI) increases. At the end of the growing season, backscatter from both summer and winter grains is influenced by the presence of crop residues and surface moisture after harvest (Mc Nairn et al., 2001), subsequent tillage operations (Mc Nairn and Brisco, 2004), weed regrowth, and planting of secondary crops. These secondary crops are not registered in the dataset we used for training and validation, but they may have influenced the temporal profiles of the primary crops and hence the accuracy of the classification.
The shape similarity of these temporal profiles was particularly strong in the profiles generated with TerraSAR-X single polarimetry (HH), despite the higher spatial resolution of this sensor (Fig. 3C). As a result, confusion between maize and potato and between the various grain crops was high, which negatively affected the accuracy performance of all classifiers (twDTW, vtwDTW, AMSS) when using TerraSAR-X data. We expect that the utilization of dual polarimetry TerraSAR-X (HH + VV) would have achieved better accuracy for crop type mapping (Sonobe, 2019), but this option was not available for our study area.
Of the three classifiers tested, the highest classification accuracy was achieved with twDTW on Sentinel-1 (VV and VH) data including Ratio, MRVI, and DPSVI indices (69.5%); without those indices, accuracy was slightly lower (67.5%) ( Table 3). Including polarimetric decomposition features (H-A-α) did not improve classification with twDTW, which confirmed findings by Li and Bijker (2019). Since Sentinel-1 is not fully polarimetric, it cannot effectively characterize surface and volumetric scattering from the crop canopy (Ji and Wu, 2015). Furthermore, crops that have the geometry and leaf morphology similar to wheat, barley, and triticale in one cluster and maize and potato in another cluster, respond similarly to the incident SAR signal. As crop height, LAI and vegetation cover reach their peak, crops exhibit similar scattering characteristics for an extended period. It can be argued that polarimetric decomposition features (from fully polarimetric sensors) would be more relevant to identify crop emergence or to distinguish cropland from other covers. Kumar et al. (2020) noted that compact polarimetric SAR data from Radarsat-2 with iS − Ω decomposition could be used to characterize growth stages of rice, cotton, and sugarcane crops.
Although we found the crop classification accuracy achieved with twDTW on Sentinel-1 (VV + VH) promising, it was lower than the accuracies found by other studies using time series of optical images (Belgiu et al., 2020;Belgiu and Csillik, 2018;Csillik et al., 2019). However, the test sites of these studies were less fragmented and had larger plot sizes than our test area. As noted in recent studies (Arias et al., 2020;Orynbaikyzy et al., 2020), the existence of small and fragmented plots has a significant impact on classification accuracy for crop type mapping, as fragmentation causes an increase in the number of mixed pixels with different crops in one pixel. Among studies using SAR instead of optical imagery, Li and Bijker (2019) found that twDTW on Sentinel-1 data performed well for the classification of short-cycle vegetables, reaching an accuracy of 80% in a relatively complex farming area in Indonesia. In a less complex farming area in the Netherlands, Danilla et al. (2017), using SAR data with a machine and deep learning models followed by post-classification smoothing, reported accuracies comparable to our study (60.28%, 59.33%, 64.12%, and 70.85% for SVM, RF, CNN, and CNN + MRF respectively). Like in our study, these authors also noted a strong confusion between crops with similar physical structure. In our case, we did not implement a post classification smoothing procedure, to avoid removal of small farm plots.
Contrary to our expectations, introducing a variable time lag for logistic weight calculation (vtwDTW) to account for crop-specific variation in planting date resulted in slightly lower accuracies than when using a fixed time lag (twDTW) (compare Tables 3 and 4). Most likely this was due to the less precise crop calendar used for this study, which was generated from the literature and not from local field observations. In addition, the high confusion between crops with similar temporal profiles may have played a role. Furthermore, our test site had distinct seasons and mechanized agriculture, which likely resulted in crops being planted in a relatively short time span at the start of the season, decreasing the time lag between early and late sowing dates of the same Table 6 Accuracy of crop type classification after merging crops with similar leaf structure and growing season, using Sentinel-1 and TerraSAR-X data. crop. It would be worthwhile to test vtwDTW in an area with larger time lags and less pronounced seasons. Like for vtwDTW, accuracies achieved with AMSS were lower than for twDTW, whether used on Sentinel-1 or TerraSAR-X data (compare Tables 3 and 4). Previous studies on AMSS for time series classification have reported mixed results. For example, Choi and Kim (2018) concluded that AMSS was more robust than other DTW classifiers for gesture recognition, while Nakamura et al. (2013) reported that the performance of AMSS depended on the dataset. The latter finding calls for investigation of the SAR or optical datasets to establish a concrete knowledge base on the predictive potential of the AMSS model for crop classification. In the remote sensing domain, Teke and Yardimci (2019) implemented AMSS for crop type mapping from harmonized Landsat 8 and Sentinel-2 time-series and reported higher accuracies than our study. The difference between their findings and ours is most likely due to variations in input images and training samples, as well as differences in plot sizes and complexity of the study area. For example, Teke and Yardımcı (2019) studied a less complex area with only two crops: maize and cotton, while our study included more crops and small farm plots. Part of the low performance, of AMSS lies in the strong confusion between maize and potato. We assume this could be attributed to a strong scattering similarity of the two crops as they are both broadleaved crops. Moreover, both crops contain different varieties with minor differences in sowing date and growing period. This may all constrain the model's learning existing temporal patterns using angular metrics for shape similarity.
For all classifiers tested (twDTW, vtwDTW, and AMSS) decision level fusion of Sentinel-1 and TerraSAR-X outputs yielded better results than when using the classifiers on the individual datasets (compare Tables 3,  4 and 5). This has two main reasons. First, for a given pixel, decision level fusion assigns the crop type with better separability based on the standardized accuracy metric from the confusion matrix. Second, the classifiers benefit from the synergy of Sentinel-1 dual polarimetry and TerraSAR-X high spatial resolution. While studies on fusing SAR data of different spatial resolution and polarimetry are lacking, studies on fusing SAR with optical data have found decision level fusion effective to increase accuracy performance for crop type mapping (e.g., Waske and van der Linden, 2008). Our decision level fusion of SAR data achieved higher accuracy (77.1%) than what Aguilar et al. (2018) found for complex farming areas with the voting of ensemble classifiers using time series of optical Worldview-2 imagery (75.9%). Results of Orynbaikyzy et al. (2020), using decision level fusion of optical and SAR images and merging of similar crops, also reported comparable results to ours. In addition, as indicated in Fig. 6, results of decision level fusion of NNresampled output of Sentinel-1 and output of TerraSAR-X have smooth boundaries, in agreement with the reference polygons, which confirms good positional accuracy. Provided each polygon contains a single crop and polygons are not linear features, distortions in positional accuracy of pixels as a result of NN-resampling were negligible.
Finally, using the Sentinel-1 backscatter dataset, we found that merging the six different grain crops into two groups, i.e. summer grains and winter grains, significantly improved the accuracy of crop classification (Table 6). The merging of grains reduced within-polygon confusion of crops, not only for the grains but also for maize and potato (Fig. 7), which increased the overall accuracy of all classifiers. The highest overall accuracy of 80.6% was achieved with twDTW, while the lowest accuracy, using AMSS, was still as high as 54.9%. Hütt and Waldhoff (2018) achieved similar results from dual polarimetric TerraSAR-X data with an RF classifier. Following a hierarchical approach, Bargiel and Herrmann (2011) also reported significant accuracy improvements in classification when grouping crops with similar leaf geometry. In a study that fused Sentinel-1 and Sentinel-2 datasets for crop mapping, the grouping of cereals and legume classes also significantly increased the classification accuracy (Orynbaikyzy et al., 2020).

Conclusions
Our study shows that, even in a fragmented landscape with relatively small plots (around 0.08 ha), crop types can be effectively mapped using twDTW on Sentinel-1 (VV and VH) data and on the fused outputs of Sentinel-1 (VV and VH) and TerraSAR-X (HH), achieving accuracies of up to 77.1%. Although our results for twDTW on the fused SAR datasets show lower accuracies than those reported by previous studies using twDTW on optical data (note: in areas with larger plots sizes), we consider twDTW on SAR output a viable alternative for areas with persistent cloud cover. In terms of selecting the most suitable classifier, our study shows that twDTW outperforms vtwDTW and AMSS, similar to findings by Dong et al. (2020), and also reaches higher accuracies than those reported for the original DTW and time weighted Derivative Dynamic Time Warping (twDDTW) (Belgiu et al., 2020). In areas where crops have highly similar leaf geometry and growing season, the accuracy of twDTW can be further improved by merging crops into larger classes; in our case, we found that merging grain crops into summer and winter grains increased twDTW accuracy from 67.5% to 80.6%, including higher accuracies for maize and potato. Overall, our findings demonstrate that twDTW on SAR data allows to map crop types in a fragmented landscape with relatively small plot sizes, which offers potential for areas with smallholder farming.
For further research, we recommend: • Training and testing the twDTW classifier with more precise observations from the field, including secondary crops; • Using a site-specific crop calendar based on field observations, rather than a general crop calendar from the literature, to better study the effect of variable time lag in the vtwDTW classifier; • Testing vtwDTW in an area with less pronounced seasonality and a larger variation in planting dates to better test differences in time lags; • Using dual or quad polarimetry data instead of single polarization TerraSAR-X HH, either from TerraSAR-X or other SAR sensor with high spatial resolution, to improve crop classification accuracy; • Conducting a transferability test to other geographic regions with complex farming systems to extend the knowledge base on the performance of DTW-classifiers with SAR imagery, particularly in smallholder farming areas.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.