Vegetable mapping using fuzzy classification of Dynamic Time Warping distances from time series of Sentinel-1A images

Vegetable production is important because of the food security, diet improvement and socio-economic value. Mapping the location and extent of vegetable fields is therefore important in agricultural policy, food security and farmer support. Dynamic Time Warping (DTW) is a common way to map crops from time series of satellite images. However, as all hard classifications, it does not show the spatial distribution of uncertainty in the classification. In fuzzy classification, where memberships to multiple classes are assigned to each pixel, differences in membership between the first and the runners-up class can be used to assess classification uncertainty at the pixel level. This research formulates a fuzzy classifier based upon Time-Weighted Dynamic Time Warping (TWDTW) distances to map vegetable types from time series of Sentinel-1A SAR images. For each pixel, the TWDTW distances to the classes was normalised by dividing them by the sum of all TWDTW distances to all the classes for that pixel. The normalized distances were then used to compute fuzzy memberships for each pixel to each class, using the Gaussian membership function. Based on these memberships, fuzzy measures such as Confusion Index (CI), Ambiguity Index (AI), fuzziness and fuzzy membership were calculated and different thresholds applied on each of the measures during subsequent defuzzification. The overall accuracy and kappa coefficient of the defuzzified output results were 0.86 and 0.83, respectively, which was an improvement with regard to the crisp Time-Weighted Dynamic Time Warping with SPRING strategy for subsequence searching (TWDTWS) algorithm with 0.73 and 0.68 for overall accuracy and kappa, respectively. This study concludes that this new approach improves classification accuracy in image classification by excluding pixels with high uncertainty, which is especially relevant when only a limited number of classes are sampled and mapped.


Introduction
Vegetable production plays an essential role at the local and national levels in terms of socio-economic and food security for people in urban and rural areas of developing countries (Joosten et al., 2015). In the global food economy, the most dominant vegetables are tomatoes, cucurbits (pumpkins, squashes, cucumbers and gherkins), alliums (onions, shallots, garlic) and chilies (Schreinemachers et al., 2018). The increase in population and income has created demand for vegetables from customers seeking to diversify their diets with vegetables (Schreinemachers et al., 2018). Demand for vegetables has provided a more significant potential for economic growth among smallholder farmers (Schreinemachers et al., 2018). For instance, tomatoes are the fourth most economically valuable food crop produced in low and middleincome countries with a trade value of US$63 billion per year in 2012-2013 after rice, sugarcane and wheat (Schreinemachers et al., 2018). Hence, in this case, vegetable maps play an essential role in making projections for future agricultural land when per capita vegetable consumption demands maximization of local food production. These maps can be used by different stakeholders interested in knowing the spatial distribution of these vegetables. For this specific area, the vegetable chilli, cucumber and tomato were selected by the stakeholders of a local project (G4AW SMARTSeeds) as the most relevant for mapping.
Agricultural land is strongly affected by spatial and temporal dynamics within and between each vegetation season and for this application, time series images with short revisit time are ideal for classification (Bargiel, 2017). Multitemporal images with short revisit have advantages over single or bitemporal images in crop mapping. On single or bi-temporal images it is not possible to capture different spectral characteristics of crops over time which can be of use in feature detection and identification of agricultural crops as is the case with multitemporal images (Odenweller and Johnson, 1984). Furthermore, Some crop types can only be distinguished at specific times of the year and appear similar throughout the rest of the growing season (Hütt and Waldhoff, 2017). The other advantage of multitemporal images is that classes that are spectrally overlapping due to the temporal nature of the classes can be separated using multispectral images (Chandola and Vatsavai, 2010). Different methods of classifying agricultural crops from time series data exist. For instance, Random forest (RF) has successfully been applied in crop type classification using different time series images such as Landsat (Tatsumi et al., 2015) or Sentinel-2 (Belgiu and Csillik, 2018). RF is an ensemble classification algorithm that uses decision trees and bootstrapping with replacement by allocating each pixel to a class with maximum number of votes from a collection of trees (Breiman, 2001). Despite its reported efficiency for crop mapping, this classifier has the challenge of handling change in phenological cycles of the target crops that could be caused by climate variations or changes in the agricultural practices (Cheng and Wang, 2019). Other studies have used Artificial Neural Network (ANN) algorithm to generate crop maps from time series images (Kussul et al., 2018). ANN are non-parametric classifiers that are able to detect specific patterns in a time series data after learning those patterns from the training data (Thakur and Maheshwari, 2017). The disadvantages of ANN is that they are data driven meaning they need an increasing number of training data to learn the crop patterns which is time consuming and expensive (Thakur and Maheshwari, 2017). Delineation of agricultural fields into smallholder farms has been performed successfully using a Fully convolutional network on single images of WorldView-2 and 3 and not multitemporal data (Persello et al., 2019). The challenge with a Fully convolution network is that they have complex architecture and time intensive training phase that can take weeks to complete (Khan et al., 2020). Support vector machines (SVM) have also been used to classify major crop lands based on Sentinel-2 time series NDVI data by effectively distinguishing corn, soyabean and alfalfa (Kang et al., 2018). Though, SVM is able to classify the major crops, it has expensive computational cost when computing the kernel function (Gudmundsson et al., 2008).
The challenges associated with the changes in the phenological characteristics of crop and the lack of a large number of training samples required by supervised classifiers can be solved by the use of Dynamic Time Warping (DTW). DTW works by comparing a temporal signature of a training sample to the unlabeled pixels by first creating a cost matrix and then finding the optimal path based on a distance similarity measure (Berndt and Clifford, 1994). In remote sensing, DTW is capable of eliminating year to year phenological differences by realigning multitemporal images to a common phenology (Baumann et al., 2017). The advantage of DTW is that it uses a simple calculation process and a lower learning process pressure (Choi and Kim, 2018) and requires a low number of training samples as compared to other supervised classifier. Originally, DTW algorithms were developed for speech recognition (Sakoe and Chiba, 1978) as dynamic programming algorithms for pattern matching of spoken word with a nonlinear time normalization effect. DTW aims at finding an optimal match between two sequences by allowing nonlinear mapping of one sequence to another and minimize the distance between the two sequences (Jeong et al., 2011). Upon completion of the distance computation, the two sequences are then warped in a nonlinear manner to determine their similarity independent of any variations in the time dimensions (Ibrahim and Valli, 2015). In this way, DTW can be used for measuring the similarity of time-series events even when there is a distortion through the matching of the reference sequence and a test sequence (Ibrahim and Valli, 2015). In remote sensing, DTW has been used to assess the similarity between two temporal sequences: one sequence representing the training or query sequence and another one representing the sequence of the pixel to be classified. A temporal sequence consists of temporal profiles of radiometric values corresponding to a pixel in satellite image time series (Petitjean et al., 2012). Temporal sequences capture the temporal variations, i.e. phenology, of the target crops. DTW is capable of dealing with temporal distortions caused by irregular sampling (e.g. because of presence of clouds) (Viana et al., 2019) and shift in the phenological cycles caused by the variation in the agricultural practices and meteorological conditions (Baumann et al., 2017). If the two sequences are not equal in length, DTW is capable of finding a subsequence (shorter sequence) within the longer sequence that optimally fit the shorter sequence (Müller, 2007).
DTW distances have the advantage of being easy to integrate with many classification techniques and replace the Euclidean distance, thereby improve the classification accuracy (Guan et al., 2016). For this reason, new versions of DTW have been developed to enhance the performance of time series matching, such as Time-Weighted Dynamic Time Warping (TWDTW) (Maus et al., 2016). TWDTW introduced a time weighting constraint to the original DTW to overcome phase shifts resulting from seasonal changes of natural and cultivated vegetation (Maus et al., 2016). TWDTW was used to capture seasonality changes in natural vegetation and cultivated crops growing over different seasons and subsequently map them accurately from time series images. According to Maus et al. (2016), TWDTW is said to be suitable for the classification of time series in remote sensing.
Further integration of the DTW was done by incorporating SPRING to the TWDTW referred to as Time-Weighted Dynamic Time Warping with SPRING strategy (TWDTWS) (Li and Bijker, 2019). SPRING is a modified version of DTW that has been used to process streaming data under DTW measure (Sakurai et al., 2007). The advantage of SPRING is that it is able to detect efficiently high similarity in a data stream by effectively detecting qualifying subsequence correctly at an enhanced speed (Sakurai et al., 2007).
DTW and its derivatives, including the TWDTWS are usually implemented as hard classifiers as these do not provide information of how close the runners-up class is to the best classification result to a class. Hard classifiers use class probability estimation to decide on the classification boundary in the feature space to assign each pixel to a class to which it has the highest probability (Liu et al., 2011). The limitation of hard classifiers is that they assume that pixels are pure and categorize each into one and only one class (Thakur and Maheshwari, 2017), without regard to similarity between the best class and the runner-up class for that pixel. In contrast, soft or fuzzy classifiers, assign to each pixel a degree of similarity (membership) to every class of the classification scheme (Choodarathnakara et al., 2012). A fuzzy membership to a class is a continuous value between 0 and 1, where 0 indicates that the membership condition is not fulfilled, 1 means the membership condition is completely fulfilled and a value between 0 and 1 means it is partially fulfilled (Hofmann, 2016). In fuzzy classification, fuzzy sets are used to reduce information loss by allowing gradual change from a membership to a non-membership (Yang et al., 2016). The gradual change in membership between the best class and the runner-up classes is essential in remote sensing for similarity and certainty measures between the best class and runners-up classes (Hofmann, 2016).
In remote sensing, DTW algorithms can be applied singularly or in combination to classify agricultural crops using optical and/or SAR images. The use of optical images in classification can be better as compared to SAR images because optical data provide more robust and interpretable images for delineating land use and land cover classes in image classification (Joshi et al., 2016). However, optical images have limited use under cloud conditions, which affect the quality of details in the images (Stendardi et al., 2019) while Synthetic Aperture Radar (SAR) images are less affected by the atmosphere and clouds as they operate in the microwave range and pass with less attenuation (Stendardi et al., 2019). Hence, in areas with cloud cover during most of the growing season, SAR sensors are useful for the collection and monitoring of vital information on the growth of crops to achieve sound agriculture management and policy-making  vegetables Chili, Tomato, and Cucumber in Indonesia. Their findings revealed that TWDTWS has the potential of providing means of distinguishing vegetable types from Sentinel-1A time series images.
This current work is based on the TWDTWS (Li and Bijker, 2019) and the main focus is to use fuzzy classification of the TWDTW distances to determine the uncertainty in the classification for the classes of interest are Chili, Tomato and Cucumber.

Study area
The study area for this research is Lampung on the Island of Sumatra, Indonesia. The study area location coordinates are 5 • 2 ′ 42 ′′ S, 104 • 42 ′ 0 ′′ E and 5 • 24 ′ 43.2 ′′ S, 105 • 24 ′ 0 ′′ E. This area was chosen because there was a project for G4AW SMARTSeeds that was closely working with farmers to improve their livelihoods through vegetable growing (SMARTSeeds, 2019). This region has a tropical climate with a relative humidity of 60-80%, temperatures of 23-37 • C, and annual precipitation ranging from 2257 to 2454 mm/year (Banuwa et al., 2019). The area experiences two seasons throughout the year, which are the dry and wet season, with higher precipitation occurring from December to April (Dewi et al., 2019).

SAR Images
Sentinel-1A time series satellite images from 1st April to 28th October 2018 were obtained from European Space Agency (ESA) as the providers of the Sentinel-1A product. We selected this time series because the focus of the study was to classify vegetables grown in the dry season from April to October (Dewi et al., 2019). Sentinel-1A images of Descending mode were acquired. Interferometric Wide Swath (IW) mode with VH and VV polarisations was used to capture features on land (https://sentinel.esa.int/web/sentinel/user-guides/sentinel-1-sar/a cquisition-modes/interferometric-wide-swath). The product type of the images are Level-1 Ground Range Detected with multi-looked intensity of 5 range looks and 1 azimuth looks. The images were acquired with a slant range resolution of 10 m and azimuth resolution of 10 m.

Field data
Field survey data used in this study included the spatial location of the agriculture fields and type of crop that has been grown. These data were collected in July 2018 as part of the G4AW SMARTSeeds project. A total number of 29 training (i.e., 5 Chili, 5 Tomato, 5 Cucumber, 5 Rice, 1 Maize, 5 Trees and 3 Others) and 37 validation (8 Chili, 6 Tomato, 7 Cucumber, 6 Rice, 1 Maize, 5 Trees and 4 Others) field samples were used in this study. The training samples are lower than the validation samples given the fact that DTW requires a low number of training samples (Belgiu and Csillik, 2018). Therefore, we decided to use more samples for validation as compared to the training samples. Their location is shown in Fig. 1. These crops were selected in accordance with the G4AW SMARTSeeds project as the most relevant crops for this area (SMARTSeeds, 2019).

Methodology
The main focus of this study was to map vegetables of chili, tomato and cucumber using fuzzy classification of TWDWS distances. The proposed methods applied in this study are shown in Fig. 2. The first steps are similar to Li & Bijker (2019) and include image processing, assessing the growth patterns for the crops, assessing crop separability and calculating the TWDTW distances. However, calculation of distances is not followed by assigning the pixel to the class with the shortest distance, but by fuzzification and defuzzification, leading to a vegetable map and an uncertainty map.

Image processing
Sentinel-1A time series images were processed using the European Space Agency (ESA) toolbox, the Sentinel Application Platform (SNAP) version 7.0 (SNAP, 2019). SNAP was used to extract the VV and VH from the dual polarised backscatter time series (Jiang et al., 2019). An automated image processing workflow described by Li & Bijker (2019) was applied in this research. The process steps included downloading the images, applying the orbit profile, radiometric calibration, speckle filtering and terrain correction. For polarimetric speckle filtering the SNAP integrated Refined Lee filter with a window size of 7 × 7 was used to reduce noise while preserving spatial resolution, polarimetric scattering property and statistical characteristics of the backscatter signal (Lee et al., 2015).

Growth pattern
In this study, the land cover classes used in the classification are Chili, Tomato, Cucumber, Rice, Maize, Trees and Others using Sentinel-1A images. A total of 476 pixels (80 Chili, 80 Tomato, 80 Cucumber, 80 Rice, 28 Maize, 80 Trees and 48 Others) were collected from the training samples to create class temporal sequences from VH, VV and VH minus VV polarisation and to come up with growth patterns and assess class separability from the profile generated. The mean for each date of the samples for each class was obtained. The Date of Year (DOY) is a number from 1 to 365 of a year. In the study, DOYs corresponding to the acquisition dates of the images were used to generate the growth patterns (see Section 5.1). The mean backscatter signal in each class was plotted against the DOY (Li and Bijker, 2019). Then, the Savitsky-Golay filter was also applied to smoothen and fit the data points for the polarisation features to a polynomial using least-squares (Savitzky and Golay, 1964). The growth patterns were then used to discover the temporal changes in the backscatter signals of the pixels for the classes under study (Li and Bijker, 2019).

Class separability
A measure of class separability using the VH, VV and VH minus VV polarisation was conducted using the Transformed Divergence (TD). This was done to determine in which feature the classes are separable based on the training data for the TWDTWS algorithm. The formula to compute TD is shown in equations (1) and (2) (Swain and Davis, 1978). (1) where D ij is divergence between i and j signature classes compared, V i − V j is the difference between the covariance matrices of signature i and j, M i − M j is the difference between the mean vectors for signatures i and j, tr is the trace function, T is the transposition function and TD is Transformed Divergence.

TWDTW distances
The TWDTWS algorithm developed by Li and Bijker (2019) and implemented in Python by Sitanggang et al. (2019) was used to generate TWDTW distances to query subsequence classes for Chili, Tomato, Cucumber, Rice, Maize, Trees, and Others. The TWDTWS algorithm uses a logistic function that has parameters for steepness, midpoint and weight (Li and Bijker, 2019). The parameter settings adopted in this study were 0.2, 5 and 0.5 for steepness, midpoint and weight parameters, respectively. These parameters were selected in our study because they were reported by Li and Bijker (2019) as giving the best classification accuracy. Then the normalized TWDTW distances for each pixel in each class query was computed using the total sum of TWDTW distances in a class, following equation (3). As a result, the sum of all normalized TWDTW distances of a pixel is equal to 1, as suggested by Bezdek et al. (1984).
where x is the normalized TWDTW distance between pixels on two temporal sequences, TWDTW represents the TWDTW distance for a class in a pixel and ∑ TWDTW is the sum of all TWDTW distances for all classes for a pixel.

Fuzzification
Uncertainty in classification is caused by the close similarity of classes, which is not directly visible in hard classification. This is because even when two classes have almost equal (low) probability, still only the best class is selected. The use of fuzzy sets to assign class memberships can help in assessing uncertainities in the classification at the pixel level by using the difference in memberships of a pixel to several classes. In this way, the spatial distribution of uncertainty can be mapped and the pattern of the areas with high uncertainty may reveal underlying causes of misclassification. Areas with high uncertainty can be excluded from classification or receive further investigation. In cases where mapping and sampling is limited to a number of classes of interest, excluding areas with high uncertainty could be a suitable means to prevent pixels of classes that were not sampled to be labelled wrongly as belonging to one of the classes of interest.
A Gaussian membership function was used to compute fuzzy memberships so that the memberships have a gradual change by following the normal distribution of the Gaussian function (Siler and Buckley, 2004). The Gaussian membership function is a continuously differentiable curve with smooth transitions, unlike the triangular or linear membership functions that do not possess these abilities (Hameed, 2011). Furthermore, the Gaussian membership function assumes that the classes are normally distributed with an infinity support base (Siler and Buckley, 2004). This allows a pixel, especially along class boundaries, to have partial memberships to multiple classes based on the classification scheme. This means that the classification of a pixel is composed of the best membership class and the runners-up class memberships (Hofmann, 2016). In fuzzy classification, the total sum of fuzzy memberships for each pixel should sum up to unity (1) because the memberships inside a pixel closely match landcover proportions (Bezdek et al., 1984). However, it is not always a must that the total sum of membership for a pixel should equal to 1, as in the case of fuzzy memberships derived from an artificial neural network (Foody, 1996).
The Gaussian membership function is defined in equation (4) (Hameed, 2011) where μ is the membership value (or membership grade) obtained from the Gaussian membership function, x and σ stands for the mean and standard deviation of the normalized TWDTW distances to the class query sequences, respectively. Fuzzy memberships were normalized so the sum in each pixel would be equal to 1, using equation (5) μ =μ ∑μ (5) where μ is normalized fuzzy membership, μ is the original fuzzy membership of a class to a pixel and ∑μ is the sum of all the original fuzzy memberships of a pixel.

Measures of uncertainty
The indices used to measure uncertainty included the Confusion Index (CI), Ambiguity Index (AI) and fuzziness.

Confusion index
The CI is a similarity measure that measures how fuzzy membership differs between the best class and the first runner-up class and the range for CI is from 0 to 1. Higher values of CI closer to 1 mean that the first runner-up class has a membership grade similar to the best class, i.e., high uncertainty. Lower values of CI closer to zero indicate that the runner-up class and the best class are not similar hence the uncertainty is also low (Hofmann, 2016). The formula for CI is shown in Equation (6) adopted from Burrough et al. (1997) where μ 0 is the membership for the best class result and μ k is membership for the first runner-up class.

Ambiguity Index
The AI is the measure of uncertainty between the best possible membership grade and the best-achieved membership of a pixel. In this case, the best possible membership for a pixel is 1. The range for AI is from 0 to 1 and a value for AI = 0 indicates that there is no uncertainty in the fuzzy classification, while AI = 1 means that there is high uncertainty (Hofmann, 2016). The formula for AI is shown in equation (7) as adopted from Burrough (1996) AI = 1 − μ 0 (7)

Fuzziness
Fuzziness is defined as the measure of the extent to which a set is fuzzy and not crisp (Siler and Buckley, 2004). Fuzziness tends to be   higher when there are more memberships of 0.5 in a pixel and thereby make the classification to be fuzzier. For example, if a pixel has two similar memberships belonging to the best class and runners-up class of 0.5, its fuzziness will be 2 (Siler and Buckley, 2004). Similarly, fuzziness decreases when there are more class memberships of 0 or 1 (Hofmann, 2016). Fuzziness can be computed using equation (8) where Fuzz 1 is fuzziness.

Defuzzification
Defuzzification is a process by which elements in a fuzzy set are converted to a crisp value deemed to be the best representative of the fuzzy set (Klir and Yuan, 1995). The defuzzification process is performed to achieve a better understanding of the output of the fuzzy classification (Onashoga et al., 2018). Matsakis et al. (2000) mentioned that during defuzzification, pixels have to be assigned to a class to which they have the highest membership.

Decision rules for defuzzification
Defuzzification involves setting decision rules based on a user-      defined threshold value on AI, CI and fuzziness. Pixels with values below the threshold were defuzzified and assigned to the class to which they have the highest membership (Islam and Metternicht, 2005). In the case of pixels with values above the selected threshold, these remain unclassified (Hofmann, 2016). Threshold values of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 and 0.9 were applied for defuzzification of AI and CI. For fuzziness, the threshold values of 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6 and 1.8 were applied during defuzzification. The threshold ranges were chosen based on the range of the respective measures (AI, CI and fuzziness). The defuzzification threshold on fuzzy membership was also applied with threshold values of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 and 0.9. In the case of defuzzification based on fuzzy membership, pixels with membership below the threshold remain unclassified, while those above the threshold are assigned to the class with the best membership. The defuzzification threshold on fuzzy membership was done to assess the certainty of the classification of pixels which can give better overall accuracy results.

Accuracy assessment
The accuracy assessment metrics used to evaluate the classification results for the defuzzification output and TWDTWS were User's accuracy, Producer's accuracy and overall accuracy (Congalton, 1991). Kappa coefficient metric was also assessed on the classification products (Cohen, 1960). A total of 633 reference points from the validation samples were used for the accuracy assessment. The reference sample points were used to validate what has been classified to what is on the ground. Fig. 3 shows the growth patterns of Chili, Tomato, Cucumber, Rice, Maize, Trees, and Others using the VH, VV, and VH minus VV signals, respectively, plotted against the Date of Year (DOY). The curves show that to some extent, the classes for Chili, Tomato, Cucumber, Rice, Maize, and Other crops can be separated from each other as compared to VV and VH minus VV signals. However, other measures such as TD can be applied to assess class separability of the crops.

Class separability
Tables 1 to 3 show the results for class separability using TD from the VH, VV and VH minus VV polarisations, respectively. The results indicate that using the class pairs for Chili-Tomato, Chili-Cucumber, Chili-Rice, Tomato-Cucumber, Tomato-Rice, Cucumber-Rice, Tomato-Trees, Cucumber-Trees and Rice-Trees highest values of 2000 are obtained for VH. The lowest TD of 1713 was recorded between the Maize-Others class pair in the VH polarisation. The results indicate that TD values for the class pairs using VV are lower than those using the VH (Table 2). Lowest TD values are obtained using VH minus VV as observed in Table 3.

Measures of uncertainty
The results of the measures of uncertainty based on the CI, AI and fuzziness accounted for pixel uncertainties in the study areas. For example, the spatial distribution for fuzzy membership to the best class is higher in agricultural fields and lower in areas that are mostly covered by trees and mountainous, as illustrated in Fig. 4. The spatial distribution of uncertainty for CI, AI and fuzziness show a similar pattern in terms of where uncertainty is higher and also where uncertainty is lower, as can be noted from Figs. 5, 6 and 7 for CI, AI and fuzziness, respectively. The range for the values of CI obtained is from 0.02 to 0.83. The results mean that a pixel having a CI value of 0.02 has a lower confusion while a pixel with a value of 0.83 has high confusion in assigning a class to a pixel. In the case of AI the range of values obtained are from 0.01 to 0.42. This entails that a pixel with an AI value of 0.01 has a lower uncertainty while a pixel with AI value of 0.42 has higher uncertainty. As for fuzziness, the range of values obtained where from 0.04 to 1.67. This means that a pixel with fuzziness value of 0.04 has a lower uncertainty while a pixel with a value of 1.67 has higher uncertainty.
The results of the measures of uncertainty using CI, AI and fuzziness show a similar spatial distribution of the extent of the uncertainties. Though the spatial distribution of uncertainty with CI, AI and fuzziness is similar, CI is the only measure that focuses on establishing the difference between the fuzzy memberships of the best class from the runners-up class. In the case of AI, the focus is on the best class in terms of how it differs from the best possible membership, i.e., 1. Fuzziness, on the other hand, measures how the fuzziness of the fuzzy memberships, i. e., how the best and runners-up classes differ from being fuzzy (i.e., 1 or 0). Hence, these measures of similarity have shown how significant there are in highlighting pixels with higher uncertainty. Table 4 shows the results for the percentage of pixels classified using different thresholds for CI, AI, fuzziness, and the normalized fuzzy membership to the best class. The results show that thresholds of AI<=0.3, CI<=0.6, Fuzz 1 <=1.2 and μ >= 0.7 have 51.74% pixels classified. On the other hand, CI<=0.7 and Fuzz 1 <=1.4 has 95.64% of the pixels classified. The classified pixels are the one that satisfy uncertainty threshold conditions and it has to be noted that some of these pixels might be wrongly classified (see section 4.5). The number of classified pixels increases with increasing value for the threshold on AI, CI and fuzziness. In contrast, for the normalized fuzzy membership, the number of pixels classified decreases with an increase in threshold value. Table 5 presents the overall accuracy and kappa coefficient. The highest overall accuracies were obtained from AI<=0.3, CI<=0.6, Fuzz 1 <=1.2 and µ >= 0.7 with an overall accuracy of 0.86 and kappa coefficient of 0.83. For TWDTWS, the overall accuracy and kappa coefficient were 0.72 and 0.68, respectively.

Accuracy assessment
The results for User's and Producer's accuracies are presented in Figs. 8 and 9, respectively. The User's accuracy is highest for defuzzification thresholds of AI<=0.3, CI<=0.6, Fuzz 1 <=1.2 and μ >= 0.7. This was followed by the defuzzification threshold for CI<=0.7 and Fuzz 1 <=1.4, and TWDTWS had the least. It has to be noted that the User's accuracy for all the classes shows that the classes are well classified. This means that these results are reliable as the map generated represents what is really on the ground (Story and Congalton, 1986). However, the Producer's accuracy for the classes of Maize and Trees are lower for all the classification criteria applied in this study.
Figs. 10 and 11 show the vegetable maps from defuzzification results with the highest obtained overall accuracy and vegetable map from TWDTWS classification, respectively. The map generated by our method shows that most pixels have been unclassified. Though a close examination of the samples for validation shows that less pixels for the classes of Chili, Tomato and Cucumber remain unclassified as compared to the rest of the classes as illustrated in Fig. 12. Similarly, not all pixels classified are necessarily classified correctly as can be observed from the User's and Producer's accuracy.
The results from the defuzzification thresholds of AI<=0.3, CI<=0.6, Fuzz 1 <=1.2 and μ>= 0.7 have pixels classified with the same spatial location for all the defuzzification thresholds (Fig. 13). Similarly, unclassified pixels also have the same spatial location for these defuzzification thresholds.

Class separability and growth patterns
TD results show that all classes are more separable using VH polarisation compared to VV and VH minus VV. Class pairs with high TD values of 2000 means they are excellently separable, while TD values below 1700 are poorly separable (Jensen, 2015). In this study, Chili, Tomato and Cucumber are excellently separable using VH. Class pairs of Tomato-Trees, Cucumber-Trees and Rice-Trees also have high TD values of 2000. However, the class pair Maize-Others shows low separability of 1713. As for the VV polarization, most of the classes are still separable but less well compared to VH. All class pairs with Maize, Tomato-Trees and Rice-Trees are not separable with VV. In the case of VH minus VH, the only class pairs that are separable are Chili-Rice and Tomato-Rice. Hence, VH is better suitable for classification and also estimation of the cropping cycles for the crops as all classes are separable.
Growth patterns of the crops are of use in understanding the timevarying characteristics of the crop growth. This is because different crops have specific time-varying features (Gao et al., 2020). There are overlaps in the growth patterns of the crops due to the growth of stems and leaves at some growth stages, thereby affecting the scattering mechanism of the backscatter signal (Gao et al., 2020). The growth pattern curves with VH have some distinctive trends as compared to VV and VV-VH. Estimation of the crop cycle with VH is better as compared to the ones with VV and VV-VH. For instance, the approximate lengths of the cropping cycles are Chili 96 days (DOY 145 to 241), Tomato 60 days (DOY 229 to 289), Cucumber 48 days (DOY 241 to 289), Rice 120 days (DOY 121 to 241) and Maize 72 days (DOY 145 to 217). The approximated results for the crop cycle from the growth pattern curves are closely similar to those reported by Li and Bijker (2019).

Spatial pattern of CI, AI and fuzziness
Lower CI values mean that the classification has lower uncertainty, while pixels with higher CI values mean higher uncertainty between the best class and the runner-up class. Similarly, lower values of AI mean that the classification has lower uncertainty, while higher values of AI mean higher uncertainty of the best class achieved from the best possible class. Lower fuzziness values also mean that the classification in those pixels has lower uncertainty, while higher values mean higher uncertainty. In short, the classification is better in pixels where the fuzziness is lower and hence the uncertainty is lower than where it is higher.
CI, AI and fuzziness show similar patterns, with lower values in agricultural fields and higher values in areas covered by trees. Hence, this classification, based on these measures of uncertainty, performed better in classifying vegetables than in classifying trees. Lower values for CI, AI and fuzziness in the agricultural fields can be attributed to the homogeneity of the crops in the pixels. This means that a single pixel in the crop field has higher fuzzy membership to the best class than to the runner-up class in a particular pixel (Burrough et al., 1997), leading to lower CI and fuzziness value, as well as a high fuzzy membership of the best class, leading to lower AI. In contrast, pixels covered by Trees may include a mixture of species and some gaps with different vegetation. Such a mixed pixel directly affects the area proportion which each class in the pixel covers (Chhikara, 1984). This becomes obvious when assigning fuzzy memberships and the results have closely similar fuzzy memberships between the best class and the runner-up class (Hofmann, 2016).
Higher CI, AI and fuzziness in some pixels belonging to agricultural fields could be due to crops grown which are different from the sampled crops. In the study area, more types of vegetables were grown, which were not sampled nor included in the class Others. Other reasons could be in the variation in management practices, amounts of weeds and soil nutrients influencing the growth of the crops. Since the Radar signal is sensitive to soil moisture and water content in the canopy of vegetation, farm management practices such as irrigation can influence Radar backscattering at the time of image acquisition (Huang et al., 2015). This can introduce heterogeneity in moisture distribution and crop growth rate when the field is not flat, which in turn affects the TWDTWS classification during the generation of TWDTW distances.

Defuzzification results
The validation has shown that the defuzzification results have better overall accuracy and kappa coefficient than those achieved by the TWDTWS algorithm (Table 5). This is because during defuzzification, most pixels that were wrongly classified by the TWDTWS classifier remain unclassified after fuzzy classification of the TWDTW distances is applied and the pixels with high uncertainty are removed. The results obtained in this study have proven that fuzzy classification of TWDTW distances can be used to improve classification accuracy, as reported by Wang (1990).
However, a trade-off has to be made between the accuracy and the number of pixels left unclassified (Hofmann, 2016). In our case, it is not unreasonable to allow for a substantial part of the pixels to remain unclassified because not all classes present in the area were sampled as they are all not of interest to the project. The unclassified pixels can also represent classes characterized by high intra-class variance. The Tree class, for example, exhibited a great variation in species composition and therefore, we expected that parts of the land area covered by trees would remain unclassified after defuzzification. In cases where the goal is not to map all classes over an entire area, but only specific classes of interest, sampling can be limited to those classes and fuzzy classification of TWDTW distances can be used to improve accuracy and exclude pixels that do not belong to the classes of interest without the need to sample those classes as well. Furthermore, the pixels which remain unclassified, either belong to classes which were not sampled, or to classes with high uncertainty, such as trees. The study area has a relatively high proportion of the land cover occupied by trees, which is the class with higher uncertainty and as a result a higher number of unclassified pixels. In the case of Chili, Tomato or Cucumber classes, a smaller number of pixels were left unclassified. This means that our method was able to correctly classify most of the pixels of vegetables. Furthermore, this approach applied in our study can be useful in assessing the quality of samples collected in the field.

Conclusion
This study has shown that TWDTW distances can be used to assess uncertainties of SAR image classification using CI, AI and fuzziness. However, it has to be noted that each of the measures CI, AI and fuzziness have a specific emphasis in terms of how uncertainty is determined. For instance, CI puts emphasis on the closeness between the best class and runner-up class. AI emphasizes the difference between best possible fuzzy membership and the achieved best class membership while fuzziness looks at memberships closer to 0.5 for both the best and runner-up class. The findings in this study have shown that individual defuzzification rules for CI, AI, and fuzziness with an equal number of classified pixels produce similar results in terms of spatial pattern and accuracy.
Furthermore, the fuzzy approach used in this study has a better overall accuracy of 0.86 and a kappa coefficient of 0.83 as compared to the TWDTWS classification with an overall accuracy of 0.73 and a kappa coefficient of 0.68. Higher accuracy of this fuzzy approach goes at the expense of leaving pixels with high uncertainty unclassified, which is suitable when only a limited number of classes of interest are sampled and mapped, as in our case.
The approach used in this study has shown that fuzzy classification of the TWDTW distances can be applied in further work when assessing the representativeness of the training and validation samples. In this way, location with high uncertainty can be excluded or receive additional investigation or sampling, depending on the purpose of the work.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.