Mapping saffron fields and their ages with Sentinel-2 time series in north-east Iran

Saffron (Crocus sativus L.) is the most expensive spice worldwide and is predominantly produced in the Khorasan Province situated in north-east Iran. Climatic shifts and lowering groundwater tables negatively affect saffron yields in this region, which are determined by environmental factors, agronomical practices, and crop age. Nonetheless, spatially explicit information on changes in saffron cultivation is scarce, underlining a need for better monitoring tools. This study aims to evaluate the utility of Sentinel-2 (S2) time series in accurately mapping saffron fields and their ages (i.e., how many years saffron was cultivated in a field), based on its unique phenology. To separate saffron from other land covers, we first derived 252 spectral-temporal features by calculating 21 spectral features (10 individual bands plus 11 vegetation indices) for each of the 12 months. A Random Forest (RF) algorithm was then used in combination with field data to retain only features of high importance for saffron classification. These features comprised vegetation indices that incorporated spectral information from red, and nearand shortwave infrared bands during the phenological phases of the rapid greenup (February to March) and the dormant period (August to October). The RF classifier resulted in a saffron map for the year 2019 with a high classification accuracy based on these features. Compared against an independent in-situ saffron field dataset, 87.6% of the existing fields were correctly classified as saffron. To assess saffron field ages, we analysed the spectral separability of different age groups using the NDVI time series. We found that NDVI levels between December and May allowed for effectively separating 1st, 2nd, 3rd, 4th-6th, and 7th-8th year saffron fields. An RF-based classification of field ages resulted in an overall accuracy of 86.8%. This study demonstrated that S2 time series data allow for accurately mapping saffron fields and their age groups. Our findings provide a solid basis for mapping saffron across larger areas and for monitoring changes in saffron distribution. Such information is crucial for understanding how anthropogenic and climate change impacts will affect the future of saffron cultivation.

Iran is the dominant producer of saffron products worldwide, accounting for more than 90% of the world's saffron exports and 60% of the global saffron cultivation area (Ghorbani, 2007). Saffron cultivation has stimulated the local economy and created vast job opportunities for about 400,000 people (Esmaeilpour & Kardavani, 2011). The saffron crop adapts well to the hot arid climatic conditions in Iran. The saffron daughter corms can survive inside the soil during the hot summer due to their heat-tolerant characteristics and require a period of cold (<8 • C) temperatures for effective daughter corm development. Its cultivation has low fertilizer and water requirements, and is therefore suited for low-input organic cultivation practices.
Saffron yields are variable in space and time, with field age being a major yield-determining factor. Usually after initial planting, fields remain productive between five and eight years without the need to replant the corms. The dried stigma yield can vary from 1.5 to 15 kg/ha; it is relatively low in the first year and increases to the maximum level in the third or fourth year (Kumar et al., 2009). After that, the production declines because of the reduced size and reproduction capability of corms, as well as the increasing competition between corms for water and nutrients (Kumar et al., 2009). Besides field age, agronomic management and environmental factors also influence flower productivity and yield. Research showed that larger corm size and earlier sowing time positively affect stigma production (De Mastro & Ruta, 1993;Gresta et al., 2008). Irrigation shortly before saffron flowering influences the anthesis duration, flower density, and stigma quality. Due to the short height and narrow leaves of saffron vegetation, frequent weeding is needed to reduce competition for nutrients, sunlight, and water (Esmaeilpour & Kardavani, 2011;Ghorbani & Koocheki, 2017). Hosseini et al. (2008) found that stresses from environmental changes threaten the sustainability of saffron production and may demand geographic shifts to locations where the climate condition and groundwater availability become more suitable for saffron growth. Therefore, monitoring the saffron field distribution and field age information is an important first step to understanding how environmental changes affect saffron cultivation and yields.
Optical satellite imagery has been widely used to map and monitor cropland for large areas (Wardlow et al., 2007;Xie et al., 2019). Three studies, published in Persian, explored how mono-temporal optical imagery could help map the distribution of saffron at the county level (Rahimzadegan & Pourgholam, 2017;Dehghani Bidgoli et al., 2018;Farzadmehr & Bajestani, 2018). These studies suggested that satellite imagery has potential for mapping saffron, but also highlight limitations. First, Dehghani Bidgoli et al. (2018) pointed out that the classification accuracy of small fields is not satisfactory (overall accuracy of 62% for fields under 0.2 ha, 72% for fields between 0.2 and 0.5 ha), which is undesirable given that these sizes are common for many fields in the area (Behdani et al., 2008). Second, Rahimzadegan & Pourgholam (2017) found that saffron was often confused with winter wheat due to their similar fractional vegetation cover in the early greenness stage. Rather than focussing on a single-date image for classification, previous research demonstrated that spectral time series data allow for more accurate crop classification (Chang et al., 2007;Zhong et al., 2014;Arvor et al., 2011). Multi-temporal information can capture the seasonal dynamics of crops, which often differ between crop types (Foerster et al., 2012;Pan et al., 2012).
Due to its high spatial resolution (10-60 m) and five-day revisit interval globally, multi-temporal imagery from the Multi Spectral Instrument (MSI) onboard the two Sentinel-2 (S2) satellites has been increasingly used in cropland classification and phenology studies in recent years (Gómez et al., 2016;Belgiu & Csillik, 2018;Vrieling et al., 2018;Cheng et al., 2020). In the main saffron-growing region of Iran, the limited cloud cover and relatively dry conditions result in frequent cloud-free S2 observations throughout the year. Therefore, the S2 time series would allow for identifying the spectral-temporal differences between saffron and other crops, leading to improved classification accuracy compared to the previous mono-temporal classifications.
Besides identifying where saffron fields are, saffron field age information is important due to the strong link between crop age and saffron production (Kumar et al., 2009). Given that spectral properties may change with plant age, optical remote sensing data provide opportunities to retrieve landscape-level age information. For example, Knox et al. (2013) demonstrated the possibility to classify grass age based on vegetation indices that were derived from spectra l measurements collected in a greenhouse setting. Similarly, Carreiras et al. (2017) found a linear relationship between spectral reflectance and age of a secondary forest. However, age classification for perennial crops with relatively short life spans (less than ten years), such as saffron, has not yet been performed. Nonetheless, given that saffron field age determines the expected amount of biomass, we hypothesized that multi-temporal S2 spectra could discern different field age groups.
This study aims to evaluate if S2 time series allow for a) accurately mapping saffron fields, and b) classifying saffron plant age at the field level in Torbat Heydarieh County in north-east Iran.

Study area and saffron phenology
The study area is located in the eastern region of Torbat Heydarieh County, Razavi Khorasan Province in north-east Iran (Fig. 1). It covers an area of 506 km 2 situated between two mountain chains that run in an east-west direction. We purposively selected this area to cover previously-acquired in-situ data (see Section 2.2). The area has a cold semi-arid steppe climate (BSk) according to the Köppen-Geiger classification (Peel et al., 2007). The mean annual temperature and precipitation are 26 • C and 236 mm, respectively. Fig. 2 shows the average seasonal variability of precipitation and temperature, indicating a combination of a relatively mild winter, rainy spring, and hot dry summer. The county has the largest saffron cultivation area in Iran, which covers more than 30% of the county's total cultivated area. Other main crops include barley, winter wheat, spring wheat, pistachio, and alfalfa. Fig. 3 presents the crop calendar, which shows the average period of the growth stages for the major crops in the study area. We defined the crop calendar through consultation with experts and literature research (USDA-FAS, 1999;Chao et al., 2003;Mahfoozi et al., 2006;Lopez Corcoles et al., 2015).
In the first year of saffron establishment on a 'new' field, saffron corms are planted during real dormancy (from May to the end of June) or pseudo-dormancy (from early July up to early September) stages. After planting, corms stay in the same field for five to eight years, during which saffron remains productive (Gresta et al., 2008). Flowering starts between two and three weeks after irrigation when daytime air temperatures range from 12 to 18 • C and night-time temperatures from 4 to 8 • C, respectively, between mid-October and early November (Alizadeh et al., 2009). When most flowers bloom and have red stigmas, farmers harvest the flowers manually, and the most precious parts (the stigmas) are subsequently separated and dried. The vegetative stage starts immediately after flowering from the end of November and lasts until late May (Koocheki & Khajeh-Hosseini, 2019). During this stage, the leaves reach maturity and provide necessary supplies for corm development through photosynthesis (Khazaei et al., 2013). In June, the leaves start to senesce, and the daughter corms become dormant to prepare for the next growing season (Behdani & Fallahi, 2015). One problem of this perennial crop is that gradually soil fertility reduces and corm density becomes too high over time, which can lead to soil overexploitation and dramatic yield reduction (Gresta et al., 2016). To solve this problem, farmers usually leave the field fallow or grow other crops for multiple years after the five to eight year saffron cycle.

Field data
The main reference data to train our models, containing locations of saffron fields and other crops, was collected during the early vegetative stage of saffron (between 9 and 23 December 2019). The number of sample fields for each crop was determined according to their approximate proportion in this region, as obtained from local agricultural statistics from the Ministry of Agriculture of Iran. Despite that field-data collection was done in December 2019, our saffron classification focussed on the July 2018 -July 2019 period (see Section 3.1). For crops other than saffron, the inspection of crop residues in the field permitted obtaining crop types for that period, while for saffron, it was generally clear if the field age was greater than one year. When in doubt, the farmer of the field was interviewed, resulting in an accurate training data set referring to crops present in the field during the July 2018 -July 2019 period. Geographic coordinates were recorded for the corners that delimit each field's boundary using a Garmin ETrex 30x GPS device set to record in the WGS 84/ UTM 40 N coordinate system. This dataset 'A' comprised 119 fields, including 30 saffron fields, 19 winter wheat fields, 7 spring wheat fields, 11 pistachio fields, 4 barley fields, 4 alfalfa fields and 44 bare soil areas. The area of these field samples varied between 0.7 and 22 ha. Besides this dataset, an additional dataset containing 213 locations of saffron fields (named dataset B) were available for this study based on a survey performed between July 2018 and February 2019. Dataset A was used for training and testing for feature selection and classification (Section 3.1). Dataset B was used to assess the accuracy of the constructed saffron map (Section 3.2). The distribution of these  sample fields is shown in Fig. 1.
To perform age-based classification (Section 3.3), we used planting year information of all saffron fields in dataset A; this information was obtained during the interviews with local farmers. In addition, during the field survey, nadir photographs were taken from a fixed height (about 1.3 m) at multiple positions within each saffron field to reflect the vegetation density of different saffron crop age. Fig. 4 presents examples of these photographs, which illustrates a higher vegetation density in older saffron fields.

Sentinel-2 data
The S2 mission comprises two polar-orbiting satellites, S2A and S2B, which share the same orbit but phase at 180 • (Fletcher, 2012). They were launched by the European Space Agency (ESA) on 23 June 2015 and 7 March 2017 as part of the Copernicus programme. The combination of the two satellites provides a five-day revisit interval for the same location on the earth surface. Areas covered by overlapping orbits have an even shorter revisit time, but this is not the case of Torbat Heydarieh. Each S2 satellite carries the MSI sensor with 13 spectral bands at 10-60 m spatial resolution.
A set of 259 Level-1C S2 images for tile 40SGE between 20 August 2015 and 30 June 2019 were downloaded from the Copernicus Open Access Hub and the USGS Earth Explorer website. Level-2A Bottom-Of-Atmosphere (BOA) products processed by ESA were only available from 15 December 2018 onwards for our study site, and consequently did not match the full timeframe of this study (i.e., July 2018 -June 2019 for saffron mapping, August 2015 -June 2019 for age classification, see Sections 3.1 and 3.3). Therefore, we used Level-1C Top-Of-Atmosphere data for the full time series and applied the Sen2Cor processor (version 2.8) for atmospheric correction of each scene. One of the Sen2Cor outputs is the 20 m resolution Scene Classification (SCL) layer, which was used to mask out any observation classified as no data (0), saturated or defective (1), dark area (2), cloud shadow (3), cloud medium probability (8), cloud high probability (9), and snow (11). We further applied a 20 m buffer to this mask to reduce possible edge effects. Spectral information from ten bands (Table 1) with 10 m and 20 m resolution were utilized in this study. The masked images of 20 m resolution bands were resampled  to 10 m, and then all these masked and resampled images were clipped to the study area extent. For each of the 10 bands, we produced an image stack that contains 259 layers, each layer representing a single date of observation.

Methods
Fig. 5 presents the general workflow adopted in this study to examine the utility of S2 time series in accurately mapping saffron fields and classifying saffron plant age. The following subsections describe in detail the methods used in this study.

Spectral-temporal feature selection for saffron identification
To identify which spectral-temporal features are typical for saffron fields, we applied a feature selection framework. First, 252 spectraltemporal features were extracted from the S2 time series. Second, the Random Forest (RF) feature importance score was used to select a subset of features that achieved optimal accuracies in separating saffron fields from other crops or land cover.
Feature extraction is a fundamental step of image classification as it aims to define useful information from the satellite images in order to achieve a specific purpose, such as crop classification. S2 MSI provides spectral information in the visible and near-infrared (VNIR), and shortwave infrared (SWIR) regions that are sensitive to crop characteristics such as leaf pigments, leaf area index, water content and nonphotosynthetic components (Peña-Barragán et al., 2011). Moreover, VIs combine information from spectral bands to enhance the vegetation signal and as such, are useful in crop classification. They can characterise the crop behaviour during different phenological phases in relation to vegetation status, residue cover, and canopy structure (Bégué et al., 2011;Huete et al., 2002).
This study used all S2 images acquired between 1 July 2018 and 30 June 2019, which cover the saffron growth cycle from planting to dormancy. Multiple satellite acquisitions in a single month were summarized at pixel level to average monthly spectra or VI (i.e., 12 temporal features per year) to reduce the feature dataset. The spectral features consist of the spectral reflectance of 10 individual bands (Table 1) and 11 VIs (Table 2) related to the biochemical and biophysical properties of specific crops. The VIs were divided into six groups according to the spectral regions used.
The combination of 21 spectral features for 12 months resulted in a high-dimensional dataset of 252 spectral-temporal features. Such a large feature set is not desirable due to multicollinearity between features and because it increases the computation time of classification algorithms, often with no or limited improvement in accuracy (Xie et al., 2019;Hao et al., 2015;Löw et al., 2013;Hu et al., 2019). To remove redundant and irrelevant variables prior to image classification, various feature selection (variable elimination) methods have been used in remote sensing studies (Chandrashekar & Sahin, 2014;Loosvelt et al., 2012;Xie et al., 2021). We performed RF feature importance score ranking to select a subset of features relevant for saffron identification. This method can outperform other algorithms due to its ability in handling high- dimensional input variables and its robustness to over-fitting (Pal & Foody, 2010). The final importance of each variable is determined by the permutation importance, which is calculated based on the increase of misclassification rate after permuting a certain feature (Goldstein et al., 2011). Training data was generated from the pixels inside the 119 field samples (dataset A). To reduce the influence of heterogeneous pixels at field boundaries, a 10 m internal buffer was created for each field parcel. Each training sample was a combination of a) the spectral-temporal information (i.e., 252 features) for the pixel, and b) the corresponding crop type or land cover. We set the number of trees at 500 for convergence of the out-of-bag error (OOB error) statistics. To select the number of features to split nodes, we calculated the square root of the total number of input features as commonly recommended (Gislason et al., 2006;Belgiu & Drȃgu, 2016). The classifier was repeated 50 times. Each time 70% of the 3,797 pixels within the 119 field samples were randomly selected as training data for each class, and the remaining 30% (1139 pixels) were used as test data for cross-validation. Eventually, the feature importance was calculated as an average of the 50 runs. The feature importance was assessed using the Scikit-Learn package in Python 3.7. According to the ranking order of importance scores, the performance of different numbers of input features for saffron classification was assessed. The retained number of features for the classification was then selected as the lowest number for which the overall accuracy (OA) curve satuated (i.e., only showed negligible increase).

Saffron field mapping and accuracy assessment
RF is one of the commonly used classification algorithms that has shown good performance for cropland classification (Tatsumi et al., 2015;Sonobe et al., 2014;Ok et al., 2012;Pal, 2005). The RF classifier consists of an ensemble of decision trees whereby each tree is constructed on a random subset from the full set of input variables. The final classification results are obtained by taking the majority voted class in the forest. In this study, the selected spectral-temporal features in Section 3.1 were used as input variables for the RF classifier. The RF parameter settings were kept the same as for the training model described in Section 3.1.
Samples for training and testing the RF classifier were extracted from dataset A as described in Section 3.1. Instead of all 252 spectraltemporal features, only the selected features were used as input variables. The classifier was initially validated using the 10-fold crossvalidation method. Each time a different 10% of the samples (one group) were used as testing data and the remaining 90% for training; this procedure was then repeated for all the ten groups. Finally, the cross-validated accuracy was calculated as the mean value of all evaluation scores. The performance of the RF classification model was evaluated in terms of common statistical measures derived from the confusion matrix, which include the overall accuracy (OA), user's accuracy (UA), producer's accuracy (PA), and kappa coefficient (Foody, 2002).
Further, the independent dataset B (Section 2.2) was used for an independent assessment of the saffron map accuracy. Pixels located inside the 213 polygons of dataset B excluding a 10 m internal buffer were used as validation data. This resulted in a total of 3,019 pixels with the label "saffron". The accuracy of the saffron map was assessed using the sensitivity (i.e. true positive rate) index, which describes what percentage of the 3,019 pixels were correctly classified as saffron (Altman & Bland, 1994). Based on dataset B, we could not calculate specificity (i.e., true negative rate) because no field samples of non-saffron fields were collected.

Saffron field age classification
According to the field survey (sample nadir photographs are shown in Fig. 4) and interviews with local experts on saffron cultivation, the vegetation density during the vegetative stage (from December to May) varies with saffron field age. Fig. 6 illustrates this using a four-year time series of field-level NDVI observations from S2. For the first field saffron was planted in 2015 (Fig. 6a), and for the second field planting took place in 2011 (Fig. 6b). Despite the smaller number of S2 observations before 2018, Fig. 6 (a and b) shows that there is a gradual increase of green vegetation cover in the first four years of cultivation, whereas, after this, it remains more stable.
Among the VIs used for feature selection (Table 2), NDVI is the most commonly used vegetation index, and correlates with vegetation canopy density and greenness (Myneni et al., 1995). For that reason, we applied it to analyse different field ages, focussing on the vegetative stage. We note however that EVI2 resulted in very similar temporal patterns. To investigate if saffron field age can be classified based on NDVI time series, the temporal variability of NDVI within fields of the same age (intra-class) and the separability between fields with different ages (inter-class) during the vegetation stage (December to May) was quantified using the Jeffries-Matusita (JM) distance. The JM distance ranges from 0 to 2, with larger values indicating that two compared classes are more distinct; in other words, their interclass variability is larger than their within-class variability (Jensen, 2005).
To analyse whether fields of different ages resulted in different NDVI characteristics, we first aggregated NDVI at the field level for the 30 sampled saffron fields of dataset A, for which we had age information. We then used the monthly NDVI between December and May (i.e., six bands) for 104 sample fields (i.e., 30 fields with age information in four growing seasons; not all fields have age information for all four years) as input to derive JM separability between different age groups. The separability between each pair of ages was calculated for single monthly NDVI to assess which months were most relevant for distinguishing fields with different ages. In addition, we calculated the JM separability for the combination of six-monthly NDVI values.
Based on the separability analysis, we identified which age groups could be discriminated, and which should be combined. The 104 saffron field samples were used as the training and validation dataset for RF classification. Each sample is a combination of a) the six-monthly NDVI values for December to May, and b) its field age information. Subsequently, the RF classifier (as in Section 3.1) was applied using 500 trees and 50 repetitions. This classification was performed only for those pixels that were identified as saffron in the previous step (Section 3.2). Here we show the classification for the 2018-2019 season, i.e., the most recent season with complete phenology at the time of our analysis. The average monthly NDVI between December and May in four growth seasons (2015-2019) for each of the 30 sample fields were combined and used as the training dataset. Cross-validation and accuracy assessment follow the same methods as described in Section 3.2. Fig. 7 presents the temporal profiles of 21 spectral features (10 spectral bands and 11 VIs) for saffron (black curves) and other crops (coloured curves) from July 2018 to June 2019. These temporal profiles were calculated by per-class averaging the spectral features of pixels within the 119 sample fields. Fig. 7 shows that the spectral differences between saffron and other classes vary over time. For example, the NDVI has low values between July and October for most crops apart from alfalfa. NDVI of saffron increases in November when its vegetative stage commences and reaches the highest value (>0.6) around March. The NDVI of other crops such as barley and winter wheat remain low until February. During summer (from July to early-September), most saffron fields have a higher reflectance in SWIR bands than most crops. During that time, saffron fields are dry and bare while other crop fields such as alfalfa and pistachio have green vegetation. After this summer period, reflectance of saffron fields declines along with the first irrigation in September, which increases the soil water content. Hence, visual examination of these plots indicates the potential to separate saffron from other crops based on its unique spectral-temporal features. Fig. 8 displays the RF feature importance score of each spectraltemporal feature for separating saffron from the other classes, averaged over the 50 classification runs. Towards yellow, the importance score increases, meaning that the feature contributes more to discriminating saffron from other classes. The highest scores are found in February for NDVI, EVI2, and NDII. With the average ranking result of each feature, the influence of the number of features on classification accuracy was assessed and shown in Fig. 9. The cross-validated OA of the classification increased with the number of input features until a saturation point was reached at 19 features (95.5%). These 19 features are comprised of 13 VIs and six spectral bands (Fig. 10a). Among the six VI groupings (Table 2), group A accounts for six, and group C accounts for four of the 13 VI features. Time wise, most features are concentrated in spring (from January to March), and a few features have a high importance value in August and September.

Spectral-temporal features for discriminating saffron from other crops
To assess potential multicollinearity between the features, we calculated the Pearson correlation coefficient between the 19 features (Fig. 10b). This revealed that NDVI, EVI2, NDII, NDWI, and NDTI from January to March were significantly correlated. The correlation between features from the same VI group was stronger than between groups, because VIs from the same group are calculated using the same combination of spectral regions resulting in similar temporal behaviour. The features band4_Feb, band12_Feb, and band12_Mar were negatively correlated with VIs in the same periods. Both VI and spectral band   features show a stronger correlation when they are obtained for the same month, as compared to features from different months. Considering the high correlation of features within the same groups and same time windows, we retained only six features (i.e., NDVI_Feb, NDRI_Oct, Band12_Feb, Band12_Mar, Band11_Aug, and Band11_Sep) as the optimal set of input features for saffron classification. Fig. 11 displays the full annual time series for the selected spectral features.

Accuracy assessment of the saffron cultivation map
The application of the RF classifier to the retained spectral-temporal features resulted in a classification of saffron fields across the study area (Fig. 12). Based on dataset A, the classification model has an OA of 95.5% and a kappa coefficient of 0.94 (Table 3). UA and PA were 96.3% and 93.6% for saffron and 97.9% and 98.8% for non-saffron classes, respectively. Omission errors (i.e., 18 saffron pixels classified as nonsaffron) were higher than commission errors. These omission errors related predominantly to saffron fields that were cultivated for the first year. Due to the low density of green vegetation, first-year saffron is hard to distinguish from winter crops such as barley and winter wheat. This difficulty is reinforced by the emergence of weeds in March within saffron fields, which is also the moment when winter crops start to grow. Fig. 12 shows the classified saffron cultivation area for the entire study area. When compared against field dataset B, the accuracy (here: sensitivity) of the saffron map is 87.6% (i.e., 2,645 pixels were correctly classified as saffron among the 3,019 saffron pixel samples derived from dataset B). The green polygons in Fig. 12 highlight saffron fields that were not (fully) classified as such. The underestimation of saffron was mostly caused by pixels near parcel edges, which have a combined spectral response of the saffron fields and their surroundings. Consequently, various small saffron fields (<0.2 ha) were incorrectly classified. Further separating the sensitivity results into field sizes resulted in a sensitivity of 91.7% for fields larger than 0.5 ha and 89.5% for fields between 0.2 and 0.5 ha whereas for small fields (<0.2 ha) this was only   Fig. 13 shows the average NDVI time series for saffron field ages and their associated standard deviation. Similar to Fig. 6, the NDVI of the green-up stage increases during the first four years, while it remains more stable between the 4 th and the 6 th year of saffron cultivation. The 7 th and 8 th year saffron fields show the highest NDVI during the vegetative stage compared with other age fields. The green vegetation density depends on corm size, corm number per clone, sprouted axillary buds number per corm, and environmental factors (Koocheki & Khajeh-Hosseini, 2019). During the first year after planting, there is only one mother corm that will produce a clone in the following years. As the age of saffron increases, the number of daughter corms will increase and consequently, vegetation density increases. However, the growth of replacement corms mainly depends on saffron leaf photosynthesis. From the 4 th year, the leaf area ratio reaches a high level (as seen in Fig. 4), resulting in less space for further development of vegetation and slowing down of daughter corm propagation. This in turn inhibits the increase of vegetation density. In the 7 th and 8 th year, the fields contain the highest density of bulbs, resulting in greater abundance of green vegetation. This greater abundance is reinforced by the fact that less weeding takes place during these final years. These results are consistent with the green vegetation density observations during the fieldwork (Fig. 4).

Feasibility evaluation for age-based saffron field classification
Fig. 14 presents the pairwise separability metrics (i.e., JM distances) that indicate how different the NDVI time series for a specific saffron field age is as compared to the NDVI series of the other seven field ages. For this analysis, NDVI data from December to May were used from the 30 saffron field samples with age information in four growing seasons (2015-2019). The yellow colours indicate a high separability between the NDVI values of the two classes in the corresponding month. The overall separability for the combination of six-monthly NDVI values between each pair of classes (ages) is shown in Fig. 15. The separability between saffron fields of different ages increases when age differences become larger. Some counterintuitive results are present in the figure;  for example, 1 st versus 3 rd year has higher separability than 1 st versus 4 th year saffron, and 5 th versus 8 th year has a higher separability than 4 th versus 8 th year. This could be an artefact of the relatively small sample size for each age group. Between saffron fields of 4 to 6 years old, the overall separability is relatively lower (JM < 0.6) than for other pairs, indicating the difficulty of distinguishing these classes from NDVI time series. Therefore, the saffron fields of these ages were merged as one group for classification. Besides, because 8 th year saffron was only available for one field, ages 7 and 8 were also combined into a single group.

Accuracy assessment of saffron age classification
The saffron age classification was implemented using the RF algorithm taking the six-monthly values of NDVI from December to May as input variables and the 104 plots as training data. Table 4 shows the accuracy of the classification result. The OA is 86.8%, and the kappa coefficient is 0.81 when combining ages of 4 to 6 years and 7 to 8 years.
The 1st year saffron fields had the highest accuracy (UA of 93.1% and PA of 90.0%), indicating that their NDVI trajectory differs most from other age groups. Third year saffron fields had considerably higher errors of commission, i.e., 21 pixels of 4 th to 6 th year saffron were incorrectly classified as belonging to 3 rd year fields. This is not surprising, given that the JM distance between the two groups is low (0.72), even if above the threshold of 0.6 used previously for grouping field ages of 4 to 6 years. Fig. 16 shows the saffron age classification for the 2018-2019 growing season using the grouped age classes. For Fig. 16ab, pixel-level classification was used, given that automated field delimitation by image segmentation was outside the scope of this study and possibly difficult due to the irregular and small fields. However, within a single field, multiple age groups were retrieved (Fig. 16b), likely caused by a heterogeneous planting density, which tends to be lower towards field  edges. To illustrate how an object-based approach could improve this, we visually delineated the 1,333 parcels inside the subset of Fig. 16b using our saffron classification and high-resolution Google Earth imagery. For each parcel we assigned the majority class. Fig. 16c shows the result of this procedure.

Classification of saffron fields and their age
This study demonstrated that S2 time series allow for accurate mapping of saffron fields, and assessing saffron age. The RF algorithm helped to select relevant spectral-temporal features that describe key phenological characteristics of saffron, and consequently was used for classification. NDVI during the vegetation stage could effectively discriminate saffron field ages, given that age affects the density of the green cover.
The RF importance ranking selected spectral features across VNIR and SWIR spectral regions. Spectral information in these regions are affected by crop biochemical and biophysical properties of vegetation, such as photosynthetic pigment absorption, structural photon scattering, and water content, and have proven to be efficient in crop classification (Peña-Barragán et al., 2011;Hao et al., 2015). Two phenological phases were identified as important for separating saffron from other crops; the rapid green-up (February to March) and the dormant period (August to October). During the rapid green-up other winter crops are in their initial vegetative phase whereas summer crops were harvested and fields left fallow. The dormant period corresponds to the dry months between August and October. At this time, saffron corms are dormant underground and saffron fields are bare and dry, resulting in a high spectral reflectance in SWIR bands until the first irrigation is carried out in late September. This contrasts with spring crops that have already been harvested and covered by residues during these months.
The accuracy of the saffron classification based on the multitemporal S2 imagery improved by 13.5% (from 82.0% to 95.5%) with respect to a previous study using single-date Landsat-8 imagery to map saffron fields (Dehghani Bidgoli et al., 2018). Various studies have found that multi-temporal images effectively reflect the phenological differences between crops (Wardlow et al., 2007;Zhong et al., 2014;Arvor et al., 2011). The higher temporal resolution of S2 provides more cloudfree data compared to Landsat-8 imagery, facilitating the description of the distinct temporal behaviour of saffron. In addition, the higher spatial resolution (10 m for S2 versus 30 m for Landsat-8) contribute to the improved accuracy, given that for fields smaller than 0.2 ha and those between 0.2 and 0.5 ha Dehghani Bidgoli et al. (2018) reported an overall accuracy of 62.0% and 72.0%, respectively against 73.1% and 89.5% for our classification.
The classification of saffron age is the first attempt to classify the age of perennial crops from multi-spectral time series to the best of our knowledge. We demonstrated that with increasing age of saffron fields, NDVI levels during the vegetation stage (December to June) increased, allowing to separate five age groups. Particularly, first-year saffron could be separated well from older fields due to its limited vegetation cover. Simultaneously however, Section 3.2 revealed that first-year fields therefore have more confusion with other crops. To improve further on the accuracy of saffron mapping, first-year saffron could therefore be treated as a separate class. The difficulties in distinguishing saffron fields with ages from four to six years are likely due to their similar vegetation cover, which is consistent with the performance of green vegetation density observed during the fieldwork (Fig. 4). This performance could be explained by the increasing competition between high-density corms for water and nutrients, which reduces the reproduction capability of corms (Kumar et al., 2009).

Potential limitations and outlook
Although the RF classifier was effectively used in this and other studies on cropland classification (e.g., Tatsumi et al., 2015;Sonobe et al., 2014;Ok et al., 2012;Pal, 2005), it requires a large training dataset, particularly if strong intra-class variability exists (Rodriguez-Galiano et al., 2012). Saffron shows strong variability in spectraltemporal features due to field age; if following our previous suggestion of classifying first-year saffron as a separate class (Section 5.1), more training samples for first-year saffron fields would be required. Alternative classifiers could be tested to improve the classification accuracy despite high spectral variability and limited training samples. For example, time-weighted dynamic time warping (TWDTW) may provide  an efficient solution due to its advantages of balancing between time series shape matching and temporal alignment (Maus et al., 2016), and low sensitivity to the intra-class variability and size of the training dataset (Belgiu & Csillik, 2018).
We increased the training sample size for field age classification by combining age data for four growing seasons (from July 2015 to June 2019). However, in this way, we did not account for possible interannual variation in spectral-temporal characteristics, which could emerge due to different weather conditions and timing of agricultural practices between years, possibly resulting in inaccuracies in the age maps. To further increase the field age classification accuracy, we could benefit from repeating the classification for each growing season and obtain age maps for any particular year. In this way, corrections could be made to maps of previous (or subsequent) years; for example, if in 2019 a field was correctly identified as second-year saffron and in 2020 as third-year saffron, we can conclude that in 2018 saffron was cultivated for the first year on that field. Given the difficulty in separating first-year saffron from winter crops (Section 4.2), the repeated classification of saffron for different years could accordingly also improve the accuracy of the saffron classification; following our previous example, if a field was classified as second-year saffron in 2019 but was not identified as saffron in 2018, we can then correct this "non-saffron" field as "saffron" in the saffron map of 2018.
Despite the regular clear-sky conditions in the study area, optical imagery is susceptible to clouds, which may result in missing and irregular data on land surface reflectance. This is most disturbing during the rainy season (from January to April), which is a crucial period of saffron vegetative growth in north-east Iran. Long periods with no or limited cloud-free observations could influence the quality of generated monthly data and reduce the classification accuracy. Compared with optical imagery, microwave radiation can penetrate clouds with negligible attenuation (Richards, 2005), which can ensure continuous measurements over the growing stage of saffron. Synergistic use of radar and optical information can help in better describing crop phenology, even though the temporal behaviour of radar and optical vegetation indices may differ (Meroni et al., 2021). Previous studies have successfully improved crop classification accuracy (Skakun et al., 2016;Dong et al., 2013;Van Tricht et al., 2018) and phenology monitoring (De Bernardis et al., 2016) using a combination of optical and Synthetic Aperture Radar (SAR) images. Different from optical imagery which reflects vegetation biophysical characteristics, SAR imagery provides insights into vegetation structure, surface roughness, and soil moisture (Veloso et al., 2017). While S2 alone gave good performance, further study could assess the benefit of including SAR (e.g., the 6-day interval 10-m resolution Sentinel-1 series) for further improving saffron classification.
The high accuracy attained in this study for saffron mapping suggests that it is possible to scale our classification approach to a larger region. Given that Iran accounts for more than 90% of the global saffron production (Ghorbani, 2007), mapping saffron fields in the key cultivation region (i.e., the north-east Khorasan region in Iran) can provide geographically explicit information and contribute to the sustainable management of saffron cultivation. When expanding to larger areas, it is possible that also other crops are cultivated with similar phenological behaviour as saffron. Moreover, if saffron would be cultivated under different management practices, this could influence the classification model's transferability (Jin et al., 2018;Juel et al., 2015). Therefore, further testing of our approach may be required before expanding to areas with different crop compositions and climatic conditions. In addition, considering the high memory requirements and time cost for processing a large amount of S2 images, this process can be executed using a cloud-based distributed processing platform. One example of this is Google Earth Engine (GEE, Gorelick, et al., 2017) that allows for access and processing of large earth observation datasets. Previous studies showed that GEE was an effective tool for large-scale crop mapping and monitoring applications (e.g., Lemoine & Leo, 2015;Tian et al., 2019).
The spatially-explicit information on saffron distribution could assist in a regionally transparent and effective management of saffron cultivation. For example, irrigation of saffron fields before emergence is important for its productivity. Knowing the saffron distribution could help local government secure and prioritize the use of water resources during critical times for the crops. Because saffron age strongly correlates to saffron productivity (Kumar et al., 2009), such information helps in estimating and predicting regional production levels, including for future years. Besides, considering that counterfeit saffron stigma has become a serious problem in global saffron markets, identifying the authenticity of saffron farms and estimated production levels stated by businesses is of importance for consumers, and could be valuable input for certification mechanisms. Satellite-based saffron field identification and age classification provides an opportunity for better supervision and management of the saffron market.

Conclusions
This study demonstrated that S2 time series allowed for accurately mapping saffron cultivation areas and field age. Using the RF classifier, we first selected an optimal set of spectral-temporal features for discriminating saffron from other crops. Based on the correlation between features, we retained six spectral-temporal features for classification, including NDVI for February (vegetative phase saffron), SWIR (band 12) for February and March, NDRI for October, and SWIR (band 11) for August and September (when saffron fields are dry and bare). Based on these features and the available field data, the RF classifier resulted in accurate saffron discrimination from other crops. The overall accuracy was 95.5% based on the training data, and based on an independent dataset of saffron field locations, 87.6% existing saffron fields were correctly classified as such. In addition, using monthly NDVI series from December to May we achieved an overall accuracy of 86.8% for classifying different saffron age groups. Our output maps and methodology provide a solid basis for monitoring saffron distribution and forecasting regional production levels.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.