RICA: A rice crop calendar for Asia based on MODIS multi year data

Information on when and where rice is planted and harvested is important for crop management under a changing climate and for monitoring crop production for early warning and market information systems. The diversity of plant genetic, crop management, and environmental conditions leads to a wide variation in the number of rice crops per year and the dates of crop establishment and harvesting across Asia. Asia-wide rice crop calendars exist (e.g., RiceAtlas) but are based on heterogeneous data sources with varying levels of detail and are challenging to update. Earth observations can contribute to consistent and replicable crop calendars. Here we demonstrate and validate a method for generating a rice crop calendar across Asia. Our analysis at administrative unit-level is based on pixel-level analysis with the PhenoRice algorithm using MODIS imagery (2003–16) to estimate start of season (SoS) and end of season (EoS) dates. PhenoRice outputs were post-processed to generate representative statistics on the number of rice crop seasons per year and their SoS/EoS dates per administrative unit across Asia, called RICA (a RIce crop Calendar for Asia). RICA SoS and EoS dates across all seasons correlated strongly with RiceAtlas crop establishment and harvesting dates (R2 of 0.88 and 0.82 respectively, n = 1,186). The mean absolute errors were around 26 and 33 days for SoS and EoS, respectively. A detailed assessment in the Philippines where data in RiceAtlas are particularly accurate had even better results (R2 of 0.93 and 0.85 respectively, n = 131). Comparisons to other published rice calendars also suggested that RICA captured rice cropping season dates well. Our study results in a unique and validated method to estimate rice crop calendar information on continental scale from remote sensing data.


Introduction
Crop calendars report the timing of key agricultural activities and crop development. At their most detailed, crop calendars can support the planning of many farm activities, from land preparation, planting, crop, soil, and water management through the growing season, harvesting, storage and fallow (IRRI, 2020). Most crop calendars, however, contain information only on the timing of crop planting and harvesting for major crops, and report these dates as planting and harvesting windows for large geographic areas (Dimou et al., 2018;FAO, 2010). This is particularly true for crop calendars that go beyond national scale (from multiple countries up to continental and global coverage). Crop calendars have been characterised as essential or critical components of agricultural monitoring systems in a review and survey of such systems by Fritz et al. (2019). The same review outlines their importance for specific applications such as monitoring crop condition, estimating crop area and harvested production, agricultural policy, emergency planning in the event of disasters. Crop calendars are also important layers of information for the United Nations Sustainable Development Goal (SDG) target 12.3 -to halve per capita global food waste at the retail and consumer levels and reduce food losses along production and supply chains, including post-harvest losses by 2030 (FAO, 2020a).
Crop calendars are constructed using one or more of the following approaches: (i) census-based calendars, compiled from multiple sources such as ground observations, expert opinion and/or survey data Portmann et al., 2010;Sacks et al., 2010); or (ii) model-based calendars determined by climate data, crop characteristics and land surface phenology from remote sensing data (Kotsuki and Tanaka, 2015;Portmann et al., 2010;Sacks et al., 2010;Whitcraft et al., 2015a). Census-based crop calendars often rely on heterogeneous detected within the timeseries. PhenoRice generates high spatial resolution maps (250 m in the case where MODIS data are used as input) of rice SoS, PoS and EoS. PhenoRice has been applied and tested in different rice growing environments, producing robust results Busetto et al., 2019;Liu et al., 2020;Wang et al., 2020). This robust performance across diverse sites suggests that PhenoRice can be applied to larger areas. The global, and relatively spatially detailed information in RiceAtlas coupled with the potential to apply PhenoRice at scale provides a unique opportunity to develop, test and validate an EObased approach to map rice SoS and EoS consistently over large geographic areas. The unique characteristics of rice systemshigh cropping intensity with up to three crops per year on the same plot of land, wide diversity in crop management practices, and its widespread cultivation in the cloudy monsoon season, suggest that this is one of the most challenging staple crops for such an approach.
Here we demonstrate an EO-based approach to provide crop calendar information that is comparable to the best available rice crop calendar (RiceAtlas). By using an EO-based approach, we ensure that the method is automated, replicable, and repeatable. Such approaches are particularly valuable in the context of tracking changes in cropping cycles over time, due to climate change or due to policies that affect diversification and land consolidation. We use East, South, and Southeast Asia for this comparison since these regions account for 85% of the global rice area and 90% of global rice production (FAOSTAT, 2019). We use the PhenoRice algorithm and MODIS data for 2003-2016 to detect the number of rice seasons per calendar year and estimate SoS and EoS per pixel for each detected season. We further process the PhenoRice pixel-level outputs to estimate the number of seasons and the peak detections of SoS and EoS dates for each season per RiceAtlas administrative unit to generate RICA: a RIce Calendar for Asia.
Our aim is to test whether an automated EO-based method can derive a rice crop calendar for Asia that is comparable to RiceAtlas, but which is repeatable, replicable, and consistent. We assess the accuracy of the EObased RICA dataset by comparing our SoS and EoS estimates to the peak crop establishment and harvesting dates for rice seasons reported in RiceAtlas. In addition to a continent-wide accuracy assessment, we explore the case of the Philippines as an example country where RiceAtlas is known to have particularly accurate and detailed rice season information. Where available, we also compare RICA to EO-based national and subnational level rice crop calendars from the literature to further demonstrate that our continental dataset performs well in relation to more locally tuned approaches. We discuss the steps required to handle the timeseries of EO data, the challenges of spatially and temporally aggregating pixel level date information (especially due to the variation in cropping intensity and seasonality across and within administrative units), and the potential to report the same information on a regular spatial framework whilst still retaining statistically robust SoS and EoS estimates. The significance of this study is that it is the first attempt to derive an EO-based continental rice crop calendar for Asia based on multi-annual EO data and an automated procedure that does not depend on locally tuned parameters or masks.

Satellite imagery and products
Timeseries of MODIS 250 m vegetation products (MOD13Q1/ MYD13Q1) were used as the basis of detecting rice growing areas and deriving rice crop phenology. Terra (MOD13Q1) and Aqua (MYD12Q1) data are available every 16 days at 250 m spatial resolution with an eight-day shift (e.g., the Terra 16-day period starts on day one, while the Aqua 16-day period starts on day nine of a 365-day cycle). As a result, the global MODIS vegetation product is available every eight days, consisting of NDVI (Normalized Difference Vegetation Index) (Tucker, 1979), EVI (Enhanced Vegetation Index), (Huete et al., 2002), rank and quality indices, and four reflectance bands (red, near-infrared, blue, and mid-infrared). The MODIS datasets consisting of 38 tiles covering the major rice growing area of Asia ( Fig. 1) from 2003 to 2016 were downloaded and analyzed. In total, we considered 24,472 MODIS data granules in this study. The data were downloaded through NASA's Land Processes Distributed Active Archive Center (LP DAAC) located at the USGS Earth Resources Observation and Science (EROS) Center using the MODIStsp R Package (Busetto and Ranghetti, 2016).
MODIS global land surface temperature (LST) data from the MOD11A2 1 km eight-day product (Wan, 2008) were also used. LST data were downloaded for the same time-period and resampled to 250 m through nearest neighbour method to match the spatial resolution of MOD13Q1/MYD13Q1. LST was used to provide an overall "temperature condition for rice crop establishment", hence any risk of uncertainty from mixtures of land surface targets by resampling 1 km LST data to 250 m, is secondary for the specific scope in which LST is used here and is unlikely to have a significant impact on our overall results.

Reference data
RiceAtlas  is the most comprehensive and most recently published spatial dataset of rice crop calendar information. It is a unique source of global rice calendar and production information, reported for 2,725 administrative units. It includes (i) the number of growing seasons, (ii) the cultivated area and production and (iii) crop calendar dates related to establishment and harvesting per administrative unit. Crop establishment and harvesting activities are represented by "start", "peak" and "end" dates per season to represent the planting and harvesting window per administrative unit, together with rice area and production per season.
A total of 1,076 administrative units cover the rice growing areas of Asia in RiceAtlas. The spatial resolution is different in each country, due to variation in the size of the sub-national administrative units per country and the availability of rice calendar information. RiceAtlas has first level subnational administrative units ("REGIONS") for: China, Indonesia, Malaysia, Myanmar, Cambodia, Laos, Pakistan, South Korea, Japan, and Sri Lanka. Several other major rice growing countries have second level subnational administrative units ("SUB_REGIONS"): Bangladesh, India, Nepal, Philippines, Thailand, and Vietnam. Smaller countries have only national level units: East Timor, North Korea, Bhutan, and Taiwan. We did not include non-rice growing countries and territories like Singapore, Hong Kong, and Macau in this study.

Methodology
The flow diagram of the rice calendar estimation is presented in Fig. 2.
Step 1: Extraction of spatiotemporal information PhenoRice  is a rule-based algorithm to detect the intensity, seasonality, and phenology of rice from timeseries of EO data. Phe-noRice estimates rice establishment date (whether by transplanting or direct seeding) which we term Start of Season (SoS), flowering dates, termed Peak of Season (PoS), and harvesting dates, termed End of Season (EoS). PhenoRice can be used with default or user modified thresholds for its detection criteria and the user can set the maximum number of rice seasons to be considered per year as well as the temporal window in which to search for rice seasons. Our implementation of PhenoRice uses timeseries of EVI and a normalized difference flooding index (NDFI) computed from the red (b1) and SWIR (b7) bands in the MOD13Q1/MYD13Q1 products  to detect rice. PhenoRice also uses the exact Day of Year (DoY) of the composite MODIS data, the usefulness index, and the reliability indicators for each pixel in a weighted Savitzky Golay temporal smoothing procedure to reduce noise in the temporal EVI signature due to low quality acquisitions. Blue band reflectance (b3) is used to check for potential atmospheric contamination as a further quality check. LST is used to check the condition of the minimum physiological temperature, 15⁰ C, necessary to establish a rice crop; an important step for automated rice detection in areas where the temporal vegetation signature for other irrigated field crops like wheat may be confused with rice.
The algorithm performs a temporal analysis of Spectral Indices to identify local minima and maxima along the timeseries that can be related to field/crop management activities and crop development. NDFI and EVI are used to identify agronomic flooding at the time of crop establishment and crop phenological stages, respectively . Two main steps are involved: the first checks if the pixel belongs to a rice-cultivated area, based on expected temporal patterns of EVI and NDFI (subject to pixel quality and LST values), and the second performs a timeseries analysis to extract SoS, PoS and EoS dates from the EVI timeseries. See Boschetti et al, (2017) for more details.
Given its robust performance in previous studies, PhenoRice was run with one set of parameters across all MODIS tiles (Table 1). For a given year, PhenoRice was run on the MODIS data for the period July 2002 to March 2017, providing sufficient EO timeseries information before and after our 2003-2016 study period. We identified up to four potential periods per calendar year when a rice season could be detected. Our use of four search windows per calendar year was a necessary step to detect all possible rice seasons over such a large geographic region where rice seasonality varies enormously. After reviewing the reported rice seasons in RiceAtlas we set these as season 1: December -February, season 2: March-May, season 3: June -August, and season 4: September -November. The allocation of a rice detection to a season is based on the occurrence of the flowering stage (PoS) within that season. Rice varieties can be grouped into short, medium, and long duration varieties, each  Step 2: Mask out pixels with excessive cloud contamination and allocate the population of rice SoS and EoS records to their respective administrative units.
Step 3: Per administrative unit, fit models to the distribution of rice SoS and EoS records to determine the number of seasons per administrative unit and the SoS and EoS dates per season, i.e., an administrative unit-level crop calendar. taking approximately 90, 120 and 150 days to mature, respectively, though some very long duration varieties can take up to 180 days to mature. Flowering occurs at the end of reproductive stage, approximately 30 days before maturity, so SoS occurs 60, 90 and 120 days before PoS for short, medium, and long duration rice varieties, respectively. PhenoRice returns the DoY of the detection of SoS, PoS and EoS in each pixel.
Step 2: Cloud screening and pooling of detections per administrative unit During processing and with some spot checks against high resolution imagery, we recognised a tendency of PhenoRice to overestimate rice areas in regions with persistent cloud cover during the crop establishment period. This led to high NDFI values that were related to cloud contamination rather than to agronomic flooding and consequently misclassified non-rice areas as rice because of a false detection of flooding followed by an increase in EVI. To mitigate this, we used a cloud mask generated using the 1st and 2nd bit of the M*D09Q1 MODIS product, Surface Reflectance 8-Day Global 250 m QA bands to remove any rice detections where there was cloud contamination on the SoS date.
We then pooled all detected SoS and EoS pixel records for 2003-2016 into one pseudo year and assigned them to a unique administrative unit based on the centre coordinate of each pixel. This resulted in a large population of records per administrative unit. The third and final step analysed this population distribution and estimated the number of rice seasons within the distribution.
Step 3: Rice calendar estimation per administrative unit The pooled detections were analysed with the mixtools "R" package (Benaglia et al., 2009) for mixture models. We used the normalmixEM function to identify the positions of the main modes of the frequency distribution, which we assumed to correspond to the timings of the main rice seasons occurring within each administrative unit. Since the mixtools multimodel tool works with the expectation-maximization of the normal distribution, we assume a Gaussian distribution of SoS and EoS over time. This analysis was performed for each administrative unit.
The identification of the main modes followed an iterative procedure to confirm their reliability. As a starting point, we assumed that there could be a maximum of four rice seasons in a year, which corresponds to the PhenoRice outcome, therefore we started by fitting four Gaussian models per administrative unit. The resulting models were first checked for agronomic feasibility. An 80-day interval between the peak of two models was set as the minimum period between two consecutive independent rice crop seasons, based on the time required for a rice crop to reach maturity and how quickly a subsequent crop could be established. This criterion is acceptable for most administrative units but may be questionable for very large units where variation in cropping calendars, from say north to south, within the unit could mean that we discard temporally but not spatially overlapping calendars. Since we are using the spatial units of RiceAtlas, we however retained this criterion even for large spatial units. In addition, we excluded spurious model values by assuming that a valid rice season should be based on at least 2% of all of detections (SoS or EoS occurrence dates) within that administrative unit. We opted for 2% after testing 1, 2, 3, and 5% thresholds and determined that 2% was the optimum level for this study (results not shown). In summary, each administrative unit was analysed as follows, with one analysis for SoS data and another for EoS data: 1) Fit up to four models to the distribution of the pixel-level SoS (EoS) dataset. 2) Check model validity: a. Two models are considered overlapping if the difference of the mean of two models is less than 80 days. b. If the models are not overlapping, then test that the population per model represents at least 2% of the total population in that unit. 3) If a model fails one or more of the checks in step 2, then it is not valid.
In this case, reduce the number of models by one and fit again (step 1). 4) Repeat steps 2-3 until all models have enough gap between their means and each represents at least 2% of the overall population. 5) Check the number of retained models for SoS and EoS. If they do not match (i.e., if the number of models for SoS is different from that of EoS), then use the case with the lowest number of valid models and refit the other case with that same number of models. 6) Assign the means of the retained models to the spatial unit, and label them as the average SoS and EoS dates of typical rice seasons within that spatial unit. The standard deviation is also included in the eventual dataset as a reference for the range of dates for each mean SoS and EoS date.
The result is the RICA database where each administrative unit is assigned SoS and EoS dates for each detected season in the pseudo year representing the period 2003-2016. Fig. 3 shows an example of the above procedure for two different cases. The left panels (a and c) show situations that start with four seasons and result in three (b) and two (d) seasons after iterating through the above steps until all criteria are met.

Performance evaluation
We compared the SoS and EoS dates from RICA to the corresponding peak SoS and EoS dates of the same administrative units reported in RiceAtlas. Seasons were matched based on their similarity in SoS across the two datasets. To do this, for each RiceAtlas season we identified the closest RICA season (i.e., the one with the lowest absolute difference between the RiceAtlas peak SoS date and the RICA SoS date), within a limit of 90 days (if two RICA seasons were associated with the same RiceAtlas one, only the closest match was considered). If no RICA season was found, either because RICA identified a lower number of seasons with respect to RiceAtlas or because the 90 days limit was not respected, the RiceAtlas season was not associated with any RICA season (i.e., we assumed that RICA "missed" the season that was reported in RiceAtlas).
The opposite is applied in case RICA identified a higher number of seasons than reported in RiceAtlas (i.e., we assumed RICA recognized a season not reported in RiceAtlas). As an example, if three seasons were detected in RICA and there were only two rice seasons in RiceAtlas, we matched the seasons that had the most similar SoS and EoS across the two datasets and discarded the third RICA season that did not match to any reported season in RiceAtlas. Discarding of records occurred in only 26 RiceAtlas units out of 1,076. This discarding of seasons for evaluation purposes does not imply that the discarded RICA detection was "wrong" since it is also possible that RiceAtlas has incomplete season information or census data represents only the most frequent condition in the considered area. These seasons remain in the RICA dataset, but do not form part of the comparison. At the end of this automated matching process, we evaluated the performance of RICA by i) comparing the number of seasons for which a match between RICA and RiceAtlas was found with the number of seasons reported in RiceAtlas and analysing the matched/non-matched (nhit/nmiss) with respect to RiceAtlas seasons (Main, Second, Third), and ii) comparing the SoS and EoS dates for the seasons for which a match was identified, and computing the coefficient of determination (R 2 ), the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) between RICA and RiceAtlas SoS and EoS.

Results and discussion
To aid interpretation, Fig. 4 shows four examples of the output of the mixture model analysis for representative administrative units of RiceAtlas. In each case, the number of detected seasons matches the reported seasons in RiceAtlas and the mean SoS for each season detected by our model is also in good agreement with the peak SoS from RiceAtlas. While not all administrative units result in such good agreement, the figure serves to illustrate the model output and the comparison against RiceAtlas which forms the bulk of the results section. We focus first on comparing the number of seasons between RICA and RiceAtlas. Fig. 5 shows a comparison of the number of rice seasons in RICA and RiceAtlas. RICA generally reports a lower number of seasons with respect to RiceAtlas, indicating that our workflow detects fewer seasons. Out of the 1,076 RiceAtlas spatial units, RICA recognised three seasons in 13 cases, two seasons in 239 cases, one season in 669 cases and no seasons in 155 cases (mostly located in Nepal, other mountainous regions, areas with subpixel rice areas, and areas with persistent cloud coverage during the monsoon season which coincides with the sowing/ transplanting time). The bottom right panel of Fig. 5 shows also that the detection rate of RICA varies by season, it is 66% in the main season of RiceAtlas, 55% in the second season and 41% in the third season.

Comparison of the number of seasons
Detection in East Asia is largely consistent between the two datasets, with some under reporting of seasons in Southeast China. RiceAtlas reports three seasons in some provinces, but there are two seasons 'early' and 'late' with the third season representing areas that plant only one, longer duration rice crop instead of two (early and late) rice crops. So, our interpretation here is that RICA detects one season across China whereas RiceAtlas sometimes reports a maximum of two. Japan and South Korea are well represented, but there was no detection of rice in North Korea, where the areas are small and fragmented.
In Southeast Asia, RICA, captures the larger double rice systems in Vietnam, Indonesia (Java), Thailand, Malaysia, Myanmar, and the Philippines, but does not often capture the intensive triple rice systems in Vietnam and parts of the Philippines. Some areas with relatively small or fragmented rice areas in Malaysia, central Vietnam and parts of Indonesia are also not detected. RiceAtlas reports two rice seasons in Laos and Cambodia (with only one spatial unit per country), but these second, irrigated seasons represent very small and sometimes fragmented areas which PhenoRice may struggle to detect in sufficient quantities to be included in our workflow. Some of the smaller second rice season areas of Myanmar and Thailand are not well represented, although the main central and delta areas where there are larger areas of rice in both the wet and dry seasons are well represented in RICA.
In South Asia, RICA tends to report one less season than RiceAtlas: one season where RiceAtlas has two, two seasons where RiceAtlas has three, and in a few cases, no season at all whereas RiceAtlas reports one. Again, we note that the third rice season often represents a very small area within an administrative unit and PhenoRice may fail to detect sufficient pixels in this season to be included in our workflow. We observed that cloud contamination was more severe in South Asia than in other regions and this likely impacted our results here. Although these results may not seem completely satisfactory, the detection rate is partially explained by a large part of the study area suffering from persistent cloud coverage during the main, wet (monsoon) season, hampering the rice detection. Also, many of the non-detected seasons are in areas with small, fragmented rice area or are in areas with mountainous terrain (e.g., Nepal) or belong to second and third rice seasons in countries where RiceAtlas is reporting at sub-region level (e. g., Bangladesh, India).
The apparent poor performance of RICA in some areas is due to several reasons. One is the different concepts of seasonality between the RICA and RiceAtlas. For example, in Yunan Province (China) the RiceAtlas peak SoS of the 1st season is DoY 105, the 2nd season is DoY 74, and the 3rd is DoY 201, but RICA requires at least 80 days between two detected SoSs to assign them as two different seasons so the first two seasons in Yunan will not be considered as two seasons by RICA. China reports early, middle (single) and late seasons. Early and late refer to double-cropped rice areas; and the middle season for single-cropped rice areas and does not mean all three seasons occur in the same locations. Using small spatial units for China would resolve this issue. Another may be related to the challenge of reporting the number of seasons per year in regions where rice is cropped almost continuously with no defined start and end of the season. This happens in parts of Vietnam, Southern India, the Philippines, and Indonesia due to abundant water supply all year round. Finally cloud contamination and the use of low spatial resolution images in fragmented cultivation areas in the heterogeneous topography will also affect the detection of rice resulting in missing seasons in some cases.

Comparison of SoS and EoS dates
Here we assess the RICA SoS and EoS dates in relation to those reported in RiceAtlas, and we also compare our results to the literature, with the caveat that the temporal and spatial scale of our analysis is much larger than many of the pixel-level assessments that we compare against. Fig. 6 compares mean SoS and EoS detections in RICA against the peak SoS and EoS dates in RiceAtlas for Asia (top row). We also report separate results for Japan (middle row), and for the Philippines (bottom row), two countries for which the SoS and EoS dates in RiceAtlas are known to be of good quality based on the underlying data that was collected and the expert analyses conducted when compiling those data for RiceAtlas (data for Japan came from the Ministry of Agriculture, Forestry and Fisheries, whereas data for the Philippines came from multiple household surveys conducted by IRRI and PhilRice). Japan has a temperate climate with one rice crop per year and the Philippines has a tropical climate where two rice crops per year is prevalent. Separate results for each country are available in the Supplementary File (Figures S1 to S6). There is a vertical spread of points in the scatterplots with the same RiceAtlas data but differing RICA dates. This is because the Peak of the Season of SoS (EoS) in RiceAtlas is often the same in many neighbouring units. This occurs in several countries.
Considering all of Asia, the total number of combinations of rice seasons and spatial units for which a positive match was found came to 1,186 out of the 1,988 reported in RiceAtlas. There was a very good linear fit between RICA SoS and RiceAtlas SoS (R 2 = 0.88, MAE = 26.4 days and RMSE = 34.2 days), as well as for RICA EoS and RiceAtlas EoS (R 2 = 0.82, MAE = 33.2 days and RMSE 46.5 days) across all Asia. These results suggest that our method can depict the spatial-temporal variability in rice seasonality across Asia.
In Japan, results were found to be very good, with R 2 = 0.86, MAE = 6.1 days and RMSE = 9.0 days for SoS and R 2 = 0.84, MAE = 6.7 days and RMSE = 10.0 for EoS. This noticeable improvement in MAE and RMSE is due to the good quality of SoS and EoS data for Japan in RiceAtlas, and that the temperate climate in Japan means that there is only one rice season per year, with limited cloud contamination. In many ways, this is the easiest rice system for our method to deal with. These results compare well with Sakamoto et al. (2005) who reported RMSEs of 12.1 days for SoS and 10.6 days for EoS based on pixel-level MODIS derived rice crop phenology dates compared against validation data for 30 paddy fields across Japan for 2002.
Results for the Philippines show that our method works well in more challenging conditions. In this case, the linear fit was still good, with a better linear fit and smaller reported errors than those for the Asia-wide results (SoS: R 2 = 0.93, MAE = 25.3 days and RMSE = 33.7 days, and EoS: R 2 = 0.85, MAE = 30.4 and RMSE = 39.4). This is an encouraging result for three reasons. The first is that the Philippines data in RiceAtlas is deemed to be some of the most accurate in the entire database for tropical zones due to regular monitoring and survey activities by IRRI and PhilRice, so the higher R 2 and smaller errors here mean that RICA is performing very well against reliable and detailed data. Secondly, the Philippines has diverse rice growing environments (irrigated lowland, rainfed lowland and rainfed upland) with up to three crops of rice per year in the most intensively cropped irrigated areas, and calendars with a wide range of SoS and EoS dates across the archipelago (stretching across 13 degrees of latitude), and RICA captures this variability well. Thirdly, cloud contamination due to the tropical climate in the Philippines can strongly affect remote sensing with optical data, reducing overall capacity of PhenoRice to detect rice cultivated area and infer rice seasonality . Reliable results under these conditions suggest that our method can robustly detect SoS and EoS in complex and varied rice cropping systems in Asia hence represent a way for periodic update of existing rice crop calendars. These results compare well against pixel-level SoS dates derived from multitemporal SAR imagery (2016-2018) that was compared against 482 ground observations across the Philippines in Gutierrez et al. (2019) with an R 2 of 0.71.
Results obtained for the other countries varied and we report several of these here, covering both major and minor producers as well as a range of environmental and climatic conditions. China ( Figures S1 and S4, SoS: R 2 = 0.60, MAE = 16.2 days and RMSE = 20.3 days, and EoS: R 2 = 0.36, MAE = 25.37 and RMSE = 30.97) has a small number of very large spatial units in RiceAtlas, and several spatial units report the same or very similar SoS/EoS values, which may not reflect the actual variation in practices across such large geographic areas with diverse environments and climates. As with many countries, the errors are smaller for SoS than EoS since the SoS detection is based on a clear detection of agronomic flooding as part of crop establishment, whereas EoS is based on a relative decrease in the vegetation index curve without any clear detection of a harvesting event. The low R 2 is attributed to the close clustering of dates in the small number of data points. We did not find a comparable EO-based analysis of disaggregated rice crop calendar information for China.
In contrast, India (Figures S1 and S4, SoS: R 2 = 0.85, MAE = 27.88 days and RMSE = 35.24 days, and EoS: R 2 = 0.84, MAE = 35.00 and RMSE = 42.56) has a large number of relatively small spatial units, though again many units report the same EoS/SoS pairs suggesting that the RiceAtlas data has been derived from high level aggregations. While the R 2 is generally good, RICA has a much wider set of SoS estimates for the typically irrigated (rabi/dry) season (SoS dates between Nov-Mar compared to Dec-Jan in RiceAtlas) drive up the RMSE and MAE, whereas there is better correspondence between the two in the main wet (kharif/monsoon) season. This could be attributed to more divergence in water management practices in the dry season, where variation in the control of water release in the irrigation systems would to a greater range of SoS dates being observed in RICA than are reported in RiceAtlas. Manjunath and Panigrahy (2009)  where photoperiod sensitive rice varieties are grown in the main wet season, and regardless of when the rice crop is planted these photoperiod sensitive varieties will mature at around the same time. Another compounding factor in Thailand is the coarse level of spatial detail in RiceAtlas where multiple second-level administrative units all share the same reported SoS and EoS dates (see supplemental information for the Thailand SoS scatterplot), which seems unrealistic, whereas RICA reports a much wider spread of SoS and EoS dates, inflating the RMSE and MAE considerably. In the absence of suitable studies in Thailand to compare against we cannot say more about the accuracy of RICA here, but this case highlights the hidden variation in data quality in censusbased compilations like RiceAtlas We found low RMSE/MAE in South Korea ( Figures S2 and S5, SoS: R 2 = 0.48, MAE = 5.7 days and RMSE = 6.7 days, and EoS: R 2 = 0.42, MAE = 2.3 and RMSE = 2.9) which has similar climate and seasonality to Japan. The R 2 is low because of the small number of spatial units, and almost all spatial units in RiceAtlas report the same date, whereas RICA has varying dates per unit, even though they have small MAE/RMSEs. We did not find comparable rice calendar data for South Korea, but we note the temporal analysis by Dong et al., (2016) where SoS dates in South Korea were mostly in the 100-120 (DoY) range, which seems early for this latitude given the germination temperature requirements for rice lies between 20 and 35 • C and this air temperature is not reached until June, whereas RICA and RiceAtlas both report SoS in the 140-150 (DoY) range, as does the Global Information and Early Warning System (FAO, 2020b).
Similarly small countries like Cambodia, Laos and Sri Lanka only have one spatial unit in RiceAtlas precluding a statistical analysis against RICA. More et al., (2016) also report SoS and EoS DoY at country level for smaller states like Cambodia SoS/EoS: 190-200/330, Laos:  and 123/219 (dry) based on SPOT-VGT data for 2009-2010. For RICA the corresponding SoS/EoS DoY are in the same range for Cambodia (main season) 200/340, Laos 130/273, and for Sri Lanka 305/67 (wet) and 124/ 235 (dry), so there is very good correspondence between these two independent EO-based methods at national level.
Our aim here has been to discuss the differences we observe between RICA and RiceAtlas and to highlight some of the characteristics of RiceAtlas that may influence the goodness of fit statistics we reportin terms of the number and relative size of the spatial units per country, as well as the "quality" of the reported dates and whether this attribute information is at the same level of detail as the spatial units or is derived from a higher level of aggregation. At the same time, we have made ad hoc comparisons of RICA to existing rice crop calendars, which are diverse in terms of spatial and temporal data as well as methods. The overall picture is that RICA compares well with existing EO-based information and is unique in that it covers such a large spatial extent and temporal period with a commonly applied EO-based method.  Figures S7 and S8). Areas for which a positive match was not found are shown in grey in RICA maps. The overall patterns across Figure 7, S7 and S8 show good consistency between RiceAtlas and RICA and we make the same brief regional assessment as in section 3.1.

Visual assessment of the spatial and temporal variability in SoS and EoS across Asia
East Asia in general shows good agreement in SoS and EoS dates in the main season. In Southeast China, RICA captures only one of the two seasons in RiceAtlas which is allocated to the 2nd and 3rd seasons ( Figure S7 and S8) with good agreement. Regarding China, we already mentioned some issue related to the reporting of the third rice season that is more a temporal shift of the main season (rice season vs wide variability of establishing date in the same season). In Southeast Asia, RICA tends to have slightly more spatial variability in SoS and EoS than RiceAtlas, which is particularly visible in Indonesia, though the overall patterns in dates is similar across both datasets. Indonesia has a more complex rice crop calendar than many countries in Southeast Asia which may be better represented by RICA than RiceAtlas. In mainland Southeast Asia there is much more spatial variability in RICA than in RiceAtlas and there are also larger differences in terms of months when SoS and EoS are detected. In the case of Myanmar, the RICA SoS dates seem more intuitive with a north-south gradient of later SoS/EoS in the delta regions and earlier SoS/EoS in the north where rainfed rice areas are likely to be planted earlier.
In South Asia, there is a general trend of later SoS in RICA than in RiceAtlas and earlier EoS, leading to a shorter main rice season based on RICA dates though still consistent with the maturity duration of popularly grown varieties in the area. For example, in Tamil Nadu state, medium-to long-duration rice cultivars with a maturity of 125-135 days are often planted during thaladi (the monsoon season between October and February), while early maturing varieties of 105-110 days with high yield potential are regularly planted during kuruvai (the dry season between June and September) (Bordey et al, 2015). Overall, there is good correspondence between the two datasets.  Figures S7 and S8.

Visual assessment of the spatial and temporal variability in SoS and EoS in the Philippines
In most of the provinces in the Philippines, there were two crops of rice detected in both RICA and RiceAtlas (Fig. 8). In some cases, however, those provinces with two or three rice seasons in RiceAtlas have only one or two in RICA. Some of those provinces are mountainous or grow rice in small, fragmented areas. Rice areas that are cropped three times a year are typically located in well irrigated areas and comprise a small portion of rice areas in a few provinces. In RiceAtlas, all calendars occurring in a spatial unit whether comprising the majority or a few areas only are recorded for the entire spatial unit. The northwesternmost province (Ilocos Norte), for example, shows three rice crops in RiceAtlas but this triple cropping system occurs only in parts of two or three municipalities in the province. This perhaps explains why RICA was not able to detect most of these triple cropping systems. In the case of differences in provinces in the Visayas island group, most rice areas in those provinces are rainfed and this could have affected the detection of rice in RICA.
RiceAtlas reports the start, peak, and end dates of SoS and EoS. The crop calendar for the Philippines is based on various sources including primary (farm surveys) and secondary sources (government website). The planting window varies considerably with some provinces having one month and others up to six months. In some provinces where peak dates are not available, these were estimated at the midpoint between the start and end dates of SoS and EoS .

PhenoRice as a basis for crop calendar information
Our method here relies on the pixel-based detection of SoS and EoS from the Phe-noRice algorithm applied to MODIS data for 2003-2016. Because of the lack of a sufficient quantity of detailed validation data on rice SoS and EoS data across Asia, we instead perform a population-based approach and compared against crop calendar information at a much higher level of spatial aggregation. Starting from a population of pixel-based estimations within a spatial unit, the modelling approach identifies representative estimates of the average and spread of SoS and EoS. It is fully automatic, and with the subsequent checking and validity criteria, estimates the number of rice seasons along with the SoS and EoS for the given spatial unit. These estimates are available as the RICA database. We used a constant set of threshold values for the PhenoRice rules across a vast geographic area representing diverse varieties (or genotype, G), environments (E), and rice crop management practices (M). Our method and results are a demonstration of how remote sensing-based methods can capture the diversity of G × E × M interactions to better understand crop production. Our only reference to external information was to look at the distribution of SoS and EoS dates across Asia in RiceAtlas to set the four temporal search windows to ensure a complete enumeration of seasons. The same search windows were used for all years and MODIS tiles. RICA data compared well against more locally tuned rice crop calendar analyses/datasets from diverse studies across Asia, suggesting that our "no tuning" approach works well and is a relevant approach even at continental scale.
Despite the recent successful application of PhenoRice in different agro-ecosystems  we know that local tuning of the algorithm parameters (Table 1; see Busetto et al. 2019) as well as adoption of different indicators from EO data (Liu et al., 2020) could improve the results. However, we decided to use a single configuration across all Asia with no local adaptation of temporal windows or threshold values because automated methods that perform robustly are essential for developing more detailed and timely information about crop calendars over large extents. There were two reasons for this choice. Firstly, we wanted to test how robust the standard settings for PhenoRice were across a broad range of rice growing environments. This would serve as prior information for future work to fine tune such parameters. Secondly, we lacked sufficient information for adapting thresholds across different regions, such as detailed information on rice management practices across Asia. This lack of calibration and validation data has been noted by others (Fritz et al., 2019) as a major impediment to wider use of EO-based approaches. Whilst we could have used RiceAtlas to tell us something about typical rice crop maturities and management practices, we would risk some circularity in then comparing our results against RiceAtlas, and we would also be introducing more variation in information accuracy from RiceAtlas into our parameterisation. Our comparison would then become more challenging.
The challenge of spatially and temporally aggregating DoY information from EO sources Generating a crop calendar for rice is nontrivial given the unique practice of intensive cropping of up to three crops per year on the same plot of land, and sometimes seven seasons across two years. Mapping this across Asia brings additional challenges due to the diversity in season dates, management practices and the duration of different rice varieties. The vast range of climates from temperate to tropical and from arid to very humid means that rice is cultivated in diverse ways across Asia. Temperate and sub-tropical climatic zones have a clear pattern of one rice crop per year where the growing season is limited by temperature. Tropical areas, mostly in Indonesia, Malaysia, South India -Tamil Nadu and Karnataka, some parts of Thailand, the Philippines, and the Mekong Delta of Vietnam, have conditions where rice can be cultivated throughout the year, and where planting and harvesting dates can be varied to suit local conditions rather than being driven by the weather. It is in these more tropical areas that generating a rice crop calendar by administrative unit may run the risk of missing a season, combining two distinct practices into one season, or indicating a supplementary season for late/early crop establishment dates. Using smaller spatial units can mitigate this issue.
Even with small administrative boundaries, they are still arbitrary with respect to the spatial and temporal configuration of what we want to observe; they do not correspond with zones of common rice crop management practices and / or environmental constraints. Since we are comparing our results to a reference dataset, then we chose the same spatial units as the reference dataset. Developing more uniform and quality-controlled representations of crop calendar information should be a priority for crop monitoring applications.
Rice paddies tend to be small, (a small farm of 0.5 to 2.0 ha can be comprised of several smaller plots/paddies). This small field size poses problems for MODIS resolution data with a pixel size of approximately 6 ha for 250 m MODIS data. This is somewhat mitigated (but not entirely) by the spatial clustering of rice paddies due to intensive monoculture practices within irrigation systems, or along low lying delta and river valley areas. These large systems are well detected by MODIS, especially where the irrigation water release schedule is tightly controlled leading to spatial and temporally consistent agronomic flooding and crop calendars. On the other hand, smaller rainfed or upland rice systems will be much harder or even impossible to detect.
Dealing with a large variation in SoS and EoS dates within units and across units was also a challenge. We introduced several steps in our method to accommodate this. We assessed the quality of the RICA database by comparing it against the best available rice crop calendar information from RiceAtlas. RiceAtlas provides 1,076 spatial units across Asia with a total of 1,998 data points once the number of rice seasons per spatial unit is accounted for. The results showed that our methodology could detect the number of rice seasons and the SoS and EoS dates with reasonable to very good accuracy. The result is a unique standardised database of statistically robust SoS and EoS dates with comparable means of estimation (because they are produced with the same approach) and a new automated method for deriving them. We believe that most, if not all other fields crops will have simpler crop calendars than rice and that this general method will remain applicable.
The challenge of comparing an EO-based crop calendar against survey/expert opinion data Whilst RICA is based on consistent input data, RiceAtlas is dependent on information from multiple sources, and by its nature, contains data of varying quality and reliability, both spatially and temporally. There is variation in spatial detail as different countries had calendar information at national, first, or second administrative levels. There is variation in how the calendar data were collected, sometimes from surveys, secondary sources such as reports and websites or expert opinion. These result in a standardised compilation of "best available" data which means that comparison between RiceAtlas and EO-based data requires careful interpretation. We made efforts to compare RICA and RiceAtlas in countries where we have high confidence in the quality of the RiceAtlas data, Japan, and the Philippines, which also gave us a transect of temperate and tropical conditions. The good performance of RICA compared to RiceAtlas in those two countries gave us high confidence that our method was performing well. In countries where the comparison resulted in lower correlations between the two, it is challenging, without additional independent information, to say if the differences are due to lower quality RiceAtlas data or poorer performance of PhenoRice in certain rice systems. Performing further country specific analyses with additional crop calendar data, or with validation from country experts could help address this.

Future work for EO-based crop calendar generation
Crop calendars and the information needed to produce them Crop calendar datasets represent the typical seasons or key dates within a season and they remain valid for as long as there is no change in crop management practices, i.e., the same crops or even varieties are planted at the same time year after year and are managed in the same way. Changes in climate, socioeconomic condition, policy, and technology all drive changes in crop production and crop management, and crop calendar datasets need to be updated regularly to represent current conditions. Calendars can be updated based on farmer surveys and similar reporting instruments, but they are limited in their spatial detail by the number and representativeness of the surveys and the cost of doing this on a regular basis. On the other hand, making use of increasingly available remote sensing information can fill this gap by developing automatic methods able to fill or update existing census-based crop calendars. Remote sensing-based approaches also permit, by observing real crop development, annual reporting of crop stages (i.e., vegetative, reproductive, and ripening), which provides more relevant information for crop monitoring systems. Remote sensing-based approaches also provide repeat analysis every year and retrospective investigation by exploiting the long-term archive of imagery, for the detection of anomaliessuch as delays in planting or crop failure due to drought, and trends in seasonality and cropping intensity with respect to the crop calendar.
Future work Here we reported crop calendar information by administrative unit to match the spatial detail in RiceAtlas. With EO data, there is no need to restrict the spatial representation of crop calendars to units which do not necessarily represent the spatial and temporal variability in cropping practices. At the same time, we need to ensure that the reported calendar dates per unit are based on a sufficiently large sample of SoS and EoS detections to enable robust representations of rice seasonality. This suggests that some spatial and temporal aggregation of detections is still needed, for example using regular 10 × 10 km units for multi-annual, MODIS-based calendars. This would require alternative sources of comparison or validation data, such as high resolution EO-based SoS and EoS maps and associated ground truth data, which are likely to be available in selected sites only. It was this lack of sufficient high resolution validation data across diverse rice cropping systems in Asia that precluded a pixel-based assessment in this study.
The global availability of MODIS data permits expansion to other rice growing regions in Europe, North and South America, Sub Saharan Africa and the Middle East and North Africa. PhenoRice has been demonstrated in Europe , Sub Saharan Africa (Busetto et al., 2019) and in US ) though we are not aware of applications yet in other regions outside Asia. The main challenge is that these rice areas are small and distributed over large areas, so that many MODIS tiles are required to retrieve a relatively small number of rice pixels, but this is a processing constraint that can be overcome with sufficient computational power. Since PhenoRice relies on only a few spectral indices and bands, it can be easily replicated in Sentinel-3/Sentinel-2 for continuity of observation given their suitability for agricultural monitoring (Whitcraft et al., 2015b). Cloud contamination however remains a concern, especially in tropical Asia where cloud cover is pervasive, especially during the monsoon season where rice is by far the dominant crop (Whitcraft et al., 2015c). The incorporation of SAR indices from Sentinel-1, for the detection of flooded or saturated soil conditions and subsequent vegetative growth, would improve the detection skill. Combining Sentinel-1 and 2 could lead to further improvements in the spatial accuracy of PhenoRice leading to a multi-resolution approach that combines the strengths of hypertemporal Sentinel-3 information with the higher spatial resolution of Sentinel-2 and the cloud penetrating capabilities of Sentinel-1.

Conclusions
In this study we present a method to automatically generate a spatial rice crop calendar covering the main rice producing countries of Asia based on a long duration timeseries of remote sensing data. The Phe-noRice algorithm was used to generate pixel-level SoS and EoS estimates for rice for 2003-2016 which were then pooled and analysed to determine the number, SoS and EoS of rice seasons per administrative unit to create the RICA rice crop calendar for Asia. We compared RICA to the state-of-the-art RiceAtlas crop calendar across Asia and made country comparisons for Japan and the Philippines where RiceAtlas data are known to be of particularly high quality. Ad hoc comparisons against more locally tuned EO-based rice crop calendars suggested that the RICA performed comparatively well. This comparison of EO-based and survey-based calendars showed that the EO-based RICA could estimate the number, SoS and EoS of rice seasons sufficiently well and could accommodate the wide range of SoS/EoS dates, cropping intensities, climates, and rice crop management practices across Asia. The RICA dataset consists of SoS and EoS dates per rice season per administrative unit and is a first crop-specific example of using EO information to generate a crop calendar across such a large geographic area using multi year data.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.