A statistical approach towards defining national-scale meteorological droughts in India using crop data

In recent years, several drought indices have been developed and used to monitor local to regional scale droughts on various temporal scales. However, to our knowledge, these indices do not possess generalized criteria to define a threshold in which to declare a national-scale drought. We present a statistical methodology to identify national-scale meteorological drought years in India. We implement a Superposed Epoch Analysis and bootstrap analysis to estimate annual cereal crop production losses as a result of widespread meteorological drought events. For this purpose, the meteorological definition of drought based on the Standardized Precipitation Index (SPI) and Standardized Precipitation Evapotranspiration Index (SPEI), in combination with the country’s cropland area and cereal crops production, is used. The results demonstrate that a national-scale meteorological drought is defined if approximately 19% or more of India’s cropland is affected by meteorological drought (SPI3 and SPEI3 equal to or less than −1.00) throughout the monsoon season (June–September). According to this analysis, depending on the indicator data used, a total of 18to 20 national-scale meteorological droughts were identified in India during 1964–2015, causing a 3.61% to 3.93% composite decrease in cereal crops production. The years which were commonly identified as national scale meteorological droughts over cropland by using different approaches are 1965, 1972, 1987, 2002, and 2009. A similar statistical approach can also be used to define drought thresholds at various spatial scales using the drought indices most applicable to the purpose and scale of study.


Introduction
Drought is an extreme climatological hazard severely affecting agricultural production and threatening local and global food security [1]. Climate variability-rainfall and temperature-is a prominent driver of agricultural production [1][2][3]. Globally, climate variability is estimated to account for 32%-39% of the observed variability in crop yield [4]. Furthermore, the impact of extreme climate hazards (droughts and extreme heat) on national cereal productions has been estimated to reduce crop yield by 9%-10% [1]. India is one of the largest producers and exporters of cereal crops. A decrease in cereal crop production in India, driven by major drought events, could affect the food security of the country. Also, the drought impacts could propagate to the countries which are import-dependent on India. Therefore, for better planning and mitigation efforts, it is imperative to identify the major droughts in the country whose impacts could be felt on the national level production of cereal crops. To date, several drought indices have been developed and used to monitor the severity of local to regional droughts on various temporal scales [5][6][7], and studies have used these indices to assess the impact of drought events on crop production [1,2,[8][9][10][11]. However, current indices do not have generalized criteria to define a threshold in which to declare a national-scale drought. A drought event is considered a national-scale drought when it affects multiple regions or affects a large proportion of the country [12].
The application of meteorological drought indices, such as the Standardized Precipitation Index (SPI) [13] and Standardized Precipitation Evapotranspiration Index (SPEI) [14], uses monthly indicator values to determine whether a geographical area of interest experiences mild (−0.99 to 0), moderate (−1.49 to −1.00), severe (−1.99 to −1.50), or extreme (−2.00 or less) drought [13]. However, the severity of drought events is based on climatology alone, and the relative impacts of such events on agricultural or related socio-economic activities are not considered while deciding the scale of the drought, whether a subnational or national-scale drought. For example, Wallander and Ifft [15] assumed a nationalscale drought in the United States if more than 50% of agricultural land is exposed to Palmer Modified Drought Index with moderate and greater severity (−2.00 or less). Similarly, the India Meteorological Department (IMD) [16] identifies a meteorological drought if the seasonal rainfall in a particular area is less than 75% of its long-term average. Droughts are further classified as moderate and severe if the seasonal rainfall deficit is between 26% and 50% and >50% of the long-term average, respectively. The IMD [16] [16,17]. However, these area thresholds do not consider the measured impacts of individual drought events on a nationalscale.
Understanding the historical drought impacts on national-scale crop production is important for future drought monitoring, declaration, planning, and mitigation efforts. Spinoni et al [18] have developed a global database of meteorological drought events from 1951 to 2016. The droughts listed in this database are based on the SPI and SPEI at different accumulation scales (from 3 to 72 months). The Emergency Events Database (EM-DAT) [19] is a global list of natural and technological disasters (including droughts) from the 1900s to the present day. To be considered a disaster, one of the following criteria must be met: 10 or more people dead, 100 or more people affected, the declaration of a state of emergency, or a call for international assistance. This data is gathered from various organizations, including the United Nations institutes and national governments. However, the database does not cover all disasters, and political limitations can affect the accurate recording of data. Although the list is substantial, the criteria for recording disasters are neither objective nor quantitative and can, therefore, cause uncertainties if the data is used to further investigate disaster impact-including the impact of droughts. Drought disaster data from EM-DAT was used to assess the impact of extreme weather disasters (droughts and extreme heat) on national cereal production [1]. However, due to the limitations mentioned, the drought impact on crop yield may have been over-or underestimated in this study.
In this background, the study aims to develop a generalized approach to define national-scale droughts (mainly meteorological droughts) in India. This study provides, (i) a list of national-scale meteorological drought years from 1964 to 2015, (ii) the composite percentage impact of identified nationalscale droughts on national cereal production losses, and (iii) the threshold of cropland area exposed to drought, which can be used as an indicator for future national-scale drought events in India. It should be noted that the present study does not propose a new drought index but a generalized approach to define area thresholds to identify the national-scale droughts by using commonly used drought indicators.

Data and methods
The data and sources used in this study are given in table 1. Monthly gridded precipitation data available for the globe for the period 1901-2018 at 0.50 degree-resolution was downloaded from the Climate Research Unit (CRU) website [20]. The CRU dataset is based on the analysis of over 4000 individual weather station records. The MATLAB code used to calculate the SPI was downloaded from MAT-LAB Central File Exchange, using precipitation as the input parameter [21]. The code uses Gamma distribution to fit the precipitation. The SPEI using CRU (ver. TS 4.03 precipitation and potential evapotranspiration) data from 1901 to 2018 was downloaded from [14,22]. Global cropland data (annual) for the period 1901-2018 at 0.50 degree-resolution was obtained from the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) [23] as used in Aleman et al [24]. The land area at 0.50 degree-resolution at different longitudes and latitudes was obtained from the ORNL DAAC [25]. The code used to fit the percentage of drought-affected cropland to parametric probability distributions was downloaded from MATLAB Central File Exchange [26]. The World's countries and regions were used as per the United Nations Statistics Division [27], and their shapefiles were downloaded from the Global Administrative and Thematic Mapping and World Borders Dataset [28,29]. Data on cereal crop harvested, production, and yield from 1961 to 2018 was obtained from the Food and Agriculture Organization Corporate Statistical Database (FAOSTAT) [30].
We used the superposed epoch analysis (SEA) method applied in Lesk et al [1] to estimate cereal production loss in response to national drought events from 1964 to 2015. We hypothesize that a nationalscale drought causes a significant decrease in national cereal production. A study by Nath et al [31] supports the hypothesis that a positive relationship between area exposed to drought and a decrease in cereal production exists. For the SEA, we used two indicators: (i) cropland area exposed to drought and (ii) cereal production. In this study, we have used the meteorological definition of drought, i.e. precipitation deficit [32]. Cropland area exposed to drought is based on SPI and SPEI and is expressed as the percentage of a country's cropland area exposed to meteorological drought (SPI3 and SPEI3 equal to or less than −1.00 and −1.500) during the rainy season (June-September). SPI3 performs better over SPI6-, 9-, and 12-month time scales, giving higher (negative) correlations between cropland exposed to drought and percent decrease in crop production. Also, the three-month scale of the drought indicator captures the length of the cropping season, i.e. three to four months. According to Wang et al [33] and Wang et al [34], SPI3 is a commonly used indicator in agricultural drought studies and can represent profit and loss of moisture on a short time scale. The World Meteorological Organization (WMO) [35] describes SPI as a widely applied meteorological drought indicator [36]. It is operational in more than 70 countries due to its versatility (as it can be calculated for different times scales and comparison across different locations is possible). It is quantified based on the probability of precipitation over any time scale. Further details on the SPI methodology are provided by the WMO [35]. The SPEI3 meteorological index, which uses the basis of SPI3 but accounts for the role of water balance (defined as the difference between precipitation and potential evapotranspiration), is used as the second meteorological drought indicator. Simple calculation and multitemporal nature (calculation for different time scales) to monitor droughts with respect to severity, duration, onset, extent, and end are the advantages of SPI and SPEI. According to McKee et al [13], drought begins when the index first falls below zero and ends with the positive value of index following a value of −1.0 or less. The mild drought, which is also called as near normal drought [37], may not reflect the impact on crop production. Therefore, only two thresholds of drought indicators are analyzed in this study: (1) SPI3 and SPEI3 with threshold −1.00 and below which includes moderate, severe, and extreme classification of droughts and (2) SPI3 and SPEI3 with threshold −1.50 and below which includes severe and extreme classification of droughts [13]. The percentage of cropland area exposed to meteorological drought (using SPI3 and SPEI3) is averaged over the monsoon season (June to September) and fitted to probability distribution functions. The exceedance probability (EP) derived from the fitted gamma distribution function is used to extract the potential candidates of nationalscale drought years.
A seven-year window is used for the SEA whereby a cereal production in a particular year is compared with the production three years before and three years after the drought event. The percentage change in production is then calculated for the event year. For example, if only one year (e.g. the event year 1965) is identified in the 1% EP category (100-year return period), then cereal production during the event is compared with the average cereal production of the three years preceding (i.e. pre-3 = 1962, pre-2 = 1963, pre-1 = 1964) and the three years following the event year (i.e. post-1 = 1966, post-2 = 1967, and post-3 = 1968) to avoid the effect of increasing tend of cereal crop production. It is further standardized-with respect to the pre-and post-year averages-to obtain the percentage change in cereal production during the event year. If more than two event years fall into the same EP category (e.g. 1965 and 2002), then more than two SEA windows are defined to obtain composite means of the seven years surrounding both events (pre-3, pre-2, pre-1, event year, post-1, post-2, and post-3 years). The average of the composite means of pre-and post-event years is used to standardize the composite mean of the event year.
We applied a bootstrap method with replacements [38] to determine the significance of the calculated percentage change in cereal production during the extracted years for each EP category. The bootstrap with replacements-i.e. 10 5 SEA repetitions on randomly selected years from 1964-2015is used to obtain fictitious composite mean percentage changes in cereal production. The number of randomly selected years for the bootstrap is kept equal to the observed years in the corresponding EP category. The bootstrap-derived composite mean percentage change at 0.50% significance level (or 99% confidence interval) is used to determine the change point. The change point is defined as the EP category, after which the percentage change ceases to be significant. We, therefore, define a national-scale drought year as a year in which the change point is detected for an EP category. Once the national-scale drought years were identified, we determined the percentage change in composite mean cereal production during the national-scale drought events. Based on explicit statistical analysis, we recommend a minimum land area, extracted from the list of defined national-scale droughts, as a threshold for determining the likelihood of future national-scale droughts in India. The national-scale drought years identified in this study are then compared with the country declared drought years (IMD) as well as the drought disaster events identified in the EM-DAT database. There should  [30] be no direct comparison between the results of the present study and the drought years from IMD and EM-DAT as all three approaches are different and used as a reference for the discussion.

Results and discussion
The cropland data for India from 1901 to 2018, obtained from the Harmonized Global Land Use database, is shown in figure 1(a).  Figure 2 shows the time series of India's cropland exposed to drought during the monsoon season: a primary agricultural season in India (known as the Kharif crop season). We defined drought-affected cropland as areas having SPI3 equal to or less than −1.  2002, 1992, and 1987, more than 40% of the country's cropland was exposed to meteorological drought (SPI3 and SPEI3 equal to or less than −1.00). The Pearson's r between cropland area exposed to meteorological drought (SPI3 and SPEI3 equal to or less than −1.00) is found to be 0.98. Similarly, the Pearson's r between cropland area exposed to meteorological drought (SPI3 and SPEI3 equal to or less than −1.50) is found to be 0.96. The Pearson's r between cropland area exposed to meteorological drought (SPI3 and SPEI3 equal to or less than −2.00) is found to be 0.96 (figures 2(a)-(c)).
The temporal evolution in India's cereal crop area, production, and yield from 1964 to 2015 are shown in figure 1(c). We observe an increasing trend in total cereal production over the past five decades. Despite the increase in production, the harvested cropland area remained fairly stable. We calculate yield as the ratio of total production and area. The data on harvested area and yield were omitted from the SEA, as harvested area data mostly corresponds to the planted area and is thus an unreliable indicator for drought impact assessment. We established a relationship between annual national level cereal production and seasonal (monsoon) cropland area exposed to drought, as described in section 2. During 1964-2015, the percentage change in annual cereal production using a single SEA window and cropland exposed to meteorological drought were correlated (Pearson's r) at −0.56 and −0.60, based on SPI3 and SPEI3 (equal to or less than −1.00), respectively. Our results are in agreement with previous studies demonstrating the dependence of Indian agriculture on the monsoon rainfall [39,40]. The most significant decrease (−15.34%) in cereal production is observed in 1966 occurring when cropland area exposed to meteorological drought was 19.54% and 23.61% using SPI3 and SPEI3 (equal to or less than −1.00), respectively (figures 2(d) and (e)). Figure 3 shows the percentage change in composite means for the categories of years defined based on the EP categories (1% to 100%). The red line shows observed percentage change in cereal production obtained by SEA, and the blue line is the lower significance level at 0.5%, obtained by bootstrap resampling with replacements. The lower significance level (0.50%) suggests 99% confidence interval that the observed percentage decreases in crop production were caused by meteorological droughts. The open circles in figure 3 show the change points at which observed decreases in composite cereal production become non-significant as EP increases (and as the number of candidate years for national-scale drought years increases). The years within the EP categories below this change point are considered significant, whilst the years within the EP categories exceeding this change point are considered non-significant. The bootstrap analysis identifies the change point at 35% EP based on SPI3 (equal to or less than −1.00) and 36% EP based on SPEI3 (equal to or less than −1.00) (figures 3(a) and (b)). The bootstrap analysis identifies the change point at 14% EP based on SPI3 (equal to or less than −1.50) and 22% EP based on SPEI3 (equal to or less than −1.50) (figures 3(c) and (d)) resulting in a smaller number of drought years. We also observe a change point in the lower EP range than change point EP; however, we reject these observations, as we attribute this insignificance to instability associated with the small number of candidate years in the lower EP range. For example, we only identify 1972 as the single candidate of national-scale drought in the 1% EP category ( figure 3(a)), and the decrease in cereal production is 5.23%, which is considered less intense. Generally, a good correlation is observed between cereal production and drought-affected cropland, though this correlation is not as strong as when focusing only on specific years.
In table 2, we compare our list of national-scale drought years (through the approach presented in this study) with national drought years identified by IMD and EM-DAT as a reference for discussion. We observe a difference in identified drought years using different drought indicators and severity (SPI3 and SPEI3 with equal to or less than −1.00 and equal to or less than −1.50 severities) in this study. Both IMD and EM-DAT use subjective criteria to define drought years, whereas the approach presented in this study is statistical. However, the significance level Scatter plot of India's cropland area exposed to (a) SPI3 and SPEI3 equal to or less than −1.00, (b) SPI3 and SPEI3 equal to or less than −1.50, (c) SPI3 and SPEI3 equal to or less than −1.50; and comparison of cropland area exposed to drought (average of Jun-Sept) (d) SPI3 and SPEI3 equal to or less than −1.00 (e) SPI3 and SPEI3 equal to or less than −1.50 with percentage change in annual cereal production compared with pre-and post-three-year averages. The dashed red line in scatter plots shows the 1:1 line. used to determine the change point can bring subjectivity in the present approach. The years which were commonly identified by all approaches (1965, 1972, 1987, 2002, and 2009) are highlighted in gray in table 2.
In the present study, cropland exposed to SPI3 and SPEI3 with equal to or less than −1.00 demonstrates a total of 18to 20 national-scale meteorological droughts in India during 1964-2015 (figures 4(a) and (b)), causing a 3.61% and 3.93% composite decrease in cereal crops production (figures 3(a) and (b)), respectively. The cropland exposed to SPI3 and SPEI3 with equal to or less than −1.50 demonstrates a total of six and eleven national-scale meteorological droughts in India during 1964-2015 (figures 4(c) and (d)), causing a 6.81 and 5.10% composite decrease in cereal crops production (figures 3(a) and (b)), respectively. A study by Nath et al [31] in the . SEA on cropland exposed to drought using (a) SPI3 equal to or less than −1.00 and (b) SPEI3 equal to or less than −1.00 (c) SPI3 equal to or less than −1.50, and (d) SPEI3 equal to or less than −1.50. The red line shows observed percentage change in cereal production obtained by SEA, and the blue line is the lower significance level at 0.50%, obtained by bootstrap resampling with replacements (10 5 repetitions). The black circle identifies the change point at which % change in composite mean (of a set of years) ceases to be significant. The years within the EP categories below this change point are considered significant, whilst the years within the EP categories exceeding this change point are considered non-significant.
northern India found that at least 50% of loss of cereal production is caused by increased area exposed to droughts. In comparison, the production loss reported by IMD (ten drought events) during 1964-2011 and EM-DAT (12 drought events) during 1964-2015 were estimated at 9.45% and 5.33%, respectively. We conclude that depending on the indicator and severity thresholds used, the area thresholds for national-scale droughts will be different, resulting in different levels of composite decrease in production.
According to SPI3 and SPEI3 with different severity, the years identified as national-scale drought years may not show a high decrease in individual years production (years 1968, 2003, 2005, 2012, and 2014) ( figure 4(a)). However, the approach considers % change in the composite mean of cereal crop production for the set of droughts years, which is significant at 0.50% significance level ( figure 3(a)). On the contrary, the years 1976, 1986, and 2015 had comparatively less area exposed to meteorological droughts, but the decrease in individual year cereal crop production was comparatively high ( figure  4(a)). The results using SPEI3 equal to or less than −1.50 shows good agreement with IMD and EM-DAT reported drought events but still misses 1974 and 1982 as drought years having a higher decrease in cereal crop production. This highlights the potential limitations of using meteorological drought indicators to assess the impact on agriculture. The coarse spatial resolution of rainfall, temperature, and cropland data failing to capture the spatial variation could have weakened the relationship between cropland area exposed and decrease in crop production. The meteorological drought expressed using SPI3 and SPEI3 in the high rainfall areas might have resulted in less severe impacts on crop production. The role of irrigation, mainly from groundwater, could reduce the severity of meteorological drought impacts on crop production. Also, increasing adaptive capacity to drought could reduce the severity of impacts.  Comparison of identified drought years from this study with the secondary sources. There should be no direct comparison between the results of the present study and the drought years from IMD and EM-DAT. The gray rows highlight the years which were commonly identified by all the study and other sources.
Our approach SPI3 −1 or less  Our approach SPEI3 −1 or less  Our approach SPI3 −1.5 or less  Our approach SPEI3 −1.5 or less  These aspects remain out of the scope of the present study.
Floods, extreme temperatures, and other externalities may also cause a decrease in crop production. However, as the approach presented in this study is based on the cropland area exposed to meteorological drought, it was proved that the majority of the composite decrease in crop production has resulted from the droughts. The SEA bootstrap significance levels (obtained for candidate years for drought) treats the decrease in crop production from other externalities as random. A similar analysis using floods or extreme temperatures can be used to identify the events showing impact at the national-scale. If there is no change point obtained, then the impact of the externalities is not seen at the national-scale and might be limited to sub-national scales.
National-scale drought years identified in this study for the period 1964-2015 are grouped to the left of the vertical dashed line in figure 4. Most of these years are associated with a prominent decrease in crop production based on SPI3 and SPEI3 meteorological drought indices. The percentage area threshold of drought-affected cropland, above which a year can be defined as a national-scale drought, is 19.04% and 19.24% according to the SPI3 and SPEI3 (equal to or less than −1.00), respectively. With increased severity of SPI3 and SPEI3, i.e. severity equal to or less than −1.50, the percentage area threshold of droughtaffected cropland, above which a year can be defined as a national-scale drought, is 15.44% and 10.39%, respectively.
To our knowledge, the approaches which exist till date are based on assumptions (50% or 20% geographical area exposed to drought conditions is assumed as a threshold to drought). The need for the study was to address these issues and come up with an explicit approach to identify national-scale droughts. But there is no unique drought database to compare our results. Depending on the type of drought, drought indicators, and data sources, the area thresholds and hence drought years and their impacts on crop production will vary. Although the present study uses SPI3 and SPEI3 with two severity levels for identifying the national-scale droughts in India, the stakeholders are free to use the indicators of their own choice as per the data availability. The finding from the study showed that the cropland area threshold for monitoring national-scale droughts as 19%, which is approximately equal to the threshold used by IMD to define All India Drought Year (Deficient Year) with 20% geographical area exposed to drought as explained in the introduction section. The major difference is that the IMD uses geographical area exposed to drought threshold to declare All India Drought Year, whereas our analysis (supported by explicit statistical approach) uses cropland area exposed to drought to identify national scale droughts. This research stimulates the discussion and thinking on drought declarations over different geographical areas with potential scope for the improvements.
The present approach has few limitations to address in future research. The approach presented in this study does not consider the role of groundwater irrigation, which acts as a buffer against meteorological droughts. The impact of other factors like floods, extreme temperatures, crop diseases, and locusts, are also not considered in this analysis. To improve our results, a multivariate drought index could be developed, combining both hydrological and agricultural variables with appropriate severity thresholds (moderate, severe, extreme drought severities). The application of finer-scale cropland data is also necessary to improve the correlation between areas exposed to drought and crop production. However, this study is a first attempt to identify national-scale drought years using an explicit statistical approach as well as considering crop production impact. The proposed approach is expected to stimulate discussions towards defining national-scale droughts and to identify the area thresholds as well as impact indicators for disaster (drought) declaration.

Conclusions
We propose a new methodology with statistical evidence to define national-scale droughts, which takes into consideration the impact of drought on crop production. We apply a SEA and bootstrap analysis to estimate annual cereal production losses in response to widespread drought events in India. We identify a national-scale drought year if more than 19% of India's cropland area is affected by meteorological droughts (using SPI3 or SPEI3 equal to or less than −1) throughout the monsoon season (June-September). Our statistical analysis identifies the occurrence of 18 and 20 meteorological drought years (using SPI3 or SPEI3 equal to or less than −1) during 1964-2015, resulting in a composite 3.61% to 3.93% decrease in cereal production, respectively. Our drought-identification approach can be used on various spatial scales using the drought index of interest to policymakers. We demonstrate that our drought-identification analysis will result in different area thresholds depending on the data sources used, such as cropland, geographical area, and drought index (time scale, severity, and frequency). To improve upon our method, we suggest using a combination of, (1) cropland area (of administrative units or watersheds) and (2) different types of drought indices as per their availability and varying intensities using spatial and temporal SEA windows. This statistical approach can also be used to assess the impact of other national-scale hazards, such as floods and extreme temperatures, on crop production. In particular, the method can be used to define area and EP thresholds to monitor the likelihood of future hazards and their potential impacts on crop production. However, the current model only considers the relationships between rainfall-derived drought indices, cropland area, and cereal crops production, and does not consider the role of supplemental irrigation or increased adaptive capacity, which can reduce the damage caused by drought events. The accuracy of our results should improve with higher resolution rainfall, cropland, and crop production data. The national-scale drought years identified in this approach may show sensitivity to the bootstrap significance level, which was used to differentiate whether the drought impact is significant or not.