Atmospheric moisture as a proxy for the ISMR variability and associated extreme weather events

This study explores the potential of atmospheric moisture content, its transport and its divergence over the ocean and land as proxies for the variability of Indian summer monsoon rainfall (ISMR) for the period 1950–2019. The analyses using multiple linear regression reveal that the interannual and intraseasonal variability of ISMR and the mean ISMR is largely controlled by Arabian Sea moisture flux and Ganga river basin moisture content, and these parameters exhibit statistically significant high correlations in most regions. The regression model and the parameters are statistically significant and the model could explain rainfall variability of about 12%–50% in various regions. The model shows a false alarm rate (FAR) of 0.25–0.45 and a probability of detection (POD) of 0.43–0.50 for wet years in West Central, North West and North Central India. The FAR and POD are about 0.06–0.32 and 0.60–0.70, respectively for dry years in those regions. The model reproduces flood and drought years of about 32%–50% and 55%–70% in those regions. Also, the moisture indices could clearly identify the majority of wet and dry years that occurred during the period. The ISMR variability associated with moisture indices is unaffected by El Niño Southern Oscillation. Henceforth, this study demonstrates the significance of atmospheric moisture on regional rainfall distribution and suggests that these parameters can be used in both statistical and dynamical models to better predict monsoon and global precipitation.


Introduction
The hydrological cycle is an important natural process in which water reaches the atmosphere from water bodies through evaporation and plants through evapotranspiration, which eventually returns to the ground as precipitation. Generally, the Hadley cell in the austral winter supplies the water vapor required for the boreal summer. Therefore, precipitation in the northern hemisphere during boreal summer is made available by moisture in the austral winter, particularly in tropical regions (Peixóto and Oort 1983). In India, there are two main rainy seasons: the southwest (June through September-JJAS) and northeast (October through December-OND) monsoons. The southwest monsoon is the major rainy season bringing moist air from the oceans to the Indian subcontinent, known as the Indian Summer Monsoon (ISM) (Pant and Kumar 1997). During this season, moisture transport is regulated by southwest winds followed by the cross-equatorial flow that normally decides the strength of moisture transport and thus the nature of the ISM to some extent (Ramesh Kumar et al 1999).
It is well known that the relationship between ISM rainfall (ISMR) and El Niño Southern Oscillation (ENSO) has been strong in previous decades (Annamalai and Liu 2005, Mishra et al 2012, Ashok et al 2001, but weak in the recent decades (Krishna Kumar et al 1999, Ashok et al 2001, Pai 2004, Hrudya et al 2020, Seetha et al 2020; while ISMR is correlated well with the Indian Ocean Dipole events (Wang et al 2015, Gadgil and Francis 2016, Yun and Timmerman 2018. Several proxies like Atlantic multidecadal oscillation, Atlantic zonal mode, El Niño modoki and extratropical sea surface temperature that affect ISMR have also been introduced recently (Goswami et al 2006, Zhang et al 2006, Kucharski et al 2008, Pottapinjara et al 2014, Chattopadhyay et al 2015, Feifei et al 2011, Garfinkel et al 2013, but were not very successful in accurate prediction of ISMR. Therefore, it is important to investigate other factors that can be used for predicting ISMR together with the existing climate forcings. Several studies (e.g. Gautam andPandey 1995, Fasullo andWebster 2002) reported the role of moisture transport in deciding the onset and withdrawal of ISMR and wet or dry monsoon years in India. It also modulates the frequency of monsoon depressions over the Bay of Bengal, which is a major factor in deciding the strength of ISMR (Vishnu et al 2016, Vishnu et al 2018. A study by Luis and Pandey (2004) noted surface atmospheric moisture convergence as a predictor for ISMR. Similarly, Ramakrishna et al (2017) showed the influence of moisture divergence for the low rainfall over India in June 2014. Apart from these, the total moisture content, measured as the precipitable water content (PWC), is also considered as a precursor of onset and withdrawal of Indian monsoon in a study by Puviarasan et al (2015). Similarly, the key moisture source regions and their contributions to ISMR have already been identified in previous works (Shukla and Misra 1977, Mei et al 2015, Pathak et al 2017 and are the western, central and upper Indian Ocean, the Ganga river basin and the Red Sea. Atmospheric moisture is a key driver to extreme weather events, and the hydrological cycle is essential for life on Earth and, therefore, their changes have to be monitored to predict extreme rainfall in the context of global warming. The abovementioned studies emphasize the importance of atmospheric moisture content, its transport and divergence, and their connections with ISMR. However, the application of moisture-related factors as proxies of rainfall changes in India has not been examined thoroughly. Henceforth, this study presents new indices with respect to the atmospheric moisture parameters of PWC, vertically integrated moisture flux (VIMF) and vertically integrated moisture flux divergence (VIMFD) over various source regions, and analyze their influences on the regional rainfall changes in India using a multiple linear regression (MLR) model. The interannual and intraseasonal variations of ISMR and the changes in mean ISMR associated with these moisture indices are investigated. Then, the potential of these new indices in explaining extreme weather events is also examined. The influence of ENSO on the moisture parameters is analyzed, and their combined effect in improving the model and interpreting the regional rainfall variability is assessed.

Rainfall data
The India Meteorological Department (IMD) gridded daily rainfall measurements made from rain gauges installed at different places in India are used for the period 1950-2019. The data available on a 0.25 • × 0.25 • latitude × longitude horizontal resolution are area-averaged over different geographic locations: Peninsular India (PI), West Central India (WCI), North West India (NWI), North Central India (NCI) and North East India (NEI). The exact locations of these regions are given in figure S1. PI includes Andhra Pradesh, Tamil  The anomaly of monthly accumulated rainfall time series (in mm/month) for the months from June to September (JJAS) is computed by subtracting monthly climatology (1950-2019) from the accumulated data for the corresponding month over the study regions. Similarly, anomaly of JJAS seasonal rainfall, both accumulated in mm/season and averaged in mm/day is calculated by subtracting corresponding seasonal climatology from the respective data over the mentioned locations. The percentage deviation (in %) of JJAS rainfall is evaluated by dividing the accumulated seasonal climatology from the accumulated seasonal rainfall anomaly time series. Thus, four sets of rainfall anomaly time series (monthly accumulated, accumulated JJAS rainfall, mean JJAS rainfall and percentage deviation of JJAS rainfall) are made for MLR analysis with moisture indices.

Regression indices
The zonal and meridional wind components and specific humidity data at pressure levels of 1000, 925, 850, 700, 600, 500, 400 and 300 hPa on a 2.5 • × 2.5 • spatial resolution for 6 h intervals taken from the National Center for Environmental Prediction and National Center for Atmospheric Research (NCEP/NCAR) reanalysis (Kalnay et al 1996) are used for the moisture flux analyses. The multivariate ENSO index (MEI) data taken from https://www.esrl.noaa.gov/psd/data/climateindices/ list/ for the period 1950-2018 and updated from https://www.esrl.noaa.gov/psd/enso/mei/data/meiv2. data for 2019 are used as ENSO indices. The moisture indices are created from the atmospheric moisture parameters of PWC, VIMF and VIMFD over the moisture source regions as mentioned below. These moisture parameters are directly connected to atmospheric dynamics and regional climate. Therefore, the indices made from these factors would also represent changes in climate and atmospheric dynamics.
The moisture source regions are selected from the VIMFD values at 1000-850 and 1000-300 hPa levels as illustrated in figure S2 and are Arabian Sea (5 • S -15 • N; 48 • -78 • E), Bay of Bengal The locations of these source regions are demarcated in blue in figure 1. The moisture parameters inside the rectangular boxes are considered for creating indices with respect to the Arabian Sea and Bay of Bengal. The regression indices are computed by averaging moisture parameters over the marked regions for each month, and then monthly climatology is subtracted from it and normalized with the standard deviation over 1950-2019. Thus, 12 new moisture indices are made from VIMF, VIMFD and PWC over the Arabian Sea (denoted as VIMF_A, VIMFD_A and PWC_A, respectively), Bay of Bengal (VIMF_B, VIMFD_B and PWC_B), central Indian Ocean (VIMF_C, VIMFD_C and PWC_C) and Ganga river basin (VIMF_G, VIMFD_G and PWC_G). The temporal evolution of the moisture indices at 1000-300 hPa level is provided in figure 2 and that at 1000-850 hPa level is given in figure S3.

Moisture flux calculation
The PWC (W) between the surface (P s ) and tth level (P t ) atmospheric pressure (P) is calculated using the acceleration due to gravity (g) and atmospheric specific humidity (q) as given in Ullah and Gao (2012) and is: The VIMF or the total instantaneous moisture flux transport (Q) is calculated as: The vertical integration is performed over the lower troposphere (1000-850 hPa) and vertical column (1000-300 hPa) using the trapezoidal rule. This VIMF is decomposed into divergent and rotational components (Chakraborty et al 2006) as: The divergence of VIMF (∇. ⃗ Q) is termed as VIMFD.

Formulation of the multiple linear regression model
The influence of atmospheric moisture parameters and moisture sources on ISMR is diagnosed by applying a statistical technique using the MLR developed by Nair et al (2018). As a first step, the correlation between the proxies and accumulated JJAS rainfall is computed at different regions, and then correlation among the proxies is calculated to check the multicollinearity. The proxies that correlate well with other indices are exempted from the analysis. The stepwise regression procedure is used to choose the best model from significant predictors independent of each other (Draper and Smith 2015). The parameter that is highly correlated with the rainfall and satisfying the limits of statistical significance (as mentioned below) is included first, and then the next most highly correlated variable is added. The statistical significance of the model and parameters is tested along with the improvement of the model. All parameters are thus examined at each stage of adding a new proxy and thereby removing variables that are not significant or do not improve the model performance (https://statisticsbyjim.com/regression/modelspecification-variable-selection/). Several statistical methods are adopted to verify the significance of parameters and model. The statistical significance of parameters is assessed using Student's t-test and the overall significance of the model is tested using the F-statistic. The overall fit of the model is examined using R 2 and adjusted R 2 , and its overall accuracy is analyzed using rootmean-square error (RMSE) and bias. The adjusted R 2 and F-statistic decide the improvement of the model as their values are increased only if the added proxy improves the model. The multicollinearity is analyzed using tolerance and the variance inflation factor (VIF). The tolerance should be greater than 0.1 or 0.2 (Lin 2008) and the VIF should be less than 2.5 (Senaviratna and Cooray 2019) for the model to be free from multicollinearity. The autocorrelation in the residuals is tested using the Durbin-Watson (DW) statistic Watson 1950, Durbin andWatson 1951) and its satisfied range is between 1.5 and 2.5. Therefore, appropriate regression models are constructed from the significant parameters that explain maximum variance and are given in equations (4)-(8). The indices corresponding to the Bay of Bengal are not used in the MLR model as these correlate with other indices and are statistically insignificant.
where the left-hand side of the equation is rainfall data at various regions and is treated as predictand, t is years, Z is a constant level term (taken as value 1) and C Z is the intercept that helps to adjust the fit not passing through the origin when a new proxy is added to the equation and is used to adjust the shift of mean predicted value with that of mean observed value (Krzywinski and Altman 2015). C VIMF_A , C VIMF_C and C VIMF_G are regression coefficients of VIMF_A, VIMF_C and VIMF_G, respectively. Similarly, C VIMFD_A and C VIMFD_C are regression coefficients of VIMFD_A and VIMFD_C, respectively. C PWC_A , C PWC_C and C PWC_G are in turn regression coefficients of PWC_A, PWC_C and PWC_G, and ε is the residual. The terms on the right-hand side, except the intercept and residual, are considered as predictors. Another point is that the moisture parameters at 1000-300 hPa are used as regression indices in the PI, WCI, NWI and NCI regions whereas those at 1000-850 hPa are used in NEI (details are provided in section 3.1) for MLR analysis.

Methodology
The rainfall and proxy data are detrended for removing long-term trends from the data. The indices are scaled to unity amplitude before performing regression analysis so that regression coefficients are obtained in the same unit of input data (Gopalapillai 2012). The parameters are solved using the least squares method by minimizing the error of predictors (Press et al 1989). The regression analysis is done on four sets of rainfall data and are (1) accumulated JJAS rainfall anomaly time series (2) accumulated monthly rainfall anomaly time series for the months from June to September, (3) mean JJAS rainfall and (4) percentage deviation of JJAS rainfall. Similarly, regression analysis is performed in three ways for the accumulated JJAS rainfall anomaly with (1) the moisture indices, to see the influence of moisture indices on ISMR, (2) moisture indices and ENSO index, to check the combined impact of ENSO and moisture indices, and (3) ENSO index alone, to see the individual impact of ENSO on ISMR. The statistically significant correlation of JJAS rainfall with VIMF and PWC is positive while that with VIMFD is negative in PI, WCI, NWI and NCI (except VIMF_G at 1000-850 hPa level in NWI). The VIMF over the Arabian Sea, VIMFD over Ganga river basin (except over NCI) and PWC over the Arabian Sea and Ganga river basin show statistically significant and strong correlations with rainfall in those regions. In addition, a significant correlation is shown by the Central Indian Ocean VIMF and PWC at 1000-300 hPa in NWI, and the central Indian Ocean VIMFD at 1000-850 hPa in PI. The Bay of Bengal PWC at 1000-300 hPa also shows a significant correlation in PI. The correlation coefficients of the moisture indices at 1000-300 hPa level are greater than those computed at 1000-850 hPa with rainfall and thus are used for MLR analysis in PI, WCI, NWI and NCI. The rainfall shows the highest correlation with the Ganga river basin VIMFD in PI (−0.54) and NWI (−0.64) whereas with Arabian Sea VIMF in WCI (0.65) and with Ganga river basin PWC in NCI (0.56) at 1000-300 hPa. These correlations are higher than that deduced between rainfall and MEI.

Correlation analysis
In NEI, rainfall shows a statistically significant correlation with the Bay of Bengal VIMF (−0.25) at 1000-300 hPa, the Bay of Bengal VIMF (−0.25), Arabian Sea VIMFD (0.29) and Ganga river basin PWC (0.25) at 1000-850 hPa. The moisture parameters integrated over 1000-850 hPa level are used as proxies as these provide better results in comparison to those at 1000-300 hPa level. Note that NEI is an exception here as the features of monsoon rainfall are opposite to that of other regions in India (Goswami et al 2010, Nair et al 2018. Table 2 illustrates the correlation coefficients between moisture indices themselves at 1000-300 hPa level and also with MEI. In general, correlations between the proxies are small and insignificant, although statistically significant and strong correlations exist between VIMFD and PWC over the Arabian Sea (−0.64) and the Bay of Bengal (−0.83). Similarly, the correlation between the Arabian Sea VIMFD and the Bay of Bengal VIMFD is about 0.5 and that between the Ganga river basin VIMFD and the Arabian Sea PWC is about −0.56. The MEI anticorrelates with VIMF and PWC and correlates with VIMFD. It shows statistically significant correlations with the Arabian Sea VIMF (−0.42), VIMFD (+0.27) and PWC (−0.44), Ganga river basin VIMF (−0.25) and PWC (−0.38), and central Indian Ocean VIMFD (0.30). Similarly, correlations between the moisture indices at 1000-850 hPa level along with MEI are given in table S1 (is available online  at stacks.iop.org/ERL/16/014045/mmedia). The combination of highly correlated proxies (greater than 0.4) is avoided for MLR analysis. However, the influence of MEI on ISMR has been tested keeping the Arabian Sea VIMF in the model, even though these exhibit correlation of −0.42, as Arabian Sea VIMF contributes significantly to the ISMR and therefore cannot be excluded (discussed in more detail below).

Performance of the regression model
The performance of the regression model and the parameters are verified using several statistical tests mentioned earlier. The t-test and probability evaluated for the coefficients of moisture parameters regressed with the accumulated JJAS rainfall are given in table 3(a). The probability is less than 0.01 here, suggesting that parameters used in the model are all significant at a 99% confidence interval. Additionally, regression coefficients and their uncertainty (2× standard deviation) are shown in table S2. The regression coefficients that are statistically significant at a 95% confidence interval are given in bold numbers. The Arabian Sea VIMF and Ganga river basin PWC are significant at a 95% confidence interval in PI, WCI and NWI. In NCI, Ganga river basin PWC is significant at 95% confidence interval whereas Arabian sea VIMF is significant at a 85% confidence interval.
The Arabian Sea VIMFD and Central Indian Ocean VIMF are significant at a 90% confidence interval in WCI and NWI, respectively. In NEI, Ganga river basin VIMF and central Indian Ocean PWC are significant at a 90% confidence interval. Table 4(a) provides the performance of regression models as evaluated from various statistical methods such as R 2 , adjusted R 2 , RMSE, bias, F-statistic, probability, tolerance, VIF and DW-statistic. The R 2 values suggest that the regression model could explain rainfall variability of about 32%, 50%, 41%, 48% and 12% in PI, WCI, NWI, NCI and NEI, respectively. The adjusted R 2 values are comparable to the R 2 values; implying that the employed proxies enhance the performance of the model. The RMSE and bias of the model show small values of about 82-154 and −6e-15 to +0.7e -15 mm/season, respectively, indicating the good performance of the model. The high F values, greater than its critical value at a level of 0.05 are computed from the F-table, and the probabilities are less than 0.01. This points out that the model results are highly significant (at 99% confidence interval) and explain noticeable variance in the PI, WCI, NWI and NCI regions. The model is significant at a 95% confidence interval (probability ∼ 0.03) in NEI. Note that the significance of model results increases as F value increases. The tolerance is greater than 0.1 and VIF is less than 2.5 in all regions; indicating that the model is free from multicollinearity problems. The DW statistic is also between 1.5 and 2.5 everywhere. These statistical tests corroborate that the model is robust for evaluating regional variability of monsoon rainfall and henceforth could be a helpful tool for forecasting ISMR. A study by Pandey et al (2020) also stated the importance of global warming mode along with ENSO in improving the skill of ISMR prediction models.

Intraseasonal variability of ISMR
The intraseasonal variability of ISMR has a major role in the interannual variability of ISMR and on Table 3. The parameter estimates such as t-statistics and probability of the coefficients of regression indices used in the multiple linear regression model. The regression is performed for (a) the accumulated and (b) mean JJAS rainfall data, with moisture parameters, (c) accumulated JJAS rainfall with moisture parameters and ENSO and (d) accumulated JJAS rainfall with ENSO in Peninsular India (PI), West Central India (WCI), North West India (NWI), North Central India (NCI) and North East India (NEI) for the period 1950-2019.

(a)
Accumulated JJAS rainfall-moisture parameters Predictands PI WCI NWI NCI NEI    (Goswami et al 2006, Maharana andDimri 2016). Therefore, the accumulated monthly rainfall anomaly time series is regressed using moisture parameters to explore their influence on the intraseasonal variability of rainfall. Figure 5 shows the impact of moisture parameters on the intraseasonal variability of monsoon rainfall in different regions. In PI, monthly rainfall variability associated with the Arabian Sea VIMF increases from June to July, decreases in August and peaks in September. The impact of Ganga river basin PWC on intraseasonal rainfall variability remains the same throughout the season with an anticorrelation in September. The central Indian Ocean VIMFD affects rainfall variability only in September when it is anticorrelated with it. In WCI, the Ganga river basin PWC related rainfall variability is high and it peaks in September. The rainfall variability associated with the Arabian Sea VIMF increases from June to August and decreases in September. Similarly, the Arabian Sea VIMFD shows a positive correlation throughout the season and its influence is highest in June and August. In NWI, all parameters show a small influence on rainfall variability in June and peak variability is exhibited by the Ganga river basin PWC in July and September. The Influence of Arabian Sea VIMF is highest in August. The central Indian Ocean VIMF shows a positive correlation in June and July, and a negative correlation in August and September. In NCI, the Ganga river basin PWC shows large variability which decreases from June to September. The influence of Arabian Sea VIMF is small with an anticorrelation in June and July, but a positive correlation in August and September. In NEI, the influence of the Ganga river basin VIMF is large and similar in June and July, about 16-18 mm/month and also in August and September, about 12-13 mm/month. The Arabian Sea and central Indian Ocean PWC show intraseasonal variability with their maxima in June and July, respectively in NEI. In general, large Table 4. The summary of the regression model fit such as the coefficient of multiple determination (R 2 in %), adjusted R 2 (A-R 2 ) in %, root-mean-square error (RMSE), bias (Bias), F-statistic (F-stat), probability (Prob), tolerance (TOL), variance inflation factor (VIF) and Durbin-Watson statistic (DW-stat) estimated by regressing the (a) accumulated JJAS rainfall data with moisture parameters, (b) mean JJAS rainfall data with moisture parameters, (c) accumulated JJAS rainfall with moisture parameters and ENSO and (d) accumulated JJAS rainfall with ENSO for the period 1950-2019. The unit of RMSE and bias for the accumulated JJAS rainfall is mm/season and that for the mean JJAS rainfall is mm/day. The F-statistic, probability, tolerance, VIF and DW statistic are unitless quantities.  intraseasonal variability is shown by PWC compared to VIMF and VIMFD, and the central Indian Ocean among the source regions. Also, parameters tend to show anticorrelation mainly in September.

MLR analysis on mean ISMR
The mean ISMR is very important in forecasting rainfall and has a strong impact on agriculture. It is also demonstrated that the interannual variability is decisive to predict mean ISMR (Goswami et al 2006).
Here, mean JJAS rainfall is regressed with moisture indices at different regions for the period 1950-2019. The variability of moisture indices associated with the mean JJAS rainfall is shown in figure S4. The temporal evolution of the mean JJAS rainfall and regressed rainfall anomaly is shown in figure S5. The pattern of the observed and regressed anomaly of mean JJAS rainfall is similar to that of the accumulated JJAS rainfall as shown in figure 3. The statistical significance of the parameters and MLR model for the mean JJAS rainfall is given in tables 3(b) and 4(b), respectively. The difference is that only the dominant parameters like Arabian Sea VIMF and Ganga river basin PWC are significant at 95% confidence interval in PI, WCI and NWI. In NCI, only Ganga river basin PWC is significant, while in NEI, Ganga river basin VIMF and central Indian Ocean PWC are significant. Despite this, the model is significant at a 99% confidence interval in PI, WCI, NWI and NCI as the probability is less than 0.001 and it is significant at a 95% confidence interval in NEI. The computed RMSE and bias (in mm/day) are also very small. Note that as long as the pattern of input and proxy data is not changing, the model output remains the same. This strongly attests that the model is highly stable and very good in explaining regional variability of rainfall.

Percentage deviation of ISMR: detection of wet and dry years
Global warming enhances the moisture holding capacity of the atmosphere which in turn increases the frequency of extreme weather events (Mukherjee et al 2018). Therefore, the percentage departure of JJAS rainfall anomaly is regressed with the moisture indices to test the applicability of the model in reproducing extreme events. The temporal evolution of regressed data along with observed rainfall anomaly for the period 1950-2019 is illustrated in figure 6. The dotted lines represent ±10% deviation from the longterm mean, a condition for determining wet and dry years. If a year meets the condition of anomaly greater (less) than 10 (−10)%, that year is considered as a wet (dry) year (Kumar et al 2013). The analysis on regional average unveils a number of extreme events with total wet years (>10% anomaly) of about 19, 20, 31, 14 and 24, and dry years (< −10% anomaly) of about 20, 23, 24, 20 and 12 in PI, WCI, NWI, NCI and NEI, respectively. Out of these, the regression model reproduces about 6, 10, 15, 7 and 0 wet years and 11, 16, 15, 12 and 3 dry years in the respective regions. The model could explain about 32%, 50%, 48%, 50% and 4% of wet and 55%, 70%, 63%, 60% and 25% of dry years in PI, WCI, NWI, NCI and NEI, respectively. The performance of the model in reproducing extreme rainfall events is calculated using a number of hits (H), false alarms (F) and misses (M), false alarm rate (FAR) and the probability of detection (POD). If both measurements and model satisfy the condition of the extreme rainfall event, it is treated as a hit. If the model does not capture the observed extreme event, it is considered as a miss. The false alarm is that the model shows an extreme event that is not observed (Ashrit et al 2015, Sofiati and Nurlatifah 2019, McBride and Ebert 2000. The FAR is the ratio of a number of false alarms to the number of forecasts (H+F) and POD is the ratio of a number of hits to the number of observed events (H+M). The FAR should be low  Table 5(a) shows H, F, M, FAR and POD computed for the wet and dry years. The estimated FAR is about 0.25-0.45 and POD is about 0.43-0.50 for wet years in WCI, NWI and NCI. In PI, the number of hits is lower than that of misses and so FAR is a little bit high (0.58) and POD is low (0.26) for wet years. In general the number of hits is greater than the number of misses for dry years in PI, WCI, NWI and NCI, indicating that the model could reproduce a good number of dry years there. The FAR ranges from 0.06 to 0.32 and POD varies from 0.55 to 0.70 in these regions. On the other hand, the model could not explain well wet and dry years in NEI. Therefore, the developed model is good in explaining extreme events in West Central, North West and North Central India. The Peninsular and North East India demand more proxies to better interpret the rainfall variability. A study by Rajeevan et al (2007) showed zero false alarm and 77%-100% of POD for the models computed for the period 1981-2004. Similarly, Sharma et al (2017 presented FAR of about 0.5-0.8 and POD of 0.3 for the period 2007-2015. Another study by Pandey et al (2015) presented FAR of about 0.14-0.63 and POD of about 0. 33-0.60 over 1982-2013. The importance is that the model could clearly reproduce severe droughts of 2002, 2004, 2014everywhere except 2002in PI, 2014in WCI and 2004 in NCI. Therefore, an attempt is made to find out a condition for determining wet and dry years from moisture indices used in the model. It is found that the indices stay within ±0.5 in normal years, but exceed this value during extreme events, particularly the indices of VIMF over the Arabian Sea and PWC over the Ganga river basin. A condition is henceforth drawn for finding extreme events from the moisture parameters such that if the index based on Ganga river basin PWC or Arabian Sea VIMF is greater (less) than +0.5 (−0.5); then, it will be a wet (dry) year. These indices well satisfy this condition for wet/dry years and even exceed +1 (−1) during extreme flood (drought) years; demonstrating the effectiveness and relevance of new indices. It attests to the applicability and potential of our statistical model based on moisture-related parameters. The observed wet and dry years that are reproduced by the model and corresponding moisture indices are shown in table S3. A study by Wang et al (2015) reported that the IMD could not forecast extreme events in 1994, 2002, 2004 and 2009 using statistical models for 1989-2012 and confirms that this failure is due to the lack of proxies regarding global warming in the model.

Influence of ENSO on ISMR
ENSO is considered as a major driver for explaining the interannual variability of ISMR even though their relationship was weakening after the 1980s. Therefore, the influence of ENSO on moisture parameters and thus on the ISMR is analyzed regressing the accumulated JJAS rainfall anomaly time series with MEI and moisture indices. The statistical significance of parameters estimated using t-test is shown in table 3(c) and statistical results of the model are provided in table 4(c) for various regions. The statistical significance of moisture parameters remains intact with the addition of MEI. MEI is significant only in PI and NEI whereas the model is significant in all regions. Ideally in the climate data analysis, although a parameter is not statistically significant, it can be used if it improves the model (https://statisticsbyjim.com/regression/modelspecification-variable-selection/). The multicollinearity statistics (tolerance and VIF) are also inside the favorable limit. The MEI improved the model with an increase of R 2 and adjusted R 2 by ∼2% in WCI, whereas slightly lower adjusted R 2 and F-statistic values are estimated in other regions. The rainfall variability associated with MEI and moisture indices is exhibited in Figure 4(b). As provided in the statistical analysis results, MEI does not affect the rainfall variability imposed by the moisture indices although slight changes are observed. Note that the nature of the ENSO index is reversed; it correlates with rainfall in PI and NEI, and is least affected in NWI. The individual influence of ENSO on accumulated JJAS rainfall is also analyzed. The statistical significance of MEI and the corresponding model is provided in tables 3(d) and 4(d), respectively. MEI is significant only in NEI while the model is significant in WCI, NWI and NCI. The R 2 is very low of about 2.98%, 15.6%, 9.5%, 12.3% and 0.01% in PI, WCI, NWI, NCI and NEI, respectively. It categorically points out that the ISMR variability cannot be explained only with ENSO.
Furthermore, the combined impact of ENSO and moisture parameters on the intraseasonal variability of ISMR is analyzed regressing the accumulated monthly rainfall anomaly from June to September and is demonstrated in figure 5(b). Here, the behavior of moisture indices remains intact, albeit with small changes in absolute values. In general, MEI shows anticorrelation with rainfall throughout the season; however, a positive correlation is found mainly in PI (June and September), NWI (June) and NEI (June and August), suggesting large regional variability. The influence of MEI is similar throughout the season in PI. It is high in July and August in WCI. Similarly, the influence of MEI peaks in July and then decreases in NWI while it increases from July to September in NCI. In NEI, the influence of MEI is high in September. Although the interannual variability is not influenced by ENSO, the intraseasonal variability of rainfall shows a profound influence in NWI. In short, ENSO influences rainfall in July and August in Peninsular, western central and northwestern parts of India while in September in the central and eastern parts of North India.
The wet and dry years are also examined using a model developed from moisture parameters and ENSO, and the resulting FAR, POD, number of hits, false alarms and misses are shown in table 5(b). In the case of wet years, the number of hits, misses and hence the POD is similar to that evaluated from the model based on moisture parameters. However, the number of false alarms and FAR is lower in PI, WCI and NWI while the number of false alarms and FAR is a bit higher in NCI compared to the model formulated on moisture parameters. In the case of dry years, F, H, M, FAR and POD are similar to that found in a model derived from moisture parameters in WCI, NWI, NCI and NEI. In PI, the number of hits is lower by one while the number of false alarms and misses is higher by one and therefore FAR is slightly higher and POD is lower for the model including ENSO.

Conclusion
The MLR analysis uncovers the application of atmospheric moisture parameters such as moisture content, moisture transport and its divergence over the moisture source regions of the Arabian Sea, central Indian Ocean and Ganga river basin as proxies of ISMR. The regression model is made from statistically significant moisture parameters that improve model performance. Therefore, indices related to the Bay of Bengal are not used as these are statistically insignificant. The regression is carried out for the seasonal, monthly, mean and percentage deviation of southwest monsoon rainfall.
The moisture indices provide statistically significant and strong correlations (greater than 0.4) with ISMR. The regression model could explain regional rainfall variability of about 12%-50% for the period 1950-2019. The contributions of moisture indices to JJAS rainfall vary in various regions. Among the moisture source regions, the Arabian Sea and Ganga river basin are the largest contributors to the regional distribution of rainfall. Similarly, moisture content and its transport mainly decide the amount of rainfall in all regions. The moisture transport is prevailing in PI and moisture content is dominant in NCI while both affect equally in West Central, North West and North East India. The robustness of the developed model is checked using a number of statistical tests and the model fulfills the conditions for being a good model. The F-values are highly significant (at 99% confidence interval) at four regions implying that the model explains regional rainfall variability very well. The model could explain the intraseasonal rainfall variability and the variability of mean JJAS rainfall too.
The regression model could reproduce about 6 (11), 10 (16), 15 (15), 7 (12) and 0 (3) wet (dry) years out of the 19 (20), 20 (23), 31 (24), 14 (20) and 24 (12) observed wet (dry) years in PI, WCI, NWI and NCI, respectively. The model explains about 32%-50% of drought years, especially 2002, 2004, 2014 and 2015, and about 55%-70% of flood years occurred in those regions. In addition, the atmospheric moisture indices based on Arabian Sea VIMF and Ganga river basin PWC can be used for detecting wet and dry years such that their index is greater than 0.5 for a wet year and less than −0.5 for a dry year and exceed +1 (−1) during extreme flood (drought) years. The model shows a FAR of 0.25-0.58 (0.06-0.32) and POD of 0.26-0.50 (0.55-0.70) for wet (dry) years in all regions except in North East India and that demands careful evaluation considering other relevant proxies in the model.
ENSO shows a good correlation with ISMR, though it is smaller than the correlation between ISMR and moisture indices. The regression analysis reveals that the ISMR variability associated with the moisture indices is unchanged in the presence of the ENSO index while ENSO shows a positive correlation in Peninsular and North East India and a negative correlation in other regions. The ENSO index improved the model by 2% in WCI. ENSO shows a strong influence on the intraseasonal variations though it is not affecting the seasonal variability of rainfall in North West India. It mainly affects the July and August rainfall in Peninsular, West Central and North West India while September rainfall in North Central and North East India. The extreme events evaluated from the model based upon moisture parameters and ENSO are almost similar to that deduced from the model created on moisture parameters.
Henceforth, this study demonstrates the significance of moisture content, its transport and divergence on regional rainfall distribution. This study recommends that these parameters can be used in both statistical and dynamical models to better predict ISMR. It also attests to the importance of local factors in explaining ISMR, as the local factors are affected by the change in global factors. The new indices made from the atmospheric moisture parameters can be employed as proxies for climate change predictions and can be used together with the commonly used parameter of ENSO for improving the prediction of ISMR.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.