Climate drivers of the variations of vegetation productivity in India

Variations in the uptake of atmospheric carbon by vegetation over India, the second-highest contributor to global greening, have enormous implications for climate change mitigation. Global studies conclude that temperature and total water storage (TWS) cause interannual variations of carbon uptake based on the correlation coefficient, which is not a causality measure. Here, we apply a statistically rigorous causality approach, Peter Clark momentary conditional independence, to the monthly observed satellite and station-based gridded dataset of India’s climate and carbon uptake variables. We find no existence of causal connections from TWS to gross primary production (GPP) or net photosynthesis (PSN). Causal relationships exist from precipitation to GPP and PSN. Since shallow-rooted croplands dominate India’s green cover, impacts of precipitation on carbon capture of the the land ecosystem are immediate and not via TWS. Our results identify the key climate drivers of GPP/PSN variability and highlight interactions between water and the carbon cycle in India. Our results also highlight the need for formal causal analysis using climate and earth sciences observations rather than the conventional practices of inferring causality from correlations.


Introduction
The terrestrial biosphere absorbs 30% of anthropogenic CO 2 emissions, thus mitigating climate change (Schlesinger 1993, Heimann andReichstein 2008). The response of the terrestrial biosphere to future climate change is highly uncertain as a result of human perturbations (Newbold et al 2020). Over the last few decades India has experienced substantial population growth, and its present population is 1.36 billion (Coale and Hoover 2015). India has also experienced steep economic growth (Maparu and Mazumder 2017). Thus, there exists an increasing trend in CO 2 emissions in India (Sun et al 2017, Peters et al 2020, Shahbaz et al 2021. Despite a rising trend in CO 2 emission, India is the second-highest contributor to global greening, as revealed by global satellite datasets (Chen et al 2019). The conflicting contributions from growing human emissions and increasing vegetation to the terrestrial carbon budget make the system complex.
The changes in terrestrial carbon storage are primarily driven by carbon uptake through photosynthesis and respiration (autotrophic and heterotrophic) (Prentice et al 2001). These two processes are controlled by radiation, water and nutrient availability (Schlesinger and Bernhardt 2020). On a global scale, the role of temperature and total water storage (TWS) in driving the interannual variation in the carbon growth rate is well documented (Jung et al 2017, Humphrey et al 2018. However, these relationships are based on an association, which does not infer causality. The association between variables may either be causal or driven by a common driver (Reichenbach and Morrison 1956). Moreover, the number of regional studies in this direction is limited (Frank et al 2015). For the Indian subcontinent, such an analysis is yet to be performed. India has a strong seasonality in climate, driven by the South Asian Monsoon (Reuter et al 2013). The region is also characterized by very high spatial variations in climate, with annual rainfall ranging from 300 mm (desert in the west) to 3000 mm (northeast region and west coast). Greening in India is contributed by agricultural croplands, which have region-specific seasonalities and properties (Chen et al 2019). IPCC SR1.5 assessed (Masson-Delmotte 2018) that agriculture and agricultural economies will be worst affected by global warming and increased climate extremes such as heatwaves, droughts and floods. Hence, the regional climate over India will have a strong impact on its carbon cycle.
The climate impacts on the carbon cycle over India have not been well explored in the literature, which is partially due to the non-availability of data and methods. The complex biosphere-hydrosphereatmosphere interactions in the Indian sub-continent make such an analysis challenging. Conventional correlation-based studies used to understand such processes at a global scale may not be applicable at the local scale due to multiple indirect interactions and confounding factors. Here, for the first time, we apply a well-grounded causality approach to understand the impacts of climate variables on the productivity of Indian vegetation. The science questions addressed here are: (a) are the globally identified factors such as TWS affecting the variability of vegetation productivity in agriculture-dominated India and (b) what is the efficacy of the causality approaches in understanding complex causal processes such as regional water-carbon cycle interactions over South Asia? The methodology implemented for addressing the science questions is elaborated in the next section.

Data
We analyze causal factors of the land biosphere carbon exchange by vegetation over India (supplementary figure 1). Inter-annual variability of land biosphere carbon exchange is governed by climate drivers (Jung et al 2017, Humphrey et al 2018, mainly through photosynthesis. Humphrey et al (2018) suggested that TWS perturbs carbon uptake by plants and, subsequently, CO 2 growth rate (CGR). Humphrey et al (2021) found that the interannual variability in modeled carbon uptake is governed by vapor pressure deficit and temperature, which are in turn controlled by soil moisture. Photosynthetic carbon exchange can be measured as gross primary productivity (GPP), net photosynthesis (PSN) or net primary productivity (NPP). Here, we use GPP and PSN from a satellite product, the Moderate Resolution Imaging Spectroradiometer (MODIS) (Running et al 2015). GPP (kgC day -1 ) is the daily total of photosynthesis which is estimated by the multiplication of absorbed photosynthetically active radiation and the radiation use efficiency coefficient (ε). While GPP is calculated daily, MODIS provides 8-day summations of GPP at 500 m spatial resolution. We also use maintenance respiration (leaf + fine root, R m ) (kgC day -1 ) which is estimated based on the leaf or fine root mass, base maintenance respiration per unit leaf/fine root carbon per day at 2 • C and daily average temperature. It is calculated daily. PSN (kgC day -1 ) is estimated by subtracting leaf and fine root respiration from GPP over a 24 h day.
We use the MOD17A2H (Running et al 2015, Running and Zhao 2019) version 6 GPP product which is a cumulative 8-day composite of values with 500 m pixel size. The data product includes information about GPP and PSN. R m is calculated by subtracting PSN from GPP. We applied quality control filters available in the product and converted the data from an 8-day to a monthly scale by using a weighted average. The MODIS product of PSN does not consider growth respiration (R g ), which is a major limitation of the present study. MODIS also provides the NPP, but at an annual scale, which is not useful for the present analysis at a monthly scale. Based on availability, we used the monthly MODIS GPP and PSN for our analysis. We also used the leaf and fine root maintenance respiration (R m , equation (1)) for generating a separate causal network, as obtained by subtracting PSN from GPP (1) To ensure the robustness of our results we also used net ecosystem exchange (NEE) and GPP from FLUXCOM (Tramontana et al 2016, Jung et al 2020. NEE is defined as GPP minus total ecosystem respiration (autotrophic and heterotrophic). FLUXCOM provides NEE by merging FLUXNET eddy covariance tower fluxes using machine learning. It uses two experimental setups: the remote sensing (RS) setup which is based on satellite data and the RS + METEO setup, which also exploits meteorological data with the mean seasonal cycle of satellite data. We use NEE for the RS + METEO setup. The spatial resolution of the data is 0.5 • . It should be noted that previous studies (Jung et al 2017, Humphrey et al 2018 used CGR, which is the difference in CO 2 concentrations between two successive months. It involves the terrestrial biosphere CO 2 exchange, ocean CO 2 exchange, wildfire and fossil fuel emissions. We could not use CGR in our present regional analysis because (a) there are very few well-calibrated CO 2 monitoring sites in south Asia and (b) the CO 2 emissions from neighboring countries may impact the CGR in India. Here, we examine the role of climate variables on MODIS PSN and FLUXCOM NEE, considered as proxies for carbon uptake.
To understand the sensitivity of carbon exchange to regional monsoon climate, we use the precipitation and temperature data from the India Meteorological Department (IMD) (Srivastava et al 2009, Pai et al 2014 at a spatial resolution of 0.25 • and 1 • , respectively. The IMD provides temperature data by interpolating 395 quality-controlled stations using Shepard's angular distance weighting algorithm with an error of less than 0.5 • C. For precipitation data, it uses a dense network of 6995 rain gauge stations. The data were originally available on a daily scale, which was converted to a monthly scale. For the TWS, we use the satellite data provided by the Gravity Recovery and Climate Experiment (GRACE) (Save et al 2016). The spatial resolution of data is 1 • . GRACE TWS represents the net changes in water storage, which includes groundwater and soil moisture, surface water, snow, etc. The TWS time series has very few missing values; we used linear interpolation to fill this gap. We also use leaf area index (LAI) as an additional variable as this may directly affect photosynthesis and carbon absorption. LAI is obtained from MODIS. It is available at a spatial resolution of 500 m and temporal resolution of 8 days (Myneni et al 2015). We performed our analysis for a continuous period of 201 months (April 2002-December 2018).

Method
We aggregate the data spatially for India and exclude the contribution of the Himalayan Region because of inadequate station coverage (supplementary figure 1). We perform the Mann-Kendall test on time series data to find out whether a significant trend (p-value = 0.05) is present or not. We find that LAI and R m have an increasing trend while TWS has a decreasing trend. Therefore, we detrended these three time series. We further remove the seasonality from our time series data by computing the monthly anomaly with respect to the monthly average. For the classification of forests and croplands, we use the International Geosphere-Biosphere Programme classification of MODIS land cover type (Friedl and Sulla-Menashe 2019) (MCD12Q1) data product.
Global analyses (Jung et al 2017, Humphrey et al 2018 on understanding the causal factors affecting vegetation carbon uptake are largely based on correlation coefficients or partial correlation coefficients. Correlation is also not suitable for capturing the complex nonlinear association present in the earth system where the variables may have high autocorrelation (Runge et al 2019a). It gives information on pairwise linear association between variables. Correlation between X and Y does not imply causation (Altman andKrzywinski 2015, Pearl andMackenzie 2018) primarily because there is no direction associated with it (X → Y or Y → X) and there may be some influence of a third variable (Z) (Runge et al 2019b). This third variable may be a common driver (Reichenbach and Morrison 1956) Metrics such as Pearson's correlation or advanced nonlinear metrics such as mutual information do not confirm any causality. Granger causality (Granger 1969), nonlinear state-space methods (Sugihara et al 2012) and statistical independence-based methods are some approaches used to estimate causal relationships in a system. Granger causality is based on autoregressive modeling and is suitable for a linear stochastic system. Nonlinear state space methods (suitable for a deterministic system) take a dynamical systems approach and are based on attractor reconstruction. Statistical independence-based methods identify causality in nonlinear stochastic systems and are often implemented in graphical model frameworks.

Peter Clark momentary conditional independence
The Peter Clark momentary conditional independence (PCMCI) (Runge et al 2019b) algorithm is based on statistical independence testing and evaluates whether Y t is independent of X t−τ given some conditioning set Z. It takes care of the limitations of correlation. The underlying assumptions (Runge 2018) of PCMCI are as follows: (a) Causal sufficiency, which states that all available information in the universe about X and Y should be considered; although most of the time this assumption is partially fulfilled by assuming that there is no unobserved confounder of X and Y. (b) Causal stationarity, according to which conditional independence should remain stationary; however, the strength may fluctuate over time. (c) The causal Markov and faithfulness assumptions. According to the causal Markov assumption, if X and Y are d-separated in the graph then they will be independent in distribution. Intuitively this ensures that if a physical process does not connect two variables then they should be conditionally independent conditional on their parents. The faithfulness assumption is the converse of Markov's assumption. A brief overview of the PCMCI approach is provided in the supplementary text.

Land use/land cover and vegetation productivity
The land use/land cover classification of India is shown in figure 1(a). It shows that croplands dominate greening in India, with more than 58% area coverage. Forests mainly cover northeastern, western and northern regions. GPP ( figure 1(b)) shows higher values in forest regions and relatively lower values in croplands. Leaf and fine root maintenance respiration (R m ) show similar patterns ( figure 1(c)). Figure 1(d) shows PSN, the difference between GPP and R m . Table 1 shows the PSN values for different vegetation types in India. Spatial mean PSN suggests that forests contribute more to PSN than croplands per  unit area. However, Indian green cover is dominated by croplands, and hence the total contribution of croplands to the national PSN is more than that of forest land. The time series of the spatial average of GPP, R m and PSN over India are plotted in figures 2(a)-(c). The plot shows that the GPP, R m and PSN over India have a strong seasonality, associated with climate. Figure 2(d) shows the mean seasonal cycles of GPP, R m and PSN. GPP has a trough in May, and then it attains a peak in September. During the monsoon season of June to September, when 80% of annual rainfall in India occurs, the vegetation becomes active with an increase in photosynthesis and a subsequent increase in CO 2 uptake. R m has a slightly flatter curve and then a peak in September. GPP is higher than R m throughout the year, resulting in a positive PSN. Unlike GPP, PSN does not attain a peak in September. During the monsoon, vegetation is fully grown and has high productivity. After the monsoon season GPP and R m both start to decrease but R m decreases faster than GPP, leading to higher PSN in the post-monsoon period, in the months of November and December. We further remove the seasonality in GPP, R m and PSN by computing the monthly anomaly with respect to the monthly average. The PSN anomalies are plotted in figure 2(e). Figure 2(e) shows that there is quite a significant variability in PSN anomalies even after the removal of seasonality. No studies are available for understanding the driving factors of such variations. It is noteworthy that the non-seasonal variability (max-min) is of the order of 0.01 kgC m −2 day −1 , which is around 50% of the total variability (maxmin), approximately 0.02 kgC m −2 day −1 , as seen in figure 2(c). Hence, understanding the non-seasonal variations is vital for a better understanding of the carbon cycle in India. It should be noted that the climate variables do have a strong seasonality due to the monsoon climate in India (supplementary figure 2), and here we aim to analyze the causal connections between climate and carbon flux not arising due to seasonality.

Causality analysis
We first use conventional correlation analysis to understand the association between the climate variables and PSN. We obtain the correlation coefficients between PSN and the climate variables precipitation (P), temperature (T), TWS and LAI at a time lag of up to 6 months. Supplementary figures 3(a)-(d) present the correlation coefficients of PSN with P, T, LAI and TWS, respectively. These plots are prepared with the detrended datasets before removing the seasonality. The seasonal signature is quite prominent in all the plots with a cyclic pattern. The seasonal pattern may also dominate over the real associations among variables. We have found that P, T, LAI and TWS have a very strong seasonality, as expected (supplementary figure 2). Hence, we remove the seasonality from all the datasets for further analysis. We present the correlations between the monthly anomalies of the variables from their respective monthly mean seasonal cycles in supplementary figures 3(e)-(h), in the same order as supplementary figures 3(a)-(d). The majority of the correlation values have become statistically insignificant after removing the seasonality. The association between PSN and TWS in India is insignificant, suggesting that TWS is not associated with vegetation productivity and carbon absorption, contrary to the findings of global studies. To understand the reason behind this discrepancy, we chose to perform a statistical independence-based causal discovery analysis.
We use PCMCI (Runge et al 2019b), which shows better performance than Granger causality in nonlinear high-dimensional stochastic systems (Runge 2018). First, we analyzed local climate control on GPP and R m , which are two basic components of the terrestrial carbon cycle, and then analyzed the impacts of . Causal graphs for India show consistent links of two conditional independence-based approaches, namely partial correlation and Gaussian process-distance correlation: (a) gross primary productivity, (b) leaf + fine root maintenance respiration, (c) net photosynthesis. The variables here are spatially averaged over the Indian region excluding the northwest Himalayan region (supplementary figure 1). Before performing the causality analysis, the variables which showed a significant trend (Mann-Kendall test; p < 0.05) were detrended, and thereafter all the variables were deseasonalized. climate on PSN. Figures 3(a)-(c) show only the consistent links in the graphs obtained using partial correlation (ParCorr) and Gaussian process distance correlation. Figure 3(a) shows the causal links for GPP. We find that temperature and LAI are associated in real time with GPP. Precipitation has a causal link with PSN at lag 1. It is noteworthy that there are no causal links from TWS to PSN for India. Causal links for R m are similar (figure 3(b)) to those obtained in figure 3(a), except that R m does not receive a causal link from precipitation. R m has a real-time association with temperature and LAI. A causal link from TWS to R m does not exist. The causal graph for PSN, shown in figure 3(c), summarizes the above-mentioned causal graphs for GPP and R m . Temperature is associated with GPP, R m and PSN in real time, while precipitation sends a causal link to GPP and, hence, to PSN. It is noteworthy that precipitation has a causal link with TWS in all three causal graphs, as expected. No causal link originates from TWS to GPP, R m or PSN. Hence, we may conclude that TWS does not have any causal association with vegetation productivity in India. Instead, they are more directly affected by precipitation and temperature. These results also suggest that the high association between TWS and the carbon growth rate found in global studies may be due to the effect of precipitation on TWS, which is a common driver of TWS and vegetation productivity in India.
To further ascertain that specific data products do not influence our results, we performed the same analysis with FLUXCOM GPP and NEE. Supplementary figures 4 and 5 summarize the results for GPP and NEE, respectively. We do not find any causal connection between TWS and GPP/NEE, similar to figure 3. The influence of TWS is not visible because croplands dominate greening in India (figure 1). In the majority of croplands precipitation does not take long to reach the shallow root zone. Hence, shallow-rooted vegetation responds directly to precipitation and the intermediate role of TWS is not visible. For FLUX-COM GPP, the causal connections are very similar to those obtained from the MODIS GPP products. For FLUXCOM NEE, we find real-time connections with temperature, precipitation and LAI. The only difference from figure 3(c) is the lack of a causal connection between precipitation and FLUXCOM NEE. FLUXCOM NEE considers heterotrophic respiration. Hence, the causal structure from NEE is not directly comparable to that obtained using PSN from MODIS. However, similar causal variables identified in the analysis of the two different datasets show the consistency of results, to a certain extent.

Role of land use/land cover
To understand the role of land use/land cover, we perform a causal analysis for forests and croplands separately. For forests, shown in figure 4(a), the causal  1). Before performing the causality analysis, the variables which showed a significant trend (Mann-Kendall test; p < 0.05) were detrended, and thereafter all the variables were deseasonalized. link to PSN is generated from temperature, while precipitation is associated in real time with PSN. For croplands (figure 4(b)), precipitation causes changes in PSN, and there is a real-time connection between temperature and PSN. As croplands dominate India, the connections are very similar to those obtained for the entire country ( figure 3(c)). For forests, the vegetation is deep-rooted: an immediate causal effect of precipitation on PSN is not visible and temperature impacts are dominant. Precipitation affects the shallow-rooted plants in croplands, and the causal link is visible at a lag of 1. It is important to note that TWS does not cause variability in PSN for cropland or forests. This is counter-intuitive for forests, and the reason could be the limitations of the TWS data. Forests are scarce in India in the northeast, the west coast and several other small regions. At such a fine resolution, the GRACE TWS data are not reliable (Vishwakarma et al 2021). Hence, the causal link involving TWS in scarce forests does not reflect the causal processes. It should also be noted that a large portion of forests lie in hilly regions where the data quality may not be good.

Discussion
We attribute the dependence of vegetation productivity in India to precipitation rather than TWS due to the dominance of croplands. Croplands typically have shallow-rooted plants. For shallow-rooted vegetation, the impact of precipitation on plant photosynthesis is immediate (supplementary figure 6(a)), unlike in deep-rooted forests. For most croplands water needs very little time to reach the soil within the shallow root zone, and hence the intermediate role of TWS is not visible in Indian regions. However, things may be different for irrigated regions, where the depletion of TWS is associated with high irrigation and higher primary production (supplementary figure 6(b)). Higher coverage of croplands with a different irrigation practice in India makes the interactions between the carbon and water cycle region-specific, unlike global traits. This demands regional land-atmosphere modeling of South Asia with the inclusion of dynamic vegetation and the carbon cycle. To the best of our knowledge, such studies are limited. Newer scientific insights about regional water and carbon cycles over the South Asian region through regional observations and modeling will provide important information for climate change projection studies. Our analysis also calls for statistically rigorous causality analysis for attribution studies instead of over-simplified, widely used correlation or partial correlation.

Data availability statement
Data used are from sources referenced in the manuscript.
No new data were created or analyzed in this study.
the funding required at IIT Bombay to perform the analysis.