Towards interpreting machine learning models for predicting soil moisture droughts

Determination of the dominant factors which affect soil moisture (SM) predictions for drought analysis is an essential step to assess the reliability of the prediction results. However, artificial intelligence (AI) based drought modelling only provides prediction results without the physical interpretation of the models. Here, we propose an explainable AI (XAI) framework to reveal the modelling of SM drought events. Random forest based site-specific SM prediction models were developed using the data from 30 sites, covering 8 vegetation types. The unity of multiply XAI tools was applied to interpret the site-models both globally (generally) and locally. Globally, the models were interpreted using two methods: permutation importance and accumulated local effect (ALE). On the other hand, for each drought event, the models were interpreted locally via Shapley additive explanations (SHAP), local interpretable model-agnostic explanation (LIME) and individual conditional expectation (ICE) methods. Globally, the dominant features for SM predictions were identified as soil temperature, atmospheric aridity, time variables and latent heat flux. But through local interpretations of the drought events, SM showed a greater reliance on soil temperature, atmospheric aridity and latent heat flux at grass sites, with higher correlation to the time-dependent parameters at the sites located in forests. The temporal variation of the feature which effects the drought events was also demonstrated. The interpretation could shed light on how predictions are made and could promote the application of AI techniques in drought prediction, which may be useful for irrigation and water resource management.


Introduction
Due to global warming, the frequency of extreme drought events is rapidly increasing, causing substantial agricultural and social-economic devastation (Lesk et al 2016, Meza et al 2020. There is an urgent need to accurately predict and interpret such extreme events to take countermeasures in advance. Soil moisture (SM) drought often affects plant production and crop yield. It usually refers to a period of decreasing SM and consequent crop failure usually (Van Loon 2016). Artificial intelligence (AI) techniques have been successfully applied to droughts prediction. For examples, random forest (RF) has been used to predict SM-based droughts (Park et al 2019), while long-short term memory (Dikshit and Pradhan 2021) and convolutional neural network (Mokhtar et al 2021) have been applied to predict meteorological droughts.
Conventional AI models can produce reliable prediction results, but their lumped and black-box nature hinders the transparency and physical interpretability. To address this issue, explainable artificial intelligence (XAI) has recently been introduced to understand the underlying physical routes involved in the AI-based modelling of real-world processes (e.g. McGovern et al 2019, Reichstein et al 2019, Irrgang et al 2021. XAI helps to peek into the black box and explain the modelling behaviour (Fleming et al 2021, Gevaert et al 2022. XAI methods provide global and local interpretations for AI modelling (see details in supplementary A). In this study, we focused on XAI related to post-hoc explanations, seeking answers to 'What does the model tell us?' and 'Is the model reliable according to the knowledge from physics?' .
Several efforts have been already made to demonstrate the capability of XAI in drought modelling. Global interpretation methods aim to reveal the drivers of already trained models and their effects on predictions. Permutation importance (PI) is the most practical global interpretation method in drought analysis due to its low computational cost for identifying the order of features importance (Hobeichi et al 2022). Pilz et al (2019) declared that PI is feasible for offering insights into black-box models for droughts prediction in a semiarid region at northeastern Brazil. Rhee et al (2020) used PI to uncover the critical predictors in predicting hydrological droughts in ungauged watersheds and suggested that a priori physical knowledge is required to use PI. Gibson et al (2021) utilized PI to sort out the importance of climate drivers during meteorological droughts. Partial dependence plot (PDP) and accumulated local effect (ALE) have also been used to interpret the drought events. PDP shows the relationship between a predictor and the response feature. In drought prediction, it can visualize the non-linear relationships between the predictors and predicted droughts (Zhang et al 2021). As a pivotal study, Schwartz et al (2019) applied PDP to droughts analysis in USA. ALE is commonly considered as an upgraded method to overcome the limitations of PDP, which assumes that predictors are uncorrelated. ALE has been successfully applied for agricultural drought modelling (Ellahi et al 2021).
On the other hand, local interpretation methods are generally used to explain drought events. Shapley additive explanations (SHAP) has been applied to drought modelling, employing game theory concepts to identify feature contributions in a model-agnostic manner (Štrumbelj and Kononenko 2013). Wen et al (2021) used SHAP to confirm the critical role of precipitation and SM parameters in hydrological droughts at catchment scale. Dikshit and Pradhan (2021) applied SHAP to discover the effect of predictors on different drought conditions at different Australian sites. Chakraborty et al (2021aChakraborty et al ( , 2021b used local interpretable model-agnostic explanation (LIME) and SHAP to explore the reasons for severe droughts in San Antonio. Additionally, Althoff et al (2021) combined individual conditional expectation (ICE) and LIME to interpret hydrological models.
However, there are some gaps in this field that should be addressed as: (i) XAI is rarely applied to SM droughts and previous studies usually just focused on a specific region (Huang et al 2023) or several sites without attention to global coverage; (ii) Previous studies used only one or two XAI methods, but using multiple XAI methods together can enhance interpretation and avoid misunderstandings of XAI (Başagaoglu et al 2022); (iii) The temporal variation of the feature effect in a drought event has not been investigated yet; such an investigation can help to understand the evolution of a drought in detail; (iv) XAI methods have not been well utilized to verify the physical consistency for SM droughts and to find possible flaws in the developed AI models.
The main objective of this study was to interpret AI models for predicting SM droughts and to investigate different aspects of the models. To cope with the aforementioned four problems, the procedures were implemented through: (i) SM droughts were modelled using SWDI (soil water deficit index) over FLUXNET sites; (ii) to reflect the main drivers in predicting SM, XAI methods including global (PI and ALE) and local (SHAP, LIME and ICE) methods were jointly used to analyse the feature effects in modelling. They were mutually verified for the same objects to address mismatches of interpretations from different XAI methods; (iii) the occurrence of each drought event was studied by SHAP and LIME. This allowed the contribution of feature dynamics to the drought to be investigated; (iv) considering prior physical knowledge, explanations were discussed. Such interpretable AI analysis could provide useful insights into underlying processes, improve the ability to predict drought occurrence, and facilitate water resource management for agriculture.

FLUXNET sites and data
Raw data for the modelling were obtained from FLUXNET2015 (https://fluxnet.fluxdata.org/). Thirty sites were chosen to obtain a global spread, covering 8 major vegetation types. The target variable was the surface SM at 5 cm soil depth. The correlated features could be a source of misunderstanding in the interpretation of the model. Thus, on the premise of predictive performance, the correlated features were removed as much as possible. See supplementary B for details of site selection rules, feature description, data pre-processing and feature selection.

Random forest
In this study, random forest (RF, Breiman 2001) which is a robust AI-based model that has shown acceptable performance in SM modelling

Definition of SM drought events
SWDI (Mishra and Singh 2010) is a well-known measure for SM drought events. In this study, a drought event was defined when daily SWDI levels are continuously severe or extreme for more than 15 consecutive days (see supplementary C).

Applied XAI methods and experimental design
XAI framework was used to interpret trained sitemodels for SM predictions as shown in figure 1 (see supplementary D), based on 30 RF models. We also identified SM drought events (supplementary C). We applied XAI methods (see supplementary A and ExplainAI package at https://doi.org/10.5281/ zenodo.6580427) to explain the predictions both at the site and event levels.
Global interpretation aimed to search for essential features by PI and their effects reflected by ALE when the entire site-model is treated as the target. PI values were normalized into [0,1]. Particularly, the ALE can mutually authenticate PI. In local interpretation, on one hand, the main drivers of droughts were detected by SHAP and LIME, jointly. Daily SHAP and LIME feature importance was averaged for an individual drought event. On the other hand, to explore which features and how they could affect SM in a particular event temporally, SHAP and LIME were combined to explain the temporal feature contribution as the drought events are developed. Additionally, to present more details about the models, ICE was used to interpret feature effects in each event. Noted that interpretation of SHAP and ICE were compared to prevent discrepancy between XAI methods. Finally, according to the physical knowledge, the validation of the predictions could be efficiently justified.

Performances of predictive models
To investigate the performance of the site models the statistical metrics (R 2 , RMSE and MAE) based on the testing data set have been presented in figure 2. According to figure 2, R 2 values for all site-models were generally higher than 0.5 with low RMSE and MAE values, which is almost similar to the results reported by Pan et al (2019). It indicates that the sitemodels could lead to acceptable performance, and it was proper to interpret the site-models with XAI as they performed quite well.

Identification of SM droughts
Based on SWDI, 504 drought events were detected through 204 site-year data (see figure 3). As shown in figure 3, the grassland sites (OSH, SAV, GRA and WSA) suffered more drought events each year, with averaged event numbers of 3.92, 2.56, 2.56 and 2.77, respectively (figure 3(a)). On the other hand, drought events that occurred at forest sites (EBF, ENF, DBF and WET) were counted less with averaged event numbers of 1.85, 2, 2 and 1.12, respectively. Briefly, SM droughts at grassland sites were severer than those at forest sites.

Interpretation of black-box RF models
3.3.1. What is the feature significance in predicting SM globally? Figure 4 shows that the most important features (Standardized PI > 0.2) were TS (9 sites), DOY (8 sites), VPD (5 sites), LE (5 sites) and DAY (4 sites). Soil temperature (TS) plays an important role at higher latitudes with sufficient precipitation (table S1). It detects heat exchange and transfer in soil water and heat transformation in land-atmosphere coupling process (Liu et al 2005, Sungmin andOrth 2021). According to physical knowledge about landatmosphere interaction, these dominant features are linked to SM. VPD represents atmospheric aridity, and affects the efficiency of plant respiration both directly and indirectly leading to SM fluctuations (Hao et al 2018, Liu et al 2020. In this study, VPD was a dominant factor where there are savannah or grasslands with lower mean SM values (see table S1). LE is a critical surface energy balance term as it incorporates response of vegetation to environmental conditions via SM, where there are grass sites (Ishola et al 2020). VPD and LE often have a coupled impact on SM in arid areas. Time indicators (DAY and DOY) were also found to be important features. As a component of the hydrologic cycle, SM is characterized by time lagging represented by DAY (Carranza et al 2018, Song et al 2019. Forest sites showed higher dependency on DAY. Similarly, as a representative of the annual cycle and seasonality, DOY also could affect SM predictions. In figures 5(a)-(c), negative correlations between TS and SM were found at the sites. It accords with a statement that high temperature would enhance evapotranspiration leading to a decrease in SM (Nouri et al 2016). Meanwhile, in figures 5(m)-(o), effects of VPD and TS were similar and coincided well with previous studies (Hao et al 2018). Therefore, DOY and DAY's effects were more essential in wet regions (i.e. CN-Qia). At CN-Qia, the surrounding area is well vegetated, and its soil is upland dark brown forest soil. The retention of water in soil may not be transferred easily (Wu et al 2007), and process memory tends to be strong. The relationship between P and SM was almost linear for P values below a certain threshold. Once P exceeds the threshold, ALE shows a horsetail trend. The

Which are the drivers of SM prediction in drought events?
According to figure 6, demonstrated agreement between the SHAP and LIME interpretations indicates the reliability of XAI tools. However, there are a few cases with some differences between LIME and SHAP results (like GPP in DBF). Important features of grasslands and forests were partly different. For grasslands, important features were determined as VPD, TS and LE. During periods with little rain, SM is likely affected by atmospheric aridity and temperature and therefore, high temperature may likely lead to agricultural drought (Bouabdelli et al 2022). In contrast, for sites covered by forest, DAY and DOY were found to be more dominant drivers of droughts. Most features had divergent SHAP values, reflecting the influence of land cover and feature values on their effects (figure 7). For instance, SHAP was strongly correlated with TS values (figure 7(d)). Below 0 • C, SHAP was extremely negative at all sites. Above 0 • C, SHAP was positive within a certain TS range, which was broader for grasslands than for forests.

How do features contribute to predicting SM in droughts?
For the drought event at AU-DaS site in figure 8(d), VPD dominated SM variation. At the beginning, P  was over 7 mm. ICE presents that if VPD was 30 hPa, SM would decrease by about 4%. In mid-term, when VPD was declined, the effect of VPD stayed horizontally. It means that VPD decelerated its effect and decline of SM was retarded. Until the end, VPD and SM were stayed in an extremely low state. This was in accordance with our understanding of atmospheric aridity and the SM drought.
For US-Var drought ( figure 8(b)), RECO was an important factor in predicting SM. During the first  50 d, RECO and its SHAP values decreased significantly, which was relevant to SM decrease. During the mid-term, the low level of RECO corresponded to lasting extreme drought. In ICE ( figure 8(h)), increasing RECO could impose a positive effect on detecting drought at this site. It should be noted that there was precipitation on the 187th day of drought, and it created a peak value for RECO. When precipitation falls in the late-term, the drought is finally terminated.
CN-Qia is covered by forest and relatively wet where SM has great lagging and seasonal characteristics. TS and its SHAP values followed a slight rising trend. Meanwhile, the SM was declined gradually. The ICE shows that higher TS would accompany an increase in SM during the whole period, which obviously contradicts physical knowledge. This indicated that the model failed to reflect this relationship, maybe due to positive correlation between observed SM and TS. That is, SM was very high during the first few days of drought when TS was high, while it was quite low afterward when TS was low, too ( figure 8(i)).

Discussion
The main purpose of this study was to offer insights into RF models for SM drought prediction using XAI method. We chose the RF model because it outperformed the neural network architecture in our test (see supplementary E and figures S1 and S2). XAI methods can provide valuable information including the significance of features in predicting SM, determining drivers of drought events, and the evolution of the relationships between features and their effects during a drought event. We found that (i) for SM prediction, the overall dominant features were VPD, TS, DAY, DOY and LE. Time indicators were more evident in forest and wetland regions. Others were more significant in grasslands with approximately linear relations. (ii) for SM droughts, drivers of SM prediction showed similar interpretations. Most of events that occurred in grasslands showed clear dependency on VPD and TS, due to poor water retention capability of soil. SM in forest was highly related to DOY and DAY, corresponding to higher water retention during dry periods. (iii) the evolution of relationship between features and their effects on SM estimated by SHAP and ICE showed that sitemodels could correctly reflect physical relationship between features and SM in general, though there might be some flaws in some site-models such as CN-Qia. Thus, it made it possible for decision-makers and hydro-meteorologists to understand the behaviours of AI models.
The correlation (i.e. collinearity) between features could affect the interpretation of models, especially for local interpretation. We compared site-models with all features (full set models) and the features with less correlation (reduced set models). The result shows that the presence of correlated features could weaken the effects of important features on SM modelling (figure S4) but had little effect on the ALE effects ( figure S5). Due to the instability of the linear assumption of LIME, the LIME interpretation was vulnerable (figure S6). However, SHAP was relatively valid due to its tree-based structure (figure S7). Thus, we suggest that SHAP is more suitable for interpretation compared to LIME.
Physical consistency is the basis for determining whether a prediction by an AI model is reliable or not. For instance, TS, VPD and LE physically have strong linkages with SM. Via PI interpretation, it was found that they were major contributors to predicting SM (figure 4). The inverse relationships of VPD and TS with SM were demonstrated by ALE (figure 5), which was in agreement with some previous findings (Liu et al 2020). LIME and SHAP (figure 6) also revealed that TS, VPD and LE were dominant for most of the droughts, happened in grasslands. Furthermore, the poor ability of grassroots to store water (Köchy and Wilson 1997) could be matched to SHAP results (Stagge et al 2015). TS is important for water and heat exchange and also important for SM prediction (figure 4) (Sungmin and Orth 2021). TS generally had a negative impact on SM at most of the sites (i.e. AU-DaS, US-Var and CN-Qia, figure 5), except for sites in frozen soil area (i.e. CN-Ha2 and CN-HaM, see figure  S8). For drought events, when TS was under 0 • C with increase of TS, a negative effect of TS was observed at both grass and forest sites (figure 7(d)). Site-models could grasp characteristic of TS revealed by SHAP and LIME. For TS over 0 • C, TS imposed different impacts on SM in grass and forest. This discrepancy can be verified by the field survey and investigation. Thus, site-models could be potentially considered reliable tools for reflecting an explicit relationship including a turning point between TS and SM. In summary, based on physical consistency, developed sitemodels were trustworthy. However, some flaws of site-models were also highlighted through the study, such as incorrect representation of the relationship between TS and SM at CN-Qia (section 3.3.3).
Though physical consistency is evaluated by qualitative description of physical processes or observations, a quantitative evaluation benchmark is gravely lacking here. This might cause inaccurate assessment of physical consistency of already trained models. Such a benchmark is hard to achieve because of the following reasons: (i) the benchmark should be created based on practical aspects of soil physical process. Various descriptions involved in soil process at different land surface models make it more difficult to establish such a quantitate benchmark. For example, correlations between variables (i.e. TS and VPD) are still unclear. (ii) Quantified 'interpretability' of black-box models should be implemented (Toms et al 2020). ALE or ICE might reflect relationships between features and SM. But the relationships might not fulfil criteria for benchmark. Currently, although it is difficult to solve the above problems, it is urgent to establish a feasible benchmark to quantify interpretability of the AI models.
Generally, XAI provides valuable knowledge about the reasons and causality behind SM droughts. That is, the general laws of SM variation and droughts may be learnt by XAI interpretation statistics. For example, SM relies more on VPD in grass and savannah land. With an increase in VPD, the negative effect on SM would be enlarged leading to a rapid decrease in SM. At wetland and forest sites, SM shows more regular temporal variations with more lagging and seasonal characteristics (figure 6). Noticeably, previous XAI studies about SM droughts mostly focused on a local scope such as a basin or several adjacent sites and therefore obtained regularities of variables and SM were less general and representative for more regions. This limitation would be relaxed to some extent in our study as 30 sites covering major vegetation types across the world were considered and analysed.
The unity of different XAI methods applied in this work could effectively prevent weakness of interpretation when only one method was applied. Agreement between the obtained results via different XAI methods could describe different aspects of the relationship between SM and features and assure the credibility of the internal logics of the RF model. To interpret the same object (i.e. a global site-model or a drought event), interpretation from different methods could offer mutual verification. For instance, for global interpretation of AU-Das, US-Var and CN-Qia site-models, PI (figure 4) and ALE (figure 5) were used for mutual verification. For droughts, SHAP and LIME presented almost similar interpretation except for a few cases ( figure 6). However, to interpret a single instance, if features were correlated, LIME would be unstable and the SHAP was seemingly appropriate (figures S4 and S5). Furthermore, evolution of SM with features and their effects across a drought event was investigated. This is an essential step to understand how drought persists when rains are rare. Figures 8(a), (d) and (g) show that VPD's effect is varied with SM. We used ICE and SHAP to examine how SM values were changed with respect to the features. This can be helpful to understand the drought process and its causes. Ultimately, the proposed XAI framework was proved valid and feasible, which could explain AI-based site-model, globally and locally.

Conclusions
An XAI framework was presented in this study to gain physical insights into the SM drought prediction. The framework was reviewed and evaluated using 30 RF site-models with 504 drought events worldwide. From the results and discussion, the following conclusions could be drawn: (i) XAI framework presented in this paper allows for the extraction and visualization of different aspects of the SM prediction site-model; (ii) via PI and ALE, dominant features for SM prediction were identified as TS, VPD, time variables and LE; (iii) via LIME and SHAP interpreting 504 droughts, it seems that SM relies more on VPD, TS and LE in grassland and savannah while it is more related to DAY and DOY in forest and wetland; (iv) through detecting the dynamic contributions of features to SM, it is easy to know how a drought develops owing to different factors; (v) the unity of XAI tools applied on the same object (a site-model or a drought event) can practically prevent misunderstanding of interpretation from merely one method; (vi) there was physical consistent for most investigated cases for the AI model through XAI's interpretation, though some flaws were also identified.
Overall, XAI methods can help to better understand the behaviour of the applied AI model and thus provide guidance to make measures to mitigate the impacts of droughts, especially on crop production. Future works may attempt to develop quantified interpretability benchmark based on physical laws and/or observations, which can cater to the requirement of AI development and application in the landatmosphere field. It is also suggested that the distribution of observations and multiple coupled features should be taken into account when developing XAI tools.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https:// fluxnet.org/data/fluxnet2015-dataset/.