Predicting Swarm Equatorial Plasma Bubbles via Machine Learning and Shapley Values

In this study we present AI Prediction of Equatorial Plasma Bubbles (APE), a machine learning model that can accurately predict the Ionospheric Bubble Index (IBI) on the Swarm spacecraft. IBI is a correlation ($R^2$) between perturbations in plasma density and the magnetic field, whose source can be Equatorial Plasma Bubbles (EPBs). EPBs have been studied for a number of years, but their day-to-day variability has made predicting them a considerable challenge. We build an ensemble machine learning model to predict IBI. We use data from 2014-22 at a resolution of 1sec, and transform it from a time-series into a 6-dimensional space with a corresponding EPB $R^2$ (0-1) acting as the label. APE performs well across all metrics, exhibiting a skill, association and root mean squared error score of 0.96, 0.98 and 0.08 respectively. The model performs best post-sunset, in the American/Atlantic sector, around the equinoxes, and when solar activity is high. This is promising because EPBs are most likely to occur during these periods. Shapley values reveal that F10.7 is the most important feature in driving the predictions, whereas latitude is the least. The analysis also examines the relationship between the features, which reveals new insights into EPB climatology. Finally, the selection of the features means that APE could be expanded to forecasting EPBs following additional investigations into their onset.


Introduction
In the post sunset F region of the ionosphere, plumes of low density plasma, known as Equatorial Plasma Bubbles (EPBs) are prone to form.These bubbles were first observed in ionosonde traces, and have subsequently been captured by radar, air glow images, and in-situ detectors [Woodman and La Hoz, 1976, Argo and Kelley, 1986, Retterer and Roddy, 2014].EPBs can cause fluctuations in the amplitude and phase of radio waves that traverse through them [Kintner et al., 2007].These scintillations adversely affect Global Navigation Satellite System (GNSS) and other communication systems which rely on quiet ionospheric conditions.Their morphology, onset, and development is complex and has been the subject of numerous studies over the years.
In the sunlit hemisphere, the neutral wind generally travels in an easterly direction towards the day-night terminator [Heelis et al., 2012], forcing the plasma in an upwards zenith direction under the action of the Lorentz force.Once in the nightside, ionization ceases and recombination dominates.This leads to a large density gradient between the E and F regions.When the interface between these layers is perturbed, the rarefied E / lower F layers are forced vertically upward into the higher density plasma, which itself is being pulled down under the action gravity [Kelley, 2009].This mechanism is known as a Generalized Rayleigh-Taylor instability, γ, and its growth rate is described by [Sultan, 1996] The growth rate of the RTI, γ, was formulated by Sultan in 1996 where P is the flux integrated Pederson conductivity for the E and F layers, V p is the vertical plasma drift, U P L is the Pederson conductivity weighted neutral meriodional wind, g e is the altitude corrected acceleration due to gravity, v ef f is the ion-neutral collision frequency, K F is the total F region flux electron tube content, and R is the ion-electron recombination rate [Sultan, 1996].Because of the conductivity ratio of L , and R will act to suppress an EPB, whereas high values of V p , K F and, g e /v ef f will destabilize the plasma and enhance the likelihood of an EPB [Sultan, 1996, Carter et al., 2020].
The spatiotemporal prediction of EPB occurrence has remained an on-going challenge for a number of years.Whilst the growth rate is described by equation (1), the terms themselves are influenced by local time, geolocation, season, and solar and geomagnetic activity [Burke et al., 2004, Carter et al., 2014a, Kumar et al., 2016, Smith and Heelis, 2017, Aa et al., 2020, Carter et al., 2020].To complicate matters, these climatological markers can often contradict themselves and the relationship between them is nuanced.Geomagnetic activity can both enhance and suppress the onset of an EPB via modified equatorial electrodynamics due to different perturbation electric fields [Aa et al., 2019, Abdu, 2012, Carter et al., 2016, Kumar et al., 2016].The under-shielding prompt penetration electric field (PPEF) tends to be dominant during the storm main phase due to suddenly varying magnetosphere convection, which has an eastward polarity in the dayside through local dusk but westward polarity in the nighttime.This typically enhances equatorial upward plasma drift in the dusk sector and thus facilitates the development of postsunset EPBs, but may disrupt post-midnight EPBs via downward plasma drift.On the other hand, the disturbance dynamo electric field (DDEF) -due to changes in global thermosphere circulation -usually dominates during the storm recovery phase, which has an opposite polarity with PPEF and so tends to suppress postsunset EPBs, but enhances postmidnight EPBs.In addition, the over-shielding penetration electric field due to substorm activity has an opposite polarity with that of PPEF, thereby suppressing the postsunet EPBs, but enhancing postmidnight EPBs.The combination and interaction of these perturbation electric fields leads to complicated occurrence patterns and spatio-temporal variations of EPBs.
Interest in machine learning (ML) within the heliophysics community has grown enormously in recent years [Camporeale, 2019], but its direct application to EPBs remains more limited.A random forest regressor has been employed to predict the vertical plasma drifts, or V P in equation ( 1) [Shidler and Rodrigues, 2020].This is a significant term in the overall onset of an EPB [Tsunoda et al., 2018].Others have used an all-sky imager to train a convolution neural network to detect EPBs, although the results seem more preliminary [Srisamoodkham et al., 2022].EPBs are also known as Spread F, which is a broader class of irregularities or wave-like structures within the ionosphere [Lan et al., 2018].Here ensemble and deep learning methods have been employed to classify and automatically detect Spread F in ionograms [Lan et al., 2018, Luwanga et al., 2022].EPBs are a known cause of radio wave scintillations [Kintner et al., 2007], and ML has been used to predict when and where scintillations may occur [Jiao et al., 2017, Linty et al., 2018, McGranaghan et al., 2018].Lastly, deep learning has also been applied to predict storm-driven irregularities within the ionosphere [Liu et al., 2021].
In this study we present AI Prediction of EPBs (APE), an ML model that predicts the Ionospheric Bubble Index (IBI) index on Swarm.First we introduce Swarm and the IBI product.Then, we analyze the R 2 value which is created by IBI and contains plasma bubbles.Thirdly, we describe the ML models and their performance.Finally, we use Shapley values to interpret and explain the complex interactions within APE, all of which highlights the scientific benefits of using such an approach.

Instrumentation, Data and Observations
Swarm is a three-spacecraft Earth exploration constellation that launched on 22 November 2013.Two spacecraft, Alpha and Charlie, were at an initial altitude of roughly 470 km, whereas Bravo was at 520 km [Friis-Christensen et al., 2008].Alpha and Charlie operate side-by-side, separated by about 1.4°in longitude.All three have a circular near-polar orbit of 87°.Swarm automatically detects EPB's via its Ionospheric Bubble Index (IBI) product, which we use to train our machine learning models.EPBs can be characterised by prolonged and simultaneous changes in B and Ne [Stolle et al., 2006].Swarm has an on-board magnetometer and Langmuir probe to measure these quantities respectively.IBI correlates the strength of ∆Ne and ∆B (where residual B fluctuations in the range 0.04 -0.5Hz exceed 0.2 nT) using the Pearson correlation co-efficient (R).An R 2 > 0.5 is tagged as a 'confirmed bubble' and < 0.5 is an 'unconfirmed bubble'.In addition to a strong R 2 score, bubbles are only confirmed if: detected at night, at latitude < 45°, there are no gaps in the data, and no non-physical measurements from the Langmuir probe or magnetometer.This reduces the risk of contamination from non-EPB events, but it does not stop some plasma blobs from being erroneously labelled as EPBs [Park et al., 2013].These will be more pronounced during solar minimum [Choi et al., 2012].
An example IBI EPB is shown inside the grey box of Figure 1a.Here a ∆B occurs simultaneously with a ∆Ne between the period 0140 -0147, which in turn triggers an IBI R 2 of 0.97.This value equates to a very high chance of EPB detection.A quiet bubble-free ionosphere is shown in Figure 1b.IBI data was accessed via ESA's virtual research environment for Swarm (https://vires.services)and the Python package viresclient [Smith et al., 2022].We also use viresclient to map F10.7 and Kp values to the IBI dataset.We use data from 2014 -2022 at a resolution of 1s across all three spacecraft where R 2 > 0. The date range covers the declining phase of solar cycle 24 and the start of solar cycle 25.We transform the data from a time-series into a 6dimensional space consisting of MLT, latitude, longitude, day-of-the-year, Kp, and F10.7, with each dimension having a corresponding R 2 value (0-1) provided by IBI.This allows us to make a prediction of IBI based on the climatology of EPBs which are dependent on time, geolocation, season, and geomagnetic and solar activity [Burke et al., 2004, Carter et al., 2016, 2020, Aa et al., 2020].It also ensures that the model can be expanded to forecasting, as Kp and F10.7 are readily available via NOAA (https://www.swpc.noaa.gov/products).After re-binning and cleaning, we have ∼42k samples for the machine learning models.Figure 2 shows the distribution of R 2 across the 9-year period.As seen the majority of values cluster around R 2 = 0 and R 2 = 0.9.We are mainly interested in R 2 > 0.7.[Park et al., 2013]. .
Next, we examine the distribution of the 42k samples across the 6 features.Figure 3 shows that 'confirmed' and 'unconfirmed' bubbles are not uniform across the climate markers.Most confirmed bubbles are in the post-sunset time frame (19-24 MLT), with a small increase at 4 MLT (Fig. 3a).The distribution of confirmed bubbles is centered around the geographic equator with only a few instances beyond 25°glat (Fig. 3b).Next, we see that most bubbles occur in the American/Atlantic sector, but that instances exist at all longitudes (Fig. 3c).The majority of EPBs occur around the equinox months and winter solstice, with little activity in July and August (Fig. 3d).Fig. 3e shows that the number of confirmed EPBs declines with Kp, and there are no bubbles detected at Kp > 7. Lastly, we see that EPB activity peaks around F10.7 = 125, but an additional population exists at F10.7 = 220 (Fig. 3f).This panel also reveals that EPBs are generally less likely to occur at F10.7 < 90.Overall, these results align with the existing literature on EPB climatology [?]e.g.,>[]Burke2004, Abdu2012, Park2013, Carter2016, Aa2020.Figure 3 also provides some insight into magnetic-only fluctuations (R 2 < 0.5) in the ionosphere, with F10.7 and Kp showing some interesting distributions (Fig. 3e-f).

Machine Learning
We use supervised machine learning (ML) algorithms to predict the IBI value provided by Swarm.Supervised methods require labels, y i , which we assign to R 2 .We use regression specific architectures as the labels are considered a continuous value.ML has a unique ability to identify complex relationships in data that contains rare events.It can also handle heterogeneity in space-time and large amounts of noise [Karpatne et al., 2018, Camporeale, 2019].
Because of this, we believe it is well suited to the task of predicting IBI and the EPBs contained within it.
Our main algorithm is the eXtreme Gradient Boosting (XGBoost) method which is a tree-based ensemble learner.XGBoost has good control over bias and variance, whilst remaining computationally inexpensive to train and enabling explainability [Chen andGuestrin, 2016, Lundberg et al., 2020].The model's prediction ability is expressed by where ŷi is the prediction value, K is the number of trees, x i is the input data, f k is a function in the functional space F, and F is the set of all the possible regression trees [Chen and Guestrin, 2016].To evaluate the model's performance we need an objective function [Géron, 2019] where y i is the target value (R 2 ), ŷi (t) is the prediction of the i th instance at the t th iteration, and ω is the complexity of the model [Chen and Guestrin, 2016].The term on the left is the loss function, and the term on the right is the regularization term.Regularization controls the magnitude of the parameters, and thus reduces the model's complexity [Géron, 2019].We use the XGBoost package for python (xgboost.readthedocs.io)and Sci-kit learn (scikit-learn.org) to perform the modelling and analysis.GridSearchCV was used to identify the optimal hyperparameters, which are as follows: estimators = 300, alpha = 0.1, subsample = 0.5, and eta = 0.2.The last three parameters are used to prevent overfitting.We divide the samples into train and test datasets with a 80%-20% split.This is randomised initially and then fixed to prevent data leakage across the training runs.
We also tested a Random Forest method [Breiman, 2001] and a standard linear regression approach as part of our study.These will feature as a basis for global performance comparison, but are not subject to extensive analysis.The model's input features and the linear correlation between them is shown in Figure 4.It reveals that there is no strong linear correlation between any of the features, which provides further justification for using an ML approach.

Assessment Metrics
Several metrics are used to assess the performance, skill, and association of the model.Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) are typical performance tests for regression problems [Chai and Draxler, 2014], This tells us if the predictions are close to the target in some part of the data range, but not in others.An ideal value is R = 1.Finally, we examine the skill of the model by looking at its Prediction Efficiency which is based on its mean square error [Murphy, 1988] A model with perfect skill is P E = 1, while P E < 0 shows that the model is no better at making predictions that the average of the target values ⟨y⟩.

Results
The following section presents the performance of the machine learning models in terms of error, association, and skill.It goes on to interpret the behavior of the XGBoost model via Shapley values, determining the importance of the features and the relationships between them.
Figure 5a shows the association (Eq.6) and skill (Eq.7) of the three modelling techniques.As shown, the machine learning techniques outperform the standard linear model, particularly with respect to prediction efficiency (0.45 vs. 0.96), which justifies their use.The same trend continues with RMSE (Eq.5) and MAE (Eq.4), with the RF and XGBoost architecture outperforming the linear regression method across both metrics.The ensemble learners offer a considerable leap across the four metrics, but XGBoost comfortably outperforms the RF in all areas.It achieves a PE, R, MAE, and RMSE of 0.96, 0.98, 0.05, and 0.08 respectively, all of which are excellent scores.XGBoost also trains 3.8x faster than the RF, because it sub-samples and approximates the split points amongst the trees [Chen and Guestrin, 2016].We now select the XGBoost model for further analysis and name it AI Prediction of EPBs, or APE.Generally speaking, APE performs very well across the entire feature space (Fig. 6).It performs poorer at 18 and 3 MLT (Fig. 6a), outside the equatorial region (Fig. 6b), and during low F10.7 (Fig. 6f).These are periods when EPB activity is expected to be lower and is therefore not of concern.The performance also tracks directly to the availability of the data (Fig. 3).That is, when there are more confirmed EPB events to learn from, model performance increases.

Explainability
A key tenet of the study is to understand the factors that influence predictions, as well as the connections between them.To do this we use Shapley Values, which allow us to approximate feature contribution via cooperate game theory [Shapley, 1953].The SHapley Additive exPlanations (SHAP) package for Python (shap-lrjball.readthedocs.io/) treats the features as players, and prediction of R 2 as the pay-off [Lundberg et al., 2020].The predictions and SHAP contributions are calculated with where is the expected value which is ≈ ⟨R 2 ⟩ and is equal to 0.66, and ϕ n is the SHAP value for each of the features n. ϕ represents the contribution to the pay-off, weighted and summed over all possible feature value combinations.Shapley values have the properties of efficiency, symmetry, and additivity, which ensures the pay-off is fair [Shapley, 1953, Lundberg et al., 2020].E[f (x)] can be thought of as the climatology of R 2 , and each of the feature values can contribute to this in a positive (ϕ > 0) or negative (ϕ < 0) way.Shapley values are emerging as the de facto method for explaining the output of ML models [Merrick and Taly, 2020], but their interpretation requires caution and expertise [Kumar et al., 2020].
Figure 7 shows the mean absolute SHAP value across the six features.It shows that, on average, an F10.7 value will influence the prediction by 0.1, which is sufficient enough to consider a prediction a 'confirmed bubble' [Park et al., 2013].Latitude contributes the least with ϕ = 0.04.Fig. 7 also shows that F10.7 is the most influential feature, whilst Latitude is the least.
We now turn our attention to the feature inputs and corresponding SHAP values.Figure 8 shows that ϕ can be positive and negative, but Eq. ( 8) means that we can only interpret the contribution to R 2 when we take the sum of all the SHAP values.ϕ > 0 equates to increasing EPB likelihood, whereas ϕ < 0 is decreasing.Figure 8: SHAP ϕ contributions across the feature space.ϕ > 0 increases the predicted value of R 2 , whereas ϕ < 0 decreases it.Predictions of R 2 > 0.7 are considered to be EPBs, so large values of ϕ > 0 are more likely to be associated with plasma bubbles.Generally the SHAP values follow the climatology outlined in Fig. 3.
In the F10.7 panel (Fig. 8a), we see that low solar activity corresponds to extremely negative SHAP values.This suggests that IBI is primarily detecting magnetic-only fluctuations and that EPBs require F10.7 > 90.Secondly, postsunset values of MLT equate to the highest values of ϕ, with the contribution peaking at 21 MLT (Fig. 8b).It also shows a largely negative contribution after midnight, meaning that most EPBs occur after sunset.Longitude generally follows the known pattern of increased EPB formation over the American/Atlantic sector (Fig. 8c), but there are positive contributions across the longitudinal space.Unlike the previous features, Day of the Year values generally contribute in a positive and negative way across the entire feature space (Fig. 8d).We see high values of ϕ > 0 around the equinoxes and winter soltice, which is to be expected as EPB formation is generally highest during this period.That said, we also see a high positive cluster around the Earth-Sun perihelion, with the highest value of ϕ on Day of the Year = 19.Kp provides perhaps the most intriguing insight into EPB climatology (Fig. 8e).It clearly shows that increasing Kp equates to negative SHAP values, which reduce the likelihood of an EPB.Beyond Kp > 6 we only see ϕ < 0 which increases the likelihood of a B-only fluctuation.Lastly, we see that positive SHAP values are mainly centered around Latitude = 0°to 20°which is expected given EPBs known formation and our use of geodetic coordinates (Fig. 8f).
Next we examine some of the ϕ > 0 values at Kp = 4-5 and Day of the Year = 360 to 21.These are intriguing because the former are the only positive contributions to EPB prediction during a moderate storm, and the latter exhibits the highest ϕ > 0 contribution for that feature.Figure 9 illustrates the values for Kp and Day of the Year, as well as the other features that contribute to R 2 .In all cases we see that the IBI value is > 0.9, and is therefore almost certainly an EPB [Park et al., 2013].It's also evident that Day of the Year is the dominant 'player', with contributions as high as ϕ = +0.27( Fig. 9a).More importantly, Figures 9c-d show the only examples of high Kp equating to positive SHAP values, which also coincide with the Earth-Sun perihelion.Examining this as a whole, Fig. 9 shows that a combination of winter solstice / Earth-Sun perihelion, Kp > 2, and low F10.7 equates to a high chance of detecting an EPB.SHAP ensures that the sum of the contributions always enables a prediction between 0 and 1.

Discussion
APE can reliably predict the IBI R 2 index on Swarm.If it predicts an R 2 > 0.7 it can be considered an EPB.The model has a high accuracy of RMSE = 0.08 and exhibits excellent skill and association.SHAP values reveal the most important features, how features contribute to predictions, and the interrelation between them.We now expand on the IBI observations and SHAP values with respect to geomagnetic activity and seasonal effects.Generally speaking the IBI climate feature observations (Fig. 3) and SHAP (Figs. 8&9) values align with the existing literature: EPBs mainly occur in post-sunset, in the American / Atlantic sector, around the equinox months, and when solar activity is high [Burke et al., 2004, Abdu, 2012, Park et al., 2013, Aa et al., 2020].They also show that magnetic-only fluctuations (R 2 < 0.5) are more likely post-midnight, during low F10.7, and high Kp.
The above suggests that geomagnetic activity suppresses EPB onset.This is supported by the test-set histograms in Fig. 8e, which shows that less data still results in more positive SHAP contributions when Kp is low.However, geomagnetic activity is both able to suppress and enhance EPB formation, via DDEF and/or over and under-shielding electric fields [Aa et al., 2019, Abdu, 2012].Unfortunately, this cannot be fully captured by concurrent Kp owing to DDEF's time-delay effects to the equator.It's possible that indices such as DST or AE are better suited to capturing this, but neither are currently available as forecast products, and thus were excluded from the feature space.To fully capture the influence of geomagnetic activity on EPBs, bespoke indices may be required.For now, this exceeds the remit of this study, especially when the model accuracy is as high as RMSE = 0.08.Kp has been shown to capture day-to-day variability of EPBs during 'EPB Season' [Carter et al., 2014b], but additional work is required to capture them during 'off-season'.If we assume that F10.7 < 90 to be off-season, then Figure 9c-d, shows Kp could be useful at all times, particularly around the Earth-Sun perihelion.That said, Fig.9c-d also shows that identical values of F10.7 can have different contributions for different predictions, which shows that interpreting Shapley values requires caution [Kumar et al., 2020].Another interesting feature is that Kp = 5 also coincides with the only ϕ > 0 at 3 MLT (Figs. 8b and 9c-d).EPBs are suppressed after sunset, but enhanced after midnight during large ∆DDEF [Abdu, 2012], so these points could be direct evidence of over-shielding effects.That said, the MLT values contribute to the pay-off in a negative way (-ϕ) and others have reported that over-shielding is more impactful than under-shielding on vertical drifts [Hui and Vichare, 2019], and so more evidence is required to support this.
Turning to the cluster of positive SHAP values around the Earth-Sun perihelion (Fig. 8d).We would not expect these ϕ values to be higher than the vernal and autumnal equinoxes or December solstice when EPB onset is most probable [Burke et al., 2004].One possible explanation is an increase in the F region density around the Gregorian new year, potentially arising from the Earth-Sun perihelion [Rishbeth and Uller-Wodarg, 2006].The exact cause of this semi-annual variation remains unknown, but we do know that an increased F P in Eq. ( 1) would increase the growth rate of an EPB [Sultan, 1996, Carter et al., 2020].This F region asymmetry has also been linked to increased atmospheric gravity waves, which are a known seeding mechanism for EPBs [Singh et al., 1997, Abdu et al., 2009].That said, the asymmetry happens every year, yet we do not see a large number of points around this period.Although further investigation is required into both the seasonal and geomagnetic influences on EPB formation, this discussion highlights the potential of Shapley values to improve our understanding of bubble climatology and predictability.

Conclusions
In this paper have shown that machine learning can successfully predict the Ionospheric Bubble Index (IBI) on-board the Swarm spacecraft.IBI detects equatorial plasma bubbles in the ionosphere by assessing changes in the plasma density and magnetic field.AI Predictions of EPBs (APE) is able to accurately predict IBI across a range of spatiotemporal conditions.The main findings of our study are summarized below: 1. APE fully captures the climatology of EPBs detected by Swarm.This is made possible with the size and resolution of the dataset (9 years @ 1sec), feature selection, and regression-specific model architecture.APE could also be expanded to forecasting as Kp and F10.7 are currently available via NOAA.2. The XGBoost approach outperforms the other methods (linear regression and random forest) across all metrics.It performs extremely well; presenting a skill, association, and root mean square error score of 0.96, 0.98, 0.08 respectively.3. APE performs well across the entire feature space, especially post-sunset, in the American/Atlantic sector, around the equinoxes, and when solar activity is high.This is encouraging as most EPBs occur during these periods and locations.Extra consideration may be required when using APE around 3 MLT.4. SHAP values reveal that F10.7 is the most influential feature, whereas latitude is the least.SHAP values generally align with the existing climatology of IBI EPBs, which validates these results.5. Additional metrics may be required to fully capture the effects of geomagnetic activity on EPB predictions, but this may compromise APE's ability to forecast them.There is some evidence of high Kp generally suppressing EPB activity, but further investigation into under and over-shielding is required.6.The Shapley analysis also reveals that a combination low solar activity, active geomagnetic conditions, and the Earth-Sun perihelion all contribute to increased EPB likelihood.To the best of our knowledge, this is the first time this exact combination of features has been linked to bubble detection.Although its underlying mechanism needs additional investigation, it does showcase the ability of Shapley values to enable new insights into EPB climatology and predictability.

FP
the onset of an EPB can only occur at night when the E region conductivity is very weak.High values for E P , U P

Figure 1 :
Figure 1: Two examples of Swarm passing over the equator.a) Swarm detects an EPB as indicated by the grey-box.b) Quiet time conditions with no bubbles present.

Figure 4 :
Figure 4: A correlation plot showing the relationship between the features.No strong linear correlation exists between any of the features.

Figure 5 :
Figure 5: The skill (PE), association (R), and performance (MAE, RMSE) of three learning models on the 20% test set.XGBoost outperforms the random forest and linear method across all four metrics.

Figure 6 :
Figure 6: The absolute error |( ŷi − y i )| of APE across the 6 climate features.0 is an ideal score.The uncertainties are calculated with the bootstrapping method [Efron and Tibshirani, 1994], and are represented by vertical bars (a&e) and the shaded areas (b-d, f).

Figure 7 :
Figure 7: The mean absolute SHAP value across the six features.F10.7 contributes an absolute average of 0.1 to the 0.66 baseline and is the considered the most important feature.Latitude contributes 0.04 to E[f(x)] is considered the least.

Figure 9 :
Figure 9: A 'waterfall' plot showing 4 predictions around the Earth-Sun perihelion and Kp >2.The final prediction value is denoted by f (x), and the values represent the contribution to this from the baseline E[f (x)] ≈ ⟨R 2 ⟩ = 0.66.SHAP ensures that the sum of the contributions always enables a prediction between 0 and 1.