IDP-PGFE: an interpretable disruption predictor based on physics-guided feature extraction

Disruption prediction has made rapid progress in recent years, especially in machine learning (ML)-based methods. If a disruption prediction model can be interpreted, it can tell why certain samples are classified as disruption precursors. This allows us to tell the types of incoming disruption for disruption avoidance and gives us insight into the mechanism of disruption. This paper presents a disruption predictor called interpretable disruption predictor based on physics-guided feature extraction (IDP-PGFE) and its results on J-TEXT experiment data. The prediction performance of IDP-PGFE with physics-guided features is effectively improved (true positive rate = 97.27%, false positive rate = 5.45%, area under the ROC curve = 0.98) compared to the models with raw signal input. The validity of the interpretation results is ensured by the high performance of the model. The interpretability study using an attribution technique provides an understanding of J-TEXT disruption and conforms to our prior comprehension of disruption. Furthermore, IDP-PGFE gives a possible mean on inferring the underlying cause of the disruption and how interventions affect the disruption process in J-TEXT. The interpretation results and the experimental phenomenon have a high degree of conformity. The interpretation results also gives a possible experimental analysis direction that the resonant magnetic perturbations delays the density limit disruption by affecting both the MHD instabilities and the radiation profile. PGFE could also reduce the data requirement of IDP-PGFE to 10% of the training data required to train a model on raw signals. This made it possible to be transferred to the next-generation tokamaks, which cannot provide large amounts of data. Therefore, IDP-PGFE is an effective approach to exploring disruption mechanisms and transferring disruption prediction models to future tokamaks.


Introduction
The safe and stable operation of next-generation tokamaks requires effectively avoiding and mitigating disruptions 1,2,3 , which relies on predictions and in-depth understanding of plasma instabilities and disruption.The complexity of the physical mechanism of disruption leads to the difficulty in using first-principle models to predict plasma instabilities and disruptions 4 .Physics-driven and data-driven are two main approaches to study disruption prediction.On the one hand, physics-driven disruption prediction studies [5][6][7] put effort into identifying disruption or instabilities event chains for operators to avoid disruptions.Such as, the disruption event characterization and forecasting (DECAF) 8 suite contains first-principle, physics-based modules for instabilities identification in tokamaks.On the other hand, most recently developed disruption predictors are based on data-driven methods because of the better performance [9][10][11][12][13] .
However, most of these data-driven models do not necessarily reflect the dynamics behind the phenomenon.Their physics fidelity and the interpretability of their results require in-depth investigation.Interpretable disruption prediction research has made preliminary progress these years.The physics-driven models try to combine the advantages of both paradigms by adopting surrogate machine learning (ML) models 14 , which could also improve the interpretability of these ML models.In data-driven method studies, an interpretability model, achieved by applying symbolic regression methods [15][16][17] , has been obtained with the support vector machine (SVM) in JET.An approach to interpret the 1.5D convolutional neural network (CNN) model has been developed in HL-2A 18 , which is a counterfactual-based interpretable approach.A Class Activation Mapping method has also been used to interpret the deep learning model predictions 19 .The hazard-based event prediction method has been introduced on DIII-D 20 .Its full deployment requires additional assumptions (models) about the future evolution of the relevant covariates.The physics-based indicators of disruption precursors based on the peaking factors were developed and adopted at JET [21][22][23] .They have been also implemented to interpret machine learning-based disruption predictors across tokamaks on DIII-D and JET 24 .Moreover, and automatic feature extraction based on deep learning from plasma profiles was also developed 25 .
The interpretable disruption prediction research is in the infancy stage.In comparison, the interpretability research has made more advanced progress in machine learning research.A prototypical part network (ProtoPNet) 26 dissects the image by finding prototypical parts to construct a deep interpretable network architecture.The perspective that guided the interpretability model design is that we should stop "interpreting" decisions made by black-box models after the fact and instead build models that are constructively interpretable 27 .This kind of perspective could also be suitable for disruption prediction and inspire us that we should design a disruption predictor that is inherently interpretable.In mathematics, machine learning-guided intuition helps researchers to prove theorem via attribution techniques 28 .SHapley Additive exPlanations (SHAP) is a unified attribution-based interpretable approach [29][30][31] , which has already been successfully applied in explaining the prediction of hypoxemia during surgery 32 .
The perspective and techniques in machine learning research could give us more inspiration for interpretable disruption prediction research.The focus of most disruption predictors is accuracy or cross-machine capability.In this work, we would like to design a disruption predictor on the purpose for both accuracy and interpretability.In our previous work, SHAP was applied to the LightGBM 33 based disruption predictors for J-TEXT and HL-2A 34 .The interpretability study provided a strategy for selecting diagnostics and shots data for developing cross-machine predictors.However, the interpretability of the model is not yet in-depth enough due to the limitations of the input signals, the raw diagnostic signals.
The work from DIII-D and JET teams 21,23,24 found that physics-based disruption markers in data-driven algorithms are a promising path toward realizing a uniform framework to predict and interpret disruptive scenarios across different tokamaks.An interpretable disruption predictor could also be designed based on physics-based disruption markers.Therefore, in this work, we would like to design an interpretable disruption predictor based on physics-based disruption markers by using SHAP as the interpretable approach.
The way to obtain the physics-based disruption markers in our work is called Physics-Guided Feature Extraction (PGFE).The aim of PGFE is not only to enable the model interpretability but also to improve the predictor's accuracy.Deep learning's powerful feature extraction capabilities make it typically outperform traditional machine learning when using a large enough amount of raw signals.However, when there is not enough data, inductive bias can be introduced to the network to better embedded known knowledge of disruption in the model.Inductive bias will limit the search domain of the machine learning algorithm, so it can get a better result when there is not enough data.Using feature engineering with known knowledge of disruption to reduce the feature space dimension limits the search domain of the training algorithm so that it can be considered as adding inductive bias.Adding domain knowledge to machine learning is thought necessary to reach good performances in low data regimes 35 , and it is the base for meaningful physical interpretation.PGFE could even reduce the data requirement of the predictor.Due to the high cost of disruptive data from ITER and next-generation tokamaks, disruption prediction researchers need to find a way to cross-machine prediction or reduce the requirement of disruptive data.Cross-machine has been studied through deep learning [36][37][38] and adaptive learning 39 .Inductive bias in PFGE is specific to disruption prediction tasks.Therefore, training the model with less data is accessible.It is possible to achieve a disruption prediction model of the new device directly using a small amount of discharge data with PGFE.
In this paper, an Interpretable Disruption Predictor based On PGFE (IDP-PGFE) has been introduced.The purpose of this work and the advantages of PGFE have already been described in this section.The following section will describe the structure of IDP-PGFE, which consists of a feature extractor, a disruption classifier, and an explainer.Section 3 introduced the dataset on J-TEXT.The predictive performance of IDP-PGFE on J-TEXT is followed in section 4, which shows the two advantages of PGFE.Section 5 examines the interpretability of the predictor, including its application to the J-TEXT physics experiment.Section 6 will briefly discuss the potential and limitations of IDP-PGFE.The summary is in section 7.

The three components of IDP-PGFE: feature extractor, disruption classifier and explainer
IDP-PGFE consists of three components, feature extractor, disruption classifier and explainer.This section will describe these three components of the IDP-PGFE.

Feature extractor: PGFE
The diagnostics signals in the tokamak experiment are heterogeneous, meaning not all dimensions share the same physical meaning, and the relation among them is very complex.It is tough to get physically meaningful patterns from the interpretation results.PGFE is a "white-box" which extracts features with physical meanings for in-depth study after the explainer.
Multiple physical phenomena that may be part of a chain of disruption 5 can be observed in tokamaks.In this work, the physics-guided features have been extracted based on MHD instabilities, radiation, density related disruption and basic plasma control system (PCS) signals.The descriptions of features and diagnostics for extracting them can be found in table 1.The numbers followed the name of diagnostic in the third column are the number of channels used for feature extraction.MHD instability is a significant precursor to disruption, especially the locked mode 5 .Moreover, 2/1 magnetic island growth 40 and multi-magnetic island overlap 41,42 are also the possible precursors of disruption.This work uses Mirnov probes and locked mode detectors to extract this type of feature, as listed in the first row of Table 1.Temperature hollowing and edge cooling 43 are risk indicators for MHD instabilities, and they could also be treated as the disruption precursors.Two radiation arrays, soft x-ray (SXR) 44 and Absolute eXtended Ultra Violet (AXUV) 45 arrays, are used to extract this type of features due to the difficulty in accessing temperature profiles in all the discharges in the dataset.Theoretically, density limit disruption could be predicted through Greenwald density limit extracted as features.As a result, 92 channels of diagnostics have been extracted into 28 features.Details of MHD instabilities, radiation and density related features extraction engineering is described.

MHD instabilities related features
In J-TEXT, the growth of m/n = 2/1 tearing mode (TM) and multi-magnetic islands overlap are the possible precursors of disruption.Mirnov probes and locked mode detectors are the fundamental diagnostics for MHD instabilities measurement in J-TEXT 48 , which provide a simple, robust measurement of static and fluctuating magnetic properties.Deep learning approach to process the Mirnov coil signal 19 and manual approach to the locked mode signal 49 have also been developed at JET. "Mir_Vpp" is the peak-to-peak value of the Mirnov probe after a low-pass filter with a cut-off frequency of 50 kHz.The typical frequency of 2/1 tearing mode in J-TEXT is approximately 3 kHz.A sliding window of 5 ms (longer than 5 periods of TM rotation) is selected as the time window of the Fast Fourier Transform (FFT) on the Mirnov probe.X i (f) represents the Fourier transform of the i th time window (slice)."Mir_abs" and "Mir_fre" are the intensity and frequency of X i (f).If multiple frequencies exist in the Mirnov probe frequency spectrum, the one with the highest spectral intensity will be selected.Here, we did not use the integral Mirnov signals to avoid the zero-drift uncertainty between different discharges that contributed to our prediction.X1 i (f) and X2 i (f) represent the Fourier transform of two Mirnov probe signals of each slice.Their cross spectral density (CSD) can be expressed as where P12 i (f) is the cross spectral density between two Mirnov probes, A(f) is the absolute value of P12 i (f), and δ12 i (f) is the phase of P12 i (f).If the two Mirnov probes are in the poloidal array, the mode number m can be calculated as 12 ( ) where θ is the poloidal separation between the two Mirnov probes.In J-TEXT, m = 3 and m = 2 modes often lead to a multi-magnetic island caused disruption.Hence, the "mode_number_m" (MNM) uses a weighted average mode number indication, which is described as (2-3) 12 ( ) ( ) ( ) where k represents the different frequencies of the CSD.The "mode_number_n" (MNN) is also calculated like this.To increase the accuracy of MNM and MNN, only the frequency component with a coherence larger than 0.95 will be considered.
A typical discharge with TM of J-TEXT is shown in Figure 1.The evolution of MNM, MNN, Mir_fre, and Mir_abs with the raw Mirnov signals are shown.Because of the weighted average mode number indication, MNM and MNN are continuous.The slowdown of mode frequency and growth of mode amplitude induced disruption after the short duration locked mode (less than 1ms).The m/n of this mode is 2/1 and the Mir_fre is approximately 3 kHz before mode locking.The ramp down of Mir_fre represents the mode locking.
After mode locking, n = 1 amplitude is also the MHDrelated feature.The magnetic field measured by the locked mode (LM) detector can be expressed as = B ( , ) cos( ) where θ and φ are the poloidal and toroidal location of the LM detector, m and n are the poloidal and toroidal mode number, and ξ is the spiral phase.If only consider the main component in J-TEXT, which are n = 0, 1 and 2, for the detector on the middle plane θ = 0 (The following are all based on this situation), equation (2-4) can be expressed as 2) n = 1 amplitude, br n=1 can be calculated through two LM detectors with Δφ = π, which is shown in equation ( 2-6) Then br n=1 and ξ n=1 can be calculated by fitting two pairs of these LM detectors.The application of n = 1 amplitude in the resonant magnetic perturbation (RMP) experiments has been introduced in our review paper on magnetic diagnostics 48 .
In the limiter configuration with a circular cross-section of J-TEXT plasma, the features "Mir_abs", "Mir_fre" and "Mir_Vpp" are extracted through one Mirnov probe; "mode_number_m" (MNM) and "mode_number_n" (MNN) are extracted from two Mirnov probes in poloidal and toroidal array, respectively; and "n = 1 amplitude" is extracted from two pairs of LM detectors, each pair consist two locked mode detectors with the toroidal angle of 180°.

Radiation and density related features
Temperature hollowing and edge cooling are phenomena of performance in profiles, which are 1D signals.In order to include the radial profiles of 1D signals in data-driven analyses, 0D peaking factor metrics have been synthesized on JET and DIII-D 21,24 .Profile indicators 22,23,39 have also been applied on the transfer of adaptive predictors in JET and ASDEX Upgrade (AUG).Both the two methods are calculated the first order statistics, i.e. mean of the arrays.Deep learning models on JET and DIII-D can extract 1D signals to features through convolution 25,36 , which includes higher dimensional information.In this work, we calculated the higher-order statistics (HOS) of the 1D signals from the SXR, AXUV and FIR arrays for radiation and density related features.
Two HOSs have been selected to extract the 1D signals to 0D features: skewness (skew), and kurtosis (kurt).Variance (var) is a low-order statistic.As the variance (second order) is higher order than arithmetic mean (first order), in this work the HOS is also included variance.Variance is a measure of dispersion, meaning it is a measure of how far a line of sight of the array is spread out from their average value.The feature, var is defined by: ) where n is the number of lines in the array, xk is the signal at the k th line of sight in the array, μ is the mean of all signals in the array.The plasma profiles are parabolic in the distribution in the limiter configuration with a circular cross-section of J-TEXT plasma.Higher var represents the greater span of data in the profile indicated a more peaked profile.Skewness is a measure of the asymmetry of the distribution of the array signals about their mean.The feature, skew is defined by: If skew > 0, the kind of skewness is called positive skew or right skewed state, which represents the mean > median > mode (statistics meanings) of the array signals.If skew < 0, then the opposite and if skew = 0, the three values above are the same.
Kurtosis is a measure of the combined weight of a distribution's tails relative to the centre of the distribution.The feature, kurt is defined by: "Minus 3" gives a kurtosis of 0 to the normal distribution.This kind of kurtosis is also called excess kurtosis, simplified to kurtosis in this paper.Higher kurt indicates fewer larger or smaller outliers and more concentrated data distribution in the profile.Therefore, higher kurt represents a fatter profile, while lower kurt represents a thinner profile.
The HOS features in one discharge are shown in figure 2. The blue line represents the 0D signal at the middle channel of each array, the green, yellow, and orange lines represent kurt, skew, and var, respectively.In this discharge, the increase of plasma radiation leads to disruption.The increase in var of SXR array and AXUV array represents the radiation profile gradually peaking.The slight reduction on the kurt of the FIR array indicates the gradient of density profile increased.The anomalous fluctuations in the var, skew and kurt before 0.15ms result from the plasma's horizontal displacement of the plasma not yet fully stabilised.
The detailed HOS feature of this discharge are shown in figure 3, ranging from 0.315s to 0.355s.Before 0.341s, the HOS features changed little.The var of each array increased, and the skew and kurt of each array decreased a little.After 0.341s, the HOS features show great variations and oscillations with the disappearance of the sawtooth.The var of FIR and SXR array and skew of FIR array decreased significantly.While the kurt of FIR array increased significantly and the var of AXUV array increased then decreased.Other HOS features show oscillations with slight variations.At about 0.349s, the disruption occurred.
Radiation related features also included the radiated power P_rad and 0D signals, such as impurities radiation CIII (carbon wall for J-TEXT) and soft x-ray radiation SXR_core at the middle channel.Density related features also included sum_ne and 0D signal core line integral density ne0, line average density of the middle channel.The feature sum_ne is calculated by summing up all the signals in FIR array to express the electron density of the poloidal profile approximately.

Disruption classifier: Dropouts meet Multiple Additive Regression Trees (DART)
Tree-based models can be more accurate than neural networks and linear models in many applications, such as tabular-style datasets, where features are individually meaningful 50 .Features after PGFE are also individually meaningful for the disruption prediction tasks.Gradient boosting decision tree (GBDT) 51 is a tree-based algorithm that gives a prediction model in the form of an ensemble of weak prediction models, typically decision trees.Disruption prediction using LightGBM, proposed based on GBDT, was introduced in our previous work 33,34 .However, LightGBM and other GBDT-based algorithms like Multiple Additive Regression Trees (MART) can be over-sensitive to the contributions of the few initially added trees.This is not quite good enough in interpretability research, as it may lead to the model learning wrong knowledge.Dropouts meet Multiple Additive Regression Trees (DART), which uses the trick of the dropout setting in deep neural networks 52 to drop the existing decision trees randomly.DART iteratively optimizes the boosted trees from the remaining set of decision trees could alleviate the over problem.Therefore, DART, a treebased model, is selected as the disruption classifier of IDP-PGFE.If no tree is dropped, DART is, in fact, the traditional GBDT.If all trees are dropped, DART is equivalent to the random forest (RF) because the regression tree has to be regenerated each time, and there is no previous regression tree to base on.

Explainer: SHAP
SHAP is based on the game theoretically optimal Shapley values 53 , which is a method from coalitional game theory to figure out how fairly distribute the "pay-out" (prediction) among the "players" (features).Note that Shapley value is not the difference in prediction compared by removing feature(s) from the model.The Shapley value is the contribution of a feature value to the difference between the actual prediction and the mean prediction when given the current set of feature values.SHAP is also an additive feature attribution method, which uses a simpler explanation model than any interpretable approximation of the original model.Explanation models use simplified inputs x' that map to the original inputs through a mapping function x = hx(x').An explanation model g can be expressed as: (2-10) Where {0,1} M z ∈ ' , z ' = 1 means that the corresponding feature value is "present" and 0 that it is "absent".M is the number of simplified input features, and i R φ ∈ is the feature attribution for a feature j, the Shapley values.The Shapley values ϕ (f, x) expressed as: where |z ' | is the number of non-zero entries in z ' and z ' \ j denotes setting z ' = 0.The Shapley values ϕi (f, x), explaining a prediction f(x), are an allocation of credit among the features in x (the features extracted through PGFE) and are the only allocation satisfying three desirable properties.The first one is local accuracy, which ensure the explanation model g at least match the predictor f.The second one is missingness to ensure features missing in the original input to have no impact, which means if z ' = 0, the importance attributed is also 0. The third one is consistency (also called monotonicity in game theory researches), which states that if a feature is more important in one model than another, the importance attributed to that feature should also be higher.SHAP is a unified approach to interpreting model predictions, which has the advantages of a solid theoretical foundation in game theory, the only allocation of credit among the features and consistency of global and local interpretations.However, it is possible to create intentionally misleading interpretations with SHAP, which can hide biases 54 , which need to be careful in explaining the predictor.

Dataset description
This section will describe the dataset selected to train, valid and test in IDP-PGFE.The electron cyclotron resonance heating (ECRH) system was successfully commissioned at the beginning of 2019 campaign 55,56 48 .The Mirnov probes in the poloidal array in the 2017-2018 campaigns are from the poloidal probe array 1, while the 2021-2022 campaigns are from the poloidal probe array 2. All discharges are split into slices from the flat-top of plasma current to current quench (CQ) time per 0.1ms, the same as the sampling rate of plasma current in J-TEXT.An automatic criterion has been applied to detect sudden large drops in IP, and marked the beginning of the drop as CQ time with 1 ms resolution.Then the results are visually checked and corrected by human experts.The phase between the CQ time and a time threshold indicates the unstable phase of each disruptive discharge.The unstable phase and time threshold of each discharge can be determined manually 22 by a statistical analysis, either equal for each discharge 10 or individually for each discharge 23,57 .An automatic approach has been used to determine the unstable phase and time threshold by finding the best performance of the model by scanning the time threshold from 5ms to 50ms in J-TEXT.The time threshold equalled to 25ms before CQ time achieved the best performance.The "unstable" slices in disruptive charges are labelled as "disruptive", and all the slices in non-disruptive discharges are labelled as "non-disruptive".The disruptive slices are totally from part of the disruptive discharges, and the non-disruptive slices are from all the non-disruptive discharges, which leads to a significant imbalance of the dataset.Therefore, we increased the weights of disruptive slices and randomly dropped a portion of non-disruptive slices to balance the two kinds of slices.

Predictive performances of IDP-PGFE
This section shows the high performance and low data required abilities of IDP-PGFE based on PGFE.Disruption prediction is a binary classification task, of which a confusion matrix is often used to evaluate performance.True positive (TP) in disruption prediction refers to a successfully predicted disruptive discharge.False positive (FP) refers to nondisruptive discharge predicted as disruptive also called false alarm.True negative (TN) refers to a non-disruptive discharge predicted as non-disruptive too.False negative (FN) refers to a disruptive discharge not predicted as disruptive.Miss alarm and tardy alarm both belong to FN.Short warning time should be reckoned as tardy alarm due to the requirement of the disruption mitigation system (DMS).In IDP-PGFE, any predicted disruption with a warning time of less than 10ms is considered FN.The evaluation indicators of a disruption predictor are the receiver operating characteristic curve (ROC), which included true positive rate (TPR), false positive rate (FPR) and area under the ROC curve (AUC).TPR and FPR are calculated as follows: The DART will give a result between "0" ("non-disruptive") and "1" ("disruptive"), which can be binarily classified by manually setting a model threshold.Each model threshold corresponds to a set of TPR and FPR.The ROC curve is created by plotting TPR against FPR at various threshold settings.
To make a fair comparation, two benchmark disruption predictors using raw signals (DPRS) are also shown in this section.The training, valid and test sets of DPRS are the same as IDP-PGFE.One version of DPRS called DPRS-base used all the channels of diagnostic signals (92-dimensions) shown in table 1.Another version of DPRS called DPRS-light used selected channels to keep a same input dimension (28dimensions) with the IDP-PGFE.The channels selected for DPRS-light is shown in table 3. AXUV and SXR arrays selected the channels from the chord distance about 0, ± 0.5a and ± a. FIR array selected the channels from the chord distance about 0 and ± a.
Table 3 Channels of diagnostics selected for DPRS-light.

Channels of diagnostics
Mirnov probes in poloidal array (2) Mirnov probes in toroidal array (2)  Locked mode detectors (4) AXUV array (5) CIII SXR array (5) FIR array (3)  Bt, Ip Ihf, Ivf dr,dz In this section, the performances of IDP-PGFE are reported in detail for J-TEXT.Section 4.1 describes the results of IDP-PGFE.As described in the introduction, PGFE has the ability on low data learning.The cases of low data learning via IDP-PGFE with fewer data are addressed in section 4.2.The hyperparameter search determines the hyperparameters of each best performance model in this section.

IDP-PGFE performances on J-TEXT
In this part, IDP-PGFE was trained with full training set data.The performances of IDP-PGFE, DPRS-base and DPRSlight are shown in figure 4. The orange, navy-blue and lightblue line represents the ROC curves of IDP-PGFE, DPRSbase and DPRS-light, respectively.Although the AUC value of DPRS-base is 0.935, better than it of DPRS-light, the performance of IDP-PGFE is more outstanding.The AUC value of DPRS-light is a little smaller than DPRS-base, which indicates that only increasing the amount of input data (input dimension) is not effective in improving the performance of the disruption predictor.PGFE could significantly improve the performance of the disruption predictor by extracting disruption-related features.This leads to the conclusion that the PGFE-based IDP-PGFE is superior in predictive performance to a disruption predictor using the raw signals.
Figure 5 shows the accumulated percentage of disruption predicted versus warning time with the fixed model threshold = 0.5.Due to J-TEXT being a small-sized tokamak with a relatively smaller time scale for disruption to take place, the warning time is shorter than that for JET, EAST, DIII-D, or other large and medium sized tokamaks.Therefore, the warning time could be selected as 10ms.The electromagnetic particle injector (EPI) could react by the trigger advanced 10ms 58 .The warning time of 30ms should also ensure a considerable accumulated percentage (TPR>90%) of disruption is predicted for other mitigation methods to react.The final performance of IDP-PGFE is TPR = 97.27%,FPR = 5.45% with a tolerance of 10ms.

Interpretability study and the application on J-TEXT experiment
This section will describe the interpretability study by IDP-PGFE and its application on J-TEXT disruption experiments.Benefit from PGFE to extract meaningful features, IDP-PGFE will be more interpretable than using the raw diagnostic signals.The interpretability application presented is not aiming real-time disruption prediction but gives a possible mean of inferring the underlying cause of the disruption and how the interventions affect the disruption process.IDP-PGFE can also be used in real-time as a guide for disruption avoidance, which is beyond the scope of this article.Sections 5.1 will show the global interpretability of IDP-PGFE and section 5.2 will describe the RMP experiment applications.

Global interpretability study of IDP-PGFE
The outstanding performance of IDP-PGFE can somewhat reflect that IDP-PGFE has learned enough correct knowledge related to J-TEXT disruption.Otherwise, it would not be possible to have such high accuracy.Moreover, checking the model interpretation results with widely accepted disruption precursors can also act as a test to verify that the model has learned physically meaningful patterns instead of some leaked information in the data.Nevertheless, understanding how IDP-PGFE recognizes "disruptive" and "non-disruptive" is more important for inferring the underlying cause of the disruption.SHAP is also an attribution technique that could not only provide the contributions of features but also how feature value impacts model output.We will still study the interpretability of IDP-PGFE from the four categories of the PGFE, MHD instabilities, radiation, density related features and basic PCS signals.Figure 8 shows the SHAP value of different features and their relations with feature value.
Figure 8 The SHAP value of different features and their relations with feature value.The width of bar in the SHAP result represent the number of the samples.The larger the width of the bar, the larger the number of samples.The order of the features represents the contributions of features.The colormap represents the feature value of each feature, red means high and blue means low.A positive SHAP value represents a "disruptive" impact on the model, while a negative SHAP value represents a "non-disruptive" impact on the model.

MHD instabilities related features contributions:
The frequency of Mirnov probe Mir_fre reflected the frequency of MHD instabilities is the most contributed feature.The lower value of Mir_fre may both contribute to "disruptive" and "nondisruptive".If the frequency of the Mirnov probe slowdown, the contribution is "disruptive", which reflects the locked mode process.Otherwise, if the frequency of the Mirnov probe is always about 0, the contribution is "non-disruptive", reflecting no MHD instabilities.The higher value of Mir_fre more to "non-disruptive", which means the highspeed rotating mode is not the direct reason for disruption.The two cases can be distinguished by the 3 rd ranked contribution feature, the intensity of Mirnov probe Mir_abs.The lower value of Mir_abs contributes more to "non-disruptive", which represents the no MHD instabilities case.J-TEXT is a smallsized tokamak with a relatively smaller time scale for LM caused disruption.Therefore, n = 1 amplitude is not an effective time-sensitive disruption precursor.The typical discharge shown in section 2.1.1 also proves that the time between the LM and disruption is short.The n = 1 amplitude is the 24 th ranked contribution feature, and the higher value contributes more "disruptive".MNM is the 6 th ranked contribution feature and MNN is the 7 th ranked contribution feature.The higher value of MNM (m ≥ 2) both contribute more to "disruptive" and "non-disruptive".The case that contributes more to "disruptive" may relate to growth of m/n = 2/1 (TM) and multi-magnetic island overlaps.In contrast, the other case may relate to edge magnetic island, which could even improve the plasma confinement 59 .
Radiation related features contributions: The 2 nd ranked contribution feature is SXRskew, and the lower value of SXRskew contributes more to "disruptive".The lower value of SXRskew does not mean that the soft x-ray radiation is high.Figure 9 shows the SHAP value of SXRskew with respect to the value of SXRskew and SXRcore.This figure shows that only lower SXRcore and lower SXRskew together contributed more to "disruptive", reflecting the general decrease of soft x-ray radiation.The 15 th ranked contribution feature is SXRvar, and the higher value of SXRvar represents the more peaked profile, which contributes more to "disruptive".This is usually a precursor to the density limit disruption.The 9 th ranked contribution feature is AXUVkurt, and the lower value of AXUVkurt represents the thinner profile, which contributes more to "disruptive".The thinner profile usually represents more significant gradients of the profile.The 20 th ranked contribution feature is P_rad, and the higher value of P_rad represents the more losses of plasma radiation energy, which contributes more to "disruptive".

Density related features contributions:
The density related features are intuitive to understand.The most relevant feature among the density related features is ne0, the 8 th ranked contribution feature.The higher value of ne0 represents the density limit disruption.No matter how FIRvar, FIRskew and FIRkurt contribute, J-TEXT cannot escape the density limit disruption after all.The higher value of FIRvar and lower value of FIRskew contribute more to "disruptive".The could provide more detailed information on the density limit disruption, which is the more peaked density profile and the more general increase of plasma density.Basic PCS signals contributions: Notably, IP and Bt get the 4 th and 5 th ranked contribution features, respectively.Even if IP and Bt do not change much in one discharge, their contribution is still significant because they are basic parameters of plasma.The value of IP and Bt along has no impact on "disruptive" or "non-disruptive".Both dZ and IHF could reflect the vertical displacement.The IHF should be larger than 0 (based on the value of TF) when the plasma is stable (no vertical displacement) in J-TEXT.Higher IHF will raise the plasma and vice versa.Therefore, the higher or lower IHF indicates that the vertical displacement tends to go down or up.According to the SHAP result, the lower IHF is more contributed to "disruptive" reflecting that the vertical displacement goes up usually causes disruption in J-TEXT.As shown in figure 10, this proves that the plasma usually tends to drift up when approaching disruption.So IHF needs to pull the plasma back.We have yet to find the reasons, which may be due to the electromagnetic force on J-TEXT or the position of the gas puffing fuelling.It requires further investigation and analysis in the future.Both dr and IVF could reflect the horizontal displacement.The higher value of dr both contributes more to "disruptive" and "non-disruptive" may represent that in J-TEXT, the horizontal displacement to LFS limiter is salvageable.

Interpretability application on RMP experiment
The analysis above is only a preliminary study on the interpretability of IDP-PGFE.Much of the analysis is speculative and not conclusive.Even though the study is preliminary, it still fits the experimental phenomenon well and helps experimenter to find possible experimental analysis direction for further analysis in J-TEXT experiments.
The experiments investigating the influence of RMPs on disruption due to continuously increasing density towards density limit in J-TEXT were carried out in the 2022 spring.In the RMP experiments (Ip = 120 kA, Bt = 2.1 T, Ic = 0-4 kA) it is found that the application of RMPs raised the density limit at disruption.Two typical discharges in this experiment have been selected in the test set.The overviews of the experiment are shown in figure 11.The discharge without the application of RMP also reached the density limit of about 0.79nG.In the RMP experiment, the discharges with the application of 2 kA RMPs reached the highest density limit of about 0.91nG, which is a really high density limit in J-TEXT.The #1080564 discharge is a typical high qa density limit disruption discharge in J-TEXT.In this kind of discharge, multifaceted asymmetric radiation from the edge (MARFE) and detached plasma reflected in the ratio evolution of the line integral density at HFS to that at LFS 61,62 .As the density continued to rise, the ratio began to increase along with the onset of MARFE, then decreased when detached plasma happened.The application of RMPs delays the decrease of the ratio, as a result in raising the density limit and delaying the disruption in #1080550 discharge.The interpretability study of IDP-PGFE could help physicists confirm if the application of RMPs raised the density limit and find out the possible reason.13 show the predicted result, SHAP value and four typical features in #1080564 (w/o RMPs) and #1080550 (with RMPs) discharges, respectively.The disruption time is much later than the predicted time contributed by the core density with the application of RMPs (# 1080550).This indicate that the application of RMPs delay the density limit disruption and raise the density limit.
The #1080564 discharge (w/o RMPs) is a high qa density limit disruption discharge different from #1074378.The predicted result increased because of the increase in SHAP value of CIII, AXUVkurt and MNM.The SHAP value of ne0 increased only close to disruption.The SHAP value of CIII increased at about 0.39s (during MARFE), contributing to the predicted result of about 0.4.At about 0.42s, detached plasma happened.The SHAP value of AXUVkurt increased with AXUVkurt's decrease makes the predicted result increase.At about 0.44s, the SHAP value of MNM increased and contributed to the predicted result larger than 0.5.The interpretability study in the high qa density limit disruption discharge shows that before the density limit disruption, detached plasma and MHD instabilities could be the reason for density limit disruption 61 .
In #1080550 discharge (with RMPs), The SHAP value of CIII still increased at about 0.39s (during MARFE).Because of the application RMPs, CIII continuing increase to about 0.56s with no detached plasma.Therefore, the value of AXUVkurt did not decrease, and the SHAP value of AXUVkurt did not increase.The most significant contribution to the predicted result is ne0 because, in the view of IDP-PGFE, this discharge reached the density limit.After RMPs cut off, at about 0.56s, the value of AXUVkurt decreased and MNM increased, so the detached plasma and MHD instabilities appeared again and caused the disruption.Therefore, applying RMPs can raise the density limit by delaying detached plasma and MHD instabilities before the density limit disruption.It is found that the application of RMPs not only has an impact on MHD instabilities as investigated before 63 but also has an impact on radiation profile (AXUVkurt).The contribution of AXUVkurt and MNM are more important than the contribution of ne0 for underlying cause of the density limit disruption.It brings a new sight for the RMP experiment on the density limit disruption, which could guide physicists to figure out how the RMPs affect the density limit disruption process.The application of RMPs verify that the radiation (profile) and MHD instabilities has a strong connection to density limit disruption.In density limit disruption, the contribution of radiation (profile) and MHD instabilities also rises before disruption.The application of RMPs makes the contribution of radiation (profile) and MHD instabilities rise after the density limit (~0.79nG), so that the density limit has been raised (~0.91nG).

Discussion
This section will discuss the potential and limitations of IDP-PGFE.We will discuss three topics: interpretability, transferability and feasibility of real-time.
Interpretability: In IDP-PGFE, PGFE as a feature extractor extracts the disruption-related features.As a disruption classifier, DART is the machine learning tool to discover potential patterns and relations between features.SHAP is the attribution technique as an explainer to understand the potential patterns and relations between features.Benefit from PGFE to extract meaningful features, IDP-PGFE will be more interpretable than using the raw diagnostic signals.It means that IDP-PGFE has the potential to give a possible mean of inferring the underlying cause of the disruption and propose new conjectures.The application of RMPs affect not only the MHD instabilities but also the radiation profile, which delays the density limit disruption.More new conjectures and physical mechanisms of disruption require in-depth collaboration with physicists in the future.
However, IDP-PGFE is only applied in J-TEXT, therefore, IDP-PGFE could only learn about the disruption on J-TEXT.In the future, IDP-PGFE could be a tool to explore disruption event chain and disruption mechanisms by applying to dataset from different devices.Moreover, the feature extraction techniques of PGFE are not ultimate.Inverted plasma profiles, more physics features and other inductive biases of disruption could make PGFE more interpretable.
Transferability: IDP-PGFE has an advantage in the low data learning.Therefore, IDP-PGFE could directly train on the target dataset.However, even if the training set only required 10%, it still included about 20 disruptive and 120 nondisruptive discharges.Using domain adaptation on the features will boost the transferability of IDP-PGFE.PGFE relies on the expert experiences summarized from the dataset and the disruption-related physics.The experiences from the disruption-related physics could transfer to future devices.Moreover, the dataset to summarize the expert experiences does not necessarily have to be the training data.The expert experiences could be summarized from the dataset in the lowperformance scenario or existing devices for ITER or future devices.However, PGFE could not handle future devices' new physics and device-specific features.
Feasibility of real-time: IDP-PGFE is based on tree models, an advantage in real-time disruption prediction.Meanwhile, PGFE makes the diagnostic signals into diagnostic-independent features, which means the lacked diagnostic signals are alternative to IDP-PGFE.The real-time tree-based disruption prediction has been tested on J-TEXT.The calculation speed can meet the requirement of the DMS.IDP-PGFE also has the potential for disruption avoidance and prevention via real-time SHAP value.The advantage of SHAP value is that we not only know the contribution of features, but also know how features change will affect the contribution.To avoid disruption, PCS only need to send different commands to auxiliary systems (like RMPs or ECRH) for different disruption causes (SHAP value).

Summary
This paper introduced an interpretable disruption predictor based on physics-guided feature extraction (PGFE) called IDP-PGFE.Actually, IDP-PGFE is a general method applicable to all tokamaks.IDP-PGFE comprises a feature extractor, a disruption classifier and an explainer based on PGFE, DART and SHAP, respectively.PGFE extracts feature with the inductive bias related to MHD instabilities, radiation and density.The feature extraction technique for PGFE might be different on different tokamaks.J-TEXT feature extractor may not work directly on other devices.But with knowledge of machine operation and diagnostic design, it can be transfer to other machines easily.
Therefore, IDP-PGFE has been first applied on J-TEXT.IDP-PGFE reaches the best performance in J-TEXT disruption prediction task benefit from PGFE.The performance of IDP-PGFE is TPR = 97.27%,FPR = 5.45% with a warning time of 10ms.PGFE could also reduce the data requirement of IDP-PGFE.The performance of IDP-PGFE using PGFE with only a 10% data size of the training dataset is similar to the performance of disruption predictor using raw signals with a full training dataset.The analysis of the contribution of features given by SHAP is not only generally consistent with existing comprehension but also helps understand some of the J-TEXT disruption processes.The interpretability application provides the analysis and brings new sight of the existing experiment phenomena.It gives a possible mean of inferring the underlying cause of the disruption and how the interventions affect the disruption process.The interpretation results and the experimental phenomenon have a high degree of conformity.The interpretability study of IDP-PGFE also gives a possible experimental analysis direction that RMPs affect not only the MHD instabilities but also the radiation profile, which delays the density limit disruption.
Last but not least, we briefly discuss three topics to discuss the potentials and limitations of IDP-PGFE.As discussed in section 6, IDP-PGFE has three potentials.First, IDP-PGFE could propose new conjectures of disruption, particularly after applying to other large and advanced tokamaks.Second, IDP-PGFE could transfer to other tokamaks with low data learning and domain adaptation.Third, IDP-PGFE could enable realtime disruption avoidance by identifying the different disruption causes.

Figure 1 A
Figure 1 A typical disruptive discharge with tearing mode of J-TEXT.a) shows the plasma current of the discharge; b) and c) are the raw signal (left) and mode number (right) of the poloidal and toroidal Mirnov signals; d) shows Mir_abs (left) and Mir_fre (right) of one Mirnov signal (the raw signal in b)).

Figure 2 A
Figure 2 A disruptive discharge to show the HOS features in J-TEXT.The navy-blue line represents the 0D signal at the middle channel of each array, the green, yellow, and orange lines represent kurt, skew, and var, respectively.a) shows the plasma current of the discharge; b) shows the FIR array, which provided density signals; c) shows the SXR array and d) shows the AXUV array, which provided radiation signals.

Figure 3
Figure 3 The disruptive discharge #1057831 range from 0.315s to 0.355s of HOS features in J-TEXT.The orange, yellow and green lines represent var, skew, and kurt, respectively.a) shows the plasma current of the discharge; b), c) and d) shows the FIR array, which provided density signals; e), f) and g) shows the SXR array; h), i) and j) shows the AXUV array, which provided radiation signals.

Figure 5
Figure 5 The accumulated percentage of disruption predicted versus warning time.The model threshold is fixed as 0.5.The red dashed line represents the accumulated percentage of disruption predicted equals to 90%.The light blue dashed lines represent the warning time of 10ms, 30ms and 300ms.

Figure 6
Figure 6 The ROC curves of IDP-PGFE trained by different data sizes, DPRS-base, and DPRS-light trained by full data sizes.The FPR axis is from 0% to 20%.The TPR axis is from 20% to 100%.Five coloured lines represent five kinds of data size, respectively (100% -orange, 40% -yellow, 10% -deep-red, and 5% -green).The navy-blue and the light-blue lines represent the DPRS-base and DPRS-light ROC curves.The AUC value of IDP-PGFE with the 10% data size is the same as the AUC value of DPRS-light and a little bit smaller than DPRS-base.The AUC value of IDP-PGFE with the 5% data size is the smallest.The low data learning ability does not benefit from the disruption classifier DART but from the feature extractor PGFE.DPRS-base and DPRS-light were also trained with fewer data as the benchmarks.The evolution of the AUC values with the training data size is shown in figure 7. The AUC value of DPRS-base and DPRS-light are similar to each other with the data size of 80%, 60%, 40% and 20%.When the training data size smaller than 20% the AUC value of DPRS-light degrades faster than DPRS-base.As the training data size decreases, the performance of DPRS-base and DPRS-light both degrades faster than IDP-PGFE.The performance of IDP-PGFE and DPRSs has a significant reduction at 10% data size and 20% data size, respectively.It is worth noting that the 10% data size only includes about 20 disruptive discharges and 120 non-disruptive discharges, which are quite a small data requirement for the disruption prediction task.PGFE as a feature extractor could obviously reduce the data requirement of the disruption predictor by the feature extraction specific to tokamak disruption.

Figure 7
Figure7The AUC value data size of the training set for two kinds of models.The orange, navy-blue and light-blue line represents IDP-PGFE, DPRS-base and DPRS-light, respectively.The x-axis decreases from 100% to 5%.The AUC value of DPRS-base and DPRS-light decreases faster than that of IDP-PGFE as the data size decreases.The AUC value of IDP-PGFE ramps down from 0.939 to 0.845 with the data size from 10% to 5%, while the AUC value of DPRS-base ramps down from 0.839 to 0.763 and the AUC value of DPRS-light ramps down from 0.831 to 0.7 with the data size from 20% to 5%.

Figure 9
Figure 9 The SHAP value of SXRskew with respect to the value of SXRskew and SXRcore.The x-axis represents the value of SXRskew; the colormap represents the value of SXRcore; the y-axis represents the SHAP value of SXRskew.

Figure 10
Figure 10 The SHAP value IHF with respect to the value of IHF and dz.The x-axis represents the value of IHF; the colormap represents the value of dz; the y-axis represents the SHAP value of IHF.

Figure 11
Figure 11 The overview of the two typical discharges in J-TEXT disruption experiment under RMPs.The blue line represents the discharge without the application of RMPs with the density limit of about 0.79nG.The orange line represents the discharge with the application of 2 kA RMPs with the density limit about of 0.91nG.a) shows the plasma current of three discharges; b) shows the line average density of middle channel; c) shows the ratio of the line integral density at HFS to that at LFS; d) shows the RMPs current.Disruption occurred after the RMPs cut off.

Figure 12
Figure12and figure13show the predicted result, SHAP value and four typical features in #1080564 (w/o RMPs) and #1080550 (with RMPs) discharges, respectively.The disruption time is much later than the predicted time contributed by the core density with the application of RMPs (# 1080550).This indicate that the application of RMPs delay the density limit disruption and raise the density limit.The #1080564 discharge (w/o RMPs) is a high qa density limit disruption discharge different from #1074378.The predicted result increased because of the increase in SHAP value of CIII, AXUVkurt and MNM.The SHAP value of ne0 increased only close to disruption.The SHAP value of CIII increased at about 0.39s (during MARFE), contributing to the predicted result of about 0.4.At about 0.42s, detached plasma happened.The SHAP value of AXUVkurt increased with AXUVkurt's decrease makes the predicted result increase.At about 0.44s, the SHAP value of MNM increased and contributed to the predicted result larger than 0.5.The

Figure 12
Figure 12 The predicted result, SHAP value and four typical features in # 1080564 discharge without the application of RMPs.a) shows the plasma current and predicted result; b) shows the SHAP value of all the features, the colored lines represent the four typical features; c) shows two typical features, line average density of the middle channel ne0 (light blue) and the impurities radiation CIII (orange); d) shows two typical features, the kurtosis of AXUV array AXUVkurt (green) and the poloidal mode number MNM (yellow).

Figure 13
Figure 13 The predicted result, SHAP value and four typical features in # 1080550 discharge with the application of RMPs.a), b), c) and d) show the same as figure 12.

Table 1
Descriptions and symbols of all the features . The dataset of J-TEXT contains 1791 discharges out of 2017-2018 campaigns and 2021-2022 campaigns with the accessibility and consistency of the diagnostics channels, including discharges applied to ECRH and RMPs.All types of disruptions were included except intentional ones triggered by massive gas injection (MGI) or shattered pellet injection (SPI) and engineering tests.Both training and validation sets are selected randomly from the 2017-2018 campaigns.The test set is selected randomly from the 2017-2018 and 2021-2022 campaigns.A split of datasets is shown in table 2. 212 disruptive and 1199 nondisruptive discharges are selected as the training set.80 disruptive and 80 non-disruptive discharges are selected as the validation set.110 disruptive and 110 non-disruptive discharges are selected as the test set.The locked mode detectors in the 2017-2018 campaigns are ex-vessel saddle loops, while in the 2021-2022 campaigns are Br HFS Mirnov probes

Table 2
Split of datasets of the predictor