Deep neural network Grad–Shafranov solver constrained with measured magnetic signals

Semin Joung; Jaewook Kim; Sehyun Kwak; J.G. Bak; S.G. Lee; H.S. Han; H.S. Kim; Geunho Lee; Daeho Kwon; Y.-C. Ghim

doi:10.1088/1741-4326/ab555f

1. Introduction

Magnetic equilibrium is one of the most important factors in understanding the basic behavior of plasmas in magnetically confined plasmas, and the off-line EFIT [1] code has been extensively used to reconstruct such equilibria in tokamaks. Its fundamentals are basically finding a solution to an ideal magnetohydrodynamic equilibrium with toroidal axisymmetry, known as the Grad–Shafranov (GS) equation [2]:

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \newcommand{\rp}{\right)} \newcommand{\lp}{\left(} \displaystyle \label{eq:gseq} \Delta^*\psi &\equiv \lp R\frac{\partial}{\partial R} \frac{1}{R} \frac{\partial}{\partial R} + \frac{\partial^2}{\partial Z^2} \rp \psi \nonumber \\ & = -\mu_{0}R j_\phi \nonumber \\ & = -\mu_0 R^2 \frac{{\rm d} p(\psi)}{{\rm d} \psi} - F(\psi)\frac{{\rm d} F(\psi)}{{\rm d} \psi},\nonumber \end{align} \tag{ 1 }$

where $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \psi=\psi\lp R, Z\rp$ is the poloidal flux function, $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} j_\phi=j_\phi\lp R, Z\rp$ the toroidal current density function, and $p(\psi)$ the plasma pressure. $F(\psi)$ is related to the net poloidal current. Here, R, $\phi$ and Z denote the usual cylindrical coordinate system. As $\Delta^*$ is a two-dimensional non-linear partial differential operator, the off-line EFIT [1] finds a solution with many numerical iterations and has been implemented in many tokamaks such as DIII-D [3], JET [4], NSTX [5], EAST [6] and KSTAR [7] to name a few examples.

With the aim of real-time control of tokamak plasmas, a real-time EFIT (rt-EFIT) [8] code is developed to provide a magnetic equilibrium fast enough, whose results are different from the off-line EFIT results. As pulse lengths of tokamak discharges become longer [9–15], the demand for more elaborate plasma control is ever increased. Furthermore, some of the ITER-relevant issues such as edge localized mode suppression with resonant magnetic perturbation coils [16] and the detached plasma scenarios [17, 18] require sophisticated plasma controls, meaning that the more accurate magnetic equilibria we have in real time, the better performance we can achieve.

There has been an attempt to satisfy such a requirement of acquiring a more accurate, i.e. closer to the off-line EFIT results compared to the rt-EFIT results, magnetic equilibrium in real time using graphics processing units (GPUs) [19] by parallelizing equilibrium reconstruction algorithms. The GPU-based EFIT (P-EFIT) [19] enabled the calculation of a well-converged equilibrium in much less time; however, the benchmark test showed similar results to the rt-EFIT rather than the off-line results [20].

Thus, we propose a reconstruction algorithm based on a neural network that satisfies the GS equation as well as the measured magnetic signals to obtain accurate magnetic equilibrium in real time. We note that usage of neural networks in the fusion community is increasing rapidly, and examples are radiated power estimation [21], identifying instabilities [22], estimating neutral beam effects [23], classifying confinement regimes [24], determination of scaling laws [25, 26], disruption prediction [27–29], turbulent transport modeling [30–33], plasma tomography with the bolometer system [34, 35], coil current prediction with the heat load pattern in W7-X [36], filament detection on MAST-U [37], electron temperature profile estimation via SXR with Thomson scattering [38] and equilibrium reconstruction [39–44] together with an equilibrium solver [45]. Most of the previous works on equilibrium reconstruction with neural networks have paid attention to finding the poloidal beta, the plasma elongation, positions of the X-points and plasma boundaries, i.e. last closed flux surface, and gaps between plasmas and plasma-facing components, rather than reconstructing the whole internal magnetic structures we present in this work.

The inputs to our developed neural networks consist of plasma current measured by a Rogowski coil, normal and tangential components of magnetic fields by magnetic pick-up coils, poloidal magnetic fluxes by flux loops and a position in the $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \lp R, Z\rp$ coordinate system, where R is the major radius and Z is the height as shown in figure 1. The output of the neural networks is a value of poloidal flux $\psi$ at the specified $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \lp R, Z\rp$ position. To train and validate the neural networks, we have collected a total of 1118 KSTAR discharges from two consecutive campaigns, i.e. the 2017 and 2018 campaigns. We, in fact, generate three separate neural networks which are NN₂₀₁₇, NN₂₀₁₈ and NN_{2017, 2018} where the subscripts indicate the year(s) of the KSTAR campaign(s) that the training data sets are obtained from. An additional 163 KSTAR discharges (from the same two campaigns) are collected to test the performance of the developed neural networks.

**Figure 1.** A poloidal cross-section of KSTAR with the first wall (blue dotted line). Green dotted line indicates a Rogowski coil measuring the plasma current ( $I_{\rm p}$ ). Green open circles and crosses depict locations of the magnetic pick-up coils measuring 32 normal ( $B_{\rm n}$ ) and 36 tangential ( $B_{\rm t}$ ) magnetic fields, respectively, whereas green triangles represent 22 flux loops measuring poloidal magnetic fluxes ( $\Psi_{\rm FL}$ ). Black asterisks ( $22\times 13$ spatial positions) show locations where we obtain the values of $\psi$ from the off-line EFIT results.
Download figure:
Standard image High-resolution image

**Figure 1.** A poloidal cross-section of KSTAR with the first wall (blue dotted line). Green dotted line indicates a Rogowski coil measuring the plasma current ( $I_{\rm p}$ ). Green open circles and crosses depict locations of the magnetic pick-up coils measuring 32 normal ( $B_{\rm n}$ ) and 36 tangential ( $B_{\rm t}$ ) magnetic fields, respectively, whereas green triangles represent 22 flux loops measuring poloidal magnetic fluxes ( $\Psi_{\rm FL}$ ). Black asterisks ( $22\times 13$ spatial positions) show locations where we obtain the values of $\psi$ from the off-line EFIT results.
Download figure:
Standard image High-resolution image

We train the neural networks with the KSTAR off-line EFIT results, and let them be accurate magnetic equilibria. Note that the dispute on whether the off-line EFIT results we use to train the networks are accurate or not is beyond the scope of this work. If we find more accurate EFIT results, e.g. motional Stark effect-constrained EFIT or more sophisticated equilibrium reconstruction algorithms that can cope with current-hole configurations (current reversal in the core) [46–48], then we can always re-train the networks with new sets of data as long as the networks follow the trained EFIT data with greater similarity than the rt-EFIT results do. This is because supervised neural networks are limited to follow the training data. Hence, as a part of the training sets we use the KSTAR off-line EFIT results as possible examples of accurate magnetic equilibria to corroborate our developed neural networks.

To calculate the output data a typical neural network requires the same set of input data as it has been trained with. Therefore, even a single missing input (out of the input data set) can result in a flawed output [49]. Such a case can be circumvented by training the network with possible combinations of missing inputs. As a part of the input data, we have 32 normal and 36 tangential magnetic fields measured by the magnetic pick-up coils. If we wish to cover a case with one missing item of input data, then we will need to repeat the whole training procedure with 68 (32 + 36) different cases. If we wish to cover a case with two or three missing items of input data, then we will need an additional 2278 and $50\,116$ different cases to be trained, respectively. This number becomes large rapidly, and it becomes formidable, if not impossible, to train the networks with reasonable computational resources. Since the magnetic pick-up coils are susceptible to damage, we have developed our networks to be capable of inferring a few missing signals of the magnetic pick-up coils in real time by invoking an imputation scheme [50] based on Bayesian probability [51] and Gaussian processes [52].

In addition to reconstructing accurate magnetic equilibria in real time, the expected improvements with our neural networks compared to those of previous studies are at least fourfold: (1) the network is capable of providing the whole internal magnetic topology, and is not limited to boundaries and locations of X-points and/or magnetic axes; (2) the spatial resolution of the reconstructed equilibria is arbitrarily adjustable within the first wall of KSTAR since the $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \lp R, Z\rp$ position is a part of the input data; (3) the required training time and computational resources for the networks are reduced by generating coarse grid points also owing to the $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \lp R, Z\rp$ position being an input; and (4) the networks can handle a few missing signals of the magnetic pick-up coils using the imputation method.

First, we present how the data are collected to train the neural networks and briefly discuss real-time preprocessing of the measured magnetic signals in section 2. For the readers who are interested in a thorough description of the real-time preprocessing, appendix A provides the details. Then, we explain the structure of our neural networks and how we train them in section 3. In section 4, we present the results of the developed neural network EFIT (nn-EFIT) in four aspects. First, we discuss how well the NN_{2017, 2018} network reproduces the off-line EFIT results. Then, we make comparisons among the three networks, NN₂₀₁₇, NN₂₀₁₈ and NN_{2017, 2018}, by examining in-campaign and cross-campaign performance. Once the absolute performance qualities of the networks are established, we compare relative performance qualities between nn-EFIT and rt-EFIT. Finally, we show how the imputation method supports the networks when there is missing input. Our conclusions are presented in section 5.

2. Collection and real-time preprocessing of data

Figure 1 shows locations where we obtain the input and the output data with the first wall (blue dotted line) on a poloidal cross-section of KSTAR. The green dotted line indicates a Rogowski coil measuring the plasma current ( $I_{\rm p}$ ). The green open circles and crosses show locations of the magnetic pick-up coils measuring 32 normal ( $B_{\rm n}$ ) and 36 tangential ( $B_{\rm t}$ ) components of magnetic fields, respectively, whereas the green triangles show 22 flux loops measuring the poloidal magnetic fluxes ( $\Psi_{\rm FL}$ ). These magnetic signals are selectively chosen out of all the magnetic sensors in KSTAR [53] whose performance has been demonstrated for many years, i.e. those which are less susceptible to damage.

Although KSTAR calibrates the magnetic sensors (magnetic pick-up coils and flux loops) regularly during a campaign to remove drifts in the magnetic signals, it does not guarantee to fully eliminate such drifts. Thus, we preprocess the signals to adjust the drifts. Figure 2 shows examples of before (blue) and after (red) the drift adjustment for (a) normal and (b) tangential components of magnetic fields measured by the magnetic pick-up coils and (c) poloidal magnetic flux measured by one of the flux loops. Here, a KSTAR discharge is sustained until about 20 s, and all the external magnetic coils (except the toroidal field coils) are turned off at about 30 s. Therefore, we expect all the magnetic signals to return to zeros at around 30 s. If not, we envisage that there has been residual drift. This means that we need to be able to preprocess the magnetic signals in real time so that the input signal characteristics for predictions are similar to the trained ones. Appendix A describes in detail how we preprocess the magnetic signals in real time.

**Figure 2.** Before (blue) and after (red) the magnetic signal adjustments for (a) normal and (b) tangential components of magnetic fields measured by the magnetic pick-up coils, and (c) poloidal magnetic flux measured by one of the flux loops. The signals return closer to zeros after the adjustment when all the external magnetic coils (except the toroidal field coils) are turned off at around 30 s in this KSTAR discharge. See appendix A for detailed description.
Download figure:
Standard image High-resolution image

The black asterisks in figure 1 show the $22\times 13$ grid points where we obtain the values of $\psi$ from the off-line EFIT results as outputs of the networks. We note that the original off-line EFIT provides the values of $\psi$ with $65\times 65$ grid points. The $22\times 13$ grid points are selected such that the distances between the neighboring channels in the R and Z directions are as similar as possible while covering the whole region within the first wall. By generating such coarse grid points we can decrease the number of samples to train the network, thus consuming less computational resources. Nevertheless, we do not lose the spatial resolution since the $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \lp R, Z\rp$ position is an input, i.e. the network can obtain the value of $\psi$ at any position within the first wall (see section 4).

With an additional input for the spatial positions R and Z, each data sample contains 93 inputs (and yet another input for bias) and one output which is a value of $\psi$ at the specified $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \lp R, Z\rp$ location. We randomly collect a total of 1118 KSTAR discharges from the 2017 and 2018 campaigns. Since each discharge can be further broken into many time slices, i.e. every 50 ms following the temporal resolution of the off-line EFIT, we obtain $217\,820$ time slices. With a total value of 286 of $\psi$ from $22\times 13$ spatial points, we have a total of $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} 62\,296\,520\lp =217\,820\times286\rp$ samples to train and validate the networks. 90% of the samples are used to train the networks, while the other 10% are used to validate the networks to avoid overfitting problems. Note that an overfitting problem can occur if a network is overly well trained to the training data, learning very detailed features of them such as noise features. This inhibits generalization of the trained network to predict unseen data, and such a problem can be minimized with the validation data set. All the inputs except R and Z are normalized such that the maximum and minimum values within the whole samples become 1 and −1, respectively. We use the actual values of R and Z in the unit of meters.

Table 1 summarizes the training and validation samples discussed in this section. Additionally, we also have randomly collected another 163 KSTAR discharges in the same way discussed here which are different from the 1118 KSTAR discharges to test the performance of the networks.

Table 1. Summary of the data samples to train and validate the networks.

Parameter	Definition	Data size	No. of samples
$I_{\rm p}$	Plasma current	1
	(Rogowski coil)

$B_{\rm n}$	Normal magnetic field	32
	(Magnetic pick-up coils)
			217 820
$B_{\rm t}$	Tangential magnetic field	36	(time slices)
	(magnetic pick-up coils)
$\Psi_{\rm FL}$	Poloidal magnetic flux	22
	(flux loops)

R	Position in major radius	1	286
			( $22\times13$ grids)
Z	Position in height	1

Network input size		93 (+1 for bias)
Total No. of samples			62 296 520

3. Neural network model and training

3.1. Neural network model

We develop neural networks that not only output a value of $\psi$ but also satisfy equation (1), the GS equation. With a total of 94 input nodes (91 for the plasma current and magnetic signals, two for the R and Z position, and one for the bias) and one output node for a value of $\psi$ , each network has three fully connected hidden layers with an additional bias node at each hidden layer. Each layer contains 61 nodes including the bias node. The structure of our networks is selected by examining several different structures by trial and error.

Denoting the value of $\psi$ calculated by the networks as $\psi^{\rm NN}$ , we have

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle \label{eq:nn_structure} \nonumber &\psi^{\rm NN} = s_{0} + \sum_{l=1}^{60} s_{l} \nonumber \\ &\times f \left(u_{l0} + \sum_{k=1}^{60} u_{lk} f \left(v_{k0} + \sum_{j=1}^{60} v_{kj} f \left(w_{j0} + \sum_{i=1}^{93} w_{ji} x_{i} \right) \right) \right), \nonumber \end{align} \tag{ 2 }$

where x_i is the ith input value with $i=1,\dots,93$ , i.e. 91 measured values with various magnetic diagnostics and two for the R and Z positions. w_ji is an element in a $61\times 94$ matrix, whereas $v_{kj}$ and u_lk are elements in $61\times 61$ matrices. s_l connects the $l^{\rm th}$ node of the third (last) hidden layer to the output node. $w, v, u$ and s are the weighting factors that need to be trained to achieve our goal of obtaining accurate $\psi$ . $w_{j0}, v_{k0}, u_{l0}$ and s₀ are the weighting factors connecting the biases, where values of all the biases are fixed to unity. We use a hyperbolic tangent function as the activation function f giving the network non-linearity [54]:

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle \label{eq:NN-actfcn} f \left(t \right) = \tanh(t) = \frac{2}{1 + {\rm e}^{-2t}} - 1. \nonumber \end{align} \tag{ 3 }$

The weighting factors are initialized as described in [55] so that a good training can be achieved. They are randomly selected from a normal distribution whose mean is zero with the variance set to be an inverse of the total number of connecting nodes. For instance, our weighting factor w connects the input layer (94 nodes with bias) and the first hidden layer (61 nodes with bias), therefore the variance is set to be $1/(94+61)$ . Likewise, the variances for $v$ , u and s are $1/(61+61)$ , $1/(61+61)$ and $1/(61+1)$ , respectively.

3.2. Training

With the aforementioned network structure, training (or optimizing) the weighting factors to predict the correct value of $\psi$ highly depends on the choice of cost function. A typical choice of such cost function would be:

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle \label{eq:costfcn} \epsilon = \frac{1}{N} \sum_{i=1}^{N} \left(\psi_{i}^{\rm NN} - \psi_{i}^{\rm Target} \right)^{2}, \nonumber \end{align} \tag{ 4 }$

where $\psi^{\rm Target}$ is the target value, i.e. the value of $\psi$ from the off-line EFIT results in our case, and N the number of data sets.

As will be shown shortly, minimizing the cost function $\newcommand{\e}{{\rm e}} \epsilon$ does not guarantee to satisfy the GS equation (equation (1)) even if $\psi^{\rm NN}$ and $\psi^{\rm Target}$ match well, i.e. the network is well trained with the given optimization rule. Since $\Delta^*\psi$ provides information on the toroidal current density directly, it is important that $\Delta^*\psi^{\rm NN}$ matches $\Delta^*\psi^{\rm Target}$ as well. We have an analytic form representing $\psi^{\rm NN}$ as in equation (2); therefore, we can analytically differentiate $\psi^{\rm NN}$ with respect to R and Z, meaning that we can calculate $\Delta^*\psi^{\rm NN}$ during the training stage. Thus, we introduce another cost function:

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle \label{eq:costfcn_plusdiff} \nonumber \epsilon^{\rm new}& = \frac{1}{N} \sum_{i=1}^{N} \left(\psi_{i}^{\rm NN} - \psi_{i}^{\rm Target} \right)^{2} \nonumber \\ &\quad+ \frac{1}{N} \sum_{i=1}^{N} \left(\Delta^*\psi_{i}^{\rm NN} - \Delta^*\psi_{i}^{\rm Target} \right)^{2}, \nonumber \end{align} \tag{ 5 }$

where we obtain the value of $\Delta^*\psi^{\rm Target}$ from the off-line EFIT results as well.

To acknowledge the difference between the two cost functions $\newcommand{\e}{{\rm e}} \epsilon$ and $\newcommand{\e}{{\rm e}} \epsilon^{\rm new}$ , we first discuss the results. Figure 3 shows the outputs of the two trained networks with the cost function (a) $\newcommand{\e}{{\rm e}} \epsilon$ and (b) $\newcommand{\e}{{\rm e}} \epsilon^{\rm new}$ . It is evident that in both cases the network output $\psi^{\rm NN}$ (red dashed line) reproduces the off-line EFIT $\psi^{\rm Target}$ (black line). However, only the network trained with the cost function $\newcommand{\e}{{\rm e}} \epsilon^{\rm new}$ reproduces the off-line EFIT $\Delta^*\psi^{\rm Target}$ . Both networks are trained well, but the network with the cost function $\newcommand{\e}{{\rm e}} \epsilon$ does not achieve our goal of correctly predicting $\psi^{\rm Target}$ and $\Delta^*\psi^{\rm Target}$ .

**Figure 3.** An example of the two networks' results trained with the cost function (a) $\newcommand{\e}{{\rm e}} \epsilon$ and (b) $\newcommand{\e}{{\rm e}} \epsilon^{\rm new}$ for KSTAR shot# 17939 at 0.950 s. Both networks (red dashed line) reproduce the $\psi^{\rm Target}$ (black line) well (left panels), but only the network trained with $\newcommand{\e}{{\rm e}} \epsilon^{\rm new}$ reproduces $\Delta^*\psi^{\rm Target}$ (right panels).
Download figure:
Standard image High-resolution image

Since our goal is to develop a neural network that solves the GS equation, we choose the cost function to be $\newcommand{\e}{{\rm e}} \epsilon^{\rm new}$ to train the networks. We determine the weighting factors found by minimizing the cost function $\newcommand{\e}{{\rm e}} \epsilon^{\rm new}$ with Adam optimizer [56] which is one of the gradient-based optimization algorithms. With 90% and 10% of the total data samples for training and validation of the networks, respectively, we stop training the networks with a fixed number of iterations that is large enough but not too large such that the validation errors do not increase, i.e. to avoid overfitting problems. The whole workflow is carried out with Python and Tensorflow [57].

With the selected cost function we create three different networks that differ only by the training data sets. NN₂₀₁₇, NN₂₀₁₈ and NN_{2017, 2018} refer to the three networks trained with the data sets from only 2017 (744 discharges), from only 2018 (374 discharges) and from both 2017 and 2018 (744 + 374 discharges) campaigns, respectively.

The descending feature of the cost function $\newcommand{\e}{{\rm e}} \epsilon^{\rm new}$ as a function of the training iteration for the NN_2017,2018 network is shown in figure 4. Both the training errors (blue line) and validation errors (red dashed line) decrease together with similar values which means that the network is well generalized. Furthermore, since the validation errors do not increase, the network does not have an overfitting problem. Note that fluctuations in the errors, i.e. standard deviation of the errors, are represented as shaded areas.

Small undulations repeated over the iterations in figure 4 are due to mini-batch learning. Contrary to batch learning, i.e. optimizing the network with the entire training set in one iteration, mini-batch learning divides the training set into a number of small subsets (1000 subsets in our case) to optimize the networks sequentially. One cycle that goes through all the subsets once is called an epoch. Mini-batch learning helps to escape from local minima in the weighting factor space [58] via the stochastic gradient descent scheme [59].

4. Performance of the developed neural networks: benchmark tests

In this section, we present how well the developed networks perform. The main figures of merit we use are the peak signal-to-noise ratio (PSNR) and mean structural similarity (MSSIM) as have been used previously [34] in addition to the usual statistical quantity R², the coefficient of determination. We note that obtaining full-flux surface information $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \psi\lp R, Z\rp$ on $22\times 13$ or $65\times 65$ spatial grids with our networks takes less than 1 msec on a typical personal computer.

First, we discuss the benchmark results of the NN_2017,2018 network. Then, we compare the performance of the NN₂₀₁₇, NN₂₀₁₈ and NN_2017,2018 networks. Here, we also investigate cross-year performance, for instance, applying the NN₂₀₁₇ network to predict the discharges obtained from the 2018 campaign and vice versa. Then, we evaluate the performance of the networks against the rt-EFIT results to examine the possibility of supplementing or even replacing the rt-EFIT with the networks. Finally, we show how the imputation scheme supports the networks' performance. Here, all the tests are performed with the unseen (to all three networks, i.e. NN₂₀₁₇, NN₂₀₁₈ and NN_2017,2018) KSTAR discharges which are 88 and 75 KSTAR discharges from the 2017 and 2018 campaigns, respectively.

4.1. Benchmark results of the NN_2017,2018 network

Figure 5 show the benchmark results of the NN_2017,2018 network, i.e. network trained with the data sets from both 2017 and 2018 campaigns. (a) and (b) show the results with the test discharges from the 2017 campaign, while (c) and (d) present the results with the test discharges from the 2018 campaign. Histograms of (a)(c) $\psi^{\rm NN}$ versus $\psi^{\rm Target}$ and (b)(d) $\Delta^*\psi^{\rm NN}$ versus $\Delta^*\psi^{\rm Target}$ are shown with colors representing the number of counts. For instance, there is a yellow colored point in figure 5(a) around $(-0.1, -0.1)\pm\varepsilon$ , where $\varepsilon$ is a bin size for the histogram. Since yellow represents about $2\times 10^5$ counts, there are approximately $2\times 10^5$ data whose neural network values and EFIT values are $-0.1\pm\varepsilon$ simultaneously within our test data set. Note that each KSTAR discharge contains numerous time slices whose number depends on the actual pulse length of a discharge, and each time slice generates a total of $22\times 13=286$ data points. The values of $\psi^{\rm Target}$ and $\Delta^*\psi^{\rm Target}$ are obtained from the off-line EFIT results. It is clear that the network predicts the target values well.

**Figure 5.** Performance tests of the NN_2017,2018 network on the unseen KSTAR discharges from (a)(b) 2017 campaign and (c)(d) 2018 campaign. The values of R² and histograms of (a)(c) $\psi^{\rm NN}$ versus $\psi^{\rm Target}$ and (b)(d) $\Delta^*\psi^{\rm NN}$ versus $\Delta^*\psi^{\rm Target}$ with colors representing number of counts demonstrate the strength of the NN_2017,2018 network. Red dashed line is the y = x line.
Download figure:
Standard image High-resolution image

As a figure of merit, we introduce the R² metric (coefficient of determination) defined as

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm R}^2 = 1 - \frac{\sum\nolimits_{i=1}^{L} \left(y_i^{\rm Target} - y_i^{\rm NN} \right)^2}{\sum\nolimits_{i=1}^{L} \left(y_i^{\rm Target} - \frac{1}{L}\sum\nolimits_{j=1}^{L} y_{j}^{\rm Target} \right)^2}, \nonumber \end{align} \tag{ 6 }$

where y takes either $\psi$ or $\Delta^*\psi$ , and L is the number of test data sets. The calculated values are written in figure 5, and they are indeed close to unity, implying the existence of very strong linear correlations between the predicted (from the network) and target (from the off-line EFIT) values. Note that R² = 1 means the perfect prediction. The red dashed lines in the figures are the y = x lines.

Figure 6 is an example of reconstructed magnetic equilibria using KSTAR shot #18057 from the 2017 campaign. (a) shows the evolution of the plasma current. The vertical dashed lines indicate the time points where we show and compare the equilibria obtained from the network (red) and the off-line EFIT (black) which is our target. (b) and (c) are taken during the ramp-up phase, (d) and (e) during the flat-top phase, and (f ) and (g) during the ramp-down phase. In each sub-figure from (b) to (g), the left panels compare $\psi$ , and the right panels are for $\Delta^*\psi$ . We mention that the equilibria in figure 6 are reconstructed with $65 \times 65$ grid points even though the network is trained with $22 \times 13$ grid points, demonstrating how the spatial resolution is flexible in our networks.

For a quantitative assessment of the network, we use an image-relevant figure of merit that is the PSNR [60] (see appendix B), originally developed to estimate the degree of artifacts due to image compression compared to an original image. The typical PSNR range for a JPEG image, which preserves the original quality with a reasonable degree, is generally 30–50 dB [34, 61]. For our case, the network errors relative to the off-line EFIT results can be treated as artifacts. As listed in figures 6(b)–(g), the PSNR for $\psi$ is very good, while we achieve acceptable values for $\Delta^*\psi$ .

4.2. The NN₂₀₁₇, NN₂₀₁₈ and NN_2017,2018 networks

Similar to shown in figure 5, we show the benchmark results of NN₂₀₁₇ and NN₂₀₁₈ in figures 7 and 8, respectively. The R² metric is also provided in the figures. Again, the overall performance of the networks is good.

**Figure 7.** Same as figure 5 for the NN₂₀₁₇ network, i.e. trained with the data sets from 2017 campaign.
Download figure:
Standard image High-resolution image

**Figure 8.** Same as figure 5 for the NN₂₀₁₈ network, i.e. trained with the data sets from 2018 campaign.
Download figure:
Standard image High-resolution image

The NN₂₀₁₇ and NN₂₀₁₈ networks are trained with only in-campaign data sets, e.g. NN₂₀₁₈ with data sets from only the 2018 campaign, and we find slightly poorer results, but still good, in predicting cross-campaign magnetic equilibria, e.g. NN₂₀₁₈ predicting equilibria for the 2017 campaign. Notice that NN₂₀₁₇ seems to predict cross-campaign equilibria better than the in-campaign ones by comparing figures 7(a) and (c), which contradicts our intuition. Although the histogram in figure 7(c) seems tightly aligned with the y = x line (red dashed line), close inspection reveals that the NN₂₀₁₇ network, in general, underestimates the off-line EFIT results from the 2018 campaign marginally. This will be evident when we compare image quality.

MSSIM [62] (see appendix B) is another image relevant figure of merit used to estimate perceptual similarity (or perceived differences) between the true and reproduced images based on the inter-dependence of adjacent spatial pixels in the images. The MSSIM ranges from zero to one, where the closer to unity the better the reproduced image is.

Together with the PSNR, figure 9 shows the MSSIM for (a) NN₂₀₁₇, (b) NN₂₀₁₈ and (c) NN_2017,2018 where the off-line EFIT results are used as a reference. Notice that counts in all the histograms of MSSIM and PSNR in this work correspond to the number of reconstructed magnetic equilibria (or a number of time slices) since we obtain a single value of MSSIM and PSNR from one equilibrium, whereas counts in figures 5, 7 and 8 are much bigger since $286(=22\times 13)$ data points are generated from each time slice. The red (green) line indicates the test results for the data sets from the 2017 (2018) campaign. In general, whether the test data sets are in-campaign or cross-campaign, the image reproducibility of all three networks, i.e. predicting the off-line EFIT results, is good as attested by the fact that the MSSIM is quite close to unity and the PSNR for $\psi$ ( $\Delta^*\psi$ ) ranges approximately from 40 to 60 (20 to 40). It is easily discernible that the in-campaign results are better for both NN₂₀₁₇ and NN₂₀₁₈ unlike what we noted in figures 7(a) and (c). Not necessarily guaranteed, we find that the NN_2017,2018 network works equally well for both campaigns as shown in figure 9(c).

**Figure 9.** Histograms of MSSIM (left panel) and PSNR (right panel) for (a) NN₂₀₁₇, (b) NN₂₀₁₈ and (c) NN_2017,2018. Red (green) line indicates the test results for the data sets from 2017 (2018) campaign. In each sub-figure, top (bottom) panel shows the results for $\psi$ ( $\Delta^*\psi$ ). The off-line EFIT results are used as reference.
Download figure:
Standard image High-resolution image

4.3. Comparisons among nn-EFIT, rt-EFIT and off-line EFIT

It is widely recognized that rt-EFIT results and off-line results are different from each other. If we allow the off-line EFIT results used to train the networks to be accurate ones, then the reconstruction of equilibria with the neural networks (nn-EFIT) must satisfy the following criterion: nn-EFIT results must be more similar to the off-line EFIT results than the rt-EFIT results are to the off-line EFIT, as mentioned in section 1. Once this criterion is satisfied, then we can always improve the nn-EFIT as genuinely more accurate EFIT results are collected. For this reason, we make comparisons among the nn-EFIT, rt-EFIT and off-line EFIT results.

Figure 10 shows an example of the reconstructed magnetic equilibria for (a) rt-EFIT versus off-line EFIT and (b) nn-EFIT (the NN_2017,2018 network) versus the off-line EFIT for KSTAR shot #17975 at 0.7 s with $\psi$ (left panel) and $\Delta^*\psi$ (right panel). The green, red and black lines indicate rt-EFIT, nn-EFIT and off-line EFIT results, respectively. This simple example shows that the nn-EFIT is more similar to the off-line EFIT than the rt-EFIT is to the off-line EFIT, satisfying the aforementioned criterion.

$ \newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \psi\lp R, Z\rp$ — **Figure 10.** An example of reconstructed $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \psi\lp R, Z\rp$ (left panel) and $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \Delta^*\psi\lp R, Z\rp$ (right panel) for KSTAR shot #17975 at 0.7 s comparing (a) rt-EFIT (green) and off-line EFIT (black) and (b) nn-EFIT (NN_2017,2018) (red) and off-line EFIT (black).
Download figure:
Standard image High-resolution image

To validate the criterion statistically, we generate histograms of the MSSIM and PSNR for the nn-EFIT and the rt-EFIT with reference to the off-line EFIT. This is shown in figure 11 as histograms, where the MSSIM (left panel) and PSNR (right panel) of $\psi$ (top) and $\Delta^*\psi$ (bottom) are compared between the nn-EFIT (black) and the rt-EFIT (green). Here, the nn-EFIT results are obtained with the NN_2017,2018 network on the test data sets. We confirm that the criterion is satisfied with the NN_2017,2018 network as the histograms in figure 11 are in favor of the nn-EFIT, i.e. larger MSSIM and PSNR are obtained by the nn-EFIT. This is more conspicuous for $\Delta^*\psi$ than $\psi$ .

**Figure 11.** Histograms of MSSIM (left panel) and PSNR (right panel) of $\psi$ (top) and $\Delta^*\psi$ (bottom) calculated by the nn-EFIT (black) and the rt-EFIT (green), where the nn-EFIT is NN_2017,2018. For both nn-EFIT and rt-EFIT, the off-line EFIT is treated as a reference.
Download figure:
Standard image High-resolution image

**Figure 11.** Histograms of MSSIM (left panel) and PSNR (right panel) of $\psi$ (top) and $\Delta^*\psi$ (bottom) calculated by the nn-EFIT (black) and the rt-EFIT (green), where the nn-EFIT is NN_2017,2018. For both nn-EFIT and rt-EFIT, the off-line EFIT is treated as a reference.
Download figure:
Standard image High-resolution image

We perform similar statistical analyses for the other two networks, NN₂₀₁₇ and NN₂₀₁₈, which are shown in figures 12 and 13. Since these two networks are trained with the data sets from only one campaign, we show the results where the test data sets are prepared from the (a) 2017 campaign and (b) 2018 campaign so that in-campaign and cross-campaign effects can be assessed separately. We find that the criterion is fulfilled for both $\psi$ and $\Delta^*\psi$ with both the in- or cross-campaign data.

**Figure 12.** Same as figure 11 with the NN₂₀₁₇ as the nn-EFIT where the test data sets are obtained from (a) 2017 campaign and (b) 2018 campaign.
Download figure:
Standard image High-resolution image

**Figure 13.** Same as figure 11 with the NN₂₀₁₈ as the nn-EFIT where the test data sets are obtained from (a) 2017 campaign and (b) 2018 campaign.
Download figure:
Standard image High-resolution image

4.4. The NN_2017,2018 network with the imputation scheme

If one or more magnetic pick-up coils forming part of the input to the nn-EFIT are impaired, then we will have to re-train the network without the damaged ones, or hope that the network will reconstruct equilibria correctly by padding a fixed value, e.g. zero-padding, to the broken ones. Of course, one can anticipate training the network by considering possible combinations of impaired magnetic pick-up coils. With a total number of 68 signals from the magnetic pick-up coils being input to the network in our case, we immediately find that the number of possible combinations increases too quickly to consider it as a solution.

Since inferring the missing values is better than the null replacement [49], we resolve the issue by using the recently proposed imputation method [50] based on Gaussian processes [52] and Bayesian inference [51], where the likelihood is constructed based on Maxwell's equations. The imputation method infers the missing values fast enough, i.e. less than 1 msec to infer up to nine missing values on a typical personal computer; thus, we can apply the method during plasma discharge by replacing the missing values with the real-time inferred values.

We have applied the imputation method to KSTAR shot #20341 at 2.1 s for the normal ( $B_{\rm n}$ ) and tangential ( $B_{\rm t}$ ) components of the magnetic pick-up coils as an example. We have randomly chosen nine signals from the 32 $B_{\rm n}$ measurements and another nine from the 36 $B_{\rm t}$ measurements and pretended that all of them (9 + 9) are missing simultaneously. Figure 14 shows the measured (blue open circles) and the inferred (red crosses with their uncertainties) values for (a) $B_{\rm n}$ and (b) $B_{\rm t}$ . Probe # on the horizontal axis is used as an identification index of the magnetic pick-up coils. Table 2 provides the actual values of the measured and inferred ones for better comparisons. We find that the imputation method infers the correct (measured) values very well except probe #37 of $B_{\rm n}$ . The inferred (missing) probes are probes #3, 4, 6, 14, 18, 24, 30, 35, 37 for $B_{\rm n}$ and probes #4, 6, 8, 11, 17, 29, 30, 32, 35 for $B_{\rm t}$ . Here, we provide all the probe #'s used for the neural network: $B_{\rm n}$ probe #[2, ..., 6, 8, 9, 11, ..., 15, 17, ..., 20, 23, ..., 26, 28, ..., 32, 34, 35, 37, ..., 41] (a total of 32); and $B_{\rm t}$ probe #[2, ..., 6, 8, 9, 11, ..., 32, 34, 35, 37, ..., 41] (a total of 36).

Table 2. The imputation results shown in figure 14 with KSTAR shot #20341 at 2.1 s.

B_n (T) $\times 10^{-2}$			B_t (T) $\times 10^{-2}$
No.	Measured	Inferred	No.	Measured	Inferred
3	−1.45	−1.88 $\pm$ 0.22	4	−14.69	−13.97 $\pm$ 0.47
4	−1.72	−2.31 $\pm$ 0.24	6	−12.38	−11.42 $\pm$ 0.97
6	4.62	4.45 $\pm$ 0.65	8	−7.82	−7.88 $\pm$ 0.67
14	6.13	6.36 $\pm$ 0.27	11	−3.15	−3.22 $\pm$ 0.65
18	−8.27	−8.11 $\pm$ 0.48	17	0.10	0.30 $\pm$ 0.52
24	1.86	1.65 $\pm$ 0.30	29	3.84	2.65 $\pm$ 0.64
30	−7.52	−7.19 $\pm$ 0.18	30	1.15	0.49 $\pm$ 0.61
35	−7.93	−7.08 $\pm$ 0.65	32	−2.65	−2.11 $\pm$ 0.62
37	−4.27	−1.41 $\pm$ 0.93	35	−8.07	−8.87 $\pm$ 0.55

Comparisons between the nn-EFIT without any missing values, which we treat as reference values, and the nn-EFIT with the imputation method or with the zero-padding method are made. Here, the nn-EFIT results are obtained using the NN_2017,2018 network. The top panel of figure 15 shows $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \psi\lp R, Z\rp$ obtained from the nn-EFIT without any missing values (black line) and from the nn-EFIT with the two missing values replaced with the inferred values (green line), i.e. the imputation method, or with zeros (pink dashed line), i.e. the zero-padding method for (a) $B_{\rm n}$ (left panel) and (b) $B_{\rm t}$ (right panel) at 2.1 s of KSTAR shot #20341. Probes #14 and 30 for $B_{\rm n}$ and probes #4 and 8 for $B_{\rm t}$ are treated as the missing ones. The bottom panels compare histograms of MSSIM and PSNR using the imputation method (green) and the zero-padding method (pink) for all the equilibria obtained from KSTAR shot #20341.

**Figure 15.** Top panel: nn-EFIT (NN_2017,2018 network) reconstructed equilibria without any missing values (black line), and with two missing values replaced with the inferred values using the imputation method (green line) or with the zeros using the zero-padding method (pink dashed line), where the missing values are (a) $B_{\rm n}$ probe #14 and 30 (left panel) and (b) $B_{\rm t}$ probe #4 and 8 (right panel). Bottom panels: histograms of MSSIM and PSNR using the imputation method (green) and the zero-padding method (pink) for all the equilibria obtained from KSTAR shot #20341, where the reference values are those obtained using nn-EFIT without any missing values. Note that there are many more counts less than 0.9 for MSSIM with the zero-padding method.
Download figure:
Standard image High-resolution image

It is clear that the nn-EFIT with the imputation method (green line) is not only much better than that with the zero-padding method (pink dashed line) but it also reconstructs the equilibrium close to the reference (black). In fact, the zero-padding method is too far off from the reference (black line) to be relied on for plasma controls.

Motivated by such a successful result of the nn-EFIT with the imputation method on the two missing values, we have increased the number of missing values as shown in figures 16 and 17 for the same KSTAR discharge, i.e. KSTAR shot #20341. Let us first discuss figure 16 which shows (a) the eight (without probe #6) and (b) nine (all) missing values of $B_{\rm t}$ . The color codes are the same as in figure 15, i.e. the reference is black, and the nn-EFIT with the imputation method is green or with the zero-padding method is pink. It is evident that the nn-EFIT with the imputation method performs well at least up to nine missing values. Such a result is, in fact, expected since the imputation method has inferred the missing values well as shown in figure 14(b) in addition to the fact that a well-trained neural network typically has a reasonable degree of resistance on noise. Again, the nn-EFIT with the zero-padding method is not reliable.

**Figure 16.** Same color code as in figure 15. Missing values are (a) eight $B_{\rm t}$ (without probe #6), and (b) all nine $B_{\rm t}$ .
Download figure:
Standard image High-resolution image

**Figure 16.** Same color code as in figure 15. Missing values are (a) eight $B_{\rm t}$ (without probe #6), and (b) all nine $B_{\rm t}$ .
Download figure:
Standard image High-resolution image

**Figure 17.** Same color code as in figure 15. Missing values are (a) eight $B_{\rm n}$ (without probe # $37)$ , (b) all nine $B_{\rm n}$ .
Download figure:
Standard image High-resolution image

Figures 17(a) and (b) are the results with the eight (without probe #37) and nine (all) missing values of $B_{\rm n}$ , respectively. The color codes are the same as in figure 15. We find that the nn-EFIT with the eight missing values reconstructs the equilibrium similar to the reference one, while the reconstruction quality becomes notably worse for the nine missing values. This is caused mostly due to poor inference of probe #37 by the imputation method (see figure 14(a)). Nevertheless, the result is still better than that with the zero-padding method. Figure 18 shows the reconstruction results with the same color codes as in figure 15 when we have (a) 4 + 4 and (b) 9 + 9 combinations of $B_{\rm n}$ and $B_{\rm t}$ missing values simultaneously.

**Figure 18.** Same color code as in figure 15. Combinations of missing $B_{\rm n}$ and $B_{\rm t}$ are examined: (a) four missing $B_{\rm n}$ and four missing $B_{\rm t}$ case, and (b) nine missing $B_{\rm n}$ and nine missing $B_{\rm t}$ case.
Download figure:
Standard image High-resolution image

All these results suggest that the nn-EFIT with the imputation method reconstructs equilibria reasonably well except when the imputation infers the true value poorly, e.g. $B_{\rm n}$ probe #37 in figure 14(a) and table 2. In fact, the suggested imputation method [50] infers the missing values based on the neighboring intact values (using Gaussian processes) while satisfying Maxwell's equations (using Bayesian probability theory). Consequently, such a method becomes less accurate if (1) the neighboring channels are also missing AND (2) the true values change fast from the neighboring values. In fact, $B_{\rm n}$ probe #37 happens to satisfy these two conditions, i.e. probe #35 is also missing, and the true values of probe #35, #37 and #38 are changing fast as one can discern from figure 14(a).

5. Conclusions

We have developed and presented a neural network-based GS solver constrained with measured magnetic signals. The networks take the plasma current from a Rogowski coil, 32 normal and 36 tangential components of the magnetic fields from the magnetic pick-up coils, 22 poloidal fluxes from the flux loops, and $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \lp R, Z\rp$ position of interest as inputs. With three fully connected hidden layers consisting of 61 nodes in each layer, the network outputs a value of poloidal flux $\psi$ . We set the cost function used to train the networks to be a function of not only the poloidal flux $\psi$ but also the GS equation $\Delta^*\psi$ itself. The networks are trained and validated with 1118 KSTAR discharges from the 2017 and 2018 campaigns.

Treating the off-line EFIT results as accurate magnetic equilibria to train the networks, our networks fully reconstruct magnetic equilibria, not limited to obtaining selected information such as positions of the magnetic axis, X-points or plasma boundaries, more similar to the off-line EFIT results than the rt-EFIT is to the off-line EFIT. Owing to the fact that the $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \lp R, Z\rp$ position is a part of the input, our networks have adjustable spatial resolution within the first wall. The imputation method supports the networks to obtain the nn-EFIT results even if there exist a few missing inputs.

As the necessary computation time is only approximately 1 ms, the networks have potential to be used for real-time plasma control. In addition, the networks can be used to provide a large number of automated EFIT results fast for many other data analyses requiring magnetic equilibria.

Acknowledgment

This research is supported by National R&D Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (Grant Nos. NRF-2017M1A7A1A01015892 and NRF-2017R1C1B2006248) and the KAI-NEET, KAIST, Korea.

Appendix A. Real-time preprocess on magnetic signals

As shown in figure 2 and discussed in section 2, normal ( $B_{\rm n}$ ) and tangential ( $B_{\rm t}$ ) components of magnetic fields measured by the magnetic pick-up coils and poloidal magnetic fluxes ( $\Psi_{\rm FL}$ ) measured by the flux loops tend to have residual drifts after calibrating the magnetic diagnostics. We train the neural networks with preprocessed, i.e. drift-adjusted, magnetic signals. Therefore, we must be able to preprocess the signals in real time as well. Here, we introduce how we preprocess the magnetic signals in detail. The same preprocess is applied to all the training, validation and test data sets. Note that we do not claim that how we adjust the magnetic signals corrects the drifts completely.

A.1. Real-time drift adjustment with information obtained during the initial magnetization stage

To adjust the signal drifts, we deem a priori that the signals drift linearly in time [63–65]. Of course, non-linear drift may well exist in the signals. However, we need to come up with a very simple and fast solution to adjust the drifts in real time with the limited amount of information. One can consider such linearization in time as taking up to the first order of Taylor-expanded drifting signals. Therefore, we take the drifting components of the signals ( $y_i^m$ ) from various types (the magnetic pick-up coils or the flux loops) of magnetic diagnostics to follow:

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle \label{eq:lineardrift} y_i^m = a_i^m t + b_i^m, \nonumber \end{align} \tag{ A.1 }$

where t is the time. $a_i^m$ and $b_i^m$ are the slope and the offset, respectively, of a drift signal for the ith magnetic sensor of type m (magnetic pick-up coils or flux loops). Then, our goal simply becomes finding $a_i^m$ and $b_i^m$ for all i and m of interest before a plasma starts or the blip time (t = 0) so that $y_i^m$ can be subtracted from the measured magnetic signals in real time, i.e. preprocessing the magnetic signals for the neural networks.

We use two different time intervals during the initial magnetization stage, i.e. before the blip time, for every plasma discharge to find $a_i^m$ and $b_i^m$ , sequentially. Figure A1 shows an example of temporal evolutions of currents in the poloidal field (PF) coils, $B_{\rm n}$ and $B_{\rm t}$ and poloidal magnetic flux up to the blip time (t = 0) of a typical KSTAR discharge.

**Figure A1.** An example of temporal evolutions of (a) currents in the PF coils, (b) normal and (c) tangential components of magnetic fields measured by the magnetic pick-up coils, respectively, and (d) poloidal flux measured by one of the flux loops during the initial magnetization stage, i.e. t < 0, for a typical KSTAR discharge. Information from the time interval d1 (d2) is used to estimate $a_i^m$ ( $b_i^m$ ).
Download figure:
Standard image High-resolution image

During the time interval d1 in figure A1, all the magnetic signals must be constant in time because there are no changes in currents of any of the PF coils; in addition, there are no plasmas yet that can change the magnetic signals. Therefore, any temporal changes in a magnetic signal during d1 can be regarded as due to a non-zero $a_i^m$ . With the knowledge of $a_i^m$ from the d1 time interval, we obtain the value of $b_i^m$ using the fact that all the magnetic signals must be zeros during the time interval d2 because there are no sources of magnetic fields, i.e. all the currents in the PF coils are zeros.

Summarizing our procedure, (1) we first obtain the slopes $a_i^m$ based on the fact that all the magnetic signals must be constant in time during d1 time interval, and then (2) find the offsets $b_i^m$ based on the fact that all the magnetic signals, after the linear drifts in time are removed based on the knowledge of $a_i^m$ , must be zeros during d2 time interval.

A.2. Bayesian inference

Bayesian probability theory [51] has a general form of

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \newcommand{\rp}{\right)} \newcommand{\lp}{\left(} \displaystyle p\lp\mathcal{W}|\mathcal{D}\rp=\frac{p\lp\mathcal{D}|\mathcal{W}\rp p\lp\mathcal{W}\rp}{p\lp\mathcal{D}\rp}, \label{eq:Bayes} \nonumber \end{align} \tag{ A.2 }$

where $\mathcal{W}$ is a (set of) parameter(s) we wish to infer, i.e. $a_i^m$ and $b_i^m$ for our case, and $\mathcal{D}$ is the measured data, i.e. measured magnetic signals during the time intervals of d1 and d2 in figure A1. The posterior $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} p\lp\mathcal{W}|\mathcal{D}\rp$ provides us the probability of having a certain value for $\mathcal{W}$ given the measured data $\mathcal{D}$ which is proportional to a product of likelihood $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} p\lp\mathcal{D}|\mathcal{W}\rp$ and prior $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} p\lp\mathcal{W}\rp$ . Then, we use the maximum a posterior (MAP) to select the value of $\mathcal{W}$ . The evidence $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} p\lp\mathcal{D}\rp$ (or marginalized likelihood) is typically used for a model selection and is irrelevant here as we are only interested in estimating the parameter $\mathcal{W}$ , i.e. $a_i^m$ and $b_i^m$ .

We estimate values of the slope $a_i^m$ and the offset $b_i^m$ based on equation (A.2) in two steps as described above:

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle \label{eq:baye-slope} {{\rm Step~(1)}}\:: \: p(a_i^{m}|\mathcal{\vec {D}}_{i, d1}^m) \propto p(\vec {D}_{i, d1}^m|a_i^{m})\,p(a_i^{m}), \nonumber \end{align} \tag{ A.3 }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle \label{eq:baye-off} {{\rm Step~(2)}}\:: \: p(b_i^{m}|\mathcal{\vec {D}}_{i, d2}^m, a_i^{m*}) \propto p(\vec {D}_{i, d2}^m|b_i^{m}, a_i^{m*})\,p(b_i^{m}), \nonumber \end{align} \tag{ A.4 }$

where $\mathcal{\vec {D}}_{i, d1}^m$ ( $\mathcal{\vec {D}}_{i, d2}^m$ ) are the time series data from the ith magnetic sensor of type m (magnetic pick-up coils or flux loops) during the time intervals of d1 (d2) as shown in figure A1. $a_i^{m*}$ is the MAP, i.e. the value of $a_i^m$ maximizing the posterior $p(a_i^{m}|\mathcal{\vec {D}}_{i, d1}^m)$ . Since we have no prior knowledge of $a_i^m$ and $b_i^m$ , we take priors, $p(a_i^{m})$ and $p(b_i^{m})$ , to be uniform allowing all the real numbers. Note that a correct $p(a_i^{m})$ would be equal to $\newcommand{\lp}{\left(} \newcommand{\rp}{\right)} \newcommand{\lsb}{\left[} \newcommand{\rsb}{\right]} 1/\lsb \pi\lp1+\lp a_i^{m} \rp^2 \rp \rsb$ [66], but we sacrifice rigor to obtain a fast solution. Furthermore, the posterior for $b_i^m$ should, rigorously speaking, be obtained by marginalizing over all possible $a_i^m$ , i.e. $p(b_i^{m}|\mathcal{\vec {D}}_{i, d2}^m)=\int p(b_i^{m}|\mathcal{\vec {D}}_{i, d2}^m, a_i^m)\,p(a_i^{m}|\mathcal{\vec {D}}_{i, d1}^m) da_i^m$ . Again, as we are interested in real-time application, such a step is simplified just to use $a_i^{m*}$ .

With equation (A.1), we model likelihoods, $p(\vec {D}_{i, d1}^m|a_i^{m})$ and $p(\vec {D}_{i, d2}^m|b_i^{m}, a_i^{m*})$ , as Gaussian:

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle & p(\vec {D}_{i, d1}^{m}|a_i^m) = \frac{1}{\sqrt{(2\pi)^L} |\sigma_{i, d1}^{m}|} \nonumber \\ & \times \exp \left(- \frac{\sum\limits_{t_l \in d1}^L \left[a_i^m(t_{l} - t_0) - \left(D_{i, d1}^m(t_l) - \left< D_{i, d1}^m(t_0)\right> \right)\right]^2}{2(\sigma_{i, d1}^m)^2}) \right), \label{eq:like-slope} \nonumber \end{align} \tag{ A.5 }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle p(\vec {D}_{i, d2}^m | b_i^m, & a_i^{m*}) = \frac{1}{\sqrt{(2\pi)^K} |\sigma_{i, d2}^m|} \nonumber \\ & \times \exp \left(- \frac{\sum\limits_{t_k\in d2}^K \left[b_i^m - \left(D_{i, d2}^m(t_k) -a_i^{m*} t_k\right)\right]^2}{2(\sigma_{i, d2}^m)^2} \right), \label{eq:like-offset} \nonumber \end{align} \tag{ A.6 }$

which simply state that noises in the measured signals follow Gaussian distributions. Here, $\sigma_{i,d1}^m$ and $\sigma_{i,d2}^m$ are the experimentally obtained noise levels for the ith magnetic sensor of type m (magnetic pick-up coils and flux loops) during the time intervals of d1 and d2 in figure A1, respectively. t_l and t_k define the actual time intervals of d1 and d2, i.e. $t_l \in [-6, -1]$ s and $t_k \in [-14, -13]$ s with L and K being the numbers of the data points in each time interval, respectively. t₀ can be any value within the d1 time interval, and we set t₀ = −2 s in this work. $\left< D_{i, d1}^m(t_0)\right>$ , removing the offset effect to obtain only the slope, is the time-averaged value of $D_{i, d1}^m(t)$ for $t\in [t_0-0.5, t_0+0.5]$ s. We use the time-averaged value to minimize the effect of the noise in $D_{i, d1}^m(t)$ at t = t₀.

With our choice of uniform distributions for priors in equations (A.3) and (A.4), MAPs for $a_i^m$ and $b_i^m$ , which we denote as $a_i^{m*}$ and $b_i^{m*}$ , coincide with the maximum likelihoods which can be analytically obtained by maximizing equations (A.5) and (A.6) with respect to $a_i^m$ and $b_i^m$ , respectively:

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle \label{eq:direct-slope} a_i^{m*} = \frac{\sum\limits_{t_l \in d1}^L \left[\left(D_{i, d1}^m(t_l) - \left< D_{i, d1}^m(t_0)\right>\right)\left(t_l - t_0 \right) \right]}{\sum\limits_{t_l \in d1}^L \left[t_l - t_0 \right]^2}, \nonumber \end{align} \tag{ A.7 }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle \label{eq:direct-off} b_i^{m*} = \frac{1}{K} \sum\limits_{t_k \in d2}^K \left[D_{i, d2}^m(t_k) - a_i^{m*} t_k \right]. \nonumber \end{align} \tag{ A.8 }$

Now, we have attained simple algebraic equations based on Bayesian probability theory which can provide us values of the slope $a_i^m$ and the offset $b_i^m$ before the blip time, i.e. before t = 0.

Since the required information ( $a_i^m$ and $b_i^m$ ) to adjust drifts in the magnetic signals is obtained before every discharge starts, we can preprocess the magnetic signals in real time. This is how we have adjusted the drift signals shown in figure 2.

Appendix B. Image-relevant figures of merit—PSNR and MSSIM

In section 4, we used two image-relevant figures of merit, namely PSNR [60, 61] and MSSIM [62], to examine the performance of the developed neural networks. Although these figures of merit are widely used and well known, we present short descriptions of the PSNR and MSSIM for the readers' convenience. Notice that we treat a reconstructed magnetic equilibrium as an image whose dimension (number of pixels) is set by the spatial grid points.

B.1. PSNR

PSNR is calculated as

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle \label{eq:psnr} {\rm PSNR}=10\times\log_{10}\left[\frac{\max\left(y^{\rm Target}\right)^2}{\frac{1}{M}\sum\nolimits_{i=1}^M\left(y_i^{\rm Target} - y_i^\star \right)^2}\right], \nonumber \end{align} \tag{ B.1 }$

where y _i is the value of either $\psi$ or $\Delta^*\psi$ at the ith position of the spatial grid (analogous to a pixel value of an image), and M for the total number of grid points, i.e. either $286 (=22\times 13)$ or $4225 (=65\times 65)$ depending on our choice for reconstructing equilibrium. $\max(\cdot)$ operator selects the maximum value of an argument, and $y^{\rm Target}$ is an array containing 'pixel' values of a reference EFIT 'image', that is a reconstructed magnetic equilibrium. $y^\star$ is also an array, and depending on whether we wish to compare the off-line EFIT result with either the rt-EFIT result or the nn-EFIT result, we select the corresponding values.

B.2. MSSIM

MSSIM is calculated as

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\lab}{\left\langle} \displaystyle \label{eq:mssim} {\rm MSSIM}=\frac{\left(2\mu_{y^{\rm Target}} \mu_{y^\star} + C_1 \right)\left(2\sigma_{y^{\rm Target} y^\star}+C_2 \right)}{\left(\mu^2_{y^{\rm Target}} + \mu^2_{y^\star}+ C_1 \right)\left(\sigma^2_{y^{\rm Target}} + \sigma^2_{y^\star}+ C_2 \right)}, \nonumber \end{align} \tag{ B.2 }$

where $\mu_{y^{\rm Target}}$ and $\mu_{y^\star}$ are the mean values of $y^{\rm Target}$ and $y^\star$ , respectively. Here, $y^{\rm Target}$ and $y^\star$ mean the same as in section B.1. $\sigma^2_{y^{\rm Target}}$ and $\sigma^2_{y^\star}$ are the variances of $y^{\rm Target}$ and $y^\star$ , respectively, while $\sigma_{y^{\rm Target} y^\star}$ is the covariance between $y^{\rm Target}$ and $y^\star$ . C₁ and C₂ are used to prevent a possible numerical instability, i.e. the denominator being zero, and are set to be small numbers. Following [62], we have $C_1=10^{-4}$ and $C_2=9\times 10^{-4}$ .