Single Gaussian Process Method for Arbitrary Tokamak Regimes with a Statistical Analysis

Gaussian Process Regression (GPR) is a Bayesian method for inferring profiles based on input data. The technique is increasing in popularity in the fusion community due to its many advantages over traditional fitting techniques including intrinsic uncertainty quantification and robustness to over-fitting. This work investigates the use of a new method, the change-point method, for handling the varying length scales found in different tokamak regimes. The use of the Student's t-distribution for the Bayesian likelihood probability is also investigated and shown to be advantageous in providing good fits in profiles with many outliers. To compare different methods, synthetic data generated from analytic profiles is used to create a database enabling a quantitative statistical comparison of which methods perform the best. Using a full Bayesian approach with the change-point method, Mat\'ern kernel for the prior probability, and Student's t-distribution for the likelihood is shown to give the best results.


I. INTRODUCTION
Achieving fusion energy using magnetic confinement requires understanding plasmas that are hotter than the sun and their relationship to the magnetic field. This is not an easy task because diagnostics in this harsh environment requires understanding the physics of the diagnostic, and the plasma itself. This is known as an inverse problem: understanding the plasma requires understanding the diagnostic, which requires understanding the plasma.
For magnetic confinement, the small Larmor radius expansion gives to first order force balance: J × B = ∇p.
However, while traditional equilibrium reconstructions using only the magnetic data can be robust and routine, using additional data such as Motional Stark Effect (MSE) [6], Thomson scattering diagnostic [27,28], or soft X-Ray [29] are more problematic both due to the additional complexity additional of modeling internal diagnostics and diagnostic limitations.
The best equilibrium reconstructions typically require considerable manual intervention. For burning plasmas, where the harsh requirements and increased need for plasma control, more robust methods are needed.
Using a Bayesian approach to these problems offers many advantages [30,31]. It more naturally allows one to account for the uncertainty in the data and for prior knowledge to be incorporated into the problem. The specific technique used in this paper will be Gaussian Process Regression, a technique that is becoming widely used in the uncertainty quantification and machine learning [32]. It's introduction into the fusion community was by Svensson [33] for soft X-ray tomography, but has since grown to include other diagnostics [33][34][35][36][37]. It's use within the fusion community has broadened to be used for verification and validation [38][39][40][41], and in equilibrium reconstruction as well [42,43]. The latter is our ultimate goal.
One of the biggest drawbacks in using GPR is the same as that using traditional fitting methods: how to account for very different profiles in the fitting. Most notably for tokamak fitting is the difference between the L-mode (low-confinement mode) and H-mode regimes. The H-mode regime is characterized by an edge pedestal that has a transport barrier giving short radial scale lengths. In the terminology of GPR (discussed more below), there is a non-stationary hyperparameters. That is, the short radial scale lengths imply that the radial correlation of the observations change. There are a few ways of handling this. The most popular method both within the larger ML community is known as warping, and is equivalent to a coordinate transformation in applied math, or non-uniform grids in numerical analysis. In the fusion community, a tanh warping function was used to develop non-stationary hyperparameters for H-mode profiles was introduced by Chilenski [38]. This has been used by many authors [34,36,37,39,40,44] since then.
The disadvantage of the warping approach to radially varying hyperparameters is the same problem that confronts the traditional fitting approach: one needs to know the type of fitting to perform ahead of time. This problem has been addressed in multiple ways.
One method is to determine if the profile is L-mode or H-mode ahead of time, and then perform the appropriate traditional fitting [45], or hyperparameter fitting [41]. In this paper, we explore a different approach using a change-point kernel with the goal of using a single method regardless of the tokamak parameter regime. Our methodology for these investigations will also be different: we will focus strictly on synthetic analytic profiles using synthetically-generated data. Using this method allows us to analytically explore the effects of the distribution of the noise and the influence of outliers in the data in a quantitative way.
Our paper proceeds as follows. We first briefly review the basics of Gaussian Process Regression to introduce the basic concepts and terminology. Although this has been discussed in the fusion literature above, as well as other fields such as machine learning [32], we will be introducing new methods and need to clarify the context of these new methods.
We next apply our new methods to the synthetic data characteristic of tokamak regimes and contrast with prior methods in the fusion community. Finally, we draw conclusions and point to future work.

II. BACKGROUND
Traditional method to fitting a profile f (ψ) typically involves finding a the function that depends on a set of parameters and minimizing the error. For example, one can write f (ψ) = f i α i , where f i are the parameters, and α i is the fitting function, and then minimize the error related to the data. This is the traditional method of EFIT which uses spline functions for the fitting functions with L-mode profiles, and splines and a tanh function for H-mode profiles. Advanced techniques for fitting using the OMFIT [46] are discussed in Ref. [47][48] In the Bayesian view, the process is fundamentally different. One first calculates the probability of a function being correct, and the most probable function is used as that fit.
Mathematically, this is written as the p f (ψ) | d : the probability of f (ψ) given the data d. This is a conceptual shift compared to traditional fitting. It acknowledges that there is uncertainty in the data, and uncertainty in f (ψ) itself. To find this probability is not straightforward directly. Instead, we use Bayes' Theorem: The advantage of Bayes' theorem in practical terms is that it easier to compute the probability of the likelihood and prior distribution than the posterior likelihood. The marginal likelihood is a normalizing constant that ensures the total probability is one.
The first step that we wish to consider is how to encode the space of all functions into the prior probability. Above, we discussed how parameters were used to parameterize a function.
A similar method is used to explore the distribution space of the probability functions here.
In PDE's, exploring a functional space is done using discretizations (e.g., finite difference).
The problem with this for probability functions is that although we might wish to assume that f (ψ) lies within a functional space (e.g., Hilbert space), the data, which is subject to noise, does not. Using traditional PDE discretizations would also subject one to systematic bias in determining the distributions. Hence, randomized sampling of the distribution space is generally preferred. Two methods for exploring this space are polynomial chaos expan-sions and stochastic processes. In both methods, the conversion of the continuum form of the probabilities above are converted to discrete form. For distributions, hyperparameters, denoted θ hp ,are used to explore the probability space for the prior and likelihood functions.
In this paper, we will consider a specific form of a stochastic process known as the Gaussian Process. Intuitively, a Gaussian process is one whose random numbers generated (definition of a stochastic process) is distributed normally. Mathematically, we write the prior sampling where µ is the mean function of the process and K is the covariance function, or kernel.
A common Gaussian process uses a zero mean and the squared exponential kernel (a.k.a. radial basis function): The Gaussian process is distribution over all smooth functions of ψ. Each function in the Gaussian process occurs with a certain probability that depends on the two hyperparameters θ v and θ l . The hyperparameter θ v characterizes how rapidly the probability decreases as the function departs from the mean function (0 in this case), and θ l characterizes the spatial correlation of the function at two points ψ and ψ . A kernel is stationary if it only depends on the distance between two points K (ψ, ψ ) = K (|ψ − ψ |), otherwise the kernel is nonstationary.
In the context of fitting profiles of fusion data, the discussion of whether a Gaussian process is stationary or not relates to the characteristic length scale of the profile, or data, to be inferred. For H-mode profiles, the short length scale of the pedestal region motivates the use of non-stationary kernels; e.g., as used in Chilenski [38]. Similarly, inferring magnetic data can be challenging because of rapid changes in the data requiring non-stationary kernels [35]. A second approach is the Empirical Bayesian where one uses the data to determine the optimal hyperparameters. There is a family of methods in this approach. One method chooses the likelihood to also be a Gaussian function. The combination of Gaussian functions in both the likelihood, and a squared-exponential kernel enables an analytically-tractable method for maximizing the marginal likelihood. This method was used in the first introduction of GPR to the. [33] When the likelihood is not a Gaussian, a closed-form analytic solution in general is not possible, and one must use numerical methods for this.
A different philosophical approach is the Full Bayesian approach in which the prior distribution is fixed before any data is observed. In this approach, the prior and likelihood are still described in terms of hyperparameters as described above, but the full hyperparameter space is explored numerically by integrating out the hyperparameters. Specifically, the likelihood and prior probability functions in Eq. 1 are parameterized in terms of hyperparameters, and then the hyperparameters are numerically integrated out in a process known as marginalization. There a wide variety of approaches to marginalization or in using partial approximations to accelerate this integration. In this paper, we will use Markov chain Monte Carlo methods (MCMC) to systematically perform the Bayesian integration. In the fusion community, Chilenski is a notable paper in discussion the Full Bayesian technique as well as using non-stationary hyperparameters. Empirical Bayesian remains popular however because of the speed of it as compared to Full Bayesian techniques. This is discussed in more detail later.
In this paper, we will systematically explore different approaches to using Bayesian fitting of profiles of varying length scales, and using both Empirical Bayes and Full Bayes using MCMC. The use of Full Bayes will be required as we explore the use of different likelihood functions in fitting noisy data. We turn to the method of determining the best method next.

A. Generation of synthetic data for tokamak parameter regimes
Our goal is to understand generic issues related to using the GPR method to infer the the density (n) and temperature (T ) profiles of tokamaks from localized measurements such as Thomson scattering. However, the profiles characteristic of L-mode, H-mode, ITB regimes can have varying length scales providing challenges to the GPR method. Experimentally, tokamaks account for this varying resolution by having TS systems with a spatial resolution that varies across the domain in order to capture the sharp features associated with a pedestal. For example, the DIII-D tokamak core TS system has a coarser spatial resolution which is adequate for resolving the slowly varying core n and T profiles and a finer resolution in pedestal to resolve the sharp profile gradients in this region [27].
Here, rather than use raw experimental data as in prior work [33][34][35][36][37], we create synthetic data to enable a more accurate understanding of the errors associated with different GPR methods. Specifically, for L-mode profiles we use where the input parameters are f o (value at the center), f edge (value at edge), α 1 (controls the gradient near the center, and α 2 controls the gradient near the edge. The H-mode profile is given by: where f ped and w ped control the pedestal height and width. The synthetic data is created by taking 40 points uniformly across the domain, and then another 40 points uniformly across the pedestal region to account for the way in which experimental measurements are often localized in the pedestal. The data at the discrete profile points is then perturbed using a random numbers to generate noise assuming a Gaussian distribution with a specified width (standard deviation). In this paper, the width varies from 10% to 33% of the data value itself. In addition, the noise can be shifted uniformly; i.e., given a non-zero mean in the Gaussian, to represent systematic errors. Outliers are synthetically created by choosing random points and using random values that have a Gaussian width of 2 -3 times the data value. If a negative outlier is generated, then it is made positive.
As seen below, this can create an additional systematic error. The synthetic ITB profiles are similar to the H-mode profiles except that the tanh profile is moved to the center of the plasma.

B. Choosing a kernel
As discussed in Section II, the standard basic kernel for GPR is the squared exponential kernels as given by Eq. 3. The hyperparameters in these kernels do not have to be chosen a priori, but can instead be inferred from the data themselves. Typically, a stationary kernel is used, but for H-mode profiles, Chilenski [38], using a a non-stationary kernel derived by Gibbs [49], used a tanh function for the hyperparameters. This addition can make the computation of the fit expensive, and additionally must be constrained or convergence becomes difficult. We desire a robust fitting routine, so we opted for a second option that is available in GPflow [50], the software that we use, called the change-point kernel. This is effectively a piecewise function for the kernel where multiple individual kernels can be defined and locations can be specified for when one kernel changes to another. The derivatives of the total kernel is ensured to be continuous by the use of a transfer function between the piecewise parts of the kernel, which smoothly transitions between the sub-kernels.
The H-mode pedestal occurs in the edge region and so we allow one kernel to describe ψ ∈ [0, 0.9] and ψ ∈ [1.0, 1.1], and then a separate kernel defines ψ ∈ [0.9, 1.0]. Each of these distributions has independent hyperparameters and therefore can represent a profile that has varying correlation length across the profile. Because we use the data to infer the hyperparameter values, this actually also works well to describe an L-mode profile that has a single effective correlation length. In that case, both kernels would end up with similar values for the hyperparameters. Figure 1 demonstrates the way this kernel has been constructed, as well as the data that illustrate the need for the change-point kernel.
The individual kernels chosen for kernels A and B (as indicated in figure 1) are the well-established Matérn covariance functions. These are designed to indicate the correlation between points given some distance, d = |ψ − ψ |, via the following equation: In this formula, θ v is indicative of the width of the covariance function (like the standard deviation in a gaussian) and θ l is a normalization parameter that indicates the degree of the correlation between points. These two values are hyperparameters of the kernels that can be independently optimized for each kernel (A & B). In regions of large gradients/fluctuations we expect σ l to be small, while in regions of small gradients we expect it to be larger.

C. Handling outliers
Outlying data can negatively affect the fit and must be handled with care. One obvious but labor intensive approach is to manually go through the data to identify outliers or "bad" data and remove them. This is, however, not ideal for multiple reasons. Firstly, it takes time and effort, but we would like the fitting to be an automated process. Secondly, sometimes data can look bad but be accurate, and it is not always possible to determine which is the case. Finally, we would rather be able to incorporate all the data but keep track of the errors instead of eliminating data because they look bad. Therefore, we conclude that we want to keep all the data, but we do not want outliers to negatively affect the fit. Instead we want them to affect the error of the fit. In the Bayesian framework, this is handled by the likelihood probability.
In general, when letting the data inform the choice in hyperparameters, we want to maximize the log likelihood of the fit given the data. For data with normally distributed noise, a squared exponential likelihood is used as it is appropriate, and it comes with the added benefit that the problem can be reduced analytically to an easily tractable form.
However, if there are outliers in the data, and/or the noise is non-Gaussian this is no longer an appropriate choice for the likelihood function. A heavy-tailed distribution can capture the error on the outliers without affecting the quality of the fit for the data points with lower error.
We have tested multiple heavy-tailed distributions including the Student's t-, logistic, and Laplace distributions. All of these function adequately, as we will show, but the Student's t-distribution is the one performs best. It is given by: where ν is the degrees of freedom and Γ is the gamma function. This is a particularly nice function because it approaches a Gaussian as ν → ∞. We can allow the degrees of freedom hyperparameter to be optimized during the GPR so that if the errors on the data are genuinely Gaussian then the value for ν will end up being large and this will capture the errors properly. However, if we have outliers that would skew the fit, the ν parameter will adjust automatically to be smaller, increasing the weight of the likelihood function tails, allowing the quality of fit to remain high while minimizing over-fitting. This does not come without a cost, however, since we can no longer make the same analytic simplifications to optimize hyperparameters that can be done with a pure Gaussian likelihood. Instead, we resort to Markov Chain Monte Carlo (MCMC) to sample the hyperparameters space of both the kernel and the likelihood simultaneously. Instead of optimizing, then, we obtain a distribution for each of the hyperparameters, and then sample from them to obtain possible fits. The final fit we choose is the mean of all the sampled fits.

IV. RESULTS
We will discuss two regimes that we fit explicitly: L-mode and H-mode. We do this because these two cases are illustrative of the reasoning behind our choice of kernel and likelihood functions. Then we will go on to discuss the statistics of many fits.

A. Fitting in various regimes
In all of the regimes we discuss below, the kernel and likelihood function are identical.
The automated tuning of the hyperparameters is all that differs, and so we will discuss those results and show the distributions for all the hyperparameters. In this way, all of these regimes are fit without any human intervention or regime detection algorithms. The data themselves inform the choice of hyperparameters for the GPR.

L-mode
In the L-mode, as discussed above, the profiles are smooth. We can hypothesize the affect of such a profile on the hyperparameter distributions. Firstly, it is expected that the hyperparameters of both kernels are similar since the correlation length across the profiles is roughly unchanging. Additionally, we expect the degrees of freedom parameter to roughly correlate with the number and size of the outlying data. Figure    to a Cauchy distribution than a Gaussian, so confirms the need for such a heavy-tailed distribution to adjust the fit to account for the outliers.

H-mode
The change in length scales of the pedestal region is the motivation for using the change point kernel discussed in Section III B. Figure 4 shows the pedestal around ψ = 1, and one can see that the remaining profile is simply the same as the L-mode profile with an offset.
The combination change point kernel allows this profile to be fit with the same kernel used to fit the L-mode profile in section IV A 1.

H-mode with ITB
The last case we look at explicitly is for an H-mode profile that contains an internal transport barrier (ITB). This is another region of steep gradient but in the core of the plasma instead of the edge, where the pedestal can exist. In our synthetic data, this will occur where the data is more sparse than the pedestal region, similar to real experiments.
This case is not explicitly fit by the kernel we introduce in section III B, but it could be extended to do so by adding a third kernel that covers a region around ψ = 0.5. This would increase the number of hyperparameters, therefore the dimensionality of the parameter space through which the MCMC must sample. It is still doable, but is saved for future work. Since this is not done, the purpose of this section is to show that this method performs well even without these additions. Figure 6 shows an example of a fit of this type of profile. Though the gradient of the ITB is not perfectly fit by GPR, an inflection in the fit is seen as the fit attempts to capture the inflection in the underlying profile. This fit better captures an ITB than a parameterized method for H-mode would, because there is no over-fitting, and yet requires no knowledge that such an ITB might exist. This applies to any non-standard profile, as well -GPR is able to smoothly fit unexpected features in the underlying profile without making assumptions about tokamak regimes.

B. Statistical analysis
In this section we perform a statistical analysis of various GPR fitting methods on a large set of synthetic data that spans a broad parameter space in the profile shape and noise. The four methods we test here cover two kernels, two likelihoods, and full vs empirical Bayes.
As described in section I, empirical Bayes involves an analytic simplification assuming a Gaussian likelihood, allowing for simple hyperparameter optimization. Full Bayesian, on the other hand, does not make this simplification so a method like Markov chain Monte Carlo must be used to sample the hyperparameter space (likelihood function is not restricted).
The Chilenski kernel [38] is used in the fusion community used for GPR regression of Hmode profiles for equilibrium reconstruction. It has been implemented for use in OMFIT [46] with OMFIT profiles [47]. It is a standard Gibbs

Building database of fits
In order to analyze the quality of fit provided by the methods described in the previous sections, we have performed a large number of fits on a range of synthetic profiles. There are two parts to doing a study like this. First, a figure of merit must be defined to determine and compare the quality of the fits. Secondly, a parameter space must be defined to generate the range of synthetic profiles.
For the first, there are many figures of merit that can be used to describe the quality of fit. We desire one that does not depend on the number of points in the resulting fit, so must therefore include a mean. The root mean square error (RMSE) is a reasonable choice as it represents the standard deviation of the residuals between the fit and the true, underlying profile. It is important to note that the Chilenski method also aims to obtain high quality fit derivatives for transport model calculations. In this paper we are using the RMSE which only compares the quality of the data fit, not including any of the extra features involving derivatives that Chilenski developed. We define this as: where y i is the value of the fit and p i is the true profile value. Because these profiles are synthetically generated with noise added, we know the exact, underlying profile making this Between these profiles, methods, and the figure of merit we were able to build up a database of over 5000 sets of synethic data from that enable us to perform a quantitative comparison of the different methods.

Worst fits for each method
In order to put the RMSE values into context, we first want to show what constitutes a "bad" fit using these methods. An advantage of GPR is that when hyperparameter optimization is successful, even the bad fits are smooth and generally still acceptable fits.
This is not always the case for parameterized fitting methods that can fail spectacularly due to over-fitting. To examine these bad cases, we have found the worst fit (highest RMSE) for each method and then plotted that fit along with the fit for the same data using all the other methods.
These plots are shown in figure IV B 2. The worst fit using the empirical Bayesian method with the Chilenski kernel (top left plot) significantly over fit the data in the core and smoothed over the pedestal. This is due to the outliers that are close to each other making the hyperparameter likelihood function multi-modal -we are most likely seeing a local maximum instead of the global. This indicates a key problem that our outlier solution is able to address. This is addressed by our method, but One could also imagine making the assumption that the core correlation length must always be larger than the pedestal/edge correlation length, thus constraining the Chilenski method so that this type of fit could not occur. However, we invoked the Chilenski method using the same safeguards as in OMFIT [46], which would need to be modified for robustness. The data in this fit are not particularly difficult to fit, as the other methods do a relatively good job in fitting them, though their means (excluding the Student's t-distribution likelihood method) are pulled above the true profile due to these outliers in the core region. Therefore the strength of the Student's tdistribution likelihood is that it addressed multiple problems that outliers can cause in data fitting -affecting the mean and confusing the length scale hyperparameter optimization.
The worst fit using the other three methods show outlying cases where the data happen to have all the outlying data points above the underlying profile. This is an edge case since for the synthetic data the outliers have equal chances of lying above and below the true profile. In all these cases the mean fit are pulled by the outliers away from the underlying profile by these outliers. Additionally, these synthetic data profiles also sit in the parameter space where the noise offset is non-zero. This is to be expected, since none of these methods are able to intrinsically detect systematic errors.
Importantly, all of the proposed (non-Chelinski) methods, even in the worst case, capture the pedestal gradient well and therefore are able to identify pedestal location and width.
These are key parameters that EFIT uses to continue forward with the equilibrium calculation. Additionally, even these worst cases result in smooth fits that would be unlikely to cause calculation errors as a part of an equilibrium reconstruction workflow.

Effect of outlier mitigation
We introduce in section III C a method for minimizing the impact of outliers on the fit by using a Student's t-distribution likelihood. This heavy-tailed distribution accounts for outliers without shifting the mean. Figure 8 shows the benefits of such an approach compared to the rest of the analyzed methods.
It is clear from this plot that the full Bayesian approach provides much better fits on average than the empirical Bayesian approach. Additionally, each of these lines is seen to trend up in error as more and more outliers are introduced, except for the method that includes outlier mitigation via the Student's t-distribution likelihood. This method is roughly invariant to the number of outliers, at least in the range shown here which represents a maximum of over 11% outliers in the data.

Fitting L-mode versus H-mode
One of the objectives behind developing the fitting method described in this paper (change-point kernel with student-T likelihood) is to fit H-mode and L-mode equally well without having to know a priori anything about the operational mode of the tokamak from which the data come. This is accomplished through GPR with variable length scales in the kernel and hyperparameter optimization/sampling. However, we want to test how well Figure 9. These plots show the distribution of RMSE for the various methods for H-mode and L-mode. It is important to note that the RMSE axes differ in scale for the plots -the full Bayesian approach consistently produces fits with lower RMSE than the empirical Bayesian approach.
these methods perform at fitting L-mode and H-mode to make sure that the methods are as effective as they should be. Figure 9 shows the distribution of RMSE for both L-mode and H-mode separately, and for each fitting method separately. There are a few key points worth emphasizing from these plots. Firstly, it's clear that the empirical Bayesian does a worse job, regardless of kernel, for fitting H-mode than it does for fitting L-mode. This is likely due to the large parameter space involved and ending up in a local maximum instead of global for the likelihood maximization.
Also, the fit for H-mode is more difficult as the gradients are steeper, so the parameter space is likely also steeper and multi-modal. Secondly, the Full Bayesian approach sees minimal different between fitting L-mode and H-mode, and the overall RMSE is lower for these than for the empirical Bayesian fits. The multi-modal behavior seen in this figure is due to the large parameter space encapsulated within the data. The peaks at higher RMSE are due to noise offsets and large outliers, while the peaks at smaller RMSE are easier profiles to fit.

V. DISCUSSION
This paper used synthetic data with the ability to control for the the amount of noise, the degree of outliers, and the effect of systematic bias to perform a statistical analysis of various GPR methods. These methods include the non-stationary empirical Bayes method introduced by Chilenski [38], as well as an alternative to non-stationary hyperparameters: the change-point kernel (a linear combination of two stationary kernels). The effect is similar, however, in that it allows multiple length scales to be chosen for GPR and applied to different locations on the ψ axis. The change-point kernel we introduced is, however, more flexible than a Gibbs kernel with a tanh length scale function as it allows any number of length scales that can change at arbitrary positions in ψ.
The most important new method introduced here is the Student's t-distribution for the likelihood function. This allows the fitting to handle outliers reliably and without any prior knowledge as to which points are outliers and how many there may be. The use of the Student's t-distribution rules out the Empirical Bayes method used here, but is easily implemented in our Full Bayesian method using GPflow. We compared Full Bayesian with the standard Gaussian likelihood to the Student's t-distribution and found that both were more accurate than the Empirical Bayesian techniques.
To be truly quantitative in our judgements, a statistical analysis using the various methods we described to compare their effectiveness across a range of profiles including L-mode, H-mode, and H-mode with ITB. This also included a variety of settings on the noise including amplitude, mean, outlier number, etc. to allow an analysis of over 5,000 sets of generated data. However, fundamentally the synthetic data is generated with Gaussian noise such that it should match well to the commonly used square exponential kernel. The noise was varied to allow a shifted Gaussian, representing systematic errors in a diagnostic, as well as outliers, representing bad channels in a diagnostic.
The most important results of this statistical study can be seen in Figure 8. H-mode profiles are consistently more difficult to difficult to fit with empirical Bayes method, with a consistent RMS error that is approximately 10% higher than L-mode. There are also differences between L-mode and H-mode profiles in the Full Bayesian methods, but the differences are slight. The change-point kernel is slightly better than the non-stationary hyperparameter tanh profiles of Chilenski, but the difference is slight. Full Bayesian methods are an order of magnitude more accurate Empirical Bayesian methods, but comes at a computational cost that is roughly an order of magnitude more expensive. Though we did not test it, we would expect the Chilenski kernel to perform similarly to our own using the full Bayesian method since it also performed similar to our own using empirical Bayes. The Student's t-distribution function for the likelihood is roughly twice as accurate as the Gaussian likelihood when there are a large number of outliers. The lack of sensitivity to outliers by the Student's t-distribution can visually be seen in Figure IV B 2. It is important to note that even the worst fits provided by our proposed methods result in smooth, reasonable fits to the data and are robust.
In future work, we will continue to refine these methods as we apply them to experimental data in a similar statistical manor. When working with experimental data, however, we do not have the luxury of knowing the nature of the true profile. Fortunately, the study that we have done here can give us confidence that we are able to capture the underlying profile with a fit, barring the existence of systematic offset in the noise.