Building a turbulence-transport workflow incorporating uncertainty quantification for predicting core profiles in a tokamak plasma

The impact of micro-scale turbulence on the macro-scale plasma profiles in a tokamak is a multi-scale problem (both in space and time) that is treated in this paper by the coupling of turbulence simulations of multiple flux-tubes to a core transport code, together with an equilibrium code. Work on quantifying the uncertainty in the predicted profiles, together with a comparison to experiment is also presented.


Introduction
Magnetically confined plasmas hold out the hope for a nearly inexhaustible supply of energy. Amongst the many challenges in reaching the goal of a fusion power plant, is the understanding of energy transport in the core plasma. A very successful way of treating the core plasma numerically is as a set of 1 dimensional time-dependent partial differential equations describing the density, temperature and toroidal rotation as a function of a radial coordinate and time. The basic form of these is as a conservation equation involving a balance between the time rate of change of the quantity of interest (QOI), the radial divergence of the flux of the QOI and the sources of * Author to whom any correspondence should be addressed. Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. the QOI. The electron temperature, T e (ρ, t), is expressed in the form of an electron energy equation (n e (ρ, t)T e (ρ, t)) = ∇ ρ n e (ρ, t)χ e (ρ, t)∇ ρ (T e (ρ, t)) + S e (ρ, t), (1) where n e (ρ, t) is the electron density; χ e (ρ, t) the thermal conductivity; and S e (ρ, t) the source of electron energy. Here the energy flux has been represented with a diffusive ansatz but, in general, a convective piece might also be needed. Much of the challenge in treating this problem is in the determination of the transport coefficient, χ e (ρ, t). A variety of models are available, but this work determines these transport coefficients by solving a turbulence problem using (in this case) the gyro-fluid flux-tube turbulence code GEM [1], running at eight positions across the plasma.
To address the disparity of time-and space-scales, a multi-scale approach has been implemented in a workflow as depicted in figure 1 which couples a transport code (in this case, the ETS [2]), a fixed boundary equilibrium code (here, CHEASE [3]), a turbulence code (GEM [1]) and an utility code that converts fluxes to transport coefficients.
The genesis of this approach is described in Falchetto-2014 [4] where the workflow was implemented in the Kepler 1 workflow engine [5] resulting in a workflow that allowed for the tight coupling of modular components with the unit of exchange between the codes the appropriate consistent physical object (CPO) [6]. Since the turbulence component required significant computing resources, UNICORE 2 was used to schedule this component on a remote supercomputer (see [7,8] for more information). At that time the supercomputer used was based in Juelich which supported UNI-CORE. Shortly after the work described in Falchetto-2014, the supercomputer in use switched to HELIOS in Japan, which did not support UNICORE. Since the porting of the complete Kepler environment used was very difficult at that time, a reduced version of the workflow was moved to using the MUSCLE2 coupling framework [9] which supported HPC use better, at the expense of requiring the entire workflow on the HPC. Development of the workflow is described in [10][11][12] [13] framework.
Starting from some initial profiles for the temperatures and densities, the equilibrium code calculates new metric coefficients (stored in the equilibrium CPO, green in figure 1). These, together with the temperature and density profiles (stored in the core profiles CPO, red in figure 1), are used by the turbulence code to advance the state of the turbulence simulation for a time window Δt turbulence that allows for some evolution of these fluxes. A running exponential mean is used to calculate flux-tube averaged fluxes (stored in the core transport CPO, light blue in figure 1) which are then converted to transport coefficients (also stored in the core transport CPO, blue in figure 1). These transport coefficients are used by the transport code to evolve the profiles over a time window Δt transport . The cycle then repeats until a fixed number of iterations is performed, with the number of iterations determined by the HPC queue time limits. Typically a number (≈5) of such jobs need to be performed where the final state of one job is the initial state of the next. Luk and co-authors in [11,14] discuss some of the issues that arise in this coupling, including limiting the time-step used by the transport code based on ensuring that the temperatures and temperature gradients do not change too much during the transport code time-step (which can lead to a large change in the fluxes predicted by the turbulence code when it sees the new profiles)-this was used in the simulations described in this paper; the alternative is just to limit the time-step allowed for the transport code but this can slow down the whole simulation. Another issue addressed in the papers is on deciding whether the simulation has converged to a quasi steady-state: the inherent nature of turbulence means that a true steady state cannot be achieved but what is sought is a state that, if it were to be further simulated for any length of time, would remain bounded and not diverge from the so-far found quasi steady-state. The approach described in [11] bins the results based on time steps, calculates a mean and standard deviation for each bin, and then compares each bin with previous bins looking for a match based on the means and standard deviations [14]; also looks at binning based on time windows and the maxima and minima in these bins. While these convergence detection methods work, more effort is perhaps needed since early detection of convergence could save a significant amount of computer time, but a failed too early detection could result in a 'wrong' solution.
In order to investigate the uncertainty in the predicted profiles, the EasyVVUQ library [15], developed as part of the VECMA project 7 , is used to incorporate the calculation of uncertainty intervals into the workflow. As identified by the VECMA project, different approaches to implementing uncertainty quantification (UQ) are possible. The easiest to implement is to treat the application as a black-box and then to apply one of the standard UQ techniques to the inputs and outputs of the black-box. In this case the black-box is the entire workflow run to (quasi) steady-state.
Since the full workflow is expensive to run (of order 100 000 core-hours and taking approximately 5 days) with some remaining uncertainty about whether a true quasi steadystate has been achieved, two alternatives have been prepared. The first uses the same workflow, but replaces the turbulence code by a simple proxy (GEM0) that accepts the same inputs and returns the same outputs. For the second, as a development tool, and for tutorial purposes, an even simpler code solving equation (1) directly for the temperature with a prescribed density profile in a cylindrical approximation was developed as described in [16] (see also the VECMA EasyVVUQ tutorial website 8 ).
We thus have three options for exploring UQ: GEM-WF The full workflow using the GEM turbulence code, which provides a relatively complete set of physics, is however too expensive to explore and develop UQ techniques, but is the ultimate target where we want to apply the UQ techniques.
GEM0-WF The full workflow with the turbulence code replaced by a cheap proxy that does converge to a steady-state in about 30 seconds of CPU time (on one core), and uses the same infrastructure as GEM-WF so that techniques developed with GEM0-WF can later be directly applied to GEM-WF.
Fusion-tutorial A tool that solves a similar problem that runs in about 20 ms, does not depend on the MUSCLE infrastructure required by GEM-WF and GEM0-WF, and that can be used to explore UQ techniques.

Results
UQ results are shown for the simple fusion tutorial example as well as from the transport, turbulence and equilibrium workflow using the GEM0 code in place of the GEM turbulence code. Then physics results are shown of the full transport, turbulence and equilibrium workflow together with a comparison with experimental data. Finally results comparing the usual positive triangularity with an artificially created negative triangularity AUG discharge are shown.

Uncertainty quantification for the simple fusion tutorial example
The advantage of the simple fusion tutorial implementation is that the source code is open and it is fast. It finds the steady-state heat conduction equation in cylindrical geometry using the 'finite volume PDE solver using Python', Fipy [17], in about 20 ms allowing for fast investigations of UQ options. Figure 2 shows a polynomial chaos expansion (PCE) ( [18] and references therein) investigation using 10 varying parameters (as specified in table 1) with a polynomial of order 2 (resulting in (2 + 1) 10 = 59 049 samples). (More details of the model can be found in section 3 of [16] or in the VECMA EasyVVUQ tutorial website 8 .) The Sobol indices results shows that most of the variance in the temperature profile at the centre of the plasma comes from the heating profile width; at mid-radius the most important source of the variance is the transport coefficient; and at the edge most of the variance comes from the boundary condition. Note that the impact of the edge temperature boundary condition dies out away from the edge.
Some caution is required in drawing physics conclusions from these results: the variances chosen for the input parameters were somewhat arbitrary and the model is much simpler than the real tokamak case.
But the fusion-tutorial application has allowed for an examination of scaling of the UQ analysis. Table 2 shows how long the actual analysis phase for PCE and stochastic collocation (SC) took for a case with 5 varying quantities, showing that the SC scaling is better than that for PCE. Table 3 shows SC timings for 10 varying parameters. For order 3, 40 days were required to do the SC analysis. An analysis using a dimension adaptive method with sparse grids (as used in [19]) resulted in a similar accuracy as the order 3 case but in 0.25 days, and requiring only 1245 samples.

Profiles and uncertainties from the transport, turbulence and equilibrium workflow with a simplified transport model
In order to do exploratory studies closer to the ultimate goal, the GEM0-WF is used where the turbulence code GEM is replaced by a simpler model GEM0 which gives the output fluxes at the same positions that GEM does, but based on a simple analytic model. This model is not meant to give quantitatively similar results to GEM, but is designed to be plug-and-play compatible in the workflow. This means that the runs no longer require 1024 cores, and run in a fraction of a minute rather than days-runs usually converge after about 10 cycles around the workflow, taking about 30 seconds.
As described in [20], using this variant, the workflow has been run with 4 uncertain parameters: 3 characterizing the width, position and magnitude of the electron heating profile and one characterising the boundary electron temperature. Here a normal distribution is used of ±20%. Results are shown in figure 3. Note that, similarly to the earlier fusiontutorial case, the impact of the variances in the edge temperature boundary condition dies away as one moves away from the edge. The relative sizes of the contributions to the Sobol indices away from the edge is affected by choices made in the range of the varying parameters (which are different to those used in the fusion-tutorial case).
The main purpose of this model is to test whether lessons learned from the fusion-tutorial UQ analysis can be applied to the workflows. If the GEMO-WF UQ analysis can be done efficiently enough, the ultimate goal would be to apply this to the full GEM-WF case. Since the GEM0 variant requires roughly a factor of 10 million less resources than the GEM variant, it is   Multiple runs (≈5) are needed (with a new run starting from the output of the previous run) before a quasi-steady-state is achieved. Figure 4 shows a comparison between the simulations and a few ASDEX Upgrade 'standard H-mode' shots. The initial simulation was set up for an earlier standard H-mode, but not all of the diagnostic signals are available for that shot; similar discharges ('standard H-modes') performed later were used as the basis of comparison. These shots were all 1 MA, −2.45 T, line average density of 8.5 × 10 19 m −3 and with heating between 8.2 and 9.5 MW. The heating sources were treated as Gaussian profiles so that sensitivity analysis could be done on the integral, position and width. The density profile was taken from the experiment and the plasma was assumed to be pure deuterium (i.e. no impurities). The experimental uncertainties are given by the integrated data analysis used to combine different diagnostics. The uncertainty intervals indicated for GEM are those resulting from the turbulence-not those arising from parameter uncertainties-and are the mean and standard deviation calculated on a time window at the end of the simulation. The standard deviations are small because of the limits on changes in temperatures and their gradients (discussed earlier, with limits of 10% and 5% for the values and gradients, respectively).
One of the challenges with these complicated workflows is ensuring that the different components agree on radial coordinates, as well as sharing the same definition of fluxes and transport coefficients. As an aspect of verification, the fluxes can be compared to the integral of sources inside the position at which the flux is calculated. Figure 5 shows heavily (time) smoothed fluxes from the simulation for the eight flux-tubes, together with a measure of the time-variation, compared to the integral energy source inside each of the flux surface for each flux-tube. One measure of convergence is that the time-average of the calculated fluxes should approach the source integrals-a process that seems to happen relatively quickly (for comparison, the energy confinement time is 0.068 s). These simulation results were taken from a long run that was used to try to calibrate various convergence criteria (see [11] and are almost certainly converged.
More details on the computational aspects of this work can be found in [14] and references therein, which also looks at applying quantitative measures for the validation of the computational results against the experimental results.

Initial comparison of positive and negative triangularity AUG plasma
In order to investigate the effect of negative triangularity on plasma confinement, the outer boundary of the plasma used in section 2.3 was mirrored around the geometric axis of the outermost flux surface, changing the sign of the upper and lower triangularity. Both cases have been started and, while not yet converged to a steady-state, initial observations are possible. Figure 6 shows the evolution of the electron and ion temperatures with time. Despite having performed the same number of cycles around the workflow, the elapsed physics time for the positive triangularity case is less than that for the negative triangularity case because of the time-step limiting in the transport code to prevent too large changes in profiles and gradients per time-step. Taking data from the end of the simulations and comparing the profiles, figure 7 shows very similar profiles for the two cases. It is to be noted, though, that the boundary conditions (applied at rho_norm = 1) are the same for the two cases and so any impact of triangularity on the boundary temperatures would not be seen in these simulations.

Discussion
A hierarchy of models has been developed: the slowest but most accurate model takes about 100 000 core hours for a single (quasi) steady state solution; the intermediate model uses the same infrastructure as the expensive model, but is about 10 000 times faster and requires one core rather than 1024; the fastest model is another 1500 times faster and is much more portable, requiring none of the proprietary physics code or the coupling framework. This hierarchy allows one to look at different aspects of the problem at the same time-the accuracy of the full physics model as well as the scaling of the UQ methods to a large number of cases (1000 000 for the most extreme case looked at so far).
The full workflow using the turbulence code (GEM-WF) shows a satisfactory match to the experiment suggesting that it would be interesting to explore the impact of varying the  heating profile and boundary conditions, but it is still too expensive to do this (except as a heroic one-off ).
The full workflow using a simple proxy for the turbulence code (GEM0-WF) has shown that is possible to do UQ on the workflow, treating it as a black-box, but that a significant number of runs of the code would be needed (hundreds to thousands).
The tutorial fusion application (fusion-tutorial) has allowed UQ aspects to be explored (PCE versus SC, the use of sparse dimension adaptive methods, etc) which will inform the application of UQ to the two workflow cases.
The investigation of positive and negative triangularity did not show a large difference: if differences are due to turbulence other than ion temperature gradient (such as trapped electron mode), then this would not be captured by the gyro-fluid turbulence code; if the differences are due to edge pedestal changes, then this too would be missed in these simulations since the boundary conditions were the same.

Outlook
The future challenge will be to bring the pieces of UQ together with the stochastic nature of the turbulence code to examine the feasibility of producing profile predictions incorporating a turbulence code together with uncertainty intervals. The work on adaptive sparse grid UQ has indicated that the number of samples required can be substantially reduced, but it would still be too expensive to apply this directly to the GEM-WF. Work has started on exploring the application of UQ to the components of the workflow ( [20] has done this for the GEM component and calculated the Sobol indices for the electron and ion heat fluxes as functions of the temperatures and their gradients), and the next step would be to explore propagating distribution functions around the workflow, rather than just the current profiles. Another approach under investigation is the replacement of the expensive model by a faster surrogate based on results from the expensive model.
If this proves to be feasible with the used gyro-fluid turbulence code, the extension to gyro-kinetic turbulence codes will be a conceptually easy subsequent step. However, to do this, access to large compute resources will be needed.