Development and simulation of multi-diagnostic Bayesian analysis for 2D inference of divertor plasma characteristics

We present results of the design, implementation and testing of a Bayesian multi-diagnostic inference system which combines various divertor diagnostics to infer the 2D fields of electron temperature Te, density ne and deuterium neutral density n0 in the divertor. The system was tested using synthetic diagnostic measurements derived from SOLPS-ITER fluid code predictions of the MAST-U Super-X divertor which include appropriate added noise. Two SOLPS-ITER simulations in different states of detachment, taken from a scan of the nitrogen seeding rate, were used as test-cases. Taken across both test-cases, the median absolute fractional errors in the inferred electron temperature and density estimates were 10.3% and 10.1% respectively. Differences between the inferred fields and the test-cases were well explained by solution uncertainty estimates derived from posterior sampling. This work represents a step toward a larger goal of obtaining a quantitative, 2D description of the divertor plasma state directly from experimental data, which could be used to gain better understanding of divertor physics phenomena.


Limitations of conventional divertor diagnostic analysis
The divertor of a magnetic confinement fusion device is a complex system involving transport, atomic, molecular and impurity processes in the plasma as well as at the divertor surfaces, all giving rise to energy, momentum and particle sources and sinks [1]. These processes are influenced by other aspects of the divertor such as the divertor geometry (e.g. what fraction of recycled neutrals escape the divertor) and magnetic topology. All of the above make it difficult to separate out the effects of individual processes to verify whether our physics understanding, embodied in 2D models of the divertor such as SOLPS-ITER [2], are correct.
Despite the variety of diagnostic systems available in the divertor, they each have limitations such that any single instrument cannot directly determine the 2D fields of plasma characteristics (e.g. electron/ion temperature and density) or the properties of the neutrals (e.g. atomic and molecular densities). For example, Langmuir probes and Thomson scattering systems can directly measure some of these fields, but do so only at a series of isolated points. Filtered camera imaging systems can collect information from a large fraction of the divertor crosssection, but provide line-integrated measurements of spectral line emissivities, which are a complicated function of the underlying plasma fields [3].
Due to these limitations, studying the physics of divertor plasmas has often relied on matching the predictions of codes like SOLPS-ITER to diagnostic measurements, to find a set of plasma fields which are consistent with the available data. This matching process can be extremely time intensive, and typically requires 'by-hand' tuning of input parameters (e.g. recycling coefficients, boundary conditions such as upstream density, transport coefficients etc) over many iterations, and a period of weeks to months. Conceptually, in this approach assumptions are made regarding how divertor physics processes and boundary conditions determine the divertor plasma state, which then implies a corresponding set of expected diagnostic measurements.
We propose to instead take the inverse approach, where starting from the diagnostic measurements the plasma state is inferred, and from the inferred plasma state the underlying physics processes can be determined. Here we demonstrate that the first part of this approach, direct inference of the divertor plasma state, is possible using an 'integrated' approach to divertor analysis in which data from multiple diagnostic systems are combined.
Such an integrated approach, if successful, would not serve as a replacement for 2D divertor modelling codes. Rather it is an alternative path to studying the role of various divertor processes and how they vary during and across discharges. It may provide an independent test of the validity of the physics we believe is responsible for determining the divertor plasma state, and is implemented within 2D divertor modelling codes.
For the purposes of this study we use the geometry and planned diagnostics of the MAST-U spherical tokamak as a test-case to investigate integrated divertor diagnostic analysis [4]. The MAST-U divertor will be well diagnosed, possessing a multi-wavelength imaging (MWI) system based on the MANTIS system at TCV [5], which can simultaneously image the divertor for each of up to 10 atomic lines, spectrometers, bolometers, Langmuir probes and a dedicated divertor Thomson scattering system. A cross-section of MAST-U is shown in figure 1, and the coverage of diagnostics relevant to our analysis is illustrated in figure 2.

An integrated, Bayesian approach to divertor analysis
We will make use of the Bayesian approach to data analysis, in which probability is used as a means of quantifying the information content of experimental data with respect to model parameters. By formalising the information content in this way, we are able to combine data from multiple diagnostics in order to strengthen our knowledge of the plasma fields. This is highly desirable, but comes at the cost of an increase in the complexity and computational expense of the data analysis, as typically all data must be analysed simultaneously. Multi-diagnostic Bayesian analysis has been successfully applied within tokamak plasma studies for profile diagnostic analysis [6,7] and equilibrium reconstruction [8,9], but not yet to inference of the 2D divertor plasma state.
Here we discuss the design, implementation and testing of a Bayesian multi-diagnostic inference system for the MAST-U Super-X divertor which aims to infer the fields of plasma electron temperature and density, and hydrogen neutral density, throughout the divertor, including associated uncertainties. In section 2 we discuss the parameterisation of the problem and design requirements of the system. In sections 3 and 4 we show how information regarding the plasma fields in both measurement data and prior knowledge may be expressed as probability distributions. In section 5 the construction of synthetic test-cases using SOLPS simulations is discussed. In section 6 we discuss the numerical strategies used to characterise the posterior distribution for the plasma fields. The results of analysing the synthetic data are presented in section 7. A discussion of potential improvements and further work is given in section 8, followed by conclusions in section 9.

Parametric representation of plasma fields
We choose to represent each of the 2D plasma fields via linear interpolation on a triangular mesh, shown in figure 2, which covers the relevant areas of the divertor cross-section. Specifically, this means that by defining the value of a field at each vertex of the mesh, that field is defined continuously inside each triangle of the mesh as the plane that connects the three points which define that triangle.
Using this approach a field, for example the electron temperature field T e (R, z), is defined as where ( ) T e k is the electron temperature at vertex k, ( ) f R z , k is the linear interpolation basis function for vertex k, and V is the number of mesh vertices. This model for the plasma fields has the advantage that the model parameters themselves are the values of each field at each mesh vertex, allowing physics constraints to be easily applied. For example, to ensure that the electron temperature field is greater than zero everywhere, we need only ensure that the parameters which set the temperature at each vertex are greater than zero, i.e. that ( ) > " The mesh shown in figure 2 was used to produce all results presented in this paper. It was generated by first creating a mesh of equilateral triangles of side length 35 mm which aligns with a toroidally-symmetric approximation of tile 5, where the outer strike-point will typically be located. In select regions of the mesh covering the expected position of the divertor leg and strike point, triangles were partitioned to produce a higher-resolution area with side-lengths of 17.5 mm, yielding a refined mesh with V=586 vertices.

Design requirements
To guide the direction of the system design a set of requirements were chosen. Firstly, we want to be able to choose easily which diagnostics are included in the analysis. This means that diagnostic systems should be able to be added or removed from the analysis without making direct alterations to the system code. Instead, there should be a 'higher-level' interface for specifying the choice of diagnostics.
A key part of the system are the diagnostic forward models. Also sometimes referred to as 'synthetic diagnostic' models, forward models simulate the experimental data we would expect to measure using a particular instrument under a given set of plasma conditions (in this case, the 2D fields of electron temperature T e , electron density n e and hydrogen neutral density n 0 defined by the mesh). There may be many possible forward models for a given diagnostic, which vary based on the physics assumptions they make, their level of complexity and their computational cost. As before, we want an interface which allows us to specify which model is used for a given diagnostic system without making changes to the code.

Choice of diagnostics
In order to infer T e , n e and n 0 without including additional fields, we must choose diagnostics whose measurements can be predicted using these fields only. The divertor Thomson scattering system is able to make direct measurements of T e and n e making it an obvious choice [10]. We also include the target Langmuir probes on tile 5 (shown in figure 2) as they are located where the primary heat and particle fluxes are incident on a divertor surface. The probe measurements, under certain assumptions, can also be modelled using only T e and n e [11].
The emissivity of line radiation has a dependence on both T e and n e , so any filtered camera data will carry information about the T e , n e fields. However, the emissivity of a given line also depends on the density field of the corresponding emitting species, which is charge-state and metastable-state resolved. For the initial development and testing of this technique, we consider only hydrogenic line emission, which can be modelled using only T e , n e and n 0 . Consequently we include 4 filtered cameras which view the hydrogen Balmer α, β, γ and δ lines, which correspond to the n=3, 4, 5, 6 to n=2 transitions respectively. This initial version of the system represents a first step toward integrating as many divertor diagnostics as possible; other diagnostics which may be included in future are discussed in section 8.

Bayesian multi-diagnostic analysis
In Bayesian analysis, our knowledge regarding a set of model parameters is expressed as a probability distribution over those parameters, which can be thought of as the distribution of possible 'causes' that could have produced the measured data.
In this case the set of model parameters, which we will call θ, are the values of T e , n e and n 0 at each vertex of the mesh (which are used by the model in (1) to specify the plasma fields) such that Our goal is to learn about the distribution of θ constrained by the set of diagnostic data, which is commonly referred to as D. This distribution is expressed mathematically as the probability of θ given D, i.e.
. This is called the 'posterior distribution', and is given by Bayes' theorem as Constructing the posterior distribution and learning about its properties is absolutely central to Bayesian analysis, so it is worthwhile to discuss the terms on the right-hand side of (3) individually. P(θ) is the prior distribution, and represents any information we have regarding the model parameters before we include information from the diagnostic data. For example, this information may be a physics constraint such as nonnegativity of the plasma fields. This information could be encoded into the prior distribution by having the prior probability fall to zero if any field values are negative. Typically the prior distribution must be chosen rather than derived-this choice will be discussed in section 4. D ( ) P is usually referred to as the model evidence, and is important in model selection problems, however we may ignore it in this analysis as the posterior need only be determined up to a constant of proportionality in order for it to be characterised.
is the likelihood, and is the probability that we would observe a dataset D assuming the plasma were in a state described by a given θ. The use of D serves as a useful shorthand to represent distributions over many individual data values. For example, suppose that d (i) represents a single data value from our full dataset-the likelihood is actually the joint distribution over every individual data value given the model parameters, i.e. ( |) , , n 1 2 we may write the likelihood more concisely as D ( | ) q P . If some set of random variables, in this case D, are mutually conditionally independent (i.e. the uncertainties of all data values are independent) then the joint distribution of all the variables can be written as the product over the distributions for each variable such that This assumption of independence may not always be valid and depends on the instruments in question, but it is strongly simplifying so should be made where possible.

Individual diagnostic likelihoods
In multi-diagnostic inference, it is often practical to separate out the overall likelihood for all data into a product of the likelihoods for each diagnostic system. Let the dataset for the Thomson scattering system and Langmuir probes be labelled D ts and D lp respectively. We will separate out the data for each filtered camera, such that data for the i'th camera is represented by D i fc, . Again making the assumption of mutual independence between the datasets, the likelihood for all data can be now written as It is common practice to work in log-probabilities, not only for the conceptual simplification that large products of probabilities become sums of log-probabilities, but also for improved numerical stability. Here we use L to indicate a log-probability density function, such that L( | ) . Now combining (3) and (5) we can express the log-posterior distribution All terms in (6) (except L D ( ), which is in practice discarded) can be evaluated independently. From a programming perspective, this allows each term to be implemented as a separate, selfcontained object, encapsulating all experimental data and forward models required to evaluate that term. This approach was used when designing the system, and the resulting structure of the code is illustrated in figure 3. This allows any of the terms to be easily included or excluded from the log-posterior, fulfilling one of the design requirements.

Thomson scattering and langmuir probe likelihoods
A single Langmuir probe or spatial channel of the divertor Thomson scattering system accumulate their signal over a volume which can be thought of as a spatial instrument function. However, if the extent of this instrument function is small compared to the scale lengths over which the relevant plasma fields vary, we may approximate them to be point measurements. Making this approximation de-couples the analysis of the raw Thomson and Langmuir data from the problem of inferring the fields. For example, the posterior distribution for electron temperature and density for a single Thomson channel can be computed in advance and stored, and then referred to when assessing the likelihood of that spatial channel with respect to a set of proposed fields.
This approximation while convenient is not strictly necessary, and in future when the system is applied to real experimental data we may forgo this assumption and forward-model from the proposed fields directly to the raw Thomson scattering and Langmuir probe data. Presently however, we seek only to demonstrate that the multi-diagnostic inference approach has value, so we are free to prescribe a sensible likelihood for the data of a point measurement given T e and n e . For this purpose, we use an uncorrelated bivariate normal distribution such that is also of the form given in (7).

Filtered camera system likelihood
The emissivity at the j'th mesh vertex for a given hydrogen spectral line E j is approximated as a sum of excitation and recombination emission such that where PEC , PEC ex rec are the photon emissivity coefficients for excitation and recombination respectively, whose values are taken from the ADAS database [3]. This model assumes that only atomic emission channels contribute meaningfully to the hydrogenic spectral emission and that = Z 1 eff . The experimental data are camera images, each of which are analysed as vector of pixel-brightness values b. The brightness at the ith pixel b i is modelled as the integral along that pixel's line-of-sight through the emissivity field defined by the values in the emissivity vector E. As the fields are defined through Barycentric interpolation, which is linear, this lineintegral can be represented exactly by a weighted sum of the emissivities at each mesh vertex. Given a particular mesh, and a set of lines-of-sight for the pixels, these weights can be precalculated and stored as a 'geometry matrix' G such that the product of this matrix with the emissivities E G yields a prediction of the pixel brightness values. We represent the filtered camera likelihood as multivariate normal such that Experimental calibration of filtered camera systems typically finds the variance of the pixel brightnesses (assuming the pixel is not near saturation) to be linear [12] such that where α, β are constants determined as part of the calibration. For our synthetic camera model, we re-parametrise (10) so that the coefficients are more easily interpreted. First suppose that the error at zero brightness can be expressed as some fraction f 0 of the maximum brightness b max . Second, we fix the fractional error at the maximum brightness to be a constant f max . Under these assumptions the variance may be expressed as

Prior constraints
We are always forced to choose a prior distribution-even omitting the prior is equivalent to using a uniform prior (i.e. one which deems all possible sets of θ to be equally likely), which is itself a choice. Our goal here is to construct a prior which excludes unrealistic plasma conditions. In order to do this, we require information about the space of realistic plasma Flow chart illustrating the code structure of the system. The posterior distribution is encapsulated as a single object which takes model parameters (which define the plasma fields) as inputs and returns the posterior log-probability. The prior distribution and the likelihood for each diagnostic system are separated into selfcontained objects, which can independently request the specific information they require about the plasma fields from the plasma state object. The diagnostic and prior objects each return a logprobability value, which are summed to produce the posterior logprobability. The 'MCMC sampler or global optimiser' is responsible for choosing the next set of parameters to pass to the posterior object, with the objective of either sampling from the posterior or finding its maximum.
conditions that exist within the divertor. To gain insight into this, we examined a collection of 25 MAST-U SOLPS simulations which were carried out in support of a study on enhancements to the plasma exhaust operational space of MAST-U [13]. These simulations cover a range of plasma densities at the core grid boundary (´´-3.6 10 1.5 10 m 18 20 3 ), and heating powers (  1.7 2.5 MW). In each case, the fields of T e , n e and n 0 in the lower divertor were extracted, and the values of each field across all simulations were gathered into a single dataset. By plotting the gathered field values against one another we are able to derive simple but useful constraints on plasma conditions which dictate whether a given triple of (T e , n e , n 0 ) is considered realistic-these results are summarised in figure 4.
Our chosen prior is made up of three components: a constraint on the static electron pressure, a constraint on the neutral fraction and a constraint on the spatial 'smoothness' of the plasma fields. This can be expressed mathematically by writing the log-prior L( ) q as a sum of three terms, one for each constraint such that We will now discuss each of these terms individually.

Static electron pressure prior
The prior on the static electron pressure for each vertex is uniform if the pressure is less than the chosen limit P e max , and Gaussian for values above the limit. The resulting static electron pressure log-prior is The value of σ prs can be thought of as a 'fractional tolerance' of the limit P e max , i.e. by what fraction the limit may be violated before the prior probability drops significantly. Based on the SOLPS data we set =´-

Neutral fraction prior
The upper limit on the neutral fraction at each vertex f max i is set as a function of the temperature at each vertex such that where c=0.04 and l=5 eV. In figure 4 this limit is shown to be greater than 99.5% of neutral fractions in the SOLPS dataset. The neutral fraction prior has the same form as that used for the static electron pressure in (13)  where σ frc =0.1.

Spatial smoothness prior
We want our prior to favour solutions where the plasma fields are spatially smooth. This requires us to choose a metric for the overall 'roughness' of the fields, so that solutions which are too rough can be penalised by assigning them a lower prior probability. Suppose v is a vector of field values at each mesh vertex, and define the 'umbrella' matrix operator U such that The product Uv is then a vector of differences between the field value at each vertex and the average field value of all vertices to which it is connected. For a purely equilateral mesh, if the value of a vertex and all its neighbours lie in a plane, then this difference will be exactly zero. In this sense the umbrella operator measures how much the field deviates from a plane in the local region of each vertex. We therefore take the sum of the squares of the umbrella differences, | | Uv 2 , as our metric for the total 'roughness' of a field.
It is helpful to consider whether the fields can be transformed such that the expected solutions for the transformed fields better satisfy the assumption of smoothness. Enforcing smoothness on these transformed fields means that real features of the fields, which we want to preserve, are less likely to be penalised by the smoothing prior.
For this reason we enforce spatial smoothness on the natural log of the plasma fields, rather than the fields themselves. Let˜˜T n n , , e e 0 represent the vectors of log-temperature, log-density and log-neutral density at each vertex of the mesh, and define =  S U U. The roughness of one of the log-fields, for example the log-temperature, can now be written as |˜|˜= Unlike the priors on the static electron pressure and neutral fraction, which effectively set upper limits on those quantities, the smoothness prior has a strong impact on the entire posterior distribution. Consequently, additional work is required to select an appropriate value for s smth -this is discussed further in section 7.

Bounds on field values
Upper and lower bounds are placed on the electron temperature, density and neutral density at every vertex. These bounds, chosen based on the SOLPS data, are given in table 1. The bounds could be imposed by including an additional term in the definition of the log-prior in (12), but in practice it is easier to allow the bounds to be enforced by the optimisation or sampling algorithm which is being used to characterise the posterior distribution.

SOLPS test-cases
In order to test the system we require synthetic data for each instrument, and that this data is as representative as possible of the real experimental data which will be measured during MAST-U operation. For this purpose we use results from SOLPS simulations of the MAST-U edge and divertor to prescribe the fields of electron temperature, density and neutral density from which the synthetic data will be derived.
Here we consider two SOLPS cases taken from a scan of the nitrogen seeding rate to detachment. Both cases have the same magnetic equilibrium, 2.5 MW of heating power and a deuterium fuelling rate of´-2 10 s 21 1 . The two cases, which we will from now refer to as the low-and high-seeding cases, have nitrogen seeding rates into the divertor of 2×10 20 s −1 and 5×10 20 s −1 respectively. These two cases are not part of the set used to inform the prior constraints discussed in section 4, however their field values lie well inside the limits set by the chosen prior.
Note that although in the SOLPS data itself the electron density and hydrogen ion density fields maybe be different due to the presence of the seeded nitrogen, we set them to be equal when producing synthetic data, as this equality is assumed in the emission model in (8).
The field values on the SOLPS grid are interpolated on to the triangular mesh prior to producing the synthetic data, such that the resulting mesh representation of the fields becomes a test-case which we will attempt to reconstruct. The meshrepresentations of the plasma fields for each of the two testcases are shown in figure 5.

Addition of simulated noise to synthetic data
After synthetic measurements for each instrument are generated using their respective forward-models, simulated noise is added to the data. For the filtered camera images, the variance of the noise added to each pixel is set according to (11)

Characterising the posterior distribution
Now that all terms in (6) which have a dependence on θ have been defined, the posterior log-probability can be evaluated for any chosen set of plasma fields. The posterior must now be characterised in a way which allows us to extract useful information about the plasma fields.

Maximum a posteriori estimation
The first stage of characterising the posterior is to find the set of model parameters which maximises its value, referred to as 'maximum a posteriori' (MAP) estimation [14]. To locate this maximum, we employ a 'hybrid' approach which combines both stochastic and gradient-based optimisation-a more detailed description of this approach is given in the appendix.

Hamiltonian Monte-Carlo sampling
Although MAP estimation yields a useful single-value estimate of the model parameters, it does not provide any information regarding the uncertainties associated with that estimate. To characterise these uncertainties we employ 'Hamiltonian Monte-Carlo' (HMC), a gradient-based sampling algorithm which is The value of the smoothing prior uncertainty σ smth , which appears in (17), can have a strong impact on the posterior distribution and therefore the MAP estimate. To assess this impact we evaluated the mean absolute difference between the MAP estimate and the low-seeding case at all vertices for a range of values of σ smth -the results of this scan are shown in figure 6. The minima in the error for each field are fairly broad, but do not all occur at the same value of σ smth . The results presented here used a value of σ smth =0.2 which provides a good balance between low error in electron temperature and electron density. In an applied case using experimental data we cannot select the smoothing uncertainty in this fashion as the true values of the fields are unknown. As such, testing selection criteria for the smoothing uncertainty which are applicable to experimental data will be the subject of further work.

Comparison of inferred fields and test-cases
The maximum a posteriori (MAP) and marginal expectation (MEX) estimates were evaluated as described in section 6 for both test-cases. The inferred field values from the MEX estimate are compared with those from the corresponding testcase in figure 7. The mean-absolute-difference between the inferred and test-case fields is used to quantify the accuracy of the estimates. The mean-absolute-difference for the electron temperature is given by ,test are the inferred and test-case electron temperature at vertex k respectively. The mean-absolute-differences for each test-case are given in table 2 for both the MAP and MEX estimates. The marginal expectation appears to outperform the MAP estimate for these cases, but with the exception of the high-seeding neutral density estimation, the differences in the mean absolute error values are less than 10%.
We note that the estimate of the electron temperature becomes less reliable above ∼10 eV-this may occur because at these higher temperatures the emission is almost purely due to excitation, which is very insensitive to electron temperature above 10 eV for the Balmer series.
Conversely, in regions where the temperature is very low, the emission becomes dominated by recombination, which has no dependence on the neutral density. We suspect this is the cause of the large errors in the neutral density estimation for the high-seeding test-case, which is more strongly detached than the low-seeding case, and therefore has a large region of recombination-dominated emission.
The absolute fractional error | | ( ) ,test is another useful metric for gauging the overall accuracy of the inferred fields. Averaged over both the low and high-seeding testcases, the median absolute fractional errors in the electron temperature and density estimates were 10.3% and 10.1% respectively. We use the median rather than the mean in this case as it is more robust against large outliers that can occur when field values get very small.
The inferred fields for the low-seeding case, along with the differences between the inferred fields and the test-case are shown in figure 8. These difference plots highlight spatial structure in the estimation errors, such as the under-estimation of the temperature along the separatrix. The peak in the electron temperature at the separatrix is a very sharp feature which will be penalised by the spatial smoothing prior. This, combined with the relatively weak temperature dependence of the emission in that region, is likely the reason for the underestimate of the separatrix temperature. This highlights a common difficulty of regularising solutions which possess a wide range of spatial scale-lengths-any level of smoothing which suppresses non-physical fluctuations in regions with a long scale-length will also over-smooth in regions with shortscale lengths.
Tests were also carried out wherein only measurements from a single filtered camera were used to constrain the plasma fields in order to verify the effects of a multi-diagnostic approach. The MAP estimates of the fields obtained in these tests were completely erroneous, and posterior sampling predicted very large uncertainties in the solution. Although imaging data for a single emission line does provide some information about T e , n e and n 0 , it is not a strong enough constraint to estimate them with useful accuracy. This is because the set of plasma fields which reproduces the measured image to within experimental error is too large and too varied. Inclusion of additional filtered images of different Balmer lines improves the estimate significantly because the size of this set of potential solutions is greatly reduced, as now a valid solution must reproduce all of the images simultaneously rather than just one.

Uncertainty estimation
Uncertainties in the inferred fields for both test-cases were estimated by sampling from the posterior distribution using Hamiltonian Monte-Carlo. Figure 9 shows a comparison of the test-cases and inferred fields along a scrape-off layer flux-surface (shown in figure 2), and shows the 95% highest-density interval derived from the sample. We see that the differences between the testcase values and the inferred fields are well explained by the estimated uncertainties almost everywhere. One notable exception is that the uncertainty in the electron temperature and density appears to be under-estimated close to the target.
For an inverse problem of this type the posterior is typically highly multi-modal. It is possible that the Markovchains used to generate the sample were trapped near the maxima corresponding to the MAP estimate, and were unable to explore other maxima which may feature more varied configurations of the fields near the target. To investigate this we plan to test extensions to standard Markov-chain Monte-Carlo which are designed specifically to allow exploration of multi-modal distributions such as parallel tempering [16].

Inference of physical processes
The long-term goal of developing this analysis is to help advance our understanding of divertor physics by providing direct information about the 2D divertor plasma state. We are therefore interested not only in plasma fields like T e and n e , but also the behaviour of physical processes like ionisation and recombination which are partially responsible for determining the plasma state.
If these processes can be modelled using only the inferred plasma fields, then their values and uncertainties can also be inferred from the posterior sample. An example of this is shown in figure 10, where the deuterium ionisation rate along a scrape-off layer flux-surface was calculated from the posterior sample, and is compared with the corresponding ionisation calculated from the test-cases.

Potential improvements to instrument modelling
All synthetic diagnostic models are 'idealised' to some extent, as they cannot reasonably capture every subtlety of the experimental set-up perfectly. Our goal however should be to make these models more realistic where possible, and this will be the focus of further work on the system before it is applied to real experimental data.
For example, uncertainty in the absolute brightness calibration of filtered cameras is a potentially important effect for which we do not currently account. This can be achieved by including the calibrations as so-called 'nuisance parameters'. This process involves allowing the calibration values   (18) for T e , n e and n 0 in each test-case, and for both the MAP and MEX estimates.
Low-seeding MAP themselves to be free parameters in the system, with a prior distribution determined by the measurement of those calibration values. By allowing the calibration values to vary in this way, effects of the uncertainty in their value are reflected in the inferred plasma fields. Some of the light collected by the MWI system will have reached the camera after being reflected by a material surface in the divertor. The algorithm used here to calculate the Geometry matrix G, which appears in (9), only accounts for light which has travelled directly from the plasma to the camera. It is however possible to account for reflections from material surfaces by using a more sophisticated approach to calculating the geometry matrix [17], with an associated increase in the computational cost of the filtered camera forward-model.
It may be the case that in practice, the Langmuir probes are unable to measure the electron temperature with an uncertainty comparable to that which we assume when generating synthetic data when the temperature drops below 5 eV. In such cases, the probes may only provide an 'upper limit' measurement on the temperature. Accounting for this will require forward-modelling to produce synthetic probe data, which can be analysed to calculate joint-distributions of T e , n e to be used in place of the assumed Gaussian errors.
In this work the SOLPS test-cases were treated as if no impurities were present. However, in real experiments with strong impurity seeding, the presence of impurities can affect measured diagnostic signals. Consequently, modifications to the diagnostic forward-models may be required to properly account for these effects. For example, the emission model in (8) assumes that the hydrogen ion and electron densities are equal, which may no longer be reasonable with strong seeding. Although the effect is expected to be small, impurities can also impact Langmuir probe measurements [18].

Inclusion of additional diagnostic systems
The MAST-U divertor spectroscopy system could be a useful additional source of information for inferring the plasma fields. The system will observe a large number of spectral lines, including many from various impurities, so modelling all data produced by the spectrometers is not feasible. However, if we restrict the analysis to spectral lines which are already being viewed by the MWI system, then this data can be modelled without greatly increasing the number of model parameters. The brightness of these lines as measured by the spectrometers would provide a cross-check on the brightnesses measured by the MWI system, and it may also be possible to constrain the electron density along the spectrometer line-of-sight using information encoded in the spectral line-shape due to Stark-broadening [19].

Choice of imaged spectral lines
The emissivity model in (8) does not account for emission resulting from the production of excited deuterium atoms due to plasma-molecule interactions. This emission may be a nonnegligible component of low-n Balmer series emissivities, particularly for deuterium Balmer-α, in strongly detached conditions [19,20]. Deuterium-α through δ were chosen as a starting point from which to develop and test the system, but there are many possible choices of atomic lines, including higher-n Balmer lines and impurity emission lines. Determining the optimal group of lines for inferring the plasma fields is complex-one needs to consider not only the information content of the lines with respect to the plasma fields, but also how well those lines can be measured (considering their brightness, wavelength and contamination from neighbouring spectral lines), how accurately their emissivity can be modelled, and the total number of plasma fields required in order to model them. Testing alternative groups of atomic lines which best meet these criteria will be an important part of the ongoing development of the system.

Imposing physics constraints using equilibrium information
The spatial structure of the T e and n e fields in the divertor is closely tied to that of the magnetic field and the resulting fluxsurfaces. If the mesh used to parametrise the plasma fields were constructed such that every vertex of the mesh lay on one of a chosen set of flux-surfaces, this would allow additional physics constraints to be imposed.
For example, we could include an additional term in the prior distribution which requires that the electron temperature decrease monotonically along each flux surface when approaching the strike-point. It would also allow for a more powerful constraint on spatial smoothness of the fields, as we could require that the fields vary much more smoothly along flux-surfaces than perpendicular to them.
There are some disadvantages to this approach however. The equilibrium reconstruction required to build such a grid is itself an inverse problem whose solution is uncertain. Given equilibrium reconstructions which include uncertainty estimates (which are typically unavailable), propagating this uncertainty through the system so that it is reflected in the uncertainties on the plasma fields would be difficult, as the grid remains fixed throughout the analysis.
Imposing physics constraints in this way, and the effect this has on the accuracy of the inferred 2D plasma fields will be explored as part of further work.

Toroidal asymmetry effects
The major radius of tile 5 in the MAST-U divertor (visible in figure 2) varies as a function of toroidal angle, with a periodicity that matches the toroidal-field ripple of the device in order to distribute power more evenly across the tile surface.
Cases in which there is strong emission close to the surface of tile 5 may introduce 3D effects into filtered camera measurements which cannot be reproduced by the 2D model we use here for the plasma fields. The significance of this effect could be explored theoretically using 3D simulations of the MAST-U divertor which account for the toroidal asymmetry of tile 5.

Summary and conclusions
We have presented details of the first design, implementation and testing of a Bayesian multi-diagnostic inference system, which can infer the 2D fields of electron temperature, density and neutral density over the divertor cross-section. The system has been designed to be modular and flexible, so that the diagnostics utilised by the system, and the underlying fields that are to be obtained by the analysis, can be changed easily.
For this initial test of the system we restricted the inferred plasma fields to only electron temperature, electron density and deuterium neutral density. These fields were inferred using simulated experimental data, which included appropriate added noise, derived from two SOLPS-ITER simulations taken from a scan of the nitrogen seeding rate. The synthetic diagnostic models used to generate the simulated data included four filtered cameras as part of the MWI system viewing the first four Balmer lines, as well as divertor Thomson-scattering system and target Langmuir probes.
These system tests have demonstrated that for the given synthetic data, the 2D plasma fields can be inferred with enough accuracy to give powerful insight into the physics of plasma behaviour in the divertor. It was also demonstrated that uncertainties in the inferred plasma fields can be reliably estimated using Hamiltonian Monte-Carlo sampling, which would allow conclusions to be drawn from the results with greater confidence.
This first effort at Integrated data analysis for the divertor has thus been successful in demonstrating that the use of a Bayesian, multi-diagnostic approach to infer the plasma solution merits further investigation. Future work will focus on the inclusion of additional diagnostic systems, and the application of this analysis to real diagnostic data from tokamak experiments. 2019-20 under grant agreement no 633 053. The views and opinions expressed herein do not necessarily reflect those of the European Commission. This work has also received funding from the EPSRC under the grant EP/N023846/1 and the research by B Lipschultz was funded in part by the Wolfson Foundation and UK Royal Society through a Royal Society Wolfson Research Merit Award as well as by the RCUK Energy Programme (EPSRC grant number EP/I501045).

Appendix. Optimisation methodology
The line-integrated nature of filtered camera measurements, and the nonlinear relationship between line emissivities and the plasma fields introduces strong nonlinear correlations between the model parameters. The presence of correlations and a large number of free parameters (around 1800 in this case) usually necessitates the use of gradient-based optimisation and sampling techniques.
Such techniques require the derivative of the log-posterior with respect to the model parameters. However approximating this derivative via finite-difference is prohibitively expensive as the number of model parameters is large. The system was therefore designed such that the gradient of the log-posterior can be calculated analytically. As a result, evaluating the gradient takes around 3 times longer than evaluating the log-posterior itself-this is approximately 600 times faster than evaluating the gradient using finite-difference for the current number of model parameters.
However, we also found the posterior distribution to be highly multi-modal, which causes issues for gradient-based optimisation algorithms which tend to converge to local rather than global maxima.
To address this we employ a 'hybrid' approach which combines a genetic algorithm with the L-BFGS algorithm [21]. In this approach a set of candidate solutions is created (initially by random sampling), and then each candidate is used as a starting-guess for the L-BFGS algorithm, which convergences to a (typically local) maximum in the posterior log-probability density. Based on the resulting set of local maxima, a new set of candidate solutions is generated using the genetic algorithm, and this process is repeated until the highest observed log-probability converges.
Evaluating the L-BFGS algorithm for each candidate solution is an independent computation, allowing them to be efficiently distributed across multiple CPUs. The results presented here used a population of 20 candidate solutions distributed over 20 threads of a Intel Xeon E5-2695 v3. The maximum log-probability had converged sufficiently after 80 generations, taking around 80 min in total.