Model-based electron density profile estimation and control, applied to ITER

In contemporary magnetic confinement devices, the density distribution is sensed with interferometers and actuated with feedback controlled gas injection and open-loop pellet injection. This is at variance with the density control for ITER and DEMO, that will depend mainly on pellet injection as an actuator in feed-back control. This paper presents recent developments in state estimation and control of the electron density profile for ITER using relevant sensors and actuators. As a first step, Thomson scattering is included in an existing dynamic state observer. Second, model predictive control is developed as a strategy to regulate the density profile while avoiding limits associated with the total density (Greenwald limit) or gradients in the density distribution (e.g. neo-classical impurity transport). Simulations show that high quality density profile estimation can be achieved with Thomson Scattering and that the controller is capable of regulating the distribution as desired.


Introduction
Future reactors, such as ITER and DEMO, aim to have a net energy gain [1]. This requires these reactors to operate close to physical limits and to optimize quantities such as temperature, density, and confinement.
In practice, shot-to-shot differences in reactor conditions will always be present and the optimal control action will vary subject to the plasma scenario and plasma state. Hence, active plasma feedback control is essential in achieving these requirements [2].
In a tokamak, the produced fusion power directly correlates to the density in the hot core of the reactor [3], particle transport and non-inductive current drive depend on the gradient of the spatial particle distribution [4,5], a.k.a the (particle) density profile.
Furthermore, turbulence and impurity transport depend on the logarithmic gradient of the density profile [6][7][8] and the density is subject to limits that can lead to detrimental plasma instabilities when violated [9][10][11]. Consequently, for optimal and reliable high-performance operation of fusion reactors, particle density profile estimation and control are mandatory [12][13][14][15].
To synthesize such controllers, model-based control design is required as experimental time for tuning and validation on reactors such as ITER will be scarce and expensive [16,17]. Model-based design has already successfully been applied to synthesize controllers for plasma shape [18] and profiles in a tokamak [19][20][21][22]. Especially, in [23], a controller for the current density profile is presented that can handle plasma limits and actuator constraints simultaneously. The design procedure requires a control-oriented model of the inputoutput dynamics, i.e., a model that captures the main dynamic relationships between inputs (actuators) and outputs (plasma quantities) to be controlled.
Recently, the control-oriented model RAPDENS was developed for the particle transport dynamics in a tokamak [24]. This model has been used to perform density profile estimation in TCV and AUG with interferometry and spectroscopy [24,25], to design a controller for the volume-averaged density during the ITER ramp-up with pellets and gas [26], to estimate the core density in density control experiments with pellets in AUG [27], and to derive a controller for the volume-average density in TCV using gas injection [25].
However, a controller for the particle density distribution,: a) that is capable of handling shot-toshot differences; b) can handle state physical limits; c) and uses multiple actuators is one key research and development aspect still to be solved for a fusion reactor.
In this work, a two-step model-based approach is proposed to develop such a density profile controller for ITER. The first is the estimation (or reconstruction) of the particle density profile with an observer and relevant sensors.
The second is the design and validation of a model-predictive controller for the density profile using two pellet injectors.
For the first step, RAPDENS [24] is applied to ITER and extended with a Thomson scattering (TS) output model. Subsequently, the model and synthetic TS measurements are used together in an extended version of the dynamic state observer (DSO) proposed in [24] to reconstruct the density profile. Simulations of the observer are used to determine the quality of the achieved profile estimation for an H-mode, 15 MA, 5.3 T, DT ITER baseline discharge.
For the second step, model-predictive control (MPC) is applied to synthesize a controller for the density distribution in ITER using pellet injection as main actuator. For the first time, constraints on the density profile due to impurity transport are taken into consideration in the control problem. Simulations using the lightweight control-oriented plasma simulator RAPDENS are used to assess the effectiveness of the controller.
The remainder of this paper is structured as follows. Section 2 briefly summarizes the controloriented model used for design and simulation. The forward TS model and the DSO are presented in Section 3. The density reconstruction results are presented in Section 4.
The controller design methodology is described in Section 5 and the results of control simulations are shown in Section 6. This paper is concluded by a discussion of the work in Section 7.

Control-oriented plasma simulator
In this work, we have used and extended the RAP-DENS model (originally proposed in [24]) for the DSO and the synthesis of the MPC controller. A summary of the model is given here.
The RAPDENS model is a lightweight, controloriented model of the electron particle transport in a tokamak [24]. It is a multi-inventory model in which the particles inside the vessel are attributed to one of three inventories: the plasma, the wall, or the vacuum. Note that the vacuum denotes the region surrounding the plasma that is filled with neutrals. However, the name "vacuum" is kept for continuity and conherence with previous papers on RAPDENS. The model consists of a 1D-PDE for the evolution of the flux-surface averaged electron density and two OD-ODE's for the evolution of the wall and neutral vacuum inventories. Radial particle transport is modeled with a drift-diffusion model with ad-hoc chosen transport coefficients. The processes that drive the particle exchange between these inventories are modeled in a heuristic fashion. An overview is given in figure  1. Additionally, it contains representations of the fueling with neutral beam, pellet and gas injection. A summary of the model equations is given is Appendix A. The full details and derivations of the model equations are outside the scope of this paper and can be found in [24].  Figure 1. Overview of the modeled processes in the RAPDENS model. The particles inside the tokamak are attributed either to the plasma, the wall or the vacuum. Green arrows represent the modeled particle fluxes between the inventories. Orange arrows represent particle fluxes crossing the system boundaries. Adapted from [28] . For the applications reported in this paper, RAPDENS is used to formulate a system of nonlinear discrete-time ODEs that describes the evolution of the plasma density due to relevant actuator inputs. The internal variables of the model (states), the external parameters, the considered actuators inputs, and the system of ODEs are briefly discussed next.
States: In RAPDENS, the flux-surface averaged electron density distribution n e (ρ, t) is discretized in space using finite elements (see e.g. [29]), i.e, the electron density is approximated by where ρ is the normalized toroidal flux. The basis function Λ α : [0, ρ e ] → [0, 1], α = 1, 2, . . . , m are defined as cubic B-splines with finite support [30] (ρ e > 1 being the artificially prolonged flux label to encompass the scrape-off layer. This is discussed in more details in [24]). The variables b α (t) ∈ R m are time dependent spline coefficients. These coefficients together with the time dependent wall and vacuum inventories, respectively denoted N w (t) and N v (t), comprise the state vector x(t) ∈ R nx with n x = m + 2, i.e., the state vector is given by External parameters: RAPDENS requires external parameters to be provided in an external parameter vector p(t) ∈ P defined as where c D indicates limited (c D = 0) or diverted (c D = 1) plasma, c H implies low confinement (c H = 0) or high confinement (c H = 1) regime, T e,b is the electron temperature at the last-closed flux surface (LCFS), I p is the plasma current, V is the plasma volume, V = dV /dt, and κ is the plasma elongation, ψ axis and ψ LCF S are the equilibrium ψ(R, Z) evaluated at the magnetic axis and LCFS, G 1 = (∇ρ) 2 ) and G 0 = |∇ρ| with |∇ρ| = |∇ψ| ( ∂ψ ∂ρ ) −1 (where · denote the flux-surface average).
Inputs: The particle fueling rate by pellet injection Γ i pellet (t) ∀i = 1, . . . , n u , where n u is the number of used pellet injectors, comprise the inputs u(t) ∈ R nu to the model ‡ . The input vector is defined as Nonlinear system of ODEs: The spatial discretization of the density profile (1) is inserted in (A.1)-(A.5) and discretized in time using an equidistant temporal discretization t k = t 0 + kT s to form the nonlinear set of ODEs where f : P ×R nx ×R nu → R nx is a nonlinear function § of the state that represents the state evolution due to physical phenomena and the influence of the actuators. This formulation is the foundation of the work presented in the coming sections. In sections 3 and 4, it is used to perform state estimation and in sections 5 and 6, it is used to derive and test a density profile controller.

Dynamic state observer for the density profile
A requirement to perform profile control is the estimation of the profile from available measurements. For this purpose, a dynamic state observer (DSO) was proposed in [24,25]. In this section, we summarize the DSO and we present a forward diagnostic model for TS that will allow us to perform profile estimations using TS measurements.
Notation 1: Let a generic variable e(t) be a physical quantity of a real system. The estimate of this quantity made by a DSO at time t = t k1 with diagnostic signals from time t = t k2 is denoted asê k1|k2 .

Forward Thomson scattering model
The DSO is comprised of a prediction model, a diagnostic model, and an observer gain (see figure  4). The original observer, developed in [24,25], included diagnostic models for interferometry and bremmstrahlung but not for TS. Here, we discuss the inclusion of a TS model that will allow us to perform density estimation with TS measurements.
TS systems are used in tokamaks to measure the radially resolved distributions of electron temperature and density [31]. In combination with knowledge of the 2D equilibrium, the spatial measurements n e (R, Z, t) can be transformed into equilibrium measurements of the density n e (ρ, t) at known equilibrium locationsρ i with i = 1, . . . , n T S . n T S denotes the total number of radial measurement locations. The spatial resolution of the ITER core TS system is 67 mm [32], which, in a typical ITER plasma with a radius of r p ≈ 2 m, provides ≈ 29 measurement locations distributed over the high and low field side. In this work, to be conservative, n y = n T S = 11 is chosen.
The TS output vector y ∈ R n T S should contain the plasma density evaluated at the corresponding flux § Note that for the modeling is this paper, the function f is linear with respect to u. Hence, (5) can be rewritten as labelsρ i , i.e., y(t) = n e (ρ, t) ρ1 · · · n e (ρ, t) The relation between (2) and (6) is obtained by evaluating the basis functions used for spatial discretization in (1) at ρ =ρ i ∀ i = 1, . . . , n T S . Inserting the spatial discretization (1) in (6) results in the numerical expression of the output vector as function of the b-spline coefficients: where Λ(ρ)|ρi are the basis function evaluated at the corresponding radial locations. By defining Λ TS ∈ R n T S ×m the matrix containing the basis function evaluated at all the n T S radial locations, we can formulate an output mapping C(p(t)) = Λ T S 0 n T S ×2 such that the forward outputs are given by The output mapping C changes as function of the equilibrium, hence, it depends on the parameter vector p(t) introduced in a previous paragraph. For the simulations presented in this work however, it is assumed for simplicity that the measured locations are constant.

Working principles of the observer
A dynamic state observer (DSO) is an algorithm that uses measurements to constrain prediction made by a dynamic model to iteratively estimate the internal state of a system. A brief overview of the working principles of the observer is given here, more details can be found in Appendix B.
In the case of the density distribution, we are dealing with a nonlinear system. Hence, we are using the extended Kalman filter (EKF) framework [33] (see Appendix B.2) to estimate the density. The algorithm works as follows. At each time step, the latest measurementsỹ k are compared with the predicted measurementsŷ k|k−1 to form the residual z k (B.2). The residual is multiplied by the Kalman gain (B.4) and used to updated the predicted statex k|k−1 to obtain the state estimatex k|k . Finally, the predicted statex k+1|k and measurementŷ k+1|k at the next time step are computed using RAPDENS (5) and the forward diagnostic model (8).
The performance, stability, and robustness of the observer depends on the choice of the measurement covariance R k , the process covariance Q x k , and the disturbance covariance Q ζ k . This is detailed in section 4.2.
In this section, we included a forward TS model in a DSO. With this addition, TS measurements can be used to perform estimations of the density profile. In the next section, the performances of the observer are assessed for ITER.

Density profile estimation in ITER baseline simulations using Thomson scattering measurements
In this section, density estimation simulations are discussed where we demonstrate the effectiveness of using TS measurements to update the state predictions in the DSO. First, we introduce the simulated plasma scenario and argue the choice of the covariance matrices, i.e. the tuning knobs of an EKF. Subsequently, we discuss the simulation set-up and present the observer's performance in a simulation with plant-observer model mismatch and realistic noise levels.

Plasma scenario
The observer is used to estimate the density profile with TS measurements for an ITER H-mode, 15 MA, -5.3 T, baseline discharge. ITER shot 134173 is chosen as reference scenario as it is up to date the most complete prediction of an ITER baseline discharge available.
The scenario is obtained with an integrated model (IM) simulation [34]. The simulation was performed as an open-loop coupling within IMAS [35] of the free boundary equilibrium code DINA [36][37][38] with the JINTRAC [39] suite of codes. The baseline scenario was assessed from the formation of the X-point to the transition from X-point to limiter. The time evolution of relevant plasma parameters are given in figure 2.(a)-(c).
In the reference scenario, the ramp-up is modeled to last for 75 s with the L-H transition being triggered by the increase in auxiliary power. The flat-top is modeled to last from t = 75 s to t = 730 s. During that time the particle source is pellet injection with a pellet composition of 50% deuterium and 50% tritium atoms. It is represented as a continuous source of 2.3×10 22 atoms/s deposited at ρ = 0.85 with Gaussian deposition profile with normalized deposition width of 0.15.

Design of covariance matrices
The accuracy, convergence speed, and robustness of the state estimation depends on the choice of the  covariance matrices R k , Q x k , and Q ζ k [33]. The choice of these matrices is discussed here.
The measurement covariance matrix R k is chosen as a diagonal matrix with the square root of the modeled noise covariance on the diagonal (the modeled noise is discussed in section 4.3). Note that this assumes that each TS measurement location is an uncorrelated individual output.
The process covariance matrix Q x k is constructed as a symmetric Toeplitz matrix with descending first row. The values on the first row of the Toeplitz matrices determine the spatial correlation between the estimates and is chosen as exponentially decaying. The disturbance covariance matrix Q ζ k is constructed as a diagonal matrix.
The magnitude of the entries of Q x k and Q ζ k reflect the amount of trust in the model predictions and influence the Kalman gain (B.4) [33]. A low observer gain reflects small trust in measurements and high A Toeplitz matrix is a matrix of which the values on the diagonals are constant. trust in the model predictions. Hence, the EKF will react slowly to the differences between measurements and model predictions. A low gain avoids fitting the noise but relies more heavily on the model predictions, making the reconstruction more sensitive to modeling errors. On the contrary, a high Kalman gain reflects high trust in the measurements and large model uncertainty. The DSO will react fast to discrepancies between model and measurements. The reconstruction will be more robust to modeling error but sensitive to diagnostic noise and errors.
The magnitude of the covariance matrices' entries are chosen based on a preliminary analysis of RAPDENS for ITER. The heuristic parameters of RAPDENS are tuned to approximate the flat-top plasma behavior of the reference discharge scenario described in section 4.1.
From the analysis, it is concluded that the model had good predictive capabilities in flat-top but that modeling of the L-mode and L-H transition is not fully accurate. Hence, a high Kalman gain is chosen in the ramp-up and ramp-down phases and a low Kalman gain is chosen during flattop. Furthermore, it was observed that the density prediction mismatched at the edge of the plasma. Hence, the diagonal entries of Q x k corresponding to the edge of the plasma were increased (see figure 3).

Simulation set-up
Before proceeding to the estimation results, we briefly describe the set-up of the simulation used to evaluate the observer's performances. A visual depiction of the simulation set-up is given in figure 4.
The EKF is simulated for the entire discharge presented in section 4.1. The external parameters required to run RAPDENS (see sec 2), the particle inputs, the initial conditions, and the 'real' (to be estimated) density profile are taken from the database of the IM simulation.
The heuristic parameters of RAPDENS that were . Block diagram of the DSO and the set-up used for the density estimation simulations for ITER. The external parameters, the particle fueling rates, and the initial conditions are taken from the ITER database and used to simulate RAPDENS. The 'real' density distribution is used to synthesize synthetic TS measurements.
tuned to approximate the flat-top plasma behavior in the preliminary analysis are used in the estimation simulation. However, to represent inaccurate knowledge of the transport and investigate the performance of the observer with model uncertainty, systematic plantobserver model mismatch is manually introduced by changing the tuned transport coefficients and the pellet spatial deposition function. Furthermore, measurement inaccuracies and noise are included in the simulation with synthetic TS measurementsỹ k . They created by adding artificial noise to the density profile computed by the IM. While the signal-to-noise ratio (SNR) for TS depends on Poisson statistics [40] and thus scales with 1/ √ n e , we choose to model the noise as a Gaussian with expected value E[∆ T S ] = 0 and covariance σ 2 k = 0.05 * max(n e ). This way, we approximate the required accuracy of the TS system [32] and account for sources of noise such as shot noise, thermal noise in the detection circuit, noise due to plasma light, and noise due to neutron damage on optics installation [41,42]. It is important to note that Thomson scattering can also be affected by systematic errors, such as misalignment or due to inaccuracies in the equilibrium reconstruction. These are not accounted for in this work. When using "real" Thomson scattering measurements, detection and correction for these errors can be performed by integrating the Thomson scattering data with other diagnostics in the observer. This is elaborated further in Section 7.

Density profile estimation results
The simulation results are shown in figure 5. The estimation error, depicted in figure 5.b, is expressed as the normalized 2-norm of the error vector: ||ŷ k|k − n e || 2 /||n e || 2 , whereŷ k|k = C(p k )x k|k with C(p k ) defined in (8) and n e the density profile of the IM simulation. Only the part of the profile measured by the core TS system [32], i.e., 0 ≤ ρ ≤ 0.9, are used in the error computation.
Overall, good density reconstruction performances can be seen for the entire profile, i.e., the estimated densities are shown to correctly track the true densities (see figure 5.a). The normalized estimation error averages around ≈ 10% during the ramp-up and ≈ 2% for the flat-top (see figure 5.b).
During the ramp-up and the L-H transition, from t = 0 s to t = 75 s the effect of the high Kalman gain can clearly be distinguished. During that time, the artificial measurement noise is visible in the estimation, as can be seen in figure 5.b. Decreasing the Kalman gain during this phase would reduce the influence of noise but increase the overall estimation error as the model is not accurate for the early stages of the discharge.
An increase in the estimation error can be seen between t = 75 s and t = 90 s. At t = 75 s, the plasma enters H-mode. As was discussed in section 4.2, the observer gain is changed from a high to a low Kalman gain at that time. This means that the observer relies more on the model predictions. The modeled density increase in early H-mode is steeper in RAPDENS, hence, the density profile is overestimated (see figure  5.d). This could be solved by slowly transitioning from a high to a low observer gain once the plasma enters H-mode. After t = 90 s, the estimation error decreases rapidly. A good agreement can be seen between the estimated and real density profile for the flat-top (see figure 5.e).
Note that the transport dynamics change after the sudden decrease in the density at t = 634 s. This change in dynamics is not modeled in RAPDENS and results in an increase of the estimation error to ≈ 5%.
This section shows that using TS measurements to update the predictions made by the RAPDENS model in an EKF results in accurate density profile estimations. Furthermore, the quality of the estimation is deemed sufficient to be used for profile control, especially during the flat-top.

Design of an MPC controller
In this section, the design of an MPC controller for the density distribution is presented. We first introduce the concepts of MPC and motivate the use of this strategy for the control of the density distribution. Subsequently, we discuss in detail each component of the controller.

Model predictive control
Model-predictive control (MPC) is widely adopted as an effective control strategy to deal with multivariate constrained control problems [43,44]. A block diagram of an MPC controller is given in figure 6.a.
In such a controller, an explicit model of the to be controlled system is used to predict future states/outputs over a prediction horizon (denoted N ∈ N). These predictions are used to formulate an optimal control problem where the future tracking error, i.e. the difference between the desired and predicted future outputs is minimized. Limits on the output, states, and inputs can be taken into account as constraints in the optimal control problem. This is illustrated for a discrete-time, single-state single-input system in figure  6.b.
The design of an MPC controller consists in: choosing a prediction model, formulating a cost function that encompasses the control objectives, formulating constraint functions that translate physical limits of the plasma and actuators as state and input constraints, and implementing and solving the optimal control problem in a numerical optimization solver. These steps are discussed in the coming sub-sections.

Control objectives and motivation for MPC
We aim to synthesize a controller that satisfies the following control objectives: (i) Track a high-performance reference density profile in a region of interest (ROI).
(ii) Guaranteeing that the Greenwald density limit is not being violated.
(iii) Optimizing impurity transport in the ROI.
Note that these objectives are generic, in section 6.1 the objectives are refined for the specific control simulations.
To control the density profile, the controller will have access to multiple actuators, e.g., multiple pellet injectors [45], and it will have access to the reconstructed density profile. The control of the density profile is thus a multivariate control problem. Additionally, the Greenwald density and the optimization of impurity transport can be formulated as state and inputs constraints. Hence, the control problem at hand can be classified as multivariate and subject to state and actutator constraints, which is exactly the type of problem for which MPC is suited.

Prediction model
The first component of a MPC controller is an explicit model of the systems dynamics. In this work, we use a linear disturbance-augmented model derived from RAPDENS for this purpose.
The prediction model is derived as follows. First, the nonlinear state-space (5) is linearized around the desired flat-top conditions {x f t , p f t } to obtain a linear approximation of the flat-top transport dynamics. Next, as the MPC controller runs in discrete-time, the continuous time model is discretized temporally by exact discretization using an equidistant temporal discretization t k = t 0 + kT s , with T s = 3 ms. This results in a discrete-time linear state-space given by Next, to incorporate integral action and enable the controller to compensate for steady-state offsets induced by model mismatch and slow changing disturbances, a disturbance model as proposed in [46,47] is used. To do so, (9) is augmented to such that it includes the disturbance d(t) ∈ R n d =ny with disturbance model matrices B d and C d . These matrices are chosen to guarantee the observability of the augmented system (10) [47]. The first condition for the augmented system to be observable is that the non-augmented system is observable. The local linear model (9) (with state vector (2)) contains two unobservable states. These are the particle inventories of the wall N w and vacuum N v . Hence, for the controller we use the minimal realization of input-output dynamics. Next, we choose B d = I nx,n d and C d = I n d ,n d , such that guaranteeing that the augmented prediction model is observable [47].
Notation 2: For a system with state variable x and input variable u, we distinguish between the system's state at time t = t k denoted x(t k ) and the predicted state at time t = t k + i denoted x i,k . Analogously, u(t k ) denotes the input to the system at time t = t k and u i,k denotes the input that would be applied at time t = t k + i.
Using notation 2, the discrete-time linear disturbance augmented state-space prediction model is then given by where x i,k , d i,k , y i,k , and u i,k denoted the predicted state, disturbance, output and control input at i ∈ N time steps ahead of the starting time t = t k . By introducing, for a given prediction horizon N , the stacked notations the compact formulation of the entire prediction sequence is given by the stacked prediction matrices A x , B u , and B d are derived in Appendix C.1.
The prediction model allows us to relate an unknown input sequence U k to the current state x(t k ) and current disturbance d(t k ). In the coming sections, an optimal control problem is formulated using this model that can be used to compute an optimal input sequence.

Disturbance estimation
The disturbance d(t k ) introduced in (10) is estimated at each time instance using a Kalman filter (KF).
The KF equations can be found in Appendix B. The performance and stability of the observer (and partially that of the controller) are determined by the choice of the measurements covariance R and the disturbance covariance Q d . They have been tuned manually and chosen as diagonal matrices with R = 5 × 10 −3 I ny and Q d = 1 × 10 −6 I n d With this design, the estimate of the disturbances converges in ≈ 2 s. After convergence, the prediction model is capable of dealing with systematic and slow-varying plant-model mismatches.

State and actuator constraints
Next, the construction of the constraints functions is explained.
The constraint functions are used to constraint the future states and inputs in the optimization problem.
Three different type of constraints are imposed on the system in this work: linear input constraints to represent the minimal and maximal fueling rates of the pellet injectors, linear state constraints to represent the Greenwald density limit, and nonlinear state and input constraints to optimize impurity transport.
First, linear constraints are defined to represent actuator limits and plasma density limit. As shown in Appendix C.2, these constraints can be formulated as linear inequality constraints on the input sequence U k as Second, a nonlinear constraint is imposed to the states and inputs that translate the desire to optimize impurity transport. A condition for outward neo-classical impurity transport can be derived as a favorable ratio between the logarithmic ion temperature gradient L Ti and the logarithmic density gradient L ne , i.e., L Ti /L ne η ic ≈ 1 [6]. Assuming typical peaked temperature and density profiles in ITER flat-top, the ratio can be used to formulate an inequality constraint on the logarithmic gradient of the density profile: Note that a soft constraint formulation is used to avoid infeasibility of the optimization problem [48]. In such a formulation, violation of the constraint is allowed but penalized via the parameter and the cost matrix W in the cost function (see (18)). In Appendix C.3, the constraint is formulated as a nonlinear function of the state, control input, disturbance estimate, and soft constraint parameter. The nonlinear constraint function g 1 (x k , u k , d k , ) is defined as (time dependence denoted by subscript k for readability): The linear and nonlinear constraints are used to constrain the optimization problem. With their inclusion the controller accounts for physical limits of the system (actuator and plasma) and the density profile -inward impurity transport interaction.

Cost function
The objective of the controller is to track the highperformance reference r k while avoiding aggressive control and violating the constraints. This is expressed in the cost function J k as: where for a given matrix T and vector x of appropriate dimensions, we define x 2 T x T x.x i andū i are the desired steady-state states and control inputs, i.e., the states and inputs that are to be reached for the reference to be tracked. They are computed using where (A, B, B d , C, C d ) are the matrices of (10) and H ∈ {0, 1} nz,nx is the controlled variable matrix. In (18), the stage state cost matrix W x is used to penalize deviations between the desired steadystate state and the future predicted states, up to but excluding the terminal state (x N,k ). The stage input cost matrix W u is used to weigh the inputs and avoid aggressive control actions. The terminal state cost matrix P is used to penalize the deviation of the terminal state. Finally, soft constraint violation cost matrix W is used to penalize the violation of the nonlinear state constraint (see section 5.5).
An iterative procedure is used to choose the weights. The matrix W x is chosen as a block-diagonal matrix of two matrices: one being a weighted unity, the other being a zero matrix. The dimensions of these matrices ensure that only the states that parametrize the ROI are penalized. The matrix W u is chosen as a weighted unity matrix, i.e., W u = 5·10 −3 I, and matrix W is chosen as positive definite diagonal matrix with W Q. Finally, P is chosen as the solution of the discrete algebraic Riccati equation By defining these weights, the cost function J is convex and quadratic in U k [49].

Optimization problem
The future inputs U k are computed by minimizing the cost function (18) subject to the constraints (15) and (17). Using the prediction model, the nonlinear online optimization is defined as: The matrices W u,u , W u,x , W u,d , and W u,r are defined in Appendix C.4. The optimization problem is nonlinear due to the presence of the impurity transport constraint (17).
Therefore, sequential quadratic programming [50] is used to solve the problem. Due to the nonlinear constraint, no guarantee on convexity (and global optimality) can be given.
Once an "optimal" solution U * k of (21) is obtained, the first n u entries of the optimal sequence are applied as control inputs to the system. In the next section, the results of control simulations are shown to test the performance of the controller.

Simulation results of closed-loop density distribution control
In this section, we present control simulations that were used to test the MPC controller. Note that it is not the objective of these simulations to provide quantitative estimates of the ITER performance or controllability of a particular scenario, but to illustrate the potential of MPC control for the density profile and help identify challenges that remain to be solved.
First, we discuss the simulation set-up and refine the control requirements.
Next, we demonstrate tracking of a high-performance density profile while minimizing inward impurity transport. This is done in simulations with plant-controller model mismatch and a continuous pellet representation.

Simulation set-up and control requirements
The considered actuators to perform control are the pellet injection systems that have their injection location situated at the high field side (HFS). During ITER baseline operation, two HFS PIS are planned with the possibility to expand to four [45]. The control inputs (signals changed by the controller) u(t k ) are chosen as the fueling rates of the PIS and defined as  The deposition function are modeled as parabolic functions with normalized radius ρ p,1 = 0.7 and ρ p,2 = 0.85 and normalized widths of w p,1 = 0.3 and w p,2 = 0.15. These represent obtainable fueling profiles for 3 and 5 mm pellets in ITER [51].
The controlled variables z(t) are the density profile evaluated on an equidistant equilibrium grid for 0 ≤ ρ ≤ 0.9. Similarly, the control reference r(t) is the desired density profile evaluated on the same equilibrium grid. The controller is synthesized using this nominal configuration.
To assess the performance of the controller in simulation, the control objectives discussed in section 5.2 are named and specified as follows: (i) Error (Err) requirement: The controller can maintain the average tracking error below 5%. The tracking error is expressed in the normalized 2-norm of the error vector: can guarantee that the line-average density will not exceed the Greenwald density n GW [9] for any time t ≥ t 0 .
(iii) Transport (Trsp) requirement: The controller can optimize neoclassical impurity transport by maintaining a favourable ratio between the logarithmic gradients of the ion temperature and density profiles for the domain The MPC controller reduces the tracking error significantly while keeping a favorable ratio between the logarithmic gradient profiles hence minimizing the inward impurity transport.

Tracking of a reference in flat-top with plant-controller model mismatch
The goal of the controller is to compensate for mismatches between the real system and the model in the controller. Therefore, mismatch is introduced in the plant with respect to the controller model. The results of two simulations are presented here. In the first (Figure 7), transport mismatch is introduced by decreasing the diffusion coefficient and increasing the drift coefficient. Furthermore, a mismatch is introduced by changing the width of the pellet deposition profile. These changes increase inward transport significantly resulting in higher densities at nominal particle inputs. In the second (Figure 8), transport mismatch is introduced by increasing the diffusion coefficient and decreasing the drift coefficient in the plant. Furthermore, a mismatch is introduced by changing the deposition radii of the pellet deposition profile, respectively to [0.8,0.9].
These changes decrease inward transport significantly resulting in lower densities at nominal particle inputs.
The open-loop control inputs are computed based on the nominal configuration of the system. For both simulations, the uncertainty in transport results in a large tracking error (figure 7.c & 8.c). In the first simulation, the uncertainty causes a violation of the Greenwald density limit (figure 7.d) and of the logarithmic gradient constraint (figure 7.e-f). It is shown in figure 7.g-i & 8.g-i that the open-loop density profiles (blue) do not match the reference profile (black).
The MPC controller is used to control the density profile in closed-loop. The control inputs differ from the open-loop ( figure 7.a-b & 8.a-b). It can be seen that in the presented simulations, the controller achieves the control objectives: the tracking error is below 5% (figure 7.c & 8.c); the Greenwald density limit is respected (figure 7.d); and the favorable ratio between ion-temperature and density logarithmic gradients is maintained ( figure 7.e-f). Furthermore, a good agreement between the controlled (red) and reference (black) profiles can be seen (figure 7.g-i & 8.g-i).
These results show that with continuous actuators, the MPC controller is capable of tracking a highperformance reference density profile in the presence of plant-controller model mismatch while simultaneously accounting for inward impurity transport.
Note that the relation between the density and the logarithmic gradient profile depend on a lot of factors, e.g. ratio between transport coefficients, pellet deposition location and deposition profiles, and the fueling rates (control inputs). For the controller to account for inward impurity transport, the control inputs must give enough freedom to find a feasible input sequence that minimizes the control error while respecting the constraint. It is however possible that the control reference and logarithmic gradient constraint become unfeasible (e.g. if the transport coefficients profiles changes drastically), i.e., there does not exist an input sequence for which the logarithmic gradient constraint is respected (this depends greatly on the degrees of freedom of the controller). In this case, the controller cannot optimize the impurity transport. However, it can give a warning to the overarching supervisory controller that the constraint is violated. Subsequently, fast update methods of the transport coefficients [52] can be used to update the transport model in the controller and determine a better suited control reference.

Discussion and outlook
In this work, we extend a dynamic state observer to estimate the density distribution in ITER using Thomson scattering (TS) measurements. For this, we combine the control-oriented particle transport model RAPDENS with a forward TS model and synthetic TS measurements in an extended Kalman filter (EKF) framework. We have shown that reliable, high-quality density profile estimates can be obtained in simulation with realistic measurement noise levels. However, we use a simple model for the synthetic TS measurements (Gaussian distribution) and thus do not account for systematic errors (e.g., due to equilibrium reconstruction errors or misalignment) or density dependent errors. These error can be present when using real TS measurements. Their presence will reduce the performance of the observer and might require a specific error handling procedure to ensure reliable reconstruction of the density profile. As a next step, we propose to test the extended observer on an experimental reactor with real diagnostic data, preferably AUG or TCV as it was already implemented there [24].
On these reactors, the TS measurements will be used together with other diagnostics (interferometers on TCV; interferometers and radiation measurements on AUG) to reconstruct the density profile. It can then be investigated if the expected density estimation performances are achieved and the increase in reconstruction performance by including TS can be quantified. Testing the observer with experimental data will help identify if additional handling is required to deal with systematic errors and design it if need be.
Furthermore in this work, we have synthesized a model-predictive controller for the density distribution in a tokamak. The controller uses the pellet fueling rate as control input to track a high-performance reference profile, avoid density limits, and optimize for impurity transport. We show that for continuous actuators, the controller is capable of achieving these control requirements in simulations with plantcontroller model mismatch.
Even so, we use the heuristic control-oriented model RAPDENS to synthesize and validate the controller. For further validation, it will be relevant as a next step to couple the controller to more complex first-principles transport codes, e.g., to JETTO as part of JINTRAC [39] to perform closed-loop control simulations, and investigate the achieved control performances. Additionally, pellets are inherently discrete events. However since this works presents a for the first time a controller for the density profile, this discrete nature is not included. Knowing that in a real reactor, the pellet size cannot be changed for each control action and is subject to constraints linked to for example plasma penetration, it will be relevant to investigate the performance of the controller when the discrete pellet nature is introduced in the control loop and when only pellets of a fixed size can be used.
Furthermore, we would like to address the possible extensions of the choice of control inputs. In this work and in literature, e.g. [26,27], the pellet fueling rate is considered as control input. This is a straightforward choice as it enables the use of linear controllers. In practice, however, the actuation with pellet injection is a complex function of pellet size, pellet velocity, injection frequency, and plasma temperature. We think it is relevant to study the possibility of using a more complete input space, e.g., by also taking into account pellet velocity, and analyze the effect on controllability and control performances. equations In this appendix, a summary of the RAPDENS equations is given. Mass conservation and radial transport [53] are used to model the evolution of the radial plasma density n e (ρ, t) as where ρ is the normalized toroidal magnetic flux, V is the plasma volume, Γ e (ρ, t) is the radial transport flux modeled as with χ(ρ, c HL ) the diffusivity and ν(ρ, c LH , I p ) the pinch (or drift) velocity where c LH indicates the regime of the plasma (L or H-mode). In A.2, S e (ρ, t) is the net electron source modeled as S e = σv iz n n n e − σv rec n e n i − n e | ρ > 1 τ SOL +S pellet +S NBI (A.3) with σv iz (T e ) and σv rec the temperature dependent cross-sections for ionization and recombination [3], τ SOL the time constant for particle loss in the scrapeoff layer, S pellet the net particle source due to pellet injection, and S N BI the net particle source due to neutral beam injection. The evolution of the particle inventories N w (t) in the wall and neutral vacuum density n n (t) are modeled by the 0D ODE's (A.5) with τ release the particle release rate, N sat the wall saturation inventory, τ pump the pumping time scale, c w a dimensionless constant that determines the steadystate balance between the wall and vacuum inventory, and V v,0 the nominal vacuum volume.
Appendix B. Dynamic state observer: theory and application to density profile estimation Appendix B.1. Kalman filter equations Given a system G with state vector x, output vector y, and input vector u, the Kalman filter (KF) assumes that the dynamics of G evolve in discrete-time following a linear state-space as where the state transition dynamics are captured in matrix F and the input dynamics in B. The output model and influence of the inputs on the outputs are captured in matrices H and D. The stochastic behavior of measurement and process noises are modeled in v and w as zero-mean Gaussian white noises with respective covariance matrices R and Q such that v(t k ) ∼ N (0, R(t k )) and w(t k ) ∼ N (0, Q(t k )). The tuning knobs of the KF are the covariance matrices R and Q.
The algorithm is initialized with an initial state estimatex 1|0 (with corresponding output estimateŷ 1|0 ) and an initial a priori estimate covariance Σ 1|0 . For each discrete time instance t = t k , the estimate of the states are obtained as follows (for readability the time dependence is denoted by the subscript k).

Update step:
Using the latest available measurementsỹ k , the residual z k and innovation covariance Ω k are computed by where H(t k ) is the output matrix and Σ k|k−1 the a priori process covariance. Next, the optimal Kalman gain L k is computed with and is used to obtain the updated state estimatex k|k and the a posteriori state estimate covariance Σ k|k aŝ Prediction step: Based on the most likely statex k|k , the actuator inputs at the present time u k , and the process covariance Q, the one step ahead predicted statex k+1|k and outputŝ y k+1|k and the a priori covariance at time step t = t k+1 are given byx

. Extended Kalman filter equations
In case a system cannot be represented accurately by a linear model, the EKF equations are used. The observer assumes that the system's dynamics evolve following where f : R nx × R nu → R nx and h : R nx × R nu → R ny are non-linear functions of the state and input. The stochastic variables v and w are similar as defined in Appendix B.1. There are two main differences between the KF and EKF equations: (i) The matrices F and H (used in the computation the measurement covariance (B.3), the Kalman gain (B.4), and the state covariances (B.9) and (B.6)) are the jacobians of f and h, i.e., ii) The prediction step is performed using the nonlinear model (B.10), i.e., (B.7) and (B.8) becomex It is important to note that the EKF is a linearized version of the Kalman filter for nonlinear dynamical systems, hence, no guarantee about stability or estimation accuracy can be given [33]. These properties are to be checked experimentally for the specific implementations. Appendix B.3. Application of the in the DSO for the density profile estimation For the density reconstruction, the RAPDENS plasma simulator is used as the one step ahead prediction model. The physical state vector x(t k ) (2) is augmented with additive unknown disturbances ζ(t k ) ∈ R nx . These disturbances are co-estimated by the observer and give a measure of the modeling errors, unmodeled processes, unaccounted particles sources, and errors in the diagnostics models. The stochastic behavior of the state and disturbances is modeled by additive zero-mean white noises w x k and w ζ k with respective covariance matrices Q x k and Q ζ k . The augmented nonlinear state-space model is given by An augmented state is defined as The stochastic measurement noise is represented by an additive zero-mean white noise v k with associated covariance matrix R k . The outputs are assumed to evolve as: The final step consists in defining the matrices The augmented state ξ(t k ) is then estimated by using (B.14) and (B.17) in (B.2)-(B.9).

Appendix C. Controller: definitions and derivations
In this appendix we provide the definitions and derivations of the controller matrices described in section 5.
Appendix C.1. Prediciton model matrices  (10), the predicted future states x i,k for i = 0 : N can be related to the current state x k , current disturbance estimatê d k , and input sequence u k , . . . , u k+N −1 by: We define matrices A x , B u , and B d as Using (C.2), (C.3), and (C.4), we rewrite (C.1) for the stacked predicted states and input sequence introduced in notation 2: The linear constraints are given by: x min ≤ x i,k ∀i = 0, 1, · · · , N, Zx i,k ≤ n gw ∀i = 1, 2, · · · , N, (C.8) By defining we rewrite (C.8) as (C.10) Using the stacked notation (13) and defining M 0 , M i , E i and b as the constraints in (C.10) are grouped as Finally, the matrices of (15) are obtained by inserting (C.5) in (C.15) and are defined as: Here we formulate the nonlinear constraint function g 1 introduced in section 5.5. Recalling that y(t k+1 ) = n e (ρ, t k+1 ), withρ defined in section 3.1, the augmented model (10) can be used to relate the density profile at time t = t k+1 can be related to the state and inputs at time t = t k . The relation is given by: Under the assumption of slow varying disturbances, such that d(t k+1 ) ≈ d(t k ), we can write n e (ρ, t k+1 ) = C(Ax(t k ) + Bu(t k ) + B d d(t k )) + C d d(t k ).
(C.19) Finally, we define C as ∂ρ , (C.20) such that C x(t k ) = ∂ne(ρ,t k ) ∂ρ and we construct an output weight matrix W y to weight the importance of the outputs in the nonlinear constraint. In this work we chose this matrix such that the weight of the outputs for which ρ < 0.9 are unity and the outputs for which ρ > 0.9 are weighted 0.
Inserting (C. 19), (C.20), and W y in (16) and rewriting the equation in negative null form, we can formulate the nonlinear constraint function g 1 as (time dependence denoted by subscript k for readability): g 1 (x k , u k , d k , ) ≡ |W y C (Ax k + Bu k + B d d k )|− |W y C(Ax k +Bu k +B d d k )+C d d k (0.95W y L Ti + )| ≤ 0. (C.21)

Appendix C.4. Cost function matrices
Here we derive the matrices of the numerical implementation of the cost function (18) as presented in (21).
First, similar as (13), we define the stacked steadystate states and inputs as . . .
Subsequently, the cost matrices W x , W u , and W N first introduced in (18)  In red the value of the disturbance after t = 0.25s is used. In blue, the disturbance estimate after t = 1.25s is used and in pink the disturbance estimate at t = 2.5s. Frame a-d): The "real" density profile computed by the nonlinear RAPDENS model (black) is compared with the predicted density profile by the LDAM for different values of the prediction horizon N. Frame(e): The prediction error is given as a function of the prediction horizon for different initialization times.
matrices W x and W u as (C. 25) Inserting (C.5) in (C.25) we obtain: Finally, by regrouping the quadratic terms in U k , the terms that depend on {U k , x(k)}, {U k , d(k)}, and {U k ,X k ,Ū k } we can define the matrices W u,u , W u,x , W u,d , and W u,r from (21) as Appendix C.5. Choice of prediction horizon The predictive capacities of the LDAM depend greatly on the estimated disturbanced k . The choice of the prediction horizon N is made to trade off control performances and prediction accuracy. In figure C1(ad) the predicted density profile is shown for different prediction horizons and initialization times. The red lines are obtained with the disturbance estimated k at time k = 0.25 s. The blue lines are obtained with the disturbance estimated k at time k = 1.25 s. Finally, the pink lines are obtained with the disturbance estimatê d k at time k = 2.5 s. In (e), the two-norm of the prediction error is shown as a function of the prediction horizon for the three initialization times. It can be seen in figure C1(e) that the prediction error increases with the prediction horizon but that is increase is very small when the disturbance estimate has reached the steady-state values (see pink line in C1(e)) and in that case large prediction horizons can be used accurately. In this work, we have chosen to work with N = 20.