Robust control of resistive wall modes using pseudospectra

A novel methodology to design robust feedback controllers for the stabilization of resistive wall modes (RWMs) in tokamaks is presented. A linear state space model is used that describes the system composed of the plasma, the resistive wall, the active coils and the magnetic field sensors. The full three-dimensional (3D) geometry of the wall and the coils is taken into account. The control system is represented by a parametrized matrix whose robust stability properties are optimized under variations of the parameters. To make the optimization process feasible, a reduced state space model is constructed by means of a novel technique involving an orthogonal projection. The orthogonality is essential to the robust stability concept used. The latter is based upon the idea of matrix pseudospectra and therefore accounts for the sensitivity of eigenvalues. Furthermore, the transient growth of perturbations is investigated. Namely, in some theoretically stable linear systems, initial perturbations can grow by large factors before they eventually decay, rendering them practically unstable. A detailed analysis of a simple, ITER-like test case provides the following general, conceptual insights into the RWM stabilization problem: (i) it is important to consider not only the eigenvalues themselves, but also their sensitivity, (ii) transient amplification might be an issue requiring consideration and (iii) the transient peak can be substantially reduced by a careful choice of the optimization objective and the sensor configuration.


Introduction
An important figure of merit for plasma confinement in nuclear fusion devices is the parameter β = 2µ 0 p / B 2 , where · denotes the volume average. Since the fusion power output increases with β, a future reactor must reach a sufficiently high β in order to be efficient and economically attractive. However, ideal magnetohydrodynamic (MHD) instabilities impose hard limits on the achievable β. In tokamaks, if internal instabilities (i.e. those which occur within the plasma core) are avoided by a suitable choice of the plasma current profile, the β limit comes about by the onset of external kink modes. These are long-wavelength instabilities that are driven by the radial gradients of the toroidal current and the pressure. They cause a deformation of the plasma boundary, grow on the Alfvénic timescale of the order of 10 −6 s and can terminate the plasma discharge abruptly. A superconducting wall sufficiently close to the plasma boundary would stabilize these modes and therefore substantially increase the β limit. But in the realistic case that such a wall has non-vanishing resistivity, the plasma configuration becomes unstable again, the stability limit being virtually the same as in the case without wall. However, the modes grow much more slowly, namely, on the resistive timescale of the wall, which is typically of the order of 10 −2 s. These decelerated external kink modes are denoted resistive wall modes (RWMs). Therefore, in the presence of a resistive wall, the active feedback stabilization of RWMs by means of magnetic field sensors and a system of additional correction coils becomes technologically feasible, and successful experiments, e.g. in the DIII-D tokamak, have already been conducted [1]- [3]. 3 Given a set of sensors and control coils, the coils have to be connected to the sensors by means of some feedback logics or controller. Finding an appropriate controller is always a difficult task, but at least it is strongly facilitated by numerical modelling of RWMs and their feedback stabilization. In this paper, a novel approach to the computational design of robust RWM controllers is presented.
The numerical treatment of RWM control has already been the subject of numerous studies, and several codes have been developed and extensively used to compute the linear stability of RWMs in the presence of a feedback system (in particular for the upcoming fusion experiment ITER): VALEN [4,5], DCON coupled to VACUUM [6], MARS-F [7]- [9], CarMa [10] and STARWALL [11,12]. STARWALL is also used in the present work. All these codes differ from each other in the complexity of representation of the plasma, the conducting wall and the feedback coils. In the VALEN code, the plasma state is represented by a single unstable eigenmode only, whereas in the other codes an expansion in terms of a finite, but large set of basis functions is used. Only MARS-F and CarMa take toroidal rotation of the plasma into account. The conducting structures have a full three-dimensional (3D) representation in VALEN, CarMa and STARWALL, whereas the other codes are restricted to axisymmetric configurations. In VALEN and STARWALL, however, a thin-wall approximation is used. Finally, STARWALL is the only code that is able to account for more than one toroidal Fourier index n simultaneously in the expansion of the plasma state, thus resolving the coupling between different n's in non-axisymmetric set-ups.
All the above mentioned studies, except those related to STARWALL, tackle the control problem in the same way as is quite common in control engineering: by the use of transfer functions. The underlying methodology shall be briefly reviewed here although a different approach is taken in this study. The transfer function concept is based upon regarding the system to be controlled as a 'signal processor', namely, a black box into which an input signal v (scalar or vector-valued) can be fed and which produces an output signal y (also scalar or vectorvalued). Inside the black box there is a linear dynamical system described by a matrix A 0 , whose internal dynamics is subjected to an additional external forcing if a nonzero input signal is applied. This means the control system is described bẏ (1.1) so that the input matrix B describes the influence of the input v on the system state vector x, and the output signal y is a measurement taken from x by means of the output matrix C. Without any feedback control, the input-output characteristics When restricting to the imaginary axis, z = iω, the transfer function describes the frequency response of the system. An important aspect of control theory is the robustness of stability. A stabilizing controller is not very useful if it is not stabilizing when applied to a slightly different system. A system is called robustly stable if it remains stable under any 'moderately large' perturbation of the system properties (not of the system state!). Clearly, robustness is important for several reasons. First of all, the 'background parameters' of the physical system under consideration (e.g. the plasma equilibrium current profile or pressure profile in the case of a tokamak) are uncertain to some extent. Furthermore, 'real' sensor and actuator signals are always noisy. Finally, the computer model introduces uncertainties due to numerical approximations as well as due to physical simplifications. At first sight, it seems hard to find some general concept of robustness that accounts for all these entirely different kinds of uncertainties at a time. What is needed is a measure of the 'size' of systems and system perturbations, that is, a norm in the space of all possible systems, in order to decide whether a perturbation is 'large' or not. In the context of using transfer functions, the so-called H ∞ -norm has become a widely accepted concept to quantify robustness, and a comprehensive theory of H ∞ -optimal control has been developed [13]. The H ∞ -norm of a stable system is defined in terms of its transfer function G(z) via (1.4) where s max denotes the largest singular value. In the case of scalar input and output, ||G|| ∞ is equal to the frequency gain maximized over all input frequencies ω, including infinity. The details of H ∞ control theory are complicated. Its fundamental idea consists of considering a stable system robust if it remains stable under all perturbations whose H ∞ -norm does not exceed some value, which is, however, large enough. The H ∞ theory has already been applied to the RWM stabilization problem using the MARS-F code [14,15].
Concerning system norms and robustness of stability, a different philosophy is followed in this study. Here, the focus of interest is what happens inside the black box. This means the internal dynamics of the system is primarily investigated rather than its input-output behaviour. The methodology to be derived relies on the following basic assumption: Among the various possible vector norms in state space, there is one particular norm || · || which is of special interest to the investigator. It provides an accepted measure of the 'size' of system states and, hence, of differences between states.
Typically, such a norm can be established if the state space variables of the dynamical system are quantities that are physically measurable (in principle), or functions of such quantities. Then, any norm of interest will be based on physical grounds, since it is a function of physically measurable quantities. In the case of the numerical treatment of RWM stabilization as presented in this paper, the above assumption is assumed to hold because a physically based state space norm will be derived in section 3. A norm that should be used to measure the size of systems and system perturbations can be derived immediately from the above assumption: because the linear system is represented by a matrix, the system norm should be the matrix norm that is vector-bound to the vector norm || · || provided by the assumption. In the following, the system matrix A will be assumed to include the effects of the feedback controller (in the case without feedback, A = A 0 except for a similarity transformation to be explained in section 3), so that the system dynamics is governed byẋ = Ax. The system norm will then be defined by (1.5) Consequently, a stable system A (meaning that all eigenvalues of A lie in the open left complex half-plane) is considered robustly stable if A + E is stable for all perturbations E satisfying ||E|| < with being 'sufficiently' large. For any stable A, the stability radius ρ(A) is defined to be equal to the supremum of all values for which the above condition holds, and it serves as a measure of robustness. These definitions of system norms and robustness are meaningful because ||E|| is equal to the maximum possible norm of the E-induced change in the tendencyẋ provided that ||x|| = 1. Finally, following the methodology introduced in [16], optimally robust control is achieved by maximizing ρ(A) over the set of all feasible feedback logics, i.e. over all the corresponding modifications of A. As explained in [16], this approach is closely related to the concept of matrix pseudospectra [17], which is also the focus of the present study. The -pseudospectrum of A is the union of all the eigenvalue spectra occurring for all perturbed matrices A + E with ||E|| < .
It is important to note that the usage of transfer functions and the application of the H ∞ theory are incompatible with the concept of robustness proposed here. The input-output characteristics have to be independent of the internal state space coordinate system used to represent x and A 0 . Indeed, using (1.1), the transfer functions (1.2) and (1.3) can be shown to be invariant under any similarity transformation x ← T −1 x, A 0 ← T −1 A 0 T. Hence, the system's H ∞ -norm (1.4) is invariant under oblique coordinate transformations, but its matrix norm (1.5) is not 2 . The latter is invariant only under orthogonal (or unitary) transformations, because only these transformations preserve the given state space norm and the induced metrics. It follows that, for systems where a certain state space norm is of particular interest, the information contained in the H ∞ -norm is independent of any properties of that state space norm and is therefore insufficient to assess robustness in the sense explained above. The transfer function is an incomplete characterization of such a kind of system. The paper is structured as follows. In section 2, the fundamentals of the used numerical model of the linearized RWM dynamics (the STARWALL code) are briefly explained, and it is shown in detail how the implementation of the feedback logics casts the RWM control problem into the structure of a parametrized eigenvalue problem. These parameters need to be optimized in order to achieve robust stability. The optimization task is feasible only if the state space dimension of the model is substantially reduced. To this end, a novel model reduction procedure has been developed that is described in section 3. The reduction is based upon projection onto an appropriate subspace. This projection has to be orthogonal in order to preserve the state space norm of interest, which is introduced in the same section. In section 4, several concepts and objectives of controller optimization are formulated, with an emphasis on pseudospectra and related ideas. Their numerical implementation in the newly developed OPTIM code package is briefly discussed. Because the proposed objective functions have some peculiar properties, their optimization as performed by OPTIM requires special explanation, which is given in section 5.

6
The methodology is applied to a computational example related to ITER, but slightly simplified, as described in section 6. Finally, the work is summarized and discussed in section 7.

Structure of the feedback control problem
In the following three subsections, the mathematical structure of the RWM feedback control problem is explained. The result is an eigenvalue problem of a matrix depending on several parameters, which require optimization in order to robustly stabilize the system. Section 2.1 contains a brief description of the physics underlying the STARWALL code that is used to compute the stability of RWMs in the presence of a conducting wall and a system of active feedback coils. The structure of the corresponding eigenvalue equation will be presented. The feedback logics relating the actions of the coils to the signals measured by a set of magnetic field sensors is described in section 2.2. These feedback logics will be encoded by a small number of free design parameters. In section 2.3, inequality constraints for the parameters will be derived, which can be used to account for technological limits of the feedback system. Finally, in section 2.4 it is demonstrated how the eigenvalue equation has to be modified in order to account for the effect of a time delay between the sensor measurements and the actions taken by the coils. The feedback control problem is presented in its final form before the model reduction step.

RWM stability calculations including a feedback system
STARWALL is a 3D ideal magnetohydrodynamics (MHD) stability code specialized for RWMs [11,12,18]. It forms a self-consistent MHD-electrodynamical model of the system composed of the plasma, the conducting wall and the active coils. The plasma dynamics is linearized about a prescribed ideal MHD equilibrium. The basic equations of the code stem from the extended energy principle of ideal MHD [19,20], from the boundary conditions at the plasma-vacuum interface and the wall, and from circuit equations for the coils. The extended energy principle is modified by neglecting the kinetic energy of the plasma. This is admissible for plasma motions on typical RWM timescales. Under this approximation, the plasma state can be eliminated from the equations, and the system state is uniquely determined by the current density distribution on the conducting wall and by the coil currents.
The coils are modelled as thin, ribbon-like conductors with only one winding. The wall is also assumed to be thin, and the currents flowing in it are divergence-free. The surface current density j is determined by a current potential φ via where n is a surface normal pointing outward. The wall is discretized into a triangular mesh. The current density is assumed to be constant on each triangle and can be derived from the φ values at the corresponding three vertices. Thus, together with the coil currents, the current potentials at the nodes of the wall mesh are the state variables of the discretized system. The wall mesh can have holes and may consist of more than one connected component. The current potential takes the same value on all nodes attached to a hole, and this value forms a single state variable for each hole, respectively. On one particular node per connected component, the φ value is set to zero and removed from the state vector. This corresponds to fixing a gauge constant for φ on that connected component and prevents singularity of some of the matrices introduced below.

7
In addition, the 3D geometry of the wall and coils and the positions and orientations of the sensors, which measure deviations from the equilibrium magnetic field, are required as input to STARWALL. Furthermore, the shape of the plasma-vacuum interface is needed as well as the MHD force operator matrix. These data are provided as output of the 3D ideal MHD stability code CAS3D [21]. In CAS3D, the plasma displacement vector is expanded in terms of Fourier modes (characterized by poloidal and toroidal Fourier indices m and n) on each flux surface accounted for in the radial discretization, giving rise to the matrix representation of the force operator. Recently, the possibility of refining the radial discretization near flux surfaces where the safety factor becomes rational has been added to CAS3D. This grid refinement avoids unphysical spikes in the displacement vector field near the rational surfaces and improves the accuracy of the computations. In the case that non-axisymmetric geometries are considered in the STARWALL computation, it is advisable to include in the CAS3D Fourier expansion at least all those n's simultaneously for which the equilibrium is unstable with respect to external kinks in the absence of the wall. The coupling between different n's induced by the breaking of the axisymmetry can be significant [12].
The linear dynamics of the plasma-wall-coils system is governed by where x ∈ R N is the state vector of the system. The first N c components of x contain the coil currents (N c is the number of coils), and the remaining components represent the current potential values at the nodes of the wall mesh. The symmetric matrix L is the sum of an inductance matrix and a matrix describing the effects due to the presence of the plasma, and consists of the symmetric, positive definite resistance matrix R 0 and the matrix U mapping the system state x onto external voltages applied to the feedback coils. U has nonzero entries in the first N c rows only and will be specified in the next subsection. Substituting a time dependence x ∼ e γ t and inverting L, one obtains the eigenvalue problem where is the system matrix composed of the open-loop part and the feedback part The system is stable if and only if all eigenvalues of A F have negative real parts. Without feedback (U = 0), all eigenvalues are real, and positive ones belong to unstable RWMs. The feedback control problem consists in choosing U properly so that A F becomes robustly stable.

The feedback controller model
As already adumbrated, a voltage control feedback model is adopted in this study. That means, depending on the system state, or, more precisely, on sensor measurements, voltages are applied 8 to the feedback coils. This dependence is described by a matrix of gain factors that maps the vector of sensor measurements onto the vector of coil voltages. The elements of this gain matrix will be optimized within a physically reasonable matrix subspace. However, for such a kind of proportional gain voltage controller, the promptness of the feedback system response is limited by the L/R time constants of the coils. To circumvent this problem, the coil currents are taken as additional measured quantities, which are fed back onto the coil voltages by additional gain factors. These act exactly like additional artificial resistances serially connected with the coils and therefore effectively reduce the coil time constants. Larger total dc resistances ('natural' plus artificial resistances) have to be compensated for by larger voltage gains to some extent. The artificial resistances are not fixed a priori, but are used as additional degrees of freedom for optimization. Namely, it is not clear a priori if the fastest available response is also the most favourable one, or if the integrating character of a slower response can also have desirable effects.
To construct the gain matrix, all coils and sensors, respectively, are subdivided into one or more toroidal arrays, depending on the configuration. All coils within an array are assumed to be identical in construction and to have the same poloidal and radial position, and the same orientation. The same assumption should hold for all sensors in an array. Each coil array k (k = 1, . . . , K ) is linked to each sensor array l (l = 1, . . . , L) via a gain sub-matrix G kl . In addition, the current flowing in each coil in the kth array is fed back onto that coil's voltage by an additional gain factor −R k , which is equal for all members of the array. It can be shown that this procedure, in principle, is equivalent to current control by means of cascade control 3 . Summarizing, for k = 1, . . . , K , the voltage vector is applied to the coils in the kth toroidal array, where i k is the vector of currents already flowing in these coils, and s l is the signal vector measured by the lth sensor array. If the system is in state x, the sensor matrix S l delivers s l : To reduce the number of design parameters in a physically reasonable manner, the gain submatrix G kl is constructed for each k = 1, . . . , K , l = 1, . . . , L in such a way that coil array k produces a field with toroidal Fourier index n in response to magnetic field perturbations with the same n, measured by sensor array l. This response is a linear combination of a response having the same toroidal phase as the perturbation and another one which is toroidally phase shifted by 90/n degrees. The in-phase response is described by a basic gain matrix G kl n,α , the 3 Application of a controller voltage U to a serial connection of a coil with resistance R and an additional resistanceR gives a current I (t) and a voltage drop U (t) = (R +R)I −R I (t) at the coil, where I = U/(R +R) is the saturation current. This voltage controller is equivalent to a current controller subdivided into an 'exterior' and an 'interior' controller, whereR is merely a numerical constant determining these controllers' properties.
In response to a sensor signal, the exterior controller 'demands' a current I , where the current gain equals the voltage gain divided by R +R. The interior controller measures the actual coil current I (t), computes the auxiliary quantity I (t) = (R/R +R))I (t) and applies the voltage U (t) = −(R +R)(I (t) − I ), being equal to the voltage drop mentioned above, to the coil, so that I (t) approaches I . phase-shifted response by G kl n,β . G kl is a linear combination of these basic matrices: G kl = n α kl n G kl n,α + β kl n G kl n,β , (2.10) where the sum runs over all n's to be controlled. The elements of G kl n,α and G kl n,β are given by (G kl n,α ) i j = cos(nϕ kl i j ) (2.11) and (G kl n,β ) i j = sin(nϕ kl i j ), (2.12) where ϕ kl i j is the toroidal angle between coil i of coil array k and sensor j of sensor array l. The free parameters α kl n (gains for in-phase responses) and β kl n (gains for phase-shifted responses) are left for optimization, together withR k .
As will be shown in the following, this feedback controller model can be compactly written down by deriving an expression for the voltage matrix U introduced in equation (2.3). With proper numbering of the coil current components in the state vector x (the first N c components) and considering (2.8)-(2.10), one finds which means U k = n L l=1 (α kl n G kl n,α + β kl n G kl n,β )S l for k = 1, . . . , K , so that U is a linear combination of the basic voltage matrices (2.14) and the basic current matrices I k with the elements with the basic feedback matrices F kl n,α = L −1 U kl n,α , F kl n,β = L −1 U kl n,β and F k I = L −1 I k . Thus, the eigenvalue problem (2.4) has become a parametrized one.

Constraints
In most cases, the parameter optimization should be carried out subject to constraints. Such constraints are useful for ensuring the technological feasibility of the parameters. Furthermore, termination of the optimization procedure can only be guaranteed if the allowable parameter values form a bounded set in parameter space. In principle, it is possible to limit the coils' time constant reciprocals, voltages and currents.
It makes sense to prevent the time constants from becoming arbitrarily small by imposing a lower bound T k min for each coil array k: where L k is the self-inductance of each coil in array k, and R k is its 'natural' dc resistance.
Constraining the voltage and current is more difficult. Because the control problem is a linear one and the voltages applied in response to a measured sensor signal are proportional to that signal, some estimate concerning the maximum signal magnitude S max has to be given in order to formulate the voltage constraint. Here, the following simplifying assumptions are made, similar to those described in [9]: S max is equal to an RWM detection limit, or noise level. As long as the sensor signal is below S max , the feedback system does not react at all. As soon as the threshold S max is exceeded, the feedback is switched on and counteracts the origin of the sensor signal. Soon, the latter drops below S max again, so that S max is never significantly exceeded.
Once a value for S max is fixed, the voltage constraint can be written for each coil array k = 1, . . . , K as where N l s is the number of sensors in array l, and U k max is some, for example, technologically caused voltage limit for the coils in array k. The expression on the left-hand side is approximately equal to the root mean square voltage applied to coil array k (mean taken over all coils in the array) in response to a measured magnetic field perturbation with amplitude S max ; it has been derived from (2.8), (2.10), (2.11), and (2.12). It makes sense to consider the root mean square voltage and current because the voltage and current can be assumed to be alternating. Stabilized RWMs, and therefore also the distribution of voltages and currents among coils, typically rotate in the toroidal direction, with frequencies comparable to the RWM growth rates without feedback, but typically much lower than the coil time constant reciprocals. From the latter fact it follows that the instantaneous current is always approximately equal to the saturation current which corresponds to the instantaneous voltage divided by the total dc resistance. The current distribution among coils has virtually the same toroidal phase as the voltage distribution. Consequently, a current constraint can be derived from (2.18) and reads with I k max being the current limit to be set for the coils in array k.
The above assumptions concerning the evolution of the sensor signal magnitude after it exceeds S max have to be taken with a pinch of salt, since its actual evolution strongly depends on the initial condition. If the initial state (the state at the moment when the sensors start to detect a signal, as it exceeds S max ) is largely an RWM, the assumptions should be reasonably justified, since the active coils indeed counteract the origin of the detected signal, namely, the RWM. This is no longer true for different initial states, where significant transient amplification of the sensor signal might actually occur after exceeding the detection limit. First of all, an initial sensor signal might be due to small-scale eddy currents, not related to an RWM, in the wall parts close to the sensors. In such a case, the action of the feedback system will erroneously and rapidly excite an RWM, which, however, will be controlled later on if the system is stable (an example will be discussed in section 6). This process will probably be accompanied by a transient increase of the sensor signal magnitude. Secondly, one has to take into account the fact that the sensors 'see' only a low-dimensional subspace of the state space, which means there is a highdimensional subspace that is invisible to the sensors. To any initial state vector, an arbitrarily large portion from the invisible subspace can be added without changing the instantaneous sensor measurement. Since the invisible subspace cannot be expected to be invariant under the system dynamics, the large, initially invisible portion will increasingly project onto the visible subspace as it dynamically evolves. By this means, initial states with arbitrarily large transient amplification of the sensor signal can be constructed theoretically. However, such initial state vectors have to be large themselves.
Fortunately, initial states that produce large sensor signal amplifications lie in the stable subspace of the open-loop system, because they correspond to current distributions in the wall that are unrelated to RWMs. Their spontaneous excitation can be caused only by physical processes not included in the model used here (e.g. edge localized modes (ELMs)). Estimating the amplitude and relevance of such perturbations would require nonlinear simulations of more complete, forced-dissipative MHD models, but is not possible here. Therefore, the possibly overoptimistic constraints (2.18) and (2.19) are relied upon, noting that if some amount of transient sensor signal amplification is expected, it could be incorporated as a factor into the estimate of S max .

Effect of time delay between sensors and actuators
In the following, a technically caused time delay between an 'event' detected by the sensors and the coil voltage application in response to that event is accounted for. This delay can be characterized by an impulse response r (t) satisfying r (t) = 0 for t < 0 and ∞ 0 r (t)dt = 1. (2.20) For simplicity, the impulse response is here assumed to be the same for all magnetic field sensors and all ammeters at the coils. But the following explanations can readily be generalized. The delay causes that, at time t, the sensors do not 'see' the present system state, but a combination of states from the past. It follows from (2.5) that the system dynamics is then governed byẋ Now, a new matrixÂ F is sought that describes the system dynamics so thatẋ(t) =Â F x(t) and therefore Depending on the particular form of r (t), an exact solution forÂ F might be difficult. But, under the assumption that r (t) is essentially zero for all t > q and some q, which is small compared to all timescales occurring in the system dynamics, the exponential in (2.22) can be approximated to first order in t . With (2.20) and the 'average delay time' one finally arrives at the solution (2.24) Under the approximation used, r (t) has exactly the same effect as a pure time delay r d (t) = δ(t − τ ). The final form of the unreduced, parametrized eigenvalue problem readŝ (2.25) withÂ F given by (2.24) and the parameter dependence residing in F as given by (2.16).

Model reduction
For the full-sized matrixÂ F , the parameter optimization is computationally not feasible since the state space dimension N is typically of the order of 10 4 . A model reduction procedure, which dramatically reduces the system dimension, but retains as much information on the original system as possible, is absolutely necessary. More precisely, this kind of reduction should be arranged in such a way that the neglected part of the system dynamics is virtually unaffected by the presence of a feedback system, whereas the retained part responds to the feedback essentially in the same way as the original system does. A typical approach, which is also adopted here, consists of two steps: firstly, the unstable subspace of the system without feedback is split off. Secondly, the stable subspace is projected onto a further, 'dominant' subspace by means of a similarity transform and truncation (deletion of 'non-dominant' rows and columns). Finally, the unstable subspace is given back to the reduced system. The various truncation-based model reduction methods that can be found in the literature ( [22,23] and references therein) differ from each other in the choice of the similarity transform or, in other words, in the measure of dominance. For reasons already explained in the introduction, a novel reduction method will be introduced here, which is based on an orthogonal similarity transform and thus preserves the metrics in the state space, in contrast to the methods available in the literature. This procedure shall be denoted isometric truncation.
As the first step, the separation of the unstable subspace is accomplished by computing all eigenvalues and eigenvectors of the open-loop system. The unstable subspace is spanned by the eigenvectors belonging to positive eigenvalues. But the solution of the eigenvalue problem also serves another purpose. Namely, a physically based state space metrics is established, after the system is subjected to a preliminary similarity transform defined by the eigenvector matrix, which is orthogonal not in the strict, but in a more general sense, as will be explained in section 3.1. The second step consists of the isometric truncation and will be described in section 3.2.

The open-loop eigenvalue problem and the state space norm
The open-loop eigenvalue problem A 0 x i = γ i x i , i = 1, . . . , N is most efficiently solved by first computing the solution of the generalized eigenvalue problem where L and R 0 are symmetric and R 0 is positive definite, and then substituting γ i = −µ −1 i , cf (2.2)-(2.4). The ordering γ i γ i+1 is assumed in the following. The eigenvectors x i are normalized so that they satisfy the generalized orthonormality condition For any system state x, x T R 0 x corresponds to the ohmic loss produced by the current distribution represented by x, since R 0 is the resistance matrix. All eigenvectors are normalized to generate unit ohmic loss. The norm is defined in terms of a physically measurable quantity, and it is considered the norm of interest. This motivates a transformation of the system states to eigenvector coefficients: In the new coordinate system, the square root of the ohmic loss can be calculated in terms of the canonical scalar product: x = √x Tx = x R 0 . All the system matrices are transformed asM = X −1 MX, where M has to be replaced by A 0 , F kl n,α , F kl n,β and F k I , respectively, cf (2.16). Clearly,Ã 0 = diag(γ 1 , γ 2 , . . . , γ N ).

Isometric truncation
The stable part of the open-loop system in the new eigenvector coordinates is described by the matrixĀ 0 ∈ R (N −u)×(N −u) , where u is the number of non-negative, i.e. unstable eigenvalues.Ā 0 is generated fromÃ 0 by deleting the first u rows and columns:Ā 0 = diag(γ u+1 , γ u+2 , . . . , γ N ). The stable part of the transformed state vectorx is denotedx and contains the last N − u components ofx. To derive the isometric truncation procedure, the stable part of the control system is described by the standard 'black box' formalism of control theory already described in the introduction. The system is assumed to have an actuator input vector v ∈ R N i and a sensor output vector y ∈ R N o , resulting in the descriptioṅ with input matrixB and output matrixC . In the following, it will be shown how these matrices can be derived from their counterparts B and C of the full system (including the unstable part) in the original coordinate system, where the system description is given by where N t is the number of different toroidal Fourier indices n to be controlled) describes the excitation of cosine and sine current patterns in the different coil arrays and with different toroidal 'wave numbers' n. It has the following block structure: For each k = 1, . . . , K , the block B k ∈ R N k c ×2N T describes the action of the kth coil array, where N k c is the number of coils in array k. The block is given by with n 1 , n 2 , . . ., n N t being the different n's to be controlled and θ k i being the toroidal position (angle) of the barycentre of the ith coil in array k. Similarly, the output matrix C ∈ R N o ×N (N o = 2L N t ) projects the output vectors of the sensor arrays onto cosine and sine patterns. It takes the form For each l = 1, . . . , L, C l provides the outputs of the lth sensor array: where S l is the lth sensor matrix, cf (2.9). P l is a projector onto the cosine and sine components under consideration, which reads Here, N l s is the number of sensors in array l, and ϑ l j is the toroidal position of the jth sensor in that array. The corresponding matricesB andC in the stable subspace are generated by first computing the transformed matricesB = X −1 B andC = CX and then deleting the first u rows inB to obtainB and the first u columns inC to getC.
Before returning to the model reduction problem, it should be noted that the above definitions of the matrices B and C are consistent with the derivation of the controller model in section 2.2. This can be seen by concluding that, in the case that the artificial coil resistancesR k are zero for all k = 1, . . . , K (this case is assumed throughout the model reduction procedure, yielding reduced models that work equally well for nonzeroR k unless the natural dc resistances R k are extremely small, as for superconducting coils), the controller model is equivalent to a particular static output feedback applied to (1.1). That means the feedback matrix F derived in section 2.2 can be reconstructed by connecting the input v with the output y via a suitably chosen controller matrix K, so that v = Ky = KCx and therefore F = BKC, cf (1.1) and (2.5).
Using the definition ϕ kl i j = θ k i − ϑ l j , see (2.11) and (2.12), it can be shown that BKC is equal to (2.16) forR k = 0, k = 1, . . . , K if the controller is chosen as Almost all truncation-based model reduction methods currently available in the literature rely on oblique transformations. Among the most popular methods are balanced truncation and related procedures [22]- [30]. The isometric truncation presented here is based upon an orthogonal transformation, but the basic idea used to define this transformation is exactly the same as in Moore's original work on balanced truncation [24], namely, the analysis of the response to two different kinds of test signals injected into the control system (3.4) with the initial condition It can be expected that, for all t > 0, the columns ofX(t) project most strongly onto those state space directions that can most easily be controlled by the actuators, and the aim is to extract these directions. Similarly, the second kind of test signals is defined by adding a term e i δ(t) for i = 1, . . . , N , with e i ∈ R N now being the ith canonical unit vector in state space, to the right-hand side of the equation forẋ in (3.4). The corresponding output vector responses y i (t) are collected into Y(t) = (y 1 (t) y 2 (t) . . . y N (t)) =C exp(tĀ 0 ). For all t > 0, the columns of Y T (t) should project most strongly onto those state space directions that can most easily be observed by the sensors and, again, one aims at finding these directions. In the model reduction procedure, the directions that are both almost (or exactly) uncontrollable and unobservable are discarded. The solution of the problem to find low-dimensional subspaces that the columns ofX(t) and Y T (t) primarily project onto is given by principal component analysis (PCA) [31], a statistical data analysis method that is widely used in various scientific disciplines ranging from meteorology to psychology. Given a matrix Z(t) ∈ R P×Q (t ∈ [t 1 t 2 ]), whose columns contain Q vector time series, PCA yields a set of basis vectors w i ∈ R P and time-dependent coefficient vectors c i (t) ∈ R Q , i = 1, . . . , P, so that the columns of Z(t) are expanded in terms of the w i 's: The w i 's are chosen in such a way that, for all k < N , the error introduced by a projection of the columns of Z(t) onto the k-dimensional subspace spanned by w 1 , . . . , w k becomes minimal in a least squares sense: where || · || F is the Frobenius norm. The solution of this optimization problem is to choose the w i 's as eigenvectors of the Gramian, or covariance matrix Since W is symmetric and positive semidefinite, its eigenvectors w 1 , . . . , w P are orthogonal, and its eigenvalues w 1 w 2 · · · w P are non-negative. For all i = 1, . . . , P, the ratio w i / P i=1 w i describes the fraction at which the variance of the matrix time series w i c T i (t) contributes to the total variance of Z(t). Therefore, w i is an 'importance' measure for w i .
It follows that, when looking for the system states that are most easily controllable, one should carry out a PCA ofX(t) = exp(tĀ 0 )B and consider the leading eigenvectors of the controllability Gramian Analogously, a PCA ofȲ T (t) = exp(tĀ T 0 )C T can be performed to find the most easily observable states as the leading eigenvectors of the observability Gramian In many PCA applications, the time integration in (3.14) has to be carried out numerically by sampling 4 Z(t), but in the case of (3.15) and (3.16) there exists a convenient alternative. Namely, W c and W o are solutions of the Lyapunov equations (3.17) [22]. Normally, solving a Lyapunov equation involves the computation of the Schur form ofĀ 0 [22], but in the case at hand, the necessary work has been done before, sinceĀ 0 = diag(γ u+1 , . . . , γ N ) is already diagonal, and (3.17) can be solved at virtually no extra cost: To proceed with the model reduction, all the system states that are essentially both uncontrollable and unobservable have to be excluded. Hence, W c and W o need to be combined somehow in order to account for controllability and observability simultaneously. In the case of balanced truncation and related methods, oblique transformations are defined that are associated with the product W c W o . Here, a different, somewhat simpler approach is taken in order to obtain an orthogonal transformation. Instead of analysingX(t) and Y T (t) separately, both matrix time series are analysed at a time. To this end, the joint controllability and observability Gramian is defined here, corresponding to a PCA of 1 2 . In this expression, the matrix addition includes that the matrix that has less columns than the other is augmented with zeros to fit the size of the larger matrix. The trace prefactors beforeX(t) and Y T (t) ensure that controllability and observability are given equal weight in the sense that both sets of time series are normalized to have the same total signal energy of 0.5.
To perform the isometric truncation, the eigenvalue problem is solved with the eigenvalues λ i sorted in descending order. For all i, λ i is equal to the variance fraction explained by the projection of all the time series onto θ i , since N −u i=1 λ i = 1. The orthogonal eigenvectors θ i are normalized to unity and define the transformation matrixT acting in the stable subspace: (3.21) In the full space represented by the open-loop eigenvector coordinates, the orthogonal transformation matrix is given by andF k I . Together with the free parameters α kl n , β kl n andR k , these transformed matrices define the transformed system matrix cf (2.24), whereF is constructed fromF kl n,α ,F kl n,β andF k I analogous to (2.16). Finally, in the truncation step the lowermost rows and rightmost columns inǍ 0 andF are deleted, thus retaining only an upper left N red × N red -block, respectively. The truncated version of A is constructed according to (3.23), but substituting the truncated versions ofǍ 0 andF. If N −u i=N red +1−u λ i 1, it can be expected that the truncated system mimics the full system quite accurately in responding to the feedback. Using the truncated model, an optimally stabilizing parameter set is searched for by means of the methodology described in sections 4 and 5. Afterwards, the parameters found are used to construct the full-sized matrix A. For quality control, the properties (eigenvalues, pseudospectra, ||e tA || curves, cf section 4) of the full model and the reduced one have to be compared. If there are unacceptable deviations, the optimization step has to be repeated using a less severely truncated model, i.e. a higher value of N red .
Since the system states correspond to current distributions on the conducting wall and in the coils, the state space directions, onto which the system is projected during the isometric truncation procedure, can be interpreted as current patterns. Represented by state vectors in the original, untransformed coordinate system, these current patterns are given by the columns of the 'pattern matrix' = XT, (3.24) which comprises the preliminary transform to open-loop eigenvector coordinates and the subsequent orthogonal transform. The column vectors of = (π 1 π 2 . . . π N ) can be used for visualization of the patterns and are orthonormal with respect to the 'ohmic loss' scalar product: π T i R 0 π j = δ i j . The leading patterns can be interpreted as being the most important physical processes involved in the control system.
In the following, A will denote both the full-sized, transformed system matrix and its truncated counterpart. The distinction between the two will be explicitly noted wherever necessary.

Concepts for controller optimization
The question remains, what is 'optimal stability'? When attempting to stabilize a parametrized matrix, two basic objectives are obvious: on the one hand, it is desirable to obtain good asymptotic stability. That means all eigenvalues lie in the left complex half-plane and are comfortably bounded away from the imaginary axis, so that initial perturbations decay reasonably fast as t → ∞. On the other hand, the stability should be robust with respect to uncertainties in the parameters or, more generally, to imperfections in the mathematical model. Assuming that such uncertainties are reflected by uncertainties in the matrix elements, robust stability means that, if the matrix A is stable, then (A + E) should be stable for any matrix perturbation E of 'moderate strength'. Unfortunately, achievement of the first objective does not necessarily imply achievement of the second. A large distance of the eigenvalues from the imaginary axis does not ensure robustness if the eigenvalues are very sensitive to perturbations of A.
Eigenvalues of normal matrices, i.e. matrices with an orthogonal eigensystem, are most insensitive. If the matrix is non-normal, which is the case for the RWM problem with feedback, the sensitivity can increase dramatically. Furthermore, there is a second unfavourable feature: if A is stable, but non-normal, then ||e tA || may be much greater than unity for some t > 0 (this holds for any matrix norm, but the matrix 2-norm is used throughout this study). This means that the initial state at t = 0 can be chosen in such a way that it exhibits strong transient growth before it eventually decays [17] (of course, not all possible initial states suffer from transient growth, in general). Systems that are theoretically stable might be practically unstable if the transient amplification is too strong. This fact gives rise to a third objective in matrix stabilization: the transient peak, i.e. the supremum of ||e tA || for t > 0, should be kept at a moderate level. In the following, these three objectives are put into quantitative terms. In section 4.1, measures to assess the asymptotic stability of a matrix are introduced. Following [17], measures of robust stability are derived in section 4.2, which are based on the central concept of matrix pseudospectra. In section 4.3, it is explained how to tackle the problem of transient peaks, which turn out to be related to pseudospectra, as well. Finally, the numerical realization of these concepts in the OPTIM code is briefly explained in section 4.4.

Measures of asymptotic stability
The traditional measure of asymptotic stability is the spectral abscissa σ (A), that is, the largest real part of all eigenvalues of A: Minimizing σ (A) under variations of the parameters upon which A depends gives the best available asymptotic decay rate of initial disturbances. However, this function 'sees' only the leading eigenvalue(s) (those with largest real parts), and in many cases several eigenvalues are simultaneously on the leading position when the optimal solution has been found, thus giving the 'least stable subspace' a rather high dimension. In such cases, it may be possible to push some of these eigenvalues much further to the left in the complex plane under additional parameter variations without affecting other eigenvalues very much. Therefore, a second function η(A), which accounts for all the eigenvalues, but gives more weight to the leading ones, has been implemented into OPTIM in addition to σ (A): Here, this function is given the name exponential spectral function. Minimizing η(A) establishes a compromise in attempting to shift all eigenvalues as far as possible to the left. Neither σ (A) nor η(A) guarantees a high level of robustness of the stability after optimization, because eigenvalue sensitivity is not taken into account. However, under the assumption that the sensitivity of the optimal solution is similar in both cases, optimization of η(A) appears to be likely to produce somewhat more robust results, because generally most of the eigenvalues should have a greater distance to the imaginary axis than they would have after σ (A) optimization. But that assumption might not be true.

Eigenvalue sensitivity and robust stability
To assess the robustness of stability it is necessary to consider the sensitivity of eigenvalues. An elegant way to capture the sensitivity of the entire spectrum is given by the definition of the -pseudospectrum γ (A): γ (A) = {z ∈ C: z is an eigenvalue of A + E for some E ∈ C N ×N with ||E|| < }. While the eigenvalue spectrum of A is a discrete set of points in C, γ (A) is an open set in C containing the spectrum. Since (4.3) involves a matrix norm (in this study, the 2-norm), the -pseudospectra depend on the state space coordinate system. This is exactly the reason why the similarity transform underlying the model reduction was demanded to be orthogonal.
One might immediately ask why the perturbations E are allowed to be complex. In physical systems, perturbations are generally real, not complex. Indeed, the real structured pseudospectrum, which is obtained when replacing E ∈ C N ×N by E ∈ R N ×N in (4.3), attracts more and more interest in the control theory community. But considering complex pseudospectra instead of real ones has certain advantages. The former are easier to deal with than the latter, both theoretically and practically. More important, however, is the fact that complex pseudospectra provide insight into the transient behaviour of the system, as will be discussed in section 4.3, whereas real pseudospectra do not. Finally, a system proved to be robust under complex perturbations will certainly be robust under real ones.
Based upon the concept of complex pseudospectra, the robustness of a stable system A can be measured by the complex stability radius ρ(A): In other words, ρ(A) is equal to that particular for which the closure of γ (A) touches the imaginary axis. Another measure of robustness is the spectral abscissa of the system A if subjected to a worst case perturbation E smaller than a given . This measure is given the name the -pseudospectral abscissa σ (A) and corresponds to the largest real part of all points on the closure of γ (A): (4.5) Both functions, ρ(A) and σ (A), are implemented in OPTIM and can be chosen as objective for optimization. The consequences of the choice of when optimizing σ (A) will be briefly discussed in the following subsection.

Transient growth
There are various theorems relating the transient behaviour to pseudospectra. An important one is the following lower bound for the transient peak [17]: That means if the -pseudospectrum extends significantly into the right complex half-plane for some so that σ (A) > , then there is transient growth. Furthermore, it is shown in [17] that, for large values of σ (A)/ , ||e tA || grows exponentially at least with a rate close to σ (A) for small t. The larger the value of σ (A)/ is, the longer this behaviour continues.
It follows from these results that it is necessary, albeit not sufficient, to keep σ (A)/ at a minimum for all in order to diminish transient growth. This can be attempted by minimizing σ (A) for significantly larger than ρ(A), at the expense of robustness. The choice of requires some experimentation. If is chosen too large, the optimal solution may even be unstable. Generally, optimizing σ (A) for > ρ(A) corresponds to a tradeoff between robustness and favourable transient behaviour, whereas the optimization for 0 < < ρ(A) is a compromise between robustness and asymptotic stability.

Functionality of OPTIM
The OPTIM code package offers four different features to be applied to parametrized matrices A: (i) optimization of stability under parameter variations by means of the algorithm described in section 5, (ii) computation of pseudospectra for a given parameter set, (iii) computation of ||e tA || on a given time interval 0 t T and for a given parameter set, and (iv) simple evaluation of a chosen objective function for a given parameter set. In principle, all these functionalities can be applied to the reduced as well as to the full-sized model. The code is fully parallelized based upon the PBLAS and ScaLAPACK libraries for use on a distributedmemory architecture. It has to be noted, however, that the order of the reduced model used in this study is so small that parallelization is not efficient, so that the optimizations are carried out on a single processor only. But the computations made for the full system (pseudospectra, ||e tA || calculations, evaluations of objective functions, see section 6) take full advantage of the parallel execution on 32 processors. Due to its modular structure, the code is not restricted to the RWM stabilization problem presented here, but can readily be applied to any user-implemented parametrized matrix and therefore offers a wide area of application in control theory and related fields.
Let p = ( p 1 p 2 . . . p P ) be the vector of parameters that A depends on, A = A(p). When optimizing the stability of A, one of the following four objective functions can be chosen for minimization, cf (4.1), (4.2), (4.4) and (4.5): (4.7) By the case differentiation, F 3 becomes a function that is continuous everywhere in the parameter space. To evaluate F 1 and F 2 , the eigenvalue problem of A has to be solved. The algorithm to calculate F 3 for stable A is closely adapted from the two-step H ∞ -norm computation method described in [32]. Namely, it can be shown, by means of the alternative definition (4.8) of γ (A) introduced below, that ρ(A) is equal to the reciprocal of the H ∞norm (1.4) of the 'transfer function' G(z) = (A − z1) −1 . F 4 is computed in terms of a criss-cross algorithm [33], which is quite similar to the two-step method.
In the computation of pseudospectra, an alternative definition of γ (A) is used, which can be shown to be equivalent to the definition (4.3) as long as the matrix 2-norm is used there [17]: with s min (A − z1) being the smallest singular value of (A − z1). By calculating s min (A − z1) for various z values on a grid in C and visualizing them in a contour plot, one obtains different -pseudospectra boundaries given by the corresponding -contours. An efficient method for this computation is used, which has been adapted from [17,34], but with the inverse iteration step replaced by an explicitly restarted Lanczos procedure as described in [35] to improve convergence in the case that the smallest singular values are densely spaced.
In the computation of ||e tA ||, the matrix exponential is evaluated by means of the scaling and squaring method as described in [36], which involves a Padé approximation of the exponential.

Optimization method
In general, all the objective functions in (4.7) have the unfavourable property of being not smooth everywhere. Even though these functions are continuous, their gradients can be discontinuous or may even diverge on sub-manifolds in parameter space. Conventional nonlinear optimization methods are likely to get stuck near such manifolds, and a special algorithm is required to successfully minimize functions like (4.7). The gradient bundle method [37,38] implemented in OPTIM fits this purpose very well and will be briefly described below. Like other optimization methods, it requires the gradient of the respective objective function, and it is therefore highly beneficial to derive analytic expressions for the gradients of (4.7) which are valid everywhere except on sub-manifolds such as mentioned above.
The functions F 1 (p) and F 2 (p) are related to the eigenvalues of A(p). It can be inferred from (4.1) and (4.2) that the components of their gradients are given by where γ m is an eigenvalue of A with maximum real part, and For the derivatives of the eigenvalues with respect to the parameters, first-order perturbation theory yields where x i is the right eigenvector and z i the left eigenvector of A associated with the eigenvalue γ i , which has to be nondegenerate [39]. Finally, the matrix derivative (∂A/∂ p j ) needs to be determined. The parameter dependence of the transformed feedback matrixF, which contributes to A according to (3.23), can be compactly written asF = P i=1 p iFi , where theF i 's are the transformed basic feedback matrices. A short calculation gives The components of the gradient of F 3 are given by where υ and ω are the left and right singular vectors belonging to the smallest singular value of (A − z1) and z ∈ C is a point where the pseudospectrum γ ρ(A) touches the imaginary axis [16,40]; z is computed automatically as a by-product of the algorithm that evaluates ρ(A). The product (∂A/∂ p j ) · Re(υω H ) is a 'matrix scalar product' defined as an ordinary scalar product while interpreting the matrices as vectors with N 2 components. Similarly, the gradient components of F 4 are computed as Again, υ and ω are the left and right singular vectors corresponding to the smallest singular value of (A − z1), but z is now a point on the closure of γ (A) with maximum real part [16,40], automatically determined during σ (A) computation. In principle, the gradient bundle method is similar to the simple steepest descent algorithm. In both methods, each iteration consists of determining a search direction in parameter space, followed by a line search along this direction in order to find a significantly lower objective function value. In the case of steepest descent, the search direction is simply the negative gradient evaluated at the current point in parameter space. The gradient bundle method additionally accounts for gradients evaluated at some other points in the neighbourhood. The search direction is defined to be the negative of the shortest vector on the convex hull of all gradient vectors. If the objective function is smooth in the neighbourhood of the current parameter point, all gradient vectors will be much the same, and the search direction will resemble the steepest descent. But if the current parameter point is close to a manifold where the gradient is discontinuous, gradients will be sampled from both sides of that manifold. The resulting search direction vector will then be aligned mostly parallel to the manifold, but still directed towards decreasing function values. This prevents the procedure from being forced to make tiny steps or getting stuck at the manifold. A well-developed theory guarantees convergence against a local minimum of the objective function [37,38]. In OPTIM, the required convex hull computations are arranged by calling the external program LRS [41]. The shortest vector on the respective convex hull is determined by a suitable quadratic programming subroutine [42].
The algorithm version implemented in OPTIM differs from the original one [37,38] in two small, but noteworthy details. First of all, in the original procedure the additional gradients are evaluated at 2P points randomly chosen within some sampling diameter around the current point in parameter space. In OPTIM, the gradients are computed at the vertices of a simplex with given edge length, positioned in such a way that the current parameter point coincides with the barycentre of the simplex. This involves only P + 1 instead of 2P additional gradient computations and therefore gives a considerable speed-up. The convex hull problem is simplified by this means, as well. Despite the smaller number of points, the probability that information from both sides of a critical manifold is accounted for is still very high. To further stabilize the procedure, the simplex is always rotated in a different, but deterministic manner in each iteration. Secondly, the selection of the 'neighbourhood size' (simplex size or sampling diameter) and the termination criteria in OPTIM are somewhat different from the original. Like in the original, the simplex size is decreased by a prescribed factor if the convex hull happens to be very close to the origin or contains it, or if the line search step becomes too small. But what is new is that the simplex is also slightly enlarged from iteration to iteration unless a prescribed maximum size is already reached. Thus, the simplex size is dynamically adapted to the local scales on which the objective function varies significantly. The program terminates when the simplex size is already at a prescribed minimum, and a further shrinking is requested.
Finally, the implementation of constraints, such as those discussed in section 2.3, should be mentioned. In OPTIM, constraints are realized in terms of user-implemented penalty functions to be added to the objective function, and their respective gradients. These penalty functions should be identically zero in the allowed parameter domain and become increasingly positive the more the constraints are violated. The gradients must exist almost everywhere, but may be discontinuous on sub-manifolds.

ITER-like example
In the following, the derived methodology is applied to an ITER-related example. Because the principal aim of this paper is to demonstrate the capabilities of the method and of the code package and to gain general, conceptual insights rather than to produce precise results directly relevant to the ongoing ITER design, some simplifications are made in order to obtain a test case allowing for good visualization and simple analysis, and requiring only moderate numerical effort.
The chosen plasma equilibrium is the steady state scenario 4 from the ITER design, with β set to 2.29%. Without a conducting wall, this equilibrium is unstable to external kinks with n = 1 only. Toroidal Fourier indices higher than n = 1 will not be included in the expansion of the plasma state, and the feedback control will also act on n = 1 only. In the strict sense, the 3D conducting structures break the axisymmetry and the different n's become dynamically coupled, so that it might be possible that the feedback system drives intrinsically stable RWMs with n > 1 unstable. But this possibility is not investigated in this study as the entire analysis is restricted to n = 1.  While a nested double wall is scheduled for ITER, the conducting wall model used here accounts only for the interior wall and also neglects some other geometric details. The effects of the exterior wall on the RWMs are minor, and a single wall greatly facilitates the visualization of current patterns. Figure 1  ports that penetrate the ITER wall. The original wall will be made of steel with conductivity σ = 1.212 × 10 6 −1 m −1 and thickness d = 0.06 m. Since the thin-wall approximation is used in the STARWALL code, it is the surface conductivity σ d that is accounted for. The coils are assumed to be made of copper (σ = 6 × 10 7 −1 m −1 ) with quadratic cross section having an edge length of 0.1 m. In the computations, the coil resistances are set accordingly. The coils are located inside some of the equatorial ports, equidistantly spaced by 40 • in the toroidal direction, and grouped into a single toroidal array, i.e. K = 1. Two neighbouring coils are missing in the array due to collision with neutral beam injection devices, so that only seven coils remain. This feedback coil set is one of several sets that have been under discussion in the ITER design process. The recent final decision gave preference to a different coil set, which will be mounted inside the wall, but the set used here is most suitable for the visualization of coil currents. The 18 sensors also form a single array (L = 1) and are positioned at equidistant toroidal angles, centred between the equatorial ports and mounted inside, but very close to the wall (see figure 1). The sensors are oriented in the vertical direction and therefore measure the poloidal component of the magnetic field perturbation. Poloidal sensors were shown to be superior to radial sensors in several studies [4,7,15,43]. The indices k, l and n are omitted in the following because, in this simple test case, they take the value 1 only. The solution of the open-loop eigenvalue problem A 0 x i = γ i x i , i = 1, . . . , N , where N = 5190, results in two unstable RWMs x 1 , x 2 with growth rates γ 1 = 21.9 s −1 and γ 2 = 21.7 s −1 , whereas all other modes x 3 , . . . , x N are stable. The wall current patterns corresponding to x 1 and x 2 are shown in figure 3. The two patterns have a helical n = 1 structure and are very similar, but toroidally phase shifted against each other by 90 • . The eigenvalues γ 1 and γ 2 are not perfectly degenerate because the axisymmetry is broken due to the presence of the holes and, what is more important, due to the two missing coils (the coils act as short-circuited passive conductors in the open-loop computation).
The model reduction procedure yields a collection of current patterns π 3 , . . . , π N spanning the stable subspace of A 0 , which constitute all except the first two columns of the pattern matrix as defined by (3.24). Note that the first two columns of correspond to the unstable RWMs, i.e. π 1 = x 1 and π 2 = x 2 . Each pattern π i , i = 3, . . . , N is associated with a variance fraction λ i−2 , or 'importance measure', as computed during the solution of the eigenvalue problem (3.20). The 12 leading stable patterns π 3 , . . . , π 14 and their corresponding variance fractions λ 1 , . . . , λ 12 are displayed in figures 4 and 5. As explained in the following, several different physical processes can be readily identified in these patterns, and it appears plausible that these processes are the most important ones involved in the RWM feedback control problem. Concerning their variance fractions, the first four patterns π 3 , . . . , π 6 are well separated from the others. Altogether, they comprise four processes, namely, (i) an n = 1 pattern of coil currents, (ii) the corresponding counterpart toroidally rotated by roughly 90 • , (iii) a pattern of small-scale, dipole-like eddy currents localized in the neighbourhood of the sensors and modulated sinusoidally with 'wave number' n = 1 in the toroidal direction, and (iv) the corresponding pattern toroidally rotated by 90 • . While π 3 is an almost pure coil current pattern, π 4 , π 5 and π 6 are linear combinations of the remaining coil and eddy current processes. The importance of the coil current patterns is clear since these are just the actuator patterns generated by the gain matrices (2.11) and (2.12). But the small eddy currents are also very important because they produce a strong n = 1 signal detected by the sensors. It can be expected that such current patterns provoke an unwanted, fierce response of the feedback system, and it will be shown later in this section that this is indeed the case. The patterns π 7 and π 8 represent the small-scale part of the image currents associated with the excitation of the coil currents. Another small-scale eddy structure near the sensors can be found in π 9 and π 10 , whereas π 11 and π 12 mainly represent the large-scale part of the image currents excited by the coils. Finally, π 13 and π 14 are almost purely large-scale n = 1 patterns, probably strongly interacting with the plasma.
As already implied by the values given in figures 4 and 5, the variance fractions λ i fall off rapidly as i increases. This indicates the model reduction being very efficient. The truncation limit N red (reduced model dimension) is set to the value for which λ N red −2 > 10 −12 > λ N red −1 holds. This results in N red = 56, allowing for both very fast optimization runs and excellent reproduction of the control system properties, as shown below.
In the following feedback computations, the time delay (2.23) is set to τ = 0.1 ms. The feedback optimization is conducted subject to the constraint that the effective current in each coil must not exceed the prescribed value I max = 74 kA to be substituted into the inequality (2.19). By this choice and by the estimate S max = 1.5 mT [9] as an RWM detection limit, the saturation current gain magnitude corresponding to the left-hand side of (2.19), is limited to twice the minimum gain needed to obtain stabilization. The chosen value of I max is not inconsistent with the fact that the coils in the set were designated to carry a total current of about 300 kA, where, however, a large portion of this current was intended to be used for ELM suppression. As an additional constraint, π 3 (λ 1 = 33.9%) π 4 (λ 2 = 22.8%) π 5 (λ 3 = 18.7%) π 6 (λ 4 = 17.6%) π 7 (λ 5 = 2.68%) π 8 (λ 6 = 1.75%) Figure 4. Stable current patterns π 3 , . . . , π 8 represented by current potential isolines in the same way as the unstable RWMs in figure 3. These patterns explain variance fractions λ 1 , . . . , λ 6 as indicated. Coil currents are visualized by colouring of the coils. The intensity of the colour corresponds to the strength of the current, and the choice of blue or red colour indicates the direction of the coil current in the same way as the rotational direction of wall currents is represented by the colouring of the equipotential lines. Like in the case of RWMs, toroidal rotation of these patterns by 180 • results mainly in a sign reversal (not shown). π 9 (λ 7 = 0.892%) π 10 (λ 8 = 0.887%) π 11 (λ 9 = 0.239%) π 12 (λ 10 = 0.171%) π 13 (λ 11 = 0.108%) π 14 (λ 12 = 0.0970%) Figure 5. Same as figure 4, but showing the stable current patterns π 9 , . . . , π 14 and their variance fractions λ 7 , . . . , λ 12 . Optimal parameter values α, β andR obtained after optimization of the objective functions F 1 through F 4 . Derived from these values, the following quantities are also listed: the coil time constants T = L/(R +R) corresponding toR, where L = 4.65 µH, R = 10.9 µ , the saturation current gain magnitude I as defined by (6.1), and the toroidal response phase ϕ = arctan(β/α). Note the constraints T 1 ms and I 49.3 kA mT −1 .
Optimal  .7) is minimized in subsequent OPTIM runs. Optimization of F 4 = σ (A) is performed setting = 2ρ opt after having obtained an optimal complex stability radius ρ opt from the minimization of F 3 .
Since the minima found in these nonlinear optimization problems generally cannot be guaranteed to be global, the computations are repeated for each objective using several different starting parameter sets. Optimizations are started with the initial value ofR chosen such that T = 100, 30, 10, 3 and 1 ms, and the voltage gains α, β set to zero. The result with the lowest objective value is considered the best available one and is retained for further analysis, respectively, and all other runs are discarded. Each of the four optimal solutions is further examined in terms of the values of all four objective functions, and concerning its eigenvalues, pseudospectra, and ||e tA || curves for t between 0 and 0.1 s. All these quantities are computed for both the full and the reduced model in order to show sufficient agreement, so that the chosen reduced model size is ensured to be large enough. Based on all these analyses, the four objective functions will be judged with respect to their capabilities to produce controllers achieving good asymptotic stability, or robust stability, or moderate transient behaviour.
The sets of parameter values α, β andR resulting from the optimizations are listed in table 1, together with derived quantities which are somewhat easier to interpret, namely, the coil time constant T , the saturation current gain I as defined by (6.1), and the toroidal phase angle ϕ between the measured n = 1 perturbation and the coils' n = 1 response pattern. The four different solutions differ strongly from each other concerning the response phase, and also considerably concerning the current gain magnitude. Furthermore, the F 1 -optimal solution is characterized by a slow response (large T ), whereas the response is fast in the other three cases.
The values of all functions F 1 through F 4 evaluated using the full model for all four parameter sets can be found in table 2. For the reduced model, the values coincide with those computed with the full model up to a relative error of about 10 −3 in the worst case (the F 1 value after optimization of F 1 ), but to 10 −4 or less in all the other cases (not shown). The F 2 Table 2. Values of the functions F 1 through F 4 as indicated by the column label, evaluated for the respective parameter set optimizing the function corresponding to the row label, substituted into the full model. When evaluating F 2 , the eigenvalues have been scaled to internal, dimensionless units. values are the only exception. They are considerably higher in the case of the full model because much more eigenvalues are contributing to the sum (4.2), which are, however, largely invariant under parameter variations. Of course, the F 1 -optimal solution exhibits the best asymptotic stability, but the latter is not much worse in the other cases. Robustness, as expressed by the F 3 value, varies strongly among the different cases. Expectedly, the F 3 -and F 4 -optimal solutions are considerably more robust than the other two. Robustness is by far the weakest after F 2 optimization. Not surprisingly, optimizing F 4 leads to a slightly less robust result than optimizing F 3 , but this is compensated to some extent by improved transient behaviour, as will be shown below. Figures 6-9 show spectra and pseudospectra for all four parameter sets substituted into the reduced as well as the full model. These plots reveal the sensitivity of individual eigenvalues as well as the sensitivity of the system's stability as a whole. For example, the F 3 values given in table 1 can also be crudely estimated by inspection of figure 6. Furthermore, as already discussed in section 4.3 and also examined below, the pseudospectra parts in the right complex half-plane are related to the transient behaviour.
Comparing for each parameter set, respectively, the pseudospectra for the two models, one finds virtually perfect agreement in the right half-plane and in the neighbourhood of eigenvalues with large imaginary parts (corresponding to rotating, damped modes), which turn out to be the most sensitive ones. Noticeable deviations are found only in regions close to the negative real axis, far away from the imaginary axis, where lots of eigenvalues of the full model are missing in the reduced model. But these deviations are unimportant, since these eigenvalues have small imaginary parts, indicating that the corresponding eigenmodes are likely to be largely unaffected by the feedback system. Furthermore, they decay rapidly. The exclusion of these eigenmodes from the reduced model shows that the reduction method works as intended. The reduction is well suited to the pseudospectral approach, since the reduced model reproduces accurately the locations and sensitivities of the 'most dangerous' eigenvalues. Since the optimization runs are finished within minutes in the simplified test case presented here, it is to be expected that the method can readily be applied to realistic ITER geometry with a high level of detail. It is noteworthy that the truncation criterion used here (include all patterns whose variance fraction is greater than 10 −12 ) is extremely conservative and can be considerably relaxed in any case when the reduced models are found to be too large.
In the case of the F 1 -optimal parameter set, the -pseudospectra cover a strikingly small domain even for large values, compared to the other parameter sets. The eigenvalues are not extremely sensitive. Estimating the ratio on the right-hand side of (4.6) by inspection of the contours for some of the larger -values given in figure 6, one finds values exceeding unity, but not by a large amount. It follows that the transient peak does not necessarily need to be immoderate. However, the eigenvalues close to the imaginary axis are not insensitive enough to obtain very robust stability. The solution after optimizing F 2 is characterized by two extremely sensitive complex eigenvalue pairs with a large magnitude of imaginary parts (see figure 7), which deteriorate the robustness and also the transient behaviour. E.g., for = 1 s −1 , γ (A) overlaps a large region in the right half-plane due to the strong influence of the dangerous eigenvalues. By estimating σ (A) for = 1 s −1 from figure 7 and considering (4.6), one finds that some initial states must exist that are transiently amplified at least by a factor of about 80.
Optimization of F 3 and F 4 , respectively, results in a complex, sensitive eigenvalue pair with a large imaginary part, as well (figures 8 and 9). These eigenvalues, however, have been pushed far enough to the left so that robustness is not affected by them. But in the case of the F 3 -optimal parameter set, the sensitive eigenvalue pair causes a significant protrusion of γ (A) into the right half-plane for > ρ(A) and thus determines σ (A) for these values. A large transient peak can be expected. For the F 4 -optimal solution, the sensitive pair is located so far on the left so that there is no influence on σ (A) for any value considered in figure 9, and the transient amplification could be more moderate.
The ||e tA || curves obtained for each optimal parameter set substituted into the full model are displayed in figure 10. The corresponding curves for the reduced model (not shown) are indistinguishable from those shown in the figure. Together with the excellent agreement concerning the objective function values and pseudospectra, this shows that the reduced model size has definitely been chosen sufficiently large. The ||e tA || curves confirm the predictions made above. The transient amplification is quite modest for F 1 . On the other hand, ||e tA || decays much slower than in all the other cases. For F 2 , the transient behaviour is disastrous. There are amplifications by a factor of more than 170 and, in addition, the ||e tA || curve is comparatively broad. In the F 3 -optimal case, the peak is sharper, but the amplification factor still exceeds 80.  Finally, the peak value is alleviated to about 30 for F 4 , which, however, is still somewhat unsatisfactory.
In order to understand which physical processes are responsible for transient growth, the initial state that suffers the strongest transient amplification in the F 3 -optimal case is presented in figure 11 as a current pattern. This state vector is given by the leading eigenvector of (e tA ) T e tA with t being equal to the time where ||e tA || attains its maximum. The vector has been normalized to unity. The main feature of the current pattern is a structure composed of small-scale eddy currents, toroidally modulated with n = 1, which closely resembles the corresponding structures contributing to the stable current patterns π 4 through π 6 (figure 4). For the other parameter sets, the corresponding equivalents to the state shown in figure 11 are very similar (not shown), except that a significant RWM contribution is admixed in the F 1 case. The mechanism of transient amplification is therefore much the same in all four cases: the small-scale eddies generate a strong n = 1 sensor signal resulting in a vigorous response of the coils and an erroneous, rapid excitation of a high-amplitude RWM, which has to be controlled afterwards. This behaviour can be observed in a video sequence available from stacks.iop.org/NJP/11/053015/mmedia (evolution smalleddy.mpg, 2.8 MB) showing the time evolution starting from the initial state displayed in figure 11 and using the F 3 -optimal feedback controller. Note that the normalization of the coil colouring in the video is the same as in figure 4 in order to visualize the coil current distribution also in the later stage of the evolution. In the early stage, the colour range is 'overdriven', i.e. the coil currents are much stronger than indicated by the colours.
For comparison, another video (evolution rwm.mpg, 2.8 MB) shows the evolution for the same controller, but with a pure RWM (the normalized state x 1 , see the left panel of figure 3) as the initial condition. Significant transient amplification (increase of the overall current intensity) is also present here, but it is much less severe than in the case of the small-eddy initial state. Interestingly, in both time evolutions the damped, slowly rotating RWM is superposed by a rapidly rotating, strongly damped mode in the initial phase of the evolution. This mode has a frequency of about 180 Hz and corresponds to the eigenvalue pair with large imaginary part magnitudes (the eigenvalue with a large positive imaginary part is visible in figure 8).
Since the physical mechanism of transient amplification has now been revealed, new strategies can be developed to avoid transient amplification while retaining much of the robustness, or even improve the latter. These strategies can be used as an alternative, or in addition, for optimizing F 4 for large . Four different strategies, denoted A, B, C and D in the following, shall be tested here: (A) use the constraint T 30 ms, (B) shift the sensors from the radial position R = 8.9 to 8.6 m, (C) keep the sensors at their original position (R, Z ) = (8.9 m, 0.45 m) and add an additional array of 18 sensors at the same toroidal and radial positions, but at Z = 0.0 m, and (D) shift both sensor arrays to R = 8.6 m. The motivation for keeping the feedback response slow, as done in case A, is to prevent the response from growing large before small eddies as shown in figure 11 decay. Shifting the sensors away from the conducting wall decreases the strength of the eddy-generated magnetic field at the sensors. Finally, the main intention of adding the second sensor array is to improve robustness rather than to decrease transient amplification. Observability is enhanced by the fact that not only the poloidal field is measured, but also the poloidal derivative. Therefore, two sensor arrays should do better in distinguishing RWMs from intrinsically stable plasma modes than a single array. But it is also hoped that a double array is somewhat harder to be misled by small-scale currents in the wall than a single array, so that the transient behaviour should also benefit from the introduction of the second array.
The different approaches are tested by optimizing F 3 for all cases A through D subject to the same current constraint I max = 74 kA as before, and subject to T 1 ms for the cases B to D. For C and D, the additional sensor array increases the number of free parameters from 3 to 5. The optimal F 3 values are −2.27 s −1 (A), −4.28 s −1 (B), −3.99 s −1 (C) and −5.31 s −1 (D). The corresponding ||e tA || plots are displayed in figure 12. Using the slow-response constraint (case A) dramatically reduces the transient peak, but at the cost of robustness and the decay rate of the ||e tA || curve. Compared to the F 3 -optimal standard case shown in figure 10, the peak value is reduced substantially in case B, but only slightly in case C. In case D, it is considerably larger than in case B, but D is more robust. Compared to the standard case, both strategies B and D lead to a considerable improvement of both robustness and transient behaviour, whereas strategy C is less efficient. The transient behaviour might still be considered not entirely satisfactory even for case B, but it can be further improved at the expense of robustness, by combining e.g. strategy B with slowing down of the feedback response (like in strategy A) or optimizing F 4 for larger than the optimal complex stability radius.

Summary and discussion
A novel approach to the numerical treatment of stabilizing RWMs in tokamaks by means of magnetic field sensors and active feedback coils has been presented. Instead of characterizing the control system in terms of its input-output behaviour, as usual in control theory, the analysis has been focused on the internal dynamics of the system. This approach is led by the insight that the state space representation of the system, which emerges from physical modelling and therefore exists for its own sake, contains much more information than needed to generate a particular input-output behaviour (many different systems can have the same input-output characteristics). This information surplus is exploited to analyse quantitatively, by means of a physically based measure, the control system's sensitivity to uncertainties of the model, or, in other words, the robustness of the system's stability with respect to changes of the system.
In this approach, the choice of the coordinate system used to define the system's state space is of particular significance. While the coordinate system is unimportant to the input-output characteristics, it defines a metrics in the state space, a measure of the difference between system states. The basic assumption of this study is the special interest in a particular state vector norm, which is typically based on physically measurable quantities. This norm is considered distinguished compared to all the other possible choices of norms. The induced metrics enables the investigator to decide whether the difference between system states is small or large. Therefore it can also be decided whether a change of the system (represented by changes of the system's matrix elements) can produce a large change in the tendency of the system state or not. The corresponding measure is the matrix norm which is vector-bound to the state vector norm of interest.
The control system model is formulated in terms of a parametrized matrix. The parameter values describe the rule how the active coils react to the sensor signals. These parameters can be varied in order to stabilize the system and optimize the system's stability with respect to certain criteria. To make such a parameter optimization computationally feasible it is inevitable to reduce the dimension of the control system model while retaining the characteristics of its response to the feedback. Powerful standard methods that serve this purpose are available in the literature, but they are based on oblique transformations. They do not preserve the state space metrics and are therefore not suitable for use in combination with the methodology of this study. For this reason, a novel, highly efficient orthogonal model reduction procedure named 'isometric truncation' has been designed and utilized in this work.
For optimization of the feedback parameters, the OPTIM code has been developed. Four different objective functions given by (4.7) are available that can be chosen to be optimized. Two of them, F 1 and F 2 , can be considered as measures for asymptotic stability and take only the eigenvalues of the parametrized matrix into account. The other two functions, F 3 and F 4 , are related not only to the eigenvalues but also to their sensitivity. Therefore, they act as measures for the robustness of the stability. They involve the matrix norm mentioned above and are in close relationship to matrix pseudospectra. The -pseudospectrum of a matrix is the union of eigenvalue spectra occurring for all perturbed matrices where the perturbation is smaller than . Optimizing F 4 for a suitably chosen value may help to attenuate possible transient amplification of initial states.
The objective functions are not smooth everywhere, in general. Non-standard techniques are required for their optimization. The method implemented in the OPTIM code is based on sampling gradients from the neighbourhood of the current point in parameter space.
For a simple, ITER-like test case, the feedback parameters have been optimized with respect to F 1 , F 2 , F 3 and F 4 , respectively. The four resulting, feedback controlled systems have been compared with respect to asymptotic stability, eigenvalue sensitivity, robust stability and transient behaviour by inspecting the values of the different objective functions, the eigenvalues, pseudospectra and ||e tA || plots. The solutions that have been optimized with respect to F 1 and F 2 are characterized by inferior robustness. The F 2 -optimal solution has extremely sensitive eigenvalues and a catastrophic transient peak. Optimizing F 3 produces a result that is by far more robust. Finally, the F 4 -optimal system is slightly less robust but exhibits less severe transient amplification than the F 3 -optimal case. The asymptotic stability of the latter two cases is not substantially worse than in the former two cases.
As a physical mechanism of the transient amplification, small-scale eddy currents in the conducting wall parts in the neighbourhood of the sensors have been identified. Based on this finding, additional strategies to mitigate the transient amplification have been developed and tested, namely, slowing down the response of the feedback system, moving the sensors away from the wall, and using an additional sensor array. All these strategies are successful to some extent and can be combined with each other. Taking these actions can either degrade or improve robustness. Testing such strategies in the case of a realistic ITER geometry is left for a future study.
It has been convincingly shown that it can be extremely dangerous to be satisfied with merely having stabilized the RWM control system. Inspecting the system's eigenvalues alone and finding that all of them are located well in the left complex half-plane tells nothing about the sensitivity of eigenvalues, robustness of the stability and possible transient amplification. Pseudospectra plots are a useful tool to get an impression of the eigenvalue sensitivity. Computation of the complex stability radius (i.e. F 3 ) provides a quantitative measure of robustness. Plotting ||e tA || reveals possibly dangerous transient amplification. It appears recommendable to use these tools routinely when designing RWM feedback controllers, and to use objective functions like F 3 or F 4 for optimization.
To the authors' knowledge, initial conditions that are not RWMs have not yet been considered in any other RWM stabilization study, and the possibility of severe transient amplification of these states has not yet been recognized by the fusion research community.
On the other hand, it has to be noted that it cannot be inferred from this study to what extent the transient growth as predicted by ||e tA || is indeed relevant. The question remains whether the states that are significantly transiently amplified, like the one shown in figure 11, are excited at any time in a 'real world' situation. Since such states project mainly onto the stable subspace of the open-loop system, they can be expected to be driven only by physical processes not included in the model (see also the discussion concerning transient amplification of the sensor signal in section 2.3). Again, nonlinear, forced-dissipative MHD simulations would be required to estimate the significance of such 'external' processes. Anyway, if the transient amplification is found to be an issue, strategies like those developed in this study could be useful to mitigate this problem to some extent.