Model reduction for stochastic systems with nonlinear drift

In this paper, we study dimension reduction techniques for large-scale controlled stochastic differential equations (SDEs). The drift of the considered SDEs contains a polynomial term satisfying a one-sided growth condition. Such nonlinearities in high dimensional settings occur, e.g., when stochastic reaction diffusion equations are discretized in space. We provide a brief discussion around existence, uniqueness and stability of solutions. (Almost) stability then is the basis for new concepts of Gramians that we introduce and study in this work. With the help of these Gramians, dominant subspace are identified leading to a balancing related highly accurate reduced order SDE. We provide an algebraic error criterion and an error analysis of the propose model reduction schemes. The paper is concluded by applying our method to spatially discretized reaction diffusion equations.


Introduction
Model order reduction (MOR) aims to find low-order approximations for high-/infinite-dimensional systems of differential equations reducing the complexity of the original problem. Many MOR schemes are based on projections (Galerkin or Petrov-Galerkin type). In this context, the first goal is to identify solution manifolds and approximate them by low-dimensional linear subspaces. A reduced state variable, taking values in this subspace, is subsequently constructed in order to ensure an accurate estimation of the original dynamics. There is a rich selection of different MOR strategies. Proper orthogonal decomposition (POD) [20] is an approach, where solution spaces are learned from data. Methods like the iterative rational Krylov algorithm (IRKA) [13] rely on interpolation or on the minimization of certain error measures between systems. Moreover, there are Gramian based techniques like balanced truncation (BT) [25], where dominant subspaces of the original dynamics are associated to eigenspaces of these (algebraic) Gramians. Recently, there has been an enormous interest in dimension reduction for large-scale nonlinear systems. Data-driven [12,18,28] or interpolation/optimization based methods [3,6] were applied to such equations in a deterministic framework. Generalizing BT to nonlinear systems was first addressed in [32]. Alternatives, where the reduced order model can be computed easier, can be found in [5,19]. MOR in probabilistic settings is even more essential than in the deterministic context discussed above. This is due to an enormous amount of system evaluations required, e.g., for conducting Monte-Carlo simulations. On the other hand, it is also about the feasibility of certain algorithms. E.g., a stochastic differential equation (SDE) in dimension n is in some sense equivalent to a partial differential equation (PDE) with n spatial variables using the formula of Feynman-Kac. Knowing how hard it is to solve high-dimensional PDEs in general, it becomes clear how vital MOR for SDEs is. A POD approach for SDEs is studied in [34]. Balancing related or optimization based MOR techniques are, for instance, investigated in [2,4,7,31] for the linear case. The advantage of the latter schemes is the possibility for detailed error and stability analysis. However, an extension to nonlinear stochastic systems seems very challenging. A first approach for stochastic bilinear equations is presented in [29] but it might not work for more complex nonlinearities.
The goal of this paper is to extend BT to stochastic systems, e.g., with certain polynomial nonlinearities. In the deterministic case, a wide focus is on quadratic systems, see for instance [5,19]. This is because many nonlinear terms in a differential equation can be transformed to a quadratic expression using additional dummy variables. This approach is called lifting in the literature. It has the advantage that a large set of nonlinear systems can be covered if we know how to handle quadratic ones. However, this is also the drawback of this ansatz, since differential equations involving quadratic terms range from globally stable to finite time explosion systems, i.e., the existence of a global solution is not guaranteed. This large variety of properties makes it seem infeasible to develop a general theory like for example an error analysis with sharp bounds. For that reason, we do not intend to apply to technique of lifting the dynamics to a quadratic system in this paper, because one might loose track of essential properties that are usually not visible anymore in a transformed SDE. Instead we exploit the structure of our locally Lipschitz nonlinearity that we assume to be of one-sided linear growth. This also involves interesting polynomials that play a role in reaction diffusion equations. This type of growth will be reflected linearly in the associated Lyapunov operator that defines the Gramians that we propose in our MOR procedure.
The paper is now structured as follows. Section 2 deals with the setting and the first details concerning the goals of this work. In Section 3, we recall facts about existence and uniqueness of solutions to the considered nonlinear SDE. We further investigate global asymptotic stability as the basis of the Gramians that we introduce in Section 4. There, it is explained and reasoned how Gramians need to be chosen in order to find a good dominant subspace characterization and hence an accurate reduced system. We also discuss on properties of Gramians that need to be fulfilled to ensure the classical error bound for BT known for deterministic linear systems [10,11]. Having computed the desired Gramians based on the strategy that we provide, we explain how to compute the reduced system in Section 5. Finally, Section 6 delivers an error bound analysis for the balancing related MOR scheme, also involving a discussion on criteria for a high approximation quality. Section 7 illustrates the performance of the MOR technique by applying it to spatially discretized stochastic reaction diffusion equations.

Setting, notation and goal
Let Ω, F, (F t ) t∈[0,T ] , P 1 be a filtered probability space on which every stochastic process appearing in this paper is defined. Given an R d -valued and square integrable Lévy process M = M 1 . . . M d with mean zero, we assume that it is is (F t ) t∈[0,T ] -adapted and its increments M (t + h) − M (t) are independent of F t for t, h ≥ 0 and t + h ≤ T . Exploiting the independent and stationary increments, there exists a positive semidefinite matrix K = (k ij ) i,j=1,...,d , so that [27,Theorem 4.44] for a proof. We call K covariance matrix of M . Now, we consider the following large-scale nonlinear stochastic dynamics driven by M : where The state vector x(t) ∈ R n is assumed to be high-dimensional, whereas the quantity of interest y(t) ∈ R p usually is a vector with a low number of entries. The nonlinear function f : R n → R n shall satisfy the following local Lipschitz condition for x 2 , z 2 ≤ R, c R > 0 and any R > 0, where ·, · 2 denotes the Euclidean inner product with corresponding norm · 2 . Further, we assume the special type of monotonicity condition for all x ∈ R n and a constant c f . In the literature, (3) is called one-sided growth condition as well. In fact, c f can be negative. In this case, (3) is also known as dissipativity condition. Below, x(t, x 0 , B), t ∈ [0, T ], represents the solution to (1a) with initial condition x 0 ∈ R n and matrix B determining the inhomogeneous part of the state equation. The associated control process u is assumed to be an (F t ) t∈[0,T ] -adapted process with Moreover, suppose that f (0) = 0 to ensure that the uncontrolled state equation (1a) (B = 0) has an equilibrium at zero. If f (0) = 0, we can replace f by f − f (0) as well as B and u by B f (0) and u 1 , respectively. The above setting covers interesting polynomial nonlinearities. This is illustrated in the next example.
Example 2.1. The local Lipschitz condition (2) is fulfilled by all functions f with continuous partial derivatives. This is particularly given for polynomials. If we assume f = f (i) , i ∈ {1, 2, 3}, to be special third order polynomial, where the monotonicity condition (3) holds. The products/powers involving "•" have to be understood in the Hadamard (component wise) sense and 1 n is the vector of ones having length n. Now, (3) can be verified by the following calculations for all x i ∈ R.
Our setting is not restricted to the functions of Example 2.1. However, we will frequently refer to these interesting cases. Let us point out that the component-wise functions f (1) and f (2) occur if the nonlinear part of certain (stochastic) reaction diffusion equations are evaluated on a spatial grid. To be more precise, a finite difference discretization of Zeldovich-Frank-Kamenetsky (or FitzHugh-Nagano) and Chafee-Infante equations would lead to such a setting. This paper does not intend to discuss finite difference schemes for stochastic partial differential equations in detail. However, the interested reader may find more information regarding these methods in [14,15,16,33]. We also refer to, e.g., [8,21,24,27] for a theoretical treatment of stochastic reaction diffusion equations.
The goal of this paper is to drastically reduce the dimension of the high-dimensional system (1) in order to lower the computational complexity when solving this system of stochastic differential equations. Therefore, the solution manifold of (1a) shall be approximated by an r-dimensional subspace im[V ] of R n (V ∈ R n×r is a full-rank matrix), so that we find a process x r yielding V x r (t) ≈ x(t). Inserting this estimate into (1) leads to with y(t) ≈ y r (t) := CV x r (t) and where e(t) is the state equation error. Now, we enforce the residual e(t) to be orthogonal to a second subspace im[W ] (W ∈ R n×r has full rank). We further assume that our choice of W provides W V = I. Multiplying (4) with W , we obtain with x r (0) = W x 0 ∈ R r , r n and y ≈ y r . Generally, we have that x r (t) ∈ R r , A r ∈ R r×r , B r ∈ R r×m , C r ∈ R p×r , N r : R r → R r×d defined by N r (x r ) = N r,1 x r . . . N r,d x r for x r ∈ R r , where N r,i ∈ R r×r (i = 1, . . . , d) and f r : R r → R r . In particular, the reduced coefficients are of the following form The goal of this paper is to provide a reduced order method for which we can compute the projection matrices V and W and for which we find an accurate approximation of (1). Here, the main focus will be on the control dynamics and not on the initial state. Therefore, we study reduced order modelling when x 0 = 0. Moreover, we aim to investigate Gramian based schemes which often heavily rely on stability of the state equation. Therefore, we discuss global asymptotic stability in the next section. Before doing so, we briefly point out that there is a unique solution to (1a) by referring to the existing literature.
3 Existence and uniqueness as well as global asymptotic stability

Existence and uniqueness for (1a)
We briefly discuss that our setting is well-posed. We define the drift function F (t, x) := Ax + Bu(t) + f (x) of (1a). Using (3) and exploiting that the remaining parts in the drift and diffusion are either linear in x or solely time dependent, we can find a constant c F,N , so that given that the control u is bounded by a constant independent of t ∈ [0, T ] and ω ∈ Ω. Here, · F denotes the Frobenius norm. Moreover, the drift F is locally Lipschitz continuous (uniformly in (t, ω)) in the sense of (2), since the same is true for f . As N is linear, it is particularly globally Lipschitz with respect to · K 1 2 F . The monotonicity condition (7) and local Lipschitz continuity of drift and diffusion yield existence and uniqueness of a solution to (1a) by [23,Theorem 3.5] if M is a Brownian motion. On the other hand, the arguments of Mao [23] can immediately be transferred to our more general setting because the Ito-integral w.r.t M has essentially the same properties as the one in the Brownian case. The first property is the Ito F ds =: Ψ 2 for predictable 2 processes Ψ with Ψ < ∞ which relies on the linear covariance function of M , see [27]. Secondly, the equation for the expected value of a quadratic form of the state variable has the same structure, see Lemma A.1. It is also worth mentioning that existence and uniqueness has been established in a more general setting than in [23] also covering ours, see [1]. There, the result was proved assuming a monotonicity condition, a local Lipschitz condition in the drift and the Brownian diffusion part as well as global Lipschitz continuity in the jump diffusion.

A note on global asymptotic stability
Stability concepts are essential in order to define computational accessible Gramians which are vital for identifying less relevant information in a system like (1). We recall known facts for the linear part of (1) based on the results in [17]. (a) The state in (1a) is exponentially mean square stable, i.e., there are k, β > 0, so that (b) It holds that where λ(·) denotes the spectrum of a matrix.
(c) There exists a matrix X > 0 with Proof. A proof of these statements can be found in [9,30].
Throughout the rest of the paper, we assume that for some constant c 1 . According to Proposition 3.1 this means that (1a) with the shifted linear drift coefficient A+c 1 I is mean square asymptotically stable for B = 0 and f ≡ 0. The associated state variable is of the form e c 1 t x(t), so that the original state x(t) (B = 0 and f ≡ 0) needs to have a decay rate β > c 1 , see Proposition 3.1 (a), given that c 1 is positive. We desire, but do not assume, that we can choose c 1 ≥ c f , i.e., the decay rate of the linear part shall outperform the one-sided linear growth constant in (3). This requires a sufficiently stable linear part if c f > 0, e.g., for the nonlinearities in Example 2.1. Since c f can also be negative, this means that the linear part of (1a) can even be exponentially increasing in some cases. Using classical arguments of [17,23] based on quadratic Lyapunov-type functions, we provide the following criterion for the global mean square stability of the uncontrolled state equation (1a). This criterion is required around the discussion of the Gramians introduced later.
Theorem 3.2. Suppose that B = 0 in (1a) and given constant c 1 , c 2 ∈ R and a matrix X > 0. If we have for all x ∈ R n . Then, there exist constants k, β > 0, so that

Proof. A proof is stated in Appendix B
Remark 3.3. If c 1 ≥ c 2 in Theorem 3.2, we obtain global mean square asymptotic stability for our nonlinear system. In particular, by assumption (3), (10) holds for X = I and c 2 = c f . If (9) is now true for X = I and c 1 = c f , mean square asymptotic stability follows.

Gramians and dominant subspace characterization
In this section, algebraic objects, called Gramians, are introduced. We aim to construct them, so that their eigenspaces corresponding to small eigenvalues coincide with the information in (1) that can be neglected. It is not trivial to find the right notion for general nonlinearities f . However, the monotonicity condition in (3) will become essential for our concept. In particular, positive (semi)definite Gramian candidates X have to preserve (3) in a certain sense when ·, · 2 is replaced by ·, X· 2 . We begin with a global Gramian concept to illustrate what we require. Subsequently, we immediately weaken it for practical reasons.

Monotonicity Gramians
First, a pair of Gramians is defined that characterizes dominant subspaces of (1) for all u ∈ L 2 T . Definition 4.1. Let c 1 and c 2 be constants. Then, a pair of matrices (P, Q) with P, Q > 0 is called global monotonicity Gramians if they satisfy and if further holds that for all x ∈ R n .
Notice that assumption (8) ensures the existence of solutions to (11) and (12), see [4,30]. In the following, we state a sufficient criterion for the existence of Gramians also satisfying (13).
Proposition 4.2. Suppose that (9) and (10) hold with some constants c 1 and c 2 . Then, global monotonicity Gramians P and Q exist with the same constants.
Proof. We denote the left hand side of (9) by −Y and multiply it with γ > 0. Hence, we have Since Y > 0, we can ensure that −γY ≤ −(γX)BB (γX) if γ is sufficiently small. Therefore, P = (γX) −1 solves (11) for a potentially small γ. On the other hand, this P gives us x, (10). This concludes the proof.
Certainly, the existence of global monotonicity Gramians is not sufficient for our considerations. As we will see later, it is important to find candidates P and Q that have a large number of small eigenvalues. Consequently, one might have to solve a problem of minimizing tr(P ) and tr(Q) subject to (11), (12) and (13). Moreover, we allow c 1 < c 2 in Definition 4.1 to have an additional degree of freedom. However, this comes with a price. We will observe that c 2 − c 1 is supposed to be small. In fact, we desire to choose c 1 = c 2 if such a c 1 ensures (8).
for any X ≥ 0 and all x ∈ R n . Therefore, any solutions of (11) and (12) with c 1 = c 2 = c f = 1 are global monotonicity Gramians. In particular, we can choose the solution to the equality in (12) and the candidate with minimal trace in (11).
• If f is globally Lipschitz in some norm, then there exist a Lipschitz constant c L , so that given that X = P −1 , Q > 0 meaning that every positive solution to (11) and (12) can be picked. However, c L depends on X which shows that c 1 and c 2 influence each other. On the other hand, this c L might not be the optimal candidate for the one-sided Lipschitz constant c 2 which can even be negative, i.e., it is also challenging to identify optimal constants.
We emphasize further that, generally, we cannot derive P and Q independent of (13). For instance, fixing c 1 = c 2 ≥ c f , we can easily find a solution Q for (12) and a vector x ∈ R n , so that x, Qf (1) is the function defined in Example 2.1. Having in mind that we aim to fix c 1 and c 2 close to each other with associated Gramians P and Q having a large number of small eigenvalues, the concept of global Gramians might generally be too restrictive. Therefore, it is more reasonable to seek for solutions of (11) and (12) that satisfy (13) on average instead of point-wise. This means, we aim to allow for positive values of the monotonicity gaps as long as G P −1 and G Q are mainly non-positive on the essential parts of R n . We specify the above arguments in the following definition. In this context, we introduce the set U of controls u ∈ L 2 T for which we desire to evaluate system (1). The following pair of Gramians (P, Q) identifies less important direction for controls in U. Therefore, it is meaningful to pick Gramian candidates that ensure a large set U.
Definition 4.5. Let c 1 , c 2 be constants and U ⊆ L 2 T be the set of controls we are interested in. Then, a pair of matrices (P, Q) with P, Q > 0 is called average monotonicity Gramians for U if (11) and (12) are satisfied, respectively, and if instead of (13) it holds that for all t ∈ [0, T ] and all state variables Certainly, a global is also an average monotonicity Gramian with U = L 2 T . Suppose that there are areas, where one of the functions in (15) is positive. Then, controls u concentrating the state variable x in such areas for a long time will violate (16) or (17).
Remark 4.6. In Definitions 4.1 and 4.5, Gramians are constructed as solutions to (shifted) linear matrix inequalities in order to allow a practical computation. This is possible due to the monotonicity condition for f in (3) which shall be preserved in some sense under the inner products defined by the Gramians P and Q. A more general version of global monotonicity Gramians is obtained by adding twice the estimates in (13) to (11) and (12) resulting in for all x ∈ R n , where c ≥ 0 is some "small" constant. The same way, average monotonicity Gramians can be generalized setting x = x(s) in (18) and (19), taking the expected value and integrating both sides of these inequalities over each subinterval [0, t] with 0 < t ≤ T . However, we will not discuss this generalization in detail below.

Relevance of monotonicity Gramians
In the following, we state in which sense the Gramians of Definition 4.5 help to identify the dominant subspaces of (1). This then motivates a truncation procedure resulting in a special type of reduced system (5). Below, let us assume that x 0 = 0, i.e., x(t) = x(t, 0, u). By definition, Gramians are positive (semi)definite matrices. Consequently, we can find an orthonormal basis (p k ) for R n consisting of eigenvalues of P with corresponding eigenvalues (λ P,k ). The same is true for Q, where the basis is denoted by (q k ) with associated eigenvalues (λ Q,k ). Hence, the state variable can be represented as Based on this representation, we aim to answer which directions p k are less relevant in (1a) and which directions q k can be neglected in (1b).
Theorem 4.7. Let P and Q be average monotonicity Gramians for the set of controls U ⊆ L 2 T and constants c 1 , c 2 according to Definition 4.5. Moreover, let (p k , λ P,k ) and (q k , λ Q,k ) be associated bases of eigenvectors giving us (20). Then, given a zero initial state, we have Proof. We find inequalities for E x(t) Xx(t) , where X ∈ {P −1 , Q}. To do so, we apply Lemma A.1 to X 1 2 x(t) and obtain exploiting (16), (17) and that (12). Therefore, by (50), we have 2 2 ) e c(t−s) ds, Inserting the representation for x(s) in (20) yields leading to (22). With X = P −1 in (23), it holds that We further observe that so that (21) follows. This concludes the proof.
Estimate (21) tells us that the state variable is small in the direction of p k if λ P,k is small and in case c T is not too large (c 2 − c 1 is supposed to be little). Consequently, these eigenspaces of P can be neglected in our considerations. The eigenspaces spanned by vectors q k that are associated to small eigenvalues of Q are also of minor relevance due to (22). This inequality shows that such q k barely contribute to the energy of the output y on each subinterval [0, t].
• Following basically the same steps, the result of Theorem 4.7 holds also true if the more general notion of Gramians in Remark 4.6 is used.
• Theorem 4.7 is formulated for u ∈ U since it is based on (16) and (17). This does not mean that a reduced order model based on neglecting eigenspaces of P and Q associated to small eigenvalues leads to a bad approximation for u ∈ L 2 T \ U. This is because (16) and (17) might still almost hold in that cases since suitable Gramians lead to G Q and G P −1 in (15) being small when they are positive. Then, the estimates in Theorem 4.7 will approximately hold.

Computation of monotonicity Gramians
We aim to compute average monotonicity Gramians P and Q for a large set U of controls. We choose them as solutions to (11) and (12), so that G P −1 and G Q in (15) have a local maximum in the origin or a saddle point with very few increasing directions. Else, we might have several cases in which the monotonicity condition is immediately violated. This would not allow (16) and (17) to hold for a large U. On the other hand, it is essential that the area where the monotonicity condition is fulfilled clearly dominates the one where it does not hold. A possible and acceptable scenario in dimension n = 2 is illustrated in Figure 1. Here, the monotonicity gap G Q is depicted for f = f (2) , c 2 = c f = 1 and Q = [ 0.49426 0.58159 0.58159 0.68542 ], a matrix with a large and a small eigenvalue. The blue color stands for small absolute values and red for large ones. G Q is non positive except for the black areas, where the monotonicity condition is slightly violated. In the following proposition, a simple criterion for local optimality for G P −1 and G Q is given. Proposition 4.9. Define the function g(x) = x, X(f (x) − c 2 x) 2 with a constant c 2 , f being twice differentiable and X > 0. We assume that for all j ∈ {1, . . . , n} andc 2 > 0, where e j is the j-th unit vector in R n . Then, g has a local maximum in x = 0.
Proof. It is easy to check that x = 0 is an extreme value since g ..,n = −2c 2 X < 0 which concludes the proof.
Condition (24) is, e.g., satisfied if polynomials are considered. We can therefore observe that G P −1 and G Q have a local maximum for the choices of f given in Example 2.1 in case c 2 is sufficiently large. In particular, we fix c 2 ≥ c f , since this means that G P −1 and G Q are nonpositive along the bases of eigenvectors used in (20). This is a consequence of assumption (3). Theorem 4.7 motivates to choose c 1 , so that c 2 − c 1 is a small positive number. If possible, we even set c 1 = c 2 providing c = 0. If c 1 > 0, the possibility of this choice also depends on weather (8) is satisfied. We then compute the solution to (11) having a minimal trace and the solution to the equality in (12). This provides that G P −1 and G Q are non-positive on large parts of R n for the particular functions introduced in Example 2.1 and only small positive values are taken on the other area. This leads to (16) and (17) for a large U. This is what we observe from numerical experiments, where A is a discrete Laplacian. Let us now briefly sketch how such a minimal trace monotonicity Gramian P is computed. We reformulate (11) by multiplying it with P from the left and from the right leading to , we obtain the following equivalent representation for (25) based on Schur complement conditions for the definiteness of a matrix. Here, we need to further assume that K is invertible. Now, we can use a linear matrix inequality solver to find a solution to the minimization of tr(P ) subject to (26) and P > 0. In this paper, we use YALMIP and MOSEK [26,22] for an efficient computation of P . In general, a good choice for P and Q guaranteeing (16) and (17) for many different controls always depends on the particular nonlinearity f . Therefore, no universal recommendation can be given here.

Extension under one-sided Lipschitz continuity
Many functions f satisfying (3) are also one-sided Lipschitz continuous. However, we require an extended version of this continuity concept in the context of the error analysis in Section 6. In detail the following inequalities are supposed to hold: for all x, z ∈ R n and a constant c f . Condition (27) will later inspire the extended definition of Gramians. Notice that one-sided Lipschitz continuity is defined with a minus in (27) but we additionally ask for this property when replacing each minus by a plus. In this context, let us look at the functions of Example 2.1 again. We begin with f (2) and f (3) and show that (27) is satisfied.
As we will see below, f (1) is also one-sided Lipschitz but (27) is not fulfilled if a plus is considered.
Example 4.11. Using f (1) for all i ∈ {1, . . . , n}. Therefore, we have We observe that the one-sided Lipschitz constant is different from the monotonicity constant in Example 2.1. Moreover, we show that (27) does not hold with a plus. Let n = 1 and c f be an arbitrary constant. We fix x = 1 and z = − 1 with > 0. We obtain if is sufficiently small and a > −1. (27), a Gramian based inner product shall preserve this property leading to the following extension of Definition 4.1.

Motivated by the one-sided Lipschitz continuity
Definition 4.12. Let c 1 and c 2 be constants. Then, a pair of matrices (P, Q) with P, Q > 0 is called global one-sided Lipschitz Gramians if they satisfy (11), (12) and for all x, z ∈ R n . Example 4.13. Let P, Q > 0 be solutions to (11), (12) and f be globally Lipschitz with −f (x) = f (−x). Then, we can always construct global one-sided Lipschitz Gramians, since for X ∈ {P −1 , Q} satisfying (11) and (12), we have that for some suitable constant c 2 .
If (28) is satisfied for z = 0, P and Q are global monotonicity Gramians. We will see later that a reduced order model based on the Gramians introduced in Definition 4.12 will lead to error estimates for all controls u ∈ L 2 T . However, as in the global monotonicity Gramian case, it might be inefficient to choose a Gramian allowing to derive estimates for all u. The error analysis will show that it is actually enough to have (28) for large/essential sets of pairs (x, z) ∈ R n × R n in order to find a reasonable error criterion for a large number of different controls, i.e., the one-sided Lipschitz gaps in (28) are mainly negative but also small positive values will be allowed. We postpone the discussion of a weaker version of Definition 4.12 to Section 6.
Remark 4.14. One-sided Lipschitz Gramians are again special solutions of linear matrix inequalities for reasons of accessibility. Analogue to Remark 4.6 this concept can be formulated more generally. Adding twice (28) to the respective inequality in (11) and (12) leads to for all x, z ∈ R n with c ≥ 0. We will see that this structure is what one requires to achieve a suitable global error bound for all u ∈ L 2 T . Notice that z = 0 leads to (18) and (19), respectively. We will not discuss a definition of Gramians P and Q via (30) and (31) in further detail but will refer to them within the error analysis. Now, let us briefly discuss the existence of global one-sided Lipschitz Gramians.
Proposition 4.15. Given a matrix X > 0 satisfying (9) for some constant c 1 and for all x, z ∈ R n and a constant c 2 . Then, global one-sided Lipschitz Gramians exist with these constants.
Proof. The proof uses the same argument as in Proposition 4.2 and is therefore omitted.

Particular reduced order model
We select a nonsingular S ∈ R n×n that we use to simultaneously diagonalize Gramians P and Q. This means that the bases of eigenvectors (p k ) and (q k ) in (20) will be the canonical basis of R n . Consequently, by Theorem 4.7, unimportant directions can be identified with components in the transformed state variable that are associated with small diagonal entries of the diagonalized Gramians. In particular, the transformation matrix defines the new state by x n = Sx. Inserting this into (1) leads to an equivalent stochastic system with coefficients instead of the original ones (A, B, f, N (33) with t ∈ [0, T ] and x n (0) = 0. The new system (33) has the same input u and output y. Moreover, properties like asymptotic stability are not affected. However, the Gramians are different. These are given in the following proposition, where the precise diagonalizing transformation is stated.
Proposition 5.1. Suppose that S is an invertible matrix. If P and Q are global/average monotonicity or one-sided Lipschitz Gramians of (1) according to Definitions 4.1, 4.5 or 4.12. Then, P n = SP S and Q n = S − QS −1 are the respective Gramians in the transformed setting (33).
Given that P, Q > 0, we find that P n = Q n = Σ n = diag(σ 1 , . . . , σ n ) using the balancing transformation where P = L P L P and L P QL P = U Σ 2 n U is a spectral factorization with an orthogonal U .
Proof. We multiply (11) and (12) with S − from the left and with S −1 from the right hand side. Consequently, we see that SP S and S − QS −1 satisfy these inequalities under the coefficients in (32). Moreover, (13) is preserved under this transformation, since Analogue, we can prove that the one-sided Lipschitz conditions (28) hold under the transformation. With x n (s) = x n (s, 0, u) given u ∈ U, we now find as well as so that the average monotonicity conditions (16) and (17) still hold for the same set U. We use (34) and obtain P n = Σ We observe that the diagonal entries of the balanced Gramians are σ i = λ i (P Q). We call them Hankel singular values (HSVs) from now on. Now, we partition the balanced state x n = x n,1 x n,2 and Σ n = diag(Σ r , Σ 2,n−r ), where Σ r = diag(σ 1 , . . . , σ r ) contains the large and Σ 2,n−r = diag(σ r+1 , . . . , σ n ), r < n, the small HSVs. The same is done for (32) yielding Since x n,2 is associated to small values in Σ 2,n−r , we truncate the equation for these variables and remove them from the dynamics of x n,1 and y. This results in a reduced system (5) with coefficients given by (35). Setting V = V r and W = W r , where we see that our reduced system's structure is of the form as in (6). Here, S is given by (34).

Error analysis of Gramian based reduced system
We consider the reduced system (5) with state dimension r and coefficients like in (35). As an intermediate step, let us introduce the same type of reduced model with dimension k = r, r + 1, . . . , n which we write as follows: Setting y n := y, we then observe that where · is some function space norm. This means that we have to investigate the error y k − y k−1 of removing a single HSV. We can derive the reduced system of order k − 1 from (36) by setting the last entry of x k equal to zero. Doing so, we obtain where the first k − 1 rows in the state equation of (38) represent the reduced order model of dimension k − 1 and v 0 , . . . , v d are (non specified) scalar processes that are introduced to ensure the equality in the last line which can be read as d0 = 0dt + d i=1 0dM i (t).
Theorem 6.1. Let y be the output of (1) with x(0) = 0 and given the r-dimensional reduced system (5) with output y r , coefficients as in (35) and x r (0) = 0. If this reduced system is based on Gramians P and Q satisfying (11) and (12) for a constant c 1 . Then, for all u ∈ L 2 T , we have where c = max{0, 2(c 2 − c 1 )} is defined by another constant c 2 (e.g. the parameter of Definitions 4.1, 4.5 or 4.12) and G + P −1 , G − Q are the associated one-sided Lipschitz gaps in (29). Moreover, x k is the reduced state variable of order k = r, r + 1. . . . , n and V k is the associated projection matrix being the first k columns of the inverse S −1 of the balancing transformation defined by (34). Corollary 6.2. Given the assumptions of Theorem 6.1, let P and Q be global one-sided Lipschitz Gramians according to Definition 4.12. Then, the following bound holds: for all u ∈ L 2 T . The same bound is established if the Gramians are defined by (30) and (31).
Proof. The functions G + P −1 and G − Q are non positive by construction of the global one-sided Lipschitz Gramians. Consequently, the result immediately follows from the one of Theorem 6.1. It is not an immediate consequence of Theorem 6.1 that (30) and (31) lead to the same result. However, the proof uses exactly the same ideas. Therefore, it is omitted. Remark 6.3.
• We found the classical bound for reduced order systems based on balanced truncation in Corollary 6.2 up to the exponential terms in (39), see [10,11] for the deterministic and [4] for the stochastic linear case. As mentioned before, choices of Gramians are only acceptable if c is sufficiently small, i.e., the exponentials do not dominate. On the other hand, global one-sided Lipschitz Gramians might not be a optimal in terms of their spectrum, so that a weaker concept is more reasonable.
• As mentioned in Section 4.4, we can allow for small positive one-sided Lipschitz gaps G − Q and G + P −1 , see (29), in certain (small) regions. If we pick P and Q accordingly, Theorem 6.1 then tells us that the averages will be non positive for a large number of controls u ∈ L 2 T and slightly positive in many of the other scenarios. This means that (39) will (approximately) hold for many controls.
• In case we have a priori information concerning the solution space of the system, we can say even more. This is given if P and Q are monotonicity Gramians according to Definitions 4.1 or 4.5, because of (21) in Theorem 4.7. This estimate provides that we obtain a small state approximation error, i.e., x(t) ≈ V k x k (t) for k ∈ {r, . . . , n − 1}, if the truncated HSVs σ k+1 , . . . , σ n are of low order. In particular, we have V k+1 x k+1 (t) ≈ V k x k (t) since this is the error of just removing σ k+1 . Therefore, we can conclude that we need G − Q and G + P −1 to be mainly negative solely on sets of pairs (x, z) ∈ R n ×R n with x ≈ z. In general, monotonicity Gramians do not ensure (39), but due to the continuity of f , we can say that Now, the monotonicity gap G P −1 defined in (15) is non positive on average for u ∈ U by construction of the average monotonicity Gramian P . This ensures that the bound of Corollary 6.2 might still deliver a reasonable error criterion although it does not hold.
Proof of Theorem 6.1. We introduce x − (t) := x k (t) − x k−1 (t) 0 and x + (t) := x k (t) + x k−1 (t) 0 , for which the dynamics are obtained by subtracting/adding (36) and (38), i.e., Recalling that Σ k = diag(σ 1 , . . . , σ k ) denotes the diagonal matrix of the k largest HSVs of the original system, we know, by Proposition 5.1, that Σ n satisfies (11) and (12) with the balanced realization (32). Evaluating the left upper k × k block of the equations associated to Σ n , we obtain Taking (40) into account, Lemma A.1 is applied to Σ Integrating this equation Let x k,2 be the last entry of x k and hence also of x − . Moreover, n k,i shall denote the last line of N k,i . Therefore, we obtain that Based on (43) combined with the definitions of the outputs in (36) and (38), we have We obtain by (50) that Now, exploiting Lemma A.1 for the process Σ − 1 2 k x + (t) together with (41) yields We exploit the estimate We apply (50) providing t 0α k (s) e c(t−s) ds ≤ σ k t 0 Ṫ k,+ (s) + 4E u(s) 2 2 e c(t−s) ds.
Combining this with (44) leads to The last step is to find different representations for T k,− and T k,+ inserting the definitions of x + and x − . We recall that f k (x k ) :=f k ( x k 0 n−k ), x k ∈ R k and 0 n−k ∈ R n−k by (35). Sincef k are the first k entries of the balanced nonlinearity f n , we have according to the definition of the one-sided Lipschitz gaps in (29). This concludes the proof using (37) and setting t = T .

Numerical experiments
Below, let L > 0 defining a "step size" parameter h := L (n+1) . Based on this, we introduce a grid by ζ j = jh for j = 0, 1, . . . , n + 1. Now, we mainly focus on an example for (1) that is given by with controlled boundaries and the intuition that x j (t) ≈ v t (ζ j ). Let us specify the other parameter and the noise profile. Below, M is a Wiener process in dimension d = 2 with covariance K = 1 −0.5 −0.5 1 and n = 100. We study the nonlinearities (1) or f = f (2) introduced in Example 2.1. The particular noise scaling functions are g 1 (ζ) = 4 sin(ζ) and g 2 (ζ) = 4 cos(ζ). Moreover, the terminal time is T = 1 and the quantity of interest shall be the following average: For illustration we show two typical paths of (47) for f = f (1) , f (2) and two different inputs in Figures 2 and 3. For f = f (2) , we know that (10) holds with X = I and c 2 ≥ c f = 1. Further, we observe that (9) is true for X = I and c 1 = c f = 1. Therefore, the system is globally mean square asymptotically stable according to Theorem 3.2 and the concept of monotonicity Gramians with c 1 = c 2 = 1 is well-defined by Proposition 4.2. We can even guarantee the existence of a one-sided Lipschitz Gramian by Proposition 4.15 since the one-sided Lipschitz condition (27) holds with c f = 1 using Example 4.10. The choice of f = f (2) also yields a mean square asymptotically stable system since (9) particularly holds for X = I if c 1 = c f = (a−1) 2 4 = 0.20250 is used and since we know, by Example 2.1, that (10) is true setting X = I and c 2 ≥ c f . Therefore, monotonicity Gramians also exist here for c 1 = c 2 = 0.20250. On the other hand, a one-sided Lipschitz Gramian Q exists with c 1 = c 2 = a 2 −a+1 3 = 0.303 due to Proposition 4.15 (X = I) exploiting Example 4.11. The same example, however, indicates that P might not be available as a one-sided Lipschitz Gramian.
The goal of this section is to construct average monotonicity Gramians P and Q according to Definition 4.5 for a large set of controls U. In detail, we choose the monotonicity/one-sided Lipschitz constant to define c 1 = c 2 = 1 for f = f (2) and we set c 1 = c 2 = 0.303 for f = f (1) which is a number dominating the monotonicity constant 0.20250. Consequently, Theorems 4.7 and 6.1 hold for c = 0. We choose Q to be the solution to the equality in (12) and P the candidate with minimal trace satisfying (11). We refer to Section 4.3 for the particular computation strategy. We observe that these P and Q do not satisfy (13) for all x ∈ R n but for the essential ones. In fact, we run experiments for a large variety of controls involving increasing, decreasing and (highly) oscillating u as well as a combination of all of them. In all cases, conditions (16) and (17) were fulfilled indicating that these P and Q are average monotonicity Gramians for a large set of controls U ⊂ L 2 T . We present the experiments solely for two representativesũ,û ∈ U which are given byũ These are chosen since they also steer the state x(t) to regions of R n , where the monotonicity conditions in (13) are violated. The constructed monotonicity Gramians have the advantage that the HSVs provide a reliable criterion for the reduction error according to Theorem 4.7. Here, we have c = 0. We depict these algebraic values for f = f (1) in Figure 4 and observe a strong decay telling us that we can expect a low approximation error for small r. The HSVs for f = f (2) behave very similarly and are therefore omitted. As discussed in Remark 6.3, we cannot expect where Q satisfies the equality in (12) and P is the minimal trace solution of (11).
the bound in Corollary 6.2 (with c = 0) to hold if average monotonicity Gramians are used. However, we expect the error to not be far from this bound, since the one-sided Lipschitz gaps G + P −1 and G − Q in Theorem 6.1 are expected to be small when they are positive. We compute the output y r of the reduced order model (5) introduced in Section 5 for different reduced dimensions r = 3, 6, 10, 20. The relative output error for f = f (1) can be found in Table 1 for the controls u andû. We observe a decreasing behaviour for growing r yielding a very high accuracy for y − y r L 2 T / y L 2 T for f = f (1) r u =ũ u =û 3 4.4077e−02 3.8041e−02 6 4.0903e−03 3.7334e−03 10 3.1233e−04 2.5745e−04 20 2.7327e−07 3.5013e−07    (48) and f = f (2) .
aa1.0494e−01aa 1.6369e−01 6 aa7.2186e−03 aa 1.3624e−02 10 aa4.7378e−04aa 7.3326e−04 20 aa1.3493e−07aa 2.1019e−07 Table 4: Relative error criterion of Corollary 6.2 with c = 0 and f = f (2) . r ≥ 6. Table 2 shows the bound of Corollary 6.2 which generally is no upper bound for the error calculated in Table 1, see the case of r = 20. This is because the one-sided Lipschitz gaps are not always non positive. However, 2 n k=r+1 σ k is close to the actual error. This is an observation made also in additional simulations that are not presented here. The intuition for 2 n k=r+1 σ k being an upper bound for dimensions r = 3, 6, 10 but not for r = 20 might be the low order of a positive one-sided Lipschitz gap. For that reason, it becomes only visible when 2 n k=r+1 σ k is very small. We repeat the error calculations for f = f (2) and obtain basically the same results, see Tables 3 and 4. This is due to a similar path behaviour of y for both nonlinearities f (1) and f (2) . Let us finally mention that we conducted the same experiments also when the right Dirichlet boundary condition in (46) is replaced by the Neumann condition ∂ ∂ζ v t (ζ)| ζ=L = u 2 (t) leading to dx n (t) = −x n (t) + x n−1 (t) instead of the last line in (45). Here, analog results can be seen using the same kind of parameters.

A Supporting lemmas
This Section contains several useful auxiliary results.
We introduce two classical versions of Gronwall's lemma below.
The corresponding integral version follows next.
whereα is the derivative of α Lebesgue almost everywhere.
Proof. The first part is a very classical result and is not proved here. Given that α is absolutely continuous, we can apply integration by parts yielding We apply Lemma A.1 to the uncontrolled process X 1 2 x(t) and obtain exploiting inequality (10) and inserting (51). We define k and k to be the smallest the largest eigenvalue of X, respectively, yielding kI ≤ X ≤ kI. With the smallest eigenvalue k Y of Y giving −Y ≤ −k Y I, we obtain −E x(t) Y x(t) ≤ −k Y E x(t) x(t) ≤ − k Y k E x(t) Xx(t) . Setting β := k Y k , we hence find d dt E x(t) Xx(t) ≤ (2(c 2 − c 1 ) − β)E x(t) Xx(t) .
By the differential version of Gronwall's inequality in Lemma A.2, we have concluding the proof.