Full state approximation by Galerkin projection reduced order models for stochastic and bilinear systems

In this paper, the problem of full state approximation by model reduction is studied for stochastic and bilinear systems. Our proposed approach relies on identifying the dominant subspaces based on the reachability Gramian of a system. Once the desired subspace is computed, the reduced order model is then obtained by a Galerkin projection. We prove that, in the stochastic case, this approach either preserves mean square asymptotic stability or leads to reduced models whose minimal realization is mean square asymptotically stable. This stability preservation guarantees the existence of the reduced system reachability Gramian which is the basis for the full state error bounds that we derive. This error bound depends on the neglected eigenvalues of the reachability Gramian and hence shows that these values are a good indicator for the expected error in the dimension reduction procedure. Subsequently, we establish the stability preservation result and the error bound for a full state approximation to bilinear systems in a similar manner. These latter results are based on a recently proved link between stochastic and bilinear systems. We conclude the paper by numerical experiments using a benchmark problem. We compare this approach with balanced truncation and show that it performs well in reproducing the full state of the system. \end{abstract}


Introduction
Galerkin approximation is an important methodology to obtain surrogate models for high fidelity systems. It relies on the fact that, in many applications, the state of the system is well approximated in a lower-dimensional subspace. In other words, for x(t) ∈ R n with n ≫ 1, there exist V ∈ R n×r such that x(t) ≈ Vx(t). The choice of the right basis V plays a crucial role in the approximation quality. Several approaches to construct the dominant subspaces have been proposed for deterministic linear systems, see, e.g., [5]. Among them, a commonly used approach is the proper orthogonal decomposition (POD) [9,17], which identifies dominant subspaces empirically by extracting them from snapshot matrices. These empirical methods have been successfully used in many applications. However, they are input-dependent in the setup of control systems, i.e., the quality of reduced order model (ROM) will depend on the choices of inputs used to generate the snapshots. In the setup of stochastic systems considered in this paper, a POD approach would require the simulation of an enormous amount of samples being numerically costly in practice. Moreover, a deeper theoretical analysis of such empirical methods is often not feasible.
Given stability in the original model, it is of interest to preserve this property in the reduced system. Galerkin methods have been shown to preserve stability for deterministic linear dissipative systems, see, e.g., [28]. Additionally, the authors in [21] propose a POD scheme combined with linear matrix inequalities to construct ROMs that are locally stable in a nonlinear deterministic setting. In general, Krylov based methods for deterministic linear [14] and bilinear systems [10] do not guarantee stability in the ROM. However, in [22], the authors have proposed equivalent dissipative realizations to arbitrary Galerkin projected linear systems that are stable. To the authors' knowledge, stability preservation using Galerkin projections has not been studied in the literature of stochastic systems.
In this work, we focus on model order reduction of linear stochastic and bilinear systems aiming for a full state approximation. Therefore, we study a Galerkin approach based on the dominant reachability subspaces, which leads to input-independent projections, i.e., the corresponding ROMs are (pathwise) accurate for a large set of inputs. This approach relies on the computation of the reachability Gramian of the underlying dynamical system. These Gramians are encoded by generalized Lyapunov equations and, hence, can be computed in a numerically efficient way in large-scale settings (see, e.g., the review papers [7,29] for low-rank methods). Once the Lyapunov equation is solved, the dominant subspaces are then identified by the span of eigenvectors of the Gramian associated with the large eigenvalues. Hence, the ROM is obtained by projecting the dynamical system onto the identified subspace. We show that this procedure either preserves the underlying stability or at least ensures the existence of the reduced order Gramian, which is vital for the error analysis. As a consequence, the minimal realization of the reduced model is stable. Subsequently, we propose error bounds for the approximation, which show how the reduction error is related to the neglected eigenvalues of the Gramian. It is worth noticing that this approach has already been successfully applied in the literature of deterministic linear time-invariant systems, e.g., in the context of structured systems [30], and port-Hamiltonian systems [20]. However, to the authors' knowledge, the stability analysis and error bounds for stochastic and bilinear systems considered in this paper have not even been established for deterministic linear systems so far.
It is worth mentioning that one way to address the problem of full state approximation is to use balanced truncation, see [18] for deterministic linear systems and [1,2,6,23] for stochastic and bilinear systems. This method is generally suitable when one wants to approximate a quantity of interest y(t) = Cx(t), with C ∈ R p×n and p ≪ n. For this method, one needs to compute the observability Gramian in addition to the reachability Gramian. Subsequently, a ROM is obtained by Petrov-Galerkin projection based on these two Gramians. One advantage of balanced truncation is that it is, under some mild conditions, stability preserving [3,19] and it guarantees error bounds [6,13,25]. However, whenever, p ≈ n, it suffers from the issue that the computation of the observability Gramian is not feasible in practice since low-rank methods are no longer applicable in this context. This scenario is given if the full state shall be approximated, since C = I in this case. For the deterministic linear case, the authors in [8] propose a scheme enabling the computation of the Petrov-Galerkin projection without the explicit computation of the observability Gramian. The approach is numerically feasible but costly since it relies on a quadrature scheme using the low-rank factors of the reachability Gramian pre-multiplied by shifted systems. Additionally, those results are no longer applicable for stochastic and bilinear systems. The paper is organized as follows. In Section 2, we present the main setup for linear stochastic systems and the concept of mean square asymptotic stability along with some literature results. Then, in Section 3, we described the proposed Galerkin projection based procedure using the reachability Gramian. Additionally, an interpretation of the dominant subspaces is derived therein. Section 4 is dedicated to showing the properties of the ROM. First, we prove that this procedure either constructs a reduced model which is mean square asymptotically stable or a ROM which has a realization satisfying the desired stability property. It is worth noticing that modified versions of those results are also valid for the class of deterministic bilinear systems, which we also establish in this paper. In Subsection 4.2, we derive bounds for the approximation error and their relation to the neglected singular values of the reachability Gramian. In Section 5, similar results for the class of bilinear systems, including stability preservation and error bounds, are derived. In Section 6, some numerical experiments are conducted to illustrate the performance of the proposed approach and in order to compare it with balanced truncation.
2 Linear stochastic systems and mean square stability

Stochastic problem setup
We consider the following linear stochastic systems where we assume that A, N i ∈ R n×n and B ∈ R n×m are constant matrices and its initial condition is x(0) = 0. The vectors x and u are called state and control input, respectively.
Moreover, let W = (W 1 , . . . , W q ) ⊤ be an R q -valued standard Wiener process for simplicity of the notation. The results can be extended to square integrable Lévy processes with mean zero and general covariance matrix (see, e.g., [23]). All stochastic processes appearing in this paper, are defined on a filtered probability space (Ω, F, (F t ) t≥0 , P) 1 . In addition, W is (F t ) t≥0 -adapted and its increments W (t + h) − W (t) are independent of F t for t, h ≥ 0. Throughout this paper, we assume that u is an (F t ) t≥0 -adapted control that is square integrable, meaning that for all T > 0, where · 2 denotes the Euclidean norm. Moreover, · F will denote the Frobenius norm, whereas · represents an arbitrary matrix/vector norm. The aim is to identify a low-dimensional subspace V of R n that approximates the manifold of the state x. Choosing a matrix V ∈ R n×r of orthonormal basis vectors of V, an approximation of the form Vx(t) ≈ x(t) can be constructed. Inserting this approximation into the original system (1), we enforce a Petrov-Galerkin condition by multiplying the residual with V ⊤ leading to a ROM Our main goal is to construct the matrix V , such that, the approximation error is small for every input u considered.

Mean square asymptotic stability and generalized Lyapunov operators
We introduce the fundamental solution Φ to (1). It is defined as the R n×n -valued solution to It is the operator that maps the initial condition x 0 to the solution of the homogeneous state equation, i.e., u ≡ 0, with initial time s ≥ 0. We additionally define Φ(t) := Φ(t, 0). Moreover, notice that we have Φ(t, s) = Φ(t)Φ −1 (s).
Throughout this paper, we assume that the uncontrolled state equation (1) is mean square asymptotically stable, i.e, E Φ(t) 2 e −ct for some constant c > 0. With λ(·) denoting the spectrum of a matrix/operator, this is equivalent to where K := I ⊗ A + A ⊗ I + q i=1 N i ⊗ N i and · ⊗ · is the Kronecker product of two matrices, see for instance [12,16]. Moreover, the system is called mean square stable if λ (K) ⊂ C − . Notice that λ(K) = λ(L A +Π N ), where the generalized Lyapunov operator In Section 3, we will introduce a reduced system which does not necessarily preserve (4) but it is always mean square stable, i.e., K can additionally have eigenvalues on the imaginary axis. Therefore, we need the following sufficient conditions for mean square stability.
Lemma 2.1. Given a matrix Y ≥ 0, let us assume that there exists X > 0 such that Proof. An algebraic proof can be found in [3,Corollary 3.2]. We refer to [23, Lemma 6.12] for a probabilistic approach.
Let α(L A +Π N ) := max{ℜ(µ) : µ ∈ λ(L A +Π N )} be the spectral abscissa of the operator L A + Π N , with ℜ(·) being the real part of a complex number. Since the stability of a stochastic system is related to the eigenvalues of L A + Π N , we formulate the following result.
Proof. A proof in a more general framework can be found in [12,Section 3.2]. We also refer to [3, Theorem 3.1] and the references therein.
Finally, spectral properties of the Kronecker matrix involving both the reduced and the original model matrices are required. Lemma 2.3. Given that the full model is mean square asymptotically stable, whereas the reduced system is just mean square stable, i.e., we have Then, it holds that Proof. A probabilistic version can be found in [23,Lemma 6.12] and an algebraic approach is given in [3,Proposition 3.4].
3 Dominant subspaces and reduced order model 3

.1 Dominant subspaces of (1)
We identify the redundant information in the system by using the reachability Gramian Notice that P exists due to the exponential decay of the fundamental solution Φ. Practically, one can compute P by solving a generalized Lyapunov equation. Using Lemma A.1 with s = 0,Â = A,B = B,N i = N i and t → ∞, we obtain that the reachability Gramian is a solution of the following generalized Lyapunov equation Let x(t, x 0 , u), t ≥ 0, denote the solution of (1) with initial value x 0 and control u. Then, for z ∈ R n , we have using the results in [24]. Let (p k ) k=1,...,n be an orthonormal basis of R n consisting of eigenvectors of P . Then, the state variable can be written as x(t, 0, u), p k 2 p k .
Setting z = p k in (6), we obtain where λ k is the corresponding eigenvalue. Consequently, we see that the direction p k is completely irrelevant if λ k = 0. On the other hand, if λ k is not zero but small, then a large component in the direction of p k requires a large amount of energy by (7). Therefore, the eigenspaces of P belonging to the small eigenvalues can also be neglected.
A ROM can now be obtained by removing the unimportant subspaces from (1). This is done by first diagonalizing P . If P is diagonal, we have that p k is the kth unit vector and the diagonal entries of P indicate the relevance of the respective unit vector. A reduced system can then be easily derived by truncating the components of x associated to the small/zero entries λ k of a Gramian of the form P = diag(λ 1 , . . . , λ n ).

Reduced order model by Galerkin projection
We introduce the eigenvalue decomposition of the reachability Gramian as follows where S −1 = S ⊤ and Λ = Λ 1 0 0 Λ 2 = diag(λ 1 , . . . , λ n ) is the matrix of eigenvalues of P . For simplicity, let us assume that the spectrum of P is ordered, i.e., λ 1 ≥ . . . ≥ λ n ≥ 0 so that Λ 2 contains the small eigenvalues. Let us do a state space transformation using the matrix S. The transformed state variable then is x b = Sx. Plugging this into (1), we find where the balanced matrices are given by We refer to (9) as the balanced realization of the linear stochastic system. The fundamental solution of the balanced realization is Φ b = SΦS ⊤ which can be seen by multiplying (3) with S from the left and with S ⊤ from the right. Therefore, the reachability Gramian of (8) is We partition , where x 1 and x 2 are associated to Λ 1 and Λ 2 , respectively. Now, exploiting the insights of Section 3.1, x 2 barely contributes to the system dynamics. We obtain the reduced system by truncating the equation related to x 2 in (8). Furthermore, we set the remaining x 2 components equal to zero. This yields a reduced system (2) with matricesÂ where V are the first r columns of In large-scale settings, the reachability Gramian can be computed using low-rank methods (see [7,29]), i.e., we find a matrix Z P ∈ R n×l , with l ≪ n, such that P ≈ Z P Z ⊤ P . Consequently, in this setup, the Galerkin projection can be identified using the singular value decomposition of Z P .

Properties of the reduced system 4.1 Mean square stability and reduced order Gramian
In this section, we study stability preservation and the existence of the Gramian for the reduced system in (11). The next result guarantees mean square stability.
Proposition 4.1. Suppose that Λ 1 = diag(λ 1 , . . . , λ r ) > 0. Then, the reduced order system (2) withÂ = A 11 andN i = N i,11 , introduced in (9), is mean square stable, i.e., Proof. According to (10), the balanced reachability Gramian is the diagonal matrix Λ of eigenvalues of P . Using the partition of the balanced matrices in (9), we therefore have The left upper block of the above equation is Since Λ 1 > 0 by assumption, Lemma 2.1 yields the claim.
Using Proposition 4.1 and Lemma 2.2, the reduced order system is asymptotically mean square stable if and only if 0 ∈ λ( 11 ). With the following example it is shown that the zero eigenvalue can indeed occur.
]. Then, system (1) is already balanced since the reachability Gramian is given by Moreover, we have rank([ B AB ]) = 2 which means that the system is reachable or locally reachable if N i were non zero. According to Section 3.2 the reduced matrices are A 11 = 0, B 1 = 0 and N i,11 = 0. Consequently, the reduced system is not asymptotically stable and the reachability of the system is also lost since the reduced system is uncontrolled.
Example 4.2 shows a difference to balanced truncation for stochastic systems, where mean square asymptotic stability is preserved under relatively general conditions [3,4]. Mean square asymptotic stability ensures the existence of the reachability GramianP : ⊤ (s)ds, whereΦ represents the fundamental solution of the reduced system. However, asymptotic stability is only a sufficient condition forP to exist. The next example illustrates such a scenario.
Clearly, Φ does not decay exponential to zero but ΦB does. Therefore, the reachability Gramian exist and is P = [ 0.25 0.25 0.25 0.25 ], whereas the set of solutions to (5) is given by Example 4.3 emphasizes that it is important to distinguish between a Gramian (given by an integral representation) and a solution of a Lyapunov equation. The next theorem proves that even if the reduced system is not mean square asymptotically stable, the existence of the reduced order reachability Gramian can be guaranteed. This is one of the main results of the paper which is also vital for later considerations, where we prove an error bound in whichP is involved. 11 defined in (9) and Λ 1 = diag(λ 1 , . . . , λ r ) > 0. Moreover, letΦ denote the fundamental solution to this reduced order system. Then, there is a constant c > 0 such that exists and satisfies Proof. We setK : 11 and consider the case that the reduced system is mean square asymptotically stable, i.e., 0 ∈ λ(K). According Given this condition the infinite integralP exists. Moreover, using Lemma A.1 with 11 and exploiting that the left hand side of (49) tends to zero if t → ∞, we see thatP solves (12).
Let us consider the case of 0 ∈ λ(K) = λ(K ⊤ ). If further B 1 = 0, the result of this theorem is true. Therefore, we additionally assume that B 1 = 0. Then, by Lemma 2.2, there existsV ≥ 0 such that Moreover, according to the proof of Proposition 4.1, we have We observe that Using the properties of the trace this yields This implies thatV The caseV > 0 is excluded since then it holds that B 1 = 0. Therefore, we consider the scenario in whichV does not have full rank. We then assume thatV is an eigenvector with maximal rank, i.e., for any other eigenvectorṼ ≥ 0 corresponding to the zero eigenvalue, we have rank(Ṽ ) ≤ rank(V ).
Introducing the eigenvalue decomposition ofV : D > 0, we find a basis of the kernel by the columns ofV 2 , i.e., ker(V ) = im(V 2 ). Inserting (16) into (15) yieldsV We use a state space transformation based onŜ = involving the following matriceŝ ,ŜΛ 1Ŝ ⊤ =: where (17) was exploited. We multiply (14) withŜ from the left and withŜ ⊤ from the right and obtain Before we evaluate the blocks of (19), we show thatÂ To do so, we show that the kernel ofV is invariant under multiplication with A 11 and N i, 11 . Let z ∈ ker(V ). Then, we obtain implying thatV N i,11 z = 0. Using this fact provides that Hence, we have A 11 ker(V ), N i,11 ker(V ) ⊂ ker(V ). Since the columns ofV 2 span ker(V ) and due to the invariance, there exist suitable matricesÃ 11 andÑ i,11 such that Exploiting thatV ⊤ 1V 2 = 0 gives us (20). Moreover, we see thatÂ 22 The evaluation of the right upper block yieldŝ Finally, the right lower block is given bŷ 11 and insert (23) into (24) in order to obtain Using (22) for the above relation leads tô We add and subtract q is positive semidefinite since it holds that where [ y z ] is an arbitrary vector of suitable dimension. Therefore, we havê We and find the associated equation by multiplying the one forΦ withŜ from the left andŜ ⊤ from the right resulting in Evaluating the right upper block and subsequently the right lower block of (27), we see thatΦ 12 = 0 andΦ 22 =Φ 2 . Therefore, (26) becomes In addition, we obtain the following rank relation where r 2 is the number of rows/columns ofÂ 22 . If there is no zero eigenvalue of the Kronecker matrix associated to (Â 22 ,N i,22 ), thenΦ 2 decays exponentially and the claim of this theorem follows by (28). If the projected system still has a zero eigenvalue, then by Lemma 2.2, there isV 22 ≥ 0,V 22 = 0, such that Now, one can further project down the reduced system with matrices (Â 22 ,B 2 ,N i,22 ) by the same type of state space transformation as in (18) based on the factor of the eigenvalue decomposition ofV 22 instead ofŜ and based on (25) instead of (14). Notice thatV 22 cannot have full rank since else we haveB 2 =V ⊤ 2 B 1 = 0 which, together with (17), implies B 1 = 0. One proceeds with this procedure until a mean square asymptotically stable subsystem is achieved. Such a subsystem exists since if one reaches a system of dimension r 0 , then it holds that r 2 = r 0 in (29). This local reachability condition combined with (25) is equivalent to mean square asymptotic stability, see [12, Theorem 3.6.1]. Since (28) is then also obtained with the mean square asymptotically stable subsystem, the result follows which completes the proof. Remark 1. The only structure of the reduced system that was used in the proof of Theorem 4.4 is the existence of an equation of the form (14). Therefore, this theorem can be extended to any reduced system for which there exists a matrixX > 0 such that The following implication of Theorem 4.4 shows the square mean asymptotic stability of the ROM (11) is preserved in some extended way.
Corollary 4.5. Suppose that the reduced system (2) with matricesÂ = A 11 ,B = B 1 = 0 andN i = N i,11 defined in (9), and associated to the fundamental solutionΦ, is not mean square asymptotically stable. If we further have that Λ 1 > 0, then there exists 11 V 0 , associated to the mean square asymptotically stable fundamental solutionΦ 0 . Moreover, it holds that Proof. As in (26), we can writeΦ(t)B 1 =Ŝ ⊤ (ŜΦ(t)Ŝ ⊤ )ŜB 1 with the orthogonal matrix S ⊤ = [V 1V2 ]. Following the steps of the proof of Theorem 4.4, we see thatΦ(t)B 1 = V 2Φ2 (t)B 2 , whereB 2 =V ⊤ 2 B 1 and whereΦ 2 is the fundamental solution for the system with matricesV ⊤ 2 A 11V2 andV ⊤ 2 N i,11V2 . IfΦ 2 is mean square asymptotically stable, we have that V 0 =V 2 . Else, by the proof of Theorem 4.4, the projection procedure can be repeated until an mean square asymptotically stable subsystem is achieved. In this case, V 0 is the product of matrices likeV 2 .
Corollary 4.5 shows that the obtained ROM always has a mean square asymptotically stable realization. In other words, the procedure described in Section 3 produces a ROM that is either mean square asymptotically stable or that can be further reduced to a mean square asymptotically stable system without an additional approximation error given that x 0 = 0.
The following corollary will be useful for interpreting error bounds for the approximation error in Section 4.2. Proof. As in the proof of Theorem 4.4, three cases need to be considered. Let us first assume that the reduced system is mean square asymptotically stable, i.e., 0 ∈ λ(K). Subtracting (12) from (14) we see that Λ 1 −P satisfies Now, equation (30) is uniquely solvable. According to Section 3.1, this solution is represented by E ∞ 0Φ (s)R 2Φ ⊤ (s)ds ≥ 0. Therefore, we have that Λ 1 ≥P implying the claim of this corollary. Now, let us study the case of 0 ∈ λ(K). B 1 = 0 implies thatP = 0 leading to Λ 1 ≥P . It remains to consider the case of an unstable reduced system with B 1 = 0. We use the arguments of the proof of Theorem 4.4 and assume w.l.o.g. that the projected reduced system with matrices (Â 22 ,B 2 ,N i,22 ) and fundamental solutionΦ 2 is already mean square asymptotically stable. Else we could project down the reduced system further and the same arguments apply as the ones we use below. Integrating both sides of (28) over [0, ∞) and using the definition of the Frobenius norm, we obtain Due to the mean square asymptotic stability, we know thatP 2 is the unique solution tô Comparing this equation with (25), we find thatP 2 ≤P 22 =P 22 −P ⊤ 12P −1 11P 12 ≤P 22 . We exploit (18) leading to tr(P ) ≤ tr(P 22 ) ≤ tr(P 22 ) + tr(P 11 ) = tr(ŜΛ 1Ŝ ⊤ ) = tr(Λ 1 ).

Error bounds
In this subsection, we derive error bounds for the model reduction procedure proposed in Section 3. We begin with an error bound that is general in the sense that it only requires the existence of the Gramians P andP and does not exploit any further structure of the reduced system. Once this general bound is established, an error estimate for the choice in (11) is given allowing to identify the scenarios in which this ROM leads to a good approximation. The next result characterizes the error in a full state approximation. Notice that we use similar techniques as in [6,27], where output errors were considered. However, we state the following proposition under milder assumptions.
Proof. It can be shown that the solution of (1) is given by see, e.g., [12]. Setting the initial states in (1) and (2) equal to zero and using the solution representations for both systems, we obtain by the triangle inequality that We apply the inequality of Cauchy-Schwarz and obtain The definition of the Frobenius norm and properties of the trace operator yield Using Corollary A.2, Φ(t, s) andΦ(t, s) can be replaced by Φ(t − s) andΦ(t − s) above.
Writing the resulting trace expressions by the Frobenius norm again, we obtain The infinite integral above exists due to the exponential decay of ΦB andΦB. Taking the supremum over [0, T ], inserting the definition of the Frobenius norm and exploiting that V ⊤ V = I, we obtain The infinite integrals P ,P and P 2 satisfy (31) due to Lemma A.1 using the exponential decay of ΦB andΦB. Based on the result in Proposition 4.7, we now find an error bound for the reduced system introduced in Section 3.2. Output error bounds for balanced truncation in the same norm based on different choices of Gramians are proved in [6,26]. The error analysis for the scheme in Section 3.2 is more challenging since less structure than in the case of balanced truncation can be exploited which is a method where the reachability and observability Gramian are both diagonal and equal (after a balancing transformation). Moreover, in contrast to balanced truncation, we need to discuss the case in which mean square asymptotic stability is not preserved.
Theorem 4.8. Let x be the solution to the mean square asymptotically stable system (1) andx the solution to (2) with zero initial states and withÂ = A 11 ,B = B 1 ,N i = N i,11 being submatrices of the balanced partition in (9). Let Λ = diag(Λ 1 , Λ 2 ) be the matrix of ordered eigenvalues of the reachability Gramian P with Λ 1 = diag(λ 1 , . . . , λ r ) > 0. Let denote the factor of the associated eigenvalue decomposition of P . Then, it holds that whereP is the reduced reachability Gramian and is defined as the unique solution to If it moreover holds that 0 ∈ λ(I ⊗ A 11 + A 11 ⊗ I + q i=1 N i,11 ⊗ N i,11 ), thenQ can be introduced as the positive semidefinite solution to Hence, the error bound becomes where the weight is Proof. Since the original model is asymptotically mean square stable and due to Theorem 4.4, the assumptions of Proposition 4.7 are met such that we have Notice that P uniquely solves (31a). Since the ROM is mean square stable by Proposition 4.1 and due to Lemma 2.3 P 2 is also the unique solution to (31c). However, there can still be infinitely many other solutions to (31b) besidesP . Using the balanced realization in (9), the error bound then becomes where Λ and X = SP 2 uniquely solve By Lemma 2.3, there is a unique solution to (33) which we can use to rewrite tr(S ⊤ XV ⊤ ) = tr(Y B b B ⊤ 1 ). Based on the partition (9), we evaluate the first r columns of (36) and obtain

Inserting this into tr(Y
The first r columns of (33) yield 22 .
Inserting this into the bound in (35) leads to tr(Λ) + tr(P ) − 2 tr(S ⊤ XV ⊤ ) = tr(P − Λ 1 ) + tr , which proves (32). Now let us consider the case where 0 ∈ λ(I ⊗ A 11 + A 11 ⊗ I + q i=1 N i,11 ⊗ N i,11 ), i.e., the reduced system is mean square asymptotically stable by Proposition 4.1 and Lemma 2.2. Therefore, (34) has a unique positive semidefinite solutionQ. Subtracting the left upper r × r block of (36) from (31b), we find Hence, we have which concludes the proof of this theorem.
Theorem 4.8 is a vital since it shows the relation between the truncated eigenvalues contained in Λ 2 and the error of the model reduction procedure. By Corollary 4.6, we know that tr(P − Λ 1 ) ≤ 0 and therefore (32) shows that the error between x and Vx is small if Λ 2 has small diagonal entries. Consequently, the reduced system is accurate if only the small eigenvalues of P are neglected. Moreover, this tells us that the reduced order dimension r can be chosen based on the eigenvalues of P since their order is a good indicator for the error. Certainly, the error bound representation in Proposition 4.7 is more suitable for practical computations than the one in Theorem 4.8. This is because one only needs to solve forP and P 2 satisfying (31b) and (31c) in addition to the Gramian P which is already computed within the model reduction procedure.

Full state approximation for bilinear systems
In this section, we discuss the extension of the proposed results for the class of bilinear systems. We consider the Galerkin projection based model reduction scheme that was studied in Section 3.2 for deterministic bilinear dynamical systems governed bẏ Roughly speaking, (38) is obtained by replacing the white noise processes dW i dt in (1) (q = m) by the ith component u i of the control vector u ∈ L 2 T , which we assume henceforth to be deterministic. Transferring the results from the linear stochastic to the deterministic bilinear case is not trivial, since from the theoretical point of view (38) and (1) are very different, since white noise is not a function. However, due to the recently shown relation between stochastic and bilinear systems in [25], we are able to establish the results of the previous sections for (38) in a similar manner. Let us assume that the matrix A is Hurwitz, i.e., λ(A) ⊂ C − . Writing the solution z = z(·, z 0 , B) to (38) dependent on the initial state z 0 and the input matrix B, λ(A) ⊂ C − implies 2 2 ds < ∞, i.e., the homogeneous equation is asymptotically stable with exponential decay, see [25]. If N i for all i = 1, . . . , m is sufficiently small, A being Hurwitz implies mean square asymptotic stability in the sense of (4). This can be, e.g., seen by the sufficient condition for (4) in [12, Corollary 3.6.3], see also [31]. We can now control the matrices N i by recalling (38) with γ > 0 resulting iṅ compare also with [2,11], where this technique has also been used. If γ is sufficiently large, (4) can be guaranteed for the pair (A, 1 γ N i ) which provides the existence of a unique solution to According to Section 3.1, P γ is the reachability Gramian of the stochastic system (1) with coefficients (A, 1 γ B, 1 γ N i ). Choosing P γ for γ = 1 as a reachability Gramian in the context of model reduction for bilinear systems was first proposed in [1] and, e.g., further investigated in [2]. By [25] we know that P γ takes a similar role as in the stochastic case (compare with (7)), i.e., it characterizes redundant information in (38) by where (p γ,k ) is an orthonormal basis of eigenvector of P γ with associated eigenvalues (λ γ,k ) and u 0 the vector of controls entering the bilinear part of the equation, i.e., Therefore, the eigenspaces of P γ corresponding to the zero eigenvalues are irrelevant for the system dynamics. Moreover, assuming that the control energy is sufficiently small, (41) tells us that z(·, 0, B) is small in the direction of p γ,k if λ γ,k is small. Therefore, these eigenspaces can also be seen as less relevant in (38) and can hence be removed leading to ROMs. A somehow different way of characterizing unimportant states in a bilinear equation was discussed in [2,15], where local estimates for the reachability energy based on P γ , γ = 1 have been shown.
Remark 3. So far, we observed some essential differences between stochastic and bilinear systems. System (38) only requires A to be Hurwitz instead of (4). On the other hand, we consider a family of Gramians for the bilinear case depending on γ rather than a fixed Gramian. Although the characterization of irrelevant states are similar in both cases, the exponential in (41) indicates that we need a certain smallness assumption on u 0 and γ in order to make our arguments valid.
The above considerations motivate to conduct the same reduced order modeling procedure as explained in Section 3.2. We introduce the eigenvalue decomposition of where Λ γ,1 > 0 contains the large and Λ γ,2 the small ordered eigenvalues of P γ . Using the partition the eigenvectors associated to small eigenvalues of P γ are then truncated, resulting in the reduced model The properties of (44) can now be immediately transferred from the considerations in the stochastic case. By Proposition 4.1, we have Example 4.2 shows that eigenvalues on the imaginary axis can occur in (45), but they can be excluded by Lemma 2.
i,11 ). However, even though there is a zero eigenvalue, the existence of the reduced order Gramian can be guaranteed using the arguments of Theorem 4.4.
In order to keep the discussion around the error bound for bilinear systems short, we do not discuss the scenario of a zero eigenvalue (45) in detail. Therefore, let us exclude this case below. We can now transfer the result of Proposition 4.7 to the bilinear case by the results of [25].
Proposition 5.1. Let z be the solution to (38) with λ(A) ⊂ C − and letẑ γ represent the solution to (44). Moreover, let γ > 0 such that and that the ROM coefficients satisfy Given zero initial states to both equations and V γ ∈ R n×r being the first r columns of the factor S ⊤ γ of the eigenvalue decomposition of P γ (unique solution to (40)), we have where P γ,2 andP γ are the unique solutions to Proof. Given the assumptions P γ , P γ,2 andP γ exist. The result is then a direct consequence of Corollary 4.3 in [25].
where the weight is Above, Y γ = [ Y γ,1 Y γ,2 ] andQ γ are defined as the unique solutions to Proof. The result directly follows from the proof of Theorem 4.8 in which B and N i need to be replaced by 1 γ B and 1 γ N i .
As in the stochastic framework, we can conclude that truncating the small eigenvalues of P γ leads to small diagonal entries of Λ γ,2 and hence to a small error in the dimension reduction according to Theorem 5.2 given that the exponential in (46) is not too dominant. Therefore, the eigenvalues of P γ can be used as a criterion to determine a suitable reduced order dimension r.
In this system, there are two source terms, namely u 1 and u 2 , which are applied at the boundaries Γ 1 and Γ 2 , respectively. A semi-discretization in space using finite differences with k = 20 grid points results in a control system of dimension n = 400 of the forṁ We refer to [2] for more details on the matrices in (47).

Stochastic example
First, we consider that the boundary Γ 1 is a perturbed by noise, i.e., u 1 = dW dt with W being a standard Wiener process. Hence the resulting dynamical stochastic system is of the form In order to apply BT, we additionally need to compute the observability Gramian Q, which satisfies the following Lyapunov equation with q = 1 and N 1 = N . This method was studied in detail in [6]. However, solving (48) leads to much higher computational cost especially due to the full-rank right hand side which does not allow the usage of low-rank solvers. Figure 1 depicts the decay of the eigenvalues/singular values of P as well as the decay of square root of the eigenvalues of P Q (Hankel singular values). As shown in Theorem 4.8, the eigenvalues of P play an important role in the error bound for OS and provide an intuition for the expected error. Similarly, the Hankel singular values are also associated the error bound for BT, see [6]. The decay of both curves in Figure 1 indicates that a small reduction error can already be achieved for for small r. For this example, we compute reduced systems of order r = 25 for both OS and BT. As a next step, we compare the quality of the reduced-order systems by simulating their responses for the input u 2 (t) = u(t) = e − 1 2 t sin(10t). To determine the transient response, we apply a semi-implicit Euler-Maruyama scheme with step size h = 1/256 and simulate the original system and the reduced-order models in the time interval [0, 1]. Additionally, those simulations are done using 10 5 samples. The mean error between the original and the reduced models are depicted in Figure 2 as well as the error bounds from Proposition 4.7. Table 1 presents the numerical values for the error bounds and max mean error for both methods. We notice that both reduced models are able to follow the behavior of the original system. Furthermore, this figure shows that the two methods, Method Error bound max mean error OS 5.46 · 10 −3 3.56 · 10 −4 BT 5.63 · 10 −3 2.05 · 10 −4 Table 1: Stochastic example: Error bounds and the max value of the mean error for OS and BT for the simulation presented in Figure 2.
BT and OS, provide very similar quality reduced models in terms of the magnitude of the error, an observation we also made with other test examples. However, we note that BT is a numerically more expensive method, since one needs to additionally solve for Q.   Additionally, for different reduced orders varying in in the range r = 1, . . . , 25, the inputindependent part of the error bound given in Proposition 4.7 is computed in Figure 3, i.e., for each reduced order r we plot the value E(r) = tr(P ) + tr(P (r)V (r) ⊤ V (r)) − 2 tr(P 2 (r)V (r) ⊤ ) where V (r) is the reduced basis of order r, andP (r), P 2 (r) are the solutions of (31b) and (31c). Notice that we added V (r) ⊤ V (r) in the second summand of the error bound since V (r) ⊤ V (r) = I for BT. As expected, the bound decays if the reduced order is increased for both OS and BT.

Bilinear example
As our second numerical example, we consider the heat transfer system in (47) with u 2 = u 1 = u. As a consequence, this leads to a bilinear system having only one input. Method Error bound L ∞ error OS 5.46 · 10 −3 3.56 · 10 −4 BT 5.63 · 10 −3 2.05 · 10 −4 Table 2: Bilinear example: Error bounds and the L ∞ error for OS and BT for the simulation presented in Figure 4.
For this example, we need to solve the Lyapunov equation in (40). To this aim, we set γ = 1 leading to the same reachability and observability Gramians as for the stochastic example. Hence, Figure 1 also gives the decay of singular values for OS and BT. Similarly, Figure 3 shows the decay of the input-independent part of the error bound from Proposition 5.1.
As in the previous example, we obtain reduced systems of order r = 25 by using OS and BT and compare their quality by simulating their responses for the input u(t) = e − 1 2 t sin(10t). To determine the transient response, we use the MATLAB ® solver ode45 to simulate the original system and the reduced-order models in the time interval [0, 10]. The results are depicted in Figure 4. Table 4 presents the numerical values for the error bounds and max error for both methods. Similar to the stochastic example, we notice that the two methods, BT and OS, provide very similar quality reduced models in terms of the magnitude of the error. Once again, we note that BT is a computationally more expensive method, since one needs the solution to the additional Lyapunov equation in (48). Finally, Figure 5 shows the simulation of the error for reduced models obtained by OS with different orders. As expected, the error decays once the order is increased.