A new type of singular perturbation approximation for stochastic bilinear systems

Model order reduction (MOR) techniques are often used to reduce the order of spatially discretized (stochastic) partial differential equations and hence reduce computational complexity. A particular class of MOR techniques is balancing related methods which rely on simultaneously diagonalizing the system Gramians. This has been extensively studied for deterministic linear systems. The balancing procedure has already been extended to bilinear equations, an important subclass of nonlinear systems. The choice of Gramians in Al-Baiyat and Bettayeb (In: Proceedings of the 32nd IEEE conference on decision and control, 1993) is the most frequently used approach. A balancing related MOR scheme for bilinear systems called singular perturbation approximation (SPA) has been described that relies on this choice of Gramians. However, no error bound for this method could be proved. In this paper, we extend SPA to stochastic systems with bilinear drift and linear diffusion term. However, we propose a slightly modified reduced order model in comparison to previous work and choose a different reachability Gramian. Based on this new approach, an L2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^2$$\end{document}-error bound is proved for SPA which is the main result of this paper. This bound is new even for deterministic bilinear systems.


Introduction
Many phenomena in real life can be described by partial differential equations (PDEs). For an accurate mathematical modeling of these real-world applications, it is often required to take random effects into account. Uncertainties in a PDE model can, for example, be represented by an additional noise term leading to stochastic PDEs (SPDEs) [11,15,28,29].
It is often necessary to numerically approximate time-dependent SPDEs since analytic solutions do not exist in general. Discretizing in space can be considered as a first step. This can, for example, be done by spectral Galerkin [17,19,20] or finite element methods [2,21,22]. This usually leads to large-scale SDEs. Solving such complex SDE systems causes large computational cost. In this context, model order reduction (MOR) is used to save computational time by replacing high-dimensional systems by systems of low order in which the main information of the original system should be captured.

Literature review
Balancing related MOR schemes were developed for deterministic linear systems first. Famous representatives of this class of methods are balanced truncation (BT) [3,26,27] and singular perturbation approximation (SPA) [14,23].
BT was extended in [5,8] and SPA was generalized in [33] to stochastic linear systems. With this first extension, however, no L 2 -error bound can be achieved [6,12]. Therefore, an alternative approach based on a different reachability Gramian was studied for stochastic linear systems leading to an L 2 -error bound for BT [12] and for SPA [32].
BT [1,5] and SPA [18] were also generalized to bilinear systems, which we refer to as the standard approach for these systems. Although bilinear terms are very weak nonlinearities, they can be seen as a bridge between linear and nonlinear systems. This is because many nonlinear systems can be represented by bilinear systems using a socalled Carleman linearization. Applications of these equations can be found in various fields [10,25,34]. A drawback of the standard approach for bilinear systems is that no L 2 -error bound could be shown so far. A first error bound for the standard ansatz was recently proved in [4], where an output error bound in L ∞ was formulated for infinite dimensional bilinear systems. Based on the alternative choice of Gramians in [12], a new type of BT for bilinear systems was considered [31] providing an L 2 -error bound under the assumption of a possibly small bound on the controls.
A more general setting extending both the stochastic linear and the deterministic bilinear case is investigated in [30]. There, BT was studied and an L 2 -error bound was proved overcoming the restriction of bounded controls in [31]. In this paper, we consider SPA for the same setting as in [30] in order to generalize the work in [18]. Moreover, we modify the reduced order model (ROM) in comparison to [18] and show an L 2 -error bound which closes the gap in the theory in this context.
For further extensions of balancing related MOR techniques to other nonlinear systems, we refer to [7,35].

Outline of the paper
This work on SPA for stochastic bilinear systems, see (1), can be interpreted as a generalization of the deterministic bilinear case [18]. This extension builds a bridge between stochastic linear systems and stochastic nonlinear systems such that SPA can possibly be used for many more stochastic equations and applications.
In Sect. 2, the procedure of SPA is described and the ROM is stated. With this, we provide an alternative to [30], where BT was studied for the same kind of systems. We also extend the work of [18] combined with a modification of the ROM and the choice of a new Gramian, compare with (3). Based on this, we obtain an error bound in Sect. 3 that was not available even for the deterministic bilinear case. Its proof requires new techniques that cannot be found in the literature so far and this is the main result of this paper. The efficiency of the error bound is shown in Sect. 5. There, the proposed version of SPA is compared with the one in [18] and with BT [30].

Setting and ROM
Let every stochastic process appearing in this paper be defined on a filtered probability We consider a large-scale stochastic control system with bilinear drift that can be interpreted as a spatially discretized SPDE. We investigate the system We assume that A, N k , H i ∈ R n×n (k ∈ {1, . . . , m} and i ∈ {1, . . . , v}), B ∈ R n×m and C ∈ R p×n . Moreover, we define x(t−) := lim s↑t x(s). The control u = (u 1 , . . . , u m ) T is assumed to be deterministic and square integrable, i.e., In this paper, we study SPA to obtain a ROM. SPA is a balancing related method and relies on defining a reachability Gramian P and an observability Gramian Q. These two matrices are selected such that P characterizes the states that barely contribute to the dynamics in (1a) and Q identifies the less important states in (1b), see [30] for estimates on the reachability and observability energy functionals. The estimates in [30] are global, whereas the standard choice of Gramians leads to results being valid in a small neighborhood of zero only [5,16].
In order to ensure the existence of these Gramians, throughout the paper it is assumed that Here, λ (·) denotes the spectrum of a matrix. The reachability Gramian P and the observability Gramian Q are, according to [30], defined as the solutions to where the existence of a positive definite solution to (3) goes back to [12,32] and is ensured if (2) holds. We approximate the large scale system (1) by a system which has a much smaller state dimension r n. This ROM is supposed be chosen, such that the corresponding output y r is close to the original one, i.e., y r ≈ y in some metric. In order to be able to remove both the unimportant states in (1a) and (1b) simultaneously, the first step of SPA is a state space transformation The ingredients of the balancing transformation are computed by the Cholesky factorizations P = L P L T P , Q = L Q L T Q , and the singular value decomposition X ΣY T = L T Q L P . This transformation does not change the output y of the system, but it guarantees that the new Gramians are diagonal and equal, i.e., S P S T = S −T QS −1 = Σ = diag(σ 1 , . . . , σ n ) with σ 1 ≥ . . . ≥ σ n being the Hankel singular values (HSVs) of the system.
We partition the balanced coefficients of (1) as follows: where A 11 , N k, 11 , H i,11 ∈ R r ×r (k ∈ {1, . . . , m} and i ∈ {1, . . . , v}), B 1 ∈ R r ×m and C 1 ∈ R p×r etc. Furthermore, we partition the state variablex of the balanced system and the diagonal matrix of HSVs where x 1 takes values in R r (x 2 accordingly), Σ 1 is the diagonal matrix of large HSVs and Σ 2 contains the small ones.

Remark 1
The balancing procedure requires the computation of the Gramians from (3) and (4). Practically, one always computes the solution of the equation in (4). The reason why an inequality is considered is that the proof of the error bound in Theorem 3 does not need an equality in (4). However, it is essential that we consider an inequality in (3). In contrast to the equation that may not have a solution, the inequality always has a solution under the given assumptions, but some regularization is needed to enforce uniqueness. In particular, one solves an optimization problem like, e.g., minimize tr(P) subject to (3). The reason why the trace is minimized is that one wants to achieve small HSVs, because this ensures a small error according to Theorem 3. We refer to Sect. 5 for more details on the computation of P.
Based on the balanced full model (1) with matrices as in (5), the ROM is obtained by neglecting the state variables x 2 corresponding to the small HSVs. The ROM using SPA is obtained by setting dx 2 (t) = 0 and furthermore neglecting the diffusion and bilinear term in the equation related to x 2 . Note that the condition dx 2 (t) = 0 is almost surely false. However, we enforce it since it leads to a ROM with remarkable properties. After setting dx 2 (t) = 0 it is no simplification to not take the diffusion into account since it would follow automatically that it is zero within the resulting algebraic constraint due to the consideration in [33,Section 2]. Assuming that the bilinear term is equal to zero in the equation is needed, so that the matrices of the ROM below do not depend on the control u. The dependence on u is something that is not desired.
With this simplification, one can solve for x 2 in the algebraic constraint. This leads to . Inserting this expression into the equation for x 1 and into the output equation, the reduced system is with matrices defined bȳ wherex(0) = 0 and the time dependence in (7a) is omitted to shorten the notation. This straightforward ansatz is based on observations from the deterministic case (N k = H i = 0), where x 2 represents the fast variables, i.e.,ẋ 2 (t) ≈ 0 after a short time, see [23]. This ansatz for stochastic systems might, however, be false, no matter how small the HSVs corresponding to x 2 are. Despite the fact that, for the motivation, a possibly less convincing argument is used, this leads to a viable MOR method for which an error bound can be proved. An averaging principle would be a mathematically well-founded alternative to this naive approach. Averaging principles for stochastic systems have for example been investigated in [36,37]. A further strategy to derive a ROM in this context can be found in [9].
Moreover, notice that system (7) is not a bilinear system anymore due to the quadratic term in the control u. This is an essential difference to the ROM proposed in [18]. One can think about a structure preserving version by setting B 2 = 0 in (7). This would lead to a generalized variant of the ROMs considered in [18,33]. The reason why this simplified method is not studied is, because the error bound in Theorem 3 could not be achieved. We refer to a further discussion below Theorem 3 and to Sect. 5 where (7) is compared numerically with the version obtained by choosing B 2 = 0.
for l = 1, 2 due to considerations in [6]. This implies λ (A ll ) ⊂ C − and hence guarantees the existence of A −1 ll for l = 1, 2.

L 2 -error bound for SPA
The proof of the main result (Theorem 3) is divided into two parts. We first investigate the error that we encounter by removing the smallest HSV from the system in Sect. 3.1.
In this reduction step, the structure from the full model (1) to the ROM (7) changes. Therefore, when removing the other HSVs from the system, another case needs to be studied in Sect. 3.2. There, an error bound between two ROM is achieved which are neighboring, i.e., the larger ROM has exactly one HSV more than the smaller one.
The results of Sects. 3.1 and 3.2 are then combined in Sect. 3.3 in order to prove the general error bound. For simplicity, let us from now on assume that system (1) is already balanced and has a zero initial condition (x 0 = 0). Thus, (3) and (4) become i.e., P = Q = Σ = diag(σ 1 , . . . , σ n ) > 0.

Error bound of removing the smallest HSV
We consider the balanced system (1) with partitions as in (5) and (6). As mentioned before, the reduced state equation is obtained by approximating x 2 throughx 2 := −A −1 22 (A 21x + B 2 u) in the differential equation for x 1 and within the output Eq. (1b). The ROM (7) can hence be rewritten as where we define We want to be able to subtract the ROM (10) from (1). Therefore, we add the following zero line to (10a) x 2 (t) for i = 1, . . . , v. Subtracting (10) from (1) together with (11) yields the following error system where we introduce the variables Moreover, adding (1a) and (10a) combined with (11) leads to Before we prove an error bound based on (12a) and (13), we need to introduce the vector u 0 of control components with a nonzero N k . This is given by The proof of the error bound when only one HSV is removed can be reduced to the task of finding suitable estimates for and subsequently the lemma of Gronwall. Then, a first estimate for (12b) is obtained, i.e., where f depends on the control u, the state x, the compensation terms c 0 , . . . , c v and Σ −1 2 assuming Σ 2 = σ I . Of course, the dependence of the above estimate on the state and the compensation terms is not desired. That is why another inequality is derived by using Ito's and Gronwall's lemma for Combining both inequalities, the result of the next theorem is obtained. A similar idea was also used to determine an error bound for BT [30]. However, the proof for SPA requires different techniques to find the estimates sketched above.
Theorem 1 Let y be the output of the full model (1) with x(0) = 0,ȳ be the output of the ROM (7) withx(0) = 0 and Σ 2 = σ I , σ > 0, in (6). Then, the following holds: Proof In order to improve the readability of this paper, the proof is given in Sect. 4.1 We proceed with the study of an error bound between two ROM that are neighboring.

Error bound for neighboring ROMs
In this section, we investigate the output error between two ROMs, in which the larger ROM has exactly one more HSV than the smaller one. This concept of neighboring ROMs was first introduced in [32] but in the much simpler stochastic linear setting. The reader might wonder why a second case is considered besides the one in Sect. 3.1 since one might just start with a full model that has the same structure as the ROM (7). The reason is that it is not clear how the Gramians need to be chosen for (7). In order to investigate the error between two ROMs by SPA, a finer partition than the one in (5) is required. We partition the matrices of the balanced full system (1) as follows: The partitioned balanced solution to (1a) and the Gramians are then of the form We introduce the ROM of truncating Σ 3 first. According to the procedure described in Sect. 2, the reduced system is obtained by setting dx 3 equal to zero, neglecting the bilinear and the diffusion term in this equation. The solutionx 3 of the resulting algebraic constraint is an approximation for x 3 . One can solve for this approximating variable and obtainsx 3 Inserting this result for x 3 in the equations for x 1 , x 2 and into the output Eq. (1b) leads to We aim to determine the error between this ROM and the reduced system of neglecting Σ 2 and Σ 3 . This is wherex r (0) = 0, and we define In order to find a bound for the error between (17b) and (18b), state variables analogous to x − and x + in Sect. 3.1 are constructed in the following and corresponding equations are derived. For simplicity, we use a similar notation again and definê Now, we find the differential equations forx − andx + . Using (19), we find that Applying the first line of (20), we obtain the following equation We add the zero line (21) to the state Eq. (18a) and subtract the resulting system from (17). Hence, we obtain One can see that the output of (22) is the output error that we aim to analyze. The sum of (17a) and (18a) together with (21) leads to . We now state the output error between the systems (17) and (18) for the case that the ROM are neighboring, i.e., the larger model has exactly one HSV more than the smaller one. Similarly as in Theorem 1, the proof relies on finding suitable estimates for E x T − (t)Σx − (t) and E x T (16).
Theorem 2 Letȳ be the output of the ROM (17),ȳ r be the output of the ROM (18) and (16). Then, it holds that Proof In order to improve the readability of this paper, the proof is presented later, see Sect. 4.2.

Main result
In the following, the main result of this paper is formulated. It is a consequence of Theorems 1 and 2. Proof We apply the results in Theorems 1 and 2. We remove the HSVs step by step and exploit the triangle inequality in order to bound the error between the outputs y andȳ. We introduceȳ as the output of ROM (7) with state space dimension = r , r + 1, . . . , n − 1. Notice thatȳ r coincides withȳ. Moreover, we setȳ n := y. We then have In the reduction step from y toȳ n−1 only the smallest HSV σ n is removed from the system. Hence, by Theorem 1, we have The ROMs of the outputsȳ andȳ −1 are neighboring according to Sect. 3.2, i.e., only the HSV σ is removed in the reduction step. By Theorem 2, we obtain This provides the claimed result.
The result in Theorem 3 tells us that the ROM (7) yields a very good approximation if the truncated HSVs (diagonal entries of Σ 2 ) are small and the vector u 0 of control components with a nonzero N k is not too large. The exponential in the error bound can be an indicator that SPA performs badly if u 0 is very large.
We conclude this section by a discussion on the result in Theorem 3. It is important to notice that the dependence of the error bound on the system matrices and the covariance matrix of M is hidden in the HSVs. Those are given by σ = √ λ (P Q). We can see from (3) and (4) that the Gramians depend on (A, B, C, H i , N k , K ) and hence σ = σ (A, B, C, H i , N k , K ). Consequently, changing the system matrices or the covariance matrix will change the HSVs too. Now, if K is large, the terms related to H i in (3) and (4) become more dominant which results in larger HSVs. According to Theorem 3 a worse approximation can then be expected. We also observe that the exponential term in Theorem 3 is related to the bilinearity in the drift. Setting N k = 0 for all k = 1, . . . , m the exponential becomes a one. This results in the bound that is known from the stochastic linear case [32]. Choosing H i = 0 for all i = 1, . . . , v yields a bound for the deterministic bilinear case. Notice that in this case, the state variables of the full and reduced system are not random anymore such that the expected value is redundant and can hence be omitted. Finally, considering H i = N k = 0 leads to the bound obtained for the deterministic linear case [23].
Since the ROM (7) has a different structure than (1), it is an important question why we do not use a structure preserving generalization of SPA considered in [18,33]. This variant of SPA is obtained by setting B 2 = 0 in (7). Conducting the proof of the error bound for this simplified method, one would obtain an additional term in the error bound that depends on Σ 1 , a possibly large matrix. This indicates that SPA with B 2 = 0 probability performs worse than the version stated in (7). We refer to Sect. 5 where both versions of SPA are compared numerically.

Proofs of Theorems 1 and 2
In this section, we present the pending proofs of Theorems 1 and 2.

Proof of Theorem 1
We derive a suitable upper bound for E[x T − (t)Σ x − (t)] first applying Ito's formula. Hence, Lemma 1 ("Appendix") and Eq. (12a) yield We find an estimate for the terms related to N k , that is where u 0 is defined in (14). Moreover, adding a zero, we rewrite With (25) and (26), (24) becomes Taking the partitions of x − and Σ into account, we see that x T − Σ 0 c 0 = x T 2 Σ 2 c 0 . Furthermore, the partitions of x − and H i yield because (9) and (12b) into inequality (27) and exploit the relations in (28) and (29). Hence, Since Σ is positive definite, we obtain an upper bound for the output error by Defining the term α x + B 2 u)ds and exploiting the assumption that Σ 2 = σ I , leads to The remaining step is to find a bound for the right side of (30) that does not depend on α + anymore. For that reason, a bound for the expression E[x T + (t)Σ −1 x + (t)] is derived next using Ito's lemma again. From (13) and Lemma 1 ('Appendix'), we obtain Analogously to (25), it holds that Additionally, we rearrange the term related to A as follows Moreover, we have We plug in the above results into (31) which gives us From inequality (8) and the Schur complement condition on definiteness, it follows that We multiply (33) with x + 2u T from the left and with x + 2u from the right. Hence, Applying this result to (32) yields We first of all see that x T + Σ −1 0 c 0 = x T 2 Σ −1 2 c 0 using the partitions of x + and Σ. With the partition of H i , we moreover have In addition, it holds that Plugging the above relations into (35) leads to We add 2E t 0 v i, j=1 c T i Σ −1 2 c j k i j ds to the right side of (36) and preserve the inequality since this term is nonnegative due to Lemma 2 ('Appendix'). This results in ds.
Gronwall's inequality in Lemma 3 ("Appendix") yields We find an estimate for the following expression: Combining (37) with (38), we obtain Comparing this result with (30) implies

Proof of Theorem 2
We make use of Eqs. (22a) and (23) in order to prove this bound. We setΣ = Σ 1 Σ 2 as a submatrix of Σ in (16). Lemma 1 ('Appendix') now yields We see that the right side of (41) contains the submatricesÂ,B,Ĥ ,N andΣ. In order to be able to refer to the full matrix inequality (9), we find upper bounds for certain terms in the following involving the full matrices A, B, H , N and Σ. With the same estimate as in (25) and the control vector u 0 defined in (14), we have Adding the term m k=1 Moreover, it holds that x 1 x 2 x 3 = −B 3 u by the definition ofx 3 . Moreover, it can be seen from the second line of (20) that [ A 31 A 32 A 33 ]x − = 0. Hence, It remains to find a suitable upper bound related to the expression depending onĤ i . We first of all see that v i, j=1 Applying (42), (43) and (44) to (41), results in Using It can be seen further that taking the first line of (20) into account. Inserting (46) and (47) into (45) and using the fact that 2x T −Σ 0 (9) and (22b), we obtain Applying Lemma 3 ('Appendix') to this inequality yields Since the above left side of the inequality is positive, we obtain We exploit that Σ 2 = σ I . Hence, we have where we setα In order to find a suitable bound for the right side of (49), Ito's lemma is applied to E[x T + (t)Σ −1x + (t)]. Due to (23) and Lemma 1 ('Appendix'), we obtain Analogously to (42), it holds that Furthermore, we see that x r h 1 h 2 = −B 3 u by the definition ofx 3 and the second line of (20), we obtain [ A 31 A 32 A 33 ]x + = −2B 3 u. Thus, Finally, we see that applying Lemma 2 ('Appendix'). With (51), (52) and (53) inequality (50) becomes Similarly to (34), we obtain This leads to In the following (55) is expressed by terms depending on Σ 2 . We obtainx T +Σ −1 0 2ĉ 0 exploiting the partitions ofx + andΣ. The terms depending onĤ i become 2ĉ j k i j which is positive due to Lemma 2 ('Appendix'). Furthermore, using the first line of (20), it holds that We insert (56) and (57) into (55) and obtain With Lemma 3 ('Appendix'), analogously to (39), we find The relations (49) and (58) yield the claim.

Numerical experiments
We conduct a numerical experiment in order to compare several MOR techniques and to check the performance of the error bound in Theorem 3. We determine three different ROMs. One is by SPA stated in (7). The corresponding output is denoted bȳ y S P A . Moreover, we study a structure preserving version of SPA that is obtained by setting B 2 = 0 in (7), i.e., (B,D,Ē k ,F i ) = (B 1 , 0, 0, 0). This technique is denoted by SPA2 and its output is written asȳ S P A2 . Notice that this method is a generalization of the one in [18]. Finally, we deal with BT [30], another structure preserving scheme. The respective output isȳ BT .
In particular, we apply the different MOR variants to a heat transfer problem that was proposed in [30]. We consider a heat equation on [0, 1] 2 : with Dirichlet and noisy Robin boundary conditions where u ∈ L 2 T is a scalar deterministic input and w denotes a scalar standard Wiener process. We discretize the heat equation with a finite difference scheme on an equidistantñ ×ñ-mesh. This leads to an n =ñ 2 -dimensional stochastic bilinear system where C = 1 n [1 1 . . . 1], i.e., the average temperature is considered. We refer to [5] or [12] for more details on the matrices A, B and N . There, a similar example was investigated for deterministic bilinear systems and linear stochastic systems, respectively.
According to (3) and (4), the associated Gramians are the solutions to A T P −1 + P −1 A + N T P −1 N ≤ −P −1 B B T P −1 , We multiply (60) with P from the left and the right. Applying the Schur complement condition on definiteness, (60) can then be equivalently written as the following linear matrix inequality: see also [12,Remark III.2]. The matrix inequality (61) now is solved using the LMIsolver YALMIP [24] minimizing tr(P). LMI-solver are generally not suitable in a large-scale setting. Therefore, we chooseñ = 10 implying n = 100. As in [30], set T = 2 and choose two different controls u(t) =ũ(t),û(t), whereũ(t) = cos(π t) andû(t) = 1 √ 2 , t ∈ [0, T ]. We derive the ROMs using SPA, a modified structure preserving version SPA2 and BT based on Q and P. We determine the reduced systems for r = 3, 6, 9. We have an error bound for    (7) for different reduced order dimensions r , T = 2 and outputs u =ũ,û . Notice that u 0 ≡ u in this example.
We compute E y −ȳ l 2 L 2 T for l = S P A, BT , S P A2 in Tables 1, 2 and 3. We can see by looking at Tables 1 and 3 that SPA performs clearly better than the structure preserving variant SPA2. This tells us that it is worth to allow a structure change since this can lead to better approximations. We can also see that the error bound for SPA is relatively tight. It is tighter for BT, compare with Table 2. However, this also means that BT performs worse than SPA. Consequently, SPA is the best choice for the example considered here.

Conclusions
In this paper, we investigated a large-scale stochastic bilinear system. In order to reduce the state space dimension, a model order reduction technique called singular perturbation approximation was extended to this setting. This method is based on Gramians proposed in [30] that characterize how much a state contributes to the system dynamics or to the output of a system. This choice of Gramians as well as the structure of the reduced system is different than in [18]. With this modification, we provided a new L 2 -error bound that can be used to point out the cases in which the reduced order model by singular perturbation approximation delivers a good approximation to the original model. This error bound is new even for deterministic bilinear systems. Its quality was tested in a numerical experiment.
is also positive semidefinite.
Proof The proof can be found in [32,Proposition 5.3].