Behavioral Theory for Stochastic Systems? A Data-driven Journey from Willems to Wiener and Back Again

The fundamental lemma by Jan C. Willems and co-workers, which is deeply rooted in behavioral systems theory, has become one of the supporting pillars of the recent progress on data-driven control and system analysis. This tutorial-style paper combines recent insights into stochastic and descriptor-system formulations of the lemma to further extend and broaden the formal basis for behavioral theory of stochastic linear systems. We show that series expansions -- in particular Polynomial Chaos Expansions (PCE) of $L^2$-random variables, which date back to Norbert Wiener's seminal work -- enable equivalent behavioral characterizations of linear stochastic systems. Specifically, we prove that under mild assumptions the behavior of the dynamics of the $L^2$-random variables is equivalent to the behavior of the dynamics of the series expansion coefficients and that it entails the behavior composed of sampled realization trajectories. We also illustrate the short-comings of the behavior associated to the time-evolution of the statistical moments. The paper culminates in the formulation of the stochastic fundamental lemma for linear (descriptor) systems, which in turn enables numerically tractable formulations of data-driven stochastic optimal control combining Hankel matrices in realization data (i.e. in measurements) with PCE concepts.


Introduction -Spring has arrived
In popular culture winter is coming is an established meme. In the context of the past and infamous AI winter (AIw, 2005) and its effect on data-driven methods, the recent rapid growth of interest and manifold results evidence the opposite. That is, we are currently experiencing an ongoing, extremely vibrant growth period of research activities on data-driven methods for systems and control.
From the control perspective, data-driven methods at large can be understood as an attempt to overcome the traditional two-step procedure of, first, modelling-either first principles combined with parameter estimation or system identification based on measurement data-and, second, designing a controller for the obtained system/process model, see Figure 1.
Building on the success and impact of subspace identification methods (Verhaegen, 2015;Markovsky et al., 2006), a recent stream of papers-which i.a. comprises De Persis & Tesi (2019); Favoreel et al. (1999) ;Fiedler & Lucia (2021); van Waarde et al. (2022)-has analyzed the so-called direct approach to control design, see Figure 1. This approach aims at obtaining a non-parametric system description directly from data, i.e. The established approach (indirect, two-step) The data-driven approach (direct) Figure 1: Direct and indirect approaches to data-driven controller design.
without obtaining system matrices, and to design feedback laws on this description. 1 In turn, subspace identification techniques are closely linked to behaviorial concepts in systems and control, which have been conceived by Jan C. Willems; see Willems (1986aWillems ( ,b, 1987 for original references, Polderman & Willems (1997); Markovsky et al. (2006) for textbook expositions, and Willems (2007) for an introduction. It stands to reason that behavioral systems theory is indeed foundational for a wide range of recent results on data-driven control (Markovsky, 2021). The pivotal result in this context, which shows for controllable finite-dimensional Linear Time-Invariant (LTI) systems that all input-output-state trajectories of finite length lie in the column space of suitable Hankel matrices constructed directly from data, appeared already in the seminal paper by Willems et al. (2005)-hence it is commonly referenced as the Fundamental Lemma of Willems' et al. Given the recent and seemingly exponential growth of publications • extending/tailoring the fundamental lemma to specific system classes and • using it for data-driven control, we postpone a more detailed literature review with respect to the former to Section 2.4. With respect to the latter, we remark that the two most frequent usages of the fundamental lemma for control design are output feedback predictive control and direct feedback design. The major conceptual advantage of output feedback predictive control is that it alleviates the need to design state estimators. The earliest conception of Model Predictive Control (MPC) based on Hankel matrices, which did not receive widespread attention, appears to be Yang & Li (2015). This line of research bifurcated to exponential growth with Coulson et al. (2019a) and manifold follow-ups. With respect to data-driven feedback design, we refer to De Persis & Tesi (2019) and the references therein. Both lines of research are linked by the concept of optimizing closed-loop behaviors, cf. Dörfler et al. (2022). So far, applications of datadriven control based on behavioral concepts includes power systems (Schmitz et al., 2022a;Huang et al., 2019;Carlet et al., 2020), autonomous driving (Wang et al., 2022), and building control O'Dwyer et al., 2021;Bilgic et al., 2022). Interestingly, the common scope of behavioral systems theory and its major success in data-driven control in discrete-time settings go along with the orthogonal observation of limited progress with respect to behavioral approaches for stochastic and infinite-dimensional systems. Indeed the ultimate paper of Jan C. Willems discusses behavioral ideas for open stochastic systems (Willems, 2012). Baggio & Sepulchre (2017) extend this to a canonical representation of LTI stochastic processes. Yet, both approaches do not cover data-driven representations of stochastic LTI systems. Moreover, Pillai & Willems (1999) and Pillai & Shankar (1999) discuss behavioral representations of systems with distributed parameters. Arguably, data-driven control of infinite-dimensional systems would require measurement data in appropriate infinite-dimensional spaces and likewise data-driven approaches to stochastic systems are seemingly limited in applicability as stochastic processes are typically described via cumulative distributions, probability densities, or random variables-all of which are, in general, infinite dimensional.
Statistical moments (expectation, co-variance, skewness etc.) provide an alternative description of random variables, which is finite dimensional under rather specific assumptions on the underlying distribution. One may claim that the frequently-used Behavioral viewpoint Fundamental lemma (Willems et al., 2005) Descriptor systems (Schmitz et al., 2022a) Stochastic descriptor systems Stochastic systems (Pan et al., 2021) Figure 2: Overview of the contributions of this tutorial. modelling in terms of the first two moments (expectation and co-variance) does not stretch far beyond i.i.d. Gaussian uncertainties as moment closures for nonlinear systems are difficult and as manifold applications require modelling non-Gaussian uncertainties. In turn, non-Gaussian random-variables induce the need for higher-order moments (Kuehn, 2016;Singh & Hespanha, 2010). Even in cases where expectation and co-variance capture sufficient information about the uncertainty, the fact that they parametrize random variables in nonlinear fashion-i.e., any scalar Gaussian is given as the sum of its mean and square root of the variance times a standard normal distribution-often complicates their use.
Remarkably, already in the 1930s the most cited journal publication of Norbert Wiener proposed an alternative avenue (Wiener, 1938). Therein, Wiener suggested to represent random variables via series expansions expressed in suitable polynomial bases of the underlying probability space. The required underlying structure is a Hilbert space of L 2 random variables, i.e., random variables of finite variance. This approach is commonly denoted as Wiener chaos expansion, as polynomial chaos, or as generalized polynomial chaos. For the sake of brevity, we gloss here over the subtle distinctions of these methods. The obtained series are denoted as Polynomial Chaos Expansions (PCE). Importantly, PCEs parameterize a large class of non-Gaussian random variables in a linear structure. Without any detailed elaboration in this introduction, we remark that polynomial chaos approaches are established for uncertainty quantification and uncertainty propagation (O'Hagan, 2013;Sullivan, 2015). Very often one leverages that the finiteness of the PCE is closely related to the appropriate choice of the basis, i.e., a good choice admits finite PCEs of non-Gaussian random variables (Xiu & Karniadakis, 2002). PCEs have also seen widespread and continued use in systems and control, e.g., for system analysis (Nagy & Braatz, 2007;Kim et al., 2013;Ahbe et al., 2020), for stochastic MPC (Mesbah & Streif, 2015;Mesbah, 2016;Fagiano & Khammash, 2012), and for optimization in power systems (Bienstock et al., 2014;Mühlpfordt et al., 2017Mühlpfordt et al., , 2018a. In light of the growing success of behavioral systems theory in data-driven control this tutorial makes the first move toward behavioral and data-driven concepts for stochastic descriptor systems, cf. Figure 2. One pillar are recent insights of the authors into the fundamental lemma for LTI descriptor systems, i.e. linear time invariant systems subject to algebraic constraints (Schmitz et al., 2022b). The other pillar are our results on a fundamental lemma for stochastic LTI systems (Pan et al., 2021) using polynomial chaos descriptions for Gaussian and non-Gaussian, potentially non-i.i.d., process disturbances. Combining both approaches we propose a behavioral approach to stochastic descriptor systems paving the road for data-driven control of this system class and naturally including stochastic LTI systems as a special case.
The structure and the contributions of this tutorial are as follows: In Section 2, we recapitulate and further refine our recent findings on the fundamental lemma for deterministic LTI descriptor systems (Schmitz et al., 2022b). Moreover, we give an overview on Willems-inspired fundamental-lemma type results. To second the further developments, Section 3 turns to modelbased descriptions of stochastic systems, it recalls the basics of Wiener chaos expansions and PCE, and it revisits the covariance propagation for LTI systems. In Section 4 the insights of Willems and Wiener are combined, i.e., we discuss different behavioral representations of stochastic systems. We introduce the behavior of PCE coefficients before we turn to the randomvariable behavior and to the moment behavior. We also introduce the concept of behavioral lifts, i.e., we characterize the underlying relations of behaviors of PCE coefficients, random variables, and realizations. We show why and how moment characterisations fall short in the behavioral context. Section 5 then continues to data-driven representations via a tailored fundamental lemma for stochastic descriptor systems this way extending Pan et al. (2021). The crucial observation, which carries over from Pan et al. (2021), is that due to the behavioral lift the Hankel matrices can be constructed from measurements of realization data (or sampled trajectories) while the PCE framework allows to capture random variables exactly. Section 6 discusses data-driven stochastic optimal control, while Section 7 draws upon numerical examples to illustrate oour findings. The paper closes with an outlook on open problems and conclusions in Sections 8 and 9.
Notation. The non-negative and the positive integers are denoted by N and Z + , respectively. For n, m ∈ N with n ≤ m we define I [n,m] . = [n, m] ∩ N. For two sets Ω and M, Ω M denotes the set of all mappings f : M → Ω. Further, for two sets Ω 1 and Ω 2 , we identify the Cartesian product Ω M 1 × Ω M 2 with (Ω 1 × Ω 2 ) M . The restriction of a function f ∈ Ω M to a subset M 0 ⊂ M is denoted by f | M 0 . In the case M = I [n,m] we use for f ∈ Ω M the notation f k . = f (k) and when there is no possibility of confusion also f k = f (k) for k ∈ M. Moreover, for f ∈ Ω M , where Ω = R d and M ⊂ N such that I [n,m] ⊂ M, we define f [n,m] . = f n . . . f m . The identity matrix and zero matrix designated by I n and 0 n×m . For a matrix A we use the notations rk(A), colsp(A), and A † for the rank, the column span, and the Moore-Penrose inverse of A, respectively. The Kronecker product of two matrices A and B is denoted by A ⊗ B.

The Deterministic Fundamental Lemma Revisited
We first recall the analysis of descriptor systems via the quasi-Weierstraß form, then we revisit their behavioral descrip-tion and a tailored fundamental lemma. This section concludes with an overview of fundamental-lemma-type results.

The Quasi-Weierstraß Form of Descriptor Systems
We consider linear time-invariant discrete-time systems where x k ∈ R n x , u k ∈ R n u , and y k ∈ R n y are the state, the input, and the output at time instant k ∈ N, respectively. Further, E, A ∈ R n x ×n x , B ∈ R n x ×n u , C ∈ R n y ×n x and D ∈ R n y ×n u . If E is an invertible matrix, the system can be transformed into an (explicit) LTI system, where E is replaced by the identity matrix I n x . Otherwise, i.e. whenever rk(E) < n x , system (1) is referred to as LTI descriptor system. For the remainder of this paper, the term LTI system will refer to the explicit case where E is invertible, while descriptor system refers to the noninvertible case. Henceforth, we assume that the matrix pencil λE − A is regular, i.e., det(λE − A) 0 holds for some λ ∈ C. In this case there exist invertible matrices P, S ∈ R n x ×n x such that the pencil λE − A can be transformed into quasi-Weierstraß form where N ∈ R n N ×n N is a nilpotent matrix and J ∈ R n J ×n J with n J + n N = n x , cf. Berger et al. (2012); Dai (1989); Kunkel & Mehrmann (2000). Whenever E is invertible, then n J = 0 and N is an empty matrix. System (1) can be transformed into the equivalent system which we refer to as quasi-Weierstraß system, where Note that system (3a) is decoupled and consists of the dynamical part for z J k and the algebraic part for z N k . The state evolution for the quasi-Weierstraß system (3) is given by cf. Belov et al. (2018), where δ is the structured nilpotency index as defined in Definition 1 below. The solution formula in (5) indicates that descriptor systems (1) can, in general, be considered as non-causal, i.e., the present state is influenced by input actions at subsequent time instances. Another way of thinking about these systems regards the choice of future input actions as constrained by the present state value. Commonly, as a reference on the number of necessary future input actions the usual nilpotency indexδ of the matrix N, i.e. Nˆδ = 0 and Nˆδ −1 0, is used (Belov et al., 2018). This can be further improved by using the following, slightly different definition, which will foster the analysis of system (1).
Definition 1 (Structured nilpotency index). The number is called the structured nilpotency index δ of system (1).
In the light of data-driven control, the structured nilpotency index, which is actually bounded from above byδ, should be preferred over the usual nilpotency index. As we will see, it leads to tighter estimates on the data demand. We further remark that, although the quasi-Weierstraß form (2) is not unique, the structured nilpotency index δ, the nilpotency indexδ, and the dimensions n J , n N are uniquely determined. In particular, they do not depend on the choice of the transformation matrices S and P, cf. (Kunkel & Mehrmann, 2000, Lemma 2.10). Moreover, we note the conceptual relationship between the structured nilpotency index and the input index for continuous-time descriptor system introduced in Ilchmann et al. (2018Ilchmann et al. ( , 2019. The next example illustrates the fact that δ <δ may hold. Example 2 (Nilpotency indexes). Consider a system given in quasi-Weierstraß form where The nilpotency index of N isδ = 4. The columns of B N stand orthogonal on the rows of N 2 , while NB N 0. Therefore, the structured nilpotency index is δ = 2.
From (5) one immediately observes that in the case NB N 0, i.e. δ ≥ 2, systems (3) and (1) are non-causal, i.e., the value of the state at time k depends on the input signal until time k + δ − 1. Further, the representation (5) allows to describe the set of consistent initial values of system (1), cf. Belov et al. (2018), Next, we recall controllability and observability concepts for descriptor systems as introduced by Dai (1989) using the equivalent characterizations in accordance to Belov et al. (2018); Stykel (2002).
Remark 4 (Controllability/observability in the Weierstraß form). R-controllability can be equivalently characterized by the Kalmanlike criterion for the quasi-Weierstraß system (2), Remark 5 (Equivalent explicit LTI system). Considering future control inputs as additional states the LTI descriptor system (1) can be lifted to an explicit LTI system. Given the equivalent quasi-Weierstraß system (3) we set Defining the matrix we consider the explicit LTI system Using the evolution formulas (5), we have, for all k ∈ N, that (3) implies (11) for all k ∈ N. The inverse implication follows via Put differently, the explicit LTI system (11) resolves the noncausality at the expense of an increased state dimension n J + n u (δ − 1). R-controllability of system (1) implies controllability of the explicit LTI system (11). R-observability of (1), however, does not guarantee observability of (11). The idea of state augmentation is borrowed from the continuos-time setting (Ilchmann et al., 2018(Ilchmann et al., , 2019, where the state is augmented by derivatives of the control input.

The Behavioral Approach to Descriptor Systems
The trajectories of system (1) can be described in the language of behavioral systems theory advocated by Polderman & Willems (1997), see also Willems (1991); Markovsky et al. (2006). The (full) behavior of system (1) is defined as The behavior B ∞ contains all state-input-output trajectories of (1). The trajectories of finite horizon T ∈ N are collected in the finite-horizon behavior We recall the definition of behavioral controllability, cf. Markovsky et al. (2006) and Wood & Zerz (1999).
Definition 6 (Behavioral controllability). B ∞ is said to be controllable if for each two trajectories b,b ∈ B ∞ and every T ∈ Vividly, controllability of B ∞ allows to switch from one trajectory to another, however, with some transition delay T .
Lemma 7 (Controllability of the behavior). The behavior B ∞ is controllable if and only if system (1) is R-controllable. In this case the transition delay can be chosen as T = δ + n J independently from the particular trajectories.
Proof. The proof is based on the observation that the behavior B ∞ admits a kernel representation. Specifically, (x, u, y) ∈ B ∞ if and only if where R is the polynomial matrix, and σ denotes the left shift operator on (R n x +n u +n y ) N , i.e. (σ f ) k = f k+1 . Controllability of B ∞ is equivalently characterized via the criterion that R(λ) has constant rank for all λ ∈ C, see Markovsky et al. (2006). That is n x + n y = rk(R(λ)) = rk λE − A B + n y for all λ ∈ C. The latter condition is, in turn, equivalent to R-controllability of (1). It remains to show that the transition delay does not depend on the particular trajectories. Let b = (x, u, y),b = (x,ũ,ỹ) ∈ B ∞ and set z = P −1 x,z = P −1x , where P is the transformation matrix used to obtain (2). Furthermore, fix T ∈ Z + and set T = δ + n J . We choose Since the system is R-controllable, that is (10a) holds, we find values u k for T + δ − 1 ≤ k ≤ T + T − 1 such that A solution of (15) is, e.g., given via the pseudo-inverse, Let z ∈ (R n x ) N with z J 0 = z J 0 evolve according to the state evolution of system (3), cf. (5), subject to the input u . Moreover, choose y ∈ (R n y ) N with y k = C 1 z J k + C 2 z N k + Du k . Then b = (Pz , u , y ) ∈ B ∞ and (14) holds by construction.
In general, the state variable x is latent and only inputoutput trajectories of the system (1) are directly accessible through measurements. The input-output trajectories are collected in the manifest behavior where the input-output trajectories of finite horizon T ∈ N are given by Remark 8 (Controllability of the manifest behavior).
Regarding the manifest behavior assuming R-controllability and R-observability of the underlying system is no limitation. According to the quasi-Weierstraß form (2) the transfer function G(z) = C(zE − A) −1 B + D ∈ R n y ×n u [z] can by decomposed into G(z) = W(z) +W(z) + D with a strictly proper rational matrix W(z) related to the dynamical part and a polynomial matrix W(z) related to the algebraic part, see (Dai, 1989, Theorem 2-6.2). The rational matrix W(z) gives rise to a minimal representation W(z) = C J (zI − J) −1 B J with matrices J, B J , C J satisfying (10a) and (10b). For the polynomial matrix W(z) there exists matrices N, B N , C N , where N is nilpotent, such that W(z) = C N (zI − N) −1 B N , see (Dai, 1989, Lemma 2-6.2). This implies the existence of a state-space representation corresponding to the manifest behavior in terms of an R-controllable and R-observable system.
The next lemma, which tightens the assertion of (Schmitz et al., 2022b, Lemma 1) by using the structured nilpotency index δ, states a lower bound on the length of input-output trajectories such that the state x is uniquely determined.
Lemma 9 (Uniqueness of state trajectory). Suppose that the system (1) is R-observable. If (x, u, y), (x, u,ỹ) ∈ B n J −2+δ+T for some T ∈ N such that y| [0,n J −1] =ỹ| [0,n J −1] , then Lemma 9 can be verified following the same argumentation as in Schmitz et al. (2022b). For the sake of convenience, we provide a proof below.
Proof. Let z = P −1 x be the state variable of the quasi-Weierstraß system (2) given in (4) with its components z J and z N ; similar forz = P −1x . As the outputs of both trajectories (x, u, y) and (x, u,ỹ) coincide until time n J −2+δ one has by (3b) and (5) for all k = 0, . . . , n J − 1. The R-observability, i.e. (10b) holds, implies z J 0 =z J 0 . According to (5) this yields The assertion follows with the transformation x = Pz andx = Pz.

The Fundamental Lemma for LTI and Descriptor Systems
Willems' fundamental lemma gives rise to a non-parametric description of linear time-invariant systems by means of Hankel matrices containing only data generated by a trajectory, which carries sufficiently rich information (Willems et al., 2005). This richness is specified in the following definition.
Definition 10 (Persistency of excitation). The input trajectory u : {0, . . . , T − 1} → R n u is said to be persistently exciting of order L, where L ∈ N with T ≥ L(n u + 1) − 1, if the Hankel matrix has row rank n u L.
The original statement of the fundamental lemma by Willems et al. (2005) is given in a behavioral context. Here, we focus only on input-output trajectories. A corresponding statement including the state variable can be derived by imposing stronger assumptions on the output, i.e. C = I.
Moreover, the choice of g in the representation ( Remark 12 (The structured nilpotency in the lemma).
In contrast to Lemma 11 the version of the fundamental lemma given by Schmitz et al. (2022b) is formulated with respect to the nilpotency index of N. The proof by Schmitz et al. (2022b) relies on the state evolution (5) for the quasi-Weierstraß system (3) and can be directly adapted to the structured nilpotency index.
The second statement in Lemma 11, which reflects the noncausality of the system, is not explicitly given in Schmitz et al. (2022b). It can be seen, however, in the proof of Lemma 2 by Schmitz et al. (2022b) that the input signals are artificially trimmed to the length L. Therefore, a slight modification of the block matrices U and V in the proof of Lemma 2 by Schmitz et al. (2022b) yields the above assertion. Summing up, modulo minor changes, the proof of Lemma 11 follows from the one given by Schmitz et al. (2022b).
For the LTI case, i.e. rk(E) = n x = n J , n N = 0 and δ = 1, we have the usual fundamental lemma, see for instance van Waarde et al. (2020).
Corollary 13 (The case rk(E) = n x ). Suppose rk(E) = n x and that the LTI system (1) is controllable. Let (u, y) ∈ B T −1 such that u is persistenly exciting of order L + n x , then (ũ,ỹ) ∈ B L−1 if and only if there is g ∈ R T −L+1 such that The next lemma, which generalizes Moonen et al. (1989) from (explizit) LTI to descriptor systems, yields a rank condition for the stacked Hankel matrix in (19).
Lemma 14 (Rank of the Hankel matrices). Suppose that sys- Proof. Without loss of generality we assume that the system is in quasi-Weierstraß form (3). Let (z, u, y) ∈ B T −1 , where z is decomposed into z J and z N as in (4). Then the matrix has full row rank, cf. (Schmitz et al., 2022b, Proof of Lemma 2). Let where S ∈ R n y L×n J , T ∈ R n y L×n u L and R ∈ R n N L×n u (L+δ−1) . Then Choose a matrix H ⊥ u such that the columns of H ⊥ u span the ker- together with L ≥ n J implies that S has full column rank. Together with the full row rank of the matrix H z J H u we have This shows the assertion.
Remark 15 (Nilpotency index and system dimensions). In certain situations the physical interpretations of the underlying system may help to identify the stuctured nilpotency index δ and the dimension n J . Moreover, comparison of Lemma 11 and Corollary 13 shows that the explicit LTI case requires more data than the descriptor setting. In the case of unknown δ and n J Lemma 14 provides a promising approach to estimate these quantities by testing the rank condition (21) for various input-output trajectories.

Overview of Deterministic Variants of the Lemma
Naturally, the previous exposition motivates to ask whether the fundamental lemma can be tailored or extended to other settings and how to formalize such extensions? Table 1 gives an overview on both items. As one can see, there exist manifold extensions of the original result of Willems et al. (2005). This includes more easily accessible proofs in the LTI setting (De Persis & Tesi, 2019), the extension to data segmentation in the Hankel matrices (van Waarde et al., 2020), the relaxation of the usual controllability assumption (Yu et al., 2021), and the consideration of affine structures, i.e. linear dynamics with constant additive inhomogenities. There also have been recent extensions to linear descriptor systems-see Schmitz et al. (2022b) and the recapitulation and refinements in Section 2and to linear stochastic systems-cf. Pan et al. (2021) and Section 5. Both of which are foundational for the present tutorial as we give an extension to stochastic descriptor systems in Section 5.
Besides the LTI setting, there are tailored variants for linear parameter varying systems (Verhoek et al., 2021), for linear time-varying ones (Nortmann & Mylvaganam, 2021) We conclude this overview with a crucial observation: while the issues surrounding measurement noise in fundamental lemma type results have seen manifold research activities-cf., e.g., Coulson et al. (2019b); Yin et al. (2021a,b)-the question of datadriven propagation of stochastic uncertainties has received much less attention. Specifically, we mention two lines of research: • We have shown in (Pan et al., 2021) that finite dimensional series descriptions of exogenous uncertainties enable a stochastic fundamental lemma. 2 Below, we extend our earlier results towards stochastic descriptor systems.
• Dörfler et al. (2021) discuss certainty equivalence LQR design. However, while a data-driven surrogate of the closed-loop state-feedback system matrix A + BK is provided therein, co-variance propagation as such is not discussed.

Models of Stochastic Linear Systems
We want to analyze descriptor systems of the form (1) subject to stochastic, exogenous process disturbances. To this end, we model the trajectories of the system as stochastic processes.
Let (Ω, F , P) be a probability space consisting of the sample space Ω (a.k.a. set of outcomes), the σ-algebra F , and the probability measure P. The space L 2 ((Ω, F , P), R d ) consists of vector-valued random variables with realizations in R d , d ∈ N, and finite variance. For X, Y ∈ L 2 ((Ω, F , P), R d ), the expected value and the covariance matrix are For stochastic processes X, Y ∈ L 2 ((Ω, F , P), R d ) N the expectation operator E and the covariance operator Cov are de- Subsequently, we assume that the probability space (Ω, F , P) is separable, i.e., F is generated by a countable family of sets Ω k ⊂ Ω, k ∈ N. Then L 2 ((Ω, F , P), R d ) is a separable Hilbert space and it possesses a countable orthogonal basis, cf. Brezis (2011).
To the end of recalling models of stochastic linear systems, we consider where the state, exogenous input, and output signals are modelled as stochastic processes, that is X k ∈ L 2 ((Ω, F , P), R n x ), Y k ∈ L 2 ((Ω, F , P), R n y ), and V k ∈ L 2 ((Ω, F , P), R n v ). Similar to the deterministic setting (cf. (7)), invoking the assumed regularity of the pencil λE − A, system (22) can be transformed into quasi-Weierstraß form. Further, the random variable X 0 has to be drawn from the set  Markovsky (2021) to be consistent with the system dynamics (22). Notice that the variable V contains all exogenous inputs of the system, i.e., the manipulated control inputs U as well as the exogenous process disturbances W are related by For many of our formal developments, we do not elaborate on the distinction between controls U and disturbances W as this fits well to behavioral viewpoint.Yet, from a stochastic systems point of view further considerations are necessary.

Stochastic Filtrations and the Markov Property.
The non-causality (for structured nilpotency index δ ≥ 2) of the regular descriptor system (22) implies that the solution X = (X n ) n∈N is not a Markov process with respect to its natural filtration (σ(X 0 , . . . , X k )) k∈N , where σ(X 0 , . . . , X k ) denotes the σalgebra generated by X 0 , . . . , X k . However, by augmenting the state variable with future control inputs (cf. Remark 5) one can recover the Markov property.
To avoid further cumbersome technicalities, we consider the case where the exogenous disturbance W at future time instances does not influence the present state. More precisely, given system (22) with split input (24), the matrix N in the quasi-Weierstraß form with S F = F J F N annihilates F N , i.e. NF N = 0, cf. (2). Further, suppose that the noise W = (W k ) k∈N is a sequence of independent square-integrable random vectors. According to the noncausality (cf. (5)) we think of a control law which assigns the new input action U k+δ on the basis of the current value of the state X k , i.e. U k+δ = K(X k ) with some fixed measurable map K. We consider the stochastic processes ∆ and Γ given by . The next proposition discusses the Markovian property of both stochastic processes.
Proposition 16 (Markov property). Suppose that NF N = 0. Further, let W 0 , W 1 , . . . be mutually independent and assume that X 0 , U 0 , . . . , U δ−1 are independent of W 1 , W 2 , . . . . Let the system be governed by the control law U k+δ = K(X k ) for k ∈ N with a measurable function K. Then (i) W j for j ≥ k + 1 is independent of F k as well as σ(∆ k ) and (ii) ∆ is a Markov process, i.e.
If in addition the system (22) is R-observable, then (iii) W j for j ≥ k + n J is independent of G k as well as σ(Γ k ) and (iv) the stochastic process Γ satisfies the Markov property for every Borel set B ⊂ R n y n J +n v (n J +δ−1) .
Proof of Proposition 16. We show (i). According to the state evolution (5) the state X k+1 can be expressed linearly by X k , U k , . . . , U k+δ in combination with W k , W k+1 . Together with the control law for U k+δ = K(X k ) we find the functional description with a linear functions f andf . This shows we see by induction that W j for j ≥ k + 2 is independent of σ(∆ 0 , . . . , ∆ k , W k+1 ) and therefore, is also independent of F k+1 as well as σ(∆ k ). We show (ii). The conditional probability can by written in terms of the conditional expectation operator employing the characteristic function 1 A of the set A. The F k -measurability of ∆ k and the fact that W k+1 is independent of F k imply (Malliavin et al., 1995, Problem IV-34). Similarly, This proves (25a). We show (iii). By (22b) the random variable Γ k is given in a linear way by ∆ k , . . . , ∆ k+n J −1 . This yields σ(Γ k ) ⊂ G k ⊂ F k+n J −1 . This together with (i) implies (iii).
We show (iv). The second recursion in (26) yield for all k ∈ N, whereh is a linear map. By R-observability of system (22) there is a linear mapĥ such that ∆ k =ĥ(Γ k ) for all k ∈ N, cf. Lemma 9. Therefore, there is a certain linear map h such that Assertion (25b) follows with a similar argument as in (ii).
Corollary 17 (Non-anticipativity). Let the assumptions of Proposition 16 hold. Then U k and W j are indepentend for all for k, j ∈ N with j ≥ k − δ + 2.
Proof. For k ≥ δ − 1 this follows by assumptions. For k ≥ δ the random vector U k is F k−δ+1 -measurable. Therefore, the assertion follows with Proposition 16 (i).

Realizations and Moments.
A common way to handle the stochastic system (22) is via path-wise dynamics. The outcome ω ∈ Ω (sample) implicitly defines the realizations Hence, we arrive at the realization dynamics Modulo the notation change from u to v, respectively, V in (22), this resembles the previous deterministic system (1). Put differently, any realization (i.e. sampled trajectory) triplet (x, v, y) satisfies (27). Yet, without oracle-like knowledge of future realizations of v i = V i (ω), i ≥ k, and leaving sampling-based approaches aside, the realization dynamics are mostly helpful in an a-posteriori sense. Alternatively, passing over to statistical moments also allows to describe the system dynamics. Even though for moments of first order the equations remain linear, higher order moments result in non-linear equations. Specifically, for firstand second-order moments we obtain the usual propagation 3.1. Representations of L 2 -Random Variables At this point, it is fair to ask which benefits the chosen setting of L 2 -random variables actually implies. As we will see below, this approach enables a linear representation of Gaussian and non-Gaussian random variables, which turns out to also be numerically tractable-under suitable assumptions.
In the following, we use the short-hand notation L 2 (Ω, R d ) instead of L 2 ((Ω, F , P), R d ) whenever there is no ambiguity. The space L 2 (Ω, R d ) can be identified with the orthogonal sum Therefore, the scalar-valued orthogonal basis (φ i ) i∈N of L 2 (Ω, R) gives rise to a series expansion for each random vector X ∈ L 2 (Ω, R d ), i.e. X = i∈N x i φ i , where the series converges in the L 2 -sense. The coefficients are uniquely determined by the usual Fourier quotient Moreover, the sequence of coefficients dynamics (22) PCE coefficient dynamics (30) Realization dynamics (27) Moment dynamics ( Using this notation, we obtain The L 2 -formulation of the dynamics (22) combined with (29) gives for all k, i ∈ N. The transition from (22) to (30) using (29) is referred to as Galerkin projection as it technically means to project (22) onto the basis functions φ i used in (29), cf. Ghanem & Spanos (2003).
To sum up and as depicted in Figure 3, the stochastic dynamics admit at least four different representations • in L 2 random variables (22) • in realizations-i.e. sampled trajectories-(27), • in statistical moments (28), and • in series expansions (30).

Exactness and Polynomial Chaos Expansions
From a numerical and a conceptual point of view, it is desirable if series expansions are of finite order, i.e., that the random variables are represented by finitely many basis functions.
Definition 18 (Exact series expansions). We say a function Z ∈ L 2 (Ω, R d ) admits an exact series expansion if there exists p ∈ Z + and such that the expansion coefficients Observe that in the above definition, the series expansion stops at a finite order. Put differently, for i > p − 1, the coefficients satisfy z i = 0 while in the formal statement (29) we have i ∈ N. Hence, one may ask if such finite series representations can be attained.
A popular approach for uncertainty quantification and uncertainty propagation are Polynomial Chaos Expansions (PCE), which date back to Wiener (1938). Given a family F of basic random variables the pivotal idea of PCE is to choose an orthogonal system (φ i ) i∈N consisting of polynomials with indeterminates in F.
Consider, e.g., a finite or countable family F of independent normally distributed random variables. Each random variable in F posses moments of all orders. For a finite selection of random variables of F mixed moments are, due to stochastic independence, simply the product of the individual moments. Therefore, for n ∈ N, the set is a linear subspace of L 2 ((Ω, F , µ), R). The space Π 0 consists of almost surely constant random variables, and all elements of Π 1 are normally distributed. For n > 1 the space Π n contains also non-Gaussian random variables. Using the Gram-Schmidt process one can construct an orthogonal basis (φ i ) i∈N of ∞ n=0 Π n . In the simplest nontrivial case where F consists of one standard normally distributed random variable ξ this can be given in terms of Hermite polynomials, that is φ i = H i (ξ), where H i is the ith Hermite polynomial. Moreover, the Cameron-Martin theorem (Cameron & Martin, 1947) states that where σ(F) designates the σ-algebra generated by the family F. As a consequence every random variable in L 2 ((Ω, σ(F), P), R) admits an L 2 -convergent series expansion in terms of the orthogonal polynomial basis (φ i ) i∈N . This series is referred to as Wiener-Hermite polynomial chaos expansion. We note that in general only σ(F) ⊂ F holds, that is, L 2 ((Ω, σ(F), P), R) ⊂ L 2 ((Ω, F , P), R).
Furthermore, every random variable in Π n admits an exact series expansion. Under certain conditions regarding the underlying distributions also a family F of non-Gaussian basic random variables allows the construction of an orthogonal basis consisting of polynomials in F, and series expansions with respect to these polynomials. For more details on this so-called generalized polynomial chaos expansion, in particular for its convergence properties, we refer the reader to Ernst et al. (2012); Sullivan (2015); Xiu (2010). For details on truncation errors and error propagation we refer to Field & Grigoriu (2004) and to Mühlpfordt et al. (2018b). Fortunately, random variables that follow some widely used distributions admit exact finitedimensional PCEs with only two terms in suitable polynomial bases. For Gaussian random variables Hermite polynomials are preferable. For other distributions, we refer to Table 2 for the usual basis choices that allow exact PCEs (Koekoek & Swarttouw, 1996;Xiu & Karniadakis, 2002). Observe that in order to use the PCE framework, one needs to specify the basis functions and their random-variable arguments. Hence Table 2 also lists those arguments.
Before concluding this part we give an example to show how exactness preserving bases for polynomial mappings can be constructed. We also comment on the relation of PCE and reproducing kernel Hilbert spaces.
Example 19 (PCE basis construction). For the sake of illustration, consider a scalar-valued map f : L 2 (Ω, R) → L 2 (Ω, R) Let X be Gaussian. Then, it admits the exact PCE where φ 0 = 1 and φ 1 = ξ, ξ ∼ N(0, 1). Consider X 2 which in terms of the PCE of X reads Apparently, the last term cannot be expressed in the subspace spanned by φ 0 , φ 1 . Hence, we extend the bases for Y by φ 2 = ξ 2 − 1 which is the third Hermite orthogonal polynomial. The exact PCE of Y reads Further details on how the structure of explicit polynomial maps can be exploited to constructed exactness preserving bases are given by Mühlpfordt et al. (2018b).
Remark 20 (Link of PCE and RKHS). Besides the fundamental lemma, data-driven control using Reproducing Kernel Hilbert Spaces (RKHS) is also of tremendous interest, see, e.g., Yan et al. (2018); Zhang et al. (2020). A well-known example of an RKHS are Gaussian Processes which are often used as function approximators in machine learning (Deisenroth et al., 2013;Muandet et al., 2017). Hence, it is natural to ask for the link between RKHS and the L 2 -probability spaces considered here. In general, an L 2 space is a Hilbert space of equivalence classes of functions but not a Hilbert space of functions. Thus, without further refinement an L 2 -probability space is not an RKHS.
A straightforward observation, which points towards this issue, is that if two random variables have the same PCE in a given basis this does not mean they are identical. Indeed, they could differ on sets of measure zero, e.g., in terms of countable many outcomes ω with probability zero. However, using the concept of exact PCEs, one may close the gap, i.e., random variables in exact and finite PCE bases are connected to learning methods in RKHS. Consider the case where the family F consists of one standard normally distributed random variable. Then, the underlying Hermite polynomials {H i } i∈N give rise to an RKHS induced by a Mehler kernel for α ∈ (0, 1), cf. (Rainville, 1960, p.196).

Moments, Densities, and Conditional Probabilities in PCE
Almost surely, the careful reader has recognized that the overview of Figure 3 does not mention the modelling of stochastic dynamics in terms of probability densities which is, e.g., common in case of Markov chains on measurable spaces, i.e., whenever the densities are treated as state variables. In a purely Gaussian and linear setting the moment description (28) allows capturing the density evolution, while in general non-Gaussian and nonlinear cases, it is difficult to analytically capture the evolution of densities. Hence, we now briefly discuss how moments and densities can be accessed in the PCE framework. Moments and PCEs. Consider X, Y ∈ L 2 ((Ω, F , P), R d ) with exact PCEs of order p, cf. Definition 18. The expected value and the covariance of X and Y can be obtained from the PCE coefficients as For further insights into the computation of higher-order moments from PCEs we refer to Lefebvre (2020). Probability Densities and PCE. For the sake of illustration, consider X ∈ L 2 ((Ω, F , P), R). Let Y = f (X) be the image of the non-Gaussian random variable X with the PCE in the Hermite polynomial basis. Moreover, let the argument of the basis be standard normal distributed, i.e., ξ ∼ N(0, 1). Then, a straight-forward sampling-based strategy to approximate the density of Y over Ω = R is given by That is, one samples realisations ξ(ω), ω ∈ R, of the argument and, then, evaluates f to obtain samples of the realisations of the non-Gaussian random variable Y(ω). Without further elaboration we remark that structural knowledge on X or f can and should be exploited. Moreover, we observe a very helpful property of PCEs, i.e., the series expansion separates the real-valued and deterministic series coefficients x i from the basis functions φ i (ξ), which capture the stochasticity. Conditional Probabilities and PCE. Another aspect that deserves clarification in the PCE framework is how to capture conditional probabilities, which are of major importance in many systems and control contexts. To this end, consider Y = X + W, where X ∈ L 2 ((Ω, F , P), R) and W ∈ L 2 ((Ω, F , P), R) are independent. Let the PCEs of X, W be given as where the subscripts · x , · w in ξ x and ξ w emphasize that the arguments of the PCE bases are independent random variables as X and W are independent. We are interested in the conditional probability for some given set Y ⊂ R and given w ∈ R. Suppose that W(ω) = w, i.e.ω ∈ Ω is the unique outcome (sample) which realises the event W = w. Then, we have Observe that from the first to the second line, we replace the conditional probability with an unconditional one in which the PCE for W is evaluated to meet the condition. Put differently, in the PCE framework conditional probabilities can be captured by evaluating the bases functions corresponding to the condition at the considered outcomes-in the example φ i w is evaluated at ξ w (ω)-while still regarding the arguments ξ x of the other bases polynomials as random variables.
Concluding this excursion, we note that combining the conceptual ideas expressed in the two examples above illustrates how conditional densities are captured in the PCE framework. Yet, in the general case they might not be directly and analytically accessible, while numerical approximation is immediate. Importantly, provided the PCE propagation is exact in the sense of Definition 18, no information about unconditional and conditional densities is lost. We conclude this part by pointing the interested reader to the introductory text by O'Hagan (2013) which provides insights into the relation of PCE and other series expansions such as Karhunen-Loève expansions or Gram-Charlier series.

Behavioral Representations of Stochastic Linear Systems
The previous section has recalled representations of stochastic systems in conceptually different settings, cf. Figure 3. Next we establish the connections between these settings and the behavioral framework of Section 2.2.

Behavioral Representations
Henceforth and with slight abuse of notation, T refer to the full and the manifest, respectively, the finitehorizon and the manifest finite-horizon behaviors of (27). In short, we say that the realization dynamics (27) generate the realization behavior B ∞ , whereby in view of (24) the variable v combines controls and disturbances.
The Behavior of the PCE Coefficients. One approach to a behavioral description of the stochastic system (22) is via the coefficients of the series expansion of the random variables in terms of the scalar-valued orthogonal basis (φ i ) i∈N . To this end, we first discuss the behavior of PCE coefficient dynamics (30).
To this end, consider the expansion coefficients where we stack all basis dimensions i for a given time instant k ∈ N. Stacking the time instants in similar fashion over the horizon 0, . . . , T , T ∈ N, we obtain the PCE coefficient trajectories Naturally, we could also tensorize this notation which, for the sake of readability, we avoid here. We define the full and the manifest behavior of the expansion coefficients by Observe that N in ( 2 (R n z )) N , z ∈ {x, v, y}, refers to the considered time horizon, while N in (x i , v i , y i ), i ∈ N refers to the expansion order (which in case of finite-dimensional expansions is denoted as p). Hence, the finite horizon behaviors C T and C i/o T are defined similarly as before in the deterministic case, i.e. by truncating (x, v, y) ∈ C ∞ to the bounded time horizon [0, T ], cf. (13) and (18). Consequently, in case of exact PCEs and finite time horizons, C T is finite dimensional.
The behavior of the expansion coefficients, C ∞ , inherits desirable properties like completeness and controllability from the realization behavior B ∞ .
Lemma 21 (Completeness of C ∞ ). The behavior C ∞ is complete, i.e. b| [0,T ] Lemma 22 (Controllability of C ∞ ). Suppose that the realization behavior B ∞ is controllable. Then C ∞ is controllable with transition delay T = δ + n J .
Proof. Let c,c ∈ C ∞ and T ∈ Z + . Then c i ,c i ∈ B ∞ for all i ∈ N. By controllability of B ∞ for each In order to see that c ∈ C ∞ , we need to show for each time step k ∈ N that the components of c k = (x k , v k , y k ) are square summable sequences composed of the expansion coefficients. For k ≤ T − 1 and k ≥ T + T this is clear, as c,c ∈ C ∞ . From the state evolution (5) in the quasi-Weierstraß form (3) together with the system dynamics (27) it can be seen that it suffices to show that v k ∈ 2 (R n v ) for all k = T, . . . , T + T − 1. From the construction of the input signal for the intermediate trajectory in the proof of Lemma 7 it can be seen that v k ∈ 2 (R n v ) for k = T, . . . , T + δ − 2, cf. (15). For the remaining n J time instants k = T + δ − 1, . . . , T + T − 1 there exists a linear (uniformly bounded with respect to i) map K, cf. (17), such that The Stochastic L 2 -Behavior. A reasonable requirement for a behavior of the stochastic system (22) is that its trajectories considered path-wise, that is evolving in time for a fixed event ω ∈ Ω, satisfy the realization dynamics (27). To this end, we define the full and the manifest stochastic behaviors corresponding to system (27) as The corresponding finite horizon behaviors S T and S i/o T are defined similarly to the deterministic case, see (13) and (18). Note that in the stochastic behavior S ∞ besides existence of secondorder moments no further assumptions on the statistical nature of the considered stochastic processes are made. In particular, stochastic independence is not required.

Behavior Inclusion, Lift, and Equivalence
Notice that the realization trajectories in B ∞ can be considered also as stochastic processes which are at each time instant almost surely constant. Hence, the stochastic behavior as defined in (34) includes the realization behavior, which can also be extended to finite-time or manifest behaviors. Likewise, the coefficient behavior (33) satisfies The equality C ∞ = Ś i∈N B ∞ holds if a square summability condition on the elements of Ś i∈N B ∞ along the running index i is imposed. In the case of finite time horizon and exactness in the PCE this additional condition is not necessary for the equality. Inclusion (35) expresses the fact that for all dimensions i ∈ N of the expansion (which correspond to basis directions φ i ) the realization behavior B ∞ describes the dynamics. Put differently, in the context of model-based system representations, for all basis dimensions i ∈ N, the PCE coefficients satisfy identical linear system equations (30) which in turn correspond to the realization dynamics (27). Consequently, it is fair to ask for the relation between S ∞ and C ∞ .
Equivalence and Lift. In the above definition of the stochastic behavior the realm of finite-dimensional systems is left behind. Below we derive relationships which sustain our conception of stochastic behavior. The next theorem shows how elements of the behaviors can be mapped onto each other.
(i) The linear map Φ : (ii) For fixed ω ∈ Ω, the linear map Ψ ω : Proof. Assertion (i) is a consequence of the isometric isomorphy between 2 (R) and L 2 ((Ω, F , P), R) established by series expansion in terms of the orthogonal basis (φ i ) i∈N , cf. the Fischer-Riesz theorem. We show that the image of Φ under C ∞ is contained in S ∞ . Let (x, v, v) ∈ C ∞ and set (X, V, Y) = Φ(x, v, v). By definition (x i , v i , y i ) ∈ B ∞ for every i ∈ N. Therefore, (30) holds for all i, k ∈ N. Together with the orthogonality of (φ i ) i∈N we obtain for all k Hence, (22) holds for all k ∈ N and (X, V, Y) ∈ S ∞ . Next, we show that Φ is surjective. Let (X, V, Y) ∈ S ∞ and take the series expansion with respect to (φ i ) i∈N , that is (x, v, y) is given by (36). As (X, V, Y) ∈ S ∞ , (22) holds for all k ∈ N. Therefore, (37) holds, which implies (30) for all i, k ∈ N. Hence, (x, v, y) ∈ C ∞ and Φ(X, V, Y) = (x, v, y). The injectivity of Φ follows from the uniqueness of the series expansion. This shows (i). The surjectivity of Ψ ω • Φ in (ii) follows from the inclusion The relations between the behaviors and the maps derived in Theorem 23 are sketched in Figure 4. Observe the structural similarity of the relations between the behaviors in Figure 4 and the relation between the models in Figure 3. Similar to Theorem 23 one finds comparable maps between the manifest behaviors as well as behaviors with respect to finite time horizon. Specifically, in the case of exact PCEs on finite time horizons, the corresponding linear map Φ T : C T → S T can be understood as a lift of the finite-dimensional c ∈ C T to the infinite-dimensional object S ∈ S T .
As an immediate consequence of the behavioral lift we find the following relationship between the realization behavior and coefficient behavior.
By Corollary 24 the sequence c ∈ 2 (R) given by allows, in accordance with the identification (35), the embedding of B ∞ into C ∞ , that is for each b ∈ B ∞ one has cb = (c 0 b, c 1 b, . . . ) ∈ C ∞ . The next theorem illustrates the equivalence of behaviors.
Proof. We show the equivalence for T = ∞. The equivalence between (i) and (ii) as well as between (iii) and (iv) follows by definition. The equivalence of (ii) and (iii) follows with the behavioral lift, Theorem 23 (i).
By restricting the elements of B ∞ , S ∞ , and C ∞ to a finite time horizon T one immediately obtains the equivalences for B T , S T , and C T .
The next lemma featuring controllabitiy and completeness of the stochastic behavior B ∞ is a direct consequence of Lemma 21 in combination with Theorems 23 and 25.
Lemma 26 (Completeness and controllability of S ∞ ). The stochastic behavior S ∞ is complete, i.e. S | [0,T ] ∈ S T for all T ∈ N implies S ∈ S ∞ . If the realization behavior B ∞ is controllable, then S ∞ is controllable with delay T = δ + n J .
Proof. Let S = (X, V, Y) and suppose that S | [0,T ] ∈ S T for all T ∈ N. Taking the corresponding coefficients of the series expansion c = (x, v, y) one has c| [0,T ] ∈ C T by Theorem 25. The completeness of C ∞ , see Lemma 21, yields c ∈ C ∞ and with Theorem 25 we see S ∈ S ∞ . The controllability of S ∞ follows in a similar way employing Lemma 22.
Completeness and controllability of the stochastic behavior are both important features in the context of control design and in view of optimal control. Controllability guarantees that given a consistent initial condition there exists an intermediate trajectory steering the system for instance into some equilibrium. Provided a sufficiently long time horizon this implies feasibility of the optimal control problem (provide no other constraints are considered). Completeness on the other hand ensures that successively solving the optimal control problem, while stepping forward in time, leads to a valid trajectory in the infinite time horizon. In essence, completeness of the behavior gives completeness of the dynamic system as such (Sontag, 1998).
Non-Equivalence of Moment Behaviors. The propagation of moments with respect to the system dynamics, i.e. (28), leads to the following definition of the (second-order) moment behavior, Despite nonlinearity in the definition of higher-order moments, the (second-order) moment behavior M ∞ as such is a linear vector space. However, comparison of Figure 3 to Figure  (39) which assigns to a stochastic process S . = (X, V, Y) ∈ S ∞ its first two moments. ® However, even in the case of Gaussian processes which are fully determined by first and second-order moments, the map Ξ is not injective.
Example 27 (Non-equivalence of M ∞ and S ∞ ). Let B ∞ , S ∞ , and M ∞ be the realization, stochastic, and moment behavior, respectively, corresponding to system (27) with Consider independently standard normally distributed random variables X 0 , . . . , V 0 , . . . and set Y k = X k . Define c xx , c xv , etc. as before. Then c xx = c vv = c yy = (1, 1, . . . ), c xv = c xy = c vy = (0, 0, . . . ) and (39) holds. On the other hand, for each k ∈ N the random variable is normally distributed with zero mean. Its variance reads The above example illustrates that even if the series of moments belongs to the moment behavior, it might happen that with probability one all realizations of the corresponding stochastic process are incompatible with the moment dynamics. This intrinsic difficulty of working with moments and their behaviors can also be seen in Figure 4: specifically, a structured means to map an element of the moment behavior M ∞ to the stochastic behavior S ∞ or to the coefficient behavior C ∞ is not available.

The Stochastic Descriptor Fundamental Lemma
The behavioral lifts and behavioral equivalence as established in Theorem 23 and 25, respectively, enable to formulate a version of the fundamental lemma applicable to stochastic descriptor systems.

Equivalence and Inclusion of Column Spaces
We begin with a result whose counterpart for (explicit) LTI systems has been given by Pan et al. (2021).
Lemma 28 (Column-space equivalence). Let there exist a controllable realization behavior B ∞ based on a realization model given by a regular descriptor system with dimension n J of the dynamical part and structured nilpotency index δ. For T ∈ Z + , let (V, Y) ∈ S i/o T −1 with corresponding expansion coefficients (v, y) ∈ C i/o T −1 and (v,ŷ) ∈ B i/o T −1 . Further, assume thatv and the coefficients v i , i ∈ N, are persistently exciting of order L + n J + δ − 1.
Proof. While the lemma is formulated using the language of (stochastic) behaviors, the proof uses similar ideas as the one given by Pan et al. (2021). Assertion (i) follows from Lemma 11 and Theorem 23. We show (ii). Let g ∈ R T −L−δ+2 and define For i ∈ N set g i . =Ĥ † H i g, whereĤ † denotes the Moore-Penrose inverse ofĤ. Then g = (g 0 , g 1 , . . . ) ∈ 2 (R T −L−δ+2 ) and G . = i∈N φ i g i ∈ L 2 (Ω, R T −L−δ+2 ). As H i g ∈ colspĤ by (i), we haveĤg i = H i g for all i ∈ N. Therefore, Observe that from the applications point of view, the above lemma is subject to a severe limitation since persistency of excitation is assumed for all PCE dimensions i ∈ N. That is, in case of stochastic processes composed by i.i.d. random variables, the PCE coefficients have to be constant. Put differently, for i.i.d. random variables persistency of excitation of the corresponding PCE coefficients does not hold. Hence, the assumption of persistency of excitation is, in general, too strong.
The next result recaps an observation made by Pan et al. (2021), which relaxes the relation of column-spaces from equivalence to inclusion. Doing so heavily alleviates the applicability since persistency of excitation is not assumed for a single v i , where i ∈ N corresponds to the PCE dimension.
Corollary 29 (Column-space inclusion). Let there exist a controllable realization behavior B ∞ based on a realization model given by a regular descriptor system with dimension n J of the dynamical part and structured nilpotency index δ.
Assume thatv is persistently exciting of order L + n J + δ − 1. Then, we have the following inclusion for all i ∈ N colsp Based on the the two results stated above, we can now formulate the fundamental lemma for stochastic descriptor systems. It extends and combines the developments of Pan et al. (2021) and Schmitz et al. (2022b).

Lemma 30 (Stochastic descriptor fundamental lemma).
Let there exist a controllable realization behavior B ∞ based on a realization model given by a regular descriptor system with dimension n J of the dynamical part and structured nilpotency index δ. Let (v, y) ∈ B i/o T −1 such that v is persistently exciting of order L + n J + δ − 1. Then, the following statements hold: Proof. The first statement leverages the column space inclusion of Corollary 29 and follows together with Lemma 11 and Theorem 23. The linear equation (42a) is under-determined. Therefore, given some trajectory (ṽ,ỹ) ∈ C i/o L−1 a particular solution g is given in terms of the pseudo-inverse by .
The square summability of this g is obvious.
The second assertion follows combining the first one and Theorem 23, where the relationship between G and g is established by G = i∈N φ i g i .

Discussion and Comments
Disturbance Modelling and Data Acquisition. For starters, we emphasize that the Hankel matrices in (42a) and (42b) are in terms of realization data. That is, they are constructed from measurement data not from PCE data or random variables.
In this context, we remark that the conceptual split of the exogenous inputs (v, V) into manipulated controls (u, U) and process disturbances (u, W)-which is expressed in (24)-can be directly translated by splitting the Hankel matrices accordingly In turn, this implies that disturbance data w [0,T −δ] is required for the application of the stochastic (descriptor) fundamental lemma. This can be either done via estimation or, for certain applications, via measurements. The latter is, e.g., the case for energy systems, where (w, W) would model volatile renewables or consumer demand, i.e., stochastic processes whose realization are accessible through measurements. For first results on the estimation of past disturbance data in case of LTI systems, we refer to Pan et al. (2021). The preceding issue hints to the fact that the stochastic process disturbance W could be further split into disturbance and noise where W d k models non-Gaussian disturbances and W n k reflects Gaussian/non-Gaussian noise. Put differently, W n k can be used to capture the i.i.d. zero-mean part of the disturbance, which we would usually denote as process noise, while W d k would model further disturbances acting on the system, which could be noni.i.d.-component-wise and in time.
Moreover, measurement noise will corrupt the data entering the Hankel matrices in any real world application. Indeed, in case of LTI systems the issues of robustness of the Hankel matrix descriptions with respect to measurements corrupted by noise has received widespread attention, see Dörfler et al. (2022); Yin et al. (2021a,b) for analysis of Hankel matrix predictions and Coulson et al. (2019b); Berberich et al. (2022) for robustness analysis in context of data-driven MPC.
Alternative Approaches to UQ for Descriptor Systems. It is worth to be remarked that the stochastic fundamental lemma provides a handle for uncertainty propagation and Uncertainty Quantification (UQ) in descriptor systems which does not rely on explicit model knowledge.
To illustrate the appeal of the proposed approach consider where Θ ∈ L 2 (Ω, R n θ ) models the uncertainty surrounding system matrices. Put differently, the realization Θ(ω) is constant for all time steps k ∈ N. Under the conditions outlined in Section 2, the deterministic fundamental lemma (Lemma 11) elegantly covers these cases in the input-output setting, i.e., it enables UQ. Indeed, without further elaboration, we remark • that, under the assumption Θ ∈ L 2 (Ω, R n θ ), PCE has seen frequent use for model-based UQ and control design for LTI counterpart of the uncertain descriptor system given above, see Wan et al. (2021Wan et al. ( , 2022; Fisher & Bhattacharya (2008), and • that the data-driven setting does not require the assumption Θ ∈ L 2 (Ω, R n θ ) as long as any realization Θ(ω) remains finite and constant over the considered time horizon.
Lemma 30 pushes UQ and uncertainty propagation for uncertain descriptor systems even further as it allows to handle the case (Ω, R n z ), n z ∈ {n x , n u , n w , n y } and Θ L 2 (Ω, R n θ ) but constant over time. We remark that a modelbased PCE approach to the easier setting with Θ ∈ L 2 (Ω, R n θ ) is subject to the fundamental complication that the multiplication of the uncertain system matrices with the random variable states, inputs etc.-e.g., A(Θ)X k , B(Θ)U k , . . . -leads to multiplication of PCE bases, which induce several technicalities in the Galerkin projection, cf. Mühlpfordt et al. (2018b). Note that the data-driven approach based on the stochastic fundamental lemma alleviates such problems entirely.
Alternative Hankel Constructions in the Lemma. One may wonder what happens if the column space representation in (42b) is altered. There are two main ways to do so: (a) Use Hankel matrices in random variables and a real columnspace selector g. Leaving the non-subtle technicality of defining persistency of excitation for V [0,T −1] aside, it can be shown that the map λ : is indeed such that its co-domain satisfies That is, the image of g under the linear map λ specifies an element of S i/o L−1 . Yet, this construction does not span the entire finite-time behavior S i/o L−1 . A detailed PCE-based construction of an LTI counterexample and the proof of the inclusion statement are given by Pan et al. (2021).
(b) Use Hankel matrices in random variables and a randomvariable column-space selector G, i.e., consider the image of G ∈ L 2 (Ω, R T −L−δ+2 ) under the Hankel matrices in random variables. At this point, we conjecture without further elaboration that the fundamental inclusion gives the entire manifest behavior S i/o L−1 . Specifically, we conjecture that the intersection of the co-domain of Λ and S i/o However, in such a setting one has to handle non-linear operations on PCEs which renders the formal analysis more complicated.

Data-Driven Stochastic Optimal Control
In the preceding sections we introduced behavioral characterizations of stochastic systems. Next we turn towards using these concepts for control. Specifically, we discuss data-driven optimal control of stochastic descriptor systems and the numerical solution of the arising problem. In terms of an outlook, we also comment on data-driven stochastic predictive control.

Behavioral Problem Formulations with S N and C N
For starters, we consider the stochastic system (22) with input partition (24), we model the stochastic input U k as a stochastic process adapted to the filtration {G k } k∈N as in Proposition 16. That is, for the sake of avoiding tedious technicalities, we suppose that the noise does not affect the algebraic subsystem. In the underlying filtered probability space (Ω, F , {G k } k∈N , P), the filtration accounts for the potential loss of causality with respect to the inputs induced by the descriptor structure.
We consider the conceptual behavioral Optimal Control Problem (OCP) for the finite horizon N minimize U,W,Y where Q 0, R 0 are symmetric positive definite matrices of appropriate dimensions. Optimal solutions are denoted by (U , W , Y ) whereby, due to the noise specification (43d), we have that W = W. The consistency condition (43c) together with R-observability of the underlying system guarantees that for sufficiently large horizon N the latent state trajectory is uniquely defined. This requires n J + δ − 1 consistent random-variable input and n J output pairs, cf. Lemma 9. This consistency combined with characterization of the filtration in Proposition 16 and Corollary 17 is the reason why the manifest behavior in (43b) has the horizon length N + δ + n J − 1.
As the finite-horizon manifest behavior S i/o N+δ+n J −2 can be characterized by Hankel matrices in realization data (cf. Lemma 30), the following data-driven reformulation of OCP (43) is immediate: • The need to specify the disturbances in (43d) is alleviated in OCP (44) as this data is directly included in (44b).
Consider the stochastic descriptor LTI system (22) and suppose that W k for k ∈ I [0,N+δ+n J −2] ,Ŷ k for k ∈ I [0,n J −1] , and U k for k ∈ I [0,δ+n J −1] admit exact PCEs with finite dimensions p w and p ini , i.e., W k = p w −1 i k φ i ini , respectively. Then, (i) the optimal solution (U , Y , G ) of OCP (44) with horizon N admits exact finite-dimensional PCEs with p terms, where p is given by and the finite-dimensional joint basis (φ i ) p−1 i=0 reads . The proof is based on the observation that the Hankel matrix description (44b) is a linear equation, i.e., if the PCE basis for G contains the union of the bases of W, Y, and U then the image Y and U can be expressed exactly in the joint basis. Exploiting the assumption of exact PCEs for the problem data W k , Y k , U k leads directly to the basis construction. A similar construction considering explicit LTI systems and uncertain initial condition is given in the proof of Proposition 1 of Pan et al. (2021). The crucial change is the consideration of the structured nilpotency index δ to extend the result to regular descriptor systems. For the sake of brevity, we omit the details of the proof.
The previous lemma implies that as the prediction N grows, the PCE dimension required for exactness in (i) increases linearly. The reason is that the realizations of W k are independent at each time instant k ≤ N + δ + n J − 2. Hence, the finitedimensional polynomial basis in (ii) enables exact propagation of the uncertainties over finite horizons. Moreover, observe that as the PCE basis grows linearly with the horizon N, the number of decision variables in a PCE reformulation will grow quadratically in N.

Remark 32 (Filtered stochastic processes with PCE).
Considering the polynomial basis given in Lemma 31, the causality (non-antipacitivity) of the filtration (G k ) k∈N used in Proposition 16 implies (see Corollary 17) that the PCE coefficients of the inputs satisfy This condition ensures that U k does not depend on the disturbance W j at time instances j ≥ k − δ + 2. For a similar discussion in the explicit model-based setting we refer the reader to Ou et al. (2021).
Now, we are ready to reformulate the behavioral OCP (43) in the finite-horizon manifest PCE coefficient behavior In comparison to OCP (43) which is formulated in the finitehorizon manifest PCE coefficient behavior S i/o N , OPC (45) is structurally similar in terms of the objective and the constraints (45b)-(45d). That is to say, (45b)-(45d) can be derived by applying Galerkin projections to their random-variable counterparts (43b)-(43d). The main structural change occurs in (45e) as this constraint models the causality requirement of the filtered stochastic process U in the applied PCE basis. Note that in the behavioral OCP (43) in random variables this causality is implicitly handled in the choice of the σ-algebra which in turn is crucial in the definition of the L 2 probability spaces, cf. Proposition 16.
Similar to the change from the behavioral OCP (43) to the data-driven OCP (44) we now reformulate OCP (45) in datadriven fashion: minimize u,y,g At this point, we do not repeat the comparison done for problems (43) subject to consistency condition and  The process of reformulations and the equivalence relations between the four considered OCPs are summarized in Figure 5.

Constrained Formulations and Implementation Aspects
Several comments on the considered OCPs, on how to extend their formulations with chance constraints, and on the numerical implementation are in order.
Chance Constraints. So far the considered OCPs (44) and (46) involve equality constraints which model behavioral constraints, consistency conditions (a.k.a. initial conditions of the dynamics), and causality requirements. Yet, in many applications it is of interest to also model inequality constraints in a probabilistic/stochastic setting.
Consider a scalar box constraint z ∈ [z,z]. Lifting z ∈ R to a probability space, i.e. to Z ∈ L 2 (Ω, R), the previous deterministic requires attention. There are three main options: In-expectation constraint satisfaction, i.e., one uses This is a weak formulation in the sense that for arbitrary L 2 (Ω, R) random variables, it might happen that z,z]. In this case, the in-expectation satisfaction of the constraint might lead to erroneous conclusions. Robust constraint satisfaction, i.e., one requests that In this setting the worst case outcome ω ∈ Ω will likely dictate whether or not the constraint can be satisfied with certainty. Moreover, observe that in case of random variables with unbounded set of outcomes Ω (e.g. Gaussian) such a constraint can never be satisfied with certainty. Probabilistic constraint satisfaction, i.e., one requires the constraint to hold in probability whereby the parameter ε ∈ [0, 1] and 1 − ε is usually denoted as confidence level. Constraints of this type are referred to as chance constraints. In case ε = 0 holds, we say the constraint is satisfied almost surely.
In many applications chance constraints are of tremendous interest as, in particular in context of optimization problems, they allow to trade-off performance against constraint satisfaction, see Mesbah (2016) We also observe that there is a subtle difference between robust and almost surely constraint satisfaction, as the later allows for constraint violation on subset of Ω with measure zero.
Naturally, this raises the question of how to formulate chance constraints in a numerically tractable fashion in the PCE framework. In stochastic MPC a common reformulation of scalar chance constraints is cf. Farina et al. (2016). A conservative choice for σ is given by σ(ε) = 2−ε ε . In case of Gaussian distributions one can also choose ε according to the standard normal table to avoid conservatism.
Let the PCE of Z be given as Z = p Z −1 i=0 z i φ i . Using (32) we obtain It is straight-forward to see that this reformulation directly leads to second-order cone constraints. For further details on the reformulation of chance constraints we refer to Farina et al. (2016); Calafiore & Ghaoui (2006). Initial Conditions. In the random-variable OCPs (43) and (44) we have phrased the consistency conditions (43c) and (44c) in terms of random variables. This setting entails the (more) common situation of deterministic initial conditions obtained from measurements as a special case. Moreover, it allows to model additive measurement noise in the PCE framework by Y k = y k + M k with M k ∈ L 2 (Ω, R n y ). Suppose that a finite PCE for M k is known, and that M k has zero mean, then the PCE for Y k is immediately obtained. We emphasize that, for deterministic initial data, the PCE formulation of the consistency conditions (45c) and (46c) is directly able to handle this. All it takes is to set the PCE coefficientsŷ i k = 0 for i > 0. Moreover, it is well-known that noise corrupted data in the Hankel matrices and in the consistency constraints might lead to infeasibility or deficient numerical solution properties of the Hankel matrix constraints (44b) and (46b). A common remedy is to add slack vectors σ i of appropriate dimension and to penalize them in the objective. Analysis on the implication of different penalization strategies has been done by, e.g., Coulson et al. (2019b); Yin et al. (2021a).
Numerical Implementation and Toolboxes. The discussion on slack variables above has already addressed aspects of numerical implementation. However, this subject warrants further comments. For starters, observe that the usual Hankel matrix equality constraint-(44b) and (46b)-entails large dense matrices. This as such numerically not beneficially. This is evident upon comparison to model-based linear quadratic OCP formulations with inequality constraints in which the state-recursion typically results in sparse equality constraints of favourable structure (Axehill, 2015).
It is known that Hankel matrices can also be constructed from segmented data (van Waarde et al., 2020). From a numerical perspective, it even more promising to segment the time horizon. That is to use Hankel matrices of smaller dimension and to couple the solution piece by continuity constraints. This idea has been suggested by O'Dwyer et al. (2021). It resembles the classic concept of multiple shooting in the data-driven setting (Bock & Plitt, 1984). In a recent paper, we have shown that data-driven multiple shooting can be applied to the stochastic setting of OCP (46) (Ou et al., 2023). Specifically, one can combine the multiple shooting idea with moment matching. This way the dimension of the PCE basis, and thus the number of decision variables can be reduced considerably. We refer to Ou et al. (2023) for further details.
Another aspect which requires attention is the computation of the numerous quadratures needed to evaluate E(φ i φ i ) and to perform Galerkin projection of nonlinear equations. Fortunately, there exists a number of efficient numerical packages which can be used. This includes tools for Matlab (Petzke et al., 2020), julia (Mühlpfordt et al., 2020), and python (Feinberg & Langtangen, 2015;Baudin et al., 2017).

Numerical Examples
We present a scalar system subject to disturbances of alternating structure and a 4th-order descriptor system subject to Gaussian noise. The implementation is done in Matlab R2021b.

Scalar Example with Alternating Disturbance Sequence
We consider the scalar system Ou et al. (2021). The stochastic disturbance switches between two distributions, i.e.
where an i.i.d. Gaussian noise models the disturbance for even time index k and an i.i.d. uniform distribution at each odd value of k. We suppose that the disturbance distribution is known, sufficient past realization data of W is also available, while its future realizations are not known. The example illustrates the flexibility of PCE to model stochastic disturbances beyond the purely Gaussian setting. The matrices Q and R are Q = R = 1. Additionally, we add the term 2 N−1 k=0 E[(U k+1 − U k ) 2 ] to smoothen the input sequence. We record 60 state-input-disturbance measurements to construct the Hankel matrices. We solve with horizon N = 20 for a randomly sampled initial deterministic condition. The dimension of the PCE basis is p = N + 1 = 21. Figure 6a depicts the first two moments of the solutions in terms of X and U , while Figure 6b shows the PCE coefficient trajectories. Moreover, we sample a total of 20 sequences of noise realizations and compute all the corresponding state and input trajectories, see Figure 6c. As one can see, the state and the input trajectories in terms of expectation converge to 0. Interestingly, the variance of the state and the input does not converge to 0, cf. Figure 6a. Figure 6c shows realizations for 20 distinct disturbance sequences. Observe that the increase in variance towards the end of the horizon is also visible for the realizations. In terms of realizations and moments, this is reminiscent of a turnpike property, cf. Ou et al. (2021) ;Faulwasser & Grüne (2022). A detailed discussion of the phenomenon in the stochastic setting is, however, beyond the scope of the current paper.

Descriptor Example
We consider a stochastic extension of the 4th-order linear descriptor system considered in Schmitz et al. (2022b). The system matrices read The system can be transformed into quasi-Weierstraß form with n J = δ = 2 via the matrices R-controllability and R-observability are easily verified via Remark 4. Note that in the quasi-Weierstraß form, we have NF N = 0. Thus, causality with respect to the disturbance W is easily obtained in the PCE problem formulation as sketched in Remark 32. For all k ∈ I [0,N−1] , the disturbance W k is modelled as i.i.d. Gaussian random variables with distribution N(0, 0.1 2 ).
We want to steer the system to the point (y, u) = ([20, 0] , 0) and solve the OCP in form of (46) with horizon N = 20. The weighting matrices in the objective function are chosen to be Q = I 2 and R = 1.
In the data collection phase, we record 160 output-inputnoise measurements to construct the Hankel matrices. With respect to the initial condition, we assume no noise measurement is available at run-time. Thus we model noise OCP (46) via its PCE. Moreover, the input applied to the system is randomly sampled from a uniform distribution with support [−1, 1]. In sum, we obtain an uncertain consistency condition (a.k.a. initial condition) (46c) which is modelled via PCE.
Slack variables are added to the initial condition as (47), while we penalize it via the 1-norm with a weighting parameter. Moreover, since the PCE coefficient of W is known, we employ the null-space method to reduce the dimension of g i and thus accelerate the computation. For further details we refer to Pan et al. (2021).
The trajectories of first two moments of Y and U are depicted in Figure 7a while the trajectories of Y and U for 20 different noise realizations are shown in Figure 7b. Furthermore, we sample a total of 1000 initial conditions as well as the noise realization sequences and compute the corresponding output/input trajectories. The evolution of the normalized histograms of output realizations y 1 at k = 0, 4, 9, 14, 19 is illustrated in Figure 7c. As one can see, the proposed data-driven optimal control achieves a narrow distribution of Y 1 around y 1 = 20.

Discussion and Open Problems
As discussed in Section 2.2 the fundamental lemma as such has deep roots in behavioral systems theory. In this tutorial we proposed extensions of the fundamental lemma to deterministic and stochastic descriptor systems, cf. also Figure 2. To this end, we formalized concepts for behavioral characterizations of stochastic descriptor systems. We introduced the stochastic L 2behavior and its counterpart in terms of series expansion coefficients (which for suitable polynomial chaos expansions can be shown to be finite dimensional). We have analyzed how these behaviors relate to the behavior of sampled trajectories (i.e. the realization behavior). We have also shown that the description in stochastic moments is structurally limited: (i) moments are by their very construction nonlinear projections of random variables, and (ii) this projection, especially if limited to the first two moments, is subject to a substantial loss of information.
Moreover, we remark that while the stochastic behavioral description as such does not necessitate the existence of a finite series expansion, finite-dimensional parametrization of the random-variables fosters numerical tractability in applications. Data-driven Output Feedback Predictive Control. While the fundamental Lemma as proposed in Willems et al. (2005) has already been published in 2005, its impact and reception peaked upon realizing that is opens a path towards datadriven output-feedback MPC (Yang & Li, 2015;Coulson et al., 2019a). Not only does the data-driven approach alleviate the  need for explicit identification of a state-space model-it also works with input-output data. Recall that MPC as such is traditionally a model-based state-feedback strategy, i.e., observer design is an inevitable step in almost all implementations. The data-driven approach allows to overcome burden. In view of the contributions of this tutorial our developments pave the road towards data-driven output-feedback stochastic MPC. First steps towards data-driven stochastic MPC, i.e., problem formulation and stability analysis for the case of usual LTI systems with state feedback, have been done by Pan et al. (2021Pan et al. ( , 2022. At the time of writing these lines, the analysis for the output-feedback case is work in progress. The extension to the stochastic descriptor case is completely open. Another dimension, which will be key for the success of data-driven MPC, are tailored numerical methods. While for model-based MPC powerful codes enable computation times down to a few micro-seconds (depending on the problem size and the computational platform), the results on multiple-shooting formulations of the OCP-which we commented on in Section 6-are first steps in this direction. Yet, they leave substantial room for further refinements and improvements. Robustness. Robustness analysis for data-driven description of deterministic systems has been subject to widespread research efforts (Coulson et al., 2019b;Yin et al., 2021b;Huang et al., 2021). We expect that due to the close relation between the realization behavior and the coefficient behavior, cf. Theorem 25, these results can be transferred to the stochastic setting. Data-driven Analysis of Descriptor Systems. The data-driven approach can be used beyond feedback design. Indeed it also allows analysis of system properties, see, e.g., Romer et al. (2019b,a) for passivity and dissipativity. When it comes to descriptor systems much less has been done. Specifically, the verification of structural properties besides the dimension of z J in Lemma 14 (nilpotency index, regularity of the matrix pencil, etc.) would be appealing. Nonlinear systems. Data-driven control of nonlinear systems is a topic, which has seen substantial progress. One can mention the recent work on data-driven prediction of so-called observables, i.e., real-or complex-valued functions of the state, in the Koopman framework by means of (extended) Dynamic Mode Decomposition (DMD), see Brunton & Kutz (2022) as well as the recent survey by Brunton et al. (2021) and the references therein, or SINDy (Brunton et al., 2016). The Koopman framework can also be applied for control. To this end, Arbabi et al. (2018) augmented the state dimension n x by the number of input variables n u to set up a surrogate model, see, e.g., Arbabi et al. (2018) and Mauroy et al. (2020). To alleviate the curse of dimensionality, Williams et al. (2016), Surana (2016), Klus et al. (2020) proposed a bilinear approach for control-affine systems which shows a superior performance for systems with coupling between state and control, see also (Mauroy et al., 2020, Section 4). Moreover, Schaller et al. (2022) provide rigorous bounds on the prediction error for control explicitly depending on the dictionary size (projection) and the number of employed data points (estimation). The analysis of the estimation error can be extended to the setting based on stochastic differential equations and ergodic sampling Nüske et al. (2021). However, data-driven control and fundamental-lemma-type results for nonlinear stochastic systems are not known yet. Similarly, there is substantial prospect of deriving fundamental-lemmalike results for infinite dimensional systems.

Conclusions
This paper has made steps towards behavioral theory for and data-driven control of stochastic systems. We have constructed equivalent behavioral characterizations of stochastic descriptor systems in terms of L 2 -random variables and in terms of series expansions of these variables. In other words, we have connected the seminal contributions of Jan C. Willems and coworkers with concepts put forward by Norbert Wiener. Our approach allows to extend the celebrated fundamental lemma to stochastic descriptor systems. Crucially, this extension works with Hankel matrices in realizations of stochastic variables, i.e., it relies on measurement data only. Thus, the presented contributions open up new perspectives on data-driven control of stochastic systems, on data-driven uncertainty quantification, and on data-driven uncertainty propagation.