Linear Parameter-Varying Subspace Identification: A Unified Framework

In this paper, we establish a unified framework for subspace identification (SID) of linear parameter-varying (LPV) systems to estimate LPV state-space (SS) models in innovation form. This framework enables us to derive novel LPV SID schemes that are extensions of existing linear time-invariant (LTI) methods. More specifically, we derive the open-loop, closed-loop, and predictor-based data-equations, an input-output surrogate form of the SS representation, by systematically establishing an LPV subspace identification theory. We show the additional challenges of the LPV setting compared to the LTI case. Based on the data-equations, several methods are proposed to estimate LPV-SS models based on a maximum-likelihood or a realization based argument. Furthermore, the established theoretical framework for the LPV subspace identification problem allows us to lower the number of to-be-estimated parameters and to overcome dimensionality problems of the involved matrices, leading to a decrease in the computational complexity of LPV SIDs in general. To the authors' knowledge, this paper is the first in-depth examination of the LPV subspace identification problem. The effectiveness of the proposed subspace identification methods are demonstrated and compared with existing methods in a Monte Carlo study of identifying a benchmark MIMO LPV system.

The field of subspace identification applies realization theory to find SS model estimates based on surrogate input-output (IO) models (with appropriate noise models) directly estimated form data. These specialized IO models are estimated by using convex optimization and it can be shown that they correspond to a maximum-likelihood (ML) estimate under the considered assumptions. Then, an SS realization is obtained from the IO model by either a direct realization step or by an intermediate projection step. In the latter idea, a projection is found to estimate the unknown state-sequence via matrix decomposition methods, then the SS matrices are estimated in a least-squares fashion. Obtaining such statesequence is heavily based on realization theory, as the estimated state-basis should be consistent with the behavior of the underlying system. In the LTI setting, the IO model estimation and realization of the SS model under the presence of process and measurement noise is well understood [9][10][11][12]. In the LPV case, contrary to the LTI setting, the stochastic interpretation of the methods with the appropriate noise representation is not well understood neither is the connection between the various methods have ever been studied.
LPV subspace schemes also suffer heavily from the curse of dimensionality, e.g., see [8, Table 1], resulting in illconditioning of the estimation problem and high parameter variance. Consequently, two common assumptions are taken to reduce the dimensionality: (i) the excitation, in terms of the variation of the scheduling variablep, is periodic or white [2,3], and/or (ii) the output-equation of the SS representation is assumed to be p-independent [1,5,8]. However, such assumptions restrict practical applicability of the methods. To tackle ill-conditioning and to reduce estimation variance, kernel based regularization techniques have been proposed [8,13]. However, computational complexity of the involved kernels grows polynomially or exponentially w.r.t. the design parameters, which significantly compromises the effectiveness of these schemes. Alternatively, SS models can directly be estimated by minimization of the ℓ 2 -loss in terms of the prediction-error associated with the model. These socalled prediction-error methods (PEM) minimize the ℓ 2 -loss directly using gradient-based methodologies [14][15][16][17] or by the expectation-maximization scheme [17]. However, minimization of the ℓ 2 -loss w.r.t. to the LPV-SS model parameters is a nonlinear and nonunique optimization problem, requiring an initial estimate close to the global optimum.
The goal of this paper is to obtain a unified formulation to treat the LPV subspace identification problem and derive its associated stochastic properties by systematically establishing an LPV SID theory. This unified framework enables us to (i) understand relations and performance of LPV SIDs, (ii) extend most of the successful LTI subspace schemes to the LPV setting, (iii) decrease the dimensionality problems, and (iv) relax assumptions on the scheduling signal. In addition, we establish stochastic LPV realization theory which provides state estimation with maximum likelihood efficiency. To the authors' knowledge, this paper is the first in-depth treatment of the subspace theory in the LPV case. In this paper, we focus on projection based schemes, but the direct realization schemes can easily be abstracted from the developed results, i.e., see [18]. We follow well-known concepts from the LTI literature, e.g., [9,11], and our theoretic results are also based on preliminary studies in the LPV setting [8,13,19]. The main contributions of this paper are: i) Formulating the state estimation problem by a maximum-likelihood approach based on canonical correlation analysis and by a realization based approach. ii) Stochastic interpretation of state estimation with maximum likelihood efficiency under the presence of noise. iii) Computationally efficient formulation of SIDs to decrease the effects of the curse of dimensionality. The unified subspace theory is tackled in the global identification setting, i.e., under general trajectories of the scheduling signal, contrary to some results in the literature [3,20,21].
The proposed schemes could also be applied in a setting where the scheduling signal contains additive white noise that might be correlated to the input additive noise. In such case, the IO estimation step could be performed by using an instrumental variable approach [22]. However, investigation of such formulation is outside of the scope of this paper. This paper is organized as follows: first, the assumed datagenerating system with LPV-SS representation and general innovation noise structure are presented and the open-loop, closed-loop, and predictor-based data-equations are derived (Sec. 2). Then, the considered parametric LPV-SS identification problem is introduced (Sec. 3). Next, the state realization problem is tackled from a maximum-likelihood and realization based argument first for the open-loop identification setting (Sec. 4) and then for the closed-loop identi-fication setting (Sec. 5) leading to the LPV formulation of various well-known LTI subspace methods. The efficiency of the unified framework is demonstrated by a Monte Carlo study on an LPV-SS identification benchmark (Sec. 6).

The data-generating system
The goal is to obtain an SS model estimate of the datagenerating system S o represented in the following LPV-SS innovation form 1 where x : Z → X = R nx is the state variable, y : Z → Y = R ny is the measured output signal, u : Z → U = R nu is the input signal, p : Z → P ⊂ R np is the scheduling signal, and ξ : Z → R ny is the sample path realization of the zero-mean stationary process: where ξ t : Ω → R ny is a white noise process with sample space Ω (set of possible outcomes) and Ξ 2 ∈ R nx×nx is a positive definite covariance matrix. Furthermore, we will assume u, p, y, ξ to have left compact support to avoid technicalities with initial conditions. The matrix functions A(·), . . . , K(·) defining the SS representation (1) are affine combinations of bounded scalar functions ψ [i] (·) : P → R: where i=0 are constant, real matrices with appropriate dimensions and ψ [0] (·) = 1 is assumed to 1 In the majority of the subspace literature [2,9,11], the datagenerating system is assumed to be in the innovation form as given in (1). However, in [18], it is shown that the noise description in (1) is not equivalent to a state-space form with general noise representation, i.e., a representation with different noise processes on the state and output equation. [18] also shows that a static, affine K(pt) can approximate the general setting if the state dimension is increased. In practice, we often need to restrict parameterization of K, e.g., to the static, affine parameterization in (1), to reduce complexity of the estimation method and variance of the model estimates. Hence, despite the possible increase of the state order of the equivalent innovation form, the usage of this affine form has been found adequate in practical applications [1][2][3]5]. be constant. Additionally, for well-posedness, it is assumed that are linearly independent over an appropriate function space and are normalized w.r.t. an appropriate norm or inner product [23]. Due to the freedom to consider arbitrary functions ψ [i] , (3) can capture a wide class of static nonlinearities and time-varying behaviors. For notational simplicity, we define

The open-loop data-equation
The first step in tackling the subspace identification problem is to represent the dynamics of the data-generating system (1) as an equivalent IO representation, the so-called data-equation. The unknowns in these data-equations are estimated by convex optimization and the final SS model is obtained from these data-equations using matrix decomposition techniques (see Sec. 3-5 for more details). Hence, the data-equations are key in formulating the subspace problem.
Open-loop data-equations are rarely used in the literature, as the innovation noise ξ t is unknown. In light of the MAX identification setting in [18,24], the innovation noise ξ t can be uniquely obtained by convex optimization, which renders the open-loop equations attractive for further investigation, similar to [25] in the LTI setting. Using (1b), the output w.r.t. a future window f ∈ N + , where N + = {i ∈ Z | i > 0}, starting from time-instance t can be written as is the extended "input" signal and y t+f t , ξ t+f t , andž t+f t are sequences according to the notation Furthermore, the matrix functions in (4) are given as andĽ f is as given in (5d) where A t , . . . ,Ď t is a shorthand notation for A(p t ), . . . ,Ď(p t ). Here, f i=1 is considered with left multiplication. In (4) and (5a)-(5d), the ⋄ operator is a shorthand notation for dynamic dependency on the scheduling signal, i.e., Next, the state can be decomposed by using the past values of the input and noise signals: with past window p ∈ N + , past dataž t−p t , and Combining the output-equation based on the future values (4) with the state-equation based on the past values (6) results in the open-loop data-equation which has the form of a MIMO LPV-IO model. Estimating the underlying IO relationship of (8) requires the inputscheduling pair (u, p) and the innovation noise ξ to be uncorrelated in order to obtain an unbiased estimate of the relationship (8) under PEM, e.g., see [11,26,27]. The case when (u, p) and ξ are uncorrelated is usually referred to as the open-loop identification setting [11,28,29], characterized by the following two assumptions: The input signal u is quasi-stationary and uncorrelated with ξ, i.e.,Ē{u The scheduling signal p is quasi-stationary and uncorrelated with ξ. Assumptions A.1 and A.2 are not restricting, for example, when considering the LPV modeling problem of a thermal loop in a wafer scanner. The thermal distribution of the wafer varies with the position, but it does not influence the measurement noise of the position sensor and, therefore, the position as scheduling signal fulfills Assumption A.2. On the other hand, an inverted pendulum setup with stabilizing controller where the angle of the pendulum is the scheduling signal (and output) will not satisfy Assumptions A.1-A.2. In such a case, p is correlated with past values of ξ due to the closed-loop interconnection between plant and controller.

The closed-loop data-equation
To overcome the limitations of the open-loop setting, the data-equation (8) can be written in an alternative form. Analogously to LTI identification [11,26,27], the outputequation (1b) is substituted in the state-equation (1a), resulting in wherez t = [u ⊤ t y ⊤ t ] ⊤ and the corresponding matrix functions are 2 The generalized expectation operationĒ of a process u is defined asĒ {ut} = limN→∞ 1 N N t=1 E{ut}. A process u is said to be quasi-stationary if there exists finite c1, c2 ∈ R such that i) E{ut} 2 < c1 for all t, and ii) Tr Ē {utu ⊤ t−τ } 2 < c2 for all τ , e.g., see [29].
It is important to note that (9) does not depend explicitly on the stochastic process ξ. Hence, the state-equation (9) can be treated in a deterministic setting. However, moving from the open-loop to the closed-loop dynamics comes at the cost of polynomial dependency of theÃ andB matrix functions. Opposed to the LTI setting where K = K ∈ R nx×ny , applying (9) instead of (1a) increases the model complexity. Using (9), the stacked output-equation (4) can equivalently be represented as In (11), (Õ f ⋄p) t denotes the observability matrix withÃ instead of A, (L f ⋄ p) t is constructed withÃ,B, and the future valuesz t+f t are similarly stacked asž t+f The state can be written as a combination of past signals (similar to (6)) where p ∈ N + is the past window, (R f ⋄ p) t denotes the reachability matrix (7a) withÃ andB instead of A and B, andX p is the initial condition (7b) withÃ instead of A.
Combining (11) and (12) results in the closed-loop dataequation: To formulate our identification problem in the closed-loop case, we take the following assumptions: The input signal u is quasi-stationary and uncorrelated with future values of ξ, i.e.,Ē{u The scheduling signal p is quasi-stationary and uncorrelated with future values of ξ. Assumptions A.3 and A.4 allow to identify systems under general feedback structures, e.g., see [11,28,29].

Derivation of the predictor
A commonly applied data-equation for subspace identification is the predictor form, e.g., see [8,27,30]. To formulate the one-step-ahead predictor for the output, the closed-loop state (9) is substituted into the output-equation (1b) and we take the conditional expectation, resulting in:ŷ Note that (14) is the minimal variance estimator of y t and that (14) represents an LPV-ARX model where p → ∞ will diminish the influence of the initial conditionX p under the assumption thatÃ is stable. The one-step-ahead predictor of the output can be similarly stacked as the closed-loop dataequation (13) leading to the predictor-based data-equation: Note that (15) is the one-step-ahead predictor of (13). Hence, the SS representation of S o can be captured by the predictor (14) from which (15) can be constructed [2,8,27,30].

Parametric subspace identification setting
Known LTI and LPV subspace schemes are based on the aforementioned data-equations or their simplifications. The subspace schemes rely on matrix decomposition techniques on applied O f R p to obtain a realization of these two matrices; however, these decomposition techniques cannot be directly applied to parameter-varying matrices. As shown in [8], the main difficulty comes from the time-varying observability matrix, as the dependency structure of the reachability matrix can be absorbed in an extended input vector.
In this paper, we are interested in estimating the unknown matrices {A, . . . , x + 2n y n x + n u n x + n y n u ). The parameters of the data-generating system S o are denoted as θ o and we denote with S(θ ′ ) the model (1) with parameters θ ′ . The identification problem of SS models based on a data set D N = {(y t , p t , u t )} N t=1 has non-unique solutions up to a transformation matrix, e.g., see [18,19]. Hence, we aim at identifying an isomorphic, jointly state minimal S(θ) w.r.t. S(θ 0 ) defined by the following set: 3 (16) where the indistinguishable manifold S is given in (17).
Given a data set D N and the basis functions , the goal of this paper is obtain a consistent estimateθ of the data-generating system S o such thatθ → θ ∈ I θo with probability one as N → ∞. For the identification setting to be well-posed, the following standard assumptions are taken A.5 S(θ o ) is an element of the model set, meaning that ∃θ ∈ Θ such that θ ∈ I θo . A.6 The state-minimal SS representation with static, basis affine dependency structure of the system S o is structural state-observable w.r.t. to the pair (A(p t ), C(p t )) and structurally state-reachable w.r.t. to the pair(A(p t ), The open-loop dynamics A(p t ) or closed-loop dynam-icsÃ(p t ) are asymptotically stable for the open-loop or closed-loop cases, respectively. A.8 The past window p is chosen sufficiently large, such that X p ≈ 0 orX p ≈ 0, ∀p ∈ P Z for the open-loop or closed-loop cases, respectively. We can only estimate system dynamics that manifest in the data, so the system is represented with a structurally minimal IO equivalent SS representations, as formalized in Assumption A.6. With Assumption A.7, the influence of the initial state x t−p can be neglected in (8), (13), or (15). This property is widely applied in subspace identification [8,9,[30][31][32]. See [31,Lemma 5] for an upper-bound on the approximation error of this assumption. Note that, we do not take the assumption that either (C(p), D(p)) or K(p) are parameter independent to reduce the complexity of the IO model opposed to state-of-the-art subspace schemes [8,13].
Next, we will develop a unified theory to extend the LTI N4SID, MOESP, CVA, SS-ARX, and PBSID principles to the LPV case. There are two significant differences with respect to the LTI case. Firstly, almost all LTI formulations apply a (partial) ARX model structure, however, in the LPV case, the LPV-ARX model comes with significantly larger parameterization compared to the MAX representation in the open-loop setting. Secondly, we apply a predictor preestimation step to identify the unknown quantities of the matrices O fŘp ,Ľ f ,Õ fRp , etc. and construct the full matrices instead of estimating the matrices O fŘp ,Ľ f ,Õ fRp directly using the data-equations (8) or (13). Direct estimation of the matrices by oblique projections comes with a significant computational cost [31, Table 1] compared to the predictor formulation [8, Table 1], especially in the LPV case. Furthermore, direct estimation of the matrices will not take the structural restrictions ofĽ f into account, which leads to a non-causal model estimate as pointed out in [33]. Therefore, we follow an alternative route by estimating a predictor in the pre-estimation step and construct O fŘp ,Ľ f ,Õ fRp to lower the computational demand and to enforce a causal model, similar to recent literature [11,26,27,34].

Subspace identification in open-loop form
In this section, we derive two methods to realize the statesequence based on the open-loop data-equation (8). The first method is based on a maximum-likelihood argument using canonical correlation analysis (CCA) (Sec. 4.2) and the second method applies a realization based argument (Sec. 4.3). The latter deterministic state realization approach results in the LPV extension of various LTI schemes by using different weighting matrices in the state realization step.

Main concept
The stochastic and the deterministic approaches use the fact that the observability and reachability matrices can be decomposed into a parameter independent and a parameter dependent part. To this end, defině The p-step extended reachability matrix and the f-step extended observability matrix are given as Using Assumption A. 8, the open-loop data-equation (8) can be decomposed as Data-equation (20) describes the IO relations of the datagenerating system based on an SS form. The unknowns in this IO relation are the so-called sub-Markov parameters Using the relation (20), the sub-Markov parameters and the unknown noise sequence ξ t can be estimated by LPV-MAX model estimation using convex optimization [18,Thm. 5.5].
In this section, the state realization is accomplished by assuming that a sub-part of the structural observability matrix O f associated with the parameter independent part of the SS representation, i.e., C 0 and A 0 , is full column rank (common assumption applied in practice [1,2,5,6]). 4 To this end, define the scheduling independent observability matrices for the open-loop and closed-loop setting, respectively. 5 A. 9 The scheduling independent part of the observability matrix is of rank n x , i.e., rank for the open-loop or closed-loop case, respectively. To make use of this assumption, the observability matrix O f in (20) is split into a part that depends on O 0 f and a part that does not: Based on (21), introduce the data-equation describing the so-called open-loop corrected future: Using an LPV-MAX estimate of (20), the open-loop corrected futurey t+f,(c) t (22) can be efficiently computed from data. Then, using this surrogate variable, (21) can be reduced to a data-equation excluding the time-variation in the observability matrix: Representation (23) (23) to obtain an estimate of the state-sequence.

Maximum-likelihood estimation
The corrected formulation (23) is the fundamental dataequation to obtain an estimate of the state-sequence. In this In such case, additional assumptions should be taken on the associated scheduling variable to fulfill the observability criterion, which is not treated to simplify the discussion. 5 The closed-loop scheduling independent observability matrix is presented here for compactness of the paper. section, the state-sequence is estimated using the canonical correlation analysis. The CCA is a well-known method in statistics that finds a (lower dimensional) space that maximizes the correlation between two random variables [35]. In our case, this translates to the objective of finding the unknowns O 0 f ,Ř p , and the subspace of x t by maximizing the correlation betweeny t+f,(c) t andM t,pž t−p t , e.g., see [10,27,[36][37][38] to mention a few. In [10,27] statistical optimality of CCA in the LTI setting has been shown by formulating the optimal one-step-ahead predictor of the state based on either the past or future data. We will take an alternative viewpoint by formulating an estimate of the statesequence by maximizing the log-likelihood function associated with the least-squares (LS) estimation problem of the unknowns O 0 fŘ p based on the signalsy t+f,(c) t andM t,pž t−p t of the model (23). [37,38] claim maximum log-likelihood of the state estimation using CCA, however, the mathematical derivations are scattered within the literature and appear to be incomplete, as pointed out in [36]. In Theorem 24, we prove the maximum log-likelihood property for the LPV case. For notational simplicity, let us define the following data-matriceš 1 NY with the matricesŨ andṼ , given bỹ

Under Assumptions A.1-A.2 and A.5-A.9,
whereṼ nx defines the first n x columns ofṼ , is a maximumlikelihood estimate of the state-sequence. The associated log-likelihood function minimized by this estimate is whereS = diag(s 1 , . . . ,s nx ).
PROOF. See the Appendix.
In case of infinite data, i.e., N → ∞,S will contain exactly n x nonzero singular values which are equal to one (see [18,Remark 9.1]). In case of finite data, the state order n x can be selected by a gap in magnitude between the singular values [9,11]. Alternatively, the stochastic interpretation of the CCA in Theorem 1 allows for a data-driven selection of the model order n x based on the log-likelihood function L (26) using an information criterion such as Akaike's or the Bayesian information criterion [29]. In depth investigation of order selection is beyond of the scope of this paper.
Using the estimate of the state-sequencex, the state-space i=0 are estimated using two linear regression steps, e.g., see [31,Sec. 2.5]. The first step is a standard ℓ 2 -loss minimization problem based on the output-equation (1b) with the solution: where Φ † defines the right-pseudo inverse of Φ and the regression matrices are Using the output-equation (1b), an estimate of the innovation noise is found in the form of the residual error of (27): The remaining state-space matrices are estimated by a second linear-regression step based on the state equation (1a): with

Realization based estimation
In Section 4.2, a statistical viewpoint has been taken based on only the input-scheduling-corrected output signals. Alternatively, inspired by the Ho-Kalman scheme in [23] or the PBSID scheme in [2], the problem can be tackled from the realization point of view, i.e., the state is realized by decomposing the sub-Markov coefficients in O 0 fŘ p . Opposed to the Ho-Kalman scheme, we are interested in the statesequence of the innovation form (1) including noise dynamics. In the LTI case, these ideas have been extensively exploited resulting in many variants of subspace identification schemes, e.g., see [9,11,39]. As recognized in [9, Thm. 12], most of the LTI subspace schemes only differ by leftand right-multiplication of the Hankel matrix with different weightings. Following this concept, a unified LPV formulation of subspace schemes can be introduced:  (23). Compute the following SVD where the full rank weightings can be taken as HK W 1 = I, The Ho-Kalman (HK), numerical subspace state space system identification (N4SID) and projected canonical correlation analysis (p-CCA) follow the default naming in subspace literature, see [9]. Then, a realization of the state-sequence is given byX PROOF. The LPV estimation problem can be rewritten in an LTI formulation, because the IO model (23) Hence, taking (6) and Assumption A.8 into account, rightmultiplying the reachability matrixŘ p in (32) with the data matrixŽ p,N leads to a realization of the state as in (31).
An important fact in Theorem 2 is the absence of the closedloop dynamics, contrary to the LTI case [9,39], and the required pre-estimation step to obtain O 0 fŘ p . Opposed to the LTI case, we do not apply oblique projections to remove the effect of future inputs, e.g., see [9,Sec. 4

.2] (the oblique projection is an indirect LS estimation and prediction step).
In the LPV case, we apply the pre-estimation step making the oblique projections of the future input superfluous and, therefore, a MOESP like weighting is not present. The unified formulation in Theorem 2 applies N4SID and CCA like weightings to the estimated Hankel matrix, but it is not an LPV extension of these methods, due to the missing oblique projections. In addition, it is important to note that the CCA in Theorem 1 and the p-CCA in Theorem 2 are different as CCA is based on stochastic realization theory and signal relations while p-CCA is based on pre-estimated sub-Markov coefficients. This theoretical split can also be found in the LTI literature, e.g., between [40] and [9], respectively. Both principles are equivalent for N → ∞, as the oblique projections and least-squares estimates are consistent and unbiased [41]. The choice of the weightings W 1 and W 2 has been discussed by many authors. In the LTI case, it has been proven that W 1 has no influence on the asymptotic accuracy 8 of the estimates, see [41][42][43]. On the other hand, on finite data, the optimal choice is still an open question.
For any applied weighting, the estimated state-sequence in the unified formulation (31) is not guaranteed to have unit variance compared to the estimate by the CCA method in (25). In the LTI case, it can be shown that the resulting model estimate is stochastically balanced for any choice of the weighting [44], similar to deterministic Ho-Kalman realization. In the LPV case, the observability and reachability Gramians are scheduling-dependent and the authors believe that the state-sequence (31) is structurally balanced, but formally proving this property is a subject of future research.

Subspace identification in closed-loop form
The concepts of the presented state estimation and realization schemes for the open-loop identification setting in Th. 1 and 2 will be extended to the closed-loop case in this section. Similar to the open-loop case, the realization problem is first tackled from an ML point of view (Sec. 5.2) and then from a deterministic realization viewpoint (Sec. 5.3). We would like to emphasize that the scheme presented in [8] simplifies the realization problem by considering the matrix functions C, D, and K constant. No such assumption will be taken next.

Main concept
Construction of the matricesÑ t,f ,Õ f ,R p , andM t,p in the closed-loop case are more involved due to the multiquadratic parameterization ofÃ(p t ) andB(p t ) in (10). First, define all unique combinations of the scheduling induced variation ψ t ⊗ ψ t as where µ : Z → M ⊂ R nµ+1 with dimension n µ = 1 2 n ψ (n ψ + 3) is called the extended scheduling variable 6 in the sequel. In addition, define Next, let us define the closed-loop p-step extended reachability matrix: 6 nµ is given by with i ∈ {u, y} and define the closed-loop f-step extended observability matrix as Finally, the scheduling dependent data-matrices are given as By applying the aforementioned matrices and Assumption A.8, Eq. (13) reads as It can be proven that the LPV-SS representation (1) . . , n ψ found inÕ fRp andL f are the sub-Markov coefficients of the multi-quadratic parameterization of the closed-loop formulation (35). These unknown quantities can be estimated by a linear regression of an LPV-ARX model.
The developed concepts of the open-loop setting can be applied to obtain a realization of the model in the closed-loop setting. This concept has successfully been used in the LPV literature, e.g., in [8,31,38] to mention a few. To this end, the closed-loop counterpart of (21) is (36), the closed-loop corrected future can be introduced as then using the same principle as in the open-loop case, (36) is reduced to a data-equation where the time-variation in the observability matrix disappears due to the use of the corrected futurẽ The closed-loop form (38) enables to treat the state realization problem equivalent to the open-loop case in Section 4.

Maximum-likelihood estimation
The corrected formulation (38) is the fundamental dataequation to obtain an estimate of the state-sequence. In this section, the state-sequence is estimated using the ML point of view introduced in Section 4.2. For notational simplicity, let us define the following data-matrices For the closed-loop case, we can reformulate Th. 1 as: Theorem 3 (CCA based state estimation: closed-loop case) Given an LPV data-generating system (1) and an associated data set D N withZ p,NZ and the matricesŨ andṼ given bỹ

Under Assumptions A.3-A.4 and A.5-A.9,
provides a maximum-likelihood estimate of the statesequence. The associated log-likelihood function minimized by this estimate is whereS = diag(s 1 , . . . ,s nx ).
PROOF. Follows the same reasoning as in Theorem 1 with trivial adaptations. The complete proof is found in [18].
Note that, Theorem 3 is the LPV counterpart of the LTI SS-ARX scheme presented in [32]. Hence, an important contribution of our framework is the extension and the generalization of the CCA to the LPV setting making it possible to directly extend the SS-ARX scheme. In addition, as a contribution, the derived CCA setting allows to prove the maximumlikelihood property and to obtain the log-likelihood function of the estimate, which have not been formally proven in the LTI case, see in [32].

Realization based estimation
The state estimation problem has been tackled from the input-scheduling-corrected output statistics point of view in Theorem 3. Alternatively, the state-sequence realization problem can be interpreted as a weighted decomposition of the stochastic, closed-loop Hankel matrixÕ 0 fR p . More specifically, the concepts introduced for the open-loop case in Section 4.3 can be directly extended to the closed-loop case leading to a unified theory, which immediately extends various LTI subspace methods to the LPV case: Theorem 4 (Unified state realization: closed-loop case) Given an LPV data-generating system (1) and an associated data set D N withZ p,NZ  (38). Compute the following SVD where the full rank weightings can be taken as HK W 1 = I, The HK, predictor based subspace identification (PBSID) and projected space state autoregressive exogenous method (p-SS-ARX) follow the default naming in subspace literature, see [8,9,32]. Then, a realization of the state-sequence is given byX PROOF. Based on a similar argument as for Theorem 2.
In [8,27] the implementation and derivation of the PB-SID method is accomplished differently. Without exploring the theoretical basis, similar to the above given general theory, the authors in [8,27] aimed at realizing a computationally efficient estimator by computing the SVD on W 1Õ 0 fR pZp,N = U SV ⊤ and realize the state byX N = S 1 2 V ⊤ nx . Obviously, this method is equivalent to the above defined PBSID weighting, but it is computationally more efficient as it avoids the square root operation in (42). Additionally, [27] proves asymptotic equivalence between LTI PBSID and LTI SS-ARX. Extension of this proof to the LPV case has not been accomplished yet, but it is likely to hold. A so-called "optimal" formulation of Theorem 4 can also be derived [18,Sec. 9.8] based on the LTI formulation [27]. The general idea of [27] is to prove that the initial condition X p on the data-equation falls within the variance of the estimator and it can be neglected if the past window p is chosen large enough. This concept translates to taking the assump- 1, 2, 3, and 4 can straightforwardly be modified such that A(p t ), . . . , K(p t ) are affinely dependent on individual basis functions

Remark 6
To lower the computational load w.r.t. the IO estimation and realization, we can apply the kernelization based computation similar to [8,13].

Simulation Example
In this section, we will demonstrate the performance of the discussed LPV subspace identification schemes on the benchmark example given in [13]. The developed subspace schemes are compared to the PBSID opt method [8].

Identification setting
The benchmark is based on a MIMO LPV-SS model with input dimension n u = 2, scheduling dimension n p = 2, state dimension n x = 2, and output dimension n y = 2. To be able to compare the developed approaches to existing LPV subspace methods, we consider the simplified setting with a scheduling independent innovation noise model y is the SNR of the output y [i] . To evaluate the statistical properties of the subspace schemes, we will carry out two simula-tion studies with N = {10 3 , 10 4 } samples in the identification data set D N and in each simulation study N MC = 100 Monte Carlo runs are executed. In each run, new realizations of the input and scheduling signals are used. The simulation output or one-step-ahead predicted outputŷ of the estimated model is compared to the measured output and the one-step-ahead predicted output y of the true system (oracle), respectively, by means of the best fit rate (BFR) 8 using a validation data set D val as in [13]. In (44),ȳ defines the mean of the simulation output or the one-step-ahead predicted output y of the oracle.
For the open-loop setting, the MAX model is estimated using pseudo linear regression (PLR) where the update is determined by the enhanced Gauss-Newton method [18,Appx. B.3]. Note that this optimization problem is convex. The PLR is initialized with a FIR model estimate using ℓ 2regularized least squares with generalized cross validation (GCV) to estimate the optimal regularization parameter [45,Sec. 6.1.4]. For the closed-loop setting, an ARX model is estimated using ℓ 2 -regularized least squares with GCV. The PBSID opt [8] uses Tikhonov regularization with GCV.
Next, we will provide a summary of the used design parameters, which are optimized to provide the highest BFR   Figure 1 also shows that the structural estimation bias of the realization based schemes is bigger than the structural bias of the maximum-likelihood schemes. The structural bias is caused by the fact that the initial condition in Assumption A. 8 is not yet small enough. The bias can be further reduced by increasing the past window p; however, this will increase the parameter variance and, therefore, decrease the overall BFR on the estimate.

Analysis of the results
Compared with PBSID opt proposed in [8], we can see that direct implementation of the CCA and SS-ARX schemes have comparable performance. However theoretically, PBSID opt should be close to the BFR performance of the standard implementation of PBSID, but Table 1 and Figure 1 highlight a clear difference which is caused by the kernel trick of PBSID opt that significantly improves numerical accuracy. In addition, the difference in BFR between some realization techniques is in the order of 10 −9 .
For example, the case of HK OL, N4SID, and p-CCA for a data set with sample size N = 10 3 . This indicates that the IO estimation step is dominant over the realization step in terms of the BFR for these particular cases. These observations indicate how important it is to develop a numerically efficient implementation of the developed subspace identification schemes to enhance their performance beyond the theoretical developments of this paper. Therefore, extending the kernel implementation to Theorems 1, 2, 3 and 4 is an important objective for future research. Furthermore, while the comparision is provided here with p-independent innovation noise models, the developed subspace schemes in this paper are capable to accomplish state estimation with p-dependent noise scenarios that are beyond the capabilities of the current state-of-the-art.

Conclusion
In this paper, we have presented a unified framework to formualte extension of supace identification methods for LPV identifcation by systematically developping an LPV subspace identification theory. Based on the derived openloop, closed-loop, and predictor-based data-equations, several methods have been proposed to estimate LPV-SS models in one unified framework based on a maximum-likelihood or realization based arguments. Hence, we have shown how to extend LTI CVA, SS-ARX, PBSID, and N4SID to the LPV setting. The effectiveness of the presented subspace identification methods is demonstrated in a Monte Carlo study by identifying a MIMO LPV benchmark system. An important future direction of research is to imporve nummerical efficency and reduce computational load of the developped methods.    There might be other solutions to the minimization problem of (A.12), however, we take the solution equal to the CVA solution of [37]. Hence, the latter choice of Q maximizes the marginal likelihood function (A. 5). An estimate of the reachability matrixR p is obtained by reformulating (A.9) aŝ R p = Q ⊤Ṽ ⊤ = [ I nx 0 ]Ṽ ⊤ , (A.14) which results in selecting the first n x columns ofṼ . Then, the estimates of the observability (A.4a) and noise covariance (A.4b) are equivalent tô when the all singular values are selected, due toSQ. In case the number of data points goes to infinity, i.e., N → ∞, thenS will contain exactly n x nonzero singular values. In the finite data case, the number of states in the realization is selected based on Q. In conclusion, the SVD (26) maximizes the marginal-likelihood function of the linear estimation problem (23) w.r.t. the unknowns O 0 f ,Ř p and the covariance Ξ 2 with state-sequence (25) and log-likelihood function (26).
Note that in early literature on CVA SID [37,40], the constrained SVD (A.6) was performed with arbitrary positivedefinite weight Λ ∈ S such that I =Ṽ ⊤ ΛṼ , which is called the CVA method. The CCA and CVA method coincides with the weighting choice in (A.6). The CVA method with Λ = 1 NŽ p,N (Ž p,N ) ⊤ leads to a minimal prediction-error solution [40,Eq. (10)], but will not lead to a maximumlikelihood estimate.
Lemma 7 (Poincaré separation theorem) Let A ∈ S n and B ∈ R n×r be matrices such that B ⊤ B = I r . Let λ i { } represent the eigenvalues of a matrix sorted in descending order. Then, λ i B ⊤ AB ≤ λ i {A} , i = 1, . . . , r.