Learning linear modules in a dynamic network using regularized kernel-based methods

In order to identify one system (module) in an interconnected dynamic network, one typically has to solve a Multi-Input-Single-Output (MISO) identification problem that requires identification of all modules in the MISO setup. For application of a parametric identification method this would require estimating a large number of parameters, as well as an appropriate model order selection step for a possibly large scale MISO problem, thereby increasing the computational complexity of the identification algorithm to levels that are beyond feasibility. An alternative identification approach is presented employing regularized kernel-based methods. Keeping a parametric model for the module of interest, we model the impulse response of the remaining modules in the MISO structure as zero mean Gaussian processes (GP) with a covariance matrix (kernel) given by the first-order stable spline kernel, accounting for the noise model affecting the output of the target module and also for possible instability of systems in the MISO setup. Using an Empirical Bayes (EB) approach the target module parameters are estimated through an Expectation-Maximization (EM) algorithm with a substantially reduced computational complexity, while avoiding extensive model structure selection. Numerical simulations illustrate the potentials of the introduced method in comparison with the state-of-the-art techniques for local module identification.

In this paper we focus on the local module identification problem.In Dankers et al. (2016) and Van den Hof et al. (2013), the classical direct method for closed loop identification (Ljung, 1999) has been generalized to the framework of a dynamic network.Similarly, in Dankers et al. (2016), Gevers et al. (2018) and Van den Hof et al. (2013), the indirect identification methods have been generalized to the dynamic network framework.A direct method to handle correlated process noise has been provided in Ramaswamy and Van den Hof (2021) and Van den Hof et al. (2019).A method that combines the frameworks of the direct and the indirect method by using additional excitation signals as predictor inputs has been introduced in Ramaswamy et al. (2019).Considering the effect of sensor noise in the measurements, the aforementioned setting has been generalized in Dankers et al. (2015).A simultaneous minimization of the prediction error approach is introduced in Günes, Dankers, and Van den Hof (2014) for identifying the target module in a dynamic network with only sensor noise.This method has been extended to a Bayesian setting in Everitt et al. (2018), where regularized kernel-based methods are used to decrease the variance of the estimated target module.
In this paper we aim at improving the performance of the direct method for dynamic networks, since the direct method exploits both the external excitation signals and noise signals for data informativity.Assuming a known topology of the network, in Van den Hof et al. (2013) it was shown that, in order to identify a given module of interest using the direct method, we have to formulate a multi-input single-output (MISO) identification problem where the inputs of the MISO setup correspond to the inputs of all modules of the network sharing the same output with the module of interest (see Section 3 for details).A relaxed setup has been provided in Dankers et al. (2016), where the MISO setup contains only a subset of the above mentioned inputs.This implies that, in both the approaches, to avoid possible bias in the parameter estimates, one has to identify all the modules constituting the MISO structure, bringing in the problem a possibly high number of parameters to be estimated that are of no primal interest to the experimenter.For example, considering the network in Fig. 1 with the target module of interest for identification being G 31 , one has to identify G 31 , G 32 and G 34 .Adding to this, a model order selection step needs to be performed to select the number of parameters for each module using complexity criteria like AIC, BIC, or cross validation (Ljung, 1999).For this, it is required to test a number of combination of candidate model orders that increases exponentially with the number of models in the MISO structure, making the model order selection step computationally infeasible (e.g., for 5 modules with FIR model structure and orders from 1 to 5, one has to test 5 5 possible combinations).More importantly, if any of the modules constituting the MISO structure is unstable, the prediction error identification approaches available from the literature cannot be used, since the predictors are unstable.We stress the presence of unstable modules is compatible with stable input-output dynamics in a network.For example, in the network of Fig. 1 the effect of unstable modules in G 31 and/or G 32 could be canceled by suitable controllers G 23 and/or G 12 .
In this paper, we address the aforementioned problems developing an identification method based on non-parametric regularized kernel-based methods that • identifies a local module through a direct approach, exploiting both the external excitation signals and the disturbance signals for data informativity, • avoids the complexity of model order selection for largescale problems, • reduces the number of nuisance parameters that need to be estimated in local module identification, and • can be used irrespective of the stability of the modules in the MISO structure, with no need of prior information on possible unstable modules.
In Ramaswamy et al. (2018), a method to improve the performance of the direct method for dynamic networks based on non-parametric regularized kernel based methods has been introduced.Even though the method in Ramaswamy et al. (2018) achieves the first three above mentioned objectives, it does not achieve the fourth and cannot be used under the presence of unstable modules in the MISO structure.In the current paper, building upon the preliminary work of Ramaswamy et al. (2018), we provide a different and unified framework to identify the module of interest, which does not depend on the stability of the modules in the MISO structure.
In order to develop this method, we build on the following approach.We keep a parametric model for the target module of interest in order to have an accurate description of its dynamics.The impulse responses of the remaining modules in the MISO structure are modeled as zero mean Gaussian Processes (GP), with covariance (or kernel) given by the first-order stable spline kernel (Chen, Ohlsson, & Ljung, 2012;Pillonetto, Dinuzzo, Chen, De Nicolao, & Ljung, 2014), which encodes stability and smoothness of the processes.However, we need to handle the prior inclusion of stability property using kernel-based methods under the presence of unstable modules and also incorporate process noise modeling in our framework to avoid increased bias in the estimated target module.We do this by appropriately rewriting the network dynamics.
Using the aforementioned approach, we obtain a Gaussian probabilistic description that depends on a vector of parameters η containing the parameters of the module of interest, the variance of the output noise, and the hyperparameters characterizing the stable spline kernel.Therefore, estimating η provides the parameters of the target module.This is accomplished by using an Empirical Bayes (EB) approach (Maritz & Lwin, 1989), where η is estimated by maximizing the marginal likelihood of the data, which requires solving a nonlinear non-convex optimization problem.To this end, we use the Expectation-Maximization (EM) method (Dempster, Laird, & Rubin, 1977), which provides a solution by iterating over simple sub-problems which either admit analytical solutions or require solving scalar optimization problems.Numerical experiments performed on simulated dynamic networks show the potentials of the developed method in comparison with available classical methods.
This paper is organized as follows.In Section 2, the setup of the dynamic network is defined.Section 3 provides a summary about the direct method and the extension of this framework using regularized kernel-based methods to end up in a marginal likelihood estimation problem is provided in Section 4. Next, we provide the approach and solution to the marginal likelihood problem using EM method.Section 6 provides the results of numerical simulations performed on simple dynamic networks, which is followed by the Conclusions.The technical proofs of all results are collected in Appendix.

Problem statement
Following the setup of Van den Hof et al. (2013), we consider a dynamic network that is built up of L scalar measurable internal variables or nodes w j (t), j = 1, . . ., L. The dynamic network is defined by the equation (time and frequency dependence is omitted below), . . . . . . . . . . . .
The representation in ( 1) is an extension of the Dynamic Structure Function (DSF) representation (Gonçalves & Warnick, 2008).In the above equation, • q −1 is the shift (delay) operator i.e. q −1 u(t) = u(t − 1); • G 0 jk (q) is a strictly proper rational transfer function for j = 1, . . ., L and k = 1, . . ., L; • v j (t) is an unmeasured process noise entering node w j (t).It is a realization of a stationary stochastic process represented by v j (t) = H 0 j (q)e j (t), with e j (t) a Gaussian white noise process with unknown variance σ 2 j and H 0 j (q) a monic, stable and minimum phase filter; • r j (t) is a measured external excitation signal entering node w j (t).In some nodes, it may be absent.
We assume that the dynamic network is stable, i.e. (I −G 0 (q)) −1 is stable, and well posed (see Van den Hof et al., 2013 for details).
Also we consider that the process noise v j (t) entering the node w j (t) is uncorrelated with the process noise entering any other node of the network.We assume that we have collected N measurements of the internal variables {w k (t)} N t=1 , k = 1, . . ., L, and that we are interested in building a model of the module directly linking node i to node j, that is G 0 ji (q), using the measurements of the internal variables, and possibly r.To this end, we choose a parameterization of G 0 ji (q), denoted as G ji (q, θ), that describes the dynamics of the module of interest for a certain parameter vector We define G 0 jk , k ∈ N j and H 0 j as rational transfer function such that G 0 jk (q) = B 0 jk (q)/F 0 jk (q) and H 0 j (q) = C 0 j (q)/D 0 j (q) where are polynomials, and n b jk , n f jk , n c j , n d j are positive integers, and N j is the set of node indices k such that G jk ̸ ≡ 0. We now expand the parameterization of G 0 ji (q) as G ji (q, θ) , where θ B and θ F are the parameterized coefficients of polynomials B 0 ji (q) and F 0 ji (q) respectively as in Eq. (2

The standard direct method
Following the definition of a dynamic network in the previous section, each scalar internal variable can be described as: The above equation represents a MISO structure and is the starting point of the methodology presented in this paper, which is based on extending the direct method (Van den Hof et al., 2013).
In the standard direct method for dynamic networks ( Van den Hof et al., 2013), we consider the one-step-ahead predictor (Ljung, 1999) of w j (t): which is a function of the parameter vector θ.Not only the target module, but also the modules G 0 jk (q), k ∈ N j \{i}, and the noise model H 0 j (q), are suitably parameterized with additional parameters.The parameter vector of interest θ is identified by minimizing the sum of the squared prediction error ε j (t) = w j (t) − ŵj (t|t − 1; θ).We note that in this formulation, the prediction error depends also on the additional parameters entering the remaining modules and the noise model, which need to be identified to guarantee consistent estimates of θ.Therefore, the total number of parameters may grow large if the cardinality of N j is large, with a detrimental effect on the variance of the estimate of θ in the case where N is not very large.

The developed empirical Bayes identification technique
We now discuss how to use regularized kernel-based methods to avoid parameterization of the additional modules (all modules except the target module) in the MISO structure.We define the following quantities: Considering the above definitions, Eq. ( 3) can be re-written as where we isolate the target module G 0 ji (q).A main challenge when using kernel methods for LTI system identification is that typically a prior knowledge on the stability of the predictor filters in (4) is imposed to reduce the MSE of the estimated impulse response of the system (see Pillonetto et al., 2014;Ramaswamy et al., 2018).When all systems (i.e.G jk , k ∈ N j ) are stable, as assumed in Ramaswamy et al. (2018), the predictor filters in (4) are stable and the setup in (4) lends itself for kernel-based estimation of the predictor filters.However, when some or all systems in the MISO structure are not stable, the imposition of prior knowledge on stability is not possible unless we suitably rewrite the network dynamics in (3).
Proposition 1.Consider the network equation of the output node signal w j (t) in (3).The network equation can be represented in an alternative way as, 1 where M ⋆ (q) are strictly proper predictor filters, B ji (q) and Fji (q) = −(1 − F ji (q)) are stable polynomials representing G ji (q), and ēj (t) is a Gaussian white noise with variance σ 2 j .
Proof.Collected in the Appendix.The expressions for M ⋆ (q) are provided in the Appendix.□ Since all the predictor filters in the rewritten network dynamics are stable, this formulation lends itself to the Bayesian approach (Ramaswamy et al., 2018), as described in the subsequent sections.

Vector description of the dynamics
In order to apply a kernel-based method to (5), we are going to formulate a vector description of the network dynamics for the available N measurements.For notation purposes, we consider N-dimensional vectors b ji and f ji (which will also depend on θ, although we will keep this dependence tacit) which are the parameterized coefficients of B ji (q, θ B ) and Fji (q, θ F ) respectively stacked with zeros (i.e. Similarly, we define the vector m k , k ∈ N j \{i}, and m j as the vectors containing the first l coefficients of the impulse responses of M jk (q), k ∈ N j \{i}, and M j (q), respectively.The integer l is chosen large enough to ensure m k (l + 1), m j (l + 1) ≃ 0.
Lemma 1.Let the vector notation for the node w j (t) be w j := [ w j (1) . . .w j (N) ] T .Considering the parameterization of G 0 ji , the network dynamics in (5) can be represented in the vector form as: where ⊤ and ēj is the vectorized noise.W , W ji and W k are Toeplitz matrices constructed from measurements of the nodes in the MISO structure.
Proof.We denote by W k ∈ R N×l the Toeplitz matrix of the vector − → ] T where ℓ ∈ {i, j}.Similarly, we denote by ] T , ℓ ∈ {i, j}.Also G b and G f are the Toeplitz matrix of b ji and f ji respectively.Considering the parameterization of G 0 ji and the above established notations, we can rewrite the network dynamics in (5) as ( 6) where W := ⊤ and ēj is the vectorized noise.□

Modeling strategy for the additional modules
We now have a vector description of the module dynamics where we have isolated the objective of the identification method, namely g ji , from the non-interesting nuisance terms, 1 from now on superscript 0 is dropped for convenience.
namely m k and m j .As the next step, we discuss our modeling strategy with the use of regularized kernel-based methods.Our goal is to limit the number of parameters necessary to describe w j in (6), in order to increase the accuracy of the estimated parameter vector of interest θ.In order to achieve this, we keep a parametric model for g ji (accounting for the zeros in g ji ), while the remaining impulse responses in (6) are modeled with nonparametric model as zero mean Gaussian processes.The choice of Gaussian processes is motivated by the fact that, with a suitable choice of the prior covariance matrix (usually referred to as kernel), we can get a significant reduction in the variance of the estimated impulse responses (Pillonetto et al., 2014).Therefore, we model m j and m k , k ∈ N j \{i}, as independent 2 zero mean Gaussian processes (vectors in this case).The choice of the covariance matrix (kernel) of these vectors are given by the First-order Stable Spline kernel whose general structure is given as, where β ∈ [0, 1) is a hyperparameter that regulates the decay velocity of the realizations of the corresponding Gaussian vector, while λ ⩾ 0 tunes their amplitude.The choice of this kernel is motivated by the fact that it enforces favorable properties such as stability and smoothness in the estimated impulse responses (Pillonetto, Chiuso, & De Nicolao, 2011;Pillonetto & De Nicolao, 2010).Therefore, we have that where we have assigned different hyperparameters to the impulse response priors to guarantee flexible enough models.

Incorporating empirical Bayes approach
We define where k 1 , . . ., k p are the elements of the set N j \{i}, and Using the above, we can rewrite (6) in compact form as Having assumed a Gaussian distribution of the noise, we can write the joint probabilistic description of m and w j , which is jointly Gaussian, as: [ where and this pdf depends upon the vector of parameters 2 It is clear that these impulse responses share some common dynamics given by the pre-multiplication with the inverse of the noise model H j (q).However, for computational purposes it is convenient to treat the impulse responses as independent.Furthermore, incorporating the mutual dependence through a suitable choice of prior distribution seems a non-trivial problem that deserves a thorough analysis that is outside the scope of this paper.
which contains the parameter vector of the target module, the hyperparameters of the kernels of the impulse response models of the other modules, and the variance of the ''dummy'' noise corrupting w j (t).Therefore, we focus on the estimation of η, since it contains the parameter of interest θ.To this end, we apply an Empirical Bayes (EB) approach.We consider the marginal pdf of w j , which is obtained by integrating out the dependence on m and corresponds to p(w j ; η) ∼ N (W ji g ji , P). (16) Then, the estimate of η is obtained by maximizing the marginal likelihood of w j , namely η = arg max η p(w j ; η) ( Solving this optimization problem can be a cumbersome task, because it is a nonlinear one and involves a large number of decision variables.In the next section, we study how to solve the marginal likelihood problem through a dedicated iterative scheme.

Solution to the marginal likelihood problem
In this section, we focus on solving the problem in ( 17) by deriving an iterative solution scheme through the EM algorithm (Dempster et al., 1977).For this, we need to first define a latent variable whose estimation simplifies the computation of the marginal likelihood.In our case, a natural choice is m.Then, the solution to (17) using the EM algorithm is obtained by iterating among the following two steps: • E-Step: Given an estimate η(n) computed at the nth iteration, compute where the expectation of the joint log-likelihood of w j and m is taken with respect to the posterior p(m|w j ; η(n) ); When iterating among the above steps, convergence to a stationary point of the marginal likelihood is ensured (Boyles, 1983).This stationary point can be a local or global maximum of the objective function.In the next section, we show that we clearly get an advantage in solving the original marginal likelihood problem (17) by repetitively solving (19) using the EM algorithm.We show that, when we use the EM method, the nonlinear optimization problem becomes a problem of iteratively constructing analytical solutions and solving scalar optimization problems, which significantly simplifies solving (17).

Computation of E-step
First we focus on the E-step.The posterior distribution of m given w j and an estimate of η is Gaussian and corresponds to (see also Anderson & Moore, 1979), where Let m(n) and P(n) m be the posterior mean and covariance of m obtained from (20) using η(n) .We define , and consider its l × l diagonal blocks, which we denote by kp , respectively.These sub-matrices correspond to the posterior second moments of the estimated impulse re- The following lemma provides the structure of the function Q (n) (η) for the particular situation of our setup in (17).
Lemma 2. Let η(n) be the estimate of η at the nth iteration of the EM algorithm according to (19).Then where Proof.See the Appendix.
The function Q (n) (η) is the summation of several terms that depend on different components of the vector η.In particular, we have a term of the type Q m for each module in the MISO structure, and a term Q (n) 0 ( σ 2 j , θ) for the module of interest and the noise variance.Therefore, the update of η according to (19) splits into a number of independent and smaller optimization problems.

Computation of M-step
We now focus on the M-step according to (19).From (21), it is evident that each kernel hyperparameters can be updated independently of the rest of the parameters.The following theorem, inspired by Bottegal, Aravkin, Hjalmarsson, and Pillonetto (2016) and Everitt et al. (2018), shows how to update the kernel hyperparameters.
Theorem 1.For the update of each kernel's hyperparameters that requires maximizing (23), we define for k ∈ {N j ∪ j}\i.Then the updates are obtained as, Proof.See the Appendix.
The optimization problem in (25) can be difficult to perform in practice when the determinant of the kernel has a very low value or when the inversion of the kernel becomes difficult.To tackle this, we exploit the factorization of the first order stable spline kernel as in Bottegal et al. (2016) by writing where L is lower-triangular with known entries (essentially, an ''integrator'') and D(β) is diagonal with entries essentially being an exponential functions of β.Using the above technique also increases the computation speed of the algorithm.
We note that from (26) that we get closed-form solutions for all λ k , k ∈ {N j ∪j}\{i}, while the β k , k ∈ {N j ∪j}\{i}, can be updated by solving scalar optimization problems in the domain [0, 1), as detailed in (25).Therefore, the hyperparameters update turns out to be a computationally fast operation.
We now turn our attention to the update of θ and σ 2 j for which we need to maximize (22).We notice that the optimum with respect to θ does not depend on the optimal value of σ 2 j .Then, we can first update θ and then use its optimal value to update σ 2 j .How to update θ is explained in the following theorem.
Theorem 2. The estimate of the parameter vector θ is updated by solving the quadratic problem that has a closed form solution given by where Â(n) and b(n) are computed using the current estimates m(n) and η(n) , and g ji = Mθ where M ∈ R 2N×n θ is a matrix with 1 or 0 as its elements.□ Proof.See the Appendix.
Therefore, the parameter vector of the target module is updated by solving the analytical expression (28).
Remark 1.An additional advantage of the method developed in this paper is that it relies on iteratively solving a quadratic least squares problem to find the solution for the parameters of the target module θ rather than solving a non-linear least squares problem as in Ramaswamy et al. (2018), making the method computationally more efficient.
We are left with updating σ 2 j , which is given in the next theorem.
Theorem 3. Let ĝ(n+1) ji , Ŵ(n+1) be constructed by inserting θ(n+1) in the general expression of g ji and W. Then Thus, a closed-form solution for the estimate of the noise variance is also obtained.
Remark 2. We estimate the ''dummy'' noise variance σ 2 j = |f an f | 2 σ 2 j , that is a scaled version of the original output noise power in the network.If there are no unstable systems in the MISO setup, then σ 2 j will be σ 2 j .This will be verified with numerical simulations in Section 6.
All-in-all, we have obtained a fast iterative procedure that provides a local solution to the marginal likelihood problem (17).All the updates follow simple rules that allow for fast iterative computation.Algorithm 1 summarizes the steps to follow to obtain η and therefore θ.
criterion for the algorithm depend on the value of This value should be small for convergence so that the algorithm can be terminated.A value of 10 −2 is considered for the numerical simulations in Section 6.The other convergence criterion is the maximum number of iterations.It is taken as 50.
Remark 3. Being applicable to a MISO identification setup, the introduced method can also be inherently used for parametric SISO identification, where the process noise modeling is now simplified by avoiding the model order selection and reducing the number of parameters of the noise model to two (which are the hyperparameters λ j , β j ).
Remark 4. We notice that: • The method does not require prior information about the stability of the systems G jk , k ∈ N j and the number of unstable poles in the systems.
• According to Dankers et al. (2016), in view of consistency of the target module estimate, it is not necessary to take all nodes w k , k ∈ N j as the inputs in the MISO structure, but it is sufficient to take a subset of nodes in N j as inputs such that every parallel path 3 from w i to w j and every loop around w j passes through a selected input.This may lead to confounding variables which can be handled using additional inputs (Dankers, Van den Hof, Materassi, & Weerts, 2017).At the same time, in view of an appropriate bias-variance trade off, especially under limited data circumstances, it could be attractive to include more predictor inputs than the ones that are strictly necessary for achieving consistency.While the algorithm presented in this paper can be applied to any choice of such MISO structure, we have formulated the results for the situation where all nodes w k , k ∈ N j are taken as inputs.

Non-parametric identification of modules in the MISO structure
In this section we slightly adapt the developed method to obtain a non-parametric estimate of the target module.For this, we rewrite the network equation (3) as, w j (t) = M j (q)w j (t) + ∑ k∈N j M jk (q)w k (t) + ēj (t)   (29) 3 a path from w i to w j that does not pass through G ji . with ) , (30) where M jk (q) and M j (q) are stable.Following the similar approach as introduced before, but modeling the impulse response of all the modules (including m i of M ji that represents the target module) as zero mean Gaussian processes with the prior covariance matrix represented by the First-order stable spline kernel, we end up in an iterative algorithm to estimate the parameter vector η which contains the hyperparameters λ k , β k where k ∈ N j and the noise variance σ 2 j .Since we are not parameterizing any modules, we do not have θ in the parameter vector η.The solutions for the β's and λ's at each iteration are given by ( 25) and ( 26) respectively.The solution to σ 2 j at each iteration is given by, ] where The above solution is equivalent to the solution of σ 2 j in Theorem 3, however without the terms that are function of θ (i.e.g ji , G b , G f , W ji g ji ).Thus we will end up in the same Algorithm 1, however with steps related to θ (step 4) being not applicable.The posterior mean of m k , k ∈ N j and m j obtained using (20) (neglecting the effect of W ji g ji ) for the converged η provides us the impulse response of M jk and M j respectively.From these, the impulse response estimates of the modules G jk , k ∈ N j can be obtained.Thus we obtain a non-parametric identification method to identify all the modules in the MISO structure as a derived result of the earlier developed identification technique.

Numerical simulations
Numerical simulations are performed to evaluate the performance of the developed method, which we abbreviate as Empirical Bayes Direct Method (EBDM).The simulations are performed on the dynamic network depicted in Fig. 1.The goal is to identify G 0 31 .To show the effectiveness of the introduced method and its flexibility to handle stable and unstable modules with a single unified identification framework, we perform the simulations for two different cases: (1) Case 1: All modules in the MISO setup are stable.
(2) Case 2: The modules in the MISO setup including the target module can be stable or unstable.
The results of the numerical simulations are presented below.

Case study 1
The EBDM is compared with the standard direct method and the two-stage method (see Van den Hof et al., 2013 for details).The network modules of network in Fig. 1 are given by We run 50 independent Monte Carlo experiments where the data is generated using known reference signals r 2 (t) and r 4 (t) that are realizations of white noise with unit variance.The number of data samples is N = 500.The noise sources e 1 (t), e 2 (t), e 3 (t) and e 4 (t) have variance 0.05, 0.08, 0.5, 0.1, respectively.We assume that we know the model order of G 0 31 (q).In the case of direct method, we solve a 3-input/1-output MISO identification problem with w 1 (t), w 2 (t) and w 4 (t) as inputs.In the two-stage method, the projections of the three inputs on external signals r 2 (t) and r 4 (t) are used as inputs to the MISO identification problem.For both these methods, we consider the case where a model order selection of all the modules in the MISO structure (except for the target module) is required, and the case where the model orders are known.Moreover, in order to improve the accuracy of the identified module in the two-stage method, we identify a noise model even though it is not necessary for consistency.
Fig. 4 shows the estimated impulse response at the end of each MC simulation using the EBDM.It can be verified that, in line with our framework, the estimates provide the description of the dynamics of M j , M jk , k ∈ N j and G ji .To evaluate the performance of the methods, we use the standard goodness-of-fit metric, where g 0 ji is the true value of the impulse response of G 0 ji , ĝji is the impulse response of the estimated target module and ḡji is the sample mean of g 0 ji .The box plots of the fits of the impulse response of G 31 (q) are shown in Fig. 2, where we have compared the two-stage method with true model orders ('TS + TO'), the direct method with true model orders and model orders selected via BIC ('DM + TO' and 'DM + MOS', respectively), and the Empirical Bayes Direct Method ('EBDM').As for the latter, we choose l = 100.It can be noted that in this setup the EBDM achieves a fit on par with the Direct method and significantly better than the two-stage method.Fig. 3 shows the mean and standard deviation of the parameter estimates of G 31 .It is evident that the EBDM gives a smaller bias and a greatly reduced variance compared to the other considered identification methods.The reduction in variance is attributed to the regularization approach used in this method.The fit is calculated using the estimated impulse response from the estimated parameters of the target module.Even though, the variability is high in estimated parameters using the Direct Method, it did not affect the fit of the impulse response, that produces an on par result in Fig. 2 when compared with EBDM.However, Fig. 3 clearly shows that EBDM performs better than the other considered approaches.Considering a relatively small sized network with 3 modules in the MISO structure, the developed method proves effective.When the size of the network grows, the results of the direct method may deteriorate further due to increase in variance; furthermore, it is expected that in large networks the model order selection step contributes to inaccurate results.Thus the EBDM, by offering reduced variance and circumventing the problem of model order selection, can stand out as an effective local module identification method in large dynamic networks.

Case study 2
Now we look into the case where the modules in the MISO structure may not be stable.In this case, we consider the same network as in Fig. 1, however with unstable module G 0 31 (target module) and G 0 32 .The network modules of network in Fig. 1 are the same as in previous section but with unstable G 0 31 and G 0 32 given by 31 has two complex poles that are not stable and G 0 32 has four poles of which one is a real unstable pole.The noise source e 3 (t) has variance of 0.1.The experiment setup is similar to the previous case and we run 50 MC experiments with the introduced method in this paper.
To evaluate the performance of the EBDM, we use the standard goodness-of-fit metric, , where θ 0 are the true parameters of the target module, θ are the estimated parameters and θ is the sample mean of θ 0 .Due to the instability of the target module, we choose fit on parameters and not on the impulse response.The box plot of the fit of the parameters of G 31 (q) is shown in Fig. 5, where the Empirical Bayes Direct Method ('EBDM') is used to identify the unstable target module.We choose l = 200.It can be noted that the box plot is above 0.9, which indicates a better fit.Fig. 6 shows the mean and standard deviation of the parameter estimates of G 31 .It is evident that the bias and variance is small.The reduction in variance is attributed to the regularization approach used in this method.
It is noteworthy to compare the introduced EBDM with other available approaches that can identify unstable modules.In Galrinho, Everitt, and Hjalmarsson (2017), a method to identify unstable SISO systems with Box-Jenkins (BJ) structure using high order ARX modeling has been introduced.This method proves effective in estimating the unstable poles of the system with high accuracy (less variance) (Galrinho et al., 2017), but the estimated model will have high variance due to high order modeling.Also, the estimated model will be of high order unless there is sufficiently large data.Fig. 7 shows the bode magnitude plot of the estimates after 50 MC simulations with the experimental setup in case study 2 using EBDM and the method of ARX modeling in Galrinho et al. (2017).ARX models of 15th order are used for the latter method.Even though the estimate of unstable poles are with high accuracy for the latter method, the EBDM performs significantly better in terms of accuracy with less variance in the identified frequency response.Since we have limited data (N = 500), the estimated model with the method in Galrinho et al. ( 2017) is of high order, which can be verified from Fig. 7.
A three step parametric identification method to identify unstable SISO system is introduced in Galrinho, Rojas, and Hjalmarsson (2016).The first step involves identifying the unstable poles of the parameterized model using the result that the unstable poles can be identified with high accuracy using the method in Galrinho et al. (2017).In the next step, from the obtained estimates, the parameters of the anti-stable part is fixed, and a weighted null space fitting (WNSF) method is used to identify the rest of the parameters of the parameterized model of interest.However, for the MISO identification setup in a dynamic network framework, we might end up in estimating 'false' unstable poles for the target module in the first step where ARX modeling is used.Due to high order ARX modeling, these 'false' unstable poles can be the unstable poles of the modules in the MISO setup other than the target module and it becomes difficult to distinguish the unstable poles between each modules, so that the estimate of unstable roots of the target module can be fixed for the second step.For example, the simulations depicted in Fig. 7 using the ARX modeling method, we estimate the target module of order 15 with 3 unstable poles, where 2 unstable poles are the poles of G 0 31 and the extra unstable pole is the unstable pole of G 0 32 .Therefore, it becomes difficult to use the WNSF method in this setup without prior knowledge about the unstable poles.An alternative BJ model has been proposed in Forssell and Ljung (1998) that can be used with prediction error framework.However, implementation of this is significantly more complex than the introduced EBDM.

Table 1
Results of the simulations that were performed using the setup of case study 1 (upper) and 2 (lower) with different noise variance of e 3 acting on the output node w 3 .Table 1 shows the actual ''dummy'' noise variance to be estimated and the estimated noise variance using EBDM for the experimental setup in Case 1 (upper) and Case 2 (lower).

Estimated noise variance
Using the experimental setup of case study 1 and 2 but with different noise power (variance) of e 3 (σ 3 ) acting on the output node w 3 , we performed simulations using the EBDM for the network in Fig. 1.For the case study 1, since all modules are stable (i.e.F a /F ⋆ a = 1), the estimated noise variance σ3 should be ap- proximately equal to the actual noise variance σ 3 (see Remark 2).This can be verified from Table 1 (upper) where the estimated noise variance approximates well the actual noise variance in the network.Considering the case study 2, the estimated noise variance σ3 should be approximately equal to the scaled version of the actual noise variance σ 3 given by σ 2 3 = |F a /F ⋆ a | 2 σ 2 3 = |f an f | 2 σ 2

Additional remarks
The method described in this paper can be developed using any of the kernels available in the literature of regularized system identification.The choice of kernel adopted in this paper is the result of a balance between its empirical effectiveness (see Pillonetto et al., 2014) and its computational efficiency (due to its factorization and the low number of hyperparameter).Other choices of kernel (e.g. the DC kernel proposed in Chen et al., 2012) may result in a final higher accuracy, requiring to estimate an additional hyperparameter, which might bring an additional cost in complexity.On the other hand, it is well known (see Chen et al., 2012) that the optimal kernel is constructed from the true impulse response, which is unknown (it is the actual object of interest).The question which is the best choice of kernel for dynamic networks is open and requires a thorough theoretical analysis which is outside the scope of the paper.

Conclusions
An effective regularized kernel-based approach for local module identification in dynamic networks has been introduced in this paper.The introduced method (EBDM) circumvents the model order selection step for all the modules that are not of primary interest to the experimenter, but still need to be identified in order to get a consistent estimate of the target module.Furthermore, by using regularized non-parametric methods, the number of parameters to be estimated is greatly reduced, with a clear benefit in terms of mean square error of the estimated target module.Therefore, the method is computationally less complex and scales favorably to large size networks.The method developed in this paper is capable of performing identification in networks composed by unstable modules, without any prior information about the stability of the modules.Numerical experiments performed with a dynamic network example illustrate the potentials of the developed method on comparison with the already available methods on networks of stable modules.The developed method provides better estimates and a reduced variance is observed in the identified model due to the integration of the regularization approach in the method.can be removed from the objective function since it becomes a constant.Then we get, ] . (D.1) We now introduce the following notation.Let D 1 ∈ R N 2 ×N and D 2 ∈ R N 2 ×N are two matrices such that, for any vector w ∈ R N , D 1 w = vec(W ), where W is the Toeplitz matrix of w, and D 2 w = vec(W ⊤ ).Let us define m(n) ∈ R N be a vector such that, if N ⩽ l, m(n) is the vector of first N elements of m(n) and if N > l, m(n) is a vector with the first l elements equal to m(n) and the remaining ones equal to 0. Let M(n) , ← → W N ℓ ∈ R N×N where ℓ ∈ {i, j} be the Toeplitz matrix of m(n) and ← → w ℓ respectively.Then and

Fig. 1 .
Fig. 1.Network example with 4 internal nodes, 2 reference signals and a noise source at each node.

Fig. 2 .
Fig. 2. Box plot of the fit of the impulse response of Ĝ31 obtained by the Two-stage method, Direct method and EBDM.Number of data samples used for estimation is N = 500.

Fig. 3 .
Fig. 3. Bias and standard deviation of each parameter obtained from 50 MC simulations using different identification methods.

Fig. 4 .
Fig. 4. Bottom right plot provides the impulse response estimate of the target module at the end of each MC simulation, which is obtained from the estimated parameter θ .The other plots show the impulse response estimates of the filters that are modeled as GP's, which is obtained by calculating the posterior (20) from the estimated hyperparameters.The black dashed line provides the true impulse response of the modules.

Fig. 6 .
Fig. 6.Bias and standard deviation of each parameter obtained from 50 MC simulations using different identification methods.

Fig. 7 .
Fig. 7. Bode magnitude plot to compare the estimates of the introduced approach(upper) and the approach in Galrinho et al. (2017)(lower).
Box plot of the fit of the parameters of Ĝ31 obtained by the proposed method.Number of data samples used for estimation is N = 500.