Maximum likelihood inference for a class of discrete-time Markov-switching time series models with multiple delays

Autoregressive Markov switching (ARMS) time series models are used to represent real-world signals whose dynamics may change over time. They have found application in many areas of the natural and social sciences, as well as in engineering. In general, inference in this kind of systems involves two problems: (a) detecting the number of distinct dynamical models that the signal may adopt and (b) estimating any unknown parameters in these models. In this paper, we introduce a class of ARMS time series models that includes many systems resulting from the discretisation of stochastic delay differential equations (DDEs). Remarkably, this class includes cases in which the discretisation time grid is not necessarily aligned with the delays of the DDE, resulting in discrete-time ARMS models with real (non-integer) delays. We describe methods for the maximum likelihood detection of the number of dynamical modes and the estimation of unknown parameters (including the possibly non-integer delays) and illustrate their application with an ARMS model of El Ni\~no--southern oscillation (ENSO) phenomenon.


Introduction 1.Background
Discrete-time autoregressive Markov switching (ARMS) time series models [10] are used to represent real-world signals whose dynamics may change over time.For example, if {x n } is the signal of interest, we may model its evolution as where n ≥ 0 is the current time, x 0:n−1 = {x 0 , x 1 , . . ., x n−1 } is the signal history, {u n } is some noise (random) process and {l n } is a Markov chain [32], i.e., a sequence of discrete random indices that change over time according to a Markov kernel that describes the conditional probabilities Prob (l n = i|l n−1 = j) for suitable integers i and j.If l n ̸ = l n−1 then the functions ϕ(•, •, l n ) and ϕ(•, •, l n−1 ) are different as well, and hence the dynamics of x n change.ARMS models have found a plethora of applications in statistical signal processing for econometrics [34,14,6,5,8], engineering [12,11,31,22], the Earth sciences [1,26,19], or complex networks [18], to name a few examples.
Inference for ARMS models involves (a) the detection of the number of possible values that the Markov chain {l n } can take, (b) and the estimation of unknown parameters in each function ϕ(•, •, l n ).
We may assume, without loss of generality, that l n ∈ {1, . . ., L}.Following [18], in this paper we refer to each dynamical model ϕ(•, •, l), 1 ≤ l ≤ L, as a layer and, hence, task (a) consists in estimating the number of active dynamical layers L from a sequence of data samples x 0:n .This is a model order selection problem that can be tackled using the Akaike and Bayesian information criteria (AIC and BIC, respectively) [20,33,30,19], penalised distances [21], penalised likelihoods [27,26], the three-pattern method [30] or the Hannan-Quinn criterion (HQC) [30].Linear ARMS models admit an equivalent representation as autoregressive moving-average (ARMA) systems and, in this case, the number of layers L can also be inferred from the covariance matrix of the ARMA process [38,6].
As for problem (b), maximum likelihood (ML) and maximum a posteriori estimators can be approximated using different forms of the expectation-maximisation (EM) algorithm [10,1,26], while Markov chain Monte Carlo (MCMC) methods have been applied for Bayesian estimation [34,22,39,5,18].Simpler moment matching techniques can also be applied for parameter estimation in linear ARMS systems [15].
See [10,28] for a survey of recent ARMS models and methods.

Contributions
In this paper we investigate nonlinear ARMS models where the independent noise process {u n } is additive and each dynamical layer depends on a different delay of the signal.For example, we may rewrite Eq. ( 1) as where {l n } is a homogeneous Markov chain and each D[l] > 1 (1 ≤ l ≤ L) is a (possibly long) delay, specific to the l-th dynamical layer.A detailed description of the relevant family of models is given in Section 2 below.Our formulation is devised to target time series models that result from the discretisation of delay differential equations [3,4], which appear often in geophysics [17,36].
The specific contributions of this work can be summarised as follows: • We introduce an ARMS time series model that includes systems resulting from the discretisation of stochastic DDEs.In particular, the proposed model includes cases in which the times at which the signal can be observed are not necessarily aligned • We provide an EM framework for parameter estimation, based on space alternation and a simple stochastic optimisation algorithm, that can be systematically implemented for the models of the proposed class.This scheme can be easily combined with an ML detector of the number L of dynamical layers.
• We illustrate the application of the proposed model and inference methodology by discretising and then fitting a stochastic DDE which has been proposed as a representation of El Niño-southern oscillation (ENSO) [36].We obtain numerical results for models with up to three dynamical layers and either integer or real delays.
In this example non-integer delays appear naturally when the observation times are not aligned with the physical delays.We validate the model and estimation algorithm using synthetically generated observations and then apply the methodology to real ENSO data.

Notation
Scalar magnitudes are denoted by regular-face letters, e.g., x.Column vectors and matrices are represented by bold-face letters, either lower-case or upper-case, respectively.For example, x = (x 1 , . . ., x m ) ⊤ is an m × 1 vector (the superscript ⊤ denotes transposition) and X = (x 1 , . . ., x d ) is an m × d matrix, with x i = (x 1i , . . ., x mi ) ⊤ denoting its i-th column.Discrete-time is indicated as a subscript, e.g., x n .Dependences on an integer index other than time are represented with the index between square brackets, e.g., D[l] in (2) is the delay associated to the l-th dynamical layer.
We abide by a simplified notation for probability functions, where p(x) denotes the probability density function (pdf) of the random variable (r.v.) x.This notation is argument-wise, hence if we have two r.v.'s x and y, then p(x) and p(y) denote the corresponding density functions, possibly different; p(x, y) denotes the joint pdf and p(x|y) is the conditional pdf of x given y.The notation for multidimensional r.v.'s, e.g., x and y, is analogous, i.e., p(x, y), p(x|y), etc.The probability mass function (pmf) of a discrete r.v.x is denoted P (x) (note the upper case) and we follow the same argument-wise convention as for pdf's.

Contents
The rest of the paper is organised as follows.In Section 2 we introduce the new class of discrete-time, nonlinear ARMS models with multiple delays and provide sufficient conditions that ensure convergence to a limit distribution.An EM framework for inference is described in Section The entry in the i-th row and jth column of M , denoted M ij , is the probability mass P (l n = j|l n−1 = i).The delayed nonlinear ARMS model with L layers, denoted DN-ARMS(L), is constructed as where } is a set of positive integer delays, one per dynamical layer, the is the maximum delay and x i:j denotes the subsequence x i , x i+1 , . . ., x j .Note that a distribution for x −1 is not sufficient to specify the model because of the (possibly long) delays.

Continuous-delay Markov-switch nonlinear models
Model (3) can be extended to incorporate real positive delays.Non-integer delays arise, for example, from the discretisation of stochastic DDEs when the continuoustime delays are not aligned with the time grid of the observed series.Such scenarios are quite natural.For example, in Section 4.1 we look into models of ENSO temperature anomalies.These temperatures are typically collected on a monthly basis; however, there is no physical reason for the delays in the DDE models to be an integer number of months.
Assume that D[l] ∈ (1, +∞) for all l ∈ L. Model (3) can be extended to account for possibly non-integer delays if we rewrite it as where xn−D[l] is constructed as an interpolation of consecutive elements of the series x n .In general, for τ ∈ R + , we let xτ = ∞ m=0 κ(τ − m)x m , where κ : R → R is an interpolation kernel satisfying that xτ = x τ when τ is an integer.In the computer experiments of Sections 4 and 5 we restrict our attention, for simplicity, to the order where, for a real number r ∈ R, ⌊r⌋ = sup{n ∈ Z : n < r}.
Let us remark that xn−D[ln] in Eq. ( 4) is not an observed data point, however it can be deterministically computed from observed data.Also, model (4) reduces to model (3) when D [1], . . ., D[L] are all integers.We refer to model (4), with real delays, as cDN-ARMS(L).

Invariant distribution and mean power
The classical results of [37] on the ergodicity and invariant distributions of first-order ARMS models with additive noise can be extended to the cDN-ARMS(L) model described in Section 2.2 above.Let us fix that the model parameters α and define the We make the following assumptions on the cDN-ARMS(L) model: The functions ϕ α [1], . . ., ϕ α [L] are Lispchitz.In particular, there are for l = 1, . . ., L, where ∥ • ∥ denotes the Euclidean norm.
Assumption 2. The Markov chain {l n } with transition matrix M is stationary, with invariant distribution P ∞ (l), l ∈ L.
Assumption 3. The invariant distribution P ∞ and the Lipschitz constants l ∈ L, satisfy the inequality The results stated below follow from Theorems 3 and 4 in [37].
Proposition 1.Let η n denote the probability law of x n .If Assumptions 1, 2 and 3 hold, then there exist a unique probability law η such that lim n→∞ η n = η.
Proof.We convert model (4) with delays for x n in R d into a first-order model (without delays) in a higher dimensional space.Specifically, let us define x n x n−1 . . .
and S+1) .Model (4) can now be equivalently rewritten as where the functions φα [l] are Lipschitz.Indeed, from Eq (6) and Assumption 1 we Combining this result with Assumptions 2 and 3 we can apply Theorem 3 in [37], which implies the statement of Proposition 1.
Let us additionally introduce the matrix , where s ≥ 1, and let ρ(M s ) denote the spectral radius of M s .Then, we can characterise the moments of order s of the sequence x n .
Proposition 2. If Assumptions 1 and 2 hold, and ρ(M s ) < 1, then the unique law η = lim n→∞ η n has finite moments of order s.In particular, if ρ(M 2 ) < 1 then Proof: This result follows from Theorem 4 in [37] by the same argument as in the proof of Proposition 1.

Model inference
We introduce a space-alternating (SA) EM algorithm for iterative ML parameter estimation in the general cDN-ARMS(L) model described in Section 2.2.First, we obtain the likelihood for the proposed class of models, recall the standard EM method and explain why it is not tractable.We then describe the SA-EM scheme and conclude this section with a succinct discussion on the ML detection of the number L of dynamical layers.

Likelihood function
Let x 0:T = {x 0 , . . ., x T } be a sequence of observations (and assume that x n is given for all n < 0).The set of model parameters to be estimated is where are the set of entries of the L × L transition matrix M , the set of (possibly real) delays and the set of entries of the k × 1 vector α, respectively.
We denote the likelihood of the parameter set Λ given the observed sequence x 0:T as p(x 0:T |Λ).In order to obtain an explicit expression for the likelihood we write it in terms of the joint distribution of x 0:T and the Markov sequence of layers For any 0 < n ≤ T , we can use Bayes' theorem to obtain a recursive decomposition of the joint distribution, where we have used the Markov property and the fact that l n is condition- If we (repeatedly) substitute the recursive relationship (10) into Eq.( 9) we readily obtain where all factors can be computed.In particular, note that p(x 0 |l 0 , Λ) is tractable because we have assumed that x −1 , x −2 , . . .are known.
The ML estimator of the parameters is the solution of the problem Λ M L ∈ arg max Λ p(x 0:T |Λ).However, even when the ϕ[l]'s are linear, it is not possible to compute Λ M L exactly and we need to resort to numerical approximations [10].

Expectation-maximisation algorithm
Let x and y be r.v.'s, let θ be some parameter and let f (x) be some transformation of x.We write E x|y,θ [f (x)] to denote the expected value of f (x) w.r.t. the distribution with pdf p(x|y, θ), i.e., A standard EM algorithm [7,25] for the iterative ML estimation of Λ from the data sequence x 0:T can be outlined as follows.
2. Expectation step: given an estimate Λi , obtain the expectation 3. Maximisation step: obtain a new estimate, Using standard terminology, x 0:n is the observed data, l 0:T is the latent data and {x 0:T , l 0:T } is the complete data set.The basic guarantee provided by the EM algorithm is that the estimates Λ0 , Λ1 , . . .have non-decreasing likelihoods, i.e., for every i ≥ 0, p(x 0:T | Λi+1 ) ≥ p(x 0:T | Λi ) [25].
If we substitute the recursive decomposition (10) into the expectation of (11) we arrive at where ) .Now, if we write the posterior expectations where P (l n |x 0:n , Λi ) and P (l n−1 , l n |x 0:n , Λi ) are the posterior pmf's of the random indices l n and l n−1:n , respectively, conditional on the observations x 0:n and the i-th parameter estimators Λi .
Given these probabilities, from the sequence of Eqs. ( 12), ( 13), ( 14) and (15) it is apparent that we can update1 ΛM,i+1 = arg max Unfortunately, the problem ΛD,i+1 ∪ Λα,i+1 ∈ arg max cally intractable in general and the (typically) large number of parameters in Λ D ∪ Λ α makes numerical approximations hard in practice.As a workaround, we propose a SA-EM algorithm [9] that can be used systematically for the class of models described by (4).

Space-alternating expectation-maximisation algorithm
Let us re-partition the parameter set as Λ = Λ M ∪ Λ * ∪ Λ c , where Λ M , Λ * and Λ c are disjoint sets, and • Λ M contains the L × L entries of the transition matrix M , as before; and possibly some entries of α; We can now summarise the proposed SA-EM algorithm as follows.
2. Expectation step: given an estimate Λi , run a forward-backward algorithm to compute the pmf's P (l n |x 0:n , Λi ) for all l n ∈ L and P (l n−1:n |x 0:n , Λi ) for all and (15), respectively.
The SA-EM algorithm can be run for a fixed, prescribed number of iterations or stopped when some criterion is fulfilled, e.g., that the difference between successive estimates is smaller than a given threshold.
Remark 1.The SA-EM algorithm is designed in the vein of the space-alternating generalised EM methods of [9].However, the scheme introduced in this paper does not rigorously below to the class of algorithms described in [9].Nevertheless, it is not hard to prove that the estimates Λi generated by the SA-EM scheme have non-decreasing likelihoods, i.e., p(x 0:n | Λi+1 ) ≥ p(x 0:n | Λi ) for every i ≥ 1. this is the same property enjoyed by the standard EM method and the generalised algorithms of [9].

Estimation of the number of layers L
When the number of dynamical layers, L, in the cDN-ARMS(L) model is unknown we can use the proposed SA-EM algorithm to estimate it.In particular, assume that L ∈ A := {a − , . . ., a + }.We can run I iterations of the the SA-EM algorithm for each where ΛI is the set of parameter estimates after I iterations (note that the number of elements in this set increases with L).The likelihood p(x 0:T | ΛI ) above can be computed exactly as a by-product of the forward-backward algorithm run in the maximisation step, with a computational cost O(T ) (see [10] for details).
Choosing the value of L ∈ {a − , . . ., a + } that maximises ℓ T (L) typically leads to overestimation due to over-fitting.Following [30], we adopt a penalised likelihood which is associated with a strong increase in rainfall intensity and duration in Central and South America.The anomalies in the trade wind and the SST at the Pacific Ocean's equator have been historically modelled as the solution of a DDE [35,29,36].
Several conceptual equations have also been proposed in the literature [13,16] that incorporate stochastic terms or display chaotic dynamics.
For the computer experiments in this section we consider a nonlinear DDE based on the model of Ghile et al. [13], where we include a diffusion term to obtain the stochastic DDE in Itô form where t denotes continuous time (the time unit is 1 year), T (t) is the SST anomaly, τ ∈ R + is a time delay, and a, b, κ and ω are constants, W (t) is a standard Wiener process and σ > 0 is a diffusion factor that determines the intensity of the stochastic perturbation.
Equation ( 17) can be integrated numerically using different schemes [4].For simplicity, we apply a weak-order 1.0 Euler-Maruyama scheme with constant time-step h > 0, which yields where T n ≈ T (nh) is an approximation of the temperature process at t = nh, D is a discrete delay computed as D = τ h (we assume that τ can be expressed as an integer multiple of h) and u n is an i.i.d.sequence of N (0, 1) r.v.'s.
Model (17) and its discretised version (18) can be shown to yield sequences of temperatures which are (qualitatively and quantitatively) similar to the measurements of SST anomalies in the South Pacific ocean.However, these models are not accurate enough for reliable forecasting and, in particular, it has not been possible to use them to predict the large "spikes" in SST that characterise the El Niño phenomenon.In an attempt at extending the applicability of the model, we propose to construct multilayer cDN-ARNMS(L) models based on (18) that extend the flexibility of the original equation.

DN-ARMS(L) model with integer delays
Let us construct a DN-ARMS(L) model of the form in Eq. (3) for SST anomaly time series.Publicly available SST data for ENSO is given in the form of monthly averaged SSTs and temperature anomalies.For this reason, we adopt h = 1  12 as a time step in the Euler scheme (18).This time step is known and common to all L layers of the DN-ARMS(L) model.Additionally, we define: • A Markov chain {l n } taking values in L = {1, . . ., L} with an L × L transition matrix M .
The assumption h ∈ N for every l may be a mild one when h can be chosen to be sufficiently small, however it is hardly realistic for a time step of one month (and we drop it in Section 4.3 below).
The resulting DN-ARMS(L) model of the ENSO time series can be compactly written as where x n is the SST anomaly at time t = nh and {u n } is an i.i.d.N (0, 1) sequence.

Estimation of the number of layers L
In the first set of computer experiments we assess the estimation of the number of layers L using the penalised maximum likelihood method described in Section 3.4.
We simulate two data sets x Table 1: Log-likelihoods and penalised log-likelihoods when the true model has L o = 2 layers (left) and L o = 3 layers (right).
Table 1 shows the log-likelihoods and the penalised log-likelihoods obtained for the models with L = 2, 3, 4, when the true model generating the data has L o = 2 layers (left) and L o = 3 layers (right).It is seen that higher likelihoods are obtained as L is increased, even when L > L o .This is due to over-fitting of the model parameters.
When a simple penalisation is included, the correct number of dynamical layers is detected in both experiments.

Parameter estimation
In this section we study the accuracy of the SA-EM parameter estimation algorithm with L o = 2 layers described in Section 4. is the estimate of the delay D[l] in the r-th independent simulation, for r = 1, . . ., R, then the frequency of (correct) detections is where δ[•] is the Kronecker delta function.
The computer experiment consists of the following steps: i) Generate R = 100 independent realisations of the time series model with L o = 2 described in Section 4.2.1.Each signal consists of T = 1, 000 data points.A sample signal is displayed in Figure 1a.
ii) For each independent realisation, extract four subsequences containing the first 250, 500, 750 and 1,000 data points, respectively.
iii) Generate an initial condition for each realisation, Λ(r) 0 , r = 1, ..., R. Apply the SA-EM algorithm to each subsequence of each realisation and obtain parameter estimates.
iv) For each r = 1, ..., R and each subsequence, compute the normalised errors for the transition matrix (e M (r)) and each real parameter (e αi (r)), as well as the detection frequency F D for the integer delays.
The purpose of this setup is to demonstrate the effectiveness of the SA-EM algorithm, study the existence of local maxima of the likelihood and illustrate the improvement in the accuracy of the parameter estimates as the length of the observed series increases.The errors for the same parameter in the two layers, e.g., a [1] and a [2], are grouped in the same box plot.
Figure 2 shows box plots of the normalised errors for the remaining parameters.For each parameter and each length of the data sequence, the red line indicates the median of the errors, the box extends between the .25 and .75quantiles of the empirical distribution, the whiskers extend to the complete distribution and the circles are outliers (i.e., points located above the upper quartile by 1.5 times the interquartile range).We observe how the median error decreases when more data points are available.As with the delays, outliers are due to simulations where the SA-EM algorithm converges to a local maximum of the likelihood that differs significantly from its global maximum.
These simulations indicate that, in any practical application, the SA-EM algorithm should be run with multiple initialisations (even for a single data set).One can then select the parameter estimates from the run that attained the highest log-likelihood (which is computed by the SA-EM algorithm as a by-product of the forward-backward procedure).

cDN-ARMS(L) model with non-integer delays
The assumption of integer delays, i.e., that τ [l], l = 1, . . ., L, are all integer multiples of the one-month time step h = 1 12 , is unrealistic.In this section, we assume that h ∈ (1, +∞) and construct a cDN-ARMS(L) model of the form in Eq. ( 4) for the ENSO time series.To be specific, the SA-EM algorithm is implemented with the model where xn−D[ln] is computed by linear interpolation as shown in Eq. ( 5).The transition matrix M and the parameters α are defined in the same way as in Section 4.2, and x n ∼ N (0, 1) for all n < 0.

Generation of synthetic time series
The interpolated signal xn−D[l] in ( 20) is an approximation that we impose to incorporate a real delay into a discrete-time model.In order to put this approximation to a test, we generate time series data using stochastic DDEs of the form in Eq. ( 17) that we integrate with an Euler-Maruyama scheme on a finer grid, namely, with a time step of the form h = 1 12m , where m is a positive integer.In particular, we let again L o = 2 and generate an auxiliary data sequence from the model where the D[l]'s are positive integer delays, {u i } is a standard Gaussian i.i.d.sequence of noise variables, y i ∼ N (0, 1) for all i < 0, l i ∼ P (l i |l i−1 ) = M li−1,li when i is an integer multiple of m and l i = l i−1 when i is not an integer multiple of m (i.e., the index l i can only change every m time steps).
The actual data set used for the computer simulations is then obtained by subsampling {y i } by a factor m, namely, x n = y nm , for n = 0, 1, . . .In this way, we generate a data sequence {x n } that depends on non-integer delays and does not rely on interpolation or any other approximation based on the observed data.

Parameter estimation
We conduct a set of computer experiments similar to those in Section 4.2.2 but using data sequences generated by the procedure in Section 4. Figure 3 shows the box plots of the normalised errors for all parameters.We note that the errors are grouped per parameter through the two layers, e.g., the box plot in Figure 3a corresponds to the population of errors {e a [1] (r), e a [2] (r) : r = 1, . . ., R}.
Similar to Figure 2 we observe that the median errors decrease consistently as the length of the data sequence increases, except for the real delays D [1 : 2].In this case we see that the normalised errors are small (close to 10 −2 ) and their variance reduces with increasing data length, but the median error remains approximately constant.This numerical result indicates that the estimators of D [1] and D [2]   We have applied the EM-SA algorithm on this data set in order to fit different models, namely: • cDN-ARMS(L) models with L = 2, 3, 4, • two Markov-switching models with L = 2 layers where each layer is a linear AR system; in one model each layer is AR(3) and in the second model each layer is AR(4).
All four models are fitted using the SA-EM algorithm.For the cDN-ARMS(L) models the algorithm is applied in the same way as in Section 4.3.For the models with linear AR layers, the algorithm simplifies considerably as Λ c = ∅, which implies that there is no need for the ARS optimisation scheme.The overall estimation procedure becomes very similar to the one in [10] (except that the models in [10] are of order 1).
Figure 4 (right) shows one sample series generated with with four models: two cDN-ARMS(L) models based on Ghil et al.'s Eq. ( 17) and two Markov-switching linear AR models.We skip the cDN-ARMS(4) model, which attained poor estimates (due to insufficient data).

Parameter estimates
The estimated transition matrices for the cDN-ARMS( 2 for cDN-ARMS(3).We remark that, in both cases, the estimated delays have nonnegligible fractional parts.This result justifies the need for modelling the delays as real random variables.

Model assessment
Table 2 shows a comparison of the four models in terms of their log-likelihood ℓ T (L) and their penalised log-likelihood lT (L), computed as described in Section 3.4.
We observe that the cDN-ARMS(L) models attain a higher log-likelihood than the Markov-switching linear AR(L) models and increasing the number of parameters to fit (by increasing L or the order of the AR models) yields a higher likelihood.When looking at the penalised log-likelihood, however, the lower order models yield the best results and the Markov-switching linear AR(3) model with L = 2 layers attains the highest value.yields the autocorrelation which is closest to the one obtained from the real data.Also, the two cDN-ARMS(L) models capture the negative autocorrelation of the ENSO data (after month 12) although they both underestimate it.The Markov-switching models with linear AR layers fail to capture the true autocorrelation after 5-6 months.2.
the distribution (for SST anomalies > 2).The representation of the left tail is poorer, however.We note that positive anomalies (known as 'El Niño') in the ENSO data of

Conclusions
We have introduced a class of nonlinear autoregressive Markov-switching time series models where each dynamical layer (or sub-model) may have a different, possibly non-integer delay.This class includes a broad collection of systems that result from the discretisation of stochastic DDEs where the characteristic delays are not a priori known.Such models are common in Geophysics.
The proposed family of models admits an asymptotic-regime analysis similar to the classical results of [37] for first-order Markov-switching systems.In particular, we have explicitly described sufficient conditions for the random sequences generated by the proposed models to converge to a unique invariant distribution.We have also introduced numerical methods, based on a space-alternating EM procedure, to detect the number of dynamical layers in the model and to compute ML estimators of any unknown parameters, including the multiple, possibly non-integer delays.The performance of these inference methods has been tested on nonlinear autoregressive Markov-switching models that combine two or three dynamical layers, each one of them originating from a DDE typically used to represent anomalies in the sea surface temperature of (certain regions of) the Pacific Ocean due to the ENSO phenomenon.
Real-world ENSO data are recorded as a monthly series, yet the delays that characterise the phenomenon are not known a priori and there is no physical reason for them to be integer multiples of a month.Our computer simulations show that it is possible to detect the number of layers and estimate the parameters of models with at least two or three dynamical layers using relatively short series of observations.The cDN-ARMS(L) models fitted using real ENSO data and the proposed space-alternating EM algorithm can capture the autocorrelation features of the model up to 15 months and its central and right-tail quantiles.The delays estimated from the real data are, indeed, non-integer.

2 . 1
and aim at estimating the transition matrix M as well as the integer delays D[1 : 2] = (D[1], D[2])⊤ and the modelparameters α = (a[1 : 2], b[1 : 2], κ[1 : 2], w[1 : 2], σ[1 : 2]) ⊤ .We assess the accuracy of the estimators of real parameters in terms of normalised errors.Assume we run R independent simulations, all with the same true parameters (but independently generated observations x 0:T ).Then, the normalised estimation errors for the transition matrix M are e M (r) := ∥ M (r) − M ∥ F /∥M ∥ F , r = 1, . . ., R, where ∥•∥ F denotes the Frobenius norm and M (r) represents the estimate of M in the r-th independent simulation.For the parameters in α, the normalised errors are computed as e αi (r) = |α (r) i − α i |/|α i | r = 1, . . ., R, where α i denotes the i-th entry of vector α and α(r) i is its estimate in the r-th independent simulation.Since the delays D[1 : 2] are integers, calculating a normalised Euclidean norm does not provide a meaningful characterisation of performance.Instead, we assess the estimation algorithm by computing the frequency of correct detections, i.e., if D[l] (r)

Figure 1b shows aFig. 1 :
Figure1bshows a bar diagram with the absolute frequency of correct detection, F D , for each data sequence length.Since there are R = 100 simulations (for each length), the maximum value of F D is R = 100.We see that, as the length of the data sequence increases, the value of F D improves as well.When the number of data points in the sequence is T = 1, 000 we obtain F D ≈ 95, i.e., D[1] and D[2] are both detected correctly in ≈ 95% of the simulations.The delays become mismatched in the simulation runs where the SA-EM algorithm converges to a local maximum of the likelihood.
the delays D[l], which are integers in the discrete-time scale of {y i }, become rational in the discrete-time scale of {x n }, with possible values of the form D[l] ∈ r, r ± 1 2 , where r ∈ Z + .For general m ∈ Z + , the delays in the subsampled time scale of the series {x n } are of the form D[l] ∈ r, r + 1 m , . . ., r + m−1 m , with r ∈ Z + .

3 . 1
to account for non-integer delays.The estimation errors for M and the parameters in vector α are computed in the same way as in Section 4.2.2.The estimates of the delays for these experiments are real numbers, hence we compute normalised errors of the form e D[l] (r) = |D[l] − D[l] (r) |/|D[l]|, where D[l] is the true value of the delay and D[l] (r) is the estimate in the r-th simulation run.The procedure for the computer experiments is the same as in Section 4.2.2, with R = 100 independently generated data sequences of length T = 1, 000, each of them split to obtain subsequences of length 250, 500, 750 and 1,000.The values of the true parameters α and transition matrix M used to generate the data are the same as in the model with L o = 2 in Section 4.2.1.The auxiliary sequence {y i } is generated with a time step 1 12m , with m = 2.The true non-integer delays are D[1] = 3.5 and D[2] = 9.5.
present a bias due to the mismatch between model (20), used by the SA-EM algorithm, and the model of Section 4.3.1 used to generate the data.

Finally, Figure 5 (
right) shows the quantile-to-quantile (QQ) plots for the four models compared to the ENSO data.Here, the ENSO quantiles are obtained from the empirical distribution of the real data, while the quantiles for the models are computed by averaging over 2,000 independent simulations of each model.We observe that the Markov-switching models with linear AR layers fail to capture both the central quantiles of the distribution of the ENSO data and its tails.The two cDN-ARMS(L) models yield a very good approximation of the central quantiles (roughly from -2 to 2) and the cDN-ARMS(3) model also yields a good approximation of the right tail of N QQ-plot of the AR(3) and GHIL models.

Fig. 5 :
Fig.5: Right: autocorrelation functions for the real ENSO data and the fitted models from 0 to 15 months.Left: QQ-plots of the fitted models.Models are labeled in the same way as in Table2.

Figure 4 (
Figure 4 (left) are stronger than the negative ones ('La Niña'), which may explain why the cDN-ARMS(3) model attains a better fit of the right tail of the distribution.

Table 2 :
Log-likelihoods and penalised log-likelihoods of the cDN-ARMS(L)(Ghil)models with L = 2, 3, 4 and the 2-layer linear AR models, when fitted to the real data Other comparisons yield a different view, however.Figure5(left) shows the empirical autocorrelation function computed using the ENSO data (blue colour) and the empirical autocorrelations obtained for the same four models in Table2.These curves are computed by averaging the empirical autocorrelations of 2,000 independently- generated series for each model.It is seen that the cDN-ARMS(2) model (Ghil, L = 2)