Testing for observation-dependent regime switching in mixture autoregressive models

Testing for regime switching when the regime switching probabilities are specified either as constants (`mixture models') or are governed by a finite-state Markov chain (`Markov switching models') are long-standing problems that have also attracted recent interest. This paper considers testing for regime switching when the regime switching probabilities are time-varying and depend on observed data (`observation-dependent regime switching'). Specifically, we consider the likelihood ratio test for observation-dependent regime switching in mixture autoregressive models. The testing problem is highly nonstandard, involving unidentified nuisance parameters under the null, parameters on the boundary, singular information matrices, and higher-order approximations of the log-likelihood. We derive the asymptotic null distribution of the likelihood ratio test statistic in a general mixture autoregressive setting using high-level conditions that allow for various forms of dependence of the regime switching probabilities on past observations, and we illustrate the theory using two particular mixture autoregressive models. The likelihood ratio test has a nonstandard asymptotic distribution that can easily be simulated, and Monte Carlo studies show the test to have satisfactory finite sample size and power properties.


Introduction
Different regime switching models are in widespread use in economics, finance, and other fields. When the regime switching probabilities are constants, these models are often referred to as 'mixture models', and when these probabilities depend on past regimes and are governed by a finite-state Markov chain, the term (time homogeneous) 'Markov switching models' is typically used. In this paper, we are interested in the case where the regime switching probabilities depend on observed data, a case we refer to as 'observation-dependent regime switching'. Models of this kind can be viewed as special cases of time inhomogeneous Markov switching models (in which regime switching probabilities depend on both past regimes and observed data). Overviews of regime switching models can be found, for example, in Frühwirth-Schnatter (2006) and Hamilton (2016). Of critical interest in all these models is whether the use of several regimes is warranted or if a single-regime model would suffice. Testing for regime switching in all these models is plagued by several irregular features such as unidentified parameters and parameters on the boundary and is consequently notoriously difficult.
Tests for Markov switching have been considered by several authors in the econometrics literature. Hansen (1992) and Garcia (1998) both considered sup-type likelihood ratio (LR) tests in Markov switching models but they did not present complete solutions. Hansen derived a bound for the distribution of the LR statistic, leading to a conservative procedure, while Garcia did not treat all the non-standard features of the problem in detail. Cho and White (2007) analyzed the use of a LR statistic for a mixture model to test for Markov-switching type regime switching. They found their test based on a mixture model to have power against Markov switching alternatives even though it ignores the temporal dependence of the Markov chain. Carrasco, Hu, and Ploberger (2014) took a different approach and proposed an information matrix type test that they showed to be asymptotically optimal against Markov switching alternatives. Very recently, both Qu and Zhuo (2017) and Kasahara and Shimotsu (2017) have studied the LR statistic for regime switching in Markov switching models.
Regarding testing for mixture type regime switching, the existing literature is extensive, and several early references can be found, for instance, in McLachlan and Peel (2000, Sec. 6.5.1). Most papers in this literature consider the case of independent observations without regressors. Notable exceptions allowing for regressors (but not dependent data) and having set-ups closer to the present paper are Zhu andZhang (2004, 2006) and Kasahara and Shimotsu (2015) who consider (among other things) LR tests for regime switching. Further comparison to these works will be provided in later sections.
In contrast to testing for Markov switching or mixture type regime switching, there exists almost no literature on testing for observation-dependent regime switching. The only two exceptions we are aware of are the unpublished PhD thesis of Jeffries (1998) and the recent paper of Shen and He (2015). Jeffries's thesis, which appears to have gone largely unnoticed, analyses the LR test in a specific (firstorder) mixture autoregressive model; we will discuss his work further in later sections. Shen and He (2015) consider the case of independent observations with regressors and observation-dependent regime switching, and propose an 'expectation maximization test' for regime switching.
In this paper we consider testing for observation-dependent regime switching in a time series context. Specifically, we analyze the asymptotic distribution of the LR test statistic for testing a linear autoregressive model against a two-regime mixture autoregressive model with observation-dependent regime switching. Mixture autoregressive models have been discussed for instance in Wong andLi (2000, 2001), Dueker, Sola, and Spagnolo (2007), Dueker, Psaradakis, Sola, and Spagnolo (2011), and Kalliovirta, Meitz, andSaikkonen (2015, 2016); further discussion of this previous work will be provided in Section 2. Motivation for allowing the regime switching probabilities to depend on observed data stems, for instance, from the desire to associate changes in regime to observable economic variables. 1,2 Following Kasahara and Shimotsu (2015) it would also be possible to consider the more general testing problem that in a model with more than two regimes the number of regimes can be reduced. However, as even the case of two regimes is quite complex in our set-up, we leave this extension to future research.
We consider mixture autoregressive (MAR) models in a rather general setting employing highlevel conditions that allow for various forms of observation-dependent regime switching. As specific examples, we treat the so-called logistic MAR (LMAR) model of Wong and Li (2001) and (a version of the) Gaussian MAR (GMAR) model of Kalliovirta et al. (2015) in detail. The technical challenges we face in analyzing the LR test statistic are similar to those when testing for Markov switching and mixture type regime switching. First, there are nuisance parameters that are unidentified under the null hypothesis. This is the classical Davies (1977Davies ( , 1987 type problem. Second, under the null hypothesis, there are parameters on the boundary of the permissible parameter space. Such problems (also allowing for unidentified nuisance parameters under the null) are discussed in Andrews (1999Andrews ( , 2001. Third, the Fisher information matrix is (potentially) singular, preventing the use of conventional secondorder expansions of the log-likelihood to analyze the LR test statistic. Such problems are discussed by Rotnitzky, Cox, Bottai, and Robins (2000), and suitable reparameterizations and higher-order expansions are needed to analyze the LR statistic. A particular challenge in the present paper is to deal with these three problems simultaneously. Similar problems were faced by Kasahara and Shimotsu (2015), and inspired by their work we consider a suitably reparameterized model, write a higher-order expansion of the log-likelihood function as a quadratic function of the new parameters, and then derive the asymptotic distribution of the LR test statistic by slightly extending and adapting the arguments of Andrews (1999Andrews ( , 2001 and Zhu and Zhang (2006) (who partially generalize results of Andrews). Our two examples demonstrate that, compared to the mixture type regime switching considered by Kasahara and Shimotsu (2015), observation-dependent regime switching can either simplify or complicate the analysis of the LR test statistic.
We contribute to the literature in several ways. (1) To the best of our knowledge, apart from the unpublished PhD thesis of Jeffries (1998), we are the first to study testing for observation-dependent regime switching using the LR test statistic and among the rather few to allow for dependent observations. (2) We provide a general framework to cover various forms of observation-dependent regime switching, making our results potentially applicable to several models not explicitly discussed in the present paper. (3) From a methodological perspective, we slightly extend and adapt certain arguments of Andrews (1999Andrews ( , 2001 and Zhu and Zhang (2006), which may be of independent interest.
The rest of the paper is organized as follows. Section 2 reviews mixture autoregressive models. Section 3 analyzes the LR test statistic for testing a linear autoregressive model against a two-regime mixture autoregressive model. Simulation-based critical values and a Monte Carlo study are discussed in Section 4, and Section 5 concludes. Appendices A-C contain technical details and proofs. Supplementary Appendices D-E, available from the authors upon request, contain further technical details omitted from the paper.
Finally, a few notational conventions are given. All vectors will be treated as column vectors and, for the sake of uncluttered notation, we shall write x = (x 1 , . . . , x n ) for the (column) vector x where the 1 See, for instance, Hamilton (2016), whose Handbook of Macroeconomics chapter begins "Many economic time series exhibit dramatic breaks associated with events such as economic recessions, financial panics, and currency crises. Such changes in regime may arise from tipping points or other nonlinear dynamics and are core to some of the most important questions in macroeconomics." 2 More general models in which the regime switching probabilities are allowed to depend on both past regimes and observable variables have also been considered, see, e.g., Diebold, Lee, and Weinbach (1994), Filardo (1994), and Kim, Piger, and Startz (2008). components x i may be either scalars or vectors (or both). For any vector or matrix x, the Euclidean norm is denoted by x . We let X T α = o pα (1) and X T α = O pα (1) stand for sup α∈A X T α = o p (1) and sup α∈A X T α = O p (1), respectively, and λ min (·) and λ max (·) for the smallest and largest eigenvalue of the indicated matrix.

General formulation
Let y t (t = 1, 2, . . .) be a real-valued time series of interest, and let F t−1 = σ (y s , s < t) denote the σ-algebra generated by past y t 's. We use P t−1 (·) to signify the conditional probability of the indicated event given F t−1 . In the general two component mixture autoregressive model we consider the y t 's are generated by where the parametersσ 1 andσ 2 are positive, and conditions required for the autoregressive parameters φ i andφ i (i = 1, . . . , p) will be discussed later. Furthermore, ε t and s t are (unobserved) stochastic processes which satisfy the following conditions: (a) ε t is a sequence of independent standard normal random variables such that ε t is independent of {y t−j , j > 0}, (b) s t is a sequence of Bernoulli random variables such that, for each t, P t−1 (s t = 1) = α t with α t a function of y t−1 = (y t−1 , . . . , y t−p ), and (c) conditional on F t−1 , s t and ε t are independent. The conditional probabilities α t and 1−α t (= P t−1 (s t = 0)) are referred to as mixing weights. They can be thought of as (conditional) probabilities that determine which one of the two autoregressive components of the mixture generates the next observation y t . In condition (b) it is assumed that the mixing weight α t (and hence also the conditional distribution of y t given its past) only depends on p lags of y t ; allowing for more than p lags in the mixing weight would be possible at the cost of more complicated notation.
We assume that of the original parametersφ = (φ 0 ,φ 1 , . . . ,φ p ,σ 2 1 ) andφ = (φ 0 ,φ 1 , . . . ,φ p ,σ 2 2 ) in the two regimes, q 1 parameters are a priori assumed the same in both regimes and the remaining q 2 parameters are potentially different in the two regimes (with q 1 + q 2 = p + 2). For instance, one may assume thatφ 0 andφ 0 are equal, or alternatively thatσ 2 1 andσ 2 2 are equal. If such an assumption is plausible, taking it into account when devising a test for regime switching will be advantageous (it will lead to a test with better power). To this end, let β be a q 1 × 1 vector of common parameters, and let φ and ϕ be q 2 ×1 vectors of (potentially) different parameters. Then, for some known (p+2)-dimensional permutation matrix P , (β, φ) = Pφ and (β, ϕ) = Pφ. For simplicity, we assume that β and φ are variation-free, requiring the autoregressive coefficientsφ 1 , . . . ,φ p to be contained in either β or φ (the same variation-freeness is assumed of β and ϕ). If there are no common coefficients in the two regimes, the parameter β can be dropped and φ =φ and ϕ =φ.
As for the mixing weight α t , in addition to past y t 's it depends on unknown parameters which may include components of the parameter vector (β, φ, ϕ) and an additional parameter α (scalar or vector). When this dependence needs to be emphasized we use the notation α t (α, β, φ, ϕ).
Using equation (1) and the conditions following it, the conditional density function of y t given its past, f (· | F t−1 ), is obtained as where the notation f t (β, φ) signifies the density function of a (univariate) normal distribution with meanφ 0 +φ 1 y t−1 + · · · +φ p y t−p and varianceσ 2 1 evaluated at y t , that is, with N(u) = (2π) −1/2 exp(−u 2 /2) the density function of a standard normal random variable and π = 3.14 . . . the number pi. The notation f t (β, ϕ) is defined similarly by using the parametersφ i (i = 0, . . . , p) andσ 2 2 instead ofφ i (i = 0, . . . , p) andσ 2 1 . Thus, the distribution of y t given its past is specified as a mixture of two normal densities with time varying mixing weights α t and 1 − α t .
Different mixture autoregressive models are obtained by different specifications of the mixing weights (or in our case the single mixing weight α t ). In some of the proposed models more than two mixture components are allowed but for reasons to be discussed below we will not consider these extensions. If the mixing weights are assumed constant over time the general mixture autoregressive model introduced above reduces to (a two component version) of the MAR model studied by Wong and Li (2000). In the LMAR model of Wong and Li (2001), a logistic transformation of the two mixing weights is assumed to be a linear function of past observed variables. Related two-regime mixture models with time-varying mixing weights have also been considered by Gouriéroux and Robert (2006), Dueker et al. (2007) and Bec, Rahbek, and Shephard (2008) whereas Lanne and Saikkonen (2003) and Kalliovirta et al. (2015) have considered mixture autoregressive models in which multiple regimes are allowed.
A common problem with the application of mixture autoregressive models is determining the value of the (usually unknown) number of component models or regimes. As discussed in the Introduction, several irregular features make this problem difficult and these difficulties are encountered even when the observations are a random sample from a mixture of (two) normal distributions. To our knowledge the only solution presented for mixture autoregressive models is provided for a simple first order case with no intercept terms in the unpublished PhD thesis of Jeffries (1998). As discussed in the recent papers by Kasahara andShimotsu (2012, 2015) and the references therein, some of the difficulties involved stem from properties of the normal distribution.
The difficulties referred to above also partly explain the complexity of our testing problem and why we only consider test procedures that can be used to test the null hypothesis that a two component mixture autoregressive model reduces to a conventional linear autoregressive model. Following the ideas in Zhu and Zhang (2006) and Kasahara andShimotsu (2012, 2015), we derive a LR test in the general set-up described above and apply it to two particular cases, the LMAR model of Wong and Li (2001) and the GMAR model of Kalliovirta et al. (2015). Next, we shall discuss these two models in more detail.

Two particular examples
LMAR Example. The LMAR model of Wong and Li (2001) is defined by specifying the mixing weight α t as where the vector α = (α 0 , α 1 , . . . , α m ) contains m+1 unknown parameters and the order m (1 ≤ m ≤ p) is assumed known.
GMAR Example. In the GMAR model of Kalliovirta et al. (2015) the mixing weight is defined as where α ∈ (0, 1) is an unknown parameter and n p (y t−1 ; ·) denotes the density function of a particular p-dimensional (p ≥ 1) normal distribution defined as follows. First, define the auxiliary Gaussian AR(p) processes (cf. equation (1)) where the autoregressive coefficients are assumed to satisfỹ This condition implies that the processes ν 1,t and ν 2,t are stationary and that each of the two component models in (1) satisfies the usual stationarity condition of the conventional linear AR(p) model. Now set ν m,t = (ν m,t , . . . , ν m,t−p+1 ) and 1 p = (1, . . . , 1) (p × 1), and let µ m 1 p and Γ m,p signify the mean vector and covariance matrix of ν m,t (m = 1, 2). 3 The random vector ν 1,t follows the p-dimensional multivariate normal distribution with density and the density of ν 2,t , denoted by n p (ν 2,t ;φ), is defined similarly. Equation (1) and conditions (4)-(6) define the (two component) GMAR model (condition (5) is part of the definition of the model because it is used to define the mixing weights).

Test procedure
We now consider a test procedure of the null hypothesis that a two component mixture autoregressive model reduces to a conventional linear autoregressive model.

The null hypothesis and the LR test statistic
We denote the conditional density function corresponding to the unrestricted model as (see (2)) where we now make the dependence of the mixing weight on the parameters explicit. With this notation the log-likelihood function of the model based on a sample (y −p+1 , . . . , y T ) (and conditional on the initial values (y −p+1 , . . . , 3 We have µ1 =φ0/φ (1) and µ2 =φ0/φ (1), whereas each of Γm,p, (m = 1, 2), has the familiar form of being a p × p symmetric Toeplitz matrix with γm,0 = Cov[νm,t, νm,t] along the main diagonal, and γm,i = Cov[νm,t, νm,t−i], i = 1, . . . , p − 1, on the diagonals above and below the main diagonal. Similarly to µ1 and µ2 the elements of the covariance matrices Γ1,p and Γ2,p are treated as functions of the parametersφ andφ, respectively (for details of this dependence, see Lütkepohl (2005, eqn. (2.1.39))).
(i) The y t 's are generated by a stationary linear Gaussian AR(p) model with (the true but unknown) parameter valueφ * an interior point ofΦ, a compact subset of {φ = (φ 0 ,φ 1 , . . . ,φ p ,σ 2 ) ∈ R p+2 : where A is a compact subset of R a and B and Φ are those compact subsets of R q 1 and R q 2 , respectively, that satisfy (β, φ) ∈ B × Φ if and only if P −1 (β, φ) ∈Φ (here P is as in the third paragraph of Section 2.1).
As our interest is to study the asymptotic null distribution of the LR test statistic, Assumption 1(i) requires the data to be generated by a stationary linear Gaussian AR(p) model. Assuming a compact parameter space in Assumptions 1(i) and (ii) is a standard requirement which facilitates proofs. Assumption 1(ii) accommodates to the main cases of interest, namely β =φ 0 , β = (φ 1 , . . . ,φ p ), β =σ 2 , or any combination of these. 4 Assumption 1(iii) implies that our two-component mixture autoregressive model reduces to a linear autoregression only when φ = ϕ, regardless of the values of α ∈ A and β ∈ B. The null hypothesis to be tested is therefore φ = ϕ and the alternative is φ = ϕ or, more precisely, Note that under the null hypothesis the parameter α vanishes from the likelihood function and is therefore unidentified. Let f 0 t (φ) := f 0 (y t | y t−1 ;φ) and L 0 T (φ) denote the conditional density and log-likelihood corresponding to the restricted model, that is, (here the superscript 0 refers to the model restricted by the null hypothesis). Note that these quantities are obtained from a linear Gaussian AR(p) model. As f 2,t (α, β * , φ * , φ * ) = f t (φ * ) for any α ∈ A, in the unrestricted model the parameter vector (α, β * , φ * , φ * ) corresponds to the true model for any α ∈ A. As already indicated, Assumption 1(iii) implies that the restriction φ = ϕ is the only possibility to formulate the null hypothesis. However, this is not necessarily the case if (against Assumption 1(iii)) the mixing weight α t (α, β, φ, ϕ) were allowed to take the boundary values zero and one. Of our two examples this would be possible for the GMAR model of Kalliovirta et al. (2015) but not for the LMAR model of Wong and Li (2001). In the GMAR model α t (α, β, φ, ϕ) takes the boundary values zero and one when the parameter α takes these values (see Section 2.2). In both cases a linear autoregression results and either the parameter φ or ϕ is unidentified (see (2)) (the MAR model of Wong and Li (2000) provides a similar example). It would be possible to obtain tests for the GMAR model by using the null hypotheses which specifies α = 0 or α = 1. However, as in Kasahara and Shimotsu (2012) (see also Kasahara and Shimotsu (2015)), this approach would require rather restrictive assumptions and would also lead to very complicated derivations. 5 Therefore, we will not consider this option.
As the parameter α is unidentified under the null hypothesis, the appropriate likelihood ratio type test statistic is where, for each fixed α ∈ A, To obtain an operational test statistic let, for each fixed α ∈ A, (β T α ,φ T α ,φ T α ) denote an (approximate) unrestricted maximum likelihood (ML) estimator of the parameter vector (β, φ, ϕ). We make the following assumption.
Assumption 2. The unrestricted ML estimator satisfies the following conditions: Assumption 2(i) means that (β T α ,φ T α ,φ T α ) is assumed to maximize the likelihood function only asymptotically. This assumption is technical and made for ease of exposition (see Andrews (1999) and Zhu and Zhang (2006) for similar assumptions in related problems). Assumption 2(ii) is a high level condition on (uniform) consistency of the ML estimator and is analogous to Assumption 1 of Andrews (2001). It has to be verified on a case by case basis (this is exemplified below for the LMAR model and GMAR model).
As for the term supφ ∈Φ L 0 T (φ) in the LR test statistic, note that L 0 T (φ) is the (conditional) loglikelihood function of a linear Gaussian AR(p) model. Letφ T denote an (approximate) maximum likelihood estimator of the parameters of a linear Gaussian AR(p) model, that is,φ T satisfies 6 Noting that L T (α, β * , φ * , φ * ) = L 0 T (φ * ) for any α now allows us to write LR T (α) as The analysis of the second term on the right hand side is standard while dealing with the first term is more demanding requiring a substantial amount of preparation.
When m = 0, the mixing weight α L t (and hence 1−α L t ) is constant and the LMAR model reduces to the MAR model of Wong and Li (2000). In this special case our testing problem requires different and more complicated analyses than in the 'real' LMAR case where m ≥ 1 and (α 1 , . . . , α r ) = (0, . . . , 0) (we shall discuss this point more later). Therefore, the conditions m ≥ 1 and (α 1 , . . . , α m ) = (0, . . . , 0) will be assumed in the sequel. A similar restriction is made by Jeffries (1998) in his (first-order) logistic mixture autoregressive model to facilitate the derivation of the LR test (see the hypotheses at the end of p. 95 and the following discussion, as well as the end of p. 110).
GMAR Example. The GMAR model exemplifies the setting with common coefficients by assuming that the intercept terms in the two regimes are the same (note that this still allows for different means in the two regimes). As will be discussed in more detail in Section 3.3.1, this assumption is partly due to the fact that otherwise the derivation of the LR test would become extremely complicated. Hence, in this example β =φ 0 (=φ 0 ), φ = (φ 1 , . . . ,φ p ,σ 2 1 ), ϕ = (φ 1 , . . . ,φ p ,σ 2 2 ), q 1 = 1, and q 2 = p + 1. To satisfy Assumptions 1(ii) and (iii) the parameter space A of α can be any compact and convex subset of (0, 1) (this also rules out the possibility that α = 0 or α = 1 discussed after Assumption 1). For the verification of Assumption 2(ii), see Appendix C.
It may be worth noting that there are cases where the mixing weight α G t is time invariant and equals α. If this happens the GMAR model reduces to the MAR model of Wong and Li (2000). 7 However, unlike in the case of the LMAR model this fact does not complicate the derivation of our test. The reason seems to be that in the GMAR model the reduction occurs only for certain values of the parametersφ andφ whereas in the LMAR model it occurs for all values ofφ andφ.

Reparameterized model
In standard testing problems the derivation of the asymptotic distribution of the LR test would rely on a quadratic expansion of the log-likelihood function L T (α, β, φ, ϕ) = T t=1 l t (α, β, φ, ϕ); when the parameter α is not identified under the null hypothesis, the relevant derivatives in this expansion would be with respect to (β, φ, ϕ) for fixed values of α ∈ A. In problems with a singular information matrix it turns out to be convenient to follow Rotnitzky et al. (2000) and Kasahara andShimotsu (2012, 2015) and employ an appropriately reparameterized model.
The employed reparameterization is model specific and aims to have two conveniences. First, it transforms the null hypothesis φ = ϕ into a point null hypothesis where some components of the parameter vector are restricted to zero and the rest are left unrestricted. Second, and more importantly, it simplifies derivations in cases where the conventional quadratic expansion of the loglikelihood function breaks down because, under the null hypothesis, the scores of the parameters (β, φ, ϕ) are linearly dependent and, consequently, the (Fisher) information matrix is singular. As will be seen later, this is the case for the GMAR model of Kalliovirta et al. (2015) but not for the LMAR model of Wong and Li (2001).
We sometimes refer to the reparameterization described in Assumption 3 as the 'π-parameterization' and the original reparameterization as the 'φ-parameterization'. Note that the transformed parameters π and generally depend on α but, for brevity, we suppress this dependence from the notation. The parameter space of (π, ) also depends on α and is given, for any α ∈ A, by By Assumption 3(ii), the null hypothesis φ = ϕ can be equivalently written as = 0 or, more precisely, as Note that under H 0 , the parameters β and π are identified, but α is not. As for Assumption 3(iii), it is a high level condition similar to Assumption 2(ii) from which it can be derived with appropriate additional assumptions. A simple Lipschitz condition similar to Andrews (1992, Assumption SE-1(b)), given in Lemma A.1 in Appendix A, is one possibility.
The reparameterization we employ in the LMAR model is Note that in this case the reparameterization (via π α (·, ·)) does not depend on α, and the same is true for the parameter space of (π, ). Verification of Assumption 3 is straightforward using Lemma A.1 (for details, see Appendix B). In the LMAR case, the only benefit of the reparameterization is to transform the null hypothesis into a point null hypothesis.

GMAR Example.
In the GMAR model our reparameterization is obtained by setting, for any fixed α ∈ A, Verification of Assumption 3 is again straightforward using Lemma A.1 (for details, see Appendix C). In the GMAR model, simplifying the null hypothesis is not the only benefit of the reparameterization, as will be discussed next. As discussed before Assumption 3, the relevant derivatives when expanding L T (α, β, φ, ϕ) are with respect to (β, φ, ϕ) and, in the GMAR case, these derivatives are linearly dependent under the null hypothesis. To see this and how the reparameterization affects this feature, note first that straightforward differentiation yields where the null hypothesis φ = ϕ is imposed and ∇ denotes differentiation with respect to the indicated parameters. As and hence the (Fisher) information matrix, is singular with rank p + 2. In contrast to the above, in the π-parameterization the score vector is given by (see Supplementary Appendix C) when the null hypothesis = 0 is imposed. Now the score of is identically zero so that the reparameterization simplifies linear dependencies of the scores which turns out to be very useful in subsequent asymptotic analyses.

Quadratic expansion of the (reparameterized) log-likelihood function
As alluded to above, in standard testing problems the asymptotic analysis of a LR test statistic is based on a second order Taylor expansion of the (average) log-likelihood function around the true parameter value. An essential assumption here is positive definiteness of the (limiting) information matrix but, as illustrated in the previous section, this assumption does not necessarily hold in our testing problem due to linear dependencies among the derivatives of the log-likelihood function. As in Rotnitzky et al. (2000), Zhu and Zhang (2006), and Kasahara andShimotsu (2012, 2015), we therefore consider a quadratic expansion of the log-likelihood function that is not based on a second order Taylor expansion but (possibly) on a higher order Taylor expansion. The need for higher-order derivatives is illustrated by the GMAR example: as the score of is identically zero, the second derivative (which turns out to be linearly independent of the score of (β, π)) now provides the first (nontrivial) local approximation for .
The following assumption ensures that the (reparameterized) log-likelihood function (8) is (at least) twice continuously differentiable.
In our general framework the reparameterized log-likelihood function is assumed to have, for each α ∈ A, a quadratic expansion in a transformed parameter vector θ(α, β, π, ) around (β * , π * , 0) given by L π T (α, β, π, ) − L π T (α, β * , π * , 0) To illustrate this expansion, suppose the information matrix is positive definite so that the quantities on the right hand side are (typically) based on a second order Taylor expansion with S T α and I α functions of (α, β * , π * , 0). As already mentioned, this is the case for the LMAR model where (the following will be justified shortly) the parameter θ(α, β, π, ) is independent of α and given by (π − π * , ) and, for each α ∈ A, S T α is the score vector, I α is the (positive definite) Fisher information matrix, and R T (α, β, π, ) is a remainder term. As the notation indicates, these three terms depend on α, and in general they may involve partial derivatives of the log-likelihood function of order higher than two (this is the case for the GMAR model, as will be demonstrated shortly). Then it may also get more complicated to find the reparameterization of the previous subsection and the transformed parameter vector θ(α, β, π, ), as the examples of Kasahara andShimotsu (2015, 2017) and the discussion on the GMAR model below show; one possibility is to consider the iterative procedure discussed by Rotnitzky et al. (2000, Sections 4.4 and 4.5) (for a recent illuminating illustration of this approach, see Hallin and Ley (2014)). Our next assumption provides further details on expansion (10). We use ⇒ to signify weak convergence of a sequence of stochastic processes on a function space. In the assumption below, the weak convergence of interest is that of the process S T α (indexed by α ∈ A) to a process S α . The two function spaces relevant in this paper are B(A, R k ) and C(A, R k ), the former is the space of all R k -valued bounded functions defined on (the compact set) A equipped with the uniform metric (d(x, y) = sup a∈A x(a) − y(a) ), and the latter is the same but with the continuity of the functions (with respect to α ∈ A) also assumed.
Assumption 5. For each α ∈ A, the log-likelihood function L π T (α, β, π, ) has a quadratic expansion given in (10), where (ii) S T α = T t=1 s tα is a sequence of R r -valued F T -measurable stochastic processes indexed by α ∈ A; S T α does not depend on (β, π, ); S T α has sample paths that are continuous as functions of α; for all α ∈ A and has continuous sample paths (as functions of α) with probability 1.
Assumption 5(i) describes the transformed parameter θ(α, β, π, ), with part (b) being an identification condition. Assumption 5(ii) is the main ingredient needed to derive the limiting distribution of our LR test whereas 5(iv) ensures that the remainder term R T (α, β, π, ) has no effect on the final result. Assumption 5(iii) imposes rather standard conditions on the counterpart of the information matrix.
As in Andrews (1999Andrews ( , 2001, Zhu and Zhang (2006), and Kasahara andShimotsu (2012, 2015), for further developments it will be convenient to write the expansion (10) in an alternative form as where Z T α = I −1 α T −1/2 S T α . Assumptions 5(ii) and (iii) imply the following facts (that will be justified in the proof of Lemma 1 in Appendix A): Z T α is F T -measurable, independent of (β, π, ), continuous as a function of α with probability 1, and Z T • ⇒ Z • where the mean zero R r -valued Gaussian process α for all α ∈ A and has continuous sample paths (as functions of α) with probability 1.
As mentioned in the LMAR example of Section 3.1.1, the treatment of the special case where the mixing weight α L t is constant is more complicated than that of the 'real' LMAR case. Indeed, replacing the mixing weight α L t by a constant in the preceding expression of the score vector S T α immediately shows that the second-order Taylor expansion (14) breaks down because, contrary to Assumption 5(iii), the components of S T α are not linearly independent and, consequently, the Fisher information matrix I α is singular. Thus, a higher order Taylor expansion is needed to analyze the LR test statistic.
To give an idea of how one could proceed, we first note that the partial derivatives of the loglikelihood function behave in the same way as their counterparts in Kasahara and Shimotsu (2015) where mixtures of normal regression models (with constant mixing weights) are considered (see particularly the discussion following their Proposition 1). This is due to the fact that in the special case of constant mixing weights the LMAR model is obtained from the model considered in Kasahara and Shimotsu (2015) by replacing the exogenous regressors therein by lagged values of y t . Thus, the arguments employed in that paper could be used to obtain the asymptotic distribution of the LR test statistic. Instead of a conventional second-order Taylor expansion this would require a more complicated reparameterization and an expansion based on partial derivatives of the log-likelihood function up to order eight. As most of the details appear very similar to those in Kasahara and Shimotsu (2015) we have preferred not to pursue this matter in this paper.
The preceding discussion means that, in the case of the LMAR model, time varying mixing weights are beneficial when the purpose is to derive a LR test for the adequacy of a single-regime model. A similar observation was made already by Jeffries (1998, p. 80). However, this does not happen in all mixture autoregressive models with time varying mixing weights, as the following discussion on the GMAR model demonstrates.

GMAR Example.
As alluded to earlier, in the case of the GMAR model the expansion (10) cannot be based on a second order Taylor expansion of the log-likelihood function. A higher order expansion is required, and similarly to Kasahara and Shimotsu (2012) the appropriate order turns out to be the fourth one with the elements of ∇ β l π t (α, β * , π * , 0) and ∇ π l π t (α, β * , π * , 0) and the distinctive elements of ∇ l π t (α, β * , π * , 0) (suitably normalized) used to define the vector S T α . In Appendix C we present, for an arbitrary fixed α ∈ A, the explicit form of a standard fourth-order Taylor expansion of L π T (α, β, π, ) = T t=1 l π t (α, β, π, ) around (β * , π * , 0) with respect to the parameters (β, π, ). Therein we also demonstrate that this fourth-order Taylor expansion can be written as a quadratic expansion of the form (10) (or (11)) with the different quantities appearing therein defined as follows.
In the GMAR example we have assumed that the intercept termsφ 0 andφ 0 in the two regimes are the same. We are now in a position to describe the difficulties that allowing forφ 0 =φ 0 (and, hence, dropping β) would entail. In this case, the additional parameterφ 0 would correspond to 1 , the first component of . As in Section 3.2.1, it would again be the case that ∇ 1 l π t (α, π * , 0) = 0, leading us to consider second derivatives. But now, due to the properties of the Gaussian distribution, it would be the case that ∇ 2 1 1 l π t (α, π * , 0) is linearly dependent with the components of ∇ π l π t (α, π * , 0), making it unsuitable to serve as a component of S T . A reparameterization more complicated than that used in Section 3.2.1 would be needed, with the aim of obtaining ∇ 2 1 1 l π t (α, π * , 0) = 0 and, instead of ∇ 2 1 1 l π t (α, π * , 0), using ∇ 3 1 1 1 l π t (α, π * , 0) or perhaps ∇ 4 1 1 1 1 l π t (α, π * , 0) as the counterpart of the score of the parameter 1 . It turns out that (restricting the discussion to the case p = 1 only) the third derivative is suitable when α = 1/2 andφ 1 = −1/2, but that fourth (or higher) order derivatives are needed when α = 1/2 orφ 1 = −1/2. Similar difficulties (involving situations comparable to the cases α = 1/2 vs. α = 1/2, but apparently not ones involving also a counterpart ofφ 1 ) were faced by Cho and White (2007, Sec. 2.3.3) and Kasahara and Shimotsu (2017, Sec. 6.2), whose analyses suggest that expanding the log-likelihood at least to the eighth order is required. As the required analysis gets excessively complicated, we have chosen to leave it for future research.

Asymptotic analysis of the quadratic expansion
We continue by analyzing the expansion (11) evaluated at (β T α ,π T α ,ˆ T α ). Previously, a similar analysis is provided by Andrews (2001) but his approach is not directly applicable in our setting. The reason for this is that in the quadratic expansion in (11) the dependence of the parameter θ(α, β, π, ) and its parameter space Θ α on the nuisance parameter α is not compatible with the formulation of Andrews (2001, eqn (3.3)). The results of Zhu and Zhang (2006) probably cover our case, but instead of trying to verify the assumptions employed by these authors we prove the needed results by adapting the arguments used in Andrews (1999Andrews ( , 2001 and Zhu and Zhang (2006) to our setting. We proceed in several steps.
Asymptotic insignificance of the remainder term. We first establish that the remainder term R T (α, β, π, ), when evaluated at (β T α ,π T α ,ˆ T α ), has no effect on the asymptotic distribution of the quadratic expansion. A crucial ingredient in showing this is showing that the transformed parameter vector θ(α,β T α ,π T α ,ˆ T α ) is root-T consistent in the sense that T 1/2 θ(α,β T α ,π T α ,ˆ T α ) = O pα (1). This, together with part (iv) of Assumption 5 allows us to obtain the result R T (α,β T α ,π T α ,ˆ T α ) = o pα (1). We collect these results in the following lemma.
Lemma 2. If Assumptions 1-5 hold, then Approximating the parameter space with a cone. In the previous subsection the quadratic form (λ − Z T α ) I α (λ − Z T α ) was minimized over the set Θ α,T which can be complicated and hence difficult to use. Therefore we next show that the quadratic form (λ − Z T α ) I α (λ − Z T α ) can instead be minimized over a simpler set, and to this end we first introduce some terminology. We say that a collection of sets {Γ α , α ∈ A} (where for each α ∈ A, Γ α ⊂ R r ) is 'locally (at the origin) uniformly equal' to a set Λ ⊂ R r if there exists a δ > 0 such that Γ α ∩ (−δ, δ) r = Λ ∩ (−δ, δ) r for all α ∈ A. Note that '{Γ α , α ∈ A} is locally uniformly equal to Λ' implies that (i) 'for all α ∈ A, Γ α is locally equal to Λ in the sense of Andrews (1999Andrews ( , p. 1359', but the reverse does not hold; and also that (ii) {Γ α , α ∈ A} is uniformly approximated by the set Λ in the sense of Zhu and Zhang (2006, Defn. 3). Finally, we say that a set Λ ⊂ R r is a 'cone' if λ ∈ Λ implies that aλ ∈ Λ for all positive real scalars a.
Based on the preceding discussion we state the following assumption.
Note that by Assumption 5(i)(a), 0 ∈ Θ α for all α ∈ A, so that the cone Λ in Assumption 6 necessarily contains 0 (∈ R r ). The cone Λ also does not depend on α. Now we can establish the following result.
Lemma 3. If Assumptions 1-6 hold, then Describing the limiting random variable. From Lemmas 2 and 3 and the definition ofλ T αq we can now conclude that (1).
(20) The assumed weak convergence of S T α (and hence that of Z α = I −1 α S α ) allows us to derive the weak limit of this random process described in the following lemma.
Lemma 5. If Assumptions 1-7 hold, then Explicit expressions for (I −1 α ) ϑϑ and Z ϑα in terms of S α and I α are given in the proof of this lemma in Supplementary Appendix D.

Derivation of the test statistic
The previous subsection described the asymptotic behavior of 2[L π T (α,β T α ,π T α ,ˆ T α )−L π T (α, β * , π * , 0)], the first term in the expression of LR T (α) in (9). Now consider the second term, namely 2[L 0 , corresponding to the model restricted by the null hypothesis. Recall that L 0 withφ * an interior point ofΦ. Denote the score vector and limiting information matrix by respectively. For the following assumption, partition the process S T α of Assumption 5 as S T α = (S T θα , S T ϑα ) (with S T θα q θ -dimensional and S T ϑα q ϑ -dimensional). The following simplifying assumption, which holds in our examples (see the expressions of S T α in (13) and (16)), allows us to obtain a neat expression for the likelihood ratio test statistic in Theorem 1 below.
Together with the earlier assumptions, Assumption 8 implies that T −1/2 S T θα = T −1/2 S 0 T d → S 0 , a q θ -dimensional Gaussian random vector with mean zero and covariance matrix I θθα = E[S 0 S 0 ] = I 0 . Standard likelihood theory now implies the following result.
The preceding results, in particular Lemmas 4, 5, and 6, now yield the distribution of the LR test statistic in the following theorem.
Theorem 1. If Assumptions 1-8 hold, then This completes the derivation of the LR test statistic. The asymptotic distribution is similar to that in Andrews (2001, Thm 4). As we next discuss, this distribution simplifies in both the LMAR and the GMAR examples.

Examples (continued) LMAR Example.
As was noted in Section 3.3.1, the LMAR case is rather standard in the sense that a conventional second-order Taylor expansion with a nonsingular information matrix and with no parameters on the boundary was sufficient to study the LR test. The only nonstandard feature in this case is the presence of unidentified parameters under the null hypothesis. Validity of Assumptions 6-8 is easy to check (see Appendix B) with the cone Λ of Assumption 6 equal to R r . Thus the infimum in the distribution of the LR T statistic in Theorem 1(ii) equals zero and the result therein simplifies to 10 For every fixed α ∈ A, the quantity Z ϑα (I −1 α ) −1 ϑϑ Z ϑα is a chi-squared random variable, so that the limiting distribution is a supremum of a chi-squared process similarly as in, for example, Davies (1987), Hansen (1996, Thm 1), and Andrews (2001, eqn. (5 .7)).

GMAR Example.
In Section 3.3.1 it was seen that in the GMAR example Z α and I α do not depend on α. As the cone Λ of Assumption 6 does not depend on α either, the weak limit of LR T (α) does not depend on α. Therefore the result of Theorem 1 (validity of Assumptions 6-8 is checked in Appendix C) simplifies to where the unnecessary α has been dropped from the notation. Here Z ϑ follows an q ϑ -variate Gaussian distribution with covariance matrix (I −1 ) ϑϑ , and the limiting distribution, which is sometimes referred to as the chi-bar-squared distribution, is similar to the one in Kasahara and Shimotsu (2012, Proposition 3c,d). Note that the cone Λ ϑ = v(R q 2 ) (see Appendix C) is not convex (in contrast to (at least most of) the examples in Andrews (2001), but similarly to Kasahara and Shimotsu (2012, Proposition 3c,d)) and the dimension of this cone, q ϑ = q 2 (q 2 + 1)/2, may not be small either (q ϑ = 3, 6, 10, . . . for q 2 = 2, 3, 4, . . .).

Simulating the asymptotic null distribution
Similarly to Hansen (1996) and Andrews (2001), the asymptotic null distribution of the LR statistic in Theorem 1 is typically application-specific and cannot be tabulated. Following these papers, we use simulation methods to obtain critical values of the asymptotic null distribution. The following procedure is based on Hansen (1996) and is analogous to the one used by Zhu and Zhang (2004, Sec 2.1) in a related mixture setting. 11 Let A G be some finite grid of α values in A. For each fixed α ∈ A G , letŝ tα signify an empirical counterpart of s tα (see Assumption 5) where the unknown parameterφ * (or (β * , π * )) is replaced by its consistent estimator under the null,φ T . (The specific forms ofŝ tα in the LMAR and GMAR examples are provided in Appendices B and C, respectively.) SetÎ T α = T −1 T t=1ŝ tαŝ tα . Now, for each j = 1, . . . , J (where J denotes the number of repetitions), do the following.
(ii) For each α ∈ A G , setŜ j T α = T t=1ŝ tα v tj ,Ẑ j T α =Î −1 T α T −1/2Ŝ j T α , and (using similar partitioning notation as before) here the minimization of the quadratic form over the cone Λ ϑ has to be performed numerically.
This yields a sample { LR 1 T,A G , . . . , LR J T,A G } of J realizations. An approximate p-value corresponding to an observed LR test statistic LR T is computed as J −1 J j=1 1( LR j T,A G > LR T ) (here 1(·) denotes the indicator function). The precision of this approximation can be controlled by choosing J large enough, see Hansen (1996) (in the illustration below we use J = 1000).

A small Monte Carlo study
We now study the finite sample properties of the LR test statistics and the simulation-based critical values. The results are presented in Table 1. We consider two LR test statistics, one based on an estimated LMAR model, and another based on an estimated GMAR model (as in our two examples in the preceding sections). In all simulations, we use an autoregressive order p = 1, J = 1000 repetitions (see the previous subsection), and three different sample sizes: T = 250, 500, and 1000.
The top part of Table 1 presents results for size simulations. Data is generated from an AR(1) model (for a range of different parameter values shown in Table 1) and AR(1), LMAR(1), and GMAR(1) models are estimated (LMAR with m = 1; GMAR with the restrictionφ 0 =φ 0 ; in estimation of the mixture models we use a genetic algorithm as singularity of the information matrix may render gradient based methods unreliable). Two LR test statistics are calculated based on the estimated LMAR and GMAR models, respectively, and labelled 'LMAR LR T ' and 'GMAR LR T '. Simulation-based p-values are computed based on the asymptotic distributions in Section 3.5.2 and using the simulation procedure in Section 4.1. Using nominal levels 10%, 5%, and 1%, a reject/not-reject decision is recorded. This exercise is repeated 1000 times, and the six rightmost columns in Table 1 present the empirical rejection frequencies (for the LMAR LR T and GMAR LR T tests and the three nominal levels used).
As can be seen from the results in Table 1 (top part), the LMAR LR T test's size is satisfactory overall, typically being somewhat oversized for sample sizes T = 250 and 500, and somewhat conservative for the largest sample size (T = 1000). The parameter values used in simulation do not seem to have a large effect on the size. The GMAR LR T test, on the other hand, appears to be moderately oversized across all sample sizes and parameter values used.
The lower part of Table 1 presents results for power simulations. Data is generated either from a GMAR model or from an LMAR model (for a range of different parameter values shown in Table 1), and empirical rejection frequencies are calculated as above. Both the LMAR LR T test and the GMAR LR T test appear to have good overall power. As expected, when the two regimes differ more from each other, the tests have higher power, and the same happens when sample size is increased. Besides having good power against the 'right' alternatives, the tests also turn out to have decent power against 'wrong' alternatives: When data is generated from the GMAR (resp., LMAR) model, the LMAR LR T (resp., GMAR LR T ) test rejects reasonably often (the GMAR LR T test in particular seems capable of picking up LMAR type regime switching). Naturally, the power of the tests may be inflated due to the tests being oversized.
As a computational remark we note that the LMAR LR T and GMAR LR T tests and their p-values are reasonably straightforward to compute in a matter of seconds using a standard, modern desktop computer (for one particular model, one particular sample size, and J = 1000 repetitions). The GMAR LR T test is computationally more demanding than the LMAR LR T test as it involves the minimization of a quadratic form over a cone which is not needed in the LMAR case (see Sections 3.5.2 and 4.1); this is also one potential reason for the less precise size of the GMAR LR T test.

Conclusions
This paper has studied the asymptotic distribution of the LR test statistic for testing a linear autoregressive model against a two-regime mixture autoregressive model. A distinguishing feature of the paper is that the regime switching probabilities are observation-dependent. Technical challenges resulting from unidentified parameters under the null, parameters on the boundary, and singularity of the information matrix were dealt with by considering an appropriately reparameterized model and higher-order expansions of the log-likelihood function. The resulting asymptotic distribution of the LR test statistic is non-standard and application-specific. Critical values can be obtained by a straightforward simulation procedure, and a Monte Carlo study indicated the proposed tests to have satisfactory size and power properties.
The general theory of the paper was illustrated using two concrete examples, the LMAR model of Wong and Li (2001) and (a version of the) GMAR model of Kalliovirta et al. (2015). Considering other mixture AR models, as well as the general GMAR model, is left for future research. This paper was concerned with testing linearity against a two-regime model, and considering tests of M ≥ 2 regimes versus M + 1 regimes, similarly as in Kasahara and Shimotsu (2015) in a related setting, forms another interesting research topic.

Appendix A Details for the general results
Lemma A.1. When Assumptions 2(ii) and 3(i,ii) hold, a sufficient condition for Assumption 3(iii) is that where C is a finite positive constant, h : [0, ∞) → [0, ∞) is a strictly increasing function such that h (x) ↓ 0 as x ↓ 0, and · * is any vector norm on R 2q 2 .
To complete the proof of Lemma 1, we now justify the last equality in (22). By Assumption 5(ii), , and by the continuous mapping theorem (justification in Supplementary Appendix D), Z T α = I −1 α T −1/2 S T α converges weakly in C(A, R r ) to a mean zero R r -valued Gaussian process Z α = I −1 α S α whose sample paths are continuous in α with probability one and that has E[Z α Z α ] = I −1 α for all α ∈ A. A further application of the continuous mapping theorem (justification in Supplementary Appendix D) implies that sup α∈A Z T α converges in distribution in R and, as all probability measures on R are tight, the limit must be tight. This justifies the last equality in (22).

Proof of Lemma 3. For any vectors
for any point p ∈ R r and a set S ⊂ R r , we define p − S I −1 α via p − S 2 With this notation, we need to prove that Z T α − Θ α,T 2 (1). First note that, because Λ is a cone, we have, for any T , Similarly, by the definitions of Θ α,T and Θ α , Because {Θ α , α ∈ A} is locally uniformly equal to the cone Λ, we can find a δ > 0 such that Θ α ∩ (−δ, δ) r = Λ ∩ (−δ, δ) r for all α ∈ A. Furthermore, 0 ∈ Θ α and 0 ∈ Λ (here 0 ∈ R r ). Therefore, we can find a neighborhood N 0 of 0 such that for all (α, x) ∈ A × N 0 , Now define G T (α), a random function of α, as G T (α) = G T (α, T −1/2 Z T α ). In the proof of Lemma 1 it was shown that sup α∈A Z T α = O p (1) (see (22)) so that T −1/2 Z T α = o pα (1). Therefore, for all > 0, where the equality holds because G T (α, x) = 0 for all (α, x) ∈ A × N 0 , and the convergence holds be- Proof of Lemma 4. It was shown in the proof of Lemma 1 that Z T • ⇒ Z • in C(A, R r ). Therefore also Billingsley (1999, Thm. 3.9)). As the function g : Supplementary Appendix D), the continuous mapping theorem is applicable. This, together with Billingsley (1999, Thm 3.1) (for which it is necessary that the remainder term in (20) is o pα (1) and not only o p (1)), implies that establishing the desired result.
Proof of Lemma 5. The proof consists of reasonably straightforward matrix algebra. For details, see Supplementary Appendix D.
Proof of Lemma 6. The required arguments are standard but presented for completeness and to contrast them with arguments that lead to Lemma 4. The reparameterization described in Assumption 3 is unnecessary and the originalφ-parameterization may be used (alternatively, consider the identity mapping π = π(φ) =φ). As for the quadratic expansion of the log-likelihood function, let θ(φ) = (φ −φ * ) take the role of θ(α, β, π, ), and note that straightforward derivations (similar to those used in the LMAR example in Section 3.3.1) yield withφ denoting a point betweenφ andφ * . Validity of Assumption 5 follows from the arguments used in connection with the LMAR example together with Assumption 8. Assumption 6 holds with Λ = R p+2 . Arguments analogous to those that lead to Lemma 4 now yield the stated convergence result, and the convergence is joint as in both cases it follows from the weak convergence result T −1/2 S T • ⇒ S • .
Proof of Theorem 1. Under Assumption 8, the random process S θα I −1 θθα S θα in Lemma 5 coincides with the random variable S 0 (I 0 ) −1 S 0 in Lemma 6. Therefore the expression of LR T (α) in (7), Lemmas 4, 5, and 6, and Billingsley (1999, Thm 3.1) (for which it is necessary that the remainder term in (7) is o pα (1) and not only o p (1)) imply the weak convergence result for LR T (α). The result for LR T follows from the continuous mapping theorem.

B Details for the LMAR example
In this appendix it appears convenient to denote α L 1,t instead of α L t and to set α L 2,t = 1 − α L 1,t . In some cases we also include the argument α and denote α L 1,t (α) and α L 2,t (α). The same notation is employed in the Supplementary Appendix and a similar modification is used in the case of the GMAR model.
Assumptions 1-4. Assumption 1(i) is assumed to hold, 1(ii) holds as A is compact, and 1(iii) holds by the definition of the mixing weight. For the verification of Assumption 2, see the GMAR example in Appendix C; the LMAR case is treated there as well. To verify Assumption 3, note first that conditions (i) and (ii) clearly hold, and for condition (iii), we have Choosing x * = x 1 = 2q i=1 |x i | and using the triangle inequality it is straightforward to check that Thus, Assumption 3(iii) holds by Lemma A.1. Regarding Assumption 4, as α L 1,t does not depend on (φ, ϕ) and π −1 α (π, ) = (π, π − ), the required differentiability conditions hold for all positive integers k.
To complete the verification of part (iii), we show that I α is a continuous function of α and such that 0 < inf α∈A λ min (I α ) and sup α∈A λ max (I α ) < ∞. For continuity, let α n be a sequence of points in A converging to α • ∈ A. It suffices to demonstrate that lim n→∞ E α L 2,t (α n ) ∇ft(π * ) and similarly with α L 2,t (·) replaced by its square. This, however, is an immediate consequence of the dominated convergence theorem because α L 2,t (α) is a continuous positive function of α and smaller than 1, and because E ∇ft(π * ) < ∞ due to Lemma F.1 (in Supplementary Appendix F.5). The statements on the eigenvalues follow from the continuity of I α , the compactness of its domain A, and the positive definiteness of I α for all fixed α ∈ A shown above.
Assumptions 6-8. That Θ is locally (uniformly) equal to the cone Λ = R 2q follows from the expression of the set Θ given in the verification of Assumption 5 above and the fact that 0 (∈ R 2q ) is an interior point of Θ. Assumption 7 is clear, as Assumption 6 holds with the cone Λ = R 2q . Assumption 8 is clear from the verification of Assumption 5.

C Details for the GMAR example
Assumption 1. Assumption 1(i) is assumed to hold. Assumption 1(ii) holds as A is a compact subset of (0, 1). Assumption 1(iii) holds by the definition of the mixing weight.
The arguments used for the GMAR model above can also be used for the LMAR model, but two things are worth noting. First, the proof given for the GMAR model above goes through even when there are no common parameters so that φ and ϕ could be used in place of (β, φ) and (β, ϕ). Second, equation (35) can be obtained in the same way as in the GMAR case even though the derivation of the related equation (A.4) in Kalliovirta et al. (2015) made use of the explicit expression of the stationary density of (y t , y t−1 ) which is known for the GMAR model but, in general, unknown for the LMAR model. The reason for this is that the null hypothesis is here assumed to hold so that y t is a linear Gaussian AR(p) process, implying that (y t , y t−1 ) is normally distributed with density function a p + 1 dimensional counterpart of the p dimensional normal density function n p (ν 1,t ;φ) defined in equation (6) (see the GMAR example of Section 2.2). After observing these two facts we can proceed in the same way as in the GMAR case and conclude that equation (35) holds also for the LMAR model as long as we replace the mixing weights of the GMAR model with those of the LMAR model. As the arguments employed in the proof of the GMAR case after equation (35) made no use of the mixing weights they apply also to the LMAR model and can be used to complete the proof.
Assumptions 3 and 4. Conditions (i) and (ii) of Assumption 3 clearly hold and condition (iii) can be verified in the same way as in the case of the LMAR model. Specifically, we have and choosing x * = x 1 = 2q i=1 |x i | it can straightforwardly be seen that condition (iii) holds by Lemma A.1. Regarding Assumption 4, based on the expression of α G t and the definition of π −1 α (π, ) = (π + (1 − α) , π − α ), the required differentiability holds for all positive integers k.
For part (ii), notice from (16), (17), and (38) where, for i, j ∈ {1, . . . , q 2 }, the c ij 's are as in Section 3.3.1 (c ij = 1/2 if i = j and c ij = 1 if i = j) and (note that s t , and hence S T , does not involve α as it cancels out from the expressions in (17) and (40)). Therefore the first three requirements in part (ii) are clearly satisfied. For the weak convergence requirement in part (ii) it now suffices to show that T −1/2 S T d → S in R r for some multivariate Gaussian random vector S with mean zero and E[SS ] = I. To this end, s t clearly forms a stationary and ergodic process. Moreover, due to Lemma F.3 in Supplementary Appendix F.5, . . , q 2 } so that s t is a martingale difference sequence. From the expression of I in (49) in Supplementary Appendix F.2 it is clear that E[s t s t ] = I. Positive definiteness of I is proven in Supplementary Appendix F.3. The stated convergence result now follows from the central limit theorem of Billingsley (1961) in conjunction with the Cramér-Wold device.
For part (iii), it suffices to show the finiteness and positive definiteness of I; these are proven in Supplementary Appendices F.2 and F.3. Part (iv) is proven in Supplementary Appendix F.4.
Expression ofŝ tα in Section 4.1. Letε t and ∇f t (φ T )/f t (φ T ) be as in the LMAR example (see Appendix B) and set (see (41) and (42) where, for i, j ∈ {1, . . . , q 2 }, the c ij 's are as in Section 3.3.1 (c ij = 1/2 if i = j and c ij = 1 if i = j) and .
Explicit expressions for the elements of ∇ 2 f t (φ T )/f t (φ T ) can be obtained from (50) in Supplementary Appendix F.3 by replacing ε t andσ * 1 therein withε t andσ T , respectively. Expressions for the elements of ∇n p (φ T )/n p (φ T ) can be obtained by evaluating (51)  Supplementary Appendix to 'Testing for observation-dependent regime switching in mixture autoregressive models' by Meitz and Saikkonen (not meant for publication)

D Further details for the general results
Proof of Lemma 1, further details. To justify that the last term on the right hand side of (26) is dominated by − 1 4 θ T α 2 + o pα (1), note first that where W T α = − 1 4 + o pα (1). Thus, P (sup α∈A W T α ≤ 0) → 1 and (here 1(·) denotes the indicator function) where the last term is non-negative and positive with probability that is at most P (sup α∈A W T α > 0) → 0. Thus, combining the above derivations yields the desired result o pα (1) − 1 To justify the use of the continuous mapping theorem, note that in the first instance it is applied with the function g : C(A, R r )×{I α } → C(A, R r ) mapping (x • , I • ) to I −1 • x • . Here I −1 α x α is continuous in α by Assumption 5(iii). Also, the latter set in the product C(A, R r ) × {I α } contains only the non-random function I α ; this product space can be equipped with essentially the same metric as C(A, R r ); cf. Andrews and Ploberger (1994, p. 1392and 1407 and Zhu and Zhang (2006, proof of Theorem 5). In the second instance, the continuous mapping theorem is applied with the function g : B(A, R r ) → R mapping x • (∈ B(A, R r )) to sup α∈A x α . For continuity, we need to establish that if a sequence x n• converges to x • in B(A, R r ), then g(x n• ) → g(x • ) in R (i.e., if sup α∈A x nα − x α → 0, then |sup α∈A x nα − sup α∈A x α | → 0). The triangle inequality implies that sup α∈A x nα ≤ sup α∈A x nα − x α + sup α∈A x α , as well as the same result with x nα and x α interchanged, and the desired result follows from these inequalities.
Proof of Lemma 4, further details. It remains to verify the continuity mentioned in the proof. For simplicity, consider the continuity of the functions g 1 : and similarly for the other infimum, we need to consider Using the triangle inequality and properties of the Euclidean vector norm, and similarly with x nα and x α exchanged, so that As was noted after Assumption 6, the cone Λ contains the origin, so that the term in (43) in parentheses is dominated by (λ max (I α )) 1/2 ( x nα + x α ). Now, due to Assumption 5(iii), the fact that x n• , x • are bounded, and the assumed sup α∈A x nα − x α → 0, the quantity in (43) converges to zero.
Proof of Lemma 5. For brevity and clarity, within this proof we use somewhat simplified notation and let denote the partition of I −1 α (so that, e.g., C is shorthand for (I −1 α ) ϑϑ ). This implies that I α can be expressed as where D = A − BC −1 B (thus, e.g., D −1 = I θθα ). Note also that A, C, and D are symmetric (as I α is symmetric).
Finally, (I −1 α ) ϑϑ and Z ϑα can be expressed as where the two different expressions result from two different ways of writing the inverse of a partitioned matrix.
F Further details for the GMAR example
with respect toσ 2 1 , and use of the explicit expressions given above, it suffices to show that the components of the vector s t = (s t,1 ,s t,2 ,s t,3 ,s t,4 ,s t,5 ,s t,6 ) (where the dimensions of the six components are 1, p, 1, p(p + 1)/2, p, E[X * t,i,j X * t,k,l ] + E[X * t,i,k X * t,j,l ] + E[X * t,i,l X * t,j,k ] i j k l = 1 4 This completes the proof of claim (b). Therefore, the verification of Assumption 5(iv) is done.
The following notation will be helpful: