Inference in mixed causal and noncausal models with generalized Student's t-distributions

The properties of Maximum Likelihood estimator in mixed causal and noncausal models with a generalized Student's t error process are reviewed. Several known existing methods are typically not applicable in the heavy-tailed framework. To this end, a new approach to make inference on causal and noncausal parameters in finite sample sizes is proposed. It exploits the empirical variance of the generalized Student's-t, without the existence of population variance. Monte Carlo simulations show a good performance of the new variance construction for fat tail series. Finally, different existing approaches are compared using three empirical applications: the variation of daily COVID-19 deaths in Belgium, the monthly wheat prices, and the monthly inflation rate in Brazil.


Introduction
Mixed causal and noncausal models (MARs) are time series processes that consist of lead and lag components. Unlike causal models (e.g. ARMA models), these specifications can capture nonlinear features, such as bubbles, which can be defined as economic or financial processes that undergo a rapid increase followed by a sudden crash. MAR has manifested its values in various applications, for instance, commodity prices, inflation rates, bitcoin, and other forms of equity (see inter alia Hencic and Gouriéroux (2015), Gouriéroux and Jasiak (2016), Bec et al. (2020), Karapanagiotidis (2014), Lof and Nyberg (2017), Hecq and Sun (2020), Gourieroux et al. (2021), and Gourieroux et al. (2020)). Furthermore, the interpretation of MARs is rooted in economic theory. More specifically, these models can be interpreted as conditions in which economic agents have more information than econometricians. For this reason, mixed causal and noncausal models are linked to the existence of non-fundamentalness in structural econometric models (see Alessi et al. (2011), and Lanne and Saikkonen (2013)). Finally, as MARs can be utilized to test for the time reversibility of a stochastic process, they can also be employed to detect business cycle asymmetry. In recent decades seminal work has sparked a renewed interest in the topic (see Breidt and Davis (1992), Ramsey and Rothman (1996), and Proietti (2020)). Despite the wide applicability of MARs, their estimation and inference are far from trivial.
A common occurrence in the MAR literature is the assumption of an error term that is distributed according to a generalized Student's t in order to capture the heavy tails in relevant data. Typically, the degrees of freedom of the Student's t are initially unknown and must be estimated. Many studies have revealed that the estimated degrees of freedom of the generalized Student's t lie between 1.5 and 2 (see Fries and Zakoian (2019), Hecq and Voisin (2019), Hecq and Voisin (2021), and ). This empirical finding imposes a difficulty for inference in MARs, namely the variance does not exist. In order to bypass this difficulty, certain methods have been proposed: (1) using a different asymptotic framework (Davis and Resnick (1985)); (2) adopting other distributions (Fries and Zakoian (2019), and Fries (2021)); (3) employing bootstrap estimators (Cavaliere et al. (2020), in the case of purely noncausal models). Our novel strategy involves the use of simulations to identify a robust estimator of the variance which performs optimally when dealing with small sample sizes. An estimator of this kind facilitates the inference of MARs by using a standard t-test. Extensive Monte Carlo simulations show that the t-tests have empirical rejection frequencies (E.R.F.) close to the nominal significance level.
The rest of the paper is organized as follows; Section 2 introduces mixed causal and noncausal models. It presents the different ways of obtaining the expected Fisher Information matrix for MARs and the existing strategies are briefly reviewed. Section 3 proposes a new approach to compute the standard errors of causal and noncausal parameters based on a robust estimator of the residuals. In Section 4, Monte Carlo simulations are used to compare the performance of current methodologies with the novel approach proposed here. Section 5 is dedicated to demonstrating the empirical applications on three different time series. Section 6 concludes.

Mixed causal and noncausal models
Consider the univariate MAR(r,s) defined as: where L is the backshift operator and L −1 produces leads such that L −1 y t = y t+1 . In order to have an easier notation, we assume that y has been demeaned. We denote φ(L) as causal/autoregressive polynomial of order r, ϕ(L −1 ) as noncausal/lead polynomial of order s, and p = r + s. It is assumed that both the polynomials φ(z) and ϕ(z) have their roots outside the unit circle: φ(z) = 0 and ϕ(z) = 0 f or |z| ≤ 1.
Note that purely causal or noncausal models are obtained by respectively setting ϕ(L −1 ) = 1 and φ(L) = 1. Furthermore, t is an independent and identically (i.i.d.) generalized Student's t sequence of random variables (the non-Gaussianity assumption of the error term is required to identify noncausal from causal models) with a mean of zero and finite variance (σ 2 ), such that its density function is: with corresponding (approximate) log-likelihood function: where the vectors φ = (φ 1 , ..., φ r ) and ϕ = (ϕ 1 , ..., ϕ s ) respectively collect the causal and the noncausal coefficients. In addition, ν denotes the degrees of freedom, η is the scale parameter and: is the variance of the error term. Since the first r and the last s observations are lost in the MAR(r,s) estimation, in (1), the parameter vectors φ and ϕ can be estimated by an AMLE approach. AMLE refers to the approximate Maximum Likelihood estimators. Lanne and Saikkonen (2011) employ a more general assumption than we do. More specifically, they use mixed causal and noncausal models which are characterized by an innovation term whose density function satisfies conditions (A1)-(A7) of Andrews et al. (2006). The generalized Student's t is one of them. In order to present their theorem, some provisional notation is necessary.
Let ζ t ∼ i.i.d (0, 1) and define the AR(r) stationary process u * t by φ 0 (L)u * t = ζ t and the AR(s) stationary process v * t by ϕ 0 Theorem 1 (by Lanne and Saikkonen (2011)) Given conditions (A1)-(A7) of Andrews et al. (2006), there exists a sequence of local maximizers θ = ( φ, ϕ, η, v) of l t (θ) in (4) such that where: is the expected Fisher information matrix (it yields the Cramér-Rao bound, which gives the lower bound of the asymptotic covariance matrix of the ML estimators). The diag notation in (5) implies that the expected Fisher Information matrix is asymptotically a block diagonal matrix. In other words, the two blocks Σ and Ω are asymptotically independent such that they can be treated separately. Σ is the expected Fisher Information matrix of the AML estimators (φ, ϕ), defined by Lanne and Saikkonen (2011) as: where: On the other hand, Ω is the expected Fisher Information matrix of the distributional parameters (ν and η). This paper only focuses on matrix Σ since we are exclusively interested in the standard errors of causal and noncausal parameters. It is positive definite if J > 1 (see condition (A5) of Andrews et al. (2006)), and when the innovation term is distributed according to a generalized Student's t-distribution, Andrews et al. (2006) show that: Hence, Σ is positive definite for ν > 2. The shortcoming of the approach proposed by Lanne and Saikkonen (2011) is that we cannot consider processes characterized by an error term as distributed as a generalized Student's t in cases where the degrees of freedom are less than 2. It is restrictive for heavytailed time series such as stock and commodity prices, bitcoin, and other form of equity, in cases where the degrees of freedom range between 1.3 and 1.9 (without reaching the Cauchy for ν = 1 though). Hecq et al. (2016) introduce an approximative and more straightforward way to compute the matrix defined in (6). This methodology is implemented in the R package MARX, and has been applied in several studies. It can be used whenever the error term is i.i.d. and has a density function as expressed in (3). Adapting the results obtained by Fonseca et al. (2008) in the context of MARs, Hecq et al. (2016) derive the following expected Fisher Information matrix of causal and noncausal parameters: Whenever the error term has a finite variance, matrix (8) leads to the same diagonal blocks as in matrix (6). For ν < 2, the approximate expected Fisher Information matrix of the causal and noncausal parameters (Σ D ) is still positive definite and provides standard errors, unlike Σ. However, a closed-form solution for the limiting distribution of the MAR parameters does not exist in this context (see Davis et al. (1992), and Andrews and Davis (2013)). This problem could be overcome by employing bootstrapping and simulation-based models.

A new robust estimator
In this section, we propose a new methodology to compute the standard errors of MAR parameters. Its use is valid with finite sample sizes in instances where the error term is assumed to be distributed according to a generalized Student's t-distribution. The next section will use Monte Carlo simulations to empirically show that our new estimator performs optimally for ν ∈ (1, D], with D < ∞. This is true, although it is not possible to derive the theoretical limiting distributions of these parameters in the heavy-tail framework. In Section 2, it is stated that the variance of the error term (σ 2 ) multiplies the block diagonal matrices of Σ. Since the generalized Student's t−distribution with heavy-tailed innovations is characterized by infinite variance, the expected Fisher Information matrix cannot be computed in this context. Our alternative strategy consists of replacing the error term's variance with the variance of residuals (σ 2 ) in (6). Furthermore, we expect the residuals to have a wide range of values, especially in cases where the population variance is infinite. In order to decrease the effect of huge outliers, we estimate the standard deviation of the residuals using the robust estimator introduced by Rousseeuw and Croux (1993): where M AD, also known as median absolute deviation, is a robust estimator of the variability of residuals: and k is a scalar value. Rousseeuw and Croux (1993) state that the value of k depends on the distribution of residuals, and that in the Gaussianity case, k = 1.48 ensures a robust estimate of the standard deviation. To detect k = 1.48 through an empirical approach, we apply a Monte Carlo experiment where a Gaussian error term is simulated (with an expected value equal to 0 and standard deviation equal to 5), considering T = 1000 observations and N = 700000 replications. A large number of replications is required to obtain an empirical density function as accurately as possible. In each replication, we compute the value of k using equation (9). In this way, the experiment yields as many estimates of k as the number of replications. To analyze the behavior of these estimates, we compute the empirical density function of k using the kernel density estimation. It is well known that extreme values of k can affect the non-parametric estimation (see Kim and Scott (2012)). To this end, we only consider the values of k that lie in the interval in the estimation procedure. Note that Q1 and Q3 indicate the first and the third quartile of k respectively, and IQR is its interquartile range. Figure 1 shows the result obtained: a distribution centered around the mode 1.48, the same value identified by Rousseeuw and Croux (1993). Let us now find the value of k that provides a robust estimate of the standard deviation of residuals, under the assumption that the error process follows a generalized Student's t−distribution, for ν ∈ (1, D]. In this case, we cannot use k = 1.48 because, as previously stated, it changes according to the distribution considered. Hence, we apply the same empirical approach used in the Gaussian case to detect its value. More specifically, after estimating the degrees of freedom (ν) of our MAR process, we apply a Monte Carlo experiment where the error term is simulated setting ν 0 =ν and the sample size is equal to the number of observations detected (T ). The next step is to take the values of k that lie inside the interval (11), and compute its empirical density function. Finally, as in the Gaussian case, we only need to extract a single value of k from its empirical density function. We take its mode and denote it as k * .
It is important to note that under the assumption of Student's t, the standard deviation of the residuals depends on two different parameters: the degrees of freedom (ν) and the sample size (T ). This implies that we can rewrite (9) as: In other words, k is a random variable with different density functions depending on different values of ν and T . To investigate how k changes with ν and T, two different Monte Carlo simulations are carried out. In the first one, we analyze how k correspondingly changes for different values of ν, keeping T as fixed. In particular, we consider an experiment where the error term is simulated setting T =500, ν 0 = (1.2, 1.8, 3, 50, 1000) and N = 700000 replications. The empirical density functions obtained are shown in Figure 2. The graphs show how the empirical density functions differ according to the different values of ν.
The second Monte Carlo experiment analyzes how the empirical distributions of k change with samples of various sizes. The data generating process is now characterized by several sample sizes T = (100, 200, 500, 1000, 3000) and degrees of freedom fixed to ν 0 = 1.5. Figure 3 illustrates how the mode of the empirical distributions shifts toward the right as T increases. This implies that k * can also be expressed as a function of ν and T . Appendix A provides its value for different ν and T.
In conclusion, this approach leads us to the following Fisher Information matrix of the causal and noncausal coefficients: where:

Monte Carlo simulations
Let us now analyze the numerical stability of our new robust estimator of σˆ as sample sizes increase. In other words, we want to analyze how our new estimator behaves as T rises. To this end, we run several Monte Carlo simulations characterized by N = 10000 replications each. The data generating process is a MAR(1,1) with a scale parameter of η 0 = 3 and sample sizes of T = (100, 200, 500, 1000, 2000, 3000). We also consider several degrees of freedom ν 0 = (1.2, 1.5, 1.8, 3) and various combinations of causal and noncausal coefficients, that is: For each replication, we estimate the standard deviation of the residuals using equation (9). Finally, we generate different boxplots to display the results. In the finite variance framework (ν 0 = 3), the median of the estimated standard deviation goes towards its real value as the sample size increases. On the other hand, for ν 0 = (1.2, 1.5, 1.8), σˆ goes to infinity regardless of which causal and noncausal coefficients are considered. This is particularly evident in the case ν 0 = 1.2, where fatter tails characterize the distribution of the residuals. However, it is important to underline that the robust estimate of σˆ never explodes dramatically to infinity: in all cases considered, an increment of the sample size of 100% implies an increment of the median of the estimated standard deviation of less than 100%. This means that the robust estimate of σˆ diverges to infinity as T increases. However, it does so at a slow rate.
To understand if the aforementioned divergence to infinity could be a problem in the inference framework, we analyze the E.R.F. of the t-test and compare them to the nominal significance level. Therefore, we run several Monte Carlo simulations with the same data generating process as before. For each replication, we test whether the estimated causal and noncausal coefficients are equal to their respective true values. In particular, we compute two different t−tests: H 0 : φ = φ 0 and H 0 : ϕ = ϕ 0 against the two-sided alternatives, φ = φ 0 and ϕ = ϕ 0 respectively. Tables 1-3 show the empirical rejection frequencies obtained at the nominal significance level of 5%. The frequencies appear when the standard errors are computed using the new methodology and those introduced in Section 2. These tables do not display all the results obtained by our data generating process as they are all similar, and including them is unnecessary. The results obtained by MAR(1,1) with φ 0 = 0.35, ϕ 0 = 0.65 and ν 0 = 1.5 are available on request. We observe that for ν > 2 (Table 1), the method developed by Hecq et al. (2016) provides an empirical t−distribution characterized by tails fatter than a standard normal distribution. The reason for this is that in the denominator of the t−test, the standard errors are underestimated. This is true because of the assumption of block diagonality made on the matrix (8) (see Section 2). We also observe that our new approach generates fewer distortions for small sample sizes (T = (100, 200)) than those provided by Σ. Indeed, the E.R.F obtained by the matrix (6) only get closer to the 5% nominal rejection frequency for T = 1000. On the other hand, for ν ≤ 2 (Tables 2-3), the matrix Σ cannot be derived. Furthermore, the E.R.F. generated by Σ D are still far from the nominal significance level (especially for small sample sizes and small values of ν). Our new approach (Σ R ) is unique in that it provides a t-distribution close to the standard normal distribution. Also, it should be noted that the empirical rejection frequencies generated by this methodology decrease as the size of the sample becomes larger. This is due to the fact that within this framework the robust estimator of σˆ diverges to infinity as T increases. However, as previously stated, since the divergence to infinity occurs slowly, the convergence towards 0 of the empirical rejection frequencies is so low that it is negligible in finite sample sizes. Empirical rejection frequencies -nominal ones at 5%; MAR(1,1): The first column,Σ, indicates the empirical rejection frequencies obtained by the matrix defined in (6). The last two columnsΣ D andΣ R indicate the empirical rejection frequencies obtained using the standard errors developed by Hecq et al. (2016) and our new robust approach respectively.  (the IPCA targets population families with household income ranging from 1 to 40 minimum wages and this income range guarantees a 90% coverage of families living in 13 geographic zones) observed from January 1997 to June 2020 (source: Central Bank of Brazil) and (c) the variation of daily COVID-19 deaths in Belgium from 10/March/2020 to 17/July/2020 (source: WHO). Figure 6 presents the data. With this panel of applications, we want to show that MAR models are also interesting for modeling other series as well as the usual commodity prices.  Estimating MAR models involves carrying out a series of steps. Firstly, it is necessary to estimate a conventional causal autoregressive model by OLS in order to obtain the lag order p using information criteria (see Saikkonen (2011), andHecq et al. (2016)). We find p = 2 for the inflation rate and wheat prices, whereas p = 4 is chosen for Belgian's COVID-19 series. Secondly, using an AML approach and searching for the r and s (with p = r + s) that maximize the generalized Student's t likelihood function, we discover that wheat prices and Brazilian inflation follow a MAR(1,1). On the other hand, the variation of COVID-19 deaths follows a MAR(2,2). Finally, we detail the value of estimated parameters and their standard errors using the methods reviewed and the novel method introduced in this paper.
The results of the simulations carried out lead us to expect some differences and similarities given the degrees of freedom estimated for the three variables: for COVID-19 dataν = 1.17, on wheat pricesν = 2.21 andν = 3.22 for Brazilian inflation. Although we observe fat tails in each series, only Belgian daily data is characterized by degrees of freedom below 2. However, none of them are significantly different from 2. To verify this, we use the standard errors given by −(T − p) −1 δ 2 l T (φ,φ, θ 2 )/δθ 2 δθ 2 withθ 2 = (ν,η), which is a consistent estimator of the expected Fisher information matrix of the distributional parameters (see Lanne and Saikkonen (2011)). Unlike Σ, Ω has no restrictions and can also be computed when the population variance is infinite.
Differences in the standard errors of the causal and noncausal parameters depend on the approaches used to compute them. In the empirical application concerning the Brazilian inflation rate (Table 5), we notice that the standard errors obtained through the robust estimator of residuals are larger than those obtained by the "traditional" methodologies described in Section 2. The same is true for the causal coefficient in the wheat prices (Table 4). For the noncausal coefficient of the same empirical application, we obtain smaller standard errors when they are calculated byΣ. Finally, matrixΣ does not provide standard errors when the time series related to the variation of the number of fatalities for COVID-19 in Belgium (Table 6) is considered. This is true because it is characterized by an error term with infinite variance. On the other hand, our methodology generates standard errors that are smaller than those obtained by the matrixΣ D .

Conclusions
In this paper, we review the behavior of the ML estimator for mixed causal and noncausal models. In particular, we focus on those with an error term that is assumed to be distributed according to a generalized Student's t−distribution. As demonstrated by Lanne and Saikkonen (2011), the expected Fisher Information matrix of causal and noncausal parameters can be computed if and only if the probability density function satisfies a certain set of assumptions. Generalized Student's t−distributions infinite variance do not meet one of these assumptions (J > 1), hence, this methodology is not applicable in the context of infinite variance. This is a serious limitation since we cannot consider time series with estimated degrees of freedom equal to or less than two. Hecq et al. (2016) propose a new, approximate, and simplified way of computing the standard errors of these parameters. This methodology is also implemented in the R package MARX and applied in several studies. However, through a simulation study, we show that the aforementioned approach does not facilitate the inference of the MAR parameters. This is true because the t-tests exhibit empirical rejection frequencies far from the nominal significance level, especially when applied to small sample sizes and for small values of degrees of freedom. In order to bypass these problems, we propose an novel way to compute standard error based on a simple alternative estimator of the variance of residuals. Monte Carlo simulations show the optimal performance of this new estimator, even when the variance of the population is not finite. Finally, we estimate MAR models on three time series, and illustrate the differences in the estimated standard error produced by the various approaches discussed.