1 Introduction

It is widely believed that there is an asymmetric inertia in many major economic time series, often attributed to differences in time series dynamics in periods of business-cycle contraction and expansion. This asymmetric behavior has been documented by, for instance, Wecker (1981) for US industrial prices. Brännäs and De Gooijer (1994) and Elwood (1998) find empirical evidence of asymmetry in US real GNP growth rates. Some more recent works include Gonzalo and Martínez (2006) for US GNP data, and Taştan (2017) for Turkish real GNP and industrial production index series. Further related papers are by Brännäs and de Luna (1998), Brännäs and Ohlsson (1999), Nebeling and Salish (2017), and Sáfadi and Morettin (1998).

A common feature of the above studies is the use of univariate asymmetric moving average (asMA) models and their variants. The dynamics of these models respond to innovations with one of two different rules according to whether the innovation is positive or negative, and hence induces asymmetry. Obviously, the univariate structure of an asMA model offers limited information about the asymmetry in a data generating process (DGP) because dynamic interrelationship between several variables is ignored. This calls for a more flexible multivariate (vector) dynamic specification with similar features as the univariate asMA model.

The purpose of the paper is to introduce and study the class of asymmetric vector moving average (asVMA) models. The asVMA model may be viewed as an extension of the univariate asMA model proposed by Wecker (1981). Roughly speaking, the asymmetric effect in an asVMA model is characterized by different MA filters applied to the components of vectors of lagged positive and negative innovations. This allows for a detailed investigation of the interrelationships among model innovations of different sign. From an empirical standpoint, there are sound reasons for using vector nonlinear models to detect asymmetries in macroeconomic time series; see, e.g., Atanasova (2003), Keating (2000), and Weisse (1999).

The rest of the paper is organized as follows. In Sect. 2, we introduce the asVMA model and discuss some covariance properties. Related to this, we briefly investigate the global invertibility of the proposed model. Section 3 is on estimation and testing. First, we describe the log-likelihood function. Next, we propose a multivariate Wald-type test statistic for testing an asVMA against a linear vector MA model. We evaluate the finite-sample performance of the proposed Wald-type test statistic in a Monte Carlo simulation study. In Sect. 4, we devise an exploratory test statistic based on lagged sample cross-bicovariance estimates. Section 5 contains two illustrative applications. A summary is given in Sect. 6. All proofs are relegated to an “Appendix”.

2 Asymmetric vector moving average model and properties

2.1 Model

Consider an m-dimensional stochastic process \({\mathbf {Y}}_{t}=(Y_{1t},\ldots ,Y_{mt})^{\prime }\). Let \(\varvec{\varepsilon }_{t}=(\varepsilon _{1t},\ldots ,\varepsilon _{mt})^{\prime }\) be an m-dimensional i.i.d. white noise process with \(m\times 1\) mean vector \({\mathbf {0}}_{m}\) and \(m\times m\) positive definite matrix \(\varvec{\Sigma }_{\varepsilon }\), independent of \({\mathbf {Y}}_{t}\). Assume that the dynamic relationships in \(\{{\mathbf {Y}}_{t}, t\in {\mathbb {Z}}\}\) are represented through a linear vector filter of positive innovations \(\varvec{\varepsilon }^{+}_{t} =(\varepsilon ^{+}_{1t},\ldots ,\varepsilon ^{+}_{mt})^{\prime }\) and a linear filter of negative innovations \(\varvec{\varepsilon }^{-}_{t} =(\varepsilon ^{-}_{1t},\ldots ,\varepsilon ^{-}_{mt})^{\prime }\) which satisfy, respectively,

$$\begin{aligned} \varvec{\varepsilon }^{+}_{t}&= \big (\max \{0,\varepsilon _{1t}\},\ldots ,\max \{0,\varepsilon _{mt}\}\big )^{\prime } = \big (I(\varepsilon _{1t}\ge 0)\varepsilon _{1t},\ldots , I(\varepsilon _{mt}\ge 0)\varepsilon _{mt}\big )^{\prime }, \\ \varvec{\varepsilon }^{-}_{t}&= \big (\min \{0,\varepsilon _{1t}\},\ldots ,\min \{0,\varepsilon _{mt}\}\big )^{\prime } = \big (I(\varepsilon _{1t}< 0)\varepsilon _{1t},\ldots , I(\varepsilon _{mt}< 0)\varepsilon _{mt}\big )^{\prime }, \end{aligned}$$

where \(I(\cdot )\) denotes the indicator function. Then an m-dimensional asymmetric vector moving average process of order (qq), denoted \(\text{ by } \text{ asVMA }(q,q)\), is defined as

$$\begin{aligned} {\mathbf {Y}}_{t}&= \varvec{\varepsilon }_{t} + \sum ^{q}_{v=1}{\mathbf {B}}^{+}_{v}\varvec{\varepsilon }^{+}_{t-v}+ \sum ^{q}_{v=1}{\mathbf {B}}^{-}_{v}\varvec{\varepsilon }^{-}_{t-v} \nonumber \\&= \varvec{\varepsilon }_{t} + \sum ^{q}_{v=1}{\mathbf {B}}^{+}_{v} \varvec{\varepsilon }_{t-v} +\sum ^{q}_{v=1}\big ({\mathbf {B}}^{-}_{v}-{\mathbf {B}}^{+}_{v}\big ) \varvec{\varepsilon }^{-}_{t-v}, \end{aligned}$$
(1)

where \({\mathbf {B}}^{\pm }_{v}=[b^{\pm }_{rs,v}]^{m}_{r,s=1}\) \((v=1,\ldots ,q)\) are \(m\times m\) parameter matrices.

We see that in (1), each component of the vector process \(\varvec{\varepsilon }_{t}\) is transformed by separate MA filters and the asymmetry in \(\{{\mathbf {Y}}_{t}, t\in {\mathbb {Z}}\}\) depends on the sign of the innovations. Moreover, in (1) we assume that each component of \(\varvec{\varepsilon }_{t}\) has the option of moving into one of two possible directions, above and below a fixed value zero. Note that by introducing a suitable number of zero parameter matrices, the order q can be different for \(\varvec{\varepsilon }^{+}_{t}\) and \(\varvec{\varepsilon }^{-}_{t}\). When \(({\mathbf {B}}^{-}_{v}-{\mathbf {B}}^{+}_{v})={\mathbf {0}}_{m\times m}\), (1) reduces to an m-dimensional VMA(q) process in “standard form”. Also note that for \(m=1\), model (1) corresponds to the univariate asMA(q) process proposed by Wecker (1981), i.e., \(Y_{t}=\varepsilon _{t}+\sum ^{q}_{v=1}b_{v}^{+}I(\varepsilon _{t-v}\ge 0)\varepsilon _{t-v}+\sum ^{q}_{v=1}b_{v}^{-}I(\varepsilon _{t-v}<0)\varepsilon _{t-v}\). For one-step ahead forecasts of stock returns, Koutmos (1999) used a variant of this model with the conditional mean given by an asMA(1) model and the conditional standard deviation given by a threshold GARCH(1, 1) model. Guay and Scaillet (2003) modified the univariate asMA(q) to allow for contemporaneous asymmetric effects.

2.2 Invertibility

Forecasting with an asVMA model is, in principle, the same as producing forecasts from a VMA model provided the model is invertible. For each univariate component of \({\mathbf {Y}}_{t}\), the empirical invertibility of (1) can be checked by Monte Carlo simulation; see, e.g., De Gooijer and Brännäs (1995). Their approach can be easily extended to the multivariate case. Alternatively, we can use the following global invertibility condition (Niglio and Vitale 2013)

$$\begin{aligned} |\lambda (\varvec{\Psi }^{+})|^{p}|\lambda (\varvec{\Psi }^{-})|^{1-p}<1, \end{aligned}$$
(2)

where \(\varvec{\Psi }^{\pm }\) is an \(mq\times mq\) matrix defined by

$$\begin{aligned} { \varvec{\Psi }^{\pm } = \left( \begin{array}{ccc} {\mathbf {B}}^{\pm }_{1} &{} \ldots &{} {\mathbf {B}}^{\pm }_{q} \\ &{} {\mathbf {0}}_{(m-1)q\times mq} &{} \\ \end{array} \right) . } \end{aligned}$$

Here, \(\lambda (\cdot )\) is the maximum eigenvalue of a square matrix and \(p={\mathbb {E}}[I({\mathbf {1}}_{m}^{\prime }\varvec{\varepsilon }_{t-1}^{+}\ge 0)]\) with \({\mathbf {1}}_{m}\) an \(m\times 1\) vector consisting of ones.

As an illustration, consider a two-dimensional asVMA(1,1) model with \({\mathbf {B}}^{+}_{1}=-{\mathbf {B}}^{-}_{1}\) and with the values of \(b^{\pm }_{12,1}\) fixed at zero. The reason for choosing this particular model will become clear in Sect. 2.3. Using (2) with \(p=1/2\), we noticed that the shape of the global invertibility regions are the same for all pairs of parameters \((b^{+}_{rr,1},b^{-}_{ss,1})\) \((r,s=1,2)\). Figure 1 shows a graphical representation where the elements of \({\mathbf {B}}^{\pm }_{1}\) are taken in the range \([-4,\,4]\) using a step-size of 0.1. It is interesting to see that the region indicates invertibility over a wide range of parameter values. In fact, a simple approximation to the global invertibility region is given by the condition \(|b^{+}_{rr,1}||b^{-}_{ss,1}|<1\). This result follows from approximating the values at the boundaries of the invertibility–non-invertibility region by nonlinear functions of the parameter combinations.

Fig. 1
figure 1

Scatter plot of the global invertibility region for the pair of parameters \((\beta ^{+},\beta ^{-})\equiv (b^{+}_{rr,1},b^{-}_{ss,1})\) \((r,s=1,2)\) of a two-dimensional asVMA(1, 1) model with \({\mathbf {B}}^{+}_{1}=-{\mathbf {B}}^{-}_{2}\) and with \(b^{\pm }_{12,1}=0\). The black solid lines represent the condition \(|\beta ^{+}||\beta ^{-}|<1\)

2.3 Covariance properties

For a linear stationary vector time series process \(\{{\mathbf {Y}}_{t}, t\in {\mathbb {Z}}\}\), the mean vector, variance and cross-covariance matrices provide useful summary information on the strength and direction between its components. Given a particular VMA model, these theoretical properties are well known. By contrast, explicit expressions for the mean and covariance of a general asVMA process are more difficult to derive.

Consider as an illustrative example the following m-dimensional asVMA(1, 1) process

$$\begin{aligned} {\mathbf {Y}}_{t}&= \varvec{\varepsilon }_{t}+{\mathbf {B}}^{+}_{} \varvec{\varepsilon }_{t-1} +({\mathbf {B}}^{-}-{\mathbf {B}}^{+}) \big (I(\varepsilon _{1,t-1}< 0)\varepsilon _{1,t-1},\ldots , I(\varepsilon _{m,t-1}< 0)\varepsilon _{m,t-1}\big )^{\prime }, \end{aligned}$$
(3)

where we suppressed the subscript v in \({\mathbf {B}}^{\pm }_{v}\). The next proposition summarizes some properties of (3).

Proposition

Let \(\{{\mathbf {Y}}_{t},t\in {\mathbb {Z}}\}\) follow the process in (3) with \(\{\varvec{\varepsilon }_{t}\}{\mathop {\sim }\limits ^{\tiny {\text{ i.i.d. }}}} {{\mathcal {N}}}({\mathbf {0}}_{m},{\varvec{\Sigma }}_{\varepsilon })\). Then the unconditional mean vector, the unconditional variance-covariance matrix, and the unconditional lag-\(\ell \) cross-covariance matrix are given by, respectively,

$$\begin{aligned} {\mathbb {E}}({\mathbf {Y}}_{t})&= (2\pi {\mathbf {1}}_{m}^{\prime }\varvec{\Sigma }_{\varepsilon }{\mathbf {1}}_{m})^{-1/2} \big ({\mathbf {B}}^{-}-{\mathbf {B}}^{+}\big ) \varvec{\Sigma }_{\varepsilon }{\mathbf {1}}_{m}, \\ \text{ Var }({\mathbf {Y}}_{t})&= \varvec{\Sigma }_{\varepsilon } + \frac{1}{2}\Big ({\mathbf {B}}^{-}\varvec{\Sigma }_{\varepsilon }\big ({\mathbf {B}}^{-}\big )^{\prime } + {\mathbf {B}}^{+}\varvec{\Sigma }_{\varepsilon } \big ({\mathbf {B}}^{+}\big )^{\prime }\Big ) - {\mathbb {E}}({\mathbf {Y}}_{t}){\mathbb {E}}({\mathbf {Y}}^{\prime }_{t}), \\ \varvec{\Gamma }(\ell )&= [\gamma _{ij}(\ell )]^{m}_{i,j=1}= \text{ Cov }({\mathbf {Y}}_{t},{\mathbf {Y}}_{t-\ell }) = {\left\{ \begin{array}{ll} \frac{1}{2}\big ({\mathbf {B}}^{-}+{\mathbf {B}}^{+)}\big ) {\varvec{\Sigma }}_{\varepsilon },\! &{}\ell =1,\\ {\mathbf {0}}_{m\times m}, &{} \ell >1. \end{array}\right. } \,\,\,\, \end{aligned}$$

We see that unlike the m-dimensional VMA(1) process where \({\mathbb {E}}({\mathbf {Y}}_{t})={\mathbf {0}}_{m}\), the mean vector of the asVMA(1, 1) process in (3) is a function of the parameter matrices \({\mathbf {B}}^{-}\) and \({\mathbf {B}}^{+}\) and is in general a nonzero vector. However, the m-dimensional asVMA(1, 1) process like the m-dimensional VMA(1) process, has the property that \(\varvec{\Gamma }(\ell )={\mathbf {0}}_{m\times m}\) for \(\ell > 1\). Thus, it will be hard to distinguish between both processes on the basis of the sample estimate of \(\varvec{\Gamma }(\ell )\). Also, setting \({\mathbf {B}}^{-}=-{\mathbf {B}}^{+}\), it follows from the Proposition that \(\varvec{\Gamma }(\ell )={\mathbf {0}}_{m\times m}\) for all lags \(\ell \ge 1\). When this asymmetry in parameter configuration occurs it is impossible to distinguish an m-dimensional asVMA(1, 1) process from an m-dimensional Gaussian white noise process using the lag-\(\ell \) cross-covariance matrix.

Following the proofs in the “Appendix”, the results in the Proposition can be extended to asVMA processes with \(q>1\). Then the same confusing situation occurs as above, i.e., it is impossible to distinguish between an asVMA process with \({\mathbf {B}}^{-}_{v}=-{\mathbf {B}}^{+}_{v}\) and an m-dimensional Gaussian white noise process, based on the unconditional lag-\(\ell \) cross-covariance matrix alone. In Sect. 3.2, we overcome this problem by introducing the proposed Wald-type test statistic.

3 Estimation and testing

For ease of notation, and without loss of generality, we assume throughout this section that the asVMA model order is fixed at q. In Sect. 5 we present estimation results with some lags excluded from the fitted model specification.

3.1 Likelihood

Assume that \(\{{\mathbf {Y}}_{t}\}^{T}_{t=1}\) is generated by the m-dimensional asVMA(qq) model in (1) with a constant vector term \(\varvec{\mu }=(\mu _{1},\ldots ,\mu _{m})^{\prime }\) included on the right-hand side. For the conditional likelihood approach, we also assume that the initial observations \({\mathbf {Y}}_{0},{\mathbf {Y}}_{-1},\ldots ,{\mathbf {Y}}_{1-T}\) are fixed, and \(\varvec{\varepsilon }_{0}=\varvec{\varepsilon }_{-1}=\cdots =\varvec{\varepsilon }_{1-T}={\mathbf {0}}_{m}\). This assumption does not affect the asymptotic properties of the test statistic developed in Sect. 3.2. We define the \(T\times m\) matrices \({\mathbf {Y}}=\big (({\mathbf {Y}}_{1}-\varvec{\mu }),\ldots , ({\mathbf {Y}}_{T}-\varvec{\mu })\big )^{\prime }\), and \(\varvec{\varepsilon }=(\varvec{\varepsilon }_{1},\ldots , \varvec{\varepsilon }_{T})^{\prime }\). In addition, using the vectorizing operation “vec” which forms a vector from a matrix by stacking the columns of the matrix one underneath the other, we define the \(mT\times 1\) vectors

$$\begin{aligned} {\mathbf {y}} = \text{ vec }(\mathbf {Y'}),\quad {\mathbf {e}}=\text{ vec }(\varvec{\varepsilon }^{\prime }) \end{aligned}$$

as well as the \(m^{2}\times 1\) parameter vector \(\varvec{\beta }_{v}^{(i)}=\text{ vec }({\mathbf {B}}^{(i)}_{v})\) (\(i=1,2;v=1,\ldots ,q),\) introducing the notation \({\mathbf {B}}_{v}^{(1)}\equiv {\mathbf {B}}_{v}^{-}\) and \({\mathbf {B}}_{v}^{(2)}\equiv {\mathbf {B}}_{v}^{+}\). Next, we introduce the \(T\times T\) lag matrix \({\mathbf {L}}\) with ones on the (sub)diagonal directly below the principal diagonal and zeros elsewhere, with \(({\mathbf {L}}^{v}\otimes {\mathbf {I}}_{m}){\mathbf {e}} = ({\mathbf {0}}_{m}^{\prime },\ldots ,{\mathbf {0}}_{m}^{\prime }, \varvec{\varepsilon }^{\prime }_{1},\ldots , \varvec{\varepsilon }_{T-v}^{\prime })^{\prime }\) \((v=1,\ldots ,q)\). Let \({\mathbf {D}}_{v}=({\mathbf {B}}_{v}^{(1)}- {\mathbf {B}}_{v}^{(2)})\). Then the model in (1) can be written as

$$\begin{aligned} {\mathbf {y}}&={\mathbf {e}}+ \sum ^{q}_{v=1}({\mathbf {L}}^{v}\otimes {\mathbf {B}}^{(2)}_{v}){\mathbf {e}} +\sum ^{q}_{v=1}({\mathbf {Q}}{\mathbf {L}}^{v}\otimes {\mathbf {D}}_{v}) {\mathbf {e}}\nonumber \\&= \big (\varvec{\Theta }^{(1)} + \varvec{\Theta }^{(2)}\big ){\mathbf {e}}, \end{aligned}$$
(4)

where \(\varvec{\Theta }^{(1)}=({\mathbf {I}}_{T}\otimes {\mathbf {I}}_{m})+\sum ^{q}_{v=1}({\mathbf {L}}^{v}\otimes {\mathbf {B}}^{(2)}_{v})\) and \(\varvec{\Theta }^{(2)}= \sum ^{q}_{v=1}({\mathbf {Q}}{\mathbf {L}}^{v}\otimes {\mathbf {D}}_{v})\) are two \(mT\times mT\) matrices, and \({\mathbf {Q}} = \text{ diag }\{[I(\varepsilon _{11}< 0),\ldots ,I(\varepsilon _{m1}< 0)],\ldots ,[I(\varepsilon _{1T}< 0),\ldots ,I(\varepsilon _{mT}< 0)]\}\) is a \(T\times T\) diagonal matrix.

Define the \((2qm^{2}+m)\times 1\) parameter vector

$$\begin{aligned} \varvec{\theta }=\big ( \varvec{\mu }^{\prime },(\varvec{\beta }^{(1)}_{1})^{\prime },\ldots , (\varvec{\beta }^{(1)}_{q})^{\prime }, (\varvec{\beta }^{(2)}_{1})^{\prime }, \ldots ,(\varvec{\beta }^{(2)}_{q})^{\prime }\big )^{\prime }. \end{aligned}$$
(5)

Then on the assumption of normality of the \(\varvec{\varepsilon }_{t}\) and since \({\mathbf {e}}{\mathop {\sim }\limits ^{\tiny {\text{ i.i.d. }}}} \mathcal {N}({\mathbf {0}}_{mT},{\mathbf {I}}_{T}\otimes \varvec{\Sigma }_{\varepsilon })\), the log-likelihood function, apart from an additive constant term, is given by

$$\begin{aligned} \ell (\varvec{\theta }) = \sum ^{T}_{t=1} \ell _{t}(\varvec{\theta })&= -\frac{T}{2}\log |\varvec{\Sigma }_{\varepsilon }| -\frac{1}{2} \sum ^{T}_{t=1}\varvec{\varepsilon }^{\prime }_{t} \varvec{\Sigma }^{-1}_{\varepsilon } \varvec{\varepsilon }_{t} \nonumber \\&= -\frac{T}{2}\log |\varvec{\Sigma }_{\varepsilon }| -\frac{1}{2}{\mathbf {e}}^{\prime }( {\mathbf {I}}_{T}\otimes \varvec{\Sigma }_{\varepsilon }^{-1} ){\mathbf {e}}, \end{aligned}$$
(6)

where \({\mathbf {e}}=\varvec{\Theta }^{-1}{\mathbf {y}}\) with \(\varvec{\Theta }=\varvec{\Theta }^{(1)}+\varvec{\Theta }^{(2)}\), and assuming the inverse exist. For a fixed parameter vector \(\varvec{\theta }\), it is clear that maximization of (6) with respect to \(\varvec{\Sigma }_{\varepsilon }\) yields \(\widehat{\varvec{\Sigma }}_{\varepsilon }=\sum ^{T}_{t=1} \varvec{\varepsilon }_{t}\varvec{\varepsilon }^{\prime }_{t}/T\).

For a vector autoregressive moving average model with Gaussian distributed errors, Reinsel et al. (1992, Sect. 2) derived an expression for the vector of partial derivatives (also called gradient or score vector) of the log-likelihood function with respect to the parameter vector. Their result carries over to an asVMA(qq) process. In particular the expression for the partial derivatives of the log-likelihood function \(\ell (\varvec{\theta })\) is given by

$$\begin{aligned} \frac{\partial \ell (\varvec{\theta })}{\partial \varvec{\theta }}&= -\frac{\partial {\mathbf {e}}^{\prime }}{\partial \varvec{\theta }}({\mathbf {I}}_{T}\otimes \varvec{\Sigma }^{-1}_{\varepsilon }){\mathbf {e}} \nonumber \\&= {\mathbf {Z}}^{\prime } (\varvec{\Theta }^{\prime })^{-1}({\mathbf {I}}_{T}\otimes \varvec{\Sigma }^{-1}_{\varepsilon }){\mathbf {e}}, \end{aligned}$$
(7)

where \({\mathbf {Z}}=[({\mathbf {L}}\varvec{\varepsilon }\otimes {\mathbf {I}}_{m}),\ldots ,({\mathbf {L}}^{q}\varvec{\varepsilon }\otimes {\mathbf {I}}_{m})]\) is an \(mT\times ( 2qm^{2}+m)\) matrix. In practice, Eq. (7) needs to be solved by an iterative numerical optimization procedure. Associated to this, it is often useful to have a convenient expression for the \(2qm^{2}\times 2qm^{2}\) Hessian matrix \(\mathcal {H}(\varvec{\theta })\) of second partial derivatives of \(\ell (\varvec{\theta })\). When \(T\rightarrow \infty \), it is well known that \(\mathcal {H}(\varvec{\theta })\) can be approximated by the outer product of the gradient vector, which is equivalent to

$$\begin{aligned} \mathcal {H}(\varvec{\theta }) \approx \Big (\frac{\partial \ell (\varvec{\theta }) }{\partial \varvec{\theta }}\Big ) \Big (\frac{\partial \ell (\varvec{\theta }) }{\partial \varvec{\theta }}\Big )^{\prime }= {\mathbf {Z}}^{\prime }(\varvec{\Theta }^{\prime })^{-1}({\mathbf {I}}_{T}\otimes \varvec{\Sigma }^{-1}_{\varepsilon }) \varvec{\Theta }^{-1}{\mathbf {Z}}. \end{aligned}$$
(8)

In Sect. 3.2, we use the inverse of \(\mathcal {H}(\varvec{\theta })\) as an approximation of the variance-covariance matrix of the vector of parameter estimates.

3.2 Wald-type test statistic

Let \(\varvec{\beta }=\big ((\varvec{\beta }^{(1)})^{\prime },(\varvec{\beta }^{(2)})^{\prime }\big )^{\prime }\) denote the \(2qm^{2}\times 1\) vector of parameters of an m-dimensional asVMA(qq) process, excluding the constant term \(\varvec{\mu }\). It is apparent from the notations introduced in Sect. 3.1 that the problem of testing the null hypothesis of symmetry is equivalent to testing the restriction \(\varvec{\beta }^{(1)} = \varvec{\beta }^{(2)}\). A convenient test statistic can be obtained as follows. Let \({\mathbf {R}}\) denote a known restriction matrix of dimension \(qm^{2} \times (2qm^{2}+m)\) such that \({\mathbf {R}}\varvec{\theta }={\mathbf {r}}\) with \({\mathbf {r}}\) a \(qm^{2}\)-vector of restricted parameters. Next, from the partition \({\mathbf {R}}=({\mathbf {R}}_{1}:{\mathbf {R}}_{2})\), where \({\mathbf {R}}_{1}\) is a \(qm^{2}\times m\) matrix of zeros and \({\mathbf {R}}_{2}\) is a \(qm^{2}\times 2qm^{2}\) matrix, the problem becomes one of testing the null hypothesis

$$\begin{aligned} {\mathbb {H}}_{0}:{\mathbf {R}}_{2}\varvec{\beta }={\mathbf {0}}_{qm^{2}}\,\, \text{ versus }\,\,{\mathbb {H}}_{1}:{\mathbf {R}}_{2}\varvec{\beta }\ne {\mathbf {0}}_{qm^{2}}. \end{aligned}$$
(9)

Let \(\widehat{\varvec{\theta }}\) be the vector of parameter estimates of \(\varvec{\theta }\) under \({\mathbb {H}}_{0}\), and \(\mathcal {H}^{-1}(\widehat{\varvec{\theta }})\) the estimate of the corresponding covariance matrix evaluated under the null. Then the Wald-type (W) test statistic is given by

$$\begin{aligned} \text{ W}_{T} = \widehat{\varvec{\beta }}^{\prime }{\mathbf {R}}_{2}^{\prime } [{\mathbf {R}}\mathcal {H}^{-1}(\widehat{\varvec{\theta }}) {\mathbf {R}}^{\prime }]^{-1} {\mathbf {R}}_{2}\widehat{\varvec{\beta }}, \end{aligned}$$
(10)

with \(\widehat{\varvec{\beta }}\) denoting the unrestricted estimator of \(\varvec{\beta }\). Under \({\mathbb {H}}_{0}\), and as \(T\rightarrow \infty \), \(\text{ W}_{T}\overset{D}{\longrightarrow } \chi _{qm^{2}}^{2}\). Any consistent estimate of \(\mathcal {H}(\varvec{\theta })\) will lead to a different variant of the Wald-type test statistic.

Remark 1

The asymptotic distribution of the restricted estimator \(\widehat{\varvec{\theta }}\) can be established using a central limit theorem for martingale difference sequences (see, e.g., White 2001, Ch. 5), and is given by

$$\begin{aligned} T^{1/2}(\widehat{\varvec{\theta }} - \varvec{\theta })\overset{D}{\longrightarrow } {{\mathcal {N}}}_{2qm^{2}+m}\big ({\mathbf {0}},({\mathbf {R}}^{\prime } {\mathbf {V}}{\mathbf {R}})^{-1}\big ), \end{aligned}$$
(11)

where \({\mathbf {V}}= \lim _{T\rightarrow \infty }T^{-1}{\mathbb {E}}\big (-\partial ^{2}\ell (\varvec{\theta })/ \partial {\mathbf {r}}\partial {\mathbf {r}}^{\prime }\big )\). Given (11), the asymptotic distribution of the Wald-type test statistic (10) follows from White (2001, Thm. 4.31(ii)).

Remark 2

Let \(\{q^{\pm }_{rs}\}^{m}_{r,s=1}\) denote the set of lag orders corresponding to the positive (negative) innovations of an m-dimensional asVMA model. Then, testing for symmetry implies \(q^{-}_{rs}=q^{+}_{rs}\equiv q_{rs}\). In this case the \(\text{ W}_{T}\) is asymptotically \(\chi ^{2}\) distributed with \(\sum ^{m}_{r=1}\sum ^{m}_{s=1}q_{rs}\) degrees of freedom.

3.3 Test performance of \(W_{T}\)

To study the empirical size of \(\text{ W}_{T}\), we consider a two-dimensional asVMA(1, 1) (DGP-1) model with \({\mathbf {B}}^{\pm }=\left( {\begin{matrix}\delta &{} -\delta \\ -\delta &{}\delta \end{matrix}}\right) \), where \(\delta =0,\, \pm 0.2,\, \pm 0.4\), and \(\{\varvec{\varepsilon }_{t}\}{\mathop {\sim }\limits ^{\tiny {\text{ i.i.d. }}}} {{\mathcal {N}}}({\mathbf {0}}_{2},{\mathbf {I}}_{2})\). To avoid the effect of any starting-up transient on the generated process, a prior part consisting of 500 observations was discarded from each series. Based on 1000 Monte Carlo replications, Table 1 shows the observed sizes of the Wald-type test statistic at nominal size 0.05 for \(T=250, 500,\, 1000\) and 2000. The table reveals that \(\hbox {W}_{{T}}\) is generally well sized. It is also apparent that the observed sizes are about the same for positive and negative values of \(\delta \).

In addition, we also studied the size of the Wald-type test statistic for a three-dimensional asVMA(1,1) model with \(\{\varvec{\varepsilon }_{t}\}{\mathop {\sim }\limits ^{\tiny {\text{ i.i.d. }}}} {{\mathcal {N}}}({\mathbf {0}}_{3},{\mathbf {I}}_{3})\) with the parameter matrix given by DGP-4 below. Under the \({\mathbb {H}}_{0}:{\mathbf {B}}^{+}={\mathbf {B}}^{-}\) the size (5% nominal level) of the test statistic is 0.091 \((T=500)\), 0.051 \((T=1000),\) and 0.46 \((T=2000\)) for \(\delta =0\). For \(|\delta |\ge 0.2\), however, we noticed serious size distortions. The size improves slightly for \(T=2000\). Nevertheless, caution is needed when interpreting the results of the Wald-type test when the sample size is relatively small.

Table 1 Size of the Wald-type test statistic \(\hbox {W}_{{T}}\) at nominal size 0.05 for DGP-1 \((m=2)\). Number of simulations is 1000

To study the empirical power of \(\text{ W}_{T}\), we employ three m-dimensional asVMA(1, 1) (\(m=2,\,3\)) DGPs with the following parameter configurations,

$$\begin{aligned} \text{ DGP-2: }&\quad {\mathbf {B}}^{-} = \left( \begin{array}{cccc} 0.5 &{}\quad 0.2 \\ 0.4 &{}\quad 0.3 \end{array} \right) ,\,\, {\mathbf {B}}^{+} = \left( \begin{array}{cccc} 0.7 &{}\quad -0.3 \\ -0.2 &{}\quad -0.5 \end{array} \right) ,\,\, \\ \text{ DGP-3: }&\quad {\mathbf {B}}^{-} = \left( \begin{array}{cccc} 0.7 &{}\quad 0 \\ 0.3 &{}\quad 0.5 \end{array} \right) ,\,\, {\mathbf {B}}^{+} = -{\mathbf {B}}^{-},\\ \text{ DGP-4 }:&\quad {\mathbf {B}}^{-} = \left( \begin{array}{cccc} \delta &{}\quad -\delta &{}\quad 0\\ -\delta &{}\quad \delta &{}\quad -\delta \\ 0 &{}\quad -\delta &{}\quad \delta \end{array} \right) , \,\, {\mathbf {B}}^{+}=-{\mathbf {B}}^{-}, \quad (\delta =0.2,\,0.4,\,0.6,\,0.8), \end{aligned}$$

where in all cases \(\{\varvec{\varepsilon }_{t}\}{\mathop {\sim }\limits ^{\tiny {\text{ i.i.d. }}}} {{\mathcal {N}}}({\mathbf {0}}_{m},{\mathbf {I}}_{m})\). Note, all DGPs do not have a constant vector term \(\varvec{\mu }\).

Table 2 shows the empirical rejection frequencies of the \(\hbox {W}_{{T}}\) test statistic at a 5% nominal significance level. The power of the test statistic is reasonable to good, irrespective of the dimension m and the sample size T. This also applies to the special case \({\mathbf {B}}^{+}=-{\mathbf {B}}^{-}\) (DGP-3 and DGP-4) which we discussed in Sect. 2.3.

Table 2 Empirical probability of rejecting symmetry by the \(\text{ W}_{T}\) test statistic when the DGP is given by the three m-dimensional asVMA(1, 1) models \((m=2,\,3)\) specified in Sect. 3.3

Table 3 shows parameter estimates and their corresponding standard deviations for DGP-2 and DGP-3. In all cases the performance of the estimation method is good, i.e., both the bias and standard deviation of \(\widehat{\varvec{\theta }}\) decreases as T increases. This empirical result is in agreement with the asymptotic result in (11).

Table 3 Parameter estimates and their corresponding standard deviations (in parentheses) for DGP-2 and DGP-3

4 Two exploratory test statistics

4.1 asMA model

To investigate the possible asymmetry of an asMA model (\(m=1\)), Welsh and Jernigan (1983) introduced the lag \(\ell \) bicovariance function, defined as \({\widetilde{\gamma }}(\ell )=\text{ Cov }(Y^{2}_{t-\ell },Y_{t})\). They proposed testing the null hypothesis \({\mathbb {H}}_{0}:{\widetilde{\gamma }}(\ell )=0\). For \({\mathbb {E}}(Y_{t})=0\), an unbiased and consistent estimator of \({\widetilde{\gamma }}(\ell )\) is given by \({\widehat{\gamma }}(\ell )=(T-\ell )^{-1}\sum ^{T}_{t=\ell +1}Y^{2}_{t-\ell }Y_{t}\). The authors showed that under the null hypothesis the standardized bicovariance

$$\begin{aligned} Q(\ell )=\sum ^{T}_{t=\ell +1}Y^{2}_{t-\ell }Y_{t}/\sqrt{3(T-\ell )} \end{aligned}$$
(12)

has an asymptotic \(\mathcal {N}(0,1)\) distribution. For an asMA(1) process, \(Q(\ell )\) has good power as demonstrated by Brännäs et al. (1998).

4.2 asVMA model

A natural extension of \({\widetilde{\gamma }}(\ell )\) is the lag \(\ell \) cross-bicovariance function, defined as \({\widetilde{\gamma }}_{ij}(\ell )={\mathbb {E}}[Y^{2}_{i,t-\ell }Y_{j,t}]\) \((i,j=1,\ldots ,m; \ell \in {\mathbb {N}}^{+})\). However, unlike the correlation matrix used for the identification of linear vector time series processes, the resulting cross-bicovariance matrix is not symmetric with respect to the principal diagonal since \({\widetilde{\gamma }}_{ij}(\ell )\ne {\widetilde{\gamma }}_{ji}(\ell )\). To obtain a symmetric matrix, we first define the following matrices

$$\begin{aligned} \varvec{\Gamma }^{(1)}(\ell ) =[{\gamma }^{(1)}_{uv}]^{m}_{u,v=1}= \left\{ \begin{array}{lcc} {\widetilde{\gamma }}_{uv}(\ell ), &{} u\le v,\\ {\widetilde{\gamma }}_{vu}(\ell ), &{} u>v, \end{array} \right. \quad \varvec{\Gamma }^{(2)}(\ell ) =[{\gamma }^{(2)}_{uv}]^{m}_{u,v=1}= \left\{ \begin{array}{lcc} {\widetilde{\gamma }}_{vu}(\ell ), &{} u\le v,\\ {\widetilde{\gamma }}_{uv}(\ell ), &{} u>v. \end{array} \right. \end{aligned}$$
(13)

Then, a symmetrized lag-\(\ell \) cross-bicovariance matrix is given by

$$\begin{aligned} \widetilde{\varvec{\Gamma }}(\ell )=\frac{1}{2}\big (\varvec{\Gamma }^{(1)}(\ell ) + \varvec{\Gamma }^{(2)}(\ell )\big ). \end{aligned}$$
(14)

Note that the principal diagonal of \(\widetilde{\varvec{\Gamma }}(\ell )\) is composed of the elements \({\widetilde{\gamma }}_{uu}(\ell )\) \((u=1,\ldots ,m)\). The off-diagonals above and below the principal diagonal of \(\widetilde{\varvec{\Gamma }}(\ell )\) are composed of the elements \(\big ({\widetilde{\gamma }}_{uv}(\ell )+{\widetilde{\gamma }}_{vu}(\ell )\big )/2\) \((u\le v; u,v=1,\ldots ,m)\), respectively.

Using (14), the null- and alternative hypotheses of interest are

$$\begin{aligned} {\mathbb {H}}_{0}:\widetilde{\varvec{\Gamma }}(\ell )=[{\widetilde{\gamma }}_{ij} (\ell )]^{m}_{i,j=1}={\mathbf {0}}_{m\times m}\, \text{ versus }\,\, {\mathbb {H}}_{1}:\widetilde{\varvec{\Gamma }}(\ell ) \ne {\mathbf {0}}_{m\times m}, \quad (\ell =1,\ldots ,q), \end{aligned}$$
(15)

where \(q\in {\mathbb {Z}}^{+}\) is a prescribed constant integer. Let \(\widehat{\varvec{\Gamma }}_{m}(\ell )=[Z_{ij}(\ell )]^{m}_{i,j=1}\) denote an estimator of \(\widetilde{\varvec{\Gamma }}(\ell )\), where \(Z_{ij}(\ell )=\sum ^{T}_{t=\ell +1}Y^{2}_{i,t-\ell }Y_{i,t}/\sqrt{3(T-\ell )}\). Then, an exploratory test statistic for testing \({\mathbb {H}}_{0}\) can be based on the squared Frobenius norm. That is, the test statistics is given by

$$\begin{aligned} Q_{m}(\ell )&= \text{ Tr }\big ( {\widehat{\varvec{\Gamma }}}_{m}^{\prime }(\ell ) {\widehat{\varvec{\Gamma }}}_{m}(\ell ) \big ), \nonumber \\&=\Big ( \sum ^{m}_{i=1}Z^{2}_{ii}(\ell )+\frac{1}{2}\sum ^{m}_{i\ne j}\big (Z_{ij}(\ell )+Z_{ji}(\ell )\big )^{2}\Big ), \,\,\, (\ell =1,\ldots ,q). \end{aligned}$$
(16)

Large values of \(Q_{m}(\ell )\) indicate that \({\mathbb {H}}_{0}\) should be rejected.

Given (16), it is easily verified that under \({\mathbb {H}}_{0}\) the asymptotic distribution of \(Q_{m}(\ell )\) is characterized by the sum of two uncorrelated random variables, i.e., \(Q_{m}(\ell )=(X_{1}+X_{2})\). Here, the distribution of \(X_{1}\) is \(\chi ^{2}(1)\) multiplied by a constant \((m+1)/2\). Then, the random variable \(((m+1)/2)\chi ^{2}(1)\) has a gamma distribution \(\Gamma (k,\theta )\) with shape parameter \(k=1/2\) and scale parameter \(\theta =(m+1)\). The distribution of \(X_{2}\) is \((m-1)/2\) times the product of two uncorrelated \(\mathcal {N}(0,1)\) random variables, say Z, where the exact probability density function of Z is given by the well-known result \(f(z)=(1/\pi )K_{0}(z)\) with \(K_{0}(\cdot )\) the modified Bessel function of the second kind of order zero. Clearly, for \(m=1\) the null distribution of \(Q_{1}(\ell )\) is \(\chi ^{2}(1)\). For \(m\ge 2\), however, the null distribution of \(Q_{m}(\ell )\) is untractable. In that case, we use a stationary bootstrap scheme, with automatic block-length selection; see Politis and White (2004) and Patton et al. (2007).

4.3 Test performance of \(Q_{m}(\ell )\)

To study the empirical power of \(Q_{m}(\ell )\), we employ two m-dimensional asVMA(1, 1) (\(m=2,3\)) DGPs with the following parameter configurations

$$\begin{aligned} \text{ DGP-5 }:&\quad {\mathbf {B}}^{-}\! =-{\mathbf {B}}^{+}= \begin{array}{lcc} {\begin{pmatrix}\delta \,\,&{} -\delta \\ -\delta \,\, &{} \delta \end{pmatrix}},\quad \,\,\, \varvec{\Sigma }_{1}= {\begin{pmatrix}1 &{} 0.2\\ 0.2 &{} 1 \end{pmatrix}},\,\, \end{array} \begin{array}{lcc} \varvec{\Sigma }_{2}= {\begin{pmatrix}1 &{} 0.3\\ 0.3 &{} 1 \end{pmatrix}},\,\, \end{array}\\ \text{ DGP-6 }:&\quad {\mathbf {B}}^{-} =-{\mathbf {B}}^{+}= \begin{array}{lcc} {\begin{pmatrix}\delta \,&{} -\delta \, &{} 0\\ -\delta &{} \delta &{} -\delta \\ 0\, &{} -\delta \, &{} \delta \end{pmatrix}},\,\, \varvec{\Sigma }_{1} = {\begin{pmatrix}1 &{} 0.2 &{} 0\\ 0.2 &{} 1 &{} 0.2\\ 0 &{} 0.2 &{} 1\end{pmatrix}},\,\, \varvec{\Sigma }_{2} = {\begin{pmatrix}1 &{} 0.3 &{} 0\\ 0.3 &{} 1 &{} 0.3\\ 0 &{} 0.3 &{} 1\end{pmatrix}},\,\, \end{array} \end{aligned}$$

where \(\delta =0.1,\dots , 0.5\), and \(\{\varvec{\varepsilon }_{t}\}{\mathop {\sim }\limits ^{\tiny {\text{ i.i.d. }}}} {{\mathcal {N}}}({\mathbf {0}}_{m},{\mathbf {I}}_{m})\).

For a 5% nominal significance level, Table 4 shows the empirical power of \(Q_{m}(\ell )\) at time-lag \(\ell =1\) with respect to three different sample sizes T and five parameter values \(\delta \). It is seen that \(Q_{m}(1)\) has good power for all dimensions m. For fixed values of \(\delta \), the power improves with increasing sample size T. Note that for \(\delta =0.1\) both DGPs are close to a vector Gaussian white noise process. But still the rejection rates are satisfactory. As expected, in all cases the empirical size (not shown here) is close to the nominal size.

Table 4 Empirical power of \(Q_{m}(1)\) when the vector time process follows two m-dimensional asVMA(1, 1) models \((m=2,3)\) for sample sizes \(T=50, 100, 200\)

5 Illustrative applications

No general model selection strategy has been developed to determine the most appropriate asVMA model in practice. We found it convenient to tackle this problem in the following way. First, based on the values of the test statistic \(Q(\ell )\) the best-fitted univariate asMA model is obtained for each time series separately. Next, we used these specifications as initial polynomial estimates for the diagonal elements of the associated asVMA model with low order polynomial specifications for the off-diagonal elements of the vector model. This provides an initial guess of the structure of the asVMA model, which can be fine-tuned using a suitable model order selection criterion (e.g., \(\text{ AIC }(q)=\ln |\widehat{\varvec{\Sigma }}_{\varepsilon }|+2qm^{2}/T\)) and by diagnostic checking the residuals. In addition, the pattern of the lag \(\ell \) cross-bicovariance matrix \(\widehat{\varvec{\Gamma }}_{m}(\ell )\) and the corresponding test statistic \(Q_{m}(\ell )\) may suggest directions of model improvement. However, in the interest of parsimony, we recommend that the initial asVMA model should be kept simple. Also, deleting asVMA model parameters which are small compared with their standard errors, may provide a better understanding of the DGP under study.

5.1 GDP and CPI

As a first illustration, we use quarterly US real GDP (seasonally adjusted; not inflation adjusted) and quarterly US consumer price index (CPI) total all items (seasonally adjusted), covering the period 1960(i)–2017(iv). In particular, we employ the GDP growth rate, i.e., \(Y_{1t}=\ln (\text{ GDP}_{t}/\text{GDP}_{t-1})\) and the inflation rate \(Y_{2t}=\ln (\text{ CPI}_{\,t}/\text{CPI}_{\,t-1})\) \((t=2,\ldots ,232)\). Our analysis starts with presenting results for the exploratory test statistics discussed in Sect. 4.

Table 5 reports values of the test statistics \(Q(\ell )\) and \(Q_{2}(\ell )\) \((\ell =1,\ldots ,10)\). For series \(Y_{1t}\), \(Q(\ell )\) has no values exceeding the 95% confidence limits at lags 1–10. For series \(Y_{2t}\), however, significant values of \(Q(\ell )\) are at lags 1–7 and 9–10. On the other hand, the univariate version of the Wald-type test statistic rejects the null hypothesis of symmetry for both series. In Table 5 the numbers within parentheses are the p-values of the test statistic \(Q_{2}(\ell )\). We see that at lags 1–6, the null hypothesis in (15) is rejected at the 5% significance level. This strongly suggest that an asVMA model should be entertained with parameter matrices specified at the first 6 lags, and perhaps also at lag 10.

Table 5 Values of the \(Q(\ell )\) and \(Q_{2}(\ell )\) test statistics. For \(Q_{2}(\ell )\) the number of bootstrap replicates is set at \(B=500\), and the number of simulations is \(1\!000\)

In our further analysis, we employed AIC to search over different asVMA specifications. The “best” fitted model is a two-dimensional asVMA given by

$$\begin{aligned} \widehat{\varvec{\mu }}&= \left( \begin{array}{ccc} \underset{\small {(4.597)}}{0.007} \\ \underset{\small {(8.240)}}{0.011} ,\,\, \end{array} \right) ,\,\, \widehat{{\mathbf {B}}}^{-}_{1} = \left( \begin{array}{ccc} \underset{\small {(-0.981)}}{-0.151} &{} \underset{\small {(1.996)}}{0.426}\\ \underset{\small {(0.454)}}{0.063} &{} \underset{\small {(4.351)}}{0.639} \end{array} \right) ,\,\, \widehat{{\mathbf {B}}}^{-}_{2} = \left( \begin{array}{ccc} \underset{\small {(0.888)}}{0.151} &{} \underset{\small {(1.545)}}{0.295}\\ \underset{\small {(0.183)}}{0.023} &{} \underset{\small {(0.518)}}{0.047} \end{array} \right) ,\,\,\\ \widehat{{\mathbf {B}}}^{-}_{3}&= \left( \begin{array}{ccc} \underset{\small {(1.237)}}{0.228} &{} \underset{\small {(-1.337)}}{-0.245}\\ {-} &{} {-} \end{array} \right) ,\\ \widehat{{\mathbf {B}}}^{-}_{10}&= \left( \begin{array}{ccc} {-} &{} {-}\\ \underset{\small {(-0.545)}}{-0.126} &{} \underset{\small {(-101.590)}}{-0.152} \end{array} \right) ,\\ \,\, \widehat{{\mathbf {B}}}^{+}_{1}&= \left( \begin{array}{ccc} \underset{\small {(4.301)}}{0.615} &{} \underset{\small {(-2.024)}}{-0.654}\\ \underset{\small {(1.799)}}{0.316} &{} \underset{\small {(5.980)}}{1.120} \end{array} \right) ,\,\, \widehat{{\mathbf {B}}}^{+}_{2} = \left( \begin{array}{ccc} \underset{\small {(1.497)}}{0.256} &{} \underset{\small {(-3.475)}}{-0.908}\\ \underset{\small {(2.276)}}{0.310} &{} \underset{\small {(0.420)}}{0.061}\\ \end{array} \right) ,\,\, \\ \widehat{{\mathbf {B}}}^{+}_{3}&= \left( \begin{array}{ccc} \underset{\small {(-2.114)}}{-0.265} &{} \underset{\small {(0.038)}}{0.009}\\ {-} &{} {-}\\ \end{array} \right) ,\\ \widehat{{\mathbf {B}}}^{+}_{10}&= \left( \begin{array}{ccc} {-} &{} {-}\\ \underset{\small {(-0.031)}}{-0.004} &{} \underset{\small {(0.570)}}{0.081}\\ \end{array} \right) ,\,\, \widehat{\varvec{\Sigma }}_{\varepsilon } = \left( \begin{array}{ccc} 4.27 &{}\quad 0.13 \\ 0.13 &{}\quad 2.97 \\ \end{array} \right) \times 10^{-5}, \end{aligned}$$

with t-values in parentheses. The Wald-type test statistic is \(\widehat{\text{ W }}_{T}=142.60\) and AIC\(=-20.04\).

The Wald-type test statistic is asymptotically distributed as \(\chi ^{2}_{12}\) with a p-value of zero, and hence the \({\mathbb {H}}_{0}\) of symmetry is rejected at the 5% nominal level. We see that for \(Y_{1t}\) the t-values of the parameter estimates \({\widehat{b}}^{-}_{12,1}\), \({\widehat{b}}^{+}_{11,1}\), \({\widehat{b}}^{+}_{12,1}\), \({\widehat{b}}^{+}_{12,2}\), and \({\widehat{b}}^{+}_{11,3}\) are all significant. It implies that for this series there is a strong interrelationship between the innovations \(\varepsilon ^{-}_{2,t-1}\), \(\varepsilon ^{+}_{1,t-1}\), \(\varepsilon ^{+}_{2,t-1}\), \(\varepsilon ^{+}_{2,t-2}\) and \(\varepsilon ^{+}_{1,t-3}\) of both time series \(Y_{1t}\) and \(Y_{2t}\). On the other hand, for \(Y_{2t}\) we see significant parameter estimates \({\widehat{b}}^{-}_{22,1}\), \({\widehat{b}}^{-}_{22,10}\), \({\widehat{b}}^{+}_{22,1}\), and \({\widehat{b}}^{+}_{21,2}\). In this case the dynamic structure depends on the innovations \(\varepsilon ^{-}_{2,t-1}\), \(\varepsilon ^{-}_{2,t-10}\), \(\varepsilon ^{+}_{2,t-1}\), and \(\varepsilon ^{+}_{1,t-2}\). The fitted asVMA model is in agreement with common expectations, i.e., US inflation has a negative, and statistically significant, asymmetric effect on growth rates in US real GDP.

Table 6 ML parameter estimates of an asMA(1) model, their t-values (in parentheses), residual variance \({\widehat{\sigma }}^{2}_{\varepsilon }\), AIC, and results of the Wald-type test statistic for the first differences of the logarithms of six univariate time series

5.2 Production price indices

As a second illustration, we reconsider the second application of Wecker (1981, Sect. 4). He fitted univariate asMA(1) models to six series of the first differences of logarithms of monthly industrial price series \((T=119)\), i.e., production price indices. Three series were obtained from the National Bureau of Labor Statistics (BLS) and three series from the National Bureau of Economic Research (NBER): Carbon Steel, Sheet and Strip – Hot rolled, Tin plate, and Regular gasoline. The BLS series are based on “spot” and quoted prices, while the NBER series are based on contract prices. Figure 2 shows time plots of the original series, initially compiled by Stigler and Kindahl (1970); see www.nber.org/chapters/c3321.pdf.

Fig. 2
figure 2

Six industrial production series for the period 1957:01–1966:12

Stigler and Kindahl (1970) noted that there are no systematic trend differences between the indices for the period 1957–1961, while for the period 1962–1966 the BLS-based indices rose about 0.7 percent a year relative to the NBER-based indices. They argue that this phenomenon is due to an asymmetric inertia (delay) in industrial prices movements. Price quotations are not revised immediately when, for instance, market conditions, and transaction prices, change. The asymmetry comes from the prudence of sellers to be slow in authenticating price reduction and prompt in authenticating price increases. This behavior is best represented by spot or quoted prices, and therefore by the BLS-based indices. For a buyer it is measured by the price of a contract, and therefore by the NBER-based indices. Given these observations, the BLS-based production indices are likely to show more asymmetries in sign-based persistence of the innovations than the NBER-based indices.

Below, we first investigate the presence of asymmetries by estimating and testing univariate asMA models for each of the six production series separately. Next, we capture asymmetries for the three BLS-based (NBER-based) indices jointly as a group by fitting asVMA models.

5.2.1 asMA models

Wecker (1981) noted that the three BLS-based indices, denoted by \(Y^{\tiny {\text {BLS}}}_{it}\) \((i=1,2,3)\), showed statistically significant signs of asymmetry using a likelihood ratio (LR) test statistic of the form \(2\log ({\widehat{\sigma }}^{2}_{s}/{\widehat{\sigma }}^{2}_{\varepsilon })\) where \({\widehat{\sigma }}^{2}_{s}\) is the estimated standard deviation of the innovations under \({\mathbb {H}}_{0}\) (symmetry) and where \({\widehat{\sigma }}^{2}_{\varepsilon }\) is the estimated standard deviation under \({\mathbb {H}}_{1}\) (asymmetry). No evidence for asymmetry was found in the three NBER-based indices, denoted by \(Y^{\tiny {\text {NBER}}}_{it}\) \((i=1,2,3)\). For more information about asymmetric effects, however, it is reasonable to fit asMA(1) models to the univariate time series first. Table 6 summarizes estimation and testing results. The ML parameter estimates are close to those reported by Wecker (1981, Table 3). But note that in almost all cases the estimates of \(b^{\pm }\) are not statistically different from zero at the 5% nominal level. Moreover, with one exception, the Wald-type test statistic indicates that there is little evidence to reject the null hypothesis of symmetry.

The entries in the first six rows of Table 7 are values of the \(Q(\ell )\) \((\ell =1,\ldots ,10)\) test statistic. We see that except for the BLS-CSSS Hot rolled series at lag 8, there is no evidence to reject the null hypothesis \({\mathbb {H}}_{0}:{\widetilde{\gamma }}(\ell )=0\) at the 5% nominal level. The null hypothesis is rejected, however, for the NBER-CSSS Hot rolled series at lags 1, 2, 6, and 7. Values of \(Q_{3}(\ell )\) and their corresponding p-values are reported in rows 7 and 8 of Table 7. For the NBER-CSSS Hot rolled series the test rejects the null hypothesis in (15) at lag 3, using a 5% nominal level. By contrast, for all remaining lags, \(Q_{3}(\ell )\) gives a strong indication not to reject the hypothesis of symmetry, on the basis of the p-values. This information may be used to drop insignificant parameter values from the final model specification and, as a result, improves model interpretation.

Table 7 Values of the \(Q(\ell )\) and \(Q_{3}(\ell )\) test statistics

5.2.2 asVMA models

We employ AIC as a model selection criterion but with the additional condition that the final selected model should be parsimonious. For a fully specified asVMA model of dimension \(m=3\) it is often hard to figure out what is going on. In that case, it makes sense to drop insignificant parameters. Given this objective, we obtain the following estimation results for the BLS-based indices

$$\begin{aligned} \widehat{\varvec{\mu }}&= \left( \begin{array}{ccc} \underset{\small {(1.000)}}{0.002} \\ \underset{\small {(1.000)}}{0.000} \\ \underset{\small {(-8.25)}}{-0.010} \end{array} \right) ,\,\, \widehat{{\mathbf {B}}}^{+}_{1} = \left( \begin{array}{ccc} - &{} - &{} \underset{\small {(-0.056)}}{-0.007}\\ \underset{\small {(0.156)}}{0.032} &{} \underset{\small {(-0.344)}}{-0.147} &{} \underset{\small {(-0.138)}}{-0.006}\\ \underset{\small {(0.314)}}{0.122} &{} \underset{\small {(-0.216)}}{-0.107} &{} \underset{\small {(11.032)}}{0.453} \end{array} \right) ,\,\,\\ \widehat{{\mathbf {B}}}^{-}_{1}&= \left( \begin{array}{ccc} - &{} - &{} \underset{\small {(0.375)}}{0.081} \\ \underset{\small {(0.203)}}{0.156} &{} \underset{\small {\text {(-}4.130)}}{-0.400} &{} \underset{\small {(-3.564)}}{-0.064} \\ \underset{\small {(1.072)}}{0.229} &{} \underset{\small {(-1.195)}}{-0.417} &{} \underset{\small {(-456.000)}}{-0.593} \end{array} \right) , \\ \widehat{{\mathbf {B}}}^{+}_{2}&= \left( \begin{array}{ccc} \underset{\small {(0.141)}}{0.063} &{} \underset{\small {(0.038)}}{0.029} &{}-\\ - &{} - &{} - \\ - &{} - &{} - \end{array} \right) ,\,\, \widehat{{\mathbf {B}}}^{-}_{2} = \left( \begin{array}{ccc} \underset{\small {(0.351)}}{0.238} &{} \underset{\small {(0.044)}}{0.106}&{} -\\ - &{} - &{} - \\ - &{} - &{} - \end{array} \right) ,\\ \widehat{{\mathbf {B}}}^{+}_{4}&= \left( \begin{array}{ccc} \underset{\small {(-0.102)}}{-0.046} &{} - &{} - \\ - &{} - &{} - \\ - &{} - &{} - \end{array} \right) ,\,\, \widehat{{\mathbf {B}}}^{-}_{4} = \left( \begin{array}{ccc} \underset{\small {(-0.485)}}{-0.325} &{} - &{} -\\ - &{} - &{} - \\ - &{} - &{} - \end{array} \right) ,\,\, \widehat{\varvec{\Sigma }}_{\varepsilon } = \left( \begin{array}{cccc} 4.02 &{} -0.25 &{} 0.81 \\ -0.25 &{} 1.98 &{} -0.74\\ 0.81 &{} -0.74 &{} 37.11 \end{array} \right) \times 10^{-5}, \end{aligned}$$

with \(\widehat{\text{ W }}_{T}=3.82\) and \(\hbox {AIC}= -27.71\). The Wald-type test statistic is asymptotically distributed as \(\chi ^{2}_{10}\) with a p-value of 0.95, and hence the \({\mathbb {H}}_{0}\) of symmetry given by (9) is not rejected at the 5% nominal level. Note that only four parameter estimates \({\widehat{b}}^{+}_{33,1}\), \({\widehat{b}}^{-}_{22,1}\), \({\widehat{b}}^{-}_{23,1}\), and \({\widehat{b}}^{-}_{33,1}\) are significantly different from zero.

Similarly, for the three NBER-based indices the “best” asVMA model is given by

$$\begin{aligned} \widehat{\varvec{\mu }}&= \left( \begin{array}{ccc} \underset{\small {(0.600)}}{0.000} \\ \underset{\small {(0.136)}}{0.000} \\ \underset{\small {(-0.600)}}{-0.001} \end{array} \right) ,\,\, \widehat{{\mathbf {B}}}^{+}_{1} = \left( \begin{array}{ccc} - &{} - &{} \underset{\small {(0.178)}}{0.055}\\ \underset{\small {(-0.252)}}{-0.041} &{} \underset{\small {(0.909)}}{0.215} &{} \underset{\small {(-0.154)}}{-0.071}\\ \underset{\small {(-0.233)}}{-0.038} &{} \underset{\small {(7.389)}}{0.211} &{} \underset{\small {(0.615)}}{0.214} \end{array} \right) ,\,\,\\ \widehat{{\mathbf {B}}}^{-}_{1}&= \left( \begin{array}{ccc} - &{} - &{} \underset{\small {(-0.600)}}{-0.186} \\ \underset{\small {(-0.590)}}{-0.287} &{} \underset{\small {\text {(-}0.154)}}{-0.041} &{} \underset{\small {(0.962)}}{0.154} \\ \underset{\small {(12.417)}}{0.190} &{} \underset{\small {(-0.312)}}{-0.071} &{} \underset{\small {(0.698)}}{0.250} \end{array} \right) ,\\ \widehat{{\mathbf {B}}}^{+}_{2}&= \left( \begin{array}{ccc} \underset{\small {(-1.028)}}{-0.164} &{} \underset{\small {(-0.652)}}{-0.012} &{}-\\ - &{} - &{} - \\ - &{} - &{} - \end{array} \right) ,\,\, \widehat{{\mathbf {B}}}^{-}_{2} = \left( \begin{array}{ccc} \underset{\small {(-0.198)}}{-0.079} &{} \underset{\small {(0.034)}}{0.083}&{} -\\ - &{} - &{} - \\ - &{} - &{} - \end{array} \right) ,\\ \widehat{{\mathbf {B}}}^{+}_{6}&= \left( \begin{array}{ccc} \underset{\small {(0.066)}}{0.010} &{} - &{} - \\ - &{} - &{} - \\ - &{} - &{} - \end{array} \right) ,\,\, \widehat{{\mathbf {B}}}^{-}_{6} = \left( \begin{array}{ccc} \underset{\small {(-0.116)}}{-0.051} &{} - &{} -\\ - &{} - &{} - \\ - &{} - &{} - \end{array} \right) ,\,\, \widehat{\varvec{\Sigma }}_{\varepsilon } = \left( \begin{array}{cccc} 4.10 &{} 0.48 &{} 0.17 \\ 0.48 &{} 3.97 &{} -0.21\\ 0.17 &{} -0.21 &{} 2.21 \end{array} \right) \times 10^{-5}, \end{aligned}$$

with \(\widehat{\text{ W }}_{T}=101.44\) and \(\hbox {AIC}=-29.84\).

In this case, the Wald-type test statistic is asymptotically distributed as \(\chi ^{2}_{10}\) with a p-value of zero, and hence the \({\mathbb {H}}_{0}\) of symmetry given by (9) is rejected at the 5% nominal level. Note that the rejection of \({\mathbb {H}}_{0}\) is most likely due to significant, and positive, values of \({\widehat{b}}^{+}_{32,1}\) and \({\widehat{b}}^{-}_{31,1}\). Given the size of these parameters, it appears that positive innovations persist longer than negative innovations. The remaining asVMA parameters are indistinguishable from zero. In summary, our multivariate estimation and testing results cast doubt on Wecker’s univariate test results. That is, we only find evidence of asymmetric effects in the three NBER-based indices when tested jointly.

6 Summary

In this paper, we introduced the class of asVMA models. The general concept and structure of these models complement the well-known class of univariate asMA models. We derived some basic properties of an asVMA model. We also proposed a multivariate Wald-type test statistic to uncover asymmetric effects in vector time series. Simulation experiments demonstrated reasonable to good power performance of the Wald-type test statistic in finite sample cases. We illustrated the proposed test and estimation procedure by finding evidence of asymmetry in two sets of empirical time series, one set modeled by a two-dimensional asVMA model, and one set by a three-dimensional asVMA model. Given these results, we believe that the asVMA model in conjunction with the proposed test statistic, has great potential to uncover asymmetric phenomena in multivariate time series with more precision than has been been possible before.