Abstract
Intensive longitudinal data (ILD) is an increasingly common data type in the social and behavioral sciences. Despite the many benefits these data provide, little work has been dedicated to realize the potential such data hold for forecasting dynamic processes at the individual level. To address this gap in the literature, we present the multi-VAR framework, a novel methodological approach allowing for penalized estimation of ILD collected from multiple individuals. Importantly, our approach estimates models for all individuals simultaneously and is capable of adaptively adjusting to the amount of heterogeneity present across individual dynamic processes. To accomplish this, we propose a novel proximal gradient descent algorithm for solving the multi-VAR problem and prove the consistency of the recovered transition matrices. We evaluate the forecasting performance of our method in comparison with a number of benchmark methods and provide an illustrative example involving the day-to-day emotional experiences of 16 individuals over an 11-week period.
Similar content being viewed by others
References
Allen, P. G., & Morzuch, B. J. (2006). Twenty-five years of progress, problems, and conflicting evidence in econometric forecasting. What about the next 25 years? International Journal of Forecasting, 22(3), 475–492.
Bańbura, M., Giannone, D., & Reichlin, L. (2010). Large Bayesian vector auto regressions. Journal of Applied Econometrics, 25(1), 71–92.
Basu, S., & Michailidis, G. (2015a). Regularized estimation in sparse high-dimensional time series models. Annals of Statistics, 43(4), 1535–1567.
Basu, S., & Michailidis, G. (2015b). Supplement to “Regularized estimation in sparse high-dimensional time series models”. Annals of Statistics, 43(4), 1535–1567.
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.
Bergmeir, C., & Benítez, J. M. (2012). On the use of cross-validation for time series predictor evaluation. Information Sciences: An International Journal, 191, 192–213.
Bergmeir, C., Costantini, M., & Benítez, J. M. (2014). On the usefulness of cross-validation for directional forecast evaluation. Computational Statistics& Data Analysis, 76, 132–143.
Bergmeir, C., Hyndman, R. J., & Koo, B. (2018). A note on the validity of cross-validation for evaluating autoregressive time series prediction. Computational Statistics& Data Analysis, 120, 70–83.
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
Bringmann, L. F., Vissers, N., Wichers, M., Geschwind, N., Kuppens, P., Peeters, F., Borsboom, D., & Tuerlinckx, F. (2013). A network approach to psychopathology: New insights into clinical longitudinal data. PLoS ONE, 8(4), e60188.
Bulteel, K., Mestdagh, M., Tuerlinckx, F., & Ceulemans, E. (2018). Var(1) based models do not always outpredict AR(1) models in typical psychological applications. Psychological Methods, 23(4), 740.
Bulteel, K., Tuerlinckx, F., Brose, A., & Ceulemans, E. (2018). Improved insight into and prediction of network dynamics by combining Var and dimension reduction. Multivariate Behavioral Research, 53(6), 853–875.
Cerqueira, V., Torgo, L., & Mozetič, I. (2020). Evaluating time series forecasting models: An empirical study on performance estimation methods. Machine Learning, 109(11), 1997–2028.
Chen, M., Chow, S.-M., Hammal, Z., Messinger, D. S., & Cohn, J. F. (2020). A person- and time-varying vector autoregressive model to capture interactive infant–mother head movement dynamics. Multivariate Behavioral Research, 56(5), 739–767.
Epskamp, S., Waldorp, L. J., Mõttus, R., & Borsboom, D. (2018). The Gaussian graphical model in cross-sectional and time-series data. Multivariate Behavioral Research, 53(4), 453–480.
Fisher, Z. F. (2021). multivar: Penalized estimation and forecasting of multiple subject vector autoregressive (multi-VAR) models. R package version 1.0.0. https://CRAN.R-project.org/package=multivar.
Fisher, Z. F., Chow, S.-M., Molenaar, P. C. M., Fredrickson, B. L., Pipiras, V., & Gates, K. M. (2020). A square-root second-order extended Kalman filtering approach for estimating smoothly time-varying parameters. Multivariate Behavioral Research, 1–19.
Fredrickson, B. L. (2013). Chapter One—Positive emotions broaden and build. In P. Devine & A. Plant (Eds.), Advances in experimental social psychology (Vol. 47, pp. 1–53). Academic Press.
Fredrickson, B. L., Boulton, A. J., Firestine, A. M., Van Cappellen, P., Algoe, S. B., Brantley, M. M., Kim, S. L., Brantley, J., & Salzberg, S. (2017). Positive emotion correlates of meditation practice: A comparison of mindfulness meditation and loving-kindness meditation. Mindfulness, 8(6), 1623–1633.
Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.
Gates, K. M., & Molenaar, P. C. M. (2012). Group search algorithm recovers effective connectivity maps for individuals in homogeneous and heterogeneous samples. NeuroImage, 63(1), 310–319.
Groen, R. N., Snippe, E., Bringmann, L. F., Simons, C. J. P., Hartmann, J. A., Bos, E. H., & Wichers, M. (2019). Capturing the risk of persisting depressive symptoms: A dynamic network investigation of patients’ daily symptom experiences. Psychiatry Research, 271, 640–648.
Gross, S. M., & Tibshirani, R. (2016). Data shared lasso: A novel tool to discover uplift. Computational Statistics& Data Analysis, 101, 226–235.
Han, F., & Liu, H. (2013). Transition matrix estimation in high dimensional time series. In International conference on machine learning (pp. 172–180).
Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: The lasso and generalizations. CRC Press.
Ji, L., Chow, S.-M., Crosby, B., & Teti, D. M. (2020). Exploring sleep dynamic of mother–infant dyads using a regime-switching vector autoregressive model. Multivariate Behavioral Research, 55(1), 150–151.
Kock, A. B., & Callot, L. (2015). Oracle inequalities for high dimensional vector autoregressions. Journal of Econometrics, 186(2), 325–344.
Lane, S., Gates, K., Fisher, Z., Arizmendi, C., & Molenaar, P. (2019). gimme: Group iterative multiple model estimation. R package version 0.6-1.
Li, J., & Chen, W. (2014). Forecasting macroeconomic time series: LASSO-based approaches and their forecast combinations with dynamic factor models. International Journal of Forecasting, 30(4), 996–1015.
Loh, P.-L., & Wainwright, M. J. (2012a). High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Annals of Statistics, 40(3), 1637–1664.
Loh, P.-L., & Wainwright, M. J. (2012b). Supplement to “High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity”. Annals of Statistics, 40(3), 1637–1664.
Lütkepohl, H. (2007). New introduction to multiple time series analysis. Springer.
Medeiros, M. C., & Mendes, E. F. (2016). \(\mathcalligra {l}\)1-Regularization of high-dimensional time-series models with non-Gaussian and heteroskedastic errors. Journal of Econometrics, 191(1), 255–271.
Molenaar, P. C. M. (1985). A dynamic factor model for the analysis of multivariate time series. Psychometrika, 50(2), 181–202.
Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. Technical Report 2007, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
Nicholson, W. B., Matteson, D. S., & Bien, J. (2017). VARX-L: Structured regularization for large vector autoregressions with exogenous variables. International Journal of Forecasting, 33(3), 627–651.
Ollier, E., & Viallon, V. (2014). Joint estimation of \(K\) related regression models with simple \(L_1\)-norm penalties. arXiv:1411.1594 [stat].
Ollier, E., & Viallon, V. (2017). Regression modelling on stratified data with the lasso. Biometrika, 104(1), 83–96.
Parikh, N., & Boyd, S. (2014). Proximal algorithms. Foundations and Trends in Optimization, 1(3), 127–239.
Polson, N. G., Scott, J. G., & Willard, B. T. (2015). Proximal algorithms in statistics and machine learning. Statistical Science, 30(4), 559–581.
Robertson, J. C., & Tallman, E. W. (2001). Improving federal-funds rate forecasts in VAR models used for policy analysis. Journal of Business& Economic Statistics, 19(3), 324–330.
Sims, C. A. (1980). Macroeconomics and reality. Econometrica, 48(1), 1–48.
Song, S., & Bickel, P. J. (2011). Large vector auto regressions. arXiv:1106.3915 [q-fin, stat].
Stock, J. H., & Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association, 97(460), 1167–1179.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.
Wild, B., Eichler, M., Friederich, H.-C., Hartmann, M., Zipfel, S., & Herzog, W. (2010). A graphical vector autoregressive modeling approach to the analysis of electronic diary data. BMC Medical Research Methodology, 10(1), 28.
Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. Journal of the American Statistical Association, 57(298), 348–368.
Zheng, Y., Wiebe, R. P., Cleveland, H. H., Molenaar, P. C. M., & Harris, K. S. (2013). An idiographic examination of day-to-day patterns of substance use craving, negative affect, and tobacco use among young adults in recovery. Multivariate Behavioral Research, 48(2), 241–266.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.
Acknowledgements
Vladas Pipiras was supported in part by the NSF grant DMS-1712966.
Author information
Authors and Affiliations
Appendix
Appendix
In this technical appendix, we discuss some theoretical aspect of LASSO estimation in the multi-VAR setting, namely concerning its consistency and sparsistency.
1.1 Consistency
Consistency of LASSO estimation for single (stable) VAR models was established in the seminal paper by Basu and Michailidis (2015a, b), building upon such results in the regression setting by Loh and Wainwright (2012a, b). In the multi-VAR setting, the model is inherently unidentifiable. It could be that the LASSO solution is consistent for some particular \( \varvec{\mu }^{*}\), \(\varvec{\Delta }^{*}_{k}\) in the model (7), or over a subset of such identifications, but this problem still appears largely unresolved. Some related result though can be found in the discussion on sparsistency below following Ollier and Viallon (2017). Here, we shall discuss a weaker form of consistency of \(\hat{\mathbf {B}}_{k}=\hat{\varvec{\mu }}+\hat{\varvec{\Delta }}_{k}\) for \(\mathbf {B}_k^{*}\). The arguments are quite straightforward and shed some light on the problem, and also seemingly were not made in the related literature yet.
We first describe the basic result for a single VAR model expressed in the regression form (5), and then turn to a multi-VAR model. We index the model quantities with subscript k or superscript (k), \(k=1,\ldots ,K\), representing the individual models in the multi-VAR setting. After expanding the quadratic term of the objective function (6), the estimation equation can be rewritten as in Basu and Michailidis (2015a) in terms of the quantities
Estimation consistency is proved under the following two conditions on these quantities:
-
Restricted eigenvalue condition: The matrix \(\widehat{\Gamma }_{k}\) is said to satisfy this condition with parameters \(\alpha _{k},\tau _{k}>0\), if
$$\begin{aligned} \beta _{k}'\widehat{\Gamma }_{k}\beta _{k}\ge \alpha _{k}\Vert \beta _{k}\Vert _{2}^{2}-\tau _{k}\Vert \beta _{k}\Vert _{1}^{2},\quad \beta _{k}\in \mathbb {R}^{q}, \end{aligned}$$(32)with \(q=pd^{2}\).
-
Deviation condition: This condition is satisfied if
$$\begin{aligned} \Vert \hat{\gamma }_{k}-\widehat{\Gamma }_{k}\mathbf {B}_{k}^{*}\Vert _{\infty } \le Q_{k}(\mathbf {B}_{k}^{*},\Sigma _{k,\varepsilon }) \sqrt{\frac{\log q}{N}}, \end{aligned}$$(33)for a deterministic function \(Q_{k}\).
Let \(s_{k}=\Vert \mathbf {B}_{k}^{*}\Vert _{0}\) denote the sparsity of the model. Under the conditions above and assuming \(s_{k}\tau _{k}\le \alpha _{k}/32\), Proposition 4.1 of Basu and Michailidis (2015a) states that any solution \(\hat{\mathbf {B}}\) of (6) satisfies: for any \(\lambda \ge 4Q_{k}(\mathbf {B}_{k}^{*},\Sigma _{k,\varepsilon })\sqrt{\frac{\log q}{N}}\),
Additionally, a result on the support of thresholded estimators of \(\hat{\mathbf {B}}_{k}\) is also available. The consistency results in (34) apply to generic LASSO estimators as long as the quantities \(\widehat{\Gamma }_{k},\hat{\gamma }_{k}\) satisfy the restricted eigenvalue and deviation conditions.
Among the key contributions of Basu and Michailidis (2015a) are their results (Propositions 4.2 and 4.3) proving that \(\widehat{\Gamma }_{k}\) and \(\hat{\gamma }_{k}\) satisfy the restricted eigenvalue and deviation conditions with high enough probabilities, and expressing the various parameters involved in the conditions \((\alpha _{k},\tau _{k},Q_{k}(\mathbf {B}_{k}^{*},\Sigma _{k,\varepsilon }))\) in terms of the VAR model parameters. Furthermore, in the restricted eigenvalue condition, \(\tau _{k}\) can be chosen so that \(s_{k}\tau _{k}\le \alpha _{k}/32\). We also note that the right-hand side of the inequalities (34) are expected to be negligible for small \(\lambda \) and hence small \(\log q/N\). The case when the logarithm of the dimension compares to the sample size through this way is the typical LASSO scenario.
In the multi-VAR setting, the optimization problem (9) can be expressed through the objective function
A consistency bound for the minimizer \(\hat{\mathbf {B}}_{k}\) of (35) can still be obtained similarly as for single VAR models if one is willing to make the assumption
The constraint (36) could be imposed while optimizing (35) or choosing \(\lambda _1\) appropriately large, or inferred to hold (with high enough probability) from sparsistency result, if available. Indeed, under (36), a consistency bound can be derived easily as in the proof of Proposition 3.3 in Basu and Michailidis (2015a, b). That is, observe first that
and rearranging the terms and setting \(\mathbf {v}_k=\hat{\mathbf {B}}_k-\mathbf {B}_k^*\), we deduce
With \(\hat{J}_{k}=\text {supp}\{ \mathbf {B}_k^{*} - \hat{\varvec{\mu }}\) being the index support of \(\mathbf {B}_k^{*} - \hat{\varvec{\mu }}\}\), repeating the argument in Basu and Michailidis (2015a, b), we get
as long as \(\lambda _{2,k}\ge 4Q_k(\mathbf {B}_k^{*},\Sigma _{k,\varepsilon })\sqrt{\frac{\log q}{N}}\) (with the function \(Q_k\) from the deviation condition), where \((\cdot )_{\hat{J}}\) and \((\cdot )_{\hat{J}^c}\) denote restrictions to the index sets \(\hat{J}\) and \(\hat{J}^c\), respectively. Then,
and one also has
by Cauchy–Schwarz inequality (twice) and the fact that \(|\text {supp}\{\hat{J}_{K}\}|\le s_0+s_k\), where \(\Vert v\Vert _2^2=\sum _{k=1}^{K}\Vert v_k\Vert _2^2\). Similarly, by the restricted eigenvalue condition (32) for each model and assuming \(s_k\tau _k \le \alpha _k/32\), we have
A combination of (37)–(39) yields, e.g.,
or
This is the multi-VAR analogue of the second consistency bound in (34). One can similarly obtain a bound on \(\Vert \mathbf {v}\Vert _1\) analogous to the first one in (34).
1.2 Sparsistency
We comment here briefly on the possibility of recovering the supports of \(\varvec{\mu }^{*}\) and \(\varvec{\Delta }_{k}^{*}\). The same issue of (non)identifiability is fundamental here as well. Some result nevertheless are available in the literature for special cases. Assuming effectively that \(s\lambda _1/\lambda _{2,k}=cK^{1/2}\), Ollier and Viallon (2017) gave conditions for identifiability and sparsistency with the limiting common parameter of interest \(\varvec{\mu }^{*}\) defined as the entrywise median of \(\varvec{B}_k^{*}\). Their approach goes through verifying a particular well-known irrepresentability condition on a design matrix. It could in principle be adapted to the multi-VAR context but the value of this effort might be questionable. First, irrepresentability conditions are quite restrictive and difficult to verify, and as a result, adaptive LASSO versions are advocated for. The setting where the limiting parameter of interest is necessarily related to the median could also be viewed restrictive.
Rights and permissions
About this article
Cite this article
Fisher, Z.F., Kim, Y., Fredrickson, B.L. et al. Penalized Estimation and Forecasting of Multiple Subject Intensive Longitudinal Data. Psychometrika 87, 1–29 (2022). https://doi.org/10.1007/s11336-021-09825-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-021-09825-7