Distributed quantile regression for longitudinal big data

Fan, Ye; Lin, Nan; Yu, Liqun

doi:10.1007/s00180-022-01318-0

Distributed quantile regression for longitudinal big data

Original paper
Published: 17 January 2023

Volume 39, pages 751–779, (2024)
Cite this article

Computational Statistics Aims and scope Submit manuscript

527 Accesses
Explore all metrics

Abstract

Longitudinal data, measurements taken from the same subjects over time, appear routinely in many scientific fields, such as biomedical science, public health, ecology and environmental sciences. With the rapid development of information technology, modern longitudinal data are becoming massive in volume and high dimensional, hence often require distributed analysis in real-world applications. Standard divide-and-conquer techniques do not apply directly to longitudinal big data due to within-subject dependence. In this paper, we focus on developing a distributed algorithm to support quantile regression (QR) analysis of longitudinal big data, which currently remains an open and challenging issue. We employ weighted quantile regression (WQR) to accommodate the correlation in longitudinal big data, and parallelize the WQR estimation process with a two-stage algorithm to support distributed computing. Based on weights estimated in the first stage by the Newton–Raphson algorithm, the second stage solves the WQR problem using the multi-block alternating direction method of multipliers (ADMM). Simulation studies show that, compared to traditional non-distributed algorithms, our proposed method has favorable estimation accuracy and is computationally more efficient in both non-distributed and distributed environments. Further, we also analyze an air quality data set to illustrate the practical performance of this method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Residual projection for quantile regression in vertically partitioned big data

Article 17 January 2023

Optimal subsampling for composite quantile regression in big data

Article 08 February 2022

Multi-round smoothed composite quantile regression for distributed data

Article 10 January 2022

References

Ai M, Wang F, Yu J, Zhang H (2021) Optimal subsampling for large-scale quantile regression. J Complex 62:101512
MathSciNet Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Google Scholar
Brown BM, Wang Y-G (2005) Standard errors and covariance matrices for smoothed rank estimators. Biometrika 92(1):149–158
MathSciNet Google Scholar
Burden RL, Faires JD (2010) Numerical analysis, (9th edn.), Cengage Learning
Chen C, Wei Y (2005) Computational issues for quantile regression. Sankhyā: Indian J Stat 67(2):399–417
MathSciNet Google Scholar
Chen C, He B, Ye Y, Yuan X (2016) The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1):57–79
MathSciNet Google Scholar
Chen X, Liu W, Zhang Y (2019) Quantile regression under memory constraint. Annals Stat 47(6):3244–3273
MathSciNet Google Scholar
Chen L, Zhou Y (2020) Quantile regression in big data: a divide and conquer based strategy. Comput Stat Data Anal 144:106892
MathSciNet Google Scholar
Chen X, Liu W, Mao X, Yang Z (2020) Distributed high-dimensional regression under a quantile loss function. J Mach Learn Res 21(182):1–43
MathSciNet Google Scholar
Deng W, Lai M-J, Peng Z, Yin W (2017) Parallel multi-block ADMM with \(o(1/k)\) convergence. J Sci Comput 71(2):712–736
MathSciNet Google Scholar
Engels JM, Diehr P (2003) Imputation of missing longitudinal data: a comparison of methods. J Clin Epidemiol 56(10):968–976
PubMed Google Scholar
Fu L, Wang Y-G (2012) Quantile regression for longitudinal data with a working correlation model. Comput Stat Data Anal 56(8):2526–2538
MathSciNet Google Scholar
Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2(1):17–40
Google Scholar
Geraci M, Bottai M (2007) Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8(1):140–154
PubMed Google Scholar
Glowinski R, Marroco A (1975) Sur L’approximation, par Éléments Finis D’ordre un, et la Résolution, par Pénalisation-Dualité D’une Classe de Problèmes de Dirichlet Nonlinéaires. Revue Française D’automatique Inf Rech Opérationnelle. Anal Numér 9(2):41–76
Google Scholar
Guan L, Qiao L, Li D, Sun T, Ge K, Lu X (2018) An efficient ADMM-based algorithm to nonconvex penalized support vector machines. In: Proceedings of the 2018 IEEE international conference on data mining workshops (ICDMW), pp 1209–1216. IEEE
Gu Y, Fan J, Kong L, Ma S, Zou H (2018) ADMM for high-dimensional sparse penalized quantile regression. Technometrics 60(3):319–331
MathSciNet Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Google Scholar
Hu A, Jiao Y, Liu Y, Shi Y, Wu Y (2021) Distributed quantile regression for massive heterogeneous data. Neurocomputing 448:249–262
Google Scholar
Kibria BG, Joarder AH (2006) A short review of multivariate \(t\)-distribution. J Stat Res 40(1):59–72
MathSciNet Google Scholar
Koenker R (2004) Quantile regression for longitudinal data. J Multivar Anal 91(1):74–89
MathSciNet Google Scholar
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50
MathSciNet Google Scholar
Leng C, Zhang W (2014) Smoothing combined estimating equations in quantile regression for longitudinal data. Stat Comput 24(1):123–136
MathSciNet Google Scholar
Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22
MathSciNet Google Scholar
Liang X, Zou T, Guo B, Li S, Zhang H, Zhang S, Huang H, Chen S (2015) Assessing Beijing’s PM2.5 pollution: severity, weather impact, apec and winter heating. Proc R Soc A: Math Phys Eng Sci 471(2182):20150257
ADS Google Scholar
Lu W, Zhu Z, Lian H (2020) High-dimensional quantile tensor regression. J Mach Learn Res 21(250):1–31
MathSciNet Google Scholar
Lv Y, Qin G, Zhu Z, Tu D (2019) Quantile regression and empirical likelihood for the analysis longitudinal data with monotone missing responses due to dropout, with applications to quality of life measurements from clinical trials. Stat Med 38(16):2972–2991
MathSciNet PubMed Google Scholar
Marino MF, Farcomeni A (2015) Linear quantile regression models for longitudinal experiments: an overview. METRON 73(2):229–247
MathSciNet Google Scholar
Nesterov Y, Nemirovski A (2013) On first-order algorithms for \(l_1\)/nuclear norm minimization. Acta Numer 22:509–575
MathSciNet Google Scholar
Ochando LC, Julián CIF, Ochando FC, Ferri C (2015) Airvlc: an application for real-time forecasting urban air pollution. In: Proceedings of the 2nd international conference on mining urban data, pp. 72–79
Portnoy S, Koenker R (1997) The Gaussian Hare and the laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12(4):279–300
MathSciNet Google Scholar
Qu A, Lindsay BG, Li B (2000) Improving generalised estimating equations using quadratic inference functions. Biometrika 87(4):823–836
MathSciNet Google Scholar
Royen T (1995) On some central and non-central multivariate chi-square distributions. Stat Sin 5:373–397
MathSciNet Google Scholar
Shi Y, Jiao Y, Cao Y, Liu Y (2018) An alternating direction method of multipliers for mcp-penalized regression with high-dimensional data. Acta Math Sin Engl Ser 34(12):1892–1906
MathSciNet Google Scholar
Shi Y, Wu Y, Xu D, Jiao Y (2018) An ADMM with continuation algorithm for non-convex sica-penalized regression in high dimensions. J Stat Comput Simul 88(9):1826–1846
MathSciNet Google Scholar
Smith V, Forte S, Ma C, Takáč M, Jordan MI, Jaggi M (2018) CoCoA: a general framework for communication-efficient distributed optimization. J Mach Learn Res 18(230):1–49
MathSciNet Google Scholar
Tang CY, Leng C (2011) Empirical likelihood and quantile regression in longitudinal data analysis. Biometrika 98(4):1001–1006
MathSciNet Google Scholar
Tang Y, Wang Y, Li J, Qian W (2015) Improving Estimation efficiency in quantile regression with longitudinal data. J Stat Plan Inference 165:38–55
MathSciNet Google Scholar
Volgushev S, Chao S-K, Cheng G (2019) Distributed inference for quantile regression processes. Ann Stat 47(3):1634–1662
MathSciNet Google Scholar
Wang H, Li C (2017) Distributed quantile regression over sensor networks. IEEE Trans Signal Inf Process Netw 4(2):338–348
MathSciNet Google Scholar
Wang H, Ma Y (2021) Optimal subsampling for quantile regression in big data. Biometrika 108(1):99–112
MathSciNet Google Scholar
Wang HJ, Zhu Z (2011) Empirical likelihood for quantile regression models with longitudinal data. J Stat Plan Inference 141(4):1603–1615
MathSciNet Google Scholar
Yang J, Meng X, Mahoney MW (2014) Quantile regression for large-scale applications. SIAM J Sci Comput 36(5):78–110
MathSciNet Google Scholar
Yuan X, Lin N, Dong X, Liu T (2017) Weighted quantile regression for longitudinal data using empirical likelihood. Sci China Math 60(1):147–164
MathSciNet Google Scholar
Yu L, Lin N (2017) ADMM for penalized quantile regression in big data. Int Stat Rev 85(3):494–518
MathSciNet Google Scholar
Yu L, Lin N, Wang L (2017) A parallel algorithm for large-scale nonconvex penalized quantile regression. J Comput Gr Stat 26(4):935–939
MathSciNet Google Scholar
Zhao W, Lian H, Song X (2017) Composite quantile regression for correlated data. Comput Stat S Data Anal 109:15–33
MathSciNet Google Scholar

Download references

Funding

Nan Lin’s work is supported by NVDIA GPU grant program. Ye Fan’s work is supported by Initial Scientific Research Fund of Young Teachers in Capital University of Economics and Business [Grant No. XRZ2022062], and partially supported by Special Fund for Basic Scientific Research of Beijing Municipal Colleges in Capital University of Economics and Business [Grant No. QNTD202207].

Author information

Authors and Affiliations

School of Statistics, Capital University of Economics and Business, Beijing, 100070, China
Ye Fan
Department of Mathematics and Statistics, Washington University in St. Louis, St. Louis, MO, 63130, USA
Nan Lin & Liqun Yu

Authors

Ye Fan
View author publications
You can also search for this author in PubMed Google Scholar
Nan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Liqun Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nan Lin.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 405 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fan, Y., Lin, N. & Yu, L. Distributed quantile regression for longitudinal big data. Comput Stat 39, 751–779 (2024). https://doi.org/10.1007/s00180-022-01318-0

Download citation

Received: 01 November 2021
Accepted: 15 December 2022
Published: 17 January 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00180-022-01318-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed quantile regression for longitudinal big data

Abstract

Access this article

Similar content being viewed by others

Residual projection for quantile regression in vertically partitioned big data

Optimal subsampling for composite quantile regression in big data

Multi-round smoothed composite quantile regression for distributed data

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 405 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distributed quantile regression for longitudinal big data

Abstract

Access this article

Similar content being viewed by others

Residual projection for quantile regression in vertically partitioned big data

Optimal subsampling for composite quantile regression in big data

Multi-round smoothed composite quantile regression for distributed data

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 405 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation