Skip to main content
Log in

Distance-Based Regression Analysis for Measuring Associations

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

Distance-based regression model, as a nonparametric multivariate method, has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of interest in genetic association studies, genomic analyses, and many other research areas. Based on it, a pseudo-F statistic which partitions the variation in distance matrices is often constructed to achieve the aim. To the best of our knowledge, the statistical properties of the pseudo-F statistic has not yet been well established in the literature. To fill this gap, the authors study the asymptotic null distribution of the pseudo-F statistic and show that it is asymptotically equivalent to a mixture of chi-squared random variables. Given that the pseudo-F test statistic has unsatisfactory power when the correlations of the response variables are large, the authors propose a square-root F-type test statistic which replaces the similarity matrix with its square root. The asymptotic null distribution of the new test statistic and power of both tests are also investigated. Simulation studies are conducted to validate the asymptotic distributions of the tests and demonstrate that the proposed test has more robust power than the pseudo-F test. Both test statistics are exemplified with a gene expression dataset for a prostate cancer pathway.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

  1. Han F and Pan W, Powerful multi-marker association tests: Unifying genomic distance-based regression and logistic regression, Genetic Epidemiology, 2010, 34(7): 680–688.

    Article  Google Scholar 

  2. Nievergelt C M, Libiger O, and Schork N J, Generalized analysis of molecular variance, PLoS Genet, 2007, 3(4): e51.

    Article  Google Scholar 

  3. Zapala M A and Schork N J, Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables, Proceedings of the National Academy of Sciences, 2006, 103(51): 19430–19435.

    Article  Google Scholar 

  4. Liang X, Bushman F D, and FitzGerald G A, Rhythmicity of the intestinal microbiota is regulated by gender and the host circadian clock, Proceedings of the National Academy of Sciences, 2015, 112(33): 10479–10484.

    Article  Google Scholar 

  5. Norman J M, Handley S A, Baldridge M T, et al., Disease-specific alterations in the enteric virome in inflammatory bowel disease, Cell, 2015, 160(3): 447–460.

    Article  Google Scholar 

  6. Wang T, Yang C, and Zhao H, Prediction analysis for microbiome sequencing data, Biometrics, 2019, 75(3): 875–884.

    Article  MathSciNet  MATH  Google Scholar 

  7. Wu G D, Chen J, Hoffmann C, et al., Linking long-term dietary patterns with gut microbial enterotypes, Science, 2011, 334(6052): 105–108.

    Article  Google Scholar 

  8. Molari M, Guilini K, Lott C, et al., CO2 leakage alters biogeochemical and ecological functions of submarine sands, Science Advances, 2018, 4(2): eaao2040.

    Article  Google Scholar 

  9. White L, O’Connor N, Yang Q, et al., Individual species provide multifaceted contributions to the stability of ecosystems, Nature Ecology & Evolution, 2020, 12(4): 1594–1601.

    Article  Google Scholar 

  10. Bertocci I, Araújo R, Incera M, et al., Benthic assemblages of rock pools in northern portugal: Seasonal and between-pool variability, Scientia Marina, 2012, 76(4): 781–789.

    Google Scholar 

  11. Consoli P, Romeo T, Ferraro M, et al., Factors affecting fish assemblages associated with gas platforms in the Mediterranean Sea, Journal of Sea Research, 2013, 77: 45–52.

    Article  Google Scholar 

  12. McArdle B and Anderson M, Fitting multivariate models to community data: A comment on distance-based redundancy analysis, Ecology, 2001, 82: 290–297.

    Article  Google Scholar 

  13. Wessel J and Schork N J, Generalized genomic distance-based regression methodology for multilocus association analysis, The American Journal of Human Genetics, 2006, 79(5): 792–806.

    Article  Google Scholar 

  14. Chen J, Bittinger K, Charlson E S, et al., Associating microbiome cmposition with environmental covariates using generalized UniFrac distances, Bioinformatics, 2012, 28(16): 2106–2113.

    Article  Google Scholar 

  15. Gambi C, Canals M, Corinaldesi C, et al., Impact of historical sulfide mine tailings discharge on meiofaunal assemblages (Portmán Bay, Mediterranean Sea), Science of The Total Environment, 2020, 736: 139641.

    Article  Google Scholar 

  16. Reiss P T, Stevens M H H, Shehzad Z, et al., On distance-based permutation tests for between-group comparisons, Biometrics, 2010, 66(2): 636–643.

    Article  MathSciNet  MATH  Google Scholar 

  17. Li J, Zhang W, Zhang S, et al., A theoretic study of a distance-based regression model, Science in China Series A: Mathematics, 2019, 62(5): 979–998.

    Article  MathSciNet  MATH  Google Scholar 

  18. Li Q, Wacholder S, Hunter D J, et al., Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment, Genetic Epidemiology, 2009, 33(5): 432–441.

    Article  Google Scholar 

  19. Gretton A, Fukumizu K, Harchaoui Z, et al., A fast, consistent kernel two-sample test, Advances in Neural Information Processing Systems, 2009, 23: 673–681.

    Google Scholar 

  20. Zhang K, Peters J, Janzing D, et al., Kernel-based conditional independence test and application in causal discovery, Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2012, 804–813.

    Google Scholar 

  21. Gower J C, Some distance properties of latent root and vector methods used in multivariate analysis, Biometrika, 1966, 53, 325–338.

    Article  MathSciNet  MATH  Google Scholar 

  22. Li Q, Hu J, Ding J, et al., Fisher’s method of combining dependent statistics using generalizations of the gamma distribution: With applications to genetic pleiotropic associations, Biostatistics, 2014, 15: 284–295.

    Article  Google Scholar 

  23. Singh D, Febbo P G, Ross K, et al., Gene expression correlates of clinical prostate cancer behavior, Cancer Cell, 2002, 1(2): 203–209.

    Article  Google Scholar 

  24. Wu G, Intestinal mucosal amino acid catabolism, Journal of Nutrition, 1998, 128(8): 1249–1252.

    Article  Google Scholar 

  25. Zihni C, Mills C, Matter K, et al., Tight junctions: From simple barriers to multifunctional molecular gates, Nature Reviews Molecular Cell Biology, 2016, 17(9): 564–580.

    Article  Google Scholar 

  26. Pinaud L, Sansonetti P J, and Phalipon A, Host cell targeting by enteropathogenic bacteria T3SS effectors, Trends in Microbiology, 2018, 26(4): 266–283.

    Article  Google Scholar 

  27. Box G E P, Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification, The Annals of Mathematical Statistics, 1954, 25: 290–302.

    Article  MathSciNet  MATH  Google Scholar 

  28. Xu G, Lin L, Wei P, et al., An adaptive two-sample test for high-dimensional means, Biometrika, 2016, 103(3): 609–624.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qizhai Li.

Additional information

This work was partially supported by Beijing Natural Science Foundation under Grant No. Z180006.

This paper was recommended for publication by Editor JIN Baisuo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, Y., Zhang, W., Liu, A. et al. Distance-Based Regression Analysis for Measuring Associations. J Syst Sci Complex 36, 393–411 (2023). https://doi.org/10.1007/s11424-023-2070-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-023-2070-7

Keywords

Navigation