Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter November 17, 2018

A novel method to accurately calculate statistical significance of local similarity analysis for high-throughput time series

  • Fang Zhang , Ang Shan and Yihui Luan EMAIL logo

Abstract

In recent years, a large number of time series microbial community data has been produced in molecular biological studies, especially in metagenomics. Among the statistical methods for time series, local similarity analysis is used in a wide range of environments to capture potential local and time-shifted associations that cannot be distinguished by traditional correlation analysis. Initially, the permutation test is popularly applied to obtain the statistical significance of local similarity analysis. More recently, a theoretical method has also been developed to achieve this aim. However, all these methods require the assumption that the time series are independent and identically distributed. In this paper, we propose a new approach based on moving block bootstrap to approximate the statistical significance of local similarity scores for dependent time series. Simulations show that our method can control the type I error rate reasonably, while theoretical approximation and the permutation test perform less well. Finally, our method is applied to human and marine microbial community datasets, indicating that it can identify potential relationship among operational taxonomic units (OTUs) and significantly decrease the rate of false positives.

Award Identifier / Grant number: 11371227, 61432010, 11626247

Funding statement: The research was supported by the Natural Science Foundation of China Grants (Funder Id: 10.13039/501100001809, 11371227, 61432010, 11626247).

Appendix A. Supplementary Materials

The type I error rate performance of three models with different time delays are shown in Supplementary Materials.

References

Andersson, M. G. I., M. Berga, E. S. Lindström and S. Langenheder (2014): “The spatial structure of bacterial communities is influenced by historical environmental conditions,” Ecology, 95, 1134–1140.10.1890/13-1300.1Search in Google Scholar PubMed

Balasubramaniyan, R., E. Hüllermeier, N. Weskamp and J. Kämper (2005): “Clustering of gene expression data using a local shape-based similarity measure,” Bioinformatics, 21, 1069–1077.10.1093/bioinformatics/bti095Search in Google Scholar PubMed

Barberán, A., S. T. Bates, E. O. Casamayor and N. Fierer (2011): “Using network analysis to explore co-occurrence patterns in soil microbial communities,” ISME J., 6, 343–351.10.1038/ismej.2011.119Search in Google Scholar PubMed PubMed Central

Beman, J. M., J. A. Steele and J. A. Fuhrman (2011): “Co-occurrence patterns for abundant marine archaeal and bacterial lineages in the deep chlorophyll maximum of coastal california,” ISME J., 5, 1077–1085.10.1038/ismej.2010.204Search in Google Scholar PubMed PubMed Central

Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” J. R. Stat. Soc. B, 57, 289–300.10.1111/j.2517-6161.1995.tb02031.xSearch in Google Scholar

Berkowitz, J. and L. Kilian (2000): “Recent developments in bootstrapping time series,” Economet. Rev., 19, 1–48.10.1080/07474930008800457Search in Google Scholar

Caporaso, J. G., C. L. Lauber, E. K. Costello, D. Berg-Lyons, A. Gonzalez, J. Stombaugh, D. Knights, P. Gajer, J. Ravel, N. Fierer, J. I. Gordon and R. Knight (2011): “Moving pictures of the human microbiome,” Genome Biol., 12, R50.10.1186/gb-2011-12-5-r50Search in Google Scholar PubMed PubMed Central

Carlstein, E. (1986): “The use of subseries values for estimating the variance of a general statistic from a stationary sequence,” Ann. Stat., 14, 1171–1179.10.1214/aos/1176350057Search in Google Scholar

Chaffron, S., H. Rehrauer, J. Pernthaler and C. von Mering (2010): “A global network of coexisting microbes from environmental and whole-genome sequence data,” Genome Res., 20, 947–959.10.1101/gr.104521.109Search in Google Scholar PubMed PubMed Central

Cram, J. A., L. C. Xia, D. M. Needham, R. Sachdeva, F. Sun and J. A. Fuhrman (2015): “Cross-depth analysis of marine bacterial networks suggests downward propagation of temporal changes,” ISME J., 9, 2573–2586.10.1038/ismej.2015.76Search in Google Scholar PubMed PubMed Central

Durno, W. E., Hanson, N. W., Konwar, K. M & Hallam, S. J. 2013, ‘Expanding the boundaries of local similarity analysis’, BMC Genomics, vol. 14, pp. S3–.10.1186/1471-2164-14-S1-S3Search in Google Scholar PubMed PubMed Central

Faust, K., J. F. Sathirapongsasuti, J. Izard, N. Segata, D. Gevers, J. Raes and C. Huttenhower (2012): “Microbial co-occurrence relationships in the human microbiome,” PLOS Comput. Biol., 8, 1–17.10.1371/journal.pcbi.1002606Search in Google Scholar PubMed PubMed Central

Faust, K., L. Lahti, D. Gonze, W. M. de Vos and J. Raes (2015): “Metagenomics meets time series analysis: unraveling microbial community dynamics,” Curr. Opin. Microbiol., 25, 56–66.10.1016/j.mib.2015.04.004Search in Google Scholar PubMed

Fierer, N., D. Nemergut, R. Knight and J. M. Craine (2010): “Changes through time: integrating microorganisms into the study of succession,” Res. Microbiol., 161, 635–642.10.1016/j.resmic.2010.06.002Search in Google Scholar PubMed

Fuhrman, J. A., I. Hewson, M. S. Schwalbach, J. A. Steele, M. V. Brown and S. Naeem (2006): “Annually reoccurring bacterial communities are predictable from ocean conditions,” Proc. Natl. Acad. Sci. USA, 103, 13104–13109.10.1073/pnas.0602399103Search in Google Scholar PubMed PubMed Central

Gilbert, J. A., J. A. Steele, J. G. Caporaso, L. Steinbrück, J. Reeder, B. Temperton, S. Huse, A. C. McHardy, R. Knight, I. Joint, P. Somerfield, J. A. Fuhrman and D. Field (2012): “Defining seasonal marine microbial community dynamics,” ISME J., 6, 298–308.10.1038/ismej.2011.107Search in Google Scholar PubMed PubMed Central

Giovannoni, S. J. and K. L. Vergin (2012): “Seasonality in ocean microbial communities,” Science, 335, 671–676.10.1126/science.1198078Search in Google Scholar PubMed

Gonçalves, J. and S. Madeira (2014): “Latebiclustering: Efficient heuristic algorithm for time-lagged bicluster identification,” IEEE/ACM T. Comput. Bi, 11, 801–813.10.1109/TCBB.2014.2312007Search in Google Scholar PubMed

Ji, L. and K.-L. Tan (2004): “Mining gene expression data for positive and negative co-regulated gene clusters,” Bioinformatics, 20, 2711–2718.10.1093/bioinformatics/bth312Search in Google Scholar PubMed

Künsch, H. R. (1989): “The jackknife and the bootstrap for general stationary observations,” Ann. Stat., 17, 1217–1241.10.1214/aos/1176347265Search in Google Scholar

Liu, R. Y. and K. Singh (1992): Moving blocks jackknife and bootstrap capture weak dependence, New York: John Wiley, pp. 225–248.Search in Google Scholar

Lagnoux, A., S. Mercier, P. Vallois (2017): “Statistical significance based on length and position of the local score in a model of i.i.d. sequences,” Bioinformatics, 33, 654–660.10.1093/bioinformatics/btw699Search in Google Scholar PubMed

Ljung, G. M. and G. E. P. Box (1978): “On a measure of lack of fit in time series models,” Biometrika, 65, 297–303.10.1093/biomet/65.2.297Search in Google Scholar

Madeira, S. C., M. C. Teixeira, I. Sa-Correia and A. L. Oliveira (2010): “Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm,” IEEE/ACM T. Comput. Bi, 7, 153–165.10.1109/TCBB.2008.34Search in Google Scholar PubMed

Mudelsee, M. (2010): Climate Time Series Analysis: Classical Statistical and Bootstrap Methods, Dordrecht: Atmospheric and Oceanographic Sciences Library, Springer.10.1007/978-90-481-9482-7Search in Google Scholar

Palmer, C., E. M. Bik, D. B. DiGiulio, D. A. Relman and P. O. Brown (2007): “Development of the human infant intestinal microbiota,” PLOS Biol., 5, 1–18.10.1371/journal.pbio.0050177Search in Google Scholar PubMed PubMed Central

Pei, Y., Q. Gao, J. Li and X. Zhao (2014): “Identifying local co-regulation relationships in gene expression data,” J. Theor. Biol., 360, 200–207.10.1016/j.jtbi.2014.06.032Search in Google Scholar PubMed

Qian, J., M. Dolled-Filhart, J. Lin, H. Yu and M. Gerstein (2001): “Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions11edited by f. cohen,” J. Mol. Biol., 314, 1053–1066.10.1006/jmbi.2000.5219Search in Google Scholar PubMed

Qin, J., R. Li, J. Raes, M. Arumugam, K. S. Burgdorf, C. Manichanh, T. Nielsen, N. Pons, F. Levenez, T. Yamada, D. R. Mende, J. Li, J. Xu, S. Li, D. Li, J. Cao, B. Wang, H. Liang, H. Zheng, Y. Xie, J. Tap, P. Lepage, M. Bertalan, J.-M. Batto, T. Hansen, D. Le Paslier, A. Linneberg, H. B. Nielsen, E. Pelletier, P. Renault, T. Sicheritz-Ponten, K. Turner, H. Zhu, C. Yu, S. Li, M. Jian, Y. Zhou, Y. Li, X. Zhang, S. Li, N. Qin, H. Yang, J. Wang, S. Brunak, J. Doré, F. Guarner, K. Kristiansen, O. Pedersen, J. Parkhill, J. Weissenbach, M. Consortium, P. Bork, S. D. Ehrlich and J. Wang (2010): “A human gut microbial gene catalogue established by metagenomic sequencing,” Nature, 464, 59–65.10.1038/nature08821Search in Google Scholar PubMed PubMed Central

Ruan, Q., D. Dutta, M. S. Schwalbach, J. A. Steele, J. A. Fuhrman and F. Sun (2006): “Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors,” Bioinformatics, 22, 2532–2538.10.1093/bioinformatics/btl417Search in Google Scholar PubMed

Shade, A., J. S. Read, N. D. Youngblut, N. Fierer, R. Knight, T. K. Kratz, N. R. Lottig, E. E. Roden, E. H. Stanley, J. Stombaugh, R. J. Whitaker, C. H. Wu and K. D. McMahon (2012): “Lake microbial communities are resilient after a whole-ecosystem disturbance,” ISME J., 6, 2153–2167.10.1038/ismej.2012.56Search in Google Scholar PubMed PubMed Central

Shade, A., J. Gregory Caporaso, J. Handelsman, R. Knight and N. Fierer (2013): “A meta-analysis of changes in bacterial and archaeal communities with time,” ISME J., 7, 1493–1506.10.1038/ismej.2013.54Search in Google Scholar PubMed PubMed Central

Sherman, M., F. M. Speed Jr and F. M. Speed (1998): “Analysis of tidal data via the blockwise bootstrap,” J. Appl. Stat., 25, 333–340.10.1080/02664769823061Search in Google Scholar

Steele, J. A., P. D. Countway, L. Xia, P. D. Vigil, J. M. Beman, D. Y. Kim, C.-E. T. Chow, R. Sachdeva, A. C. Jones, M. S. Schwalbach, J. M. Rose, I. Hewson, A. Patel, F. Sun, D. A. Caron and J. A. Fuhrman (2011): “Marine bacterial, archaeal and protistan association networks reveal ecological linkages,” ISME J., 5, 1414–1425.10.1038/ismej.2011.24Search in Google Scholar PubMed PubMed Central

Storey, J. D. (2002): “A direct approach to false discovery rates,” J. R. Stat. Soc. B, 64, 479–498.10.1111/1467-9868.00346Search in Google Scholar

Storey, J. D., A. J. Bass, A. Dabney and D. Robinson (2015): qvalue: Q-value estimation for false discovery rate control. R package version 2.6.0.Search in Google Scholar

The Human Microbiome Project Consortium. (2012): “Structure, function and diversity of the healthy human microbiome,” Nature, 486, 207–214.10.1038/nature11234Search in Google Scholar PubMed PubMed Central

Trosvik, P., N. C. Stenseth and K. Rudi (2010): “Convergent temporal dynamics of the human infant gut microbiota,” ISME J., 4, 151–158.10.1038/ismej.2009.96Search in Google Scholar PubMed

Weiss, S., W. V. Treuren, C. Lozupone, K. Faust, J. Friedman, D. Ye, L. C. Xia, Z. Z. Xu, L. Ursell, E. J. Alm, A. Birmingham, J. A. Cram, J. A. Fuhrman, J. Raes, F. Sun, J. Zhou and R. Knight (2016): “Correlation detection strategies in microbial data sets vary widely in sensitivityand precision.” ISME J., 10, 1669–1681.10.1038/ismej.2015.235Search in Google Scholar PubMed PubMed Central

Waterman, M. S. (1995): Introduction to Computational Biology: Maps, Sequences and Genomes, NY, USA: Chapman and Hall/CRC.10.1007/978-1-4899-6846-3Search in Google Scholar

Xia, L. C., J. A. Steele, J. A. Cram, Z. G. Cardon, S. L. Simmons, J. J. Vallino, J. A. Fuhrman and F. Sun (2011): “Extended local similarity analysis (elsa) of microbial community and other time series data with replicates,” BMC Syst. Biol., 5, S15.10.1186/1752-0509-5-S2-S15Search in Google Scholar PubMed PubMed Central

Xia, L. C., D. Ai, J. Cram, J. A. Fuhrman and F. Sun (2013): “Efficient statistical significance approximation for local similarity analysis of high-throughput time series data,” Bioinformatics, 29, 230–237.10.1093/bioinformatics/bts668Search in Google Scholar PubMed PubMed Central

Xia, L. C., D. Ai, J. A. Cram, X. Liang, J. A. Fuhrman and F. Sun (2015): “Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of markov chains,” BMC Bioinformatics, 16, 301.10.1186/s12859-015-0732-8Search in Google Scholar PubMed PubMed Central

Zhou, J., Y. Deng, P. Zhang, K. Xue, Y. Liang, J. D. Van Nostrand, Y. Yang, Z. He, L. Wu, D. A. Stahl, T. C. Hazen, J. M. Tiedje and A. P. Arkin (2014): “Stochasticity, succession, and environmental perturbations in a fluidic ecosystem,” Proc. Natl. Acad. Sci. USA, 111, 836–845.10.1073/pnas.1324044111Search in Google Scholar PubMed PubMed Central


Supplementary Material

The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2018-0019).


Published Online: 2018-11-17

©2018 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 20.4.2024 from https://www.degruyter.com/document/doi/10.1515/sagmb-2018-0019/html
Scroll to top button