Skip to main content

Statistical Modeling of Coverage in High-Throughput Data

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1038))

Abstract

In high-throughput sequencing experiments, the number of reads mapping to a genomic region, also known as the “coverage” or “coverage depth,” is often used as a proxy for the abundance of the underlying genomic region in the sample. The abundance, in turn, can be used for many purposes including calling SNPs, estimating the allele frequency in a pool of individuals, identifying copy number variations, and identifying differentially expressed shRNAs in shRNA-seq experiments.

In this chapter we describe the fundamentals of statistical modeling of coverage depth and discuss the problems of estimation and inference in the relevant experimental scenarios.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Metzker ML (2010) Sequencing technologies – the next generation. Nat Rev Genet 11:31–46

    Article  PubMed  CAS  Google Scholar 

  2. Kircher M, Kelso J (2010) High-throughput DNA sequencing – concepts and limitations. Bioessays 32:425–536

    Article  Google Scholar 

  3. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63

    Article  PubMed  CAS  Google Scholar 

  4. Medvedev P, Stanciu M, Brudno M (2009) Computational methods for discovering structural variation with next generation sequencing. Nat Methods 6:S13–S20

    Article  PubMed  CAS  Google Scholar 

  5. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9(4):357–359

    Article  PubMed  CAS  Google Scholar 

  6. Li H et al (2009) The sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed  Google Scholar 

  7. R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  8. Glenn TC (2011) Field guide to next-generation DNA sequencers. Mol Ecol Resour 11:759–769

    Article  PubMed  CAS  Google Scholar 

  9. Benjamini Y, Speed TP (2012) Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 40(10):1–14

    Google Scholar 

  10. McCullagh P, Nelder J (1989) Generalized linear models, 2nd edn. Chapman and Hall/CRC, Boca Raton

    Google Scholar 

  11. Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10(3):512–526

    PubMed  CAS  Google Scholar 

  12. Hilbe JM (2007) Negative binomial regression. Cambridge University Press, Cambridge

    Book  Google Scholar 

  13. Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11:R106

    Article  PubMed  CAS  Google Scholar 

  14. 1000 Genomes Project Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this protocol

Cite this protocol

Golan, D., Rosset, S. (2013). Statistical Modeling of Coverage in High-Throughput Data. In: Shomron, N. (eds) Deep Sequencing Data Analysis. Methods in Molecular Biology, vol 1038. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-514-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-514-9_4

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-513-2

  • Online ISBN: 978-1-62703-514-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics