Skip to main content

Data Analysis in Single-Cell Transcriptome Sequencing

  • Protocol
Computational Systems Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1754))

Abstract

Single-cell transcriptome sequencing, often referred to as single-cell RNA sequencing (scRNA-seq), is used to measure gene expression at the single-cell level and provides a higher resolution of cellular differences than bulk RNA-seq. With more detailed and accurate information, scRNA-seq will greatly promote the understanding of cell functions, disease progression, and treatment response. Although the scRNA-seq experimental protocols have been improved very quickly, many challenges in the scRNA-seq data analysis still need to be overcome. In this chapter, we focus on the introduction and discussion of the research status in the field of scRNA-seq data normalization and cluster analysis, which are the two most important challenges in the scRNA-seq data analysis. Particularly, we present a protocol to discover and validate cancer stem cells (CSCs) using scRNA-seq. Suggestions have also been made to help researchers rationally design their scRNA-seq experiments and data analysis in their future studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gao S, Ou J, Xiao K (2014) R language and Bioconductor in bioinformatics applications (Chinese Edition). Tianjin Science and Technology Translation Publishing, Co. Ltd, Tianjin

    Google Scholar 

  2. Stegle O, Teichmann SA, Marioni JC (2015) Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 16(3):133–145

    Article  CAS  Google Scholar 

  3. Zhang M, Sun H, Fei Z, Zhan F, Gong X, Gao S (2014) Fastq_clean: an optimized pipeline to clean the Illumina sequencing data with quality control. 2014 I.E. international conference on bioinformatics and biomedicine, pp 44–48

    Google Scholar 

  4. Ziegenhain C, Vieth B, Parekh S, Reinius B, Guillaumet-Adkins A, Smets M, Leonhardt H, Heyn H, Hellmann I, Enard W (2017) Comparative analysis of single-cell RNA sequencing methods. Mol Cell 65(4):631–643

    Article  CAS  Google Scholar 

  5. Gao S, Tian X, Chang H, Sun Y, Wu Z, Cheng Z, Dong P, Zhao Q, Ruan J, Bu W (2017) Two novel lncRNAs discovered in human mitochondrial DNA using PacBio full-length transcriptome data. Mitochondrion. https://doi.org/10.1016/j.mito.2017.08.002

  6. Ilicic T, Kim JK, Kolodziejczyk AA, Bagger FO, Mccarthy DJ, Marioni JC, Teichmann SA (2016) Classification of low quality cells from single-cell RNA-seq data. Genome Biol 17(1):29

    Article  Google Scholar 

  7. Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10):R106

    Article  CAS  Google Scholar 

  8. Robinson MD, Mccarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140

    Article  CAS  Google Scholar 

  9. Zhang Y, Li D, Sun B (2015) Do housekeeping genes exist? PLoS One 10(5):e0123691

    Article  Google Scholar 

  10. Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B (2011) Synthetic spike-in standards for RNA-seq experiments. Genome Res 21(9):1543–1551

    Article  CAS  Google Scholar 

  11. Risso D, Ngai J, Speed TP, Dudoit S (2014) Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 32(9):896–902

    Article  CAS  Google Scholar 

  12. Lovén J, Orlando DA, Sigova AA, Lin CY, Rahl PB, Burge CB, Levens DL, Lee TI, Young RA (2012) Revisiting global gene expression analysis. Cell 151(3):476–482

    Article  Google Scholar 

  13. Islam S, Zeisel A, Joost S, La MG, Zajac P, Kasper M, Lönnerberg P, Linnarsson S (2014) Quantitative single-cell RNA-seq with unique molecular identifiers. Nat Methods 11(2):163–166

    Article  CAS  Google Scholar 

  14. Lun AT, Bach K, Marioni JC (2016) Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol 17(1):75

    Article  Google Scholar 

  15. Ren Y, Zhang J, Sun Y, Wu Z, Ruan J, He B, Liu G, Gao S, Bu W (2016) Full-length transcriptome sequencing on PacBio platform (in Chinese). Chin Sci Bull 11(61):1250–1254

    Google Scholar 

  16. Satija R, Farrell JA, Gennert D, Schier AF, Regev A (2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33(5):495–502

    Article  CAS  Google Scholar 

  17. Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemometr Intell Lab Syst 2(1–3):37–52

    Article  CAS  Google Scholar 

  18. Hyvarinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430

    Article  CAS  Google Scholar 

  19. Balakrishnama S, Ganapathiraju A (1998) Linear discriminant analysis – a brief tutorial. Procof Intjoint Confon Neural Networks 3(94):387–391

    Google Scholar 

  20. Carroll JD, Arabie P (1980) Multidimensional scaling. Annu Rev Psychol 31(31):607–649

    Article  CAS  Google Scholar 

  21. Maaten LVD, Hinton G (2008) Viualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605

    Google Scholar 

  22. Levina E, Bickel PJ (2004) Maximum likelihood estimation of intrinsic dimension. Adv Neural Inf Proces Syst 17:777–784

    Google Scholar 

  23. Camastra F, Vinciarelli A (2002) Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans Pattern Anal Mach Intell 24(10):1404–1407

    Article  Google Scholar 

  24. Pettis KW, Bailey TA, Jain AK, Dubes RC (1979) An intrinsic dimensionality estimator from near-neighbor information. IEEE Trans Pattern Anal Mach Intell PAMI-1(1):25–37

    Article  Google Scholar 

  25. Costa JA, Hero AO (2004) Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans Signal Process 52(8):2210–2221

    Article  Google Scholar 

  26. Kfgl B (2002) Intrinsic dimension estimation using packing numbers. Adv Neural Inform Process Syst NIPS-02:697–704

    Google Scholar 

  27. Pettit JB, Tomer R, Achim K, Richardson S, Azizi L, Marioni J (2014) Identifying cell types from spatially referenced single-cell expression datasets. PLoS Comput Biol 10(9):e1003824

    Article  Google Scholar 

  28. O'Flaherty JD, Barr M, Fennell D, Richard D, Reynolds J, O'Leary J, O’Byrne K (2012) The cancer stem-cell hypothesis: its emerging role in lung cancer biology and its relevance for future therapy. J Thorac Oncol 7(12):1880–1890

    Article  CAS  Google Scholar 

  29. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) Star: ultrafast universal rna-seq aligner. Bioinformatics 29(1):15–21

    Article  CAS  Google Scholar 

  30. Wu Z, Liu W, Jin X, Yu D, Wang H, Liu L, Ruan J, Gao S (2018) NormExpression: an R package to normalize gene expression data using evaluated methods. bioRxiv. https://doi.org/10.1101/251140

Download references

Acknowledgments

I appreciate help equally from the people listed below. They are Professor Wenjun Bu; Professor Lin Liu; Ph.D. student Hua Wang; Master’s student Yu Sun and Deshui Yu from College of Life Sciences, Nankai University; Professor Jishou Ruan; PhD student Zhenfeng Wu from School of Mathematical Sciences, Nankai University; and Associate Professor Weixiang Liu from Shenzhen University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shan Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Cite this protocol

Gao, S. (2018). Data Analysis in Single-Cell Transcriptome Sequencing. In: Huang, T. (eds) Computational Systems Biology. Methods in Molecular Biology, vol 1754. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7717-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7717-8_18

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7716-1

  • Online ISBN: 978-1-4939-7717-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics