Abstract
Single-cell transcriptome sequencing, often referred to as single-cell RNA sequencing (scRNA-seq), is used to measure gene expression at the single-cell level and provides a higher resolution of cellular differences than bulk RNA-seq. With more detailed and accurate information, scRNA-seq will greatly promote the understanding of cell functions, disease progression, and treatment response. Although the scRNA-seq experimental protocols have been improved very quickly, many challenges in the scRNA-seq data analysis still need to be overcome. In this chapter, we focus on the introduction and discussion of the research status in the field of scRNA-seq data normalization and cluster analysis, which are the two most important challenges in the scRNA-seq data analysis. Particularly, we present a protocol to discover and validate cancer stem cells (CSCs) using scRNA-seq. Suggestions have also been made to help researchers rationally design their scRNA-seq experiments and data analysis in their future studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gao S, Ou J, Xiao K (2014) R language and Bioconductor in bioinformatics applications (Chinese Edition). Tianjin Science and Technology Translation Publishing, Co. Ltd, Tianjin
Stegle O, Teichmann SA, Marioni JC (2015) Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 16(3):133–145
Zhang M, Sun H, Fei Z, Zhan F, Gong X, Gao S (2014) Fastq_clean: an optimized pipeline to clean the Illumina sequencing data with quality control. 2014 I.E. international conference on bioinformatics and biomedicine, pp 44–48
Ziegenhain C, Vieth B, Parekh S, Reinius B, Guillaumet-Adkins A, Smets M, Leonhardt H, Heyn H, Hellmann I, Enard W (2017) Comparative analysis of single-cell RNA sequencing methods. Mol Cell 65(4):631–643
Gao S, Tian X, Chang H, Sun Y, Wu Z, Cheng Z, Dong P, Zhao Q, Ruan J, Bu W (2017) Two novel lncRNAs discovered in human mitochondrial DNA using PacBio full-length transcriptome data. Mitochondrion. https://doi.org/10.1016/j.mito.2017.08.002
Ilicic T, Kim JK, Kolodziejczyk AA, Bagger FO, Mccarthy DJ, Marioni JC, Teichmann SA (2016) Classification of low quality cells from single-cell RNA-seq data. Genome Biol 17(1):29
Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10):R106
Robinson MD, Mccarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140
Zhang Y, Li D, Sun B (2015) Do housekeeping genes exist? PLoS One 10(5):e0123691
Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B (2011) Synthetic spike-in standards for RNA-seq experiments. Genome Res 21(9):1543–1551
Risso D, Ngai J, Speed TP, Dudoit S (2014) Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 32(9):896–902
Lovén J, Orlando DA, Sigova AA, Lin CY, Rahl PB, Burge CB, Levens DL, Lee TI, Young RA (2012) Revisiting global gene expression analysis. Cell 151(3):476–482
Islam S, Zeisel A, Joost S, La MG, Zajac P, Kasper M, Lönnerberg P, Linnarsson S (2014) Quantitative single-cell RNA-seq with unique molecular identifiers. Nat Methods 11(2):163–166
Lun AT, Bach K, Marioni JC (2016) Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol 17(1):75
Ren Y, Zhang J, Sun Y, Wu Z, Ruan J, He B, Liu G, Gao S, Bu W (2016) Full-length transcriptome sequencing on PacBio platform (in Chinese). Chin Sci Bull 11(61):1250–1254
Satija R, Farrell JA, Gennert D, Schier AF, Regev A (2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33(5):495–502
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemometr Intell Lab Syst 2(1–3):37–52
Hyvarinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430
Balakrishnama S, Ganapathiraju A (1998) Linear discriminant analysis – a brief tutorial. Procof Intjoint Confon Neural Networks 3(94):387–391
Carroll JD, Arabie P (1980) Multidimensional scaling. Annu Rev Psychol 31(31):607–649
Maaten LVD, Hinton G (2008) Viualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605
Levina E, Bickel PJ (2004) Maximum likelihood estimation of intrinsic dimension. Adv Neural Inf Proces Syst 17:777–784
Camastra F, Vinciarelli A (2002) Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans Pattern Anal Mach Intell 24(10):1404–1407
Pettis KW, Bailey TA, Jain AK, Dubes RC (1979) An intrinsic dimensionality estimator from near-neighbor information. IEEE Trans Pattern Anal Mach Intell PAMI-1(1):25–37
Costa JA, Hero AO (2004) Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans Signal Process 52(8):2210–2221
Kfgl B (2002) Intrinsic dimension estimation using packing numbers. Adv Neural Inform Process Syst NIPS-02:697–704
Pettit JB, Tomer R, Achim K, Richardson S, Azizi L, Marioni J (2014) Identifying cell types from spatially referenced single-cell expression datasets. PLoS Comput Biol 10(9):e1003824
O'Flaherty JD, Barr M, Fennell D, Richard D, Reynolds J, O'Leary J, O’Byrne K (2012) The cancer stem-cell hypothesis: its emerging role in lung cancer biology and its relevance for future therapy. J Thorac Oncol 7(12):1880–1890
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) Star: ultrafast universal rna-seq aligner. Bioinformatics 29(1):15–21
Wu Z, Liu W, Jin X, Yu D, Wang H, Liu L, Ruan J, Gao S (2018) NormExpression: an R package to normalize gene expression data using evaluated methods. bioRxiv. https://doi.org/10.1101/251140
Acknowledgments
I appreciate help equally from the people listed below. They are Professor Wenjun Bu; Professor Lin Liu; Ph.D. student Hua Wang; Master’s student Yu Sun and Deshui Yu from College of Life Sciences, Nankai University; Professor Jishou Ruan; PhD student Zhenfeng Wu from School of Mathematical Sciences, Nankai University; and Associate Professor Weixiang Liu from Shenzhen University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Gao, S. (2018). Data Analysis in Single-Cell Transcriptome Sequencing. In: Huang, T. (eds) Computational Systems Biology. Methods in Molecular Biology, vol 1754. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7717-8_18
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7717-8_18
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7716-1
Online ISBN: 978-1-4939-7717-8
eBook Packages: Springer Protocols