Computational approaches for inferring tumor evolution from single-cell genomic data
Introduction
Cancer is a disease emerging from a single cell in the somatic tissue and is driven by a complex interplay of somatic mutations, copy number alterations (CNAs) and chromosomal rearrangements 1, 2. As a tumor progresses, diverse genomic aberrations give rise to genetically heterogeneous subpopulations (clones) of cells interacting with each other in a Darwinian framework of mutations, fitness and selection 3, 4, 5. Intratumor heterogeneity (ITH) complicates the diagnosis and treatment of cancer patients and causes relapse and drug resistance 6, 7, 8. The emergence of next-generation sequencing (NGS) technologies enabled a thorough analysis of tumor heterogeneity through the generation of large-scale quantitative genomic datasets 9, 10, 11. However, despite these advances, a comprehensive understanding of ITH has proved elusive thus far 12, 13.
Bulk high-throughput sequencing has been the technology of choice for studying heterogeneity and tumor evolution 14, 15. Subpopulations are computationally inferred 16, 17, 18, 19, 20, 21, 22 from variant allele frequencies (VAFs) of mutations detected in bulk DNA that consists of an admixture of DNA from millions of cells in a cancer tissue. VAFs, however, provide a noisy signal for deconvoluting heterogeneity 23, 24 and cannot reliably reconstruct rare subclones, or subclones having similar frequencies in the tumor mass. The single-sample approach of bulk sequencing is augmented in multi-region sequencing through which multiple samples obtained from different geographical regions of a tumor are analyzed 25, 26, 27, 28. Although multi-region sequencing can reveal geographically segregated subpopulations, resolving spatially intermixed subclones remains difficult and this approach still relies on deconvolution of subclones for phylogeny inference [29].
The emergence of single-cell DNA sequencing (SCS) technologies has enabled sequencing of individual cancer cells, providing the highest-resolution of the mutational histories of cancer 23, 30. SCS aims to further our knowledge of different aspects of cancer biology including resolving clonal substructure, tracing tumor evolution, identifying rare subclones and understanding the role of cancer microenvironment in tumor progression 23, 24, 31. In this review, we discuss the state of the art of SCS technologies, technical challenges and computational approaches to overcome those, and finally, approaches for understanding ITH and tumor evolution from SCS data.
Section snippets
An overview of single-cell DNA sequencing methods
Figure 1 illustrates the steps of a single-cell DNA sequencing study. The first step in producing high-quality SCS data is the isolation of individual cells. Early experiments used techniques such as serial [32] or microwell dilution [33], micropipetting [34], laser-capture microdissection (LCM) [35] to isolate cells from a solid tissue. Several methods 36, 37 opted for isolation of single nuclei that remain intact in frozen samples. Later, flow-assisted cell sorting (FACS) 38, 39 and
Single-cell sequencing errors
Different technical artifacts introduced during the single-cell DNA sequencing workflow may introduce noise into the datasets, confounding bioinformatics analysis (Figure 2). Inadvertent isolation of DNA from multiple cells violates the basic assumption of the methods designed for analyzing single-cell data resulting in spurious biological conclusions [72]. Specifically, presence of ‘cell doublets’ is a persisting error (ranging from 1% 38, 42, 56 to 10% 60, 61, 73), in which more than one cell
Variant calling from single cells
Detection of copy number variants from SCS data commonly involves a variable binning method where the genome is divided into bins and the read count in each bin represents whether the region is over- or under-represented compared to a diploid genome 38, 56. Loess normalization is applied for correcting bias due to GC content and circular binary segmentation (CBS) [76] is used to segment the copy number profiles. Specific algorithms account for technical artifacts introduced by WGA 77, 78. A
Subclonal reconstruction from single cells
Variants detected from single cells are used to infer clonal subpopulations. Dimensionality reduction techniques such as PCA [89] and multidimensional scaling [90] have been used to infer monoclonality [60] or polyclonality [63] of a tumor. Hierarchical clustering has been applied on CNV profiles ••70, ••91 as well as SNV profiles 62, 63, 86, 92 to uncover the clonal composition in a tumor. Failure to account for errors in variant calling can result in spurious clustering. To overcome the
Reconstruction of phylogeny from single cells
One of the major applications of SCS is to study tumor evolution via the inference of phylogeny, a binary genealogical tree along which the tumor cells evolve. Even though concepts borrowed from population genetics such as selection and fitness are useful in the context of tumor evolution [96], many concepts (e.g., meiotic recombination, sexual selection) do not apply to tumors [4]. The presence of technical artifacts further inhibits a straightforward use of classical phylogeny inference
Conclusion & future directions
In conclusion, single-cell genomics is a promising new method that can improve many facets of cancer research, by illuminating tumor initiation, metastasis and therapy resistance. In the clinic, these tools are likely to have important applications in early detection, non-invasive monitoring and personalized therapy. However, significant challenges still remain and will need to be overcome before clinical applications can truly be realized. Even though the error rates of SCS datasets have
Funding
The study was supported by the National Cancer Institute (grant R01 CA172652 to KC), the NCI-Designated cancer center support grant to MD Anderson cancer center (P30 CA016672), and the Andrew Sabin Family Foundation. This work was supported by grants to NN from NCI (1RO1CA169244-01) and the Chan-Zuckerberg Foundation (HCA-A-1704-01668).
References (131)
- et al.
The life history of 21 breast cancers
Cell
(2012) - et al.
Converting cancer therapies into cures: lessons from infectious diseases
Cell
(2012) - et al.
Inferring the mutational history of a tumor using multi-state perfect phylogeny mixtures
Cell Syst
(2016) - et al.
Single cell analysis of cancer genomes
Curr Opin Genet Dev
(2014) - et al.
Multiregional tumor trees are not phylogenies
Trends Cancer
(2017) - et al.
Advances and applications of single-cell sequencing technologies
Mol Cell
(2015) - et al.
Single cell profiling of Circulating tumor cells: transcriptional heterogeneity and diversity from breast cancer cell lines
PLoS One
(2012) - et al.
Whole-genome amplification by degenerate oligonucleotide primed PCR (DOP-PCR)
Cold Spring Harb Protoc
(2008) - et al.
Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm
Cell
(2012) - et al.
Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor
Cell
(2012)
Single-cell sequencing analysis characterizes common and cell-lineage-specific mutations in a muscle-invasive bladder cancer
Gigascience
Clonal evolution in breast cancer revealed by single nucleus genome sequencing
Nature
Accurate identification of single-nucleotide variants in whole-genome-amplified single cells
Nat Methods
A single cell level based method for copy number variation analysis by low coverage massively parallel sequencing
PLoS One
Integrative analysis of genome-wide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer
Genome Res
The cancer genome
Nature
Cancer genome landscapes
Science
The clonal evolution of tumor cell populations
Science
Cancer as an evolutionary and ecological process
Nat Rev Cancer
Evolution of the cancer genome
Nat Rev Genet
Clonal evolution in cancer
Nature
Evolutionary dynamics of carcinogenesis and why targeted therapy does not work
Nat Rev Cancer
The causes and consequences of genetic heterogeneity in cancer evolution
Nature
Comprehensive genomic characterization defines human glioblastoma genes and core pathways
Nature
Mutational landscape and significance across 12 major cancer types
Nature
Turning ecology and evolution against cancer
Nat Rev Cancer
The clonal and mutational evolution spectrum of primary triple-negative breast cancers
Nature
Mutations driving CLL and their evolution in progression and relapse
Nature
PyClone: statistical inference of clonal population structure in cancer
Nat Methods
SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution
PLoS Comput Biol
Reconstruction of clonal trees and tumor composition from multi-sample sequencing data
Bioinformatics
PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors
Genome Biol
Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing
Proc Natl Acad Sci
Reconstructing metastatic seeding patterns of human cancers
Nat Commun
Cancer genomics: one cell at a time
Genome Biol
Intratumor heterogeneity and branched evolution revealed by multiregion sequencing
N Engl J Med
Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing
Nat Genet
Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing
Science
Subclonal diversification of primary breast cancer revealed by multiregion sequencing
Nat Med
Unravelling biology and shifting paradigms in cancer with single-cell sequencing
Nat Rev Cancer
Clonal growth of mammalian cells in a chemically defined, synthetic medium
Proc Natl Acad Sci U S A
Massively parallel polymerase cloning and genome sequencing of single cells using nanoliter microwells
Nat Biotechnol
Genome-wide detection of single-nucleotide and copy-number variations of a single human cell
Science
Laser capture microdissection for analysis of single cells
Methods Mol Med
Cancer stem cells: a guide for skeptics
J Cell Biochem
Circulating tumor cells, disease progression, and survival in metastatic breast cancer
N Engl J Med
Tumour evolution inferred by single-cell sequencing
Nature
Single-Cell mutational profiling and clonal phylogeny in cancer
Genome Res
Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics
Proc Natl Acad Sci U S A
Optimizing sparse sequencing of single cells for highly multiplex copy number profiling
Genome Res
Cited by (23)
Mechano-immunology in microgravity
2023, Life Sciences in Space ResearchImproving cellular phylogenies through the integrated use of mutation order and optimality principles
2023, Computational and Structural Biotechnology JournalHyperTraPS: Inferring Probabilistic Patterns of Trait Acquisition in Evolutionary and Disease Progression Pathways
2020, Cell SystemsCitation Excerpt :Recent methods for understanding feature relationships in single-cell data include SCITE (Jahn et al., 2016) and SiFit (Zafar et al., 2018), while methods for relating the samples phylogenetically in single-cell data and evaluating clonal clusters include OncoNEM (Ross and Markowetz, 2016). Zafar et al. (2018) discuss these methods in the context of single cell cancer observations. At the intermediate level of attempting to find common relationships in feature space across multiple cancer samples in different patients and different tissues, the recent Revolver platform attempts to provide a unifying interpretative approach via the method of transfer learning (Caravagna et al., 2018), and note that HyperTraPS could be readily applied to compilations of patient specific somatic trees too.
Inferring Markov Chains to Describe Convergent Tumor Evolution with CIMICE
2024, IEEE/ACM Transactions on Computational Biology and BioinformaticsAssessing the performance of methods for cell clustering from single-cell DNA sequencing data
2023, PLoS Computational Biology