Applications of single-cell sequencing for human lung cancer: the progress and the future perspective

Human lung cancer is an extremely heterogeneous disease. Cell heterogeneity and diversity are responsible for lung cancer’s invasion, metastasis and the resistance to therapies. Recent developments of single-cell analysis make it possible for DNA sequencing, RNA sequencing and genomic element sequencing for single-cells from lung cancer. Methodology of single-cell sequencing was improved to reduce the errors in the processes due to applying tiny amount of the genetic materials. The single-cell sequencing for lung cancer has begun to reveal the deep insights of the cancer evolution and provided the new targets for clinical care. In this review, we briefly describe the methods of isolation, amplification and sequencing of single-cells. We also discuss the current progress in the research of lung cancer and the future prospects in single-cell analysis for the disease.


Introduction
Lung cancer is one of the leading causes of cancer death around the world [1]. Lung cancer can be divided into two broad categories. Small cell lung cancer (SCLC) accounts for 15% of lung cancer cases and is a highly malignant tumour exhibiting neuroendocrine characteristics. Non-small cell lung cancer (NSCLC) can further be classified into three major subtypes: adenocarcinoma, squamous cell carcinoma, and large cell carcinoma. Adenocarcinoma accounts for 38.5% of all lung cancer cases, with squamous cell carcinoma accounting for 20% and large cell carcinoma accounting for 2.9% [2]. In recent years, the systematic approaches with epigenetic studies, transcriptional profiling, exome sequencing, chromatin immunoprecipitation (ChIP) sequencing brought a huge influx of data to lung cancer. However, the data from cancer mass tissue cannot illustrate clonal selection during the cancer growth. It is also difficult to resolve cell to cell variations and identify rare mutated cells that may play key roles in disease progress [3]. Intra-tumour heterogeneity is resulted from the fast changing of the lineages diverge of tumours [4]. The genetic heterogeneity of cancer is significantly responsible for tumour progression and the outcomes of treatments [5]. Singlecell genome profiling can provide the highest-resolution analysis of intra-tumour genetic heterogeneity and also reduce the complexity of the genomic signal through the physical separation of cells [6,7]. The correlation between genotype-phenotype in single-cells can also provide important information for selecting the most appropriate clinical treatment for targeting heterogeneous cancer [8]. With the developments of single-cell isolation, amplification and sequencing technology, it is possible to sequence DNA and RNA samples in single-cells both from solid lung tumour mass or circulating tumour cells from blood, therefore identifying the relationships between genetic mutations, expressions and tumour diagnosis, progress and efficiency of targeting or immune therapies for lung cancer. Single-cell sequencing data just emerged for lung cancer and already had impacts for our understanding of the heterogeneity and diversity of the disease. In this review, we briefly introduce the methodologies for single-cell sequencing and the current research progress for lung cancer. We also discuss the future prospects of the research on the disease.

The Methods of Single-cell Sequencing of Cancer
Single-cell sequencing consists of three major steps: single-cell isolation, whole genome amplification (WGA) or whole-transcriptome amplification (WTA) and next generation sequencing.

Single-cell isolation
To obtain high quality single-cell sequencing data depends on efficient physical isolation of individual cells, amplifications of the genome or transcriptome of single cell to acquire sufficient materials for downstream analysis, identifying true variations from technological biases [7]. One of the major challenges of analyzing single-cell genomics data is to develop tools that differentiate technical artefacts and noise introduced during single-cell isolation, WGA, WTA and sequencing from true biological variation. During single-cell isolation, the population of cells being interrogated can be biased through selection of cells based on size, viability or propensity to enter the cell cycle. Using cell line cells as control may be problematic as cell lines or cell types may not be diploid; they can be highly aneuploid or even polyploid, and these affect experimental performance [7].
Cancer single-cells are normally obtained from two kinds of tissues. One is from solid tissue; the other is from body fluids containing circulating cancer cells. To get the single-cells from solid tissues is a technically challenge and requires mechanical or enzymatic dissociation that keeps the cells viable while not biasing for specific subpopulations. Patients-derived xenograft (PDX) cells provide good model for single cancer cell analyses of parental tumours [9]. Diseased tissues can have different dissociation kinetics and also have varied dissociation between samples of the same disease. Laser-assisted microdissection (LAM) can provide a low-throughput way of isolating DNA from single-cells in solid samples [10]. LAM is also used to isolate rare cells [3]. For circulating cells, automatic sorting and manual manipulation have been developed to isolate single-cells. Fluorescence-activated cell sorting (FACS) is one of the most common methods for single-cells isolation for circulating cells. Manual manipulation includes serial dilution, micropipetting, microwell dilution and optical tweezers [7]. Many commercial platforms have been developed for isolating circulating tumour cells (CTCs). These included the CellSearch system, the Magsweeper system [3]. These methods require magnets and EpCAM antibody or CD45 antibody to conjugate at nanoparticles. The DEPArray system uses microchip and the CellClector applies robotic micromanipulation capillary system to isolate single-cells [11,12]. Nanofilters are used to isolate rare cells by size exclusion for obtaining single-cells [13]. Nuclear isolation was also showed the advantage of single-cell sequencing on frozen tissue [14]. CellSearch system and Magsweeper system can be used for CTC enrichment and DEPArray system provides CTC sorting and recovery for previously enriched samples. Automated micromanipulation uses droplets or micro-mechanical valves in microfluidic devices, it is important to accurately confirm that a single-cell has been physically isolated before the sequencing so that spurious biological conclusions are not made after evaluating chambers that are empty or contain multiple cells [7].

Single-cell DNA and RNA sequencing
The methods to perform single-cell DNA and RNA sequencing have been developed in recent five years. Single-cells contain tiny amount of genetic materials for analysis. Sequencing DNA or RNA from single cell is technically challenging. A typical cancer cell contains 6-12 pg of DNA and 10-50 pg of RNA [15]. After successful isolation of single cells, the next steps are WGA or WTA to increase sufficient input genetic material for constructing sequencing libraries. During the process of amplification, a number of errors may arise after sequencing. The technical errors include allelic dropout events, amplification distortion, false positive or negative errors and coverage nonuniformity [16]. Starting with two genome copies by sorting out tetraploid nuclei improves the recovery of the genome about 10% by using degenerate oligonucleotide primed PCR (DOP)-PCR [14,17]. The method can be worked with flow-sorting and next generation sequencing (NGS) to have good resolution of copy number profiles from single cells [18], but it is not suitable for detecting mutations at base pair resolution due the poor physical coverage [3]. Other genome amplification method is multiple displacement amplification (MDA). MDA can achieve high physical coverage by applying the Phi29 for a single-cell genome [19]. DNA polymerase Phi 29 is increasingly used in molecular biology for multiple displacement DNA amplification procedures, and has a number of features that make it particularly suitable for single-cell sequencing. It allows mutations to be detected at base pair resolution. Multiple annealing and looping based amplification cycles (MALBAC) is another DNA single-cell sequencing method that performs a quasi-liner amplification of the genome with a polymerase for strand-displacement, Bst polymerase is one of the polymerases for the PCR reaction [20]. The method is suitable for copy number profiling. Single nucleus exome sequencing (SNES) reduces technical error rates due to taking advantage of G2/M nuclei which duplicate the amount of genomic DNA in a single cell, providing four copies of the genome as input material and thereby reducing technical error rates [21]. SNES also applies MDA for amplification. Before performing any single-cell sequencing study, one should carefully plan which method will be implemented and statistical method from ecology and population genetics to estimate the sample sizes for the experiment power [4]. Copy number variation (CNV) analysis has been successfully performed for single-cell studies [22,23]. The studies demonstrated the potential to overcome WGA-bias and to detect CNVs (more than 1 Mb) at the single cell level through low coverage massively parallel sequencing [22]. nbCNV is a read-depth based method to detect the copy number variants in single-cells. It applies negative binomial distributions to approximate loci along the whole genome. It has demonstrated that nbCNV achieved superior performance and high robustness for the detection of CNVs in single-cell sequencing data [23].
Single-cell RNA sequencing data can reveal the global expression profile and exon splicing expression with appropriate depth and analysis [24]. It can identify the functional heterogeneity or diversity of a population of cancer cells and uncover the molecular characters, specific signals and pathways of cancer [25]. It also can detect mutations in the transcripts of the single cells [26]. Since it is not yet possible to directly sequence RNA molecules, a common strategy applied to capture the single-cell transcriptome relies on three steps: RNA reverse transcription into first-strand cDNA, second-strand synthesis and cDNA amplification [27]. To profile RNA transcriptomes in single cells, initial methods are to use oligo-dT primers following by ligation adapter PCR or make liner transcription with T7 RNA polymerase [28]. Strong 3' bias could happen due to the inefficiency for fist-strand cDNA synthesis by reverse transcriptase. Moloney Murine Leukemia Virus (MMLV) reverse transcriptase only amplifies full length mRNA that can reduce sequencing errors [29]. Unique molecular indexes (UMIs) were developed to label each RNA molecule with unique barcode prior to WTA, thereby reduce amplification bias [30]. After WTA, the resulting cDNA libraries are barcoded and pooled for multiplexed next generation sequencing. A shallow RNA-sequencing could faithfully detect the heterogeneity and activated signalling pathways about 50,000 reads per cell being sufficient for unsupervised cell-type classification [31]. qRT-PCR can be used for a panel of genes in single cells and may have even higher sensitivity compared with RNA-sequencing in single cells [32]. The interpretation of single-cell sequencing data is also a challenge as appropriate computational and statistical methods are the key requirements for the success of experiments [33]. There are six single-cell RNA sequencing methods that are commonly used in the lab, these are poly(A) tailing, template switching, In vitro transcription, rolling circle amplification, 5' selection and 3' selection [27]. SMART-seq is a template switching method that utilizes an intrinsic property of RT M-MuLV to add three to four cytosines specifically to the 3' end of the first cDNA strand to anchor a universal PCR primer. It ensures that full length transcripts are amplified [34]. CEL-seq and MARS-seq are methods to pool cells and libraries to reduce labour intensive of in vitro transcription (IVT) for single-cell sequencing [35,36]. STRT-seq is more suitable for large-scale quantitative analysis, as well as for the characterization of transcription starting sites. It is a highly multiplexed and strand-specific single-cell RNA 5' end sequencing [37].
The co-sequencing methods of genome and transcriptomes in singles-cell system were reported, the data could lead to a multidimensional analysis of heterogeneity, stratification and phenotype regulation in cancer [38,39].

Single-cell genomic elements analysis
Single-cell epigenetic assays are very promising tools that can reveal the cellular variation of the regulating elements of the genome with the complicated phenotypes of human cancer, future research in the methodology will focus on the improving the bisulphite-sequencing and the coverage of sequencing assays [25].
DNase I hypersensitive sites (DHSs) provide important information on the presence of transcriptional regulatory elements and the state of chromatin in mammalian cells. Single-cell DNase sequencing can detect genome-wide DHSs at the single-cell level. The DHSs are highly reproducible among individual cells. Among different single-cells, highly expressed gene promoters and enhancers associated with multiple active histone modifications display constitutive DHS whereas chromatin regions with fewer histone modifications exhibit high variation of DHS. There were thousands of tumour-specific DHSs associated with promoters and enhancers critically involved in cancer development [40].
DNA methylation refers to the addition of the methyl group to the cytosine of a CpG dinucleotide in CpG islands by DNA methyltransferases [41]. Specific tumour-suppressor genes are usually significantly hypermethylated at their promoter regions or associated during carcinogenesis [42]. The hypomethylation of repetitive DNA sequences across the genome is the other cancer epigenomic regulation. Single-cell reduced representation bisulfite sequencing was developed and enabled to detect 0.5 to 3.7million CpG sites in a single-cell genome [43,44]. These approaches provide single-nucleotide resolution of CpG methylation patterns representing an exciting start to explore the CpG methylation in single-cell, although the overall coverage still need to be improved.

Single-cells Sequencing for Lung Cancer
As lung cancer is an extraordinary heterogeneous and diverse disease, single-cell sequencing will provide unique opportunities to identify clinically important subpopulations within heterogeneous tumour cell populations. The work for lung cancer just began, but with the development of the technology for single-cell isolation and sequencing, it will play more and more important roles in the research for lung cancer.
A recent study conducted a single-cell RNA-Sequencing analyse 336 single-cell RNA-Sequencing libraries from seven cell lines of lung adenocarcinoma [45]. Individual cells treated with the multi-tyrosine kinase inhibitor vandetanib revealed that house-keeping genes reduced their relative expression diversity during the treatment; the genes that were directly targeted by vandetanib, the EGFR and RET genes remained constant. The gene expression patterns of cancer-related genes were more diverse than expected based on the founder cells. Characteristic patterns in gene expression divergence, which would not be revealed by transcriptome analysis of bulk cells, may also play important roles when cells acquire drug resistance [45].
Lung adenocarcinoma is characterised by genetic alteration in the receptor tyrosine kinase (RTK-RAS-mitogen-activated protein kinases (MAPK)) pathway [46]. Specific targeting therapy has been developed to the mutations of EGFR, KRAS, BRAF and ALK. The targeting therapy for one mutation often leads to the resistance due to new mutation emerges in cancer cells. It requires more comprehensive investigation of genomic analysis of individual lung adenocarcinoma patients [47].
The individual lung cancer cells originated from a common ancestor and share early tumourinitiating genetic mutation, but tumour cells frequently diverge and show heterogeneity in growth, drug resistance and metastases [48,49]. The genetic heterogeneity is significantly associated with tumour progression and the treatment outcomes of lung cancer [50].
In a recent report on single-cell mRNA sequencing for 34 patient-derived patients-derived xenograft tumour cells from lung adenocarcinoma patients, fifty lung cancer specific single nucleotide variations were observed to be heterogeneous in individual PDX cells, Including KRAS G12D . PDX cells that survived in vitro anti-cancer treatment displayed consistent transcriptome signatures with the group characterised by KRAS G12D [26]. The further analysis the 34 patientderived PDX tumour cells for the intrinsic transcriptomic signatures identified two distinct intratumoural subgroups that were primarily distinguished by the gene module G64. The G64 module was predominantly composed of cell-cycle genes. E2F1, a transcription factor, most likely mediates the expression of the G64 module in single lung adenocarcinoma cells. The G64 module also indicated inter-tumoural heterogeneity based on its association with patient survival and other clinical variables such as smoking status and tumour stage [51].
Circulating tumour cells (CTCs) offer an alternative source for the detection of genetic alterations, as a form of "liquid biopsy" [52]. Currently the CellSearch system is the only FDAapproved CTC enumeration system [53]. Development of a reliable platform to detect and capture a small number of mutation-bearing CTCs from a blood sample is necessary for the development of non-invasive cancer diagnosis. A capture system for single CTCs based on high-density dielectrophoretic microwell array technology was developed to detect single cells from lung cell line cells. The detection rate was markedly higher than that obtained using the CellSearch system, suggesting the superior sensitivity of the system in detecting EpCAM-tumour cells. Isolation of single captured tumour cells, followed by detection of EGFR mutations, was achieved using Sanger sequencing [53].
In a recent report on detection of CNV, MALBAC was applied for whole-genome amplification to sequence single CTCs from lung adenocarcinoma and small-cell lung cancer (SCLC) patients. Every CTC from an individual patient, regardless of the cancer subtypes, showed reproducible CNV patterns, similar to those of the metastatic tumour of the same patient. Different patients with the same lung adenocarcinoma shared similar CNV patterns in their CTCs. Patients of small cell lung cancer had CNV patterns distinctly different from those of adenocarcinoma patients. Those finding indicated that CNVs at certain genomic loci are selected for the cancer metastasis and the reproducibility of cancer-specific CNVs offers potential for CTC-based cancer diagnostics [54].
A study that used a model CTC system of spiked tumour cells to determine EGFR mutation in single cancer cells by applying LAM to isolate individual CTCs. Followed by WGA of the DNA for exon 19 micro-deletion, L858R and T790M mutation detection by PCR sequencing. EGFR mutations were successful measured. Sequencing of the amplicons showed allele dropout in the amplification reaction, but mutations were correctly identified in 80% of the amplicons. To overcome allele dropout and to obtain reliable information about the tumour's EGFR status, multiple individual tumour cells should be assayed [55].
A recent study on copy-number aberrations (CNAs) in circulating tumour cells from pretreatment SCLC blood samples just was published. After analysis of 88 CTCs isolated from 13 patients (training set), a CNA-based classifier was generated. Then the results were validated in 18 additional patients (testing set, 112 CTC samples) and in six SCLC patient-derived CTC explant tumors. The classifier correctly assigned 83.3% of the cases as chemorefractory or chemosensitive. A significant difference was observed in progression-free survival (PFS) between patients designated as chemorefractory or chemosensitive by using the baseline CNA classifier. Notably, CTC CNA profiles obtained at relapse from five patients with initially chemosensitive disease did not switch to a chemorefractory CNA profile, which suggests that the genetic basis for initial chemoresistance differs from that underlying acquired chemoresistance [56]. In another recent report of a massively parallel, multigene-profiling nanoplatform to compartmentalize and analyse hundreds of single CTCs, a single-cell nanowell array performed CTC mutation profiling using modular gene panels. Multigene expression, profiling of individual CTCs from NSCLC patients had remarkable sensitivity. It was ideal for single-cell mutation profiling of individual lung cancer CTCs toward minimally invasive cancer therapy prediction and disease monitoring [57].
The current progress for single-cell analysis of lung cancer was summarized in the Table 1.

The Limitations of Single-cell Sequencing and Future Research Direction of Lung Cancer
The major concern of single-cell sequencing in clinics is the high cost of sequencing and the platforms appropriate for different clinical requirements for screening, detection, monitoring, or personal therapy guidance. The progresses of technology and marketplace competition have reduced the cost of sequencing, and more kits are now available.
The first limitation of single-cell sequencing is the low coverage, which is a common scenario when WGA, WTA or chromatin profile amplicons are sequenced. Aside from current DNA methylation methods that suffer from low coverage, even the commonly used full-length RNA amplification kit gives mostly fragments of sequences that are far from full-length. The development of more sensitive methods to increase the overall coverage, and the consistency of the coverage between single-cells will be the objective for clinicians and researchers. The second limitation is the technologies available for the comprehensive molecular analysis of a single-cell is still at the infant stage, and few technologies have the ability to robustly detect the whole proteome of a singlecell [58]. The third challenge is temporal and spatial measurements of the molecular profile in a single-cell. In situ sequencing and real-time sequencing, as well as in vivo analysis of the DNA and RNA from single-cells, have been developed, but these methods need to enhance sensitivity, coverage, and accessibility, and a reduction in cost [59,60]. The in vivo technology for single-cell analysis would provide real-time dynamics as demonstrated with other in vitro technologies. To this end, progress has been made with just a limited number of transcripts [61]. The fourth challenge involves data analysis. Most algorithms currently used for single-cell sequencing analysis were originally designed for bulk cell samples. These analysis methods do not take into account the inherent properties of a single-cell or any amplification-or sequencing-introduced biases, noise, incomplete coverage, or errors. Therefore, enhanced algorithms are needed to meet the new era of single-cell sequencing [25].
Single-cell sequencing will be a valuable asset to assist the design of regimens for personalized tumour therapies based on tumour subpopulation-specific genetic alterations in individual patients [62]. Most single-cell sequencing for DNA and RNA work in lung cancer was performed for adenocarcinoma. There is little information of data from other lung cancer such as small cell lung cancer, squamous cell carcinoma, and large cell carcinoma. The different genetic landscapes for different lung cancer have different heterogeneities and diversities. With the development of the techniques for single-cell sequencing, it will bring more exciting data from all types of lung cancer and will eventually identify rare variants that are responsible for cancer's invasion, metastatic and resistance for the targeting therapy or immunotherapy.

Conflicts of Interest
The authors declare no conflicts of interest in this paper.