Exploring viral infection using single-cell sequencing

Single-cell sequencing (SCS) has emerged as a valuable tool to study cellular heterogeneity in diverse ﬁelds, including virology. By studying the viral and cellular genome and/or transcriptome, the dynamics of viral infection can be investigated at single cell level. Most studies have explored the impact of cell-to- cell variation on the viral life cycle from the point of view of the virus, by analyzing viral sequences, and from the point of view of the cell, mainly by analyzing the cellular host transcriptome. In this review, we will focus on recent studies that use single-cell sequencing to explore viral diversity and cell variability in response to viral replication.


Introduction
Cells are the basic biological units of living organisms. They are structurally grouped by function into tissues and organs, and they form the most basic component of these structures (at the transcriptome and proteome levels) and thus their identity. Despite sharing the same DNA content and being exposed to the apparent same conditions, cells display some level of functional hetero- geneity. This heterogeneity can be explained by extrinsic features, such as cell identity (cell type/subpopulation/lineage) and cell state/process (cell cycle, circadian rhythm), or by the intrinsic stochastic nature of gene expression (Battich et al., 2015;Satija and Shalek, 2014;Stoeger et al., 2016). These cell-to-cell variations can impact cell function, cell communication and proliferation or cell fate, and behavior of the cell population at large (Altschuler and Wu, 2010;Huang, 2009;Satija and Shalek, 2014). Moreover, rare cells or small fractions of cells can play an important biological role in specific disorders and environments (e.g. cancer cells), but their contribution can be masked by the larger cell population.
Single-cell technology evolved quickly in the last years, from a simple FACS analysis assessing the expression of a specific protein to the most sophisticated techniques that allow the analysis of single-cell genome, transcriptome and proteome. The breakthrough of single-cell analysis, and specifically single-cell-omics, was an important milestone in some areas such as cancer, stem cells, epigenetics or immunology.
In the immunology field, single-cell technologies were critical for the discovery of new gene networks and novel cell subpopulations, elucidating relationships between cell clonality and their functional phenotypes, and in the discovery of new cell states or cell types within the immune system (Vieira Braga et al., 2016). For example, Stubbington et al., using a new computational method comparing the paired T cell receptor (TCR) sequences from lymphocyte single-cell RNA sequence data, were able to directly correlate T cell clonal origin with the functional phenotype in a mouse Salmonella infection model (Stubbington et al., 2016). Buetner and colleagues developed a single-cell latent variable model (scLVM) allowing identification of otherwise undetectable subpopulations of cells that correspond to different stages during the differentiation of naive T cells into T helper type 2 (Th2) cells (Buettner et al., 2015). In another study, analysis of single-cell messenger RNA sequencing revealed rare intestinal cell types . Also, single-cell analysis of CD4 + T-cell differentiation characterized three major different cell states during Th2 polarization, from the intermediate activated cell state to the mature cytokinesecreting effector state .
In the cancer field, single-cell studies have allowed significant progress in understanding carcinogenesis, progression, metastases and drug resistance (Qian et al., 2016). Single-cell RNA-seq led for example to the identification of distinct tumor subpopulations in lung adenocarcinoma (Min et al., 2015), showed that there was subclonal heterogeneity in anti-cancer drug responses of lung adenocarcinoma cells (Kim et al., 2015b), and also led to the identification of distinct gene expression patterns, including candidate biomarkers for melanoma circulating tumor cells (Ramskold et al., 2012) Single-cell genome, epigenome and transcriptome sequencing led as well to considerable advancement in the stem-cell field, improving the knowledge in both pluripotent and tissue-specific stem cells . Single-cell RNA-seq provided the opportunity to decipher gene expression dynamics during mammalian pre-implantation development, by analyzing transcriptome profiles from both human and mouse cells undergoing pre-implantation (Deng et al., 2014;Yan et al., 2013). Based on single-cell RNA-seq studies, it was also possible to identify new stem cells types (Treutlein et al., 2014) and dissect cell heterogeneity among a stem cell population (reviewed in Wen and Tang, 2016).
Single-cell technology can also be a great asset for the virology field. Viruses are dependent on the host cell to replicate and therefore, heterogeneity in the host cell population will be reflected in viral infection outcome (Fig. 1). In many cases, with many cell types, achieving 100% of infected cells is difficult. This result can be due (i) to the heterogeneity of virus particles, i.e. defective viruses or Fig. 1. Impact of cellular heterogeneity on viral infection. The percentage of infected cells in a population can reflect the cellular heterogeneity in response to viral exposure. The source of this heterogeneity can be due to different cellular states (activation stage or cell cycle) or different cellular subsets of the same cell type (e.g. CD4 + T cells subsets). Indeed, specific cell states or cell subsets may be permissive to viral infection while other cell states and other cell subsets may be resistant to viral infection. The proportion of infected cells at the population mixed level would mirror the proportion of permissive cells within the total population. mutations that lead to a decrease of infectivity, or (ii) to the cellular heterogeneity, due to cells in a different cell state when infected (e.g. different state of cell activation or cell cycle), or due to a mixture of different subsets of the same cell type (e.g. CD4 + T cells subsets). Therefore, single-cell analysis and specifically single-cell sequencing (SCS) allows for investigating the source of heterogeneity, through the joint analysis of virus replication and host cell environment.
To date, single-cell technology has already been applied to study virus evolution and virus-host interaction. The first studies exploiting single-cell analysis used simple techniques to study viral DNA, RNA or protein such as time-lapse microscopy, RT-qPCR, and FACS (recently reviewed in Ciuffi et al., 2016). The development of SCS and its application to the virology field holds great potential to help decipher virus biology and its interaction with the host cell. In this review, we will focus on the main techniques that exist to date to perform SCS as well as on the computational methods for SCS data analysis, and present recent studies that used SCS to explore the impact of host cell heterogeneity on virus infection.

Single-cell sequencing (SCS)
Single-cell sequencing (SCS) has proved to be a powerful tool to study cell heterogeneity through analysis of whole genome and transcriptome of an individual cell. While based on deepsequencing methods applied to cell population nucleic acid material, SCS had to deal with three main additional challenges: first, how to succeed with the physical isolation of an individual cell, second, how to deal with the limited amount of biological material (6 pg of DNA, 10 pg of total RNA and 0.1 pg of mRNA per cell), and third how to discriminate bona fide cell-to-cell variation from noise.
To overcome these issues, SCS techniques include (i) single cell isolation methods, (ii) extensive amplification of DNA or cDNA before library preparation and sequencing, and (iii) development of dedicated data analysis tools (Fig. 2).

Single-cell isolation
Several methods have been used for single-cell isolation from tissue or cell populations in suspension (Table 1 , for more detailed information see Gross et al., 2015;Hodne and Weltzien, 2015).
Initially, micromanipulation was often used as gold standard (Brauns and Goos, 2005;Hodne and Weltzien, 2015). It uses a glass micropipette to aspirate one single cell at a time from a cell population under a microscope. Similarly, laser-capture microdissection isolates individual cells from cell population/tissues under direct microscopic visualization (Frost et al., 2015;Kroneis et al., 2016). These methods, although useful when working with tissues or with rare and fragile cells, have significant drawbacks, including time-consuming and labor-intensive manual handling. On the other hand, automated isolation based on Fluorescentactivated cell sorting (FACS) is less time-consuming and allows the isolation of thousands to millions of cells in a couple of hours. Furthermore, cells can also be isolated according to their size, morphology, or expression of specific markers bound to fluorescently labeled antibodies (Gross et al., 2015;Saliba et al., 2014). A more automated isolation method, microfluidics, allows the automated capture of single cells using special Integrated Fluidic Circuits (IFC). Subsequently, single cell capture efficiency can be confirmed by imaging. Each cell undergoes a series of controlled chemical reactions that are carried out in successive microchambers. These reactions include cell lysis, reverse transcription of RNA into cDNA, and PCR amplification (Reece et al., 2016;Wheeler et al., 2003). All reactions are performed in very low volumes (nanoliters) and in a completely automated way. C1 TM Single-Cell Auto Prep System, Fig. 2. Single-Cell Sequencing. SCS can be divided in three major steps: (i) single-cell isolation and capture, (ii) nucleic acid amplification and library preparation, and (iii) sequencing and data analysis. First, cells have to be physically isolated prior to cell lysis. Genomic DNA is then amplified while RNA requires a reverse transcription of RNA to cDNA prior to the amplification step. Amplified DNA is then used to prepare libraries that are subsequently sequenced, producing data sets that are computationally analyzed. Therefore, SCS allows analysis of genome and transcriptome of individual cells.
from Fluidigm, is the most commonly used system. Although very convenient, this technique has two major weaknesses: on one hand, the single-cell capture is performed on fixed size chips accommodating three cell size ranges, potentially introducing a size bias in the cells that can be selected; on the other hand, a limited amount of cells (up to 96 cells for RNA-seq and up to 800 cells for Tagbased RNA-seq) can be captured on one chip, which will not be efficient for the isolation of rare cells from a mixed cell population. For that, an additional enrichment step, by FACS for instance, would be required. Similarly, ICELL8 Single-Cell System (Wafer-Gen Biosystems) is able to capture up to 1800 cells without any size bias for Tag-based RNA-Seq. Very recently, microfluidics has evolved to physically isolate individual cells into separated aqueous droplets using hydrogel microsphere emulsion. Each microdroplet constitutes a physical compartment in which cell lysis, reverse transcription, DNA amplification, SCS, or other process can be subsequently carried out (Brouzes et al., 2009;Hummer et al., 2016;Mazutis et al., 2013).

Single-cell DNA sequencing (DNA-SCS)
After isolation, each single-cell is lysed and the nucleic acid material is amplified. Because of the limited amount of starting DNA, an amplification step is first required. For DNA amplification the most commonly used techniques are degenerateoligonucleotide-primed PCR (DOP-PCR), multiple-displacementamplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC) (reviewed in detail in Gawad et al., 2016;Huang et al., 2015;Wang and Navin, 2015). These amplification methods exhibit differences in coverage, in efficiency of amplification and in the   Zong et al. (2012) amount of generated technical errors (Table 2 ). Briefly, degenerateoligonucleotide-primed PCR (DOP-PCR) is a PCR-based technique that uses degenerate primers (containing a known sequence at the 5 end of the primers and terminating with random hexamers at the 3 end) to first amplify DNA at a low annealing temperature for a few cycles. Then, a second round of PCR is carried out with primers specific to the known sequence of the degenerate primers initially used and at a higher annealing temperature. The concentration of primers and polymerase directly affect the efficiency of amplification. DOP-PCR exhibits generally low genome coverage with a high rate of technical errors, which can lead to the over-amplification of small differences and also under-amplification of some regions of the genome (Telenius et al., 1992). Similarly to DOP-PCR, multiple-displacement-amplification (MDA) uses a random hexamer primer to launch DNA synthe-sis, but which is carried out by the Phi29 DNA polymerase, a high-fidelity polymerase with 3 →5 exonuclease and proofreading activity. Annealing and amplification are performed at a constant temperature (Dean et al., 2001;Spits et al., 2006). MDA has higher genome coverage as compared to DOP-PCR, but still displays some technical errors due to the exponential amplification process.
Multiple annealing and looping-based amplification cycles (MALBAC) technique although combining features from DOP-PCR and MDA has the advantage of performing a quasi-linear amplification of the genome. This technique uses degenerate primers (27 known nucleotides followed by a random octamer) to prime extension by a DNA polymerase through isothermal stranddisplacement. Amplicons are terminated at both extremities by the same 27-nt contained in the primers, which can hybridize and form a hairpin thereby preventing further amplification. After 8-12 STRT-seq (STRT/C1).

Single-cell RNA sequencing (RNA-SCS)
Single-cell RNA sequencing first starts with a step of reversetranscription (RT) to generate cDNA as it is not yet possible to sequence RNA directly. This step is then followed by cDNA amplification and sequencing. The most current methods used for single-cell RNA sequencing are SMART-seq, SMART-seq2, CEL-seq, CEL-seq2, STRT-seq, Cyto-seq and Drop-seq, differing on the RT (traditional, modified RT-PCR or T7-in vitro transcription) and on the full length or tag-based cDNA amplification (Table 3, for more information refer to Picelli, 2016;Saliba et al., 2014). These differences will be reflected on the length of cDNA amplified, on the coverage efficiency and on 3 or 5 amplification bias.
SMART (Switching Mechanism At the end of the 5 -end of the RNA Transcript)-seq is a full-length RNA sequencing method that exploits the dual activity of the Moloney Murine Leukemia Virus (MMLV) reverse transcriptase, i.e. reverse transcription (RT) and template switching (TS). RT reaction requires a primer (often an oligodT, tailed to a known sequence for subsequent PCR amplification) and a RNA template in order to achieve first-strand cDNA synthesis. TS is characterized by the 3 adjunction of 3-4 cytosines once the RT is completed. This 3 C-overhang is then used to ligate a compatible oligonucleotide duplex with a known sequence useful for subsequent PCR amplification. This ensures that only full-length transcripts are further amplified by PCR. Furthermore, the identification of strand specificity is maintained due to the localization of the added C residues (method detailed in Goetz and Trimarchi, 2012;Ramskold et al., 2012). SMART-seq allows a high coverage of the transcriptome with an intermediate sensitivity, but with low read coverage toward the 5 -end of the transcripts and an underrepresentation of transcripts with a high GC-content (Picelli et al., 2013). SMART-seq2 improved the previous technique using (i) a refinement of the RT and TS steps and (ii) a pre-amplification step to increase the cDNA yield. This results in the increased ability of detecting gene expression, especially for the genes with high GCcontent and in lower technical biases. Nevertheless, this method does not allow identification of strand-specificity of mRNAs and is still labor intensive, since the samples can be pooled just prior to sequencing. Nevertheless, this can be now overcome with automated platforms or microfluidics devices such as Fluidigm (Picelli et al., 2013(Picelli et al., , 2014. Indeed, recently Clontech TM have implemented the SMART-Seq v4 Ultra Low Input RNA Kit, which includes all the improvements of SMARTseq2 and that can be used with Fluidigm C1 instrument. Both SMART-seq and SMART-seq2 are full-length RNA-sequencing methods based on exponential-PCR amplification. Different methods were developed using in vitro transcription (IVT), allowing for targeted and linear RNA amplification. CEL-Seq (Cell • Simultaneous DNA and RNA amplification • CEL-seq followed by MALBAC • DNA and RNA sequencing from the same single cell • Reduced handling (no separation step) and transfer • 3 -bias amplification Picelli et al. (2013) and Picelli et al. (2014) Expression by Linear amplification and Sequencing) method uses a specific barcode system in addition to IVT. Briefly, single-cell RNA is uniquely bar-coded for reverse transcription and, after secondstrand synthesis the reactions are pooled for IVT. The amplified RNA is then fragmented, purified and sequenced (Hashimshony et al., 2012). CEL-seq allows for strand specificity and has high barcoding efficiency. Nevertheless, CEL-seq has a strong 3 -bias and a low sensitivity for lowly expressed transcripts (Hashimshony et al., 2012). An updated version of this method, CEL-seq2, has been developed, allowing for higher sensitivity, lower costs and faster processing (Hashimshony et al., 2016). STRT-seq (Single-cell tagged reverse transcription sequencing) is a method evolved from SMART-seq that was modified to a tag-based method with the introduction of 6-base random barcodes at 3 -end. Additional modifications were implemented to increase the TS efficiency (Islam et al., 2011(Islam et al., , 2012. The original STRT-seq protocol was subsequently simplified, improved and adapted to the C1 Single-Cell Auto Prep system (Fluidigm) and is known as STRT/C1 (Islam et al., 2014). STRT-seq and STRT/C1 display a high sensitivity and good genome coverage although is still limited by 5 -end bias.
Single-cell RNA-seq methods were highly improved at levels of efficiency, rapidity and generation of technical errors. Further improvements are still being developed, aiming at scalable approaches compatible with routine analyses, and allowing digital gene expression profiling of thousands of single cells across an arbitrary number of genes, without using robotics or automation. For that reasons, novel methods were combined with established single-cell RNA-seq techniques, such as Cyto-seq which allows for volume reduction (Cytpicoliter wells) , Drop-seq and in-Drop that are based on physical displacement of emulsion droplet) (Brouzes et al., 2009;Hummer et al., 2016;Mazutis et al., 2013).

DNA/RNA dual sequencing
Single-cell technology was further improved with the recent development of methods allowing the analysis of both the genome and the transcriptome of the same single cell. Two methods have been described for this dual −omics, Genome and Transcriptome Sequencing (G&T-seq) and DNA and RNA sequencing (DR-seq) ( Table 4). In G&T-seq poly(A) RNA is captured by a biotinylated oligo-dT primer, thus separated from genomic DNA, and then both the genome and the transcriptome are amplified and sequenced in parallel (Macaulay et al., 2015). The full-length mRNA is amplified using SMART-seq2 and the genomic DNA can be amplified using different Whole Genome-Amplification (WGA) methods. In contrast, DR-seq does not separate RNA from DNA before amplification, but rather directly lyses and reverse transcribes RNA first (similarly to CEL-seq), and then amplifies both genomic DNA and cDNA by quasilinear whole genome amplification (MALBAC method) (Dey et al., 2015).
Recently, a step further has been made to uncover all genetic information of a single cell. New exciting methods start to appear combining RNA and/or DNA SCS with epigenome sequencing, through sequencing of single-cell methylome by bisulfite sequencing (scRRBS) (Hou et al., 2016) or by investigating chromatin interactions and chromosomal organization (3D genome structure) through single-cell Hi-C (Furlan-Magaril et al., 2015;Nagano et al., 2015).

Computational analysis of single-cell sequencing data
The challenging technological manipulation during SCS experiments, such as capture of single cells, dealing with minute amounts of DNA or RNA, and amplification of the biological material, introduce systematic noise and biases in the data. These render data processing complex and analysis difficult to discriminate between bona fide cell-to-cell variation and technical noise. Currently, there are several tools available for the analysis of single-cell sequencing data, taking these issues in consideration.
Initial preprocessing of SCS data typically involves similar steps as when dealing with conventional bulk DNA or RNA sequencing. These steps include read trimming, quality control and mapping to the reference genome, using the same tools as for bulk sequencing (Tables 5 and 6). However, SCS datasets contain a high level of artifacts and noise, such as allelic dropout, non-uniform coverage and spurious sequencing errors; therefore, after alignment, they require a tailored downstream analysis (Gawad et al., 2016;Li et al., 2016;Stegle et al., 2015). A necessary quality assessment of each individual single-cell sample is necessary. This step consists of inspecting different data preprocessing results, such as read count, percentage of uniquely aligned reads, genome coverage and allelic dropout. These features allow detecting low-quality samples with broken or dead cells, and select the viable datasets, relevant for further study of biological questions (Bacher and Kendziorski, 2016;Gawad et al., 2016;. Ilicic et al. propose a complete pipeline for RNA-SCS data processing, including a classification method to discard unreliable single-cell datasets. Their approach accounts for features such as read count, read mappability, proportions of non-exonic and mitochondrial reads and transcriptome variance (Ilicic et al., 2016).
Due to false positive rates specific to the whole-genome amplification methods (de Bourcy et al., 2014), non-uniform coverage, sequencing errors and high rates of allelic dropouts, detecting copy number variants (CNVs) or single nucleotide variants (SNVs) in DNA-SCS data is a difficult task. Currently, in order to ease variant detection, a handful of studies attempt to quantify the different biases in single-cell data and propose normalization approaches to correct for them (Table 5 and (Cheng et al., 2011;Zhang et al., 2015)). Zong et al. incorporate bulk DNA sequencing data to train a mixture model, which identifies SNVs in DNA-SCS, and a hidden Markov Model (HMM) with a pre-defined transition matrix for CNV  (2016), Gawad et al. (2016) and Grün and van Oudenaarden (2015) Normalization and variant detection • Amplification bias quantification and coverage adjustment using a statistical model • K-mer correction method for genome assembly • Nikolenko et al. (2013) detection (Zong et al., 2012). Another approach for SNV calling in DNA-SCS data is the recently proposed method Monovar. By the means of a statistical model, the method takes into account allelic dropout, uneven coverage and false positive effects and calculates posterior probabilities for each locus of incorporating an SNV (Zafar et al., 2016). DNA-SCS also allows novel genome assembly of viruses and other organisms that are not easy to culture for subsequent bulk sequencing and assembly analysis. The traditional approaches of many genome assemblers rely on the uniform read coverage assumption, which does not stand for DNA-SCS data. Novel assembly methods such as SPAdes (Bankevich et al., 2012) and IDBA-UD (Peng et al., 2012) tackle issues such as uneven coverage, amplification errors and spurious reads and perform adequate k-mer correction. The single-cell assembler SPAdes comes with an arsenal of genome assembly tools, capable of handling diploid genomes with numerous polymorphisms (dipSPAdes Safonova et al., 2015), Illumina TruSeq barcoded reads (TruSPAdes), or RNA assembly (rnaSPAdes). Nikolenko et al. propose a novel k-mer error-correction method, which uses connected components in Hamming graphs and Bayesian clustering of these components prior genome assembly (Nikolenko et al., 2013).
After preprocessing, alignment and cell quality control, the naturally following computational task in RNA-SCS is gene expression quantification. Raw gene expression counts can be obtained with existing tools for bulk RNA-Seq, such as HTSeq (Anders et al., 2015). Note that relative expression measures such as reads per kilobase per million (RPKM) are not applicable for RNA-SCS data because of unsuitable assumptions such as homogeneous RNA and gene expression across cells (Bacher and Kendziorski, 2016). Following normalization of the read counts, some of the typical downstream analyses comprise differential expression, identification of cellular subpopulations based on heterogeneous gene expression patterns, studying splicing heterogeneity across cells, pseudo-temporal ordering of cells and functional analysis (Table 6).
With approximately 10%-20% of the total number of transcripts being successfully reverse transcribed in RNA-SCS (Islam et al., 2014;Kolodziejczyk et al., 2015), and with numerous sources of variation and technical noise (Grün et al., 2014), normalization of raw gene expression counts proves to be challenging. The key necessity in this step is deconvolution of technical and biological variations. Several existing computational tools use spike-ins to quantify technical noise and identify genes showing significant biological variation across cells (Table 6 and (Brennecke et al., 2013;Kim et al., 2015a;Vallejos et al., 2015Vallejos et al., , 2016). In absence of spike-ins, normalization methods rely on the use of global scaling factors as proposed in the DESeq tool (Anders and Huber, 2010). However, this might not be well suited for single cell data, as the idea of global scale factors implies homogeneous RNA amounts across the different cells, which might not be the case (Bacher and Kendziorski, 2016); furthermore, RNA-SCS data contains a large number of dropout events, thus, the gene expression profiles contain a high proportion of zeros. There are available computational tools that model the dropout events into a separate component to infer the expected gene expression levels in each cell (Finak et al., 2015;Kharchenko et al., 2014). Another approach combines the gene expression levels by pooling read counts from different cells, and deconvolving size factors characteristic to each cell (L. Lun et al., 2016).
Clustering of single cells based on their gene expression profiles leads to the stratification of the total population of single cells into subpopulations or subtypes of functional relevance. Several studies have performed this task using conventional clustering methods such as principle component analysis (PCA), hierarchical clustering, or K-Means (Table 6; for a detailed review see Bacher and Kendziorski, 2016). However, many of the classical clustering Table 6 Computational methods designed for analysis of RNA-SCS data. • Normalization based on Poisson bootstrapping using spike-ins; statistical test for differential expression identification • Zero-inflated factor analysis (ZIFA) method which mitigates the effect of dropout events for dimensionality reduction and clustering • Pierson and Yau (2015) • MAST method consisting of a generalized linear model for analyzing time-series RNA-SCS data, performing differential expression assessment and gene set enrichment analysis • Finak et al. (2015) • Shared nearest neighbor clustering approach based on clique joining (SNN-Cliq) • Xu and Su (2015) • RaceID, a k-means-based clustering method with focus on identifying rare cells  • BackSPIN single-cell biclustering • Zeisel et al. (2015) • PcaReduce method, combines PCA and hierarchical clustering to identify novel cell types •žurauskienė and Yau (2016) • Method that applies classical clustering methods to transcript compatibility counts from SCS data  (2013) Splicing heterogeneity analysis • SingleSplice method, constructs a weighted splice graph and employs Gamma regression models fitted to spike-ins measurements to estimate technical variability due to noise and dropout events; the method then uses bootstrapping to identify isoform biological variation across cells • Welch et al. (2016) methods may be vulnerable to the existence of confounding factors, such as cell cycle, which should be corrected for. For this reason, novel clustering methods, tailored for single-cell gene expression data, tackle the confounding, hidden, effects and robustly identify different subgroups of cells (Buettner et al., 2015). Subsequently, the clustered individual cells can be further organized in so-called cellular hierarchies, depending on their transcriptional state profiles. This type of analysis provides a glimpse of the heterogeneity in transcription and regulatory dynamics. Juliá et al. introduced the Sincell R package, which comprises a comprehensive and rigorous workflow for cell-state hierarchies identification. Sincell gives the user the opportunity to continuously adapt the analysis pipeline and choose between alternative approaches to identify, evaluate and interpret cell-state hierarchies (Juliá et al., 2015). RNA-SCS gives the opportunity of studying other interesting biological questions that are computationally challenging, such as identifying gene expression kinetics (Kim and Marioni, 2013) and analysis of splicing heterogeneity across different cells (Shalek et al., 2013;Welch et al., 2016).

Heterogeneity of cell population reflects differences in viral infection
When studying viral infection, we cannot separate the virus from the host cell and subsequently, to the intrinsic heterogeneity that exists in the cell population. Differences in cell subpopulations/subsets or in cell cycle/differentiation stages reflect into the specific composition of cells, both in the nature and the amount of RNA molecules and proteins, thereby influencing viral replication and pathogenesis. This also defines the cell susceptibility and permissiveness to viral infection.
A large number of examples are known to correlate cell heterogeneity with differences in susceptibility to viral infection. For example, differentiation of monocytes to macrophages leads to different susceptibilities to viral infection. Indeed, macrophages are more susceptible than monocytes to human influenza A virus (IAV) (Hoeve et al., 2012) or to human immunodeficiency virus (HIV) infection. This differential susceptibility was shown to be dependent on the cell differentiation status in case of HIV, correlating with differences in gene expression, including APOBEC3A and 3G restriction factors (Peng et al., 2007), CCR5 co-receptor (Tuttle et al., 1998), or with the expression of miRNA that target the 3 UTR of HIV-1 transcripts (Wang et al., 2009).
Cell cycle can also determine viral infection efficiency, as illustrated by retrovirus infections (Katz et al., 2005). Upon cell entry, common to all retroviruses, the viral RNA genome (positive, singlestranded) is first reverse-transcribed to double-stranded DNA and then integrated in the host cell genome. To achieve this last step, the murine leukemia virus (MLV), a gammaretrovirus, needs the cell to be dividing so that the nuclear membrane is disrupted and the viral DNA can be tethered to the host chromatin (reviewed in Ciuffi, 2016;Suzuki and Craigie, 2007). In contrast, the avian sarcoma virus (ASV), an alpharetrovirus, can integrate its genetic material into the host genome of non-dividing cells, but requires cells to be in S-phase to do so efficiently (Humphries et al., 1981). The human foamy virus (HFV), a spumavirus, is unable to infect productively G1/S or G2 growth-arrested cells, thereby requiring cell cycle progression in order to successfully replicate (Bieniasz et al., 1995;Patton et al., 2004). HIV, the prototypical lentivirus, is able to infect both dividing and non-dividing cells. However, depending on cellular division, HIV DNA reaches nuclear chromatin differently, due to the absence or presence of an intact nuclear envelope, and thus to the interaction with nuclear pore components. Thus, the cell cycle determines the entry route of the viral genome to the chromatin, contributing to integration site selection and potentially impacting viral gene expression (Ciuffi, 2016;Demeulemeester et al., 2015).
In addition, virus infection itself triggers the expression of numerous cellular genes, which can be different from cell to cell and that adds to the cellular heterogeneity. This is illustrated in a study from Mohammadi et al. where HIV replication in SupT1 cells (a CD4 + T cell line) is followed over time (Mohammadi et al., 2013). Kinetic differences appeared very quickly during viral progression, where most cells reverse transcribed the viral RNA genome within 7 to 10 h post-virus exposure and expressed virus-encoded green fluorescent protein (GFP) within 22 to 25 h. However, a minority of cells did so earlier or later, i.e. reverse-transcribing as quickly as 3 h or as lately as 23 h post-exposure and expressing GFP as quickly as 16 h and as lately as 32 h post-exposure. Also, Holmes et al. investigated the dynamics of the HIV-1 life cycle using timeto-inhibitor-addition experiments in MT4 cells (CD4 + T cell line) and observed similar dynamics of viral progression although within a different time frame kinetics (Holmes et al., 2015). These studies clearly exemplify the different abilities of individual cells to support and conduct productive infection.
At last, differences in innate immune response, due to the cell type or cell state, can affect the outcome of viral infection. Differential expression of effector molecules, at least in vitro, can contribute to cell permissiveness to viral infection. Several studies comparing the permissiveness of cell lines to HIV infection showed critical differences in the expression of such proteins and led to the discovery of Interferon (IFN)-induced anti-viral proteins, also called restriction factors, like MX2 (Kane et al., 2013), APOBEC3G (Sheehy et al., 2002) or BST2/tetherin (Kluge et al., 2015;Neil et al., 2008). Similarly, Ye et al. compared the proteome composition of two hepatocyte cell lines differing on their permissiveness to hepatitis C virus (HCV) infection (Ye et al., 2015). They observed that, compared to the other cell line (Huh7.5.1), the less permissive cell line (HepG2) expressed high levels of the major vault protein (MPV) protein, an IFN-inducible gene, thereby leading to an inhibition of HCV replication.
Moreover, innate immune defenses that block viral replication are not expressed similarly in all cell types, thus contributing to the cell susceptibility to virus infection. For instance, primary CD4 + T cells are more or less susceptible to HIV infection depending on whether they are in an activated or quiescent state, respectively (Doitsh and Greene, 2016;Jakobsen et al., 2015;Pan et al., 2013;Rausell et al., 2016). This was shown to be due, at least in part, to the differential expression of cellular restriction factors, such as SAMHD1, in the different cell states. Similarly, transformed cell lines are usually more permissive to viral infection because they display various defects in expression of the innate immune defense .
Studies at the population level revealed cellular heterogeneity and a subsequent impact on the viral outcome. The advances of single-cell technologies allow to go further studying viral infection in the context of this cellular heterogeneity. Specifically, single-cell sequencing (SCS) allows for evaluating viral diversity independently of the cell population bias, and for studying the impact of the cellular heterogeneity on viral replication. SCS applied to virology is an emerging field. Studies so far investigated viral diversity at the single-cell level, through viral genome analysis, or explored cellular variability in response to viral infection by evaluating the genome and transcriptome of a single infected cell.

Viral genome analysis by SCS
Several studies have used SCS to better understand viral diversity and to identify different quasispecies at a single-cell level. McWilliam Leitch and McLauchlan analyzed viral RNA heterogene-ity. They established a system to amplify the genome of HCV from single infected cells and analyzed the viral quasispecies diversity (McWilliam Leitch and McLauchlan, 2013). Using RT-qPCR and viral RNA deep-sequencing, they determined the quasispecies composition of 16 individual single cells as compared to the total cell population. They determined that, on average, one single cell contains 113 copies of replicon RNA (ranging from 84 to 160 copies). Moreover, results showed that HCV quasispecies composition differed significantly among cells, indicating a "cellular compartmentalization" of replicon HCV RNA with distinct sequences, which can probably lead to divergent evolution. Although the wild-type sequence was the most prevalent sequence identified in both single and total cells, different variants were found in single cells that appeared as minor variants at the population level.
Combe et al. studied the genetic diversity of vesicular stomatitis virus (VSV), a RNA virus from the Rhabdoviridae family, by combining single-cell and ultra-deep sequencing (Combe et al., 2015). Baby Hamster Kidney (BHK)-21 cells were infected with VSV for 45 min before being separated by micromanipulation and further incubated for 24 h to allow completion of viral infection. Supernatants were collected, used to quantify infectious virion progeny by plaque assay and to investigate viral genome sequence variation by deep sequencing. Full-length sequences from 881 viral plaques derived from 90 individual cells and corresponding to three viral generations were compared to the initial viral genome input: in total, 532 single nucleotide polymorphisms were identified, 36 polymorphisms were originally present in the viral stock and 496 newly arising polymorphisms were due to spontaneous mutations, suggesting that new sequence variants emerge rapidly. The authors exploited the pre-existing genetic diversity of viruses to track viral infection at single-cell level. They showed that in most cases single cells were infected by more than one sequence variant of the virus even at low multiplicity of infection. Furthermore, the viral progeny showed variable rates of spontaneous mutation between single cells.
Based on a previous SCS assay (Josefsson et al., 2011), Josefsson et al. investigated the HIV genetic composition in individual memory and naïve CD4+ T cells from lymph node tissue and peripheral blood. Single cells were isolated from patients by FACS sorting followed by PCR amplification of HIV DNA and sequencing (Josefsson et al., 2013). They showed that the majority of infected CD4 + Tcells (90%) located in lymph node tissue and peripheral blood contained only one HIV-1 DNA molecule. Furthermore, when they compared the viral sequences from the population of cells of memory and naïve T-cells from lymph node tissue and peripheral blood, obtained by the SCS assay with the HIV-1 RNA sequences from the populations of cells found in plasma, they observed that viral sequences were similar to each other and to HIV-1 RNA from contemporaneous plasma from these patients, implying an exchange of virus and/or infected cells between these compartments in untreated chronic infection.

DNA sequencing
Some studies have used DNA-SCS to investigate the heterogeneity of immune responses in infected single cells. Tanaka et al. studied T-cell receptor (TCR) repertoire of human T-cell lymphotropic virus type 1 (HTLV-1)-specific cytotoxic T lymphocytes (Tanaka et al., 2010). HTLV-1 is the major causative agent of adult T-cell leukemia (ATL), an aggressive lymphoproliferative malignancy, which has been associated with specific HLA alleles. Allo-hematopoietic stem cell transplantation has proved to be an effective treatment for adult T-cell leukemia. Tanaka et al. used single-cell approach to study the TCR repertoire of cytotoxic T cells of patients with specific HLA haplotypes undergoing or not allo-hematopoietic stem cell transplantation. They performed single-cell isolation of T cells by FACS followed by RT-PCR, and direct sequencing of the V-D-J CDR3 region of TCR. They used a total of 160 cells from two patients before treatment and 282 cells from three patients after transplantation. Results showed highly restricted oligoclonal diversity of cytotoxic T cells in bone marrow and peripheral blood at single-cell level before and after allo-transplantation. They also observed that the amino acid motif P-D/P-R of CDR3 was conserved between unrelated ATL patients and in the same patient before and after transplant. Tsioris et al. used single-cell analysis to study the humoral response against West Nile Virus (WNV) in individuals recently infected with WNV (Tsioris et al., 2015). Antibody secreting single cells and stimulated memory B single cells were identified and isolated by microengraving, and the corresponding antibodies were recovered, cloned and analyzed by next generation sequencing (NGS). Single-cell analysis allowed the identification of WNV-specific antibodies. Moreover, data showed that WNVspecific memory B cells and antibody-secreting cells persisted in post-convalescent individuals and that antibody response was independent of an asymptomatic versus symptomatic disease outcome.
Similarly, Cox et al. isolated dengue-neutralizing antibodies from single cell-FACS sorted human antigen-specific memory B-cell (Cox et al., 2016). Antibody sequencing allowed the characterization of antibody diversity produced by human memory B cells and also led to the identification of a novel epitope for dengue 2 virus serotype.

RNA sequencing
Most transcriptome studies at the single cell level used RNA-Seq technology and investigated cell-to-cell variability in the context of viral infection. Wu et al. studied the cell heterogeneity of a human papillloma virus (HPV)-infected cell line derived from a cervical cancer (HeLa S3 cells) by single-cell RNA-seq . They developed a customized full-length RNA sequencing pipeline, mRNA amplification and library construction system (MIRALCS followed by modified SMART-seq2), which is based on a high throughput platform where single-cell RNA and subsequent cDNA were prepared at nanoliter scale based on a customized microwell chip. Using this method, they amplified full-length transcripts of 669 single HeLa S3 cells and selected 40 random individual cells for single-cell RNA sequencing. They identified significant differences in the number of total transcripts among single cells (n = 67,000-233,000). Cellular heterogeneity was observed at the level of expression, alternative splicing, and at RNA chimeric transcripts (originated from fusion genes or two different genes by subsequent trans-splicing, and translated into chimeric proteins). Moreover, they identified a set of genes, which are potential interactors with or regulated by E6 and E7 oncogenes.
A single-cell RNA-seq approach was used with a dedicated bioinformatic pipeline (Juliá et al., 2015) to investigate cellular heterogeneity and identify biomarkers of HIV permissiveness (Ciuffi et al., 2015;Rato et al., 2016). Non-infected activated CD4 + T cells from a highly and a poorly susceptible individual were selected and used for single-cell RNA-seq analysis (SMARTseq) using Fluidigm C1 technology. RNA-seq profiles from 85 highly permissive and 81 poorly permissive single cells were successfully obtained, with ∼25 million reads per single cell. Transcriptional heterogeneity translated in a continuum of intermediary cell states in both highly and poorly permissive donor cells, which was mainly driven by TCR-mediated cell activation. Genes whose expression was differential were further investigated as candidate biomarkers of HIV permissiveness.
Martin-Gayo et al. studied at the single-cell level a subset of dendritic cells (DCs) from Elite Controller patients (EC) previ-ously described to have a more efficient immune response against HIV-1 infection (Martin-Gayo et al., 2015). DCs were first challenged with HIV-1 before being isolated by FACS and processed to single-cell RNA-seq. The whole transcriptome analysis of 85 cells demonstrated different transcriptional patterns on the expression of interferon stimulated genes, cytokines, cytokine receptors and co-stimulatory molecules. Furthermore, from single-cell analyses the authors were able to identify specific markers that characterize a highly functional subset of DCs, with improved abilities to induce T cell proliferation (Martin-Gayo et al., 2016).
Single-cell analysis was also used to study the very recent outbreak of Zika virus (ZIKV). ZIKV is a mosquito-borne flavivirus. It is known to enter human skin fibroblasts, keratinocytes, and immature dendritic cells (Hamel et al., 2015). Very recently, reports of infants born with microcephaly from infected mothers have markedly increased, establishing a pathological link between ZIKV and the central nervous system. Nowakowski et al. used single-cell RNA-Seq (SMART-seq) to identify cell populations that could be susceptible to ZIKV (Nowakowski et al., 2016). The authors studied the expression of different cell receptors that can be used by ZIKV across diverse cell types of the developing brain. Using Fluidigm technology, they perform single-cell RNA-seq on these different cell types and observed that the receptor AXL, known to mediate ZIKV and dengue virus entry in human skin, was also highly expressed in human radial glial cells, astrocytes, endothelial cells, and microglia in developing human cortex and by progenitor cells in developing retina. These results suggested that all these cell types may be particularly vulnerable to ZIKV infection.

Conclusions and perspectives
Single-cell technologies have the potential to change the experimental approach to the study of viral infection. Specifically, single-cell sequencing allows examining the impact of cell-to-cell variability on the outcome of viral infection. The analysis can capture viral diversity and evolution to identify sequence variation in viral quasispecies. The approach can also examine cellular heterogeneity through the changes of cell transcriptome. Additional applications can explore the immune response to viral infection in the infected cell.
The current challenge in the field is the joint study of viral and cellular heterogeneity. Ideally this would need the ability to analyze the same single cell before and after viral infection, as the cellular state and gene expression evolves in response to the incoming virus. This would also support the identification of the effect of cellular heterogeneity on the outcome of infection. Future improvements of single-cell technologies, from isolation to biological sampling and analysis, should aim at a more comprehensive snapshot of the single-cell composition.
Ultimately, single-cell-based technologies could be used as a "personalized" tool to identify individual cells in a cell population, in the same way as personalized medicine deals with different individuals. Indeed, personalized medicine originated from the initial identification of outlier individuals, with a higher or lower response to a defined drug treatment for example, as compared to the average individual population. A better characterization of these outlier individuals, at the level of genotyping, transcriptome or proteome profiling, provided the foundation for understanding individual specificities leading to the observed differential response to drug treatment, including antivirals, thereby allowing refinement of the drug treatment and thus personalized medicine. Similarly, SCS is setting the ground for understanding the cellular individual response, by characterizing individual cells at the level of DNA, RNA and protein signatures.
Finally, novel technologies aiming at a controlled and specific treatment of individual cells, followed by SCS may help identifying cellular features characterizing the quality of the individual cellular response to a given stimulus, drug or virus exposure. This should prove useful for future clinical applications, including personalized medicine.

Author contributions
SR, MG, AT and AC wrote the manuscript.

Conflicts of interest
The authors declare no conflict of interest.