The use of single-cell RNA-Seq to understand virus–host interactions

Single-cell analyses allow uncovering cellular heterogeneity, not only per se , but also in response to viral infection. Similarly, single cell transcriptome analyses (scRNA-Seq) can highlight speciﬁc signatures, identifying cell subsets with particular phenotypes, which are relevant in the understanding of virus– host interactions. This review presents a practical guide to help researchers design a scRNA-Seq study. It provides a simple overview of the scRNA-Seq procedure as well as a comparison between the scRNA-Seq workﬂows available to date.


Introduction
Recent technological advances coupled with mathematical modeling and computer science revolutionized the field of biology. Indeed, deep sequencing of the whole human genome in 2001 signed a first milestone in the 'omics era' [1,2]. From that point on, lot of efforts have been devoted to develop new chemistries and devices with high throughput capacity, such as Next Generation Sequencing (NGS) and Mass Spectrometry (MS), and to bring them to basic and clinical research [3 ,4]. The combination of multiple omics is known today as the 'integrome' and should allow a better and more comprehensive understanding of biological processes or diseases [5].
To date, a series of state of the art techniques is available for studies at population level resolution, with many biological applications, including cancer [6], stem cells [7], and infectious diseases [8,9]. These omic studies are likely to be valuable assets for personalized medicine [10][11][12].
Although essential and informative, population studies have some limitations as they contain a mixture of cells. As such, these cells can appear on one hand as being similar or homogeneous for some parameters, such as cell size or shape, tissue localization, protein expression or cell type subset. On the other hand, a cell population can also appear heterogeneous for some other parameters, such as cell cycle or metabolic activation. Thus, analyses at the cell population level may average and minimize individual cellular differences, potentially masking rare cells or cell subsets with a significant specific phenotype [13 ]. Illustrations of this can be found in cancer, where heterogeneity in intra-tumor cells at genetic, epigenetic and phenotypic level can lead to resistance in cancer therapies, as well as in infectious diseases where cell heterogeneity can reveal differential susceptibility to infections or different immunological responses [14 ,15,16]. In order to start investigating this cellular heterogeneity, single-cell analyses were initially performed using imaging techniques and further developed to benefit from sequencing technologies [14 ,17]. Single-cell analyses bring up several technical challenges, including single cell isolation, handling minimal amounts of biological sample (6 pg of DNA, 10 pg of RNA), and data normalization and analysis [5,[17][18][19][20][21][22]. Nowadays, an array of kits and methods are available, overcoming these challenges and allowing single cell omics analyses (reviewed in [23][24][25][26]).
Characterization of cell heterogeneity in the field of virology is obviously of great interest as it is likely impacting virus replication cycle and thus infection outcome. Indeed, the success of viral replication is highly dependent on its host, as viruses adapted to exploit the cell machinery to their own benefit. Interestingly, achieving universal infection in a given cell population is quite rare. Two non-mutually exclusive hypotheses can explain this unequal susceptibility to infection: (i) viral heterogeneity, that is, the presence of a mixture of competent, mutated and defective viral particles, displaying differences in infection ability, and (ii) cellular heterogeneity, that is, the presence of a mixture of cells with differences in metabolism, composition, activation status or cell cycle, resulting in specific cellular environments shaping virus progression success throughout the cell. In this perspective, single-cell analyses represent novel opportunities to identify specific cellular and molecular features promoting or in contrast restricting virus replication, thereby adding to the understanding of virus-host interactions, as well as providing new targets to inhibit viral replication.
To date, only a limited amount of single cell studies in virology have been published (reviewed in [14 ,27]).
Initial studies used single-cell technologies to explore virus evolution, through methods including fluorescent reporters, FACS analysis and sorting, as well as time lapse microscopy (recently reviewed in [27]), and it is only recently that transcriptome analyses at single cell level started to uncover cell subsets with specific response to viral infection. In this review, we will focus on the prominent role of single-cell RNA-Seq (scRNA-Seq) in virus-host interactions. In particular, we will provide an overview of the complete scRNA-Seq workflow as well as studies using this technology to identify specific features relevant for virus-host interactions.
Transcriptome analysis: single cell RNA-Seq workflow The scRNA-Seq analysis pipeline can be divided in three essential steps ( Figure 1): (i) single cell isolation, (ii) RNA-Seq procedure, that is, RNA extraction, reverse transcription (RT), library preparation and sequencing, and (iii) data analysis.

Single cell isolation
In order to perform single-cell RNA-Seq (scRNA-Seq) the first trivial step is to isolate single cells. To date single cell isolation methods can be divided according to two criteria: (i) the cell isolation capacity, that is, high or low throughput, depending of the level of automation and (ii) the cell selection method, that is, blind selection without any a priori or biased selection according to a specific parameter (Table 1). For example, some high throughput technologies such as Fluorescence Activated Cell Sorting (FACS) and Magnetic Activated Cell Sorting (MACS) allow a biased selection of the cell population based either on cell size and shape or on the expression of specific cell surface markers, while microfluidics-based and microdroplets-based technologies allow unbiased separation of single cells. Similarly, low throughput techniques that require micromanipulation, such as manual cell picking or Laser Capture Microdissection (LCM), also allow biased cell selection based on a specific cell morphology or the expression of a fluorescence marker. By contrast, limiting dilution-based methods are unbiased. (For more information on the isolation methods, please refer to [27][28][29]30 ].) RNA-Seq procedure: RNA extraction, reverse transcription, library preparation, and sequencing Standard protocols for library preparation have been optimized to use only a few ng (10-100 ng) of starting DNA material. However, one single cell contains on average only 10 pg of total RNA. Therefore, workflows for RNA extraction and library preparation were adapted and optimized to work with single-cell material.
Isolated single cells need first to be lysed in order to access RNA. This step can be performed through dedicated automated devices (such as fluidigm C1, 10X genomics chromium, ddSEQ single-cell isolator), or manually, using commercially available kits (such as GenElute TM Single Cell RNA Purification Kit (Sigma) or Single Cell Lysis Kit (Thermo Scientific)). Alternatively, manual cell lysis and total RNA purification can be performed simultaneously using resin-based columns (e.g. Single Cell RNA Purification Kit (NorGen Biotek Corp), PicoPure1 RNA Isolation Kit (Thermo Scientific)) or magnetic bead separation (e.g. 5 min Single Cell RNA Extraction Kit (Biofactories)). Afterwards, RNA is enriched for mRNA, either by polyA selection or more rarely by ribosome depletion (Table 2, Figure 1) [31,32]. The enriched RNA fraction is then usually reverse transcribed with modified oligodT primer through different protocols, either polyA tailing or template switching. During the RT step, some protocols allow tagging of single molecules with unique molecular identifiers (UMI), which are random hexanucleotides that can be used to quantify more precisely the number of initial mRNA molecules present in one single cell [33,34]. Following reverse transcription, cDNA is amplified by in vitro transcription or by PCR. The amplified cDNA library is then used for library preparation and high throughput sequencing (currently most protocols use Nextera for library preparation and Illumina as a sequencing platform) (for additional information on RNA-Seq procedures, please refer to [3 ,15,23,25,29,35,36 ,37 ]).

Single-cell data analysis
As every single cell is unique, it is not possible to perform experimental replicates and assess noise. It is thus essential to have some quality controls to ensure data reliability. This can be achieved by adding synthetic mRNAs of known sequence and amount, such as External RNA Controls Consortium (ERCC) RNA spike-ins, to each cell lysate. The number of reads recovered from the spike-ins will provide information about inter-sample technical variability [3 ,25,38 ,39].
It has been recently shown that sequencing errors in UMI sequence are common [34], and may thus bias transcript quantification when used. To minimize the error rate of UMI during sequencing, Lau et al. developed a new approach consisting in building a high error resistant exponentially-expanded barcodes (EXBs) with giga-scale diversity and introduce them into double strand cDNA by mean of a transposase [40]. Multiple packages for singlecell data analysis are already available and many computational tools are being developed to improve current analyses in order to handle bona fide cell-to-cell variation and technical noise [38 ,39,41,42,43 ,44,45 ,46]. Most of these analyses rely on cell-to-cell vicinity, or transcriptome similarity, thereby allowing a neighborhood-based graphical representation of individual cells, such as Principal Component Analysis (PCA) or t-distributed Stochastic Neighbor Embedding (t-SNE) plots. These plots are commonly used as they allow easy and intuitive visualization of cell heterogeneity and cell clusters that can be further analyzed by differential expression analysis. In conclusion, transcriptome analysis at single-cell level (scRNA-Seq) can reveal cell heterogeneity in a cell population, thereby highlighting the existence of one or multiple discrete cell clusters with specific gene expression profiles.
Identification of a cell-specific transcriptional signature correlating with a specific phenotype in response to viral infection Although valuable, scRNA-Seq alone does not provide any information on the association and relevance of a specific cell cluster (with a specific gene expression signature) with a specific cell phenotype. For this purpose, identification and assessment of a specific phenotype in response to viral infection is required.
Phenotypic analyses usually rely, either on the expression of a virus reporter gene or a specific cellular gene, or on the activity of a specific cellular protein ( Figure 1). Furthermore, phenotypic analyses can be performed either at single-cell level associated with scRNA-Seq or at population level independently of scRNA-Seq. Indeed, assessment of the phenotype of interest on the same single cell that will be lysed and processed for scRNA-Seq allows for direct overlapping of transcriptome and phenotype, thereby providing a direct and straightforward association between gene expression signature and specific phenotype. This is possible if single cells are first characterized by FACS or by microscopy for the expression of a fluorescent reporter gene before scRNA-Seq for example. Alternatively, specific phenotypes of interest need to be first characterized at population level. In this case, it will be helpful to identify two cell populations with extreme opposite phenotypes. For example, a cell population A is more permissive than a cell population B. Single-cell analysis of the two cell populations may then reveal imbalanced cell clustering, correlating with the observed phenotypes. For example, the cell population A contains a higher proportion of cells in cluster 3 and only few cells in the other two clusters, while cell population B contains only a few cells in cluster 3 and a higher proportion of cells in the other clusters. Cells in cluster 3 will thus represent permissive cells and will be characterized by a specific transcriptional signature.
In conclusion, joint analysis of scRNA-Seq profiles and phenotype assessment brings additional and valuable information, as it allows correlating a specific cell phenotype to a transcriptional signature (Figure 1). These analyses should help identifying novel biomarkers specific for viral phenotypes.  Figure 1 Legend Continued) finally sequencing. (iii) Sequencing reads are further processed and aligned to host and virus genome reference sequences to assess abundance and structure of transcripts, thereby defining the gene expression profile of each single cell. Multiple transcriptomic analyses can be performed to inform about data structure and cell heterogeneity using dimensionality reduction plots, such as t-Distributed Stochastic Neighbor Embedding (t-SNE) or Principal Component Analysis (PCA). Cell clustering and differential expression analysis can provide specific gene expression profiles. Finally, single cell transcriptomes can be compared to cell population transcriptome or to quantitative RT-qPCR analyses. These three first steps characterize the process for single-cell transcriptomic analysis, allowing mostly revealing the level of cellular heterogeneity in a cell population, that is, identifying one or multiple cell subpopulations. (iv) Phenotypic analyses at single cell level (with direct association with corresponding scRNA-Seq) or using two cell populations with extreme opposite phenotypes can complement transcriptome analyses. Phenotypic analyses include assessment of protein expression level and localization, using FACS staining, immunofluorescence or western blots, or assessment of protein activity studies, using assays quantifying enzymatic activity, transport activity (channels) or translocation assays (metabolites). The analysis of specific phenotypes should also reveal single cell heterogeneity (i.e. single cells displaying different levels of protein expression or activity) or differences between multiple cell populations. (v) Finally, correlation analysis between transcriptomic data and phenotypical analyses should help identifying specific cell subsets displaying a specific phenotype and characterized by a defined transcriptional and molecular signature.

Single-cell RNA-Seq in virology
To date, a few single-cell RNA-Seq studies have been performed, investigating virus or cell diversity (Table 3).
In 2013, McWilliam Leitch and McLauchlan published the first scRNA-Seq analysis in virology, aiming at exploring the heterogeneity of Hepatitis C virus (HCV) quasispecies and assessing their fitness [47]. They isolated 16 Huh7 single cells that express constitutively JFH1_2a subgenomic replicon, extracted total RNA and amplified a 708 bp-specific N5B region by RT-qPCR. The PCR products were then cloned and 20 clones per single cell were sequenced, providing a total of 320 viral sequences. They showed that one single cell harbored on average 113 copies of viral RNA and that the sequence of HCV quasispecies differed extensively among cells, with some single cells displaying only WT viral RNA sequences while others contained up to four different viral RNA sequences, and diverse mutations. These data highlighted the co-existence of multiple quasi-species within the same single cell, suggesting distinct individual viral evolution, likely due to mutations introduced during genome replication by the error-prone RNA-dependent RNA polymerase. Furthermore, fitness analysis of three major variants was compared to the prevalent WT replicon and showed that the best replicative fitness was displayed by the WT replicon, consistent with a selective advantage for WT replication and its high prevalence.
In 2015 Wu et al. developed a dedicated pipeline called mRNA amplification and library construction system (MIRALCS) to investigate both cellular and viral RNA of the tumoral HeLa S3 cell line, originally transformed by Human Papilloma Virus (HPV) [48]. scRNA-Seq analysis revealed a high degree of heterogeneity among individual virus-infected cells. In particular, they identified alternative splicing of HPV genome, resulting either in internal E6 splicing leading to a truncated and nonfunctional E6 protein, or in aberrant HPV-host sequence fusion transcripts. Furthermore, coupling transcriptomic data with co-expression analysis identified a set of 283 genes co-regulated with E6 and E7, representing novel potential HPV interactors.
Tsioris et al. used microengraving to identify and capture West Nile Virus (WNV)-specific actively secreting and memory B cells isolated from 11 WNV-infected individuals, followed by single cell RT-PCR and sequencing of WNV-specific antibodies (V, D, J, CDR3 sequences of heavy and light immunoglobulin chains) [49,50]. They successfully identified and isolated 98 WNV-specific cells out of thousands of cells and identified complete antibody sequence from 19 single cells. Phenotypic analyses were carried out to characterize the diverse antibodies, using Plaque Reduction Neutralization Assay and binding affinity to WNV Envelope protein. This strategy successfully led to the identification of four novel WNV neutralizing antibodies.

Afik et al. developed T cell receptor (TCR) Reconstruction Algorithm for Paired-End Single cell (TRAPeS)
, an algorithm facilitating reconstruction of full-length TCR sequences from scRNA-Seq-derived short reads, in order to investigate TCR heterogeneity among CD8+ T cells [51]. Their analysis revealed that naïve CD8+ T cells expressed distinct TCR, thereby suggesting a heterogeneous population, while effector memory CD8+ T cells displayed some levels of oligoclonality. Similarly, virusexposed CD8+ T cells, during vaccination protocols or chronic infections, presented various degrees of heterogeneity. Indeed, Yellow Fever Virus (YFV) and Hepatitis C virus (HCV)-exposed CD8+ T cells (short-term exposure during vaccination protocol) displayed higher levels of TCR heterogeneity than cytomegalovirus (CMV)exposed cells (long-term exposure during persistent infection). These data suggest that TCR heterogeneity and cell oligoclonality were negatively correlated and depended on the duration of virus exposure. ScRNA-Seq data of 353 single cells were further explored to assess cell heterogeneity of CMV and YFV-exposed CD8+ T cells. The analysis clearly separated clusters of naïve CD8+ T cells, effector CD8+ T cells and CMV-specific CD8+ T cells. By contrast, YFV-specific CD8+ T cells displayed some cellular heterogeneity, with one effector memory-like subpopulation and one naïve-like subpopulation. Further analysis revealed that YFV-specific CD8+ T cell subpopulations were associated with a distinct TCR sequence, displaying longer CDR3 sequences in the naïve-like subset.
Recently, two studies used scRNA-Seq to investigate Zika virus (ZIKV)-mediated neuropathogenesis [52,53]. First, Nowakosky and colleagues used scRNA-Seq to characterize the gene expression profile of multiple cell types of the developing human cortex, in order to correlate ZIKV tropism and pathogenesis [52]. In particular, they evaluated the expression of 23 gene candidates potentially involved in ZKV across cell types and identified AXL as being highly expressed in radial glial cells, in astrocytes, in microglia and in endothelial cells. Further analysis of AXL expression was performed by immunostaining of multiple tissues, including iPSC-derived cerebral organoids, highlighting AXL expression in radial glia (ventricular zone staining) and neural stem cells. AXL expression pattern seemed thus consistent with cells susceptible to ZIKV infection and cerebral lesions (microcephaly and ocular lesions), further arguing for a possible role of AXL in ZIKV entry.
Onorati et al. established cells lines derived from neuroepithelial stem (NES) cells isolated from primary neocortex (NCX) or from spinal cord (SC) in order to investigate ZIKV infection success according to tropism and gene expression profile (scRNA-Seq) [53]. They confirmed previous data showing high expression of AXL in NES cells and radial glial cells, as well as preferential localization of ZIKVinfected cells at the ventricular zone. ZIKV infection of Table 3 ScRNA-Seq studies in virology.   Key findings/outcome: -Latent CD4+ T cells were heterogeneous, with two distinct cell clusters -Latent CD4+ T cells displayed differential response to TCR-mediated stimulation -Transcriptional heterogeneity was linked to viral reactivation potential organotypic brain slices confirmed the absence of infected cells in the cortical plate and the preferential infection of cells residing in the ventricular zone such as radial glial cells. Over time, a progressive disruption of radial glia scaffold and disorganization of the neocortical structure was observed, consistent with brain damage. They also investigated the role of TNAK-binding kinase 1 (TBK1), previously involved in innate immunity and cell proliferation, in NES cell growth arrest and cell death. Although scRNA-Seq data did not reveal any difference in TBK1 gene expression, fetal cortex analyses revealed a higher level of phosphorylated TBK1 (pTBK1) in the ventricular zone. Furthermore, ZIKV infection relocated pTBK1 from mitotic centrosomes to the mitochondria in NES-infected cells, resulting in mitosis disruption and increased cell death, and consistent with ZIKV-mediated cerebral lesions.
Johnson et al. performed scRNA-Seq on glioblastoma tumors and glioblastoma cell lines, previously reported to be heterogenous, in order to investigate a possible role of CMV infection in glioblastoma [54,55]. The analysis of 854 single cells failed to detect CMV transcripts, suggesting either the absence or the rarity of CMV sequences. Therefore, these data indicated that glioblastoma tumors may not contain active CMV.
Recent studies on the Human Immunodeficiency Virus (HIV) used single-cell RNA-Seq to identify the transcriptional signature of specific cell subsets, such as dendritic cells from elite controllers, highly permissive CD4+ T cells or inducible latently infected CD4+ T cells [56][57][58]. Martin-Gayo et al. characterized conventional dendritic cells (cDCs) isolated from elite controllers, at the single-cell level, in order to understand if these cells contributed to the improved response observed in these individuals [56,59]. Using scRNA-Seq, they identified three cDC subsets after HIV exposure: one subset similar to control unexposed cells, one subset of exposed cDCs, and a third exposed cDC subset with a different expression profile, showing upregulated antiviral response (interferon-stimulated genes and cytokines) and effective antigen-presenting properties. This third functional subset was enriched in elite controllers compared to chronic progressors and uninfected individuals, and may thus contribute to the improved immunological control observed in elite controllers.
Rato et al. identified two CD4+ T cell populations (isolated from two individuals) that, upon TCR-mediated stimulation, displayed high and low permissiveness to HIV infection [57]. scRNA-Seq analysis of non-infected activated CD4+ T cells revealed transcriptional heterogeneity, with individual cells reflecting a continuum of cell states and correlating with cellular activation status. High permissive cells contained more individual cells with successful activation profile than low permissive cells. Analysis of cell surface protein expression was further assessed by FACS and correlated with permissiveness to HIV. Correlation analysis of scRNA-Seq, cell activation, protein expression and HIV permissiveness identified candidate cellular markers marking HIV permissive cells. A few of these biomarkers were further validated as biomarkers of HIV permissiveness. Subsequent transcriptome analysis of biomarker-sorted cell subpopulations identified specific transcriptional features associated with supported viral replication. In particular, highly permissive cells displayed downregulated expression of 95 genes involved in antiviral response.
Using a similar approach, Golumbeanu et al. investigated the cell heterogeneity of HIV-infected cells during latent and activated stages, in a primary model of latently infected CD4+ T cells [58,60]. Using a virally-encoded fluorescent reporter gene to assess successful induction of HIV expression coupled with scRNA-Seq analysis, Golumbeanu et al. identified two cell subsets, with differential responsiveness to cell stimulation and HIV reactivation. One cell cluster displayed globally higher gene expression levels, and was more susceptible to cellular stimulation and HIV expression than the other cell cluster. Differential expression analysis also identified a 134 genespecific signature to discriminate the two cell subsets.

Conclusions
Overall, scRNA-Seq is a powerful technique allowing characterizing cellular heterogeneity within a cell population. Association of transcriptome profilings with specific cellular phenotypes may help identifying specific proteins or biomarkers of interest involved in virus-host interactions, such as antibodies, TCR, virus receptor or cell-specific biomarkers.
Current analyses capture a unique snapshot of a singlecell gene expression profile at a specific time point, and do not allow manipulation of a single cell with follow-up analyses over time. Future technical improvements may possibly overcome this limitation, at least partially. Furthermore, development of novel protocols allowing simultaneous analysis of multiple molecular molecules, such as DNA, RNA and proteins, within the same individual cell would provide an additional step toward a more comprehensive 'single-cell integrome', which will likely be instrumental to further our understanding of viral replication and virus-host interactions.