Cell-to-Cell Variation in Defective Virus Expression and Effects on Host Responses during Influenza Virus Infection.

Virus and host factors contribute to cell-to-cell variation in viral infections and determine the outcome of the overall infection. However, the extent of the variability at the single-cell level and how it impacts virus-host interactions at a system level are not well understood. To characterize the dynamics of viral transcription and host responses, we used single-cell RNA sequencing to quantify at multiple time points the host and viral transcriptomes of human A549 cells and primary bronchial epithelial cells infected with influenza A virus. We observed substantial variability in viral transcription between cells, including the accumulation of defective viral genomes (DVGs) that impact viral replication. We show (i) a correlation between DVGs and virus-induced variation of the host transcriptional program and (ii) an association between differential inductions of innate immune response genes and attenuated viral transcription in subpopulations of cells. These observations at the single-cell level improve our understanding of the complex virus-host interplay during influenza virus infection.IMPORTANCE Defective influenza virus particles generated during viral replication carry incomplete viral genomes and can interfere with the replication of competent viruses. These defective genomes are thought to modulate the disease severity and pathogenicity of an influenza virus infection. Different defective viral genomes also introduce another source of variation across a heterogeneous cell population. Evaluating the impact of defective virus genomes on host cell responses cannot be fully resolved at the population level, requiring single-cell transcriptional profiling. Here, we characterized virus and host transcriptomes in individual influenza virus-infected cells, including those of defective viruses that arise during influenza A virus infection. We established an association between defective virus transcription and host responses and validated interfering and immunostimulatory functions of identified dominant defective viral genome species in vitro This study demonstrates the intricate effects of defective viral genomes on host transcriptional responses and highlights the importance of capturing host-virus interactions at the single-cell level.

Influenza virus defective interfering particles (DIs) readily generated during successive high multiplicities of infection (MOIs) in cell culture passages (17)(18)(19)(20)(21) and observed in natural infections (22,23) contribute further to the observed cell-to-cell variation in infection. Influenza virus DIs have defective viral genomes (DVGs) that carry large internal deletions but retain conserved 5= and 3= termini identical to those in the full-length genomes (24,25). They have a significant effect on productive infection (26), as they possess the ability to interfere with the replication of infectious viruses (18). Because DIs diminish the productivity of infectious progeny and may impact disease outcome, they are of great interest for therapeutic and prophylactic purposes (reviewed in references 27 to 30). A specific influenza virus DI (i.e., DI244) was shown to effectively provide protection against IAV infection in mice (31,32) and ferrets (33,34). One of the ways influenza virus DIs are thought to modulate viral infections is through interaction with a cytosolic pattern recognition receptor (PRR), retinoic acid-inducible gene I (RIG-I), essential for interferon (IFN) induction (35). It has been assumed that, besides having enhanced IFN induction in vitro (36)(37)(38)(39) and in vivo (40), DIs may compete with standard viruses for cellular resources (reviewed in references 27 to 30 and 36). Recent studies on paramyxovirus revealed high heterogeneity in the accumulation of copy-back DVGs, resulting in the establishment of persistent infection in a subpopulation of cells (8) and differential levels of production of standard and defective viral particles (7). However, similar studies have not been done with influenza virus DVGs.
While diverse DIs can arise during IAV infection (40,41), the emergence and accumulation of distinct DVGs and their impact on host gene expression have not been well characterized at the population level nor at a single-cell resolution. Using singlecell transcriptome sequencing (RNA-seq), which allows us to probe viral and host transcriptomes simultaneously in the same cells and determine the abundance and diversity of DVGs, we monitored host-virus interactions in cultured cells over the course of IAV infection. These data established a temporal association between the level of viral transcription and effects on the host transcriptome and characterized the diversity and accumulation of DVG transcripts.

RESULTS
Cell-to-cell variation in virus gene expression. To determine how both the viral and host cell transcriptional programs relate to each other over the course of an influenza virus infection, we (i) infected two cell types, the adenocarcinomic human alveolar basal epithelial A549 cell line and human bronchial epithelial cells (HBEpC), at a high multiplicity of infection (MOI; 5) with A/Puerto Rico/8/34 (H1N1) (PR8) and (ii) performed transcriptome profiling by conventional bulk RNA-seq and a droplet-based single-cell RNA-seq approach. A high-MOI infection ensures that virtually all the cells can rapidly be infected, promotes the accumulation of DVGs, and consequently enables the characterization of the host response and DVG diversity. We first determined the percentage of reads that uniquely aligned with viral genes from the total number of mapped reads to obtain the relative abundances of virus transcripts within cells at each time point. Similarly to what has been observed at early stages of infection during a low-MOI infection with IAV (12), the relative abundances of virus transcripts were heterogeneous across cells from both cell types, with 0 to 70% of the total reads in each cell being derived from virus transcripts and the relative abundances of these transcripts increasing over time (see Fig. S1a in the supplemental material). The same trend was also seen when we analyzed segment-specific virus transcripts within individual cells over the course of the infection (Fig. S1b).

Heterogeneity of defective virus segment expression across the cell population.
Since DVGs are known to accumulate in cell culture, especially during high-MOI infections (17)(18)(19)(20)(21), and can serve as templates for transcription (42), we characterized their abundances and diversity by examining gap-spanning reads in the sequencing data. To determine the relative abundances of DVGs identified, we calculated the frequency of each unique DVG transcript type (determined by 5=/3= gap coordinates) across single cells and the ratio of these to non-gap-spanning transcripts derived from the corresponding viral segments (i.e., DVG/full-length genome [FL] ratio). Although DVG transcripts could be detected in all viral segments, those originating from the polymerase segments (i.e., PB2, PB1, and PA) occurred with the highest abundances and frequency in both cell types over the course of the infection (data available on GitHub, https://github.com/GhedinLab/Single-Cell-IAV -infection-in-monolayer/blob/master/AdditionalFiles/AdditionalFile1_DVG-from-all -segments_DVG-FL-ratios.pdf), consistent with observations made in other studies (22,43). We thus focused our analyses on the detection of DVG transcripts derived from those segments. We collected reads with large internal deletions spanning Ն1,000 nucleotides (nt) (Fig. 1a) and identified the junction coordinates of these gap-spanning reads. We observed a diverse pool of DVG transcripts, including some shared with the viral RNA (vRNA) segments from the PR8 stock (data are available on GitHub, https:// github.com/GhedinLab/Single-Cell-IAV-infection-in-monolayer/blob/master/ AdditionalFiles/AdditionalFile2_Polymerase-segments-DVG-species.pdf). The sizes for the majority of these DVG transcripts are estimated to be between 300 nt and 1,000 nt. To evaluate the accumulation of DVG transcripts in the virus transcript pool over the course of the infection, we compared changes in the ratios of DVG transcripts to FL transcripts derived from a given polymerase segment (DVG/FL) across time points. The polymerase DVG/FL ratios increased significantly between the early (6 h postinfection [hpi]) and late (24 hpi) stages of the infection (P Ͻ 2.2 ϫ 10 Ϫ16 ) and displayed a high level of heterogeneity across single cells from both cell types (Fig. S2).
Interestingly, the junction sites aggregated in specific genomic regions, as they systematically occurred in a high percentage of cells. We identified two predominant types of DVG transcripts corresponding to defective PB2 and PB1 segments ( Fig. 1b and c and Fig. S3 and S4). These same DVG transcripts were present in the stock, increased in prevalence over the course of the infection, and were conserved in different cell types and at different MOIs (data are available on GitHub, https:// github.com/GhedinLab/Single-Cell-IAV-infection-in-monolayer/blob/master/ AdditionalFiles/AdditionalFile3_Polymerase-DVG-species-abundance_previous-PB2-PB1.pdf), indicating likely carryover from the stock virus rather than de novo formation in each cell type. For PA, one particular DVG type that was present in the stock was found in A549 and Madin-Darby canine kidney (MDCK) cells at different MOIs, but not in HBEpC. Its prevalence was also much lower than for PB2 and PB1 DVGs (data are available on GitHub, https://github.com/GhedinLab/Single-Cell-IAV -infection-in-monolayer/blob/master/AdditionalFiles/AdditionalFile4_Polymerase-DVGspecies-abundance_previous-data-and-PA-data.pdf). The dominant DVG PB2 and PB1 transcripts derived from corresponding defective segments in both the stock and the infected cells showed stable relative abundances across individual cells over the course of the infection, suggesting the persistence of the defective viral segments.
To determine if any of the inferred defective polymerase segments could lead to the generation of defective interfering particles (DIs), we tested their interference potential experimentally at the bulk level. We generated clonal PB8-DI162 and PR8-DI222 viruses carrying the deletion sites in the PB2 segment identified as those in two dominant PB2-DVG species and then coinfected each of the DIs with wild-type (WT) PR8 virus in A549 cells. Both the PR8-DI162 and PR8-DI222 viruses inhibited the productivity of WT virus as well as PR8-DI244, known to be an effective DI (31) (Fig. 1d). To confirm that inhibition is specific to the DIs and not due to competition between coinfecting strains, we also tested a full-length PR8 virus with a frame-shifting point mutation in the PB2 gene. Coinfection with this mutant virus did not affect the productivity of WT virus (a) Selection criteria for reads considered to be representatives of DVGs. Given the lengths of the reads, we required that each portion of the gap-spanning reads were at least 10 bp in length. Considering that DVGs typically lose approximately 80% of their original sequences and considering the distribution in the size of the deletions detected in this study, we also required that the size of the gap should be at least 1 kb. (b and c) Schematic diagram showing two types of defective virus transcripts derived from the PB2 (b) and PB1 (c) segments, respectively. The caret (^) denotes a deletion event, and the numbers before and following it indicate the coordinates of the junction sites. (d) Interfering capacity of generated defective interfering particles (DIs) carrying identified deletion sites in the PB2 segment during coinfection with the WT PR8 virus. A549 cells were coinfected with WT virus and one type of DI at a ratio of 1:1,000 or 1:1. Viral titers were measured by the TCID 50 assay on MDCK cells. Four types of defective viruses, including two DIs identified in this study, a previously reported DI244 virus (31) that has been considered a potential candidate for antiviral therapy (27), and a PB2 polymerase-defective PR8 virus mutant (i.e., PB2del) that carries 1 nucleotide deletion in the coding sequence of the PB2 gene, were tested. For each type of DI, the information about the junction sites is given in parentheses. The error bar representing the standard deviation of the mean was obtained from three biological replicates. Each panel shows the results of an independent interference assay. (Fig. 1d). The PR8-DI222 virus had a strong inhibitory effect at a DI/WT ratio of 1,000:1, and a mild effect could still be detected at a DI/WT ratio of 1:1 (Fig. 1d). This confirmed the interfering ability of the two types of defective PB2 segments identified in the single-cell data. The interference test was not intended to mimic a single-cell scenario, given the difficulties in recreating single-cell infection conditions at a bulk level.
Cell-to-cell variation in concurrent virus-host transcriptional changes over the course of the infection. Viral infection can trigger massive changes in the host transcriptional program, including the activation of the interferon (IFN) response (reviewed in reference 44). Since the induction of IFNs is also subject to cell-to-cell variation (13,15,45), we first evaluated the expression frequency of type I IFN (e.g., beta IFN [IFN-␤]) and type III IFN (i.e., IFN-) to determine the extent of the variation during infection. While these were significantly differentially expressed over the course of the infection at the population level, as measured by bulk RNA-seq (Fig. 2a), the expression of IFNs was detected in only less than 3% of cells until the late stage of infection, although this proportion increased over time (Fig. 2b and c). The same expression profile was observed for a number of IFN-stimulated genes (ISGs), such as RSAD2, CXCL10, GBP4, GBP5, IDO1, and CH25H, in A549 cells and for IFI44L, CMPK2, IFIT1, BST2, OASL, and XAF1 in HBEpC (Fig. 2). In silico pooling of the single-cell data to mimic the population-level measurement resembles the bulk RNA-seq data, thus excluding the possibility of substantial technical limitation of single-cell RNA-seq (Fig. 2a).
To evaluate cell-to-cell variation in the host response manifested by the alteration in the host transcriptome following IAV infection, we performed unsupervised cell clustering using solely the host transcriptional data. We then annotated single cells in each subpopulation with information about the relative abundances of virus transcripts and the DVG/FL ratio. We observed a correlative pattern between the distribution of the relative abundances of virus transcripts and the host transcriptional landscape across cells within a population, as cells with a high abundance of virus transcripts formed a separate cluster by 12 hpi in both cell types. This population structure, marked by a drastic difference in the level of viral transcription between clusters, was retained at 24 hpi ( Fig. 3a and b). However, cell cycle status, which contributes to the heterogeneity of the cell population by 12 hpi and the formation of clusters in distinct cell cycle stages ( Fig. S5a and Fig. S5b) with similar abundances of virus transcripts, does not seem to have a significant impact on population structure in A549 cells at 24 hpi. Consistently with previous reports on G 0 /G 1 cell cycle arrest (46,47), at 24 hpi, we observed a drastic increase in the proportion of cells in the G 0 /G 1 phase compared to proportions at earlier time points (Fig. S5c). Interestingly, this same pattern was not observed in HBEpC at the same time point; in these cells, the host and viral transcriptome landscape at 24 hpi is similar to that in A549 cells at 12 hpi, suggesting a temporal difference in host response and viral replication between cell types.
To characterize host genes driving the changes associated with viral transcription, we assessed the overrepresentation of gene ontology (GO) terms associated with differentially expressed genes in each subpopulation of A549 cells and HBEpC compared to the other subpopulations of cells at 24 hpi. In two subpopulations of A549 cells with high relative abundances of virus transcripts (clusters 0 and 1 in Fig. 4a), there is an enrichment of genes involved in viral transcription, RNA processing, translation, signal recognition particle-dependent cotranslational protein targeting to the membrane, and mitochondrial electron transport (Fig. 4a). These GO terms are also associated with genes highly expressed in HBEpC, which have the highest abundances of virus transcripts at 24 hpi (cluster 3 in Fig. 4b) (P is Ͻ2.2 ϫ 10 Ϫ16 in a comparison with the other clusters). In contrast, genes involved in antiviral responses, such as the type I IFN signaling pathway and the negative regulation of viral genome replication, are highly expressed in the other two subpopulations of A549 cells with similar or severely reduced relative abundances of virus transcripts (clusters 3 and 2, respectively, in Fig. 4a). However, some antiviral genes are differentially induced in cells with different levels of viral transcription. For example, type I and III IFNs, as well as a subset of ISGs, are highly expressed in cluster 3, and another subset of ISGs is highly expressed in To further understand how viral and host transcriptional activities are coordinated in clusters of cells that are subjected to different levels of viral stimuli, we used multiscale embedded gene coexpression network analysis (MEGENA) (48) and subsequent correlation analyses to directly characterize virus-host interactions. This was done for each cluster individually for A549 cells and HBEpC collected at 24 hpi to identify modules of coexpressed genes representing coherent functional pathways. The relevance of a module to viral infection was evaluated by correlating the first principal component of a module with the relative abundance of virus transcripts. As with the processes identified in the GO term overrepresentation approach using differentially expressed genes identified within a given cluster of cells (Fig. 4), those related to the host cell cycle and viral life cycle, including transcription, translation, RNA processing, metabolic process, and protein localization to the membrane, were enriched in the modules that were correlated with viral transcription and identified in the clusters of A549 cells with high abundances of virus transcripts (Fig. S7). Conversely, the host innate immune response (e.g., the response to type I IFN) was associated with viraltranscription-correlated modules in cells with the lowest abundance of virus transcripts (cluster 2 of A549 cells at 24 hpi in Fig. 5a). For example, module M100 negatively correlated with the relative abundance of virus transcripts ( ϭ Ϫ0.262, P ϭ 2.9 ϫ 10 Ϫ5 ) in cluster 2 of A549 cells, which has genes, including ISG20, RNF213, IFI35, STAT1, RBCK1, IFI27, OAS2, OAS3, XAF1, STAT2, and IFIT5, that are involved in the innate antiviral response and that were significantly highly expressed compared to their expression in other cells of the same population (Fig. 5b). In HBEpC, viral-transcription-correlated modules, such as M227, which were enriched for type I IFN signaling genes (e.g., ISG15, MX1, MX2, IFI35, IFI27, IFIT1, IFITM3, OAS1, OASL, and ISG20) and positively correlated with the relative abundance of virus transcripts ( ϭ 0.103, P ϭ 0.021) ( Fig. 5c and d), were detected in one cluster of cells with a low abundance of virus transcripts (cluster 2 of HBEpC at 24 hpi in Fig. 5c). Modules correlated with viral transcription in the other two clusters of HBEpC with low abundances of virus transcripts (clusters 0 and 1 in Fig. S8) were enriched for genes associated with the host cell cycle, consistent with observations of A549 cells at 12 hpi (Fig. S6), while the viral-transcription-correlated modules in HBEpC with a high abundance of virus transcripts (cluster 3 in Fig. S8) were enriched for genes involved in the endoplasmic reticulum (ER)-associated ubiquitindependent protein catabolic process, the small-molecule metabolic process, and vesicle-mediated transport, to name a few. Overall, by identifying modules of genes that have a direct correlation with virus transcript abundance, the gene coexpression network analysis further demonstrates a complex association between virus transcript abundance and host responses, consistent with the pattern revealed by GO term We also indicate another overrepresented GO term related to the type I IFN signaling pathway in cluster 2 of HBEpC, as shown separately in a lower section of the bubble chart. Each GO term is denoted by a bubble. The color intensity of each bubble indicates the fold enrichment of the corresponding GO term, and the size corresponds to the ratio of queried genes in the gene set associated with a given GO term. The GO terms enriched in each cluster are color-coded by cluster. The asterisks next to the GO terms in the bubble charts denote the GO terms enriched in two clusters, and their colors indicate the identity of the other cluster besides the one denoted by the color of the GO term. The bar chart above the bubble chart of the GO terms shows the median relative abundances of the virus transcripts across cells in each cluster. The colors of the bar are consistent with those of the GO terms.
analysis of highly expressed genes in individual clusters, and provides further evidence for the complexity of virus-host interactions.
Differential expression of host genes linked to defective viral genomes. Since DVGs are known to play an important role in IFN induction and inhibition of viral replication (reviewed in references 28 to 30 and 36), we determined whether the accumulation of DVGs was associated with attenuated viral transcription and induction of innate immune antiviral response genes. We observed a subtle but consistent inverse relationship between the relative abundance of virus transcripts and the PB2 and PB1 DVG/FL ratios within each cluster of A549 cells at 12 hpi and that of HBEpC at 24 hpi ( Fig. 6a to d). The cluster of cells with the highest abundance of virus transcripts (cluster 1 of A549 cells at 12 hpi in Fig. 6a and cluster 3 of HBEpC at 24 hpi in Fig. 6b) had the lowest PB2 and PB1 DVG/FL ratios ( Fig. 6c and d) (P was Ͻ2.2 ϫ 10 Ϫ16 in a comparison of PB2 and PB1 DVG/FL ratios in cluster 1 of A549 cells or cluster 3 of HBEpC against the other clusters). However, in A549 cells at 24 hpi, higher PB2 DVG/FL ratios were Each GO term is denoted by a bubble. The color intensity of each bubble indicates the fold enrichment of the corresponding GO term, and the size corresponds to the log 10 -transformed corrected P value for a given GO term. Red and blue color bars above the bubble charts denote positive or negative correlation of viral transcription with corresponding modules, respectively. The right panels show the MEGENA network of modules enriched with the innate immune response, including module M100 in cluster 2 of A549 cells (b) and module M227 in cluster 2 of HBEpC (d). Red and blue nodes represent significantly up-or downregulated genes compared to their expression in other cells in the population, while light-green nodes denote genes that are not significantly differentially expressed. Diamond nodes indicate key regulators. The size of the nodes indicates node strength after multiscale hub analysis within the MEGENA pipeline. Modules that have a significant correlation with the relative abundance of virus transcripts and enriched GO biological process terms are shown in the GO term bubble charts (a and c). Module member information for all viral-transcription-correlated modules can be found on GitHub (https://github.com/GhedinLab/Single-Cell-IAV-infection-in-monolayer/tree/master/AdditionalFiles/MEGENA_Tables).

Defective Influenza Virus Expression and Host Response
® observed in the cluster of cells with the lowest abundance of virus transcripts (cluster 2 in Fig. 6e and f) (P was Ͻ2.2 ϫ 10 Ϫ16 in a comparison with clusters 0 and 1 in Fig. 6f) and in a cluster of cells with a relatively high abundance of virus transcripts (cluster 3 in Fig. 6e and f) (P was 4.036 ϫ 10 Ϫ11 in a comparison of clusters 0 and 1 in Fig. 6f). The highest level of PB1 DVG/FL ratios was observed only in cluster 2 (P was 1.943 ϫ 10 Ϫ9 in a comparison of cluster 2 with the other clusters). Moreover, innate immune response genes were highly expressed in cells with either an elevated DVG/FL ratio for both DVG PB2 and PB1 transcripts and a low abundance of virus transcripts (cluster 2 of A549 cells at 24 hpi in Fig. 6e and f) or an elevated DVG/FL ratio for PB2 transcripts and a relatively high abundance of virus transcripts (cluster 3 in Fig. 6e and f). It suggests that both DVGs and full-length genomes may play a role in stimulating the innate immune response and that DVG immunostimulation may also be independent of viral abundance. Notably, the abundances of two dominant and ubiquitous PB2-DVG transcripts in the total virus transcript pool vary between cells and across clusters. The PB2-DVG transcripts carrying the same deletion sites as in PR8-DI222 were highly abundant in the cluster of cells with the lowest abundance of virus transcripts (cluster 2 of A549 cells at 24 hpi in Fig. 6g) (P was Ͻ2.2 ϫ 10 Ϫ16 in a comparison with the other clusters), while the other PB2-DVG transcripts corresponding to PR8-DI162 were highly abundant in both cluster 2 and cluster 3 (Fig. 6g) (the P values were 2.067 ϫ 10 Ϫ10 and 0.001241, respectively, in comparisons with clusters 0 and 1), suggesting a potential difference in the levels of induction of innate immune response genes by different DVG species. We did not detect a difference in the relative abundances of defective PA transcripts among cells with different levels of viral transcription (Fig. S9).
Given the enrichment of PB2-DVGs and the overexpression of ISGs in cluster 2 of A549 cells at 24 hpi compared to their representation in the rest of the cells in the same population, we experimentally validated the immunostimulatory impact of PB2-DVGs on the host cells. We infected A549 cells with each of the 3 PR8-DIs (PR8-DI162, -DI222, or -DI244) or with WT virus. We quantified the expression levels of the influenza virus M gene, of ISGs that were shown to be significantly highly expressed in cluster 2 of A549 cells, including IFI35, IFI27, and MX1, and of the type I (IFN-␤) and III (IFN-) IFNs that were shown to be significantly highly expressed in cluster 3 of A549 cells ( Fig. S10a  and b). Due to their limited ability to replicate without a helper virus, DIs cultured on their own exhibit a significantly lower level of viral transcription than WT virus (Fig. S10c). Despite the lower replication of DIs, there was a higher induction of ISGs in DI-infected cells than in cells infected with WT virus, while there were comparable or marginally higher inductions of IFN-␤ and IFN-in the WT-infected cells (Fig. S10c). Together, these data demonstrate that at the late stage of infection, PB2-DVGs have a strong immunostimulatory effect on the host cells manifested by a significant induction of ISGs, even at very low levels of viral transcription and independently of type I and III IFN expression levels.

DISCUSSION
The goal of this study was to quantitatively and qualitatively characterize viral and host factors that contribute to cell-to-cell transcriptional variation observed during IAV infection. We identified DVGs as potentially important factors in the temporal variation of the host cell transcriptome. The substantial cell-to-cell variation in IAV replication and host response that we detected highlights how single-cell RNA-seq experiments can provide novel insight into the complexity of virus-host (and DI-host) interactions in influenza virus infection. d, and f) DVG/FL ratios for DVG PB2 and PB1 transcripts calculated as the ratio of gap-spanning reads to non-gap-spanning reads aligned to a 3= coverage peak region derived from the corresponding viral segments in each cluster. (g) DVG/FL ratios for PB2-DVG transcripts carrying deletion sites similar to those in PR8-DI222 and PR8-DI162 viruses in each cluster of A549 cells at 24 hpi. All box plots show the first and third quantiles as the lower and upper hinges, the median in the center, and a 1.5 interquartile range (IQR) from the first and third quantiles as whiskers. Asterisks over the boxes denote the statistical significance levels of pairwise comparisons determined by the one-tailed Wilcoxon rank sum test: *, P Յ 0.05; **, P Յ 0.01; ***, P Յ 0.001; and ****, P Յ 0.0001. ns, not significant.

®
The inherent stochasticity of molecular processes, especially during the initial stages of infection, can contribute to the cell-to-cell variation in viral replication (1). Evidence from studies examining low-MOI IAV infections with different virus strains and cell lines suggests that this intrinsic stochasticity can impact viral replication and result in the loss of viral genome segments, transcripts, or proteins (1,11,12,16). Similarly, at different time points of the high-MOI infection, we observed cell-to-cell variation in viral transcription, including segment-specific transcription. Although the mechanisms leading to this observation are unclear based on our data, it may be explained by the following possibilities as discussed in reference 16, including (i) the absence of one or more viral genome segment in infecting virions, (ii) a deficiency in intracellular trafficking of incoming viral ribonucleoprotein complexes (vRNPs), (iii) the random degradation of vRNAs prior to transcription, or (iv) mutations resulting in the decreased stability of viral mRNAs.
Another source of variability in viral replication stems from the emergence and accumulation of various vRNA species synthesized by the error-prone viral polymerase (reviewed in references 49 and 50). Indeed, viral genetic diversity, including amino acid mutations in the NS1 and PB1 proteins, the absence of the NS gene, and an internal deletion in the PB1 gene, contributes to the cell-to-cell variation in the innate immune response, as shown in a recent report on a low-MOI infection (13). Although the 3= sequencing strategy that we used in our study limits the identification of mutations in the full-length segments, our experimental setup provides a unique opportunity to evaluate the diversity of DVGs by characterizing DVG transcripts, since a high-MOI infection promotes the accumulation of DVGs. Late stages of infection also emphasize the impact of accumulated DVGs on the host response. We identified a diverse pool of DVG transcripts derived from the three polymerase segments, including several dominant species. Notably, the fact that all the dominant DVG PB2 and PB1 species identified in the infected cells were also present in the virus stock and that the dominant PB1 DVG species were detected in an increasing proportion of cells over the course of the infection suggest potential cellular transmission of DIs carrying these DVGs. However, we cannot exclude the possibility that the regions where the hot spots were located are most prone to polymerase skipping and that DVG enrichment over the course of the infection resulted in their abundance above a detection threshold in an increasing number of cells.
The accumulation of DVGs can have consequences on viral replication by directly competing with the full-length genomes and modulating host innate immune responses (reviewed in references 27 to 30, 49, and 51 to 53). The immune-modulatory effect of DVGs has been attributed to efficient activation of the IFN induction cascade and antiviral immunity, a mechanism especially well known for the copy-back DVGs found in many negative-sense RNA viruses, such as SeV (reviewed in references 28 to 30 and 36). Consistently with previous reports, we detected enrichment of highly expressed genes involved in the IFN response in one subpopulation of cells with significantly attenuated viral transcription. In this cluster of cells, we also observed that DVG transcripts accumulated to a high level. The observation was further validated in our DI or WT infection assay, in which we detected a significantly higher expression of some ISGs in DI-infected cells independently of the level of viral transcription and of IFNs at the late stage of the infection. The observed strong innate immune response in DI-infected cells may be explained by reduced expression of the NS1 protein, a major IFN antagonist, compared to that in WT-infected cells. However, the observation of DI-mediated antiviral protection against lethal homologous WT virus infection in type I IFN receptor knockout mice suggests the involvement of innate immune factors other than type I IFN (54). Alternatively, the IFN mRNA dose-independent ISG expression in DI-infected cells may suggest a potential novel mechanism of DVG immunostimulation that is independent of IFN induction. Expression of a subset of ISGs can be induced in the absence of IFN signaling (55,56). It is likely to be at least partially attributed to similar and overlapping consensus DNA-binding motifs for the interferon regulatory factor (IRF) family, most notably IRF3 and IRF7, which are key regulators of type I IFN production, and for IFN-stimulated gene factor 3 (ISGF3), which promotes the expression of many ISGs in response to IFN signaling (57)(58)(59)(60). For example, an IRF3 chromatin immunoprecipitation sequencing (ChIP-seq) experiment with adenovirus-infected human primary bronchial/tracheal epithelial cells demonstrated IRF3 binding to the promoter region of induced ISGs, including MX1 and ISG15, which have the IFNstimulated response element (ISRE) bound by IRFs and ISGF3 in the promoter regions (61). Further investigation of ISG induction in the absence of IFN signaling during DI infection will determine whether an IFN-independent mechanism of DI immunostimulation does indeed exist. In addition, given the fact that some ISGs may be highly expressed in cells with a high level of viral transcription or a high abundance of DVGs in the virus pool suggests that viral factors other than the accumulation of DVGs may also play an important role in triggering the innate immune response. Nevertheless, unlike SeV copy-back DVGs, which carry complementary termini generated by copying back the authentic 5= terminus to the 3= terminus (62,63), IAV "deletion" DVGs have termini identical to those of the full-length genomes. In the SeV copy-back DVGs, a stretch of double-stranded RNA (dsRNA) adjacent to a 5= triphosphate serves as an effective RIG-I ligand and strongly activates IFN-␤ (35,36,63,64), and an RNA motif presented in the DVGs plays an important role in stimulating a strong antiviral response (65,66). In influenza virus infection, short viral RNA templates (shorter than 149 nt) retaining the conserved 5= and 3= termini, called mini vRNAs (mvRNAs) (67), can be transcribed and replicated in the absence of nucleoprotein (NP) (68) and induce strong IFN-␤ expression by binding to, and activating, RIG-I (67). However, the mechanism of a more effective immunostimulation by IAV DVGs (which are larger than mvRNAs) than by the full-length genomes remains elusive (reviewed in reference 36). In addition, the interfering and immune-modulatory abilities of influenza virus DVGs may be attributed to DVG-encoded proteins that directly interact with mitochondrial antiviral signaling (MAVS) and act independently of RIG-I in IFN induction, as previously reported (69).
Cell-to-cell variation during viral infection has been reported in studies of different viruses using different single-cell techniques (1, 2, 4-10, 12-15, 70-74). In the past few years, dissecting heterogeneity of viral replication and understanding complex virushost interactions have been greatly facilitated by the application of fast-evolving single-cell RNA-seq approaches (reviewed in reference 75). Our study, along with several other in vitro (12,13,15) and in vivo (14) studies of single-cell transcriptome dynamics during influenza virus infection, reveals substantial heterogeneity of viral gene expression and cellular responses, especially the innate antiviral response. However, unlike other in vitro studies using single-cell RNA-seq (12,13,15) that focused either on a low-MOI infection scenario, which limits the chance of coinfection, or on the early-to-middle stages of a high-MOI infection, which emphasizes the early host response, our study of high-MOI infection over multiple infection stages allows us to characterize and determine the abundances of diverse DVG species that accumulated in individual cells and to comprehensively capture variations in host responses in these infected cells. By distinguishing different DVG species at the sequence level, it also allowed us to associate the enrichment of certain DVG species with the attenuation of viral transcription and the induction of innate immune responses at a single-cell resolution. For example, the accumulation of PB2-DVGs in one subpopulation of cells was associated with significantly attenuated viral transcription and a strong host innate immune response manifested by high expression of some ISGs. Although this appeared to be associated primarily with one dominant PB2-DVG species (PR8-DI222), we cannot rule out the effect of other non-dominant defective PB2-DVGs, DVGs derived from other segments, or certain mutations in stimulating host responses. Diverse DVG species can play different roles in shaping the virus-host interaction. Further investigation is necessary to elucidate the dynamics and impact of diverse DVG species during influenza virus pathogenesis. Our characterization of DVGs and corresponding host responses at the single-cell level provides a different view into complex virus-host interactions and demonstrates the intricate effects of various viral factors, including (but probably not limited to) the accumulation of DVGs, in shaping infection outcome.
Infection assays. Subconfluent monolayers of A549 cells in T-25 flasks were washed with A549 infection medium (F-12K supplemented with 0.15% bovine serum albumin [BSA] fraction V and 1% antibiotic-antimycotic) and infected with influenza virus at a multiplicity of infection (MOI) of 5 in 0.5 ml precooled A549 infection medium. The flasks were incubated at 4°C for 30 min with agitation every 5 to 10 min, followed by the addition of 4.5 ml prewarmed A549 infection medium, and transferred to a 37°C, 5% CO 2 incubator for further incubation. HBEpC were infected similarly except that the HBEpC infection medium was PromoCell airway epithelial cell growth medium. The inoculum was back-titrated to confirm that the desired MOI was used. The virus-infected cells were collected at 6, 12, and 24 hpi, and the mock-infected cells were collected at 12 hpi. There was no substantial cell death observed at any time point. Cells were extensively washed before their resuspension in PBS containing 0.04% BSA. A proportion of cells from one of the two duplicate flasks per time point were subject to 10ϫ Genomics Chromium single-cell library preparation. The remaining cells in the two flasks were used for conventional bulk RNA-seq library preparation. Notably, due to a failure in the single-cell library preparation for the PR8-infected HBEpC collected at 12 hpi, we replaced this sample by repeating the same infection assay with HBEpC.
Sample preparation, library construction, and sequencing. A 10ϫ Genomics single-cell library preparation with Chromium 3= v2 chemistry was performed by following the manufacturer's protocol. A range of 3,900 to 7,500 cells were used as the input in each single-cell preparation, with a median of ϳ3,353 cells (range, 2,075 to 5,254 cells) obtained following sequencing, as described below. Sequencing was performed on the HiSeq 2500 in HighOutput mode (v2) with one library per lane according to the manufacturer's recommended sequencing configuration (i.e., for paired-end read 1, 27 bp, for read 2, 99 bp, and for the i7 index, 8 bp). An average of 59,200 reads per cell was obtained. For bulk mRNA sequencing, total RNA from two replicate flasks per time point was extracted using an RNeasy minikit (Qiagen, Hilden, Germany) with on-column DNA digestion using RNase-free DNase (Qiagen). The conventional bulk RNA-seq library was prepared using an NEBNext Ultra II RNA library prep kit for Illumina following poly(A) mRNA enrichment. Libraries were multiplexed and sequenced on an Illumina NextSeq 500 in HighOutput 2ϫ 75-bp mode (v3). Viral genomic RNA (vRNA) of the virus stocks used in the infection assay was amplified using multisegment reverse transcription-PCR (M-RTPCR) (77). The M-RTPCR product was subject to Illumina library preparation with a Nextera DNA library prep kit and sequenced on the HiSeq 2500 in rapid 2ϫ 150-bp mode (v2).
Upstream computational analyses of single-cell RNA-seq data. The Cell Ranger (v2.1.0) single-cell software suite (10ϫ Genomics) was applied to the single-cell RNA-seq data to perform the alignment with the concatenated human (hg19) and influenza A virus (A/Puerto Rico/8/34) references with STAR (v2.5.3a) (78) and gene counting with the processed unique molecular identifiers (UMIs) for each cell. The UMI counts for the viral reads with the CIGAR string containing a gap (i.e., MNM) and different UMI sequences for comparing to the ungapped reads were added into the expression matrix later, since they were not considered during gene counting. Given the fact that we performed 3=-end sequencing, transcripts derived from alternative splicing of M and NS segments, encoding M1/M2 and NS1/NEP, respectively, could not be distinguished in our analyses, as those transcribed from the same genes have identical 3= ends.
To identify and exclude low-quality cells in the single-cell data from the downstream analyses, the following steps were taken for each sample and were employed to filter out cells that failed to meet these criteria: we (i) removed cells with fewer than 2,000 detected genes, (ii) removed cells with an alignment rate of less than the mean minus 3 standard deviations, (iii) removed cells for which the number of reads after log 10 transformation was not within 3 standard deviations below or above the mean, (iv) removed cells for which the number of UMIs after log 10 transformation was not within 3 standard deviations below or above the mean, (v) removed cells for which the percentage of reads aligned with mitochondrial genes was not within 3 standard deviations below or above the mean, and (vi) removed mock-infected cells with 1 viral read detected. Although we filtered cells based on the size of the total mRNA pool (i.e., total reads) as one of the criteria described above, we also checked the distribution of the size of the host mRNA pool (i.e., host reads) for individual cells. As reported in recent virus-infected single-cell studies (12,14,15), while virus-induced host shutoff of gene expression (79) is likely to be mediated mainly by host mRNA degradation, it remains unclear what effect viral transcription has on the size of the total mRNA pools. In our study, we observed that virtually all the cells with small or large host mRNA pools for which the number of host reads after log 10 transformation was not within 3 standard deviations below or above the mean were already being excluded when the 6 criteria described above were applied; thus, we decided to filter cells based on these. Approximately 120 to 250 cells were eliminated from each sample, with a median of 164 cells per sample. We further removed genes that were detected in fewer than three cells. After initial cell and gene quality control, the majority of samples had approximately 3,000 to 3,500 cells left for downstream analysis. The expression levels of host genes were normalized based on the size of the host mRNA pool.
Cell clustering with Seurat. An unsupervised cell clustering to identify subtle changes in the population structure after infection was performed on each time point datum separately by following the procedures of the Seurat package (v2.1.0) (80) using the normalized, scaled, and centered host gene expression matrix, without consideration of the relative abundances of virus transcripts within individual cells. Briefly, the highly variable genes with average levels of expression of Ͻ4 and dispersion of Ͼ1 were used as input for the principal-component analysis (PCA). The statistically significant and biological meaningful PCs determined by the built-in jackstraw and elbow analyses and manual exploration were retained for visualization by t-distributed stochastic neighbor (t-SNE) and subsequent clustering by a shared nearest neighbor (SNN) graph-based approach. The legitimacy of the initially identified clusters was validated using the ValidateClusters function in Seurat, which built a support vector machine (SVM) classifier with significant PCs and then applied the accuracy cutoff of 0.9 and the minimal connectivity threshold of 0.001. To identify markers that are differentially expressed among clusters, the FindMarkers function in Seurat was used with the different test options, including bimod (81), poisson, negbinom, and MAST (v1.4.1) (82). Only the genes that showed at least a 0.25-fold difference on the log scale between two groups and were expressed in at least 25% of the cells in either group were tested for differential expression. Significantly differentially expressed genes for each cluster (i.e., the markers) were identified by applying the adjusted P value cutoff of 0.05. Markers identified by all four methods were retained. GO term overrepresentation analysis of the upregulated markers was performed with the online service DAVID (83,84) by applying the Benjamini P value cutoff of 0.05. To determine the cell cycle stage associated with individual cells, cell cycle scoring and assignment were performed with Seurat using the CellCycleScoring function based on the expression of canonical markers (85).
Computational analyses of bulk RNA-seq and virus stock sequencing data. The raw bulk RNA-seq and virus stock sequencing data were first trimmed with Trimmomatic (v0.36) (86) to remove the adaptors and trim off low-quality bases. Reads with a minimal length of 36 bases in the trimmed bulk RNA-seq data set were aligned to the concatenated human (hg19) and influenza A/Puerto Rico/8/34 (H1N1) references with STAR (v2.5.3a) (78) with the default parameters and counted with featureCounts (87) in the Subread package (v1.5.1) (88), while reads in the virus stock sequencing data set were aligned to the influenza A/Puerto Rico/8/34 (H1N1) reference with STAR (v2.5.3a) (78). Differential-expression analysis was performed with DESeq2 (v1.18.1) (89) and edgeR (v3.20.5) (90, 91) using the bulk RNA-seq and the merged single-cell RNA-seq data. Host genes with the adjusted P value of Ͻ0.05 were identified as significantly differentially expressed at each time point.
Deletion junction identification, filtering, and quantification. Reads that aligned to both ends of a viral segment, with each aligned portion comprised of at least 10 nucleotides in length, were collected. Following UMI deduplication, reads with junction coordinates within a 10-nucleotide window were grouped together. We excluded from downstream analyses reads with junction coordinates that occurred fewer than 10 times in each sample. To compare quantitatively gap-spanning reads for each segment across cells and samples, the number of gap-spanning reads was normalized to the total number of non-gap-spanning reads aligned to a 100-nt region, with the coverage peak centered at the 3= end. For the polymerase segments, gap-spanning reads were also filtered by the size of deletions (i.e., Ն1,000 nucleotides). The 1,000-nucleotide threshold for the polymerase segments was chosen based on the distribution of the size of the deletions (data are available on GitHub, https://github.com/ GhedinLab/Single-Cell-IAV-infection-in-monolayer/blob/master/AdditionalFiles/AdditionalFile5_ polymerase-DVG-deletion-size-distribution.pdf), as the majority of the deletions are larger than 1,000 nucleotides. Because cells with low infection (especially those harvested at the early stage of infection or inoculated at a low MOI) typically have poor coverage for the viral segments of interest, the DVG/FL ratio in those cells calculated as described above is typically inflated and may even fail to be calculated because all the viral reads corresponding to a given segment span gaps or there are no viral reads. To mitigate this effect, we overwrote those values with 0 in the data sets, including the data set collected from HBEpC at 6 hpi.
Interference test for PR8-DI162 and PR8-DI222 viruses and validation of their immunostimulatory effects. To test the interfering ability of PR8-DI162 and PR8-DI222 viruses that carry the deletions identified in two dominant PB2-DVG species from the single-cell data set, their inhibitory effects on wild-type PR8 virus replication were quantified in cultured cells at different DI/WT ratios. Subconfluent monolayers of A549 cells in 12-well plates were washed with infection medium (MEM supplemented with 0.15% BSA fraction V, 1% antibiotic-antimycotic, and 1 g/ml tosylsulfonyl phenylalanyl chloromethyl ketone [TPCK]-treated trypsin), and each well was coinfected with one type of defective virus at an MOI of 5 and the PR8 virus either at an MOI of 0.005 for a DI/WT ratio of 1,000:1 or at an MOI of 5 for a DI/WT ratio of 1:1 in 1 ml infection medium. For a DI/WT coinfection ratio of 1,000:1, plates were transferred to a 37°C, 5% CO 2 incubator and supernatants were collected at 2 h postinfection and 1, 2, and 3 days postinfection. For a DI/WT coinfection ratio of 1:1, infection was similar to that used for single-cell RNA-seq, including incubation at 4°C for 30 min with agitation every 5 to 10 min after infection followed by addition of prewarmed A549 infection medium, and cells were transferred to a 37°C, 5% CO 2 incubator for further incubation. Supernatants were collected at 2, 6, 12, and 24 h postinfection and titrated as described above. The wild-type PR8 virus yield from collected supernatants was determined by a TCID 50 assay using MDCK cells, which do not support replication of DIs due to the lack of functional PB2 proteins.
Gene coexpression network analysis. Multiscale embedded-gene coexpression network analysis (MEGENA) (48) was performed to identify host modules of highly coexpressed genes in influenza virus infection. The MEGENA workflow comprises 4 major steps: (i) fast planar-filtered-network construction (FPFNC), (ii) multiscale clustering analysis (MCA), (iii) multiscale hub analysis (MHA), (iv) and cluster-trait association analysis (CTA). A cutoff of 0.05 after perturbation-based false-discovery rate (FDR) calculation was used. The differentially expressed genes are identified between virus-infected cells of a particular cluster and the rest of the cells in the same population, determined by a t test. Only the genes that showed at least a 0.25-fold difference on the log scale between two groups and were expressed in at least 25% of the cells in either group were tested for differential expression. Significantly differentially expressed genes were identified by applying the adjusted P value cutoff of 0.05.
Identification of enriched GO terms, key regulators in the host module, and relative abundances of virus transcripts associated with host modules. To functionally annotate gene signatures and gene modules identified in this study, enrichment analysis of the established gene ontology (GO) categories was performed. The hub genes in each subnetwork were identified using the adopted Fisher's inverse chi-square approach in MEGENA; Bonferroni-corrected P values smaller than 0.05 were set as the threshold to identify significant hubs.
The relative abundances of virus transcripts and the DVGs associated with the host modules were identified using Spearman correlation between the first principal component of the gene expression in the corresponding module and the relative abundances of virus transcripts. Significantly associated traits were identified using the Benjamini-Hochberg FDR-corrected P value of 0.05 as the cutoff.
Statistical analysis. The statistical significance of the changes in the relative abundances of viral and DVG transcripts between 3 or more groups of cells was first determined by the one-or two-tailed Kruskal-Wallis rank sum test, followed by the one-tailed Wilcoxon rank sum test to calculate the pairwise comparisons. The statistical significance of expression fold changes in the qPCR validation assay was determined using the two-tailed Student t test. A P value of Յ0.05 was considered statistically significant.
Data availability. The code used to generate all the results is available on GitHub (https://github .com/GhedinLab/Single-Cell-IAV-infection-in-monolayer). Sequencing data that support the findings of this study have been deposited in the Gene Expression Omnibus (GEO) repository with the accession code GSE118773.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.