Differential alphavirus defective RNA diversity between intracellular and encapsidated compartments is driven by subgenomic recombination events

Alphaviruses are positive-sense RNA arboviruses that can cause either a chronic arthritis or a potentially lethal encephalitis. Like other RNA viruses, alphaviruses produce truncated, defective genomes featuring large deletions during replication. Defective RNAs (D-RNAs) have primarily been isolated from virions after high-multiplicity of infection passaging. Here, we aimed to characterize both intracellular and packaged viral D-RNA populations during early passage infections under the hypothesis that D-RNAs arise de novo intracellularly that may not be packaged and thus have remained undetected. To this end, we generated NGS libraries using RNA derived from passage 1 (P1) stock chikungunya virus (CHIKV) 181/clone 25, intracellular virus, and encapsidated P2 virus and analyzed samples for D-RNA expression, followed by diversity and differential expression analyses. We found that the diversity of D-RNA species is significantly higher for intracellular D-RNA populations than encapsidated and specific populations of D-RNAs are differentially expressed between intracellular and encapsidated compartments. Importantly, these trends were likewise observed in a murine model of CHIKV 15561 infection, as well as in vitro studies using related Mayaro, Sindbis, and Aura viruses. Additionally, we identified a novel subtype of subgenomic D-RNA that are conserved across arthritogenic alphaviruses. D-RNAs specific to intracellular populations were defined by recombination events specifically in the subgenomic region, which was confirmed by direct RNA nanopore sequencing of intracellular CHIKV RNAs. Together, these studies show that only a portion of D-RNAs generated intracellularly are packaged and D-RNAs readily arise de novo in the absence of transmitted template. IMPORTANCE Our understanding of viral defective RNAs (D-RNAs), or truncated viral genomes, comes largely from passaging studies in tissue culture under artificial conditions and/or packaged viral RNAs. Here, we show that specific populations of alphavirus D-RNAs arise de novo and that they are not packaged into virions, thus imposing a transmission bottleneck and impeding their prior detection. This raises important questions about the roles of D-RNAs, both in nature and in tissue culture, during viral infection and whether their influence is constrained by packaging requirements. Further, during the course of these studies, we found a novel type of alphavirus D-RNA that is enriched intracellularly; dubbed subgenomic D-RNAs (sgD-RNAs), they are defined by deletion boundaries between capsid/E3 and E1/3’UTR regions and are common to chikungunya, Mayaro, Sindbis, and Aura viruses. These sgD-RNAs are enriched intracellularly and do not appear to be selectively packaged, and additionally may exist as subgenome-derived transcripts.


ABSTRACT INTRODUCTION
Although D-RNAs have been considered an epi-phenomenon of cell-culturing practices, emerging deep 79 sequencing technologies have enabled researchers to uncover D-RNAs in both natural and laboratory 80 animal infections (15,16), as well as in human samples (17,18).  RNAs were generated by undiluted/high MOI serial passaging and were found to suppress wild-type 99 parental virus replication. Later, SINV D-RNA production in persistently-infected BHK cells was 100 positively correlated with increased resistance to SFV challenge (29), indicating a potential for D-RNAs 101 to interfere with the replication cycles of heterologous alphaviruses. The first alphavirus D-RNAs to be 102 sequenced were derived from passage (P) 11 SFV; Lehtovaara et al subsequently found that SFV D-RNAs contained conserved nucleotide sequences from the most 5' and 3' ends of the genome, which 104 were rearranged across several repeats (30,31). Similar deletions spanning the majority of the two 105 open reading frames were later identified for SINV (32), although Monroe et al additionally found that 106 D-RNA populations were generally heterogeneous and comprised of numerous species. From these 107 and other studies (26-28, 33, 34) it has been hypothesized that alphavirus D-RNAs are encoded by 108 defective negative-sense templates that are subsequently transcribed into defective positive-sense 109 transcripts. This hypothesis is supported by the discovery of novel double-stranded intracellular RNA 110 species in late-passage SINV infections (33), which suggests the presence of a truncated intermediate. 111 Later, a particularly common deletion spanning from the nsP1-E1 genes of the SINV genome was unexplored. In this study, we characterized both intracellular and packaged D-RNA populations during 127 early viral passages, under the hypothesis that numerous D-RNAs arise de novo intracellularly that are 128 not packaged. By thoroughly investigating intracellular RNA diversity and its relation to packaged RNA 129 populations, we can define D-RNA subtypes and thereby elucidate specific roles D-RNAs play during 130 infection. To this end, we used Illumina sequencing to sequence RNA from passage (P) 1 alphavirus 131 stock, intracellular RNA (P1.5), and resultant P2 encapsidated RNA, and various bioinformatics 132 approaches to analyze differences in D-RNA diversity and expression. Intracellular D-RNA expression 133 was also evaluated using Oxford Nanopore Technologies' direct RNA sequencing. Finally, we utilized 134 a murine model of CHIKV infection to demonstrate that these in vitro trends hold true in a complex To investigate potential differences between intracellular and packaged D-RNA populations 140 during early passages, African green monkey kidney cells (Vero) were infected with CHIKV 181/clone 141 25 at multiplicity of infection (MOI) of 2. After 12 hours incubation, cells were washed three times with 142 PBS, and then total cellular RNA was extracted and rRNA-depleted. Concomitant infections were 143 incubated for 48 hours, and then supernatant was collected, clarified by centrifugation, and 144 concentrated using PEG-NaCl precipitation; concentrated virus was incubated with 2µg RNase A for 1 145 hour at room temperature to remove non-packaged RNAs prior to RNA extraction. By following this 146 study design, we were able to maximize available viral RNA from both compartments and thus enable 147 us to identify rare events. RNAseq libraries were then constructed using the previously described 148 ClickSeq library preparation method optimized for the discovery of rare recombination events (43,44) 149 and sequenced on an Illumina NextSeq550. Reads were trimmed and filtered using fastp (45) and 150 analyzed using the ViReMa v1.5 pipeline, which provides alignment data, recombination events, and 151 associated count data (20,44,46). This study design was repeated once with one replicate and a 152 second time with 3 replicates (used for statistical analyses). Additionally, RNA from two separate stock 153 vials was purified and sequenced. These were not included in statistical comparisons, but are shown 154 none-the-less to establish a baseline D-RNA phenotype. All reactions resulted in robust sequencing 155 read coverage across the reference virus genome, with median nucleotide coverage ranging between 156 34268 and 71485 reads (Table 1). Median coverage was slightly higher for nucleotides in the 157 subgenomic region in intracellular samples, ranging between 5.4-5.9x higher than those in the 158 nonstructural gene cassette. Overall, after normalizing count data to the number of mapped reads per 159 million sequenced reads, both the total number of recombination events, as well as the number of 160 unique events, were significantly higher among intracellular RNAs than encapsidated RNAs (Student's 161 T-test, p<0.01 for all; Fig. 1A-B), indicating that the majority of D-RNAs generated during replication 162 were not packaged. This is supported by Shannon's entropy analysis, calculated from raw D-RNA count 163 data weighted against median wild-type coverage as previously described (44), which found that 164 Shannon's diversity index (H) for D-RNAs is significantly higher for intracellular RNAs than  for intracellular and encapsidated D-RNAs ( Fig. 2A), which revealed that intracellular samples cluster 175 together while two of the three encapsidated replicates cluster together with the third clustering closer 176 to intracellular samples, indicating that variation in D-RNA expression patterns is mostly unique to its 177 respective sources (intracellular vs. encapsidated). This is supported by differential expression analysis  To compare specific boundaries, all normalized ViReMa count data from each data set (stock, 187 intracellular, and encapsidated) were combined; normalized count data for D-RNAs present in at least 188 two out of three samples were averaged, and then the boundaries for D-RNAs with greater than 2 189 counts per 10 6 mapped reads are illustrated using heatmaps, along with associated coverage data (Fig For the recombination events observed in the subgenomic region, it is possible that they occur 206 on either genomic or sg transcripts intracellularly; additionally, it is difficult to quantify relative numbers 207 of D-RNAs to full-length transcripts using short-read sequencing due to, for example, variance inherent 208 to RNAseq in nucleotide coverage across the virus genome and at boundaries of recombination events.

209
Thus, to put these sgDels into context, two separate Oxford Nanopore Technologies' (ONT) Direct RNA 210 Sequencing (DRS) libraries were generated using two intracellular samples derived from independent 211 infections that utilized completely distinct CHIKV 181/clone25 stocks. ONT DRS directly sequences the 212 input RNA on a single molecule by molecule basis (there is no PCR or amplification process), 213 maintaining native features of the RNA (50). Therefore, this technology can robustly quantify relative 214 numbers of genomic and sgRNA reads. Reads were based-called using Albacore v2.0.1 and aligned 215 to the CHIKV 181/clone 25 genome using minimap2 (51). Full-length reads were recovered (totaling 216 75-82 reads per data set), and median nucleotide coverage ranged from 229-249.5 with decreasing 217 coverage from the 3'UTR (due to 3' to 5' sequencing from the poly(A) tail) and a sharp drop in coverage 218 following the 5'UTR of the sgRNA (Fig. S2), showing a clear difference between genomic and 219 subgenomic reads. From aligned data, specific read junctions were extracted from the SAM file and 220 reads containing deletions were assessed (Table 2). For both sequencing data sets, multiple deletion 221 events on a single read were rare, with 96-98% of deleted reads featuring just one deletion. First, reads were placed into one of two categories: 1) genomic reads containing deletions, 225 defined as a read beginning between NT 1-7450; and 2) sg reads containing deletions, defined as a 226 read beginning between NT 7500-7550. Beyond these nucleotides, it is not possible to discern between 227 true sgRNA reads and genomic reads that may have simply been truncated during the sequencing 228 process (either due to RNA fragmentation or incomplete 3' to 5' nanopore sequencing of the RNA). The representative replicate shown), along with read/deletion boundary counts. These data confirm that 231 many of the sgDels occur specifically on sgRNAs rather than genomic RNAs. Interestingly, although 232 sgDels were present in both genomic and sgRNA transcripts, there was little to no commonality 233 between sgDels observed in genomic reads and sgDels observed in sg reads, with only two sgDels 234 common to both sgRNA and genomic RNA reads in DRS1. Thus, we were unable to confirm the 235 presence of a common defective template for the overwhelming majority of sgDels observed.

236
Finally, these data additionally reveal that 2.9-3.2% of all aligned reads from intracellular   Fig. 8), similar to that observed for CHIKV: for 277 MAYV, near NT 7600 (Fig. S3); for SINV, near NT 7760 (Fig. S4); and for AURV, near NT 7840 (Fig.   278   S5). These sites were accompanied by an additional downstream donor site that was nearly absent in    (Table 4); therefore, spleen and liver were 300 excluded from downstream analyses.  Also consistent with in vitro results, heat maps of recombination junctions reveal the same three 317 acceptor sites observed in vitro across both serum and tissue samples, as well as a similar donor site 318 specific to tissues rather than serum (Fig. 10). Interestingly, skeletal muscle, and especially kidney 319 tissue, was particularly enriched for sgD-RNAs. 3'UTR recombination events remained the most 320 common type of D-RNA in all samples. This is also evident when comparing the top deletion events 321 across different tissues, where kidney samples unsurprisingly most closely recapitulate in vitro data 322 ( Fig. S2-3). In addition to the 3' boundary observed in the 3'UTR, all mouse samples showed an  formation is through a copy-choice mechanism, in which a polymerase retains the nascent RNA chain 368 but switches templates during transcription (69,70). In this model, it is generally thought that template-369 switching occurs during anti-sense strand synthesis, thus giving rise to a defective template prior to 370 generation of daughter sense-strand RNA. This is a reasonable hypothesis for a few reasons, including: 371 1) anti-sense template is generally rare, affording few opportunities for recombination between discrete 372 templates; and 2) to produce D-RNAs to transmissible levels, presumably a defective template would 373 be required to generate a minimum copy number to increase packaging efficiency. Kirkegaard and 374 Baltimore were also able to demonstrate this experimentally using a wild-type poliovirus and a 375 guanidine-resistant poliovirus mutant, which showed preferential template-switching activity consistent 376 with that predicted for the generation of recombinant negative-sense template RNA (69).

377
However, the data presented here suggests that template-switching may not occur exclusively 378 during anti-sense template synthesis. Firstly, that alphavirus D-RNA production likely occurs de novo 379 implies that transmitted template is not a strict requirement for D-RNA production. In addition, 380 differential D-RNA diversity between intracellular and encapsidated alphavirus RNA was due almost 381 entirely to recombination events in the subgenomic region encoding the structural proteins. The 382 packaging signals of alphaviruses are thought to be in the nsP1 (71, 72) and, in the case of Semliki 383 Forest virus, nsP2 genes (3) in the nonstructural gene cassette, thus the lack of packaging of these 384 particular D-RNAs may be due to the fact that they are specifically on sgRNA transcripts and not 385 genomic transcripts. This was confirmed by DRS, which showed that CHIKV sgDels were expressed 386 primarily in sgRNA transcripts. Furthermore, since genomic and sgRNA transcripts are both transcribed 387 from full-length negative-sense template RNA, then we would reasonably expect to see the same 388 deletions in both genomic and subgenomic reads in long-read sequencing data if template-switching 389 occurs during negative-sense strand synthesis; however, because no overlap in deletions was 390 observed between genomic and subgenomic reads in either of our DRS data sets, we could not confirm 391 the presence of a common defective template. Together, these results suggest that either: 1) template-392 switching can occur during either antisense-or sense-strand synthesis; or 2) sgD-RNAs are not 393 generated through a copy-choice mechanism. Clustering of recombination junctions into "hotspots" 394 supports the former.

395
Similar to previous work, we also observed substantial recombination activity in the 3'UTR of 396 CHIKV. These CHIKV 3'UTR recombination events mostly consisted of duplications, especially in 397 intracellular populations. Duplications in the CHIKV 3'UTR are well-documented and are thought to 398 influence CHIKV host adaptability, particularly in the insect vector (36, 38, 73). Filomatori and Bardossy 399 and colleagues additionally found that these recombination events arise from a mixture of homologous-400 and nonhomologous-template switching (39), although the relative quantities of each varied depending 401 on host type. In addition to these, we have also shown that CHIKV, MAYV, SINV, and AURV all readily 402 produce sgD-RNAs featuring recombination events in the subgenomic region, especially subgenomic 403 deletions (sgDels) during early, low-MOI passages, as well as in a murine model of CHIKV infection.

404
All sgD-RNAs share similar boundaries between all viruses tested, particularly deletions with the first 405 junction occurring between capsid-E3 regions and the second occurring between E1-3'UTR regions.

406
Because these populations arise regularly after infection not only with different CHIKV stock viruses 407 but also during infection with heterologous alphaviruses, with unpredictable overlap between those 408 observed among intracellular and encapsidated RNAs, this suggests they arise de novo and are 409 consistently expressed at low levels intracellularly. The boundaries appear to be loosely conserved 410 between the alphaviruses tested, which all either cause or are highly related to viruses that cause 411 arthritogenic disease (56). Interestingly, 3'UTR events were relatively enriched in encapsidated factions 412 compared to sgDels and other subgenomic recombination events, potentially indicating a different 413 mechanism through which these events occur.

414
Intriguingly, although this study focused on arthritogenic alphaviruses, the general sgD-RNA 415 boundaries correspond to a well-known recombination event between Eastern equine encephalitis virus 416 (EEEV) and SINV that gave rise to the Western equine encephalitis virus (WEEV) species (56, 74).

417
Little is known about the particulars of this recombination event, such as whether it occurred in an insect 418 or mammalian host, but determining the mechanism of expression of sgD-RNAs among alphaviruses 419 may help delineate the circumstances that gave rise to WEEV. This is especially important, as CHIKV

424
A common concern regarding D-RNA research is that they may simply be an artifact of tissue  Type 1 interferon-deficient IFNαR -/-(A129) mice of varying ages and genders were kindly provided by 493 Dr. Slobodan Paessler. Mice were infected with 10 3 PFU CHIKV AF15561 in 10uL PBS in the left rear-494 footpad and monitored for weight and signs of disease. Animals were euthanized 4 days post-infection 495 by CO2 overdose followed by cardiac puncture, whereupon blood, skeletal muscle from the infected 496 and contralateral legs, heart, kidney, liver, and spleen: organs were placed in 300uL RNA later solution, 497 while whole blood was centrifuged at 3,380 rcf for 5 minutes and serum collected and stored at -80 until 498 further use. Upon processing, tissue was removed from RNAlater solution, rinsed in sterile PBS, and 499 placed in a 2mL microcentrifuge tube with 300uL Buffer RLT and a stainless steel ball bearing. Samples 500 were heat inactivated at 60C for 15 minutes, and then organs were homogenized using a TissueLyzer 501 II (Qiagen, Hilden, Germany) shaking at 26 p/sec for 5 minutes. RNA was then extracted from organs 502 in Buffer RLT using RNeasy Fibrous Tissue mini kits following manufacturer's instructions (Qiagen), The resulting .txt files from ViReMa were used to calculate Shannon's entropy (H), weighted by median 527 nucleotide coverage, utilizing a previously described custom python script (44). Further, using DESeq2 528 (47), for each virus recombination events were compiled, normalized, and expression changes 529 evaluated based on ViReMa recombination output file. From these results, hierarchical clustering was 530 performed using Cluster 3.0 (48), filtering for a minimum of 2 replicates with expression values >2; 531 resulting trees were visualized using TreeView (49). Additionally, normalized data from DESeq2 were 532 used to perform principle component analyses in SigmaPlot. ONT sequences were aligned to 533 alphavirus genomes using minimap2 (51)and read alignments extracted using a custom python script.

535
All statistics were performed in SigmaPlot. For Shannon's entropy and unique/total D-RNA 536 comparisons, data were tested for normality using the Shapiro Wilk test, followed by Student's T-test 537 comparing intracellular and encapsidated P2 samples. For animal data, data were tested for normality 538 followed by one-way analysis of variance (ANOVA). Data formatted by DESeq2 were used for principle 539 component analyses performed in SigmaPlot. 540 Data availability 541 All raw data files for Illumina and default quality-filtered/base-called data for direct RNA nanopore