The Genome Segments of Bluetongue Virus Differ in Copy Number in a Host-Specific Manner

The variation in viral gene frequencies remains a largely unexplored aspect of within-host genetics. This phenomenon is often considered to be specific to multipartite viruses. Multipartite viruses have segmented genomes, but in contrast to segmented viruses, their segments are each encapsidated alone in a virion. A main hypothesis explaining the evolution of multipartism is that, compared to segmented viruses, it facilitates the regulation of segment abundancy, and the genes the segments carry, within a host. These differences in gene frequencies could allow for expression regulation. Here, we show that wild populations of a segmented virus, bluetongue virus (BTV), also present unequal segment frequencies. BTV cycles between ruminants and Culicoides biting midges. As expected from a role in expression regulation, segment frequencies tended to show specific values that differed between ruminants and midges. Our results expand previous knowledge on gene frequency variation and call for studies on its role and conservation beyond multipartite viruses.

the so-called "set-point GF" (i.e., the median segment frequencies in a population of infected hosts) (1). The fact that GFs deviate from segment equimolarity is puzzling from an evolutionary standpoint. Such population genetics should imply a cost derived from the higher probability of losing scarce segments during population bottlenecks (5). However, populations close to the set-point GF tend to accumulate at higher rates, suggesting a link with virus fitness (1,4), probably through optimization of gene expression (1,6). As expected from a role in expression regulation, the set-point GF has been shown to be host specific (1,4). However, the putative relationship between GF and gene expression has not yet been directly demonstrated in multipartite viruses (7).
Currently, there are two examples of nonmultipartite viruses with a behavior reminiscent of set-point GFs in multipartite viruses. In the first example, populations of a monopartite baculovirus bore specific differences in gene frequencies that are due to certain genotypes with a deletion in the same genomic region (8). In the second example, a lower encapsidation rate of segment N led to unequal gene frequencies in populations of an experimentally evolved clone of influenza A virus, a segmented virus (9). In both cases, increases in fitness were observed despite the essential role of the genes with lower frequencies (9,10). Moreover, fitness improvements were probably due to regulation of virus expression (9,10).
These observations raise the question on how widespread gene frequency variation is. We have explored this question in bluetongue virus (BTV; Orbivirus, Reoviridae), a segmented virus affecting livestock worldwide. BTV possesses a double-stranded RNA (dsRNA) genome divided into 10 segments, a segment number among the highest in viruses. Most segments code for a single protein (11). Moreover, BTV is mainly an arthropod-borne virus infecting ruminants and Culicoides biting midges. This complex cycle probably involves yet-unknown mechanisms allowing expression regulation in phylogenetically distant hosts.

RESULTS
We first built a collection of samples from naturally infected hosts during the 2014/2019 epizooty of BTV serotype 4 (BTV-4) in Europe (12). Blood samples were collected from sheep and cows from Corsica (France) and sheep from Sardinia (Italy), two Mediterranean islands ( Table 1). The collection also included pools of 50 adult females of Culicoides imicola from the same islands and period (Table 1).
We then developed a SYBR green quantitative PCR (qPCR) approach to quantify genome segments similar to those used with multipartite viruses (1,3,4). The SYBR green assay was chosen over other qPCR assays due to its accuracy, simplicity, and cost efficiency in schemes requiring the validation of many primer pairs in parallel and the analysis of large sample numbers. Importantly, BTV segments can be quantified apart from viral transcripts because the negative strand of the dsRNA segments is generated only within fully formed capsids (13). Hence, we developed a two-step reverse transcription (RT)-qPCR assay for each segment of BTV-4 (strain BTV4᎑16᎑03) (12) in which the RT primer targets only the negative strand. We tested a large panel of primer couples to identify couples with similar efficiencies (minimum/maximum ϭ 91.8/95.2) and specificity when using BTV-free templates ( Table 2).
Each RNA sample was screened with the 10 RT-qPCR assays. To ensure exact calibration of the 10 qPCRs, we generated and used a single plasmid as the template for all the standard curves. This plasmid harbors a concatenate of the 10 regions targeted by our RT-qPCR assays, thus ensuring exactly the same initial number of copies per amplicon. We selected for further analysis only those samples showing a threshold cycle (C T ) value below 31 for all segments (approximately 1,000 copies). The final data set consisted of 15, 12, and 9 samples from sheep, cows, and biting midges, respectively (Table 1). We tested for primer-independent cDNA synthesis, a phenomenon in which RNA secondary structures or nucleic acid molecules (e.g., tRNA) can act as primers in the RT reaction (14). This phenomenon can affect strand-specific quantification in viruses (15), but to our knowledge, it is not tested for during quantification of BTV or other reoviruses (e.g., reference 16). We compared RT-qPCR output with or without a primer at the RT step for all segments and samples. RT reactions in the absence of primers for segments 2, 7, and 10 were the only ones yielding mean copy numbers at least 1% of that obtained with an RT primer (9%, 1%, and 26%, respectively; Fig. 1A). Primerindependent amplification could not be abolished without losing sensitivity through changes in the elongation temperature or the primer concentration at the RT step as previously proposed (15) (Fig. 1B). We compared segment frequencies in all samples between two data sets, corrected or not for the copy number obtained without a primer during the RT step. We observed a significant difference for segment 10 (Wilcoxon test, false-discovery rate [FDR] corrected, P value ϭ 0.008), although this effect was detected only in sheep samples when the data set was split per host species (Wilcoxon test, FDR corrected, P value ϭ 0.026 for the sheep data set) ( Fig. 2A). Therefore, all statistical analyses were carried out in three data sets in parallel: (i) a data set with the raw copy numbers, (ii) a data set obtained after subtraction of copy numbers found in the absence of an RT primer (corrected data set), and (iii) a data set with raw copy numbers but without data from segment 10. No main difference in the significance of the statistical tests was observed between the three data sets ( Table 3).
The results presented below were obtained with the corrected data set. As observed in multipartite viruses, relative segment frequencies were variable and their distributions sometimes deviated from those expected under segment equimolarity ( Fig. 2A). Moreover, distribution of segment frequencies in each sample did not suggest multimodality in frequency distributions between samples (for example, a group of samples with equimolar amounts of segments and another group with segment frequencies deviating from equimolarity) ( Fig. 2B and Fig. 3). Segments 1 and 9 had lower frequencies and segments 5 and 7 had higher frequencies than expected under equimolarity in all hosts (Student's t test one-sided, Benjamini-Hochberg corrected, P values Ͻ0.01) (Fig. 2A). The frequencies of segments 2, 6, and 10 also significantly deviated from equimolarity but only in sheep, cows, or midges, respectively (Table 4).
An analysis of variance showed a significant effect of the variable "segment" on segment frequencies ( Fig. 2A; Table 3). The analysis did not detect a significant effect for primer-independent cDNA amplification. The correction consisted of subtracting the copy number yielded in an RT-qPCR assay without a primer in the RT step, and it was done for each segment and sample. Segment numbers appear at bottom. The asterisk indicates the only segment-host combination (segment 10 in sheep) for which a significant difference was detected between the uncorrected and corrected CNs. (B) Influence of temperature and primer concentration at the RT step on primerindependent cDNA amplification and qPCR sensibility. Data presented were obtained with the RT-qPCR assay for segment 10, the segment showing significant primer-independent amplification. RNA extracted from BTV-infected BHK cells was used as the template. RT-qPCRs were carried out as described in the text. Colors follow final primer concentration (M) in RT step 1. Red dots indicate PCR output when RT was performed without a primer. The experiment was performed with two or three biological replicates. We selected an RT temperature of 42°C and a primer concentration of 2 M. of host or island but found significant interactions between segment and host, and between segment and island ( Table 3). The analysis also found a marginally significant interaction among the three independent variables (P value ϭ 0.049) ( Table 3). Such crossover interactions are likely to take place when examining frequency data with several independent variables. We thus analyzed the effect of host species on segment frequency as previously done (1). Several factors hampered a robust analysis of a putative island effect, and it was not further explored (i.e., limited sample size, lack of cow samples from Sardinia, and differences in sample processing between islands).
No significant difference in the frequencies of any segment was found between BTV populations from ruminants (Tukey's honestly significant difference [HSD] test for each segment with host as independent variable [ Fig. 2A]; the full statistical output can be found at https://github.com/loire/MatSupp_Moreau20JVir). In contrast, segments 1, 7, and 10 showed significant differences in their frequencies between midge and ruminant samples (Fig. 2A). A significant difference was also identified for segment 2 between midge and sheep samples ( Fig. 2A). The results obtained with the plasmid control ("Control," boxplots in gray) are shown to allow comparison with a template with equimolar amounts of PCR targets. Dot shape indicates sample origin (island) or template type (plasmid control versus field samples). The black and white arrows indicate the segments for which the average relative frequency was significantly higher or lower, respectively, than expected under equimolarity. Significant differences in frequencies among hosts for a given segment are indicated with a star above boxplots. Black asterisks correspond to significant differences between populations in the two ruminant hosts versus those in midges, whereas the red asterisk shows a significant difference between BTV populations from sheep and biting midges. (B) Segment frequencies per sample. Dots represent the relative frequency of a given segment (x axis) in a sample. Samples have been split per host in three panels (sheep, cow, and midge). For each host, all dots from a given sample are linked by a line of the same color.
These results supported the existence of host-specific set-point GFs in BTV-4 (Fig. 4). The relative abundances of segments 5, 6, 7, and 10 were between 2-and 3-fold higher than that of segment 1 in midges, whereas differences in ruminants were below 2-fold (Fig. 4). The largest differences were in the same order as those in nonmultipartite viruses with GFs improving fitness (between 3-and 4-fold) (8,9).

DISCUSSION
Deletions and nonconservative encapsidation have been observed in BTV populations infecting cell culture (17)(18)(19). These phenomena, undistinguishable with our PCR approach, could generate the observed differences among segments. Moreover, both phenomena would result in differences in the number of fully functional copies between segments and thus impact gene expression. However, if random, these phenomena alone cannot explain reproducible host-specific GFs. The observed GFs may also be (i) generated by host factors, (iii) encoded in the virus genome, or (iii) the product of selection on randomly generated GF variation (1,5).
Host-specific GFs are expected under a hypothetical role in viral gene expression. However, we cannot provide a robust functional explanation on the differences between segments or hosts. For example, differences in copy number roughly followed stoichiometry in capsids for segment 7 (Fig. 4). However, this trend was perceptible only in midges whereas capsid structure is supposedly similar in ruminants and midges. Whatever their role (or absence of), unbalanced GFs should have an impact on BTV evolution. If segments differ in their population sizes (i.e., their copy number), we could expect different evolutionary rates among segments and the genes they carry (5,(20)(21)(22). Moreover, a given segment could have different evolutionary rates depending on the host.
Overall, our results strongly suggest that wild BTV populations can have set-point GFs that differ from segment equimolarity and, even more intriguingly, between insect a Analyses of variance (model: frequency ϳ segment ϫ host ϫ island) with three data sets (raw, corrected, and Wo-10). Raw, raw copy numbers per segment and sample; corrected, copy numbers after subtraction of copy numbers found in primer-independent amplifications; Wo-10, raw copy numbers for all segments but segment 10. and ruminant hosts. This work expands our limited knowledge on gene frequency variation in viruses and calls for studies on its role and conservation in BTV and beyond.

MATERIALS AND METHODS
Sample collection and RNA extraction. Samples were obtained from two Mediterranean islands, Corsica (France) and Sardinia (Italy), between 2016 and 2017. Blood samples from sheep were collected from symptomatic animals from farms undergoing a BTV outbreak in both Corsica and Sardinia. Blood samples from cows were obtained during routine testing for BTV in slaughterhouses in Corsica. Biting midges were collected using OVI traps as previously described (23), except that the water in the traps that was replaced with a solution limiting RNase activity (24). Culicoides imicola females were identified on a cold plate, gathered in pools of 50 individuals, and stored in absolute ethanol at Ϫ20°C for further investigation.
RNA extraction from Corsican samples was performed with the RNA viral extraction kit (Macherey-Nagel) following manufacturer's instructions and using linear acrylamide (New England Biolabs) as carrier. Sardinian samples were extracted with the MagMAX core nucleic acid purification kit (Applied Biosystems) following manufacturer's instructions.
Development and validation of the quantitative PCR designs. Primers were designed with the LightCycler probe design software 2.0 (Roche) using as the template the segment sequences of strain BTV4᎑16᎑03 (GenBank accession numbers KY654328 to KY654337). Primer pairs located in the central regions of segments were preferred. Primers for the reverse transcription of the complementary strand were tested for potential hybridization on the coding strand using Geneious v10 software, and only primers with at least 5 mismatches were selected (Table 2). Then, primer pairs underwent several PCR efficiencies between 90% and 100% and a single peak during melting curve analysis of amplicons were selected for further validation. Specificity for selected primer pairs was tested against cDNA suspensions obtained from RT-processed RNA from BTV-free sheep blood and Culicoides biting midges and with concentrations similar to those of the field samples. Only primer pairs providing C T values above 34 were selected for further validation. Another round of efficiency validation was then carried out but this time using a plasmid as the template. The plasmid contained a concatenate of the regions targeted by primer pairs (i.e., each amplicon plus 20-bp flanking regions in both extremities of each amplicon), thus ensuring exactly the same initial number of copies per amplicon. Only primer pairs showing similar efficiencies and profiles of the melting temperature analyses between assays using either plasmids or cDNA were selected. Moreover, primer concentration was optimized to improve efficiency so as to obtain primer pairs with efficiencies at most differing in 5 units and, thus, with highly similar PCR dynamics. In total, we tested 61 primer pairs to generate the final set ( Table 2; the primers rejected can be found at https://github.com/loire/MatSupp_Moreau20JVir). Reaction and cycling conditions of RT-qPCRs. Tables 5 and 6 present the reaction and cycling conditions. First, RNA solutions were denatured with a dimethyl sulfoxide (DMSO) treatment. Then, RT and qPCRs were carried out in a two-step process. RT reactions were prepared separately for each segment as presented in Tables 5 and 6. RT reactions were performed with primers at a final concentration of 2 M for all segments and as described in Table 5 with the Revert-Aid kit (Thermo Fisher Scientific) in a Simpliamp Thermal Cycler (Thermo Fisher Scientific). The RT primer volume was replaced with nuclease-free water in tests for primer-independent cDNA amplification. All qPCRs were carried out in triplicate (primer concentrations are shown in Table 2) with the LightCycler FastStart DNA Master Plus SYBR green I kit (Roche) in a LightCycler 480 thermocycler (Roche).
Statistical analyses. All statistical tests were carried out with the R software (R Development Core Team, 2011, version 1.2.5033). Distances of segment frequencies in a sample to equimolarity (dEQ)   25), where i is the segment, p is the relative frequency of the segment in the sample, and 0.1 is the relative frequency at equimolarity.