Evolutionary trajectory of fish Piscine novirhabdovirus (=Viral Hemorrhagic Septicemia Virus) across its Laurentian Great Lakes history: Spatial and temporal diversification

Abstract Piscine novirhabdovirus = Viral Hemorrhagic Septicemia Virus (VHSV) first appeared in the Laurentian Great Lakes with large outbreaks from 2005 to 2006, as a new and novel RNA rhabdovirus subgenogroup (IVb) that killed >30 fish species. Interlude periods punctuated smaller more localized outbreaks in 2007, 2010, and 2017, although some fishes tested positive in the intervals. There have not been reports of outbreaks or positives from 2018, 2019, or 2020. Here, we employ a combined population genetics and phylogenetic approach to evaluate spatial and temporal evolutionary trajectory on its G‐gene sequence variation, in comparison with whole‐genome sequences (11,083 bp) from a subset of 44 individual isolates (including 40 newly sequenced ones). Our results show that IVb (N = 184 individual fish isolates) diversified into 36 G‐gene haplotypes from 2003 to 2017, stemming from two originals (“a” and “b”). G‐gene haplotypes “a” and “b” differed by just one synonymous single‐nucleotide polymorphism (SNP) substitution, remained the most abundant until 2011, then disappeared. Group “a” descendants (14 haplotypes) remained most prevalent in the Upper and Central Great Lakes, with eight (51%) having nonsynonymous substitutions. Group “b” descendants primarily have occurred in the Lower Great Lakes, including 22 haplotypes, of which 15 (68%) contained nonsynonymous changes. Evolutionary patterns of the whole‐genome sequences (which had 34 haplotypes among 44 isolates) appear congruent with those from the G‐gene. Virus populations significantly diverged among the Upper, Central, and Lower Great Lakes, diversifying over time. Spatial divergence was apparent in the overall patterns of nucleotide substitutions, while amino acid changes increased temporally. VHSV‐IVb thus significantly differentiated across its less than two decades in the Great Lakes, accompanied by declining outbreaks and virulence. Continuing diversification likely allowed the virus to persist at low levels in resident fish populations, and may facilitate its potential for further and future spread to new habitats and nonacclimated hosts.

Phylogenetic approaches have elucidated the overall evolutionary patterns of emerging and resurging viruses, such as Zika (Faye et al., 2014), West Nile (May, Li, Davis, Galbraith, & Barrett, 2010), Measles (Kimura et al., 2015), and the VHSV fish virus (Pierce & Stepien, 2012;Stepien et al., 2015Stepien et al., , 2020. Such broadscale examinations generally have lacked the fine-scale resolution to address recent temporal and spatial trends, which can be assessed with a population genetics approach. Population genetic investigations analyze changes in gene frequencies over spatial and temporal scales to discern the effects of natural selection, drift, and gene flow on mutational variation, as well as their respective influences on fitness and adaptations (summarized by Hedrick, 2011;Lowe, Kovach, & Allendorf, 2017).
Most traditional virus studies have defined a viral population as a genetic isolate from a single host individual (Beerenwinkel & Zagordi, 2011;Yang et al., 2012). Here, we define a viral population as the gene pool obtained from isolates at a given geographic location and from a common time point, comprising a single outbreak. The few population genetic studies of viruses to date largely have been restricted to agriculturally important plant viruses (Alabi, Martin, & Naidu, 2010;Lin, Rubio, Smythe, & Falk, 2004;Tsompana, Abad, Purugganan, & Moyer, 2005) or human pathogens (Bahl, Vijaykrishna, Holmes, Smith, & Guan, 2009;Kearney et al., 2009;Pesko & Ebel, 2012). Most examined viral populations either within a single host species (Alabi et al., 2010;Pesko & Ebel, 2012;Tsompana et al., 2005) or a single host individual (Kearney et al., 2009), rather than focusing on broadscale evolutionary trends across a virus' temporal and spatial history, as addressed here.

| VHSV evolution, outbreaks, and hosts
Viral Hemorrhagic Septicemia Virus infects >140 species of fishes in marine, estuarine, and freshwater environments across the Northern Hemisphere, rendering it one of the world's most serious fish diseases (Escobar, Escobar-Dodero, & Phelps, 2018). The evolutionary origin of VHSV has implicated a North Atlantic marine ancestor (Pierce & Stepien, 2012), whose lineage diversified into four genogroups (designated I-IV), with I-III found in the Northeastern Atlantic region of Europe (Hedrick et al., 2003;Meyers & Winton, 1995;Pierce & Stepien, 2012). VHSV genogroup I has a wide and diverse geographic range across Western Europe and has diversified into several subgenogroups (Pierce & Stepien, 2012). Genogroup I infects the most fish host species across a variety of estuarine and freshwaters (Kurath, 2012;Pierce & Stepien, 2012), including many aquacultured species (Abbadi et al., 2016;Ghorani et al., 2016). VHSV genogroup II diverged in Baltic Sea estuarine waters and is the sister group of genogroups I and III, with genogroup III mostly occurring in marine and estuarine waters of the North Sea (Pierce & Stepien, 2012;Stepien et al., 2020).
The Great Lakes' endemic subgenogroup-IVb-was back-traced to its apparent origin in a 2003 muskellunge (Esox masquinongy) from Lake St. Clair (Ammayappan & Vakharia, 2009). The first outbreaks of IVb occurred during the spring months of 2005 and 2006, resulting in massive and widespread fish kills across the Great Lakes (Groocock et al., 2007;Lumsden et al., 2007;Thompson et al., 2011).
In 2000, subgenogroup IVc was discovered in marine/estuarine North Atlantic waters (Gagné et al., 2007). IVc is the sister group of IVb, which together comprise a clade that is the sister group of IVa (Pierce & Stepien, 2012;Stepien et al., 2015Stepien et al., , 2020.

| Aim and objectives
Our investigation aimed to evaluate whether VHSV-IVb diversified during its two decades in the Great Lakes, and to elucidate possible spatial and temporal evolutionary patterns. The combined phylogenetic and population genetic approach analyzed partial sequences of glycoprotein (G-gene) and 44 whole-genome sequences (from a subset) from field-collected samples of the virus' hosts, which had naturally occurring infections, using new samples, historic isolates, and sequences available in NCBI GenBank (https://www.ncbi.nlm. nih.gov/GenBank). We then related the results with patterns in other viruses, to augment understanding of virus and host coevolution.

Middle
TA B L E 1 Field-collected individuals surveyed and numbers of individuals that tested VHSV positive, their isolate numbers (note some individuals are designated with >1 system (per Niner, 2019, and Stepien et al., 2015, homology indicated by =), species' common and scientific names, total length (TL, mm), VHSV concentrations (molecules per actin molecules per our laboratory test in ; also see Figure 3), haplotypes (G-gene and whole-genome), and GenBank accession numbers from our field-collected individuals in 2012, 2015, and 2016    Individual surgical sites (anus-operculum) each were cleaned with alcohol wipes, and a new sterile razor blade was used; the spleen was removed using sterile forceps and placed in a sterile labeled tube containing 1.5 ml RNAlater (Qiagen). Each liver was wrapped individually in sterile aluminum foil, and archived at −80°C. Spleen and liver tissues later were pooled in single tubes for up to five individuals (<50 mm TL) of the same species, collection, and location. Between specimens, forceps and aluminum dissection trays were sterilized, razor blades discarded in sharps containers, and gloves changed.
Specimen disposal followed the respective sampling agency protocol and/or UT IACUC#106419.
In 2015, to examine potential harboring of IVb, 50-100 dreissenid mussels were collected from all sites except for Budd Lake, where sufficient numbers occurred, and tested for VHSV.
Each mussel was smashed on a sterile surface with the back end of sterile forceps. Half of each was placed in RNAlater (as above), and the other half was wrapped in sterile foil and stored at −80°C.

| RNA extraction and reverse transcription
Tissues from up to 11 individuals per location were pooled in sterile 1.5 ml tubes for initial processing and ≤0.  Figure 3. " " = same as above.

TA B L E 1 (Continued)
TA B L E 2 VHSV-IVb samples used in our analyses. Isolate name (and alternate name from Niner, 2019, andStepien et al., 2020, used for whole-genome sequences), year, location information, geographic coordinates, host species, haplotype, GenBank accession number, and gene dataset (G-gene or All = whole genome) are provided for the (1) Early, (2) Middle, and (3) Later time periods

Isolate name
Year

Host species
Haplotype Accession no.

Isolate name
Year

Host species
Haplotype Accession no.

Isolate name
Year

Host species
Haplotype Accession no.

Isolate name
Year

Host species
Haplotype Accession no.

Isolate name
Year

Host species
Haplotype Accession no.

| Preparation of historic isolates in cell culture
Samples from historic VHSV outbreaks (2006)(2007)(2008)(2009)(2010)(2011) were received from G. Kurath (USGS, Seattle, WA) as frozen media from BF2 cell culture or as RNA, to which 30 and 150 µl of serum-free MEM (minimum essential media; Thermo Fisher Scientific) were added per well of a plate confluent with BF2 cells. Cells were incubated with media at 20°C for 1 hr, after which media were replaced with 10% serum MEM, and incubated at 20°C for <1 week. At ≥80% cytopathogenicity (CPE), media were collected in a 1.5 ml tube, 250 µl versene added for 10 min, centrifuged for 4 min at 1,700 g at 4°C, and the supernatant discarded. We added 250 µl Trizol ® to the cell solids, and isolated RNA and cDNA as above. All samples were passaged twice, to minimize possible NT mutations during cell culture. We were successful at sequencing the whole genome from 40 isolates.

| Sequencing VHSV isolates
cDNA was synthesized from total RNA extracted from tissue samples using SuperScript IV (Invitrogen), following manufacturer's instructions. Genomic cDNA was amplified in four segments using primers from Schönherz, Lorenzen, Guldbrandtsen,

Isolate name
Year

Host species
Haplotype Accession no. Positive controls of "a" were resequenced to confirm accuracy, and nuclease-free ddH 2 O served as a negative control. Any and all NT differences from "a" in the historic and new samples were confirmed with corresponding trace files. Amplicons were examined under UV light on 1% agarose gels stained with ethidium bromide.
Target PCR products were gel excised and purified using QIAquick Gel Extraction kits (Qiagen).
Additional PCRs amplified the front 700 nucleotides Genomic sequencing was performed at Ohio State University's Molecular and Cellular Imaging Center (Wooster, OH). Sequences were uploaded by us to the Galaxy web platform, and analyzed with usega laxy.org programs (Afgan et al., 2018). Segments were aligned to the reference VHSV-IVb genome (C03MU, GenBank: GQ385941) using MAP WITH BWA-MEM (Li, 2013). For each of the isolates, consensus sequences were generated followed by manual checking of each single-nucleotide polymorphism (SNP) and coverage read using Integrative Genomics Viewer (IGV: Robinson et al., 2011;Thorvaldsdóttir, Robinson, & Mesirov, 2013). Consensus sequences, front, and end segments were concatenated, aligned, and trimmed in MEGA X.
Haplotypes are defined here as "unique gene sequences that differ by one or more NT substitutions" from haplotype "a," which was the original 2003 isolate MI03GL sequenced from a Lake St. Clair muskellunge (GQ385941) (see Table 2). Haplotypes from the whole genomes were designated with their G-gene designations followed by a number.

F I G U R E 3
Concentrations of VHSV-IVb (±standard error) in wild-caught fish tissues, compared to results from experimental laboratory haplotype "a" challenged muskellunge, determined with qPCR assay developed in Stepien's laboratory using internal standards . Laboratory samples (squares) are named by the number of days (6-42D) after VHSV-IVb inoculation, H: high virus dosage (1 × 10 5 pfu/ml), and L: low dosage (100 pfu/ml) (data from Pierce, 2013). Haplotype of each sample is listed above its standard error bars. Solid line denotes the experimental threshold for clinical signs of disease and dashed line the cell culture detection threshold . Wild-caught samples (circles) are designated by abbreviated common name, followed by collection year and sample number (Table 1)  We calculated haplotypic and nucleotide diversity, and number and relative proportion of private haplotypes in populations with ARLEQUINv3.5 (Excoffier & Lischer, 2010). Evolutionary relationships among haplotypes are depicted with POPART (https://popart. otago.ac.nz) and TCS networks (Clement, Posada, & Crandall, 2000).
Pairwise divergences between populations were analyzed using θ ST (F ST analogue; Weir & Cockerham, 1984) in ARLEQUIN and with exact tests of differentiation (χ 2 ) in GENEPOP v4.6 (Raymond & Rousset, 1995;Rousset, 2008). The latter employed a MCMC procedure with 1,000 batches and 10,000 iterations to randomly sample allelic frequencies. Probabilities were adjusted with sequential Bonferroni correction (Rice, 1989) and reported both prior and after adjustment, to identify borderline cases.
Tajima's (1989) D tests in ARLEQUIN evaluated possible influence of selection. We also further examined selection pressures using unconstrained Bayesian approximation (FUBAR) (Murrell et al., 2013) to identify positive or purifying selection. Since FUBAR's assumption of constant selection might not accurately represent IVb, we also used MEME (mixed-effects model of evolution), which can detect positive selection, under strong purifying selection or the removal of detrimental variants (Murrell et al., 2013). FUBAR and MEME were run with HyPHY on DataMonkey (www.datam onkey. org), with significance evaluated using posterior probability >.95 for FUBAR and p < .05 for MEME. substitutions, using the G-gene.
A neighbor-joining genetic distance tree analyzed population relationships using Reynolds' R ST genetic distances (Reynolds, Weir, & Cockerham, 1983) in PHYMLv3.697 (Felsenstein, 2007) with 10,000 bootstrap pseudoreplications (Felsenstein, 1985). Possible relationships between genetic distance (θ ST ) and geographic distance were evaluated with separate Mantel (1967) tests for the Early, Middle, and Later time periods, using shortest waterway distances (km) between outbreak locations or the most direct road route for landlocked locations. We also tested the relationship between genetic distance (using all samples) and time (sampling years).

| VHSV-IVb detections
We . These had two new haplotypes: 13 with "w" and six with "x" (Table 1). No VHSV positives were detected from 1,003 dreissenid mussels collected in May-September 2015; thus, no further testing was conducted. Additionally, none of the 2017 fishes we sampled tested VHSV positive.
Our qPCR analyses of field-collected fishes (Figure 3) discerned VHSV-IVb concentrations (log values) for the 2015 white perch (haplotype "u") of 1.8 × 10 3 VHSV/10 6 β-actin molecules, with the round goby 2015 individual ("v") being much higher at 5.2 × 10 6 . The latter concentration was the highest recorded, much above the challenge experiment threshold for clinical signs of disease based on haplotype "a," yet the fish exhibited no visible VHSV signs . A wide range of viral concentrations (5.1 × 10 1 to 1.9 × 10 6 ) occurred in our 2016 field-collected positives ("w" and "x"), as had been found in the laboratory-challenged fish (haplotype "a") experiments (the latter from Pierce, 2013). Three 2016 gizzard shad individuals from Lake Erie possessing VHSV-IVb haplotype "w" also had virus concentrations that were above the clinical disease sign threshold (Figure 3).

| Evolutionary patterns
We analyzed a total of 184 G-gene sequences comprising 36 haplotypes, and a subset of those for whole-genome sequences containing 34 haplotypes. Figure 4 shows the consensus G-gene phylogenetic tree from maximum likelihood (PHYML) and Bayesian (MR. BAYES) analyses, which was rooted to genogroup VHSV-IVa. Estimated divergence times are indicated (BEAST). This tree is congruent with the tree from Stepien et al. (2015), but expanded to include the additional haplotypes to date. There are two major clades (labeled 1 and 2), with the new 2016 and 2017 haplotypes ("w," "x," and "bd") contained inside clade 1, along with the original haplotype "a." The F I G U R E 4 VHSV-IVb G-gene phylogeny. Phylogenetic tree of VHSV haplotypes based on the G-gene from maximum likelihood and Bayesian analyses. Values above nodes = 2,000 bootstrap pseudoreplicates/Bayesian posterior probabilities. Values in parentheses and italics = estimated divergence time (years) (Stepien et al., 2015). VHSV-IVa (AB179621) served as the out-group. Two clades discussed here (1 and 2) are designated by brackets. Symbols designate area in the Great Lakes (diamonds) and time period (squares) new haplotypes from 2015 ("u" and "v") are in clade 2. Evolutionary rates for the G-gene were similar in both datasets (partial G-gene: 1.00 × 10 −4 substitutions site −1 yr −1 ; whole G-gene: 8.51 × 10 −5 ). The overall rate for the entire genome dataset was slower, at 6.64 × 10 −5 (Table 3). Figure 5 depicts the G-gene haplotype networks, including separate networks (b, d, f) based on nonsynonymous substitutions alone (those with AA changes). Two predominant haplotypes, "a" and "b," are centrally located as the largest circles, containing 74 (40%) and 45 (24%) respective isolates. One single synonymous transition from cytosine to guanine at position 3,996 separated haplotypes "a" and "b." Thirteen unique haplotypes descend from "a," with mean divergence of 1.46 ± 0.22 NT. Twenty-one unique haplotype descendants surround "b," diverging by a mean of 1.67 ± 0.19 NT. Regional patterns are apparent (Figure 5a, Table 4), with 81% of haplotype "a" occurrences in the Upper (30%) and Central Great Lakes (51%), and 93% of "b" in the Lower Great Lakes. Similar geographic separations characterized the descendants: the "a" group was more prevalent in the Upper (N = 9, 25%) and Central Great Lakes (N = 24, 67%) and "b" in the Lower Great Lakes (N = 23, 74%).
Temporal patterns also are apparent in the networks (Figure 5c, Table 4). Haplotypes "a" and "b" were most abundant during the Early time period, comprising 51% and 46% of the samples. During A diversity of host species (Figure 4e-f) possessed either "a" or "b" haplotypes; none predominated in "a," but haplotype "b," and its descendants frequently were found in round goby (constituting 38% of the occurrences of "b"). Overall, "a" was in 13 host species and "b" in 11, with nine having either "a" or "b." Multiple occurrences of "a" group haplotypes were in gizzard shad (N = 12, including haplotype "w"), round goby (N = 4, all "x"), and bluegill (Lepomis macrochirus, N = 3) (Figure 5e). All emerald shiner and white bass (Morone chrysops) contained haplotype "a," and the single white perch with VHSV possessed "b." Both of the positive invertebrate samples (leech and amphipod) had "a" (Faisal & Schulz, 2009;Faisal & Winters, 2011).
Mean number of substitutions and relative percentage of AA changes in VHSV-IV significantly increased over time (Table 5A). (1.25, 0.50; 40%); that is, more substitutions were synonymous in the latter.

| Congruence with whole-genome analyses
Lower sample sizes (N = 45) and lack of available samples from some areas and temporal periods precluded more in-depth population analyses using whole-genome sequences, versus the larger sample sizes available for the G-gene alone (see Table 3).

| Population genetic patterns
Haplotype network results were statistically supported by pairwise genetic divergence (θ ST ) analyses. Pronounced population genetic divergence occurred over time, with the Later period differing from both the Early and Middle time periods (Table 6A). Populations from all three regions (Upper, Central, and Lower Great Lakes) significantly diverged (Table 6B), with the greatest distinction between the Central and Lower Lakes. Among individual water bodies (Table 6C), F I G U R E 5 G-gene haplotype networks. Partial G-gene sequences (669 NT) from 176 isolates using POPART (https://popart.otago.ac.nz) and TCS (Clement et al., 2000) for ( Table 2) illustrated with POPART and TCS. Numbers on branches denote numbers of NT changes. Red = Upper Great Lakes, Yellow = Central Great Lakes, Blue = Lower Great Lakes, and Underline = Haplotypes from the Later time period. Individual isolates sharing haplotype "a" for the whole-genome sequences were the following: E06WBc, C06NP, C06RB, C06SR, C06YP, C06FB, M08AMa-b, C08LEa-b, and C09MU, along with C03MU (see Table 2) TA B L E 6 Pairwise genetic divergences of VHSV populations between (A) sampling time periods, Early ), Middle (2007), and Later (2011, (B) Great Lakes regions (Upper, Central, Lower), and (C) individual water bodies, based on variation for the (1) G-gene and (2) entire genome data sets, using exact tests (GENEPOP; above diagonal) and θ ST divergences (ARLEQUIN; below diagonal) time periods (51%, p < .001), totaling 69% of the variation. AMOVA (Table 7B) found less but significant variation when variation first was partitioned among the three time periods (0.52%, p < .001), and more among their component sampling events (67%, p < .001).
The neighbor-joining genetic distance tree (Figure 7) shows the relationships among VHSV-IVb populations, which were each collected from a single area at a single time. The tree depicts two primary population clusters, whose upper cluster is dominated by "a" and its descendants, and lower cluster by haplotype "b" and its Further analyses to uncover possible selection using FUBAR and MEME on the entire genome data set also indicated that purifying selection characterized IVb's N-(codon 313), G-(codon 342), and L-genes (six codons: 8, 119, 333, 460, 1,284, and 1,758) (Table 9).

| VHSV-IV occurrences and evolutionary trajectory
VHSV-IVb has undergone extensive evolutionary changes across its less than two decades in the Great Lakes, showing significant TA B L E 7 Relative distribution of genetic variation among VHSV-IVb isolates using analysis of molecular variance (AMOVA, Excoffier, Smouse, & Quattro, 1992), calculated from partial G-gene sequences for (A) nucleotide sequences, and (B) amino acid changes, using ARLEQUINv3.5.1.3 (Excoffier & Lischer, 2010)  population spatial and temporal patterning, and increased genetic differentiation. Its haplotype distributions showed geographic structuring among the Upper, Central, and Lower Great Lakes. Amino acid changes also revealed significant diversification over time. Such increased genotypic and phenotypic variation may allow a virus to overcome its hosts' immune systems, both within and among host species (Novella & Presloid, 2012;Stepien et al., 2015). This diversity may allow the virus population to persist under consistent and/or variable environmental conditions, and to enter new hosts and their habitats.
VHSV-IVb continued to diversify following a quasispecies pattern, radiating new variants from the original central "a" and "b" haplotypes. Ojosnegros and Beerenwinkel (2010)  and other clinical signs of disease. In contrast, the first VHSV-IVb outbreaks were characterized by high virulence, hemorrhaging, and mass fish die-offs (Kim & Faisal, 2011). Similarly, the classic example of Australian Myxoma Virus began with high mortality in feral rabbits, which lessened over time due to coadaptations between the virus and host populations (Alves et al., 2019;Elsworth et al., 2014). That pattern also appears to have characterized VHSV-IVb, whose initial host fish populations may have been more susceptible than later ones, followed by increased acclimation and resistance of the hosts. Meanwhile, the virus continued to differentiate over time.

| Evolutionary patterns across space and time
Evolutionary diversification of VHSV-IVb, based on our G-gene results, radiated from the original haplotypes "a" and "b," with the "a" descendent group primarily found in the Upper and Central Great Lakes, and the "b" group in the Lower Great Lakes and St. Lawrence River. The two original haplotypes may have originated from separate introductions into the Great Lakes, with "a" in the Upper and Central Great Lakes, and "b" in the Lower Great Lakes. An alternate hypothesis is that "b" descended from "a," with the latter first appearing in Lake St. Clair ca. 2003. VHSV-IVb haplotypes "a" and "b" appeared nearly monotypic in the host populations during the Early time period (2003)(2004)(2005)(2006), accounting for 90% of the known isolates, but both haplotypes disappeared after 2011.

F I G U R E 7
Neighbor-joining genetic distance tree depicting relationships among VHSV-IVb population samples. Reynolds' (1983) genetic distances (R ST ) used on G-gene haplotypes and their frequencies in PHYLIP (Felsenstein, 2007). Bootstrap percentage support for nodes from 10,000 replications is shown. Sample sizes (N) are in parentheses. Symbols designate area in the Great Lakes (diamonds) and time period ( The closely related IHNV (=Salmonid novirhabdovirus) also exhibited geographic patterning among three distinct geographic Northeastern Pacific regions, with its "M" group having more substitutions than its "L" and "U" groups (Kurath et al., 2003). In 1985, IHNV was reported in China, where it likely was introduced from Japan, and now constitutes a distinct clade (Xu et al., 2018). Since regional differences in VHSV-IVb are based on a history of less than two decades in the Great Lakes, it appears likely that geographic divergence and genetic diversity may develop further (unless the virus "disappears").
In our study, genetic divergence (θ ST ) was strongly correlated with geographic distance among samples. The relationship between genetic divergence and time (years) also showed significant positive correlation, indicating continued changes over time. Increased genetic divergence was accompanied by greater overall population variability, consistent with the quasispecies theory.
In contrast with the patterns we discerned for VHSV-IVb, evolutionary relatedness of IHNV G-gene sequences along the Northeastern Pacific did not correspond to changes over time (Kurath et al., 2003). Another Northeastern Pacific IHNV study identified greater diversity in viral isolates in later years, but attributed those to increased sampling effort and surveillance (Black, Breyta, Bedford, & Kurath, 2016), which was not the case in our study.
Like genetic structuring in VHSV-IVb and IHNV, Balkan RABV isolates (N = 210) were found to possess genetic structure (with five genetic groupings), whose relationships better corresponded to geo-  Abbreviation: pp, posterior probability. " " = same as above. (Velazquez-Salinas et al., 2014). We discerned that spatial structure of VHSV-IVb exhibited more pronounced patterning than did temporal structure, although both were significant.

| Substitution rates, types, and patterns
VHSV-IVb appears to have evolved at a fairly consistent rate over time, from 2005 to 2017 in the Great Lakes' region. We previously determined its partial G-gene region rate to be 2.8 × 10 −4 substitutions/nucleotide site per year (Stepien et al., 2015), which was slightly faster than our 1.0 × 10 −4 estimate here, based on 77 additional isolates (based on the sequences we used for the present population genetic analyses). The calculated rate for the entire Ggene here, which added more conserved (slowly evolving) regions, is 8.5 × 10 −5 (Table 3). The rate based on our sequences across the entire genome is 6.6 × 10 −5 (Table 3), which matches the partial gene rate calculation (from a combination of partial G-, N-, Nv-, M-, and P-gene sequences) of 6.6 × 10 −5 calculated in Stepien et al. (2015).
Nonsynonymous substitutions constituted 41% of the 34 substitutions in the partial G-gene region examined here (668 NT and 97.5% with the full-genome sequences (Table 3). Most G-gene haplotypes differed by just single substitutions (62%). In comparison, using a shorter 360 NT G-gene region, Benmansour et al. (1997) identified 92% AA conservation between the Northeastern Pacific VHSV-IVa subgenogroup (six isolates) and the European VHSV genogroups I-III (eight isolates), and 98% similarity within IVa. Stone, Way, and Dixon (1997)  In comparison, IHNV displayed 91% sequence conservation for a 303 NT G-gene segment of 323 Northeastern Pacific coastal isolates (Kurath et al., 2003). Complete IHNV G-gene sequences showed 94.4% sequence conservation among 38 international samples (Nishizawa, Kinoshita, Kim, Higashi, & Yoshimizu, 2006). G-gene sequences from 61 rabies (RABV) cases in Brazilian livestock had 98% NT similarity and 97% AA conservation, with 18/27 (67%) SNPs resulting in AA changes (Cargnelutti et al., 2017). A comparable study in India examined 25 full-length G-gene RABV sequences from six host species, revealing 96% sequence and 96% AA conservation (Cherian et al., 2015). Those AA changes occurred away from antigenic sites, suggesting conservation of critical gene functions and avoidance of the host immune system (Cherian et al., 2015). with 123 being nonsynonymous (33.4%), which also is higher than ours for both and might again reflect sampling size or a mutation rate difference. Similar to our results, a Zika Virus study found 16% nonsynonymous changes among 1,030 SNPs across 110 whole genomes (Metsky et al., 2017), close to the proportion we identified here in the partial G-gene sequences and across the entire VHSV-IVb genome. Expressed genetic variation in VHSV-IVb thus appears high.

| Gene-specific variation
The G-gene encodes the glycoprotein of the viral capsid, which enables the viral particle to attach and enter the host cell via endocytosis (Kurath, 2012). In response, the fish hosts mount immune defenses against glycoprotein (Dietzgen & Kuzmin, 2012) whose relatively high mutation rate and genetic diversification may help the viral population to avoid detection (Stepien et al., 2015. Adaptive radiation in the VHSV-Ia G-gene occurred in European freshwater rainbow trout (Oncorhynchus mykiss) culture following flooding of fish farms with marine water that carried marine hostbased VHSV-Ia into the area (Schönherz, Forsberg, Guldbrandtsen, Buitenhuis, & Einer-Jensen, 2018). Here, evolutionary diversification of VHSV-IVb G-gene variants likewise may facilitate localized adaptation to match cell receptors of particular hosts, such as round goby and gizzard shad, in similar manner to that of Ia in rainbow trout.
Of the individual genes we examined, the novirhabdovirus genus-specific Nv-gene of IVb had more substitutions (3.8%), as previously was observed across all global VHSV genogroups (Pierce & Stepien, 2012), within the European genogroups (Basurco et al., 1995), and for IHNV (He, Ding, He, Yan, & Teng, 2013;He, Yan, Liang, Sun, & Teng, 2014). The evolutionary rate of the Nv-gene (9.76 × 10 −5 ) was the highest among the IVb genes (Table 3). Conversely, Getchell et al. (2017) found a single Nv substitution in just one of their four IVb genomes, whereas Basurco et al. (1995) identified one synonymous change in one of nine IVa isolates, likely due to the smaller sample sizes of both studies. The Nv-gene encodes a small nonstructural protein (Ammayappan, Kurath, Thompson, & Vakharia, 2011;Biacchesi, 2011) and has the fastest mutation rate among the VHSV genes (Stepien et al., 2015), suggesting its potential role in evading host detection and/or defense (Pierce & Stepien, 2012).  discovered that Nv-knockout mutants induced cell apoptosis earlier than did nonaltered (wild-type) VHSV-IVb and IHNV, implying that Nv prolongs the length of infection, resulting in increased transmission potential.
P-serves as a chaperone for N-protein synthesis, preventing it from binding to cellular RNA products, and also forms the RNA polymerase complex with the L-protein (Dietzgen & Kuzmin, 2012).
We found that the VHSV-IVb M-and P-genes possessed fewer substitutions (2.8% and 1.9%, respectively), whose lower diversity indicates that they are more conserved. This likely is due to the importance of the M-and P-proteins in viral replication. greater than Getchell et al.'s (2017); that is, the lack of nonsynonymous substitutions in their study likely was due to their sample size.
The N-gene encodes the nucleocapsid protein involved with balancing viral transcription and replication. N-protein binds to viral RNA, forming a complex that serves as the template for transcription and replication, and additionally interacts directly with P-and L-proteins at multiple binding sites (Dietzgen & Kuzmin, 2012); thus, its conservation is expected.
The remaining gene, L (5,955 NT), has been relatively little studied in VHSV until now. We found that it has evolved the most slowly (1.9%), with 33.9% of its mutations being nonsynonymous. Schönherz et al. (2016) sequenced the full genomes of four VHSV-Ia isolates, finding the most mutations in the G-and L-genes. The latter is true here, due to the length of the L-gene, rather than its rate. The L-protein functions in viral transcription and replication (Kurath, 2012), with its variants increasing the virus' temperature range to >20°C, as demonstrated with gene-swapping experiments (Kim, Yusuff, Vakharia, & Evensen, 2015).

| Host species generality, specificity, and infection
Less than 1% of the individual fishes from our 2015 to 2017 sampling tested VHSV positive. In comparison, 2010 samples from the Great Lakes described 13% VHSV-IVb incidence among >5,000 fish individuals, with 25% of those occurrences found in round goby, which was targeted due to its high infection incidence (Cornwell et al., 2015). That overall incidence was a decrease from 16% positive occurrences found in 2009 (Cornwell, Eckerlin, et al., 2012 (Vennerström et al., 2018). Thus, global VHSV occurrences and predictability appear sporadic. All dreissenid mussels tested by us were VHSV negative. Throckmorton et al. (2017) likewise discovered no VHSV in cylindrical papershell mussels (Anodontoides ferussacianus) from Budd Lake, despite positive largemouth bass and Hyalellidae amphipods from that location. Researchers in Denmark tested for a VHSV-II invertebrate reservoir in isopods, krill, and squid, but none were positive (Skall, Olesen, & Mellergaard, 2005 and specialization among IVb haplotype groupings to date. Similar to our geographic findings, VHSV-IVa isolates from eight fish species showed higher G-gene similarity to isolates from the same geographic region than to those from the host species (Hedrick et al., 2003). Moreover, no host species associations were found for 63 full G-gene IVa isolates collected over a 20-year span, with just four of those haplotypes recovered from the same host species (Garver et al., 2013).
Much differentiation of the VHSV-IVb haplotype "b" group occurred in round goby from the Lower Great Lakes region during the Middle and Later time periods, suggesting that this invasive species might serve as a reservoir or vector (also see Cornwell et al., 2014).
Many round goby individuals collected during and outside of large VHSV-IVb outbreaks displayed classic hemorrhaging and other signs of disease (Cornwell, Getchell, Groocock, Walsh, & Bowser, 2012;Groocock et al., 2007), yet others had none (this study and Cornwell, Eckerlin, et al., 2012). Round goby has become one of the most common benthic fishes in the lower Great Lakes (Snyder & Stepien, 2017) and is a major prey item for game fishes (Johnson, Bunnell, & Knight, 2005), possibly aiding VHSV transmission among species.
Two round goby individuals from the peak of a small outbreak each possessed two different IVb haplotypes (Cornwell et al., 2014); such variation within individuals could increase transmission likelihood, since the host might fail to recognize some newer sequence variants. Getchell et al. (2019) also reported polymorphisms at single NT positions in some isolates from the Cayuga Lake 2017 outbreak. Here, we found no evidence of multiple haplotypes within a single individual. Coinfections by multiple isolates are not easy to detect, as the sequence having more RNA will exhibit greater PCR amplification (Hoorfar et al., 2004).
The round goby's large population sizes and relatively high genetic diversity levels (Brown & Stepien, 2009;Snyder & Stepien, 2017) may facilitate VHSV-IVb's coevolutionary success. The goby's high VHSV-IVb infection incidence may suggest that the virus is in the process of evolving some host specificity (Cornwell et al., 2014), notably in the "b" haplotype group indicated here. Since past research targeting round goby mostly focused on Lake Ontario (Cornwell et al., 2014) and the St. Lawrence River (Groocock et al., 2007), future efforts also should examine other populations.
Gizzard shad was the second most commonly infected species (22 VHSV-IVb isolates), dominating the original 2006 Lake Erie outbreak (primarily haplotype "a"; Thompson et al., 2011;C. A. Stepien, personal observation, 2006). The 2017 Lake St. Clair outbreak also primarily infected gizzard shad (M. Faisal & G. Whelan, personal communications, 2017), comprising a single unique haplotype ("be"). Many gizzard shad positives occurred during the Later time period in the Central Great Lakes, suggesting a possible reservoir species. Gizzard shad have a high death rate from VHSV . We also here report the first IVb detection in alewife (Lake Michigan, 2016), which, like gizzard shad, belongs to the Clupeid family.
Similarly, group "b"-derived haplotypes of VHSV-IVb might develop more specificity in round goby over time, meriting future investigation.
In comparison, high RABV genetic diversity is believed to have facilitated jumps among a wide taxonomic range of key host species, including bats, raccoons, and many canids (Rodríguez-Nevado, Lam, Holmes, & Pagán, 2018). A genetically diverse host pool, whether in terms of population or species, can increase diversity of viral sequences (Ojosnegros & Beerenwinkel, 2010); this appears to be the case for VHSV-IVb.
The effects of VHSV-IVb on different host species, as well as within a given host species, varied considerably in our study, as measured by concentrations of the virus in infected field-collected fish tissues determined with our qPCR assay. Our qPCR assay used internal standards (IS), increasing measurement accuracy .

Similar breadth in virus titers characterized individual fish in viral
challenge experiments infected with haplotype "a" . Notably, four of our field-collected fish samples were above the threshold for clinical onset of disease signs in haplotype "a" laboratory-challenged fish .

| Selection and coevolution
Tajima's D analysis results indicate that the G-gene and the VHSV-IVb genome as a whole, have evolved under significant purifying selection. We found evidence of purifying selection acting on single N-and G-gene codons, and on six codons in the L-gene, based on results of Bayesian analyses in FUBAR and MEME. Other studies have found that purifying selection regulated VHSV evolution in other genogroups (Abbadi et al., 2016;He et al., 2014), as well as in other rhabdoviruses (Kuzmin, Novella, Dietzgen, Padhi, & Rupprecht, 2009) and RNA viruses in general (Hughes & Hughes, 2007).
Diversifying selection was indicated for two VHSV-IVb genes: G and Nv, in our results. Positive selection was implicated for changes in two IVb G-gene codons. Positive selection, as well as diversifying selection, on viral genes constitute support for the host-pathogen "arms race," during which the virus's evolutionary course acts to thwart and/or suppress host immune responses (Pereira & Amorim, 2013).
Following a quasispecies pattern, VHSV-IVb radiated into many closely related variants (toward the possible result of evading recognition by the host's immune system), with purifying selection removing less fit ones from the viral gene pool.
The original haplotypes "a" and "b" disappeared over time Padhi & Verghese, 2012). Purifying selection pressures on IHNV appeared greater than for VHSV-IVb, attributed to the former's more limited reported realm of host species and its longer temporal history (He et al., 2013). Analysis of 27 rhabdovirus taxa revealed that purifying selection has characterized all members and was greatest in the Lyssavirus genus (including RABV; Kuzmin, Hughes, & Rupprecht, 2006). In the case of RABV in Taiwan, purifying selection in the viral population increased in response to reductions in vaccinating its potential host populations, as infection incidences expanded (Lin et al., 2016). Global RABV evolutionary patterns have displayed strong purifying selection across mammalian hosts on their overall phylogenetic tree, being most pronounced in the bat and dog clades that are some of that virus' most prevalent hosts (Troupin et al., 2016). It appears likely that VHSV evolutionary patterns will continue to be shaped by purifying selection.
Illustrating local host population resistance, Millard, Brenden, LaPatra, Marcquenski, and Faisal (2014) discerned that VHSV-IVb antibodies lasted up to one year in muskellunge. Throckmorton, Peters, Brenden, and Faisal (2015) detected VHSV-IVb from Budd Lake in 2011 following its absence since 2007, suggesting an interlude of dormancy due to local immunity. Similar to our findings, a study of Danish VHSV-Ia isolates distinguished an almost four-year gap between fish kills, despite some detections in local fishes (Kahns et al., 2012). This may explain the relative rarity of VHSV-IVb from 2010 to present in Great Lakes populations. However, our results show that during these interludes, the virus continued to evolve to maintain "a foothold" in the local populations. VHSV-IVb thus underwent considerable diversification over time and space, whose trajectory may continue.

| SUMMARY AND CON CLUS I ON S
VHSV-IVb appears to have remained geographically constrained to the Great Lakes region and surrounding areas, albeit at lower incidence and virulence over time, and continued to mutate and genetically diversify between outbreaks. Significant sequence diversification of its viral populations occurred both spatially and temporally. This coevolutionary pattern likely aided the virus in circumventing immune system recognition by the hosts, in response to the increasing resistance of the host populations. More recently, the virus has displayed reduced virulence, as well as evolution toward possible persistence at low levels in resident host populations.
It also is likely that VHSV-IVb may continue to infect naïve fish populations and could spread to other water bodies, facilitated by its quasispecies pattern of continuing evolutionary differentiation.