Genomic and immunogenic changes over the history of the Viral Hemorrhagic Septicemia (VHS-IVb) fish virus (=Piscine novirhabdovirus) in the Laurentian Great Lakes

Viral Hemorrhagic Septicemia Virus (VHSV) (=Piscine novirhabdovirus) appeared in the Laurentian Great Lakes in 2005, constituting a unique and highly virulent genogroup (IVb), which killed >32 fish species in large 2005 and 2006. Periods of apparent dormancy punctuated smaller outbreaks in 2007, 2008, and 2017. We conducted the first whole genome analysis of IVb, evaluating its evolutionary changes using 46 isolates, in reference to immunogenicity in cell culture, and the genomes of other VHS genogroups (I–IVa) and other Novirhabdoviruses. IVb isolates had 253 genomic nucleotide substitutions (2.3% of the total 11,158 nucleotides), with 85 (16.6%) being non-synonymous. The greatest number of substitutions occurred in the non-coding region (NCDS; 4.3%) followed by the Nv- (3.8%), and M- (2.8%) genes. The M-gene possessed the greatest proportions of amino acid changes (52.9%), followed by the Nv- (50.0%), G- (48.6%), N- (35.7%) and L- (23.1%) genes. Among VHS genogroups, IVa from the northeastern Pacific exhibited the fastest substitution rate (2.01×10-3), followed by Ivb (6.64×10−5), and I/III from Europe (4.09×10−5). A 2016 gizzard shad isolate from Lake Erie was the most divergent IVb isolate (38 NT, 15.0%, 15 AA), yet exhibited reduced virulence with in vitro immunogenicity analyses, as did other 2016 isolates, in comparison to the first IVb isolate (2003). The 2016 isolates exhibited lower impact on innate antiviral responses, suggesting phenotypic effects. Results suggest continued sequence change and lower virulence over the history of IVb, which may facilitate its long-term persistence in fish host populations.

St. Clair, and Erie/Ontario) significantly diverged from each other (Table 5B), with greatest 358 difference between Lakes Michigan/Budd and Erie/Ontario. Significant regional geographic 359 differences occurred between Lakes Erie/Ontario versus other regions for M and NCDS, whereas 360 Lakes Michigan/Budd significantly diverged from others with L, and G differed for Lake St. 361 Clair. N differed between Lakes Michigan/Budd and St. Clair. 362 The Novirhabdovirus phylogenetic tree ( Figure 4) defines each species with 1.00 363 posterior probability (pp) and 100% bootstrap support (bs). IHNV and HRRV are sister species, 364 comprising the sister group to VHSV+SHRV. The 79 genomic VHSV sequences form two 365 clades: European genogroups I-III and North American/Asian IV. Within I-III, the single 366 VHSV-II isolate is basally located. Two of the five VHSV-III genotypes group with two Ia 367 isolates (Hededam DK35928), thus not monophyletic (i.e., not a clade). In contrast, the seven Ib 368 isolates are a well-defined clade, and the III isolates comprise another. Genogroup IV (Figure 3) 369 possesses 1.00 pp and 100% bs support, with IVa (N=25) and IVb (N=39) on separate branches, 370 having 1.00 pp and 100% bs support. 371 The IVb phylogenetic tree ( Figure 5) contains several inner clades, with M08RB 372 appearing basal (weakly supported), followed by O06SB, CellC03, and O13GS, respectively 373 (0.60-.90 pp/<50% bs). Two Lake Erie isolates from 2014-15 (E14GS, E15RG) group together 374 (1.00 pp/74% bs), followed by two Lake Michigan 2007-08 isolates, located prior to the main 375 cluster of the remaining 39 isolates, including C03MU. Within the major group are two clades 376 containing two isolates (B07BG/B07PS: 0.80-.89 pp/66% bs; M11YP/C06GS: 0.80-. 89 377 pp/<50% bs) and two larger clades. The first larger clade incorporates the 2016 isolates (0.  . 69 pp/<50% bs), further subdivided in two, separating Lake Michigan (1.00 pp/98% bs) from 379 Lake Erie and cell culture derivatives (1.00 pp/100% bs). A second well-defined clade (1.00 380 pp/83% bs) contains 14 isolates, including most early to middle (2006-08) Lake Erie genotypes, 381 along with its very distinct 2012 isolate. The 2012 genotype is the most distant, comprising the 382 sister taxon to E08ES (1.00/94%), which are linked by four shared SNPs and 2 AAs (M: one 383 SNP, one AA; L: two SNPs, one AA; NCDS: one SNP). E12FD and E08ES share an additional 384 five SNPs: two in N (two AA), and two in L (one AA). 385 14 BLAST searches returned no similar sequences to Nv when novirhabdoviruses were 386 omitted. The NT phylogeny ( Figure 6A) resolved all genes as comprising distinct clades, except 387 for Nv. The phylogeny contained two groups of clades: M+G and N+P+L. SHRV-Nv grouped 388 with G. IHNV-Nv and HIRRV-Nv are sister groups, sharing a common ancestor with the N clade. 389 VHSV's Nv gene appeared a divergent part of the L clade. All Nv clades were fully supported by 390 Bayesian analysis. 391 When AAs alone were considered ( Figure 6B), the IHNV and HIRRV-Nv clade was 392 sister to VHSV-Nv and the G clade. SHRV-Nv formed its own clade, more closely related to the 393 P clade, but was not well supported. 394

Differences in cytopathicity and immune response 395
A series of cell-based studies were used to determine whether sequence variations 396 impacted viral function: SRB assays to assess cytopathicity, virus yield assays to estimate viral 397 production, antiviral assays to assess IFN production, and qPCR to determine levels of IFN and 398 VHSV-IVb mRNA production. Sequencing confirmed each isolate's identity and revealed 399 changes acquired during cell culture propagation. The cell culture amplified control, CellC03,  Table A Since IFN antiviral bioassays are relatively insensitive to minimal changes, we extended the  418   previous studies by assessing IFN mRNA synthesis in infected cells with RT-qPCR. IFN mRNA  419 was expressed significantly more in Cell16a-c, compared with the CellC03 control across all 420 time points beyond 24 hpi ( Figure 10). These data suggest significant difference in ability of the 421 2016 isolates to induce cellular IFN expression, or significant reduction in their suppression of 422 IFN expression. Consequently, by 48  al. [45] did not uncover diversifying selection for either G or Nv. Getchell et al. [45] discerned 441 variation in two N-gene codons and one codon each for M and L, which were unsupported in our 442 investigation. We found evidence of purifying selection acting on single N-and G-gene codons, 443 and on six codons in L. In our IV analyses. Thus, fewer codons were under purifying or 444 diversifying selection than in Getchell et al.'s [45] findings. None matched between the two 445 analyses, with ours being the larger, more robust dataset. 446 16 Purifying selection was the major evolutionary force discerned across all VHSV 447 genogroups, subgenogroups, and samples. Our study identified ≥one codon per gene under 448 negative selection, for IVa and I/III. No NT sites under selection were in common between IVb 449 and the other genogroups, with one exception: codon 1758 in L for IVb and I/III. Our dN/dS 450 ratio further indicated purifying selection, with values being >0.5 across all genes and VHSV 451 genogroups. Other investigations found that purifying selection regulates VHSV evolution [30, 452 77], as well as in other rhabdoviruses [78] and RNA viruses in general [79]. He et al. [77] 453 compared dN/dS ratios for individual genes among VHSV genogroups, with all six genes having 454 lower dN/dS than found with our data. However, most of the IVb (N=45) and IVa (N=18, from 455 [80]) sequences used in our analyses were unavailable to He et al. [77]. 456 Positive selection was indicated for changes in three G-gene codons: two in IVb and one 457 in IVa. Positive selection on viral genes constitutes evidence that supports the host-pathogen 458 "arms race", which typically suppresses host immune responses [51]. Abbadi et al. [30]  viruses [51], including Human Immunodeficiency Virus (HIV) [81]. Viral RNAs can suppress 465 host RNA interference pathways involved in antiviral immunity [82] or block the abilities of host 466 antiviral proteins to engage with viral dsRNAs and mount an immune response [83]. 467 We found that overall evolutionary rates across the entire genome were slower than were 468 previous estimates based on partial gene sequences [12]. These differences stem from our present 469 use of more samples and complete genome sequences, which included conserved areas that were 470 previously omitted. 471 Among the three VHSV genogroups, V-IVa exhibited the fastest rate -2.01x10 -3 . He et 472 al. [77] estimated a rate of 5.60x10 -4 for 48 IVa G-gene sequences, which was slower than our 473 IVb rate for G of 1.02x10 -3 . Our inclusion of recently published diverse sequences from Korea 474 between 2012-16 [80] may have increased this, as those isolates differed by >20nt. Just two 475 complete sequences of IVa were from Japan, which were even more different than the Korean 476 isolates. This suggests that the Korean isolates are diverging from the Japanese IVa isolates, but 477 17 additional IVa genomes from Japan are needed for robust conclusion. IVa first was detected 478 along the North American Pacific Coast [32,33,34], but no full-genome sequences then were 479 available. All IVa Korean isolates were obtained from aquaculture; such transportation of 480 infected fishes may introduce VHSV to naïve fishes and different environmental conditions, 481 which can enhance adaptation and dramatically alter the virus, as observed in IHNV [84] and 482 HIRRV [85]. 483 Some Ia, Ib, and III isolates grouped together in our phylogeny, indicating that their 484 previous phylogenetic relationships, based on partial genes [12,30,86], had been poorly 485 resolved in the earlier analyses. We calculated that genomes of VHSV genogroups I/III evolved 486 at an overall rate of 4.09x10 -5 , more similar to that of IVb. Previous G-gene rate estimates for 487 VHSV-I and -III [77] using fewer complete gene sequences were faster than ours (4.54 x 10 -5 ), 488 having been based on 201 I sequences (5.57x10 -4 ) and seven III isolates (1.63x10 -3 ). Evolution of 489 subgenogroup Ia had been estimated at 1.74x10 -3 (N=34) [23] to 7.3x10 -4 from 108 G-gene 490 Italian isolates [30]. Our combination of the European groups may have yielded our slower rate. 491 Despite these variations, all of the above estimates are within the known range of evolutionary 492 rates for RNA viruses [87] (Duffy et al. 2008). 493

Phylogenetic patterns: Novirhabdoviruses 494
The phylogeny from the whole genome ( Figure 3) is congruent with prior partial gene 495 studies [12,14]. Two main sister groups characterize the novirhabdoviruses: IHNV+HIRRV and 496 VHSV+SHRV, with VHSV+SHRV diverging first from the common ancestor. This relationship 497 is congruent with Kurath's (2012 [18]) analysis of complete N-gene sequences. Two studies 498 examined rhabdoviruses using partial L-gene sequences [88,89], which likewise supported our 499 sister group pairing of IHRV+HIRRV. Our phylogenetic consensus tree yielded 100% ML and 500 Bayesian support for these relationships, whereas Kurath [18] found just 76% support for the 501 SHRV+VHSV clade using N. Thus, our analysis of the entire genome using a larger number of 502 isolates significantly increased confidence in resolving novirhabdovirus evolutionary 503 relationships. 504 Phylogenetic results for VHSV relationships from whole genome sequences support most 505 known genogroups and subgenogroups, except that III and Ia are not monophyletic. Prior 506 analyses of G-gene sequences by Dale et al. [90] and Ghorani et al. [31] thus were incorrect in 507 separating III and Ia. Genogroup Ia occurs in freshwater hosts [23], whereas both III and Ib are 508 18 marine [90]. All three are capable of infecting rainbow trout (Oncorhynchus mykiss), which 509 spends portions of its lifecycle in both environments. Our results indicate that Ib is 510 monophyletic, whereas III and Ia comprise a single taxon, indicating that they need to be re-511 defined as a single taxon. Additional III and Ia isolates should be completely sequenced. sharing an additional four AA changes in the coding regions of N, P, and L. These AA changes 526 possibly provided some advantage or lacked deleterious effects, since the substitutions remained 527 conserved for four years. Although most of that clade was recovered from Percidae hosts, several 528 of its isolates were in freshwater drum. In addition to the 2006 and 2008 Lake Erie cases [11], 529 freshwater drum die-offs were reported in 2005 from Lake Ontario [9], which appeared more 530 closely related to E06FD. 531 Both of our 2007 samples from the inland Budd Lake form a clade, located near C03MU. 532 Wayne County, Michigan, which is adjacent to Lake St. Clair that is a highly urbanized area with 533 the largest number of registered anglers in the state [91]. Anglers traveling from that area to 534 Budd Lake may have unknowingly transported VHSV, perhaps via live bait [12,14,92]. 535 The 2014-15 Lake Erie VHSV-IVb genotypes differ from one another, but share a sister 536 relationship branching from the main part of the phylogeny, located closer to C03MU. 537 Genotypes E14GS and E15GS shared the host species of gizzard shad and round goby with the 538 2016 isolates, but are genetically distant from the latter. The 2016 genotypes form a clade, which 539 19 is linked by SNPs as shown in the genotype network. Interestingly, these occurred in different 540 host species, with the E16 isolates exclusively recovered from gizzard shad, and both M16 541 isolates in round goby. Other studies identified potential evolutionary radiation of IVb in the 542 round goby host, particularly in Lake Ontario [40,75]. There have been relatively few VHS 543 positive round goby samples detected in Lake Michigan, despite its reported deaths during the 544 2008 outbreaks [16]. Round goby has been implicated in spreading the virus [93], often is used 545 as bait, and is transported among water bodies by anglers [94]. 546 Other VHS-positive isolates were recovered from Lakes Michigan and Erie in 2016, 547 whose viral titers were too low for genome sequencing [75], further indicating that infection 548 levels differ spatially and temporally. One of the first 2006 outbreaks intensely affected the Lake 549 St. Clair gizzard shad population [16], as well as its 2017 outbreak (G. Whelan, personal 550 communication 2017). Gizzard shad populations frequently experience large die-offs from cold 551 weather [95], water temperature fluctuations, and/or spawning activities [96], coinciding with the 552 optimal temperature range of VHSV-IVb (12-18 °C [97]; 10-14 °C [98]. In addition to the 2016 553 isolates, IVb-positive gizzard shad also occurred in Lakes St. Clair, Erie, and Ontario dating back 554 to 2006; however, only its 2016 isolates were closely related. Possible host specialization should 555 be examined in future investigation. 556 All but one of our identical sequences (C06FD group) were collected from Lake St. Clair 557 over a five-year span. This consistency suggests that the C06FD genome may have been 558 prevalent and locally adapted to environmental conditions and host species. It differed from 559 C03MU by a single AA in the L-gene, and was found in six different fish host species and the 560 sole two known invertebrate host species [99,100]. Yusuff et al. [101] discerned no effect on 561 host specificity when swapping L-genes between C03MU and Ia's DK-3592B. L variants appear 562 to differ in optimal temperature range, since swapping L from IVa into IVb led to increased virus 563 at higher temperatures, than occurred for IVa [102]. The single AA change between the C06FD 564 group and C03ML was not shared with IVa; instead C03ML and IVa were identical at that NT, 565 meriting future examination. 566 Analyses of population divergence patterns using the VHSV-IVb genome discern similar 567 spatial and temporal patterns as found in our prior partial G-gene analyses [75]. Lakes. This pattern is evident in all individual genes as well, suggesting that the G-gene depicts 571 overall population trends. During the later time period, VHSV-IVb has occurred more 572 sporadically in smaller, localized outbreaks, which may reflect increased host population 573 immunity and resistance, as predicted by the "Red Queen hypothesis" [49,50]. Likewise, viral 574 phage Φ2 showed greater genome divergence when exposed to changing hosts versus consistent 575 ones, particularly in genes associated with host immune suppression [103]. On the host side, 576 Alves et al. [104] described genomic changes in immune-related genes of wild rabbit populations 577 following 60 years of exposure to Myoxma Virus (MYXV). 578

Evolutionary perspectives for the Nv-gene 579
Our multi-gene phylogenies based on NT and AA sequences reveal little consensus 580 regarding the evolutionary origin of the Nv gene. All results support the sister relationship 581 between IHNV and HIRRV. VHSV-Nv appears possibly related to the L-gene in the NT 582 sequence tree, but pairs with IHNV-Nv and SHRV-Nv in the AA tree, showing closer 583 relationship to the G-gene. SHRV-Nv differs the most from other novirhabdoviruses, appearing 584 more closely related to the G-gene in the NT tree and to the P-gene for the AA tree. This 585 discrepancy might reflect differential functionality, since studies showed that Nv is not essential 586 to pathogenicity in SHRV [20,22]. In contrast, VHSV and IHNV require Nv for suppression of 587 the host immune response [19]. We conclude that Nv is highly saturated and homoplastic, 588 especially for SHRV. 589   [43] found that when 600 four AA M-gene mutations were introduced to C03MU, the virus was less able to suppress the 601 21 transcription of host immune defenses. These data suggest that even subtle changes can lead to 602 altered gene function and viral phenotypic characteristics. 603

Differences in cytopathogenicity 590
It appears that the oldest VHSV-IVb isolate, C03MU, has remained the most virulent to 604 date. More recent isolates from 2006-16 showed reduced virulence, potentially resulting from 605 virus-host coevolution. Other studies observed similar trends. For example, Imanse et al. [44] 606 examined vcG001 (C03MU) and vcG002 (not included in our analysis), finding faster growth 607 but lower titers in cells exposed to vcG002. Those had similar levels of viral RNA, suggesting 608 that vcG002 was less efficient at producing infective particles. Getchell et al. [45] also found 609 reduced viral load in O06SB, O13GS, and E14GS. In our results, viral RNA production and 610 virulence of Cell16a-c likewise did not significantly differ from CellC03, as evidenced by lower 611 peak virus production in the viral yield assay. Reduced virulence over time has characterized 612 MYXV in Australian rabbits [105], human papillomaviruses [106], and RABV in hyenas [107], 613 leading to a longer infectious period that may aid spread of the virus. Such pattern may similarly 614 characterize VHSV-IVb, meriting further work. 615  Table 1). 659 All analyses are deposited in Dryad under XXX.    regions (Lake Michigan and Budd Lake, Lake St. Clair, Lake Erie and Lake Ontario) using exact tests (GENEPOP; above diagonal) and θST divergences (ARLEQUIN; below diagonal).

Figure 1. Map of VHSV (I-IV) full genome isolates included in this study, colored by
Genogroup and with shapes denoting substrains.