Gene duplication, gene loss, and recombination events with variola virus shaped the complex evolutionary path of historical American horsepox-based smallpox vaccines

ABSTRACT Vaccinia virus is the active component of all modern smallpox vaccines after the mid-20th century, but it is uncertain to what extent cowpox, vaccinia, and horsepox viruses were used to produce vaccines before then. Genome sequences of six smallpox vaccines used in the United States between 1850 and 1902, namely VK01, VK02, VK05, VK08, VK12, and Mulford_1902 vaccines, revealed >99.5% similarity with a 1976 strain of horsepox in the genome core. However, how these historical vaccines relate to horsepox and vaccinia viruses is still unknown. Here, we present a detailed investigation of the gene content and genomic structure of these historical smallpox vaccines. Except for VK05, all historical vaccines differ from horsepox in the genomic architecture of the flanking variable regions showing complex patterns of gene duplication/transposition, gene fragmentation, and gene loss. The Mulford_1902 vaccine is the closest virus to contemporary vaccinia viruses and the VK02 vaccine is the most different, with several stretches of variola virus genes recombined in its genome. Our data suggest that in the late 19th and early 20th centuries, different horsepox-based vaccines and probably related unsampled progenitors of modern vaccinia virus coexisted. A better understanding of the evolutionary path of the now extinct horsepox-based vaccines will increase our knowledge of the origins of contemporary vaccinia viruses and the pathways that led to the consolidation of current smallpox vaccines. This is particularly important now that the resumption of production of smallpox vaccines for use against mpox is widely discussed, as is the improvement of available vaccines. IMPORTANCE Modern smallpox vaccines, such as those used against mpox, are made from vaccinia viruses, but it is still unknown whether cowpox, horsepox, or vaccinia viruses were used in the early 20th century or earlier. The mystery began to be solved when the genomes of six historical smallpox vaccines used in the United States from 1850 to 1902 were determined. Our work analyzed in detail the genomes of these six historical vaccines, revealing a complex genomic structure. Historical vaccines are highly similar to horsepox in the core of their genomes, but some are closer to the structure of vaccinia virus at the ends of the genome. One of the vaccines is a recombinant virus with parts of variola virus recombined into its genome. Our data add valuable information for understanding the evolutionary path of current smallpox vaccines and the genetic makeup of the potentially extinct group of horsepox viruses.

Mapping of VK01 (A-D) and VK12 (E to I) reads to each transition region was performed using Geneious Prime and is shown below the schemes.A low coverage region was observed downstream region 6 in VK01 genome (D) and region 8 in VK12 genome (I).This low coverage region corresponds to AT-rich sequences located in the intergenic region between HPXV-15b and C11R genes (Figure S7G) and were detected in all VK genomes.HXPV-15b is the last gene of the 10.7 kb insertion at the left end of the HPXV genome (Figures 3B and 3C).

Figure S1 .
Figure S1.Comparison of genomic structure, percent identity, and gene content among CPXV strains, HPXV_MNR-76, and VK05.A) Alignment of the variable regions of the genomes of CPXV strains from the five clades, HPXV_MNR-76 and VK05.The ORF map highlights the similarity between the CPXV strains, HPXV_MNR-76, and VK05 at the variable ends.The HPXV and VK05 ORFs in the 10.7 kb region (left panel) and in the 5.5 kb region (right panel), which are absent in all VACV strains, are indicated above the figures.The nine CPXV orthologs found in the corresponding 10.7 kb region are highlighted in red only in the CPXV_GRI-90 genome, but they are also seen in the other CPXV strains, according to the alignment.The asterisk indicates the 4.7 kb region found in CPXV strains but absent in HPXV_MNR-76 and VK05.CPXV genes found in this region are highlighted in gray in CPXV_GRI-90.B) Pairwise comparison of the percent of shared identity between HPXV_MNR-76, VK05, and two CPXV strains from each of the five clades.C) Venn diagram showing the number of genes in the hypervariable regions that are shared by CPXV and HPXV/VK05.

Figure S2 .
Figure S2.Mapping of VK01 (A-D) and VK12 (E-I) reads to the regions of long insertions in VK01 and VK12 genomes.(A -I) Top schemes show the long insertions of VK01 and VK12 (green bars) and fragments 1, 2, 6, 7, and 8 of the 10.7 kb insertion of HPXV (red bars) at the left end of the genomes, as shown in Figure 3A.Each region of transition is indicated by colored boxes and letters.

Figure S3 .
Figure S3.Bootscan analysis of region 1 to detect recombination subregions in the VK02 genome.Region 1 highlighted in Figure 7 was reanalyzed by Bootscan, using VK02 as query, window size 200, and step size 10.Six putative recombinant subregions with VARV were identified and marked from A to F. The VK02 sequences between recombination breakpoints were extracted and realigned with several orthopoxviruses for phylogenetic analysis, using either maximumlikelihood or neighbor-joining models.Blue boxes indicate the cluster containing HPXV and HPXVrelated viruses and pink boxes indicate the VARV cluster.The position of VK02 in the trees is indicated by a red star.Numbers indicate the bootstrap support from 1,000 replicates (>50% is shown).The scale bar indicates the number of substitutions per site.

Figure S4 .
Figure S4.Bootscan analysis of region 2 to detect recombination subregions in the VK02 genome.Region 2 highlighted in Figure 7 was reanalyzed by Bootscan, using VK02 as query, window size 200, and step size 20.Five putative recombinant subregions with VARV were identified and marked from A to E. The VK02 sequences between recombination breakpoints were extracted and realigned with several orthopoxviruses for phylogenetic analysis using either maximumlikelihood or neighbor-joining models.Blue boxes indicate the cluster containing HPXV and HPXVrelated viruses and pink boxes indicate the VARV cluster.The position of VK02 in the trees is indicated by a red star.Numbers indicate the bootstrap support from 1,000 replicates (>50% is shown).The scale bar indicates the number of substitutions per site.

Figure S5 .
Figure S5.Bootscan analysis of region 3 to detect recombination subregions in the VK02 genome.Region 3 highlighted in Figure 7 was extracted and reanalyzed by Bootscan, using VK02 as query, window size 600, and step size 80.Two putative recombinant subregions with VARV were identified and marked A and B. The VK02 sequences between recombination breakpoints were extracted and realigned with several orthopoxviruses for phylogenetic analysis using either maximum-likelihood or neighbor-joining models.Blue boxes indicate the cluster containing HPXV and HPXV-related viruses and pink boxes indicate the VARV cluster.The position of VK02 in the trees is indicated by a red star.Numbers indicate the bootstrap support from 1,000 replicates (>50% is shown).The scale bar indicates the number of substitutions per site.

Figure S6 .
Figure S6.Bootscan analysis of region 4 to detect recombination subregions in the VK02 genome.Region 4 highlighted in Figure 7 was extracted and reanalyzed by Bootscan, using VK02 as query, window size 500, and step size 80.Two putative recombinant subregions with VARV were identified and marked A and B. The VK02 sequences between recombination breakpoints were extracted and realigned with several orthopoxviruses for phylogenetic analysis using either maximum-likelihood or neighbor-joining models.Blue boxes indicate the cluster containing HPXV and HPXV-related viruses and pink boxes indicate the VARV cluster.The position of VK02 in the trees is indicated by a red star.Numbers indicate the bootstrap support from 1,000 replicates (>50% is shown).The scale bar indicates the number of substitutions per site.

Figure S7 .
Figure S7.Mapping of VK02 (A-F) reads to the recombination regions of VK02 genome.(A -F) The six recombination regions of VK02 (yellow arrows) and coverage profile are shown on the top panel.Bottom panels show the left and right boundaries of each recombination region.(G) An alignment of the intergenic region between HPXV-15b and C11R genes is shown.The left arrow indicates the end of HPXV-15b gene, and the right arrow indicates the beginning of the C11R gene.The AT-rich region is indicated by a black bar on the top.The left boundary of recombination region 6A is indicated with a red bar.

Figure S8 .
Figure S8.Similarity profile and Bootscan analysis to detect recombination events in the left variable region of the MFDV_1902 genome.A) The multialignment of the left variable region of the six historical smallpox vaccines and the HPXV_MNR-76 genomes was scanned for similarity analysis using Simplot, using HPXV_MNR-76 as query, window size 500, step size 50.A black arrow indicates the region of least similarity between MFDV_1902 and HPXV_MNR-76.B) The initial 3 kb segment of the multialignment was used in Simplot with window size 300, step size 40, and MFDV_1902 was selected as the query.The black arrow indicates the region of the least similarity between MFDV_1902 and both HPXV and VK08/VK01.C) The multialignment shown in B, was analyzed by Bootscan, using MFDV_1902 as query, window size 300, and step size 40.A putative recombinant region with VARV was identified (black arrow).The MFDV_1902 sequence between recombination breakpoints was realigned with several orthopoxviruses for phylogenetic analysis (D) using either maximum-likelihood or neighbor-joining models.Blue boxes indicate the HPXV cluster and orange boxes indicate the VACV cluster.The position of MFDV_1902 in the tree is indicated by a red star.Numbers indicate the bootstrap support from 1,000 replicates (>50% is shown).The scale bar indicates the number of substitutions per site.

Figure S9 .
Figure S9.Details of the recombination events with VARV in the VK02 genomes.A) part of the nucleotide alignment of region 6A that contains the ortholog of C11R.Note the similar patterns of SNPs, insertions, and deletions in the VK02 and VARV genomes.These patterns are missing in other historical smallpox vaccines.B) location of the recombinant breakpoints, size of the recombinant regions and subregions, and gene loci where the recombination event occurred.