Genomic Variability of Monkeypox Virus among Humans, Democratic Republic of the Congo

Health authorities should be vigilant for this rapidly evolving virus.

of MPXV is increasing (20,21). In 2003 in the DRC, 7 generations of uninterrupted spread among humans were reported (21). Increasing susceptibility coincident with decreasing herd immunity is expected to profoundly increase the introduction and spread of MPXV among humans. In 2003, an outbreak of monkeypox in the midwestern United States revealed the propensity for transmission to MPXVnaive populations (22). The strain of MPXV responsible belonged to the Western African clade; these strains tend to be less pathogenic than the Central African strains that circulate in the DRC (2).
MPXV is a linear DNA genome of ≈197 kb and contains ≈190 nonoverlapping ORFs >180 nt long (1,7,26). Like all orthopoxviruses, the central coding region sequence (CRS) at MPXV nucleotide positions ≈56000-120000 is highly conserved and flanked by variable ends that contain inverted terminal repeats (ITRs) (1). VACV homologs to genes found in the terminal ends of the MPXV genome are predominantly involved in immunomodulation, and most are either predicted or known to influence host range determination and pathogenicity (27). Unlike VARV, which lacks ORFs in the ITR region, MPXV contains at least 4 ORFs in the ITR region (7,26).
The evolution of short-read, high-throughput sequencing technologies has made genomic surveys of large populations readily attainable (23,24). It has been suggested that the progressive loss of genes that were not essential for pathogenesis in humans is the mechanism that led to the emergence of a highly adapted virus that causes serious disease and is capable of efficient and rapid human-tohuman transmission (7). A recent study demonstrated that gene copy number variation might be a crucial factor for modulating virus fitness (25). In the study reported here, we investigated the genomic plasticity of MPXV strains obtained from patients in the DRC during active surveillance for monkeypox disease (20).

Sample Selection
From November 2005 through November 2007, a total of 760 cases of human monkeypox in the Sankuru District, DRC, were identified by active disease surveillance and confirmed by quantitative real-time PCR (20). From these, we selected 60 cases on the basis of 3 epidemiologic measures: severity, transmission, and geography. The samples chosen represented the entire set of samples and included samples from every category of severity and transmission observed and every geographic location sampled. Of the 60 cases, we sequenced 29 representative scab or vesicle fluid samples from patients whose samples met the criteria for concentration and quality of genetic material (>500 ng total DNA and >1.8 260/230 and 260/280 absorbance ratio). We used the Genome Analyzer IIx and MiSeq sequencers (Illumina, Inc., San Diego, CA, USA) to conduct 76-bp or 100-bp, paired-end sequencing. To avoid laboratory-induced mutation events, we sequenced these samples directly from scab or vesicle material without virus isolation or propagation.

Sample Collection and Processing
Techniques for sample collection, processing, nucleotide isolation, and real-time PCR have been described (20,28,29). The World Health Organization scoring system used during the smallpox eradication program was used to generate severity designations: presence of <25 lesions on the body was considered mild disease; 25-99 lesions, moderate; 100-250 lesions, severe, and >250 lesions, serious. Libraries with an average insert size of ≈300 bp were processed by using the TruSeq (Illumina, Inc.) sequencing kit version 5 and sequenced for either 76 or 100 cycles from each end on an Illumina Genome Analyzer IIx or MiSeq instrument.

Genome Assembly
All raw Illumina sequence data were processed through an in-house-developed filtering program that removes or trims reads with known artifacts. Specifically, all reads containing adaptor sequence, low-complexity reads, or PCR duplicates were removed or trimmed. Host read exclusion was performed by removing reads that aligned to the human genome (Homo sapiens chromosome 10, GRCh37 primary reference assembly) by using the CASA-VA pipeline 1.8 (Illumina, Inc.) with default parameters. De novo assemblies were performed by using SeqMan Pro NGen (DNAStar, Inc., Madison, WI, USA) with default parameters. Genome size was set to 200 kb. The optimal k-mer was determined by using an iterative assembly process. We closed gaps in coverage and resolved large repeats by using Sanger sequencing for the 23 genomes described in this work. Sequences were submitted to GenBank with accession nos. JX878407-JX878429.

Phylogenetic Analysis
We conducted the phylogenetic analysis by using methods identical to those reported for VARV diversity (4). In brief, we aligned the CRS (MPXV nucleotide positions ≈56000 120000) of samples with sufficient sequence information with 11 MXPV reference sequences available through the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov/), GenBank nos. KC257459, KC257460, DQ011154, DQ011155, HQ857562, HQ857563, NC_003310, AY603973, AY753185, AY741551, DQ011156, DQ011153, and DQ011157 in Se-qMan Pro NGen (DNAStar, Inc.). Manual editing of gaps and ambiguous base calls of the alignment were performed to include the draft genomes and provide a more robust phylogenetic narrative. The resulting alignment was analyzed by using Bayesian methods of phylogenetic reconstruction and the GTR+Γ evolutionary model. MrBayes version 3.1.2 (30) was run by using 1,000,000 generations with 4 parallel runs. Markov chain Monte Carlo convergence was assessed by checking the average standard deviation of split frequencies (<0.01) during >10,000 generations. Figure 1 shows the unrooted tree based on alignment of the 11 MPXV reference sequences and the viral genomes from the 23 clinical samples in this study.

PCR Screening
All primers were developed by using Primer-BLAST (www.ncbi.nlm.nih.gov/tools/primer-blast/) with default parameters. This software uses Primer3 to design the primers and then BLAST to determine specificity. We designed qualitative PCRs to detect the left and right short tandem repeats (STRs) at MPXV nucleotide positions 72-486 and 196710-197124, a region spanning the 625-bp deletion between bases 189820 and 190444 in the right ITR, and a 410-bp region of the CRS that contains 4 single-nucleotide polymorphism locations positively typing the lineages described in this article between nucleotide positions 89273 and 89944. All PCRs were run by using standard methods and optimized annealing temperatures of 55°C through 65°C. The PCR product Sanger sequences were aligned to the reference in SeqMan Pro software (DNAStar) for validation by using default parameters. Primer sequences are available on request.

Results
Referenced and de novo alignments were performed by using SeqMan Pro NGen to determine the scaffold of the viral genome. Genomes were finished by using PCR and traditional dideoxynucleotide sequencing (Sanger) to resolve gaps and repetitive and low coverage areas. We ob- by ≈6-fold. To assess the potential genomic diversity of MPXV strains circulating in the DRC, we performed a phylogenetic analysis of the CRS. The resulting cladogram suggested the emergence of 4 distinct lineages within the Central African clade.

Genomic Reduction
Among the MPXV alignments, we detected a polymorphism in the noncoding region of the ITR with 12 variants. Of the 23 (17.4%) complete genome sequences, 4 showed substantial genomic instability directly upstream of the right ITR. This subset of samples contained a 625bp deletion between bases 189820 and 190444 (genome positions based on MPXV-COG_2003_358). This deletion completely removes MPV-Z-N2R and eliminates the first 103 bp of the orthopoxvirus major histocompatibility complex class I-like protein (OMCP, MPV-Z-N3R) (Figure 2, panel A). The function of the protein expressed from MPV-Z-N2R is unknown, and no homologous gene is in the VARV or West African MPXV genomes. OMCP is a secreted protein that binds to NKG2D and interferes with NKG2D-mediated cell killing by natural killer cells (31). A conventional PCR created to characterize the deletion enabled us to detect this deletion in 6 additional genomes (online Table, wwwnc.cdc.gov/EID/article/20/2/13-0118-T1. htm.) We found no evidence of a mixed population within the sequence information for the isolates containing the deletion or in the gel electrophoresis of the Amplicon Sanger sequencing products for samples tested (data not shown).
This genetic polymorphism was absent from all MPXV sequences available through GenBank. Moreover, although 1 full-length copy and 1 truncated copy of the OMCP gene are present in all genomes of the Western African clade MPXV, only the full-length copy of the gene is present in genomes of the Central African MPXV clade (Figure 2, panel A). No homologous gene is present in the VARV genome (26) (www.poxvirus.org). The deletion identified in our clinical samples completely removes the region of OMCP containing the leader sequence (Figure 2, panel B) and substantially alters protein folding (data not shown), suggesting that the truncated protein is nonfunctional. The deletion was identified in genomes belonging to lineage B only (online Table). Data regarding route of infection were available for 44 patients; transmission for 24 cases was classified as human-to-human (secondary) and for 20 as animal-to-human (primary) (online Table). Metadata were available for 8 of the samples containing the deletion; 7 (87.5%) of the 8 were from cases resulting from human-to-human transmission (p = 0.0544, Fisher exact test), suggesting that this deletion might be associated with increased fitness in humans. Further work, however, is needed to elucidate the role of OMCP in transmissibility.

STR Analysis
Substantial variations from other orthopoxviruses were detected in the 70-bp STR region (bases 72-486 and 196710-197124 in MPXV-COG_2003_358) (4,(32)(33)(34). A traditional PCR designed against the left and right STR regions enabled us to evaluate this polymorphism in all 60 samples. We did not quantitatively test the purity of the repeat population beyond visualization of the PCR product by gel electrophoresis. A small percentage of the samples (10%) had faint secondary products that could not be resolved. The STRs described here should be considered to be the dominant population. In total, we identified 12 unique STR combinations (online Table). STR variations have been described for VARV isolates obtained during 1944-1974 (34); however, no studies have addressed the possibility of STR variations in VARV isolates belonging to the same geographic cluster. Samples collected during episodes of VARV activity in Bangladesh, South Africa, or the Far East showed limited variation (<3 left STR patterns); whereas, we observed 12 patterns for MPXV samples collected during a 2-year period in the DRC. Given that the poxvirus genomes in our study were characterized directly from clinical samples, it is plausible that this smaller range of diversity for VARV is only the result of a genetic bottleneck that occurred during cell culture isolation. Rapid copy number variation of the K3L gene in the VACV ITR has recently been described as a mechanism for host immunomodulation (25). However, none of the STR copy number variations observed among the MPXV strains involved a known coding region. In contrast to some VARV isolates, all 60 MPXV samples had identical right and left STRs (online Table).

Phylogenetic Analysis of the Entire Dataset
Because only 23 samples contained enough material for complete genome characterization, to expand our analysis to all 60 samples, we performed PCR amplification of a 410-bp variable region of the CRS corresponding to the H4L gene (VACV-COP gene notation). When selecting this region, we used the phylogenetic relationships of the CRS regions of the Central African clade by using 72 segregating sites over 61 kb of sequence. The selected region contained 4 segregating sites that are phylogenetically relevant to Central African clade MPXV (MPXV accession no. DQ01154: 89505, 89545, 89674, and 89914). No evidence of additional diversity was detected at these sites among the samples from the DRC except at the fourth site (89914) and only among the members of lineage B. Phylogenetic reconstruction of this region among the samples resulted in a topologic distribution for DRC isolates that correlates with the whole-genome tree (bootstrap values, as a measure for confidence measures, were lineage A (n = 12, 82%), lineage B (n = 11, 81%), lineage C (n = 6, 80%), and lineage D (n = 1). Moreover, this region contains 7 sites that differentiate between the Central African (n = 29) and Western African (n = 6) clades with a bootstrap value of 82%, which can be evaluated as another measure of the phylogenetically informative character of this genomic area. In this region, we identified 1 nonsynonymous and 3 synonymous changes. An additional nonsynonymous change was identified in all lineages ≈800 bp downstream of this region, for a total of 5 polymorphic sites in H4L within the Central African clade.
Overall, 11.6% of the genetic variations identified in this study were located in an ≈1.2-kb conserved region of the CRS. H4L is reported to be tightly associated with the DNA-dependent RNA polymerase, aiding in early gene transcription before initiation and termination (35,36). Increased mutation rates in a tightly constrained area of a genome can be a marker of increased selective pressure. The effect of the genetic polymorphisms on the function of H4L is unknown, and further studies to evaluate their effects on early gene expression are needed.
The phylogenetically informative single-nucleotide polymorphisms were assessed by PCR amplification and dideoxynucleotide sequencing for all 60 samples. This information was then used to extrapolate the lineage of each sample (online Table).
Lineage A contains the widest range of STR repeats (1-15) and represents 33 (55.0%) of the samples analyzed. The lineage seems to have arisen in the Lomela health zone (forest) where it continued to circulate throughout 2007. The remaining members of the lineage are a cluster of strains with 13-15 STR repeats that were geographically restricted to the Vangakete health zone (savannah). According to the World Health Organization scoring system for severity used during smallpox eradication, the virus strains belonging to lineage A caused predominantly moderate disease (66.7%, p = 0.2816). No significant differences in animal-acquired versus human-acquired infection were associated with this lineage.
Lineage B contains 1-8 STRs and represents 16 (26.7%) of the samples analyzed. These strains were predominantly isolated from the Kole (forest) and Lomela health zones and were associated with moderate disease (80.0%, p = 0.2179). Although no statistically significant association with secondary (human) transmission was observed among all lineage B strains, the MPV-Z-N2R/ OMCP deletion was identified only in genomes belonging to this lineage and, as stated above, was significantly associated with human-to-human transmission.
Lineage C showed high homogeneity with only 1 STR and represented 10 (6.7%) of the samples analyzed. These strains were predominantly obtained from the Djalo-Ndjeka (ecotone) and Vangakete health zones and were associated with severe/serious disease (70.0%, p = 0.0234).
Only 1 representative of lineage D was identified and contained 10 STRs. The sample was collected in Bene Dibele (forest), was associated with severe disease, and was acquired from a human source.
Although the data do not support causality linking the number of STR repeats to the severity of any lineage, the composition of the STR region merits further investigation for effects on virulence and transmission and as a marker of passaging effects. Our data support a wide diversity of STR composition that varies according to lineage; 1 population exhibited only a dominating single STR repeat and others had variants between 1 and 15 repeats. On the basis of our data and the information previously reported (25) depicting an accordion-like expansion of gene-copy in response to antiviral systems, there is precedence for a copy number variation mechanism used by the virus. Considering that there is no coding element in this region, the most likely need to vary this region is structural; therefore, we predict that the STR region may be under selective pressure to provide stability to this region of the genome.

Discussion
Our data demonstrate that a combination of genomic destabilization and genetic polymorphism are influencing the evolution of MPXV strains actively circulating in the Sankuru District of the DRC. Patterns of genomic destabilization and gene loss were consistent with models of orthopoxvirus genomic evolution (7). A recently submitted genomic sequence derived from 2 isolates obtained in Sudan in 2005 contains a large indel in the right flanking region. This indel simultaneously inserts a 10.8-kb inverted repeat of several immune modulatory coding regions from the left flanking region and deletes a 2.1-kb region that surrounds the region that we identified as reduced. The authors who reported the sequence posit that this insertion and deletion resulted from a single event, and the identifying sequence is the only evidence of either the insertion or the exact deletion in any Poxviridae isolate (37). The 2 reported changes cannot be functionally compared in this study; however, additional evidence supports genetic variability in this region. Genomic reduction might have played a role in the emergence of VARV as a highly adapted human pathogen capable of efficient human-tohuman spread (7). Although it has been suggested that gene loss was associated with a restricted host range for VARV, no link has been demonstrated between gene loss and increased severity of disease. VARV is thought to have diverged from its rodent-borne ancestor 3,000-4,000 years ago (38). The strains of VARV isolated during periods of high activity in the 20th century were already well adapted to their evolutionary niche (humans), and, as a result, their genetic sequences were highly conserved. Conversely, the level of variability for MPXV isolates might suggest an active pattern of change; however, longer-term surveillance is required.
Although genomic changes are predicted by models of host transition, the correlation of the gene loss pattern with secondary transmission might indicate that MPXV is adapting for efficient replication in a novel ecologic niche: humans. It is also plausible that the association of OMCP gene loss and transmissibility is coincidental because numerous factors, including vaccination status and human encroachment on reservoir habitats, could also explain increased human-to-human transmission and variant introduction frequency. The scarcity of information regarding historical orthopoxvirus emergence and the absence of sequencing data for MPXV reservoir isolates precludes our ability to pinpoint the exact source of the observed variability; however, we predict that the 4 lineages are circulating in the reservoir population and are introduced into the human population after direct contact with those reservoir hosts. Further genomic surveillance that includes reservoir species might help explain the variant emergence described herein.
Active surveillance during November 2005-November 2007 in Sankuru District, DRC, suggested a strong correlation between smallpox vaccination and protection from MPXV (20). However, as the number of unvaccinated persons continues to increase, so does the size of the susceptible population. Other factors in the DRC, such as malnutrition, disease, and an inadequate health care system, have provided an ideal environment for MPXV. It is possible that increased circulation of MPXV in the human population of the DRC is a driving force in the evolution of the virus. Our data also indicate that certain genetic changes might affect disease severity or human-to-human transmissibility, suggesting an active period of adaptation that could result in virus strains with increased fitness in humans. However, we have no evidence to directly link the genetic changes to increased severity or transmissibility in vivo.
The global effects of the emergence of MPXV strains that are highly adapted to humans could be devastating. Importation of MPXV by infected vertebrates is of concern because of the potential for establishment of new reservoirs outside Africa. In fact, American ground squirrels have been found to be susceptible to infection (39), suggesting that other rodent species worldwide might also be susceptible. Small genetic changes could favor adaptation to a human host, and this potential is greatest for pathogens with moderate transmission rates (such as MPXV) (40). The ability to spread rapidly and efficiently from human to human could enhance spread by travelers to new regions. Therefore, active disease surveillance should continue to be used monitor MPXV for changes that are consistent with increasing adaptation to humans. Continued active surveillance of the Sankuru District, and expansion to all other regions where the virus is known or predicted to circulate, would help determine the true geographic range of this virus. Given the apparent rapid evolution of this virus, when suspected or confirmed cases in humans are observed, health authorities in presently unaffected areas should become vigilant and actively prepare to take immediate action.