Viral transmission and evolution dynamics of SARS-CoV-2 in shipboard quarantine

Abstract Objective To examine transmission and evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in shipboard quarantine of the Diamond Princess cruise ship. Methods We obtained the full SARS-CoV-2 genome sequences of 28 samples from the Global Initiative on Sharing All Influenza Data database. The samples were collected between 10 and 25 February 2020 and came for individuals who had been tested for SARS-CoV-2 during the quarantine on the cruise ship. These samples were later sequenced in either Japan or the United States of America. We analysed evolution dynamics of SARS-CoV-2 using computational tools of phylogenetics, natural selection pressure and genetic linkage. Findings The SARS-CoV-2 outbreak in the cruise most likely originated from either a single person infected with a virus variant identical to the WIV04 isolates, or simultaneously with another primary case infected with a virus containing the 11083G > T mutation. We identified a total of 24 new viral mutations across 64.2% (18/28) of samples, and the virus evolved into at least five subgroups. Increased positive selection of SARS-CoV-2 were statistically significant during the quarantine (Tajima’s D: −2.03, P < 0.01; Fu and Li’s D: −2.66, P < 0.01; and Zeng’s E: −2.37, P < 0.01). Linkage disequilibrium analysis confirmed that ribonucleic acid (RNA) recombination with the11083G > T mutation also contributed to the increase of mutations among the viral progeny. Conclusion The findings indicate that the 11083G > T mutation of SARS-CoV-2 spread during shipboard quarantine and arose through de novo RNA recombination under positive selection pressure.

required intensive care and nine (2.4%) died. 6,7 The shipboard quarantine provided a closed environment to observe the SARS-CoV-2 transmission and adaptation independently from other infectious resources. 6,7 This environment presents an ideal static population, with little interfering noise, to measure the viral phylodynamics from the COVID-19 outbreak. We therefore decided to use this opportunity to study de novo evolution of SARS-CoV-2 in a closed population.

Data resources
Viral sequences and sequencing methods are available in the database of Global Initiative on Sharing All Influenza Data (GISAID) 8 and GenBank®. 9 From these database, we downloaded sequences and annotations of the isolates from the cruise chip, as well as the reference genomes of SARS-CoV-2 isolates PBCAMS-WH-04 (accession number MT019532), WIV04 (MN996528), Hu-1 (NC045512) and WHU01 (MN98868). 4

Statistical and phylogenetics analyses
We aligned FASTA files of viral sequences using MAFFT 7 software (Kazutaka Katoh, Research Institute for Microbial Diseases, Osaka, Japan). 10,11 To analyse phylogenetic relationships between viral sequences, we used the neighbour-joining method and Jukes-Cantor substitution model with setting bootstrap resampling number as five. We generated the rectangular phylogenetic tree using Archaeopteryx with Java plugin of MAFFT 7. 12  To investigate the linkage disequilibrium, that is the non-random assortment of alleles at different loci, of SARS-CoV-2 genomes, we first converted 148 SARS-CoV-2 genomic sequences using SNP_tools plug-in in Excel (Microsoft, Redmond, United States of America) to create a baseline. 21 We downloaded these sequences from GISAID. Using HaploView software, where Dmax is the theoretical maximum difference between the observed and expected haplotype frequencies. We also calculated the log of the odds of there being a disequilibrium between two loci and the squared coefficient of correlation (r 2 ) using the same software. In the absence of evolutionary forces or natural selection, the D' converges to zero along the time axis at a rate depending on the magnitude of the recombination rate between the two loci. We used the χ 2 test to examine if the obtained linkage disequilibrium was statistically significant. To detect positive RNA recombination, we plotted 95% confidence bounds for D' using HaploView. 23 Pairs are thought to be in strong linkage disequilibrium if the upper 95% confidence bound is above 0.98 (that is, consistent with no recombination) and the lower bound is above 0.7. Conversely, strong evidence for recombination is defined if pairs for which the upper confidence bound of D' is less than 0.9. We searched a solid spine of strong linkage disequilibrium running from one marker to another along the legs of the triangle in the linkage disequilibrium chart to determine the haplotype block. 22

Viral variants
A total of 28 specimens with viral sequences were available for this analysis, including 25 samples from the United States and three samples from Japan. Table 1  Whether the single 11083G > T substitution spontaneously occurred during the quarantine or the patients had been infected with a viral variant containing this mutation before boarding the ship is unclear. Nevertheless, all of the viral sequences were more similar to the WIV04 sequence than with other 143 SARS-CoV-2 isolates in the GISAID database (data repository). 24 This result suggests that the 24 new mutations identified were generated de novo on the ship rather than deriving from multiple geographic origins.
The analysis revealed two possibilities of the viral origin: either the virus (except hCoV-19/USA/CruiseA-18/2020) originated from a single primary case with the WIV04 sequence and all substitution mutations occurred during the quarantine; or there were two simultaneously primary cases, one identical to the WIV04 sequence and one containing the 11083G > T substitution.

Natural selection of mutations
Four variants had 3 mutations in their genome, one variant had 4 mutations and two variants had 6 mutations ( Fig. 1 and Table 1). To test the hypothesis that the virus mutation evolved under selection pressure as opposed to neutral evolution (random) onboard the cruise ship, we We further investigated these two possibilities using two neutrality tests. 16,17 First, Fu and Li's test generated a negative D value of 2.66 (P < 0.01), suggesting that the quarantine procedure provided a purifying or positive selection pressure to generate an excess of singleton sites. This conclusion was corroborated by the fact that the 39.3% (11/28) of cases contained 15 new singleton mutations (Fig. 1). Second, Zeng's E value of 2.37 (P < 0.01) supported the possibility of population growth of the virus after a recent bottleneck as a force. We conclude that SARS-CoV-2 viral evolution was positively correlated to the increase of the selection pressure during the shipboard quarantine.

RNA recombination
Seven samples contain the 11083G > T mutation although they belong to different subgroups ( Fig. 1). By assuming the substitution rate of 0.92 × 10 3 /site/year, 26 it is unlikely that the virus variants of different subgroups all generated the same spontaneous mutation at the G11083 site into the nucleotide T within three weeks. One hypothesis is that RNA recombination occurred in these cases to gain the 11083G > T mutation.

3'-UTR mutations
In three samples, we found 2 mutations, 29736G > T and 29751G > T in the stem loop-II motif (Fig. 5). We used the published 3 dimensional crystal structure of SARS-CoV stem loop-II motif RNA 27 to map the nucleotides G29736 and G29751. We found that these nucleotides are equal to the nucleotides G13 and G28 in the SARS-CoV stem loop-II motif (Fig. 5). In SARS-CoV, G13 (G29736 in SARS-CoV-2) forms a base triple with A38 and C39 in a seven-nucleotide asymmetric bubble, while G28 (G29751 in SARS-CoV-2) participates in formation of an essential RNA base quartet composed of two G-C pairs (G19, C20, G28, C31; Fig. 5). 29

Discussion
Viral phylogenetics is useful tool to study epidemiological and evolutionary processes, such as epidemic spread and spatiotemporal dynamics including metapopulation dynamics, zoonotic transmission, tissue tropism and antigenic drift. 30 Here we report the viral phylodynamics of SARS-CoV-2 from patients in a shipboard quarantine for three weeks in February 2020. The transmission started from either one or two primary cases with WIV04 sequence and/or 11083G > T mutation, then quickly separated into at least five subgroups based on new mutations.
Increased positive selection as well as RNA recombination of SARS-CoV-2 were evident during the quarantine. These results should be considered in formulation of future management protocols with respect to a SARS-CoV-2 outbreak in any relatively close quarters, such as shipboards, submarines, dormitories, prisons and hospitals.
While the quarantine averted a lot of infections on the shipboard, 31 the phylogenetics analysis showed that viral transmission and RNA recombination occurred between the five identified subgroups. Our data fit in the coalescent model, which use the diversity of viral genome, the viral evolutionary rate and the estimated time of infection to determine the number of viral genotypes present in the initial infected population. 32 However, we cannot rule out that evolutionary processes, such as the transmission bottleneck that determines how much of the viral diversity generated in one host passes to another during transmission, also shaped the viral phylogenies. While spatial structure is the most general virus population structure in phylodynamic analyses, SARS-CoV-2 evolution may also have been influenced by the characteristic of the host, such as age, race and risk behaviour. 30 Because viral transmission can preferentially occur between patients sharing any of these attributes, the real reason(s) for viral transmission between virus variants require(s) further study. Furthermore, studies on whether quarantine in close quarters also promote virus to gain more rapidly mutations via RNA recombination is needed.
Although the small sample size in this study, our findings from computational and statistical analyses indicated that the selection pressure was not random. We assume that SARS-CoV-2 variants were at an initial stage of evolution rather than the fixation stage, 33  From these analysis of selection pressure, we conclude that on the cruise, the virus evolved under strong positive selection or maybe in the process of selective sweeps, which could generate beneficial mutations for SARS-CoV-2 to quickly reach fixation.
Although RNA recombination in SARS-CoV-2 had been suggested previously, 28 this study provides evidence that RNA recombination occurred de novo in SARS-CoV-2 genome.
Within three weeks, the genome, sampled from four infected individuals, had gained the same 11083G > T mutation, suggesting that RNA recombination also participated in viral evolution of the virus. The RNA recombination of 11803G > T is also present in UPHL-01 variant, however, whether the carrier of the UPHL-01 acquired the variant from a cruise ship passenger or if the mutation appeared independently of cruise ship variants is unknown. Because this mutation was also later detected in other variants, 36 in addition to UPHL-01, future studies should further investigate whether 11083G > T may increase the fitness of the carrier. Other studies have suggested the 11083G > T could be a beneficial mutation linked to asymptomatic infection. 36,37 Publication: Bulletin of the World Health Organization; Type: Research Article ID: BLT.20.255752 Of the 24 mutations we identified, 11 mutations led to amino acid substitutions and two mutations occurred in the stem loop-II motif in the 3'-UTR region. This motif is a very well conserved RNA motif in more than 30 coronaviruses. 29,38 We have also reported the unique 29742G > A or 29742G > U substitutions in stem loop-II motif RNA in SARS-CoV-2 isolates in Australia (Fig. 5), 28 reinforcing the idea that stem loop-II motif is a hotspot for mutations in SARS-CoV-2 rather than a conserved RNA domain. 27    Notes: We generated the phylogenetic tree of the viral sequences from the cruise in MAFFT 10 using the neighbour-joining method. Mutations appearing in more than one variant are colour coded. We have included UPHL-01 (accession no. EPI_ISL_415539U) as it is the first sample with both 11083G > T and 26326C > T. Notes: Alignments of viral sequences were generated using MAFFT, 10   Notes: Linkage disequilibrium plot of HaploView to display the confidence bounds colour scheme. Each box represents a pair of mutations. The solid spines of strong linkage disequilibrium running from one marker to another along the legs of the triangle in the linkage disequilibrium chart determine the haplotype block. 22 We defined strong evidence recombination if pairs for which the upper confidence bound of the coefficient of linkage disequilibrium is less than 0.9 (white squares). Notes: Numbers next to each haplotype block are haplotype frequencies. The joining lines represent combined haplotypes. In the crossing areas between haplotype blocks, a value of multi-allelic D', that is the normalized value of the coefficient of linkage disequilibrium, is shown to represent the level of recombination between blocks.