Genomic evolution of SARS-CoV-2 in Reunion Island

Island communities are interesting study sites for microbial evolution during epidemics, as their insular nature reduces the complexity of the population's connectivity. This was particularly true on Reunion Island during the first half of 2021, when international travel was restricted in order to mitigate the risk for SARS-CoV-2 introductions. Concurrently, the SARS-CoV-2 Beta variant became dominant and started to circulate at high levels for several months before being completely replaced by the Delta variant as of October 2021. Here, we explore some of the particularities of SARS-CoV-2 genomic evolution within the insular context of Reunion Island. We show that island isolation allowed the amplification and expansion of unique genetic lineages that remained uncommon across the globe. Islands are therefore potential hotspots for the emergence of new genetic variants, meaning that they will play a key role in the continued evolution and propagation of COVID-19 as the pandemic persists.


Introduction
Oceanic islands have historically been attractive environments for the study of the ecology and evolution of many organisms (Emerson, 2002). Recognized as "natural laboratories", these peculiar ecosystems have stimulated research on colonization, diversification and specialization processes of a wide range of animal and plant species, and recently of human diseases (Jean et al., 2016). Indeed, tropical islands are in the front line for disease emergence related problems; the islands of the Western Indian ocean, Caribbean and Pacific often shoulder this burden, as demonstrated by recent outbreaks associated to vector-borne diseases such as Chikungunya, Dengue and Zika (Tortosa et al., 2012;Roth et al., 2014;Van Bortel et al., 2014).
Identifying the source, as well as understanding the mechanisms underlying transmission dynamics, and predicting the evolutionary trajectory of emerging viruses remains very challenging. Because of their peculiar features, oceanic islands provide relevant conditions for the study of such aspects. For instance, spatial isolation may represent a major barrier to the introduction of infectious agents. The discrete geographical nature of islands allows the identification and quantification of infectious agent migration processes associated with human travel and animal trade; and their small sizes also offer possibilities for rapid and efficient monitoring of epidemics. Oceanic islands could be considered as simplified ecosystems as compared to continental habitats, in which the quantification of the factors involved in diseases transmission may be easier to achieve, and where intervention measures may be more effective on smaller scales.
The COVID-19 pandemic, caused by SARS-CoV-2, has had an unprecedented impact on the global community since its suspected emergence from wildlife populations in China in 2019 (Huang et al., 2020;Lu et al., 2020). According to WHO data to date (July 2022), >500,000,000 cases of COVID-19 have been declared across the globe, with >6,000,000 associated deaths. Over more than two years, patterns of human movement associated with both trade and travel have shaped the progression of the pandemic, with countries introducing movement restrictions at differing scales to try to mitigate the spread of the disease. However, all global communities have now been affected by  This includes many small island communities, which have faced unique challenges dealing with the burden of the COVID-19 pandemic in insular and often geographically isolated contexts, sometimes with limited access to healthcare and medical resources.
In this study, we investigated SARS-CoV-2 genomic evolution in the island of Reunion, a French overseas territory located in the Western Indian Ocean. We hypothesized that population factors characterizing small islands could favour the selection and emergence of particular variants. But contrastingly, this effect may be diluted by continuous population mixing associated with numerous incoming and newly infected travellers. Based on 7423 consensus genome sequences generated between the 1st of January 2021 and 15th April 2022, we investigate key evolutionary features of the epidemic in Reunion, paying particular attention to aspects of the epidemic that were unique (or otherwise uncommon) to this particular island setting.

Sample and data collection
COVID-19 positive nasopharyngeal swabs identified as part of routine diagnostic screening across Reunion Island were provided for genomic analysis. Epidemiological data relating to COVID-19 cases between 1 January 2021 and 15 April 2022 were collected through the national and regional surveillance system (SI-DEP).

Genomic data
Genomic data were generated using the ARTIC amplicon-tiling protocol for the Oxford Nanopore Technologies (ONT) sequencing platform (MinION) (Quick et al., 2017). Briefly, RNA was extracted from nasopharyngeal swabs using the Qiagen RNA Mini extraction kit on the Qiacube pipetting robot. cDNA was generated using the Lunascript random hexamer reverse transcriptase protocol from New England Biolabs. Overlapping PCR products spanning the entire length of the genome were amplified in two independent reactions. The ARTIC v3 standard primer sets until October 2021, and the ARTIC v4 standard primer sets after this (Lambisia et al., 2022). Changing primer sets corrected for mutations in the primer regions that were present in the Delta variant. PCR products were pooled and quantified using the DNA HS assay kit on a Qubit 4.0 (ThermoFisher). Native barcodes were attached to PCR products via ligation using the TA blunt DNA ligase before samples were pooled and purified using AmpPure XT magnetic beads (Beckman). The nanopore sequencing adapter molecule was attached to the purified barcoded DNA using the Quick T4 DNA ligase kit from New England Biolabs before being purified for a second time using AmpPure XT magnetic beads. Sequencing of DNA libraries was performed on the MinION Mk1c device using R9.4.1 flowcells from Oxford Nanopore. Sequencing times varied depending on the number of samples included in each library, but overall, sufficient data were generated to provide a median read depth of 1484×. Basecalling and demultiplexing were performed in high accuracy mode and specifying the "-require-both-ends" parameter using the most recent version of the Guppy software from Oxford Nanopore at the time of sequencing. Genomic assemblies were performed using the latest version of the Medaka assembly software at the time of sequencing. Genomes with >5% missing data (1495 of 29,903 bases), or a median read depth lower than 300× were excluded from genetic analyses.

Lineage assignment and phylogenetic analyses
Lineage assignments were performed using PANGOLIN (Phylogenetic Assignment of Named Global Outbreak LINeages) v3.1.2 (O'Toole et al., 2021), and the NextClade CLI v1.7.0 (Aksamentov et al., 2021). The output from NextClade was used for sequence alignment and to identify the presence of specific substitutions, deletions, and insertions for each genome.

Epidemiological context
The epidemiological aspects of the COVID-19 in Reunion Island have been partially addressed elsewhere (Mercier et al., 2022). Briefly, the number of observed cases of COVID-19 rose in Reunion Island at the beginning of 2021, following the Christmas-period holidays of 2020. This rise in cases was associated with the introduction of the Beta variant, which became the main circulating variant on the island by February 2021 (week 7) of 2021. Case incidence then remained stable for approximately 20 weeks at around 100 cases per 100,000 inhabitants until the introduction of the Delta variant, which resulted in a sharp rise in incidence prompting the rapid implementation of stricter containment measures. Curfew and social-distancing measures resulted in a steady decrease in case incidence, which was accompanied by an increase in the proportion of Delta-variant associated cases. By the end of September 2021 (week 39), the Delta variant had completely displaced the Beta variant as the main circulating lineage on the island (Fig. 1). The first imported case of Omicron was observed in November (week 46), and autochthonous circulation of Omicron likely started around week 48 of 2021. The circulation of Omicron was accompanied by a sharp rise in the number of reported cases, which peaked at approximately 3500 cases per 100,000 individuals in week three of 2022. By the end of the study period (week 15 of 2022) Omicron lineage BA.2 had entirely replaced Omicron lineage BA.1 and was essentially the only variant circulating on the island. (See Fig. 1).

Lineage replacement and sequential waves of variant circulation
A total of 7423 genomes with >95% coverage at a median read depth of 1484× (IQR 962-2003) were included in this study. Genomes originated from a pseudo-random selection of individuals that had undergone testing for COVID-19 infection in private laboratories across Reunion Island between the 1st of January 2021 and 15th April 2022. Weekly sequencing activity generated genomic data for a median of 5% (IQR 3.1% -10.7%) of all declared COVID cases in Reunion.
Based on pangolin classification data, the major trend in lineage evolution was one of complete, or near-complete selective sweeps that led to lineage replacement on at least four separate occasions (Fig. 1). The first of these transitions has been documented elsewhere (Mercier Fig. 1. Summary of the epidemiological and genomic classification data from Reunion 2021-2022. Panels display (from top to bottom): 1) Weekly numbers of reported cases, and the ratio between imported and autochthonous cases (black line), 2) Weekly proportions of genomes attributed to different variants of concern, 3-7) Weekly numbers of sequenced genomes attributed to different PANGO lineages, data are displayed separately for each major variant of concern. et al., 2022); Briefly, non-variant-of-concern (VOC) lineages were replaced by Beta subvariants B.1.351 and B.1.351.2 around week 7 of 2021. Prior to this time, high proportions of imported cases were reported ( Fig. 1) that were associated with circulating European lineages. For example, subvariants of lineage B.1.177 were common, this lineage is thought to have originated in Spain and circulated predominantly around Europe towards the end of 2020 .
Among the non-VOC lineages identified in weeks 1-15 of 2021, we identified a strain that showed significant dissimilarity to all available sequences in the GISAID database. As a result, lineage B.1.622 was assigned to the pango-lineages database based on data from 71 observed cases. We detected B.1.622 in Reunion between weeks 03 and 15 of 2021, and its relative detected prevalence was highest in week 06, when B.1.622 sequences represented 13% of all sequenced genomes. B.1.622 isolates were highly divergent from the original sequence obtained from Wuhan in 2019 (Fig. 3), carrying an average of 24 synonymous and nonsynonymous mutations relative to the reference (Table 1). This level of divergence is similar, for example, to that observed for VOC Beta Fig. 3. Phylogenetic analyses of SAR-CoV-2 genomes from Reunion 2021-2022. a) Timetree phylogeny, leaves are coloured by variant of concern, shaded regions represent 6-month intervals. variants at the same time. The GISAID online tool CoVizu e was used to identify similar lineages from the global isolate database. Lineage B.1.622 was predicted to share a common ancestry with two other Pangolin lineages; B.1.474 which showed a widespread distribution across Europe, and B.1.629, a relatively uncommon lineage which was identified both in continental Africa and across Europe (Fig. 2). All three lineages exhibited significant genetic dissimilarities to other known strains and each other, appearing at the ends of long branches from the B.1 parental root position of the global time-scaled tree. Interestingly, a single sequence of B.1.622 was also isolated in France, falling at a basal position close to the root node of the B.1.622 clade in phylogenetic analyses (Fig. 2).
B.1.622 carried mutations characteristic of other major lineages from 2020 such as the D614G spike mutation that increases SARS-CoV-2 fitness (Plante et al., 2020). In addition, B.1.622 was characterised by a specific genetic profile combining: five mutations in the polymerase (ORF1a:S428N, T2196N, M3621I and L3977V; ORF1b: S2312L); two mutations in the Nucleocapsid gene, Q70R, which is located in the SARS-CoV-2 specific T cell epitope (Nelde et al., 2020), and S187L in the linker domain; one mutation in ORF9b, K67E. Moreover, there was a high density of mutations in the ORF3a gene including Q57H which was associated with a wave of infection in China and has been shown not to have a significant effect on SARS-CoV-2 fitness (Chu et al., 2021), as well as T14I and S216P which have both been described to alter the structure and function of the ORF3a-encoded protein (Azad and Khan, 2021). The full list of amino acid substitutions is shown in Table 2. Beta subvariants B.1.351 and B.1.351.2 co-circulated in Reunion in relatively stable proportions for nearly 35 weeks before being displaced by Delta in the second major strain transition that occurred around week 29 of 2021. Interestingly, however, the majority of Delta strains in circulation could be attributed to one of two pangolin lineages, AY.40 (974 of 3098 isolates) and AY.43 (1092 of 3098 isolates). Arrival of AY.40 around week 27 resulted in a large peak in the number of infections at week 31 (weekly incidence of infection of 350 per 100,000). AY.40 infections dropped to much lower levels by week 41, and AY.40 was itself replaced by AY.43. Case numbers of predominantly AY.43 infections then increased steadily from weeks 41 to 51 (Fig. 1).
The third major lineage replacement event occurred between weeks 52 of 2021 and 02 of 2022, when circulation of VOC Omicron BA.1 began during a period of time when AY.43 (Delta) infections were increasing. This likely had a synergistic effect on infection rates, resulting in a massive increase in case incidence to above 3500 per 100,000 individuals in week 03 of 2022.
Omicron BA.1 remained the main detected circulating variant for only 8 weeks. From week 09 of 2022 Omicron BA.2 became the majority variant and had almost fully replaced Omicron BA.1 by week 12. It should be noted that due to the extremely elevated case numbers, only a small fraction of cases was sequenced during 2022, and that case travel history is absent for the majority of this period due to the concurrent saturation of the health system.

Different dynamics and genetic characteristics of VOCs
Overall dynamics were similar for VOCs Beta, Delta and Omicron, with each variant successfully replacing its predecessor rapidly after the establishment of autochthonous circulation on the island. However, differences in the relative selective advantages between competing lineages resulted in different durations of the periods of lineage replacement. Beta became the majority variant (>50% of sequenced genomes) after seven weeks of circulation, compared to five weeks for Omicron and two weeks for Delta (Fig. 4).
Genetic diversity profiles of each VOC were highly contrasting. Beta lineages showed low diversity after importation, with overall strain diversity increasing almost linearly over the entire duration of its circulation in Reunion (Fig. 4). In contrast, high strain heterogeneity of Delta variants was observed rapidly after its arrival, both in terms of the number of predefined pangolin lineages detected (Fig. 1) and overall genetic diversity (Fig. 4). Interestingly, the overall genetic diversity of Omicron variants was also observed to increase linearly over time but remained lower than that of both Beta and Delta variants (Fig. 4) despite a large number of predefined Pangolin lineages being detected (Fig. 1), reflecting a likely high rate of importation. This may be indicative of negative selection limiting Omicron divergence.
We examined the amino acid substitutions of all VOCs and lineages that had demonstrated significant selective advantage in Reunion between 2021 and 2022. All substitutions that were conserved within VOCs are shown in Table 2, whereas all substitutions that distinguish between locally important subvariants of VOCs are shown in Table 3. Interestingly, there were several observations of the same substitution being acquired independently by different lineages, potentially indicating positive selection for these mutations. Local lineage B.1.622 shared ORF3a:Q57H with Beta variants B.1.317 and B.1.317.2. Beta variants shared spike mutation S:N501Y all Omicron variants, and spike mutation S:K417N was also shared between Beta variants and Omicron BA.2 subvariants. Both of these mutations have been linked to immune escape mechanisms (Harvey et al., 2021), and S:K417N was additionally observed in some Delta subvariants belonging to pangolin lineage AY.4.2 Interestingly, strain AY.43 (Delta) which replaced similar variant AY.40, shared the ORF1a:T3255I with all Omicron variants as well as spike gene mutation S:G142D with Omicron subvariant BA.2. However, the S:G142D mutation has been identified as a potential sequencing artefact resulting from the use of the ARTIC V4 primer sets (Davis et al., 2021).
As an overseas department of France, Reunion Island shares a strong link with continental France, and is also a travel hub between Europe, Africa, and Asia. As such, one of the key questions surrounding the epidemiology of SARS-CoV-2 in Reunion Island is why the Alpha variant did not become dominant during 2021, when it was the majority variant in continental France and in Europe. We observed multiple importation events of Alpha variants to Reunion (Figs. 1 and 3), but the proportion of Alpha cases was never observed to increase significantly (Fig. 4). Studies in other island communities have reported importation bottlenecks that result in the delayed entry of otherwise common variants. This, for example, resulted in unique variant profiles in the Canary Islands despite strong ties with continental Spain (Ciuffreda et al., 2022) and the island of Ilhabela despite strong ties with continental Sao Paolo (Brazil) (Viala et al., 2022). It may therefore be that Alpha was not imported sufficiently frequently for it to become a dominant variant, or it may be that imported Alpha variants did not possess a sufficient selective advantage to outcompete others in a population where Beta variant circulation was already dominant.
It is not known if the Beta variant possesses selective advantages that may account for it becoming the dominant lineage on Reunion, although it has been suggested that Beta variants exhibit partial immune evasion meaning that they could have a selective advantage in populations where cumulative incidence exceeds 20% . As the cumulative number of cases reported in Reunion at the beginning of 2021 represented only 1% of the total population, this is unlikely to explain the prevalence of Beta. The linear increase in genetic diversity of Beta variants over time and associated tree topology (Fig. 3) suggest continued genetic drift among local transmission chains in a large naive population, resulting in local strain diversification. Simultaneously, low numbers of imported cases were observed during the main period of Beta circulation, due to movement restrictions and confinement measures (Fig. 1).
Of particular interest, the most frequently detected Beta subvariant in Reunion Island, pangolin lineage B.1.351.2, has never been reported to circulate significantly outside of the Indian Ocean region, but was frequently reported in Reunion Island, Mayotte and Madagascar over the same period. Local population movements thus appear to have generated an insular community which allowed this otherwise rarelyobserved lineage to circulate and diversify on a large scale.
The transmission advantage of the Delta variant is largely explained by its ability to replicate faster compared to many other VOCs (Hart et al., 2022). However, the genetic diversity of Delta observed in Reunion Island (Fig. 4) as well as relative increases in the proportion of imported cases surrounding the period of Delta circulation (Fig. 1) suggests a very different dynamic when compared to Beta circulation. Numerous subvariants of Delta were likely imported very frequently, and a resulting two-phase selective sweep was observed with AY.40 and AY.43 becoming dominant variants. The observed dynamic is, again, different for Omicron; which was imported on multiple occasions, has a selective advantage explained by increased replication rates, immune evasion and increased reinfection rates (Hart et al., 2022), circulated in high levels but still retained a low level of genetic diversity compared to other variants.

Substitution rate heterogeneity
We calculated the substitution rate of SARS-CoV-2 from our data (Fig. 5). A strong linear correlation was observed between the root-to-tip distance and date of infection, providing strong evidence for a strict molecular clock. Similar to other studies, we estimated an overall substitution rate of 0.00128 substitutions/site/year (95% confidence Table 2 Mutations conserved in >90% of genomes are shaded in green, mutations conserved in >70% are shaded in yellow. OMICRON  Substitution B.1.622 B.1.351 B.1.351.2  However, calculated substitution rates for individual VOCs also showed strong linear correlations but ranged from >2 times slower for Beta and Omicron to more than ten times slower for Alpha and Delta when compared to the overall substitution rate (Table 4). Clear stepwise accumulations of mutations accompanied each lineage replacement event (Fig. 5), leading to stepwise jumps in the number of substitutions fixed within the population over time, and suggesting that selective sweeps are the strongest driver of SARS-CoV-2 evolution.  Substitution rate estimates for SARS-CoV-2 genomes from Reunion 2021-2022. a) Overall linear correlation between root-to-tip distance in phylogenetic analyses and collection date for all SARS-CoV-2 genomes. b) Individual correlations between root-to-tip distance and collection date for all variants of concern. c) Average substitutions per site per year calculated for each major variant of concern, error bars represent the 95% confidence intervals of the linear regression for each correlation.

Conclusion
Here, we report a detailed description of the genomic evolution of SARS-CoV-2 on the island community of Reunion based on genomic surveillance data collected between January 2021 and April 2022. While many of the observations we have made are in line with the global evolution of the COVID pandemic, we have seen some interesting aspects that are unique to the insular epidemiology of Reunion Island. Most notably, the dominance of VOC Beta subvariants B.1.351 and B.1.351.2 between weeks 07 and 39 of 2021. We also noted that the interplay between importation rates, epidemic dynamics and substitution rates were all highly contrasted between different VOCs. Overall, the most clear characteristic of SARS-CoV-2 epidemiology is one that is shared globally -that of sequential waves of dominant variants that replace their predecessors.
In common with other island communities, we observed that bottlenecks in strain importation followed by local dissemination resulted in contrasted variant profiles when compared to related continental settings, as well as the amplification and diversification of locally circulating lineages. Within our dataset, the most obvious examples of this came from lineages B.1.622, and Beta variants B.1.351 and B.1.351.2. These observations suggest that insular communities may be important in the emergence of new viral lineages, and it will be important to continue monitoring the genomic evolution of SARS-CoV-2 across the globe.

Funding
This work was also supported by the "Enhancing Whole Genome Sequencing (WGS) and/or Reverse Transcription Polymerase Chain Reaction (RT-PCR) national infrastructures and capacities to respond to the COVID-19 pandemic in the European Union and European Economic Area" Grant Agreement ECDC/HERA/2021/007 ECD. 12221.

Ethics statement
Biological materials were stored and treated using standard diagnostic procedures. No personal data was used in this analysis and samples were anonymized, inhibiting any correlations of samples with the respective patients. Thus, according to national regulations and the institutional rules for Good Scientific Practice, the requirement for submission to an ethical committee and for obtaining patients' informed consent was waived. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Declaration of Competing Interest
The authors declare no conflict of interest.

Data availability
All genomic data generated as part of this study are freely available through the GISAID database.