Genomic surveillance of SARS-CoV-2 in Puerto Rico reveals emergence of an autochthonous lineage and early detection of variants

Puerto Rico has experienced the full impact of the COVID-19 pandemic. Since SARS-CoV-2, the virus that causes COVID-19, was first detected on the island in March of 2020, it spread rapidly though the island’s population and became a critical threat to public health. We conducted a genomic surveillance study through a partnership with health agencies and academic institutions to understand the emergence and molecular epidemiology of the virus on the island. We sampled COVID-19 cases monthly over 19 months and sequenced a total of 753 SARS-CoV-2 genomes between March 2020 and September 2021 to reconstruct the local epidemic in a regional context using phylogenetic inference. Our analyses revealed that multiple importation events propelled the emergence and spread of the virus throughout the study period, including the introduction and spread of most SARS-CoV-2 variants detected world-wide. Lineage turnover cycles through various phases of the local epidemic were observed, where the predominant lineage was replaced by the next competing lineage or variant after approximately 4 months of circulation locally. We also identified the emergence of lineage B.1.588, an autochthonous lineage that predominated circulation in Puerto Rico from September to December 2020 and subsequently spread to the United States. The results of this collaborative approach highlight the importance of timely collection and analysis of SARS-CoV-2 genomic surveillance data to inform public health responses.


Introduction
The current coronavirus disease 2019  pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was initially declared a Public Health Emergency of International Concern in January 2020 1,2 . Despite global efforts to interrupt transmission chains with quarantine, isolation and travel restrictions at the onset of the pandemic, SARS-CoV-2 spread rapidly across the globe, creating a global pandemic and threat to human health worldwide. By November 15th, 2021, the World Health Organization (WHO) reported 253 million con rmed COVID-19 cases in 222 countries and over 5 million deaths 3,4 . SARS-CoV-2 reached all 50 states of the United States and associated territories, including Puerto Rico, by March 2020, after multiple introductions by travelers with infection 5,6,7 . The rapid spread across the United States was primarily propelled by interstate transmission chains and air travel to the associated territories 6, 8, 9 . SARS-CoV-2 is an enveloped virus with a single-stranded positive-sense RNA genome of approximately 30,000 base pairs. During replication, a virus-encoded exonuclease provides a proof-reading activity that contributes to the observed low mutation rate and stable genome 10,11 . Nevertheless, the unprecedented spread of SARS-CoV-2 globally and the wealth of genomic sequence data available through the international initiative for genomic studies and surveillance has facilitated phylodynamic approaches to infer viral evolutionary rate, growth rate, and estimated time of origin for speci c outbreaks 11 . Studies have revealed that the viral genome has been accumulating mutations of concern, especially in the spike protein region, which confer phenotypes with increased tness and pathogenicity 12,13,14 . Increased infectivity, resistance to monoclonal antibody therapy and evasion of the immune response were among the most frequently observed phenotypes attributed to WHO-monitored variants; these phenotypes often dominated transmission and replacement of other lineages upon emergence 15,16,17,18 . The Variant Being Monitored (VBM) B.1.1.7 (Alpha) was rst identi ed in the United States in late December 2020 and was then characterized by a considerable increase in COVID-19 incidence associated with increased infectivity and occasionally more severe disease manifestations that increased hospitalization rates 19,20 . Alpha became the dominant variant, especially in Europe and the United States, until the emergence of Variant of Concern (VOC) B.1.617.2/AY.x (Delta), rst identi ed in the United States in May 2021, which developed into a prominent variant with an apparent higher virulence and pathogenic phenotype 21,22,23 . Because of the potential for increased transmissibility, morbidity mortality, and decreased e cacy of vaccines and other intervention strategies, monitoring the spread of variants (VBMs and VOCs) rapidly became a public health concern and priority 24, 25 .
Puerto Rico, an unincorporated territory of the United States, is a densely populated island and a popular tourist destination located in the Caribbean basin. SARS-CoV-2 was rst identi ed in Puerto Rico on March 13th, 2020, in two European travelers who arrived on a cruise ship and in one local resident who had close contact with family members with recent travel history. Additional travel-related and local cases were con rmed within the following weeks 26 . In response to the emerging threat, the government of Puerto Rico executed the most restrictive (compared to the United States) national stay-at-home order on March 15th, 2020, to mitigate transmission while preparing the public health infrastructure for the imminent impact 27, 28 . Travel restrictions imposed by the United States during the initial pandemic minimized international tra c to Puerto Rico, although domestic travel from the United States continued. Puerto Rico represents a unique epidemiologic setting in a geographically isolated location (an island), but with a regular in ux of travelers mostly from the United States. This is an ideal setting to monitor introduction and spread of SARS-CoV-2 variants and answer questions to help inform SARS-CoV-2 spread and disease prevention strategies. Puerto Rico's public health response incorporated the increase of laboratory capacity extensive molecular surveillance, which presented a unique opportunity to study the impact of SARS-CoV-2 variant turnover, local dissemination, and evolution during a period of changing epidemiology and public health responses.
In response to the impending local epidemic, we established a partnership with the local health authorities and academia to conduct a genomic surveillance initiative to sample complete genomes of SARS-CoV-2 across the island through time, monitor lineage circulation and understand the genomic epidemiology of the COVID-19 pandemic in Puerto Rico. This report presents the results from 19 months of genomic surveillance and phylogenetic analyses, which identi ed multiple introduction events that propelled the rapid expansion and persistent transmission of the virus on the island and lead to the establishment of an autochthonous lineage between August 2020 and January 2021.

Local epidemic and variant detection
During March-July 2020, the number of con rmed COVID-19 cases remained low, associated with the strict stay-at-home order. When the order was lifted in peaks in December 2020, April 2021, and August 2021. By September 30th, 2021, the Puerto Rico Department of Health (PRDH) and the Centers for Disease Control and Prevention (CDC) reported over 181,599 con rmed cases 26 .
We conducted genomic surveillance for 19 months since March 2020, where, each month, we sequenced SARS-CoV-2 positive specimens from recently symptomatic and asymptomatic patients residing in 63 of the 78 municipalities, covering all 7 health regions of the island. The frequency of lineages detected was calculated periodically from all viral genomic sequences from Puerto Rico published in GISAID, including our sequence datasets, and reported to the PRDH to inform case investigations and surveillance ( Figure 1B). Our data comprises the genomes from the initial SARS-CoV-2 con rmed infections detected in March 2020, which included 3 European tourists that arrived on the island in a cruise ship and 8 residents with no recent travel history declared. PANGO  Figures 1A and 1B). During this period, most COVID-19 cases in Puerto Rico were caused by Delta and approximately 18 Delta sub-lineages were detected in the island, with AY.3 as the most frequently sampled sub-lineage ( Figure 1C). By September 30th, 2021, a steep decrease in con rmed cases was observed, a point in which more than 77% of the population had received at least one dose of the vaccine ( Figure 1A).

Phylogenetic reconstruction of the local pandemic
This study generated 753 complete genomes from viruses sampled between March 2020 and September 2021. Our dataset was combined with 2,611 publicly available genomes in GISAID to understand the emergence and spread of the viruses circulating in Puerto Rico in a global context. We reconstructed the local and regional epidemic using a time-calibrated phylogenetic tree inferred with maximum likelihood (Figure 2). This global phylogenetic analysis estimated that the initial SARS-CoV-2 introductions occurred between February 19 and March 16, 2020. Most viral genomes from Puerto Rico descend or are closely related to genomes from the United States. However, we were unable to determine the precise origin at the state level due to the limited sampling during the emergence period and subsequent low circulation. The resulting tree topology inferred viral sequences from Puerto Rico scattered across the global tree, smaller shortlived monophyletic clusters, and larger monophyletic clusters that suggest sustained transmission of a particular genotype. Our analysis also showed the emergence and spread of the SARS-CoV-2 variants detected in Puerto Rico. Multiple monophyletic clusters of Puerto Rican sequences were inferred within the clades formed by each emergent variant and the size of the clades is proportional to the frequency of genomes sampled in the island ( Figures 1B and 2). The Emergence of SARS-CoV-2 variants VBM Alpha was rst detected in Puerto Rico in January 2021, co-circulating with local predominant lineage B.1.588 and other B lineages at a lower rate ( Figure 1B). Notably, this VBM replaced the well-established autochthonous lineage B.1.588. The emergence and epidemiology of Alpha in Puerto Rico resembled the patterns observed in the United States, with rapid spread and a sharp increase in con rmed cases 19,20 . To understand the emergence and spread of Alpha in Puerto Rico, we inferred a maximum likelihood phylogenetic sub-tree with all Alpha genomes obtained in our dataset in addition to a subset of other Alpha genomes from Puerto Rico, the United States, and a regional context backdrop (Figure 4). The resulting inference estimated that the emergence of Alpha in Puerto Rico may have occurred between November 6th, 2020, and December 31st, 2020. Tree topology showed multiple monophyletic clusters of Puerto Rican sequences diverging across a period of 4-5 months of circulation. The larger clusters of Puerto Rican sequences suggest that local transmission of speci c Alpha genotypes was sustained, succeeding after in multiple introduction events. Most of these clusters were associated with sequences from the United States, suggesting that multiple introductions occurred over a period of 5-6 months, propelling the local transmission of this variant. We also found a subset of Puerto Rican sequences associated with sequences from the Caribbean and the Americas but low node support impaired resolution of transmission patterns.
VOC Delta was rst detected in Puerto Rico in June 2021, during a period when SARS-CoV-2 transmission was declining, and the vaccination campaign was progressing rapidly (Figures 1A and 1B). After its initial detection, Delta spread rapidly across the island ( Figure 1B). Over 30% of the COVID-19 cases sampled and sequenced in June 2021 were caused by Delta. This variant has been characterized broadly as the most dominant emerging variant, replacing most lineages, and causing most of COVID-19 cases in the United States and Puerto Rico from its emergence through November 2021. To understand the rapid emergence and spread of Delta in the island, we reconstructed a maximum likelihood phylogenetic sub-tree with all Delta Puerto Rican sequences obtained in our dataset supplemented with additional sequences from Puerto Rico and the United States retrieved from GISAID with collection dates between May 1st, 2021, and September 31st, 2021. According to our phylogenetic inference, the emergence of Delta in Puerto Rico may have occurred between April 15th and June 14th, 2021, potentially after one or multiple introductions. The precise origin of the introductions was challenging to resolve, considering that multiple sequences from Mexico, the United States, and the Caribbean cluster among the early sampled sequences from Puerto Rico with low node support ( Figure 5).
The rst Delta lineage to be detected was B.1.617.2, which seems directly related to a small number of VBM B.1.617.1 (Kappa) that clustered basal to the subtree. Tree topology is similar to the patterns observed in the Alpha sub-tree, where more than 17 distinct clusters with sequences from Puerto Rico were observed diverging across 4 months of circulation in the island. Most of these clusters were associated with distinct Delta sub-lineages and seem closely related to similar sequences from the United States and the Caribbean. These clustering patterns and the diversity of Delta sub-lineages detected suggest that multiple introductions throughout 5-6 months propelled the emergence and transmission of this variant in the island.

Discussion
Despite initial containment of COVID-19 cases in the spring of 2020, SARS-CoV-2 subsequently spread rapidly across the island propelled by multiple introductions and extensive intra-island transmission. Our partnership responded rapidly with the launch of this collaborative study aimed at enhancing the local capacity for genomic surveillance, understanding the development of the epidemic, and tracking viral lineages of public health concern. Our efforts also led to the early identi cation and tracking of most of the SARS-CoV-2 variants of lineages circulating in the island; information that was reported monthly to the Puerto Rico Department of Health (PRDH) to inform case investigation and epidemic response. As a result, our study contributed more than 75% of the genomes from the initial phase of the local epidemic through December 2020, and overall, to over 23% of the genomes sampled in Puerto Rico published in GISAID by October 31st, 2021.
Our systematic genomic surveillance and phylogenetic analyses suggest that the local epidemic was initiated and frequently boosted by travel-associated cases. This hypothesis is supported by the diversity of B.x lineages detected in the island during a period when the local population movement was restricted by stay-at-home orders and other response measures enforced by the local government 27, 28 . In addition, the identi cation of multiple clusters of Puerto Rican sequences phylogenetically related to viruses found circulating primarily in the United States further supports this assessment. The increase in COVID-19 cases observed in the summer of 2020 coincided with lifting the stay-at-home order and increased tra c of tourists during a period of high SARS-CoV-2 transmission in the United States. Travel restrictions between countries with high incidence and the United States affected Puerto Rico by interrupting international travel but potentially increasing the volumes of domestic travel to and from the island. Consequently, we did not observe frequent clustering of Puerto Rican sequences with sequences from regions other than the Americas. The frequent mixing of viruses between Puerto Rico and the United States, the lineage diversity circulating in the island, and density of sampling at the location of origin generated phylogenetic uncertainty, limiting the capacity to resolve the precise origin of some local clusters and speci c transmission chains 29 .
Upon the emergence and rapid spread of SARS-CoV-2 during the rst 10 months of the epidemic (March 2020-January 2021), we observed a shift in phylogenetic clustering patterns that most likely resulted from in situ viral evolution and adaptation to the local population or environment. This pattern was observed with the divergence of lineage B.1.588 from the persistent local transmission of B.1 lineage, possibly due to founder effect. The xation of two nonconservative mutations in the spike and nucleocapsid proteins seem to have improved tness for local circulation. Similar clustering patterns resembling adaptation, as well as the divergence of local lineages, have been observed in other regions of the world after a period of sustained local transmission leading to in situ evolution, including in the United Kingdom after the re-opening of a national lockdown 30,31,32,33 . Interestingly, B.1.588 declined during a period of peak transmission of SARS-CoV-2 and increased travel in which Alpha was potentially introduced. These observations indicate the limited effectiveness of the initial efforts to prevent SARS-CoV-2 introductions and spread. The subsequent decline of the rst epidemic peak could potentially be attributed to the combination of the local government response measures and the launch of the vaccination campaign.
The predominance of Alpha and the subsequent second epidemic peak could suggest that this variant presented a more virulent phenotype with a higher infectivity rate that outcompeted the autochthonous B.1.588 and other lineages in January 2021. This is consistent with the recurring observation that, once VBM/VOC emerged in the island, the frequency of other B.1x lineages was reduced substantially from that point forward. It is possible that this dynamic may have contributed to increased transmission during the initial phase of vaccination before the local population reached a 50% vaccination rate. Our ndings are concordant with the epidemiological scenario in the United States.
As the vaccination campaign progressed during the spring of 2021, the number of cases decreased substantially, and the local government reduced most restrictions to public gatherings and indoor activities 27 . Puerto Rico then experienced another wave of increased travel during the summer of 2021 coinciding with the emergence of Delta. Though this variant was rst detected in June 2021, our analyses estimated that the variant could have been introduced during the second epidemic peak in April 2021. If so, Delta could have faced a complex scenario with Alpha predominating circulation and almost half of the population vaccinated against the virus. However, the transmission of Delta displaced most of the circulating lineages and led to the third epidemic peak. Two different scenarios could be proposed for the emergence of Delta in Puerto Rico. First, Delta could have been introduced when Alpha dominated transmission, the two variants co-circulated, but Delta presented the phenotype that could outcompete Alpha. Key mutation patterns in Delta have been proposed to confer an advantage over other lineages 34 . Alternatively, the transmission of Alpha could have been already declining when Delta was introduced and emerged in a susceptible population. This second scenario proposed a gap between the dominance of Alpha and Delta. Recent reports from Madrid suggest that virus competition upon the emergence of Delta was not the exclusive factor driving the decline of Alpha but also a period of declined Alpha transmission facilitating the emergence of Delta in the region 35 . Though it is possible to speculate that Delta's phenotype exhibited some resistance to the vaccine, the decline of the third epidemic peak after approximately 4 months of Delta circulation, coincides with the population reaching a 77% vaccination rate with at least one dose.
The pattern of lineage turn-over observed in this study, where the predominant variants circulated for approximately 4 months and then were replaced by another variant in Puerto Rico, should be further compared with transmission in larger countries with larger populations and human movement. Similar patterns have been observed in the United States, where variant emergence and spread potentially affect Puerto Rico 50 . Travel restrictions could have blocked the introduction of additional variants directly to Puerto Rico through international travel. However, our results suggest that lineage turn-over was in part driven by domestic travel with the United States. We also speculate that these cycles could be related to the limitations of an island geography, the rapid response of the local government with restrictive measures to controls the spread of the virus and a vaccination campaign that reached over 84% of the population by December 1st, 2021 26 .
Although our study sampled viral genomes from various cases and periods during the local pandemic, the universal ARTIC v3 tiled PCR-amplicon NGS work ow's sensitivity used in this study is limited to specimens with PCR Ct values below 28. Thus, we question if there are viral genetic differences from infections with lower viral loads or inconsistent variant-calling from samples with lower viral loads that might affect the accuracy of lineage assignment 36, 37,38 . The sampling for this study was also limited due to the ability of our partnership to procure samples from every municipality of the island, especially during the rst year of the local epidemic. Future national genomic surveillance programs could bene t from improved systematic sampling engaging with clinical laboratories to ensure timely reporting of results to the local public health authorities, proper sample storage, and transfer to sequencing laboratories. Regarding phylogenetic analyses, the slow mutation rate and the low genetic diversity of the virus frequently impair the resolution of internal nodes with statistical support affecting the interpretation of phylogenetic histories and geographic origins. These analyses could also be affected by selecting context genomes from the unprecedented abundance of genomic data published in GISAID.
This study provides an overview of the COVID-19 epidemic in Puerto Rico during March 2020-November 2021 from the molecular epidemiology perspective.
The documentation of an autochthonous lineage and dynamics of virus movement between the United States and Puerto Rico is important to inform prevention and surveillance efforts in both regions. Our phylogenetic study offers the genomic framework to understand the genomic changes occurring through time in the Puerto Rican population and elucidate the mutational landscape within this region. Furthermore, our ongoing genomic surveillance initiative will facilitate the study of SARS-CoV-2 intra-island phylodynamics and compare pre-and post-vaccination populations. Finally, this report highlights the importance of government and academic partnerships to respond to public health threats and the potential of systematic genomic surveillance to improve disease prevention and control.

DISCLAIMER
The ndings and conclusions in this report are those of the author(s) and do not necessarily represent the o cial position of the Centers for Disease Control and Prevention or the National Institute of Health.
(ThermoFisher), and tiling PCR amplicons were generated using Q5® high-delity DNA polymerase (New England Biolabs) and the ARTIC nCoV-2019 V3 primer scheme purchased from Integrated DNA Technologies (https://github.com/artic-network/artic-ncov2019/blob/master/primer_scheme/nCoV-2019/V3/nCov-2019.tsv). Candidate samples for sequencing presented clearly visible bands of target size (approximately 400bp) in DNA gel electrophoresis for both primer pools. PCR products were puri ed with AMPure XP magnetic beads (Beckman Coulter) and quanti ed using Qubit 4.0 uorometer (ThermoFisher). DNA libraries were generated using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs) reducing all reagents volumes to 25% from the manufacturer recommended protocol to increase throughput. The resulting products were screened for size and quality using the Bioanalyzer 2100 instrument (Agilent Technologies) and quanti ed with Qubit 4 uorometer (ThermoFisher). Passing libraries were pooled and run in the MiSeq sequencer instrument (Illumina) using the MiSeq Reagent Kit v3 in 600-cycle program.
The resulting sequence reads were screened for quality, trimmed and assembled into complete consensus SARS-CoV-2 genomes using the Genome Detective Virus Tool v1.136 42 (https://www.genomedetective.com) and assembly con rmed with iVar 43 . The Pangolin COVID-19 Lineage Assigner tool was used for lineage assignment 41 (https://pangolin.cog-uk.io). A total of 753 samples were sequenced with more than 95% genome coverage at a minimum of 10x sequence depth. All sequence data obtained for this study was submitted to GISAID, accession numbers available in Supplementary Table 1.

Phylogenetic analysis
Our Puerto Rico SARS-CoV-2 genomes dataset was analyzed against a diverse panel of genomes from across the world which provide regional phylogenetic context. Initially, we downloaded the Genomic Epidemiology metadata package for all entries from GISAID on August 18 th , 2021 to screen genomes for subsampling. However, due to the large number of genomes available in GISAID, we downloaded and combined the following pre-sampled datasets for regional studies: NextRegion-North America, NextRegion-South America, and NextRegion-Global. We then used the standard ncov augur/auspice multiple input work ow available in the Nextstrain platform 44 (https://github.com/nextstrain/ncov) to subsample contextual genomes and phylogenetic inference with timestamped trees. The custom subsampling scheme program selected 2,611 contextual genomes from the United States, North America, the Caribbean, Central America, South America, Africa, Europe, Asia, and Oceania, with higher proportions on The Americas based on collection dates and genetic proximity to our Puerto Rico dataset. The combined dataset of 3,364 genomes was aligned using MAFFT 45  bootstrap replicates using IQ-TREE v1.6.12 46 . The resulting tree topology and node support were compared to Bayesian maximum clade credibility (MCC) tree reconstruction using BEAST v1.10.4 49 . Brie y, we used time-calibrated genomes with sample collection dates and the nucleotide substitution model parametrized using Yang96 model under strict molecular clock, and Bayesian Skyline coalescent model. Markov chains were run for a total of 100 million steps with sampling every 10,000 steps in the chain. Run results were evaluated in Tracer (http://tree.bio.ed.ac.uk/software/tracer/) to ensure stationary parameters with statistical errors re ected in 95% highest probability density ranges with effective sample size (ESS) higher than 200 for each tree prior. MCC trees were generated in TreeAnnotator from BEAST package after discarding 10% of runs as burn-in.   (n=1,360 genomes). Sub-lineages with more than 5% detection are represented by individual color, whereas sub-lineages with less than 5% detection are categorized as a collective labeled "Other".

Figure 2
Phylogenetic reconstruction of local SARS-CoV-2 epidemic in Puerto Rico in a global context. Maximum likelihood tree inferred with 3,364 complete genomes including 753 viral genomes from Puerto Rico sampled between March 23 rd , 2020 and September 30 th , 2021 (red branches) combined with 2,611 complete genomes retrieved from GISAID during the same period to provide a global backdrop with a higher focus on the Americas region. Node structure is supported by 1,000 bootstrap replicates. Branches marked in red represent taxa from Puerto Rico. The outer ring is color-coded by region of origin. The inner wedges are color-coded to represent emerging variants of interest or concern. The phylogenetic tree is rooted in Wuhan/WH01/2019 and Wuhan/Hu-1/2019 reference genomes.

Figure 4
Emergence and spread of VBM Alpha in Puerto Rico driven by multiple introductions. Phylogenetic reconstruction using a maximum likelihood tree inferred with 730 time-calibrated complete genomes, including 160 viral genomes from Puerto Rico and 570 contextual viral genomes from the United States and the Americas to provide a regional backdrop. Node structure supported by 1,000 bootstrap replicates. Tree topology shaded in red represents clusters of viral genomes from Puerto Rico, blue shades represent clusters of genomes from the United States, and the gray shades represent clusters from other countries.

Figure 5
Emergence and spread of VOC Delta in Puerto Rico driven by introduction of multiple sub-lineages. Phylogenetic reconstruction using a maximum likelihood tree inferred with 815 time-calibrated complete genomes including 324 viral genomes from Puerto Rico and 491 contextual genomes from the United States and the Americas to provide a regional backdrop. Node structure supported by 1,000 bootstrap replicates. Tree topology sections shaded in red represent clusters of viral genomes from Puerto Rico. Each cluster from Puerto Rico is labeled with cluster number C.x and Delta sub-lineage PANGO assignment [AY.x].

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.