SARS-CoV-2 Infections in Vaccinated and Unvaccinated Populations in Camp Lemonnier, Djibouti, from April 2020 to January 2022

The global pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has highlighted the disparity between developed and developing countries for infectious disease surveillance and the sequencing of pathogen genomes. The majority of SARS-CoV-2 sequences published are from Europe, North America, and Asia. Between April 2020 and January 2022, 795 SARS-CoV-2-positive nares swabs from individuals in the U.S. Navy installation Camp Lemonnier, Djibouti, were collected, sequenced, and analyzed. In this study, we described the results of genomic sequencing and analysis for 589 samples, the first published viral sequences for Djibouti, including 196 cases of vaccine breakthrough infections. This study contributes to the knowledge base of circulating SARS-CoV-2 lineages in the under-sampled country of Djibouti, where only 716 total genome sequences are available at time of publication. Our analysis resulted in the detection of circulating variants of concern, mutations of interest in lineages in which those mutations are not common, and emerging spike mutations.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was first identified in Wuhan, China, in December of 2019 [1,2] and causes a severe respiratory disease, termed coronavirus disease 2019 (COVID-19) [3]. The World Health Organization (WHO) declared SARS-CoV-2 a pandemic on 11 March 2020, and the virus caused infections on all continents, with over 533 million cases and over 6.3 million deaths worldwide as of 8 June 2022 [4]. SARS-CoV-2 is a member of the Coronaviridae family and has a positive-sense singlestranded (ssRNA(+)) RNA genome that consists of one linear RNA segment. One feature that partly distinguishes coronaviruses from other RNA viruses is the proofreading ability of the virus during transcription. There are 14 open reading frames (ORFs) with ORF1ab, the largest ORF in the genome, encoding a polypeptide [5]. The mature peptide nsp14, which is cleaved from the ORF1ab polypeptide, is an exonuclease that can proofread the nascent RNA being transcribed, excising incorrectly incorporated nucleotides [6]. Although nsp14 performs this important function, mutations still occur, and resulting variants emerge. Due to the nature of the prolonged pandemic, numerous variants have emerged, with some Viruses 2022, 14,1918 2 of 13 harboring concerning mutations that may affect the efficacy of diagnostic, prophylactic, and therapeutic countermeasures, as well as transmission rates. Examples include the emergence of lineage Alpha (B.1.1.7), a WHO-designated variant of concern (VOC); Alpha is the first lineage with this distinction and was shown to have a higher transmission rate [7]. Subsequent VOCs Beta (B.1.351), Delta (B.1.617.2), and Omicron (B.1.1.519) also have mutations in the receptor-binding domain (RBD) of the spike protein, affecting the ability of antibodies to bind and neutralize the virus [8][9][10][11].
Various technologies exist for the detection of SARS-CoV-2, including polymerase chain reaction (PCR), nanotechnology-based sensors, and viral genome sequencing [12,13]. While PCR is a method very commonly implemented, it provides only limited information (i.e., the presence of the virus or its genes), similar to nanosensors. Conversely, viral genomic sequencing allows for a full analysis, including elucidation of mutations and lineage information as well as tracking the spread and emergence of variants and the associated effectiveness of available countermeasures. SARS-CoV-2 genomic sequencing in African countries has been vastly underrepresented when compared with that of other countries such as the United Kingdom (UK), China, and the United States. Prior to our sequencing efforts, no sequences had been published for Djibouti in the Global Initiative on Sharing Avian Influenza Data (GISAID) [14], a heavily used database containing SARS-CoV-2 sequences. Two and a half years since the first identified SARS-CoV-2 cases, as of June 2022, the surrounding countries had limited sequences published to GISAID (not including the genome submissions reported herein as part of our work): 35 for Somalia, 626 for Ethiopia, and 0 for Eritrea.
Multiple vaccines were developed for SARS-CoV-2, and within the UK, the ChAdOx1 nCoV-19 vaccine AZD1222 (AstraZeneca) received regulatory approval on 2 December 2020 by the UK medicines' regulator Medicine and Healthcare products Regulatory Agency (MHRA) [15]. Sputnik V (Gamaleya Research Centre) was developed and approved in Russia in February 2020 [16]. CoronaVac COVID-19 vaccine (Sinovac) was developed and approved in China in June 2021 [17]. Within the United States, BNT162b2 (Pfizer-BioNTech) was the first COVID-19 vaccine to gain the U.S. Food and Drug Administration (FDA)'s emergency use authorization (EUA) approval on 11 December 2020, followed by full FDA approval on 23 August 2021 [18,19]. The Moderna (mRNA-1273) COVID-19 vaccine received the FDA's EUA approval on 18 December 2020 and full approval on 31 January 2022 [20,21]. The Janssen COVID-19 vaccine was EUA-approved on 27 February 2021 [22]. The most recent vaccine approved for FDA's EUA is the Novavax NVX-CoV2373 COVID-19 vaccine, approved on 13 July 2022 [23]. Both Pfizer-BioNTech and Moderna are two-dose mRNA vaccines, with three weeks and four weeks recommended between doses, respectively. AstraZeneca and Sputnik V are two-dose adenovirus vector vaccines, with a recommended time between doses of 8-12 weeks for AstraZeneca and 21 days for Sputnik V. Sinovac is a two-dose inactivated viral vaccine, administered 2-4 weeks apart. Janssen is a single-dose adenovirus vector vaccine, while Novavax is a two-dose recombinant protein vaccine administered 21 days apart.
For all the vaccines (except the single-dose Janssen vaccine), the recipient is considered fully vaccinated two weeks after the second dose. Instances of infection in fully vaccinated individuals have been documented and are not unexpected, as the maximum efficacy of the vaccines is 95% in those without prior infection (for Pfizer-BioNTech) [18]. Other vaccines, including Sputnik V, Sinovac, AstraZeneca, Janssen, Novavax, and Moderna, had lower than 95% efficacy rates [15][16][17][18]20,22,24]. At the time of vaccine efficacy trials, the VOCs were not widely circulating, but recent studies have shown some reduction in efficacies against the VOCs specifically. Studies in Qatar showed reduced effectiveness of the Pfizer vaccine against three of the VOCs, including Alpha, Beta, and Delta, with Beta and Delta showing the greatest evasion of vaccine-induced immunity [25,26]. Studies show reduced neutralization of Beta and Delta by convalescent and vaccinated sera, but minimal immune evasion by Alpha [8,9,[27][28][29]. A study in Scotland showed reduced efficacy of the AstraZeneca vaccine against both Alpha and Delta [30]. Finally, a South African study Viruses 2022, 14,1918 3 of 13 showed the Pfizer vaccine had reduced efficacy against the latest VOC, Omicron In this study, we described 589 SARS-CoV-2 genomes, including those from 196 cases of vaccinebreakthrough infections (VBI) at Camp Lemonnier, Djibouti, in individuals vaccinated with the Moderna, Pfizer-BioNTech, Janssen, AstraZeneca, Sinovac, or Sputnik V vaccines.

Sample Collection
From April 2020 through January 2022, the 795 patient samples in this study were collected at Camp Lemonnier, Djibouti, and tested positive for SARS-CoV-2. The samples were collected from symptomatic and asymptomatic patients via nasopharyngeal swabs and then placed into viral transport media (VTM). All samples tested SARS-CoV-2 positive via one of the FDA emergency-use-authorized BioFire panels: Respiratory Panel 2.1 or COVID-19 Test v1.1. After testing positive for SARS-CoV-2, all samples were stored at −20 • C for further testing.

RNA Extraction and Genome Sequencing
Viral genome sequencing and analysis was conducted from primary material at Naval Medical Research Center, Fort Detrick, MD, under non-human subject research determination PJT 20-08. Briefly, RNA was extracted from 0.25 mL of VTM using 0.75 mL of TRIzol LS reagent (Invitrogen; Carlsbad, CA, USA) according to the manufacturer's protocol. The RNA concentration was measured using a Qubit RNA High Sensitivity assay (Ther-moFisher Scientific; Waltham, MA, USA) prior to use in the ARTIC nCoV-2019 sequencing protocol [31], the NEBNext ARTIC SARS-CoV-2 Library Prep Kit (New England Biolabs, Ipswich, MA, USA), and/or the QIAseq DIRECT SARS-CoV-2 Kit (Qiagen, Valencia, CA, USA). Briefly, the RNA was reverse-transcribed, and cDNA was then amplified using multiplex PCR and either the associated ARTIC primer pools or the QIAseq DIRECT primer pools. The inserts were then polished and ligated to Illumina-compatible adaptors and indexes. The libraries were quality-checked using an Agilent High Sensitivity DNA kit (Agilent Technologies; Santa Clara, CA) and quantitated using a Qubit DNA High Sensitivity assay (ThermoFisher Scientific) prior to sequencing using Illumina MiSeq v3 2 × 300 sequencing kits and MiSeq sequencers (Illumina; San Diego, CA, USA).

Bioinformatic Analyses
Viral Amplicon Illumina Workflow (VAIW) [32] was used to collate and analyze the SARS-CoV-2 genomes from resulting sequencing reads as described previously [33]. Briefly, Illumina reads were trimmed and filtered to Q20 and a minimum length of 50 bp using BBDuk [34]. The paired reads were then merged using BBMerge with default settings [35]. The trimmed, filtered, and merged reads were then aligned to the Wuhan reference genome (NCBI GenBank accession NC_045512.2/MN908947.3) using BBMap with local alignment and a maximum insertion/deletion of 500 bp [34]. The amplicon primers were trimmed from sequences using align_trim from the ARTIC workflow/pipeline [36]. Consensus genomes were generated when possible, and single-nucleotide variants (SNVs) were determined using the SAMtools mpileup [37] and iVar (intrahost variant analysis of replicates) [38]. The resulting mappings were visualized and examined for artifacts, and when necessary, the genomes were manually closed in CLC Genomics Workbench v2021.0.4 (Qiagen; Valencia, CA, USA). The lineage determination of the consensus genomes was conducted using Pangolin (Phylogenetic Assignment of Named Global Outbreak LINeages; v.3.1.20) [39]. The clade assignments and consensus mutations were determined using Nextclade CLI 1.2.0 and Nextalign CLI 1.2.0. The viral genome data resulting from this study are available in GISAID, and their accessions are found in the Supplementary Materials. Alignments were performed using MAFFT [40], and a Maximum Likelihood tree was generated with IQ Tree [41] ML (GTR+G) with 1000 bootstraps using a sub-set of the SARS-CoV-2 genomes available from the Global Initiative on Sharing All Influenza Data repository (GISAID accessed February 2022). The resulting trees were visualized  [42]. The lineage distribution over time was visualized with a custom python script using the library Matplotlib [43]. Molecular docking was performed on the homology-modelled receptor-binding domain (RBD) of the SARS-CoV-2 spike (S) protein and human ACE2 proteins using the hybrid docking method of HDOCK (http://hdock.phys.hust.edu.cn/, accessed on 29 July 2022). The RBD protein sequences of the wild-type Wuhan virus (YP_009724390), A425, and A425 with mutated S494P were used as ligands, and PDB id 6LZG Chain A was chosen as the human ACE2 receptor sequence.

Lineage Distribution Trends
A total of 795 SARS-CoV-2 positive samples were collected at Camp Lemonnier, Djibouti, between April 2020 and January 2022. Of those, 655 (82% of total samples) passed the library preparation quality control and thus proceeded for viral genome sequencing, resulting in 445 coding-complete genomes (68% of sequenced genomes) [44]. Of the remaining 210 consensus genomes, 144 genomes (22% of sequenced genomes) reached a consensus length ≥ 20,000 nucleotides (nt), while 66 samples (10% of sequenced genomes) either did not reach a consensus length ≥ 20,000 nt or were omitted due to sequence quality issues and thus were not included in subsequent analysis. An exception is the vaccine breakthrough infection (VBI) samples under 20,000 nt, which are discussed further below. The metadata (collection date, consensus genome length, Pango lineage, NextClade, and GISAID accessions) associated with 589 samples (the coding-complete and those with a consensus genome length ≥ 20,000 nt) can be found in Supplementary Table S1.
The Pango lineage assignments of the sequenced samples revealed trends of the emergence of specific lineages over the 21-month timeframe at Camp Lemonnier ( Figure 1A Figure 1B). Interestingly, despite its dominance at Camp Lemonnier at that time, AY.127.1 is a rare lineage, with only 337 genomes published to GISAID as of June 2022, accounting for less than 0.5% of lineages worldwide at that time. This therefore likely represents cluster outbreaks with two possible introductions occurring on Camp Lemonnier (February 2021 and August 2021). All 589 samples with lineages assigned are shown in a maximum likelihood phylogenetic tree, with the four main groups highlighted: VOCs Alpha, Beta, Delta, and Omicron (Figure 2). Lemonnier at that time, AY.127.1 is a rare lineage, with only 337 genomes published to GISAID as of June 2022, accounting for less than 0.5% of lineages worldwide at that time. This therefore likely represents cluster outbreaks with two possible introductions occurring on Camp Lemonnier (February 2021 and August 2021). All 589 samples with lineages assigned are shown in a maximum likelihood phylogenetic tree, with the four main groups highlighted: VOCs Alpha, Beta, Delta, and Omicron (Figure 2).

Vaccine Breakthrough and Reinfection Cases
Of the 655 sequenced samples, 252 of these were VBI (either partial vaccination, full vaccination, or unknown vaccination details) cases (

Spike Mutations
In addition to characterizing the genomes by lineage, we also looked for specific mutations of interest and found that a group of 16 B.1 lineage viral genomes collected in November 2020 had the spike mutation of interest, S494P. It is striking since B.1 lineage viruses rarely have this mutation, which is reported in less than 0.5% of B.1 lineages viruses in GISAID even over a year after these samples were collected, as of June 2022. Located in the receptor-binding domain of the spike protein, a S494P mutation contributes to increased binding affinity to ACE2, the cellular receptor for SARS-CoV-2 [46], as well as reducing neutralizing antibody efficacy [47]. This group of samples likely represents an outbreak, with 14 of the samples collected on the same day, 16 November 2020, and is further supported by the clustering of these genomes on the phylogenetic tree ( Figure 4A). S494P was also found in a cluster of nine BA.1.1 (Omicron) lineage viruses collected in December 2021, all of which were VBIs ( Figure 4B). Similarly to B.1, this mutation is rare in this lineage, being found in less than 0.5% of BA.1.1 lineage virus sequences as of June 2022. Consistent with another study [46] showing increased binding affinity between S494P to human ACE2, our analysis with HDOCK showed a predicted increased binding affinity of a BA.1.1 lineage genome (sample A425) with the addition of the S494P spike mutation. The docking score decreased to −364 from −347 with the addition of S494P.
Additionally, six genomes of the B.1.1.306 lineage have another spike mutation of interest not commonly associated with this lineage, P681H (Supplementary Table S1). P681H is suggested to increase infectivity of the virus as it directly neighbors the furin cleavage site of the spike protein [48]. One sample was assigned to lineage B.1.1.25 but also had the P681R spike mutation, a mutation found in less than 0.5% of the B.1.1.25 lineage viruses on GISAID as of June 2022. Similar to P681H, P681R may increase infectivity of the virus [50].
In addition to the BA.1.1 VBI samples with spike S494P discussed above, we noted other viral genomes from VBIs in this study having mutations of interest that are not commonly associated with their assigned lineages. One genome from a VBI case, which was assigned lineage B.1.1, had the spike mutations of concern, E484K and N501Y, which are only found in less than 0.5% of the published B.1.1 lineage sequences as of June 2022 and are known to reduce antibody neutralization efficacy alone (E484K) or in combination (E484K/N501Y) [49]. This particular genome also had the spike mutation H1271Y, which is present in the C-terminus of the S2 intraviron region of the spike protein and is involved in the incorporation of the spike protein into the virion and the spike-mediated virus-host Viruses 2022, 14,1918 9 of 13 cell membrane fusion [50]. Strikingly, H1271Y is present in only one other B.1.1 sequence published to GISAID, and that was collected in Switzerland.  Table S1). P681H is suggested to increase infectivity of the virus as it directly neighbors the furin Two other genomes, both from VBIs and assigned lineage B.1.1.7 (Alpha), had the spike mutation H49Y, a mutation found in less than 0.5% of the Alpha (B.1.1.7) sequences on GISAID as of June 2022. It was also found in 20 other samples of the VOC Alpha genomes sequenced in this dataset that were not VBIs. H49Y is in the N-terminal domain of the S1 sub-unit of the spike protein and has been shown to increase the stability of the spike protein and to increase cellular entry [46,51]. Due to the rarity of this mutation in the B.1.1.7 lineage sequences, this group likely represents a cluster outbreak occurring at Camp Lemonnier, with the first case collected on 25 February 2021 and the last case collected on 10 March 2021, further supported by their clustering on the phylogenetic tree ( Figure 4C).

Discussion
An analysis of 795 samples from Camp Lemonnier, Djibouti, from April 2020 through January 2022 resulted in the detection of VOCs and VUMs, the identification of mutations of interest in lineages not normally associated with them, the identification of vaccine breakthrough infections, and the general knowledge of circulating lineages and mutations in this geographic location. Overall, during the timeframe of this study, Camp Lemonnier followed global trends for the circulation of certain lineages, such as B.  [14]. B.1.324 comprised less than 0.5% of worldwide lineages as of June 2022, with the greatest prevalence in Djibouti (including all samples discussed in this text) and the British Virgin Islands [14]. The samples in this dataset with these lineages likely represent small cluster outbreaks, which is further supported by the 11 cases of B.1.324 being from passengers arriving from the U.S. together on an inbound flight. The most prevalent lineage circulating in the dataset is BA.1.1, a sub-lineage of Omicron; within the vaccine breakthrough samples, VOC Omicron is the predominant lineage, followed by the VOC Delta. The VOCs Delta, Beta, and Omicron were documented to partially evade the immunity induced by the currently available vaccines [25,26,30,52], which is consistent with the reduced efficacy of antibodies to Beta, Delta, and Omicron due to the mutations present in the spike protein [8][9][10][11]25,29,53].
In addition to determining the circulating lineages at Camp Lemonnier, we identified spike mutations of interest in lineages where they are rarely found. This included a group of samples from the B.1 and BA.1.1 (Omicron) lineages with the spike S494P mutation, a mutation of interest as it reduces neutralizing antibody efficacy [47]. Other spike mutations of interest identified in lineages where they are rarely found included P681R, P681H, E484K, N501Y, and H49Y. These all are associated with previous evidence in the literature of increasing infectivity and/or reduced neutralization by antibodies [46,48,49,54]. Although the major variants circulating at Camp Lemonnier broadly appear to align with the variants that circulated in other regions of the world at the same time, the finding of multiple "rare" mutations in these different lineages at Camp Lemonnier, in a region of the world where relatively few SARS-CoV-2 genomic sequences have been reported overall, suggests that uneven sampling from various geographic regions may be skewing the databases toward more industrialized areas and thereby missing or underrepresenting the viral genetic variations in the rest of the world. Tracking emerging mutations of interest in new or even previously described lineages is critical for advising on therapeutics and public health policies, as well as for the prediction of risk to deployed military forces around the globe.
The viral sequences in this analysis were the first published to GISAID from the country of Djibouti. Despite some unique characteristics of infectious disease surveillance in a deployed military setting, it has been demonstrated previously in influenza-like illness (ILI) surveillance that more than a quarter of the ILI cases surveilled at Camp Lemonnier were among the personnel living and interacting within the local community [55]. Therefore, it is reasonable to infer that the SARS-CoV-2 surveillance data from Camp Lemonnier would include not only lineages that may be introduced from recent personnel movement but also lineages that are circulating regionally within Djibouti. Overall, this dataset provides a critical addition to the knowledge base of SARS-CoV-2 lineages circulating in Camp Lemonnier, Djibouti, specifically and perhaps in Africa in general, and it underscores the importance of efforts aimed at sampling from various geographic regions. The surveillance of SARS-CoV-2 infections and vaccine breakthroughs in the U.S. military and adjunct staff at Camp Lemonnier is necessary for the implementation of control measures, as well as for advising policy on protective measures for the overseas military.