Unique Genomic Epidemiology of COVID-19 in the White Mountain Apache Tribe, April to August 2020, Arizona

ABSTRACT The first case of coronavirus disease 2019 (COVID-19) within the White Mountain Apache Tribe (WMAT) in Arizona was diagnosed almost 1 month after community transmission was recognized in the state. Aggressive contact tracing allowed for robust genomic epidemiology of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and subsequent phylogenetic analyses implicated only two virus introductions, which resulted in the spread of two unique viral lineages on the reservation. The phylogenies of these lineages reflect the nature of the introductions, the remoteness of the community, and the extraordinarily high attack rates. The timing and space-limited nature of the outbreaks validate the public health tracing efforts involved, which were illustrated by multiple short transmission chains over a period of several weeks, eventually resulting in extinction of the lineages. Comprehensive sampling and successful infection control efforts are illustrated in both the effective population size analyses and the limited mortality outcomes. The rapid spread and high attack rates of the two lineages may be due to a combination of sociological determinants of the WMAT and a seemingly enhanced transmissibility. The SARS-CoV-2 genomic epidemiology of the WMAT demonstrates a unique local history of the pandemic and highlights the extraordinary and successful efforts of their public health response. IMPORTANCE This article discusses the introduction and spread of two unique viral lineages of SARS-CoV-2 within the White Mountain Apache Tribe in Arizona. Both genomic sequencing and traditional epidemiological strategies (e.g., contract tracing) were used to understand the nature of the spread of both lineages. Beyond providing a robust genomic analysis of the epidemiology of the outbreaks, this work also highlights the successful efforts of the local public health response.

most federally recognized NA tribes in the country (n = 22); some of these tribes (e.g., Navajo Nation) experienced case and mortality rates higher than anywhere else in the county (7,10,11). While COVID-19 mortality rates in some NA peoples have been disproportionately high (16,17), the remarkably low case fatality rate in other Arizona tribal communities, specifically the White Mountain Apache Tribe (WMAT) within the Fort Apache reservation, is noteworthy (3,18).
Although the first case of SARS-CoV-2 in Arizona was detected in late January 2020 and community transmission was confirmed in early March (19), the WMAT did not report its first case of COVID-19 until a month later (18). The Fort Apache reservation encompasses 1.67 million acres in Arizona and has over 18,000 members living in nine main communities. Multigenerational households with $5 members are typical, and the population is significantly impacted by high rates of obesity, heart disease, diabetes, and bacterial infections (18,20,21). The WMAT's first COVID-19 case was diagnosed on 1 April 2020, and the tribe soon experienced one of the highest infection rates documented during the early days of the pandemic. By August, approximately 15% of the residents had tested positive for COVID-19 (.2,200 reported cases). Transmission of COVID-19 in the community was rapid, and secondary rates of infections among households ranged from 80 to 100% (3). However, during the peak of their 2020 COVID-19 wave, the community had a case fatality rate of just ;1.2% (22), one of the lowest recorded at the time. They sustained that low rate well into 2021 (18) despite the high rates of both comorbidities and social determinants known to increase the risk of severe disease (3,7,11,12,18,20,21). The low fatality rates were driven in large part by successful on-ground, rapid test-and-treat interventions by local health workers (3,18,22).
With the multitude of opportunities for SARS-CoV-2 to mutate due to millions of people being infected and reinfected over time, the emergence of more transmissible variants and their diverging population structure was not a surprising occurrence (23). While most mutations are not phenotypically impactful, they may serve as genotypic lineage-defining markers that allow for phylogenetic analysis at global and local levels (4,19). That being said, some SARS-CoV-2 mutations have indeed changed the phenotype of the lineage they define; for example, the D614G spike mutation was the first to be associated with higher viral loads and faster transmission (24) and to sweep the global population (25). Subsequent mutations continually drive the epidemiology of this virus and have resulted in variants of concern (VOCs) with higher fitness (referred to here by their Pango lineage names [26,27] and WHO labels), including B.1.1.7 (Alpha), B.1.617.2 and AY lineages (Delta), and BA.1, BA.2, BA.4, and BA.5 (Omicron) (28,29).
With a phylogenetic analysis of the SARS-CoV-2 derived from several epidemiologically linked clusters within the WMAT, we report a surprisingly limited number of viral introductions onto the reservation during the first several months of the pandemic and describe their lineage-defining mutations. These analyses illustrate the viral and host community dynamics that first drove and then halted the outbreak.

RESULTS
The initial SARS-CoV-2 wave across the WMAT began 1 April and died out by 13 August 2020, infecting at least 15% of the community population. Although epidemiologic and clinical analyses of the outbreak pointed to five initial clusters before widespread transmission was noted, genomic analysis identified only two main introduction events of the virus onto reservation lands, which together encompass samples from the epidemiological clusters. The majority (n = 732 [88%]) of the 831 virus genomes from this wave fell into two monophyletic clades, which comprised samples only from Arizona and almost exclusively from this tribal community. Many viral genomes (n = 99 [12%]) collected from the WMAT over the same time frame did not fall into these two virus populations, but none of these outliers appeared to be transmitted to other tribal members; therefore, these were omitted to maintain focus on the outbreak lineages.
The first monophyletic clade (n = 276) ( Fig. 1) represents lineage B.1.289 and is defined by one unique point mutation in the spike gene, conferring an H245Y amino acid change.   1) and B.1.516 (Fig. 2) illustrate the rapid rise and fall of each population and the robustness of the COVID-19 outbreak case identification. Both plots show a steady increase in genomic diversity as each lineage spread and diverged, ultimately reaching a peak in viral diversity within weeks. This peak was followed by an equal decrease in diversity as each lineage was eliminated. Both plots also show tight confidence intervals, indicating near completeness in capturing the viral diversity in these localized outbreaks. By 13 August 2020, 2,335 cumulative COVID-19 cases had occurred in this community; thus, roughly 36% of known case viruses were sequenced. Successful SARS-CoV-2 genome recovery from specimens reflects the high viral loads observed for the WMAT cases, as indicated by low cycle threshold (C T ) values obtained via real-time reverse transcription-PCR (rRT-PCR) (30). Of the 814 positive-case samples collected from this community with C T values available, 593 (73%) yielded C T values of #33.0 in the TG-N2 assay. Figures (Fig. 1). This lineage expanded through April and May, then diminished in June, and presumably went extinct in July ( Fig. 1; Fig. S1). The tree structure shows that B.1.289 followed a largely singular path for the month of April, where one branch from each node of the tree dead-ends at one or a minimal number of case genomes, while the other branch leads to a majority of all subsequent case genomes. These cases include four of the original five clusters that appeared to be separate unrelated outbreaks based on field epidemiology alone. In late April, the B.1.289 phylogeny split into four main sublineages comprising the May cases, and these sublineages remained in the population through June and early July.
The phylogeny of the second lineage, B.1.516 (N: R209I), shows that after introduction in early May, the virus diverged into several well-populated sublineages that continued into the late summer months ( Fig. 2; Fig. S2). However, after these early bifurcations, the sublineages take on a branching structure somewhat similar to that of the B.1.289 phylogeny, where each sublineage follows a largely singular pathway. The B.1.516 lineage's latest samples, collected in August, fall into only four small subclades which subsequently died out.
Three of the five epidemiologically linked clusters detected in April and May included several cases that provided viral genomes along with substantial epidemiological information. This population's index case (A1) (Fig. 3) presented to the Emergency Department (ED) in late March 2020, with fever and low back pain, and was discharged on antibiotics. Despite recent travel to Phoenix, a location experiencing a high number of cases, the patient continued to work and reportedly refused to wear a mask. They re-presented at the ED 1 day later with additional symptoms, at a time when there were no known cases of COVID-19 in this community or surrounding localities. Thus, this patient was the first in the region to meet the Arizona state testing criteria then in use. A nasopharyngeal (NP) swab sample collected the same day tested positive for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) by rRT-PCR at the Arizona State Public Health Laboratory (ASPHL) on 1 April (with a  Fig. S1 for the full phylogenetic tree). The largely linear transmission pathway illustrated by the tree and the robust sampling and sequencing illustrated by the N e plot reflect the unique circumstances of this lineage's introduction and aggressive response by tribal public health. B.1.289 first appeared in April 2020, and its elimination was successful by July. Colored boxes denote SARS-CoV-2 cases associated with cluster 1 and 2, matching those depicted in the epidemiological linkage maps in Fig. 3.
COVID-19 Genomic Epidemiology in the WMA Tribe mSphere TG_N2 assay C T value of 30.7 on the remnant specimen). The patient was hospitalized for 6 days and fully recovered. Patient A1 had five household (HH-A) contacts, all of whom became symptomatic and tested positive (Fig. 3). Given the high attack rate, extensive contact tracing was conducted (3,18). Patient A1 identified family contacts living in a second household (HH-B), where seven of nine individuals tested positive with a range of reported symptoms. Two of A1's coworkers, C1 and E1, developed mild symptoms, and C1 tested positive. In an attempt to prevent spread  COVID-19 Genomic Epidemiology in the WMA Tribe mSphere from households HH-X, -Y, and -Z, whom W1 had provided transportation to or had visited (Fig. 3). The high attack rate appeared to continue for this cluster, as three of four members of HH-X and all six members of HH-Y tested positive; the fourth member of HH-X was symptomatic but refused testing. Tracing efforts led to individual Y7, the spouse of Y1, who likely introduced the virus into HH-Y before being incarcerated April 7. Testing at the tribal department of corrections (DOC) started 16 April and revealed another 13 inmates and four guards with COVID-19. Transmission in the DOC continued the outbreak within the facility and apparently back to the community. Notably, none of the close contacts of HH-X members tested positive. Phylogenetic evidence supports the contact tracing inferences of the viral spread and provides additional key epidemiological information. In addition to the genomic links among the individuals in cluster 2, the phylogeny also shows that cluster 2 was born out of cluster 1 and that an additional mutation, T67I in the ORF1b nonstructural replicative enzyme, occurred on or around 10 April between clusters 1 and 2 and quickly went to fixation. The first sample with this genotype (TG276187) was collected April 10, while the last sample without it was collected April 23. Our analysis also revealed that SARS-CoV-2 genomes from cluster 2 sit basal to the rest of the H245Y outbreak and that genomes from the DOC cases are interspersed with those from other cases, indicating several lines of transmission between the DOC and community. These links appear to be responsible for the change from a singular transmission path to expansion of B.1.289 into several sublineages. At least 275 cases of COVID-19 were caused by B.1.289, and 261 (95%) of those were caused by B.1.289 with Orf1b: T67I, after introduction onto the reservation.
In contrast to clusters 1 and 2, where field epidemiology initially linked cases within each cluster, rapid SARS-CoV-2 genomics first alerted the tribal public health officials to cluster 3 and to an independent introduction of a new lineage, B.1.516. This combination of patient tracking and genomic epidemiology revealed that the provenance of B.1.516 with N: R209I was around 21 April, with the congregation of the first infected individuals in a home associated with substance abuse. These cases were diagnosed 2 to 5 May, and the phylogenetic tree shows that the dispersal of these individuals and their contacts launched at least two main sublineages of B.1.516 that each spread to several members of the tribal community (Fig. 2). It is possible that the origin of B.1.516 was a correctional facility, as statewide genomic surveillance identified a sample (TG276516) collected April 14 from a correctional facility in a noncontiguous county that shared a recent ancestor with the initial cluster 3 samples. This sample was used as the phylogenetic tree's outgroup. Additionally, one of the first case samples in the second major sublineage of B.1.516 (TG343840) (Fig. 2) was collected from a separate correctional facility. The lack of B.1.516 in other Arizona communities may counter this hypothesis of a correctional facility origin, instead suggesting introductions from members of the B.1.516 outbreak to correctional facilities (and rapid containment); in either case, the genomic linkage suggests an early connection between the individuals suffering substance abuse and incarceration.

DISCUSSION
COVID-19 has had significant and unprecedented impacts on human populations across the globe. In North America, NA communities have been particularly hard hit, not only in terms of morbidity and mortality (12,16) but also in the social, cultural, and economic impacts of lockdown measures and lack of access to and delays in health care, coverage, and funding (10,13,31). Reservations in Arizona have been surrounded for much of the pandemic by some of the highest case rates and hospital occupancy rates in the world, which undoubtedly impacted rates in tribal communities. Early and rapid proactive measures by tribal leadership delayed the local COVID-19 epidemics and likely saved lives (3,8,12,32); however, in spite of these efforts, infection rates in tribal communities described here and elsewhere eventually surged to unparalleled levels (3,12,13).
Through a combination of shoe-leather approaches and genomic epidemiology, we showed that only two SARS-CoV-2 viral lineages, introduced under very different circumstances approximately 1 month apart, were the cause of the multiple clusters inclusive of and exclusive to a single tribal community in Arizona. The phylogenetic patterns of both B.1.289 and B.1.516 reflect the nature of the virus introduction as well as the community's social structure and the response by public health officials. The rapid testing by the tribe's health care staff and the community's unique response to a positive case (e.g., sending healthy but infected household contacts to stay at a relative's house) may account for the singular transmission paths illustrated by the phylogenies. Initially, public health efforts were just behind the virus spread, likely preventing its expansion into multiple transmission lines. However, these efforts quickly caught up to the pace of transmission and subsequently extinguished both variants.
The introductions of B.1.289 and B.1.516 were estimated by molecular clock analysis to have occurred on 26 March and 20 April 2020, respectively, only 3 and 12 days before the first cases were reported. The tribe's public health team used their "luxury of time" in March, when Arizona's pandemic began but well before the tribe's first cases, to prepare for their response (3,22). The path of B.1.289 was largely linear from the first identified case. This case was easily visible, occurring in a health care worker and within a setting of well-informed individuals. This diagnosis came before individuals infected by this index case could spread it into a larger transmission network. In contrast, the initial cases of B.1.516 were believed to have been contracted simultaneously by congregates in a community "trap house" that then dispersed to other parts of the community, seeding several initial sublineages of this variant. It has been well documented that substance abuse and its association with a disregard for social distancing have been drivers of COVID-19 transmission (33). This hypothesis has strong merit because of the remarkable field epidemiology conducted by the tribal members and the effective population size analysis showing nearly complete viral sampling and allowing for thorough genomic epidemiology. This does not rule out possible cryptic sublineages that went undetected due to lack of sample collection (e.g., the virus spread into an unsampled host population); indeed, virus sequencing efforts significantly slowed all over the world in late summer and fall of 2020. However, subsequent comprehensive statewide SARS-CoV-2 genomic surveillance did not reveal S: H245Y and N: R209I mutations in Arizona or in the global genome database, evidencing extinction of B.1.289 and B.1.516 and refuting the possibility of a lack of sampling.
The defining mutations of these two variants, spike H245Y and Orf1b T67I in B.1.289 and nucleocapsid R209I in B.1.516, though phenotypically uncharacterized, are in regions of the SARS-CoV-2 genome considered hot spots of enhanced growth rate-associated mutations (34). Spike H245 lies in the N-terminal domain (NTD) of the protein, amid polymorphic loci associated with widespread lineages of variants of concern (e.g., A222V in Delta and D253G in Iota). Orf1b T67I falls within nsp12, the RNA-dependent RNA polymerase (RdRp) protein involved in viral replication. The mutations S202R and R203M in the linker region of the nucleocapsid protein potentially increase efficiency of SARS-CoV-2 RNA packaging and enhanced replication in lung epithelial cells (35). R209I of B.1.516 is also in this linker region. The low real-time PCR C T values and low fatality rates observed here are at least partly due to early health care interventions and aggressive contact tracing by local health care workers (3,18), which allowed for sample collection during peak viral loads (i.e., early in the infection, which also allowed for complete genome sequencing), and the high household attack rates may be due to the nature of NA society. Whether the virus lineages' mutation profiles might also contribute to this epidemiology or its exclusivity to a NA population remains to be established.
Transmission of COVID-19 in the community was rapid, and secondary rates of infections among households ranged from 80 to 100% (3). The low fatality rates were driven in large part by the successful, on-ground rapid test-and-treat interventions from local health workers. Despite prevention efforts, COVID-19 has permeated the landscape and successfully spread to some of the most remote communities in the United States. Taken together, the two introductions of SARS-CoV-2 into the WMAT, the confinement of each lineage to this population, and the low case fatality rate given the extreme attack rate in this community are highly noteworthy, especially given the common comorbidities and other factors predisposing NA communities to severe outcomes (3,11,13,18,21,32,36,37). The relatively late introduction of COVID-19 into this region provided time for response preparations, while the involvement of tribal members in the response allowed for a more personalized approach, both of which likely limited the spread and severity of disease (1,3). However, social circumstances unique to tribal populations in Arizona negatively impacted the public health response (3,32); these included (i) frequent interactions among extended family members, friends, and neighbors living in close proximity, as illustrated by clusters 1 and 2, and (ii) a number of home-insecure individuals who congregate to engage in shared substance abuse activities, as illustrated by cluster 3. The rapid and aggressive contact tracing efforts and immediate clinical support activities in this community are an exemplary paradigm for public health response efforts everywhere (1,3) and demonstrate the resilience of NA communities (13), especially given the inherent challenges in health care and response on NA reservations (10,14,15,32). Ensuring highly effective field epidemiology in conjunction with rapid genomic sequencing and analysis can better prepare public health and tribal health agencies to respond effectively to future emerging pathogens.

MATERIALS AND METHODS
Ethics approval. This work was conducted in collaboration with the tribal health department as a public health surveillance activity, involving genomic sequencing and analysis of deidentified remnant biospecimens, and therefore is exempt from needing human subject research board approval. The White Mountain Apache Tribe health board and council approved publication of this work.
Specimen collection, processing, and testing. NP swabs were collected at health care facilities serving the WMAT or in the field during contract tracing investigations. Specimens were submitted in phosphate-buffered saline or viral transport media and tested at the TGen North Clinical Laboratory (TNCL) or ASPHL. RNA was extracted from all NP swab medium samples using the Quick Viral kit (Zymo Research). TNCL samples were run on three rRT-PCR assays to diagnose COVID-19 using Reliance one-step multiplex supermix with 450 nM primer and 150 nM probe on the CFX96 real-time PCR detection system (Bio-Rad). The three rRT-PCR assays target the SARS-CoV-2 nucleocapsid gene (TG-N2_F, TTCAGCGTTCTTCGGAATGTC; TG-N2_R, TGGCACC  TGTGTAGGTCAAC; TG SARS-CoV-2 genome sequencing and analysis. Methods used for sample preparation and genome sequencing were described previously (19,30). RNA was treated with DNase I (Zymo Research) and prepared for either total RNA sequencing or targeted amplicon sequencing. Total RNA was prepared with either the SmartSeq Stranded kit (TaKaRa) or the Ovation Solo kit (Nugen) without a fragmentation step and sequenced on a v2 2 Â 151 high-output kit on a NextSeq system (Illumina). For targeted virus genome sequencing, cDNA was synthesized according to the nCoV-2019 sequencing protocol v2 (https://doi.org/10.17504/protocols .io.bdp7i5rn). Tiled PCR was performed with the nCov-2019/V3 primers (39) using Q5 hot start master mix (New England Biolabs [NEB]), and the amplicons were purified with Ampure XP beads (Beckman Coulter). Amplicon libraries were prepared with either Nextera Flex (Illumina) or plexWell 384 (seqWell) and sequenced with a v3 2 Â 300 kit on a MiSeq system (Illumina).
Virus genome consensus sequences were built using the Amplicon Sequencing Analysis Pipeline (ASAP) as described previously (19,30). First, adapters were trimmed from reads using bbduk (http://jgi.doe .gov/data-and-tools/bb-tools/) and mapped to the Wuhan-Hu-1 genome (GenBank accession no. MN908947) with bwa mem (40) using local alignment with soft clipping. BAM files were then processed to generate a consensus sequence and metrics on the quality of the assembly according to the following: (i) individual base calls with a quality score below 20 were discarded; (ii) remaining base calls at each position were tallied; (iii-a) if coverage was $10Â and $80% of the reads agreed at a locus, a consensus call was made; (iii-b) if either of these parameters was not met, an "N" call was made; and (iv) gaps in coverage (usually the result of a missing COVID-19 Genomic Epidemiology in the WMA Tribe mSphere amplicon) were filled in with a lowercase "n" to preserve alignment with the reference genome. Only consensus genomes covering at least 90% of the breadth of the reference genome with an average depth of $30Â were used in subsequent analyses. Genomes were uploaded into GISAID (41); accession numbers are listed in Table S1. Viruses were typed with Pangolin (Pango v.4.0.4 PLEARN-v1.3) (27). Phylogenetic analyses were conducted in Nextstrain (42), supplemented with genomes from GISAID (41) selected by the genome sampler (43) for context, and with BEAST 1.10.5 (44) to estimate the timing of relevant events. Substitution model selection was carried out with IQ-Tree (45), where GTR1F1I best fitted the data set. To determine the best-fitting clock and demographic model combinations for the data, the generalized stepping stone marginal likelihood estimator (46) was employed to compare the strict or uncorrelated lognormal (UCLN) (47) clock models combined with the exponential or Bayesian Skygrid demographic (48) models. The four model combinations were each run for 100,000,000 generations, whereas Markov chains were sampled every 10,000 generations. For both the B.1.289 and B.1.516 analyses, the combination of UCLN clock and Bayesian Skygrid outperformed the other three (Fig. S1). Using the UCLN Bayesian Skygrid model, three additional chains for 100,000,000 generations were run, with sampling every 10,000. Tracer v1.7.2 (49) was used to find convergence within and among chains. LogCombiner 1.10.4 was used to merge the four different chains, discarding the first 30% as burn in (30,000,000 generations per chain) and resampling every 30,000 generations. The resulting file from each data set was input into TreeAnnotator to produce a maximum clade credibility tree, which was visualized using FigTree v1.4.4 (50).
Data availability. The genomic data presented are available through GISAID, a public repository (https://www.gisaid.org/). Accession IDs linking to full sequence records and sample metadata are included in Table S1, and additional data are available upon request.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.