Population genomics of Bacillus anthracis from an anthrax hyperendemic area reveals transmission processes across spatial scales and unexpected within-host diversity

Genomic sequencing has revolutionized our understanding of bacterial disease epidemiology, but remains underutilized for zoonotic pathogens in remote endemic settings. Anthrax, caused by the spore-forming bacterium Bacillus anthracis , remains a threat to human and animal health and rural livelihoods in low- and middle-income countries. While the global genomic diversity of B. anthracis has been well-characterized, there is limited information on how its populations are genetically structured at the scale at which transmission occurs, critical for understanding the pathogen’s evolution and transmission dynamics. Using a uniquely rich dataset, we quantified genome-wide SNPs among 73 B. anthracis isolates derived from 33 livestock carcasses sampled over 1 year throughout the Ngorongoro Conservation Area, Tanzania, a region hyperendemic for anthrax. Genome-wide SNPs distinguished 22 unique B. anthracis genotypes (i.e. SNP profiles) within the study area. However, phylogeographical structure was lacking, as identical SNP profiles were found throughout the study area, likely the result of the long and variable periods of spore dormancy and long-distance livestock movements. Significantly, divergent genotypes were obtained from spatio-temporally linked cases and even individual carcasses. The high number of SNPs distinguishing isolates from the same host is unlikely to have arisen during infection, as supported by our simulation models. This points to an unexpectedly wide transmission bottleneck for B. anthracis , with an inoculum comprising multiple variants being the norm. Our work highlights that inferring transmission patterns of B. anthracis from genomic data will require analytical approaches that account for extended and variable environmental persistence, as well as co-infection.

Each isolate is attributed to a carcass ID, with the sample type indicated following the underscore (B = blood, D = soil, I = insect, S = swab, T = tissue). A number follows the sample type if more than one isolate was sequenced from the same sample.    Table S4. Identical or nearly identical isolates from carcasses sampled several months apart.

Study area
The population of the Ngorongoro Conservation Area (NCA) is comprised mostly of Maasai pastoralists. The size was based on a population estimate of just over 70,000 inhabitants in 2012 and an annual growth rate of 2.7%. Given this area's conservation status, in addition to livestock, people live in close proximity with various wildlife species.

Research and ethical approval
Material and data transfer agreements were established as part of the research approvals. This study complied with Tanzania's national access measures for genetic material. Approval and permission to access communities were also obtained from relevant local authorities. Verbal and/or written informed consent was obtained from all owners of livestock sampled, with verbal consent obtained in lieu of written consent where participants preferred. Both verbal and written consent had been approved by the ethical committees.

Sample collection and handling during shipping
Anthrax was suspected in livestock for any acute mortality where the animal had appeared healthy until the time of death. Terminal bleeding from natural orifices was variably observed.
Samples were triple-bagged, with the inner container first disinfected with 10,000 ppm sodium hypochlorite reconstituted from Haz-Tab tablets (Guest Medical, UK). GPS coordinates were taken for samples collected later in the study; for earlier samples, coordinates were estimated by having the field team identify the sampling location on a map of the study area and subsequently plotting these points using Google Earth. When available, GPS coordinates are provided in with the isolate metadata to three decimal places to ensure participant anonymity.
Aliquots from a total of 278 samples from 109 carcasses were shipped to the University of

Swabs
The dry swabs were moistened with 350 µL Dulbecco's saline (fortified with additional 1 M calcium carbonate) for 24 hours. The swabs were softened by the saline solution and the solute took on the appearance of diluted blood after vortexing.

Blood
Blood samples were diluted in 250 µL PBS and were heat treated at 65 °C for 10 minutes to inhibit competition from heat-sensitive bacteria.

Insects
The insects were identified under a stereoscope while being placed in sterile 1.5 mL Eppendorf tubes filled with 350 µL PBS. A homogenizing pestle (Lasec, South Africa) was used to homogenize the sample for plating.

Sub-Culture and DNA extraction
Only grey-white, ground-glass textured, non-hemolytic colonies were selected for subculture from sheep blood agar (SBA) (within 24 hours) and dome shaped, rough white colonies were selected from PET (within 48 hours). Differentiating morphology characteristics among suspect isolates included (i) pronounced "medusa head" edges around the colony forming unit; (ii) an uncharacteristic tacky texture when touched with the bacteriologic loop; and (iii) a larger colony diameter than other colony forming units on the same plate.
Once pure, single colonies could be isolated after passage, species was confirmed using a 10 µg penicillin disc (Oxoid) and 10 µL of diagnostic Gamma phage (5x10 9 pfu/mL); colonies were identified as B. anthracis when demonstrating both penicillin and γ-phage sensitivity after overnight incubation at 37°C, as well as testing positive by qPCR for the protective antigen pag gene (BAPA) [3]. DNA extraction for qPCR confirmation was performed by harvesting all material on the purity plate in 5 mL of PBS and crude boiled at a 110 °C for 15 minutes. The boiled solution was then centrifuged at 6000 G for 30 min, reserving the supernatant.
To extract genomic DNA for sequencing, a loopful of bacterial colonies from the purity plates was used as the input for the Bioline Isolate II Genomic DNA kit (Meridian Biosciences, UK) with 2 hour 37°C incubation in 20 mg/mL lysozyme (Fluka) solution according to the manufacturer's instructions for Gram-positive bacteria.

qPCR
The qPCR for BAPA [3][4][5]  for 30 second, slope 20 °C/second [3,5]. Plates and CFU that were negative for both BAPA as well as phage sensitivity were excluded for final nucleic acid extraction selection.

Additional quality control
DNA was extracted from 96 isolates and shipped to the UK for further quality control and subsequent sequencing. DNA extracts from 75 isolates from 33 carcasses were submitted for sequencing; these were selected based on having sufficiently high concentration as measured by

Simulation modelling
One hundred simulations running for 25 generations were performed for inoculum sizes one, two, five, and ten. Due to computational restrictions, only 50 simulations running for 25 generations were performed for dose 20. One hundred simulations running for 20 generations were performed for doses 50 and 100, while an additional 7 simulations with an initial dose of 50 were run for 25 generations.

Bacterial Isolation
Of the 278 specimens processed for culture, 122 yielded suspect B. anthracis colonies, while the remainder had almost no growth on either the B. anthracis selective and non-selective media.
After testing the suspect colonies with penicillin, Gamma phage and qPCR for the protective  [8,9]. The role of biting flies as a source of B. anthracis infection is less well defined [3].

Sequence quality
GC content of the raw reads was unexpectedly low (<34; n=1) or high (>36; n=5), and/or total de novo assembly length was above the expected length of ~5.5 MB (n=13), and/or there were a high number of contigs (>1000; n=11), indicative of challenges with assembly that could be related to mixed culture. A total of 16 isolates had one or more of these issues (see isolate metadata). Strict filtering criteria were implemented during reference-based mapping to address this issue.

Simulation modelling
After 20 generations and across 100 simulations, a population with a starting dose of 1 (n = Across inoculum sizes, the proportion of genomes identical to the infecting dose in our simulations tended to remain high, and while genomes differing by a maximum of 5 (doses 1 and 2) and 6 (doses 5 -100) SNPs were observed, these were incredibly rare and therefore unlikely to be sampled; the majority of variant genomes differed from the inoculum genome by a single SNP. Greater variability around the average proportion of the population with different numbers of SNPs was observed in populations simulated from the smaller inoculum sizes due to increased impact of stochasticity in early generations. For example, in a single simulation with an inoculum size of 1, a mutation emerged in the first replication and therefore no genomes identical to the infecting dose were observed in further generations (Fig. S7-A), while in some simulations initiated with an infectious dose of 2, two major variants separated by a single SNP were both present at fairly even proportions (Fig. S8-B).
Proportions of sampled pairs of genomes from the simulated populations with different numbers of SNP differences are summarized in Table S5, which also shows the mean SNP distance averaged across sampled pairs. This mean SNP distance ranged from 0.15 (dose = 1) to 0.17 (dose = 100). While pairwise distances ranging from 0 to 4 were observed, this only included 33 observations of 3 SNP distance (0.05%) and 5 observations of 4 SNP distance (0.008%).
The relationship between higher dose and higher mean SNP distance was statistically significant (p < 0.005) though the estimated effect size was small (+0.0014 per extra 10 genomes in dose) and resulted in only a slight increase in the chance of drawing pairs of genotypes with greater SNP distances. For example, the proportion of samples with SNP distances of 2 or more increased only from 1.22% with an inoculum size of 1, to 1.37% with an inoculum of 100 genomes.
At least 50 simulations initiated with inoculums of 1, 2, 5 10 and 20 bacteria were run for an additional 5 generations. The mean SNP distance between pairs sampled in generations 20 to 25 tended to increase slightly (Fig. 6D), by around 0.0085 per generation (p < 0.0001). This gradual increase resulted in the proportion of samples with a SNP distance of 2 rising from 1.15% in generation 20 to 1.74% in generation 25. In 270,000 pairs sampled in generations 21 to 25, a SNP distance of 4 was observed only 8 times and a SNP distance of 5 was sampled once (in a population simulated from an infectious dose of 10 sampled in the 24 th generation). Such an observation is incredibly rare as it requires, in the most likely scenario, sampling 2 genotypes differing from the inoculum by 2 and 3 non-shared mutations.