Pathogen diversity and antimicrobial resistance transmission of Salmonella enterica serovars Typhi and Paratyphi A in Bangladesh, Nepal, and Malawi: a genomic epidemiological study

Summary Background Enteric fever is a serious public health concern. The causative agents, Salmonella enterica serovars Typhi and Paratyphi A, frequently have antimicrobial resistance (AMR), leading to limited treatment options and poorer clinical outcomes. We investigated the genomic epidemiology, resistance mechanisms, and transmission dynamics of these pathogens at three urban sites in Africa and Asia. Methods S Typhi and S Paratyphi A bacteria isolated from blood cultures of febrile children and adults at study sites in Dhaka (Bangladesh), Kathmandu (Nepal), and Blantyre (Malawi) during STRATAA surveillance were sequenced. Isolates were charactered in terms of their serotypes, genotypes (according to GenoTyphi and Paratype), molecular determinants of AMR, and population structure. We used phylogenomic analyses incorporating globally representative genomic data from previously published surveillance studies and ancestral state reconstruction to differentiate locally circulating from imported pathogen AMR variants. Clusters of sequences without any single-nucleotide variants in their core genome were identified and used to explore spatiotemporal patterns and transmission dynamics. Findings We sequenced 731 genomes from isolates obtained during surveillance across the three sites between Oct 1, 2016, and Aug 31, 2019 (24 months in Dhaka and Kathmandu and 34 months in Blantyre). S Paratyphi A was present in Dhaka and Kathmandu but not Blantyre. S Typhi genotype 4.3.1 (H58) was common in all sites, but with different dominant variants (4.3.1.1.EA1 in Blantyre, 4.3.1.1 in Dhaka, and 4.3.1.2 in Kathmandu). Multidrug resistance (ie, resistance to chloramphenicol, co-trimoxazole, and ampicillin) was common in Blantyre (138 [98%] of 141 cases) and Dhaka (143 [32%] of 452), but absent from Kathmandu. Quinolone-resistance mutations were common in Dhaka (451 [>99%] of 452) and Kathmandu (123 [89%] of 138), but not in Blantyre (three [2%] of 141). Azithromycin-resistance mutations in acrB were rare, appearing only in Dhaka (five [1%] of 452). Phylogenetic analyses showed that most cases derived from pre-existing, locally established pathogen variants; 702 (98%) of 713 drug-resistant infections resulted from local circulation of AMR variants, not imported variants or recent de novo emergence; and pathogen variants circulated across age groups. 479 (66%) of 731 cases clustered with others that were indistinguishable by point mutations; individual clusters included multiple age groups and persisted for up to 2·3 years, and AMR determinants were invariant within clusters. Interpretation Enteric fever was associated with locally established pathogen variants that circulate across age groups. AMR infections resulted from local transmission of resistant strains. These results form a baseline against which to monitor the impacts of control measures. Funding Wellcome Trust, Bill & Melinda Gates Foundation, EU Horizon 2020, and UK National Institute for Health and Care Research.

Typhi and Paratyphi A genomes sequenced in this study [CSV file] Table S2: Logistic regression models testing for association of age and sex with pathogen features.
Table S3: Logistic regression models of severe Typhi infection.
Table S4: Comparison of phenotypic and genotypic assessment of non-susceptibility to antimicrobials for Typhi.
Table S5: Comparison of phenotypic and genotypic assessment of non-susceptibility to antimicrobials for Paratyphi A.

Passive surveillance for enteric fever
The manuscript reports genome sequence data for all Salmonella isolates collected during passive surveillance for enteric fever during the STRATAA study.The full STRATAA protocol is reported in Darton et al. 2017 1 and the primary disease burden results are reported in Meiring et al .2022 2 ; key details are summarised here.
The three study sites in Blantyre (Malawi), Dhaka (Bangladesh), and Kathmandu (Nepal) were selected from locations across Africa and Asia based on known high rates of enteric fever and capacity to deliver a large-scale and logistically complexity study.Each study site demonstrated different epidemiological profiles and historical patterns of enteric fever incidence 1 .Sample size calculations are described in the STRATAA protocol, and the target catchment population size (n=100,000) was powered to estimate typhoid incidence rate through passive surveillance with precision (half-width) of 50% 1 .Baseline and final demographic censuses enumerated a target population around 100,000 at each site including 22,000-26,000 households 2 .Participants residing in these catchment areas were encouraged to attend study health-care facilities for blood culture and treatment if they developed fever.Those presenting with a history of fever for ≥48 h or current temperature of ≥38•0°C were approached for recruitment.For analyses presented here, the inclusion criteria was blood culture confirmed cases derived from separate individuals resident within the census area, collected within the surveillance period.
For Dhaka and Kathmandu the surveillance period was Jan 2017-Dec 2018, and Oct 2016-Sep 2018 for Blantyre.As detailed in Table 1, an additional ten months of surveillance data was available for Blantyre, which was leveraged to better understand these populations, but these data were not included in diversity analyses to prevent bias from a longer sampling frame.

Bacterial isolates and whole genome sequencing
Isolates cultured from blood of febrile individuals recruited into the STRATAA passive surveillance studies at each of the three sites were stored locally until the end of the recruitment period.Antimicrobial susceptibility testing was performed by disk diffusion method for Ampicillin, Azithromycin, Ceftriaxone, Cefixime, Ciprofloxacin, Chloramphenicol, Co-trimoxazole, Nalidixic Acid and Meropenem following either Clinical and Laboratory Standards Institute (CLSI) 3 or British Society of Antimicrobial Chemotherapy (BSAC) breakpoint guidelines (available at: http://www.bsac.org.uk/) as described in reference 2. Subsequently, isolates were cultured overnight and genomic DNA extracted using the Wizard Genomic DNA Extraction Kit following the manufacturers recommendations (Promega, WI, USA).DNA was shipped to the Wellcome Sanger Institute and subjected to indexed whole genome sequencing on an Illumina HiSeq 2500 platform to generate paired-end reads of 100 bp in length, as described previously 4 .
Isolates from n=452/454 cases in Dhaka (99•6% of all culture-positive) from the period January 2017-December 2018 were successfully sequenced and confirmed as Typhi or Paratyphi A (summary in Table 1, genome list in Table S1).From Kathmandu, n=138/164 cases (84•1% of all culture-positive) from the same time period were successfully sequenced and serovars confirmed.From Blantyre, n=83/115 cases (72•2% of all culture-positive) from October 2016-October 2018 were successfully sequenced and typhoidal serovars confirmed.Recruitment continued for an additional 10 months in Blantyre, resulting in total n=141/158 sequenced typhoidal isolates (89•2% of positive cultures).The full set of n=141 sequences from Blantyre are included in the main dataset 'STRATAA', which comprises genome sequence data for n=731 unique typhoidal Salmonella blood-culture isolates (622 Typhi and 109 Paratyphi A, see Table 1).A small number of additional blood-culture isolates were captured at the study sites before or after these formal surveillance periods (n=30 from Dhaka, n=27 from Kathmandu) or outside the boundaries of the surveillance catchment area (n=35 from Blantyre); these were also sequenced and included in phylogenetic trees to provide context (total n=707 Typhi, n=116 Paratyphi A; see Table S1), but were excluded from statistical analyses.

Phylogenomic analysis
Maximum-likelihood (ML) phylogenetic trees were inferred from the aforementioned chromosomal SNV alignments using RAxML (v8.2.8) 26 .A generalised time-reversible model and a Gamma distribution was used to model site-specific rate variation (GTR+Γ substitution model; GTRGAMMA in RAxML) with 100 bootstrap pseudoreplicates 27 used to assess branch support for the ML phylogeny.The resulting phylogenies were visualised and annotated using Microreact 28 and the R package ggtree (v2.2.4) 29 .
As most azithromycin resistance was not explained by known mechanisms (acrB-717 mutation or acquired genes), we screened Ribosomal RNA (rRNA) operons for potential explanatory mutations.A single copy of the rRNA operon (coordinates 4257263-4262892 in the CT18 reference genome) was extracted, and used as a reference sequence for mapping of Typhi reads using RedDog (as described above).Read alignments (BAM format) were subjected to low-frequency variant calling LoFreq (v2.1.3.1) 37, and SNV read-depth data extracted from the resultant variant calling format (VCF) files with the read.vcfR()function from the R package vcfR (v1.12.0) 38 .Read depths for SNVs detected in the rRNA operon were normalised using the average chromosomal read depths for each sequence reported by RedDog across the entire CT18 reference sequence to identify single copy mutations (read depth of ~1x the average reported for the CT18 chromosome).These single copy mutations were examined for an association with observed azithromycin resistance phenotypes, but were found to correlate with the genotype backgrounds in which they occurred and not with resistance.The same approach was used to assess Paratyphi A genomes, where reads were mapped to a single rRNA operon (coordinates 3870557-3876131 in the Paratyphi A AKU_12601 reference genome), but again no evidence of resistance-associated mutations in the rRNA operon were identified.DBGWAS (v0.5.4) 39 was utilised to carry out a bacterial genome-wide association study (GWAS) to screen for genetic loci and/or variants associated with the azithromycin resistance.For this, raw reads were assembled de novo with Unicycler (v0.4.7) 40 and used as input to DBGWAS, along with azithromycin resistance status for each sequence.DBGWAS was run using default parameters and p-value <0•01 interpreted as a significant association; none were identified.

Transmission of molecular determinants of AMR
To determine if resistant enteric fever cases resulted from infection with locally-circulating resistant strains, imported strains, or emerged de novo, we carried out ancestral state reconstruction (ASR) analyses.For each genotype where molecular determinants of AMR were detected, we inferred a ML phylogeny including all available STRATAA genomes plus global contextual sequences of the same genotype (phylogenetic inference methods as detailed above).We then conducted ASR, for each detected AMR determinant and country of origin (Nepal, Bangladesh, Malawi, or other), onto these ML tree topologies using a maximum-parsimony method implemented in the ancestral.pars()function in the R package phangorn (v2.5.5) 41 .The inferred states for each variable (country and AMR determinant) were extracted for each internal node of each tree and used to infer, for each resistant isolate (tree tip), whether the resistance was most likely: (i) inherited from a local resistant strain (parent node and tip share same AMR determinant and location, interpreted as local transmission of a resistant strain); (ii) inherited from an imported resistant strain (parent node and tip share the same AMR determinant but parent node is located in a different country, interpreted as transmission of an imported resistant strain); (iii) recently emerged de novo (parent node lacks the AMR determinant, interpreted as recent emergence of resistance rather than transmission of an established resistant strain).

Logistic regression models
Multivariable logistic regression models were fit using the logistf package (v1.24.1) 42 , which implements Firth's bias-reduced penalized-likelihood logistic regression.Age-group was treated as a categorical variable reflecting groups that have broadly different contact networks relevant to enteric fever transmission/exposure: pre-school age (<5 years and interacting mainly with other household members, reference category in logistic regression), school-age children (5-15 years, interacting with household members and other school-age children), and working age (≥15 years old, interacting with household members and the wider community).Notably, treating age-group categories as ordinal, or treating age (in years) as a continuous variable, gave comparable results in logistic regression models with no significant difference in model fit (assessed with AIC); therefore we report all results using categorical age groups as we consider these more interpretable in terms of enteric fever epidemiology.Other variables (serovar, dominant genotype, MDR, severity) were coded as binary data.The population is mainly acrBwildtype; two STRATAA isolates appear to share an acrB mutation inherited from a common ancestor, consistent with local transmission of an azithromycin-resistant variant; the third acrB mutant is distantly related from others and shows no evidence of transmission of a resistant ancestor.The emergence of the gyrA-S83Y mutation conferring ciprofloxacin non-susceptibility is also highlighted.(D) Tree highlighting the phylogenetic position of the single ciprofloxacin-resistant Typhi 4.3.1.2sequenced from STRATAA surveillance in Dhaka (labelled with arrow), which suggests it was imported into Bangladesh (dark green) from India (light green) where it is believed to have emerged 13,43 .(E) Tree highlighting the phylogenetic position of the single acrB mutant of Typhi 4.3.1.1 sequenced from STRATAA surveillance in Dhaka (labelled with arrow), which suggests it results from local transmission of an acrB mutant (i.e.azithromycin-resistant) clade that has been detected in previous studies from Dhaka 30 (2013-2016).(F) Tree highlighting the phylogenetic position of the single acrB mutant (i.e.azithromycin-resistant) Typhi 3.2.2sequenced from STRATAA surveillance in Dhaka (labelled with arrow); this suggests the mutation arose locally, in the background of a locally-circulating clade with wildtype acrB.This was categorised as a de novo resistance mutation.(G) Tree highlighting the phylogenetic position of the single gyrA-S83Y mutant (i.e.ciprofloxacin non-susceptible) Paratyphi A genotype 2.3 sequenced during STRATAA surveillance in Kathmandu (labelled with arrows).The position in the phylogeny suggests it was imported into Nepal (dark green) from India (light green).(H) Tree highlighting the phylogenetic position of the single gyrA-S83Y mutant (i.e.ciprofloxacin non-susceptible) Typhi genotype 4.3.1.1 sequenced during STRATAA surveillance in Kathmandu (labelled with arrows).The position in the phylogeny suggests it was imported into Nepal (dark green) from India (light green).For panels (C-H), branches are coloured by country of origin and coloured bars indicate samples from this study (STRATAA) by site, country of origin, and AcrB/GyrA status, as per the legend in panel (C).

Figure S2 :
Figure S2: Breakdown of patient sex and Typhi features, within each age-group, at each site, for the main STRATAA dataset.

Figure S3 :
Figure S3: Phylogenetic trees of Typhi and Paratyphi A showing STRATAA isolate sequences in context with publicly available global genome data.

Figure S4 :
Figure S4: Detail of subpopulations discussed in text.

Figure S5 :
Figure S5: Phylogenetic tree of Typhi sequenced from STRATAA surveillance in Blantyre.

Figure S6 :
Figure S6: Phylogenetic tree of Typhi sequenced from STRATAA surveillance in Dhaka.

Figure S7 :
Figure S7: Phylogenetic tree of Typhi sequenced from STRATAA surveillance in Kathmandu.

Figure S8 :
Figure S8: Spatial distribution of zero-SNV clusters from STRATAA surveillance

Figure S1 .
Figure S1.Genetic diversity of enteric fever pathogen populations.(A-C) Epidemic curves for sequenced enteric fever cases in the three STRATAA catchment areas, stratified by pathogen genotype.Bars indicate monthly case counts, coloured to indicate serovar and genotype as per inset legend.*4.3.1.1 in Blantyre is sublineage 4.3.1.1.EA1.Inset labels 'D( )' indicate Simpson's diversity index calculated from pathogen genotype counts for Typhi, Paratyphi A or both ('D(All)').(D-E) Maximum-likelihood phylogenetic trees for STRATAA isolates of serovars Typhi (D) and Paratyphi A (E), inferred from genome-wide single nucleotide variant alignments.Each tree was outgroup-rooted using genomes of the other serovar.Branches are coloured by country as per inset legend, and labelled by genotype.Data in all panels (A-E) represent the main STRATAA dataset (n=622 Typhi and n=109 Paratyphi A; seeTable1).
Figure S1.Genetic diversity of enteric fever pathogen populations.(A-C) Epidemic curves for sequenced enteric fever cases in the three STRATAA catchment areas, stratified by pathogen genotype.Bars indicate monthly case counts, coloured to indicate serovar and genotype as per inset legend.*4.3.1.1 in Blantyre is sublineage 4.3.1.1.EA1.Inset labels 'D( )' indicate Simpson's diversity index calculated from pathogen genotype counts for Typhi, Paratyphi A or both ('D(All)').(D-E) Maximum-likelihood phylogenetic trees for STRATAA isolates of serovars Typhi (D) and Paratyphi A (E), inferred from genome-wide single nucleotide variant alignments.Each tree was outgroup-rooted using genomes of the other serovar.Branches are coloured by country as per inset legend, and labelled by genotype.Data in all panels (A-E) represent the main STRATAA dataset (n=622 Typhi and n=109 Paratyphi A; seeTable1).

Figure S3 .
Figure S3.Phylogenetic trees of Typhi and Paratyphi A showing STRATAA isolate sequences in context with publicly available global genome data.(A) Non-H58 genomes (i.e., excluding 4.3.1 and derived genotypes).(B) Subtree of H58 (4.3.1 and derived) genotypes.A maximum-likelihood tree was inferred for n=3,835 Typhi sequences, including n=707 from this study plus global genomes, based on an alignment of genome-wide single nucleotide variants and outgroup rooted using Paratyphi A. The subtree of 4.3.1 genotypes (panel B) was pruned from the larger tree to aid visualisation.Genotypes are indicated with labels, and branch lengths indicate substitutions per variable site, as per inset scalebar.All branches and rings are coloured as per inset legend.Branch colours and first two rings indicate the geographic origin for the sequences; ring 1 highlights the n=707 sequences from this study.Other rings indicate antimicrobial resistance determinants as labelled.AcrB mutations are associated with azithromycin resistance.QRDR, quinolone resistance determining region.MDR, multidrug-resistant (MDR).XDR, extensively drug resistant.An interactive version of the full tree is available at https://microreact.org/project/wim5TssQ3AqSfgWXTSP2Bd.(C)Paratyphi A genomes.A maximum likelihood phylogeny was inferred in the same manner as for (A) from n=375 genomes and outgroup rooted using Typhi.Branch colours and the first two coloured bars indicate geographic origin of the sequences; bar 1 highlights the n=116 sequences from this study.Other coloured bars indicate antimicrobial resistance determinants as labelled.An interactive version of the full tree is available at: https://microreact.org/project/SgHbIR9cP.

Figure S4 .
Figure S4.Detail of subpopulations discussed in text.Subtrees were extracted from the full tree of STRATAA plus global isolates (shown in Fig.S3) to highlight specific points discussed in the manuscript.(A) Typhi 3.3.2tree, highlighting the two distinct clades identified as being either endemic or imported in Nepal.Arrows indicate the de novo emergence of ciprofloxacin non-susceptibility driven by non-synonymous mutations in gyrA (as labelled).(B) Epidemic curve of Typhi sequenced from STRATAA surveillance in Kathmandu, showing counts of imported and endemic genotype 3.3.2,defined using the tree in panel A. (C) Tree highlighting the phylogenetic positions of three acrB mutant (i.e.azithromycin-resistant) Paratyphi A 2.4.4 sequenced from STRATAA surveillance in Dhaka (labelled with arrows).The population is mainly acrBwildtype; two STRATAA isolates appear to share an acrB mutation inherited from a common ancestor, consistent with local transmission of an azithromycin-resistant variant; the third acrB mutant is distantly related

Figure S5 .
Figure S5.Phylogenetic tree of Typhi sequenced from STRATAA surveillance in Blantyre.Whole-genome maximum-likelihood phylogeny of n=141 Typhi sequences from Blantyre, Malawi (main 'STRATAA' dataset).Tip colours and first column of the heatmap indicate the genotype of the sequenced isolate (as per inset legend and branch labels).Second and third columns indicate presence of quinolone resistance-associated mutations in gyrA, remaining columns indicate presence of repC (a marker of the common multidrug resistant transposon), acquired genes associated with resistance to first-line drugs, and patient age group (coloured as per inset legends).Branch lengths indicate substitutions per variable site, as per the inset scalebar.Interactive phylogeny available at: https://microreact.org/project/beTX7esTof71d6J329iagM

Figure S7 .
Figure S7.Phylogenetic tree of Typhi sequenced from STRATAA surveillance in Kathmandu.Whole-genome maximum-likelihood phylogeny of n=124 Typhi sequences from Kathmandu, Nepal (main 'STRATAA' dataset).Tip colours and first column of heatmap indicate the genotype of the sequenced isolate (as per inset legend and branch labels).'Columns 2-7 indicate presence of quinolone resistance-associated mutations in gyrA and parC, remaining columns indicate presence of repC (a marker of the common multidrug resistant transposon), acquired genes associated with resistance to first-line drugs, and patient age group (coloured as per inset legends).Branch lengths indicate substitutions per variable site, as per inset scalebar.Interactive phylogeny available at: https://microreact.org/project/nJ42r2XHmd7gZCQ1eqQHGT.

Figure S8 .
Figure S8.Spatial distribution of zero-SNV clusters from STRATAA surveillance.Each panel (A-D) shows the spatial distribution of clusters on a map.Filled points on the map each represent a sequenced case, coloured by the zero-SNV cluster as per inset legend in each panel (unfilled points indicate other cases, including unclustered cases or members of smaller clusters).

Table S6 :
Publicly available Typhi and Paratyphi A genome sequences used to provide global context for genotypes detected in STRATAA sequences [CSV file]

Table S7 :
Representative Typhi sequences (one per defined genotype) used to outgroup-root Typhi phylogenies [CSV file]

Table S4 . Comparison of phenotypic and genotypic assessment of non-susceptibility to antimicrobials for Typhi.
Drugs for which phenotype results were available for ≥20% of isolates are shown.Positive predictive value (PPV) and negative predictive value (NPV) of genetic determinants for prediction of resistant phenotypes are shown.Square brackets indicate 95% CI for PPV and NPV.Genetic determinants used as predictors of resistance are listed in Supplementary Methods.

Table S5 . Comparison of phenotypic and genotypic assessment of non-susceptibility to antimicrobials for Paratyphi A.
Drugs for which phenotype results were available for ≥20% of isolates are shown.Positive predictive value (PPV) and negative predictive value (NPV) of genetic determinants for prediction of resistant phenotypes are shown.Square brackets indicate 95% CI for PPV and NPV.Genetic determinants used as predictors of resistance are listed in Supplementary Methods.