Multidrug Resistance Dynamics in Salmonella in Food Animals in the United States: An Analysis of Genomes from Public Databases

ABSTRACT The number of bacterial genomes deposited each year in public databases is growing exponentially. However, efforts to use these genomes to track trends in antimicrobial resistance (AMR) have been limited thus far. We used 22,102 genomes from public databases to track AMR trends in nontyphoidal Salmonella in food animals in the United States. In 2018, genomes deposited in public databases carried genes conferring resistance, on average, to 2.08 antimicrobial classes in poultry, 1.74 in bovines, and 1.28 in swine. This represents a decline in AMR of over 70% compared to the levels in 2000 in bovines and swine, and an increase of 13% for poultry. Trends in resistance inferred from genomic data showed good agreement with U.S. phenotypic surveillance data (weighted mean absolute difference ± standard deviation, 5.86% ± 8.11%). In 2018, resistance to 3rd-generation cephalosporins in bovines, swine, and poultry decreased to 9.97% on average, whereas in quinolones and 4th-generation cephalosporins, resistance increased to 12.53% and 3.87%, respectively. This was concomitant with a decrease of blaCMY-2 but an increase in blaCTX-M-65 and gyrA D87Y (encoding a change of D to Y at position 87). Core genome single-nucleotide polymorphism (SNP) phylogenies show that resistance to these antimicrobial classes was predominantly associated with Salmonella enterica serovar Infantis and, to a lesser extent, S. enterica serovar Typhimurium and its monophasic variant I 4,[5],12:i:−, whereas quinolone resistance was also associated with S. enterica serovar Dublin. Between 2000 and 2018, trends in serovar prevalence showed a composition shift where S. Typhimurium decreased while S. Infantis increased. Our findings illustrate the growing potential of using genomes in public databases to track AMR in regions where sequencing capacities are currently expanding. IMPORTANCE Next-generation sequencing has led to an exponential increase in the number of genomes deposited in public repositories. This growing volume of information presents opportunities to track the prevalence of genes conferring antimicrobial resistance (AMR), a growing threat to the health of humans and animals. Using 22,102 public genomes, we estimated that the prevalence of multidrug resistance (MDR) in the United States decreased in nontyphoidal Salmonella isolates recovered from bovines and swine between 2000 and 2018, whereas it increased in poultry. These trends are consistent with those detected by national surveillance systems that monitor resistance using phenotypic testing. However, using genomes, we identified that genes conferring resistance to critically important antimicrobials were associated with specific MDR serovars that could be the focus for future interventions. Our analysis illustrates the growing potential of public repositories to monitor AMR trends and shows that similar efforts could soon be carried out in other regions where genomic surveillance is increasing.

We searched for all available Salmonella enterica assemblies recovered from food-animals 33 released until the end of 2018 in three public genomic data repositories: the National Center 34 for Biotechnology Information (NCBI) Nucleotide database(1), EnteroBase (2)  Metadata tables were imported into R (version 3.6.0)(5). Manipulation of metadata was 44 performed with the tidyverse(6) (version 1.3.0), data.table(7) (version 1.12.8) and plyr (8)  45 (version 1.8.6) packages. 46 All entries that did not report a country of origin or geographic coordinates were removed. 47 Thereafter, we inspected isolation sources and to identify food-animal key words that allowed 48 to reduce the datasets, but would maximize the number of hits. The resulting filtered datasets 49 were then manually curated to exclude entries that did not meet the criteria of food-animal. 50 We considered four levels of data aggregation for host attribution: • Source Niche: highest level of aggregation and indicates whether samples were 52 recovered from food-products (Food) or from the animals themselves (Poultry or 53 Livestock); 54 • Generic Host: aggregation within animal-production group such as poultry, swine, 55 bovine, ovine and caprine. The categories dairy, meat and environment were also 56 introduced when no specific animal was given. The category environment denotes 57 food-animal-related samples not collected directly from the animal or their food-58 products such as drag swabs, poultry litter, eggshells, animal bedding and barns; 59 • Source Type: indicates the specific animal from which the samples were collected; 60 • Source Details: contains the original sample description as input by the submitter. 61 62 Geographic coordinates were retrieved from metadata tables when available. Coordinates 63 expressed in cardinal directions were converted to decimal degree. For entries without 64 coordinates, an addresse was constructed based on the available information of the isolation 65 location (country, province, state, region, city, zip code, etc.). We queried addresses for their 66 decimal degree coordinates with the geocode function of the ggmap package(9) (version 67 3.0.0). Assemblies returning no coordinates were inspected manually and queried in Google 68 Earth Pro(10). A column with country's three-letter code based on the ISO 3166-1 guidelines 69 was also assigned. 70 Isolation dates were harmonized according to the ISO 8601 format (year-month-day) using 71 lubridate (11) and anytime(12) packages. A dedicated column for year of isolation was also 72 created. Finally, the NCBI BioSample and BioProject (when available) were also kept. 73 74 1.3.

Creation of a Consensus Dataset and Assembly Download 75 76
We used the BioSample identifier to compare entries across databases and created a 77 consensus dataset by removing duplicate entries. We primarily kept entries from EnteroBase 78 given the dedicated pipeline this database has towards short-read sequences assembly, 79 quality control, and molecular typing). Then we retrieved data from PATRIC and finally from 80 NCBI RefSeq(13). EnteroBase derived assemblies were kindly provided by the curators, 81 PATRIC assemblies were downloaded through the PATRIC Command Line Interface (14) phenotypes were assigned based on the gene with the best alignment score, but with a 96 minimum of 97% identity. When no matches were found in CARD, we used NCBI's BLAST (16)  97 instead. For the matches with the highest identity and coverage, we inspected the referred 98 manuscripts where such ARG and their respective resistance phenotypes were described. 99 Finally, for ß-lactamase genes, we added cephalothin manually to the Phenotype column. This 100 is because early generation cephalosporins were not included in the ResFinder phenotype list, 101 although TEM-types(17), AmpC(18), OXA-Types(19) hydrolyze these ß-lactams. We removed 102 aac(6')-Iaa from the dataset as this gene has been described as intrinsic to S. enterica and 103 does not cause phenotypic resistance (20,21). 104 In the case of point mutations, we only kept those that have known resistance phenotype in 105 the PointFinder database (22), which can extracted directly from the staramr output(23). 106 107 3. Multidrug Resistance Score Calculation 108 109 We devised a metric to summarize multidrug resistance based on the number of different 110 classes of antimicrobials an isolate is predicted to have resistance based on its content in 111 ARGs (acquired ARGs or point mutations). We call this metric the Multidrug Resistance Score 112 (MDR Score). We based this metric on microbiological resistance. In brief, microbiological 113 resistance is identified when the minimal inhibitory concentration (MIC) is above the 114 epidemiological cut-off (ECOFF) value(24) -the highest MIC for organisms devoid of 115 phenotypically detectable acquired resistance mechanisms(25). We assume that all identified 116 ARGs are functional and thus resulting MICs would be above the ECOFF. For the majority of 117 ARGs, the phenotype would not be affected. However, it could affect the predicted 118 phenotype ß-lactamase genes since for some variants the amino acid changes result in 119 different resistance phenotypes(26). Although only 0.44% of the ß-lactamase genes had a 120 coverage and identity below 100% in our study.
To calculate the MDR Score, we used a list of antimicrobials of clinical importance 122 Enterobacteriaceae in relation to acquired resistance(27). We used cephalothin as a surrogate 123 for cefazolin since they are both early generation cephalosporins. 124 The MDR score was computed as follow: 125 • For each genome, the unique predicted resistance phenotypes were identified, and 126 antibiotics were grouped into the different molecular classes: • The MDR Score will increase by one when an antibiotic is assigned to one of the 145 described molecular classes. If more ARGs confer resistance to the same molecular 146 class, the MDR score still only increases by one. 147

Model Weights Calculation 168
For the temporal trend analysis of resistance, we need to weight the observations relative to 169 their representativeness in our dataset. To achieve this, we weighted all observations by the 170 countries' Population Correction Unit (PCU) for each host (expressed as proportion) and 171 corresponding isolation year times the proportion of genomes contributed by a given a 172 country for a given year. We calculated PCU as described by Tiseo and colleagues (28) for all 173 countries as follows: 174 where Ank,s is the number of animal type, k, for each production system, s (intensive or 176 extensive), in each country; nk,s is the number of production cycles for each animal type in 177 each production system; Yk is the quantity of meat in each country for each animal type; and 178 RCW/LW,k is the carcass weight to live weight ratio for each animal type. The PCU allows for 179 direct comparisons of animals raised for food in across countries. For some countries, PCU 180 data was unavailable before 1999 for Belgium, before 1991 for Belarus, before 1992 for Czech 181 Republic and Slovakia before, and before 1991 for Belarus, Croatia, Estonia, and Lithuania. In 182 addition, no PCU data existed prior to 1985. In such cases, we assigned the PCU value 183 corresponding to the earliest available year in the time series. PCU data can be found in 184 Supplementary Dataset S1.  Tables  308  309  310  Table S1.    Year -isolation year; Generic_Host -animal host; Phenotype -antimicrobial class; Prevalence 353 -fitted prevalence; low_CI -lower bound of 95% confidence interval; upp_CI -upper bound 354 of 95% confidence interval. signif -wether covariate "Year" was statistically significant or not; 355

IV.
Legend for Dataset S1 370 371 372 Dataset S1. PCU data for all countries between 1985 and 2018. Each column corresponds to 373 a food-animal/year combination. "Ca" refers to bovine, "Ch" refers to poultry, and "Pg" refers 374 to swine. 375