Whole genome sequence analysis of 91 Salmonella Enteritidis isolates from mice caught on poultry farms in the mid 1990s

Salmonella enterica serovar Enteritidis (SE), the most commonly reported serovar of human salmonellosis, has been frequently associated with poultry farms, eggs and egg products. Mice are known vectors of SE contamination in these facilities. The objective of this study was to use whole-genome sequencing (WGS) to analyze SE from mice obtained at poultry farms in Pennsylvania. Documenting pathogen diversity can identify reliable biomarkers for rapid detection and speed up outbreak investigations. We sequenced 91 SE isolates from 83 mice (62 spleen isolates, 29 intestinal isolates) caught at 15 poultry farms between 1995-1998 using an Illumina NextSeq 500. We identified 742 single nucleotide polymorphisms (SNPs) capable of distinguishing each isolate from one another. Isolates were divided into two major clades: there were more SNPs differences within Clade B than counterparts in Clade A. All isolates containing antimicrobial resistance genes belong to Subgroup B2. Clade-defining SNPs provided biomarkers distinguishing isolates from 12 individual subgroups, which were separated by farm location or year of collection. Nonsynonymous changes from the clade-defining SNPs proffered a better understanding of possible genetic variations among these isolates. For a broader view of SE diversity, we included data from NCBI Pathogen Detection Isolates Browser, in which subgroups in Clade B formed new SNP Clusters. Importance WGS and SNPs analyses are excellent and powerful tools for investigating SE phylogenies. Identifying the evolutionary relationships among SE isolates from mouse, poultry, environmental, and clinical isolates, along with patterns of genetic diversity, advances understanding of SE and the role mice may play in SE contamination and spread among poultry population. Our data was able to identify SE isolates from different farms or years of collection. Moreover, the annotations of clade-defining SNPs provided information about possible protein functions among these SE isolates from each subgroup. Clade-defining or farm-unique biomarkers were useful for rapid detection and outbreak investigations.


Abstract 21
Salmonella enterica serovar Enteritidis (SE), the most commonly reported serovar of 22 human salmonellosis, has been frequently associated with poultry farms, eggs and egg 23 products. Mice are known vectors of SE contamination in these facilities. The objective 24 of this study was to use whole-genome sequencing (WGS) to analyze SE from mice 25 obtained at poultry farms in Pennsylvania. Documenting pathogen diversity can identify 26 reliable biomarkers for rapid detection and speed up outbreak investigations. We 27 sequenced 91 SE isolates from 83 mice (62 spleen isolates, 29 intestinal isolates) 28 caught at 15 poultry farms between 1995-1998 using an Illumina NextSeq 500. We 29 identified 742 single nucleotide polymorphisms (SNPs) capable of distinguishing each 30 isolate from one another. Isolates were divided into two major clades: there were more 31 SNPs differences within Clade B than counterparts in Clade A. All isolates containing 32 antimicrobial resistance genes belong to Subgroup B2. Clade-defining SNPs provided 33 biomarkers distinguishing isolates from 12 individual subgroups, which were separated 34 by farm location or year of collection. Nonsynonymous changes from the clade-defining 35 SNPs proffered a better understanding of possible genetic variations among these 36 isolates. For a broader view of SE diversity, we included data from NCBI Pathogen 37 Detection Isolates Browser, in which subgroups in Clade B formed new SNP Clusters. 38 39 Importance 40 WGS and SNPs analyses are excellent and powerful tools for investigating SE 41 phylogenies. Identifying the evolutionary relationships among SE isolates from mouse, 42 poultry, environmental, and clinical isolates, along with patterns of genetic diversity, 43 Introduction 51 Salmonella enterica serovar Enteritidis (SE) is a long-standing public health concern in 52 the US (1); salmonellosis can result in hospitalization or death of infants, the elderly, 53 and those with compromised immune systems (2, 3). This pathogen has been strongly 54 associated with poultry farms, eggs, and egg products (4,5). In 2010, SE linked to shell 55 eggs resulted in an outbreak requiring the recall of a half billion eggs 56 (https://www.cdc.gov/salmonella/2010/shell-eggs-12-2-10.html) (6). 57 One of the challenges in resolving foodborne outbreaks associated with SE is the 58 extreme genomic homogeneity within a specific geographic location or ecology system 59 and its broad host range (6, 7). Mice are important biological vehicles contributing to SE 60 dissemination and amplification in chicken houses, especially among laying hens (8,9). 61 In fact, SE has been strongly correlated with rodent activity; chickens in caged housing 62 where mice are present are more likely to carry SE (10). Understanding the evolutionary 63 relationships among SE isolates from mice, poultry, environmental surfaces, and clinical 64 cases is important both for outbreak investigations and for identifying strains with 65 genetic markers for virulence or capacity for rapid host adaptation, such as mutations in 66 the mismatch repair gene mutS that can contribute to rapid evolution in 67 immunocompromised hosts (11). 68 Whole genome sequencing (WGS) methods have identified variations across otherwise 69 indistinguishable isolates from eggs and egg products (6, 12), SE associated with reptile 70 feeder mice (13), S. Montevideo from red and black pepper (14). Genome-wide single 71 nucleotide polymorphisms (SNPs) detected by WGS are considered as the most 72 valuable genetic markers for investigating the evolutionary relationships among SE 73

Phylogenetic analysis 135
Overview 136 We identified 742 SNPs and generated the maximum-likelihood phylogenetic tree 137 arising from these SNPs, as depicted in Fig 1. Tree tips were marked using 138 isolate name, source, year, farm, and NCBI Pathogen Detection Isolates Browser 139 SNP Cluster. For example, CFSAN051866 was labeled as 140 CFSAN051866_spleen_1996_FarmVII_SCA, which provides the following 141 details: this bacterium was isolated from a mouse spleen in 1996, that mouse 142 came from Farm VII, and the isolate fits within SNP Cluster A (28), which was 143 designated according to the NCBI Pathogen Detection Isolates Browser (Table  144 1). Subgroup names and the number of clade-defining SNPs were labeled on the 145 internal branches. For example, Subgroup B1 had the most clade-defining SNPs 146 (179 SNPs), while Subgroup A5 had only 6 clade-defining SNPs. 147

Phylogenetic Tree Construction 148
We recognized two major clades: Clades A and B, which further subdivided into 149 12 subgroups: A1 to A8 and B1 to B4. It was notable that all isolates carrying 150 antimicrobial resistance genes belonged exclusively to Subgroup B2. Moreover, 151 isolates in each subgroup had varied ranges of SNP differences. The maximum 152 SNP differences within Subgroups A1 and A5 were 33 (CFSAN063779 and 153 CFSAN063803) and 27 (CFSAN063788 and CFSAN063792) SNPs, respectively, 154 while the maximum number in Subgroup A3 was only 6 SNPs (CFSAN051856 155 and CFSAN051861). Subgroups A1 and A2 were the two largest subgroups, 156 containing 18 and 17 isolates, respectively. Subgroup B3 only contained two 157 isolates, as the smallest group in the tree. 158

Impact of Farm 159
Not all subgroups in each clade showed the same pattern of geographic 160 distribution, although some subgroups were exclusively comprised of isolates 161 from a single farm. For example, all Subgroup A2 isolates were from Farm III and 162 all Subgroup B1 isolates from Farm X. In contrast, Subgroups B3 only contained 163 two from Farm I; A3 contained isolates from Farms V and VII. 164 Our phylogeny revealed that some isolates from different farms can be grouped 165 together and were closely related: isolates in Subgroup A3 obtained from Farms 166 V and VII with few to no SNP differences among them. For example, 167 CFSAN051854 (CFSAN051854_spleen(m7)_1996_FarmV_SCA) and 168 CFSAN051864 (CFSAN051864_spleen_1996_FarmVII_SCA) where zero SNP 169 differences were observed (Table S1). 170 Isolates from some farms were only distantly related and, unsurprisingly, our 171 phylogeny showed these belonging to different subgroups. For example, 172 Subgroups B1 and B2 both contained isolates from Farm X, indicating that these 173 were distantly related to the rest of the isolates from our sequencing. 174 There were several cases in which isolates from different farms were found to 175 belong to the same subgroup: isolates from Subgroup B2, which contained 11 176 clade-defining SNPs, came from Farms VI, X, and XI. Isolates in Subgroup A3 177 were found at Farms V and VII, and there were only very small differences 178 among their SNPs (Table S1). 179 Although isolates in Subgroups A1 and A5 were found at different farms, isolates 180 from the same farm shared common ancestors. Specifically, all Subgroup A1 181 isolates were from Farm XII and XV, isolates from Farm XII formed a cluster and 182 shared a common ancestor, and another common ancestor was shared by all 183 isolates collected from Farm XV. 184

Impact of Isolation Year 185
Isolates in each subgroup were collected during the same year, with only two 186 exceptions: A1 contained isolates from 1995 and 1996, and A5 contained 187 isolates from 1997 and 1998. In Subgroup A1, isolates from 1995 were grouped 188 together sharing common ancestor, which also applied to those from 1996 in 189 Subgroup A1. In another case, all Subgroup A5 isolates were collected from 190 1998 except CFSAN063788, which was from 1997. 191

296
The dissemination of SE via mice, particularly on poultry farms, is considered to be one 297 of the most serious threats to poultry industry today (2). Here, we characterized a set of 298 91 SE that (i) represented two organs in mice that have been associated with 299 dissemination of SE among poultry and hence to humans, (ii) were isolated at 15 farms 300 in Pennsylvania during the mid-1990s, which was a time during which few SE isolates 301 from mice have previously been sequenced, and (iii) analyzed in combination with the 302 open access NCBI Pathogen Detection Isolates Browser. These steps allow us to 303 construct a more nuanced picture of SE dissemination during the 1990s, and also 304 identify connections between historic isolates and current SE phenotypes. 305 Our study demonstrated that WGS not only reliably distinguishes among closely related 306 SE isolates from mice and trace a genome back to its farm of origin and year of 307 isolation, but also allows sufficient resolution to distinguish between SE isolates, even 308 those collected from different organs (spleens and intestines) of individual mouse. In 309 addition, our analyses showed that (i) isolates carrying antimicrobial resistance genes 310 formed a separate subgroup, which could indicate a shared mechanism which enables 311 that feature, (ii) open access WGS database contributes comprehensive perspectives to 312 our understanding of selected isolates, and (iii) new clade-defining markers and NCBI 313 Pathogen Detection Isolates Browser SNP Clusters were identified, offering tool with 314 high resolution in outbreak investigations and rapid detections to identify specific clade 315 related to certain years or locations. 316 Our results strongly suggested it was possible for unique ecologies of SE to develop on 317 individual farms, although local adaptation is not inevitable. Farms I and X exemplify this 318 range of possibilities: Farm I exhibited heterogeneous isolates while isolates from Farm 319 X were shown to be highly similar. Isolates can spread from one location to another in 320 multiple ways: insects (29), wild birds (30), wild animals (31, 32), and even wind (33) 321 can move contamination from one place to another. However, among these possible 322 transmission routes, mice are ubiquitous pests (8-10), and their behaviors may help 323 shape those unique local ecologies: mice migrate periodically and also defend their 324 territories. Understanding the genetic relatedness among the SE carried by mice and 325 the SE found in veterinary, food, and human sampling will help improve safety and 326 security in poultry industry. 327 WGS data identified a subgroup consisting exclusively of isolates carrying 328 antimicrobial resistance genes. 329 Previously, WGS has been used to differentiate drug-resistant S. enterica isolates from 330 different locations, which can exhibit notable differences in resistant-relevant genotypic 331 and phenotypic characteristics (34). Other research has shown WGS can be valuable in 332 predicting phenotypic resistance among both S. enterica (34, 35) and E. coli (36). In the 333 current study, WGS analyses revealed that all our Subgroup B2 isolates carried bla TEM-1 334 and tetA. It is possible that Subgroup B2 isolates share specific genetic features that 335 permit them to obtain and carry antimicrobial resistance genes via horizontal gene 336 transfer, or make it more likely for those genes to be maintained. For example, bacteria 337 that carry non-functional Clustered Regularly Interspaced Short Palindromic Repeats 338 (CRISPR) /cas system could acquire plasmids carrying antimicrobial resistance genes. 339 Possession of a fully-functioning CRISPR/cas system is reversely correlated with 340 antimicrobial resistance in bacteria (37-39). 341

Open access genome databases allow greatly expanded genomic and 342 phylogenetic investigations 343
In the NCBI Pathogen Detection Isolates Browser, comprehensive data was available 344 for each genome, including up to 40 columns of detail such as WGS run qualities, 345 outbreak relatedness, and antimicrobial resistance genotypes. The Browser also 346 assigns specific cluster ID numbers computed based on SNP distances. Although these 347 cluster numbers can change as new information is added to the Browser, this feature 348 allows researchers to quickly identify isolates most closely related to target isolates, 349 which can assist in recognizing possible connections among clinical illness cases. The 350 phylogenetic analyses from the Browser were consistent with our phylogenetic tree. 351 Multiple subgroups in the current study formed distinct SNP Clusters containing isolates 352 exclusively from our collection, like B3 isolates in SCC. We identified clinical isolates 353 and poultry related isolates closely related to our isolates, such as SE PNUSAS011122 354  Finding such genetic identifiers can help rapidly determine outbreak lineages and 361 accurately distinguish highly clonal clades (6). The nonsynonymous changes we 362 identified in this study suggested that a combination of several genetic factors has 363 facilitated the survival and growth of SE, resulting in different contamination risks for 364 each subgroup. For example, the icmF we identified in Subgroup A1 was part of Type 365 VI Secretion System, which is known to be required for full virulence in mice (40,41). 366 Similarly, fimH alleles have been associated with the abilities of Salmonella to bind onto 367 avian or mammalian cells (42). Despite the clonal structure of SE, isolates vary greatly 368 in the ability to contaminate eggs, which is biologically independent of phage types 369 those isolates belong to (15,43,44). The heterogeneity of metabolic profiles in SE 370 isolates might provide an explanation for the variation in contamination capability (15). 371 The accumulation of mutations that affect gene function is a significant part of the 372 process by which S. enterica becomes host adapted (45). Such host adaptations may 373 well be occurring at some of the farms where we collected SE from local mice, with 374 important consequences for the safety and security of the poultry supply chain. Notably, 375 serovars Enteritidis, Gallinarum, and Pullorum can circulate within the same farm, and 376 sometimes within the same bird, as evidenced by field analyses conducted in South 377 America (46). Therefore, WGS also has potential for detecting evolutionary trends within 378 SE that could threaten the poultry industry supply chain. Our data also pave the way for 379 research on poultry pathogenic serovars S. Gallinarum and S. Pullorum, which diverged 380 independently from an Enteritidis-like ancestor (3,47,48).  spleens and intestines. We constructed the phylogenetic tree using 742 single 389 nucleotide polymorphisms (SNPs). All sequenced isolates were divided into two major 390 clades, Clade A and B, which were further grouped into 12 subgroups. 391 392 Figure 2. CFSAN051841_intestine (m4)_1995_FarmIII_SCA CFSAN051838_spleen (m2) (14) A2 (19) A3 (23) A4 (14) A5 (6) A6 (31) A7 (13) A8 (10) A3&4&5 (2) A7&8 (14) B2 (11) B3 (23) B4 (16) B1 (179) B3&4(52)