Epistatic interactions between the high pathogenicity island and other iron uptake systems shape Escherichia coli extra-intestinal virulence

The intrinsic virulence of extra-intestinal pathogenic Escherichia coli is associated with numerous chromosomal and/or plasmid-borne genes, encoding diverse functions such as adhesins, toxins, and iron capture systems. However, the respective contribution to virulence of those genes seems to depend on the genetic background and is poorly understood. Here, we analyze genomes of 232 strains of sequence type complex STc58 and show that virulence (quantified in a mouse model of sepsis) emerged in a sub-group of STc58 due to the presence of the siderophore-encoding high-pathogenicity island (HPI). When extending our genome-wide association study to 370 Escherichia strains, we show that full virulence is associated with the presence of the aer or sit operons, in addition to the HPI. The prevalence of these operons, their co-occurrence and their genomic location depend on strain phylogeny. Thus, selection of lineage-dependent specific associations of virulence-associated genes argues for strong epistatic interactions shaping the emergence of virulence in E. coli.


March 2021
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy

Human research participants
Policy information about studies involving human research participants and Sex and Gender in Research.
Reporting on sex and gender Population characteristics

Recruitment
Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. All studies must disclose on these points even when the disclosure is negative.

Study description
Research sample analyse the region of genome plasticity. Mafft v7.31.0 was used to generate multiple alignment of some virulence associated genes. R package "ape" v5.1 was used to compute patristic distances from the phylogenetic tree of the virulence associated phylogenetic tree. R package "phytools" was used to correct for phylogenetic structure using Pagel's model when searching for associations between VAGs at the whole species level and associations between the inactivation of a given VAG and the presence/absence of other VAGs. The study aims to decipher the genetic determinants involved in virulence both at the clonal (CC87) and species level. To this end, it used a mouse model of sepsis coupled with a genome-wide association study to identify the most relevant genes. Particular emphasis is placed on genes related to iron acquisition in terms of prevalence, co-occurrence and genomic location (i.e. chromosome or plasmid).
A first dataset consists of 232 strains belonging to the clonal complex 87 (CC87). This dataset gathered strains and genomes from several origins (described in Supplementary Data 1) as well as genomes obtained from a previous study by Reid et al. (PMID: 35115531). This dataset has been used because it is representative of the whole diversity of the CC87, including both ST58 and ST155 strains. The second dataset is composed 370 genomes representative of the genus Escherichia and which has been previously described (PMID: 33112851). Finally, the third dataset consists of the complete E. coli genomes available on RefSeq on September 19, 2022. This dataset has been used because it is composed of high-quality circularized genomes (chromosome +/-plasmids) from strains belonging to the main the E. coli phylogroups.
No sample size calculation was performed. The genomes from the first dataset (232 CC87 genomes) were obtained from various strains as described in Supplementary Data 1. These strains were chosen because they were diverse in terms of sequence type (ST58 or ST155), source, host and pathotype. No data were excluded from the analysis.
Mouse sepsis assay were done 10 times for each tested strains. Negative and positive controls were included in each experiment. In all cases, we obtained similar results for the controls, mice being killed only by CFT073 at the same time after the inoculation +/-4h.
The genomes from the CC87 were groupes based on their phylogenetic history in congruence with sequence types. The complete genomes from RefSeq were grouped based on phylogroups and sequence types.
The mice were inoculated in a blind experiment by the zootechnician that ignores the status of the strain.
Female mice OF1 of 14-16 g (4 week-old) from Charles River (L'Arbresle, France) were used for the mouse sepsis assay. Housing conditions for the mice were in agreement with the French law, with dark/light cycle, and constant ambient temperature (21°C +/-2°C ) and humidity (50% +/-10%).
The study did not involve wild animals Only female mice were used.
The study did not involve animals collected from the fiels.