Convergence of plasmid architectures drives emergence of multi-drug resistance in a clonally diverse Escherichia coli population from a veterinary clinical care setting

The purpose of this study was to determine the plasmid architecture and context of resistance genes in multidrug resistant (MDR) Escherichia coli strains isolated from urinary tract infections in dogs. Illumina and singlemolecule real-time (SMRT) sequencing were applied to assemble the complete genomes of E. coli strains associated with clinical urinary tract infections, which were either phenotypically MDR or drug susceptible. This revealed that multiple distinct families of plasmids were associated with building an MDR phenotype. Plasmidmediated AmpC (CMY-2) beta-lactamase resistance was associated with a clonal group of IncI1 plasmids that has remained stable in isolates collected up to a decade apart. Other plasmids, in particular those with an IncF replicon type, contained other resistance gene markers, so that the emergence of these MDR strains was driven by the accumulation of multiple plasmids, up to 5 replicons in specific cases. This study indicates that vulnerable patients, often with complex clinical histories provide a setting leading to the emergence of MDR E. coli strains in clonally distinct commensal backgrounds. While it is known that horizontally-transferred resistance supplements uropathogenic strains of E. coli such as ST131, our study demonstrates that the selection of an MDR phenotype in commensal E. coli strains can result in opportunistic infections in vulnerable patient populations. These strains provide a reservoir for the onward transfer of resistance alleles into more typically pathogenic strains and provide opportunities for the coalition of resistance and virulence determinants on plasmids as evidenced by the IncF replicons characterised in this study.


Introduction
E. coli is an important commensal organism and a significant pathogen. It is associated with intestinal and extra-intestinal infections and is a leading cause of urinary tract infections (UTIs) and bacteraemia leading to sepsis (Pitout, 2010).
Beta-lactam antimicrobials are widely used in both human and animal medicine. Due to their spectrum of activity, pharmacokinetic characteristics and good safety profile, members of the group are a good choice in the treatment of UTIs associated with E. coli. However, there is increasing resistance to these antimicrobials, mediated by the ability of Enterobacteriaceae such as E. coli, to produce plasmid mediated AmpC (pAmpC) and extended spectrum beta-lactamase enzymes (ESBLs). This imparts resistance to most of the Beta-lactam antimicrobials, including the later generation cephalosporins. In addition, many are also resistant to other antimicrobial classes rendering them multi-drug resistant (MDR), significantly increasing morbidity and mortality (Harris, 2015). The prevalence of pAmpC and ESBL resistance is increasing in both hospital and community acquired infections (Nakai et al., 2016). Less information is available for companion animals, but studies that have evaluated this report a resistance epidemiology similar to that observed in human clinical care (Dierikx et al., 2012;Gibson et al., 2010;Murphy et al., 2010;Tamang et al., 2012) Successful dissemination of resistance within MDR E. coli is attributable to the fact that they are mostly located on horizontally transferable elements (HTEs) such as plasmids and transposable elements. HTEs are often promiscuous with a diverse bacterial host range, conferring resistance not just within but also between bacterial species, generating a large potential resistance reservoir (Bortolaia et al., 2011). It is important to determine the genetic context of resistance alleles to understand co-inheritance of traits that may drive selection and the potential of resistances to coalesce in single isolate backgrounds.
As sequencing technologies have advanced, an increasing number of resistant E. coli strains, have been sequenced. Initially, much of the sequence comparison relied upon read mapping and short read de novo assembly. These methods often fail to accurately resolve complete chromosomes and other replicons due to multiple repeat regions which are often present in mobile genetic elements. As a consequence, there is a relative lack of availability of high quality sequence data relating to complete bacterial plasmids encoding antibiotic resistance. The advent of long read sequencing technologies is now leading to increased representation of these sequences in public databases.
The aim of this study was to characterise MDR E. coli associated with urinary tract infections in dogs focusing on the use of Illumina sequencing combined with Pacific Biosciences single molecule real time (SMRT) sequencing to elucidate the plasmid repertoires that have been assembled to generate MDR isolates.

Bacterial strains
All isolates (16 MDR and 14 antibiotic susceptible (AS)) were previously characterized in terms of bacterial identification, phylogenetic group, plasmid replicon type, MDR phenotype and ST type (Wagner et al., 2014).

Illumina sequencing
E. coli isolates were cultured overnight at 37°C, 170 rpm, in lysogeny broth (LB). DNA was extracted using Qiagen DNeasy extraction kit (Qiagen U.K.). DNA was isolated from 1 ml of bacterial culture, according to the manufacturer's specifications. Following integrity assessment on agarose gels and quantification/quality determination by spectrophotometry (including absorbance 260:280 nm), Nextera XT libraries were prepared by Edinburgh Genomics. Paired-end Illumina sequencing was performed using an Illumina Hi-Seq 2500 sequencer with read lengths of 100 bp to achieve an average depth of 60×.

SMRT sequencing
SMRT sequencing was performed using the PacBio platform (Pacific Biosciences). E. coli were inoculated into 10 ml LB and incubated overnight at 37°C, 170 rpm. DNA was extracted using the Qiagen 100/ G extraction kit, according to the manufacturer's instructions. Extracted DNA was prepared for sequencing using AMPure Beads (Beckman Coulter), with a target insert size range of 10 kb or greater. DNA was sheared using a g-TUBE™ (Covaris). Sheared DNA was purified and concentrated using AMPure beads. Samples were eluted using PacBio Elution Buffer, and single stranded fragments were removed with ExoVII DNA. Fragments were then repaired using the PacBio DNA Damage and Repair Ends buffers, as per the protocol. Processed DNA was further purified using AMPure beads; then blunt ended sequence adapters were ligated, and ExoIII and ExoVII restriction enzymes used to remove any failed ligation products. Successfully ligated sequence fragments were concentrated using three successive AMPure bead purification steps. Final DNA concentration for all of the samples exceeded 5 μg. DNA was sequenced on the PacBio RSII sequencing platform.

Sequence assembly and annotation
Illumina sequence reads were filtered using Sickle (Joshi and Fass, 2011) with a minimum read quality score of 30, and a read length of 50 bp. Reads were re-shuffled with shuffleSequences fastq.pl script (Zerbino, 2010). De novo assembly used both paired reads and singletons. Illumina sequence reads were assembled with the VelvetOptimiser script, from Velvet 1.2.08 (Zerbino and Birney, 2008) using a kmer size range of 47 to 67.
SMRT analysis was used to generate a fastq file from the PacBio reads and error-corrected reads were adjusted using PBcR with selfcorrection (Koren et al., 2013). Then the longest 20× coverage reads were assembled with Celera Assembler 8.1 and polished using Quiver (Chin et al., 2013). Annotated genomes (Do-It-Yourself Annotator (DIYA) (Stewart et al., 2009)) were imported into Geneious (Biomatters LTD., Auckland, New Zealand) (Kearse et al., 2012) and duplicated sequence removed from the 5′ and 3′ ends to generate the circularized chromosomes/plasmids. Origin of replication was approximated using Ori-Finder (Luo et al., 2014) and the chromosome reoriented using the origin as base 1.

Plasmid sequence comparison
CCTViewer (Grant et al., 2012) was used to visualise SMRT sequenced plasmids, using 1428 p96 as a reference sequence for the IncI1 plasmids (Accession no CP023370) and 144 p134 (Accession no CP023363) as the reference for the IncF plasmid (Table 2). Pro-gressiveMauve (Darling et al., 2010) alignment was performed using default parameters. ResFinder (Zankari et al., 2012) was used to confirm annotated antimicrobial resistance markers. BlastKOALA (Kanehisa et al., 2016) was used to identify putative virulence factor sequences on the plasmid.

Analysis of plasmid core and pan genomes
Plasmid replicon and clonal complex types were determined using the pubMLST database (Jolley and Maiden, 2010). Core sequence homology between the plasmid sequences was detected with GE-T_HOMOLOGUES (Altschul et al., 1997) using bi-directional best-hits (BDBHs) and orthoMCL algorithms. Protein clusters were aligned with Muscle (Edgar, 2004). Amino acid alignments were then translated back into nucleotide sequences using the PAL2NAL (Suyama et al., 2006), concatenated and transformed into phyml format with catfas-ta2phyml/catfasta2phyml.pl script to use for RaXML phylogenetic estimation under a GTR model and 100 bootstrap replicates (Stamatakis, 2014). GET_HOMOLOGUES was also used for the pan genome analysis, using the PARS program from the PHYLIP package (Contreras-Moreira

Comparative analysis of E. coli strains
Core SNP-based phylogenetic analysis of the 30 clinical UTI associated isolates (16 MDR and 14 AS) plus 10 reference strains (Table 1) was carried out, and identified at least two distinct clusters that mostly correlates well with the isolates' MDR status (Fig. 1). Isolates were ancestrally diverse, however there was a trend for susceptible isolates to be more closely associated to human UPEC E. coli reference sequences and MDR isolates clustered together with commensal E. coli reference sequences. Whole genome sequencing (WGS) analysis of the MDR strains confirmed previous standard phylotyping results regarding their diverse commensal backgrounds. Isolates that possess the IncI1/IncF plasmid genotype were well dispersed throughout the tree. Tree clades were largely independent with respect to date of isolation.

Sequencing and plasmid carriage
Plasmid sequences could not be assembled using Illumina short read sequence data. Therefore, 8 MDR isolates (127, 1223, 1283, 1428, 144, 317, 746 and 1943 (Supplementary Table S1)) were sequenced by SMRT, these were selected to examine the IncI1 replicon context of the CMY-2 (AmpC) beta-lactamase. Two sensitive isolates (1190 and 1105, (Supplementary Table S2)) but carrying an IncI1 replicon were also sequenced by SMRT. The plasmid combinations present in the MDR strains are described in Table 2 and depicted in Fig. 2 along with their contribution to MDR as defined by carriage of specific resistance alleles.
While 7/8 MDR strains had the anticipated replicons, one isolate (317) only carried an IncF plasmid, despite PCR and Illumina sequence data indicating the presence of IncF and IncI1 replicons. It is possible that the plasmid was lost during the experimental process. One susceptible isolate, 1105, was PCR positive for the IncI1 genotype, but this replicon could not be assembled from the SMRT sequence data.

IncI1 comparative analysis
In the MDR isolates, the IncI1 plasmid sequence sizes ranged from 85 to 96 kb, with between 106 and 113 coding domain sequences; many of which still had no ascribed function. All but two of the IncI1 plasmids belonged to the same clonal complex (CC-2), determined in silico, based on the presence and sequence similarity of repI1, ardA, trbA, sogS, and pilL to previously published IncI1 pMLST profiles. As anticipated, the IncI1 plasmids associated exclusively with the CMY-2 encoded pAmpC beta-lactamase resistance gene; this was located within a resistance cassette associated with a mobile element (Figs. 2 and 3).
Significant periods of time separate isolation of the sequenced strains; the first collected in 2001, the most recent 2011. This makes the comparison of the IncI1 replicons unique as it provides insight into the evolution of this resistance-encoding replicon in our local context over this time period.
BLAST Atlas search using the CCT comparison tool for analysis of the Incl1 replicons show greater than 90% sequence similarity for most of the sequences. Multiple collinear blocks, with high synteny and sequence similarity and limited large-scale rearrangements were identified using progressive Mauve (Fig. 3).
IncI1 plasmids were compared against 42 plasmid sequences obtained through GenBank, which were identified by BLAST as being similar ( Supplementary Fig. 1). Pan-genomic analysis detected 305 homologous gene clusters across all sequences. Phylogenetic estimations using homologous cluster presence or absence indicated a close phylogenetic relationship between all IncI1 plasmids sequenced in this study including the IncI1 plasmid from the susceptible E. coli isolate in this study (1190/01 Accession no CP023387). The resistance cassette conferring CMY-2 mediated pAmpC resistance although not exclusive to the canine resistance plasmids, was not detected in many other IncI1 sequences available on databases. Ten of the homologous gene clusters: traL, traM, nikB, traO, traJ, traF, traE, traT, traI, and traX; all relating to plasmid transfer and replication were core to all the IncI1 plasmid sequences (Fig. 4). These were extracted from the pan-genomic analysis and aligned for maximum likelihood core genome phylogenetic estimation ( Supplementary Fig. 1). Despite only weak bootstrap support overall, the phylogenetic tree is congruent with the predicted in silico MLST plasmid groups. The IncI1 plasmids included in the analysis were from different Salmonella serovars and E. coli isolated from different animal hosts suggesting that interspecies transmission of the IncI1 plasmids occurs. However, IncI1 sequences associated with bacteria isolated from humans were largely absent, despite the over representation of human isolates in databases in general. The majority of canine sequences form a sub-cluster, with chicken-and porcine-associated plasmid sequences being the most similar. Of note, CP009566 and KF434766 are the only two reference IncI1 plasmids associated with canine clinical infections; isolated in Salmonella enterica serovar Newport in Arizona in 2015 (Cao et al., 2015) and E. coli in Denmark in 2008 (Hansen et al., 2016) respectively. These were closely related to the canine E. coli plasmids sequenced in this study even though their host bacterial strains were obtained from different geographical areas and at different times ( Supplementary Fig. 1). hha, virD, virB * Each plasmid is referred to by the isolate ID followed by the letter p and then a number which represents the size of the plasmid in KB. nt = non typable, CC = clonal complex, − = non identified, § = putative split contig, CDS = coding sequence, ST = sequence type.

IncF comparative analysis
The IncF plasmids were more heterogeneous in size (100-147 kb), the number of coding regions (105-150), and the number and classifications of various resistance markers. Aminoglycoside, chloramphenicol, potentiated-sulfonamide, and beta-lactam resistance markers (other than CMY-2) were associated with IncF or smaller untyped plasmids (Fig. 2).
Despite the greater variety and number of resistance markers shared amongst some of the IncFII plasmids, five contained no detectable resistance markers. Virulence genes, mostly associated with iron uptake, and genes associated with metabolism were detected on many of the IncF plasmids. In silico replicon typing indicates that many of these plasmids were of mixed lineage (Table 2). IncF plasmids showed less sequence similarity when compared by BLAST or by progressiveMauve sequence analysis. With one exception, all regions of local co-linearity had little synteny, or support across all the plasmid sequences (Fig. 5). GenBank BLAST searches identified 9 sequences similar to the IncFII/ IncFIB plasmids. Pan-genomic maximum parsimony analysis indicated diverse content (Fig. 6a). No core genes could be identified for all the plasmid sequences, although a subset, excluding 1428 p66, 746 p62, 746 p72, and 1943 p80, could be compared using 8 homologous gene clusters; as with the IncI1 plasmids these genes were mostly associated with plasmid maintenance and replication, including traA, traL, traE, traB and traX. The different replicon sub-types are depicted in Fig. 6b, with FII/FIB the most commonly identified. A singular clade of plasmids, exclusively IncFII, showed greater sequence homology than the remaining plasmid sequences (Fig. 6a & b). Maximum likelihood phylogenetic analysis of IncFII, IncFIA, and IncFIB replicon types showed dispersal of all sequences of different replicon types throughout the tree (Fig. 6b). None of the PCR typed IncFII plasmids sequenced in this study were members of the same clade and 317 p100 was a significant outlier from the rest of the plasmid sequences.

Other plasmids
Numerous other extra-chromosomal contigs were detected from SMRT sequencing. IncI1, IncFII, IncFIA, and IncFIB, were the only Fig. 3. Core sequence alignment of IncI1 replicon type plasmid sequences, using 1428 p96 as the index sequence. Comparisons were performed using a BLAST Atlas search using the CCT comparison tool. Both the whole nucleotide sequence (smaller ring) and coding domain sequence specific (larger ring) BLAST comparisons were carried out. ProgressiveMauve was also used to compare the plasmids. Local co-linear blocks, detected by progressiveMauve have been annotated onto the BLAST comparisons. detected incompatibility types, yet account for a fraction of plasmid sequences. The nature of the SMRT sequencing and the analysis means that these sequences are very unlikely to be part of the main chromosome or the plasmids with defined replicon-types. Whilst many of these do not contain identifiable resistance markers, some do. Many of these sequences do not have established replication and transfer machinery encoded on them, so their transfer capacity is currently unknown, but they may represent small replicons that can be co-inherited with other transferred plasmids or possibly by other methods such as transduction.

Discussion
MDR E. coli isolates are relatively rarely isolated from in and outpatient samples presented for testing at the University of Edinburgh's small animal hospital. As previously described (Wagner et al., 2014) MDR isolates are often associated with animals with complex medical histories and understanding their emergence in the context of generally sensitive E. coli isolates (AS) provides an important opportunity to develop our understanding of MDR emergence in a clinical setting.
Combined use of Illumina and SMRT sequencing has allowed detailed examination of individual plasmids and strain background, as well as an overview of plasmid carriage in the context of individual isolates.
Phenotypic beta-lactamase resistance is attributed to pAmpC, encoded exclusively by the CMY-2 allele on IncI1 replicon plasmids, forming a notably closely related phylogenetic cluster, with high levels of homogeneity in the IncI1 plasmid sequence, despite the strains having been collected over a 10 year period. The CMY-2 allele was the only identifiable resistance gene present on this subset of IncI1 plasmids. This has been reported previously in CMY-2 carrying IncI1 plasmids isolated from canine, feline and human hosts (Bortolaia et al., 2014;Sidjabat et al., 2014). The majority of the MDR IncI1 plasmids share a closest common ancestor, using maximum-likelihood analysis of the core genome. This is either due to i) limited sequence divergence or ii) sequence convergence. Given the genetic distance between the susceptible IncI1 plasmids, the absence of any dominant E. coli clone associated with the IncI1 plasmids, the lack of any indication of potential bacterial host range of the plasmids (other than E. coli), and the discordance of sequence similarity with the chronological sequence of the E. coli isolates; it is difficult to identify which. The reliability of any estimation of rates of divergence between the different plasmids is questionable; the identification of a similar plasmid backbone in canine clinical isolates CP009566 and KF434766 from the USA and Scandinavia does suggest underlying core genome stability (Bogaerts et al., 2015;Bortolaia et al., 2014). CMY-2 has also been identified from faeces of healthy dogs in a number of geographical locations including the Netherlands, Mexico, France and Japan, (Baede et al., 2015;Haenni et al., 2014;Okubo et al., 2014;Rocha-Gracia et al., 2015) and where determined, the plasmid context is predominantly IncI1. This may indicate that the IncI1-CMY-2 is endemic to the commensal population especially in the dog. The common use of cephalosporins such as cephalexin in companion animal practice has been implicated in high prevalence carriage particularly in canine isolates (Haenni et al., 2014).
In comparison to the IncI1 plasmids, the IncF plasmids are more disparate, in core and pan-genome sequence content. This could be a consequence of experimental design, as the isolates for this study were collected based on their pAmpC production, the gene for which is present on the IncI1 plasmid. However other studies also provide evidence for the relative heterogeneity of the IncF plasmids (Villa et al., 2010), and relatively few sequences could be identified in the NCBI database, sharing nucleotide similarity with the IncF sequences in this study; with little conservation of resistance markers between one IncF plasmid and another. Another contrasting feature of the IncF plasmids, supported by other studies, was their greater compliment of resistance alleles and putative virulence-associated genes compared to the IncI1 plasmids. As most of the strains containing the IncI1/IncF plasmid combination specifically were associated with commensal phylogroups, we speculate that acquisition of IncF plasmids in combination with IncI1 drives emergence of normally commensal E. coli strains resulting in clinical disease in vulnerable patients, especially in the presence of antibiotic selection. All patients from which MDR isolates were identified could without exception fall into this categorisation having significant underlying disease, often multiple antibiotic treatments and some receiving immunosuppressant therapies (Wagner et al., 2014). The dispersion of resistance genes across different plasmids, in many of the isolates, suggests sequential acquisition perhaps whilst part of the Fig. 4. Maximum-parsimony analysis using detected presence or absence of homologous gene clusters from the pan genome of PacBio IncI1 replicon sequences, and comparator IncI1 sequences obtained from the NCBI nucleotide sequence database. Plasmid sequences from this study lie within the red box. Core genome genes (used for maximumlikelihood analysis) are indicated by *. The CMY-2 resistance cassette, common to the IncI1 plasmid sequences isolated in this study, is highlighted by the blue box. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 5. Core sequence alignment of IncFII replicon type plasmid sequences, using 1223 p147 as the index sequence. Comparisons were performed using a BLAST Atlas search using the CCT comparison tool. Both the whole nucleotide sequence (smaller ring) and coding domain sequence specific (larger ring) BLAST comparisons were carried out. ProgressiveMauve was also used to compare the plasmids. Local co-linear blocks detected by progressiveMauve have been annotated onto the BLAST comparisons.
normal flora of the gastrointestinal tract; as it is unlikely that the transfer of multiple plasmids would occur as a singular event. Commensal strains have previously been implicated as a significant component of the resistance reservoir (Marshall and Levy, 2011) whilst other studies have clearly identified the carriage of CMY-2 in more typical UPEC strains such as ST 131 (Dashti et al., 2014). SMRT sequencing identified strains containing up to 5 plasmids contributing up to 300KB of additional genetic information. Much of the function of these coding regions is unknown Sequencing also revealed the presence of small non-typable plasmids of which we were previously unaware. It is assumed that the acquisition of so much extrachromosomal DNA may be energetically costly to the bacterial host, and many of the plasmids may not be stably maintained in this combinatorial manner in these specific backgrounds without antibiotic selective pressure. The long-term stability of these plasmids within their host bacterial genomes is currently unknown, but investigation of this in future work would be of significant value.
The MDR E. coli strains characterised in this study were isolated from clinical cases with significant underlying disease, which had received often multiple courses of antimicrobial chemotherapy. This represents a model of the genetic and phenotypic adaptation of E. coli to current clinical practices in both the human and veterinary setting. Patient vulnerability and antibiotic selective pressures provide an environment for the emergence of opportunistic MDR resistant E. coli based on the acquisition of at least two plasmid replicon groups, with numerous other horizontal DNA molecules also under selection. It will be of interest to evaluate the long-term stability of extra-chromosomal DNA in relation to antibiotic selective pressure and individual plasmid and resistance gene effects. In addition, it would also be of value to establish the reservoir potential of isolates as a source of resistance. It could be argued that the commensal background of the isolates requires a confluence of several factors for disease to occur, but should the horizontal transfer of plasmids to more pathogenic E. coli occur then the clinical significance of this increases exponentially.

Transparency declarations
None to declare.

Funding
This work was funded through a PhD case studentship jointly funded by the Biotechnology and Biological Sciences Research Council (R42122) and Zoetis (R82977).