Population genomics and antimicrobial resistance in Corynebacterium diphtheriae

Corynebacterium diphtheriae, the agent of diphtheria, is a genetically diverse bacterial species. Although antimicrobial resistance has emerged against several drugs including first-line penicillin, the genomic determinants and population dynamics of resistance are largely unknown for this neglected human pathogen. Here, we analyzed the associations of antimicrobial susceptibility phenotypes, diphtheria toxin production, and genomic features in C. diphtheriae. We used 247 strains collected over several decades in multiple world regions, including the 163 clinical isolates collected prospectively from 2008 to 2017 in France mainland and overseas territories. Phylogenetic analysis revealed multiple deep-branching sublineages, grouped into a Mitis lineage strongly associated with diphtheria toxin production and a largely toxin gene-negative Gravis lineage with few toxin-producing isolates including the 1990s ex-Soviet Union outbreak strain. The distribution of susceptibility phenotypes allowed proposing ecological cutoffs for most of the 19 agents tested, thereby defining acquired antimicrobial resistance. Penicillin resistance was found in 17.2% of prospective isolates. Seventeen (10.4%) prospective isolates were multidrug-resistant (≥ 3 antimicrobial categories), including four isolates resistant to penicillin and macrolides. Homologous recombination was frequent (r/m = 5), and horizontal gene transfer contributed to the emergence of antimicrobial resistance in multiple sublineages. Genome-wide association mapping uncovered genetic factors of resistance, including an accessory penicillin-binding protein (PBP2m) located in diverse genomic contexts. Gene pbp2m is widespread in other Corynebacterium species, and its expression in C. glutamicum demonstrated its effect against several beta-lactams. A novel 73-kb C. diphtheriae multiresistance plasmid was discovered. This work uncovers the dynamics of antimicrobial resistance in C. diphtheriae in the context of phylogenetic structure, biovar, and diphtheria toxin production and provides a blueprint to analyze re-emerging diphtheria.


Background
Diphtheria, if untreated, is one of the most severe bacterial infections of humans. It typically affects the upper respiratory tract causing pseudomembrane formation, sometimes leading to suffocation and death. The infection can be complicated by systemic symptoms, caused by the diphtheria toxin. Other forms of disease are skin and invasive infections, including endocarditis [1,2].
The agent of diphtheria is Corynebacterium diphtheriae, a member of the phylum Actinomycetes [3,4]. The diphtheria toxin, encoded by the tox gene, is carried by lysogenized corynephages within the chromosome of some C. diphtheriae strains [5,6]. Concern exists about the possibility of lysogenic conversion of previously nontoxigenic strains during colonization, infection, or transmission chains [7]. However, knowledge on the microevolutionary dynamics between tox-positive and tox-negative strains is limited. The high genetic diversity of C. diphtheriae strains underlies their variable colonization, adhesion, and pathogenicity properties [8][9][10]. Although three main biovars (Mitis, Gravis, and Belfanti) are distinguished since the 1950s, their phylogenetic relationships are poorly defined [11][12][13].
Diphtheria used to be one of the deadliest infections in young children but has been largely controlled by vaccination with the highly effective toxoid vaccine [14]. Even so, thousands of cases of diphtheria are still reported annually [15], and large outbreaks can quickly follow the disruption of public health systems [14,[16][17][18]. In countries with high vaccination coverage, diphtheria cases are associated with travel and migration from endemic regions [19][20][21]. As diphtheria vaccination is performed using an inactivated form of diphtheria toxin, it is not considered to prevent asymptomatic colonization and silent transmission of the pathogen, which still circulates and is the object of intense epidemiological surveillance [4]. However, vaccine preparations may include other antigens, and the impact of vaccination on C. diphtheriae evolution deserves further studies [22].
Clinical management of infections with toxigenic isolates includes treatment with diphtheria antitoxin (DAT), which can prevent or reduce the systemic effects of the toxin [4]. Nevertheless, antimicrobial treatment is critical in the clinical management of both tox-positive and tox-negative infections, as it contributes to the elimination of the bacteria within the patient and limits transmission to novel individuals [23]. With DAT production being threatened [24], antimicrobial treatment might become even more critical in diphtheria therapy.
Antimicrobial resistance genes have been described in C. diphtheriae, including the erythromycin resistance gene ermX on plasmid pNG2 [36] and genes dfrA16, qacH, and sul1 carried on a class 1 integron, mobilized by IS6100 [37]. However, the prevalence and phylogenetic distribution of resistance genes in C. diphtheriae clinical isolates are unknown. Six chromosomal penicillin-binding proteins (PBPs) have been reported in C. diphtheriae [38], but so far, no association between pbp or other genetic variation and penicillin resistance has been described. Understanding the genetic basis of antimicrobial resistance in C. diphtheriae would improve our ability to diagnose and track its spread.
The aims of this study were (i) to characterize antimicrobial resistance phenotypes in a large collection of C. diphtheriae strains with diverse geographical and temporal origins and to uncover genomic determinants of resistance and (ii) to analyze the population structure of C. diphtheriae and define the associations between antimicrobial resistance, diphtheria toxin production, biovars, and phylogenetic sublineages.
Second, we included 15 clinical isolates collected in France between 1981 and 1991, 11 of which had been deposited in the Collection de l'Institut Pasteur (CIP; A. Strains per year and resistance phenotypes origins B. Geographic origins historical clinical isolates subset in Additional file 1: Table S1). Third, to increase the genetic diversity and geographic range of the sample, the 65 available reference strains of ribotypes that belong to C. diphtheriae were included [40]. These reference strains represent an international collection of isolates collected over several decades and originating from multiple world regions including the Americas, Europe, Asia, Africa, and Oceania. Our subcultures of these strains were controlled for tox gene presence, toxin production, and biovar, leading to modifications of published characteristics in some instances (Additional file 1: Table S1; Fig. 1). Finally, four reference strains were included: strain NCTC13129, which is used as genomic sequence reference [38]; strain NCTC10648, which is used as the tox-positive and toxinogenic reference strain in PCR and Elek tests, respectively; strain NCTC11397 T , which is the taxonomic type strain of the C. diphtheriae species; and the vaccine production strain PW8, which corresponds to CIP A102 [41]. This third subset is referred to as "ribotype and reference strains" subset (Additional file 1: Table S1).

Bacterial cultures, identification, and biovar
Bacteria were cultivated on Trypto-Casein-Soy (TCS) agar for 24 h at 35-37°C. Bacterial identification was performed at the NRC-CCD as described previously [42] by multiplex polymerase chain reaction (PCR) combining a dtxR gene fragment specific for C. diphtheriae and a multiplex PCR that targets a fragment of the pld gene specific for C. pseudotuberculosis, the gene rpoB (amplified in all species of the C. diphtheriae complex) and a fragment of 16S rRNA gene specific for C. pseudotuberculosis and C. ulcerans, respectively. Isolates collected since 2014 were confirmed as C. diphtheriae by matrixassisted laser desorption-ionization time-of-flight mass spectrometry (MALDI-TOF MS) using Bruker technology. In order to exclude strains initially identified as C. diphtheriae but now classified as C. belfantii [42] or C. rouxii [43], genome-wide average nucleotide identity (ANI) was used as described previously [42]. Strains were characterized biochemically for pyrazinamidase, urease, and nitrate reductase and for utilization of maltose and trehalose using API Coryne strips (BioMérieux, Marcy l'Etoile, France) and the Rosco Diagnostica reagents (Eurobio, Les Ulis, France). The Hiss serum water test was used for glycogen fermentation. The biovar of isolates was determined based on the combination of nitrate reductase (positive in Mitis and Gravis, negative in Belfanti) and glycogen fermentation (positive in Gravis only). The rare biovar Intermedius was not identified, as its distinction from other biovars is based on colony morphology, which is considered subjective, or on lipophily, which was not tested.

Determination of the presence of the tox gene
Determination of the diphtheria toxin gene (tox gene) presence was achieved by a conventional tox PCR assay [44], while its phenotypic production was assessed by the modified Elek test [45]. We also confirmed tox PCR results by BLASTN (query: tox gene sequence from strain NCTC13129, RefSeq accession number: DIP_ RS12515) analysis of the genomic assemblies.
Antimicrobial susceptibility was determined using the disk diffusion method with impregnated paper disks (Bio-Rad, Marnes-la-Coquette, France) on Mueller Hinton agar plates supplemented with 5% of sheep blood and 20 mg/L β-NAD, as recommended. Minimum inhibitory concentrations (MICs) were determined using E-test strips (BioMerieux, Marcy l'Etoile, France). The control strain used is Streptococcus pneumoniae ATCC 49619. The zone diameter (ZD) data were interpreted into S, I, and R categories in the following way. First, we used the CA-SFM/EUCAST V.1.0 (January 2019) document (https://www.sfm-microbiologie.org/wp-content/ uploads/2019/02/CASFM2019_V1.0.pdf), which contains interpretative criteria for Corynebacterium spp. only for CIP, GEN, CLD, TET, RIF, and TMP-STX. Second, for the other agents, we used the interpretative criteria published in Table III of the CA-SFM 2013 recommendations (https://resapath.anses.fr/resapath_uploadfiles/files/ Documents/2013_CASFM.pdf). Note that for RIF, we used the 2013 breakpoints, as they fitted better with the observed distribution of ZD values. Clarithromycin breakpoints were taken from those for erythromycin, as recommended. The breakpoint for oxacillin was derived from the one used for Staphylococcus spp. ZD interpretation breakpoints are given in Additional file 1: Table S2.
Penicillin susceptibility was initially determined using 10-UI (6 μg) disks (resistance breakpoint, 18 mm), but CA-SFM/EUCAST recommendations were changed in 2014 to use 1-UI disks, while the resistance breakpoint was increased from 18 to 29 mm. As all C. diphtheriae strains end up in the resistant category following this recommendation, E-test strips were used to define the penicillin MIC since 2014 (Additional file 1: Table S2: isolates starting from FRC0259); the EUCAST breakpoint of 0.125 g/L was used as a cutoff. Penicillin E-test was also performed systematically for strains tested as resistant before 2014 (using 10-UI disks), as well as for some susceptible isolates (Additional file 1: Table S1).
Multidrug-resistant C. diphtheriae (MDR-DIP) were defined as strains resistant to three or more antimicrobial agent categories (defined in Additional file 1: Table  S2), excluding intrinsic resistance to fosfomycin [46]. Note that we used ecological cutoffs rather than currently proposed clinical breakpoints (see Additional file 1: Table S2 for a comparison of both types of breakpoints).

Whole-genome sequencing by Illumina and Oxford Nanopore Technologies
DNA was extracted from broth cultures, by making use of DNeasy Blood & Tissue Kit (QIAGEN, Hilden, Germany). However, a lysis step was added to the extraction protocol described by the manufacturer as previously described [47]: a 1-μL loopful of bacterial colonies was emulsified in 180 μL of lysis buffer containing 20 mM Tris-HCl, pH 8, 2 mM EDTA, 1.2% Triton X-100, and 20 mg/mL lysozyme, in a DNase/RNase-free 1.5-mL Eppendorf tube and incubated in a heating block at 37°C for 1 h, with mixing every 20 min. After extraction, DNA concentration was measured with the Qubit 3.0 Fluorometer (Invitrogen), employing the Qubit dsDNA BR Assay Kit (Invitrogen). Besides, the DNA quality was verified using a D-One spectrophotometer (Nanodrop). Multiplexed paired-end libraries (2 × 150 bp) were prepared using the Nextera XT DNA kit (Illumina, San Diego, CA, USA) and eventually sequenced with an Illumina NextSeq-500 instrument at a minimum of 50× coverage depth. Trimming and clipping were performed using AlienTrimmer v0.4.0 [48]. Redundant or over-represented reads were reduced using the khmer software package v1.3 [49]. Finally, sequencing errors were corrected using Musket v1.1 [50]. A de novo assembly was performed for each strain using SPAdes v3.12.0 [51]. The genomic sequences of the four reference strains were retrieved from public repositories (Additional file 1: Table S1).
Additionally, the multidrug-resistant isolate FRC0402 was subjected to long-read sequencing using Oxford Nanopore Technologies (ONT). Genomic DNA was extracted using the phenol-chloroform protocol combined with Phase Lock Gel tubes (Qiagen GmbH). Libraries were prepared using a 1D ligation sequencing kit (SQK-LSK-108) without fragmentation and sequenced using a MinION FLO-MIN-106 flow cell. Finally, ONT and Illumina short reads were combined to generate a hybrid assembly using Unicycler v0.4.4 (normal assembly mode, default parameters).

Phylogeny, recombination, and genomic sequence analyses
We built a core genome multiple sequence alignment (cg-MSA) from the assembled genome sequences. For this, the genome sequences were annotated using PROKKA v1.14.2 [52] with default parameters, resulting in GFF files. Roary v3.6 [53] was used to define proteincoding gene clusters, with a threshold set at 70% amino acid identity. Core genes were defined as being present in 95% of genomes and were concatenated into a cg-MSA by Roary. ClonalFrameML v1.11 [54] was used to build a phylogenetic tree based on the cg-MSA, which quantifies and accounts for the effects of recombination events. PhyML v20131022 [55] was used to build an initial tree. To evaluate bootstrap support, IQtree version 2 [56] with best-fit model GTR+F+R10 was used.
MLST genotypes were defined using the international MLST scheme for C. diphtheriae and C. ulcerans [12].

Genome-wide association studies
The software treeWAS [57] was used to find genomewide associations between either antimicrobial resistance phenotypes or biovar on the one hand and genetic variants (both core-genome SNPs and accessory genome gene presence/absence) on the other hand. Coregenome SNPs were derived either from a mapping approach (Samtools v1.9 and GATK v3.4-0), which comprises intergenic regions, or from the alignment of core coding sequences found using Roary. We ran treeWAS v1.1 with default parameters, using as input the previously computed ClonalFrameML phylogenetic tree and distribution of homoplasies, in order to account for both the population structure and effect of recombination. For this analysis, susceptibility phenotypes were classified into resistant or susceptible categories based on zone diameter phenotypes using the CA-SFM/EUCAST 2019 cutoffs (Fig. 3, Additional file 3: Table S3). The seven chromosomal PBP coding sequences (including gene with locus tag RS14485 in RefSeq NC_002935.2, or DIP0637 in the original GenBank file, renamed by us as PBP4b) were extracted from the genomic sequences and translated into amino acid (AA) sequences, which were also analyzed for association with penicillin resistance.

Cloning and transformation experiments
For ectopic expression in C. glutamicum, the pbp2m gene was amplified from C. diphtheriae strain FRC0402 and put under the control of the inducible P gntK promoter on the shuttle vector pTGR5 [58] (Additional file 4: Escherichia coli CopyCutter EPI400 (Lucigen) was used for the cloning of the pbp2m gene and was grown in Luria-Bertani (LB) broth or agar plates at 37°C supplemented with 50 μg/mL kanamycin. The pTGR5_ pbp2m plasmid was sequenced and electroporated into Corynebacterium glutamicum ATCC 13032. Positive colonies were grown in brain heart infusion (BHI) at 30°C and 120 rpm supplemented with 25 μg/mL kanamycin and 1% (w/v) gluconate when required for ectopic expression of Pbp2m.

Mapping of SNPs on C. diphtheriae PBP sequences
Functional annotation of the different sequences was performed with InterPro [59]. Conserved transpeptidation motifs SxxK, SxN, and KTG were identified and mapped on the PBP sequences from Corynebacteriales based on the results of multiple sequence alignments performed with Clustal Omega [60]. When uncertainty between a transmembrane domain and a signal peptide existed, a decision was made based on the previous characterization of the homologous PBP in other Corynebacteriales in the literature.

Results
Provenance and microbiological characteristics of C. diphtheriae isolates We studied 247 C. diphtheriae strains of diverse geographic and temporal origins (Fig. 1). This collection included 163 isolates prospectively collected between 2008 and 2017 from French mainland and overseas territories, 15 older  French clinical isolates, 65 ribotype reference strains [40], and 4 other reference strains. All isolates were confirmed as C. diphtheriae (excluding C. belfantii and C. rouxii) based on an average nucleotide identity (ANI) value higher than 96% with the C. diphtheriae type strain NCTC11397 T .
Approximately one third (n = 78, 32%) of isolates were tox-positive (as defined by the detection of the tox gene by PCR), whereas the remaining 169 isolates (68%) were tox-negative. The proportions of tox-positive isolates were 42%, 34%, and 2% among reference strains, 2008-2017 clinical isolates, and older clinical isolates, respectively (Additional file 4: Fig. S2). Of the 78 tox-positive isolates, 17 (21.8%) were negative for toxin production and thus correspond to non-toxigenic toxin gene-bearing (NTTB) isolates. Six of the NTTB isolates had a stop codon within the tox gene sequence (Additional file 1: Table S1, Additional file 5: Table S4). However, for the 11 remaining strains, we found no explanation for the observed lack of toxin production.
Phylogenetic structure of C. diphtheriae and distribution of the toxin gene To infer a phylogenetic tree, we first aimed to detect and remove homologous recombination events among C. diphtheriae genomic sequences. ClonalFrameML inferred a relative rate of recombination to mutation (R/ theta) of 0.86, with an average length of recombination segments (delta) of 287 bp. The mean genetic distance between donor and recipient of recombination (nu) was estimated by ClonalFrameML to be 0.02, resulting in a relative impact of recombination to mutation in genomic diversification (r/m = R/theta × delta × nu) of 5.01 [61].
The recombination-corrected phylogeny was similar to the uncorrected phylogeny but with shorter branches, as expected (Additional file 4: Fig. S3). The phylogeny, rooted with C. belfantii and C. rouxii (Additional file 4: Fig. S4), was star-like, with a multitude of sublineages branching off deeply. The deepest branching sublineages corresponded to two ribotype reference strains of biovar Mitis: CIP107521 (ribotype Dagestan) and CIP107534 (ribotype Kaliningrad). Remarkably, isolates of biovars Mitis and Gravis were mostly distributed in two distinct branches of the tree. We therefore named the two major branches, lineage Mitis (156 strains, of which 86% were of biovar Mitis) and lineage Gravis (91 strains, of which 77% were of biovar Gravis). The Gravis lineage branched off from within the Mitis lineage ( Fig. 2; Additional file 4: Fig. S5). However, although most shallow branches were strongly supported, bootstrap support values of the deep nodes of the phylogeny were generally weak, implying that the position of the Gravis branch within the Mitis diversity was uncertain, as were the relative positions of the early-branching Gravis and Mitis sublineages (Additional file 4: Fig. S5, Fig. S6). Reference strains PW8 and NCTC11297 T belonged to the Mitis lineage, whereas NCTC13129 (from the ex-Soviet Union 1990's outbreak) and NCTC10648 belonged to the Gravis lineage. The Belfanti isolates were scattered in three distinct sublineages within the Mitis lineage and one within the Gravis lineage.
The isolates carrying the tox gene belonged mostly to the Mitis lineage (68 of 78, 87.2%), in which they were distributed in multiple sublineages. In the Mitis lineage, 69 (44.2%) were tox-positive. In contrast, within the Gravis lineage, only 10 (11%) isolates were tox-positive, and they corresponded to the earliest-branching Gravis sublineages with only one exception. Interestingly, this exception corresponded to the large ex-Soviet Union outbreak in the 1990s (( Fig. 2; Additional file 4: Fig. S5). This phylogenetic pattern is consistent with an evolutionary scenario where Mitis is the ancestral biovar of C. diphtheriae and where Gravis evolved from the Mitis lineage as an initially tox-positive sublineage, with subsequent loss of the toxin gene. In this scenario, the ex-Soviet Union outbreak sublineage would have reacquired the tox gene. All NTTB isolates belonged to the Mitis lineage except strain CIPA99 (ribotype Rhone, biovar Belfanti; Fig. 2), and they were distributed in multiple sublineages, showing convergent evolution towards the loss of toxin production.

Genetic events linked to biovar status
Biovar Mitis and Gravis are distinguished by the ability to utilize glycogen (positive in Gravis, negative in Mitis). The spuA gene, which codes for a putative alpha-1,6-glycosidase, was reported as being specific for biovar Gravis isolates [62]. Our genome-wide association study (GWAS) of accessory genes with the biovar phenotype revealed a strong association of a cluster of genes that includes spuA (DIP357; Additional file 4: Fig. S7) with biovar Gravis isolates, providing statistical support to the discovery of Santos et al. [62]. This association was stronger within the Gravis lineage; in contrast within the Mitis lineage, few (5 out of 17) of the biovar Gravis isolates possessed spuA (Fig. 2). GWAS analysis of core SNPs further demonstrated that a SNP (at position 324,487, Additional file 4: Fig. S7) downstream of the spuA cluster insertion point was also associated with biovar, suggesting homologous recombination among core genes as a mechanism for the spuA cluster insertion event.
The nitrate reductase activity differentiates Mitis and Gravis isolates, which are positive, from Belfanti isolates, which are nitrate-negative. We found that the nitrate reduction narKGHJI gene cluster [62] was disrupted in three of the six isolates assigned to the biovar Belfanti: strains FRC0480 and FRC0481 had a G to A mutation at position 675 of the narG gene, leading to a stop codon, whereas in strain CIPA99, approximately 100 nucleotides were inserted at position 446 in narG. No molecular explanation was found for the lack of nitrate reductase ability of the three other Belfanti strains when scrutinizing the narKGHJI gene cluster and adjacent molybdenum cofactor biosynthesis genes [63].

Antimicrobial susceptibility variation
Susceptibility to 19 antimicrobial agents was determined for the 247 clinical isolates and reference strains (Additional file 3: Table S3). For each agent, the distribution of zone diameter (ZD) values (Fig. 3) revealed a predominant mode located towards the right end of the distribution. This mode likely corresponds to the natural susceptibility distribution within the C. diphtheriae population and was used to define tentative epidemiological cutoffs (ECOFFs, also called ecological cutoff [64]). The proposed ECOFFs and their comparison with clinical breakpoints are presented in Additional file 2: Table S2. For each antimicrobial agent except cefotaxime, this approach led to the identification of strains that had ZD values smaller than those in the main mode, thus potentially corresponding to acquired resistance (Fig. 3).
Penicillin was exceptional in that the predominant mode (centered around 36 mm) was less neatly defined, due to the partial overlap with a second mode of smaller diameter values centered around 24 mm. This second mode corresponds mostly to the "intermediate" interpretative category (18 ≤ ZD < 29 mm) but also overlaps with the "resistant" category (< 18 mm). The distribution of ZD values for tetracycline also showed a clear second mode. For multiple other agents (amoxicillin, oxacillin, imipenem, kanamycin, rifampicin, ciprofloxacin, clindamycin and more evidently sulfonamide, trimethoprim, and the trimethoprim-sulfamethoxazole combination), some strains had the minimal diameter (6 mm, corresponding to growth at the disk contact). For trimethoprim, we observed both a second mode centered around 14 mm and a group of even more resistant strains with growth at disk contact.
Antimicrobial resistance levels were similarly distributed between tox-positive and tox-negative isolates (Additional file 4: Fig. S8) as well as between the two main phylogenetic lineages or biovars (Additional file 4: Fig. S9; Additional file 1: Table S1). In particular, the Fig. 2 Phylogenetic tree of C. diphtheriae. The tree was obtained using ClonalFrameML and was rooted using C. rouxii and C. belfantii isolates (see supplementary material). The main lineages Mitis and Gravis are labeled, and their branches are drawn using purple and green, respectively. The first (internal) circle around the tree corresponds to the three strain subsets (red, recent clinical isolates; blue, reference strains; gray, older clinical isolates). The second circle (stars) gives the toxigenic status. The third circle corresponds to biovars Mitis (purple), Gravis (green), and Belfanti (yellow). The next three circles indicate the presence of the spuA-associated gene cluster; DIP357 = spuA gene. The positions of reference strains PW8, NCTC13129, and NCTC10648 are indicated. The scale bar gives the number of nucleotide substitutions per site proportion of multidrug-resistant strains was not statistically different between biovars Mitis (n = 14) and Gravis (n = 4; chi-square test p value 0.2) or between tox-positive and tox-negative isolates (6 versus 12, p value 0.87).
Resistance rates were 17.2%, 2.5%, and 2.5% for penicillin, amoxicillin, and erythromycin, respectively, among the prospectively collected 2008-2017 clinical isolates (Fig. 4). Reference strains were generally susceptible to most agents, including penicillin, but were partially resistant to tetracycline (18%) and sulfonamide (35%). The resistance profile distribution showed that approximately half (121/247) of the strains had a fully susceptible phenotype, whereas 18 isolates (7.3%) were multidrugresistant (Table S1). Notably, four isolates were resistant at the same time to penicillin and macrolides (Fig. 4 inset), and two of them (FRC0402 and FRC0466) additionally had a reduced susceptibility to amoxicillin. Two of these isolates (FRC0475 and FRC0478) were collected from a foot arch wound and in respiratory carriage in the same patient (French mainland, with recent travel from New Caledonia). The two others came from a patient living in La Réunion Island (FRC0402) and from a patient living in Paris, who had recently traveled to Tunisia (FRC0466). Of the 18 multidrug-resistant isolates, 1 was a 1994 isolate from India and 17 belonged to the 2008-1017 prospective subset of clinical isolates (representing 10.4% of these). Isolates from La Réunion Island and mainland France showed resistance to multiple antimicrobial agents more often than isolates from other geographic origins (Fig. 1a, inset).

Genomic associations with antimicrobial resistance phenotypes
We first searched for the presence in the genomic sequences, of previously described antimicrobial resistance genes (ARGs). This approach led to the detection of 12 ARGs (Additional file 6: Table S5). We identified three tetracycline resistance genes (tetW, tet33, and tetO), four aminoglycoside resistance genes [aph (3′)-Ia, aph (3″)-Ib, aph (6)-Id, and aadA1 = ant (3′)-Ia], and also ermX, dfrA16, dfrA15b, dfrA1, and sul1 genes. We observed a strong correlation between the presence of ARGs and the  Fig.  S10), particularly for ermX (macrolide resistance; phi coefficient = 0.66 with erythromycin resistance), sul1 (sulfonamide resistance; phi = 0.80), and aph (3′)-Ia (kanamycin resistance; phi = 0.84). In strains FRC0137 and FRC0375, this latter gene was linked to aph (3′)-Ib (strA), aph (6)-Id (strB), and ermX on a Tn5432-like genomic region with an IS1249 insertion sequence [65]. The phylogenetic distribution of ARGs ( Fig. 5; Additional file 4: Fig. S6) revealed their presence in multiple unrelated sublineages, consistent with independent acquisitions by horizontal gene transfer. Gene ermX was present either in proximity to gene pbp2m (see below) or in a fragmented insertionsequence rich accessory region. Gene dfrA16 was associated with sul1 on a reported [37] class 1 integron (see the "Discovery of a multidrug-resistant conjugative plasmid carrying the gene pbp2m" section). Tetracycline resistance was associated either with the ribosomal protection protein genes tet(O) or tet(W) or with the efflux pump gene tet33. These three genes were present in distinct strain subsets and appear to contribute independently to tetracycline resistance in C. diphtheriae (Fig. 5); they were mostly associated with insertion sequences but not with other ARGs.
Next, in order to identify novel genetic determinants potentially associated with antimicrobial resistance in C. diphtheriae, a GWAS approach was followed, based on either core genome SNPs or accessory gene presence/absence. SNPs that were strongly associated with ciprofloxacin, trimethoprim, and rifampicin resistance were identified within genes for gyrase subunit A, dihydrofolate reductase, and RNA polymerase subunit B, respectively (Additional file 6: Table S5; Additional file 4: Fig. S11), consistent with known mechanisms and validating our approach. SNPs were also found to be associated with penicillin, kanamycin, and tetracycline resistance (Additional file 6: Table  S5), but based on the annotations of the genes where they are located, the functional implications of these SNPs are unclear in these cases. No association was found for penicillin resistance within the core PBP genes using the genomewide approach. However, using a concatenation of the amino acid sequences of the seven identified putative PBPs of C. diphtheriae, we identified amino acid positions that were statistically associated with penicillin resistance (Additional file 4: Fig. S12). The identified positions were mapped onto the predicted functional domains of the different PBPs (Additional file 4: Fig. S13) revealing several mutation hotspots. A number of significant SNPs were  found within the transpeptidase (TP) domains of the different PBPs, but none of the mutations mapped to the conserved transpeptidation motif. Other mutations were found outside the TP domains, for instance, in the transglycosylase and PASTA domains of PBP1b or in the dimerization domain of PBP2b. GWAS analysis of accessory genes demonstrated significant associations with phenotypic resistance. Associated genes included those mentioned above for erythromycin, tetracycline, kanamycin, sulfonamide, and trimethoprim (Additional file 6: Table S5), showing that these genes are the main mechanisms of resistance to these antimicrobial agents in C. diphtheriae populations. In addition, an accessory penicillin-binding protein gene (which we name pbp2m) was strongly associated with penicillin resistance. This gene is described in more detail below.

Discovery of a penicillin-binding protein (PBP2m) associated with penicillin resistance
The PBP gene pbp2m was observed in 11 isolates, 8 of which were penicillin resistant with minimum inhibitory concentrations (MIC) ranging from 0.19 to 1.5 mg/L. Two of these isolates were also resistant to amoxicillin and one was in addition resistant to oxacillin (Additional file 2: Table S2). The phylogenetic distribution of pbp2m-positive strains was compatible with multiple independent acquisitions of the gene through horizontal gene transfer (Fig. 5). Sequence analysis showed that the newly identified PBP2m is almost identical (3 differences out of 593 amino acids, 99.5%) to PBP2c from C. jeikeium, a class B PBP with an Nterminal signal peptide followed by a lipobox domain and the C-terminal transpeptidase domain (Additional file 4: Fig. S13). The C. jeikeium PBP2c is a lowaffinity PBP and was associated with beta-lactam resistance in C. jeikeium [66].
To demonstrate the role of PBP2m in penicillin resistance, its gene was PCR amplified from FRC0402 and cloned into the pTGR5 plasmid (Additional file 4: Fig.  S1). Transformation of the plasmid into C. glutamicum strain ATCC 13032 raised the MIC for penicillin from 0.125 to 1.5 mg/L, and the MICs of the other betalactams amoxicillin, oxacillin, and cefotaxime also increased importantly ( Fig. 6; Additional file 7: Table S6). In contrast, MICs of non-beta-lactam agents were not changed. Imipenem was less effective against the transformant based on disk diffusion but not based on E-test. Transformation with the empty plasmid used as control (See figure on previous page.) Fig. 5 Phylogenetic distribution of antimicrobial resistance phenotypes and genes. The phylogenetic tree is the same as in Fig. 1; the Mitis branch is in purple, the Gravis branch in green. To the right of the tree, each bloc indicates first the phenotype (red, resistant; see key) and relevant corresponding genotypes (orange, gene or mutation presence) Fig. 6 Phenotypic effect of pbp2m expression. Compared susceptibility phenotypes for C. glutamicum transformants with plasmid pTGR5 containing, or not, the pbp2m gene. Left, the shift in zone diameter size; right, the shift in the minimum inhibitory concentration (MIC). Diamonds are positioned on the scales, at positions corresponding to the difference of zone diameters (without pbp2mwith pbp2m) or the log2 of MIC ratios (with pbp2m/without pbp2m). Red, penicillins or cephalosporins; blue, other agents. Tmp-Stx, trimethoprim-sulfamethoxazole did not affect the MIC of any agent. These results show that PBP2m confers resistance to a broad range of betalactams.
Discovery of a multidrug-resistant conjugative plasmid carrying the gene pbp2m Strain FRC0402, a tox-negative isolate from La Réunion Island, stood out as being resistant to 12 agents (Fig. 4  inset). In addition to pbp2m, this isolate carried genes sul1, ermX, and dfrA16 and a tetA family tet(Z)-like (71%) tetracycline efflux gene. To define the genomic context of resistance genes, a complete genome sequence was obtained. The assembly revealed a chromosome of 2,397,465 bp and a circular plasmid of 73,763 bp (Fig. 7), which we propose to name pLRPD (for large resistance plasmid of C. diphtheriae).
The pbp2m gene was located on the large plasmid in a region comprising three other genes, a blaB betalactamase family gene, a LysR family regulator gene, and the ermX gene, flanked by two insertion sequences (IS1628) of the IS6 family (Fig. 7). With disparate direct repeat sequences, it remains unclear if this region Fig. 7 Map of plasmid pLRPD from isolate FRC0402. Predicted coding sequences are portrayed by arrows and colored based on the predicted gene function (refer to key). Inner blue circle, G+C% content; inner green circle, A+T% content. IS3502 annotation is putative; IS3503 is truncated represents a single transposable unit or a mosaic of gene acquisition events in pLRPD. A nearly identical PBP was observed in 78% of publicly available C. jeikeium genomes, in 57% of C. striatum genomes and in multiple other Corynebacterium genomes (Additional file 8: Table  S7). However, the genetic context of PBP2m was highly variable in C. diphtheriae and among other Corynebacterium species (Additional file 4: Fig. S14). A putative transposable PBP-containing unit (PCU) comprising genes pbp2m, blaB, and lysR, commonly flanked by IS3503 (IS256 family) with a fragment identified in pLRPD (Fig. 7), appeared to be highly conserved and was associated variably with ermX in C. diphtheriae and with a helicase in C. diphtheriae and other Corynebacterium species. The PCU was sometimes found in 2 or 3 tandem copies and was chromosomally located in most genomes.
Further elements carried by pLRPD included an integron carrying genes dfrA16, qacL, and sul1, as well as elements of a putative conjugation apparatus gene cluster (Fig. 7). Our conjugation experiments aiming to demonstrate the transfer of pLRPD into recipient C. diphtheriae isolates failed. This plasmid was not found in other C. diphtheriae strains.

Discussion
Strains of C. diphtheriae that are resistant to antimicrobial therapy may compromise the management of diphtheria cases and the control of pathogen transmission. Here, we aimed to define the genomic determinants of resistance to penicillin and other antimicrobial agents in C. diphtheriae and to analyze the relationships of resistance with diphtheria toxin production, biochemical variants, and phylogenetic sublineages. To this aim, we characterized phenotypically and genotypically a large sample of C. diphtheriae isolates from diverse geographic and temporal origins. We confirmed that the species is made of multiple phylogenetic sublineages [9,13,67] and showed that homologous recombination contributes five times more to their diversification than mutation, consistent with previous evidence of recombination in C. diphtheriae populations [12,13].
Historically, C. diphtheriae isolates have been classified into three main biovars, but the links between biovars and phylogenetic structure have remained obscure. Whereas previous work concluded on the absence of an association [11], our phylogenetic analyses reveal that Gravis and Mitis, the two main biovars of C. diphtheriae, are associated strongly with two phylogenetic lineages. Lineage Gravis appears to have acquired ancestrally a gene cluster comprising the extracellular glycogen debranching enzyme gene spuA [62]. Although the Gravis phenotype was associated with spuA within lineage Gravis, other genomic determinants of glycogen utilization remain to be discovered within the Mitis lineage. Whereas most of our biovar Belfanti isolates were excluded from this work because they belonged to C. belfantii or C. rouxii, a few Belfanti isolates did belong to C. diphtheriae. For three of them, mutations leading to a non-functional nitrate reductase were uncovered, explaining the phenotypic switch to biovar Belfanti. Our results show that biotyping is subject to parallel evolution, perhaps due to a yet-unknown selective pressure, and has limited species-level identification (as Belfanti is found in the three species C. diphtheriae, C. belfantii, and C. rouxii) or epidemiological typing value.
The most important factor of C. diphtheriae pathogenicity is the diphtheria toxin. Despite early realization that it is encoded on a prophage [5], few studies have investigated the phylogenetic distribution of the tox gene in C. diphtheriae [8,9]. Here, we show that tox-positive strains mainly belong to the Mitis lineage and to earlydiverging branches of the Gravis lineage. The distribution of tox-positive isolates into multiple Mitis sublineages is strongly indicative of independent acquisitions of the toxin gene. Alternately, this pattern might result from initial acquisition of the tox gene, followed by secondary loss in multiple sublineages. The phylogenetic pattern is also consistent with an ancestral presence of the tox-bearing phage in the Gravis lineage, with subsequent loss of the tox gene in the branch leading to the ancestor of most Gravis isolates. Future work should investigate the dynamics of the lysogenic corynephages and molecular determinants of their sublineage distribution. One important open question is the likelihood of tox-negative strains acquiring the tox gene during colonization, infection, or short-term epidemiological timeframes [7].
Remarkably, except for early-diverging sublineages, only one Gravis sublineage was found to carry the tox gene. This sublineage happens to correspond to the largest outbreak in recent times, which occurred in Newly Independent States of the ex-Soviet Union in the 1990s [13,14,38]. This sublineage, which comprises the ST8 reference strain NCTC13129, is genetically distant from other tox-positive lineages, which belong to the Mitis lineage. Hence, its antigenic structures or other pathogenicity properties may have diverged from those of more common tox-positive isolates, which might have contributed to its exceptional transmission in the 1990s, in addition to the decline in vaccine coverage [14]. Of note, biovar Gravis was named to reflect a perceived higher severity of infection compared to diphtheria cases caused by biovar Mitis isolates [68,69]. Recently, it was shown that most diphtheria vaccines contain, besides the anatoxin, multiple other C. diphtheriae immunogens [22]. The impact of vaccination on the evolution of C. diphtheriae populations, and possible variations of crossprotection as a function of strain diversity, is currently undefined. This work provides a framework onto which future studies can build to address this important question.
Although antimicrobial-resistant C. diphtheriae strains have been reported on numerous occasions [23,32], knowledge on antimicrobial resistance in C. diphtheriae is largely fragmented and suffers from the lack of harmonization. Breakpoints used to define resistance vary according to world region and have changed over time within single countries [70]. The lack of consensus on the definition of resistance restricts our ability to define the magnitude of the problem and its global significance.
We aimed to define biologically meaningful cutoffs [64] based on susceptibility phenotype distributions, taking advantage of our large and diverse sample. Our data allowed us to propose tentative ecological cutoffs for C. diphtheriae. This approach should in the future be extended to MIC values [71] and should use larger and more diverse strain collections. Nevertheless, our analyses suggest that non-susceptibility to at least one antimicrobial agent was acquired by half of C. diphtheriae strains, regardless of lineage, biovar, or toxigenic status. This study further suggests that acquired resistance to penicillin, the first-line therapy against diphtheria, is far from being rare, affecting > 15% of C. diphtheriae isolates collected in the last decade in France and its overseas territories. Besides, 10.4% of these isolates were multidrug-resistant. The high prevalence of resistance to penicillin, tetracycline, and trimethoprim/sulfamethoxazole found here is consistent with susceptibility surveys of recent C. diphtheriae isolates in Algeria [29], Indonesia [31,72], and India [73]. Many high-income countries such as France have chosen to use amoxicillin as the first choice for antibiotic therapy [74], as this molecule remains highly active. Still, widespread penicillin resistance is concerning, since diphtheria mainly occurs in resource-poor settings where penicillin G is largely used. In contrast, resistance to erythromycin and other macrolides remains rare. Our results call for concerted research into the magnitude of the antimicrobial resistance threat in C. diphtheriae.
Knowledge of the genetic mechanisms of antimicrobial resistance is critical for defining appropriate treatments, refining diagnostics, and conducting epidemiological studies of antimicrobial resistance. Resistance genes to several antimicrobial classes have been described in C. diphtheriae [36,37,75], while additional genes described in other Corynebacterium species [76] might also be present in C. diphtheriae. Here, we defined the prevalence and phylogenetic distribution of previously reported and newly identified resistance determinants in C. diphtheriae. We demonstrate the co-occurrence of resistance phenotypes and genes, suggesting a causative link in multiple instances. We further show that resistance genes have been acquired independently in multiple sublineages, demonstrating a dynamic resistome in C. diphtheriae. In addition, we demonstrate an association between alterations in chromosomally encoded targets and phenotypic resistance for quinolone, trimethoprim, and rifampicin. Fluoroquinolone resistance was previously linked to mutations in the gyrA gene in C. amycolatum [77], C. striatum [78], and C. belfantii [79] but seemingly never for C. diphtheriae. Finally, we demonstrate the co-occurrence within some strains of multiple resistance determinants and uncover a previously undescribed large resistance plasmid in C. diphtheriae. The mechanism of genetic transfer of this plasmid remains to be investigated. This work provides the first overview of the C. diphtheriae resistome and will facilitate further studies into the evolutionary emergence of multiresistant C. diphtheriae strains.
Here, we discovered an accessory PBP (PBP2m) and experimentally showed it to confer resistance to penicillin and other beta-lactam antimicrobial agents. Its distribution in multiple sublineages, and its presence in other Corynebacterium species, clearly demonstrates its horizontal transfer, and we revealed a multiplicity of genomic contexts in which it is found within Corynebacterium. PBP2m is a putative low-affinity PBP, which would explain why it is less affected by beta-lactam antibiotics. Very recently, Forde et al. [80] have reported high-level penicillin resistance (MIC > 256 g/L) in C. diphtheriae strain BQ11 and have attributed this phenotype to a penicillin-binding protein located on transposon Tn3503. We found that the pbp2m-containing unit is 99.92% similar to Tn3503, which additionally harbors a helicase (Additional file 4: Fig. S14). Besides, PBP2m is 100% identical to the PBP from BQ11 and therefore also possesses the V361A alteration (using C. jeikeium as reference) that was hypothesized to confer high-level penicillin resistance [80]. Further studies on the expression of pbp2m in its variable genetic backgrounds are required to understand the observed difference in penicillin resistance levels. The location of pbp2m on a mobile element and its association with beta-lactam and carbapenem resistance represents a public health threat.
The seven chromosomal PBPs of C. diphtheriae (including the newly annotated PBP4b) were investigated to identify amino acid sequence polymorphisms associated with penicillin resistance. Although several alterations were significantly associated, none was directly implicated with the catalytic residues of the transpeptidase or transglycosylase domains. The association with resistance may not be directly linked to these catalytic residues but could be due to secondary sites that are thought to interfere with beta-lactam ligand binding. While biochemical and structural studies are necessary to understand how these mutations affect penicillin susceptibility, we postulate that some changes in these domains could lead to allosteric effects ultimately resulting in beta-lactam resistance, as described for Staphylococcus aureus PBP2a [81,82] or Streptococcus pneumoniae PBP2x [83]. Other SNPs might simply have been hitchhiking due to their physical linkage with functionally important SNPs [84].

Conclusions
Our phylogenetic analyses showed that C. diphtheriae is highly diverse and comprises multiple sublineages that are grouped into two major lineages, characterized by distinctive associations with biovars Mitis and Gravis and with the diphtheria toxin gene. Antimicrobial resistance remains rare in France and its overseas territories, but multidrug-resistant strains were found and have a wide phylogenetic distribution. Analyses of the genetic bases of antimicrobial resistance defined the contribution of previously described mechanisms of resistance and uncovered a penicillin-binding protein, PBP2m, associated at the population level with penicillin resistance and shown experimentally to confer resistance to several beta-lactam antimicrobial agents. PBP2m is encoded in diverse genetic contexts, including on a large putatively conjugative resistance plasmid. This study reveals how biovars, diphtheria toxin gene, and antimicrobial resistance are associated with the phylogenetic diversity of C. diphtheriae strains, and provides novel data into the evolutionary dynamics of these medically important features. This framework will facilitate future studies investigating the emergence of sublineages of C. diphtheriae involved in diphtheria outbreaks, antimicrobial resistant infections, and the patterns and drivers of global spread of C. diphtheriae sublineages.