Antibiotic resistance and genomic features of Clostridioides difficile in southwest China

Background Clostridioides difficile infection (CDI) caused by toxigenic strains leads to antibiotic-related diarrhea, colitis, or even fatal pseudomembranous enteritis. Previously, we conducted a cross-sectional study on prevalence of CDI in southwest China. However, the antibiotics resistance and characteristics of genomes of these isolates are still unknown. Methods Antibiotic susceptibility testing with E-test strips and whole genome sequence analysis were used to characterize the features of these C. difficile isolates. Results Forty-nine strains of C. difficile were used in this study. Five isolates were non-toxigenic and the rest carried toxigenic genes. We have previously reported that ST35/RT046, ST3/RT001 and ST3/RT009 were the mostly distributed genotypes of strains in the children group. In this study, all the C. difficile isolates were sensitive to metronidazole, meropenem, amoxicillin/clavulanic acid and vancomycin. Most of the strains were resistant to erythromycin, gentamicin and clindamycin. The annotated resistant genes, such as macB, vanRA, vanRG, vanRM, arlR, and efrB were mostly identified related to macrolide, glycopeptide, and fluoroquinolone resistance. Interestingly, 77.55% of the strains were considered as multi-drug resistant (MDR). Phylogenetic analysis based on core genome of bacteria revealed all the strains were divided into clade 1 and clade 4. The characteristics of genome diversity for clade 1 could be found. None of the isolates showed 18-bp deletion of tcdC as RT027 strain as described before, and polymorphism of tcdB showed a high degree of conservation than tcdA gene. Conclusions Most of the C. difficile isolates in this study were resistant to macrolide and aminoglycoside antibiotics. Moreover, the MDR strains were commonly found. All the isolates belonged to clade 1 and clade 4 according to phylogenetic analysis of bacterial genome, and highly genomic diversity of clade 1 was identified for these strains.


Bacterial source, growth and DNA extraction
Forty-nine isolated C. difficile strains from our previous study (978 fecal samples) were used (Table S1). C. difficile were grown at 37 C in an anaerobic chamber (Mitsubishi, Tokyo, Japan) by using an anaerobic gas pack (Thermo Fisher Scientific, Waltham, MA, USA). BHI agar (Oxoid, Basingstoke, Hampshire, United Kingdom) supplemented with 0.5% yeast extract (Oxoid, Basingstoke, Hampshire, United Kingdom) and 0.03% L-cysteine (Oxoid, Basingstoke, Hampshire, United Kingdom) was used for bacterial cultures, and plates were incubated for 24 to 48 h. Total genomic DNA of the isolated bacteria was extracted using a bacterial total genomic DNA extraction kit (Tiangen, Beijing, China) following the manufacturer's instructions. All DNA samples were stored at −20 C for complete genome analysis.
Breakpoints for these antibiotics were determined according to the recommendations of the Clinical and Laboratory Standards Institute (CLSI) M11-A7 and M100-S24. C. difficile ATCC700057 was used as a quality control. Multi-drug resistance (MDR) was defined as acquired non-susceptibility to at least one agent in three or more antimicrobial categories Magiorakos et al., 2012).

Genome sequencing, assembly and annotation
Genomic sequencing of all isolates was performed by our laboratory on the Illumina MiSeq platform using 2 × 150 bp paired-end reads (Gu et al., 2020). The libraries were built by using a Nextera XT DNA Library Prep Kit following the manufacturer's instructions. In general, 1 ng genomic DNA of each strain was used. The segment and purification were performed, then index PCR was used to add the barcodes using i5 and i7 primers, after that the purification was executed again. Finally, the libraries were normalized, pooled and sequenced using an Illumina MiSeq sequencing system (Illumina, San Diego, CA, USA).
The raw data were trimmed and filtered by Trimmomatic (version 0.38) (Bolger, Lohse & Usadel, 2014). SOAP denovo (version 2.04) was used to assemble the draft genomes with k-mer values optimized to the best assembly results (Li et al., 2009). The open reading frame (ORF) of each genome was predicted by GenemarkS software (version 4.28) (Besemer, Lomsadze & Borodovsky, 2001) with default parameters. The predicted amino acid sequences were aligned and annotated by DIAMOND (E-value: 1e−5, top 5) (Buchfink, Xie & Huson, 2015) to SwissProt, NR, COG, KEGG, Virulence Factor Database (VFDB), Antibiotic Resistance Genes Database (CARD), and Pathogen-Host Interaction database (PHI). The heatmaps based on CARD and VFDB database annotations were drawn using pheatmap in the R package (version 3.2). CD630 with accession number: CP010905 was used as the reference strain for alignment.

Phylogenetic analysis and statistics
Phylogenetic analyses based on whole genomes were performed for all 49 C. difficile in this study with 20 reference strains isolated from different countries in different years ( Table S2). The core SNP alignment phylogenetic tree was built based on the SNP of concatenated core genes with Snippy (Bush, 2021), and the recombination region removed with BAGEP (Olawoye, Frost & Happi, 2020). tcdA, tcdB and tcdC genes were extracted by using CLC genomic workbench, and only full length of these genes were used for alignment. The sequences of each strain were aligned by MEGA 6.0 using the neighbour-joining (NJ) method with 1,000 bootstrap replicates to generate a phylogenetic tree. CD630 (accession number: CP010905) and R20291 (accession number: FN545816) were used as references.
Statistical analysis was performed using the SPSS software package (version 16.0, IBM, Armonk, NY, USA). Kolmogorov-Smirnov Z, T-test or χ2 test were used where appropriate. A P value of <0.05 was recognized as statistically significant.

Availability of data and materials
All data generated or analysed during this study are included in this published article. The sequence data have been deposited into the National Center for Biotechnology Information (NCBI), https://www.ncbi.nlm.nih.gov/ with BioProject accession number: PRJNA785426 (SRR17146436-SRR17146484).

Ethics statements
The human sample collection and detection protocols were carried out in accordance with relevant guidelines and regulations. All experimental procedures were approved by the Ethics Review Committee (Institutional Review Board (IRB)) of Yunnan Provincial Centre for Disease Control and Prevention (Number: YNCDC-2017005). All adult subjects provided written consent, and a parent or guardian of any child participant provided written consent on their behalf.

Antibiotic resistance
All the strains were sensitive to MTZ, MEM, AMC and VA. All of the isolates were intermediately resistant to CTX. However, 87.80% (43/49) of the C. difficile strains were resistant to erythromycin, 81.60% (40/49) to CN, 61.20% (30/49) to DA, 36.70% (18/49) to CAZ, and 28.60% (14/49) of isolates to LEV (Table 1). Among all the C. difficile isolates in this study, 77.55% (38/49) were defined as MDR strains. Twenty-four strains were resistant to three classes of antibiotics, nine strains were resistant to four classes, three isolates were resistant to five classes, and two C. difficile isolates resistant to six classes.
We further compared the antibiotic resistance results between children and adult groups (Table 1). Except the gentamicin, there was no statistical significance between two groups for all antibiotics. The resistant rate of gentamicin in children group (69.39%) was significantly higher than adult group (12.25%) (χ2 = 9.269, P = 0.010).

Genomic sequencing results
The general information and details of genomic sequencing results are shown in Table 2 and Table S3. The effect reads after quality control of raw data were 91.39 ± 2.02 (%) for all the 49 C. difficile strains in this study, and the average nucleotide identity (ANI) was 99.65 ± 0.48. The average genome size was 4.07 ± 0.10 Mbp, GC content was 28.86 ± 0.35 (%), and the coverage for all strains was 98.14 ± 0.68 (%). The reference CD630 strain showed 4.27 Mbp genome size and 29.00% GC content. For variation analysis, the average SNPs of all the C. difficile strains was 16,665.84 ± 9,676.51 bp, from 3,023 bp to 36,394 bp. There was 116.69 ± 59.12 bp insertion, and 144.39 ± 75.04 bp deletion of all the bacteria.

Annotation results
The annotation results of C. difficile isolates in this study revealed that the average numbers of genes annotated using the different databases were 3,925.27 ± 129.82. There were 3,873.78 ± 120.07, 1,938.49 ± 33.77, 2,949.37 ± 43.02, and 1,236.92 ± 8.26 genes annotated with NR, KEGG, COG and SwissProt databases, respectively, as shown in Fig. 1A. Furthermore, 132.47 ± 2.14, 93.84 ± 4.86, and 193.63 ± 3.30 genes were annotated using PHI, CARD and VFDB databases (Fig. 1A). Specifically, macB (19.78%) belonged to ATP-binding cassette (ABC) antibiotic efflux pump resistant to macrolide antibiotic was the most distributed gene in CARD database. arlR (4.98%) and efrB (4.32%) were also the top distributed genes in CARD database, both belonged to antibiotic efflux pump. vanRA (5.62%), vanRG (9.45%), and vanRM (3.90%) belonged to glycopeptide resistance gene cluster were the top distributed genes for glycopeptide antibiotic resistance, as Fig. 1B shown. For VFDB annotation, two clusters of strains were generated according to the virulence genes distribution (Fig. 1C). From YNCD947 to YNCD55 isolates, secretion system related genes were highly distributed, however from YNCD6 to YNCD505 strains, the toxin related genes were highly distributed.

Phylogenetic analysis
Phylogenetic analysis was conducted on the 49 C. difficile isolates used in this study with core genomes of bacteria, as shown in Fig. 2. Five clusters of branches were obviously identified, and each cluster belonged to a different clade, namely clade 1 to 5, as shown in Fig. 2. All the strains in this study were divided into clade 1 and clade 4. Except three isolates that belonged to clade 4 (ST39 strains; YNCD887, YNCD175 and YNCD372), all others belonged to clade 1. However, there were some sub-clusters found in clade 1, and each sub-cluster corresponded to the major sequence type of the strains. For example, from YNCD905 to YNCD947, all these strains belonged to ST35; from YNCD601 to YNCD916, all belonged to ST3; from YNCD086 to YNCD088, both belonged to ST2; from YNCD082 to YNCD147, all belonged to ST54 (Fig. 2). The high genomic diversity characteristic of clade 1 could be detected in the strains that were grouped into this clade.

Toxin gene analysis
Toxin gene analysis revealed the presence of the full length tcdA gene in 18 C. difficile strains studied, and the alignment results indicated that six clusters were generated according to the gene polymorphism, as shown in Fig. 3A. Compared with highly pathogenic strain (R20291, RT027), all the tcdA genes were similar with low pathogenic strain (CD630, RT012). For tcdB gene, 28 full lengths of this gene were extracted, but only four clusters were generated, which showed a high degree of conservation (Fig. 3B). Similar with tcdA, all the tcdB genes were clustered with low pathogenic strain CD630, but far away from highly pathogenic strain R20291. The tcdC analysis revealed that 17 full lengths of genes were extracted, and all the strains did not have the 18-bp deletion of R20291 from 391 to 408 nt of the gene, as shown in Fig. 3C. Only some

DISCUSSION
Antibiotic usage has been associated with CDI in clinics, which disrupt the normal gut microbiota of hosts and lead to the proliferation of C. difficile, especially for macrolides, aminoglycosides, quinolones and cephalosporins. Previous studies revealed that clade 4 of C. difficile often associated with MDR, and was recognized as primary clade of MDR after clade 1 in China Li et al., 2019). Several European countries have reported the reduced metronidazole susceptibility of strains, but still not found in China (Peng et al., 2017;Spigaglia, Barbanti & Mastrantonio, 2011). In our study, all the strains were sensitive to metronidazole, meropenem, amoxicillin/clavulanic acid and vancomycin, which showed identical results as previously reported. Li et al. (2019) performed a study on antibiotic resistance in C. difficile isolated from clinical sources across China. In general, 319 isolates tested by using 11 antibiotics, 313 were resistant to at least one antibiotic. The ciprofloxacin, clindamycin, and erythromycin were the highest rate of resistant antibiotics among all age groups. All isolates were susceptible to metronidazole and vancomycin. Moreover, the proportion of multi-drug resistant strains was much lower than previous reports, except for chloramphenicol and meropenem. In this study, most of the strains were resistant to erythromycin, gentamicin and clindamycin, followed by ceftazidime and levofloxacin. The CARD annotation results were consistent to phenotypic results, macB, vanRA, vanRG, vanRM, arlR, and efrB were resistant genes referred to macrolide, glycopeptide, and fluoroquinolone. There were some associations between the phenotypic resistance and the annotated antibiotic resistance genes, except for vanRA, vanRG and vanRM and glycopeptide resistance, which was not phenotypically tested. The macB, arlR and efrB genes were all belonged to antibiotic efflux pump for macrolide and fluoroquinolone resistance, in accordance with the erythromycin and levofloxacin resistant phenotypes. Interestingly, the MDR strains showed extremely high proportion than other areas of China, and 77.55% were defined as MDR. Clade 1 represents highly heterogeneous toxigenic and non-toxigenic sequence types and ribo-types, including many clinically significant strains, such as RT014, RT002 and RT018, some of the most commonly isolated types of RTs from CDI patients (Bauer et al., 2011;Collins, Hawkey & Riley, 2013;Foster et al., 2014;Lim et al., 2014). Clade 4 contains RT017 (ST37), which has a different toxin spectrum and is generally resistant to clindamycin and fluoroquinolone. Despite the absence of toxin A and CDT toxin expression, RT017 still causes widespread CDI, which is associated with outbreaks in Europe and North America (Alfa et al., 2000;Drudy et al., 2007;Goorhuis et al., 2009). In addition, RT017 is responsible for the most cases of CDI in Asia Wu et al., 2022). Clade 2 contains highly toxic RT027 and other clinically important RTs, including RT244 and RT176 (Valiente et al., 2012). Recent MLST and WGS studies have shown that the heterogeneity of clade 5 is more diverse than originally imagined, including not only RT078, but also RTs from numerous clinical, animal and food sources (RT033, RT045, RT066, RT126, RT127, RT237, RT280, RT281 and RT288 strains) (Knight et al., 2019;Wu et al., 2019). In this study, only clade 1 and 4 of strains were identified, and isolates in clade 1 revealed characteristics of highly genomic diversity. The sub-clusters of strains were consistency with ST types in clade 1.
Both tcdA and tcdB are located at the 19.6 kb pathogenicity locus (PaLoc). tcdC is located downstream of tcdA and is transcribed from the opposite direction of the two toxin genes (Pruitt et al., 2010). The decrease in TcdC expression corresponds to the increase in TcdA and TcdB expression, indicating that TcdC may be a negative regulator of toxin production (Matamouros, England & Dupuy, 2007). However, the role of tcdC in repression of PaLoc toxin expression has been subsequently disproven (Bakker et al., 2012). RT027 (highly pathogenic clone) strain from the United Kingdom showed 18-bp deletion for tcdC gene, and this deletion indicated to be conserved among all RT027 isolates (Curry et al., 2007). Furthermore, previous report demonstrated that epidemic C. difficile that produced high levels of toxins carried deletion or frame shift mutations in tcdC gene (Curry et al., 2007). In this study, none of the isolates showed 18-bp deletion as RT027 strain as described before, and it was considered that all these C. difficile were low pathogenic clones. Moreover, the tcdB gene was more conserved than the tcdA gene. Steele et al. (2013) investigated the efficacy of specific human monoclonal antibodies against TcdA and TcdB toxins. Their results showed that TcdB was more important than TcdA as virulence factor associated with gastrointestinal tract disease. Therefore, the nucleotide polymorphisms probably were associated with pathogenic ability of these toxins.

CONCLUSIONS
In this study, the antibiotic susceptibility tests and whole genomic sequencing method were both used to analyze the features of C. difficile in southwest China. Most of the strains were resistant to erythromycin, gentamicin and clindamycin. A total of 77.55% of the strains were considered as MDR. Phylogenetic analysis based on core genome of bacteria revealed all the strains were divided into clade 1 and clade 4. The characteristics of genome diversity for clade 1 could be found. None of the isolates showed 18-bp deletion of tcdC as RT027 strain as described before, and the tcdB gene was more conserved than the tcdA gene.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The authors received no funding for this work.