PPE Barcoding Identifies Biclonal Mycobacterium ulcerans Buruli Ulcer, Côte d’Ivoire

ABSTRACT Mycobacterium ulcerans, an environmental opportunistic pathogen, causes necrotic cutaneous and subcutaneous lesions, named Buruli ulcers, in tropical countries. PCR-derived tests used to detect M. ulcerans in environmental and clinical samples do not allow one-shot detection, identification, and typing of M. ulcerans among closely related Mycobacterium marinum complex mycobacteria. We established a 385-member M. marinum/M. ulcerans complex whole-genome sequence database by assembling and annotating 341 M. marinum/M. ulcerans complex genomes and added 44 M. marinum/M. ulcerans complex whole-genome sequences already deposited in the NCBI database. Pangenome, core genome, and single-nucleotide polymorphism (SNP) distance-based comparisons sorted the 385 strains into 10 M. ulcerans taxa and 13 M. marinum taxa, correlating with the geographic origin of strains. Aligning conserved genes identified one PPE (proline-proline-glutamate) gene sequence to be species and intraspecies specific, thereby genotyping the 23 M. marinum/M. ulcerans complex taxa. PCR sequencing of the PPE gene correctly genotyped nine M. marinum/M. ulcerans complex isolates among one M. marinum taxon and three M. ulcerans taxa in the African taxon (T2.4). Further, successful PPE gene PCR sequencing in 15/21 (71.4%) swabs collected from suspected Buruli ulcer lesions in Côte d’Ivoire exhibited positive M. ulcerans IS2404 real-time PCR and identified the M. ulcerans T2.4.1 genotype in eight swabs and M. ulcerans T2.4.1/T2.4.2 mixed genotypes in seven swabs. PPE gene sequencing could be used as a proxy for whole-genome sequencing for the one-shot detection, identification, and typing of clinical M. ulcerans strains, offering an unprecedented tool for identifying M. ulcerans mixed infections. IMPORTANCE We describe a new targeted sequencing approach that characterizes the PPE gene to disclose the simultaneous presence of different variants of a single pathogenic microorganism. This approach has direct implications on the understanding of pathogen diversity and natural history and potential therapeutic implications when dealing with obligate and opportunistic pathogens, such as Mycobacterium ulcerans presented here as a prototype.

collection, and its genome sequence is not known, limiting any further study of this unique environmental isolate.
The detection of M. ulcerans is routinely based on the detection of nucleic acid sequences thought to be specific for this opportunistic pathogen. More precisely, detection of M. ulcerans in samples mainly relies on PCR-based multiplex detection of the plasmid-encoded Ketereductase B-domain gene (KR-B) and two insertion sequences IS2404 and IS2606 (4). This multiplex system is limited in its capability to accurately identify M. ulcerans from closely related mycobacteria, including M. ulcerans subsp. shinshuense and Mycobacterium liflandii, and to genotype M. ulcerans in a context where the geographical microevolution of the pathogen has been observed in several Buruli ulcer regions of endemicity in Africa and Australia (5).
Here, investigating a large collection of M. marinum/M. ulcerans complex whole-genome sequences (WGSs) in the context of taxonomic and phylogenetic investigations of this pathogen (6), we observed one hot spot of genomic sequence variation among M. ulcerans complex members (including different taxa). We further analyzed this hot spot and are now reporting that investigating this hot spot by using an appropriate PCR sequencing system described here could add a convenient laboratory tool for diagnosing genomic diversity of M. ulcerans in clinical materials.

RESULTS
WGS database. We investigated a database containing a total of 385 M. marinum complex genomic sequences. In detail, 355 M. ulcerans genomes were isolated from humans and animals from Angola, Benin, Cameroon, Côte d'Ivoire, Democratic Republic of the Congo, Gabon, Ghana, Nigeria, Papua New Guinea, Republic of the Congo, Togo, Uganda, Australia (Mornington Peninsula, Bellarine Peninsula, Gippsland, Melbourne, Ballarat, Phillip Island, Queensland, Darwin, and Port Hedland regions), the United States, Japan, and French Guiana, 3 M. pseudoshottsii genomes were isolated from the United States and China, 3 M. liflandii genomes were isolated from Israel, and 24 M. marinum genomes were isolated from France, Thailand, Israel, the United States, Japan, and the United Kingdom.
Identification of taxa using the PPE gene as a hot spot-specific target. Based on output conserved alignment genes from the pangenome analysis of 385 different M. ulcerans and M. marinum isolates from different areas, we searched for a specific marker gene whose sequence would detect and discriminate between the announced taxa. Using the Seaview application, we observed that one PPE gene sequence was variable enough to discriminate between the M. ulcerans and M. marinum taxa, acting as a hot spot sequence proxy for the complete genome sequence, as detailed above. M. ulcerans strain SRR6346343 in taxon 2.1, isolated from T. vulpecula in Gippsland, completely lacked the PPE gene. A phylogenetic tree drawn from 384 extracted PPE gene sequences in the assembled genomes of M. marinum and M. ulcerans showed coherent clustering of taxa with the core genome phylogenetic tree (Fig. 2). This 1,068-bp PPE gene was annotated as a PPE family protein (CP003899.1) in the M. liflandii strain 128FXT used as the reference (GenBank accession number NC_020133.1). The PPE protein was predicted to be an alpha helical transmembrane protein with a 0.934205 probability. We further observed deletions in the 59 extremity of the   Fig. 3 and 4). All the mutations or deletions in this region were noted in each taxon according to their position (  Fig. 3; Table S3). In taxon 2.1, the Australian isolate SRR6346364 has two unique mutations (C/G at position 66 and C/A at position 86) in the common region compared to all the other isolates (Fig. 2,  Experimental data. Partial PCR sequencing of the PPE gene in nine cultured M. ulcerans and M. marinum isolates belonging to four taxa retrieved an experimental sequence identical to the one derived in silico from the whole-genome sequence of the same strain (Fig. 4). Regarding the mutation profile, we detected the same mutation profile in each isolate as the standard mutation PPE gene for each strain (Table S3). As for DNA extracted from 24 swabs collected for the diagnosis of Buruli ulcers, the insertion sequence IS2404 real-time PCR found 21 positive samples exhibiting cycling threshold (C T ) values of ,32 ( Table 1). PCR of the PPE gene further detected 15/21 (71.4%) positive samples, which all exhibited real-time PCR IS2404 C T values of ,39.9 ( Fig. 4; Table 1). Sequencing PCR products showed that the 15 samples were related to Africa taxon T2.4 (Fig. 4). In detail, the M. ulcerans T2.4.1 genotype was detected in eight samples, whereas a biclonal T2.4.1/T2.4.2 genotype infection was observed in seven samples by detecting the specific character of each taxon on output sequencing reads. Interestingly, biclonal infection was observed specifically in Toumodi and Sakassou regions, two cities located 15 miles apart one from the other, suggesting a cluster of cases in this region. The consensus sequence of each taxon was extracted, and a phylogenetic tree was generated with the reference PPE gene (Fig. 4).

DISCUSSION
To complement public databases currently containing only 14 complete M. ulcerans genome sequences, here, we succeeded in assembling a total of 341 additional complete M. marinum/M. ulcerans complex genome sequences directly from reads recovered from isolates acquired from two large epidemic areas, Africa and Australia. This work gave us the opportunity to conduct an unprecedented pangenome comparative analysis of many M. marinum/M. ulcerans complex members, with proposals for the reclassification of M. marinum/M. ulcerans complex members in different taxa based on core genomes, SNP distance, and geographical isolation of isolates, following our previous preliminary study.
As a practical output for the detection of M. marinum/M. ulcerans complex organisms in clinical practice, this tremendous bioinformatic work indicated that one PPE gene was a hot spot for genome variability. In silico data indicated that a 390-bp chromosomal fragment sequence of the PPE gene discriminated between the different taxa regions derived from a previous WGS analysis (5)(6)(7)(8) and was experimentally confirmed on cultured samples and on clinical samples in the presence of negative controls. Together, these data indicate that the PCR-sequencing protocol described here could be used for the effective one-shot detection, identification, and genotyping of organisms within the M. marinum/M. ulcerans complex in clinical samples previously detected by IS2404 real-time PCR. In the present study, negative detection of the PPE gene in some clinical samples correlated with low specific DNA concentration, marked by IS2404 real-time PCR C T values of .32. Indeed, whereas the number of IS2404 copies varies from 150 to 200 copies in the M. ulcerans genome (4,9), the PPE gene investigated here is a monocopy gene. Moreover, the PPE gene PCR sequencing system developed here may miss PPE gene deletions, making it undetectable by PPE gene primers.
The protocol developed here could be of interest for laboratories investigating M. marinum/M. ulcerans complex organisms in tropical countries, as M. ulcerans and M. marinum genotyping is currently achieved by analyzing mycobacterial interspersed repetitive units (9). The only obvious alternative would be whole-genome sequencing, an approach so far limited to M. marinum/M. ulcerans complex isolates, as no metagenomic data have ever been reported directly from Buruli ulcer lesions; therefore, the actual genomic diversity of M. ulcerans that causes Buruli ulcers remains underestimated. The protocol reported here therefore offers a convenient, albeit limited, proxy for the study of polyclonal M. ulcerans infection. Accordingly, we report a biclonal M. ulcerans infection in Buruli ulcer lesions in Côte d'Ivoire, an unprecedented observation opening new avenues in terms of understanding the natural history of Buruli ulcers in front of a yet undetermined genomic diversity of environmental M. ulcerans.
We therefore propose that partial PPE gene PCR sequencing could be implemented in the routine laboratory investigation of otherwise PCR-confirmed Buruli ulcer lesions to assess polyclonal infection. The protocol reported here will be adapted to the rapid Oxford Nanopore sequencing platform, which is based on a mapping approach, for point-of-care application.

MATERIALS AND METHODS
In silico M. ulcerans whole-genome sequence analyses. We retrieved WGS data for 44 assembled genomes of M. ulcerans and M. marinum available in the NCBI database (parameter used: Assembly; November 2020) as well as reads deposited for 174 M. ulcerans and 167 M. ulcerans isolates from two large epidemic areas in Australia and Africa (Table S1 in the supplemental material) (5)(6)(7)(8). FastQC was used for quality control of the sequencing data (https://github.com/s-andrews/FastQC). These 341 M. ulcerans sequencing reads were assembled using SPAdes version 3.14.0 and annotated using Prokka version 1.14.6 (10,11). To run SPAdes, we used the pipeline option "-careful" to reduce the number of mismatches and short insertions/deletions (indels). Default parameters for k values, that is, k-mer values of 127, 99, 77, 55, 33, and 21, were applied. All contigs with sizes under 800 bp or with depth values lower than 25% of the mean depth were possible contaminants and were removed. Roary 3.13.0 set at 95% sequence identity was used to generate the pangenome and core genome (conserved genes) (12). Snp-dists 0.6.3 was then used to calculate the single-nucleotide polymorphism (SNP) distance based on the core genome of the 385 studied WGSs. The core genome alignment was visualized using Seaview (13). FastTree v2.1.11 on the NGphylogeny website was used to create a phylogenetic tree using default parameters of the core genome and of the PPE gene of interest in this study (M. ulcerans Agy99; GenBank PPE gene ID: ABL03116.1). Moreover, one PCR primer pair specifically targeting a 390-bp region in the PPE gene of interest was designed using primer BLAST (https://www.ncbi.nlm.nih.gov/tools/primer-blast/). PPE gene-based PCR. Eight M. ulcerans isolates and one M. marinum isolate maintained in the IHU Méditerranée Infection Mycobacteriology Laboratory collection and firmly identified by WGS were incubated for 6 weeks at 30°C on solid Löwenstein-Jensen medium (Bio-Rad, Marnes la Coquette, France). DNA was extracted from colonies using the InstaGene matrix following the manufacturer's instructions (Bio-Rad). In addition, 24 leftover swab samples taken from suspected Buruli ulcer cutaneous lesions in 24 different patients as part of routine medical management in Côte d'Ivoire were tentatively diagnosed as M. ulcerans infections at the Institut Pasteur Laboratory, Abidjan, Côte d'Ivoire (Table 1). Swabs were hydrated and suspended in sterile distilled water, which was used for the extraction of total DNA using a Qiagen kit following the manufacturer's instructions (DNeasy blood and tissue kit, Qiagen, Courtaboeuf, France). M. ulcerans DNA was searched using a real-time PCR targeting the IS2404 gene, as previously described (4). After strict anonymization of leftover extracted DNA, the remaining DNA was sent to IHU Méditerranée Infection to refine the diagnosis using the PPE gene and was stored at 220°C until use. PCR amplifications of a 429-bp fragment were performed in a Bio-Rad MyCycler (Bio-Rad, Marnes-la-Coquette, France) in 50-mL reaction mixtures containing 5 mL of genomic DNA template, 25 mL of AmpliTaq Gold 360 master mix (Thermo Fisher Scientific), 1.  gel electrophoresis, were purified using the Millipore NucleoFast 96 PCR kit according to the manufacturer's recommendations (Macherey-Nagel, Düren, Germany) and sequenced using the BigDye terminator cycle sequencing kit (Applied Biosystems) with an ABI automated sequencer (Applied Biosystems). Sequences were assembled using the Chromas Pro 1.7 software (Technelysium Pty, Ltd., Tewantin, Australia). Ethics statements. No clinical specimen was sampled specifically for this study, which only incorporated anonymized, leftovers of samples routinely collected for the diagnosis of Buruli ulcers in consulting patients.
Data availability. The data supporting the findings of this study are available within the article and its supplemental material. Raw data that support the findings of this study are available from the corresponding author upon reasonable request.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, PDF file, 0.4 MB. SUPPLEMENTAL FILE 2, XLSX file, 0.8 MB.