Ongoing Genome Reduction in Mycobacterium ulcerans

M. ulcerans is adapting to a more stable environment.

evolution of Mycobacterium ulcerans, the causative agent of Buruli ulcer, is hampered by the striking lack of genetic diversity of this emerging pathogen. However, by using a prototype plasmid-based microarray that covered 10% of the genome, we found multiple genomic DNA deletions among 30 M. ulcerans clinical isolates of diverse geographic origins. Many of the changes appear to have been mediated by insertion sequence (IS) elements IS2404 and IS2606, which have high copy numbers. Classifi cation of the deleted genes according to their biological functions supports the hypothesis that M. ulcerans has recently evolved from the generalist environmental M. marinum to become a nicheadapted specialist. The substantial genomic diversity, along with a prototype microarray that covered a small portion of the genome, suggests that a genome-wide microarray will make available a genetic fi ngerprinting method with the high resolution required for microepidemiologic studies.
T he study of genetic diversity within bacterial species has provided information on aspects such as virulence (1,2), antimicrobial drug resistance (3), epidemiology, and microbial evolution (4)(5)(6)(7). For mycobacteria such as Mycobacterium tuberculosis and M. ulcerans, low intraspecies diversity limits the use of genetic fi ngerprinting techniques that are based on sequence diversity in selected genetic elements. For M. tuberculosis, M. bovis, and the various bacillus Calmette-Guérin daughter strains, genome-wide microarray analyses have identifi ed large sequence polymorphisms (4,(8)(9)(10). However, the complete genome sequence of an organism is required for the design of synthetic oligonucleotide or PCR product-based microarrays. When this information is not available, an alternative is a PCR product-based shotgun DNA microarray (11), which we developed further into a plasmid-based microarray. We used this method for the differential genomic analysis of M. ulcerans, a human pathogen for which the fully assembled and annotated genome sequence was not available at the time of the study.
M. ulcerans is the causative agent of Buruli ulcer, an infectious disease characterized by chronic necrotizing skin ulcers (12). Buruli ulcer is an emerging infectious disease found mostly in West African countries but also in tropical and subtropical regions of Asia, the Western Pacifi c, and Latin America (13). Genetic analyses suggest recent divergence of M. ulcerans from M. marinum, a well-known fi sh pathogen that can cause limited granulomatous skin infections in humans (14). One of the hallmarks of the emergence of M. ulcerans as a more severe pathogen is the acquisition of a 174-kb plasmid that bears a cluster of genes necessary for the synthesis of the polyketide toxin mycolactone. This toxin appears largely responsible for the massive tissue destruction seen in Buruli ulcer (15). The epidemiology and mode of transmission of M. ulcerans disease are not fully understood, partly because no molecular typing method with suffi ciently high resolution for microepidemiologic analyses is available.
Standard molecular typing methods such as multilocus sequence typing, restriction fragment length polymorphism, and fi ngerprinting using variable number of tandem repeats have shown an apparent lack of genetic diversity of M. ulcerans within individual geographic regions, which is indicative of a clonal population structure. The genotyping technique that has shown the highest discriminatory power so far is based on the use of outward-directed primers spe-cifi c for the insertion sequence (IS) IS2404, in combination with an oligonucleotide that targets a repeated GC-rich motif (16). Application of this method determined the resolution of 10 different M. ulcerans genotypes, which correspond to the geographic origin of the isolates. However, this level of resolution is not suffi cient for microepidemiologic analyses. We hypothesized that, as for M. tuberculosis (17), deletional and insertional events mediated by repetitive sequence elements are a major mechanism for genomic variation in M. ulcerans. To test this hypothesis, we developed a plasmid-based microarray and analyzed genomic DNA from 30 M. ulcerans isolates of diverse origins.

Plasmid-Based DNA Microarray
From a shotgun clone library of strain Agy99, 352 Escherichia coli plasmids (pCDNA2.1, Invitrogen, Basel, Switzerland) were randomly selected. Each plasmid contained an M. ulcerans DNA fragment of ≈2.3-2.7 kb. Given a genome size of 5,806 kb (18), this set of plasmid inserts represents a theoretical genome coverage of ≈10%. Plasmid DNA was prepared by using a Biomek 2000 Workstation (Beckman Coulter, Krefeld, Germany) and dissolved at a concentration of 150 ng/μL in 3× SSC (20× SSC stock solution is 3 M sodium chloride, 0.2 M sodium citrate, pH 7.0). The DNA samples were loaded on a piezo-dispensing head that contained 24 channels and spotted onto glass slides coated with poly-L-lysine (Superfrost Plus, Menzel, Braunschweig, Germany) by using a Topspot spotter (Biofl uidix, Freiburg, Germany). Slides were incubated at 4ºC overnight and rehydrated under 50%-60% humidity for 1 h at room temperature. The spots resulting from a volume of ≈1 nL had an average diameter of 270 μm and were 500 μm apart from each other. The microarray layout displayed 2 identical fi elds-for hybridization with 2 different probes-that consisted of 2 replicates each, both of which contained 32 controls and 352 plasmids.

Hybridization of Microarray Slides
Five micrograms of biotinylated DNA was mixed with 30 μg human Cot-1 DNA (Roche Applied Science, Indianapolis, IN, USA) and 100 μg yeast tRNA (Gibco/BRL). The hybridization mix was concentrated with a Speed Vac Concentrator System (Eppendorf, Basel, Switzerland), resolved in 3× SSC, 0.3% SDS, denatured for 3 min at 95ºC, and incubated for 30 min at 37ºC before hybridization. Microarray slides were cleaned with a nitrogen fl ow, exposed to UV light in a Stratalinker 2400 (Stratagene, La Jolla, CA, USA) at 650× 100 μJ, and heated for 5 min to 95ºC before application of 13 μL of the hybridization mix on each array fi eld. Hybridization occurred for 20 h at 65ºC in a hydration chamber. Hybridized slides were washed once with 2× SSC, 0.03% SDS for 5 min at 65ºC, twice with 1× SSC for 5 min at room temperature, and fi nally with 0.2× SSC for 5 min at room temperature. The coloration step was performed with 2 mL staining solution containing 50% caseine, 1× maleic acid buffer (Roche Applied Science), and 2 μg Streptavidin Cy3 Fluorolink (Amersham, Piscataway, NJ, USA) for 30 min at room temperature, followed by additional washings for 5 min with 1× TBS (0.15 M  sodium chloride, 0.02 M Tris, pH 7.5) as well as 0.1× TBS and drying with a nitrogen fl ow. DNA of all 30 M. ulcerans strains was processed under identical conditions and hybridized at least twice, which yielded 4 sets of data for each strain. Human Cot-1 DNA and plasmid DNA without insert as well as a hybridization mix without DNA served as negative controls for hybridization. A 500-bp β-lactamase gene fragment and Cy3-labeled random oligonucleotides (Microsynth, Balgach, Switzerland) were used as positive controls and for estimation of the amount of spotted DNA.

Microarray Scanning and Data Evaluation
Images of the microarrays were acquired by using a laser microarray scanner (GenePix 4100A, Axon Instruments Inc., Foster City, CA, USA) with an excitation wave length of 532 nm, an emission wavelength of 570 nm, and standardized measurement parameters. The resulting image was analyzed by the software GenePix Pro 4.1 (Axon Instruments Inc.), which enabled assignments of mean intensity values used for data interpretation. To select spots to be included in the analysis of genomic diversity of M. ulcerans strains, replicates of 10 hybridizations were performed by using M. ulcerans Agy99 genomic DNA. All spots that showed a signal lower than twice that given by the negative control plasmid without insert were rejected, as were all spots for which coeffi cient of variation was >30%. Further analysis used 232 spots that had an average signal above the threshold and suffi cient signal stability. For each plasmid, we calculated the average signal value, standard deviation, and coeffi cient of variation and assessed a signal ratio in comparison with the reference strain. Outlier spots with a ratio higher than U2 (U2 = upper quartile + 3× interquartile) were identifi ed through a box-plot analysis.

Characterization of Large Sequence Polymorphisms
Microarray data that indicated the presence of a deletion were verifi ed by PCR analysis, which used primer pairs that spanned the insertion sequences of the respective plasmids, the fl anking regions, or both. The 5′ and 3′ limits of the confi rmed genomic deletions with respect to the genome of strain Agy99 were determined by PCR analysis, which used multiple sets of primers complementary to fl anking genomic regions. PCR analyses that bridged the genomic breakpoints were performed by using a long-range PCR polymerase mix (Fermentas, St Leon-Rot, Germany) according to the manufacturer's description. PCR products were cloned into pGEM-T (Catalys AG, Promega, Wallisellen, Switzerland) and sequenced using an ABI PRISM 310 genetic sequence analyzer (Perkin-Elmer, Waltham, MA, USA).

Comparative Genomic Hybridization of M. ulcerans Isolates
We constructed a microarray based on a random selection of 232 Escherichia coli plasmids obtained from a shotgun sequence library of the M. ulcerans isolate Agy99 from Ghana. Genomic DNA hybridization signal intensities from 30 M. ulcerans clinical isolates of worldwide distribution ( Figure 2) were compared with those obtained with strain Agy99. Box-plot analysis ( Figure 3) identifi ed plasmids that yielded outlier signals with respect to strain Agy99. For 19 of 20 plasmids, PCR analysis confi rmed an association of the outlier signal with a genomic deletion. Only 1 low hybridization signal represented a false-positive result (p188 from strain 940511, Côte d'Ivoire; Figure  3). The number of confi rmed outlier plasmids per isolate ranged from zero for most African isolates to 9 for isolates from Suriname and French Guiana (Figure 1).
Of the 19 plasmid inserts that yielded confi rmed outlier signals, 3 (p111, p299, and p341) contained sequences from the virulence plasmid pMUM001 of M. ulcerans. Of the 16 plasmids derived from the M. ulcerans chromosome, some contained fragments that overlapped the same region ( Figure 4). Hybridizing regions were almost identical for p60 and p61. Both plasmids yielded outlier values with the isolates from Suriname and French Guiana. A cluster of overlapping inserts was observed for p88, p153, and p360; these produced outlier values for both of the Mexican isolates. The same pattern was seen with p124 and p291, which have inserts that are located in close proximity to each other in the genome (Figure 3). These results from related inserts demonstrated the reproducibility of the differential hybridization analysis. Because the inserts p60-p61, p88-p153-p360, and p124-p291 were part of the same deletion in regions of difference (RDs) 4, 5, and 8, respectively; (Figure 4), altogether 12 chromosomal RDs were identifi ed.  Figure 1).

Characterization of Genomic RDs
The 5′ and 3′ limits of the genomic deletions with respect to the genome of strain Agy99 were determined by PCR analysis that used multiple sets of primers complementary to plasmid inserts and to fl anking genomic regions. The size of the deletions ranged from 1.8 kb to 53.1 kb (Table).
In 3 of the 12 RDs (RD3, 9, and 12), 2 distinct types of overlapping deletions (designated A and B) were observed, leading to a total of 15 large deletions. The overlapping deletions shared neither common 5′ nor 3′ end sequences. The strains from Australia had a 3.5-kb deletion in RD3; strains from Suriname and French Guiana had a slightly larger (3.8-kb) deletion. The isolates from Suriname and French Guiana had a larger (25.4-kb) deletion in RD9 than the isolates from Japan and China (17.7 kb). The largest deletion (53.1 kb) was designated RD12A and was observed in strains from Japan and China. Isolates from Suriname and French Guiana had a signifi cantly smaller deletion in RD12 (35.2 kb). The 19.7-kb deletion 6 was found in isolates from 2 different regions (Mexico and Japan/China, respectively). All other deletions were observed in 2 isolates from the same region (Table).
To assess whether polymorphisms undetected by the microarray analysis would frequently occur in the identifi ed RDs, we performed a detailed PCR analysis in all 30 M. ulcerans strains included in this study for 2 randomly selected RDs (RD5 and 12). We used 4 distinct primer pairs to span the insert sequence plus 5′ and 3′ fl anking sequence stretches. For RD12, the PCR analysis confi rmed the presence of a deletion in the 4 strains that had outlier signals in the microarray analysis, but no evidence for deletional polymorphism was obtained in the other strains. For RD5, PCR analysis confi rmed the presence of a deletion in the 2 Mexican strains that had outlier signals (not shown). In addition, this PCR analysis identifi ed the presence of an insertion in strains from Japan, China, Suriname, and French Guiana. The sequence of this 765-bp DNA insert was identical for all 4 strains. Its G+C content was 64%, and BLAST searches showed 98% identity with a sequence stretch of the M. marinum genome (www.sanger.ac.uk/cgi-bin/blast/ submitblast/m_marinum) but no signifi cant homology with sequences in the National Center for Biotechnology Information BLAST databases (www.ncbi.nlm.nih.gov/blast).

Association of Deletions with Insertions
Of the 15 identifi ed genome rearrangement events, 1 (deletion 3A observed in 2 Australian isolates) was found to be a deletion, with the genomic sequences fl anking the 5′ and 3′ borders of the 3,451-bp deletion being directly joined ( Figure 5). Analysis of the other 14 deletions showed that the loss of DNA in a given strain with respect to the genome of Agy99 was associated with the insertion of substituting sequences of varying sizes unrelated to the deleted regions. As an example, the larger (3,784-bp) deletion 3B found in the isolates from Suriname and French Guiana was associated with the insertion of an unrelated DNA fragment, which comprised the 1,368 bp of IS2404 (20) plus an additional DNA stretch of 163 bp ( Figure 5). For most of the other deletions, 1 of the 2 highly abundant insertion sequence elements (IS2404 or IS2606) was situated in either the genomic sequences that fl anked the deletion or that were in the deleted parts or in the substituting sequence stretches (as for deletion 3B).

Analysis of Coding Sequences and Pseudogenes in the Deleted DNA Sequences
The 15 deletions identifi ed contained 52 pseudogenes and 185 predicted protein-coding sequences (CDSs), which represent 5.7% of the annotated 4,143 CDSs in the genome of the M. ulcerans strain Agy99 (18). The number of deleted CDSs and pseudogenes ranged from 2 (RD4) to 50 (RD8 and 12A) and averaged 18.6 per deletion (Table). CDSs were classifi ed into 11 functional categories (17). When compared with the gene composition of the entire Agy99 genome, the following functional categories were overrepresented among the 185 deleted CDSs: insertion sequences, unique hypothetical genes, and predicted proteins involved in detoxifi cation ( Figure 6). Also overrepresented was the deletion of the 52 pseudogenes that contain frame shift mutations and premature stop codons or that are disrupted by an insertion sequence. In contrast, genes involved in intermediary metabolism, information pathways, and cell wall/cell processes were underrepresented among the deleted CDSs ( Figure 6). Of the 185 deleted functional CDSs, 89 had orthologs with >50% amino acid sequence identity to proteins from the M. tuberculosis H37Rv genome. A tendency for gene categories to cluster within the RDs was found. RD2 comprises 2 PPE genes: RDs 1, 12A, and 12B are predominantly CDSs involved in lipid metabolism, and RDs 9A and 11 include mainly transcriptional regulators. However, overall M. ulcerans lineages from distinct geographic origin (Africa, Australia, Asia, South America, Mexico) did not differ markedly in the categories of deleted genes. RD8 (deleted in the Mexican strains) is particularly interesting because it contains a cluster of proteins of the mammalian cell entry mce3 operon and associated regulators thereof. The transcriptional repressor, Mce3R, is considered to be an essential gene required for growth of M. tuberculosis (21). In addition, RD8 comprises a collection of CDSs of almost every functional category (online Appendix Table, available from www.cdc. gov/EID/content/13/7/1008-appT.htm). The spectrum of RD8-associated CDSs involved in detoxifi cation included the multidrug transport protein mmr, the epoxide hydrolase EphB, the thiol peroxidase Tpx, and the alkyl hydroperoxide reductase C protein AhpC.
Although CDSs involved in intermediary metabolism were underrepresented among the deleted genes, 21 (42%) of deleted CDSs of this category were dehydrogenases (such as acyl-CoA short-chain alcohol, saccharopine, and aldehyde dehydrogenases), which are central enzymes in anaerobic metabolism (22) and important for survival in poorly oxygenated environments such as soil (23). In addition, other genes associated with anaerobic respiration, such as nitroreductases and electron transfer proteins, were found among the deleted CDSs.

Discussion
We describe the use of a plasmid-based DNA microarray for identifying large deletional and insertional genomic polymorphisms in a collection of 30 M. ulcerans strains of geographically diverse origin. A set of plasmids randomly selected from an E. coli shotgun library of M. ulcerans genomic DNA was spotted on microarray slides. This is a newly developed technology, highly suitable for situations in which the complete genome sequence of a microorganism is not available. The prototype array used comprised 232 plasmids that yielded a reproducible and stable signal. Plasmids contained M. ulcerans genomic DNA fragments of 2.3-2.7 kb, thus reaching a theoretical genome coverage of 10%. Despite this incomplete coverage, 12 chromosomal and 3 virulence plasmid-associated RDs were identifi ed. Fifteen distinct deletions of 1.8-53.1 kb were found and characterized in detail by sequence analysis within the 12 genomic RDs. The deletions identifi ed were found in >1 M. ulcerans isolate, which demonstrates that they do not refl ect events that occur during in vitro cultivation of individual isolates. The diversity of deletions within some genomic regions implies recombination hot spots or a selective advantage for loss of particular sequence stretches. Recombination events between adjacent copies of IS6110 in M. tuberculosis and IS100 in Yersinia pestis have been shown to promote the deletion of intervening DNA seg-ments (9,(23)(24)(25)(26). Close association of RDs with the high copy number elements IS2404 and IS2606 of M. ulcerans indicates that these are involved in insertional and deletional events.
Although genome coverage with the prototype microarray used here was low, several geographic types of M. ulcerans could be differentiated. The largest group comprised all the African isolates (from Ghana, Benin, Côte d'Ivoire, Democratic Republic of Congo, Angola, and Togo), the isolates from Papua New Guinea, and some of the Australian isolates. A second group comprised the Australian strains 5142 and 5147, and a third group included the South American strains (from Suriname and French Guiana). The Mexican isolates represented a fourth; the Asian isolates (from Japan and China), a fi fth subgroup. An extended analysis of insertions and deletions is expected to eventually give insight into the phylogenetic relationship between M. marinum and different lineages of M. ulcerans. Moreover, the use of a microarray that covers the whole genome  may lead to the development of a genomic fi ngerprinting method, which is urgently needed for microepidemiologic studies that aim to characterize transmission pathways and environmental reservoirs of M. ulcerans.
The 15 distinct genomic deletions that we identifi ed affected 6.2% of the M. ulcerans Agy99 genome, or 5.7% of the annotated CDSs and pseudogenes. When a wholegenome microarray was used to compare genomic DNA of 100 M. tuberculosis isolates, 5.5% of the genes were found to be affected (27). When one considers the limited genome coverage of the M. ulcerans prototype array used here, fi ndings demonstrate a remarkably high degree of insertional and deletional diversity in M. ulcerans. In contrast, single nucleotide polymorphisms are rare (14).
Comparative genomic studies have shown that M. ulcerans recently evolved from the ubiquitous, fast-growing environmental bacterium M. marinum (www.sanger. ac.uk/projects/m_marinum) by lateral gene transfer and reductive evolution (18). Our comparative genomic hybridization analysis of a worldwide collection of M. ulcerans strains indicates that the downsizing of the genome from 6.6 Mb (M. marinum) to 5.8 Mb (M. ulcerans Agy99) is an ongoing process. Further genome reduction appears to be driving genetic diversifi cation of M. ulcerans. Studies of other groups of microorganisms indicate that genome reduction is usually associated with adaptation to a more stable environment. An example is M. leprae, which has eliminated >2,000 genes upon adaptation to its human host (28). To which ecologic niche(s) in the environment or in host organisms M. ulcerans is adapting remains to be investigated.
Among the deleted CDSs are 11 members of the mammalian cell entry mce3 operon, which are regarded as virulence determinants in other mycobacteria. In M. tuberculosis the mce operons have been shown to code for genes important for entry and survival of the pathogen in mammalian cells (29,30). The 4 mce operons of M. tuberculosis have homologs among other mycobacteria. In particular, the mce3 operon has been found in M. avium and M. smegmatis; its deletion in M. bovis has been also documented (31). The 12.7-kb region that codes for the mce3 operon is located near the 3′ end of the RD2 element (32) that is present in M. bovis but absent in some strains of M. bovis BCG, which suggests the potential instability of this region. A mouse model of intradermal infection has recently shown that M. ulcerans is initially captured by phagocytes (33). In vitro studies suggest that the M. ulcerans intracellular stage is transient because phagocytic cells enter apoptosismediated cell death within 1 day. It will be interesting to investigate whether the mce3 operon plays a role during the transient invasion of host cells by M. ulcerans.
Overrepresentation of proteins involved in detoxifi cation processes among the deleted CDSs indicates adapta-tion to a more stable environment. Deletion of many dehydrogenases thought to be involved in anaerobic respiration and of anaerobic respiratory enzymes and tranporters may give a hint that this niche is not anaerobic. At least in highly disease-endemic areas, M. ulcerans' long-term persistence in chronic wounds and shedding into the environment may be relevant for the propagation of this species. Whether M. ulcerans is primarily adapting to persist in a specialized environmental habitat, in arthropod hosts (34), or in chronic wounds of mammalian hosts remains to be determined.