2 Whole Genome Analyses of Treponemes : New Targets for Strain-and Subspecies-Specific Molecular Diagnostics

The genus Treponema comprises several human uncultivable pathogens including Treponema pallidum subspecies pallidum (TPA, the causative agent of the sexually transmitted syphilis), Treponema pallidum subspecies pertenue (TPE, causative agent of yaws), Treponema pallidum subspecies endemicum (TEN, causing endemic syphilis), and Treponema carateum causing pinta. Additionally, the rabbit pathogen Treponema paraluiscuniculi (TPC) is very similar to syphilis treponeme but is not pathogenic to humans. Other pathogenic treponemes (e.g. Treponema denticola and T. vincentii) differ from the others by having considerably larger genomes (MacDougall & Girons, 1995; Seshadri et al., 2004). Moreover, these treponemes can be cultivated under in vitro conditions. The infections caused by human uncultivable pathogenic treponemes can be classified according to their invasivity, from the most invasive bacterium causing venereal syphilis to Treponema carateum (pinta), which is a noninvasive spirochete causing local dermal lesions (Antal et al., 2002). Strains of non-venereal treponemes including Treponema pallidum subspecies pertenue and endemicum are considered moderately invasive. The whole genome analyses of treponemes started with the completion of the whole genome sequence of T. pallidum, by Nichols strain in 1998 (Fraser et al., 1998). Since then, a number of genome studies have been performed (e.g. Brinkmann et al., 2006; Giacani et al., 2010; Harper et al., 2008a; Matějková et al., 2008; McKevitt et al., 2003; McKevitt et al., 2005; Mikalová et al., 2010; Šmajs et al., 2005; Strouhal et al., 2007; Titz et al., 2008). The genomic data has provided new opportunities to study pathogenic treponemes and increase our understanding of these unique pathogens. Serological tests are considered standard laboratory methods for the diagnosis of syphilis since direct diagnostic methods are limited by the fact that the T. pallidum treponemes cannot be cultured continuously under in vitro conditions. The rabbit infectivity test (RIT) is the gold standard for demonstrating T. pallidum infection, but is impractical for clinical use


Introduction
The genus Treponema comprises several human uncultivable pathogens including Treponema pallidum subspecies pallidum (TPA, the causative agent of the sexually transmitted syphilis), Treponema pallidum subspecies pertenue (TPE, causative agent of yaws), Treponema pallidum subspecies endemicum (TEN, causing endemic syphilis), and Treponema carateum causing pinta.Additionally, the rabbit pathogen Treponema paraluiscuniculi (TPC) is very similar to syphilis treponeme but is not pathogenic to humans.Other pathogenic treponemes (e.g.Treponema denticola and T. vincentii) differ from the others by having considerably larger genomes (MacDougall & Girons, 1995;Seshadri et al., 2004).Moreover, these treponemes can be cultivated under in vitro conditions.The infections caused by human uncultivable pathogenic treponemes can be classified according to their invasivity, from the most invasive bacterium causing venereal syphilis to Treponema carateum (pinta), which is a noninvasive spirochete causing local dermal lesions (Antal et al., 2002).Strains of non-venereal treponemes including Treponema pallidum subspecies pertenue and endemicum are considered moderately invasive.The whole genome analyses of treponemes started with the completion of the whole genome sequence of T. pallidum, by Nichols strain in 1998 (Fraser et al., 1998).Since then, a number of genome studies have been performed (e.g.Brinkmann et al., 2006;Giacani et al., 2010;Harper et al., 2008a;Matějková et al., 2008;McKevitt et al., 2003;McKevitt et al., 2005;Mikalová et al., 2010;Šmajs et al., 2005;Strouhal et al., 2007;Titz et al., 2008).The genomic data has provided new opportunities to study pathogenic treponemes and increase our understanding of these unique pathogens.Serological tests are considered standard laboratory methods for the diagnosis of syphilis since direct diagnostic methods are limited by the fact that the T. pallidum treponemes cannot be cultured continuously under in vitro conditions.The rabbit infectivity test (RIT) is the gold standard for demonstrating T. pallidum infection, but is impractical for clinical use

Whole genome analyses of uncultivable treponemes 2.1 Whole genome fingerprinting
The genomes of nine uncultivable treponemes including T. p. pallidum strains (Nichols, SS14, DAL-1 and Mexico A), T. p. pertenue strains (Samoa D, CDC-2 and Gauthier), Treponema paraluiscuniculi Cuniculi A strain, and the Fribourg-Blanc simian isolate, were studied using the whole genome fingerprinting technique (WGF, Mikalová et al., 2010;Strouhal et al., 2007;Weinstock et al., 2000).More than 130 individual amplicons covering the entire genome were digested with a set of several restriction endonucleases and the resulting restriction fragments were visualized using gel electrophoresis.WGF was used to estimate the genome size, genome structure and the sequentially diverse chromosomal regions (Table 1).The observed differences, in the presence of restriction target sites, grouped T. p. pallidum strains into a separate cluster compared to T. p. pertenue strains.The Fribourg-Blanc isolate, although more distantly separated, was clustered with TPE strains (Fig. 1).Analysis of the tprC and tprI gene phylogeny (Gray et al., 2006) revealed similar close relationships between the Fribourg-Blanc treponemes and T. p. pertenue strains.The Fribourg-Blanc isolate is infectious to humans and is able to cause symptoms of yaws (Smith, 1971;Smith et al., 1971).Although the genome analysis of T. pallidum ssp.endemicum strain Bosnia A (Grin, 1952) has not yet been completed, the preliminary analysis of more than a quarter of the genome (25.6%) has revealed a relatedness among the TEN Bosnia A strain, TPE strains, and the Fribourg-Blanc treponeme (Fig. 1, panel B).The observed relatedness between the TPE and the Fribourg-Blanc strains suggests a possible common origin of these strains and potentially indicates treponemal strain transmission between humans and African primates.Since yaws and simian treponemal infections occur in overlapping geographic territories, human treponemal pathogens may have originated in Africa (Livingstone, 1991).The observed restriction target site diversity among TPA strains indicates the presence of two  (Šmajs et al., 2002) was added to the previously published genome sequence (Fraser et al., 1998).
Table 1.Genome size and revealed genome sequence identity with the Nichols genome of T. p. pallidum, T. p. pertenue, T. paraluiscuniculi and the Fribourg-Blanc strains.
The WGF technique also identified genomic regions showing variability in most investigated strains including the intergenic region between genes TP0126 and TP0127, and in the arp, TP0470, and TP0967 genes.Among the investigated TPA and TPE genomes, the tprK-like sequence inserted between the TP0126 and TP0127 genes was found in three different versions (Mikalová et al., 2010).In the Nichols genome, this insertion was found only in part of the treponemal population (Šmajs et al., 2002).With regard to the arp gene (Pillay et al., 1998), a variable number of tandem repetitions were found in the tested genomes.Based on amino acid variations, previously published papers (Harper et al., 2008b;Liu et al., 2007) classified the TPA and TPE Arp repeat motifs into 4 types (I, II, III, II/III), and the variability in repeat sequence types correlated with the sexual transmission strategy (Harper et al., 2008b).The differences among tested strains were also found in a number of 24 bp tandem repeats of TP0470, a gene encoding a hypothetical protein, and in indels present in the hypothetical TP0967 gene (Mikalová et al., 2010).As with the tprK-like insertion between the TP0126 and TP0127 genes, the number of 24 bp repetitions in TP0470 was reported to vary within individual bacterial isolates (Marra et al., 2010).

Whole genome sequencing
Historically, syphilis and yaws treponemes were considered to be separate species (based on differences in clinical manifestations of the corresponding diseases), but since 1984 they have been classified as subspecies (Smibert, 1984) based on DNA hybridization experiments (Miao and Fieldsteel, 1980).
The WGF technique revealed high sequence relatedness among all investigated genomes, with the most divergent, T. paraluiscuniculi, genome differing in less than 2% of the genome sequence (Strouhal et al., 2007).These data indicated that complete, high-quality sequences were required for treponeme genome comparisons.The list of sequenced treponemal genomes and the status of sequencing is shown in Table 2.
With the exception of the Nichols and Chicago genomes, whole genome DNA sequencing has been performed using a combination of several approaches including comparative genome sequencing (CGS, Matějková et al., 2008), 454 pyrosequencing (Margulies et al., 2005) and the Solexa/Illumina method (Bennett, 2004).Isolated genomic DNA of most of the sequenced strains was amplified before genomic DNA sequencing.All discrepancies in CGS, 454 and Solexa/Illumina sequences were resequenced, using the dideoxyterminator sequencing method, until a final consensus sequence was obtained.Sequencing of the T. paraluiscuniculi genome revealed 99.16% sequence identity (Šmajs et al., 2011) of the conserved regions of the Nichols and Cuniculi A genomes.The identity between TPA and TPE genomes, greater than 99.8%, was found during sequencing of three TPE genomes (Čejková et al., unpublished data).In all sequenced genomes, no major genome rearrangements were found.Despite the different clinical manifestations and host specificities, a nearly identical gene order was found in TPA, TPE, and T. paraluiscuniculi strains, further establishing the close genetic relationship between these treponemal pathogens.The accuracy of genome assemblies and the sequencing error rate were estimated using the WGF approach and revealed high quality genome sequences with an error rate less than 10 -4 .All investigated TPE strains were very similar in genome size with only 414 bp difference between the largest, CDC-2, and the smallest, Samoa D, genome.Nucleotide diversity (π) among sequenced TPE genomes was quite low (0.00032).In contrast, the nucleotide divergence (d A ) between TPA and TPE genomes was 3.6 -4.6 times higher than the observed nucleotide diversity among each subspecies.These data indicate a significant evolutionary relationship between yaws and syphilis strains.Sequencing of additional TPA and TPE strains in the future will result in decreased numbers of genetic differences relevant to clinical manifestations of yaws and syphilis treponemes.Altogether, 13 pseudogenes were found in the TPE genomes.In addition to pseudogenes, the genetic changes were analyzed in 970 similarly annotated protein-coding genes in both TPE and TPA strains.Compared to TPA strains, 70.4% of TPE genes encoded either identical proteins or identical proteins with strain specific differences; 194 (19.7%) genes encoded proteins with 1 amino acid substitution found in all tested TPE strains, 63 (6.4%) genes encoded proteins with 2 to 5 amino acid changes, and only 34 (3.5%) genes encoded proteins with 6 or more amino acid replacements or other major protein changes.
3. Targets for TPA strain-specific molecular diagnostics

Multilocus analyses of treponemal strains
The whole genome analyses of treponemal genomes revealed chromosomal regions with accumulated genetic diversity between TPA and TPE strains (Fig. 2).In these regions, we determined the most genetically diverse genes and analyzed them in a set of TPA strains.In addition to four TPA strains (Nichols, SS14, DAL-1 and Mexico A),

25
TPA strains Grady (Atlanta, 1980), MN-3 (Minnesota, unknown), Philadelphia-1 (Philadelphia, 1988), Philadelphia-2 (Philadelphia, unknown), Bal-73-01 (Baltimore, 1973) were analyzed.All these strains were kindly provided by D. L. Cox, CDC, Atlanta, GA.The results of this analysis are summarized in the Tables 3 and 4. The greatest observed nucleotide difference was found in the TP0136 locus, followed by TP0548, TP0326 and TP0488.Interestingly, all investigated strains split into two subclusters containing either the Nichols or the SS14 strain (see Fig. 3).3. Analysis of 4 chromosomal loci as potential targets for PCR detection and typing of clinical treponemal samples.The table shows the percentage of nucleotide differences compared to sequences present in the Nichols genome (Fraser et al., 1998).

Differences in the
Fig. 3.The unrooted tree constructed from the nucleotide region resulting from concatenation of TP0136, TP0326, TP0488, and TP0548 loci sequenced in several TPA strains (of lengths ranging from 8342 to 8412 nucleotides).The bar scale corresponds to 0.01 nt changes per site.Please note that the TPA strains subcluster into two groups, one associated with the Nichols and the second associated with the SS14 strain.
In addition to TP0136, TP0548, TP0326 and TP0488 loci, we also tested other candidate chromosomal regions including TP0346, TP0515, TP0558, and TP0868 genes.The observed nucleotide diversity of these loci is shown in Table 4.The nucleotide diversity was considerably smaller among the latter group of genes (TP0346, TP0515, TP0558, and TP0868) indicating that their potential for testing of clinical TPA samples is lower than that of loci shown in Table 3.As in the previous case, Nichols-like strains were extremely similar to each other and distinct from SS14-like strains, reflecting different evolutionary relationships between these strains.(Fraser et al., 1998).

PCR analyses of clinical samples
Chromosomal genes including TP0136, TP0326, TP0488, TP0548, and TP0868, previously sequenced in TPA type strains, were used as DNA amplification targets from clinical samples containing treponemal DNA.The tested clinical samples were collected in the Czech Republic between years 2004 and 2010.The detected numbers of nucleotide changes were considerably lower than among the tested TPA type strains, indicating a genetic homogeneity of syphilis-causing strains in the Czech Republic.At locus TP0868, no diversity was observed among 5 tested clinical samples (see Table 5) and all TP0868 gene sequences were identical to the SS14 sequence.Interestingly, only strains identical or very similar to the SS14 strain were found among all investigated clinical strains (taken from 91 patients with sequenced treponemal DNA; unpublished results).Although not all gene sequences in the investigated strains were determined, the number of identified unique sequences (shown as unique a -d; Table 5) correlated with the number of nucleotide changes observed in Table 3, indicating that the most variable chromosomal regions identified during whole genome analyses were also the most variable among sequentially related clinical strains, i.e. causing syphilis, from particular geographic areas.Table 5. Analysis of chromosomal loci as potential targets for PCR detection and typing of treponemes.Data for 13 selected samples isolated between the 2004 and 2010 from patients in Czech Republic are shown.
The data shown in Table 5 indicate numbers of different genotypes identified by sequencing analysis of several treponemal chromosomal loci.The greatest numbers of genotypes were found for the locus TP0548 (5) and for genetic variants detected by amplification of the arp gene (5).The use of the TP0548 locus in molecular typing of syphilis was first described by Flasarová et al. (2006) and this locus was recently incorporated to the enhanced molecular typing system of Treponema pallidum (Marra et al., 2010).Analysis of TP0136 revealed 3 genotypes.Although the loci TP0326, TP0488 and TP0868 were sequenced in only a few of the presented isolates, one to two different genotypes were identified.Moreover, the unique sequences for the investigated TP loci appear to vary independently with each other and also vary independently with the number of repetitions in the arp gene and restriction profile of tprEGJ genes.This finding indicates the potential of using the above stated chromosomal loci in a detailed genetic identification of clinical samples.Although the typing of 23S rDNA locus revealed three genotypes, one encoding sensitivity to macrolide antibiotics and two coding for macrolide resistance (A2058G and A2059G mutations, respectively; Lukehart et al., 2004;Matějková et al., 2009;Stamm & Bergen, 2000), their use in molecular typing is probably limited by the fact that these mutations can be selected by use of macrolide antibiotics in a population.In the Czech Republic, more than 35% of clinical samples were found to contain a mutation encoding resistance to macrolide antibiotics (Flasarová et al., unpublished results).In contrast, screening of 23S rDNA for A2058G in syphilitic samples taken from patients in Madagascar revealed no such mutation in 141 samples (Van Damme et al., 2009).The recently improved CDC typing system (Marra et al., 2010) relies on detection of the number of repetitions in the arp gene, on amplification and restriction digest analysis of tprEGJ genes and on sequencing of a part of the TP0548 gene.

Chromosomal targets for detection of TPE strains
Endemic treponematoses (caused by T. pallidum subsp.pertenue, T. pallidum subsp.endemicum and T. carateum) are estimated to currently affect more than 2.5 million people worldwide (Antal et al., 2002).In the previous century, the number of yaws cases decreased from 50 million to a few million.In recent years, yaws has re-emerged in several rural populations in Africa, Asia and South America and a new effort to eradicate this disease has recently been undertaken (Asiedu et al., 2008).Since single-dose penicillin is both cheap and available and no other disease reservoirs (besides humans and primates) are known, the chances for yaws eradication are relatively good.Molecular diagnosis of yaws treponemes in this situation is of fundamental importance.In the last two decades, several subtle genetic differences between TPA and TPE strains were published (Walker et al., 1995;Centurion-Lara et al., 1998;Centurion-Lara et al., 2006).The most prominent indels common for all investigated TPE strains are shown in Table 6.The regions listed in Table 6 and several additional regions (Mikalová et al., 2010) need to be tested for other TPE strains before selecting the most suitable target for a molecular diagnosis of the yaws causing strains.Interestingly, all these indels are also found in the Fribourg-Blanc genome.However, the Fribourg-Blanc isolate can be differentiated based on the presence of specific indels (Mikalová et al., 2010) as well as other individual TPE strains (Mikalová et al., 2010).(Fraser et al., 1998) Table 6.Most prominent indels identified in all investigated TPE strains (Samoa D, CDC-2, and Gauthier).All these changes were also found in the Fribourg-Blanc genome (Mikalová et al., 2010).

Chromosomal targets for detection of TEN strains
The ongoing whole genome sequencing project of T. p. endemicum strain Bosnia A has already identified indels in at least 4 regions (ranging between 13 and ~60 bp), which can be used to differentiate the bejel treponeme from both the T. p. pallidum and T. p. pertenue strains (Table 7).Table 7. Indels identified in the TEN Bosnia A strain when compared to other investigated TPA and TPE strains.

Conclusions
The genomes of 9 pathogenic treponemes including T. p. pallidum strains (Nichols, SS14, DAL-1 and Mexico A), T. p. pertenue strains (Samoa D, CDC-2 and Gauthier), the Fribourg-Blanc isolate and T. p. endemicum (Bosnia A) were analyzed using several approaches including whole genome fingerprinting and whole genome sequencing.Genome analyses revealed several important chromosomal loci suitable for diagnostic purposes including: i) syphilis-causing treponemes and their molecular typing, ii) yaws treponemes, and iii) bejel treponemes.
A sequencing-based typing scheme using simultaneous analysis of 3 loci (TP0136, TP0548 and 23S rDNA genes) in the T. p. pallidum genome was also evaluated.In addition, amplification of 23S rDNA locus and its subsequent restriction target analysis was used to detect mutations leading to macrolide resistance.The unique sequences in the investigated TP loci appear to combine independently with each other and also combine independently with the number of repetitions in the arp gene and restriction profiles of tprEGJ genes.Several genomic regions were found to differ between T. p. pallidum and T. p. pertenue strains and comprised indels ranging from 33 bp in the TP0266 gene to 635 bp in the tprF gene (TP0316).In all cases, the Fribourg-Blanc simian isolate showed changes similar to T. p. pertenue strains suggesting a close relationship to the pertenue subspecies.A partial genome analysis of T. p. endemicum strain, Bosnia A, showed, that this strain clustered with TPE strains, though more distantly than that of the Fribourg-Blanc isolate.The Bosnia A genome contained indels in at least 4 regions (ranging between 13 and ~60 bp) that can be used to differentiate the bejel treponeme from both T. p. pallidum and T. p. pertenue strains.

Fig. 1 .
Fig. 1.Unrooted trees constructed from restriction target site data of the analyzed TPA and TPE genomes.Panel A: unrooted tree constructed from whole genome analyses.The Fribourg-Blanc isolate clusters with TPE strains.Panel B: unrooted tree constructed from 25.6% of the tested genomes.The Bosnia A strain clusters with TPE genomes, indicating a close relationship between this T. pallidum ssp.endemicum strain and TPE strains.The bar scale corresponds to 0.01 and 0.1 restriction target site (RTS) changes per RTS, respectively.TPA strains are shown in bold.

Fig. 2 .
Fig.2.Plot of numbers of nucleotide changes in 20 kb intervals between TPA and TPE strains along the treponemal chromosome.The exact number of nucleotide changes in the four most diverse regions is shown next to each column.Positions of selected chromosomal loci (see Table3 and 4) are shown by asterisks.Positions of tpr genes in the treponemal genomes are shown with triangles (∆).

Table 2 .
Whole genome sequencing of uncultivable treponemal strains

Table 4 .
Analysis of chromosomal loci as potential targets for PCR detection and typing of treponemes.The table shows the number of detected nucleotide changes compared to sequences present in the Nichols genome positions of detected indels in the TEN Bosnia A genome are shown as coordinates thereof in the Samoa D genome (GenBank accession no.CP002374) *