Whole Genome Sequence of the Treponema pallidum subsp. endemicum Strain Bosnia A: The Genome Is Related to Yaws Treponemes but Contains Few Loci Similar to Syphilis Treponemes

Background T. pallidum subsp. endemicum (TEN) is the causative agent of bejel (also known as endemic syphilis). Clinical symptoms of syphilis and bejel are overlapping and the epidemiological context is important for correct diagnosis of both diseases. In contrast to syphilis, caused by T. pallidum subsp. pallidum (TPA), TEN infections are usually spread by direct contact or contaminated utensils rather than by sexual contact. Bejel is most often seen in western Africa and in the Middle East. The strain Bosnia A was isolated in 1950 in Bosnia, southern Europe. Methodology/Principal Findings The complete genome of the Bosnia A strain was amplified and sequenced using the pooled segment genome sequencing (PSGS) method and a combination of three next-generation sequencing techniques (SOLiD, Roche 454, and Illumina). Using this approach, a total combined average genome coverage of 513× was achieved. The size of the Bosnia A genome was found to be 1,137,653 bp, i.e. 1.6–2.8 kbp shorter than any previously published genomes of uncultivable pathogenic treponemes. Conserved gene synteny was found in the Bosnia A genome compared to other sequenced syphilis and yaws treponemes. The TEN Bosnia A genome was distinct but very similar to the genome of yaws-causing T. pallidum subsp. pertenue (TPE) strains. Interestingly, the TEN Bosnia A genome was found to contain several sequences, which so far, have been uniquely identified only in syphilis treponemes. Conclusions/Significance The genome of TEN Bosnia A contains several sequences thought to be unique to TPA strains; these sequences very likely represent remnants of recombination events during the evolution of TEN treponemes. This finding emphasizes a possible role of repeated horizontal gene transfer between treponemal subspecies in shaping the Bosnia A genome.

While yaws is found in warm, moist climates, bejel is found in drier climates. In both cases, infection is spread by direct contact (e.g. skin-to-skin or skin-to-mucosa). In addition, bejel can also be transmitted by contact with contaminated utensils [1,2]. The current, and widespread, belief that yaws and bejel are non-sexually transmitted may simply reflect that these diseases mostly affect children that have not reached sexual maturity [3,4].
Diagnosis of endemic treponematoses comprises clinical symptoms, epidemiological data, and serology. Since there is significant clinical similarity between the symptoms of syphilis and endemic treponematoses, and serology cannot discriminate between infection with TPA, TPE, and TEN strains, the epidemiology plays a major role in establishing a diagnosis. While yaws remains endemic in poor communities in Africa, Southeast Asia, and the western Pacific, bejel is predominant in western Africa and in the Middle East (reviewed in [2,4]). Imported cases of yaws and bejel have been documented in children in Europe and Canada [5,6]. With the accumulation of genetic data, molecular targets that can be used to differentiate treponemal subspecies, at the molecular level, have become available [2]. Endemic syphilis has been described almost everywhere in Europe since the 16th century (for review see [7]) and often was described under different names, e.g. the disease that appeared in Brno, CZ in 1575 was called morbus Brunogallicus, although it is not clear whether this infection was not perhaps caused by the syphilis treponeme [8]. The Bosnia A strain was isolated in 1950 in Bosnia, a country in southern Europe, from a 35-year old male with mucous patches under the tongue and on the tonsils; additionally, the patient showed secondary lesions (papules) on the face, trunk and extremities. Material for experimental inoculation of laboratory animals was taken from an ulcer on the shaft of the penis [9]. Although several other isolates were collected from bejel patients, only one additional strain of T. pallidum subsp. endemicum (Iraq B) is currently propagated in laboratory settings.
In this study, the complete genome sequence of the T. pallidum subsp. endemicum Bosnia A strain was obtained using a combination of next-generation sequencing approaches and compared to the genomes of the four TPE strains (Samoa D, CDC-2, Gauthier, Fribourg-Blanc isolate) and five TPA strains (Nichols, DAL-1, Chicago, SS14, Mexico A), all of which have been determined in recent years [10][11][12][13][14][15].

Amplification of TEN Bosnia A DNA
Bosnia A DNA was provided by Dr. Sylvia M. Bruisten from the Public Health Service, GGD Amsterdam, Amsterdam, The Netherlands. Bosnia A genomic DNA was amplified using the pooled segment genome sequencing (PSGS) method as described previously [11,15]. Briefly, Bosnia A DNA was amplified with 214 pairs of specific primers to obtain overlapping PCR products (Table S1). To facilitate sequencing of paralogous genes containing repetitive sequences, PCR products were mixed in equimolar amounts into four distinct pools. Prior to next-generation sequencing (454-pyrosequencing, Illumina and SOLiD), the PCR products constituting each pool were labeled with multiplex identifier (MID) adapters and sequenced as four different samples. Two genomic regions were not amplified during PSGS and therefore were not used for sequencing the whole genome (gaps between coordinates 332290-335395 and 1123251-1123648 according to the Nichols sequence, AE000520.1 [16]; see Table  S1). Sequences in these regions were Sanger sequenced at the University of Washington in Seattle (WA), USA.  Table  S1) and were separately assembled de novo using a Newbler assembler (454 Life Sciences, Branford, CT, USA) or TIGRA [17], respectively. The resulting 454 and Illumina contigs obtained for each pool were then aligned to the corresponding sequences (representing each pool sequence) of the reference CDC-2 genome (CP002375.1 [11]) using Lasergene software (DNASTAR, Madison, WI, USA). All gaps and discrepancies between these platforms within each pool were resolved using Sanger sequencing. Altogether, 20 genomic regions of the Bosnia A genome were amplified and Sanger sequenced. The final overlapping pool sequences were joined to obtain complete genome sequence of the Bosnia A strain. The SOLiD sequencing results were mapped to the reference Samoa D genome (CP002374.1 [11]) using the CLC Genomics Workbench (CLC bio, Cambridge, MA, USA) and were processed as mentioned above. The genome sequence obtained from SOLiD was then compared with the consensus genome sequence obtained from 454 and Illumina. All discrepancies were resolved using Sanger sequencing. Two TPE genomes (CDC-2 or Samoa D) were used as reference genomes for contig alignments since only few minor genetic differences have been found to be specific within individual TPE strains [11].

DNA sequencing and assembly of the Bosnia A genome
Due to low coverage, one genomic region (Treponema pallidum interval; TPI), was amplified with specific primers using a GeneAmp XL PCR Kit (Applied Biosystems, Foster City, CA, USA) [18,19]. This TPI-48 interval contained paralogous genes tprI and tprJ. The PCR product was purified using a QIAquick PCR Purification Kit (QIAGEN, Valencia, CA, USA) according to the manufacturer's

Author Summary
Uncultivable treponemes represent bacterial species and subspecies that are obligate pathogens of humans and animals causing diseases with distinct clinical manifestations. Treponema pallidum subsp. pallidum causes sexually transmitted syphilis, a multistage disease characterized in humans by localized, disseminated, and chronic forms of infection, whereas Treponema pallidum subsp. pertenue (agent of yaws) and Treponema pallidum subsp. endemicum (agent of bejel) cause milder, non-venereally transmitted diseases affecting skin, bones and joints. The genetic basis of the pathogenesis and evolution of these microorganisms are still unknown. In this study, a high quality whole genome sequence of the T. pallidum subsp. endemicum Bosnia A strain was obtained using a combination of next-generation sequencing approaches and compared to the genomes of available uncultivable pathogenic treponemes. Relative to all known genomes of Treponema pallidum subspecies, no major genome rearrangements were found in the Bosnia A. The Bosnia A strain clustered with other yaws-causing strains, while syphilis-causing strains clustered separately. In general, the Bosnia A genome showed similar genetic characteristics to yaws treponemes but also contained several sequences thought to be unique to syphilis-causing strains. This finding suggests a possible role of repeated horizontal gene transfer between treponemal subspecies in shaping the Bosnia A genome.
instructions and Sanger sequenced using internal primers. The tprK (TENDBA_0897), arp (TENDBA_0433), and TENDBA_0470 genes were amplified and cloned into the pCR 2.1-TOPO cloning vector (Invitrogen, Carlsbad, CA, USA). Nine independent clones for the tprK and arp genes and seven clones for TENDBA_0470 were sequenced as previously described [11]. A total of 7 genomic regions (in genes TENDBA_0040, TENDBA_0348, TENDBA_ 0461, TENDBA_0697, TENDBA_0859, TENDBA_0865 and TENDBA_0966) revealed intra-strain variability in the length of homopolymeric (G-or C-) stretches. The prevailing length of these regions was determined by TOPO TA-cloning and Sanger sequencing. At least five independent clones were sequenced as previously described [15].

Gene identification, annotation and classification
The final whole genome sequence of the Bosnia A strain was assembled from SOLiD, 454 and Illumina contigs. In addition, Sanger sequencing was used for finishing the complete genome sequence and for additional sequencing including paralogous, repetitive and intra-strain variable chromosomal regions. Geneious software v5.6.5 [20] was used for gene annotation based on the annotation of the TPE CDC-2 genome [11]. Genes were tagged with TENDBA_ prefix. The original locus tag numbering corresponds to the tag numbering of orthologous genes annotated in the TPE CDC-2 genome [11]. The TENDBA_0897 gene, coding for TprK, showed intra-strain variable nucleotides and therefore nucleotides in variable regions were denoted with Ns in the complete Bosnia A genome. For proteins with unpredicted functions, a gene size limit of 150 bp was applied. Protein domains and functional annotation of analyzed genes were characterized using Pfam [21], CDD [22] and KEGG [23] databases.

Comparisons of whole genome sequences
Whole genome nucleotide alignments of five TPA strains, four TPE strains and the Bosnia A strain were used for determination of genetic relatedness using several approaches including calculation of nucleotide diversity (p) and construction of a phylogenetic tree. All positions containing indels in at least one genome sequence were omitted from the analysis. There were a total of 1,128,391 nucleotide positions aligned in the final dataset. TPA strains comprised Nichols (re-sequenced genome CP004010.2 [14]), DAL-1 (CP03115.1 [13]), SS14 (re-sequenced genome CP004011.1 [14]), Chicago (CP001752.1 [10]), and Mexico A (CP003064.1 [12]) genomes, while TPE strains included Samoa D (CP002374.1 [11]), CDC-2 (CP002375.1 [11]), Gauthier (CP002376.1 [11]) and Fribourg-Blanc (CP003902.1 [15]). Whole genome alignments were constructed using Geneious software [20] and SeqMan software (DNASTAR, Madison, WI, USA). Nucleotide differences among studied whole genome alignments were analyzed using DnaSP software, version 5.10 [24]. An unrooted phylogenetic tree was constructed from the whole genome sequence alignment using the Maximum Parsimony method and MEGA5 software [25]. To test, whether the mosaic character of identified loci were a result of intra-strain recombination, potential donor sites were screened from the entire Bosnia A genome using several computer programs and algorithms including RDP3 [26], EditSeq software (DNASTAR, Madison, WI, USA), BLAST (http://blast.ncbi.nlm.nih.gov), and Crossmatch (http://www. phrap.org/phredphrapconsed.html). We failed to find any potential donor sites in the Bosnia A genome. We also failed to find any TPA-or TPE-specific NGS reads in the regions having a mosaic character.

Nucleotide sequence accession number
The complete genome sequence of the Bosnia A strain was deposited in the GenBank under accession number CP007548.

Whole genome sequencing, genome parameters, gene annotation
Sequencing of the TEN Bosnia A strain genome using three independent next-generation sequencing platforms yielded a total combined average coverage of 5136. The summarized genomic features of the Bosnia A strain in comparison to previously sequenced TPA and TPE strain genomes are shown in Table 1. The size of the Bosnia A genome (1,137,653 bp) was 1,628-2,828 bp shorter than the sizes of previously published genomes for TPA and TPE strains [10][11][12][13][14][15]. The overall gene order in the Bosnia A genome was identical to other TPE and TPA strains. Altogether, 1125 genes were annotated in the Bosnia A genome including 54 untranslated genes encoding rRNAs, tRNAs and other ncRNAs (short bacterial RNA molecules that are not translated into proteins). A total of 640 genes (56.9%) encoded proteins with predicted function, 137 genes encoded treponemal conserved hypothetical proteins (TCHP, 12.2%), 141 genes encoded conserved hypothetical proteins (CHP, 12.5%), 145 genes encoded hypothetical proteins (HP, 12.9%) and 8 genes (TENDBA_0082a, TENDBA_0146, TENDBA_0316, TENDBA_ 0370, TENDBA_0520, TENDBA_0532, TENDBA_0812 and TENDBA_1029; 0.7%) were annotated as pseudogenes. The average and median gene lengths of the Bosnia A genome were calculated to 979.2 bp and 831 bp, respectively. The intergenic regions covered 52.6 kbp and represented 4.63% of the total Bosnia A genome length. In general, other calculated genomic parameters were similar to other TPE strains.
When compared to TPA strains, the Bosnia A genome contained a 635 bp long insertion in the tprF locus. In this respect, the Bosnia A genome was similar to TPE strains. When compared to both TPA and TPE genomes, the Bosnia A genome contained a 2300 bp long deletion involving the tprF and G loci (TPANIC_0316 and TPANIC_0317 in the Nichols genome CP004010.2 [14]). Moreover, the predicted TENDBA_0316 gene (1860 bp in length) was a chimera encompassing the tprG 59region, tprI-like sequence and the tprF 39-region, and was hence designated as tprGI as previously described by Centurion-Lara et al. [27] (Table 2). Two insertions of 65 bp and 52 bp, respectively, resulted in the prediction of two hypothetical genes, TENDBA_ 0126b and TENDBA_548a. The same orthologs were also predicted in TPE but not in TPA strains (Table 2).

Similarity of the Bosnia A genome to the available TPA and TPE genomes
Sequence relatedness of the Bosnia A genome to other Treponema pallidum genomes is shown in Fig. 1. This unrooted tree was constructed using several available whole genome sequences of uncultivable pathogenic treponemes. The image clearly showed clustering of the Bosnia A strain with the TPE strains. The Bosnia A genome was found to be 99.91-99.94% and 99.79-99.82% identical to the TPE and TPA genomes, respectively ( Table 3). The nucleotide diversity between TPE strains and the Bosnia A strain (0.0006360.00032 to 0.0008660.00043) was about three times lower than the nucleotide diversity between TPA strains and the Bosnia A strain (0.0018160.00090 to 0.0021260.00106). For comparison, calculated p values between the Bosnia A strain and individual TPA strains were of the same order of magnitude as p values between TPA and TPE strains (Table 4).

Bosnia A specific sequences
To identify Bosnia A-specific differences, the Bosnia A genome was compared to the available genomes of TPE strains [11,15] and TPA strains [10,[12][13][14]. The Bosnia A strain-specific sequences were defined as those not present in both TPA and TPE strains and altogether comprised 406 differences (indels and substitutions with a total length of 2772 bp) equally distributed along the Bosnia A genome (Fig. 2). Differences in coding regions included 9 deletions, 5 insertions and 360 nucleotide substitutions for a total of 2728 bp (Table 5). Those 360 substitutions resulted in 197 Bosnia A-specific amino acid differences in the putative proteome. Most of the nucleotide substitutions were found in the TENDBA_0136, TENDBA_0548, TENDBA_0856, TENDBA_0859 and TENDBA_0865 genes (Table 5). Bosnia A-specific frameshift mutations (caused by three deletions and one insertion) resulted in significant gene truncation (TEND-BA_0082a, TENDBA_0316 and TENDBA_1029) or elongation (TENDBA_0126b) ( Table 2). Other detected indels resulted in 6 protein shortenings (TENDBA_0067, TENDBA_0136, TENDBA_ 0225, TENDBA_0548, TENDBA_0859, and TENDBA_0865) and 4 protein elongations (TENDBA_0856, TENDBA_0859, TENDBA_ 0897, and TENDBA_0898) ( Table 5).
All affected genes code for hypothetical proteins of unknown function except for TENDBA_0898 coding for RecB (exodeoxyribonuclease V beta subunit; EC3.1.11.5). TENDBA_0136 and TENDBA_0865 have been predicted to be putative outer membrane proteins. In addition, TPA and TPE orthologs to TENDBA_0136 have been experimentally shown to bind human fibronectin [28]. TENDBA_0856 has been predicted to be putative lipoprotein. No putative conserved domains have been detected in hypothetical proteins except for TENDBA_0067, TENDBA_0225 and TENDBA_1029 containing TPR (tetratricopeptide) domain, LRR_5 (leucine rich repeat) domain and DbpA (RNA binding) domain, respectively ( Table 5). All nonsynonymous substitutions have been identified outside the predicted domains.
Bosnia A sequences shared with TPE but not TPA strains Genome sequences differentiating the Bosnia A strain from the TPA but not TPE strains are shown in Fig. 2. These sequences were found to be regularly distributed along the Bosnia A genome and altogether comprised 1422 differences (indels and substitutions of total length of 2335 bp). In the coding regions, 2128 bp including 13 deletions, 9 insertions and 1296 substitutions differentiated genomes of TPA strains from Bosnia A and other TPE strains (Table 6). A set of 1296 substitutions resulted in 631 amino acid differences in the encoded proteins. Most of the differences were found in genes TENDBA_0117 (tprC), TENDBA_0131 (tprD), TENDBA_0133, TENDBA_0134, No. of rRNA loci 6 (2 operons) 6 (2 operons) 6 (2 operons) 6 (2 operons) 6 (2 operons) No. of ncRNAs 3 3 3 3 3 a [15]. b [11]. c in previous studies [11,15], Samoa D, CDC-2, Gauthier and Fribourg-Blanc genomes were compared to the Nichols CP004010.2 genome sequence [14]. Bosnia A sequences shared with TPA but not TPE strains Genome sequences differentiating the Bosnia A strain from TPE but not TPA strains are shown in Fig. 2. These sequences were also found to be regularly distributed along the Bosnia A genome and, altogether, comprised 197 differences in genome positions (containing indels and substitutions encompassing a total of 635 bp). Three deletions, three insertions and 174 substitutions (Table 7) were found within the Bosnia A coding regions, encompassing a total of 612 bp. The 174 substitutions resulted in 101 amino acid differences in the putative encoded proteins.
Most of the substitution differences were found in genes TENDBA_0136, TENDBA_0488, TENDBA_0577, TEND-BA_0856a/TENDBA_0858, TENDBA_0859, TENDBA_0865 and TENDBA_0968 (Table 7). An insertion of 378 bp in TENDBA_1031 (tprL) resulted in a gene elongation (Table 2). TENDBA_0488 codes for Mcp (methyl-accepting chemotaxis) protein. All other genes code for hypothetical proteins of unknown function. Two genes have been predicted to encode putative outer membrane proteins (TENDBA_0136 and TENDBA_0865) and one gene has been predicted to encode putative lipoprotein (TENDBA_0858). No putative conserved domains have been detected in hypothetical proteins (Table 7).

Several genetic loci of the Bosnia A genome show striking similarity to TPA sequences
Despite the overall sequence similarity of the Bosnia A genome to TPE strains, several chromosomal sequences were found to be almost identical to sequences in TPA strains. The Bosnia A sequence in the TENDBA_0577 locus was identical to four out of 5 orthologous sequences of completely sequenced TPA strains (Fig. 3). In the TENDBA_0968 locus, stretches of TPA-and TPE-like sequences were found (Fig. 3) and a similar pattern was also found in TENDBA_0858 (not shown). In addition, TENDBA_0326 (tp92, bamA) was identical to the orthologous sequence of TPA SS14 (coordinates 1593-1649, Fig. 3) and to all TPA strains (with the exception of the TPA Mexico A strain) between coordinates 2127-2494. The TPA Mexico A strain is, in this region, similar to TPE strains [12,29]. While the latter TPAlike sequences in TENDBA_0326 were almost 0.4 kbp long, other TPA-like sequences were usually relatively short, ranging from about 50-70 bp. However, TPA-like sequences of the Bosnia A strain were clearly different from Bosnia A-specific sequences with sporadic nucleotide positions identical to TPA sequences (TENDBA_0856; Fig. 3). The previously reported 378 bp insertion almost identical to TPA strains (differing only in one nucleotide position [27]) was confirmed in TENDBA_1031 as well as the nucleotide mosaic in the TP0488 (mcp2-1) locus; revealing a sequence identical to TPA Mexico A (with the exception of 2 single nucleotide substitutions [12]). Altogether, at least seven TPA-like sequences having 5 or more nucleotide positions identical to TPA sequences and not interrupted by TPE-like nucleotide positions were found in the Bosnia A genome.

Discussion
The first complete genome sequence of the bejel-causing agent, T. pallidum subsp. endemicum (TEN) strain Bosnia A, was determined using three independent next-generation sequencing techniques. Because the total combined coverage was .5006 and all sequencing ambiguities were resolved with Sanger sequencing, the quality of this new genome is very high. This allowed us to carry out a comparative analysis of the Bosnia A genome with the already available treponemal genomes [10][11][12][13][14][15]30] with a high degree of confidence that our results would not be affected by sequencing errors. In several of the previously published genomes, the whole genome sequence was compared to whole genome fingerprinting data to assess the quality of the genome sequence. In each of the previously tested genomes, the sequencing error rate was less than 10 24 [11,12,15,30].
The genome length of strain Bosnia A (1,137,653 bp) is about 2 kbp shorter than the length of TPE or TPA genomes. This is caused by a 2300 bp deletion in the tprF and tprG loci. This deletion was also confirmed in the TEN Iraq B sequence [27] suggesting that this is a common feature of bejel strains. An identical deletion was also found in the T. paraluisleporidarum ec. Cuniculus genome (formerly denoted T. paraluiscuniculi Cuniculi A [30,31]). Moreover, this type of deletion was observed during PCR amplification of the tprF and tprG loci in other treponemal genomes (M. Strouhal, D. Š majs; unpublished data). This fact, together with the presence of repeats in the flanking regions suggests that this 2300 bp deletion is a result of polymerase slippage and that this deletion could have happened several times independently during evolution. In fact, no other similarities between the Bosnia A and T. p. ec. Cuniculus genome were found with respect to other identified indels in the T. p. ec. Cuniculus genome.
The overall genetic similarity of Bosnia A to the sequenced TPE strains is 99.91-99.94%, at the DNA level. For comparison, the sequence similarity between TPA and TPE strains is greater than 99.8% [11,15]. This enormous sequence similarity among TPA, TPE and TEN strains is the molecular basis for the long established fact that individual etiological agents of syphilis and endemic treponematoses (yaws and bejel) cannot be distinguished by their morphology or serology.
Although syphilis, yaws, and bejel show differences in their geographical distribution, mode of transmission, invasiveness and pathogenicity, it is known that the clinical symptoms of these diseases overlap and one disease can mimic the others. Interestingly, in very dry areas, yaws symptoms are almost the same as bejel symptoms [32]; which again reflects the extremely high sequence similarity between TPE and TEN strains. In many or perhaps most cases, the final diagnosis is therefore often based on the epidemiological context of the infection. However, at the same time, even small genomic differences (although not known at present) have the potential to influence the phenotypic differences between the clinical manifestations of syphilis, yaws and bejel. Additional whole genome sequences of TPA, TPE and TEN strains will help to identify a set of invariant differences between the etiological agents of these diseases, which could help answer this question.
At the same time, the TEN Bosnia A strain is clearly distant from the cluster of TPE strains. However, additional TEN whole genome sequences will be needed to assess the variability within TEN strains. To our knowledge, there is only one additional laboratory stock of TEN, i.e. strain Iraq B. Previous studies on the Iraq B isolate revealed a high degree of similarity to Bosnia A [27,29,[33][34][35][36] suggesting that this strain is more related to Bosnia A than to TPE strains.
Most prominent genetic changes between Bosnia A and TPE and/or TPA genomes resulting in protein truncations or elongations were located in just 14 genes. These genes encoded TprA, F, G, and L proteins, RecQ protein, ethanolamine phosphotransferase, and treponemal conserved hypothetical pro- Table 4. Calculated nucleotide diversity (p 6 standard deviation) between TPA and TPE strains, within individual TPE strains, within TPA strains, and between Bosnia A strain and TPA and TPE strains.

Strains
Nucleotide diversity (p ± SD)    (5). Both Tpr and RecQ proteins were found to also be affected in the T. p. ec. Cuniculus genome [30]. While the tprA gene was functional in Bosnia A and TPE strains but not among TPA strains (except for strain Sea 81-4; see [37]), tprF and tprG were partially deleted (similarly to T. p. ec. Cuniculus genome) and the tprL gene was elongated in a way that was similar to that seen in TPA strains. These changes were already described in detail by Centurion-Lara et al. [27]. Tpr proteins likely play an important role in treponemal infectivity, pathogenicity, immune evasion and host specificity. Tpr proteins induce an antibody response during infection and exhibit heterogeneity both within and among T. pallidum subspecies and strains [38][39][40]. In the T. p. ec. Cuniculus genome, a mutation in recQ resulted in a predicted RecQ protein without a C-terminal or DNA-binding domain [41]; on the other hand in Bosnia A the frameshift reversion led to a functional recQ gene (similar to that seen in TPE genomes [11]). Other prominent changes seen in the Bosnia A strain include a different number of tandem repeat units in TENDBA_0433 (encoding Arp) and TENDBA_0470 genes (encoding conserved hypothetical protein) compared to orthologous genes in individual TPE and TPA strains. The same number of 60-bp tandem repeat units (all of Type II) within the arp gene was found in the Bosnia A genome as previously described [42]. Variable numbers of tandem repeat units in genes orthologous to TENDBA_0470 have already been described in TPE and TPA strains [11,15,19]. The genome of Bosnia A showed several genetic loci with sequences identical to TPA sequences (Fig. 3). The TENDBA_0577 gene encoded treponemal conserved hypothetical protein of unknown function with predicted cytoplasmic membrane localization. This gene was completely identical to TPA orthologs and differed from TPE orthologs by deletion of 12 nucleotides and substitution of 5 nucleotides. Recent studies of s factor RpoE (TP0092) binding sites identified gene TP0577 (orthologous to TENDBA_0577) as one of 22 putative TP0092controlled ORFs [43]. The TENDBA_0577 thus could possibly code for a protein integrated in the stress response pathway during the first days post infection. Similarly, the 378 bp insertion in TENDBA_1031 is with exception of a 1 nucleotide insertion almost identical to orthologs of the TPA strain (but not to TPE strains). In other genes (TENDBA_0968, TENDBA_0858), 50-70 bp long sequences identical to one or several TPA strains were found indicating that the genome of Bosnia A incorporated sequences identical to TPA strains. Most of the above mentioned genes were found to evolve under positive selection in TPA-TPE comparisons [11]. In fact, previous papers found this type of mixed TPA and TPE sequences in TPA Mexico A and South Africa strains [12,29]. Moreover, previous reports have shown that TEN strain Bosnia A contains the same nucleotide mosaic at the TP0488 (mcp2-1) locus as TPA Mexico A (with the exception of 2 single nucleotide substitutions). Despite the numerous efforts to identify potential donor sites within TPA Mexico A that could explain the existence of these sequences by intra-strain recombination [12], no such sites have been identified in the Mexico A genome. Similarly, no donor sites have been identified in the Bosnia A genome either. It is likely that these sequences identical to TPA in the Bosnia A genome could result from inter-strain recombination event between TPA and TEN strains during a simultaneous infection of multiple hosts during the TEN evolution. Although the overall genome sequence of Bosnia A is related to TPE strains, horizontal gene transfer appears to be the mechanism that introduced at least seven chromosomal sequences related to TPA SS14, TPA Mexico A, and other TPA strains. In fact, both the TPA SS14 and Mexico A sequences are required and sufficient to provide sequences to Bosnia A genome. Moreover, at least two subsequent transfers had to occur to introduce both SS14-and Mexico A-specific sequences. Experimental infection with either TPA, TPE or TEN strains did not result in complete cross-protection [9]. In addition, recombination mechanisms are more active during treponemal infection and represent important genetic mechanisms for avoiding the host immune response [40]. Moreover, the absence of modification and restriction systems and the presence of genes for homologous recombination in pathogenic treponemes [16] appear to allow incorporation of foreign DNA molecules with subsequent integration into chromosomal DNA. Therefore, uptake of TPA DNA by a TEN strain during a simultaneous infection of multiple hosts appears to be a possible explanation. It is clear that TPA strains can be classified as SS14-like (SS14, Mexico A) and Nichols-like strains (Nichols, DAL-1, Chicago) [14,44] and that most of the TPA strains causing infections throughout the world are in fact SS14-like strains [36]. However, it is not clear if the SS14 and Mexico A sequences in the Bosnia A genome reflect a greater prevalence of SS14-like strains in the human population or an accidental coincidence of transfers from SS14-like strains. Moreover, there are several loci in the Bosnia A genome similar to the TENDBA_0856 locus (TENDBA_0483, TENDBA_0858, TENDBA_0865) that represent regions of Bosnia A-specific sequences with only sporadic nucleotide positions that are identical to TPA sequences. These sequences may be identical to other, yet unidentified, TPA strains or isolates. If such TPA isolates are identified in the future, they may help to unravel the evolution of TPA and TEN treponemes.

Supporting Information
Table S1 Sample preparation of Bosnia A strain for whole genome sequencing using pooled segment genome sequencing (PSGS) strategy. Sheet 1 (TableS1_BosniaAprimers) contains a list of primers used for whole genome amplification of the Bosnia A strain using PSGS strategy. Sheet 2 (TableS1_BosniaA-overlap reg) contains a list of primers used for amplification of TPI-overlapping regions shorter than 60 bp. (XLS)