Clinical and genomic diversity of Treponema pallidum subspecies pallidum to inform vaccine research: an international, molecular epidemiology study

Summary Background The increase in syphilis rates worldwide necessitates development of a vaccine with global efficacy. We aimed to explore Treponema pallidum subspecies pallidum (TPA) molecular epidemiology essential for vaccine research by analysing clinical data and specimens from early syphilis patients using whole-genome sequencing (WGS) and publicly available WGS data. Methods In this multicentre, cross-sectional, molecular epidemiology study, we enrolled patients with primary, secondary, or early latent syphilis from clinics in China, Colombia, Malawi, and the USA between Nov 28, 2019, and May 27, 2022. Participants aged 18 years or older with laboratory confirmation of syphilis by direct detection methods or serological testing, or both, were included. Patients were excluded from enrolment if they were unwilling or unable to give informed consent, did not understand the study purpose or nature of their participation, or received antibiotics active against syphilis in the past 30 days. TPA detection and WGS were conducted on lesion swabs, skin biopsies, skin scrapings, whole blood, or rabbit-passaged isolates. We compared our WGS data to publicly available genomes and analysed TPA populations to identify mutations associated with lineage and geography. Findings We screened 2802 patients and enrolled 233 participants, of whom 77 (33%) had primary syphilis, 154 (66%) had secondary syphilis, and two (1%) had early latent syphilis. The median age of participants was 28 years (IQR 22–35); 154 (66%) participants were cisgender men, 77 (33%) were cisgender women, and two (1%) were transgender women. Of the cisgender men, 66 (43%) identified as gay, bisexual, or other sexuality. Among all participants, 56 (24%) had HIV co-infection. WGS data from 113 participants showed a predominance of SS14-lineage strains with geographical clustering. Phylogenomic analyses confirmed that Nichols-lineage strains were more genetically diverse than SS14-lineage strains and clustered into more distinct subclades. Differences in single nucleotide variants (SNVs) were evident by TPA lineage and geography. Mapping of highly differentiated SNVs to three-dimensional protein models showed population-specific substitutions, some in outer membrane proteins (OMPs) of interest. Interpretation Our study substantiates the global diversity of TPA strains. Additional analyses to explore TPA OMP variability within strains is vital for vaccine development and understanding syphilis pathogenesis on a population level. Funding US National Institutes of Health National Institute for Allergy and Infectious Disease, the Bill & Melinda Gates Foundation, Connecticut Children’s, and the Czech Republic National Institute of Virology and Bacteriology.


STUDY DESIGN
The study will be a longitudinal observational study of patients with primary, secondary, or early latent syphilis.
From our well-established syphilis network (Figure 2), we will recruit, screen, and enroll patients presenting with primary or SS (Figure 1).We will also enroll a smaller number of patients from the UNC and Cali sites with early latent syphilis.We will also enroll pregnant women with early syphilis at our Cali site.Our clinical sites, located in four different continents, will provide access to a diversity of patient populations with early syphilis, including heterosexual men and women, men who have sex with men, pregnant women, and HIV coinfected persons.
In order to participate, participants must be 18 years of age or older; be willing to provide informed consent; have signs or symptoms of primary or SS, or documented seroconversion within the previous 12 months, or sexual exposure to an individual known to have early syphilis diagnosed within the last 12 months.It is important to note that each of the four sites has slightly different capabilities for screening and testing patients for syphilis (See Study Procedures).
Following written consent, patients will undergo evaluation to confirm a diagnosis of early syphilis using direct detection and/or rapid treponemal Ab testing and stat non-treponemal antibody testing, (for patients with a known history of prior syphilis).Participants with unknown or prior HIV-negative status will undergo HIV testing.Additional STI testing will be conducted as per standard, country-specific care, and urine pregnancy testing will be performed in women with childbearing potential.
Study participants with initial confirmation of early syphilis diagnosis will be consented for additional study procedures including a detailed standardized questionnaire and specimen collection, including exudate collection and ulcer swabs from genital/perianal lesions, skin biopsies from patients with secondary syphilis, and serum and whole blood collection.All patients will receive syphilis therapy as per country specific standards of care.Patients enrolled in Cali and UNC will be asked to return for a follow up visit within 6 months after initial diagnosis.At the follow up visit, PBMC and serum samples will be obtained for Ab binding experiments and immunologic studies.If a patient is re-infected and clinically symptomatic with either primary or secondary syphilis, new samples will be obtained for TPA DNA analysis and OMP sequencing.

STUDY POPULATIONS 2.1 Selection of the Study Population
Site 1: UNC Infectious Disease Clinic, Chapel Hill, North Carolina: This clinical site will recruit and enroll known HIV-infected patients who present to the walk-in clinic or their routine appointments with genital lesions or a rash suggestive of early syphilis, and patients with early latent syphilis.
Site 2: CIDEIM, Cali, Colombia: This clinical site will recruit and enroll early syphilis patients identified through the clinical network and referred for enrollment to the Clinical Research Unit at CIDEIM.
Site 3: UNC Project, Lilongwe, Malawi: This clinical site will recruit and enroll patients who present to the Bwaila Hospital STI Clinic with genital ulcer disease (GUD).
Site 4: Southern Medical University Dermatology Hospital, Guangzhou, China: This clinical site will recruit and enroll patients who present to the STI clinic with a presumptive diagnosis of primary or secondary syphilis.

Inclusion/Exclusion Criteria
Inclusion criteria for screening (based on site capability for routine syphilis testing): participants with suspected primary or secondary syphilis); 4. Willingness to provide serum and whole blood for peripheral blood mononuclear cells; 5. Study subject is willing to return to the clinic for study follow-up between 1-6 months following enrollment (UNC and Cali sites only).
Pregnant women, HIV-infected persons or patients who have been previously enrolled in the study will not be excluded.We will actively recruit participants who are suspected of reinfection with primary, secondary, or early latent syphilis.
Exclusion criteria: 1. Patient is unwilling or unable to give informed consent; 2. Patient does not understand the purpose of the study or the nature of their participation; 3. Patient has received antibiotics active against TPA in the last 30 days.

Study Procedures
Screening: Potential participants with either signs/symptoms of early syphilis, history of a recent exposure to a syphilis infected sexual partner, and/or reactive syphilis screening test will present to the clinics for routine evaluation.This may involve a routine history, targeted physical examination, and initial evaluation for syphilis.Subjects with unknown HIV status or prior HIV-negative status will undergo HIV testing.Additional STI testing will be conducted and pregnancy testing will be performed in women of child-bearing potential.
If a patient is deemed to be potentially eligible, study staff will review the eligibility criteria and will obtain written informed consent before screening and additional study procedures.An ICF example is provided in Appendix A. Study specific procedures are those tests that are not routinely performed in the clinic as part of standard of care.
Enrollment: Upon initial confirmation of primary, secondary or early latent syphilis with at least one of the tests above at each clinical site, study participants will undergo additional informed consent for the following study procedures.All study related forms and specimens will be labeled with study identification numbers and no private identifiable information will be collected in this study.
1) Case report forms (CRFs, Appendix B) will be reviewed to collect additional demographic characteristics (i.e., age, sex/gender, race/ethnicity, marital status, education); behavioral characteristics (i.e., sexual orientation, age at first sexual debut,); sexual partnerships (i.e.number and gender of longterm and short-term sexual partners, number of sexual acts per week with long-term and short-term sexual partners, number of sex acts with use of condoms, number of concurrent sexual partners, duration of long-term and short-term partnerships, history of sexual contact with sex workers); 2) If the subject is pregnant, we will also collect obstetric characteristics (i.e., prenatal care, existing number of children, number of pregnancies and parity, and previous pregnancy loss).
3) Clinical data will be collected from the medical records and documented in CRFs, including syphilis symptoms, medical history (i.e.HIV status, history of syphilis or other STDs, history of viral hepatitis, history of drug use), physical examination findings (i.e.size, number, and location of genital ulcers lesions; type and distribution of rash, mucous patches, lymphadenopathy, condylomata lata), number of days from onset of symptoms to initiation of therapy, and treatment for syphilis.
4) Specimen collection for research purposes.
5) Treatment for syphilis, counseling regarding abstinence and condom use, and partner services will be provided according to standards of care at each clinical site.
Site specific procedures are outlined below.Each site will develop and follow procedures based on site specific algorithms that consider standard of care for syphilis management in their country.
Site 1 (UNC ID Clinic, Chapel Hill) - CIDEIM for the study.3) CIDEIM staff will obtain screening consent for HIV, consent for enrollment and complete CRF. 4) HIV screening and rapid syphilis test will be performed.5) Perform stat RPR and rapid treponemal test in patient without a history of prior syphilis, pregnancy test in women.6) If genital ulcer is suggestive of primary syphilis, or any of the syphilis tests are positive, patient is eligible for additional procedures.(If negative, patient is not eligible.Provide screening incentive).7) Proceed with enrollment procedures: additional CRFs, study specimen collection (e.g., darkfield microscopy, exudate collection, lesions swab, skin biopsy, whole blood, serum), specimen processing.8) Provide study incentive, and schedule study follow-up visit at 1-6 months.9) Provide antibiotic treatment per country specific protocol for appropriate stage of syphilis.10) Other routine care (e.g., counseling, partner services, condoms, STI testing) Site 3 (UNC Malawi, Lilongwe) -1) Patient > 18 years of age presents for routine care with a genital ulcerative lesion suggestive of primary syphilis.2) Refer patient to research staff for recruitment and study screening.
3) Research staff will obtain screening consent.4) After screening consent, review eligibility criteria and complete screening CRF. 5) Obtain ulcer swab and perform darkfield microscopy.6) If darkfield microscopy is positive, obtain additional consent for enrollment.(If negative, patient is not eligible.Provide screening incentive).7) Proceed with enrollment procedures: additional CRFs, RPR, treponemal antibody test in patient without a history of prior syphilis, HIV test (unless known HIV-positive), and pregnancy test in women, study specimen collection (e.g., lesions swab, serum), specimen processing.8) Provide study incentive, and antibiotic treatment per country specific protocol for appropriate stage of syphilis.9) Other routine care (e.g., counseling, partner services, condoms).Site 4 (SMU Dermatology Hospital, Guangzhou) -1) Patient > 18 years of age presents for routine care to the STD clinic with a characteristic genital lesion or rash suggestive of early syphilis.2) Perform darkfield microscopy, stat TRUST/RPR, rapid treponemal test in patient without a history of prior syphilis, HIV test (unless known HIV-positive), and pregnancy test in women.3) If any of the syphilis tests are positive, refer patient to research staff for recruitment and study enrollment.4) Research staff will obtain consent for enrollment.5) After consent, review eligibility criteria and complete screening CRFs.6) Proceed with enrollment procedures: additional CRFs, study specimen collection (e.g.lesions swab, whole blood, serum) and specimen processing.7) Provide antibiotic treatment per country specific protocol for appropriate stage of syphilis.8) Provide study incentive, 9) Other routine care (e.g., HIV care, partner services, counseling, condoms)

Clinical Evaluations (Post-screening)
Complete medical history will be obtained by a personal interview and questionnaire of each subject upon enrollment.A targeted physical examination (genital, rectal, oral, skin, and lymph node examinations) will be performed at each visit.All physical examinations will be performed by a qualified study clinician.Clinical photography will be performed from skin or mucosal lesions if patient consents.
Specimens to be collected specifically for research purposes from study subjects who meet criteria for early syphilis will include exudate collection and ulcer swabs from genital/perianal lesions and/or skin biopsies for TPA PCR and amplicon-based OMP sequencing.Study participants will undergo phlebotomy to collect 10 mL of whole blood to obtain serum for opsonophagocytosis assays and up to 60 mL of whole blood for peripheral blood mononuclear cells and immununology assays.

Clinical Laboratory Evaluations
Diagnostic laboratory tests for syphilis, other STIs, HIV, CD4 count, hepatitis, and inflammatory markers (e.g.ESR, CRP) as indicated will be performed locally according to local protocols.Phlebotomy for serological testing and specimen storage to determine study outcomes will be performed locally.A pregnancy test will be performed on all subjects of childbearing potential.

Specimen collection for research purposes
Specimens collected from study participants who meet criteria for early syphilis and signed informed consent for the full study will include exudate swabs from genital/perianal lesions and/or biopsies from SS skin lesions for qPCR, genome sequencing, and OMP typing.Study participants will undergo phlebotomy to collect 10 mL of whole blood for serum for opsonophagocytosis assays and up to 60 mL of whole blood for PBMCs extraction and immunology assays.

TPA DNA sample procurement and processing
In addition to whole blood and serum samples, genital ulcer material (CIDEIM, Malawi, China) and skin biopsies (CIDEIM and UNC only) will be used to extract TPA DNA from patients with primary and SS.TPA DNA samples will be initially stored locally at each clinic, and shipped in batches to the University of North Carolina at Chapel Hill Genomics Core or the Duke Human Vaccine Institute (under the direction of Dr. Moody, U19 co-PI).Samples can then be transferred to Dr.Moody's immunology laboratory at Duke, the GGC at UNC, and the Salazar/Hawley laboratory in Connecticut for additional molecular and immunologic analyses (i.e., opsonophagocytosis).DNAs will be extracted from fresh genital ulcer material on site within 2 hours of collection.

Laboratory Evaluations/Assays
Spirochetal burdens can be quantitated by qPCR using for TPA polA 17 by study site personnel, and in the case of the ID clinic in Chapel Hill and UNC Project Malawi, by the Genetics and Genomics Core.As previously stated above and in Table 3, each site will have slightly different screening procedures and in the case of Malawi and China, will enroll mostly primary syphilis cases, while in Cali and Chapel Hill, we will enroll a combination of primary, secondary, and early latent syphilis patients.A detailed description and site-specific enrollment is described in the Human Subjects section of this application.

Rabbit infectivity testing (RIT)
SMU (China) has the ability to isolate TPA from syphilis patients' blood and genital ulcer material, a technique known as rabbit infectivity testing.Cali has a new rabbit facility on site at CIDEIM.Procedures for rabbit injection and TPA isolation have been previously described.Approximately 20 live TPA isolates will also be isolated from a cohort of primary or secondary syphilis patients by RIT for whole genome sequencing (WGS).

Genomic sequencing
The Genetics and Genomics Core (GGC) will be housed primarily at UNC, with support from a second laboratory facility at Masaryk University (MU) in the Czech Republic.Each GGC laboratory will be responsible for sequencing of samples from different field collection sites.Specifically, UNC will process and sequence de-identified DNA from samples collected in North America, South America, Africa, and Asia.MU will process and sequence archived, de-identified DNA from samples previously collected in Europe.The GGC will employ next-generation sequencing techniques for whole-genome and/or targeted sequencing of known OMP genes.Sequencing data will be analyzed using established bioinformatics pipelines that permit assessment of minority variants in polyclonal infections.The GGC will also perform WGS on a subset of clinical isolates, including those successfully passaged in rabbits as described above.Sequences will be deposited in public repositories as described below (see Publication Policy).

STUDY SCHEDULE
We will recruit adult patients identified with early syphilis from four clinical sites located at the UNC-Chapel Hill Infectious Disease Clinic in Chapel Hill, North Carolina; CIDEIM in Cali, Colombia; Bwaila Hospital STI Clinic in Lilongwe, Malawi; and the Dermatology Hospital of SMU in Guangzhou, China.Heterosexual men and women, men who have sex with men, pregnant women, and HIV co-infected persons presenting for routine evaluation with symptoms suggestive of primary or secondary syphilis or suspected early latent syphilis will be recruited by study staff located at each clinical site.Recruitment will also be conducted through outreach activities and IRB-approved materials to be shared in person or via social media to promote syphilis screening among high-risk populations in the communities.

Screening and Enrollment
The study procedures for each clinical site are outlined and discussed in Section 6, Study Procedures  The potential study participant will be provided with a description of the study (purpose and study procedures) and asked to read the ICF or have it read to him/her.The ICF must be signed before any screening or study procedures are performed.
 Demographic information will be collected from the study participant.
 Eligibility criteria will be reviewed, and the following procedures will be conducted.
o Complete medical history will be obtained by interviewing subjects to assure eligibility.
o Sexual history for the past 60 days will be collected.o A targeted physical examination will be performed by a qualified study clinician.o If primary or secondary lesions are present, clinical staff may swab the lesions.o A urine or serum pregnancy test will be performed on all subjects of childbearing potential.
o Blood will be collected for RPR test, storage, and HIV assays (including a CD4 if not available for HIV infected patients).Clinicians will provide pre-and post-test counseling, obtain written consent if required for HIV testing by local authorities, treatment, and referrals per local standard of care.
 Protocol requirements will be reviewed with the participant.
 Contact information will be collected, the preferred method of contact will be noted for any follow-up visit (s) at each site.

Follow-up and Final Visits, if applicable
To improve retention (UNC and Cali only), enrolled participants will be provided an appointment card for the follow-up visit to be scheduled after enrollment and to monitor response to syphilis therapy (defined as 3-6 months after treatment for early syphilis, or monthly for pregnant women with syphilis).Appointment reminders via telephone calls will also be provided to increase the likelihood of follow up.Three separate attempts (e.g., via telephone calls) to contact participants for follow-up will be conducted by study staff.
Participants will be considered lost-to-followup if there is no response or return to clinic after three attempts.

Criteria for Discontinuation or Withdrawal of a Subject (or a Cohort), if applicable
Study participants may voluntarily withdraw their consent for further study participation at any time and for any reason without penalty or prejudice to future medical care.

Specification of the Appropriate Outcome Measures
Significant advances have been made in the characterization of outer membrane proteins of TPA.However, developing a syphilis vaccine requires a better understanding of the OMPeome in different geographic areas and different transmission groups.The variety of TPA strains and intra-strain variability that has been observed in clinical isolates indicates that immune protection will require an effective immune response across multiple TPA strains.

2 Primary Outcome Measures
Generate a catalog of the global repertoire of TPA outer membrane proteins, establishing the frequency of TPA clades and strains within each geographic area, as well as within specific transmission groups.Genome sequencing and OMP typing in clinical isolate samples will allow us to characterize the distribution and frequency of the alleles of current OMPs and may allow to identify new OMPs.
Structural analysis of OMPs identified in clinical isolates will be used to identify proposed extracellular loops which will then be used to evaluate serum reactivity and to identify epitope-specific B-cells.Epitope-specific B-cells will be utilized to generate monoclonal antibodies (MAbs) to be used in opsonization experiments.Epitope-specific MAbs found to have higher opsonization capacity will be considered to be directed against epitopes that will potentially have higher immunogenicity, and therefore could be used for immunogenicity experiments in animal models.

Secondary Outcome Measures
Based on the frequency of the OMP alleles, recombination of alleles from different strains within clinical isolates, and the number of mutations within variable regions of the OMPs, a new classification system will be developed, which then can be used for local surveillance studies to make sure vaccine cocktails in specific locales have the correct representation of OMP variants.
Demographic and behavioral surveys of the study participants will be used to generate models of transmission dynamics and sexual partnership, which will be helpful to analyze the impact of vaccine candidates in different geographic areas and transmission groups.

Case report forms
Paper or electronic case report forms were used to document eligibility criteria for screening and enrollment, and review participants' demographics and sexual histories.In depth information regarding male or female sexual partnerships in the past year were also collected in this study to assist with future syphilis vaccine modeling.Clinical data were collected from participants and their medical records to include the following: history of sexually transmitted infections (STIs) and HIV; medical history and drug use; pregnancy status (from women); symptoms and physical examination findings; point-of-care and clinical laboratory testing; and research specimen collection.Data on other STI testing and repeat syphilis testing following treatment were collected if conducted as part of routine clinical care at each site.Any missing data or data anomalies were communicated to the sites for clarification and resolution.All queries were addressed locally and REDCap records were corrected when applicable.

Screening Eligibility Form Inclusion Criteria: 0=No 1=Yes
Participants must respond "Yes" to all site specific questions below to be eligible for screening.If sexual partner is male or female: 23.In an average month during which you were sexually active with this partner in the past year, how many times per month did you perform oral sex on this partner (you put your mouth on your partner's genitals)?99 = Decline to Answer 24.In an average month during which you were sexually active with this partner in the past year, how many times per month did this partner perform oral sex on you (he/she put his/her mouth on your vaginal area)?99 = Decline to Answer STI and HIV History Form 1.
Previous self-testing (Self-test defined as you take your own sample and interpret the result yourself) 0 = No 1 =Syphilis test, specify result: 2 = HIV test, specify result: For ALL subjects: Responses to the following questions can be from obtained from the patient.However, review of medical records should be conducted for additional documentation.For participants with multiple TPA qPCR results per sample type, overall TPA qPCR sample results were classified as follows: if at least one individual sample result was >0 copies/µl, the overall sample type result was classified as positive with an overall quantitative result as the geometric mean of all >0 copies/µl results.Otherwise, the overall sample result was classified as negative (0 copies/µl).Only results from the local laboratories were included in this determination, with the exception of Malawi, which had all TPA PCR labs performed at UNC.Genomic data analyses are described below.

TPA enrichment, library preparation, and sequencing
Determination of TPA polA copy number and total DNA concentration was conducted on samples by qPCR and Qubit fluorometer with dsDNA HS2 reagents (Thermo Fischer Scientific, Waltham, MA, USA), respectively.Specimens with ≥40 polA copies/µL were selected for WGS, with the exception of two low concentration samples from China.For participants with multiple specimens, we selected one sample for inclusion in the study by prioritizing DNA samples, when available, that had been extracted directly from chancre swabs or skin biopsies (without rabbit passage) when available and had the highest polA copy number.For some patients enrolled in Guangzhou, China, only rabbit T. pallidum isolates were available for sequencing.
TPA enrichment and WGS were conducted as previously described, 1,2 with several alterations.In brief, samples with total DNA concentrations <2ng/µL (i.e.<10ng DNA in 50µL) were first subjected to parallel, pooled whole-genome amplification (ppWGA) with at least three replicates using random hexamer primers and phi29 polymerase (Genomiphi v2, Cytiva, Marlborough, MA, USA).Amplified products were pooled and cleaned up using 1ꞏ8x AMPure XP beads (Beckman Coulter, Brea, CA, USA).DNA or cleaned-up ppWGA product was then acoustically sheared and enriched for TPA DNA using Sure Select XT Low Input (Agilent Technologies, Santa Clara, CA) or Sure Select XTHS2 custom 120-nucleotide RNA oligonucleotide baits at UNC or SMU, respectively, according to manufacturer instructions.Efforts were made to normalize capture efficiency and sequencing depth of all samples.Samples were pooled for hybrid capture by relative TPA input (TPA:total DNA ratio).Rabbit-passaged samples were pooled separately from directly extracted samples.Capture pools consisted of up to 40 libraries per capture reaction.These pools were combined and sequenced using the MiSeq platform (Illumina, San Diego, CA, USA) at UNC or NovaSeq platform at SMU with paired-end, 150bp reads.Raw sequencing data after removal residual human reads are available through the Sequence Read Archive (SRA, BioProject PRJNA815321).

Sequencing, data processing, and phylogenomic analysis
Sequencing data, along with a convenience sample of publicly available data from geographically diverse locations published by Lieberman et al., 3 was processed, aligned, and analyzed using an adaptation of our previously described bioinformatic pipeline, 1 as described at https://github.com/IDEELResearch/Tpallidum_genomics.In brief, adapter sequences were trimmed using trimmomatic (v0ꞏ39), 4 and trimmed reads analyzed for contaminants using strainseeker (v1ꞏ5ꞏ1); 5 aligned to the SS14 (accession CP004011ꞏ1) and Nichols (accession CP004010ꞏ2) reference genomes using bwa (v0ꞏ7ꞏ17); 6 filtered using samtools (v1ꞏ16), 7 picard (v2ꞏ26ꞏ11), 8 and custom shell scripts; and variant called using GATK's (v3ꞏ8ꞏ1) haplotypecaller utility. 9Downstream analyses were conducted using a consensus genome FASTA file constructed based on single nucleotide variant (SNVs); only loci with ≥3 unique mapping reads were called.Consensus genomes for SS14-and Nichols-like strains were derived from the SS14-and Nichols-reference-based alignments, respectively.Genomes were aligned using MAFFT (v7ꞏ490), 10 including wellcharacterized reference genomes for TPA (Nichols, SS14, Mexico A), T. pallidum subsp.pertenue (TPE, Samoa D), and T. pallidum subp.endemicum (TEN, Bosnia A).Repetitive and putative recombination regions were identified and masked using Gubbins (v3ꞏ2) (see appendix 2 tab 6) before construction of a maximum likelihood (ML) tree using RAxML (v8ꞏ2ꞏ12), 11,12 performed with 1,000 rapid bootstraps.Trees were visualized and annotated in R (v4ꞏ1ꞏ2; R Core Team, Vienna, Austria) using the ggtree package (v3ꞏ2ꞏ1). 13Clades and subclades were assigned by visual inspection and comparison to those defined by Lieberman et al, 3 and informed by bootstrap support.Bayesian modeling using fastbaps was also used to identify clustering as described below, using maximum likelihood phylogeny results as a prior to inform partitioning for the purposes of annotating the phylogenomic tree (figure 3, appendix 1 figure 3).Maximum-likelihood-informed fastbaps clusters were not called in eight samples (appendix 2 tab 1) due to differences in filters applied during global TPA genomic population structure analysis (see below).
Macrolide resistant strains were identified using a previously described competitive mapping method followed by variant calling for 23S rRNA A2058G and A2059G variants. 14alysis of global TPA genomic population structure and allele frequency differences A separate analysis pipeline was used to align, call, and analyze these and all publicly available TPA genomes as described at https://github.com/IDEELResearch/Tpallidum_genomics.Following adapter trimming using trimmomatic (v0ꞏ39), we assessed sequences for traces of host genome using bbmap (v38ꞏ82) by mapping reads against human (hg19) and rabbit (oryCun2) reference genomes and removing reads with ≥2 hits against the reference genomes. 15Broken reads were removed using repaired.shembedded in bbmap.We mapped qualityfiltered reads to a modified version of the Nichols reference genome (accession CP004010ꞏ2), with 23S rRNA, tpr, arp and tp0470 genes masked using BEDTools (v2ꞏ30)) using bwa (v0ꞏ7ꞏ17). 16Sequence alignments were subjected to post-alignment filtering, including removing duplicate reads, indel realignment, removing reads with excessive mismatches, soft and hard clips, chimeric alignment, repetitively aligned reads and low mapping quality reads.Following post-alignment filtering, genomic sequences with at ≥3 unique reads mapping across >80% of the genome were retained for analysis.Variant calling was performed using GATK (v3ꞏ8ꞏ1). 9In order to assign SS14 versus Nichols clade membership, a maximum likelihood phylogeny tree of clinical isolates alongside Nichols (CP004010ꞏ2) and SS14 (CP004011ꞏ1) reference genomes was constructed using IQ-tree (v1ꞏ6ꞏ12) and clades were assigned according to clustering with the corresponding reference genomes. 17 evaluated TPA population structure using several approaches.First, we performed principal component analysis (PCA) using the smartpca package in EIGENSOFT (v6ꞏ1ꞏ4), 18 including only biallelic single nucleotide variant (SNV) sites with <20% missingness and >1% allele frequency across the global TPA gene pool.Then two lineagespecific PCAs were constructed using SNVs with within-population allele frequency of >1% and >5% for SS14-and Nichols-lineage strains, respectively.PCAs were visualized and annotated using ggplot2 (v3ꞏ3ꞏ6) in R. 19 Next, we employed fastbaps (v1ꞏ0ꞏ8) to identify clades and subclades using a Bayesian approach. 20Fisher's exact test was used to identify significantly differentiated SNVs between SS14 and Nichols clades using a Bonferonni-corrected p value of < 4ꞏ451x10 -6 to account for multiple comparisons; results were compared to Fixation index (F st ) values calculated using vcftools (v0.1.15)(see appendix 2 tab 5). 21For all SNVs with >1% within-population allele frequency, we compared allele frequencies by clade and continent using a heatmap generated using ggplot2 in R. We used snpEff (v5ꞏ1) for functional annotation estimation of significant SNVs. 22SNVs then were mapped to 3dimensional protein models previously predicted by Hawley et al. 23 (TP0515, TP0858, TP0865, TP0966) or by AlphaFold2 (TP0136, TP0179, TP0462) for genes with high frequency, population-informative mutations, 24 and visualized using Chimera (v1ꞏ16). 25

Study outcomes and sample size
Primary outcomes of this study included the number and frequency of unique TPA genomic sequences distributed geographically and associated individual-level characteristics.Secondary outcomes include the predominant clades, subclades and alleles identified in our study population and relative to the TPA global population structure.Target sample sizes at each site were informed by rarefaction curves, which are classically used in ecological studies to depict observed species richness at different sample sizes but also used to assess allelic diversity in genomic analyses. 26Based on estimates of the genetic diversity of a subset of TPA outer membrane protein genes available at the time of study launch, 27 we estimated that sequences of 30-50 TPA strains from each clinical site (target total sample size of 120-200 for four sites) were required in order to adequately sample each site's TPA OMP genetic diversity with 95% confidence.

Characteristics of additional participants included in the analyses
There were 43 DFM-negative partipants who provided WGS data from Malawi, of whom the median age was 32 (IQR: 23, 38); 22 (51%) were male.Five of 31 (16%) with known HIV status were living with HIV.Among the 10 participants from Cali, Colombia, the median age was 28 (IQR 24-33).Six were male, and three reported being gay, bisexual or other in sexual orientation.All of the participants had laboratory confirmed SS and three participants were living with HIV.

SUPPLEMENTARY TABLES
The following tables are provided as an associated .xlsxfile to improve readability.3).
Table 7: Allele frequency of highly differentiated SNVs by population.

Figure 1 .
Figure 1.Enrollment and Study Procedures

Figure 1 :
Figure 1: Participant characteristics by TPA lineage across clinical sites.

Figure 2 :
Figure 2: Principal components analysis illustrating the relationship between TPA strains sequenced as part of this study (n=166) and global TPA genomic diversity (n=1,413).Points are not offset; thus, samples with identical SNVs overlap.

Figure 3 :
Figure 3: Comparison of TPA population assignments using baps Bayesian modeling with different partitioning strategies.Maximum-likelihood (ML) wholegenome phylogeny including samples derived from 166 individuals in this study, 62 recently published genomes, and 5 reference genome samples included in figure3is shown at left.Results of different baps partitioning methods are shown, including one using ML whole-genome phylogeny results used as a prior for partitioning (fastbaps_iqtree; annotated in figure3) and four using internal partitioning methods.The "optimized symmetric" (opt.symmetric)approach was used during analysis of global TPA genetic population structure (figures 4-5, appendix 2 tabs).

Figure 4 :Figure 5 :
Figure 4: Allele frequencies of lineage-informative, highly differentiated (Fisher exact p < 4ꞏ451x10 -6 ) SNVs identified during analysis by lineage and geographical region, excluding tpr family, tp0470, and arp genes, with gene annotation by SnpEff.Fixed SNVs are not shown.SNV coordinates reference the Nichols genome.
RPR) titers >1:8 dilutions and/or a positive rapid treponemal antibody test.c) Suspected early latent syphilis with no reported symptoms but documented seroconversion of syphilis serologies within the previous 12 months or documented sexual exposure to an individual known to have early syphilis diagnosed within the last 12 months; with reactive non-treponemal (i.e.RPR) titers >1:8 dilutions and treponemal antibody tests; 2. Willingness to provide informed consent for additional study procedures; 3. Willingness to undergo ulcer swabs or skin biopsies for T. pallidum DNA extraction and genomics (for 1. Age 18 years of age or older; 2. Any of the following presentations: a) Suspected primary syphilis characterized by one or more painless ulcerative lesions (e.g.chancre); with or without a reactive non-treponemal or rapid treponemal antibody test.b) Suspected secondary syphilis characteristic mucocutaneous lesions (e.g., macular, maculopapular, papular, or pustular cutaneous lesions, mucous patches and/or condyloma lata ), with or without generalized lymphadenopathy AND with reactive nontreponemal AND treponemal antibody tests.c) Suspected early latent syphilis with no reported symptoms but documented seroconversion of syphilis serologies within the previous 12 months or documented sexual exposure to an individual known to have early syphilis diagnosed within the last 12 months; with reactive non-treponemal (i.e.RPR) titers >1:8 dilutions and b) Study subject has untreated secondary syphilis characterized by clinical findings as described above AND initial serologic confirmation defined by either a positive non-treponemal antibody test (i.e.

18 years of age presents
for routine care with a characteristic genital lesion or rash suggestive of early syphilis, or a newly reactive RPR (>1:8 titer) or positive rapid treponemal syphilis test, or recent exposure to a sexual partner diagnosed with syphilis within last 12 months.2) Health care institution staff identify and refer patient to Clinical Research Unit at Patient has untreated primary syphilis based on clinical findings AND confirmation by darkfield microscopy, OR b.Patient has untreated secondary syphilis based on clinical findings AND initial confirmation by stat non-treponemal antibody test and/or rapid syphilis test (if no prior history of syphilis) OR c. Patient has untreated early latent syphilis based on documented seroconversion of syphilis serologies within past 12 months, or sexual exposure to an individual diagnosed with early syphilis within last 12 months 3. Patient has a hemoglobin level ≥ 10.5 (for Lilongwe only); If sexual partner is male: 21.In an average month during which you were sexually active with this partner in the past year, how many times per month did you have receptive vaginal intercourse with this partner (your partner inserted his penis in your vagina)?In an average month during which you were sexually active with this partner in the past year, how many times per month did you have receptive anal intercourse with this partner (your partner inserted his penis in your butt or anus)?99 = Decline to Answer If 0, skip Q22a.22a.Of those times that you had receptive anal intercourse, how many times did this partner put his penis in your rectum (butt or anus) without a condom?99 = Decline to Answer Error message if Q22>Q22a: The number of times without a condom cannot be greater than the total number of times reported for receptive anal intercourse.
6. Patient has not received antibiotics active against syphilis in the last 30 days (all sites: penicillin, azithromycin, doxycycline or syndromic GUD treatment) 7. Patient understands the purpose of the study or the nature of their participation (all sites); Study Eligibility Determination: (all sites)8.Is the patient eligible to participate?(responded "yes" to all inclusion criteria above):9.Patient has provided informed consent for screening procedures ); however, the two participants with ELS had very low qPCR copy numbers from whole blood compared to other sample types from participants with PS or SS.Overall, there were 277 samples with positive PCR results, including 108 (83%) lesion swabs, 92 (85%) skin biopsies, seven (37%) skin scrapings, and 70 (51%) whole blood specimens (appendix 2 tab 2).Darkfield genital ulcer swabs from Malawi had qPCR values with geometric means of 1016 copies/µl (range 12-10,438), compared to 39 (range 1-862) and 194 (range 11-8,835) copies/µl from the China and Colombia sites, respectively (appendix 2 tab 2).SS lesion swabs from Malawi had the highest mean values of 1858 (range 115-82,147) and 2830 (range 235-20,143) copies/µl for both DFM-positive and DFM-negative samples, respectively.TPA qPCR copy numbers in blood from SS participants based on nontreponemal titers < 1:32 versus > 1:32 were similar.

Table 1 :
Genomes and metadata for samples included in this study, comprising both new and publicly available genomes.23S rRNA mutations associated with TPA macrolide resistance calls for samples in this study are depicted in the A2058G and A2059G columns.The convenience sample of published genomes included in figure3are highlighted in the "Convenience_sampling_phylogeny" column.

Table 2 :
Qualitative and quantitative T. pallidum polA PCR results based on stage of syphilis, specimen type and site of enrollment.

Table 3 :
Frequency of TPA populations sampled in this study by country.

Table 4 :
Fixed SNVs identified during comparison of Nichols-(reference sequence) and SS14-lineage TPA strains, including functional annotation.

Table 6 :
Putative recombination regions masked by Gubbins during phylogenomic analysis (see figure