Comparative genomics between Trichomonas tenax and Trichomonas vaginalis: CAZymes and candidate virulence factors

Introduction The oral trichomonad Trichomonas tenax is increasingly appreciated as a likely contributor to periodontitis, a chronic inflammatory disease induced by dysbiotic microbiota, in humans and domestic animals and is strongly associated with its worst prognosis. Our current understanding of the molecular basis of T. tenax interactions with host cells and the microbiota of the oral cavity are still rather limited. One laboratory strain of T. tenax (Hs-4:NIH/ATCC 30207) can be grown axenically and two draft genome assemblies have been published for that strain, although the structural and functional annotation of these genomes is not available. Methods GenSAS and Galaxy were used to annotate two publicly available draft genomes for T. tenax, with a focus on protein-coding genes. A custom pipeline was used to annotate the CAZymes for T. tenax and the human sexually transmitted parasite Trichomonas vaginalis, the most well-characterized trichomonad. A combination of bioinformatics analyses was used to screen for homologs of T. vaginalis virulence and colonization factors within the T. tenax annotated proteins. Results Our annotation of the two T. tenax draft genome sequences and their comparison with T. vaginalis proteins provide evidence for several candidate virulence factors. These include candidate surface proteins, secreted proteins and enzymes mediating potential interactions with host cells and/or members of the oral microbiota. The CAZymes annotation identified a broad range of glycoside hydrolase (GH) families, with the majority of these being shared between the two Trichomonas species. Discussion The presence of candidate T. tenax virulence genes supports the hypothesis that this species is associated with periodontitis through direct and indirect mechanisms. Notably, several GH proteins could represent potential new virulence factors for both Trichomonas species. These data support a model where T. tenax interactions with host cells and members of the oral microbiota could synergistically contribute to the damaging inflammation characteristic of periodontitis, supporting a causal link between T. tenax and periodontitis.


Introduction:
The oral trichomonad Trichomonas tenax is increasingly appreciated as a likely contributor to periodontitis, a chronic inflammatory disease induced by dysbiotic microbiota, in humans and domestic animals and is strongly associated with its worst prognosis.Our current understanding of the molecular basis of T. tenax interactions with host cells and the microbiota of the oral cavity are still rather limited.One laboratory strain of T. tenax (Hs-:NIH/ATCC ) can be grown axenically and two draft genome assemblies have been published for that strain, although the structural and functional annotation of these genomes is not available.
Methods: GenSAS and Galaxy were used to annotate two publicly available draft genomes for T. tenax, with a focus on protein-coding genes.A custom pipeline was used to annotate the CAZymes for T. tenax and the human sexually transmitted parasite Trichomonas vaginalis, the most well-characterized trichomonad.A combination of bioinformatics analyses was used to screen for homologs of T. vaginalis virulence and colonization factors within the T. tenax annotated proteins.
Results: Our annotation of the two T. tenax draft genome sequences and their comparison with T. vaginalis proteins provide evidence for several candidate virulence factors.These include candidate surface proteins, secreted proteins and enzymes mediating potential interactions with host cells and/or members of the oral microbiota.The CAZymes annotation identified a broad range of glycoside hydrolase (GH) families, with the majority of these being shared between the two Trichomonas species.Discussion: The presence of candidate T. tenax virulence genes supports the hypothesis that this species is associated with periodontitis through direct and indirect mechanisms.Notably, several GH proteins could represent potential new virulence factors for both Trichomonas species.These data support a model where T. tenax interactions with host cells and members of the oral microbiota could synergistically contribute to the damaging inflammation characteristic of periodontitis, supporting a causal link between T. tenax and periodontitis.KEYWORDS protein-coding genes, surface proteins, secreted proteins, glycoside/glycosyl hydrolases (GHs), peptidases/proteases, exosomes, pore-forming proteins

Introduction
The trichomonad Trichomonas tenax (Phylum Parabasalia, Class Trichomonadea, Cepicka et al., 2010) is an obligate symbiont of the oral cavity reported to have a potential role in periodontitis (Marty et al., 2017;Santi-Rocca, 2020;Eslahi et al., 2021;Martin-Garcia et al., 2022).However, T. tenax is one of the least well-understood microbial species of the complex oral microbial ecosystem and very little is known on how it might modulate human oral disease (Baker et al., 2024).Similar to the closely related sexually transmitted parasite Trichomonas vaginalis, which is human-specific, T. tenax was also commonly considered to be specifically associated with humans (Honigberg, 1990;Maritz et al., 2014).However, the use of sensitive molecular diagnostic tools has clearly demonstrated that T. tenax is common in the oral cavity of a broader range of hosts including, cats, dogs and horses and with some additional reports among birds (Kellerová and Tachezy, 2017;Matthew et al., 2023).There are also reports of the isolation of T. tenax from extra-oral locations including the respiratory tract, lymph nodes, salivary glands and the urogenital tract (Maritz et al., 2014;Brosh-Nissimov et al., 2019).Studies have reported an increased prevalence of T. tenax among patients with periodontitis compared to healthy controls (Marty et al., 2017;Eslahi et al., 2021;Martin-Garcia et al., 2022).However, the ability of T. tenax to contribute, directly or indirectly, to the pathobiology of periodontitis, is still poorly understood.Indeed, there is a lack of data on the molecular and cellular basis of T. tenax interactions with host cells (Matthew et al., 2023), members of the microbiota, and its potential endosymbionts, three key factors considered important in modulating the pathobiology of T. vaginalis (Hirt et al., 2011;Mercer and Johnson, 2018;Riestra et al., 2022;Margarita et al., 2023).
Trichomonad genomes typically exceed 50 Mb, making them unusually large compared to other parasitic microbial eukaryotes (Zubáčová et al., 2008).The published draft genome data for T. vaginalis (G3) is consistent with these estimates, with a size of ∼160 Mb (Carlton et al., 2007).The first draft whole genome sequence for T. tenax (strain Hs-4:NIH, ATCC 30207; Manassas, VA, USA) was published in 2019 (∼47 Mb, across 4,161 contigs) (Benabdelkader et al., 2019).Benabdelkader et al. (2019) conducted this study to investigate the prevalence and genetic diversity of T. tenax and its potential involvement in the severity of periodontitis among humans.A second draft whole genome sequence dataset for the same strain of T. tenax was published in 2022 (63.4 Mb, across 4906 contigs) (Yang et al., 2022).Both draft genomes are smaller than the 133 ± 4 Mb estimated from flow cytometry (Zubáčová et al., 2008).The two published T. tenax draft genomes did not include annotation of protein-coding genes, which is essential to guide the study of the molecular cell biology of a given organism.Here, we investigated T. tenax protein-coding genes by integrating and annotating the two published draft genomes (Benabdelkader et al., 2019;Yang et al., 2022).The predicted T. tenax proteincoding genes and their functional annotations were compared with the predicted proteome of a new assembly of T. vaginalis.We also annotated the CAZymes (Lombard et al., 2014;Wardman et al., 2022) from T. tenax and T. vaginalis.We focused our analyses to specifically test the hypothesis that T. tenax encodes a range of virulence and colonization factors that contribute directly and indirectly to the pathobiology of periodontitis through interaction with various host cells and members of the oral microbiota, as known for T. vaginalis (Hirt, 2013;Mercer and Johnson, 2018;Riestra et al., 2022).Indeed, only a handful of publications have investigated the potential virulence of T. tenax on host cells and the virulence factors that could be associated with the observed impact on host cells (reviewed in Matthew et al., 2023).Similarly, we are aware of only one study that has considered T. tenax genes encoding candidate proteins mediating interactions with members of the microbiota, i.e. the T. vaginalis NlpC/P60 peptidases that target bacterial cell walls and are conserved in Trichomonas gallinae (Barnett et al., 2023).Here we took advantage of the rich annotation available for T. vaginalis virulence factors (Carlton et al., 2007;Hirt et al., 2011;Hirt, 2013;Mercer and Johnson, 2018;Riestra et al., 2022) to expand the bioinformatic characterization of T. tenax candidate virulence factors.These could mediate binding of T. tenax to human epithelial cells, immunocytes, the microbiota in the oral cavity, or could degrade host structural proteins or those involved in innate and adaptive immunity, and by doing so could contribute to the pathobiology of periodontitis.

. Genome annotation
GenSAS (Humann et al., 2019) was used to annotate the two draft genome sequences of T. tenax strain Hs-4:NIH (NCBI Genome assemblies PRJEB22701 and ASM2309173v1) (Benabdelkader et al., 2019;Yang et al., 2022).Identification and masking of repeats was achieved using RepeatMasker (Nishimura, 2000) and RepeatModeler (Smit and Hubley, 2008) on GenSAS.Multiple tools integrated into GenSAS were employed for the ab initio prediction of protein-coding genes including Augustus (Stanke et al., 2004), GeneMarkES (Lomsadze et al., 2005), GlimmerM (Salzberg et al., 1999), Genescan (Burge and Karlin, 1997) and SNAP (Korf, 2004).For homology-based gene predictions protein and transcript sequences from other trichomonads, including T. vaginalis (Carlton et al., 2007), and other relevant eukaryotic species were downloaded from NCBI and aligned to the genomes using BLAST (Altschul et al., 1990), BLAT (Kent, 2002), PASA (Haas et al., 2003) and DIAMOND (Buchfink et al., 2021) using default parameters.A published T. tenax RNA-Seq dataset (Handrich et al., 2019) (NCBI SRA accession SRX2052871) was used as evidence for expression of protein-coding genes.The RNA-Seq reads were aligned to the repeat masked genome assembly using HISAT2 (Kim et al., 2015) and TopHat2 (Kim et al., 2013) with default parameters.EvidenceModeler (Haas et al., 2008) was used to create the consensus gene set by incorporating the outputs of all ab initio and alignment-based gene predictions.The official gene set (OGS) which is defined as the set of predictions that serve as the most trusted gene predictions, was generated by PASA Refinement (Haas et al., 2003).In this study, all outputs generated from the gene prediction tools were included and integrated to generate a consensus annotation.

. Protein functional annotation
For protein functional annotation, tools integrated into GenSAS were used and included DIAMOND (Buchfink et al., 2021) and InterProScan (Jones et al., 2014).These analyses were complemented with tools integrated into Galaxy (Afgan et al., 2016) and included BlastP (Camacho et al., 2009), EggNog Mapper (Huerta-Cepas et al., 2016) and Motif Search (Kanehisa, 2002), PANZZER2 (Törönen et al., 2018) and GhostKoala (Kanehisa et al., 2016).The predicted proteins were used to query the NCBI non-redundant (nr) refseq_protozoa protein database (released on 9/12/2021) using the BLASTP search, with an E-value cutoff of 1 × 10 −8 .That version of the database contains 1,095,419 protein sequences from different protozoans including T. vaginalis.The inferred protein sequences of T. tenax were also used to query the SWISS-PROT and TrEMBL databases using DIAMOND to further improve the accuracy of function allocation.Gene name assignment was done by recording BLASTP top hits and PANZZER2 (Törönen et al., 2018).Additional functional domains were predicted by EggNog Mapper (Huerta-Cepas et al., 2017).Default settings were used with a minimum E-value expected to be 1 × 10 −3 .Both InterProScan and Eggnog Mapper provided additional evidence for existing gene annotations including the database cross-references (Dbxref) and gene ontology terms (GO).We also used SMART (Letunic and Bork, 2018) to investigate the domain organization of specific proteins of interest.To further enrich the protein functional annotation, GhostKOALA (Kanehisa et al., 2016) was used to assign K numbers to the annotations by GHOSTX search against a nonredundant set of KEGG GENES.The GhostKOALA platform is a web-based server that performs automatic annotation of genomes with Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology (Kanehisa et al., 2016).Protein localization was inferred using Phobius (Käll et al., 2007), SignalP (Petersen et al., 2011) and for selected entries DeepTMHMM (Hallgren et al., 2022).Data produced by the different software were mined and an annotation summary generated manually.
In addition to the published 2007 annotation of the T. vaginalis G3 draft genome (Carlton et al., 2007), we leveraged updated annotation from a new T. vaginalis G3 assembly featuring six chromosome-length scaffolds derived from long-read sequencing.The new genome sequence data and corresponding assembly and annotation are all available at the NCBI [NCBI: Name, NYU_TvagG3_2, BioProject PRJNA16084, NCBI RefSeq assembly GCF_026262505.1; also TrichDB (Aurrecoechea et al., 2009): release 65, 14/09/2023].These updated data are referred to here as the G3_2022 data.Supplementary Table 1 provides a lookup table between the G3_2007 and G3_2022 annotations providing continuity with the previous assembly by including where possible the TrichDB IDs (i.e., TVAG_XXXXXX, TVAG_RG_XXXXX).

. CAZy annotation
The general protein functional annotation from T. tenax (this study) and T. vaginalis (Carlton et al., 2007) genomes were enriched by submitting the predicted protein sequences to the CAZy annotation pipeline.The reported data in this study have a specific focus on glycoside hydrolases (GH), key Carbohydrate Active enZymes (CAZymes), and the carbohydrate-binding modules (CBMs), as these are more likely to represent potential virulence factors.Briefly, automated crunching was performed followed by manual curation of borderline cases and fragments as described for hundreds of annotated genomes and used for updating the CAZy database (Lombard et al., 2014).Predictions of enzyme activities for Trichomonas spp.GHs, based on significant similarities with entries from a library of experimentally determined enzymatic activities of CAZymes (Drula et al., 2022), are also reported (Drula et al., 2022).

. Estimates of completeness
Genome annotation quality and the completeness of gene annotation were estimated and quantified using the Benchmarking for Universal Single-Copy Orthologs (BUSCO) tool (Manni et al., 2021).The annotated proteins of T. tenax were searched against selected near-universal single-copy orthologs from the eukaryotic database (eukaryota_odb10 v5.2.2) of the hierarchical catalog of orthologs (OrthoDB v10) (Kriventseva et al., 2007).BUSCO searches the query proteome for "complete -single copy, " "complete -duplicated, " "fragmented, " and "missing" orthologs within the proteome that are expected to be highly conserved among related species.The output generated from this evaluation is presented as the percentage of the total 255 BUSCOs in the eukaryote_odb10 lineage from the OrthoDB database.

. Gene family analyses
We employed OrthoVenn3 (using the Orthofinder algorithm) (Emms and Kelly, 2015;Sun et al., 2023) to identify and compare orthologs in the protein sequence sets from T. tenax (integrated annotated proteins) and T. vaginalis.Orthofinder utilizes DIAMOND for identifying sequence similarity and DendroBLAST (Kelly and Maini, 2013) for gene tree inference.
We also performed a gene network analysis using EGN (Halary et al., 2013) to predict homologous gene families using the full list of annotated proteins from the genomes of T. vaginalis G3_2022 (Genbank accession GCA_026262505.1) and T. tenax (this study).Results of an all vs all BLASTP searches (E-value ≤ 1 × 10 −10 , percent identity >25%, aligned length of query and subject ≥90%) were clustered into networks with edges linking gene nodes according to BLAST hits.Thresholds for recording significant BLAST hits were an E-value of less than 1 × 10 −10 , percentage identity of greater than 25% and an alignment length of at least 90% of the length of both sequences.A selection of gene networks for complex gene families of interest was visualized.Figures were generated using Cytoscape using the yfiles organic layout algorithm (Shannon et al., 2003).

Gene predictions, functional annotation and completeness of T. tenax annotations
An overview of the annotation of the two T. tenax Hs-4:NIH draft genome sequences and their level of completeness is provided in Table 1.The Supplementary Data 1, 2 are the GFF3 files of the annotation of the T. tenax 2019 and 2022 draft genomes, respectively.Integrating the two annotations predicted a total of 20,286 distinct protein-coding genes for the T. tenax strain Hs-4:NIH.A FASTA file with the 20,286 protein sequences is available as the Supplementary Data 3.The BUSCO completeness score of 50.6% for T. tenax integrated annotation (20,287 proteins) is similar to the T. vaginalis (2022 annotation), with a completeness score of 53%.Here we describe the functional annotation of a selection of T. tenax proteins with a focus on CAZymes and a selection of some of the best characterized candidate virulence factors by taking advantage of the more extensive annotation of T. vaginalis proteins (Carlton et al., 2007(Carlton et al., , 2010;;Hirt et al., 2011;Riestra et al., 2022).Supplementary Table 2 lists the 20,286 T. tenax (strain Hs-4:NIH) proteins with their functional annotations and inferred protein sequences, derived from the integrated annotation of the 2019 and 2022 draft genome sequence data.

. CAZy annotation for T. tenax and T. vaginalis
A specific CAZy annotation was performed for both T. tenax (annotation from this study) and T. vaginalis (G3_2007 annotation) (Supplementary Table 3).Only 21 T. vaginalis CAZy annotated entries (7%), for a total of 287 entries, were deprecated, or not yet vetted, in the most recent assembly and corresponding annotation, G3_2022 annotation (Supplementary Tables 1, 3).The majority of Trichomonas spp.CAZy entries correspond to CAZymes members of the GH class with 126 TtGHs and 181 TvGHs.There are 86 T. tenax glycosyltransferases (TtGT) and 91 T. vaginalis glycosyltransferases (TvGT) (Supplementary Table 3).There are far fewer polysaccharide lyases (PLs) with three TtPL and 14 TvPL (Supplementary Table 3).Three T. tenax proteins are predicted to contain a CBM domain only, without an identified CAZyme domain (two entries with one CBM20, and one entry with one CBM32).Similarly, there are only three proteins with a single CBM13 domain without an identified CAZyme domain in T. vaginalis (Supplementary Table 3).Of the 189 GH families currently recognized in CAZy (http://www.cazy.org/April, 25th 2024) (named GH1 to GH189), a total of 26 families were identified among either of the two species (Table 2, Supplementary Table 3).Some of the Trichomonas spp.GH proteins have evidence of a signal peptide (SP) and/or a transmembrane domain(s) (TMD), suggesting that these CAZymes could act on glycans with extracellular locations or within an organelle (Table 2).Among the GH13 proteins (14 in T. tenax and 24 in T. vaginalis), two in each species contain a GH133 domain (Table 2, Supplementary Table 3).Seven families with 10 or more, entries in at least one Trichomonas species include the GH13, GH16, GH30, GH31, GH47, GH77 and GH163 (Table 2).Table 3 lists selected features for the GH families with the largest number of entries or that include one or more activities that comprise potential mucin-targeting enzymes (mucinases) (Labourel et al., 2023) or other host glycoproteins encoded by both Trichomonas species.Two families, GH5 and GH32, are specific for T. vaginalis with one entry for each family, and these are confirmed in the G3_2022 annotation (Table 2, Supplementary Table 3).Table 4 lists selected characteristics for these two T. vaginalis specific GHs families.
. Identification of candidate virulence factors in T. tenax through comparisons with T. vaginalis

. . Peptidases
Similar to the 469 predicted peptidases in T. vaginalis (Carlton et al., 2007), a large number of candidate peptidases (476) were identified in T. tenax (Supplementary Table 4).These include 240 cysteine peptidases, 141 metallopeptidases, 85 serine peptidases and seven aspartic peptidases (Supplementary Table 4).A selection of T. tenax peptidase families, including some of those with the largest number of homologs to T. vaginalis factors and that are experimentally demonstrated to contribute to host damage or are candidate virulence factors, are listed in Table 5.A number of these peptidases represent candidate secreted or cell surface proteins based on the presence of inferred signal peptides and/or TMD, consistent with a potential role in targeting extracellular or phagocytosed proteins from host or microbiota cells (including peptidoglycans) (Supplementary Table 4, Table 5).The T. tenax genome encodes a large number (100 members) of family C19  ubiquitin hydrolases, similar to T. vaginalis (117 members) (Carlton et al., 2007) (Supplementary Table 4).In T. vaginalis > 50% of the C19 ubiquitin hydrolases are likely to be functional (Hirt et al., 2011).This large number of C19 peptidases that are known to be functionally associated with the 20S proteasome (ubiquitin C-terminal hydrolases) in model systems (Barrett and Rawlings, 2001) highlight for these two species the importance of cytosolic protein degradation.

. . Surface proteins as adhesins
A large number of genes were annotated in T. vaginalis to encode candidate surface proteins mediating key interactions with its environment, including host epithelial cells (adhesins) and members of the microbiota (Carlton et al., 2007;Hirt et al., 2011;Riestra et al., 2022).Several of these proteins have been experimentally demonstrated in T. vaginalis to be expressed on the cell surface and to mediate binding to host cells (Hirt et al., 2011;Riestra et al., 2022).We focus here on some specific examples which indicate that T. tenax has a similar repertoire of candidate binding factors to host cells, and potentially to members of the oral microbiota.These include 244 TtBspA-like (Supplementary Table 5) and 24 TtPmp-like (Supplementary Table 6) proteins (Figure 1).Two TtBspA-like proteins contain a peptidase domain, including an M8 peptidase and a NlpC/P60 peptidase (Figure 2).There is also evidence for at least 14 T. tenax BAP-like proteins related to TVAG_244130 (de Miguel et al., 2010), with extracellular domains that are shared with bacterial proteins (Table 6).Two of the TtBAP-like proteins have similar structural organization with eight TvBAP-like proteins (>25% identify level, similar protein length and shared TMD-CT), including the TvBAP-like proteins TVAG_166850 and TVAG_244130 (Figure 3).The shown bit scores, e-values and number of hits are derived from BLASTP searchers (e-value ≤ e −10 ) at the NCBI Blast server using the T. vaginalis BAP-like protein TVAG_244130 sequence as query against the (i) nr database or (ii) the 20,286 annotated proteins from T. tenax Hs-4:NIH.Only the features of the top hits are listed for each taxon.The ranking, from top to bottom of the table, is according to the bit score, with the highest value on the top.

. . Extracellular vesicles
As carefully reviewed by Riestra et al. (2022), there are published evidence, including electron microscopy, proteomics surveys and cell biology, demonstrating that T. vaginalis can secrete two types of extracellular vesicles (EVs), exosomes and microvesicles (MVs) and that these modulate several important aspects of T. vaginalis-host cells interactions (Twu et al., 2013;Rai and Johnson, 2019;Artuyants et al., 2020;Molgora et al., 2023;Salas et al., 2023).We identified T. tenax homologs for the Endosomal Sorting Complex Required for Transport (ESCRT) machinery made of four complexes (complexes I to IV) (Leung et al., 2008), which are involved in exosome formation in multivesicle bodies (MVBs) and secretion in many eukaryotes, strongly suggesting that T. tenax also possess MVBs and could secrete exosomes (Supplementary Table 8).

Discussion . CAZymes
CAZymes, including GHs, are of great interest as their glycans substrates are widely distributed across all cellular life forms as well as viruses.Glycans mediate a myriad of biological functions, including handling of carbon reserves, structural roles, or as mediators of intra-and intercellular interactions between organisms or organisms and viruses.CAZymes and GH in particular, but also CBM-containing proteins, include virulence factors of numerous cellular and viral pathogens (Garron and Henrissat, 2019;Wardman et al., 2022).A diverse array of ./fmicb. .
GH families was identified in the two Trichomonas species and the majority of these are shared between them, suggesting that these play key roles in the biology of both the oral cavity and the urogenital tract for T. tenax and T. vaginalis, respectively.A number of families are characterized by relatively larger membership with more than 10 members in one or both species.Three families, GH13, GH31 and GH47 stand out as making up the largest families present in both species (ranging from 14 to 27 entries), further emphasizing their potential importance in the mucosal lifestyle of the respective species.Notably, some GH entries represent candidate virulence factors.Seven GH families (GH2, GH16, GH18, GH16, GH29, GH33 and GH163) could include members from either species that target mucins (Labourel et al., 2023) (candidate endo-β-Nacetylglucosaminidases in GH163, candidate fucosidases in GH29, candidate sialidases in GH33, candidate β-galactosidases in GH2), important structural proteins at mucosal surfaces mediating key innate immune functions in host-microbe interactions (Hansson, 2020).T. vaginalis might be able to degrade mucins more extensively as it has candidate O-glycan endo-β-1,4-galactosidases in GH16, while T. tenax does not seem to encode a homolog with the same predicted activity.Mucins are typically targeted by mucosal pathogens as well as some members of the microbiota (Labourel et al., 2023).One or more of these enzymes could also target SIgA, a key immunoglobulin in the oral cavity (Feller et al., 2013) and the urogenital tract (Garcia et al., 2015), an activity demonstrated for a GH163 from a common member of the human gut microbiota (Briliute et al., 2019).
Several GH families could also target members of the microbiota cell walls, either of bacteria (lysozymes GH19 and GH25) or fungi (chitinase GH19) or GH47, which target glycoproteins containing α-1,2-linked mannose residues.Candidate lysozymes could be functional partners of the NlpC/P60 peptidases, which were demonstrated to target peptidoglycans (Barnett et al., 2023), to contribute to deconstruct bacterial cell walls.These sets of enzymes could be important for the two Trichomonas species to target bacteria for interspecies competition and extraction of important source of nutrients.As fragments of peptidoglycans represent strong pro-inflammatory molecules (Irazoki et al., 2019), the products of the combined activities of these enzymes (GH and peptidases) could also be an important factor contributing to inflammation, such as the damaging inflammations characteristic of both periodontitis (Baker et al., 2024) and trichomoniasis (Mercer and Johnson, 2018).
As most of these GH families are known to mediate multiple distinct enzymatic activities, it will be important to study their exact substrate specificities, expression profile and cell biology to experimentally establish their functional importance for Trichomonas species.

. Candidate surface proteins and exosomes
Several key aspects of host-microbe interactions are mediated by cell surface proteins and EVs, such as exosomes, which modulate cell-cell interactions and contact-dependent cytolysis (e.g., Hirt et al., 2011;Riestra et al., 2022).As for T. vaginalis (Carlton et al., 2007;Hirt et al., 2011;Handrich et al., 2019;Riestra et al., 2022), the annotation of T. tenax proteins identified a large number of candidate surface proteins including members of TtBspA-like, TtPmp-like and TtBAP-like families.Two TtBspA-like proteins possess a peptidase domain (one with an M8 metallopeptidase domain and the other with NlpC/P60 cysteine peptidase domain), a configuration distinct from the single TvBspA-like protein with a peptidase domain (a partial serine peptidase S8/S53-like domain) (Noël et al., 2010).Potential surface proteins also include several candidate peptidases and GHs, as they possess inferred TMDs.There is transcriptional evidence for the expression of 81 TtBspAlike and 11 TtPmp-like encoding genes (Handrich et al., 2019).In T. vaginalis, one or more members of these three families have been experimentally demonstrated to be expressed on the cell surface and to mediate interactions with vaginal epithelial cells (de Miguel et al., 2010;Handrich et al., 2019).Hence some of the identified homologs from these protein families in T. tenax could contribute to interactions of this species with epithelial cells of the oral cavity, interactions which are likely key to mediating the longterm colonization of the periodontal pocket, and that could also contribute to cell contact-mediated cytolysis of host cells (Ribeiro et al., 2015;Matthew et al., 2023).Our gene network analyses that used stringent coverage of the pairwise alignment length criteria (≥90% coverage) indicate that a number of BspA-, Pmp and BAPlike entries have similar structural organization, with a number of these sharing similar length and a TMD and cytoplasmic tail (TMD-CT).Visual inspection of the cytoplasmic tail of TtBAP-and TvBAP-like proteins identified conserved TMD-CT shared with some TvBspA-and TvPmp-like proteins (Hirt et al., 2011;Handrich et al., 2019).This indicates that the gene fusion events underlying these shared TMD-CT domains between BspA and Pmp-like coding genes (Hirt et al., 2011) have also involved some members of the BAP-like family (Handrich et al., 2019) and that these fusion events might have occured in a common ancestor to T. tenax and T. vaginalis.This further supports the functional importance of this type of TMD-CT and in the context of various unrelated extracellular domains: (i) BspA-like proteins characterized by a specific type of leucine-rich repeat (TpLRR), (ii) Pmp-like repeats (that include two specific motifs GGA[ILV] and FXXN) and (iii) BAP domain (related to "Cell surface glycoprotein; S-layer").The two TvBAP-like proteins (TVAG_166850, TVAG_244130) were shown to be significantly more highly expressed by more adherent T. vaginalis strains compared to less adherent strains (de Miguel et al., 2010).Overexpression of some members of these three gene families, one TvBspA, two TvPmp (Handrich et al., 2019) and two BAP-like proteins (TVAG_166850, TVAG_244130) (de Miguel et al., 2010) significantly increased T. vaginalis binding to vaginal epithelial cells in in vitro binding assays, suggesting that some members of the TtBspA-like, TtPmp-like and TtBAPlike proteins families could contribute to T. tenax binding to oral epithelial cells.The Tv/TtBspA-like proteins could also mediate interactions between the Trichomonas species and members of the microbiota based on their known functions in bacteria-bacteria interactions among oral bacterial pathogens (Sharma, 2010), which could contribute to bacteria phagocytosis, as previously speculated for T. vaginalis (Noël et al., 2010).Notably, the TtBspA-like proteins could also contribute to the inflammatory response of the ./fmicb. .periodontal pocket, as demonstrated for some oral bacterial BspA proteins (Sharma, 2010).Some members of these protein families also contribute to the proteome of T. vaginalis exosomes (Riestra et al., 2022;Ong et al., 2024).Our identification of homologs of subunits of the ESCRT machinery (with homologs identified for all four complexes I to IV), suggests that exosomes are likely produced and secreted by T. tenax.Hence it will be important to affirm the presence and functional characteristic of EVs in T. tenax, including their protein and RNA complement as has been done for T. vaginalis (Riestra et al., 2022;Matthew et al., 2023;Ong et al., 2024).These potentially include members of the BspA-like proteins and peptidases (e.g.some calpain-like cysteine peptidase and metallopeptidases, see next section).These different proteins possibly play multiple roles, including modulating T. tenax binding properties to host cells, and mediating cytolytic capabilities, which could both contribute to the pathobiology of periodontitis.Other proteins identified in T. vaginalis exosomes include membrane protein tetraspanins and CBM20-GH77 enzymes (candidate 4α-glucanotransferase) (Rai and Johnson, 2019).We identified homologs of these proteins in T. tenax, including 15 tetraspanins of which 10 were inferred to possess the canonical four TMDs of tetraspanins (Supplementary Table 2) and five proteins with the CBM20-GH77 domain configuration, with one protein inferred to possess a TMD (Supplementary Table 3).In T. vaginalis some tetraspanin proteins were also shown to mediate T. vaginalis binding to self (Cóceres et al., 2015), thus contributing to swarming, which is thought to have implication for its virulence (Hirt, 2013).Notably, the CBM20 domain from one of the T. vaginalis GH77 exosomal proteins was demonstrated to bind to heparan sulfate from host proteoglycans (Rai and Johnson, 2019).Hence these different proteins represent primary targets to study T. tenax interactions with oral epithelial cells and with itself.Cell surface proteins and EVs could also play important roles in T. vaginalis and T. tenax interactions with immunocytes and members of the microbiota or extracellular matrix (ECM) proteins.
Hence these enzymes are of primary interest to investigate the molecular basis of T. tenax pathobiology.Indeed, initial investigations indicated the secretion of cysteine peptidases and metallopeptidases by T. tenax (reviewed in Matthew et al., 2023).Like T. vaginalis (Carlton et al., 2007;Hirt et al., 2011), T. tenax is endowed with a large number and broad diversity of peptidases.A number of these could be either secreted as soluble proteins, secreted as exosomes cargo, or expressed on the cell surface or within organelles such as the lysosomes, where they could contribute to the degradation of proteins on the cell surface of the parasite or from phagocytosed microbial or human cells.Several M8 peptidases have an inferred SP and/or one or more TMD.One member of the M8 family was characterized with an LRR BspA-like containing domain and inferred to possess three TMDs and exposed to either the cell surface or lumen of an organelle, a location consistent with the function of the related M8 peptidases, including Leishmanolysin, an important virulence factor among Leishmania species (Hirt et al., 2011).
In addition to peptidases potentially targeting host and microbial proteins, some peptidases are also likely to specifically target peptides of bacterial peptidoglycans.These include the NlpC/P60 peptidases, and initial analyses of the first published draft genome of T. tenax (Benabdelkader et al., 2019) identified six genes encoding bacteria-like TtNlpC/P60 peptidases (Barnett et al., 2023).The genes encoding these NlpC/P60 peptidases were likely acquired by two distinct lateral gene transfers events from distinct grampositive bacteria donors into a common ancestor of Trichomonas species as these are found in the genomes of T. vaginalis, T. gallinae (Alrefaei et al., 2019) and T. tenax (Barnett et al., 2023).A total of seven distinct TtNlpC/P60 encoding genes were identified when annotating the more recent draft genome (Yang et al., 2022).The latest identified TtNlpC/P60 protein is characterized by a distinct structural configuration, which include two segments of LRR BspAlike proteins, one on each end of the protein.As the LRR of some cell surface BspA-like proteins from oral bacteria mediate bacteriabacteria interactions (Sharma, 2010), these two LRR segments could target this peptidase to the cell surface of specific bacteria.Six of the seven TtNlpC/P60 have inferred SP, consistent with digesting extracellular targets, such as the peptidoglycans from bacteria.The identification of several candidate GHs potentially targeting the glycans from peptidoglycans (e.g., TtGH25, candidate lysozymes) suggests that the TtNlpC/P60 protein could be functional partners of these CAZymes to effectively deconstruct peptidoglycans from the bacterial members of the oral and vaginal bacterial microbiota.As peptidoglycan fragments derived from the combined activity of peptidases and CAZymes are very strong pro-inflammatory molecules (Irazoki et al., 2019), these enzymes could represent important indirect virulent factors contributing to inducing damaging inflammations characteristic of periodontitis (Baker et al., 2024) and trichomoniasis (Mercer and Johnson, 2018;Riestra et al., 2022).In addition, as T. vaginalis and T. tenax are associated with dysbiotic microbiota, the potential preferential targeting of mutualistic bacteria by these enzymes, such as Lactobacillus species (that have anti-inflammatory properties and other important protective functions) typically affected/eradicated by T. vaginalis infections, these enzymes could also contribute to the worsening of the inflammatory response of the oral and vaginal mucosa.
The 10 TtSAPLIP candidates pore-forming proteins are also potentially important virulent factors for T. tenax as discussed for T. vaginalis (Hirt et al., 2011;Diaz et al., 2020).The functional characterization of the TvSAPLIP12 demonstrated that it is haemolytic and cytotoxic toward epithelial cells and the bacterium Escherichia coli (Diaz et al., 2020).The inference of SP for the majority of the TtSAPLIP proteins is consistent with the hypothesis that these candidate virulence factors are secreted and able to act as pore-forming proteins targeting host cells and/or members of the microbiota as shown (or suggested) for the TvSAPLIP proteins (Hirt et al., 2011;Diaz et al., 2020;Margarita et al., 2022).

Conclusions
Our detailed integrated annotations of the two T. tenax draft genomes from the currently single strain available from the American Type Culture Collections (ATCC, strain Hs-4:NIH) further support a model where T. tenax is a pathobiont that directly and/or indirectly contribute to periodontitis (Ribeiro et al., 2015;Marty et al., 2017;Benabdelkader et al., 2019;Bisson et al., 2019;Matthew et al., 2023).Comparison between T. tenax and T. vaginalis annotations also identified some novel shared candidate virulence factors including GHs potentially targeting host or bacterial glycans, making them relevant in both species, and warranting further study in both species.Based on the features of the protein complement of T. tenax shared with T. vaginalis, the oral protist could contribute to worsening the prognosis and/or initiating periodontitis in multiple ways including through host cell cytolysis and bacteria targeting, which together could contribute synergistically at inducing and/or amplifying damaging inflammations.Periodontitis is primarily recognized as a condition driven by bacterial dysbiosis, which is sufficient to develop the condition (Baker et al., 2024).However, the high prevalence of T. tenax among humans and pets (dogs and cats) and the heterogeneity of disease prognosis between individuals makes it potentially highly relevant to numerous cases of periodontitis and could help also at stratifying patients (Marty et al., 2017).Hence, affected hosts could potentially benefit from treatments targeting both T. tenax and pathogenic bacteria to reduce/eliminate the damaging inflammation characteristic of periodontitis.Establishing the potential causality of T. tenax for periodontitis will require further investigation, including longitudinal studies among humans and animals, and the molecular and cellular characterization of several candidate virulence factors.A notable virulence factor associated with T. vaginalis are the Trichomonas vaginalis viruses (TVV), which are members of the totoviridae known to have strong proinflammatory properties (Fichorova et al., 2013).Hence, it will be also very interesting to investigate the potential presence and relevance of related TVV-like viruses in T. tenax isolated from humans and pets.Furthermore, T. vaginalis is associated with two mycoplasma-like endosymbiotic bacteria, which are known to contribute to promoting T. vaginalis growth and to boosting virulence by up-regulating adhesion to human epithelial cells and haemolysis in in vitro conditions (Margarita et al., 2022).It is currently unknown if sets of related endosymbionts are associated with T. tenax.It will be in particular important to sequence genomes from across the spectrum of the known genetic diversity of T. tenax, which includes specific genotypes associated with worse periodontitis (Benabdelkader et al., 2019).This should include recent clinical isolates from both humans and animals (e.g.dogs and cats), as the existing strain (Hs-4:NIH) has been grown in vitro since 1959, and thus may not represent the best, albeit a very valuable and important, model system to study T. tenaxhost interactions.
, 2010 * Listed references are either reviews or original publications describing the listed peptidases as virulence factors.These were either directly demonstrated to degrade host proteins or damage host cells (C1 peptidases) or through indirect mechanisms by targeting members of the microbiota (C40 peptidases).M60-like peptidases are speculated to represent virulence factors based on their cell surface location in T. vaginalis and enriched distribution among mammalian host associated microbial pathogens and members of the microbiota.S8 peptidases are speculated to represent virulence factors based on their cell surface location in T. vaginalis and known targets among several microbial pathogens including parasites, fungi and bacteria.

FIGURE
FIGURESequence comparisons between T. tenax and T. vaginalis BspA-and Pmp-like proteins.The gene networks illustrate the relationships between (A) BspA-and (B) Pmp-like protein families.Gene networks in which nodes represent individual homologous genes and edges represent significant BLASTP alignments.Nodes are colored according to species (pink: T. vaginalis, orange: T. tenax).Edge color is scaled according to BLASTP alignment percentage identity from -% (light blue), -% (blue) and > % (purple).Purple edges are displayed with greater width to aid visibility.The two illustrated networks correspond to the one with the largest number of nodes (genes) for the respective gene families.The arrow in (A) indicates the edge between the two BspA-like sequences with the highest level of sequence similarity between T. tenax and T. vaginalis proteins (C).(Continued)

FIGUREFIGURE
FIGURE (Continued)(C) The structural organization inferred by SMART for the two most similar BspA-like proteins ( % identity) between T. tenax and T. vaginalis are illustrated and drawn to scale.Not all domains identified by SMART are illustrated due to overlap between some of these inferred domains.None of these proteins have an inferred SP or TMD.(D) The structural organization inferred by SMART for one T. tenax Pmp-like protein, which is inferred to possess both a SP and a TMD, with the segment with the Pmp domain (motif GGA[ILV] and FXXN) to be inferred to be exposed to the extracellular environment.

FIGURE
FIGURESequence comparisons between T. tenax and T. vaginalis BAP-like proteins.(A) Gene network in which nodes represent individual homologous genes and edges represent significant BLASTP alignments.Nodes are colored according to species (pink: T. vaginalis, orange: T. tenax).Edge color is scaled according to BLASTP alignment percentage identity from -% (light blue), -% (blue) and > % (purple).Purple edges are displayed with greater width to aid visibility.The locus tags for the two TtBAP-like annotated proteins member of the illustrated network are indicated as are the two T. vaginalis BAP-like proteins ( locus tag) with functional data (see main text).(B) The structural organization inferred by SMART for one T. tenax and one T. vaginalis BAP-like are illustrated and drawn to scale.The symbols for SP, TMD, low complexity and coiled coil sequences are (Continued)

FIGUREFIGURE
FIGURE (Continued)indicated.The red dashed boxes illustrate the position of the HHPred top hit alignments between the alignment of selected BAP-like proteins used as query: "Cell surface glycoprotein; S-layer csg, STRUCTURAL PROTEIN; HET: CA, BGC; .A {Haloferax volcanii DS }", probability: ., e-values:. .The alignment starts and ends at position and , respectively, from NG. g .(C) Multi sequence alignment illustrating the conservation of the TMD and cytoplasmic tail for selected BAP-like, BspA and Pmp-like proteins.The approximate position of the inferred TMD is indicated by the dashed box and the position of the [FY]NPX[YF]-like motif and the acidic cluster are indicated by respectively a blue and purple line below the alignment, these could represent signal for endocytosis (seeHirt et al.,  ; Handrich et al.,  ).
TABLE Overview of selected characteristics and annotations for the two published Trichomonas tenax draft genome sequences.
TABLE Annotated candidate glycoside hydrolases (GH) for T. tenax and T. vaginalis.
TABLE Selected characteristics of glycoside hydrolase families with or more entries or known to include enzymes targeting mucins or other host glycoproteins, encoded by both T. tenax and T. vaginalis.
* Values in brackets (x/y) are the number of annotated genes encoding listed GH families for respectively T. tenax and T. vaginalis.
TABLE Selected characteristics of the two GH families restricted to T. vaginalis.
TABLE Selected annotated peptidase families representing candidate virulence factors of T. tenax.