Mitogenomics and Evolutionary History of Rodent Whipworms (Trichuris spp.) Originating from Three Biogeographic Regions

Trichuris spp. is a widespread nematode which parasitizes a wide range of mammalian hosts including rodents, the most diverse mammalian order. However, genetic data on rodent whipworms are still scarce, with only one published whole genome (Trichuris muris) despite an increasing demand for whole genome data. We sequenced the whipworm mitogenomes from seven rodent hosts belonging to three biogeographic regions (Palearctic, Afrotropical, and Indomalayan), including three previously described species: Trichuris cossoni, Trichuris arvicolae, and Trichuris mastomysi. We assembled and annotated two complete and five almost complete mitogenomes (lacking only the long non-coding region) and performed comparative genomic and phylogenetic analyses. All the mitogenomes are circular, have the same organisation, and consist of 13 protein-coding, 2 rRNA, and 22 tRNA genes. The phylogenetic analysis supports geographical clustering of whipworm species and indicates that T. mastomysi found in Eastern Africa is able to infect multiple closely related rodent hosts. Our results are informative for species delimitation based on mitochondrial markers and could be further used in studies on phylogeny, phylogeography, and population genetics of rodent whipworms


Introduction
Trichuris is a genus of obligate parasitic nematode which infects a broad range of mammalian hosts, including humans. It is a common caecum parasite with fecal-oral transmission. Depending on the whipworm species in question, the number of hosts ranges from one to several taxonomically related species [1]. The overall degree of host specificity of Trichuris spp. remains an unresolved issue [2]. In humans, whipworms are responsible for the soil-transmitted disease trichuriasis, which manifests itself as gastrointestinal inflammation and can, with heavy infections, cause serious complications, such as rectal prolapse, iron-deficiency anemia, and long-term disabilities (especially in children). Trichuriasis affects over half a billion people annually and is considered to be one of the neglected tropical diseases burdening the developing world [3].
Rodents are the most diverse order of mammals, encompassing around 40% of all mammalian species, with family Muridae, native to the Old World, leading in both abundance and number of species [4]. Rodents are not only distributed worldwide but also have a rich evolutionary history with evidence of multiple adaptive radiation events [5,6]. They are also reservoirs for a plethora of parasitic and pathogenic organisms, some of which

Sampling, DNA Extraction, and Sequencing
We used 7 Trichuris spp. (see Table 1 for sample information) preserved in 70% ethanol collected during previous studies on the evolutionary history, taxonomy, and ecology of rodents and their associated parasites and pathogens in Asia, Africa, and Europe [7,12,29]. Three of these samples were identified based on their morphological characteristics as T. mastomysi, recovered from the Natal multimammate mouse, Mastomys natalensis, from Tanzania; T. arvicolae, recovered from the common vole, Microtus arvalis, in the Czech Republic; and T. cossoni, recovered from the greater bandicoot rat, Bandicota indica in Laos [10,12,24]. DNA was extracted using Invisorb Spin Forensic kit (Stratec, Berlin, Germany) following the manufacturer instructions (but without using the carrier RNA). For 3 of our samples for which we obtained a very low amount of DNA after Qubit (Life Technologies, Eugene, OR, USA) measurement (<1 ng/µL), we performed a whole genome amplification (WGA) step using the Illustra Ready-To Go GenomiPhi V3 DNA amplification kit (GE Healthcare, Chalfont, UK) to increase the amount of DNA before library preparation. Trichuris DNA (WGA or not) was equimolarly pooled with DNA from another nematode belonging to the family Oxyuridae, and libraries were prepared using the KAPA HyperPrep kit (Roche, KAPA, Cape Town, South Africa). A pool of 24 dual indexed libraries (with 7 containing the Trichuris samples of this study and 17 additional ones from a wider nematode mitochondrial study) were sequenced using Illumina MiSeq platform and v2 chemistry (San Diego, CA, USA) (i.e., 2 × 250 bp paired-end reads) at the CEITEC Genomics Core Facility (Brno, Czech Republic). After de novo assembly of the mitogenomes, we found that the long non-coding region (NCR-L) was missing for all samples. So, for two samples for which we had some DNA left, i.e., Trichuris samples from Mus caroli and Mus mahomet, we characterised the variable region NCR-L by Sanger sequencing with primers designed in the surrounding NAD1 and NAD2 genes.

Genetic Characterisation of Rodent Hosts
For 2 host samples for which we had access to tissues, we confirmed the field identification by cytochrome b (CYTB) genotyping. We amplified the CYTB gene with primers H15915 and L14723 [30] using the Multiplex PCR kit (Qiagen, Hilden, Germany) in a final volume of 15 µL and using 1 µL of extracted DNA. Amplicons were Sanger-sequenced at Eurofins Genomics (Germany). For the Asiatic rodent samples, we did not have access to tissue or DNA, but a metagenomic study of the saliva sample from the same Mus specimen allowed us to retrieve a 300 nucleotide-long contig that BLASTed 100% with Mus caroli R5231 CYTB (GenBank AN: KJ530558). We thus used the complete CYTB from R5231 in the subsequent analysis. The CYTB of the Mastomys natalensis individual host of T. mastomysi was sequenced in a previous study (GenBank AN: MK454444) [31].
The sequences were aligned by Mauve genomic aligner [38]. The open reading frames were analysed with the Geneious ORF Tool using the invertebrate mitochondrial code and compared to the annotated genome of T. muris. Pairwise mean nucleotide and aminoacid identities were calculated for homologous genes. Putative secondary structures of 22 tRNA genes were identified using software ARWEN [39]. As an independent check, we compared these annotations to the ones obtained using the MITOS pipeline web server [40]. Additional statistical analysis was performed using a custom Python 3.6 script [41] Python Software Foundation, Python Language Reference, version 3.6, using packages Biopython 1.75 [42], Numpy 1.16 [43], and Matplotlib 3.3.4 [44]. We used a sliding-window approach with 1000 bp windows and 25 bp steps to compare GC contents and mean pairwise differences between each sample pair.

Phylogenetic Analysis
For the purpose of phylogenetic analysis, the mitogenomes of T. ovis (NC_018597), T. trichiura (NC_017750), and Trichinella spiralis (NC_002681) were added to the alignment and used as outgroup samples. jModelTest v2.1.10 [45] was used to test the substitution models. Phylogenetic analysis was performed via Bayesian inference (BI) conducted in MrBayes v3.2.6 [46] and complemented by a maximum likelihood phylogenetic analysis conducted in RaxML v8.2.12 [47]. The whole genome alignments were used for the phylogenetic analyses. GTR+G was used as substitution model. The analysis was performed using four independent Markov chain runs for 4,000,000 MCMC generations, sampling every 2000th generation. The first 25% of samples were discarded as burn-in. To compare the level of consistency between the two phylogenetic methods, we first checked the topologies for any inconsistencies and then plotted the individual values of pairwise differences between same sample pairs given by the different methods against each other. This comparison was performed to compare the difference between branch lengths as given by the two methods of phylogenetic analysis.
Only the Bayesian inference analysis using MrBayes was performed for the host samples. We used similar settings for host CYTB sequence analysis, with Rhizomys pruinosus (MH189045) used as outgroup. For host species for which we did not have access to tissue, we used the following GenBank sequences: Bandicota indica: HM217476, Microtus arvalis: KX380038, Mus musculus: KX790793, and Mus mahomet: MN223610. The only host/parasite difference in the MrBayes settings was the choice of JC96 as the host substitution model.

Genomic Features and Annotations
All assemblies were cut in the NCR-L, which uniformly had the lowest coverage. We completed this region for sample LO613 obtained from Mus caroli and ETH232 obtained from Mus mahomet via Sanger sequencing. However, the NCR-L was missing from the other assembled sequences. The two complete mitogenomes, T. sp. ex Mus caroli and T. sp. ex Mus mahomet (Figure 1), had lengths 14,165 bp and 14,100 bp, respectively (see Table 2 for comparison and Table S1 for comparison of the incomplete genomes). All mitogenomes contained 13 protein coding genes (PCGs) (COX1-COX3, NAD1-NAD6, NAD4L, ATP6, ATP8, and CYTB), with NAD5 being the longest and ATP8 the shortest in all samples, 2 rRNA genes, and 22 tRNA genes (except T. cossoni, see below). Four PCGs (NAD2, NAD5, NAD4, and NAD4L) and 10 tRNA genes (tRNA-Met, tRNA-Phe, tRNA-His, tRNA-Arg, tRNA-Pro, tRNA-Trp, tRNA-Ile, tRNA-Gly, tRNA-Cys, and tRNA-Tyr) were encoded by the L-strand, with the rest being encoded by the H-strand. The NCR-L was located between NAD1 and tRNA-Lys, while the short non-coding region (NCR-S) was found between NAD3 and tRNA-Ser(UCN) ( Figure 1). These findings are consistent with published literature on Trichuris mitogenomes. Genes NAD1 and NAD2 were incomplete, and the tRNA-Lys sequence was missing in T. cossoni ex Bandicota indica, as these lie in the immediate vicinity of the missing NCR-L region. In the same sample, the NAD4L gene was abnormally longer (336) than the rest (249-252 bp), with an overlap with NAD4 gene. The reason is a mutation in position 7427 with A replacing T and changing the function of an original stop codon. We checked the trimmed reads to rule out an assembly error. For the seven studied mitogenomes, the average nucleotide frequencies were 34.6% A, 12.3% C, 13.9% G, and 37.2% T. The nucleotide composition was strongly biased towards A+T (73.3%), with T being the most favoured and C being the least favoured nucleotide.    All PCGs used ATG, ATA, TTG, or ATT as their initiation codon. All PCGs except for NAD5 had complete termination codons. In the whole dataset, only COX2 and COX3 (and possibly NAD4, but this could not be inferred due to the missing sequence in T. cossoni sample) had initiation and termination codons identical across all samples. Most of the 22 tRNA genes could be folded into standard secondary structures. The length of tRNA genes ranged from 45 to 94 bp, consistent with other literature on Trichuris mitogenomes, and the tRNA-Ser lacked D-arm and loop [16]. In each genome, the 12S-rRNA gene was located between tRNA-Ser(AGN) and tRNA-Val, while 16S-rRNA gene was found between tRNA-Val and ATP6 gene. There were no differences in tRNA anticodons across the samples. The position of NCR-L could be inferred from the assembly even in those samples, being between the genes NAD1 and tRNA-Lys. However, very limited conclusions about the base composition in said region could be drawn, since the coverage of NCR-L was either too low or non-existent in the five samples mentioned. The NCR-S was located between NAD3 and tRNA-SerUCN.

Comparative Genomic Analyses
The reference genome of T. muris was added to our samples for the analyses within this section. For summary statistics, see Table 3. The sequence divergences between the genomes were substantial, ranging from 2.2% to 34.5%, with average pairwise divergence being 21.7% (Figure 2). The number and order of mitochondrial genes (13 PCGs, 22 tRNA genes, and 2 rRNA genes) and non-coding regions was identical across all studied mitogenomes. The mean pairwise divergence between the samples was 24.7%, with 7320 invariant sites across the whole dataset (45.8%). The magnitude of nucleotide sequence divergence in each gene ranged from 15% in COX1, being the most conserved gene, to 32.8% in ATP8, being the least conserved one. Amino-acid sequences were also compared. There were a total of 1361 amino-acid substitutions across all individual proteins without NAD1 and NAD2. The mean pairwise difference ranged from 7.3%-38%, with COX1 being the most and ATP8 the least conserved protein. There were no marked differences in GC content across our samples ( Figure 3). Both non-coding regions were heavily AT biased (AT percentages 78.6% for NCR-L and 80.2% for NCR-S). Although the information about NCR-L came from two samples only, these were very similar to the NCR-L of T. muris both in length (82 bp in T. sp. ex Mus caroli and 100 bp in T. sp. ex Mus mahomet vs. 112 bp in T. muris) and composition (GC content 21.4% average in our samples vs. 19.6% in T. muris). The average length of NCR-S (107.4 bp in our samples vs. 106 bp in T. muris) and GC content was similar to the T. muris (19.8% average in our samples vs. 20.8% in T. muris). Table 3. Summary of differences in nucleotide sequences and predicted amino-acid sequences. * statistics for NAD1 and NAD2 were computed excluding the sample of T. cossoni, which had incomplete sequences at these sites. Abbreviations: Nuc, nucleotide; ini, initiation; ter, termination. ** statistics for NCR-L were computed using only samples of T.

Phylogenetic Analysis
Both maximum likelihood and Bayesian inference analyses gave the same phylogenetic topology ( Figure 4) and very similar branch lengths. The difference between individual pairwise divergences for each method of phylogenetic analysis was low, ranging from 0-0.25, with average difference of 0.056 (for more details, see Figure S1). The phylogenetic analyses revealed that the Trichuris derived from Afrotropical rodents clustered into a single haplogroup (Figure 4). Samples from this haplogroup had a mean pairwise difference at 4.4%, with 12,917 sites being completely invariant (91.5%). Trichuris from Palearctic rodents, M. arvalis (Czech Republic) and M. musculus, formed a sister group to this haplogroup. T. sp. ex Mus caroli (Laos) was basal to all other rodent derived Trichuris, with T. cossoni ex Bandicota indica (Laos) being the second most basal sample. The samples from Afrotropical and Palearctic regions formed two highly supported monophyletic haplogroups, while the Indomalayan samples did not form a monophyletic group. The divergence (per Bayesian inference analysis) between whipworms found in ungulates (T. ovis) and whipworms found in rodents was of the same order of magnitude than between two rodent pinworms: The pairwise difference between T. sp. ex Mus caroli and T. muris was 33.1%, while the difference between it and the T. ovis was 36.7%. The phylogenetic pattern strongly supports geographical clustering of rodent-derived Trichuris, with no clear evidence of codivergence between the host and parasite. The reconstructed host phylogeny (see Figure S2), although partially unresolved, agrees with published results on murine phylogenetics [5,6,48]. Individual phylogenetic distances are given in Table 4.

Phylogenetic Analysis
Both maximum likelihood and Bayesian inference analyses gave the same phylogenetic topology ( Figure 4) and very similar branch lengths. The difference between individual pairwise divergences for each method of phylogenetic analysis was low, ranging from 0-0.25, with average difference of 0.056 (for more details, see Figure S1). The phylogenetic analyses revealed that the Trichuris derived from Afrotropical rodents clustered into a single haplogroup (Figure 4). Samples from this haplogroup had a mean pairwise difference at 4.4%, with 12,917 sites being completely invariant (91.5%). Trichuris from Palearctic rodents, M. arvalis (Czech Republic) and M. musculus, formed a sister group to this haplogroup. T. sp. ex Mus caroli (Laos) was basal to all other rodent derived Trichuris, with T. cossoni ex Bandicota indica (Laos) being the second most basal sample. The samples from Afrotropical and Palearctic regions formed two highly supported monophyletic haplogroups, while the Indomalayan samples did not form a monophyletic group. The divergence (per Bayesian inference analysis) between whipworms found in ungulates (T. ovis) and whipworms found in rodents was of the same order of magnitude than between two rodent pinworms: The pairwise difference between T. sp. ex Mus caroli and T. muris was 33.1%, while the difference between it and the T. ovis was 36.7%. The phylogenetic pattern strongly supports geographical clustering of rodent-derived Trichuris, with no clear evidence of codivergence between the host and parasite. The reconstructed host phylogeny (see Figure S2), although partially unresolved, agrees with published results on murine phylogenetics [5,6,48]. Individual phylogenetic distances are given in Table 4.

Discussion
We have described seven mitochondrial genomes from different rodent whipworms. Two genomes were described completely, while in five others, the information about NCR-L is lacking. The divergence between the genomes was substantial (average pairwise divergence of 21.7%). The gene organisation was identical in all samples and did not differ from that of Trichuris found in other animals. The missing tRNA-Lys sequence in T. cossoni was a result of sequencing issues, and there is no reason to suspect that the gene itself is not present within the said mitogenome. GC content did not differ drastically across the samples (Figure 2), and the degree of genome conservation (Figure 3) is in line with other studies, with rRNA subunits being relatively the most conserved region [16,17].

Discussion
We have described seven mitochondrial genomes from different rodent whipworms. Two genomes were described completely, while in five others, the information about NCR-L is lacking. The divergence between the genomes was substantial (average pairwise divergence of 21.7%). The gene organisation was identical in all samples and did not differ from that of Trichuris found in other animals. The missing tRNA-Lys sequence in T. cossoni was a result of sequencing issues, and there is no reason to suspect that the gene itself is not present within the said mitogenome. GC content did not differ drastically across the samples (Figure 2), and the degree of genome conservation (Figure 3) is in line with other studies, with rRNA subunits being relatively the most conserved region [16,17].
Previously, a high degree of amino-acid divergence between different Trichuris species was reported [16,17] relatively to other nematodes. In accord with these findings, we observed a similar pattern, with mean pairwise amino-acid divergence at the least conserved gene (ATP8) being 38%, which is similar to the reported divergence between T. ovis and T. discolor (33.9%). When the samples Trichuris ex Mus caroli and Trichuris ex Bandicota indica were excluded from the comparison, this number dropped to 18.1%, as these two samples are the most distinct of the whole dataset. However, this number is still high, since a divergence of around 10% is common between different nematode species, such as in the genera Ancylostoma with around 4% [19,20] or Toxocara spp. with around 5-7% [21]. We suggest these observations are likely not the result of an unusually high mtDNA substitution rates within the Trichuris genus but rather due to sparse sampling from deep mitogenome phylogeny. Given that enoplids are a group that diverged early in the evolution of Nematoda, e.g., the related genus Trichinella diverged from other Nematoda as early as 275 million years ago [49,50], it is probable that the reported amino-acid divergences are a function of deep time: whipworms are likely also an early diverging nematode clade. If so, whipworms may contain a great cryptic biodiversity only starting to be discovered, which is suggested by the recent studies as well as by our own data. Extrapolating from previous studies that compared the amino-acid differences between Trichuris species [16,51], the pairwise amino acid divergences we observe would suggest four of our seven mitogenomes comes from different whipworm species. A fifth taxon of whipworms from African murid genera Mastomys and Mus rodents is also clear. The issue of species delimitation within this taxon will be returned to below.
Our results also support a clear geographic pattern of differentiation between different whipworm lineages, with no evidence of co-divergence between the hosts and parasites in question. The samples clustered together based on the continent/area of their origin: the two samples from Ethiopia, although sampled from the Murini and Praomyini tribes, cluster together before clustering with two other Trichuris samples from the Praomyini tribe from Tanzania and Kenya. Similarly, T. arvicolae from a Cricetidae rodent host clusters first with T. muris found in Apodemus sylvaticus, Mus musculus, or Rattus rattus in Europe [2,52], those hosts belonging to the Murini and Rattini tribes of the Muridae family. This pattern might be expected in parasites without strict host specificity. The samples from Asian rodents (Mus caroli and Bandicota indica) did not form a single monophyletic group. In fact, Trichuris genomes obtained from these two samples were the most distinct pair over all the studied samples (pairwise phylogenetic divergence being 34.5%), strongly suggesting an existence of two distinct Trichuris haplogroups in Southeast Asia. This is in line with recent results of Ribas et al. [12], which showed that Trichuris from Asian rodents are distinct from the population of European Trichuris based on morphological and genetic characteristics. Our phylogenetic analysis, while only preliminary, also hints at Southeast Asia as a possible evolutionary origin of murine Trichuris, as the samples obtained there both clustered basally and were as distinct from each other as they were from the samples from Africa and Europe.
Our study also has potential implications for the question of species delineation. In our phylogenetic analysis, we used two recognized species, T. muris and T. arvicolae (described on the basis of morphological differences), with 17.7% sequence divergence. Yet several other samples, such as T. cossoni ex Bandicota indica or Mus caroli, were in fact more divergent from any other sample than the aforementioned figure. The average distance of Trichuris ex M. caroli and B. indica to other murine samples was 33.4% and 29.8%, respectively (see Table 3 for more details). These numbers are similar to the phylogenetic distance between two recognized and distinct species, T. muris and T. ovis, which in the same analysis amounts to 35.4%. This is in line with results obtained by Liu et al. [16], who found that the genome divergence between T. ovis and T. discolor was 21.9%. If one were to go by the average distance between recognized Trichuris species, one ought to already recognize some of our samples as a putative species.
Returning to the African rodent-host samples: We can state that our observations for samples obtained from species Mastomys natalensis, Mastomys erythroleucus, Praomys missonei, and Mus mahomet are consistent with them all belonging to a single species: Trichuris mastomysi. These samples not only formed a monophyletic haplogroup in our phylogenetic analysis, but they also exhibit low mean pairwise distance (4.7%), while being strongly distinct from all other samples ( Figure 4). It should be emphasized this consistency does not preclude the three samples being representatives of more than one species. The answer to that question depends on how species are defined. Most species definitions would require data from more than one locus to resolve species-hood (the mitogenome is assumed to comprise a single non-recombining locus). Trichuris mastomysi has been already reported on another species than Mastomys natalensis, namely Mastomys erythroleucus in Western Africa [53]. In this study, T. mastomysi was described as clade II, while another clade, clade I, grouped together Trichuris individuals found in three species of the genus Mastomys (M. natalensis, M. erythroleucus, and M. huberti), Arvicanthis niloticus, and Gerbilliscus gambianus. The genetic distance separating clades I and II was 7.5% based on the nuclear ITS-1,5.8S and ITS-2 rDNA genes, suggesting clade I corresponds to individuals belonging to a different Trichuris species than T. mastomysi. It thus appears that, whatever T. mastomysi or Trichuris clade I, both species have the ability to infect several related murid-host species. Nevertheless, it cannot be conclusively ruled out that M. natalensis is the main host of both Trichuris species and the other hosts are just the result of spillovers and thus accidental in nature. There are two important considerations regarding this issue. First, there has never been a systematic investigation into Trichuris on the aforementioned species, so the expected prevalence in African rodents studied here is unknown. Second, the more rodent species T. mastomysi is found to inhabit, the more likely are two specific scenarios. Either T. mastomysi naturally occurs in multiple related rodent species and can infect these without difficulty, or it is particularly prone to spillover events. More systematic research into prevalence of T. mastomysi in different rodent species is needed before this question can be satisfactorily answered.

Conclusions
In the present study, we described seven new mitogenomes of rodent Trichuris representing four different species and analysed their phylogenetic relationships. It would be interesting to get additional material of Trichuris ex Mus caroli to check if morphological characters, generally in males, support the large genetic divergence observed at the mitochondrial level. The large divergence between our Trichuris species coming from three major biogeographic zones in the evolutionary history of the Muridae provides an important source of information for mitochondrial markers to be used as basis for subsequent rodent-borne Trichuris phylogenetic, phylogeographic, and population genetic studies.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/life11060540/s1, Table S1: Genome organisation of the incomplete mitogenomes; Figure S1: Pairwise comparison of branch lengths between each sample pair as constructed by the maximum likelihood and Bayesian inference phylogenetic analyses. The black line denotes equality; the red dashed line is the regression line from the linear model. The regression slope is equal to 1.06, while the intercept is equal to −0.05; Figure