The ribosomal transcription units of Haplorchis pumilio and H. taichui and the use of 28S rDNA sequences for phylogenetic identification of common heterophyids in Vietnam

Background Heterophyidiasis is now a major public health threat in many tropical countries. Species in the trematode family Heterophyidae infecting humans include Centrocestus formosanus, Haplorchis pumilio, H. taichui, H. yokogawai, Procerovum varium and Stellantchasmus falcatus. For molecular phylogenetic and systematic studies on trematodes, we need more prospective markers for taxonomic identification and classification. This study provides near-complete ribosomal transcription units (rTU) from Haplorchis pumilio and H. taichui and demonstrates the use of 28S rDNA sequences for identification and phylogenetic analysis. Results The near-complete ribosomal transcription units (rTU), consisting of 18S, ITS1, 5.8S, ITS2 and 28S rRNA genes and spacers, from H. pumilio and H. taichui from human hosts in Vietnam, were determined and annotated. Sequence analysis revealed tandem repetitive elements in ITS1 in H. pumilio and in ITS2 in H. taichui. A phylogenetic tree inferred from 28S rDNA sequences of 40 trematode strains/species, including 14 Vietnamese heterophyid individuals, clearly confirmed the status of each of the Vietnamese species: Centrocestus formosanus, Haplorchis pumilio, H. taichui, H. yokogawai, Procerovum varium and Stellantchasmus falcatus. However, the family Heterophyidae was clearly not monophyletic, with some genera apparently allied with other families within the superfamily Opisthorchioidea (i.e. Cryptogonimidae and Opisthorchiidae). These families and their constituent genera require substantial re-evaluation using a combination of morphological and molecular data. Our new molecular data will assist in such studies. Conclusions The 28S rDNA sequences are conserved among individuals within a species but varied between genera. Based on analysis of 40 28S rDNA sequences representing 19 species in the superfamily Opisthorchioidea and an outgroup taxon (Alaria alata, family Diplostomidae), six common human pathogenic heterophyids were identified and clearly resolved. The phylogenetic tree inferred from these sequences again confirmed anomalies in molecular placement of some members of the family Heterophyidae and demonstrates the need for reappraisal of the entire superfamily Opisthorchioidea. The new sequences provided here supplement those already available in public databases and add to the array of molecular tools that can be used for the diagnosis of heterophyid species in human and animal infections.

The aim of this paper is to present the sequence of near-complete ribosomal transcription units from Haplorchis pumilio and H. taichui, commonly found in humans. Portions of the 28S rRNA gene from other heterophyids infecting humans in Vietnam are also presented, i.e. Centrocestus formosanus, Haplorchis yokogawai, Procerovum varium and Stellantchasmus falcatus. The data will be used to explore the phylogenetic positions of these genera in the family Heterophyidae and in the class Trematoda.
Adults of Centrocestus spp., Haplorchis spp., Procerovum spp. and Stellantchasmus spp., originating from Ha Giang, Nam Dinh, Quang Tri and Quang Ninh Provinces, in the  [5,14] (Table 1). Each adult worm, unstained or stained with acetic carmine, was morphologically identified to species by light microscopy [3,5,14]. Up to ten worms of each species recovered per human were individually fixed in 70% ethanol; one or two worms of each species were subjected to molecular analysis. The samples HTAQT3 of Haplorchis taichui and HpDzH of H. pumilio, collected from people in Quang Tri and Nam Dinh Provinces, respectively, were chosen for amplification and sequencing of the rTU. Only the 28S region was amplified and sequenced from other species for molecular identification and phylogenetic analysis ( Table 1).

Genomic DNA extraction, primers and amplification
Total genomic DNA was extracted from individual cercariae, metacercariae or adult specimens using the GeneJET™ Genomic DNA Purification Kit (Thermo Fisher Scientific Inc., MA, USA), according to the manufacturer's instructions. Genomic DNA was eluted in 50 μl of the elution buffer provided in the kit and stored at -20°C. The DNA concentration was estimated using a GBC UV/ visible 911A spectrophotometer (GBC Scientific Equipment Pty. Ltd., Braeside VIC, Australia) and diluted to a working 50 ng/μl: 2 μl were used as template in a PCR of 50 μl volume. All rTU-universal primers, used both for amplification and sequencing the rTU of H. pumilio and H. taichui, are listed in Table 2. Primers UD18SF/U3SR amplified the 18S and ITS1 region and U3SF/1500R amplified the ITS2 and 28S region. The primer pairs U18SF/U18SR and U28SF/U28SR, were used for obtaining major fragments of ribosomal 18S or 28S, respectively. These primers were also used as sequencing primers, as were additional internal primers ( Table 2).
The amplicons were eluted from the gel and subjected to direct sequencing by primer-walking in both directions.

Annotation and phylogenetic analysis
Boundaries of ribosomal 18S, 5.8S and 28S genes were determined by alignment, using the Clustal X program [36], with known ribosomal DNA sequences inferred from complete or near-complete rTU sequences available in the GenBank database or previous publications, i.e. for Euryhelmis costaricensis (GenBank: AB521797); Isthmiophora hortensis (AB189982); Paragonimus kellicotti (HQ900670); Paramphistomum cervi [33]; and some partial rTUs including Centrocestus sp. (AY245699); and Haplorchis pumilio (AY245706) and Haplorchis taichui (AY245705) [12]. For internal transcribed spacers, ITS1 was recognized as the region located between 18S and 5.8S and ITS2 as between and 5.8S and 28S, respectively. Tandem repeats (TRs) were detected in the ITS1 or ITS2 using the Tandem Repeat Finder v3.01 [37]. Newly obtained partial 28S sequences (approximately, 1,100 nucleotides) of 14 Vietnamese heterophyids and 25 additional sequences, representing species of all three families of the superfamily Opisthorchioidea available in GenBank, and including another 17 sequences from members of the family Heterophyidae, were aligned using GENEDOC2.7 (available at: http://iubio.bio.indiana.edu/soft/molbio/ibmpc/genedoc-readme.html) (Tables 1  and 3). Also included in the alignment was Alaria alata (family Diplostomidae) as an outgroup species. The alignment was trimmed to the length of the shortest sequence, saved in FASTA format and imported into the MEGA6.06 software. To examine the phylogenetic position of the Vietnamese heterophyids relative to other trematodes, a phylogenetic tree was reconstructed (see list of sequences in Tables 1 and 3) using maximum likelihood (ML) analysis with the general time reversible (GTR) + G+ I model (gamma rate heterogeneity and a proportion of invariant sites). This model was given the best Bayesian information criterion score by MEGA. Confidence in each node was Sequence used as the outgroup assessed using 1,000 bootstrap resamplings [38]. A Bayesian analysis was also conducted using MrBayes v3.2 [39] and the same model of sequence evolution. Five million generations were performed (two parallel runs, each with four chains), more than required for the standard deviation of the splits frequencies to fall below 0.01. Plots indicated that convergence was approached after fewer than 1,000,000 generations. The first 1,000,000 cycles were therefore discarded as 'burn-in' and trees sampled every 1,000 generations.

Results
Structural organization and characteristics of the ribosomal transcription unit of Haplorchis pumilio and H. taichui We did not sequence the IGS due to the highly repetitive sequences included in this region. The five regions of the rTU are: 18S, ITS1, 5.8S, ITS2 and 28S, structurally organized as usually seen in the ribosomal DNA operon of metazoans (Fig. 1).
In both H. pumilio and H. taichui, the 18S gene was 1,992 bp in length, and the 5.8S gene was 160 bp long; however, the currently sequenced portion of the 28S gene obtained from H. pumilio is 1,397 bp, and that of H. taichui, 1,403 bp (Table 4). These lengths represent only a portion of the complete 28S gene (around 3.2-5.5 kb in total for various trematode species [16]). The Vietnamese H. pumilio ITS1 region (1,106 bp) contains five complete tandem repeats, (TRA1-2-3, each of 136 bp) and TRB (TRB1-2 each of 123 bp) followed by a partial TRB3 of 84 bp (Table 4; Fig. 1 (Table 1). These were aligned with 26 previously published sequences representing 20 species of trematodes in 4 families, including additional representatives of the Heterophyidae ( Table 3). The alignment used was 1,100 bp in length. The phylogenetic tree shown in Fig. 2 is based on the maximum likelihood (ML) analysis. Bayesian posterior support values and bootstrap values are shown at relevant nodes. Bayesian and ML trees were almost identical, differing only in the placement of Centrocestus formosanus. In the Bayesian tree, this species fell into a clade (posterior support 0.86) with members of the Cryptogonimidae, whereas in the ML tree it was depicted as basal to all other opisthorchioideans (Fig. 2), albeit with low bootstrap support. Sequences of each of our six target heterophyid species were consistently grouped with those of the same species from published sources, thus confirming our morphological identifications. With one exception, species were clustered within their respective genera. The exception was Procerovum varium, which was nested among species of Haplorchis. Monophyly of the Heterophyidae was not observed. The Centrocestus formosanus sequences were grouped either with a sister relationship to the Cryptogonimidae (Bayesian analysis) or basal in the Opisthorchioidea (ML analysis), Sequences of two other heterophyids, Euryhelmis costaricensis from Japanese martens (Martes melampus) [32] and Cryptocotyle lingua, fell into a strongly supported clade (Bayesian posterior support value 1.0 and ML bootstrap support 96%), all other members of which belonged to the family Opisthorchiidae (Fig. 2).

Discussion
In this study, we have presented sequences of the nearcomplete ribosomal transcription units (rTUs) for two The obtained sequences encompass virtually the complete 18S gene (typical length range 1.7-2.9 kb) and almost half of the 28S gene (typical length range 3.3-5.5 kb) [16,17]. Also obtained were the complete ITS1, 5.8S gene and ITS2 sequences for these species.
We have found repetitive sequences tandemly arranged in the ITS1 of H. pumilio and in the ITS2 of H. taichui. ITS sequences of both species have been reported from Israel [12]. Israeli H. pumilio possessed only two short tandem repeats (30 bp) in their ITS1, in strong contrast to the Vietnamese sequences, in which the ITS1 contained five complete repeats and one incomplete copy. The ITS1 sequences differed substantially in length between Vietnamese and Israeli individuals of the same species, 1,106 vs 640 bp in H. pumilio; and 797 vs 582 bp in H. taichui, due to differences in numbers of tandem repeats. These indicate intraspecific polymorphism as reported commonly in trematodes [8,12,33]. Likewise, ITS2 showed repetitive sequence differences between individuals from different locations. The presence of repeats in the internal transcribed spacers of trematodes has been reported for several taxa, including those in Schistosomatidae, Opisthorchiidae, Heterophyidae, Paramphistomatidae and others [8,32,33,40]. The presence of repeats, variation in length and sequence variation, within and between species, all contribute to difficulties when trying to align ITS regions. This is particularly so when phylogenetically divergent species are being compared and suggest that this region is not suitable for deep-level phylogenies [17]. At the level of genus and species, however, alignments of ITS sequences have proved valuable for phylogenetic studies and molecular taxonomy [17,41,42].
The topology of the phylogenetic tree inferred from 40 trematode sequences in this study (Fig. 2) generally agreed well with previous findings. Most genera represented by multiple sequences formed well-supported monophyletic clusters. One striking exception was the sequence of Procerovum varium, which rendered Haplorchis paraphyletic. This relationship has also been noticed by others (e.g. [10]). Clearly, the definitions of these two genera will need to be revisited. The three families  [15] using 18S and 28S sequences. An additional heterophyid genus, Centrocestus, had an affinity with members of the Cryptogonimidae, or appeared as basal within the Opisthorchioidea (Fig. 2). Such a placement was not supported by analysis of concatenated 18S and 28S sequences by [18]. It is clear that the entire superfamily Opisthorchioidea presents broad systematic and taxonomic challenges to be met in the future using combined morphological and molecular approaches.

Conclusions
In conclusion, the present study determined and annotated the near-complete ribosomal transcription unit  . Alaria alata (Diplostomidae) was used as the outgroup taxon. The tree depicted was inferred using maximum likelihood (ML) analysis with the general time reversible (GTR) + G + I model (gamma rate heterogeneity and a proportion of invariant sites) in the MEGA 6.06 package. Support for each node was evaluated using 1,000 bootstrap resamplings [38]. An almost identical tree was found using Bayesian analysis (see text for details). Numbers at nodes are Bayesian posterior support values/ML bootstrap values. The basal node for the superfamily Opisthorchioidea is indicated by an arrow. The scale-bar indicates the number of substitutions per site. Accession numbers are given at the end of each sequence name from these sequences again confirmed anomalies in molecular placement of some members of the family Heterophyidae and demonstrates the need for reappraisal of the entire superfamily Opisthorchioidea. The new sequences provided here supplement those already available in public databases and add to the array of molecular tools that can be used for the diagnosis of heterophyid species in human and animal infections.