Comparative analysis of the ribosomal DNA repeat unit (rDNA) of Perna viridis (Linnaeus, 1758) and Perna canaliculus (Gmelin, 1791)

Perna viridis and P. canaliculus are economically and ecologically important species of shellfish. In this study, the complete ribosomal DNA (rDNA) unit sequences of these species were determined for the first time. The gene order, 18S rRNA–internal transcribed spacer (ITS) 1–5.8S rRNA–ITS2–28S rRNA–intergenic spacer (IGS), was similar to that observed in other eukaryotes. The lengths of the P. viridis and P. canaliculus rDNA sequences ranged from 8,432 to 8,616 bp and from 7,597 to 7,610 bp, respectively, this variability was mainly attributable to the IGS region. The putative transcription termination site and initiation site were confirmed. Perna viridis and P. canaliculus rDNA contained two (length: 93 and 40 bp) and one (length: 131 bp) repeat motifs, respectively. Individual intra-species differences mainly involved the copy number of repeat units. In P. viridis, three cytosine-guanine (CpG) sites with sizes of 440, 1,075 and 537 bp were found to cover nearly the entire IGS sequence, whereas in P. canaliculus, two CpG islands with sizes of 361 and 484 bp were identified. The phylogenetic trees constructed with maximum likelihood and neighbour-joining methods and based on ITS sequences were identical and included three major clusters. Species of the same genus were easily clustered together.


INTRODUCTION
The family Mytilidae comprises a diverse group of bivalves that are broadly distributed in marine environments (Distel, 2000). This family includes mussels of the genus Perna, which encompasses three currently recognised species of intertidal mussels: Perna viridis, P. canaliculus and P. perna. All three species are considered economically, ecologically and environmentally important (Wood et al., 2007). The green mussel P. viridis is a warm water bivalve species distributed along the coasts and estuaries of the Asia-Pacific region. This mollusc species is highly prized both as a food source and an important aquaculture component in southeast Asia (Tan & Ransangan, 2017;Wang et al., 2018). The green-lipped mussel P. canaliculus is endemic to New Zealand, and its range extends from the warm temperate northern waters (35.5 S) to the cold temperate waters south of Stewart Island (47 S) (Wood et al., 2007). The aquaculture industry surrounding the green-lipped mussel has expanded rapidly and now represents three quarters of all aquaculture exports (by value) from New Zealand (Ibarrola, Hilton & Ragg, 2017).
To date, molecular phylogenetic studies of Perna species have been based mainly on mitochondrial cytochrome oxidase I (COI, 623 bp in length) and ribosomal internal transcribed spacer (ITS) (ITS1 & 2, 711 bp) sequence data (Wood et al., 2007;Cunha et al., 2014;Gardner et al., 2016). Although COI and ITS (ITS1 and ITS2) sequences can be used as molecular markers to distinguish Perna species (Wood et al., 2007), these data do not supply sufficient information for Perna classification based on geographical identification or among strains (or even individuals). Within a given locus, short amplicon sequences may include less genetic variation than longer amplicons, which would reduce the ability to distinguish closely related species (Heeger et al., 2018). Therefore, this study aimed to identify a sufficiently informative molecular marker useful for inter-species and intra-species phylogenetic analyses of Perna species.
In eukaryotes, nuclear ribosomal DNA (rDNA) is usually organised in long tandem repeats. These repeats are arranged head-to-tail in large rDNA clusters to form nucleolar organising regions in chromosomes (Dyomin et al., 2016). Each repeated transcribed unit of rDNA comprises a coding region (18S, 5.8S and 28S rRNA genes) separated by two internal transcribed spacers (ITS1 and ITS2). Moreover, each transcribed unit is separated by an intergenic spacer (IGS) which can be subdivided into external transcribed spacers (ETSs) and the non-transcribed spacer (NTS). The coding regions are highly conserved between organisms, and have been selected as genetic markers for higher-level relationships within the molluscan (Distel, 2000;Passamaneck, Schander & Halanych, 2004;Combosch & Giribet, 2016). Whereas non-transcribed regions (ITS1, ITS2 and IGS) exhibit high structural variability (Li et al., 2016;Huang et al., 2017;. The IGS region often contains repeat fragments, promoter and enhancer sites. Both the IGS and its embedded promoters can evolve more rapidly than other parts of the ribosomal repeat unit, which can lead to considerable differences in the sequences between related species and considerable intraindividual variations in length (Huang et al., 2017). The IGS region was shown to be more variable and phylogenetically informative than the ITS (Zhao et al., 2011). The availability of a complete rDNA sequence would provide researchers with various options, depending on the molecular variability. For Perna species, however, the NCBI database only contains partial rDNA sequences (18S and ITS) and a complete rDNA unit has never been reported.
In the present study, the complete sequences of nrDNA in P. viridis and P. canaliculus are reported for the first time, and the molecular features of various rDNA components are characterised, especially IGS. The phylogenetic relationships among members of the family Mytilidae are revealed based on ITS sequences. Comparative analyses are performed with several known complete rDNA sequences to clarify the rDNA relationships among these molluscs and other eukaryotes.

Mussel material and DNA extraction
Perna viridis samples were collected offshore of Leizhou Bay (20.968614 N, 110.325743 E) in China, and P. canaliculus samples were collected from Pegasus Bay (43.312595 S, 172.854629 E) in New Zealand. Three individuals from each species were used in this study. Total genomic DNA was extracted from muscle tissue using the TIANamp Marine Animals DNA Kit (TianGen Biotech Co., Ltd, Beijing, China) according to the manufacturer's instructions.

PCR amplification, cloning and sequencing
The rDNA fragments were amplified using the primers listed in Table 1. The primers used to amplify the 18S and 28S rRNA genes were designed based on partial sequences corresponding to the 18S rRNA genes of P. viridis (EF613234) and 28S rRNA genes of Geukensia demissa (AY145405), respectively. The primers used to amplify the 18S-ITS-28S region were designed from the 3′end of P. viridis (EF613234) and 5′end of G. demissa (AY145405), while the IGS region were designed based on the 3′end of G. demissa (AY145405) and 5′end of P. viridis (EF613234). Each PCR had a total volume of 25 ml and contained the following: one mL of DNA template, 0.5 mL of each primer (10 mmol/L), 12.5 mL of 2 × TransTaq PCR SuperMix and 10.5 mL of H 2 O. The reactions were initially denatured at 95 C for 3 min and subjected to 35 cycles of denaturing at 95 C for 45 s; annealing at 52 C for 18S and 28S rRNA, 50 C for 18S-ITS-28S and 56.5 C for 28S-IGS-18S for 30 s, and extension at 72 C for 1-3.5 min (1 kb/min). Finally, all reactions were subjected to a final extension at 72 C for 7 min. The PCR products were electrophoresed on 0.8% agarose gels, extracted and purified. The purified PCR products were cloned using the pUCm-T Vector Cloning Kit and SanPrep Column Plasmid MiniPreps Kit (Sangon Biotech Co., Ltd., Shanghai, China) according to the manufacturer's instructions. Three positive recombinant plasmid colonies corresponding to each amplified region were picked, cultivated and sequenced by Beijing Ruiboxingke Biotechnology Co., Ltd. (Beijing, China).

Sequence analysis
The complete rDNA unit was assembled using DNAMAN software (Kaukonen et al., 2000), and both sequences have been deposited in the GenBank database (accession numbers: MK419104-MK419109). The boundaries of each region were confirmed using the NCBI database and BLAST software (Johnson et al., 2008). Multiple sequence alignment was performed using T-Coffee software (Notredame, Higgins & Heringa, 2000) with manual adjustments. The general molecular features of P. viridis and P. canaliculus rDNA were calculated using MEGA 7.0 (Kumar, Stecher & Tamura, 2016). Sub-repeat (SR) fragments and inverted repeats of the IGS region were analysed with the Tandem Repeats Finder (Benson, 1999) and Unipro UGENE (Okonechnikov, Golosova & Fursov, 2012), respectively. Predictions of putative transcription initiation sites (TIS) and transcription termination sites (TTSs) in the IGS regions were based on a comparative analysis of the sequences with data for various marine animal species from the literature (Ki, Kim & Lee, 2009;Luchetti, Scanabissi & Mantovani, 2006;Chae et al., 2018;. Cytosine-guanine (CpG) islands were identified using CpGPlot (Polanco & De La Vega, 1994). The sequence identity was confirmed using DNAMAN software. The genetic distance between P. viridis and P. canaliculus was calculated with MEGA 7.0 software according to a Kimura two-parameter model. Nucleotide diversity (p) was calculated using the software DnaSP 5.10 (Librado & Rozas, 2009). For phylogenetic reconstruction, the ITS sequences of P. viridis and P. canaliculus were first determined. Next, nine complete or near-complete sequences from the following Mytilidae species were obtained from the GenBank database: Mytilus edulis (AY695798), M. unguiculatus (former M. coruscus) (MK157201), M. galloprovincialis (JX081670), M. trossulus (JX081669), Aulacomya maoriana (former Aulacomya atra maoriana) (DQ924557), P. perna southern Indian population (former P. indica) (JQ622200), P. perna Omani population (KC692037), P. perna Mauritanian population (former P. picta) (DQ924548) and Modiolus rectus (EF035114). Maximum likelihood (ML) and neighbourjoining (NJ) phylogenetic trees based on ITS sequences were constructed with 1,000 bootstrap replicates using MEGA 7.0 software and the GTR+G nucleotide substitution and Kimura two-parameter models, respectively. Crassostrea ariakensis (EU252081) was selected as the outgroup.

RESULTS
Complete rDNA sequences of P. viridis and P. canaliculus The complete rDNA units of P. viridis and P. canaliculus were determined after sequencing, comparison and assembly. Figure 1 presents a scheme of the structural organisation of the rDNA repeat unit, which includes the following elements in order: 18S-ITS1-5.8S-ITS2-28S-IGS. The complete rDNA units of P. viridis and P. canaliculus varied from 8,432 to 8,616 bp and from 7,597 to 7,609 bp, respectively.
The length, GC content, pairwise identity and variable sites of each region are shown in Table 2. An analysis of the P. viridis and P. canaliculus sequences revealed low levels of genetic variation in the 18S, 5.8S and 28S rRNA regions, which had pairwise identities of 99.56%, 100% and 98.94%, respectively. The variable (V) sites in 18S and 28S rRNA made respective contributions of only 0.83% and 1.22%, indicating that the coding regions (18S, 5.8S and 28S rRNA) were even more highly conserved than the complete rDNA unit (6.25%). Differences between P. viridis and P. canaliculus were observed mainly in non-coding regions (ITS1, ITS2 and IGS). In intra-species comparisons of three individuals per species, the sequence similarities of ITS1 and ITS2 in P. viridis were 99.38% and 99.87%, respectively, and these regions differed by only five and one base, respectively. Among P. canaliculus individuals, both regions had sequence similarities of 100%. The respective sizes of ITS1 and ITS2 were 269 and 263 bp in P. viridis and 304 and 258 bp in P. canaliculus. Differences in the lengths and variable sites of the ITS1 and ITS2 regions resulted in relatively low sequence identities (66.99% in ITS1, 67.03% in ITS2).
Comparisons of the IGS sequences at various levels revealed that the highest (91.84% for P. viridis, 98.31% for P. canaliculus) and lowest similarities (33.32%) were found at the individual and inter-species levels, respectively, and the variable sites between the IGS  sequences of the two species accounted for 15.65% of all sites in this region (Tables 2  and 3). The sizes of the IGS sequences ranged from 1,399 to 1,411 bp and 2,267 to 2,451 bp in P. canaliculus and P. viridis, respectively. This result suggested that the IGS sequences of them had large differences in length and a relatively high level of sequence heterogeneity. Due to heterogeneity in the length of IGSs at the individual and inter-species levels of the genus Perna, the longest IGS sequence of each species was chosen as a representative to outline the canonical structural organisation of the IGSs from P. canaliculus and P. viridis. The detailed analysis of the molecular structure of IGS sequences of them contains six distinct regions: TTS, TIS, NTS, ETS, SR and CpG islands.

Beginning and ending sequences of transcripts, ETS and NTS
A poly (T) tract (5′-TTTTCGTTTGCCTTTTTCGTTCCTTTTTTTTT-3′) was identified in the P. viridis IGS region from 171 to 202 bp. The sequence 5′-TTACTTGT-3′ was detected in the P. canaliculus IGS region from 120 to 127 bp. Both sequences were considered putative TTSs. Accordingly, the positions of the 3′ETS sequences in the P. viridis and P. canaliculus 5′IGS regions were determined to correspond to bp 1-202 and bp 1-127, respectively. The sequence 'TTATTATGTGGAGTGGG' was considered a putative TIS for RNA polymerase I, and the +1 position of the TIS was the C/T nucleotide in P. viridis and P. canaliculus. The 5′ETSs were localised between the putative TIS and the beginning of the 18S rRNA gene and had lengths of 581 bp and 500-503 bp in P. viridis and P. canaliculus, respectively. Non-transcribed spacers comprised the remainder of IGS. The length, nucleotide diversity, intra-and inter-species sequence identity of 3′ETS, NTS and 5′ETS in P. viridis and P. canaliculus were shown in Table 4, comparision with 3′ETS and 5′ETS, NTS had lower intra-and inter-species sequence identity and higher nucleotide diversity. The high level of individual, intra-and inter-species divergence within the NTS sequences was attributed to lower functional constraints. In contrast, the nucleotide sequences were more conserved in the ETS region than in the NTSs, despite differences in the length of the former between P. viridis and P. canaliculus.

Sub-repeat regions and CpG islands in the IGS sequence
Next, the DNA repeats in the IGS sequences were analysed to explore the IGS genetic structure in greater detail. The identification of the numbers and lengths of the repetitive  1 and 2). A comparison of the IGS regions indicated an extremely high level of sequence diversity within the SR region. The P. canaliculus sequence included only one type of SR (length: 131 bp) which was repeated 2.9 times, and three poly (A) tracts in the NTS area. The P. viridis sequence included two repeat motifs: SR1 (length: 93 bp), which was repeated 9.3-11.5 times within the NTS, and SR2 (length: 40 bp), which was repeated 2.5 times within the 5′ETS. Differences between individuals of the same species were due to SR copy number variation.
No inverted repeats were detected in either Perna species. The GC content of the IGS region was slightly higher in P. viridis than in P. canaliculus (Table 2). Similarly, the number and lengths of the CpG islands were greater in P. viridis than in P. canaliculus. In P. viridis, three CpG sites were found to cover nearly the Table 4 The length, nucleotide diversity (p), intra-and inter-species sequence identity of 3′ ETS, NTS and 5′ ETS in P. viridis and P. canaliculus.

Sample
Length (

Phylogenetic tree analysis based on ITS sequences
Maximum likelihood and NJ phylogenetic trees were constructed based on the ITS sequences of Mytilidae species (Fig. 3). Clustering pattern analysis revealed that the NJ and ML trees were identical, both including three major clusters, and species of the same genus clearly clustered together with relatively high supporting values. For both P. viridis and P. canaliculus, the individuals were classified into single clusters with high sequence identity. All Mytilidae species included in the phylogenetic analysis belong to the subfamily Mytilinae, except M. rectus which belongs to the subfamily Modiolinae. During tree formation, M. rectus and A. maoriana initially clustered together, and then clustered with Perna and Mytilus species.

DISCUSSION
In this study, the complete rDNA units of P. viridis and P. canaliculus were sequenced for the first time. The rDNA gene orders in these organisms were identical to those in other eukaryotes. Reported rDNA unit lengths vary among taxa, as shown in Table 5, with values of 43 kb (human) (Gonzalez & Sylvester, 1995) and 45 kb (mouse) for mammals (Grozdanov, Georgiev & Karagyozov, 2003); 13.67 kb for fish (Cyprinus carpio) (Vera et al., 2003); 12.26 kb (Brachiola algerae) (Belkorchia et al., 2008), 9.5 kb (Plasmodiophora     (Ki, Park & Lee, 2011), 7.7 kb (Aurelia sp.1) (Ki, Kim & Lee, 2009) and 9.6-10.7 kb (Haliotis species) (Guo et al., 2017 for marine invertebrates; 7.9-8.9 kb (Oryza sativa) (Fujisawa et al., 2006) and 8.0-8.9 kb (Stipa spp.) (Krawczyk et al., 2017) for land plants and 11.76-12.57 kb (Bangia)  and 13.65 kb (Pyropia yezoensis) (Li et al., 2016) for sea algae. Meanwhile, the rDNA lengths of P. viridis and P. canaliculus were 8.6 and 7.6 kb, respectively. Table 5 compares each region of the rDNA unit between the two Perna species and other eukaryotes. The length of the coding region (18S, 5.8S and 28S rRNA genes) exhibits conservative features. The length of 18S rRNA varies from 1,799 bp to 1,996 bp except for Plasmodiophora brassicae (3,105 bp) and Brachiola algerae (1,391 bp). The sizes of 5.8S rRNA in most eukaryotes are around 160 bp except for Entamoeba invadens (116 bp) and Brachiola algerae. In B. algerae, the ITS size was estimated at 22 bp and the 5.8S rRNA gene was not detected (Belkorchia et al., 2008), while the length of 28S rRNA ranged from 3 to 5 kb. Differences in rDNA lengths among species are mainly attributable to the IGS. According to the database of rDNA unit sequences submitted to GenBank, IGS lengths vary from 0.58 kb in Paramphistomum cervi to over 30 kb in mammals.
This study observed variations in the length and sequences of the IGS region in comparisons at both intra-and inter-species levels. Length differences in this region were largely responsible for variations in the entire rDNA unit lengths of P. viridis and P. canaliculus at both levels and are mainly attributable to differences in indels between individuals. Compared with the entire IGS region and the NTS, the beginning of the 3′ETS and latter half of the 5′ETS sequences were more highly conserved and exhibited greater sequence identity (>65%). All rDNA IGS sequences shared characteristics such as the presence of SR elements. Therefore, heterogeneity in the IGS length can be attributed to duplications or deletions of the SR region, which may be present in different copy numbers in almost all species (Ambrose & Crease, 2011;Huang et al., 2017). In an intra-species comparison of individuals, length variants were mainly attributable to the SR zone; for example, 9.3-11.5 copies of the repeat motif SR1 were present in P. viridis individuals. Huang et al. (2017) speculated that indels are restricted to SR-rich regions. Therefore, length polymorphisms in IGS result from concerted evolution, and unequal crossover between SR elements might be a major driving force underlying the evolution of rDNA units (Ganley & Scott, 1999;Stage & Eickbush, 2007). The repeat fragments were found to be species-specific and rarely adhered to any predictable behaviour or similarities even within a genus. For example, P. viridis and P. canaliculus exhibited totally different repeat patterns, and this phenomenon has also been observed in other eukaryotes (Krawczyk et al., 2017;. Possibly, the repeat pattern could be used as a genetic marker for species identification. The IGS contains several functional elements, including TTS and TIS regions. According to Ki, Kim & Lee (2009;Ki, Park & Lee, 2011),  and Chae et al. (2018), a poly(T) tract on the 5′side of the IGS can be considered a the putative TTS in marine invertebrates. The typical structure of the region near the beginning of the 5′IGS was also observed in P. viridis and P. canaliculus, even though the sequences were not totally identical. The core promoter contains TATA and GGGG boxes, which appear to be a general feature of rDNA transcription, and this site is immediately preceded by an AT-rich region, which is commonly found in both plants and animals (Vera et al., 2003;Wang, Zhao & Li, 2003;Maggini et al., 2008;Krawczyk et al., 2017). However, the beginnings of the P. viridis and P. canaliculus IGS genes contained only two and one TATA-box sequences, respectively, and lacked AT-rich regions. Only one previous report described a similar putative TIS sequence (TTATTATGTGGAGTG GG). Specifically, Krawczyk et al. (2017), after an analysis of Poaceae species, also reported that an AT-rich region upstream of the TIS was not ubiquitous. The rDNA genes are widely used to resolve phylogenetic relationships between species at various taxonomic levels. However, few studies have used IGS to identify species or reveal phylogenetic relationships among mussels. A comparison of the genetic distances of rDNA between P. viridis and P. canaliculus yielded the largest value in the IGS region (0.304-0.313), which suggests that this region could be used to identify species even at a sub-genus level. Given its relatively rapid evolution (compared with other rDNA sequences) and differences in the SR sequences (e.g., copy number, sequence and length) among species, the IGS region may be more suitable than ITS and 18/28S rRNA for reconstructing both inter-and intra-species phylogenetic relationships.
In conclusion, we sequenced the complete rDNA unit of P. viridis and P. canaliculus for the first time. In both species, the structural organisation of the rDNA unit was similar to those of many other eukaryotes. The unit lengths in P. viridis and P. canaliculus were 8,432-8,616 bp and 7,597-7,610 bp, respectively, and variations in the length were mainly attributable to the IGS region. We further investigated the characteristics of the IGS and assessed sequence diversity at intra-and inter-species levels. The Perna rDNA unit provides a structural model of nuclear rDNA for molecular comparisons, particularly among Mytilidae species. These research discoveries will hopefully pave the way for analyses of mollusc population genetics and evolution.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This work was supported by the Shandong Provincial Natural Science Foundation, China (Grant Number ZR2017PEE008). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.