Introduction

The red crab, Charybdis feriata (Crustacea: Decapoda: Portunidae) (Linnaeus, 1758), also known as the crucifix crab, is a large swimming crab species that is broadly distributed in the Indo-Pacific sea areas, including Japan, China, Indonesia, Australia, India, Pakistan, Oman and South Africa1,2. C. feriata can be easily distinguished by its striking red and white colour pattern and the distinct cross on the median surface of its carapace3. Young crabs usually live in the sandy shore, whereas adults prefer to inhabit the areas of sandy and muddy bottoms at depths from 30 m to 60 m4. Given its fast growth speed, large size, good flavor and high market demand, this species is considered as one of most valuable fisheries resources and has become a potentially important target for aquaculture, domestication and stock enhancement5,6. The wild females usually weigh 200 - 350 g, but the males can grow up to 1 kg7. In the last few decades, the catching production and the wild resources of C. feriata have been decreasing on a yearly basis8 due to over-exploitation and environmental deterioration. Despite its economic importance, studies have been limited to reproductive biology7,8, larvae characteristics9,10 and fishery biology11. Little information could be available about the germplasm resource and population genetic structure for this species by now, except that moderate variation was reported for two wild populations sampled from Shanwei City and Zhoushan City, China based on the cytochrome c oxidase subunit I (COI) gene and microsatellite markers, respectively12,13. To estimate and conserve this important crab resource, genetic studies such as population genetic diversity and evolutionary history should be carried out.

The genus Charybdis (De Haan, 1833) is an important group in family Portunidae. For a long time, the taxonomic status of this genus has remained controversial. Several studies have recommended to classify it into subfamily Portuninae of family Portunidae14,15,16. However, some other studies suggested to assign it to subfamily Thalamitinae of family Portunidae17,18. Molecular studies based on COI, 16S rRNA and RAPD19,20 supported the latter opinion. However, the phylogenetic analysis based on 13 protein-coding genes from the mitochondrial genome21 supported the former opinion. Thus, in order to better solve this problem, more studies need to be carried out in the future.

Mitochondrial genome is a typically closed-circular molecule ranging approximately from 14 to 18 kb in size and it consists of 13 protein-coding genes, 22 transfer RNA (tRNA) genes, 2 rRNA genes and a control region. It is thought to be an ideal marker for studies on population genetic diversity, molecular phylogeny and species identification because of its high mutation rate, simple structure, abundant distribution and maternal inheritance22,23,24. Thus far, complete mitochondrial genomes have been reported in many crustacean species, such as Litopenaeus vannamei25, C. japonica21, Scylla serrata26 and S. paramamosain23. Although several mitochondrial gene sequences of C. feriata are present in the GenBank database, the complete mitochondrial genome information is still not available by now. The lack of complete mitochondrial genome has limited the development of population genetic diversity and molecular evolution for this species.

The purpose of this study is to report the complete mitochondrial genome for C. feriata, elucidate its evolutionary status and estimate population genetic diversity and differentiation. This work should be helpful to better understand the evolutionary status and population genetic diversity of C. feriata and other related crustacean species.

Materials and methods

Sampling and DNA extraction

A total of 70 wild individuals of C. feriata were sampled from the southeastern coasts of China, with 21 from Hainan Island (named HN), 24 from the city of Xiamen (named XM) and 25 from the city of Zhoushan (named ZS) (Fig. 1). Animals were killed by a lethal dose of MS-222. Muscle tissues were collected and fixed in 99% ethanol at room temperature. Genomic DNA was extracted using the traditional proteinase K and phenol-chloroform extraction protocol as described by Ma27.

Figure 1
figure 1

Three localities of Charybdis feriata collected in this study.

HN, Hainan locality; XM, Xiamen locality; ZS, Zhoushan locality. This map was created by the software Adobe Photoshop 7.0.

Primers, PCR and sequencing

First, partial sequences of three genes (12 S rRNA, 16 S rRNA and COI) of C. feriata and the complete mitochondrial genomes of three closely related crab species (C. japonica, S. paramamosain and Portunus trituberculatus) were downloaded from the GenBank database. Then the three genes were confirmed by resequencing. Next, the complete mitochondrial genome of C. feriata was generated by overlapping PCR with specific or degenerate primers (Supplementary Table 1) and sequencing. Furthermore, the complete COI gene sequence was employed to evaluate the population genetic diversity and genetic differentiation of C. feriata population. A pair of primers (COI-f: 5′–AATAAGAAAGTTAATAACTTGTGTT–3′ and COI-r 5′–GAAGAAAAGTATCTTCCTAGTAGG–3′) with an anealing temperature of 52 °C were successfully designed. Seventy individuals collected from three localities (HN, XM and ZS) were evaluated in this study.

PCRs were carried out in a 25 μL volume that included 0.4 μM each primer, 0.2 mM each dNTP, 1 × PCR buffer, 1.5 mM MgCl2, 0.75 unit Taq polymerase and approximately 100 ng template DNA at the following conditions: one cycle of denaturation at 94 °C for 4 min; 37 cycles of 30 s at 94 °C, 50 s at a primer-specific annealing temperature (Supplementary Table 1) and 50 s at 72 °C. As a final step, the products were extended for 7 min at 72 °C. The PCR products were separated on 1.5% agarose gels and directly sequenced in both directions by using an ABI Prism 3730 automated DNA sequencer (PE Corporation). The sequences were edited and assembled using two softwares, EditSeq and SeqMan (DNASTAR).

Complete mitochondrial genome analysis

The graphical map of the complete mitochondrial genome (Fig. 2) was drawn using the online software OrganellarGenomeDRAW (http://ogdraw.mpimp-golm.mpg.de/)28. The genome structure was determined by sequence comparisons with the known complete mitochondrial genomes of the closely related species, including S. paramamosain23 and C. japonica21. tRNAs were identified by their proposed clover-leaf secondary structure and anticodons by using the web-based tRNA-scan SE 1.21 program (http://lowelab.ucsc.edu/tRNAscan-SE/)29 with default search mode. Protein-coding genes were translated into amino acids by using the software MEGA 4.030. The codon usage of protein-coding genes and the nucleotide composition of the mitochondrial genome were also determined using MEGA 4.0. Finally, the complete mitochondrial genome DNA sequence was deposited into the GenBank database by using the software Sequin 12.30 (http://www.ncbi.nlm.nih.gov/Sequin/).

Figure 2
figure 2

Graphical map of complete mitochondrial genome of Charybdis feriata.

Genes encoded by the heavy strand were shown outside the circle and encoded by the light strand were shown inside the circle respectively. The inner ring showed the GC content of this genome.

Phylogenomic analysis

The complete mitochondrial genomes of 19 species under Decapoda were downloaded from the GenBank database, including Charybdis japonica (FJ460517), Callinectes sapidus (NC_006281), Eriocheir japonica (NC_011597), Eriocheir hepuensis (NC_011598), Eriocheir sinensis (NC_006992), Geothelphusa dehaani (NC_007379), Fenneropenaeus chinensis (DQ518969), Litopenaeus vannamei (DQ534543), Pagurus longicarpus (NC_003058), Macrobrachium rosenbergii (NC_006880), Marsupenaeus japonicus (NC_007010), Penaeus monodon (NC_002184), Portunus trituberculatus (AB093006), Panulirus japonicus (NC_004251), Pseudocarcinus gigas (NC_006891), Scylla serrata (NC_012565), Scylla tranquebarica (NC_012567), Scylla olivacea (NC_012569) and Scylla paramamosain (JX457150). Furthermore, one species, Harpiosquilla harpax (NC_006916), was used as an outgroup taxon.

Protein-coding genes were aligned using Clustal W in MEGA 4.0 with default settings. As a result, gene ND6 showed high heterogeneity that consistently causes poor phylogenetic performance31. Thus, the remaining 12 protein-coding genes alignments were concatenated to a single multiple sequence alignment. Then the multiple sequence alignment was formatted and analyzed using RAxML web-servers (http://embnet.vital-it.ch/raxml-bb/index.php)32. The CAT model was used to estimate the evolutionary rate of the 12 protein-coding genes. Maximum likelihood (ML) search was carried out after bootstraps. The phylogenetic tree was drawn by the software FigTree v1.4.2.

Population genetic analysis

Haplotypes were identified using software Dna SP version 5.033 and deposited into the GenBank database. For each locality and overall locality, haplotype diversity (h) and nucleotide diversity (π) were calculated using Dna SP version 5.0. Molecular variance (AMOVA) analysis was carried out using software Arlequin version 3.1134 to explain the genetic structure and differentiation among these three localities. Significant level of the test was assessed using 1000 permutations of each pairwise comparison. Neutrality tests including Ewens-Watterson35,36, Chakraborty37, Tajima D38 and Fu’s Fs39 with 1000 permutations were performed using software Arlequin version 3.11. Mismatch analysis40 with 10000 bootstrap replicates were also performed using Arlequin 3.11. The histogram of mismatch distribution was constructed using the software Network version 4.6.1.2 (http://www.fluxus-engineering.com/)41. The median-joining network of haplotypes was also constructed using software Network version 4.6.1.2. The rough time of population expansion was estimated using the following equation t = τ/2u42, where t is the time since population expansion, τ is the mutational time scale, which is calculated using software Arlequin version 3.11 and 2u is calculated using the equation 2u = μ × length of sequence × generation time, where μ is the mutation rate. Given the lack of a calibrated mutation rate of COI gene of C. feriata, the mutation rates of COI gene ranging from 1.66% to 2.33% per million years of Sesarma43 were used. In addition, a generation time of one year was also used.

Results and Discussion

Genome organization

The mitochondrial genome of C. feriata was a typically circular molecule with 15,660 bp in size (GenBank accession no. KF386147) and consisted of 13 protein-coding genes, 22 tRNA genes, two rRNA genes and a control region (Fig. 2) as found in most metazoan species such as Lutra lutra44, C. Japonica21 and S. paramamosain23. This genome was slightly smaller than those of most sequenced crab species under Decapoda but larger than that of P. gigas, whose size was 15,515 bp21,23,26,45. Such a small genome was mainly due to the short size of the control region (762 bp). Moreover, the lengths of other regions of the genome were approximately equal among these species. Heavy strand (H-strand) encoded 23 genes, whereas light strand (L-strand) encoded the remaining 14 genes. The arrangement and order (Table 1) of the 37 genes were completely identical to those of reported species of Portunids, such as C. japonica21 and S. paramamosain23. However, the location of tRNAHis (Table 1) was different from that of most arthropods. In most arthropods, tRNAHis was between NAD4 and NAD5, whereas it was found between tRNAGlu and tRNAPhe in C. feriata. The phenomenon of gene rearrangements in mitochondrial genome was a relatively common event in crustacean species25. The tandem duplication of gene regions was thought to be the most acceptable mechnism of mitochondrial gene rearrangement, in this case, the slipped-strand mispairing took place first and then followed by gene deletions46.

Table 1 Gene structure of mitochondrial genome of Charybdis feriata.

Ten gene overlaps and ten intergenic spacers were found in mitochondrial genome of C. feriata, most of them have been reported in many other species mitochondrial genomes23,24,47. The total length of overlaps and intergenic spacers were 24 and 105 bp, with ranges from 1 to 7 and from 2 to 27 bp, respectively. The overall A + T content of C. feriata mitochondrial genome was 70.15%, which was similar to those of C. japonica, S. olivacea, S. serrata and P. trituberculatus (Table 2). However, different regions had different A + T contents. The control region had the highest A + T content (78.74%), whereas the protein-coding region had the lowest A + T content (68.60%).

Table 2 Comparison of mitochondrial genomes of partial crustacean species.

Protein-coding genes

A total of 13 protein-coding genes were identified, of which 9 (COI, COII, COIII, APT6, ATP8, ND2, ND3, ND6 and Cyt b) were encoded by H-strand and 4 (ND1, ND4, ND4L and ND5) were encoded by L-strand. These genes consisted of 11,182 bp in length and coded 3716 amino acids in total. All 13 genes were initiated by the start codon ATN (ATG, ATA and ATT), with an exception (GTG) in ATP8 (Table 1). The typical stop codon (TAA or TAG) was detected in nine protein-coding genes (ATP6, ATP8, ND1, ND2, ND3, ND4, ND4L, ND5 and ND6), whereas the remaining four genes (COI, COII, COIII and Cyt b) were ended by incomplete stop codons (T-- or TA-). Variable start codons and incomplete stop codons have been reported in many other mitochondrial genomes. For example, four types of start codons (ATT, ATG, ATA and ACG) were detected in mitochondrial genome of Myrmeleon immanis48. Two and one incomplete stop codons were found in Lutjanus russellii and S. paramamosain mitochondrial genomes23,49. For the incomplete stop codon, the missed nucleotides may be produced by post-transcriptional polyadenylation50. In addition, ND6 had the highest A + T content (72.78%), whereas COIII had the lowest A + T content (64.85%) (Table 3).

Table 3 The base composition for different regions of mitochondrial genome of Charybdis feriata (the genes which are encoded by the L-strand are converted to complementary strand sequences).

Transfer and ribosomal RNA genes

A total of 22 tRNA genes ranging from 64 to 74 bp in length were identified from the mitochondrial genome of C. feriata. All of them were capable of folding into a typically clover-leaf secondary structure (Fig. 3). In the closely related crabs C. japonica and S. paramamosain, tRNASer (AGN) could not form a secondary structure because it lacked the dihydrouracil (DHU) arms21,23. Fourteen tRNA genes were located on H-strand, whereas the remaining eight were located on L-strand. All tRNA genes had a common length of 7 bp for the aminoacyl stem and an invariable size of 7 bp for the anticodon loop. Variable nucleotide lengths of tRNAs were found at the DHU, TΨC and anticodon arms. All these 22 tRNA genes possessed the common anticodons of Decapods mitochondrial genomes, except that tRNALys and tRNASer (AGN) possessed TTT and TCT anticodons rather than CTT and GCT, respectively. Seven unmatched base pairs were found in 22 tRNA genes, which was lower than the number detected from tRNA genes of S. paramamosain23. The overall A + T content of 22 tRNA was 71.76%, with the highest content (84.62%) in tRNAAsp and the lowest content (57.35%) in tRNALys.

Figure 3
figure 3

Putative secondary structures of 22 tRNAs detected from mitochondrial genome of Charybdis feriata.

Both 16S and 12S rRNA genes were found on the L-strand of the mitochondrial genome. They were located between tRNALeu (CUN) and the putative control region and were separated by tRNAVal. The sizes of 16S rRNA and 12S rRNA genes were 1321 and 843 bp and the A + T contents were 74.26% and 71.89%, respectively.

Non-coding regions

A total of 11 non-coding regions were identified in the mitochondrial genome of C. feriata. The major non-coding region (762 bp in length) was found between 12S rRNA and tRNAIle, which was considered to be the putative control region. The other 10 non-coding regions were small, ranging from 2 to 27 bp in length. The A + T content of control region was higher (78.74%) than that of other regions in mitochondrial genome. The high rate of A + T content was due to the existence of A/T repeated motifs. In control region, TA, AT, TAA, AAT and TTA were found to be the most abundant motifs. Additionally, microsatellite sequences were detected, such as (TA)3, (AT)3, (TA)4, (AT)4 and (TA)11. Microsatellites were also identified from control region of the mitochondrial genome in M. immanis and Nymphes myrmeleonoides48. In addition, the nucleotide composition of control region in this study was 41.99% for A, 7.87% for G, 36.75% for T and 13.39% for C (Table 3).

Phylogenetic relationship

The taxonomic status of genus Charybdis within Portunidae has been a highly contentious issue for a long time. In this study, we estimated the evolutionary relationship of C. feriata within Decapoda by reconstructing a phylogenetic tree. This tree was created based on 12 concatenated protein-coding genes from the mitochondrial genome of 21 species. From the tree topologies (Fig. 4), we found that C. feriata and C. japonica first formed a monophyletic group and showed the closest relationship to each other. Together with C. sapidus and P. trituberculatus, they then formed another monophyletic group. Furthermore, these four species showed a sister relationship with another four species of genus Scylla. Thus, our results supported the opinion of classifying genus Charybdis into subfamily Portuninae of family Portunidae. The same suggestion was also proposed by Liu and Cui21.

Figure 4
figure 4

Phylogenetic relationship of Decapoda species based on 12 protein-coding genes.

Population genetic diversity and differentiation

The complete COI gene sequence (1534 bp) was employed to estimate the genetic diversity and differentiation of C. feriata population. A total of 33 haplotypes (Table 4) were identified from 70 individuals, of which 14 were from HN locality, 11 from XM locality and 16 from ZS locality. H9 was the most abundant haplotype, which was present in each locality. A high level of genetic diversity (Table 5) was found with h ranging from 0.819 to 0.867 and π ranging from 0.0011 to 0.0013 per locality. The h value was slightly higher than that (0.787) reported by Huang12. In our previous study, moderate genetic variation of C. feriata was detected by microsatellite markers13. Moreover, a high genetic diversity was also observed in other marine animals, such as S. serrata51 and Salmo salar52. The following factors, including life history charateristics, environmental heterogeneity and large population sizes, may help in maintaining a high level of genetic diversity among marine animals53,54. AMOVA analysis indicated that 99.77% of the total genetic variation was contributed by within-localities variation, whereas only 0.23% was caused by among-localities variation (FST = 0.0023, P > 0.05) (Table 6). In addition, no significant genetic differentiation was found among three localities (Table 7). The above analysis showed that the genetic differentiation among three localities of C. feriata was at a low level, suggesting a single population in East China Sea and South China Sea. The summer southwest monsoonal wind and winter northeast monsoonal wind drive seasonal ocean current circulation between South China Sea and East China Sea55,56. This homogenous population structure of C. feriata might be related with the ocean current circulation and the high larval dispersal ability of this crab species.

Table 4 Polymorphic sites of 33 haplotypes of complete COI gene of Charybdis feriata., identical.
Table 5 Genetic diversity of Charybdis feriata from three localities.
Table 6 Analysis of molecular variance (AMOVA) of three localities of Charybdis feriata.
Table 7 Genetic distance (below diagonal) and fixation index (FST) (above diagonal) among three localities of Charybdis feriata.

Neutrality tests, mismatch analysis and population expansion estimation

Four kinds of neutrality tests, including Ewens-Watterson, Chakraborty, Tajima D and Fu’s Fs, were carried out in C. feriata. All of them, except Ewens-Watterson, suggested a significant deviation from mutation-drift equilibrium (Table 8). Mismatch distributions analysis (Table 9) indicated that the estimated effective population size after population growth was significantly larger than that before population growth. In addition, a star-like topology was produced based on 33 haplotypes (Fig. 5). Of these haplotypes, H9 was located in the center of this topology and it was closely linked with the majority of other haplotypes, thereby suggesting that it is the ancestral haplotype in C. feriata. All above analysis indicated that C. feriata might have undergone a population expansion event. Meanwhile, the high h (between 0.819 and 0.867) and low π (between 0.0011 and 0.0013) also suggested that C. feriata underwent population expansion after a period of low effective population size. Sudden population expansion can affect population genetic diversity and haplotypes and in this process more haplotypes were generated by mutation than were removed by genetic drift57. We further deduced that the population expansion event of C. feriata might have occurred between 61,498 and 43,814 years ago. This period of population expansion is a little bit later than the Last Interglacial complex (140-75 kya)58. This findings showed that the Last Interglacial complex might have played an important role in demographic history of C. feriata. Further, a big changes of nutrient concentrations and sea water temperature could affect the population distribution of marine orgnisms too59.

Table 8 Neutrality tests results of Charybdis feriata.
Table 9 Mismatch distributions analysis of Charybdis feriata.
Figure 5
figure 5

The median-joining network of 33 haplotypes of complete COI gene sequence of Charybdis feriata.

XM, Xiamen locality; HN, Hainan locality; ZS, Zhoushan locality.

Conclusion

This study first described the complete mitochondrial genome of C. feriata, which was 15,660 bp in length, including a typical set of 37 genes and a control region. Phylogenomic analysis results supported that genus Charybdis should be classified into subfamily Portuninae rather than into subfamily Thalamitinae. Furthermore, a high level of genetic diversity and a low level of differentiation of C. feriata were found and a population expansion event was deduced to have occurred between 61,498 and 43,814 years ago. This study should be helpful for studies on evolution and phylogeny, population genetic structure and conservaton genetics for C. feriata and related species60,61,62.

Additional Information

How to cite this article: Ma, H. et al. First mitochondrial genome for the red crab (Charybdis feriata) with implication of phylogenomics and population genetics. Sci. Rep. 5, 11524; doi: 10.1038/srep11524 (2015).