Transcriptome Analysis and Differential Gene Expression on the Testis of Orange Mud Crab, Scylla olivacea, during Sexual Maturation

Adequate genetic information is essential for sustainable crustacean fisheries and aquaculture management. The commercially important orange mud crab, Scylla olivacea, is prevalent in Southeast Asia region and is highly sought after. Although it is a suitable aquaculture candidate, full domestication of this species is hampered by the lack of knowledge about the sexual maturation process and the molecular mechanisms behind it, especially in males. To date, data on its whole genome is yet to be reported for S. olivacea. The available transcriptome data published previously on this species focus primarily on females and the role of central nervous system in reproductive development. De novo transcriptome sequencing for the testes of S. olivacea from immature, maturing and mature stages were performed. A total of approximately 144 million high-quality reads were generated and de novo assembled into 160,569 transcripts with a total length of 142.2 Mb. Approximately 15–23% of the total assembled transcripts were annotated when compared to public protein sequence databases (i.e. UniProt database, Interpro database, Pfam database and Drosophila melanogaster protein database), and GO-categorised with GO Ontology terms. A total of 156,181 high-quality Single-Nucleotide Polymorphisms (SNPs) were mined from the transcriptome data of present study. Transcriptome comparison among the testes of different maturation stages revealed one gene (beta crystallin like gene) with the most significant differential expression—up-regulated in immature stage and down-regulated in maturing and mature stages. This was further validated by qRT-PCR. In conclusion, a comprehensive transcriptome of the testis of orange mud crabs from different maturation stages were obtained. This report provides an invaluable resource for enhancing our understanding of this species’ genome structure and biology, as expressed and controlled by their gonads.


Introduction
Orange mud crab, Scylla olivacea is widely distributed along the equator and predominantly found in the Southeast Asia region [1][2][3][4][5]. It is considered as one of the most economically important marine crustacean species in Southeast Asean countries including Malaysia, Thailand, Philippines and Indonesia [2,6,7]. Until now, landing of S. olivacea around Southeast Asia region depends solely on wild fisheries and although small scale aquaculture productions were reported [2], they often only involve fattening of wild-caught marketable-sized crabs with low flesh content or production of soft-shelled crabs from captured juvenile crabs. The overexploitation of wild S. olivacea resources, coupled with habitat loss and pollution, negatively affect its population health and indirectly impact the livelihood of coastal communities as well. One of the ways to help safeguard the natural resources of S. olivacea is to meet the market's demand with farmed animals. In 2014, the estimated world aquaculture production of Scylla species was approximately 183,000 tonnes (Scylla serrata Fact Sheet, Cultured Aquatic Species Information Programme, Fisheries and Aquaculture Department, Food and Agriculture Organization of the United Nations; http://www.fao.org/fishery/species/2637/en [accessed February 20, 2016]). Unfortunately, most of these productions still rely on wild broodstocks and juveniles [2]. Full involvement of S. olivacea in aquaculture is currently still not possible due to the lack of in-depth knowledge in many fields, especially regarding its basic reproductive biology and physiology.
Directly related to sexual maturation and reproduction, testis is responsible for the production of male gametes via spermatogenesis and androgenic hormones. The morphology and ultrastructure of testis and germ cells of Scylla spp., and their histological changes during sexual maturation has been described in detail by Anbarasu et al. [8] and Waiho et al. [9], yet the regulatory mechanism and gene expression in testis during sexual maturation are still poorly understood. Extremely limited molecular studies were conducted on S. olivacea [10,11]. Most studies focus primarily on the maturation of females and tissue-specific gene expression profiles in male S. olivacea are currently unavailable [11]. The limited genome and transcriptome information available for this economically important portunid species hampers the largescale aquaculture of S. olivacea, especially in the field of broodstock selection and artificial seed production.
Transcriptome analysis is able to reveal genes that are being actively expressed in specific tissue and species of interest, and also facilitate the discovery of potential molecular markers. This is in particular useful in non-model organisms where the full genome data is still not available for comparison [12][13][14]. The use of transcriptome analysis has been reported in several economically important aquaculture species [15][16][17][18]. The reproduction-related genes of commercially important crustacean species, such as swimming crab Portunus trituberculatus, Chinese mitten crab Eriocheir sinensis, green mud crab Scylla paramamosain and Oriental river prawn Macrobrachium nipponense were successfully identified via transcriptome sequencing [12,[19][20][21]. To date, the sequencing of whole genome and research involving next-generation sequencing of S. olivacea has yet to be reported. The availability of sufficient genome or transcriptome data are potentially useful for studies on differential gene expressions, gene regulatory mechanisms, and molecular marker application. Present study presents a comprehensive analysis of the transcriptome data derived from testis tissue of S. olivacea in different maturation stages using Illumina HiSeq. An annotated S. olivacea testis transcriptome library was constructed via de novo assembly of sequenced reads. The findings in this study provide an in-depth insight to the changes occurring in the testis of S. olivacea at molecular and genomic level, and could further facilitate future studies on specific functional genes, identification of molecular markers and the construction of detailed genetic map in this species.

Sample collection
Male S. olivacea (carapace width range = 60.0 to 123.0 mm) were obtained from Setiu Wetlands, Terengganu, Malaysia (5˚38'19''N; 102˚46'20''E) during July 2014. Setiu Wetlands is a common fishing ground and no licensing was required for the acquisition of mud crabs. We adhered to the ASAB (2012) "Guidelines for the treatment of animals in behavioural research and teaching" published in Animal Behaviour 83: 301-309. None of the work involved endangered or protected species. All crab handling and experimental procedures were approved by the Ethics Committee of Institute of Tropical Aquaculture, Universiti Malaysia Terengganu in accordance with the "Malaysian code of practice for the care and use of animals for scientific purposes" outlined by Laboratory Animal Science Association of Malaysia. All crabs were transported live back to marine hatchery of Institute of Tropical Aquaculture, Universiti Malaysia Terengganu, Terengganu, Malaysia, disinfected and maintained briefly in filtered sea water before being sacrificed.

RNA extraction and cDNA library preparation
Crabs were categorised into three maturation stages, i.e. immature, maturing and mature, based on their gonadosomatic index (GSI) and gonad external morphologies: immature-GSI = <0.15, vas deferens are translucent and barely visible; maturing-GSI = <0.36, vas deferens are visible, milky white but not enlarged; mature-GSI = >0.40, vas deferens are milky white and swollen [9]. Testes of crabs from all maturation stages were removed and snap frozen in liquid nitrogen, with six samples per stage. Testes were homogenized using mortar and pestle and temperature was maintained low using liquid nitrogen. RNA extraction using Direct-zol™ RNA MiniPrep (Zymo Research, U.S.A) was conducted independently on one sample from each tissue to ensure that RNA extraction method used was able to extract sufficient quantity of high quality RNA. Subsequently, equal amount (25 mg) of the remaining homogenized samples were pooled according to maturation stage (five samples per stage) and total RNA was extracted for each pooled samples. The RNA quality and quantity were assessed using Nano-Drop 2000 (Thermo Fisher Scientific Inc., USA) and Qubit 2.0 RNA Broad Range Assay (Invitrogen, USA) respectively. The RNA integrity number (RIN) of each samples were measured using Agilent Bioanalyzer (Agilent, USA). All samples were selected for sequencing (RIN in the range of 7.4-8.3). RNA were then pooled according to maturation stages. mRNA isolation and cDNA synthesis were performed using NEBNext 1 Ultra™ RNA Library Prep Kit for Illumina 1 according to manufacturer's protocol. The synthesized cDNA was quantified using Qubit 2.0 DNA Broad Range Assay (Invitrogen, USA). A minimum of 10ng cDNA was fragmented using Covaris S220 (Covaris Inc, USA) to a targeted size of 200-300 bp. The fragmented cDNAs were then end-repaired, ligated to NEBNext adapters, and PCR-enriched using NEBNext 1 Ultra™ RNA Library Prep Kit. The final sequencing libraries were quantified using KAPA kit (KAPA Biosystem, USA) on Agilent Stratagene Mx-3005p quantitative PCR (Agilent, USA) and sizes were confirmed using Agilent Bioanalyzer High Sensitivity DNA Chip (Agilent, USA). The resulting sequencing libraries were sequenced using an Illumina flow cell, and 209 cycles on the Illumina HiSeq™ 2000 platform (Illumina, USA). The sequencing run generated a total of 17 GB of raw data. [23]. FastQC assessment reports of sequence reads were used to evaluate read quality before and after pre-processing. All subsequent analyses were conducted using clean reads.
After pre-processing, the clean reads from the data sets were assembled by de novo assembly using Trinity RNA-Seq version 2.0.4 [24]. Reference transcripts were generated by combining all clean reads of the Illumina sequencing data sets. Only one gene (the longest one) was selected to represent the assembled component from each cluster to prevent redundancy [24]. Transcriptome assembly completeness was analysed using BUSCO [25] against a set of 2,675 arthopoda genes to evaluate the quality of the final assembly. All clean reads of de novo assembly sequence data from S. olivacea were deposited in GenBank, National Centre for Biotechnology Information (NCBI, USA, http://ww.ncbi.nlm.nih.gov/) under the Accession No. GDRN00000000 (BioProject Accession No. PRJNA289610).

Functional annotation
Homology searches and assembled transcripts mapping were conducted using Blastx (version: ncbi-blast-2.2.30+) against the UniProt database, Interpro database, Pfam (Protein family) database and Drosophila melanogaster protein database with a cut-off e-value of 1e-5. The top (best) hit from each assembled transcript comparisons were used as the annotation reference for the respective transcripts. The Gene Ontology (GO) terms of S. olivacea were further analysed using Blast2GO software v.2.6.0 [26,27] based on default parameters (e-value < 1e-6, annotation cut-off > 55 and a GO weight > 5).

Single Nucleotide Polymorphism (SNP) calling
For SNPs calling, only reliable, Bowtie mapped reads were considered. Insertion or deletion variations (InDels) were excluded because alternative splicing impedes reliable InDel discovery. SNPs were called using SAMtools mpileup [28]. Genotype likelihoods were computed using SAMtools utilities. Variable positions in the aligned reads were compared to the reference transcripts using the BCFtools utilities. Read depth ! 10, SNP reads/total reads ratio ! 25, SNP quality ! 50 and mapping quality ! 20 were used to filter false positive SNPs by using in-house Perl scripts.

Identification and validation of differentially expressed gene
To identify differentially expressed genes, paired-end reads were first aligned back to the assembled transcripts (length ! 300 bp) using RSEM [29]. Transcripts' abundance was then estimated and alternatively-spliced transcripts were constructed. In some rare cases, these transcripts may be from paralogs that shared high sequence similarity. Differential expression analysis between samples was conducted using edgeR [30]. Expected counts of mapped read pairs were normalized, and the fold changes and p-values for each gene or transcript were calculated. Results were then filtered based on a set of threshold values (log2FoldChange and adjusted P-(P adj ) value < 0.05). For the identification of significantly differentially expressed genes, only genes with padj value of < 1e-10 was considered.
Total RNA from immature, maturing and mature specimens were extracted using Direct-zol™ RNA MiniPrep (Zymo Research, U.S.A) and converted to cDNA using iScript™ Reverse Transcription Supermix (Bio-Rad, USA) as per manufacturer's protocol. Approximately 5 μl of RNA served as template for cDNA conversion and the incubation protocol was: priming at 25˚C for 5 min, reverse transcription at 42˚C for 30 min and inactivation at 85˚C for 5 min. Quantitative real-time polymerase chain reaction (qPCR) was run in Miniopticon Real-time PCR system (Bio-Rad, USA) with SYBR Green PCR Master Mix (Bio-Rad, USA) to validate differentially expressed genes obtained from transcriptome data. Primers were designed using PrimerQuest Tool (Integrated DNA Technologies Inc., Singapore) with housekeeping gene 18S rRNA [31] as internal control (normalization gene) ( Table 1). Three biological replicates and two technical replicates for each maturation stage were run along with internal control in qPCR. Standard manufacturer protocol was applied, with each qPCR reaction (total volume = 25 μl) contained 10 ng cDNA as template. The temperature profile used was initial denaturation at 95˚C for 3 min, followed by 40 cycles of denaturation at 95˚C for 15 s and annealing at 60˚C for 30 s. cDNA template was replaced with diethylpyrocarbonate water in negative control. Comparative Cycle Threshold (C T ) method [32] was used to determine the fold difference of studied gene in different maturation stages. One-Way ANOVA was used to determine statistical difference between maturation stages (significant value at p < 0.05), followed by Tukey's test. All statistical analyses were conducted using Microsoft excel 2013.

Transcriptome sequencing and read assembly
Three cDNA libraries representing different maturation stages (i.e. immature, maturing and mature) of S. olivacea were sequenced using Illumina HiSeq 2000 platform. A total number of 76,337,338, 64,928,802 and 30,841,304 raw reads were obtained from immature, maturing and mature male crabs respectively. Approximately 86.27%, 86.05% and 74.02% of clean reads were retrieved after pre-processing (adaptor removal, quality trimming and N removals) to discard low quality and empty reads ( Table 2). A large number of reads (86.50%) aligned back to the transcripts as expected ( Table 2). Reads that did not map back to the assembled transcripts corresponded to either low quality reads or lowly-expressed transcripts that could not be assembled due to the minimum length requirement (! 300 nt). The assembled transcripts (n = 160,569) had a total size of 142,192,028 bp, an average size of 886 bp, assembled transcript range of 300 bp to 16,041 bp and a N50 assembled transcripts length of 1,225 ( Table 2). Nearly half of (45.55%) of the assembled transcripts were at the length range of 300-499 nt (Fig 1). Approximately 41% (n = 64,793) of assembled transcripts contained proteincoding potential. Busco analysis revealed that 2,045 out of 2,675 genes could be fully annotated (76% completeness) and 2,355 out of 2,645 genes met the criterion for partial annotation (88.04% completeness).

Functional annotation
BLASTx search against the UniProt database, Interpro database, Pfam database and D. melanogaster protein database was conducted to annotate the consensus sequences. Out of 160,569 total number of assembled transcripts, 36,642 (22.82%) transcripts mapped back to UniProt database, 25,511 (15.89%) transcripts mapped back to Interpro database, 23,620 (14.71%) transcripts mapped back to Pfam database and 25,375 (15.80%) transcripts mapped back to D. melanogaster protein database (1e-5 cut-off threshold). A total of 240 transcripts (0.95%) to the D. melanogaster protein database were full length. Approximately 75.32% of the top-hit alignments had a similarity of higher than 40% (Fig 2). Seven out of the top ten organism hits in S. olivacea transcriptome against UniProt database were Arthropods (Table 3). Nevada termite, Zootermopsis nevadensis had the highest matched assembled transcripts percentage (11.38%) followed by water flea, Daphnia pulex (6.67%) and European centipede, Strigamia maritima (4.96%) (Fig 3). Among the annotated transcripts, 480, 56, 8 and 1 transcripts were similar to that of other Scylla species in UniProt database, i.e. S. paramamosain, S. serrata, S. olivacea and S. tranquebarica respectively. The top 20 high quality annotations of S. olivacea transcriptome based on E value and bit score are listed in Table 4. GO terms of S. olivacea transcriptome were analysed using the GO classification system. A total of 19,155 (52%) transcripts were GO-categorized into one of the three GO domains, i.e. biological process (12,250 transcripts), cellular component (11,129 transcripts) and molecular function (26,805 transcripts) while the remaining 17,487 transcripts were unassigned. Fig 4 shows the distribution of transcripts across the top 10 GO terms for each of the three GO domains. The top three categories in the biological process GO domain were "DNA integration" (698 transcripts), "transmembrane transport" (381 transcripts) and "regulation of transcription, DNA-templated" (350 transcripts). In the cellular component GO domain, most of the transcripts were involved in "integral component of membrane" (3185 transcripts), "nucleus" (1406 transcripts) and "membrane" (908 transcripts). "nucleic acid binding", "ATP binding" and "zinc ion binding" were the top three categories in the molecular function GO domain, with a total number of assigned transcripts of 2222, 1794 and 1511 respectively.

Genes associated with growth, development and reproduction
During the annotation process, a number of GO terms associated with growth, development and reproduction processes, especially with the term from the ontology of "multicellular organismal development" (GO:0007275). The child terms and co-occurring terms associated with this parent category are listed in Table 5. The regulators (i.e. proteins) of growth, development and reproduction were identified from S. olivacea transcriptome annotation results ( Table 6).  Identification and validation of differentially expressed gene A total of 200 genes were up-or down-regulated with a P adj value of < 0.05 (Table 7, S2 Appendix). Of these differentially expressed genes, only 69 genes were successfully annotated, while the remaining 65.5% are novel genes. Significant differential expression patterns between  different maturation stages of S. olivacea are clearly seen in the heatmaps (Figs 6, 7 and 8). In general, more differentially expressed genes were found in the comparison involving immature crabs (67 and 106 differentially expressed genes were found for the comparison between immature and mature crabs, and between immature and maturing crabs, respectively) than the comparison between mature and maturing crabs (27 differentially expressed genes). Differentially expressed genes that were annotated (excluding genes encoding for uncharacterized proteins) are tabulated in Table 8 based on the different clustering within each heatmap. However, application of minimum threshold of Padj < 1e-10 revealed only one gene that is likely a potential candidate marker for immature crabs. The most significant differentially expressed gene-the 1515 bp beta crystallin like gene (accession no: GDRN01147796.1) was up-regulated in immature specimens but down-regulated in maturing and mature specimens. No significantly differentially expressed genes with minimum threshold of Padj < 1e-10 were found when comparing mature and maturing specimens. The beta crystallin like gene was validated using qPCR and gene-specific primers (Fig 9).

Discussion
In recent years, the usage of high-throughput sequencing technique to reveal various genomic and genetic information, even in non-model organisms has been steadily gaining momentum [33,34]. In addition, transcriptome sequencing allows the profiling of genes that are differentially expressed under different physiological conditions [35]. Current study used pooled samples to represent each developmental stage for the differential expression analysis as we were interested in the gene expression among stages rather than the inter-individual variation within specific stage. Thus, in this context, pooling minimizes the effects of biological variation (difference among individuals) [36] and highlights the substantive gene expressions expressed during each stage [37]. Konczal et al. [38] reported that when liver transcriptomes of bank voles were sequenced individually and as pooled samples, the accuracy of allele frequency estimation was minimally affected by inter-individual variation in gene expression and that pooled RNA-seq is as accurate as pooled genome resequencing. A total of 17 Gbp transcriptome data consisting of 144,560,704 clean reads were successfully obtained in three runs in present study. The amount of clean reads retrieved were higher than that acquired from the Chinese mitten crab (Eriocheir sinensis, 25,698,778 reads in two runs) [12] and boreal spider The discoveries and annotations of known genes were based on four protein databases, i.e. UniProt, Interpro, Pfam and D. melanogaster protein database. The low number of successful gene annotations (approximately 15-23% hits when compared to the four protein databases) might be due to unavailability of whole genome of the studied crab species and the scarcity of genomic data of closely related organisms in public domains [41]. Using the same next-gene sequencing (NGS) technology, i.e. Illumina HiSeq 2000 platform, approximately 18.62% of clean reads of a non-model organism, the swimming crab (Portunus trituberculatus) were annotated in Swiss-prot [21]. In addition, aquaculture sector and researchers also focus more on female candidates of most commercially important species, resulting in richer genetic information compared to males. The high percentage of unannotated sequences (more than 75%) from the transcriptome data of S. olivacea implies that potentially useful genetic information, especially differentially expressed genes that might be available was missed and remain unexploited. Thus, current transcriptome data might still hold many important genes and valuable genetic information that can be mined in the near future.
In the transcriptome data of S. olivacea, predominant gene clusters were found to be involved in various biological processes (e.g. DNA transcription and signal transduction processes) and molecular functions (e.g. molecular binding activities), in addition to formation of structural component of cells, such as nucleus, membrane and cytoplasm. The consistency of gene distribution based on GO terms and GO categories in the present study with other  -43] showed that genes encoding these functions are rather conserved and easily annotatable from database. Functional annotation and enrichment analysis of GO functions aid in mapping out genes and their potential functions at transcriptomic level. The transcriptome data in present study represents an extensive gene catalog particularly expressed in the testis of S. olivacea, with important role in several biochemical processes such as reproductive development, growth and sexual differentiation. These transcriptome data will be useful for future genomic and gene functional analysis of S. olivacea.
Although the role of gonad in regulating developmental processes in crustaceans with the aid of a variety of regulatory factors (e.g. hormones and neurotransmitters) have been extensively studied [44][45][46], the underlying molecular mechanisms governing their biosynthesis remain largely unexplored [21]. Gene sequences related to growth, sexual differentiation and reproduction were identified in the transcriptome data of S. olivacea. Known for their regulatory role in reproduction in crustaceans [44,47], the identification of crustacean hyperglycemic hormone (CHH) family peptides (Table 6) in this study may aid in providing possible alternatives to the conventional eyestalk ablation methods to promote growth and sexual maturation. Found in our gonad transcriptome of male S. olivacea, neurotrophins (Table 6)   Identified mostly in insects, neuroparsins are multifunctional neurohormones that are antigonadotropic, involved in the regulation of hemolymph lipid and trehalose levels, and in their reproduction development [53,54]. Recently, a crustacean neuroparsin-Metapenaeus ensis neuroparsin (MeNPLP) homologous to the insect neuroparsin was discovered in most major organs of sand shrimp Metapenaeus ensis, including in the hepatopancreas, nerve cord, brain, heart, ovary and muscle. Surprisingly, no expression of MeNPLP was found in the testis. MeNPLP is involved in the ovarian maturation in shrimp as a drop in the production of vitellogenin protein in hemolymph and ovary was observed following the RNAi silencing of MeNPLP [55]. The discovery of neuroparsin gene expression in the testis of S. olivacea ( Table 6) might indicates that the neuroparsin is vital for the development and reproduction of male mud crab but not in shrimp. Similar postulate was proposed to explain the absence of neuroparsin gene in the widely-studied Drosophila melanogaster (Arthropoda, Insecta) genome and that due to different metamorphosis patterns, neuroparsin becomes non-essential in some Drosophila species [56].
VASA gene is vital for germ cell development, proliferation and maintenance and can be found in both invertebrates and vertebrates [57,58]. This gene encodes for RNA-dependent helicase and is specifically expressed in germ cells throughout all developmental stages [59]. The function and regulation of VASA proteins during gonadal development and gametogenesis have been described for several crustacean species [57,[59][60][61][62], including S. paramamosain [31]. Only found to be expressed in the ovary and testis, VASA gene was highly expressed during early gametogenesis of S. paramamosain, with significantly higher expression levels were observed in the testes of immature and maturing males. In contrary, no significant decrease in the expression of VASA gene was found among different developmental stages of S. olivacea ( Table 6, S2 Appendix). This inconsistency of VASA expression was also found in other crustacean species. For example, in Chinese white shrimp Fenneropenaeus chinensis, the expression of VASA gene showed a decrease pattern from spermatogonia to spermatids, and no expression was observed in mature sperm [63]; while VASA RNA was found in the nucleus and cytoplasm of sperms of giant freshwater prawn (Macrobrachium rosenbergii) [60].
Prostaglandins (PGs) are cell-signalling autocoids derived from lipids and some are known to be involved in the reproduction development in crustacean, i.e. the level of PGD2, PGE2 and PGF2 α increased with the progression of vitellogenesis and ovarian developmental stages [21,63]. However, most of the previous studies in crustaceans focused on the involvement of PGs on oogenesis and ovarian development [64][65][66]. In the present study, we identified two PGs, namely PGE2 and PGF (Table 6). Both PGs are known for their regulatory roles during oocyte maturation in animals including crustaceans [64,67,68]. Thus, the findings in this study suggest that PGs might also be involved in the regulation of testicular development in S. olivacea.    Ecdysteroid receptors (EcR) are nuclear receptors that are to be bound and activated by ecdysteroids [69]. They act as ligand-dependent ecdysteroid signalling mediators and upon binding with ecdysteroids, corresponding genes will be actively transcribed and a cascade reaction will be initiated. Although present in all arthropods, the number of hormones and receptor isoforms' structures in crustaceans differ with that of insects'. In crustaceans, ecdysteroids are produced by Y-organs and positively regulate molting, gametogenesis and gonad maturation [21,70]. Some EcR splice variants are organ-specific and they might play different roles although present in both sexes [71]. Four types of EcR were found in this transcriptome, namely EcR, EcR2, EcR3 and putative ecdysteroids/dopamine receptor ( Table 6). As shown by Li et al. [72] in Drosophila, EcRs found in the testis of S. olivacea might also play the same role -maintenance of testis stem cells.
SNPs are potential markers that are frequently used in trait-mapping and whole-genome association studies due to their wide distribution and abundant polymorphisms [73,74]. They serve as potential markers in non-model species lacking full annotated genome sequences [16,75,76]. For example, an ATP-dependent DNA helicase gene, RuvB-like 2, with three SNPs (one exonic and two intronic) was significantly expressed in the ovaries of mature giant tiger shrimp (Penaeus monodon) and influenced overall body weight during ovarian development [77]. In addition, four intronic SNPs in the actin and CHH were reported to influence the growth performance in M. rosenbergii [78]. The mean Ts: Tv ratio (2.22: 1.00) of SNPs reported in current study can aid in the identification of genes affected by selection [76,79]. Studies showed that unlike in fish [75,[80][81][82], the mean Ts: Tv ratio is species-specific in crustaceans. The mean Ts: Tv ratios in M. rosenbergii [83] P. trituberculatus [84], green mud crab (S. paramamosain) [20] and Chinese mitten crab (Eriocheir sinensis) [85] were 1.99: 1.00, 1.00: 1.79, 3.48: 1.00 and 1.00: 1.84, respectively. In addition, the superiority of Illumina HiSeq 2000 over the Roche/454 platform and its potential in the development of SNP markers were highlighted in this study, with approximately eleven-fold increase in the SNPs discovery (13,271 SNPs detected in the testis and ovary of S. paramamosain as reported by Gao et al. [20] in comparison with 156,181 SNPs found in the testis of S. olivacea in this study). The putative SNPs found in this study are useful in various fields of fisheries and aquaculture regarding S. olivacea, such as the study of population genetic structures, conservation of wild population, mapping of economically important traits, and provide resource for potential valuable markers for future selective breeding of S. olivacea.
The availability of transcriptomic data from the testis of S. olivacea found in this study proved to be beneficial, in which soon after approximately 160,000 transcriptome shotgun assembly sequences of S. olivacea were made public in GenBank, our data were mined for putative peptide-encoding transcripts to further understand the peptidergic control systems in S. olivacea and subsequently suggest possible endocrine manipulation to improve its aquaculture production [86]. Being the largest and most diverse class of hormones, peptides function as major signal transducers and essentially regulate behavioural and physiological changes in all aspects, including growth, sexual development, reproduction and metabolism [87][88][89][90]. This mined peptidome identified 49 transcripts encoding putative peptide precursors and subsequently predicted 187 distinct peptides for S. olivacea [86]. Based on the high similarity in peptide structure and the numbers of peptide families found between S. olivacea and S. paramamosain [91], Christie [86] postulated that the physiological roles of these peptides might be conserved in both Scylla species. The precursors of neuropeptides found in this study, e.g. CHH and vitellogenin-inhiting hormone (VIH), are mainly produced by the X-organ-sinus gland complex located at the eyestalk ganglia of S. olivacea [45,92,93]. However, some of the peptide groups, such as the CHH, were reported to be produced and released by non-neuronal tissues (epithelial endocrine cells of the gut) as well in other crab species for the regulation of water and ion during moulting [93]. Thus, the discovery of these putative peptide-encoding transcripts in the testis of S. olivacea suggests that testis might be involved in the production and regulation of reproductive hormones in Scylla spp. and possibly also in other brachyurans or crustaceans more than what we expected. In support of this postulate, a neuropeptide-pigment dispersing hormone (PDH)-encoding transcript, was also found to be produced in the reproductive organs (i.e. ovaries) of S. paramamosain [94].
The reproductive regulatory mechanism and development are complex processes, with testis being the main regulator. The differentially expressed genes found between the testis expression profile of different maturation stages serve as a large candidate database for the mining of novel genes involved in the gonad development, maturation and reproduction in S. olivacea and other crustaceans as more than half (65.5%) of the differentially expressed genes are novel genes (S2 Appendix). Most of the annotated differentially expressed genes (e.g. transposase, prosaposin and aminopeptidase) are involved in general cell regulation and signalling pathways ( Table 8). Genes that regulate growth, maturation and reproduction such as Farne-soic_acid O-methyltransferase and vitellogenin increased in expression in the testis of S. olivacea as the crab matures. Other genes expressed in testis such as Dmrt (reported in S. paramamosain [20] and E. sinensis [95]) and Feminization-1 (FEM-1) (reported in S. paramamosain [20]) that are involved in sex differentiation and testis development were not found in this study. Of a total of 200 genes that were differentially expressed at Padj < 0.05 (S2 Appendix), beta crystallin like gene was the most significant differentially expressed gene at Padj < 1e-10 (up-regulated in immature stage but down-regulated in maturing and mature stages). Thus, this gene serves as a good candidate for a marker of immaturity in crab testis. The beta crystallin domain (Pfam PF00030) is a water-soluble calcium binding domain found in a diverse set of proteins. Proteins within this domain are multifunctional and although primarily found in the eye lens, beta crystallin is also regulated in other sites such as brain and testis [96,97]. Found in all vertebrate classes, beta crystallin is highly expressed during developmental stages, presumably involved in the formation of complex optical properties in the eye lens [98]. In addition, betaB2-crystallin proteins are postulated to be involved in fertility as mutation in betaB2-crytallin gene resulted in subfertile mice in both males and females [99,100]. This gene is found to be upregulated in the testis of mice during the initiation of spermatogenesis [99], similar to the result found in current study. The relationship between beta crytallin proteins and the gonad maturation in invertebrates is still unexplored and this study serves as the first report of this gene in invertebrate and its possible involvement in the gonadal maturation. This finding broadens our understanding on the reproductive biology of invertebrates, particularly crustaceans, as they are known for regulating their reproductive development with the aid of neuropeptides produced in the eyestalk [45]. If their functions remain the same, the beta crystallin like proteins are also likely to be found in the eye lens of crustaceans. Thus, the use of the frequently adopted procedure of eyestalk ablation to promote faster gonadal maturation especially in male crustaceans for aquaculture production need to be reviewed because although eyestalk ablation removes testis inhibiting factors and resulted in the increase in the size of testis and the number of number of mature spermatocytes [101,102], it also removes beta crystallin like proteins, which promotes testicular maturation and the absence of it may influence fertility. The negative effect of eyestalk ablation on the quantity and quality of spawning, and subsequent larvae viability have been reported in female crustaceans [103,104]. Further study on this specific beta crystallin like gene that was found upregulated in immature male S. olivacea might provide more insight on its involvement in crustacean fertility and reproductive development.

Conclusions
The first transcriptome analysis on the testis of orange mud crab (S. olivacea) was carried out successfully and yielded 144,560,704 high quality reads. Present study also demonstrated the usefulness of next generation sequencing (Illumina) in characterizing transcriptome profile and gene expression of non-model organism using tissue-specific samples. Data obtained in present study greatly contributes to the understanding of the gene expression and genome structure occurring within the testis of S. olivacea throughout its developmental stages. Potential SNPs reported in this study is useful for future selective breeding, trait-mapping, and gene localization studies. The discovery and validation of differentially expressed beta crystallin like gene based on the testis transcriptome profiles of S. olivacea show that this particular gene might be suitable to be use as immaturity marker in male S. olivacea in the future.