Data on RNA-seq analysis of Garcinia mangostana L. seed development

Mangosteen (Garcinia mangostana L.) has exceptional potential for commercial and pharmaceutical applications due to its delicious fruit and medicinal properties. Nevertheless, the molecular mechanism of mangosteen seed development is poorly understood. In this study, we performed transcriptomic analysis of four seed developmental stages; eight, ten, twelve and fourteen weeks after anthesis. Illumina HiSeq™ 4000 sequencer was used to generate raw data of approximately 68 Gb in size. From 451,495,326 raw reads, 406,143,756 clean reads were obtained. The raw data were uploaded to SRA database and the BioProject ID is PRJNA395504. These data provide the basis for further exploration and understanding of the molecular mechanism in mangosteen seed development.


a b s t r a c t
Mangosteen (Garcinia mangostana L.) has exceptional potential for commercial and pharmaceutical applications due to its delicious fruit and medicinal properties. Nevertheless, the molecular mechanism of mangosteen seed development is poorly understood. In this study, we performed transcriptomic analysis of four seed developmental stages; eight, ten, twelve and fourteen weeks after anthesis. Illumina HiSeq™ 4000 sequencer was used to generate raw data of approximately 68 Gb in size. From 451,495,326 raw reads, 406,143,756 clean reads were obtained. The raw data were uploaded to SRA database and the BioProject ID is PRJNA395504. These data provide the basis for further exploration and understanding of the molecular mechanism in mangosteen seed development.

Value of the data
The data obtained using Illumina sequencer is the first report on RNA-seq of mangosteen seed at different developmental stages (eight, ten, twelve and fourteen weeks after anthesis).
This permits the identification of differentially expressed genes that may play an important role in mangosteen seed development.
Transcriptomics analysis provides the foundation in elucidating the molecular regulation during mangosteen seed development. Data obtained will be valuable for further investigation on putative genes and proteins discovery in mangosteen seed development.

Data
This dataset are raw reads for mangosteen seed at four different developmental stages; eight, ten, twelve and fourteen weeks after anthesis. Consequently, the data were de novo assembled into fulllength transcriptome.

Plant materials
Mangosteen fruit were obtained from mangosteen plots at Universiti Kebangsaan Malaysia, Bangi (2°55′09.0″N 101°47′04.8″E). Flowers of mangosteen were labelled at anthesis during its flowering season (March -April 2014). During fruiting period (June -August 2014), fruits were harvested for seeds at eight, ten, twelve and fourteen weeks after anthesis denoting different developmental stages. Seed samples were stored at −80°C and grounded to fine powder prior analysis.

Total RNA extraction and quality control, library preparation and transcriptomic service
Extraction of seed total RNA was done [1,2] via modifying the CTAB method [3]. For quality control, NanoDrop spectrophotometer (Thermo Fisher Scientific Inc., USA) and Agilent 2100 Bioanalyzer (Agilent Technologies, USA) were used to determine the total RNA quantity, quality and reliability. Samples with RNA integrity number (RIN) of around 8.0 or higher were selected for library preparation and sequencing. The mRNA library preparation employed was SureSelect Strand-Specific RNA Library Prep for Illumina Multiplexed Sequencing (protocol version E0, March 2017). Consequently, RNA-seq was performed using Illumina HiSeq™ 4000 (Theragene Etex, South Korea), generating 150 bp of paired end reads (Table 1).

De novo transcriptome assembly
Quality control of raw reads were tested via FastQC version 0.11.2 [4]. Then, high quality reads were obtained by trimming adapters and other unwanted sequences sequence using cutadapt version 1.9.1 [5] and filtering the reads using in-house script by Theragen Etex Bio Institute, Republic of Korea (Table 2). Trinity version 2.1.1 [6] was used to assemble the reads de novo [7] with default configuration while TIGR Gene Indices clustering tools version 2.1 (Identity; 0.94) [8] was used to omit redundant sequences and cluster them into non-redundant unigenes set. A total of 101,384 unigenes were found and their average length is 784 bp (Table 3). Table 2 Statistics of raw and clean reads and bases of mangosteen seed development transcriptome. Mangosteen seed developmental stages; eight (W8), ten (W10), twelve (12) and fourteen (W14) weeks after anthesis.