Dataset of the first de novo transcriptome assembly of the arillode of Baccaurea motleyana

Baccaurea motleyana Müll. Arg. (rambai) is one of the underutilized fruit natives to Indonesia, Thailand, and Malaya Peninsula and it is mostly cultivated in Java island (Lim, 2012) [1]. The edible part of fruits is white and reddish arillodes in which having sweet to acid-sweet tastes. However, nucleotide as well as transcriptome information of this species is still scarce, no information has been deposited in GenBank. In this data article, we performed for the first time of de novo assembly of transcriptome using paired-end Illumina technology. The assembled contigs were constructed using Trinity and after filtering and clustering, produced 37,077 contigs. The contig ranged 201–4972 bp and N50 has 696 bp. The contig was annotated with several database such as SwissProt, TrEMBL, nr and nt NCBI databases. The raw reads were deposited in DDBJ with DRA numbers, DRA007358. The assembled contigs of transcriptome are deposited in the DDBJ TSA with accession number, IADP01000001–IADP01037077 and also can be accessed at http://rujakbase.id.


a b s t r a c t
Baccaurea motleyana Müll. Arg. (rambai) is one of the underutilized fruit natives to Indonesia, Thailand, and Malaya Peninsula and it is mostly cultivated in Java island (Lim, 2012) [1]. The edible part of fruits is white and reddish arillodes in which having sweet to acidsweet tastes. However, nucleotide as well as transcriptome information of this species is still scarce, no information has been deposited in GenBank. In this data article, we performed for the first time of de novo assembly of transcriptome using paired-end Illumina technology. The assembled contigs were constructed using Trinity and after filtering and clustering, produced 37,077 contigs. The contig ranged 201-4972 bp and N50 has 696 bp. The contig was annotated with several database such as SwissProt, TrEMBL, nr and nt NCBI databases. The raw reads were deposited in DDBJ with DRA numbers, DRA007358. The assembled contigs of transcriptome are deposited in the DDBJ TSA with accession number, IADP01000001-IADP01037077 and also can be accessed at http://rujakbase.id. &

Value of the data
These data provide transcriptome for the first time of Baccaurea motleyana from arillode fruits. These data will be useful to obtain molecular markers of microsatellite and single nucleotide polymorphisms for breeding program in B. motleyana and the related-genus.
These data also will be valuable for gene expression analysis using any treatments among the species and related-genus.

Data
In this data article, a de novo transcriptome assembly of Baccaurea motleyana (rambai) has been reported for the first time. The tissue was collected from arillode-reddish color of rambai, and the high quality of RNA was extracted for 150 bp paired-end sequencing technology of Illumina. The high quality of reads was obtained, and de novo assembly was performed using Trinity v.2.4.0 [2]. All statistics of reads and assembled sequence were determined ( Table 1). The contigs were reconstructed using CAP3 [3] and CD-HIT-EST v.4.6.8 [4] to remove redundant contigs and then the contigs were filtering and clustering using Corset v.1.06 [5]. The contigs were annotated with several databases using the BLAST v.2.7.1 þ program [6]. An overview of the transcriptome assembly of B. motleyana is presented in Table 2. 2. Experimental design, materials, and methods B. motleyana (rambai) cultivar. Merah (reddish arillode) were collected from Mekarsari Fruit Garden at ripening stage. The flesh arillode was used for RNA extraction. The total RNA was extracted using ISOLATE RNA (Bioline) following the protocol. The quality and quantity of DNA were checked by P360 Nanophotometer (Implen, München, Germany). The total RNA was subjected to preparation of a paired-end library for RNA sequencing using the Illumina Hiseq X Ten (BGI, Hongkong). After sequencing, the raw reads were filtered includes removing adaptor sequences, contamination and low-quality read from raw reads. The high quality of reads used to construct the transcriptome contigs using Trinity package with default parameters and minimum length of 200 bp. The assembled contigs were performed by CAP3 (-p 90), and CD-HIT-EST (-c 0.90 -M 0 -T 0) and clustering with Corset after filtering low expression reads below 1 CPM. Several databases such as nt and nr databases from NCBI and SwissProt and TrEMBL databases from UniProt were used to annotate the contigs using the BLAST program with the cut-off of 10 À 5 .

Data accessibility
All raw data and sequences have been deposited to the DDBJ with accession number DRA007358 and assembled contigs have been deposited to the Transcriptome Shotgun Assembly (TSA) with accession number, IADP01000001-IADP01037077 and also can be downloaded at http://rujakbase.id/ content/download.