Dataset from de novo transcriptome assembly of Myristica fatua leaves using MinION nanopore sequencer

Myristica fatua is a tropical fruit tree species originating from Indonesia. Very few genomic resources are available for the species. We developed a full-length transcriptome assembly using long-read sequencing (MinION Nanopore technology) and produced 4.3 million reads (3.5 G of bases). The assembled full-length transcript was constructed using the RATTLE program and assembled 21,098 transcripts. The transcript ranged from 201 – 14,174 bp, and N50 was 2,017 bp. The transcripts were annotated with the UNIPROT database using BlastX. The functional annotation was performed using Blast2go software. The 8,445 microsatellite motif-containing contigs were identified. The raw reads are deposited in the ENA (European Nucleotide Archive) with ENA experiment accession number ERX6798613.


Value of the Data
• This data provides Myristica fatua coding sequence (CDS) as the first transcriptome reference using Oxford Nanopore Technologies of long-read sequencing • This data could benefit studies to identify full-length transcripts related to flavonoid biosynthesis for molecular biologists that are used for downstream analysis in Myristica fatua and related genera. • This data provides datasets of EST-microsatellite molecular markers for the breeder to improve crop breeding programs in Myristica fatua-related genera . • The raw sequencing data may be carried out further in differential expressed gene study.

Objective
This plant is one of the best potential spices sources from Indonesia. However, genetic information such as transcriptome data is not yet available. Therefore, these data were used to obtain transcriptome information from leaves in the seedling phase of M. fatua . The transcripts' results were obtained using long-read sequencing from oxford nanopore technology. This data is able to provide full-length transcripts that are useful for studying gene expression analysis.

Data Description
In this data, full-length transcripts were sequenced from Myristica fatua using long-read sequencing. The total RNA was extracted from the leaves on the seedling stage with high-quality total RNA. The full length was obtained with raw data produced 4.3 million reads (3.5G of bases) [2] . The raw reads are deposited in the ENA database with the accession number ERX6798613 [3] . The clean reads were filtered by pychopper and cutadapt programs. The de novo assembly was constructed using the RATTLE program and produced 21,098 transcripts [4] . All statistics of reads and assembled transcripts were analyzed ( Table 1 ). The transcripts were annotated with a filtered-UNIPROT database using the BLAST + v . 2.7.1 program [5] and processed by Blast2go software ( Table 2 ) [6][7][8] . An overview of Myristica fatua Gene Ontology (GO) classification is presented in Fig. 1 a for Biological Process, Fig. 1 b for Molecular Function, and Fig. 1 c for Cellular Component [9] and KEGG pathways [10] . Open reading frames (ORFs) from transcripts were determined using the TransDecoder program ( Table 3 ) [11] . The distribution of the identified EST-SSRs in transcripts was performed using the MISA program ( Table 4 ) [12] .

Total RNA extraction
The total RNA from young leaves was extracted using the RNeasy PowerPlant Kit (Qiagen) following the manufacturer's protocol. The quality and quantity of RNA were checked by Nanophotometer NP-80 (Implen) and Qubit TM RNA Broad Range (BR) assay on Qubit® Fluorometer (Invitrogen).

Ethics statements
Not applicable.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
RujakBase project -Myristica database for Whole Genome and Transcriptome Studies (Origin al data) (European Nucleotide Archive (ENA)).