Datasets for genome assembly of six underutilized Indonesian fruits

Indonesia has a high genetic diversity of tropical fruits. However, studies on genomics are still very limited. In this data article, six underutilized Indonesian fruits were analyzed for the estimated genome size and partial data of genome assembly including Artocarpus nangkadak (Artocarpus heterophyllus x Artocarpus integer), Salacca sumatrana, Flacourtia inermis, Lansium domesticum, Pometia pinnata, and Syzygium samarangense. These genome data may be used to construct molecular markers for plant systematics and breeding program of these species. Our genome data were sequenced paired-end libraries using BGISeq-500 and generated approximately 5 Gb of bases per species. The raw sequences have been deposited in the DNA Data Bank of Japan (DDBJ) under the DDBJ BioProject umbrella with accession number PRJDB7265 and to the DDBJ Read Archive for each species following Artocarpus nangkadak (DRA007398), Salacca sumatrana (DRA007394), Flacourtia inermis (DRA007395), Lansium domesticum (DRA007393), Pometia pinnata (DRA007396), Syzygium samarangense (DRA007397).


a b s t r a c t
Indonesia has a high genetic diversity of tropical fruits. However, studies on genomics are still very limited. In this data article, six underutilized Indonesian fruits were analyzed for the estimated genome size and partial data of genome assembly including Artocarpus nangkadak (Artocarpus heterophyllus x Artocarpus integer), Salacca sumatrana, Flacourtia inermis, Lansium domesticum, Pometia pinnata, and Syzygium samarangense. These genome data may be used to construct molecular markers for plant systematics and breeding program of these species. Our genome data were sequenced paired-end libraries using BGISeq-500 and generated approximately 5 Gb of bases per species. The

Value of the data
These data provide genomic data of six Indonesian underutilized fruits for genetic studies and breeding program.
These data will be useful to obtain molecular markers such as microsatellite and single nucleotide polymorphisms for breeding and selection of new cultivars from six underutilized Indonesian fruits.
These data will further be valuable for more complex studies on plant systematics among their species and genus.

Data
Many edible tropical fruits are native to South East Asia such as Indonesia, Malaysia, Philippines, and Thailand. Some underutilized fruits in Indonesia are important genetic resources for crop improvement, biomass, and food security [1]. In this data article, we analyzed genome size estimation and the draft genome assembly of six Indonesian underutilized fruits following Artocarpus nangkadak (Artocarpus heterophyllus x Artocarpus integer), Salacca sumatrana, Flacourtia inermis, Lansium domesticum, Pometia pinnata, and Syzygium samarangense. The estimated genome size was analyzed using flow cytometry [2]. The genomes of the six Indonesian fruits were sequenced using paired-end libraries of BGISeq-500.

Genome size estimation
The 1 cm 2 of leaves was mixed with nuclei extraction buffer of CyStain UV Precise P (Cytotechs, Kandatsu, Japan). The nuclei were isolated from leaves using chopping method with razor blade and stained with staining buffer of CyStain UV Precise P (Cytotechs, Kandatsu, Japan). The stained nuclei were counted using Cyflow (Sysmex Partec, Gorlitz, Germany). The data were analyzed using Flow-Max Software. The Raphanus sativus was used as plant reference for 2C DNA value estimation [2].

DNA extraction, whole genome sequencing and assembly
Genomic DNA was extracted from the young leaves using DNeasy Plant Mini Kit (Qiagen) following the protocol. The quality and quantity of DNA were checked by P360 Nanophotometer (Implen, München, Germany). Library quality was assessed on the Agilent Bioanalyzer 2100 system. The libraries were sequenced on the BGISeq-500 platform based on sequencing by synthesis with 100 bp paired-end reads (BGI, HongKong). The extracted genomic DNA was subjected to preparation of a paired-end library for genome sequencing using the BGISeq-500. After sequencing, the raw reads were filtered. Data filtering include removing adaptor sequences, contamination and low-quality reads from raw reads ( Table 1).
The assembly of reads from each species was performed through DDBJ Read Annotation Pipeline [3,4] using ABySS 1.3.2 [5], Platanus 1.2.2 [6], SOAPdenovo 2.04-r240 [7], and Velvet 1.2.10 [8] with default parameters and the contigs have filtering minimum of 200 bp. The contig statistics from each assembler were calculated using Assembly-stat program [9] ( Table 2). The contigs generated from the four assemblers will be made available at http://rujakbase.id.

Data accessibility
The raw read data were submitted to the DDBJ Read Archive (Table 1).