Data set for transcriptome analysis of Apocynum venetum L.

In this paper, we present the transcriptome profiles of the A. venetum L. by RNA-Seq approach. A total of 6.57 Gb raw data were obtained, and 52,983 unigenes with an average length of 1009 bp and N50 of 1632 bp were annotated with the 7 databases. The unigenes annotated to KEGG database were divided into 21 categories from 6 main groups. Among these, 4952 (22.21%) unigenes were clustered to “Global and overview maps”, and 1834 (8.23%) unigenes were clustered to “Carbohydrate metabolism”. In addition, 6340 unigenes containing 7579 SSRs were identified and the mononucleotide, dinucleotide, trinucleotide motifs were the most common motif type (95.59%), accounting for 39.62%, 36.02%, and 19.95%, respectively.


a b s t r a c t
In this paper, we present the transcriptome profiles of the A. venetum L. by RNA-Seq approach. A total of 6.57 Gb raw data were obtained, and 52,983 unigenes with an average length of 1009 bp and N50 of 1632 bp were annotated with the 7 databases. The unigenes annotated to KEGG database were divided into 21 categories from 6 main groups. Among these, 4952 (22.21%) unigenes were clustered to "Global and overview maps", and 1834 (8.23%) unigenes were clustered to "Carbohydrate metabolism". In addition, 6340 unigenes containing 7579 SSRs were identified and the mononucleotide, dinucleotide, trinucleotide motifs were the most common motif type (95.59%), accounting for 39. 62%

Value of the data
Apocynum venetum (luobuma) is a common fiber and medicinal plant widely distributed in the salt marish, desert margins, alluvia flats and riversides [2,3], which makes it an invaluable model for bast fiber development and plant stress resistance research.
The genetic information and gene sequences about the A. venetum in public databases are scanty. The large dataset of transcripts and unigenes can be useful as it provides abundant genetic information for identifying of A. venetum genes.
The unigenes obtained provide a good resource for SSRs application in evolutionary genetic from A. venetum.

Data
Here we report a de novo transcriptome assembly of A. venetum. Our aim was to obtain a high quality reference transcriptome of A. venetum leaves, elucidate the molecular pathway of fiber and flavonoids synthetize, stress resistance, and find candidate genes of these process (see Tables 1-3 and Figs. 1-3).
The de novo transcriptome assembly of A. venetum L., and the SRA records is accessible with the following link: https://www.ncbi.nlm.nih.gov/sra/SRP151546.

Plant materials
The seeds of A. venetum were collected from Xinjiang Province, China, in November 2016. Seeds were surface-sterilized by rinsing in 70% (v/v) ethanol for 60 s, then in 5% (v/v) sodium hypochlorite (NaClO) for 30 min while rocking on a platform, and washed in distilled water for 8 min. The seeds were allowed to germinate and grow for 30 days in half-strength MS agar medium inside a growth chamber with a 14 h light/10 h dark cycle, air temperature of 25°C, photon flux density (PFD) of 280 mol m À 2 s À 1 . The leaves of A. venetum were collected, immediately frozen in liquid nitrogen, and stored at -80°C until use. Total RNA was extracted using TRIzol Reagent (Invitrogen, LifeTechnologies, USA) following the manufacturer's instructions, then rtreated with DNase I (Invitrogen, Life Technologies, USA). The RNA integrity was verified using an Agilent 2100 BioAnalyzer (Agilent, USA).

RNA sequencing
RNA-Seq libraries were constructed using the RNA Library Prep Kit for Illumina using to the manufacturer's instructions (NEB, USA). Library quality was assessed on the Agilent Bioanalyzer 2100 system. The libraries were sequenced on the BGIEQ-500 platform (BGI, CHN) based on sequencing by synthesis with 100 bp paired-end reads (BGI Technologies, Shenzhen). All RNA-Seq data were deposited in National Center for Biotechnology Information (NCBI) with the accession number SRP151546.

Leaf transcriptome assembly and gene functional annotation
The raw reads were firstly filtered and combined to form longer fragments, then de novo assembled into unigenes using the short read assembly program Trinity with default settings [4,5]. Functional annotation of the unigenes was performed by searching the following databases: Nr; Pfam; KOG/COG; Swiss Prot; KEGG; and GO. The information on the annotation was summarized and the distribution of unigenes was illustrated by Venn diagram (Fig. 2).

Identification of SSR markers
Using the MISA software [6], 6,340 unigenes containing 7,579 SSRs were identified, of which 1040 sequences contained more than one SSR.