Transcriptome data of Prorocentrum donghaiense Lu under nitrogen and phosphorus limitation

Prorocentrum donghaiense Lu is one of the most frequently occurred harmful algae in the coastal waters of China. The growth of P. donghaiense can be limited by nitrogen or phosphorus in marine environment. However, molecular mechanism of P. donghaiense in response to nitrogen and phosphorus limitation is poorly understood. In this study, we summarized the transcriptome datasets of P. donghaiense in response to nitrogen or phosphorus depletion. Raw data of approximately 19 GB in size were generated from IlluminaHiSeqTM 4000 sequencer. From 250, 539, 604 raw reads, 211, 394, 052 clean reads were obtained. The raw data were deposited into SRA database with the BioProject ID 436946. Our dataset will provide more scientific and valuable information for analyses of gene expression related to metabolic processes in P. donghaiense.


a b s t r a c t
Prorocentrum donghaiense Lu is one of the most frequently occurred harmful algae in the coastal waters of China. The growth of P. donghaiense can be limited by nitrogen or phosphorus in marine environment. However, molecular mechanism of P. donghaiense in response to nitrogen and phosphorus limitation is poorly understood. In this study, we summarized the transcriptome datasets of P. donghaiense in response to nitrogen or phosphorus depletion. Raw data of approximately 19 GB in size were generated from IlluminaHiSeq TM 4000 sequencer. From 250, 539, 604 raw reads, 211, 394, 052 clean reads were obtained. The raw data were deposited into SRA database with the BioProject ID 436946. Our dataset will provide more scientific and valuable information for analyses of gene expression related to metabolic processes in P.

Value of the data
The data show changes in gene expression levels of P. donghaiense in response to nitrogen or phosphorus limitation, which are valuable for estimating the impact of variation of nutrients on P. donghaiense cells.
The data can be used by other teams studying on the molecular biology of P. donghaiense. The data will be helpful for analyses of gene expression related to metabolic processes in P. donghaiense.

Data
This article reports the transcriptome data of Prorocentrum donghaiense under nutrient replete, nitrogen-limited or phosphorus-limited conditions. The raw data were deposited in the NCBI SRA database as detailed in Table 1.

Total RNA extraction and quality control
Total RNA was prepared from frozen cells of P. donghaiense using the total RNA extraction kit (Magen, shanghai, China). To remove any traces of genomic DNA from RNA extractions, the RNA was treated with RNase-free Dnase (Magen, shanghai, China) according to the manufacture. For quality control, Drop spectrophotometer (Kai Ao, China), Qubit® 3.0 Flurometer (Life Technologies, USA) and Agilent 2100 RNA Nano 6000 Assay Kit (Agilent Technologies, USA) were used to determine the quality, quantity and integrity of the total RNA.

Library preparation and RNA seq
The mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. The purified mRNA was cut into fragments using divalent cations under high temperature. These RNA fragments were generated into first strand cDNA using random hexamer primer and RNase H. After that, the second strand of cDNA was subsequently synthesized using the first strand buffer, dNTPs, DNA polymerase I and RNase H. The cDNA fragments were purified with QiaQuick PCR kits and washed with EB buffer. And then, these fragments were terminally repaired, and poly(A)-tails and adapters were added. The aimed products were separated by agarose gel electrophoresis, and the fragments were PCR amplificated to create a cDNA library. The clustering of the index-coded samples was performed on a cBot cluster generation system using HiSeq PE Cluster Kit v4-cBot-HS (Illumina) and then the library preparations were sequenced using an IlluminaHiSeq TM 4000 sequencer (Illumina, Shanghai, China) and 150 bp paired-end reads were generated.

Transcriptome de novo assembly
In order to get high-quality reads, raw data were processed with Perl scripts to get rid of reads with adaptor sequence, low-quality reads and reads with number of N accounting for more than 5%. High-quality reads were assemblied by Trinity software (version 20140710) [1] ( Table 2). The clean data were mapped to the assembled transcript by Bowtie2 to post assembly evaluation [3].

Bioinformatic analyses
The number of Reads was counted by HTSeq v0.6.0. RPKM (reads per kilobase per million fragments mapped) was then used to quantitatively estimate gene expression values in each sample [2]. DEGseq was used to compare genes that were up-regulated and down-regulated between two samples using a model based on the negative binomial distribution. The P-value could be assigned to each gene and adjusted following the Benjamini and Hochberg's correction for controlling the false discovery rate (FDR). Genes with FDR r0.05 and |log 2_ ratio| Z2 were identified as differentially expressed genes (DEGs) [3]. When N-limited cultures were compared with N-replete cultures, 34 transcripts were up-regulated and 31 transcripts were down-regulated; Compared between those under phosphorus limitation and nutrient replete conditions, 224 transcripts were up-regulated and 507 transcripts were down-regulated (Fig. 1).