De novo transcriptomic data of salt tolerant halophytes Dichnathium annulatum (Forssk.) stapf and Urochondra setulosa (Trin.) C.E.Hubb.

Two halophytes, Dichanthium annulatum (moderately salt tolerant) and Urochondra setulosa (highly salt tolerant) were selected to generate transcriptome at different salinity levels. Sequencing of RNA samples was done on Illumina-Hi-Seq platform for de novo transcriptome assembly from the leaf tissues of D. annulatum at salinity of ECe ∼30 dS/m and of U. setulosa at three salt levels (i.e. ECe ∼30, ∼40 and ∼50 dS/m). DESeq was used for identification of differentially expressed transcripts and a total of 267,196 and 384,442 transcripts were assembled through Trinity in both the plants respectively. A total of 32,246 and 25,479 SSRs were identified respectively in both the plants using MISA perl script with mono and tri-nucleotide repeats as most common motif.


Specifications
Biological Sciences Specific subject area Transcriptomics Type of data Table  Chart  Figure  How data were acquired Illumina HiSeq Data format Raw Sequencing Reads (FastQ) Parameters for data collection Total RNA was isolated from leaves of both the halophytes, D. annulatum at EC 30dS/m and U. setulosa at salinity treatments of EC 30, 40 and 50 dS/m ( ∼ 300, 400 and 500 mM NaCl) for sequencing. Description of data collection Leaves of both the plants were collected in ice and RNA was isolated with one set of control and different salinity treatments in two replications each separately for both the plants. Sequencing was performed on Illumina-HiSeq platform. The RNAseq libraries were prepared with Illumina-compatible NEBNext ® Ultra TM Directional RNA Library Prep Kit. Processed reads were assembled using graph-based approach by Trinity program. Clustering of the assembled transcripts based on sequence similarity was performed using CD-HIT-EST. Processed reads were aligned back to the final assembly using Bowtie with end to end parameters. DESeq, was used for differential expression analysis. SSRs were identified using MISA. Data

Value of the Data
• These halophytes, Dichanthium annulatum and Urochondra setulosa, are naturally salt loving plants, where the earlier is moderately salt tolerant surviving upto EC 30 dS/m ( ∼300 mM NaCl) while the later is highly salt tolerant with salt tolerance upto EC 50 mM ( ∼500 mM NaCl). There is no reference genome available for these two halophytic plants, hence, the transcriptomic information generated here will be useful for further identification of genes, pathways, mechanism at high salinity in related species.
• The studied halophytes are important dessert plants with economic value as well and having potential in desalinating waste lands. The information generated is valuable for plant researchers working in abiotic stress. • For crop improvement programmes, this information might be useful in development of markers/QTLs, genic-markers, SNPs or different transcription factors involved in various pathways operating at high salt levels which is need of the time for enhancing crop productivity in changing climatic situations.

Data Description
An aliquot of RNA isolated from leaves of control and salt treated both the halophytic plants was run on Agilent TapeStation to check RNA integrity. All the samples were having RIN (RNA integrity number) values more than 7 ( Table 1 ). A schematic overview of experimental design and transcriptomic data analysis pipeline used in this work has been shown in Fig. 1 . After sequencing, a total of 44.3-49.6 million paired end reads were obtained from 8 RNA libraries in Urochondra and 4 libraries in D. annulatum . The quality of data was assessed using FastQC ( https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ) and important parameters such as mean quality scores per position, per sequence quality scores, GC content distribution and read length distribution were measured ( Fig. 2 ). The phred quality score per position of all libraries was higher than 30 with normal distribution of GC content. This analysis showed that there is no sequence contamination during sequencing. After trimming the raw reads using Cutadapt [1] , an average of 96% of high quality data was retained and more than 90% of data were found to align with clustered transcripts. The details of the transcriptomic data of D. annulatum and U. setulosa are shown in Table 2 . The trinity assembly of high quality reads resulted in 352.78 million transcripts in Urochondra samples, further clustering into 282,719 transcripts with average length of 1,259 bp and N50 value of 1,819 bp whereas in D. annulatum , 267.19 million transcripts were clustered into 188,353 transcripts with average length of 864.2 bp and N50 value of 1,100. Trinity [2] combines the overlapping reads of a given length and quality into longer contig sequences without gaps. Main properties of assembled contigs including average length, N50 length, maximum and minimum length were calculated. Contigs shorter than 300 bp were not counted since a characterized protein domain may be either lacking in shorter sequences or we may not find any significant match for such sequences. BLASTing was done against "viridiplantae " for functional annotation of clustered transcripts. A total of 65.52% of the transcripts were annotated in Urochondra while 64.47% transcripts were annotated in D. annulatum . Transcripts with matching e-value less than e-5 and minimum 30% similarity were assigned with a homolog protein from other organisms. The E-value distribution of the transcripts showed that 47.99% of aligned transcripts had similarity with an E-value range of 1e-05 to 1e-60, whereas the remaining 52% of the homologous sequences ranged from 1e-5 to 0. The similarity distribution of transcripts showed that 55.07% and 42.75% of the sequences had a similarity higher than 80% in D. annulatum and U. setulosa respectively and remaining 44.9 and 57% of the sequences in each plant had a similarity in range of 21-80% ( Fig. 3 ). We also analysed the novel gene expression patterns by performing analyses of the differentially expressed genes (DEGs) related to salinity/salt tolerance. DESeq [3] normalized expression values were used to calculate fold change for a given transcript. The regulation for each transcript was assigned based on log2fold change. Being de novo , the transcripts showing log2fold change less than -1 were counted as down regulated and the values more than 1 were represented as up regulated. The data of differentially expressed genes with their expression levels at different salt concentrations in both the plants is available at Mendeley Data ( https://data.mendeley.com//datasets/c9zwjncxb4/1 ). In Urochondra , out of total 345,729 transcripts, 68,455 genes were up-regulated and 69,759 were down-regulated. The volcano plots were plotted for each saline treatment as shown in Fig. 4 . We identified a total of 8,074 DEGs commonly up-regulated and 1,929 DEGs commonly downregulated in U. setulosa between the three stress treatments ( p ≤0.05) . In addition, 1,065 transcripts (2.8%) were commonly up-regulated at EC 30 and 40 dS/m, 11,209 (29.2%) transcripts   scripts in Urochondra and 4,114 (14.99%) in Dichanthium were found to have more than one SSRs. 1,401 and 1,060 SSRs were identified in compound form in U. setulosa and D. annulatum respectively while the remaining were the perfect SSRs. The identified SSRs were classified as per the criteria proposed by Weber [4] and it was observed that mononucleotide (17,432 and 13,890) and tri-nucleotide repeats (11,008 and 6,724) were the most abundant motifs in both the plants representing about ∼54% and 34.13%, 26.39% of the total SSRs respectively ( Fig. 5 ) followed by dinucleotide (10.38;17.99%), tetra-nucleotide (0.90%), penta-nucleotide (0.28%, 0.13%) and hexanucleotide (0.22, 0.06%) in D. annulatum and U. setulosa respectively. Motif type prediction revealed T/A as the most abundant motif in both the halophytes followed by CT/GA and TC/AG in Urochondra with CCG/GGC and CGC/GCG in Dichanthium ( Fig. 5 ).   m. These saline levels were maintained regularly and after six months, final saline treatments were applied at flowering stage and leaves were harvested after 48 hours for RNA isolation. Three replicates were pooled to make one biological replicate and two biological replicates (pooled) per treatment were used for RNA library construction and further transcriptome profiling in both the plant types.

RNAseq library preparation and RNA sequencing
Total RNA was isolated using Qiagen RNeasy plant mini kit which was quantified on Nanodrop Spectrophotometer while RNA purity was checked on Nanodrop Spectrophotometer. Illumina-compatible NEBNext ® Ultra TM Directional RNA Library Prep Kit (NEB, USA) was used for RNAseq libraries as per manufacturer's instructions. cDNA-library was prepared following standard Illumina protocol with synthesis of first strand using Actinomycin D (Gibco, life technologies, CA, USA) followed by second strand synthesis. Double-stranded cDNA was purified using HighPrep magnetic beads (Magbio Genomics Inc, USA) and after end-repairing and adenylation, it was ligated to Illumina multiplex barcode adapters as per NEBNext ® Ultra TM Directional RNA Library Prep Kit protocol.
Indexing-PCR of adapter-ligated cDNA was followed for enrichment of adapter-ligated fragments. The reaction was carried out at ( 37 ˚C for 15 mins, with denaturation at 98 ˚C for 30 sec followed by 15 cycles of 98 ˚C for 10 sec, 65 ˚C for 75 sec and 65 ˚C for 5 min. The sequence library (final PCR product), thus constructed was purified with HighPrep beadswith quality check on Qubit fluorometer (Thermo Fisher Scientific, MA, USA) and fragment size distribution was analysed on Agilent 2200 Tapestation.
The constructed RNAseq libraries were used for sequencing on Illumina HiSeq sequencer at Genotypic Technology, Bangalore (India) to generate 150 base pair length paired-end reads. On an average 460.88 and 428.85 million raw sequencing reads were generated in U. setulosa and D. annulatum respectively which were processed for quality assessment and lowquality filtering before the assembly. The raw data generated was checked for the quality using FastQC ( https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ). The reads were processed for quality assessment and low quality filtering before the assembly. Pre-processing of the data was done with Cutadapt which includes removing the adapter sequences and low quality bases ( < q30). Graph-based approach was used for assembling of processed reads through Trinity program [2] by combining the overlapping reads of a given length and quality into longer contig sequences without gaps.. Based on sequence similarity, assembled transcripts were clustered using CD-HIT-EST [5] with 95% similarity between the sequences which reduces the redundancy without exclusion of sequence diversity. These clustered transcripts were used further for annotation and differential expression analysis. To evaluate the read content and assess the quality of the assembly, Bowtie [6] was used for final assembly through back alignment of processed reads with end to end parameters. Differential expression of transcripts was analysed using DE-Seq [3] . Sequencing (uneven library size/depth) bias among the samples was removed by library normalization using size factor calculation in DESeq.

SSR marker detection
MISA (MicroSatellite identification tool) perl script was used for mining Simple Sequence Repeats (SSR) in each transcript sequence. Sequence repeats with length and motif type were identified with recommended default protocol of MISA [7] .
The commands used for all these programs are available in Supplementary table 1.

Ethics Statement
All the authors hereby declare that all the experiments were conducted while maintaining all ethical rules and regulations. None of the studies included humans or animals.