Dataset of De novo leaf, salicylic induced leaf and flower transcriptome profiling of Datura metel plant

Datura metel L (thorn's apple) is a popular plant belonging to the family Solanaceae, growing all around the year in humid and warm climates. The importance of D. metel as a medicinal marvel is due to secondary metabolites within various parts of the plant, which serve different therapeutic functions. The whole plant is considered a narcotic, anodyne, and antispasmodic, while the leaves, bark, and seeds are also separately used in extractions. The biological potency of the plant has been used in traditional medicine for over a century. Currently, plant parts are used as a rich source in pharmaceutical manufacturing of secondary metabolites such as flavonoids, saponins, alkaloids, steroids, tannins, and withanaloids. D. metel has proven advanced functions of antiviral effects, antibacterial and antifungal effects, anti-inflammatory, analgesic, antipyretic, hepatoprotective, nephroprotective effect, anticancer, and to treat chronic cardiovascular diseases, diabetes, and neurological ailments. This is the first report on transcriptome assembly for this plant. The raw RNA sequencing data for leaf, salicylic-induced leaf, and flower are available at the NCBI Sequence Read Archive (SRA) under the Bioproject access PRJNA838784. The raw RNA sequencing data that is currently accessible can be utilized to conduct differential gene expression investigations pertaining to various secondary metabolite pathways and diverse tissues, as well as for the research of gene expression related to stress induced by salicylic acid in leaf tissues of the plant. Gene functions can be evaluated and mostly utilized for gene clustering data analysis, gene characterizations, and the identification of genes involved in linked biological pathways in plant studies.


Transcriptomics
De novo sequencing Secondary metabolites Gene annotations Leaf Datura metel a b s t r a c t Datura metel L (thorn's apple) is a popular plant belonging to the family Solanaceae, growing all around the year in humid and warm climates.The importance of D. metel as a medicinal marvel is due to secondary metabolites within various parts of the plant, which serve different therapeutic functions.The whole plant is considered a narcotic, anodyne, and antispasmodic, while the leaves, bark, and seeds are also separately used in extractions.The biological potency of the plant has been used in traditional medicine for over a century.Currently, plant parts are used as a rich source in pharmaceutical manufacturing of secondary metabolites such as flavonoids, saponins, alkaloids, steroids, tannins, and withanaloids.D. metel has proven advanced functions of antiviral effects, antibacterial and antifungal effects, anti-inflammatory, analgesic, antipyretic, hepatoprotective, nephroprotective effect, anticancer, and to treat chronic cardiovascular diseases, diabetes, and neurological ailments.This is the first report on transcriptome assembly for this plant.The raw RNA sequencing data for leaf, salicylic-induced leaf, and flower are available at the NCBI Sequence Read Archive (SRA) under the Bioproject access PRJNA838784.The raw RNA sequencing data that is currently accessible can be utilized to conduct differential gene expression investigations pertaining to various secondary metabolite pathways and diverse tissues, as well as for the research of gene expression related to stress induced by salicylic acid in leaf tissues of the plant.Gene functions can be evaluated and mostly utilized for gene clustering data analysis, gene characterizations, and the identification of genes involved in linked biological pathways in plant studies.
© Quantitative and qualitative RNA assessment employed a NanoDrop-10 0 0 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, USA).The RNA purity was determined through absorbance measurements at 260 and 280 nm (A260/280) and 260/230 nm (A260/230), with ratios between 1.8 and 2.0 indicating optimal purity.RNA integrity was evaluated using an Agilent 2100 bioanalyzer, with a minimum RIN of 7.5 established as the threshold for our samples.RNA sequencing RNA sequencing and library preparation were done at Macrogen Korea.There, the ribosomal RNA (rRNA) was briefly removed from Ribo-Zero rRNA removal beads using TruSeq Stranded Total RNA with Ribo-Zero Plant (Illumina).Following purification, the RNA is fragmented into small pieces using divalent cations under elevated temperatures.The cleaved RNA fragments are copied into the first-strand cDNA using reverse transcriptase and random primers, followed by second-strand cDNA synthesis using DNA polymerase I and RNase H enzyme.The products were purified and enriched in a PCR to create the final cDNA library.The QC of the amplified library was determined using a high-sensitivity bioanalyzer chip (Agilent Technologies Inc., Santa Clara, CA).Then the libraries were sequenced on Illumina NovaSeq 60 0 0, and 100 base-pair end sequencing was conducted.Short read data in FASTQ file format were produced and the short reads that did not pass Illumina standard quality filter were eliminated.The process yielded clean reads from the mRNA pool isolated from D. metel tissues of leaves, induced leaves, and flower.

RNA sequencing analysis (RNA-Seq analysis)
Raw reads were filtered to exclude reads containing adapter and low-quality sequences using Trimmomatic software.The N20, N30, and N50 values, the GC content, and the clean data sequence duplication level were calculated.De novo assembly of each transcriptome and a common transcriptome of leaf, induced leaf, and flower buds were performed with Trinity 2.8.4 software.The default Kmer size of 25 was set for the de novo transcriptome assembly of the flowers and leaves.The length of the assembled unigenes used for further study was ≥500 bp.All downstream analyses were based on high-quality clean data.CD-HIT-EST (version 4.6.3)analysis was performed by selecting the longest isoform, and other contigs were merged according to a similarity criterion of 90 %.This would eliminate the redundancy of the FASTA sequences generated from assembly.BUSCO assessment was conducted to find the completeness of transcriptome assembly under viridiplantea (Kingdom) lineage and cut-off E value of 1.03E

Value of the Data
• The integration of transcriptomics and metabolomics knowledge, along with their applications, will provide a more comprehensive path for gathering all information related to plant metabolomics.This approach is critical for identifying missing genes and transcription factors that regulate the production of secondary metabolites within their respective biosynthesis pathways.A deep understanding of secondary metabolite biosynthesis pathways empowers synthetic biologists to integrate the acquired knowledge into heterologous systems, thereby enhancing the production of secondary metabolites in a cost-effective and tailored manner.As a result, RNA-seq studies hold particular significance, especially for non-model plants.
• This dataset represents the first instance of RNA-seq data encompassing the most critical tissue parts responsible for producing numerous medicinally important secondary metabolites.The sequencing was executed using the Novaseq platform, resulting in a read depth of 60 M in a single sample.Consequently, this dataset stands as a valuable resource, providing a foundation for subsequent gene expression analysis endeavors.• Moreover, a sample induced with salicylic acid was also sequenced with the intention of facilitating future differential gene expression analysis.This specific setup offers an excellent opportunity to gain insights into how chemically induced plants elevate the production of secondary metabolites in Datura metel species.

Data Description
The Bioproject encompasses the raw sequencing data obtained through transcriptomic sequencing of leaf, salicylic-induced leaf, and flower tissues from Datura metel plants.This data was collected in January 2018 and has been deposited under the Bioproject with the accession number PRJNA838784.RNA was extracted from the leaf, salicylic-induced leaf, and flower tissues of Datura metel ( Table 1 and Fig. 1 ) and used for de novo transcriptome analysis using Illumina NovaSeq 60 0 0 sequencing technology.The 60x depth of sequencing coverage was received.The percentage of Q20 bases and Q30 bases were around 98 %, and 94 %, respectively ( Table 1 ).The GC content was at least 45 % or above ( Table 2 ).The sequencing results showed good quality for all tissues and provided valid information for subsequent analyses.De novo assembly of each transcriptome and a common transcriptome of leaf, induced leaf and flower buds were performed with Trinity 2.8.4 software.The default kmer size of 25 was set for the de novo transcriptome assembly of the flowers and leaves.The length of the assembled unigenes used for further study was ≥500 bp.All downstream analyses were based on high-quality clean data.All reads were de novo assembled into a single transcriptome individually and finally merged into a single transcriptome containing 232,032 contigs and representing 131,784 unigenes.The N50 was equal to 2967 bp for transcript 1866 bp for the longest isoform for the gene ( Table 3 ).The mean lengths of transcripts and unigenes were 1529 bp and 978 bp, respectively.The contigs reported as the output of the Trinity package contain a FASTA file that contains redundancy.Therefore, the CD-HIT clusters [4] are used to remove duplicate sequences.The BUSCO completeness analysis [5] has revealed that 98 % of the completeness of the genome ( Fig. 2 ).
Fig. 2. analysis results.Blue represents the complete single-copy genes (85.41 %), green represents (13.65 %) of the complete duplicate-copy genes, and orange for fragmented genes (0.71 %) and missing genes (0.23 %).(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Raw sequencing data for flower sample
• Datura metel flower: Illumina sequencing data were obtained from an Illumina NovaSeq 60 0 0 run, producing 57.7 million spots with a total of 17 gigabases and 5 gigabytes of downloadable data.The experiment number is SRX17976631, and the corresponding raw data is labeled as Library ID "DF_control".Other information is shown in Tables 4 and 5 .
The SRA accession number for the file is SRR21994161.• Datura metel Salicylic Acid Induced Leaf: Illumina sequencing data were acquired from an Illumina NovaSeq 60 0 0 run, yielding 54.7 million spots with a total of 16.1 gigabases and 4.8 gigabytes of downloadable data.The experimental number is SRX17976630, and the corresponding raw data is labeled as Library ID "DL_treated".( Tables 4 and 5 ).The SRA accession number for the file is SRR21994162 and its statistical information is also provided in Tables 4 and 5 .• Datura metel normal Leaf: Illumina sequencing data were generated from an Illumina No-vaSeq 60 0 0 run, resulting in 57.7 million spots with a total of 17.1 gigabases and 5.1 giga- bytes of downloadable data.The experimental number is SRX17976629, and the associated raw data is labeled as Library ID "DL_control " ( Tables 4 and 5 ).The SRA accession number for the file is SRR21994163 and statistical information is given in Tables 4 and 5 .

Experimental design
In January 2018, D. metel seeds independently germinated in germination pots, with a 4:1 mixture of soil and sand.After 68 days of germination, the seedlings were transplanted into 2.5 × 2.0 m plots, spaced 50 × 60 cm apart, and cultivated within the greenhouse of the Sri Lanka Institute of Nano Technology.

RNA extraction
Previous investigations identified the optimal salicylic acid (SA) concentration for exogenous application as 5 mM for higher secondary metabolites production in Solanaceae plants [1] .As such, three plants had 10 ml of 5 mM SA sprayed on their third leaves every 12 h over a span of 36 h.Water was sprayed as the control for another three plants.Three hours following the final treatment, SA sprayed leaves and normal leaf and flower tissues were excised (two replicates from each tissue sample) and immersed in an RNA stabilization reagent, RNAlater (Cat 76104, Qiagen, Germany), to facilitate RNA isolation and subsequent transcriptome analysis.RNA extraction was executed using a Qiagen RNeasy plant mini kit, in adherence to the manufacturer's instructions.Quantitative and qualitative RNA assessment employed a NanoDrop-10 0 0 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, USA).The RNA purity was determined through absorbance measurements at 260 and 280 nm (A260/280) and 260/230 nm (A260/230), with ra-tios between 1.8 and 2.0 indicating optimal purity.RNA integrity was evaluated using an Agilent 2100 bioanalyzer, with a minimum RIN of 7.5 established as the threshold for our samples.

RNA sequencing
For RNA sequencing and library construction of D. metel , the RNA was dispatched to Macrogen in Korea.Trimmomatic software [2] was employed to filter raw reads, excluding sequences with adapters and low quality.Key parameters including N20, N30, N50 values, GC content, and clean data sequence duplication level were calculated.De novo assembly of individual transcriptomes and a shared transcriptome for leaf, induced leaf, and flower buds was executed using Trinity 2.8.4 software [3] , with a default KMER size of 25 adopted for flower and leaf assemblies.Assembled unigenes exceeding a length of ≥500 bp were selected for further analysis, exclusively utilizing high-quality, clean data.

Quality control and sequence analysis
CD-HIT-EST (version 4.6.3)[4] analysis involved the selection of the longest isoform, and other contigs were merged based on a similarity criterion of 90 %.This step eliminated redundancy among FASTA sequences derived from the assembly.BUSCO [5] assessment evaluated the completeness of transcriptome assembly within the Viridiplantea (Kingdom) lineage, employing an E value cut-off of 1.03E-3.

Limitations
Sample replicates were pooled and did not sequence separately.

Fig. 1 .
Fig. 1.(a) Bioanalyzer electropherogram of total RNA for leaf tissue (b) Bioanalyzer electropherogram of total RNA for SA induced leaf (c) Bioanalyzer electropherogram of total RNA for flower tissue.
2023The Authors.Published by Elsevier Inc.
A salicylic acid (SA) concentration of 5 mM was sprayed on the third leaf every 12 h over a span of 36 h.Water was sprayed as the control for another three plants.Three hours following the final treatment, SA sprayed leaves, normal leaves, and flower tissues were excised (two biological repeats from the tissue type) and immersed in an RNA stabilization reagent, RNAlater (Cat 76,104, Qiagen, Germany), to facilitate RNA isolation and subsequent transcriptome analysis.RNA extraction was executed using a Qiagen RNeasy plant mini kit, in adherence to the manufacturer's instructions.

Table 1
RNA concentration determined by NanoDrop.

Table 2
Raw RNA sequencing of the three tissues information.

Table 4
Experimental description of the dataset.