Dataset of De Novo hybrid berry transcriptome profiling and characterization of Piper species (Piper nigrum and Piper longum) using Illumina and Nanopore sequencing

Piper nigrum and Piper longum are the most popular and economically essential spice crops globally valued for their aromatic alkaloids, especially Piperine. However, Piperine synthesis pathway mechanisms are not yet well known. This work was aimed to generate the full-length comparative berry transcriptome analysis dataset of P. nigrum and P. longum by Illumina and Nanopore sequencing platforms. While short-read sequencing technology is widely using to capture transcriptome profiles, there are still some limitations due to the read length. We used Oxford Nanopore technology for long reads and the Illumina sequencing platform for short reads to generate a hybrid transcriptome assembly from half matured and fully matured berries of P. nigrum and P. longum. From P. nigrum and P. longum 37.3 million and 38.1 million raw reads were generated respectively. A total of 308369 contigs from P. nigrum and 267715 contigs from P. longum were obtained and successfully annotated. The transcriptome data revealed gene families involved in piperine and other secondary metabolite biosynthetic pathways. The raw data were uploaded to NCBI database. This dataset shed light on the further exploration of the piperine biosynthetic pathway, its transcriptomic changes, and evolution. Data generated has been submitted to SRA of NCBI with Bio samples accession: (SAMN13981803, SAMN22826456).


a b s t r a c t
Piper nigrum and Piper longum are the most popular and economically essential spice crops globally valued for their aromatic alkaloids, especially Piperine. However, Piperine synthesis pathway mechanisms are not yet well known. This work was aimed to generate the full-length comparative berry transcriptome analysis dataset of P. nigrum and P. longum by Illumina and Nanopore sequencing platforms. While short-read sequencing technology is widely using to capture transcriptome profiles, there are still some limitations due to the read length. We used Oxford Nanopore technology for long reads and the Illumina sequencing platform for short reads to generate a hybrid transcriptome assembly from half matured and fully matured berries of P. nigrum and P. longum . From P. nigrum and P. longum 37.3 million and 38.1 million raw reads were generated respectively. A total of 308369 contigs from P. nigrum and 267715 contigs from P. longum were obtained and successfully annotated. The transcriptome data revealed gene families involved in piperine and other secondary metabolite biosynthetic pathways. The raw data were uploaded to NCBI database. This dataset shed light on the further exploration of the piperine biosynthetic pathway, its transcriptomic changes, and evolution. Data generated has been submitted to SRA of NCBI with Bio samples accession: (SAMN13981803, SAMN22826456

Value of the Data
• This data includes downstream analysis such as relative abundance, differential expression, pathway analysis, and orthology relationships. • The full-length hybrid berry transcriptome data and associated annotation of Piper nigrum and Piper longum will help to explore the molecular mechanism of the piperine biosynthetic pathway and also other important metabolites unique to each species • The hybrid transcriptome sequences will serve as a future reference. They would be valuable resources to examine molecular characteristics of genes that play a role in the biosynthesis of beneficial secondary metabolites in both plants.
• Meta-analysis of the raw sequencing data may be carried out for further in silico comparative genomics studies.

Data Description
Statistical report and other details of transcripts and unigenes for the full-length berry transcriptome are provided in Table 1 . A total of 308369 unigenes from P. nigrum and 267715 unigenes from P. longum with an average length of 1120 were obtained. Using KOG, NR, KEGG, Pfam databases, all the unigenes were successfully annotated ( Table 2 ). The workflow of hybrid transcriptome analysis of Piper nigrum and Piper longum samples are provided in Fig. 1 . The raw reads were submitted in the NCBI database and is publicly accessible at bio sample accession no: SAMN13981803, SAMN22826456. The annotation and analysis file including the data of filtered pathways, detected SSR's, KOG, NR, KEGG and plant transcription factors are submitted in Mendeley database ( https://data.mendeley.com/datasets/vyr4r7mxj8/draft?a= a8fd91d3-2868-4ae5-bf3d-d6ab370ab792 ).

Sample collection
Half matured and fully matured berry samples of P. nigrum variety IISR-Thevam and P.
longum were collected from ICAR-Indian Institute of Spices Research, Experimental Farm, Kozhikode, Kerala, India

Total RNA Extraction
The modified Spectrum Plant Total RNA Kit (STRN50-Sigma) protocol was used to extract the total RNA. Bioanalyzer 2100 (Agilent, USA) was used to assess RNA integrity. An equivalent Sequencing was performed on samples with an RNA integrity score of at least six.

Transcriptome sequencing and De novo assembly
Illumina and Nanopore sequencing were used to perform de novo transcriptome sequencing of berry samples. The Illumina data were demultiplexed using bcl2fastq, and nanopore fast5 data were base-called using Albacore [1] . The quality of the Illumina data was analyzed using FastQC [2] . The short reads were processed using velvet (ver. 1.1.04-ver. 0.1.21) denovo assembly pipeline [3] with the minimum kmer length was set to 69 (-hash_length 69) and maximum kmer value set to 194 (-MAXKMERLENGTH 194). Option selected for short paired read type (-shortPaired), set up two separate files for paired reads (-separate) and track the short read positions in assembly (-read_trkg yes). Data from both Illumina and Nanopore platforms and a short-read transcriptome assembly were submitted to a hybrid transcriptome assembly using IDP-denovo Assembler [4] by the parameter of kmer length 69 (-K_MER_LENGTH 69). Choose the option for left mate short reads (-SR_left), right mate short reads (-SR_right) and long nanopore reads file (-long_reads). The assembler was run with multiple threads. The de novo hybrid transcriptome assembly pipeline is illustrated in Fig. 1 .

Functional Annotation
The full-length contigs were annotated by homology searches using the NR database [5] with an e-value of 1e −5. Functional annotation was performed using KOG [6] and GO [7] . The KEGG [8] parameters -species ko; E-value 1e-5 was used to compare and annotate transcripts. Contigs were classified using Pfam [9] .

Ethics Statement
Nil.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.

Data Availability
Dataset of De Novo Hybrid Berry Transcriptome Profiling and Characterization of Piper sps. (Piper nigrum and Piper longum) using Illumina and Nanopore Sequencing (Original data) (Mendeley Data).