Dataset from transcriptome profiling of Musa resistant and susceptible cultivars in response to Fusarium oxysporum f.sp. cubense race1 and TR4 challenges using Illumina NovaSeq

In this investigation, the study focused on the RNAseq data generated in response to Fusarium oxysporum f.sp. cubense (Foc) race1 (Cavendish infecting strain VCG 0124), targeting both resistant (cv. Rose, AA) and susceptible cultivars (Namarai, AA), and Tropical Race 4 (TR4, strain VCG 01213/16), involving resistant (cv. Rose, AA) and susceptible cultivars (Matti, AA). The respective contrasting cultivars were independently challenged with Foc race1 and TR4, and the root and corm samples were collected in two replications at varying time intervals [0th (control), 2nd, 4th, 6th, and 8th days] in duplicates. The RNA samples underwent stringent quality checks, with all 80 samples meeting the primary parameters, including a satisfactory RNA integrity number (>7). Subsequent library preparation and secondary quality control steps were executed successfully for all samples, paving the way for the sequencing phase. Sequencing generated an extensive amount of data, yielding a range of 10 to 31 million paired-end raw reads per sample, resulting in a cumulative raw data size of 11–50 GB. These raw reads were aligned against the reference genome of Musa acuminata ssp. malaccensis version 2 (DH Pahang), as well as the pathogen genomes of Foc race 1 and Foc TR4, using the HISAT2 alignment tool. The focal point of this study was the investigation of differential gene expression patterns of Musa spp. upon Foc infection. In Foc race1 resistant and susceptible root samples across the designated day intervals, a significant number of genes displayed up-regulation (ranging from 1 to 228) and down-regulation (ranging from 1 to 274). In corm samples, the up-regulated genes ranged from 1 to 149, while down-regulated genes spanned from 3 to 845. For Foc TR4 resistant and susceptible root samples, the expression profiles exhibited a notable up-regulation of genes (ranging from 31 to 964), along with a down-regulation range of 316–1315. In corm samples, up-regulated genes ranged from 57 to 929, while down-regulated genes were observed in the range of 40–936. In addition to the primary analysis, a comprehensive secondary analysis was conducted, including Gene Ontology (GO), euKaryotic Orthologous Groups (KOG) classification, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and investigations into Simple Sequence Repeats (SSRs), Single Nucleotide Polymorphisms (SNPs), and microRNA (miRNA). The complete dataset was carefully curated and housed at ICAR-NRCB, Trichy, ensuring its accuracy and accessibility for the duration of the study. Further, the raw transcriptome read datasets have been successfully submitted to the National Center for Biotechnology Information - Sequence Read Archive (NCBI-SRA) database, ensuring the accessibility and reproducibility of this valuable dataset for further research endeavors

Dataset link: Transcriptome analysis of Foc TR4 challenged Musa spp.(Original data) Dataset link: Transcriptome analysis of Foc race1 challenged Musa spp.(Original data) Dataset link: Dataset from transcriptome profiling of Musa resistant and susceptible cultivars in response to Fusarium oxysporum f.sp.cubense race1 and TR4 challenges using Illumina NovaSeq (Original data) Dataset link: RNA-seq raw read counts to genes in the Foc race1 and TR4 challenged corm, root samples (before normalization) (Original data) a b s t r a c t In this investigation, the study focused on the RNAseq data generated in response to Fusarium oxysporum f.sp.cubense (Foc) race1 (Cavendish infecting strain VCG 0124), targeting both resistant (cv.Rose, AA) and susceptible cultivars (Namarai, AA), and Tropical Race 4 (TR4, strain VCG 01213/16), involving resistant (cv.Rose, AA) and susceptible cultivars (Matti, AA).The respective contrasting cultivars were independently challenged with Foc race1 and TR4, and the root and corm samples were collected in two replications at varying time intervals [0th (control), 2nd, 4th, 6th, and 8th days] in duplicates.The RNA samples underwent stringent quality checks, with all 80 samples meeting the primary parameters, including a satisfactory RNA integrity number ( > 7).Subsequent library preparation and secondary quality control steps were executed successfully for all samples, paving the way for the sequencing phase.Sequencing generated an extensive amount of data, yielding a range of 10 to 31 million pairedend raw reads per sample, resulting in a cumulative raw data size of 11-50 GB.These raw reads were aligned against the reference genome of Musa acuminata ssp.malaccensi s version 2 (DH Pahang), as well as the pathogen genomes of Foc race 1 and Foc TR4, using the HISAT2 alignment tool.

Keywords:
Banana Fusarium oxysporum f.sp.cubense Transcriptome Gene-expression Annotation Illumina sequencing The focal point of this study was the investigation of differential gene expression patterns of Musa spp.upon Foc infection.In Foc race1 resistant and susceptible root samples across the designated day intervals, a significant number of genes displayed up-regulation (ranging from 1 to 228) and down-regulation (ranging from 1 to 274).In corm samples, the up-regulated genes ranged from 1 to 149, while down-regulated genes spanned from 3 to 845.For Foc TR4 resistant and susceptible root samples, the expression profiles exhibited a notable up-regulation of genes (ranging from 31 to 964), along with a down-regulation range of 316-1315.In corm samples, up-regulated genes ranged from 57 to 929, while down-regulated genes were observed in the range of 40-936.In addition to the primary analysis, a comprehensive secondary analysis was conducted, including Gene Ontology (GO), euKaryotic Orthologous Groups (KOG) classification, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and investigations into Simple Sequence Repeats (SSRs), Single Nucleotide Polymorphisms (SNPs), and microRNA (miRNA).The complete dataset was carefully curated and housed at ICAR-NRCB, Trichy, ensuring its accuracy and accessibility for the duration of the study.Further, the raw transcriptome read datasets have been successfully submitted to the National Center for Biotechnology Information

Value of the Data
• This information uncovers a range of downstream analyses, including assembly, annotation, differential expression, pathway investigation, and exploration of interactions between Musa spp.and the respective Foc race 1 and TR4.• By examining the transcriptome profiles and annotations of Fusarium wilt resistant and susceptible cultivars that are challenged with Foc race1 and TR4 independently, we can gain insights into the intricate molecular mechanisms that drive defense pathways in banana.• The outcomes aid in pinpointing potential genes associated with plant resistance within the non-model organism, banana.Furthermore, they enhance our existing comprehension of interactions between hosts and pathogens.• The transcriptome sequences will function as essential references and valuable reservoirs for investigating genes linked to resistance and defense, which hold significant roles in Musa cultivars afflicted by Foc race1 and TR4 infections.• Repurposing the data facilitates the creation of markers for Marker-Assisted Selection, enabling breeders to precisely target and enhance desired traits, particularly in relation to Fusarium wilt resistance.• Employing the data in Genome-Wide Association Studies aids in the identification of candidate genes responsible for controlling plant immunity.• Utilizing the data empowers breeders to develop new cultivars with heightened resistance to Fusarium wilt.The information obtained from these data will provide a baseline understanding of the genes of practical importance in a resistance breeding programme.

Objective of the Study
The primary objective of this study is to comparing the infection responses of diploidresistant and susceptible banana cultivars to Foc race 1 (the Cavendish-infecting strain) and TR4.To achieve this goal, a comprehensive analysis of the global transcriptome responses has been conducted in three banana cultivars: cv Rose (AA, resistant to both races), Namarai (AA, susceptible to Foc race 1), and Matti (AA, susceptible to Foc TR4).The study aims to identify differentially expressed genes (DEGs) and elucidate the distinct defense responses triggered by these two Foc races.

Data Description
Detailed statistical information and pertinent data concerning transcripts and unigenes within the context of Foc race1 and TR4-challenged Musa transcriptome profiles have been made available in the Supplementary data 1a, 1b and 2a, 2b ( https://data.mendeley.com/datasets/pnnhtxpd23/1 ).The results revealed an average count of 342,699 and 424,031 unigenes obtained from root samples, while corm samples yielded 410,984 and 373,939 unigenes in the context of Foc race1 and TR4 challenged Musa cultivars, respectively.Remarkably, these unigenes displayed an average length of around 1,120 base pairs.The comprehensive annotation process utilized a range of databases, notably KOG, NR, KEGG, InterPro, GO, and Pfam, as outlined in Table 3 .The entire workflow detailing the transcriptome analysis of Musa samples is visually depicted in Fig. 1 .To enhance data accessibility, the raw reads were deposited into the NCBI-SRA

Sample collection
The primary focus of the study was centered around the RNAseq data obtained as a result of the interaction between Musa species and two distinct strains of Foc race1 (Cavendish infecting strain VCG 0124) [1 , 2] , and Tropical Race 4 (TR4, strain VCG 01213/16) [3] .The investigation targeted both resistant and susceptible cultivars of Musa , with specific cultivars being cv Rose (resistant, AA, Accession no.0638) and Namarai (susceptible, AA, Accession no.0185) for the Foc race1 strain, and cv Rose (resistant, AA) and Matti (susceptible, AA, Accession no.0182) for the TR4 strain [4] .The chosen cultivars were subjected to inoculation with Foc race1 and TR4, after which root and corm samples were collected at various day intervals [0th (control), 2nd, 4th, 6th, and 8th days], each with duplicate samples.

Total RNA extraction
Total RNA was isolated from root and corm samples using the RNA extraction kit (Sigma-Aldrich, USA).Tot al RNA was analyzed by agarose gel electrophoresis for size and integrity.The quantification of total RNA was done with a Nanodrop 20 0 0 (Thermo Scientific TM NanoDrop TM 20 0 0/20 0 0c spectrophotometers).Consequently, the integrity of RNA used for library preparations was checked with a value of > 7 using Bioanalyzer (Agilent, USA).The quality control (QC) passed RNA samples were then processed for library preparation.

Library preparation
DNA-free RNA was used for cDNA synthesis and amplification employing the NEBNext® Single Cell for cDNA Synthesis & Amplification Module (E6421), New England Biolabs, Massachusetts, USA.Following cDNA synthesis, the resulting product was purified using Pronexbeads (NG103B) (Promega, Madison, USA).The length of the cDNA library was evaluated using the Agilent TapeStation instrument (Agilent Technologies).Subsequently, the prepared library was subjected to sequencing using the Illumina Novaseq 60 0 0 platform (Illumina, USA).

Transcriptome sequencing and assembly
Raw reads obtained from sequencing were processed to obtain high-quality reads.Moreover, all reads were trimmed by using the Trimmomatic 0.35 tool [5] to remove low-quality reads and any adapter sequences if present.The sequence quality was accessed using fastp tool (FASTQ data pre-processing tool) with default settings [6] .The algorithm has functions to check the quality control, trimming of adapters, filtering by quality, and read pruning.The resultant high-quality reads of each sample were used for mapping on Musa acuminata DH-Pahang v2 on banana genome hub ( https://banana-genome-hub.southgreen.fr/download) [7 , 8 , 9] by BWT and HISAT2 [10 , 11] .The transcriptome assembly pipeline is illustrated in Fig. 1 [12 , 13] .

Functional annotation
We utilized the DESeq2 package [14] , which is specifically developed for the normalization, visualization, and assessment of differential gene expression (DGE) in datasets with highdimensional count data.For our comparative analysis between the resistant and susceptible conditions in relation to the control, we employed defined criteria to identify up-regulated and down-regulated genes.Genes were considered up-regulated if their log2 fold change (log2FC) was greater than or equal to 2 and their adjusted p -value (padj) was below 0.05.Conversely, genes were deemed down-regulated if their log2 fold change was less than or equal to -2 and their adjusted p -value was below 0.05.These criteria enabled us to pinpoint significant changes in gene expression associated with resistance and susceptibility.Following infection with Foc race1, we detected an average of 47 and 34 up-regulated genes, as well as 92 and 133 down-regulated genes, in samples taken from the roots and corms of resistant and susceptible cultivars, respectively.These samples were collected at multiple day intervals [0th (Control), 2nd, 4th, 6th, and 8th].Similarly, in response to TR4 infection, we observed an average of 564 and 381 up-regulated genes, along with 730 and 477 down-regulated genes, in the roots and corms of resistant and susceptible cultivars, respectively ( Supplementary data 3a-3h ) ( https://data.mendeley.com/datasets/pnnhtxpd23/1).The commonly expressed genes upon Foc race1 and TR4 infection in root and corm samples of resistant and susceptible cultivars are provide in Fig. 2 .The assembled contigs, which encompassed full-length sequences, underwent annotation through similarity searches against the non-redundant (NR) databases [15] , employing an e-value threshold of 1e −5 .Functional annotation was conducted utilizing both the KOG [16] and GO [17] databases.For KEGG pathway analysis [18] , the parameters utilized were as follows: species ko and an E-value cutoff of 1e −5, which facilitated the comparison of annotated transcripts.Furthermore, to assess the distribution of differentially expressed genes (DEGs) across various pathways, we employed the WEGO tool [19] to compute statistical GO enrichment ( Fig. 3 ).The classification of contigs was accomplished using the Pfam database [20] .This comprehensive annotation approach provided valuable insights into the functional characteristics and potential roles of the identified contigs and genes ( Tables 1 , 2a and 2b ).

Fig. 1 .
Fig. 1.Workflow illustrating the Musa transcriptome analysis challenged by Foc race1 and TR4.Pipeline Flowchart database under the following temporary bio sample accession numbers: Foc race1 samples are under SAMN36510589 -SAMN36510628, and Foc TR4 samples are denoted by SAMN36780136 -SAMN36780175.The transcriptome analysis and annotation files, encompassing predicted SSRs, SNPs, KOG, NR, KEGG pathways, and plant transcription factors, have been securely archived on an in-house server.

Fig. 2 .
Fig. 2. Venn diagrams of commonly expressed genes upon Foc race1 and TR4 infection in root and corm samples of resistant and susceptible cultivars.A) Foc Race1 Corm (Res vs Sus) B) Foc race1 Root (Res vs Sus) C) Foc TR4 Corm (Res vs Sus) D) Foc TR4 Root (Res vs Sus) based on absolute normalized threshold values > 1 per sample.

Table 1
Transcriptome summary statistics in average.

Table 2a
Total number of contigs from the transcriptomes of Musa plants, specifically the corm and root tissues of both resistant and susceptible cultivars, upon challenge with Foc race1 and TR4.

Table 2b
Total number of peptides from the transcriptomes of Musa plants, specifically the corm and root tissues of both resistant and susceptible cultivars, upon challenge with Foc race1 and TR4.

Table 3
Statistics of annotation and analysis.