Dataset for transcriptomic profiles associated with development of sexual structures in Aspergillus flavus

Information on the transcriptomic changes that occur within sclerotia of Aspergillus flavus during its sexual cycle is very limited and warrants further research. The findings will broaden our knowledge of the biology of A. flavus and can provide valuable insights in the development or deployment of non-toxigenic strains as biocontrol agents against aflatoxigenic strains. This article presents transcriptomic datasets included in our research article entitled, “Development of sexual structures influences metabolomic and transcriptomic profiles in Aspergillus flavus” [1], which utilized transcriptomics to identify possible genes and gene clusters associated with sexual reproduction and fertilization in A. flavus. RNA was extracted from sclerotia of a high fertility cross (Hi-Fert-Mated), a low fertility cross (Lo-Fert-Mated), and unmated strains (Hi-Fert-Unmated and Lo-Fert-Unmated) of A. flavus collected immediately after crossing and at every two weeks until eight weeks of incubation on mixed cereal agar at 30 °C in continuous darkness (n = 4 replicates from each treatment for each time point; 80 total). Raw sequencing reads obtained on an Illumina NovaSeq 6000 were deposited in NCBI's Sequence Read Archive (SRA) repository under BioProject accession number PRJNA789260. Reads were mapped to the A. flavus NRRL 3357 genome (assembly JCVI-afl1-v2.0; GCA_000006275.2) using STAR software. Differential gene expression analyses, functional analyses, and weighted gene co-expression network analysis were performed using DESeq2 R packages. The raw and analyzed data presented in this article could be reused for comparisons with other datasets to obtain transcriptional differences among strains of A. flavus or closely related species. The data can also be used for further investigation of the molecular basis of different processes involved in sexual reproduction and sclerotia fertility in A. flavus.

mixed cereal agar at 30 °C in continuous darkness ( n = 4 replicates from each treatment for each time point; 80 total). Raw sequencing reads obtained on an Illumina NovaSeq 60 0 0 were deposited in NCBI's Sequence Read Archive (SRA) repository under BioProject accession number PRJNA789260. Reads were mapped to the A. flavus NRRL 3357 genome (assembly JCVI-afl1-v2.0; GCA_0 0 0 0 06275.2) using STAR software. Differential gene expression analyses, functional analyses, and weighted gene co-expression network analysis were performed using DESeq2 R packages. The raw and analyzed data presented in this article could be reused for comparisons with other datasets to obtain transcriptional differences among strains of A. flavus or closely related species. The data can also be used for further investigation of the molecular basis of different processes involved in sexual reproduction and sclerotia fertility in A. flavus .

Value of the Data
• This article reports transcriptomic dataset from sclerotia of A. flavus exhibiting high level of fertility and compared to sclerotia with low level of fertility and unfertilized sclerotia. The data will be useful for researchers interested in the gene expression, genomics and functional genomics of A. flavus and other fungi with a sexual cycle. • The raw data and methodologies in this article could be reused to compare with other similar datasets to obtain transcriptional differences among strains of A. flavus or closely related species. • The reported data can be used to screen for candidate genes that are involved in the initiation of sexual reproduction, development of sexual structures, and other fertilizationassociated processes in A. flavus . It can be further used in investigating the molecular basis and functional pathways of these processes. • Genes that are differentially expressed between treatments and time points can be used as markers for sclerotia fertility and can be useful in developing biocontrol strategies against aflatoxigenic strains of A. flavus .

Data Description
A total of 80 transcriptome libraries were generated from four samples collected from each of four treatments (Hi-Fert-Mated, Lo-Fert-Mated, Hi-Fert-Unmated and Lo-Fert-Unmated) at five sampling time points (T0, T1, T2, T3 and T4 described below). The sequence reads obtained on an Illumina NovaSeq 60 0 0 were deposited in NCBI's Sequence Read Archive (SRA) repository under BioProject accession number PRJNA789260. A list of samples according to treatment × time point combination is shown in Table 1 . The reported values for % duplicates, % GC content, and total sequence lengths have been filtered to remove low-quality reads and adapters from raw sequence data. Each library contains an average of 18.69 million filtered quality reads, yielding a total of 1.50 billion reads ( Table 1 ).
Multifactor analyses were used to identify genes that were differentially expressed between main factor effects: fertility (high vs. low), sampling time points vs. T0 (2 weeks incubation vs. T0, until 8 weeks incubation vs. T0), and mating (mated = TRUE vs. unmated = FALSE). Analyses were conducted in DeSeq2 using the formula: ∼ time + fertility + mating. Differentially expressed genes were defined as having a fold change of 2 and an adjusted p -value < 0.05. Expression values for genes that meet these criteria are listed in Table 2 . A total of 2804 DEGs were identified between fertility levels, up to 3810 DEGs between sampling time points, and 731 DEGs between mating categories ( Table 2 A and 2 B). The interaction effect between fertility and mating was investigated using the formula: ∼ time + fertility + mating + fertility:mating and can be identified in the dataset as Fertilityhigh.matedTRUE. This analysis identified 710 DEGs that were detected in Hi-Fert-Mated but not in Lo-Fert-Mated ( Table 2 A and 2 B). All DEGs identified in the multifactor comparisons were subjected to functional enrichment analysis. P -values for each functional term were reported, with separate tables for both up-regulated and downregulated genes ( Table 2 C), up-regulated genes only ( Table 2 D), and down-regulated genes only ( Table 2 E). Pairwise analyses between 36 different treatment × time point combinations were also evaluated. These comparisons identified genes that were differentially expressed between mated strains at similar time points, unmated strains at similar time points, and similar treatments at consecutive time points. Number of DEGs for the pairwise comparisons ranged from 2 to 3058 genes ( Table 2 F and 2 G).
Co-expression module analysis using weighted gene co-expression network analysis (WGCNA) identified 25 modules of highly correlated genes in the Hi-Fert strain (NRRL 29507) (Table 3A). By default, WGCNA uses colors to name the modules. The overall gene expression profile of each module was correlated with mating and time point. Black, dark red, salmon, and pink modules yielded correlation values with mating above 0.5 ( p -value < 0.05) ( Table 3A). The preservation ( Z ) score shows how strong the modules in the Hi-Fert strain are preserved among genes in the   Lo-Fert strain (NRRL 21882). Values between 2 and 10 were considered as weak to moderately preserved modules, while Z scores above 10 were considered as highly preserved. The pink module was highly preserved while the other three modules in the low fertility strains were low to moderately preserved (Table 3A). Results of the enrichment analysis for these four modules are shown in Table 3B. The list of genes for each co-expression module is shown in Table 3C.

Treatments and sclerotia production
This article reports the transcriptomes of A. flavus sclerotia exhibiting different levels of fertility collected over an eight-week period of incubation at 30 °C in continuous darkness. The treatments consisted of a high fertility cross (Hi-Fert-Mated, NRRL 29507 sclerotia x NRRL 21882 conidia), a low fertility cross (Lo-Fert-Mated, NRRL 21882 sclerotia x NRRL 29507 conidia), and unmated strains (Hi-Fert-Unmated, NRRL 29507 sclerotia; Lo-Fert-Unmated, NRRL 21882 sclerotia) of A. flavus . Selection of parental strains (NRRL 29507 and NRRL 21882) was based on the study of Horn et al. [2] , and the sclerotia and conidia from parental strains were prepared according to the methodologies in Luis et al. [1 , 3] . Briefly, Hi-Fert-Mated was prepared by placing sclerotia of NRRL 29507 over a layer of NRRL 21882 conidia on mixed cereal agar (MCA) [4] plate. Lo-Fert-Mated was prepared by placing sclerotia of NRRL 21882 over a layer of NRRL 21882 conidia on MCA plate. Sclerotia of NRRL 29507 and of NRRL 21882 were individually plated on MCA to serve as unmated controls. Culture plates were sealed with parafilm, arranged in zip lock bags, and then incubated at 30 °C under continuous darkness.
Changes in transcription profiles over the eight-week period were assessed by collecting culture plates from each treatment starting from immediately after crossing (T0), and at 2 weeks (T1), 4 weeks (T2), 6 weeks (T3) and 8 weeks (T4) of incubation. During harvesting, 3-5 mL distilled water containing 0.01% Triton-X was poured onto the culture plate and then the sclerotia were carefully detached from the agar using a transfer loop. Residual conidia that remained attached to the sclerotia were removed by transferring the sclerotia to 50 mL microtubes, repeatedly washed in DEPC-treated water, then filtered through a miracloth (MilliporeSigma). The sclerotium samples ( n = 4 sample replicates per treatment per time point; 80 total) were flash frozen, stored in -80 °C until all samples were collected, then submitted to the North Carolina State University Genomic Sciences Laboratory (Raleigh, NC, USA) for RNA extraction and sequencing.

RNA extraction, library construction and Illumina sequencing
RNA extraction was performed using Qiagen RNeasy mini columns and reagents (Germantown, MD, USA) following the manufacturer's instructions. Integrity, purity, and concentration of RNA were checked using an Agilent 2100 Bioanalyzer with an RNA 60 0 0 Nano Chip (Agilent Technologies, USA). Messenger RNA (mRNA) was purified using oligo-dT beads included in the NEBNext Poly(A) mRNA Magnetic Isolation Module (New England Biolabs, USA). Complementary DNA (cDNA) libraries for Illumina sequencing were prepared with the NEBNext Ultra Directional RNA Library Prep Kit (NEB) and NEBNext Multiplex Oligos for Illumina (NEB) following the manufacturer-specific protocol. Amplified library fragments were purified, quality-checked and quantified for final concentration using an Agilent 2200 Tapestation (Agilent Technologies, USA). Quantified libraries were pooled in equimolar amounts for clustering and sequencing on an Illumina NovaSeq 60 0 0 DNA sequencer in 3 XP split lanes using a 100 bp single end sequencing SP reagent kit (Illumina, USA). Raw bcl files were generated via the Real Time Analysis software package, then de-multiplexed by sample into fastq files.

Differential expression analysis
The quality of raw sequence reads was checked using FastQC [5] prior to analysis. Low quality sequences and adapters were removed using BBDuk [6] . The sequencing reads were then mapped to the A. flavus NRRL 3357 genome (assembly JCVI-afl1-v2.0; GCA_0 0 0 0 06275.2) using STAR v2.6.1 [7] . Multifactor analyses comparing the effects of fertility, mating and sampling time point were conducted in DESeq2 v1.28.1 [8] . Multifactor analyses were modeled using the formula: ∼time + fertility + mating. Interaction effect between fertility and mating was analyzed using the formula: ∼ time + fertility + mating + fertility:mating. Values for the interaction effect were extracted with the "results" function from DESeq2 as "results(dds, name = 'fertilityhigh.matedTRUE', alpha = 0.05)". Genes with log2(fold_change) ≥ |1| and adjusted P ≤ 0.05 were considered as differentially expressed. Pairwise analyses of differentially expressed genes (DEGs) between treatments or time points were also conducted in DESeq2. All differentially expressed gene sets from each multifactor and pairwise comparisons were subjected to functional enrichment analyses using annotation terms from the Gene Ontology, KEGG pathways, SMURF secondary metabolite clusters, Apoplast-p, Signal-p, Effector-p, Deeploc, and Interpro domains.

Weighted gene co-expression network analysis
Variance-stabilized mRNA counts from DESeq2 were used as input for WGCNA [9] to create individual co-expression networks for Hi-Fert (NRRL 29507) and Lo-Fert (NRRL 21882) strains. The two networks were created using 40 samples with NRRL 29507 and 40 samples with NRRL 21882 as sclerotial parents. The settings used for network adjacency matrix creation was "corFnc = 'bicor', type = 'signed hybrid', power = 10". Module preservation analysis was conducted with the module Preservation function. Comparison between the Lo-Fert modules and the Hi-Fert modules was used as the reference. Functional enrichment analysis was performed on each co-expression module gene set using annotation terms from the Gene Ontology, KEGG pathways, SMURF secondary metabolite clusters, Apoplast-p, Signal-p, Effector-p, Deeploc, and Interpro domains.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.