The transcriptomic (RNA-Sequencing) datasets collected in the course of floral induction in Chenopodium ficifolium 459

The transition from vegetative growth to reproduction is the essential commitment in plant life. It is triggered by environmental cues (day length, temperature, nutrients) and regulated by the very complex signaling gene network and by phytohormones. The control of flowering is well understood in Arabidopsis thaliana and in some crops, much less is known about the other angiosperms. We performed the detailed transcriptomic survey of the course of floral induction in seedlings of Chenopodium ficifolium accession 459, a close relative of the important crop Chenopodium quinoa. It flowers earlier under short days (6 hours light) than under long days (18 hours light). Plants were sampled at the age 14, 18, 21 and 24 days in the morning and afternoon, both at long and short day, for RNA-Sequencing, and also for phytohormone analyses. We employed Illumina NovaSeq6000 platform to generate raw reads, which were cleaned and mapped against the de novo constructed transcriptome of C. ficifolium. The global gene expression levels between long and short days were pairwise compared at each time points. We identified differentially expressed genes associated with floral induction in C. ficifolium 459. Particular attention was paid to the genes responsible for phytohormone metabolism and signaling. The datasets produced by this project contributed to better understanding of the regulation of growth and development in the genus Chenopodium.

induction in C. ficifolium 459. Particular attention was paid to the genes responsible for phytohormone metabolism and signaling. The datasets produced by this project contributed to better understanding of the regulation of growth and development in the genus Chenopodium .  Table   Subject Plant Science: General Specific subject area Transcriptomic changes during floral induction; differential gene expression under short and long photoperiod Type of data

Value of the Data
• The gene expression data provide the comprehensive picture of transcriptomic changes during floral induction in Chenopodium ficifolium accession 459, making it possible to identify the genes, putatively involved in the regulation of flowering. • The transcriptomic data set may be used not only by the specialists investigating flowering, but also by numerous researchers interested in plant growth and development, plant stress response and phytohormone function. • This comprehensive data set may be also used for the comparison with the course of floral induction in C. ficifolium accessions with the opposite response to photoperiod, which flower earlier under long days, or for the comparison with the important crop Chenopodium quinoa . The integrative analysis of transcriptomic and hormonomic data will contribute to the creation of the plausible model of the control of flowering in the genus Chenopodium , which is phylogenetically distant from the current model plants.

Data Description
The general overview of the transcriptomic data is given in Table 1 , which presents the accession numbers of raw data generated by RNA sequencing at particular time points of the floral induction experiment, as well as the counts of raw and trimmed Illumina reads. Clean reads Table 1 Accession numbers and read counts for raw data of the transcriptomes from the specific time points in the course of floral induction (days after sowing, DAS) in C. ficifolium 459 under short and long days.  were mapped against the reference de novo transcriptome of C. ficifolium by Salmon and differential expression (DE) between short day (SD)-treated and long day (LD)-treated plants in particular time points was estimated by DESeq2. The most highly DE genes were analyzed for GO enrichment by OmicsBox v.1.3.3. Table 2 shows the enriched GO categories among 6096 DE genes, with the sum of log2fold above a selected threshold. GO categories include hydrogen peroxide catabolism, hydrolase and peroxidase activities, or defense response. We generated the graphs of gene expression in the course floral induction under contrasting photoperiods. Fig. 1 shows the graph for the LATE ELONGATED HYPOCOTYL ( LHY ) gene as an example. LHY is the homolog of the central clock oscillator gene in A. thaliana and might have performed the same function in C. ficifolium , too.The gene expression graphs for the phytohormone-related genes, which were not presented in [1] are accessible on Mendeley (DOI: 10.17632/gxh32vrrxc.2).The graphs were constructed from TMM coverage values and log2 fold changes between SD-and LD-grown plants.

Plant material
The accession C. ficifolium 459 was originally collected in Central Asia [2] . The plants were cultivated in the Institute of Experimental Botany greenhouse and propagated by self-pollination. Seeds were surface-sterilized and germinated as described by Štorchová et al. [2] . Average-sized seedlings with opened cotyledons and uniform growth were selected for the experiments. Plants

RNA sampling and extraction
The seedlings were collected twice a day (in the morning at 9.00 and the afternoon at 15.00) at 14, 18, 21 and 24 DAS under SD and LD. The light was switched on at 9.00 under both regimes. Above-ground parts of the seedlings (14 and 18 DAS) or upper leaves and stems with apical parts of young plants (21 and 24 DAS) from each photoperiodic regime were collected and flash-frozen in liquid nitrogen. Three biological replicates, each consisting of three to four seedlings from LD conditions and eight to ten seedlings from SD conditions, were sampled at each time point. Total RNA was extracted using a Plant RNeasy Mini kit (Qiagen, Valencia, CA, USA). DNase I treatment was performed according to the manufacturer's protocol (DNA-free, Ambion, Austin, TX, USA) to remove genomic DNA. If necessary, the DNase I treatment was done twice to eliminate any traces of genomic DNA. RNA concentration and quality were checked on 0.9% agarose gel and using the NanoDrop (Thermo Fisher Scientific, Vantaa, Finland).

RNA-Sequencing
Total RNAs extracted from the seedlings collected at eight time points under SD and LD were stabilized by GenTegra technology (GenTegra, Pleasanton, California, USA) and sent to Macrogen (Seoul, Korea) in GenTegra microtubes. Strand-specific cDNA libraries were constructed from polyA enriched RNA. Additional RNAs were prepared from leaves, flowers, and roots of mature plants grown in the greenhouse to supplement seedling RNA specimens to achieve the more complete transcriptome assembly. Strand-specific cDNA libraries were constructed from polyA enriched RNA and sequenced on the Illumina NovaSeq60 0 0 platform.
We obtained 753,019,719 paired-end (PE) reads (150 nt), about 14.8 million reads per sample. The read quality in phred scores per base is shown in Fig. 2 . These raw reads were first error corrected using Rcorrector [3] with default settings, to address random sequencing errors in the RNA-Seq dataset.
After error correction, ribosomal RNA was filtered out deploying SortMeRNA [4] using the provided silva rRNA databases as reference. The resulting sequencing reads were further quality and adapter trimming with TrimGalore [5] . Here, we used the trimming lengths of 145 bp with quality trimming (-q) of 5, for stringency and maximum allowed error rates default options (stringency 1, -e 0.1). This trimming procedure removed approximately 25% of the data, leaving 567,261,573 paired-reads after the cutoff.
The raw and trimmed reads of the 48 samples (14, 18, 21, and 24 DAS) were deposited under the BioProject number PRJNA771226 with SRA accessions SRR16327138-SRR16327180 for the raw reads and SRR16380491-SRR16380533 for the trimmed reads. The raw and trimmed reads of three samples (leaves, roots, and flowers of adult plants are available under the same BioProject number under SRA accessions SRR19142490-SRR19142492, and SRR19143407-SRR19143409, respectively.

Transcriptome assembly and evaluation
Part of the trimmed reads, one replicate per sampling time point and treatment, as well as the three individual samples from leaves, roots and flowers of adult plants, were used for the de novo assembly with Trinity v.2.9.0 [6] with default options and the strand-specific RNA-Seq read orientation parameter (-SS_lib_type RF). The resulting assembly was first roughly evaluated with the perl script within the Trinity pipeline (StatsTrinity.pl) resulting in 213,741 transcripts and 168,036 potential 'genes', and an N50 value of 1530 based on all transcripts.
The redundancy of the Trinity assembly was first reduced with CD-Hit v.4.8.1 [7] applying a similarity cutoff of 99.9%. It was followed by a step, which resulted in a more condensed and non-redundant transcript assembly, with the script EvidentialGene tr2aacds.pl using MINCDS = 50. The resulting okay set, containing 55,020 transcripts and 51,146 potential genes, was used as the final assembly and input for a blastx search against the nr database. The blastx results were obtained using the command line application with the faster blastx-fast version. The parameters employed for the blastx search against the nr-database were an e-value of 0.01 and a maximum of 10 target sequences. The BLASTX results were imported into the MEGAN pipeline [8] , with only plant hits retained.
The evigene assembly was used for all subsequent analyses and deposited at DDBJ/ENA/ GenBank in the TSA archive under the accession GJOD010 0 0 0 0 0. The version described in this paper is the first version, GJOD010 0 0 0 0 0.
After this step, we applied three evaluation methods to check the quality of the assembly. First we used BUSCO v.3.1.0. [9] with the embryophytes_odb9 database and in transcriptome mode (-mode trans) to access the assembly. BUSCO reported 1329 complete, from which 1279 are single copy and 50 duplicated, 34 fragmented and 76 missing BUSCOs. Second we employed detonate with the RSEM-EVAL package v.1.11 [10] using bowtie2 with the transcript-length-parameters 959_APVO_SCC_Genes.fasta, as true_transcript_length_distribution, the -strand-specific and -paired-end option for the 145 bp reads assembly. This evaluation resulted in a score of -78578280472.09. Finally, a custom script was used to evaluate the completeness and contiguity of the Trinity assembly as described in [11] . The assembly showed a completeness of 0.915 and contiguity of 0.904.
To annotate the transcriptome, blastx-based homology searches (BLAST + 2.9.0) for the final transcriptome assembly against the NCBI nr protein database were performed. The cutoff E-value was set to < 10-4, and the maximum number of allowed hits was set to 10. The Omics-Box program v. 1.3.3 (BioBam Bioinformatics S.L., Valencia, Spain) was then used to annotate the "Trinity" genes based on gene ontology (GO) terms, InterProScan, and nr database annotation.

Transcript quantification and pairwise differential expression
Transcript quantification was done with the Trinity pipeline, using the alignment-free method Salmon v.1.4.0 [12] with default parameter, but specifying the single stranded library with -SS_lib_type RF for all samples (48) at each sampling time point. The resulting estimated fragment counts and normalized expression metrics (transcripts per million transcripts; TPM) were reported for the transcripts and trinity 'genes' in each of the samples. In the next step a matrix of estimated counts and a second matrix of cross-sample normalized expression values using the TMM (trimmed mean of M-values) method was built for all samples on the transcript and gene-level. These matrices were used for the subsequent analyses of DE genes.
The differential gene expression analysis was carried out on both, the transcript and trinity 'gene' level, using the Bioconductor package DESeq2 v.1.32.0 [13] and the scripts within the Trinity pipeline. The three biological replicates for each sampling time point were pairwise compared contrasting the LD with the SD condition. The standard single time point analysis was used. Extraction of DE genes was done for each sampling time point with 0.05 cutoff for corrected FDR p-values. For the subsequent analyses only gene-level data was used.
To set the collection of DE genes used for the Gene Ontology Term Enrichment analysis (GO analysis), an index was created based on the Fold Change values between SD and LD treated samples obtained through the software DESeq2 [13] . Absolute values of log2 Fold Change for each DE gene between SD and LD at each sampling time point were summed up. High values of the sum denoted high pair differences in the expression between SD and LD, both positive and negative. The thresholds of 10, 15, and 20 index sum values corresponding to 6096, 3011, and 1545 DE genes, respectively, were selected to perform GO analysis. After comparing the GO analysis outputs and the gene expression graphs of selected DE genes, the set of 6096 genes was chosen as the most robust set for the GO enrichment analysis. The Fisher exact test (p-value < 0.05) implemented in OmicsBox program v. 1.3.3 was utilized for this analysis.

Ethics Statements
Our data was obtained from plant material, no animals were used.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Repository of Supplementary files from the paper entitled: "The transcriptomic (RNA-Sequencing) datasets collected in the course of floral induction in Chenopodium ficifolium 459 " (Original data) (Mendeley Data).