Sequence read alignment and differential expression analysis
RNA-Seq data were generated from 22 extreme segregants of an F2 population derived from interspecific cross between Saccharum officinarum 'LA Purple' and wild species Saccharum robustum MOL5829, including 14 high biomass and 8 low biomass segregants (Additional file 1; Table S-1). Extremely low and high biomass hybrids ranged between 18.0 to 37.0 and 78.7 to 106.5 metric tons/hectare of 12 months old plants, respectively. After mRNA extraction from the 3rd, 6th and 9th internodes and leaf, cDNA libraries (combined of all plant sections) were constructed and used for RNA sequencing. The details of phenotypic data and sugar contents (reducing and non-reducing) was previously reported and sugarcane transcriptome was analyzed using the Sorghum bicolor L. Moench as a reference genome due to the unavailability of the high-quality sugarcane reference genome at that time [23]. Although the small genome of sorghum is an attractive model for tropical C4 photosynthetic grasses, but it has diploid genome and minimal duplication events in the genome. The genome of Saccharum species is highly polyploid and aneuploid, exhibiting variable ploidy levels among different loci, a large genome size and abundant repetitive sequences [24]. Comparing sugarcane transcript data with sorghum would compromise the contribution of allelic and duplicated copies of gene models, which are equally contributing in polyploid crops. Therefore, we re-analyzed our Illumina raw sequences of 22 RNA-seq samples (34 GB, 293.5 million reads and 151 nucleotides length) with a high-quality allele specific Saccharum officinarum reference genome. After quality trimming and removing the adapter sequences, data were reduced to 19 GB and 191.3 million clean reads. Although, HISAT2 is an efficient aligner with better performance in the terms of high percentage of correctly aligned reads and high-precision rates. Nevertheless, in our data, STAR outperformed in alignment with a higher alignment rate (~85%) in all the samples followed by transcriptome assembly and differential expression analysis.
Among 22 samples, 4 samples from high biomass and 1 sample from the low biomass group were excluded, being outliers in clustering analysis of samples. Finally, differential expression of 17 samples (10 HB, 7 LB) resulted in 174593 genes and alleles (GMs (Gene Models), alleles, tandem duplicates and paralogs, (Additional file 1; Fig. S-1, Table S-1) which were sorted using the thresholds, FDR |≤0.05| and pval |<0.01| while log2FC |>2| for up-regulated and log2FC |<-2| for down-regulated. Using the above criteria of selection, 8059 genes and alleles were regarded as DEGs, 5540 were up-regulated and 2519 were down-regulated in high biomass samples (Additional file 2; Table S-3). Among these 8059 genes and alleles, 21.5% belonged to GMs, 68% was allelic contribution, and 0.14% were tandem duplicates and 10% paralogs. Further, Partial least squares discriminant analysis (PLS-DA), hierarchical clustering and volcano plot of the differentially expressed genes to visualize expression patterns in samples based on FPKM and log2FC (Fig. 1). PLS-DA plot reduced the dimensions of large data and showed distinct differences in gene expression patterns among low and high biomass segregants. PC1(26%) exhibited maximum variation in the clustering patterns and showed low biomass samples are tightly clustered on the positive x-axis (Fig.1A). Hierarchical clustering heat map showed variability between two biomass groups, by clustering all the genes in six clusters using expression values (Fig. 1B). The volcano plot visualization showed an overall trend of fold change in high biomass genotypes (Fig. 1C).
Exploration of biological processes involved in growth and biomass by GO
To explore the pathways underlying a large number of differentially expressed GMs, alleles and duplicated genes, GO enrichment analysis was performed (Fig. 2B, Additional file 1; Fig. S-2). Regarding GO, a significant number of genes were enriched in the biological process category in high and low biomass samples, including signaling (GO:0023052), transmembrane transport (GO:0055085), and cellular nitrogen metabolism biosynthetic (GO:0034641) processes. Significant number of genes belonging to membrane metabolism (GO:0016020), cell wall (GO:0005618) and extracellular region related processes (GO:0005576) were identified in the cellular component's category. Moreover, the genes involved in electron transport, catalysis (GO:0003824), transmembrane transporter activity (GO:0022857), transferase activity (GO:0016740) and macromolecules modification (GO:0043412) were significantly enriched in the molecular function category (Additional file 1; Fig. S-2, Additional file 2; Table S-6).
Hormone dynamics in extreme segregants:
To identify the regulatory role of hormones in growth, cell wall remodeling, and biomass accumulation, KEGG analysis was performed for the functional annotation of DEGs between high and low biomass samples (Fig. 2A, Additional file 2; Table S-4, S-7). A considerable number of genes were enriched in different metabolic pathways in KEGG analysis. For example, 96 GMs, alleles and duplicates were differentially regulated in signal transduction pathways, 346 in carbohydrate metabolism (glycolysis, sugars, starch, pyruvate metabolism), and 42 in glycan biosynthesis. Hormone-related sequences obtained from KEGG analysis were categorized into GMs (Gene Model), allelic copies and gene duplications (tandem and paralogous duplicate) using eight-column allele table. In polyploid species, alleles are the homologous genes on the homologous chromosomes at the same loci. Tandemly duplicated genes are originated owing to unequal crossing over, categorized on the criteria if they are having a difference of configurable gene rank =1. Dispersly duplicated paralogs are neither neighbors nor on homologous chromosome, and they are re-labelled based on blastp hits [25, 26]. The details of all the discussed genes and their respective categories (GMs, alleles, tandem and paralogous duplicates) are discussed in (Additional file 2; Table S-5).
Auxin related genes:
Auxin, an essential hormone in modulating growth and development, showed the enrichment of several GMs, alleles and duplications in high biomass samples. Annotated sequences were confirmed using blastp in phytozome and NCBI (https://phytozome.jgi.doe.gov/pz/portal.html). Sorghum genome, a model for C4 plant species having high evolutionary proximity with sugarcane, has a significant number of auxin-related gene family members i.e., 26 Aux/IAA genes, 16 GH3 genes, 36 LBD genes, and 25 ARF genes [27]. Two members (Soffic.03G0032240-1A, Soffic.01G0025830-1P) of auxin influx AUX 1 (Auxin influx carrier protein) were up and down regulated in HB whereas one allele (Soffic.05G0006040-2B) related to auxin efflux PIN (PIN-FORMED) was up-regulated. There was over representation of five auxin-responsive Aux/IAA (repressor) sequences (GMs and alleles) in high biomass genotypes. Two members (Soffic.08G0009450-1P, Soffic.03G0018710-2B) of Gretchen Hagen3 (GH3) family, (indole-3-acetic acid-amido synthetase) showed a profound upregulation in high biomass genotypes. 18 members of small auxin-up RNAs (SAUR) family, and 6 members encoding lateral organ boundaries (LBD) were up-regulated in HB genotypes as compared to low biomass genotypes. Among 8 differentially regulated ARF (AUXIN RESPONSIVE FACTORS) sequences, 7 showed up-regulation in HB genotypes. Active contribution of auxin signaling and responsive genes in HB genotypes predicts auxin mediated growth changes accounting for high biomass (Fig. 3A, Additional file 2; Table S-5).
Jasmonic acid (JA) signaling pathway genes
Jasmonic acid is a lipid-derived hormone, which regulates important stress, and immunity related responses. Many jasmonic acid-related genes were highly enriched in the signal transduction category identified by KEGG analysis (Fig. 3A, Additional file 2; Table S-5). Coronatine-insensitive (COI1) protein acts essentially as a receptor of jasmonoyl–isoleucine (JA–Ile), a conjugated form of JA. Three DEGs coding COI1 were downregulated in HB genotypes indicating absence of substrate for signaling of jasmonic acid responses. There was over representation of 15 ZIM-domain (JAZ) repressor proteins, among which 13 were upregulated in HB genotypes. The action of JAZ mediated repression is accomplished by Groucho/Tup1-type co-repressor TOPLESS (TPL), one allele encoding TPL is also up regulated in HB genotypes in our data. Thanks to the upregulation of JA repressor genes, which inhibit the activation of JA downstream stress responsive genes inhibiting the activation of stress response and allowing the normal function of growth-related genes for increased biomass.
Abscisic acid (ABA) related genes
Regarding abscisic acid signaling pathway, 23 members were differentially expressed in high and low biomass genotypes. Type 2C protein phosphatases (PP2Cs) act as a negative regulator in abscisic acid signaling pathways. Our data shows down and up-regulation of 6 and 4 PP2C coding sequences. One class of protein kinases, i.e., Snf1-related protein kinases 2 (SnRK2s), is activated by autophosphorylation after repression of PP2C. The results exhibited that 7 SnRK2 genes showed up-regulation while 3 genes were down-regulated. Moreover, two Basic leucine zipper (bZIP) and one cis-regulatory DNA element i.e Abscisic acid (ABA) response elements (ABREs) were both down-regulated in HB genotypes, respectively (Additional file 1; Fig. S-3.).
Hormones responsive genes accounting for growth
Several genes may be induced or suppressed by the biologically active hormones in their vicinity, for example, cell wall synthesis CELLULOSE SYNTHASE (CESA) and cell expansion genes xyloglucan (XTH) and Expansins (EXP) strongly respond to auxin, brassinolide and gibberellic acid. Five alleles encoding CESA proteins, were up-regulated in HB genotypes. Cell wall loosening and expansion is critical for cell growth and secondary cell wall formation. A plethora of XTH genes and EXP genes were up regulated in HB genotypes i.e., XTH =19, EXP = 4. Three genes encoding galactosyl transferase, 8 genes belonging to GDP-mannose 4, 6 dehydratase, and 14 members of glycosyltransferase family were up-regulated in HB genotypes. UDP-glycosyltransferases play their indispensable roles in glucosylation of aglycones such as hormones, secondary metabolites engaged in stress and defense responses, and xenobiotics. 14 genes encoding UGTs were downregulated, whereas 11 genes were up regulated in HB genotypes (Fig. 3B, Additional file 2; Table S-5).
Flowering and senescence genes relating vegetative growth
Flowering is an important developmental transition, and time to flowering induction is key trait for deciding assimilate portioning and biomass. Major integrators of flowering related pathways are FLOWERING PROMOTING FACTOR 1 (FPF1), EARLY FLOWERING 3 (ELF3) and AGAMOUS-LIKE MADS-BOX PROTEIN (AGL12). Two gene coding FPF1, one ELF3 and one AGL12 were down-regulated in HB genotypes. Expression of certain other genes responsible for inflorescence architecture, for example HOMEOBOX related genes, BREVIPEDICELLUS, and PROTODERMAL FACTOR 2, was also down regulated in HB genotypes. Newly identified family genes (S-40) related to senescence were down regulated in HB genotypes, which supplements the above-mentioned activity of growth-related hormones and growth responsive genes (Fig. 3B, Additional file 2; Table S-5).
Identification of WGCNA modules associated with hormone and cell wall expansion genes
Weighted gene co-expression network analysis (WGCNA) was performed using 8540 non-redundant genes to identify modules with similar expression patterns. All the genes used in WGCNA showed 21 distinct modules (Fig. 4 A, Additional file 1; Fig. S-4.). From the module-trait relationship co-relation heat map, ‘blue’ and ‘greenyellow’ modules were selected as highly correlated (r = 0.83, r= 0.91) modules in HB genotypes. Cytoscape representation of blue module genes having edge weight over 0.10 indicated high level of connections between genes. Blue module network (left) showed the connections between different hormone repressor genes (IAA, TIFY, and PP2C) and cell wall related growth genes i.e., XTH (xyloglucan endohydrolysis (XEH) and or endotransglycosylation (XET)), UDP-glycosyl-transferase, Histone-lysine N-methyltransferase and O-methyltransferase (Fig. 3.). Green yellow module network (right) comprised of co-expression of DELLA, TIFY, SAUR, PP2C(HAB2), ABCB1, and cell wall related gene Pectinesterase (PME18). Module ‘blue’ of 1158 genes and ‘greenyellow’ of 412 genes appeared to be associated with hormone regulation and biomass gain (Fig. 4 B, 4C). These modules represented highly significant gene connections having the merge cut height (~0.25).
Expression profiles of transcription factors and protein kinases (kinome)
To apprehend the regulatory roles of transcriptions factors (TFs) and protein kinases (PKs), we identified TFs from the protein sequences of differentially expressed genes. In plants, DNA transcription involves more than 1500 TFs to regulate target genes by binding with cis-regulatory elements in promoter region [28]. Mapping the important transcription factors can provide insights to understand transcriptional regulation of the plant hormone genes in achieving the enhanced biomass (Fig. 5A, Additional file 2; Table S-8). In this RNA-Seq data, a high proportion of TFs of 46 different families were differentially expressed between high and low biomass groups, for example, 45 WRKY, 16 TIFY, 38 NAC, and 18 GRAS were differentially regulated between high and low biomass groups. The over-representation of the WRKY transcription factors in high biomass genotypes showed the crucial roles in regulating genes accounting for assimilates accumulation and modulating the growth processes. Owing to their dual role, they can either mediate activation or act as repressors of target genes. For example, in rice OsWRKY72 and OsWRKY77 they activate ABA signaling and repress GA signaling. Qiao and co-workers reported the direct role of WRKY in ABA signaling and drought mediated regulation responses. 7 out of 45 WRKY TFs were down-regulated in high biomass groups whereas, 38 TFs were up-regulated in high biomass [29]. All the 16 TIFY were up regulated in HB genotypes whereas, 21 NAC TFs were down regulated among 38 differentially expressed TFs. Among 52 bHLH differentially expressed TFs, 6 were down-regulated while 46 were up-regulated in HB genotypes. bHLH TFs have a binding site for regulating MeJA inducible genes, and also reported to be transcribed in root apical meristem as a target of ARF5 downstream in auxin signaling pathway [30]. Auxin responsive TFs containing B3 domain were also differentially expressed and 12 out of 18 ARFs were up-regulated in HB genotypes. Moreover, most members of HSF (Heat Shock Factors) were down-regulated in HB genotypes, which shows the inactivated immunity and growth inhibitory processes responses.
Similarly, protein sequences of DEGs were subjected to the identification of protein kinases (PK) from an online database iTAK. In Sorghum bicolor and Saccharrum spontaneum, kinome comprised 1210 PKs for Sbi, and 2919 PKs besides allelic copies and segmental duplications [31]. From 8059 sequences, 132 sequences encoding for PKs were identified from an online database, iTAK. After reconfirmation by BLAST similarities, a heatmap of all the PKs having |log2FC>2|, showing 40 (30%) PKs as down-regulated and 70% were up-regulated was generated (Fig. 5B, Additional file 2; Table S-9). Among 132 PKs, 104 (78%) were receptor-like Kinase (RLKs), 6 PKs (4.8%) were calcium- and calmodulin regulated kinase (CAMK) 7 (5.3%) sequences coding cyclin dependent kinase-like kinase (CMGC) and other families belonged to serine/threonine kinase (STE), tyrosine kinase-like kinase (TKL) etc. RLKs are the kinases having P Kinase domain, either membrane localized (RLK) or membrane-associated proteins localized in cytoplasm close to plasma membrane (RLCK). Several Pelle (RLK) kinase proteins are implicated in cell wall metabolism, hormone signaling, and growth-related responses. Similarly, CAMK and CGMC protein kinases are involved in the cell cycle and certain growth responses.
Hormone quantification
To confirm the empirical evidence of FPKM profiling in signaling of IAA, ABA and JA of the hormones, we quantified three hormones from the plant leaf samples (Fig. 6A). IAA and ABA concentration was high (41.32%, 18.82) in high biomass samples. Whereas, JA contents were drastically high in low biomass samples by 155.86%. This shows highly activated defense system in low biomass genotypes, which may hinder the activity of growth responsive genes. Further, linear regression between hormone concentration and hormone signaling genes was performed, which showed auxin concentration has a positive linear relation with AUX1, ARF and XTH, whereas negative relationship was observed in JA (Fig. 6B).
RNA-seq data validation by qRT-PCR
To validate expression patterns of genes identified by RNA-seq data, 15 genes were selected and examined by qRT-PCR (Fig. 7). One gene related to ABC transporter (Soffic.10G0011210-1A), four genes from auxin signal transduction pathway (Soffic.10G0011210-1A, Soffic.03G0032240-1A, Soffic.10G0012000-1A, Soffic.01G0033270-1A), two JA (TIFY), 1 ABA (EIN) confirmed differences in transcript abundances in high and low biomass genotypes. Similarly, cell wall synthesis and remodeling genes (CSLA2, XTH) were highly expressed in high biomass genotypes. Moreover, flowering and senescence related genes i.e., ELF3, FPF1 and S-40 (Soffic.03G0016710-4D, Soffic.03G0036590-2B, Soffic.03G0024310-3H) showed consistent expression patterns in RNA-seq and qPCR quantification.