Comparative Transcriptome Analysis Revealed Key Genes Regulating Gossypol Synthesis in Tetraploid Cultivated Cotton

Tetraploid cultivated cotton (Gossypium spp.) produces cottonseeds rich in protein and oil. Gossypol and related terpenoids, stored in the pigment glands of cottonseeds, are toxic to human beings and monogastric animals. However, a comprehensive understanding of the genetic basis of gossypol and gland formation is still lacking. We performed a comprehensive transcriptome analysis of four glanded versus two glandless tetraploid cultivars distributed in Gossypium hirsutum and Gossypium barbadense. A weighted gene co-expression network analysis (WGCNA) based on 431 common differentially expressed genes (DEGs) uncovered a candidate module that was strongly associated with the reduction in or disappearance of gossypol and pigment glands. Further, the co-expression network helped us to focus on 29 hub genes, which played key roles in the regulation of related genes in the candidate module. The present study contributes to our understanding of the genetic basis of gossypol and gland formation and serves as a rich potential source for breeding cotton cultivars with gossypol-rich plants and gossypol-free cottonseed, which is beneficial for improving food safety, environmental protection, and economic gains of tetraploid cultivated cotton.


Introduction
Tetraploid cultivated cotton (Gossypium spp.) is a globally appreciated and important economic crop, which produces the leading natural fiber for textile and provides a large quantity of cottonseeds containing 21% oil and 23% protein [1]. However, the oil or protein from cottonseeds is inedible directly due to the presence of gossypol, which is toxic to humans and monogastric animals [2,3]. It must be dephenolized before application, which limits the comprehensive utilization of cottonseed. Gossypol is a yellowish phenolic compound that can protect cotton plants against pests, diseases, and abiotic stresses [4,5]. Therefore, breeders have exerted much effort to cultivate cotton varieties with gossypol-free cottonseed and plants with a normal gossypol content [1,6,7].
Pigment glands appear as small dark spots and are specialized cavity structures that store a wide variety of secondary metabolites, including gossypol [8,9]. Given the specific storage locations of gossypol and related terpenoids, they are significant indicators of gossypol, with significant positive correlations with the numbers [10,11]. Therefore, it is important to explore the genetic basis of pigment glands for developing cotton cultivars with gossypol-free cottonseed and plants with a normal gossypol content. Pigment glands are one of the major characteristics of the Gossypium species [12]. They are secretory organs with a large central cavity which are formed by the degradation of gland primordium cells [13]. Most of the pigment glands are distributed in the subepidermal tissues of the aerial parts and the cortex of roots in cotton plants [3].
Studies on the genetic mechanisms of pigment glands formation in cotton began in the early 20th century [14][15][16][17]. Since then, genetic research conducted by cotton breeders revealed the roles of six gland genes in gland formation, of which two major genes were Gl 2 and Gl 3 [18]. In a previous study, the glandless trait of the whole plant and cottonseed was controlled by two recessive genes (gl 2 gl 2 gl 3 gl 3 ) on A12 and D12 chromosomes or one dominant gene (Gl 2 e ) at the Gl 2 locus [19][20][21]. The Gl 2 e was discovered in a mutant released as Bahtim 110 in Egypt and Hai 1 (G. barbadense) in China [21,22]. Then, it was identified through map-based cloning [23,24]. The gl 1 was mapped based on bulked segregant analysis and sequencing [25]. Virus-induced gene silencing (VIGS) of the candidate GoSPGF resulted in glandless stems and a dramatically reduced gossypol content [25]. In addition, three 'Cotton Gland Formation' genes (CGF1, CGF2, and CGF3) were identified using the RNA-seq analysis of embryos from near-isogenic glanded (Gl 2 Gl 2 Gl 3 Gl 3 ) versus glandless (gl 2 gl 2 gl 3 gl 3 ), and VIGS against them led to significant reductions in the gland number in the leaves and a significantly lower gossypol level and related terpenoids [2]. A new glandassociated gene GauGRAS1 was identified in the gland-forming stage and functionally validated by VIGS, which caused glandless stems and petioles in Gossypium australe [26]. A transcription factor named CGP1 was identified by comparative transcriptome analysis on the stem tissues of glanded and glandless varieties, which was involved in regulating gland pigmentation [3]. Although several genes related to gland formation or gossypol synthesis have been discovered and cloned, the regulatory relationships between them are still not clear. Understanding the specific mechanism of pigment gland formation and gossypol synthesis could facilitate the cultivation of glandless cottonseeds without affecting the amount of gossypol in the whole plant.
Recently, weighted gene co-expression network analysis (WGCNA) has been widely applied as a powerful method to illustrate the genetic architecture underlying complex traits by extracting meaningful differences across the integration of large-scale transcripts and complex traits [27][28][29][30]. The gene sets and their co-expression modules, including hub genes which play a key role in the regulation of the gene expression network, can be identified effectively, which are strongly specifically associated with the target traits [31]. WGCNA has been widely used to study fiber traits [32][33][34], stress resistance [35], and so on, but few studies focused on gossypol and pigment glands exist.
In this study, we employed comparative transcriptome and WGCNA analysis of glanded and glandless cultivars distributed in G. hirsutum and G. barbadense to identify the major genes involved in gland formation in tetraploid cultivated cotton. A series of differentially expressed genes were identified in the leaves from glanded and glandless cotton cultivars. Moreover, a WGCNA analysis was performed using differentially expressed genes to identify important co-expression modules and related metabolic networks, which were strongly associated with the formation of the pigment gland and related secondary metabolism. Finally, the hub genes were highlighted and further validated using a quantitative reverse transcription polymerase chain reaction (qRT-PCR). The results of this study provide insight into the molecular genetic basis of gossypol gland formation and serve as a rich potential source for developing cotton cultivars with gossypol-free cottonseed and plants with a normal gossypol content.

Plant Materials
Four glanded cultivars (G. hirsutum 'X10', G. hirsutum 'TM-1', G. barbadense 'H7124', and G. barbadense '3-79') and two glandless cultivars (G. hirsutum 'X18' and G. barbadense 'H1') obtained from the Institute of Cotton Research, Chinese Academy of Agricultural Sciences (Anyang, China) were planted at Anyang Experiment Station. Among these cultivars, 'X10', 'TM-1', 'H7124', and '3-79' have glands throughout the whole plant and cottonseed. 'H1' is produced by introducing Gl 2 e in G. barbadense and shows no gland in the whole plant and cottonseed [21]. X18 is a mutant bred by dominant glandless line Zhong 5655 (G. hirsutum, bred from H1) with backcrossing glanded line X10 through years of selections [36][37][38]. X18 has the special property of producing low-gossypol cottonseed and has a few glands in its stem and vein, but no glands in its leaves. Apical fresh leaves of each cultivar were collected without veins at the full-bloom stage (approximately week 12 after emergence) for total RNA extraction. Three biological replicates were maintained for each cultivar and frozen in liquid nitrogen immediately and stored at −80 • C.

RNA Extraction, Library Construction, and RNA Sequencing
Total RNA was isolated from the collected leaves of each sample using the Plant RNA Rapid Extraction kit (Molfarming, Nanjing, China). RNA quality and concentration were examined with the Agilent 2100 RNA 6000 Nano kit (Agilent Technologies, Santa Clara, CA, USA). Only the RNA samples with OD260/280 = 1.8-2.2, OD260/230 > 2.0 and RNA integrity number (RIN) > 8 were used for RNA sequencing. Library construction and sequencing were accomplished by Majorbio Bio-pharm Technology Corporation, Ltd. (Shanghai, China). The mRNA was enriched from total RNA using magnetic beads with oligo (dT) for library preparation. A total of 18 libraries were sequenced using the Illumina Novaseq 6000 platform with 2 × 150 bp paired-end raw reads.

RNA-seq Data Analysis
The RNA-seq raw data were processed to filter out the adapter, poly-N, and lowquality reads using Trimmomatic (v0.36) software [39]. The clean data were mapped to the reference genome of TM-1 (G. hirsutum) [40] using HISAT2 [41]. Gene expression values were estimated using the Subread suite (v1.5.2) [42], and the transcripts per kilobase of exon model per million mapped reads (TPM) were calculated to measure the gene expression level [43]. Pearson correlation coefficients between samples were calculated, and samples with correlation coefficients less than 0.8 between the biological replicates were eliminated. The differentially expressed genes (DEGs) between glandless and glanded samples were identified based on the average expression of biological replicates using DESeq2 R package [44], which were carried out in G. hirsutum and G. barbadense samples, respectively. The genes with padj ≤ 0.05 and an absolute value of log2 fold change ≥ 1.5 were defined as significant DEGs. A Venn diagram and volcano plot were drawn using the R package ggplot2 [45].

Construction of Gene Co-Expression Networks
Gene co-expression networks were constructed using the R pipeline WGCNA [31]. The common DEGs were clustered into co-expression modules, and the correlations between each module eigengene and glanded/glandless trait were used to estimate moduletrait associations. When the co-expression analysis was completed, the edge files were sorted according to the weight value. The co-expression networks among DEGs in the candidate modules that significantly related to the glanded/glandless trait were established with an eigengene-based connectivity (K ME ) value ≥ 0.9 and edge weight value ≥ 0.5. Network visualization was performed using the Cytoscape software version 3.6.1 [46]. Furthermore, hub genes, which show the most significant connections in networks, were identified on the basis of their high module membership (K ME ) values > 0.97 and gene significance > 0.55.

Function Annotation of DEGs
The function annotation of DEGs within the candidate module was performed according to their homology with the annotated Arabidopsis genes from TAIR (Arabidopsis information resources). Gene Ontology (GO) functional enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed on the Cotton Functional Genomics Database (CottonFGD: http://cottonfgd.org/ (accessed on 8 October 2021)) [47] with the criterion of corrected p Value ≤ 0.05.

Validation of DEGs by qRT-PCR
First-strand cDNA was synthesized from the total RNA of each sample using the HiScript II Reverse Transcriptase Kit (Vazyme, Nanjing, China). The cDNA was diluted to 100 ng/µL and mixed with TransStart TOP Green qPCR SuperMix (TransGen, Beijing, China) to a total of 20 µL for qRT-PCR. The amplifications were conducted on an ABI Prism 7500 Fast Real-time PCR System (Applied Biosystems, Foster City, CA, USA) according to the instructions provided by the manufacturer. Each qRT-PCR reaction included three biological replicates and three technical replicates. The expressions levels were normalized using ACTIN (GenBank: AY305733) as an internal reference and calculated using the 2 −∆∆Ct method [48]. The specific primers (Table S6) were designed using Oligo 7 software [49] and synthesized by Sangon Biotech (Shanghai, China).

Transcriptome Analysis
To explore the molecular basis underlying pigment gland formation, we compared the transcriptomes of four glanded and two glandless cultivars distributed in G. hirsutum and G. barbadense. In total, 18 RNA-seq libraries each with 6 cultivars with 3 biological replications were constructed, and a total of 124.76 Gb clean data from 836.44 million clean reads were generated. The clean data of each sample were more than 6.22 Gb, with a quality score Q30 > 92.59%, and the average GC content was 44.57%. The clean reads of each sample were mapped to the G. hirsutum reference genome of TM-1 [40], and the alignment rate ranged from 95.82% to 97.83% (Table S1).
To further exploit RNA-seq results, we employed correlation analysis for each sample based on the expression of all genes, which showed that the G. hirsutum and G. barbadense samples as well as the glanded and glandless samples were significantly divided into different groups ( Figure 1).

Function Annotation of DEGs
The function annotation of DEGs within the candidate module was performed according to their homology with the annotated Arabidopsis genes from TAIR (Arabidopsis information resources). Gene Ontology (GO) functional enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed on the Cotton Functional Genomics Database (CottonFGD: http://cottonfgd.org/ (accessed on 8 October 2021)) [47] with the criterion of corrected pValue ≤ 0.05.

Validation of DEGs by qRT-PCR
First-strand cDNA was synthesized from the total RNA of each sample using the HiScript II Reverse Transcriptase Kit (Vazyme, Nanjing, China). The cDNA was diluted to 100 ng/µL and mixed with TransStart TOP Green qPCR SuperMix (TransGen, Beijing, China) to a total of 20 µL for qRT-PCR. The amplifications were conducted on an ABI Prism 7500 Fast Real-time PCR System (Applied Biosystems, Foster City, CA, USA) according to the instructions provided by the manufacturer. Each qRT-PCR reaction included three biological replicates and three technical replicates. The expressions levels were normalized using ACTIN (GenBank: AY305733) as an internal reference and calculated using the 2 −∆∆Ct method [48]. The specific primers (Table S6) were designed using Oligo 7 software [49] and synthesized by Sangon Biotech (Shanghai, China).

Transcriptome Analysis
To explore the molecular basis underlying pigment gland formation, we compared the transcriptomes of four glanded and two glandless cultivars distributed in G. hirsutum and G. barbadense. In total, 18 RNA-seq libraries each with 6 cultivars with 3 biological replications were constructed, and a total of 124.76 Gb clean data from 836.44 million clean reads were generated. The clean data of each sample were more than 6.22 Gb, with a quality score Q30 > 92.59%, and the average GC content was 44.57%. The clean reads of each sample were mapped to the G. hirsutum reference genome of TM-1 [40], and the alignment rate ranged from 95.82% to 97.83% (Table S1).
To further exploit RNA-seq results, we employed correlation analysis for each sample based on the expression of all genes, which showed that the G. hirsutum and G. barbadense samples as well as the glanded and glandless samples were significantly divided into different groups ( Figure 1).

WGCNA Analysis of DEGs
In order to identify the specific gene sets that are strongly correlated with pigment gland formation, the co-expression modules were generated by WGCNA using the TPM of 431 DEGs of all samples. The power of β = 18 (R 2 = 0.84) was selected as a soft threshold to ensure a scale-free network. Some genes with a higher correlation coefficient were clustered into the same cluster, and then the dynamic cutting method was used to cut the branches into different modules and the modules with similar expression patterns were merged according to a correlation coefficient greater than 0.8. As a result, a total of seven distinct modules associated with the specific expression profiles of different samples were obtained (Figure 3a,b). The turquoise module contains the largest number of genes, and the red module contains the least number of genes (Figure 3c). The grey module represents genes which cannot be classified into any one module and/or whose TPM < 1 in more than 50% of the samples. "H7124 vs. H1", and 431 common DEGs were shared between the 3858 DEGs of the G. hirsutum group and the 1208 DEGs of the G. barbadense group (Figure 2f).

WGCNA Analysis of DEGs
In order to identify the specific gene sets that are strongly correlated with pigment gland formation, the co-expression modules were generated by WGCNA using the TPM of 431 DEGs of all samples. The power of β = 18 (R 2 = 0.84) was selected as a soft threshold to ensure a scale-free network. Some genes with a higher correlation coefficient were clustered into the same cluster, and then the dynamic cutting method was used to cut the branches into different modules and the modules with similar expression patterns were merged according to a correlation coefficient greater than 0.8. As a result, a total of seven distinct modules associated with the specific expression profiles of different samples were obtained (Figure 3a,b). The turquoise module contains the largest number of genes, and the red module contains the least number of genes (Figure 3c). The grey module represents genes which cannot be classified into any one module and/or whose TPM < 1 in more than 50% of the samples. The correlation coefficients between trait features of each sample (glanded/glandless) and module eigengenes are shown in Figure 3d. No module is significantly associated with the glanded trait features of H7124, 3-79, TM1, and X18 at the same time. The module eigengenes of 'green' and 'yellow' were extremely significantly positively associated with The correlation coefficients between trait features of each sample (glanded/glandless) and module eigengenes are shown in Figure 3d. No module is significantly associated with the glanded trait features of H7124, 3-79, TM1, and X18 at the same time. The module eigengenes of 'green' and 'yellow' were extremely significantly positively associated with the glandless trait features of H1 but significantly negatively associated with the glandless trait features of X18. On the contrary, the module eigengene of 'turquoise' was extremely significantly positively associated with the glandless trait features of X18 but significantly negatively associated with the glandless trait features of H1. Of particular note, the module eigengene of 'blue', which included 136 genes, was significantly negatively associated with the glandless trait features of both H1 and X18 and positively associated with the glanded trait features of H7124, 3-79, TM1, and X18 at the same time, which showed a consistent association between G. hirsutum and G. barbadense cultivars (Figure 3d). Consequently, the 'blue' module was selected as a candidate module, which could be used to identify the common genetic basis for gland formation in tetraploid cultivated cotton. Corresponding to the significant negative association of the glandless and 'blue' module, the expression of most genes in this module was significantly down-regulated in H1 and X18 (Figure 4a). Moreover, significant positive correlations were observed between the gene significance (GS, or the correlation of gene expression and trait features values) and module membership (or K ME ) of each gene in H1 and X18 (Figure 4b,c). In addition, the 136 genes of the 'blue' module were spread over the genome, except for chromosome A02, and a pair of homologous chromosomes, A01 and D01, carried the most genes (Table S2).
the glandless trait features of H1 but significantly negatively associated with the glandless trait features of X18. On the contrary, the module eigengene of 'turquoise' was extremely significantly positively associated with the glandless trait features of X18 but significantly negatively associated with the glandless trait features of H1. Of particular note, the module eigengene of 'blue', which included 136 genes, was significantly negatively associated with the glandless trait features of both H1 and X18 and positively associated with the glanded trait features of H7124, 3-79, TM1, and X18 at the same time, which showed a consistent association between G. hirsutum and G. barbadense cultivars (Figure 3d). Consequently, the 'blue' module was selected as a candidate module, which could be used to identify the common genetic basis for gland formation in tetraploid cultivated cotton. Corresponding to the significant negative association of the glandless and 'blue' module, the expression of most genes in this module was significantly down-regulated in H1 and X18 (Figure 4a). Moreover, significant positive correlations were observed between the gene significance (GS, or the correlation of gene expression and trait features values) and module membership (or KME) of each gene in H1 and X18 (Figure 4b,c). In addition, the 136 genes of the 'blue' module were spread over the genome, except for chromosome A02, and a pair of homologous chromosomes, A01 and D01, carried the most genes (Table S2).

Construction of Co-Expression Gene Networks and Identification of Hub Genes
The WGCNA analysis helped us to focus on the 'blue' module, in which a completeness co-expression network was constructed ( Figure 5 and Table S2). Interestingly, 21 hub genes were identified based on the criteria of KME ≥ 0.98 and gene significance for X18 ≥ 0.57 (Figures 4b and 5 (red and blue)), and 26 hub genes were identified based on the criteria of KME ≥ 0.98 and gene significance for H1 ≥ 0.55 (Figures 4c and 5 (red and orange)). There were 18 common hub genes which showed the strongest co-expression correlations. Additionally, a total of 42 genes in the network were identified as homologous genes of the known genes related to the formation of gossypol and glands, and 13 of these were hub genes ( Figure 5 (octagons) and Table S2).

Construction of Co-Expression Gene Networks and Identification of Hub Genes
The WGCNA analysis helped us to focus on the 'blue' module, in which a completeness co-expression network was constructed ( Figure 5 and Table S2). Interestingly, 21 hub genes were identified based on the criteria of K ME ≥ 0.98 and gene significance for X18 ≥ 0.57 (Figures 4b and 5 (red and blue)), and 26 hub genes were identified based on the criteria of K ME ≥ 0.98 and gene significance for H1 ≥ 0.55 (Figures 4c and 5 (red and  orange)). There were 18 common hub genes which showed the strongest co-expression correlations. Additionally, a total of 42 genes in the network were identified as homologous genes of the known genes related to the formation of gossypol and glands, and 13 of these were hub genes ( Figure 5 (octagons) and Table S2).  Figure 5. The co-expression network of genes in blue module. Red: common hub genes for H1 and X18; blue: unique hub genes for X18; orange: unique hub genes for H1; black: non-hub genes. Octagon: the homologous genes of known genes related to the gossypol and/or glands formation; Vshaped: the genes that have not been reported.

Functional Annotations of DEGs in Candidate Module
To identify and confirm the roles of the candidate 'blue' module in the formation of gossypol and glands, the DEGs were annotated using functional annotation, GO enrichment, and KEGG pathway enrichment. The results of the functional annotation showed that 123 genes were accurately annotated based on the high homology with Arabidopsis thaliana (Table S3). Notably, 24 genes were annotated as 'cytochrome P450 (CYP) family', among which 9, 7, and 7 genes were CYP71B, CYP706A, and CYP82C, respectively. Moreover, nine genes were annotated as 'NAD(P)-binding superfamily protein' involved in oxidation reduction process; seven genes were annotated as 'terpene synthase/cyclases'; and five genes were annotated to each of 'disease resistance responsive family protein', 'lactoyl-glutathione lyase (glyoxalase I) family protein', and 'transmembrane protein', respectively. Additionally, 24 of the 29 hub genes were accurately annotated, which were also mainly related to 'cytochrome P450 (CYP) family', 'NAD(P)-binding superfamily protein', 'lactoyl-glutathione lyase (glyoxalase I) family protein', 'terpene synthase/cyclases', and '2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase superfamily protein' (Table S3).

Functional Annotations of DEGs in Candidate Module
To identify and confirm the roles of the candidate 'blue' module in the formation of gossypol and glands, the DEGs were annotated using functional annotation, GO enrichment, and KEGG pathway enrichment. The results of the functional annotation showed that 123 genes were accurately annotated based on the high homology with Arabidopsis thaliana (Table S3). Notably, 24 genes were annotated as 'cytochrome P450 (CYP) family', among which 9, 7, and 7 genes were CYP71B, CYP706A, and CYP82C, respectively. Moreover, nine genes were annotated as 'NAD(P)-binding superfamily protein' involved in oxidation reduction process; seven genes were annotated as 'terpene synthase/cyclases'; and five genes were annotated to each of 'disease resistance responsive family protein', 'lactoyl-glutathione lyase (glyoxalase I) family protein', and 'transmembrane protein', respectively. Additionally, 24 of the 29 hub genes were accurately annotated, which were also mainly related to 'cytochrome P450 (CYP) family', 'NAD(P)-binding superfamily protein', 'lactoyl-glutathione lyase (glyoxalase I) family protein', 'terpene synthase/cyclases', and '2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase superfamily protein' (Table S3).

Confirmation of DEGs by qRT-PCR
To validate the expression levels of DEGs identified by RNA-seq, 25 genes were selected to carry out qRT-PCR, including 18 common hub genes, 2 unique hubgenes for X18 which were the homologous genes of enzymes catalyzing the defined steps in MVA for gossypol pathways, 2 unique hubgenes for H1 which were the homologous gene of the enzyme in MVA pathways and the homologous gene of pcC13-62 relating to the nectary In the KEGG pathways analysis, the unigenes of the 'blue' module were significantly enriched into 16 pathways, which were prominently related to 'metabolism of terpenoids and polyketids', 'metabolism of cytochrome P450', 'steroid hormone (isoprenelike) biosynthesis', 'metabolism', 'diterpenoid biosynthesis', 'lipid metabolism', 'sesquiterpenoid and triterpenoid biosynthesis', 'biosynthesis of various plant secondary metabolites', and so on (Figure 6b and Table S5). Further, most of the hub genes were mainly enriched in the pathway of 'metabolism of terpenoids and polyketides', 'cytochrome P450', and 'metabolism'. Note, in particular, six hub genes, GH_A13G2336, GH_A03G0193, GH_D03G1778, GH_D13G2328, GH_A13G1576, and GH_D05G3845, were significantly enriched in multiple metabolism pathways, such as 'metabolism of terpenoids and polyketides', 'cytochrome P450', 'steroid hormone biosynthesis', 'diterpenoid biosynthesis', and 'sesquiterpenoid and triterpenoid biosynthesis', which were directly involved in the gossypol biosynthesis and pigment gland development (Table S5). Obviously, the genes in the candidate module played important roles in the metabolic for the formation of gossypol biosynthesis and pigment gland in cotton.

Confirmation of DEGs by qRT-PCR
To validate the expression levels of DEGs identified by RNA-seq, 25 genes were selected to carry out qRT-PCR, including 18 common hub genes, 2 unique hubgenes for X18 which were the homologous genes of enzymes catalyzing the defined steps in MVA for gossypol pathways, 2 unique hubgenes for H1 which were the homologous gene of the enzyme in MVA pathways and the homologous gene of pcC13-62 relating to the nectary formation of bean plant, and 3 non-hubgenes which were the homologous genes of CYP76B, CGP1, and GoPGF identified previously (Table S2).
As expected, most of the selected genes showed the similar trend of expression profiles between qRT-PCR and RNA-seq across samples, particularly the expression between glanded and glandless materials in each genetic background (G. hirsutum or G. barbadense) (Figure 7). The results confirmed the reliability of the RNA-seq data, which also confirmed that the reduction in or disappearance of gossypol and pigment glands were strongly associated with the down-regulation of genes identified in the candidate co-expression network.
Genes 2023, 14, x FOR PEER REVIEW 10 of 16 formation of bean plant, and 3 non-hubgenes which were the homologous genes of CYP76B, CGP1, and GoPGF identified previously (Table S2). As expected, most of the selected genes showed the similar trend of expression profiles between qRT-PCR and RNA-seq across samples, particularly the expression between glanded and glandless materials in each genetic background (G. hirsutum or G. barbadense) (Figure 7). The results confirmed the reliability of the RNA-seq data, which also confirmed that the reduction in or disappearance of gossypol and pigment glands were strongly associated with the down-regulation of genes identified in the candidate co-expression network.

Discussion
As the exclusive repository of toxic gossypol, pigment glands contribute to the resist of cotton against pathogens, pests, herbivores, and abiotic stresses [50][51][52]. However, the presence of toxic gossypol hinders the utilization of cottonseed rich in protein and oil. Worse still, the dephenolization of cottonseed is poor in operability, of a high cost, and contributes to environmental pollution. Obviously, it is valuable to breed new cotton cultivars with gossypol-free cottonseed and plants with a normal gossypol content, which can meet the multiple demands of agricultural production and broaden food resources for humans.
Usually, the number of glands can be used as a more convenient and efficient phenotypic indicator to study the gossypol content in cotton [8][9][10]. Numerous attempts have been made by breeders to study the morphogenesis of glands formation as well as the accumulation of related secondary metabolites in them [2,5,24]. However, the research progress of this area was at one time limited by the time-consuming and laborious task of map-based cloning for the corresponding loci. With the development of RNA sequencing, comparative transcriptome analysis was proved to be a rather straightforward and useful method to identify candidate genes that related to pigment gland formation [2,3]. In the present study, in addition to the identification of 13,113 DEGs by the comparative transcriptome analysis of several glanded and glandless cotton cultivars (Figure 2), a further

Discussion
As the exclusive repository of toxic gossypol, pigment glands contribute to the resist of cotton against pathogens, pests, herbivores, and abiotic stresses [50][51][52]. However, the presence of toxic gossypol hinders the utilization of cottonseed rich in protein and oil. Worse still, the dephenolization of cottonseed is poor in operability, of a high cost, and contributes to environmental pollution. Obviously, it is valuable to breed new cotton cultivars with gossypol-free cottonseed and plants with a normal gossypol content, which can meet the multiple demands of agricultural production and broaden food resources for humans.
Usually, the number of glands can be used as a more convenient and efficient phenotypic indicator to study the gossypol content in cotton [8][9][10]. Numerous attempts have been made by breeders to study the morphogenesis of glands formation as well as the accumulation of related secondary metabolites in them [2,5,24]. However, the research progress of this area was at one time limited by the time-consuming and laborious task of map-based cloning for the corresponding loci. With the development of RNA sequencing, comparative transcriptome analysis was proved to be a rather straightforward and useful method to identify candidate genes that related to pigment gland formation [2,3]. In the present study, in addition to the identification of 13,113 DEGs by the comparative tran-scriptome analysis of several glanded and glandless cotton cultivars (Figure 2), a further WGCNA was innovatively applied to identify the key co-expression network and major genes for pigment gland formation (Figures 3-5, Table S2).
Many of the DEGs in the candidate 'blue' module (network) were directly involved in gossypol biosynthesis and pigment gland development. Specifically, first, 24 and 7 genes were annotated as 'CYP450 family' and 'terpene synthase/cyclases family', respectively (Table S3), whose family members were participated in the metabolism of gossypol and/or pigment glands, such as the homologous genes of CYP82D113 [5,53] CYP76B6 [54], and CDN/CDNC ((+)-δ-cadinene synthase) [5,25,53] (Table S2). Those related genes were significantly enriched in the pathway of 'metabolism of terpenoids and polyketides', 'metabolism of cytochrome P450, 'Steroid hormone biosynthesis', and 'diterpenoid biosynthesis', which enriched at least 48 genes of the candidate module. Second, GH_D12G2619 and GH_A12G2598, as the homologous genes of GoPGF/CGF3 which had been proved to control both gland morphogenesis and gossypol biosynthesis [23,24] through regulating the expression of JAZ, WRKYs, and TPSs [24], were enriched in the 'anthocyanin compound metabolic (GO:0046283)' and 'flavonoid metabolic (GO:0009812)', which enriched 8 and 13 genes of the candidate module, respectively. It should also be noted that CGP1 (homologous gene of GH_A07G0851), a MYB transcription factor in the nucleus, can interact with GoPGF and form heterodimers to control the synthesis of gossypol and other secondary metabolites in cotton [3] (Table S3).
In addition, in our transcriptome results, 122 homologous genes with more or less expression in at least one cultivar were identified to be involved in the 18 enzymatic reactions of the know MVA and gossypol pathways [5,53], most of which were downregulated in glandless materials ( Figure 8). Additionally, full relative expressions of them in six cultivars were presented in Figure 8, serving as a reference resource for other relevant studies. Interestingly, lots of homologous genes of related enzymes were extensively expanded with tandem duplications (Figure 8), which appear to have arisen from local duplications, such as CDN, DH1, CYP82D113, and 2-ODD-1. However, here, we are focused on genes that were effectively expressed in most samples (>50%), especially the DEGs clustered in the candidate 'blue' module. In total, in this module, 25 homologous genes of related enzymes were identified to be involved in at least 11 enzymatic reactions of the know MVA and gossypol pathways [53], especially each of the enzymatic reactions of the gossypol pathways. Additionally, all of them were significantly down-regulated or specifically not expressed in glandless materials ( Figure 8, Table S2), which were very consistent with the results of previous studies [5,25,53,54]. Together, so many (42) homologous genes of known genes were clustered in the 'blue' module (or co-expression network) ( Figure 5), which suggested that the WGCNA analysis provided a powerful approach to identifying candidate genes for pigment gland formation and gossypol biosynthesis and that the reliable scientific results were identified in present studies. Thus, we believe that the expression network constructed by 136 DEGs should play a crucial role in the formation of gland and gossypol in tetraploid cultivated cotton. In total, 29 hub genes were highlighted in the co-expression network ( Figure 5, Table  S2), of which 13 were the homologous genes of know genes involved in the metabolism In total, 29 hub genes were highlighted in the co-expression network ( Figure 5, Table S2), of which 13 were the homologous genes of know genes involved in the metabolism of pyruvate, terpenoids and polyketides, cytochrome P450, sesquiterpenoid, and triterpenoid biosynthesis. Notably, 12 hub genes were homologous to 8 enzyme genes of the know pathway mentioned above [53]. Specifically, two common hub genes GH_A13G2336 and GH_D13G2328 for 2-ODD-1; two common hub genes GH_A03G0193 and GH_D03G1778 for CYP706B1; two common hub genes GH_A03G0399 and GH_D03G1574 and a unique hub gene GH_A03G0398 for SPG; and common hub genes GH_A13G1576, GH_D01G2288, and GH_D02G2103 for CYP71BE79, DH1, and HMGR, respectively. Moreover, a common hub gene GH_D05G3845 was one of the homologous genes of CDN/CDNC, which was reported to decrease 95.1% of hemigossypolone and 96.7% of gossypol by VIGS [5]. GH_D05G2016, a unique hub gene for H1, was one of the homologous genes of CYP82D113, which was reported to decrease more than 50% of hemigossypolone and gossypol by VIGS [5]. Likewise, GH_D08G0492, a unique hub gene for H1, was detected as the homologous gene of pcC13-62, which was identified as a major nectar protein (nectarin) of the bean plant and is expressed exclusively in the stylopodium, where the nectary is located [55]. The relationship between pcC13-62 and pigment gland or gossypol synthesis requires further study. Although identified as a non-hub gene in tetraploid cultivated G. hirsutum and G. barbadense, GH_A04G0525 was detected as the homologous gene of Gbi08G2110 (CYP76B6) regulated by GoPGF, which was identified in G. bickii and showed an important regulatory role for the biosynthesis of gossypol [54]. Altogether, in line with the crucial role of the identified gene network in the formation of gland and gossypol in tetraploid cultivated cotton, we believe that these hub genes should play an important role in the regulation of the expression network, which should be given priority in future studies.
Gossypol-free is a highly desirable trait for cottonseeds that increases the value of commercial cotton cultivars. Tissue-specific silencing is an effective measure to silence a gene in a particular tissue without affecting its expression in other tissues, and the trait created by this way is stable and heritable [56][57][58]. Therefore, the identification of candidate genes that are associated with gossypol biosynthesis or pigment gland formation provides us with the genetic resources which can be used for strict tissue-specific silencing to eliminate or reduce the gossypol in cottonseed kernel. Nowadays, as genetic engineering technology has developed, RNAi, CRISPR/Cas9, and CRISPR/Cas13a systems were applied for the destruction of specific transcripts [1,59,60]. Any such gene silencing technologies in conjunction with a seed-specific promoter can be used to eliminate the glands or gossypol from cottonseed only and develop a cotton cultivar with glanded plants and glandless cottonseed.

Conclusions
In the present study, 29 hub genes and related regulatory networks, which played key roles in the formation of gland and gossypol in tetraploid cultivated cotton, were identified by the RNA-seq of glanded and glandless cultivars distributed in G. hirsutum and G. barbadense. Our study provided the opportunity for a more accurate and comprehensive resolution of the genetic basis of gossypol and gland formation and should serve as a rich source for breeding cotton cultivars with gossypol-rich plants and gossypol-free cottonseed.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes14061144/s1, Table S1: Summary of the RNA-seq data; Table S2: Gene information in blue module; Table S3: Function annotation of genes in blue module; Table S4: Go enrichment information in blue module; Table S5: KEGG enrichment information in blue module; Table S6: Primer sequences for qRT-PCR; Table S7: Expression TPM of top 20 significantly up-regulated and down-regulated genes. Data Availability Statement: The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.