Genome-wide identification and expression profile of GhGRF gene family in Gossypium hirsutum L.

Background Cotton is the primary source of renewable natural fiber in the textile industry and an important biodiesel crop. Growth regulating factors (GRFs) are involved in regulating plant growth and development. Methods Using genome-wide analysis, we identified 35 GRF genes in Gossypium hirsutum. Results Chromosomal location information revealed an uneven distribution of GhGRF genes, with maximum genes on chromosomes A02, A05, and A12 from the At sub-genome and their corresponding D05 and D12 from the Dt sub-genome. In the phylogenetic tree, 35 GRF genes were divided into five groups, including G1, G2, G3, G4, and G5. The majority of GhGRF genes have two to three introns and three to four exons, and their deduced proteins contained conserved QLQ and WRC domains in the N-terminal end of GRFs in Arabidopsis and rice. Sequence logos revealed that GRF genes were highly conserved during the long-term evolutionary process. The CDS of the GhGRF gene can complement MiRNA396a. Moreover, most GhGRF genes transcripts developed high levels of ovules and fibers. Analyses of promoter cis-elements and expression patterns indicated that GhGRF genes play an essential role in regulating plant growth and development by coordinating the internal and external environment and multiple hormone signaling pathways. Our analysis indicated that GhGRFs are ideal target genes with significant potential for improving the molecular structure of cotton.


INTRODUCTION
Cotton (Gossypium spp.) is the most important natural fiber and biodiesel crop in the world (Eevera & Pazhanichamy, 2013;Li et al., 2015;Zhu & Li, 2013). However, the yield and quality of cotton production must be improved. Cotton fiber and oil are closely related to the growth and development of cotton plants, and genes involved in controlling cotton growth and development can be used to improve cotton varieties. The GRF (growth-regulating factors) gene family is a branch of the small plant-specific transcription factor (TF) gene family, whose first identified member ''OsGRF1'' was isolated by differentially displaying the mRNA from internode intercalary meristem tissues of rice treated with gibberellin (Van der Knaap, Kim & Kende, 2000). Since then, many genome-wide characterizations of GRF genes have been performed and genome-sequencing technology has advanced. The GRF family is universally present in many plant taxa. For example, nine GRF family in Arabidopsis (Kim, Choi & Kende, 2003), 12 in Oryza sativa (Choi, Kim & Kende, 2004), 14 in Zea mays (Zhang et al., 2008), 10 in Brachypodium distachyon (Filiz, Koc & Tombuloğlu, 2014), 17 in Brassica rapa (Wang et al., 2014a), eight in Vitis vinifera, 19 in Populus trichocarpa (Cao et al., 2016), 10 in Fragaria vesca (Li et al., 2021), 10 in Jatropha curcas (Tang et al., 2021), and 25 in Nicotiana tabacum (Zhang et al., 2018).
GRF is a plant-specific transcription factor that plays an essential role in the development of roots, stems, and leaves and in the formation of flowers and seeds. However, GRF expression is controlled by miRNA396, and the regulation mode of GRF-miRNA396 is a core of plant development (Noon, Hewezi & Baum, 2019;Omidbakhshfard et al., 2015). The regulatory network formed by GRF and miRNA396 is essential for developing soybean roots (Noon, Hewezi & Baum, 2019). MiRNA is a small RNA regulator, typically 20-21 nucleotides in length, that can regulate the post-transcriptional inhibition of target genes. MiRNAs are a crucial regulator during plant growth and development. Heterologous miRNA396 expression in Arabidopsis and tobacco led to a decrease in the expression of three of the four tested NtGRF genes, which was accompanied by a severe reduction in leaf size and narrow leaf phenotype. miRNA396 expression is positively regulated by upstream TCP transcription factors, while miRNA396 regulates the transcriptional abundance of downstream GRF. miRNA396/GRF regulatory network affects the development of plant leaves (Curaba, Singh & Bhalla, 2014;Omidbakhshfard et al., 2015). The conserved miRNA396 is involved in growth, development, and abiotic stress responses in various plants by regulating its target gene growth regulator (GRF) transcription factor genes (Noon, Hewezi & Baum, 2019).
Cotton is an essential cash crop and produces 90% of the world's lint. Currently, 45 diploids and five tetraploid cotton species are present in the cotton genus (Huang et al., 2021;Wendel & Cronn, 2003). Allopolyploid cotton (G. hirsutum and G. barbadense) is widely cultivated because of its significant economic value. G. hirsutum and G. barbadense (A and D subgenomes) were derived about 1-2 million years ago (MYA) from isolated diploid genomes of G. arboreum and G. raimondii. These diploid cotton species were hybridized through transoceanic dispersal (Hu et al., 2019;Huang et al., 2020;Wendel & Cronn, 2003). These diploid and tetraploid species are primarily used for evolutionary and biological studies of cotton.
Advancements in cotton genome sequencing (Li et al., 2015;Li et al., 2014;Wang et al., 2012) have provided a new opportunity to exploit gene resources to genetically improve cotton. In this study, we identified GRF genes in tetraploid upland cotton (G. hirsutum), the main cultivated cotton species (Li et al., 2015). The phylogenetic tree was generated using the protein sequences of AtGRFs, OsGRFs, and GhGRFs. Furthermore, we analyzed the exon and intron structures of the GhGRF genes, reverse complementary fragments in their mRNA with microRNA396 sequences, and the deduced protein motifs, as well as the cis-acting regulatory elements in their promoter regions. We also characterized their expression profiles in different organs, developmental stages, and responses to various abiotic and hormonal stresses. Our results provide a foundation for further studies assessing how GhGRF genes can be used to improve cotton genetics.

Identification of GRF genes
The genomic and protein sequences of G. hirsutum (NAU, v1.1; ZJU, v 1.0; JGI, v1.1) were downloaded from CottonFGD (https://cottonfgd.org/about/download.html) (Zhu et al., 2017), NCBI (https://www.ncbi.nlm.Nih.gov/genome/), and Phytozome 13 (https://phytozome.jgi.doe.gov/pz/portal.html). GRF protein sequences of rice were downloaded from NCBI GenBank, and the protein sequences of AtGRFs were acquired from TAIR 10 (http://www.arabidopsis.org). AtGRF sequences were used as query sequences to blast against the G. hirsutum protein database to search homologous candidate sequences in the GRF gene family with default parameters. The candidate sequences were submitted to the Batch CD-search tool in NCBI and ScanProsite tool in the ExPASy prosite database (https://prosite.expasy.org/scanprosite/) to detect QLQ and WRC domains, which are the structural features to distinguish GRF family members from other gene families. The GRF family genes were determined by predicting the conserved domain of the candidate sequence.

Gene structure and protein motifs of GhGRFs
We used the online software Gene Structure Display Server (GSDS) for gene structure analysis (http://gsds.cbi.pku.edu.cn/) (Hu et al., 2015), as previously described (Wang et al., 2021). The CDS sequences were aligned with the reverse complementary sequence of miRNA396a, which was acquired from previously published research papers (Hewezi et al., 2012;Yang et al., 2009). Motif analysis of the full-length protein sequence was performed using the MEME online software (Bailey et al., 2015) (https://meme-suite.org/meme/) as previously described (Qanmber et al., 2019b), according to the following parameters: classic mode, zero or one occurrence per sequence, and motif number 10.

Biophysical properties and sequence logos of GhGRF genes
Various physical and chemical parameters, such as the number of amino acids, MW (molecular weight), pI (isoelectric point), and GRAVY (grand average of hydropathicity) of GhGRF gene coding protein sequences were determined using the ExPASy ProtParam tool (http://us.expasy.org/tools/protpara-m.html). The GRF protein sequences were aligned using Muscle in MEGA 5.2 (Tamura et al., 2011), and the results were analyzed with the WEBLOG online program (http://weblogo.plusone.com/create.cgi) (Crooks et al., 2004) as previously described (Ali et al., 2020;Qanmber et al., 2019a) to generate the sequence logos of conserved amino acid residues.

Classification and phylogenetic analyses of GRF genes
Classification and phylogenetic analyses were performed using MEGA 5.2. A neighborjoining tree of G. hirsutum, O. sativa, and Arabidopsis was constructed using full-length protein sequences according to the following parameters: Poisson correction, 95% partial deletion, and bootstrap analysis with 1000 replicates, as previously described (Xiao et al., 2018;Yu et al., 2018). Another phylogenetic tree was built using maximum likelihood (ML) methods (default parameters and JTT+G as amino acid substitution models) to confirm the authentication of the phylogenetic tree.

Plant material and Real-time quantitative PCR
For expression pattern analysis, tissue samples were collected under field conditions. Cotton plants growing under normal conditions were placed into a container with a predetermined concentration of BL, IAA, JA, PEG, and NaCl. The samples were collected at different time points (0, 0.5, 1, 3, and 5 h), immediately frozen into liquid nitrogen, and stored at −80 • C for subsequent RNA extraction and qRT-PCR analysis. We used the Trizol (Invitrogen, Carlsbad, CA, USA) method to extract total RNA from each cotton sample, as previously described. cDNA was synthesized using a PrimeScript 1st strand cDNA Synthesis Kit (Takara), according to the manufacturer's instructions. RT-qPCR was performed using SYBR Green Master Mix (Takara) on an IQ5 Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA) (Wang et al., 2014b). RT qPCR experiments were performed using the actin gene as an internal reference gene. The PCR program consisted of an initial polymerase activation at 95 • C for 1 min, followed by 40 cycles of 95 • C for 10 s and 60 • C for 30 s. Upon completion of the PCR, a dissociation curve was generated to confirm the specificity of the product and avoid generating primer dimers.

Identification and naming of GhGRF genes
To identify the GRF gene family, candidate sequences lacking both domains (QLQ and WRC) or with only one of the two domains were discarded. Finally, 35 GRF genes were identified (Fig. 1). To further enhance the credibility of the results, two additional blasts were performed against NAU and ZJU protein sequence data of G. hirsutum. The 42 or 41 candidates were also submitted to the Batch CD-search tool in NCBI, and 35 GhGRF family members were confirmed (Figs. S1 and S2). Multiple sequence alignment was performed using JGI (35), N AU (33), and ZJU (35) GhGRF deduced protein sequences. The gene names and their corresponding IDs are displayed in Table 1, and the nomenclature is described in detail in later paragraphs.

Conserved amino acid residues within QLQ and WRC domains
The GRF gene family members were characterized by the presence of both conserved QLQ and WRC domains in the N-terminal end (Kim, Choi & Kende, 2003;Van der Knaap, Kim & Kende, 2000). To further explore the conservative properties of the QLQ and WRC domains of GhGRFs, multiple sequence alignment was performed to depict sequence logos of QLQ and WRC domains for G. hirsutum, Arabidopsis, and rice. The conserved domain sequence logos showed that most amino acid residues distributed in two domains were highly conserved among three plant species (Fig. 2). Amino acid residues include Q (6), E (9), Q (13), Y (19), V (26), P (27), and L (30) in the QLQ domain, while EP (3-4), RC (6-7), RRTDGKKWRC (9-17), KYC (26-28), H (31) and R (38) in the WRC domain were highly conserved.

Phylogenetic analysis and nomenclature of GRF genes
To better understand the phylogenetic relationships between GRFs in G. hirsutum, rice, and Arabidopsis, a neighbor-joining phylogenetic tree was generated using 56 GRF proteins. GRF genes from the three plant species were classified into five groups (G1 to G5) (Fig. 3). To validate the phylogenetic tree constructed using the NJ method, the maximum likelihood (ML) method was used to build another phylogenetic tree. Results indicated that 56 GRFs were also divided into five clades based on the ML tree. The topology of the ML tree was slightly different from the NJ tree, but the composition in each clade was the same, indicating that the NJ tree could be used for further analysis.
The number of GRF genes differed between the clades of the NJ tree. There were 18, 17, six, five, and 10 members in G1, G2, G3, G4, and G5 clades, respectively. The G1, G2, and G5 groups had the highest coverage of all three species, including the monocot (rice) and dicots (G. hirsutum and Arabidopsis). In contrast, the G3 and G4 groups contained GRF genes only from dicots without no GRF member from the monocot. The proportion of GRF family members from three different plants in each clade differed. In the G1 group, the number of AtGRF, OsGRF, and GhGRF members were two, four, and 12, respectively, while in the G2 group, the number of AtGRFs, OsGRFs, and GhGRFs were two, five, and ten, respectively. Similarly, in the G3 group, there were two AtGRFs and four GhGRFs members; the G4 group has two AtGRFs and three GhGRFs genes. G5 group contained one AtGRF, three OsGRFs, and six GhGRFs. These results indicate that GRF families of different plants experienced different evolutionary processes, and that cotton and Arabidopsis thaliana are dicotyledons and have a relatively close genetic relationship.
In the phylogenetic tree, names were assigned to each GhGRF gene based on their phylogenetic relationship and homologous sequence consistency (Fig. 3). Gh is used as a prefix before the GRF name of upland cotton, At for Arabidopsis GRFs. We proceeded according to the following rules of nomenclature: first, GhGRF genes were named after their orthologs in Arabidopsis, which were part of the same clade of the phylogenetic tree; second, if there were more than one orthologous counterpart of Arabidopsis, the GhGRF genes were named after the Arabidopsis homologous gene possessing the highest sequence consistency with the cotton genes; third, ''a'', ''b'', and ''c'' were appended to gene names as suffixes to distinguish the GRF genes sharing the common Arabidopsis orthologs based on the sequence consistency order from high to low. Finally, At or Dt was attached respectively to the end of each gene name to distinguish the sub-genomes in tetraploid cotton in which the gene was located; ''t'' means tetraploid.

Biophysical properties and chromosomal distribution of GhGRFs
Physical and biochemical properties include the distribution of genes on the chromosome, gene ID, length of coding sequence (CDS), number of amino acids (protein length), molecular weight (MW), isoelectric point (pI), and grand average of hydropathicity (GRAVY) of GhGRF genes and their deduced protein is displayed in Table 1. The CDS length of GhGRF genes ranged from 498 bp (GhGRF9b_At ) to 1,833 bp (GhGRF1c_At ), and the number of amino acids of their corresponding deduced proteins ranged from were computed for the GhGRF5a_Dt and GhGRF1c_Dt, respectively. We observed that GRAVY scores for all GhGRF genes were negative, indicating that all GhGRF proteins were hydrophilic; however, the degree of hydrophilicity varied greatly. Annotation of the GhGRF genome sequences (GFF3 profile) helped us to predict the locations of GhGRF genes on chromosomes (Fig. 4). The 35 GRF genes were unevenly distributed among 20 chromosomes, with 17 genes on nine chromosomes of the Atsubgenome and 18 genes on 11 chromosomes of the Dt-subgenome. There were three genes on chromosomes At02, At05, At12, Dt05, and Dt12. For At10, At13, Dt03, Dt10, and Dt13, there were two genes on each chromosome. Ten chromosomes (At06, At08, At09, At11, Dt02, D04, Dt06, Dt108, Dt09, and Dt11) have one GhGRF gene. However, six chromosomes (At01, At03-04, At07, Dt01, and Dt07) have no GhGRF gene. Four pairs of homologous chromosomes, including At05/Dt05, At06/Dt06, At08/Dt08, and At13/Dt13, have a similar distribution of GhGRF genes. However, the gene distribution on three chromosome pairs (At02/Dt02 to At04/Dt04) differed. We inferred that gene loss occurred from the At04 homologous chromosome (GhGRF3b_Dt) to the Dt04 chromosome and unidirectional translocation of a chromosome segment containing the two genes, GhGRF6c-At and GhGRF1b_At, from At03 to At02 occurred (Fig. 4). These results are consistent with those of previous studies (Yang et al., 2017).

Analysis of gene structure and deduced protein motif
To explore the exon-intron structure of GhGRF genes, their coding and genomic sequences were aligned and the number and positions of exons and introns were detected (Fig. 5B). Results indicated that the structural patterns of genes were conserved between Arabidopsis and upland cotton, and the majority of GhGRF and AtGRF genes possessed 2-3/3-4 introns/exons, respectively. The MEME web server was used to analyze the types and distributions of motifs in GhGRF deduced proteins, and ten kinds of motifs were detected (Fig. 5C). Motif 1, characterized by the WRC domain, was predicted in 34 out of 35 GhGRFs. Motif 2, characterized by the QLQ domain, was present in all GhGRFs. In the C-terminus, there were many conserved motifs, including motifs 3-5, 7, and 10. Moreover, motifs 6, 8, and 9 were primarily located in the N-terminal end close to WRC and QLQ of G1 members. This conserved exon-intron and motif structure pattern supports the evolutionary relationship of GRF genes in upland cotton and Arabidopsis.
To investigate whether microRNA396 inhibits the regulation of GhGRF genes on plant growth and development, we tried to find fragments in CDS complementary to microRNA396a through multiple sequence alignment using ClustalW in MEGA5.2. Except for GhGRF9b (which lost the second half of the gene), all other gene mRNA sequences contained highly conserved fragments reverse complementary to microRNA396a (Fig. 6). These results suggested that microRNA396a is essential for controlling the function of GhGRF genes when regulating plant growth and development by decreasing their transcripts.

Analysis of cis-regulatory elements
To analyze cis-regulatory elements, 1,500 bp upstream sequence before the transcriptional start codon was retrieved, and along with the 5 -UTR sequence, was submitted to the PlantCARE database. More than 20 types of cis-elements were identified in GhGRF promoter regions and divided into four categories based on their functions: (1) Phytohormone responsive elements (ABA, MeJA, auxin, GA, and SA); (2) environmental or stress-responsive elements (light, defense, low-temperature, anaerobic conditions, and drought); (3) cis-elements related to growth and development (meristem, palisade mesophyll cells, circadian control, cell cycles, and endosperm); (4) cis-elements related to the regulation of bioactive compound metabolism (zein, phytochrome, flavonoid biosynthesis); ( Table 2). The distribution of these elements in the promoters regions of GhGRF was not uniform, indicating that GRF genes became functionally differentiated during germline evolution. The diversity of promoter cis-elements demonstrated that the GRF genes play an essential role in the growth and development of cotton.

Expression patterns of GhGRF genes
To analyze the expression patterns of GhGRF genes, publicly available transcriptomic data was retrieved and used to describe the expression levels of GhGRF genes in mature organs, developing tissues, and abiotic stress conditions (Fig. 7). The expression patterns of GhGRF genes in different tissues indicated that most GhGRF genes transcripts developed high amounts of ovules and fibers, except for two genes (GhGRF1f_Dt and GhGRF6a_Dt ) whose expression was barely detected during ovule and fiber development (Fig. 7A). More than half of the GhGRF genes were highly expressed during seed germination and the root and cotyledon development processes (Fig. 7B). As shown in Fig. 7C, 27 genes exhibited high expression in mature organs (pistil). The transcript level of 2-5 genes was high for other organs such as calycle, leaf, petal, root, stamen, stem, and torus. Only one gene (GhGRF9b_Dt ) indicated a high transcript level in the torus. We next observed the expression levels of GhGRF genes against abiotic stresses (cold, heat, salt, and PEG) using transcriptomic data. GhGRF1a_At and GhGRF9b_Dt exhibited downregulated expression under the exposure of cold and PEG, while the expression levels of these two genes were upregulated under heat and salt stress. Moreover, the expression of GhGRF1c _At was high at 6 h and 12 h of cold and heat treatment, while GhGRF3a_At was up-regulated only in response to cold. However, GhGRF8_At was down-regulated in response to cold, heat, and salt.
Full-size DOI: 10.7717/peerj.13372/ fig-8 GhGRF9cDt was significantly higher for all time points under all abiotic stresses (Fig. 9), suggesting that these GhGRF genes could improve plant resistance to abiotic stresses and are potential candidate genes for further study. The transcript level of GhGRF7At decreased at all time points with NaCl and PEG treatment compared to the control. The transcription level of other genes decreased at 0.5, 1, and 3 h after NaCl and PEG treatment, including GhGRF1cAt and GhGRF1cDt. For hormonal treatments, including IAA, BL, and GA, some genes (GhGRF1cAt, GhGRF1cDt, GhGRF5aAt ) had relatively high expression levels at all time points. This highlights the important role they play in hormone signaling pathways ( Fig. 10). Other GhGRF genes exhibited moderate to low expression when exposed to IAA, BL, and GA treatments.

DISCUSSION
GRF family transcription factors are involved in regulating various life processes during plant growth and development by controlling cell division (Horiguchi, Kim & Tsukaya, 2005;Kim & Kende, 2004;Kim & Lee, 2006;Lee et al., 2009;Omidbakhshfard et al., 2018), providing an opportunity for genetically improving crops including rice, wheat, and Brassica napus (Che et al., 2015;Chen et al., 2019;Duan et al., 2015;Hu et al., 2015;Li et al., 2016;Liu et al., 2012;Raz et al., 2018;Sun et al., 2016). In our study, 35 GRF genes were identified in G. hirsutum, which was the highest number of GRF of all selected species (Cao et al., 2016;Choi, Kim & Kende, 2004;Filiz, Koc & Tombuloğlu, 2014;Kim, Choi & Kende, 2003;Wang et al., 2014a;Zhang et al., 2008;Zhang et al., 2018). We primarily analyzed GhGRFs and identified their potential functions in growth and development. Previous studies reported that GRF proteins with regulatory functions in plant growth and development share common structural features that are characterized by two highly conserved motifs, QLQ and WRC (Choi, Kim & Kende, 2004;Kim, Choi & Kende, 2003;Kim & Kende, 2004;Van der Knaap, Kim & Kende, 2000). In this study, we performed a comparative analysis of the structural features of GhGRFs and AtGRFs and found that they have very conserved QLQ and WRC domains, including TQL (motif 4), GGPL (motif 3), and FDW (motif 5). Only one gene, GhGRF9b_At, was shortened without the end-half of the WRC domain and the C-terminus. Therefore, most GhGRFs regulate the expression of downstream genes involved in cell division and therefore control cotton growth and development.
Previous studies reported that GRF family TFs were highly expressed in growing and developing tissues, including shoot tips, flower buds, and roots, but that they respond weakly in mature stem and leaf tissues (Kim, Choi & Kende, 2003). We found that nearly all GhGRF genes were highly expressed during the ovule development stages, and nearly half of the selected genes were highly expressed during the development of fiber tissues. The expression level of most GhGRF genes was up-regulated under stress treatment (PEG and NaCl) and hormonal treatment (IAA, BL, and JA). These results suggest a possible role for GhGRF genes in regulating the growth and development of cotton plants by cell division in young and growing tissues.
It is commonly believed that gene expression is controlled by promoter regions, which function as a multi-channel switch. Our analysis of promoter cis-elements indicated that the GhGRF gene promoter regions contain various cis-elements related to light, low temperature, drought, biological stresses, and a variety of hormones, including auxin, GA, SA, MeJA, and ABA binding with TFs such as ERF, MYB, MYC, and DREB. Our research demonstrated that these GhGRF genes could be essential for signal transduction pathways and integrating the internal and external environment. Consistent with previous studies of AtGRFs, we found that they were greatly accumulated in GhGRF gene transcripts and less so in mature tissues. It has been documented that microRNA396 is the major repressor of GRF gene expression (Baucher et al., 2013;Bazin et al., 2013;Beltramino et al., 2021;Chen, Luan & Zhai, 2015;Debernardi et al., 2014;Debernardi et al., 2012;Hewezi et al., 2012;Liu et al., 2009;Liu et al., 2014;Rodriguez et al., 2010;Yang et al., 2009). In our study, we found that a reverse complementary sequence of microRNA396a existed in all GRF gene CDS except GhGRF9b_At, suggesting that microRNA396a might be the reason for the lower expression level of GhGRF genes in mature organs.
Our results indicate that GhGRF genes can be used in cotton genetic engineering to improve fiber and cottonseed oil yield. However, little is known about the regulatory network and mechanism of GhGRF genes, ecological risks and technological limitations have hampered further study. However, progress has been made researching the function and mechanism of OsGRF4 with mutations in GL2 (Che et al., 2015), GS2 (Duan et al., 2015;Hu et al., 2015), GLW2 (Li et al., 2016), PT2 (Sun et al., 2016), LGS1 (Chen et al., 2019), and TtGRF4-A (Raz et al., 2018) (homologs to OsGRF4 in wheat). These spontaneous mutations of GRF genes occurred in regions reverse complementary to microRNA396 and stopped the microRNA396 inhibition of GRF genes, leading to high accumulation of GRF mRNA. However, these mutations did not change the structure and function of proteins encoded by the GRF genes, meaning that a high accumulation of GRF mRNA promotes extra growth and development. For example, Large grain Size 1 (LGS1) is an allele of the OsGRF4 gene that contains 2 bp missense mutation in the OsGRF gene coding region reverse complementary to microRNA396, with no influence on its normal function.
However, the mutation disrupts the pairing of the LGS1 mRNA with microR396, leading to the up-regulation of the OsGRF4 transcript accumulation, the promotion of plant growth and development, the formation of larger grains, and increased cold tolerance (Chen et al., 2019). However, the development of the CRISPR/Cas9 system has made it possible to perform targeted mutagenesis on functional genes in plants (Belhaj et al., 2015;Demirci, Zhang & Unver, 2018;Samanta, Dey & Gayen, 2016).
Inspired by this research progress, we proposed a feasible technical route for the molecular improvement of cotton production. The natural mutations mentioned above can be mimicked using gene-editing techniques to produce desirable mutations of some GhGRF genes with high frequency. Specifically, the CRISPR/Cas9 system can be used to edit the bases in reverse complementary fragments of the GhGRF genes to the microRNA396a sequences and produce gene mutations, making its transcripts unable to pair with microRNA396a without damaging its ability to translate into proteins with normal function. Transgenic ingredients of the Cas9 vector can be removed through hybridization and selection. Using this method, we can easily obtain mutants possessing expected agronomic traits, which can be used in cotton breeding and production, avoiding the negative effects of these transgenic technologies.

CONCLUSIONS
In this study, 35 GRF genes were identified in G. hirsutum, with highly conserved deduced protein structures. GhGRFs shared the QLQ and WRC domains at the N-terminal end with GRFs in Arabidopsis and rice, which laid the structural foundation for the regulation of its growth and development. The phylogenetic tree revealed the homologous relationship among GRF family members from Arabidopsis, G. hirsutum, and rice, indicating that the GhGRF genes have similar functions and mechanisms as Arabidopsis rice. The mRNA sequences of GRF genes contain reverse complementary sequences with microRNA396a, indicating that microRNA396a can effectively inhibit its function. We found that the expression of most GhGRF genes was high during the growth and development of ovules, fibers, and pistils. The significant changes in the expression levels of GhGRF genes under stress further illustrate that these genes are required for growth and development in normal environments and under stress conditions. Promoter cis-elements and expression patterns analyses indicated that the GhGRF genes play a key role in regulating plant growth and development. Therefore, the production of oil and fiber can be increased using CRISPR/Cas9 technology to conduct targeted mutagenesis in the reverse complementary fragment of GhGRF genes with microRNA396a to promote the growth and development of cotton.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This study was supported by the following funds: National Natural Science Foundation of China (31801416); Innovation Program of the Chinese Academy of Agricultural Sciences