Genome-wide comparative analysis of RNA-binding Glycine-rich protein family genes between Gossypium arboreum and Gossypium raimondii

RB-GRP (RNA-binding Glycine-rich protein gene) family belongs to the fourth subfamily of the GRP (Glycine-rich protein gene) superfamily, which plays a great role in plant growth and development, as well as in abiotic stresses response, while it has not been identified in cotton. Here, we identified 37 and 32 RB-GRPs from two cotton species (Gossypium arboreum and Gossypium raimondii, respectively), which were divided into four distinct subfamilies based on the presence of additional motifs and the arrangement of the glycine repeats. The distribution of RB-GRPs was nonrandom and uneven among the chromosomes both in two cotton species. The expansion of RB-GRP gene family between two cultivars was mainly attributed to segmental and tandem duplication events indicated by synteny analysis, and the tandem duplicated genes were mapped into homologous collinear blocks, indicated that they shared a common ancestral gene in both species. Furthermore, most RB-GRPs in two cotton species undergone stronger negative selective pressure by evolutionary analysis of RB-GRP orthologous genes. Meanwhile, RB-GRPs participated in different abiotic stresses (Abscisic acid, salt and Polyethylene glycol) responses and tissues at different developmental stages between two cotton species were showed by gene expression analysis. This research would provide insight into the evolution and function of the RB-GRPs in Gossypium species.


Identification and classification of RB-GRPs in G.aboreum and G.raimondii
In the draft genome of G.aboreum, RB-GRPs were identified using Hidden Markov Model (HMM) profile corresponding to the Pfam RRM family (PF00076, PF04059, PF08777, PF10378, PF10598, PF13893, PF14259) and CSD family PF00313 through the HMMER 3.0 package with the E-value<1×10 −4 as threshold. From the selected protein sequence screened through RRM domain and CSD domain, the RBPs containing high content of glycine residues (more than 50% residues within any 20-animo acid peptide are glycine) were obtained using perl script. Then, the RB-GRPs were confirmed within the database SMART (http://samart. embl-heidelberg.de/) and INTERPRO (http://www.ebi.ac.uk/interpro/) according to their conserved domain architecture. The RB-GRPs were divided into different subgroups, Sector allotment was based on their conserved motif composition as described previously. Molecular weight (MW), theoretical isoelectric point (pI), and size of the RB-GRPs were investigated with the online tool ExPASy (http://expasy.org/tools/). Subcellular locations were predicted by software WoLF PSORT (http://wolfpsort.org/).

Chromosome distribution of RB-GRPs
The positions of the RB-GRP genes were physically mapped to the 13 chromosomes in each genome with GFF file downloaded from http://cgp.genomics.org.cn, respectively. After that, Mapchart was used to draw the physical maps of RB-GRPs on chromosomes with SVG module.

Multiple sequence alignment and phylogenetic analyses of RB-GRP genes
To explore the evolution relationship of RB-GRP genes, conserved domains among five species have been extracted using perl script and aligned according to the "hmmalign" module by HMMER V3.0 programmer, and the files containing conserved domain sequences should be converted into a format that MEGA 5.0 can recognize. Then the resulting sequences were used to construct a phylogenetic tree using the N-J method in MEGA 5.0 with the random seed of phylogeny test, poisson correction and pairwise deletion option parameters enable. A bootstrap test with 1,000 replicates was tested to obtain the reliability of the trees, and only a test value higher than 50% in the clades was selected for the conserve tree.
Exon/intro structure analysis and motif prediction of RB-GRPs between G.arboreum and G.raimondii. The existing gff files of G.arboreum and G.raimondii were downloaded from http://cgp.genomics.org.cn, and then extracted the files contained candidate RB-GRP genes to analyze the exon/intro structures by GSDS (http://gsds.cbi.pku.edu.cn). Conserved motifs and zinc finger structures of the selected protein sequences were confirmed by SMART database (http://smart.embl-heidelberg.de/) and INTEPRO (http://www.ebi.ac.uk/interpro/ scan.html), respectively.

Detection of collinear tandem arrays
A tandem array at an ancestral locus which termed collinear tandem array, may imply positional gene family expansion. In this study, BLASTP was provided for detection of collinear tandem arrays of RB-GRP genes between G.arboreum and G.raimondii. In any BLASTP hit, the two genes are relabeled as 'tandem duplicates' if they have a difference of gene rank = 1.

Identification and non-synonymous/synonymous substitution (Ka/Ks) ratios of orthologous gene pairs of BP-GRP genes between G.aboreum and G.raimondii
A two-step method was used to obtain the orthologous gene pairs of BP-GRP genes to detect the evolutionary relationship between different cotton species. Firstly, MCscanX was employed to identify the orthologous regions between G.arboreum and G.raimondii. Secondly, orthologous gene pairs of RB-GRP genes were then extracted according to orthologous regions containing RB-GRP genes with small Ks value. PAML package was used to calculate the orthologous rate of Ka and Ks to characterize the collinear genes between G.arboreum and G.raimondii.

Plant materials and growth condition
G.arboreum and G.raimondii were grown in soil mixture in a climate-controlled greenhouse (16 h light/8 h dark at 30˚C). 250 mM NaCl, 100 mg/L ABA and 10% PEG (Polyethylene glycol) were treated after the expansion of the first true leaf to induce salt stress, hormone stress and drought stress, respectively. For each induction treatment, we collected leaf samples at five time points (0 as control, 6, 12, 24 and 48 h). To analyze the expressions of RB-GRPs in different tissues, plants were tagged on the day of flowering (0 DPA), fiber was separated from 50 plants at 0 DPA (ovule), 3 DPA, 6 DPA, 10 DPA and 15 DPA, and seed at 10 DPA, 20 DPA, 30 DPA and 40 DPA. And then immediately frozen in liquid nitrogen and stored at -80˚C freezer for RNA extraction. Three biological replicates were conducted for each sample.

RNA isolation and qPCR analysis
Total RNA of all the collected samples were extracted using the RNAprep Pure Plant kit (Aidlab, Beijing, China). A total of 2 μg of RNA was used as the template, and the first-strand cDNAs were synthesized using the Takara Reverse Transcription System (TaKaRa, Shuzo, Otsu, Japan). The gene expressions of all RB-GRP genes were detected by the 170-8792iCycler iQ Calibration Kit qPCR (Quantitative Real-time polymerase chain reaction) system (Bio-Rad, USA). SYBR Green Real-time PCR Master Mix (Toyobo) was used to perform the reaction. The details of the protocol were as follows: (Step 1) initial denaturation step of 30 s at 95˚C, (Step 2) 40 cycles of 5 s at 95˚C, 34 s at 60˚C and (Step 3) melting curve analysis, and the comparative Ct (2 -ΔΔCt ) method was used to calculate gene expression levels. The β-actin gene was chosen as the reference gene. The primer sequences are shown in S1 Table. Specificity of primers used in this study was verified by subcloning the generated amplicons using the TOPO TA Cloning Kit (Thermo Fischer Scientific, Reinach, Switzerland), and then using them for sequencing (data not shown). Gradient dilution of validated plasmids was then used to construct a standard curve. Amplification efficiency of primer pairs of all genes we detected were no less than 98%. Each experiment was repeated three times.

RNA-sequencing analysis
Total RNA was extracted using the RNAprep Pure Plant kit (Aidlab, Beijing, China).CA, USA) from different cotton tissues during different development stages. The RNA samples were sent to the Beijing Berrygenomics for sequencing on an Illumina HiSeq2000 sequencing platform. The DEGseq package was used for identifying genes differentially expressed between paired samples pairings, and P-values were adjusted according to the Benjamini and Hochberg method [23].

Identification and classification of BP-GRP genes in G.aboreum and G. raimonii
Until now, 23 and 8 glycine-rich RNA-binding protein genes were identified in the genomes of Zea mays and Arabidopsis thaliana, respectively [5,7]. 434 and 405 non-redundant RNAbinding protein (RBP)-coding genes were identified by the HMM profile from the Pfam database in the genome assemblies of G.arboreum and G.raimondii, respectively. 50 and 47 RNAbinding glycine-rich protein genes were then selected according to presence of (Gly)n-X repeats in the 434 and 405 RB-GRPs. The protein sequences of above candidate genes were then confirmed within the SMART database (http://smart.embl-heidelberg.de/) and BLASTP according to the conserved domains of their own. Finally, 37 and 32 RB-GRP genes were selected from G.arboreum and G.raimondii (S1 and S2 Figs). Meanwhile, we have identified 13 and 15 glycine-rich RNA-binding protein genes in the genomes of Arabidopsis thaliana and Theobroma cacao using the same method (Table 1). Then we categorized these RB-GRP encoding genes into four subtribes (IVa, IVb, IVc and IVd) according to domain motif consistent with previous principles of classification (Table 1) [4]. According to Table 1, numbers of Class IVa in genomes of five different plant species were all bigger than any other subtribes, followed by the Class IVd. In addition, the numbers of RB-GRP genes in two cotton species were bigger than other plant species. GaRB-GRP1 to GaRB-GRP37 and GrRB-GRP1 to GrRB-GRP33 were ordered according to Tables 2 and 3.

Chromosome location of RB-GRPs between G.arboreum and G.raimondii
The 37 GaRB-GRP genes were located on the 13 G.arboreum chromosomes (Fig 1). Normally, the number of GaRB-GRP genes on each chromosome varied widely. Chromosome 5 and chromosome 13 have a maximum of five GaRB-GRP genes, respectively. Four GaRB-GRP genes on chromosome 1, 6, 7 and 8, followed by on chromosome 10 which three members were found. Chromosome 2, and 4 contained two genes each, whereas each only single GaRB-GRP gene was localized on chromosome 3, 11 and 12 (Fig 1). Obviously, they were distributed unevenly among 13 chromosomes, except for no GaRB-GRP gene was found on chromosome 9 (Fig 1). Four pairs of GaRB-GRPs were linked on the same chromosome. The rest genes were found as singletons on chromosomes (Fig 1).
Like the case in G.arboreum, the 32 GrRB-GRP genes distributed unevenly across the 13 chromosomes in G.raimondii (Fig 2). Chromosomes 2 had a maximum of six GrRB-GRP genes, five GrRB-GRP genes each on chromosome 5 and 10, respectively, three on chromosome 1 and 7, two on chromosome 9 and 8, one each distributed on the other four chromosomes, respectively (Fig 2). Five pairs of GrRB-GRPs in G.raimondii were linked on the same chromosome (Fig 2).

Phylogenetic analysis of RB-GRP gene family
To investigate the molecular evolution of RB-GRP gene family comprehensively and systematically, all the putative RB-GRPs (with protein conserved domains) from two cotton species, as well as the RB-GRPs from Arabidopsis thaliana, Theobroma cacao and Zea mays, were aligned to generate an unrooted phylogenetic tree separately with Neiboring-Joining method. According to the position among the protein sequence, the domain sequence naming scheme was added a suffix N, M and C behind the original sequence name, such as GaRB-GRP37 N, GaRB-GRP37 M and GaRB-GRP37 C. There are two domains (RRM and CSD) existed in RB-GRPs, while the RRM-type RB-GRPs accounted for the majority. Thus, the RRM-type phylogenetic tree of RB-GRP proteins from G.raimondii or G.arboreum, Arabidopsis thaliana, Theobroma cacao and Zea mays was established (Fig 3). It suggested that most of RRM-type RB-GRPs of five species were divided into two subgroups according to the position of RRM domain (C-terminal or N-terminal), expect for two GaRB-GRPs (GaRB-GRP28 and GaRB-GRP29), two GrRB-GRPs (GrRB-GRP5 and GrRB-GRP26) and TcRB-GRP 10.

Gene structure of GaRB-GRPs and GrRB-GRPs
To investigate the possible structural evolution of RB-GRP gene family in the two diploid cotton species, the gene structures of GaRB-GRPs and GrRB-GRPs were compared separately. In

Whole genome collinearity analysis of RB-GRPs between G.arboreum and G.raimondii
The RB-GRP gene family in G.arboreum and G.raimondii have established a close relationship of collinear and synteny for each other (Fig 5), exploited by Circos software. The analysis of collinear blocks of two cotton species indicated that the large-scale syntenies contained 27 GaRB-GRPs, 26 GrRB-GRPs and one gene (Gorai.009G401300) in G.raimondii genome that was identified as not RB-GRP gene found to share synteny with G.arboreum. 10 RB-GRPs were single G.arboreum-to-G.raimondii orthologs, which indicated these genes should have been in the genome of the last common ancestor of G.arboreum and G.raimondii. While the rest genes showed one-to-more or more-to-one correspondence, for example, twelve cases that two G. arboreum genes corresponded to one G.raimondii genes, thirteen cases that one G.arboreum gene corresponded to multiple G.raimondii genes. Genome-wide analysis of the RB-GRP family

Expansion pattern of RB-GRP genes in G.arboreum and G.raimondii
Detection of collinear orthologs is important for understanding gene evolution. Segment and tandem duplications are the main mechanism resulting in gene family expansion. Distribution of paralogs could be used to analysis the potential duplications and the evolution of RB-GRPs between G.arboreum and G.raimondii. Sixteen paralogs GaRB-GRPs and twelve GrRB-GRPs were detected based on the bootstrap value in the phylogenetic analysis, and all of them randomly distributed on chromosomes of G.arboreum and G.raimondii, which indicated that segmental duplication is predominant during RB-GRP gene family expansion. Meanwhile, Cotton_A_22861/Cotton_A_22864, Gorai.005G053000/Gorai.005G052700 and Gorai.002G019510/Gorai.002G019500 were six tandem duplication genes, and Cotton_A_22861/ Gorai.005G052700 belonged to the orthologous pair of single G.arboreum-to-G.raimondii gene correspondence. It illustrated expansion of RB-GRP gene family was also associated with tandem duplication event, and the expansion pattern of RB-GRPs in G.arboreum is consistent with G.raimondii.

Selective pressure analysis of orthologous gene pairs for RB-GRP genes
To investigate the selective constrains on RB-GRP genes, the non-synonymous (Ka) to synonymous (Ks) substitution ratio for orthologous gene pairs for RB-GRP genes were calculated. As is known to all that the ratio (Ka/Ks) indicated neutral mutation when Ka equals to Ks, negative (purifying) selection when ka is less than Ks, and positive (diversifying) selection when Ka exceeds Ks [24]. In this study, 20 orthologous gene pairs in the RB-GRP gene family of G.arboreum and G.raimondii were investigated (S3 Fig). For RRM-type, the mean Ka/Ks ratio of all of orthologous genes was 0.154 between G.arboreum and G.raimondii, with most of them being even<0.3, which suggested that they had experienced strong purifying selection pressure (S3

Expression analysis of RB-GRPs during fiber and seed development between G.arboreum and G.raimondii
We used RNA-seq analysis to compare the gene expression profiles in fiber or seed during different development, the result showed that 10 GaRB-GRPs and 11 GrRB-GRPs were participated in fiber cell development (Fig 6), 7 GaRB-GRPs and 11 GrRB-GRPs were then participated in seed development of G.arboreum and G.raimondii, respectively (Fig 7). Expressions of above genes during different development were further analyzed by q-PCR. Expression of Cotton_A_00105, Cotton_A_09110, Cotton_A_11378, Cotton_A_25290, Cotton_A_34922 and Cotton_A_35063 were higher in the earlier development stage of fiber, especially Cotton_A_34922 and Cotton_A_35063 had the highest expression level at 0-3 DPA, and decreased with fiber development, while Cotton_A_10822 presented a gradually increasing trend during 0-15 DPA (Fig 8). Unlike GaRB-GRPs, most GrRB-GRPs presented stable high expression during 0-3 DPA (Fig 9), which means that GrRB-GRPs played an important role in earlier fiber development of G.raimondii. In addition, while two (Gorai.002G19510 and Gorai.013G03680) kept high expressions during 0-15 DPA (Fig 9), indicated that they may participated in the initiation and elongation of cotton fiber. There are many RB-GRPs differentially expressed during the seed development of cotton (Figs 10 and 11). In G.arboreum, expressions of most RB-GRPs showed a decline trend with seed development of cotton, while Cot-ton_A_19718 kept high and stable expression during whole seed developmental stage, and Cot-ton_A_35063 showed a dramatic increase at 40 DPA (Fig 10). But in G.raimondii, only Gorai.013G03680 showed a high stable expression in seed during different developmental stages. Expressions of Gorai.002G07700, Gorai.002G01580 and Gorai.006G03410 presented transient increase in seed at 40 DPA (Fig 11).

RB-GRP genes in different cotton species
RB-GRP family is characterized by the presence of a glycine-rich domain arranged in (Gly)n-X repeats and a RRM [7]. A number of RB-GRPs have been reported in different plants, such as eight in Arabidopsis [25] and six in rice [26], which play important roles in plant growth, development, and stress resistance [8,9]. In our study, 37 GaRB-GRPs and 32 GrRB-GRPs Genome-wide analysis of the RB-GRP family were identified in genomes of G.arboreum and G.raimondii. Conserved protein sequence analysis indicated that the GaRB-GRPs and GrRB-GRPs could be divided to four subgroups (Class IVa, IVb, IVc and IVd), which was similar to that in other plant species [3]. However, two GaRB-GRPs (Cotton_A_35374 and Cotton_A_38093) and one GrRB-GRP (Gorai.002G167100) belong to Class IVb contained one RRM domain and two ZnF_RanBP 2 , which is not found in other plant species. The species phylogenetic tree displayed the value of synonymous substitution/site (Ks) among G.arboreum, G.raimondii and T.cacao, and the divergence time of cotton species can be calculated by Ks values. The result of divergence time was consistent with that the common ancestor of two diploid cotton species have diverged from T.cacao 18-58 million years ago [19]. To sum up, we could speculated that RB-GRP gene in cotton was in an expansion trend after speciation from the T.cacao lineage.

Tandem duplication and segmental duplication contributed to the expansion of the RB-GRP gene family in G.arboreum and G.raimondii
Gene family expansion mainly caused by tandem duplication, segmental duplication and transposition events [7]. Through the analyses of 50 gene families in Arabidopsis thaliana, tandem duplication is the most prominent, whereas segmental and transposition events occurred  Genome-wide analysis of the RB-GRP family more frequently in other plants [27]. Due to a recent and an ancient whole-genome duplication event occurring in G.arboreum and G.raimondii genomes, and the whole-genome duplication event cannot be ruled out as the cause of gene family expansion in G.arboreum and G. raimondii. In our study, no orthologous genes in G.raimondii could be found corresponding to 11 GaRB-GRPs, and 8 GrRB-GRPs have no corresponding orthologous genes in G.arboreum, which may result from losing or deleting of orthologous after whole genome duplication event, and the missing genes may not survive the evolutionary selection pressure. Moreover, the two cotton species experienced a retrotransposon insertion event, and the insertion was the fundamental reason for the different size between A and D genome [28], including the losing or deleting of orthologous genes. The number of copies of the orthologous genes between G.arboreum and G.raimondii was not increased by the whole genome duplication. Thus, whole genome duplication of cotton did not contribute to expansion of RB-GRPs in G.arboreum and G.raimondii.
Only two (5.4%) and four (12.5%) tandem duplication genes were found in RB-GRPs among G.arboreum and G.raimondii, while seven pairs of paralogous RB-GRP genes in G. arboreum and five pairs of paralogous RB-GRP genes in G.raimondii randomly distributing on chromosomes indicated that segmental duplication is predominant in the three duplication events, which was consistent with the expansion of the ZmRB-GRPs in maize [7]. Overall, both tandem duplication and segmental duplication contributed to expansion of RB-GRP gene family between G.arboreum and G.raimondii.

Role of the GaRB-GRPs and GrRB-GRPs during fiber and seed development of cotton
Cotton plays a crucial role in human life, the world economy, and scientific research, and the fiber of cotton is an essential raw material for the textile industry [29,30], and the seed of cotton, a by-product of fiber, is increasingly recognized to have excellent potential as a source of food and biofuel [31]. A large number of genes are believed to be involved in fiber and seed developmental process, such as WLIM1a, Sus (Sucrose synthase), GhRDL1 [32,33,34]. The tissue-specific expression patterns of the selected RB-GRP genes under normal condition reflected that they might play versatile functions in the growth and development of cotton (Dong et al. 2016). Comparing to G.arboreum, there were more RBGRPs involve in the cotton fiber development of G.raimondii, while more RBGRPs differently expressed in seed of G. arboreum. Although most of the expression patterns between GaRB-GRP encoding genes and GrRB-GRP encoding genes were different in same tissue between G.arboreum and G.raimondii, few orthologous genes presented similar expressions between two cotton species, which implied their functional conservation.

Role of the GaRB-GRPs and GrRB-GRPs under different stresses tolerance
Most of Arabidopsis, rice and maize RB-GRPs were involved in response to stresses [5,6,7]. In this study, expressions of 8 GaRB-GRPs and 8 GrRB-GRPs in two cottons under different stresses were analyzed by qPCR. Cotton_A_19718, Cotton_A_19121, Cotton_A_00739, Cot-ton_A_16121, Cotton_A_22297, Cotton_A_18468 in G.arboreum and Gorai.010G13910, Gorai.002G00350, Gorai.013G03680, Gorai.002G11140, Gorai.002G23920, Gorai.002G16700 in G. raimondii were significantly responded to ABA, NaCl and PEG response, while Cot-ton_A_39551, Cotton_A_10822 in G.arboreum and Gorai.009G24350, Gorai.001G07470 in G. raimondii were involved in the early stages of different stress, while showed no obvious significant difference with the prolongation of treatment, which implied that they were involved in rapid response to environmental stress, and other RB-GRPs play an important role in stress resistance. The gene expression patterns can provide important clues for gene function [35]. However, their detailed roles in stress responses need to be further studied in future.

Conclusion
The RB-GRP gene family has been extensively studied in model plant species such as Arabidopsis, rice and mazie [5,6,7], while there has been a lack of systematic analysis of RB-GRP family genes in cotton. Here, we identified and compared the RB-GRP gene family members between the two cotton species (G. raimondii and G. arboretum). The RB-GRP genes in cotton likely experienced tandem and segmental duplication events, similar to other species. Most RB-GRPs in two cotton species undergone stronger negative selective pressure by evolutionary analysis of RB-GRP orthologous genes. qRT-PCR analyses revealed that RB-GRPs have crucial functions during fiber and seed development of cotton, and may be involved in ABA, NaCl and PEG response. The results have provided a basis for further assessment of physiological roles of different RB-GRP genes in response to stresses in cotton species.