Genome-wide identification and expression analysis of the VQ gene family in soybean (Glycine max)

Background VQ proteins, the plant-specific transcription factors, are involved in plant development and multiple stresses; however, only few articles systematic reported the VQ genes in soybean. Methods In total, we identified 75 GmVQ genes, which were classified into 7 groups (I-VII). Conserved domain analysis indicated that VQ gene family members all contain the VQ domains. VQ genes from the same evolutionary branches of soybean shared similar motifs and structures. Promoter analysis revealed that cis-elements related to stress responses, phytohormone responses and controlling physical as well as reproductive growth. Based on the RNA-seq and qRT-PCR analysis, GmVQ genes were showed expressing in nine tissues, suggesting their putative function in many aspects of plant growth and development as well as response to stress in Glycine max. Results This study aims to understand the roles of VQ genes in various development processes and their expression patterns in responses to stimuli. Our results provide basic information in identification and classification of GmVQ genes. Further experimental analysis will allows us to know the functions of GmVQs participation in plant growth and stress responses.


INTRODUCTION
VQ genes are plant specific genes, which involved in plant development and multiple stress responses (Cheng et al., 2012). A conserved amino acid region has been identified within them, which composed of approximately 50-60 amino acids with a highly conserved the FxxhVQxhTG motif (Jing & Lin, 2015). The VQ domain possesses multiple biological functions in VQ proteins, such as the mutant strain of AtVQ14 (changes from IVQQ to EDLE) in the VQ domain result in producing small seeds, nevertheless the mutations in other locations does not have this characteristic (Wang et al., 2010). Furthermore, studies have reported that VQ genes are different in plants and do not have any intron in higher plants, whereas most VQ genes contain one or more introns in moss (Li et al., 2014;Jiang, Sevugan & Ramachandran, 2018;Dong et al., 2018).VQ proteins can interact with the WRKY proteins, for example, SIB1 and SIB2 are also VQ proteins, they were interacted with WRKY33 by recognizing the WRKY domain in C-terminal to activating the defense of plants (Lai et al., 2011).
VQ proteins were reported in dicotyledon such as Arabidopsis thaliana (Cheng et al., 2012), Vitis vinifera , Camellia sinensis (Guo et al., 2018), and monocotyledon such as Oryza sativa (Kim et al., 2013a;Kim et al., 2013b), Zea mays (Song et al., 2016). VQ proteins perform a variety of functions in plant development. For example, IKU1 (AT2G35230) is one of the VQ protein, it involved in regulating endosperm development and affect the seed formation during plant growth (Garcia & Berger, 2003). Under the far-red and low intensity of white light conditions, over expression of AtVQ29 can reduces the hypocotyl growth and it has higher expression in stem cells (Perruc et al., 1999). Furthermore, VQ genes regulate varying functions under abiotic and biotic stresses. AtCaMBP25 (also named AtVQ15) overexpression in transgenic plants had highly sensitive to osmotic stress in germination and early growth of seeds (Perruc et al., 1999). AtVQ9 alleviated the activity of WRKY8 under salt stress (Hu et al., 2013). The transcript levels of AtVQ23 and AtVQ16 are strongly induced by Botrytis cinerea infection and SA stress (Lai et al., 2011).
Glycine max is an important economic crop, widely cultivated in a number of countries. They are often subjected to abiotic stresses during the growth process, such as drought, high salinity, and other abiotic stresses were severely influenced on soybean production (Liu & Li, 2010). Therefore, identification of resistance genes has great significance for improving the yield and quality of soybean through molecular breeding. In this study, we identified 75 VQ genes of the soybean genome, and analyzed their phylogenetic, evolutionary motif, structure, promoter, and expression pattern. In addition, we analyzed the GmVQs's expression level in different multiple abiotic stresses. Our results provide a basic information on identification and classification of GmVQ genes, and further experimental analysis allows us to comprehend the functions of GmVQs participate in plant growth and stress responses.

Identification of VQ genes
The Hidden Markov Model (HMM) profiles of the VQ motif PF05678 were downloaded from the Pfam database (Punta et al., 2012). HMM searched VQ motif (PF05678) from the G. max proteins database with the values (e-value) cut-off at 0.1 (Punta et al., 2012). The integrity of the VQ motif was determined using the online program SMART (http://smart.embl-heidelberg.de/) with an e-value < 0.1 (Letunic, Doerks & Bork, 2012). In addition, the three fields (length, molecular weight, and isoelectric point) of each VQ protein were predicted by the online ExPasy program (http://www.expasy.org/tools/) (Rueda et al., 2015).

Phylogenetic analysis
To investigate the phylogenetic relationship of the VQ gene families among A. thaliana, O. sativa, and G. max, AtVQ and OsVQ proteins were downloaded from phytozomes (http://www.phytozome.org) based on the previous studies (Cheng et al., 2012;Li et al., 2014;Goodstein et al., 2012). VQ proteins were aligned using the BioEdit program. A neighbor-joining (NJ) phylogenetic tree was constructed using these proteins through MEGA7.0 software (Tamura et al., 2011). Bootstrapping was performed with 1,000 replications. Genes were classified according to the distance homology with A. thaliana and O. sativa genes (Cheng et al., 2012;Li et al., 2014).

Sequence alignment, motif prediction and gene structure of GmVQ genes
Multiple alignments of the VQ full length proteins were conducted using Jalview software with default parameter settings. The online MEME analysis used to identify the unknown conserved motifs (http://meme.ebi.edu.au/meme/intro.html) using the following parameters: site distribution: zero or one occurrence (of a contributing motif site) per sequence, maximum number of motifs: 20, and optimum motif width ≥6 and ≤200 (Bailey et al., 2015). A gene structure displaying server program (http://gsds.cbi.pku.edu.cn/index.php) was used to show the structure of Glycine max VQ gene.

Calculating Ka and Ks
The Ka and Ks were used to assess the selection history and divergence time of gene families (Li, Gojobori & Nei, 1981). The number of synonymous (Ks) and nonsynonymous (Ka) substitutions of duplicated VQ genes was computed by using the KaKs_Calculator 2.0 with the NG method (Xu et al., 2018). The divergence time (T ) was calculated using the formula T = Ks/(2 × 6.1 × 10 −9 ) × 10 −6 million years ago (MYA) (Kim et al., 2013a;Kim et al., 2013b).

VQ genes expression analysis of soybean
The expression data of VQ genes in different tissues, including seed, pod, SAM, stem, flower, leaf, root, root hair and nodule, is available in Phytozome V12.1 database (https://phytozome.jgi.doe.gov/pz/portal.html). The expression profile for VQ genes was utilized for generating the heatmap and k-means clustering using R 3.2.2 software (Gentleman et al., 2004).

Plant material and treatments
Glycine max (Williams 82) was used in this study. Seeds were planted in a 3:1 (w/w) mixture of soil and sand, germinated, and irrigated with half-strength Hoagland solution once every 2 days. The seedlings were grown in a night temperature of 20 • C and day temperature of 22 • C, relative humidity of 60 %, and a 16/8 h photoperiod (daytime: 05:00-21:00). After 4 weeks, the germinated seedlings were treated with 20% PEG6000 (drought), 250 mM NaCl solution (salt), 4 • C (cold), 100 µM abscisic acid (ABA), 100 µM salicylic acid (SA) solutions. Control and treated seedlings were harvested 1 h, 6 h, 12 h, and 24 h after treatment. All samples were frozen in liquid nitrogen and stored at −80 • C until use.

RNA extraction and Quantitative real-time PCR (qRT-PCR)
Total RNA was extracted from G. max using RNAiso Plus (TaKaRa, Tokyo, Japan) according to manufacturer's instructions. The cDNA synthesis was carried out with approximately 2 µg RNA using PrimeScript RT reagent Kit with gDNA Eraser (TaKaRa, Tokyo, Japan). Quantitative Real-time PCR (qRT-PCR) was performed using SYBR Premix Ex Taq II (TaKaRa, Tokyo, Japan) on an ABI Prism 7000 sequence detection system (Applied Biosystems, USA) with the primers listed in Table S1. PCR amplification was performed in accordance with SYBR Premix Ex Taq (TaKaRa, Tokyo, Japan) response system. For each sample, three technical replicates were conducted to calculate the averaged Ct values. Relative expression was calculated by the 2 − Ct method (Livak & Schmittgen, 2001). The actin and GAPDH genes were used as internal control.

Gene Ontology Enrichment
Once the sequences were obtained ran a BLASTX search against the UNIPROT database at a 1e-30 significance level. The matches were extracted and compared to the GO annotation generated against UNIPROT hits located at EBI. The GO annotation of the GmVQ genes by using WEGO 2.0 website (http://wego.genomics.org.cn/).

Analyzed the cis-elements of GmVQ promoters
The cis-elements of GmVQ promoters were analyzed to further understand the GmVQ gene family. We examined the sequences within 1,500 base pairs (bp) upstream of initiation codons (ATG) for promoter analysis and searched for these sequences in the soybean genome. The cis-elements in promoters were subsequently searched using the PlantCARE database (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/).

Gene interaction network
Protein sequence of GmWRKY transcription factors were obtained from the genome database of soybean, also were mapped to the WRKY proteins of Arabidopsis by BLASTP tool in the TAIR database. Subsequently, the interaction between GmVQs and GmWRKYs were forecasted based on the PAIR website (https://rc.webmail.pair.com/), and their network was drawn in Cytoscape 3.6.1.

Identification of GmVQs
Hidden Markov Model (HMM) of the VQ motif (PF05678) was used to search for putative VQs in soybean proteins database. A total of 75 VQs were identifiedand were named from GmVQ1 to GmVQ75 based on their physical locations on the chromosomes. This is different from the previous study, which 74 GmVQs were identified before the database updated (Wang et al., 2014;Zhou et al., 2016). ExPasy predicted that these 75 VQ proteins have different physical and chemical properties whose amino acid lengths ranged from 89 aa (GmVQ37 ) to 486 aa (GmVQ18), with an average of 223 aa and most of them were less than 300 aa. The molecular weights of these 75 VQ proteins ranged from 10.03 kDa (GmVQ37 ) to 52.79 kDa (GmVQ18) and their isoelectric points ranged from 4.29 (GmVQ69) to 10.74 (GmVQ51) ( Table 1).

Phylogenetic analysis and multiple alignment of the VQ genes
To explore the phylogenetic relationships among the VQ genes of soybean, A. thaliana and O. sativa, a NJ phylogenetic tree was constructed (Fig. 1). We found that soybean and A. thaliana have a closer relationship than rice. Based on their relationship with AtVQs and OsVQs and the characteristics of GmVQs' core domain, they were divided into 7 groups, designated Group I-VII (Figs. 1 and 2). For the 75 GmVQ proteins, Group VI contains two VQ proteins; Group V has the biggest amount, with 17 VQ proteins. Groups I, II, III, IV, VII contain 7, 15, 8, 12, 14 members respectively. At the same time, we found 5 types of VQ specificity domain: FxxxVQxLTG (54/75), FxxxVQxFTG (16/75), FxxxVQxVTG (2/75), FxxxVQxLTR (1/75), FxxxVQxLTS (1/75), besides, there is also a GmVQ protein (GmVQ10) has partial domain deletion (Fig. 2). Different types of VQ domains indicate that they might have different biological functions.

Conserved motifs and gene structures of the VQ gene family
We predicted that the 75 GmVQs contained 20 conserved motifs, with the motif length ranged from 11 aa to 50 aa (Fig. S1). Every GmVQ member contains 1-7 conserved motifs ( Fig. 3B). All of the proteins, excepted GmVQ22, show motif 1 which contains a specialty VQ domain. Additionally, an unrooted phylogenetic tree was constructed with VQ protein sequences, suggested that the motifs organization of VQ genes were consistent with the phylogenetic tree (Fig. 3A). Group V contains motif 4, Group IV contains motif 2. We found that most groups possess more than two motifs, suggested that every group might have special functions with a highly conserved amino acid residue. Through the VQ gene structures analysis, half of the group VI has introns; genes in group V have longer coding regions, while genes of group I have shorter coding regions than other groups (Fig. 3C). Interestingly, 78.67% (59/75) of GmVQ genes are intronless genes. It is speculated that a large number of introns might be lost in VQ genes during evolution. The phylogenetic

Chromosome location and gene duplication
We drew a chromosomal location map of GmVQ genes on each chromosome (Fig. 4).
GmVQs are distributed on all soybean chromosomes, except chromosome 16, and were densely distributed on chromosome 8 and chromosome 13, containing 13 and 7 members, respectively (Fig. 4). Most of GmVQs are distributed on the two ends of chromosomes. Segmental or tandem duplicate in many gene families are the main expanding way in plants.
To better study the evolution of GmVQ genes, we further explored gene duplication events using the MCScanX software. We found that 52 pairs of genes originated from segmental duplication, and 4 pairs of genes involved in tandem duplication events (Table S2).

Evolution and divergence of the VQ gene family in soybean and Arabidopsis
With the OrthoMCL software, we found 56 paralogous pairs in soybean, 37 orthologous pairs between soybean and Arabidopsis. Some VQ genes have never had any homology genes. All the paralogous and orthologous pairs are listed in Table 2. At the same time, we found that two or more GmVQ genes match to one AtVQ gene, implying that they might promote the expansion of the VQ gene family during evolution. We calculated Ka/Ks ratios of 55 paralogous pairs in soybean (Table 3). Most Ka/Ks ratios are <1, however, the GmVQ54/GmVQ63 and GmVQ65/GmVQ36 pairs are <1. In addition, the genetic differentiation of the 55 gene pairs occurred between 5 and 30 MYA.

Expression analysis of GmVQ genes among various tissues
Sixty-seven GmVQ genes were investigated using available RNA-seq data from nine different tissues (seed, pod, SAM, stem, flower, leaf, root, nodule, and root hair) (Fig.  5). We found that the expression levels of the GmVQs varied significantly in different tissues.Most GmVQ genes were found expressed in more than one detected organ. As shown in Fig. 5, genes in group A are expressed in all analyzed tissues. The expression levels of group B in pod and stem tissues are higher. Genes in group C have specific expression in leaf and root.

Cis-elements in GmVQ promoters
We found many hormone-and stress-related promoter's cis-elements in GmVQ genes. Enhancer regions (CAAT-box) and core promoter element are around −30 bp of transcription start (TATA-box). Cis-acting regulatory element (A-box) are the common cis-acting elements in the promoter. Others cis-elements that were found in the 75 GmVQ s can be classified into three groups (Fig. 11). Twelve cis-elements involve in the hormone responsiveness; five cis-elements are stress-related elements: ARE/GC/LTR/MBS/TC;   some GmVQ genes containe plant growth and development elements, such as CATbox/circadian/GCN4/HD-Zip 1/MSA-like/RY-element. In addition, some GmVQ genes containe W-box motif, which is binding site for WRKY transcription factor.

Gene Ontology Enrichment
To further understand the functions of the GmVQs, we performed GO annotation and GO enrichment analyses ( Fig. S2 and Table S8). The GO terms included three categories, biological process (BP), molecular function (MF) and cellular component (CC). GO enrichment confirmed that these GmVQs were enriched in the biological process (GO:0008150), regulation of biological process (GO:0050789) and biological regulation (GO:0065007) terms of the BP category. Cellular component (GO:0005575), intracellular  (Table S8). MF was enriched in molecular function (GO:0003674) and binding (GO:0005488). The GO ebrichment suggested that GmVQs were play curcial roles in regulated of biological process.

Gene interaction network analysis
Based on the PAIR tool, we found the functions and their interactions of the GmVQs and GmWRKYs. As shown in Fig. 12A, 3 GmWRKYs are supposed to interact with GmVQ proteins, included GmWRKY115, GmWRKY149 and GmWRKY156, all of them belong to WRKY's IIc group. In the Fig. 12B, we found that GmWRKYs and AtWRKYs are quite

DISCUSSION
VQ protein is a kind of specific protein that widely exists in plant, involved in plant growth and can response to different stresses (Petersen et al., 2010;Fiil & Petersen, 2011;Xie et al., 2010). Hence, we completed genome-wide analysis of soybean VQ proteins by bioinformatic analysis and qRT-PCR to understand their regulation when environmental changed. In the previous study, 74 GmVQ genes were identified (Wang et al., 2014;  al., 2016). After the database was updated, we identified and isolated 75 GmVQ genes in the soybean genome. Compared with previous study, the number of genes in chromosome 2, 4 and 17 show a big difference. Soybean contains more VQ genes than that of A. thaliana (34) (Cheng et al., 2012), Populus trichocarpa (51) (Chu et al., 2016) and O. sativa (42) (Kim et al., 2013a;Kim et al., 2013b). The reason is the whole genome duplication events (WGD). There are two rounds of genome duplication, occurred at around 59 and 13 million years ago, which caused 75% soybean genes duplicated (Jeremy et al., 2010). Seventy-five VQ genes were identified in Glycine max's genome, divided into seven groups based on their comprehensive phylogenetic tree among G. max, A. thaliana, and O. sativa. These proteins are in the shorter branches and with closer spacing, suggesting that they were highly conserved during the evolution. The more closer related genes within the same group shared more similar gene structures, either in their intron or in the exon patterns. Whereas, the variation in different groups suggested the functional diversity of the VQ genes (Jiang, Sevugan & Ramachandran, 2018). In addition, most GmVQ genes (59; 78.67%) were found intronless, and most GmVQ genes (64; 85.33%) encoded relatively small proteins with protein length less than 300 amino acid. This suggests that VQ gene families were intronless and they were highly conserved during evolution. At the same time, gene duplication can help plants to adapt to different environments during their development and growth (Huang et al., 2016;Storz, 2009). The main expansion of GmVQ gene family is segmental duplication (52; 92.9%), only 4 pairs of genes involved in tandem duplication events (4; 7.1%). A similar phenomenon was reported in the BrVQ gene family, which contains a high proportion of segmental duplication (71.9%) and low proportion of tandem duplication (28.1%) (Zhang et al., 2015). Nonfunctionalization, subfunctionalization, and neofunctionalization generally take place after genome duplication, resulting in lose or fix of genes (He & Zhang, 2005;Sandve, Rohlfs & Hvidsten, 2018;Stark et al., 2017). Soybean has undergone the WGD and the whole genome triplication (WGT) compared to grapevine (Wang et al., 2017). As there are 18 VQ genes in grapevine genome, the predicted number of VQ genes in soybean should be more than 100 . However, in this study, we only found 75 VQ genes in the soybean genome, suggesting that there were gene loss events after genome duplication. In addition, the Ks value of each paralogous pairs was calculated to find gene duplication events, the most duplication events in GmVQ gene occurred between 5 and 30 MYA, consistent with the recent WGD in soybean (Wang et al., 2017;Jeremy et al., 2010).
The Ka/Ks ratios in different gene pairs are different, but most gene pairs' Ka/Ks ratios are less than one and only two gene pairs' (GmVQ54-GmVQ63 and GmVQ65-GmVQ36 ) ratios are larger than 1, implying these gene pairs undergo different selection pressure. The above analysis indicated that purifying selection played a crucial role during the evolution, conserved VQ proteins evolved much slowly at the protein level. Expression patterns of 67 GmVQ genes were performed to determine their tissue expression using RNA-seq data. The results showed that 24 genes were relatively highly expressed in nine tissues, indicated that they may relate to the growth and development of plants. Moreover, 76% (57/75) and 64% (48/75) of GmVQ genes' expression levels were obviously increased in leaves and roots, respectively. More and more studies have shown that VQ proteins played a significant role in plants development. The study of A. thaliana mutants showed that AtVQ8 had a certain influence on chlorophyll formation and leaf growth and development (Cheng et al., 2012). In this study, GmVQ7 and GmVQ75 were in the same evolutionary branch with AtVQ8. Their high expression in leaves indicating they might have similar function as AtVQ8 (Cheng et al., 2012). These results will help us to study the further function of soybean's VQ proteins.
Plants need to face various abiotic stresses during their growth in natural conditions, the most common of which are high salt, drought and cold (Kim et al., 2013a;Kim et al., 2013b;Wang et al., 2014). Except for regulation by environmental factors, VQ gene family is regulated by defense-related hormones, such as SA and ABA. In our study, we selected 25 GmVQs for qRT-PCR analysis under five different stresses (salt, drought, cold, SA and ABA stresses). In this study, most GmVQ genes were up-regulated with the SA treatment, the result is consistent with previous study that most AtVQ genes can response to pathogen or the SA treatment (Cheng et al., 2012). In addition, fifteen GmVQ genes (e.g., GmVQ2/21/29/46 ) were up-regulated under SA treatment, suggesting that they play a potential role in stress resitance. 56% GmVQ genes (14/25) were up-regulated, which is different with the up-regulation of VQ genes in rice that only three OsVQ genes were up-regulated more than two fold (Kim et al., 2013a;Kim et al., 2013b). Increasing evidence suggests that VQ genes are involved in various stress response. For example, 23% ZmVQ genes were upregulated, all the VvVQ genes were up-regulated by drought stress (Song et al., 2016;Wang et al., 2015). Consistently, 30% of GmVQ genes were up-regulated, GmVQ2/29/33 were highly expressed under drought stress. Nevertheless, AtVQ9 and AtVQ15 were reported can response to abiotic stress during high salinity treatment. The response of VQ genes to cold stress is similar to that of Chinese cabbage (Hu et al., 2013;Zhang et al., 2015;Cheng et al., 2012). In our study, GmVQ5/6/7/31/46/58/59 and GmVQ7/9/28 were activated the salt and cold stresses, respectively, because that their promoter region exists in specific stress cis-elements. Besides, homologous GmVQ genes possessed similar expression pattern but may exhibit opposite expression trend under stress, such as GmVQ9-GmVQ21 were up-regulated under SA treatment, but GmVQ9 was up-regulated and GmVQ29 was down-regulated during cold stress. These results suggest that GmVQ genes participate in response mechanism of abiotic stresses, their regulation mechanism is complex and diverse.
As auxiliary factor, VQ genes regulate transcription, can interact with many proteins to participate in regulating complex physiological and biochemical processes of plants, such as they can interact with WRKY transcription factors Lei et al., 2017;Lai et al., 2011). Studies have shown that the responses under three different pathogens, VQ protein are interacted with WRKY protein in rice (Li et al., 2014). VQ proteins and WRKY proteins may form a protein complex to exercise function. We found some of the GmVQ genes interact with group I's WRKY , most VQ genes interact with groups I and IIc's VQ protein in various stresses in previous reports (Dong et al., 2018;Guo et al., 2018;Lei et al., 2017). The promoter analysis indicated that 23 of 75 GmVQ genes (30.67%) contained one or more W-box motif in their 1,500 bp promoter regions, W-box were present in 78% VvVQ genes, 91% ZmVQ genes contained one or more W-box motif (Song et al., 2016). In the promoters of GmWRKY genes, W-boxes could regulate GmWRKY members (Dong, Chen & Chen, 2003). It indicates that WRKY protein affect VQ genes expression and thus responses to environmental stimuli (Dong et al., 2018;Guo et al., 2018).