The essence of NAC gene family to the cultivation of drought-resistant soybean (Glycine max L. Merr.) cultivars

The NAC gene family is notable due to its large size, as well as its relevance in crop cultivation – particularly in terms of enhancing stress tolerance of plants. These plant-specific proteins contain NAC domain(s) that are named after Petunia NAM and Arabidopsis ATAF1/2 and CUC2 transcription factors based on the consensus sequence they have. Despite the knowledge available regarding NAC protein function, an extensive study on the possible use of GmNACs in developing soybean cultivars with superior drought tolerance is yet to be done. In response to this, our study was carried out, mainly through means of phylogenetic analysis (rice and Arabidopsis NAC genes served as seeding sequences). Through this, 139 GmNAC genes were identified and later grouped into 17 clusters. Furthermore, real-time quantitative PCR was carried out on drought-stressed and unstressed leaf tissues of both sensitive (B217 and H228) and tolerant (Jindou 74 and 78) cultivars. This was done to analyze the gene expression of 28 dehydration-responsive GmNAC genes. Upon completing the analysis, it was found that GmNAC gene expression is actually dependent on genotype. Eight of the 28 selected genes (GmNAC004, GmNAC021, GmNAC065, GmNAC066, GmNAC073, GmNAC082, GmNAC083 and GmNAC087) were discovered to have high expression levels in the drought-resistant soybean varieties tested. This holds true for both extreme and standard drought conditions. Alternatively, the drought-sensitive cultivars exhibited lower GmNAC expression levels in comparison to their tolerant counterparts. The study allowed for the identification of eight GmNAC genes that could be focused upon in future attempts to develop superior soybean varieties, particularly in terms of drought resistance. This study revealed that there were more dehydration-responsive GmNAC genes as (GmNAC004, GmNAC005, GmNAC020 and GmNAC021) in addition to what were reported in earlier inquiries. It is important to note though, that discovering such notable genes is not the only goal of the study. It managed to put emphasis on the significance of further understanding the potential of soybean GmNAC genes, for the purpose of enhancing tolerance towards abiotic stress in general. This scientific inquiry has also revealed that cultivar genotypes tend to differ in their drought-induced gene expression.


Background
Being an excellent source of protein and calcium (surpassing other major staples such as wheat and sorghum), soybean (Glycine max [L.] Merr) is generally considered the most valuable legume variety. Aside from being produced for human consumption, the plant is used to feed livestockhence further increasing its significance as a food source [1,2]. Moreover, its oil is used in biodiesel production [2,3]. In 2015 and 2016, soybean yield worldwide is estimated at 320.15 million metric tons [4].
Still, like other crops, soybean is normally exposed to a myriad of stress factors, ranging from the biotic (organism-induced) to the abiotic (environmental causes) [5]. Without proper management or sufficient tolerance, stresses can limit plant development and thus affect overall yield [6]. Among stress factors though, droughts brought forth by climate change have become among the most concerning, and are expected to adversely affect soybean production around the world in the years to come [6,7]. Thus, it is important to choose cultivars with the highest drought tolerance to address this expected threat to proteins and food security [8].
The NAC transcription factor family is a big group of proteins, which are named after three important proteins, Petunia No Apical Meristem (NAM), Arabidopsis transcription activation factors (ATAF1 and ATAF2) and Cup-shaped cotyledon 2 (CUC2) [9,10]. NAM is essential in the Petunia plant's embryo development and flower pattern formation [11]. In Arabidopsis, CUC2 is vital to embryo, flower, and apical meristem growth [12], while ATAF1 and ATAF2 are known to be responsive to stress factors [13][14][15][16]. The NAM domain content of these NAC genes covers N-terminus subdomains A to E [17,18]. Subdomains B and E are flexible, while the rest are highly conserved [19].
NAC transcription factors are relevant in both abscisic acid (ABA) dependent and independent pathways in drought stress signaling [28,29]. This was first revealed in the discovery of multiple stress-responsive genes in Arabidopsis, namely ANAC019, GmNAC055, and GmNAC072. The overexpression of the aforementioned genes led to enhanced drought tolerance [30,31]. In rice, one notable NAC gene is SNAC1, which is vital to drought stress signaling in guard cells. The overexpression of SNAC1 improves the drought tolerance of transgenic varieties [32]. OsNAC10, a root-specific transcription factor, is another noteworthy NAC. Overexpression of this gene enhances drought tolerance and provides another valuable benefitincreased grain production (under normal circumstances) [33]. SNAC2/ OsNAC6, Os045, and Os063 are other genes in rice whose overexpression boosts tolerance to abiotic stress factors (such as salinity, drought and cold) [34][35][36][37].
Shifting focus towards soybean, Meng et al. [38] recognized the first six GmNAC genes (designated 1 to 6). The expressions of GmNAC1-6 under osmotic stress were examined [39]. A more comprehensive research on GmNAC genes in soybean was conducted later on. In the said study, 9 of the 31 GmNAC genes tested were affected by abiotic stress factors, such as ABA treatments, high salinity, cold, and/or dehydration [40].
In 2011, Le et al. carried out another inquiry on soybean GmNAC genes, in which 152 full-length GmNAC transcription factors (inclusive of 11 membrane-bound transcription factors) were identified [23]. The researchers also discovered that out of the 38 GmNAC genes tested, six were repressed and 25 induced twice as much or more upon being subjected to dehydration treatment under real-time quantitative PCR (RT-qPCR) [23]. The same group of GmNAC genes also showed differences in drought-responsive expressions within the same tissue, at different developmental stages, and in different tissues at the same developmental stage [41].
Due to the recognized role of NAC transcription factors in various physiological and biological processes, a more comprehensive study of the NAC gene family in soybean was done. This was to establish the possible application of the said genes in creating highly droughttolerant transgenic cultivars. The research involved a total of 28 GmNAC genes from drought-sensitive (B217 and H228) and drought-tolerant (Jindou74 and 78) soybean varieties. It was hypothesized that the varying expression of GmNAC genes might have an impact on drought tolerance. Furthermore, the possible correlation between GmNACs and tolerance degrees was looked into.

Results and discussions
Identification of NAC Transcription factors and phylogenetic analysis of NAC genes in soybean, Arabidopsis, and rice NAC transcription factors in soybean were identified using Phytozome v.10.2 database using BLAST search technique and two well-known NAC proteins from rice and Arabidopsis were used as query sequences. During the search, the presence of NAC domain was found and confirmed and furthermore redundant sequences were removed. Non-redundant putative NAC genes which were identified in soybean genome were 139 GmNACs.
From the alignments of predicted NAC proteins, phylogenetic tree was constructed and GmNAC001 to GmNAC139 were designated in accordance with the tree. These listings are shown in Additional file 1. In these GmNACs genes, the coding proteins were composed of 190 to 378 amino acids in length. For every orthologous which is found in rice and Arabidopsis, two or more soybean NAC genes were found in most of the cases. Additional file 2 contains detailed information along with accession numbers regarding NAC family genes present in rice and Arabidopsis.
Phylogenetic relationships exists between NAC family proteins in Arabidopsis, rice and soybean and for clarification of these complex relationships, an unrooted tree can be constructed using NAC protein sequences from rice, Arabidopsis and soybean. Most of the groups obtained from the results were similar to previous phylogenetic analyses [42,43]. All the NAC members were clustered in the phylogenetic tree into 17 groups and this is shown in (Fig. 1).
GmNAC geness are as diverse as NAC proteins and in 16 groups there was unequal distribution of 139 GmNACs was found. In Group IV no member of GmNACs was present. In the phylogenetic analysis of Arabidopsis and rice this similar formula was observed [18,42], which indicates that either this group is acquired in rice and Arabidopsis or is lost in soybean plant after their common ancestors went through the process of divergence. Move over in rice and Arabidopsis plants, certain specific roles are played by these NACs and for studying the relationship between different plant species understanding of characteristics of these groups is important.
Usually similar functions are performed by the genes which have a close evolutionary relationship. Predicting the functions of gene is important in many functional studies and polygenetic analysis can be used for this purpose. Furthermore study of phylogenetic relationships is effective for prediction of stress-related functional genes [24,42]. From the phylogenetic tree, it can be established that GmNAC100, GmNAC101, GmNAC102, GmNAC103 GmNAC118 and GmNAC119 might play important part in the formation and development of shoot apical meristem as these are combined together in single with CUC1 and CUC3 [44,45]. OsNAC52, ANAC017, ONAC063, GmNAC004, GmNAC005, GmNAC116 and GmNAC117 which are combined together in a single group are responsible for drought tolerance and oxidative stress in transgenic plants [36,37,46]. A close evolutionary relationship is shown by GmNAC020 and GmNAC021 with ANAC096. ABF2 and ABF4 functions synergistically with ANAC096 and through their combined function plants can survive under osmotic stress and dehydration [47]. GmNAC064 to GmNAC082 are orthologous with ORE1/ANAC092/ AtNAC2, NAC047, OsNAC1, ANAC072/RD26, ANAC019, OsNAC9, OsNAC6, ANAC032, ANAC055 and ANAC102 and their functions in plants is to respond to stress conditions thereby resulting in improved yield in different stress conditions such as limited water [30,34,[48][49][50][51]. From GmNAC128 to GmNAC139, VNDs (including VND1-VND7) and NSTs (including NST1, NST2, NST3/SND1) combined together in a single group are considered as regulators in vascular vessel development [52]. In Group XVI, OsNAC10 and ANAC042 are present and they are responsible for oxidative stress response functions [33,53]. From GmNAC008 to GmNAC009 were orthologues to VN12 which functions in environmental stress responses and senescence and also interacts with gemeniviral replication initiator protein [54]. Drought-resistance response is inducted by GmNAC040, GmNAC041, GmNAC042, GmNAC043, NTL6 and NTL9 which are combined together in a single group and lead senescence is also affected by this drought-resistance response [55,56]. For cold response and co-ordination of flowering time, an important role is played by GmNAC046, GmNAC047, GmNAC052, GmNAC053 and LOV1 which are clustered in one group [57]. These results indicated different possible functions which are performed by NAC genes in the soybean plant.

Structural and motifs analysis of GmNAC genes
During the evolution period of multigene families, the structure of gene was commonly diversified and this facilitates the evolutionary co-option of genes for adopting different new functions in accordance with changes in the environment [21]. For in-depth investigation of structure of soybean NAC genes, we analyzed conserved motifs and intron/exon distribution according to the phylogenetic relationships (Fig. 2).
The analysis of gene structure indicates that introns present in GmNACs genes varies from 1 to 7 and this is similar to banana, in which the number of introns vary from 0 to 6. [58]. Nevertheless, the introns in cotton and rice vary from 0 to 9 and 0 to 16 respectively [21,59]. Through these results it is suggested that there exist small diversity in the gene structure of GmNACs genes when compared to NAC genes present in cotton and rice. It was also found that in 94 of 139 GmNAC genes two introns were present. In cotton, rice and banana this phenomena was observed and two introns were present in majority of NAC genes [21,58,59]. Nuruzzaman et al. [21] suggested that in rice the intron loss is faster compared to intron gain after the process of segmental duplication. Furthermore, 7 introns are present in GmNAC039, and exon-intron structure is similar in most of the GmNAC members which are present in the same group. In each group and close evolutionary relationship are supported by the conserved intron numbers.
For further examination of diverse structure of soybean NAC proteins, MEME program along with subsequent annotation with InterPro was used and 20 conserved motifs were identified and predicted (Fig. 2). In all the GmNACs there were at least 5 of the 7 main motifs present (motif 1, 2, 4, 5, 6, 7, 8 and 10). These motifs were annotated as no apical meristem (NAM) or as NAC domain, and therefore all soybean NACs which are identified in the study possessed conserved features which are present in the NAC family. The conserved motifs which are present in the N-region of the NAC proteins provides functions for DNA-binding and this indicated that these motifs are very important for proper functioning of NAC proteins and Singh et al. [60] observed a similar phenomenon for NACs present in potato. The motif number 19 was particularly found in 4 genes only. Moreover Motif 1 was similar to one present in CUC3 gene which is involved in leaf development and abiotic stress responses [45]. In the AtSOGI motifs 7 and 10 were found which provides response to DNA damage [61]. In the VNDs of Arabidopsis NAC genes Motif 9 and 16 were found and these were also associated with regulation of xylem vessel for example AtVND4,  AtVND5, AtVND2, AtVND6 and also performed functions in biosynthesis of vessels [52]. Similar motif composition is shared by NAC proteins which are combined together in a same group and this indicates that members present is same group possess similar functionalities.
Differential expression profiles of the 139 GmNAC genes using previous results of previous studies The productivity of soybean is severely affected by drought stress and abiotic stress and therefore study of genes of this crop important to increase crop yield. The research conducted by Dung et al. (2011) [23] indicated that among 152 GmNACs having putative full-length open reading frame, about fifty stress-related genes have been predicted using phylogenetic analyses of ANAC, GmNAC and ONAC families. According to the results of their study seven of our 139 GmNAC genes (namely GmNAC033, 064, 065, 068, 066, 067and 069) were found ubiquitously in all eight tissues, while 22 genes were found expressed strongly in stressed leaves as shown (Additional file 4). Silico analysis of GmNACs was conducted in the study along with rice and Arabidopsis counterparts and similar NAC architecture was revealed [23]. These results indicate that the stressrelated NAC genes are differentially expressed spatially and temporally in response to abiotic stresses.
In the above discussion, different biological aspects of GmNAC TFs have been identified and it can be considered that there is a need of future follow-up studies for improving the understanding of different regulatory functions performed by NAC members. If the operations of TFs are made understandable to greater extends then this understanding can be translated into potential application for increasing plant productivity.
Predicted GmNAC genes and dehydration stress: analysis of expression patterns among drought-tolerant and drought-sensitive soybean cultivars As previously discussed, dehydration stress has altered expression of many GmNAC genes in the soybean leaves [23,41]. Pinpointing the specific function of genes is of great importance in agriculture and other related industries. It is, after all, through such means that superior cultivars could be developed with great accuracy. In this study, the goal was to gather information on gene expression patterns in relation to dehydration stress, potentially revealing genes that could allow for new drought-tolerant varieties to be engineered. The study involved the analysis of 28 predicted stress-related GmNAC genes, mainly done through qRT-PCR. As to be expected, gene-specific primers were utilized (Additional file 5), and the samples were subjected to simulated drought conditions.
Despite being subjected to the same conditions, though as to be expected, the drought-tolerant and droughtsensitive cultivars had different phenotypic expressions in response to drought. Both H288 and B217 (drought-sensitive varieties) exhibited wilting at a much greater extent compared to the Jindou 74 and 78 (drought-tolerant varieties). The impact on growth was also much more prominent on the former pair than on the latter. These effects became more noticeable as the experiment progressed, and the differences were most evident upon reaching the eighth day (Additional file 6). As for gene expression, the GmNAC genes analyzed were expressed differently among drought-tolerant and drought-sensitive cultivars (Fig. 4). The drought-tolerant varieties (which include Jindou 74 and 78) exhibited high expression levels for GmNAC004, GmNAC021, GmNAC065, GmNAC066, GmNAC073, GmNAC082, GmNAC083 and GmNAC087. The droughtsensitive cultivars (B217, H228), on the other hand, were detected to have lower expressions for the aforesaid genes. Do note that this trend was observed in both extreme drought and standard stress setups. Interestingly, the Jindou varieties still maintained higher expressions for all (See figure on previous page.) Fig. 2 Phylogenetic relationships, gene structure and motif compositions of soybean NAC genes. a Multiple alignments of 139 full length proteins of NAC genes from Soybean were executed by MEGA 6.0 to construct the phylogenetic tree by the Neighbour-Joining (NJ) method with 1,000 bootstrap replicates. MEGA6.0 by the Neighbor-Joining (NJ) method with 1,000 bootstrap replicates. b Exon/intron structures of NAC genes from Soybean. Exons and introns are represented by blue, yellow boxes and black lines, respectively. c Schematic representation of the conserved motifs in the NAC proteins from Soybean elucidated by MEME. Each motif is represented by a colored box. The black lines represent the non-conserved sequences. Refer to Additional file 3 for the details of individual motif genes mentioned so far, even under non-stress conditions. Although associated with higher expression in more genes while under drought stress (24 out of 28 GmNAC genes to be exact), the Jindou varieties failed to match H228 and B217's expression for GmNAC067, GmNAC072, and GmNAC080 under normal conditions (Fig. 4).
Furthermore, the drought-linked GmNAC genes (GmNAC004, GmNAC021, GmNAC065, GmNAC066, GmNAC073, GmNAC082, GmNAC083 and GmNAC087) were found to be inducible regardless of drought sensitivity of cultivars, although in both drought and non-drought conditions, such genes showed greater transcript levels in the Jindou or drought-tolerant soybean varieties than that in drought-sensitive varieties. The same set of genes were also upregulated throughout different stages of leaf growth. The results suggest that these genes may play conserved roles in leaf growth and in plant responses to drought stresses.

Conclusion
Based on the data gathered throughout the course of the study, it may be said that drought tolerance is influenced by the expression of certain GmNAC genes, particularly GmNAC004, GmNAC021, GmNAC065, GmNAC066, GmNAC073, GmNAC082, GmNAC083 and GmNAC087. It is crucial to mention, however, that only three among the six have been previously known to be connected to drought. It may also be concluded that there is a correlation between gene expression, transcriptional regulation, and superior tolerance to drought, as seen from the different stages of leaf development across tolerant and sensitive cultivars. All in all, the study has revealed a number of potential gene candidates that could be given focus by those attempting to develop soybean cultivars with even greater drought tolerance.

Identification of soybean NAC genes
Putative NAC proteins in G. Max were identified by a BLAST search at Phytozome v 10.0 [63] using well known NAC proteins from Arabidopsis and rice as guidance for the criterion of the gene without NAC. The NAM domains were expelled from the analysis. The database of Arabidopsis and rice were extracted from the National Centre for Biotechnology Information [64]. The search was based upon NAC transcription factor as a key word. Approximately 400 protein sequences were Fig. 3 Differential expression data of 139 GmNAC genes in V6 and R2 leaves under drought stress. Genes shown are either up-regulated or down-regulated at least by two-fold; the colors indicate expression intensity (red, high expression; green, low expression; grey, no expression). 66 K Affymetrix Soybean Array GeneChip source: Dung Tien Le et al. [41] see Additional file 7 carried out. Upon removal of the putative, redundant, and unknown sequences, a total of 110 NACs from Arabidopsis and rice remained. All candidate GmNACs that contained at least one NAC domain (PF02365) were confirmed using SMART [65] and Pfam [66] online software. Lastly, 139 non-redundant and complete NACdomain containing protein sequences were selected for further analysis. The genes and corresponding Gene bank accession numbers for Arabidopsis and rice are listed in Additional file 2.

Phylogenetic analysis
The finalized 139 G. Max protein sequences that contained NAC-domain, 78 Arabidopsis NAC proteins, and 32 rice NAC proteins sequences in FASTA format were aligned for the construction of an un-rooted phylogenetic tree by the neighbor-joining method. The confidence level of the monophyletic groups was estimated using a bootstrap analysis of 1000 replicates. A multiple sequence alignment (MSA) and phylogenetic analysis were performed using quicktree [67]. Finally, the evolutionary tree is designed by MEGA6 or figtree. The NAC protein sequences of the phylogenetic tree are listed in Additional file 8.
We utilized the Gene structure display server program GSDS [68] in order to show the exon/intron structure of each GmNAC gene by comparison of the coding sequences with their corresponding genomic sequences from Phytozome.
A MEME 4.10.1 program utility [69] was used for the display of motifs and sequence logos of the GmNAC proteins from soybean [70]. The parameters were set as: Motif Count: Search for 20 motifs Occurrence of a single motif: 0 to 1 occurrence per sequence of a contributing motif site Motifs width range: between 6 and 50 wide -inclusive ) and drought (dr.) conditions. Error bars above means denote LSD value of three replicates. Different alphabetical letters (a, b, c…) above means show the significant differences (LSD < 0.05) among treatments of a cultivar at control and 8 days after drought. Error bars above means denote LSD value of three biological replicates All motifs discovered by MEME were annotated by adopting Interpro [71] and SMART [64]. The transmembrane domain into the NAC proteins was expected via the ARAMEMNON plant membrane protein database [72]. Motif information was listed in Additional file 3.

Plant materials, growth conditions and treatments
Four soybean varieties viz., (Jindou74 and 78) drought tolerant and (H228 and B217) drought sensitive were sown in each pot on 10 May 2015. Most of the seedlings emerged within 7 days after sowing. Plants of each pot received adequate watering regularly to maintain optimal soil moisture until the water stress treatment was imposed. Plants of all the genotype were subjected to two levels of water regime viz., ck: Non-stress (Control); water was applied when it is required and dr: Drought stress throughout the growing period; while the plant has two trifolium leaves, they were affected to drought stress by holding water. For Quantitative real-time RT-PCR, leaves of v2 growth phase were collected from control plant and plant with eight days of drought treatment as appearance of wilting symptom was clearly observed comparing with the control plant, and the results are shown at Additional file 6.

RNA isolation and cDNA synthesis
Pre-frozen leaves of control and stress samples were ground to a fine powder in liquid nitrogen; and total RNA was isolated using the TRIzol reagent (Invitrogen) according to the manufacturer's instructions. The obtained RNA concentration and purity were measured using a spectrophotometer (NanoDrop, ND-1000), and contaminating DNA in the total RNA was degraded using RNase-free DNase, according to the manufacturer's instructions. High-quality 2 μg RNA was utilized to obtain the first strand of cDNAs from each sample synthesis kit (M-MLV, China), according to the manufacturer's protocol. The quality of the cDNA and contamination with genomic DNA were examined using (NanoDrop, ND-1000) and a standard PCR assay with primers of the β-actin soybean gene.
Quantitative real-time PCR and statistical analyses cDNA was diluted 1:10 with ddH 2 O for (qRT-PCR) assay, Gene-specific primers were designed for the 28 GmNACs genes, according to their predicted CDS and gene structure using PrimerQuest IDT (http://www.idtdna.com/pri merquet/home/index/). Quantitative real-time RT-PCR (qPCR) was performed on a qPCRsoft V 3.0. Using NI -ROX DYE I according to the manufacturer's protocol the reaction mixture contained 10 μL of SYBR Green I Master Mix (Applied Biosystems), 0.8 μm of each primer and 2 μL diluted template cDNA. The qPCR assays were performed with three replicates. The amplification programs the following cycling condition: 95°C for 3 min, 45 cycles of 95°C for 10 s, 58°C for 30 s, and 72°C for 30 s. each sample had three replicates to ensure the accuracy of results, and reaction mixtures with β -actin of glyma templates were used as controls to evaluate the specificity of each real-time PCR. The relative gene expression levels were determined by the ΔΔCT method as described previously [73]. The genes ID with specific primers information were listed in Additional file 5.

Statistical analysis
The experiment was arranged in a completely randomized design (CRD) with three independent replicates. Each replicate was consisted of four pots. Analysis of Variance (ANOVA) was used for analyzing the data statistically. For the significant difference, a margin of 5% probability level was used utilizing Statistix 8.1 (Analytical Software, Tallahassee, FL, USA) software.

Additional files
Additional file 1: Additional file 4: Heat map representation for tissue-specific expression of 44 predicted stress-responsive and 6 previously reported dehydrationresponsive GmNAC genes. Expression patterns of 44 GmNAC genes were analysed using Illumina transcriptome data (source: [22]). Box A indicates a group of ubiquitously expressed GmNAC genes in the eight types of tissues examined. The colors indicate expression intensity (red, high expression; green, low expression; grey, no expression). (JPG 74 kb) Additional file 5: Table of