Identification and characterization of aquaporin genes in Arachis duranensis and Arachis ipaensis genomes, the diploid progenitors of peanut

Aquaporins (AQPs) facilitate transport of water and small solutes across cell membranes and play an important role in different physiological processes in plants. Despite their importance, limited data is available about AQP distribution and function in the economically important oilseed crop peanut, Arachis hypogea (AABB). The present study reports the identification and structural and expression analysis of the AQPs found in the diploid progenitor genomes of A. hypogea i.e. Arachis duranensis (AA) and Arachis ipaensis (BB). Genome-wide analysis revealed the presence of 32 and 36 AQPs in A. duranensis and A. ipaensis, respectively. Phylogenetic analysis showed similar numbers of AQPs clustered in five distinct subfamilies including the plasma membrane intrinsic proteins (PIPs), the tonoplast intrinsic proteins (TIPs), the nodulin 26-like intrinsic proteins (NIPs), the small basic intrinsic proteins (SIPs), and the uncharacterized intrinsic proteins (XIPs). A notable exception was the XIP subfamily where XIP1 group was observed only in A. ipaensis genome. Protein structure evaluation showed a hydrophilic aromatic/arginine (ar/R) selectivity filter (SF) in PIPs whereas other subfamilies mostly contained a hydrophobic ar/R SF. Both genomes contained one NIP2 with a GSGR SF indicating a conserved ability within the genus to uptake silicon. Analysis of RNA-seq data from A. hypogea revealed a similar expression pattern for the different AQP paralogs of AA and BB genomes. The TIP3s showed seed-specific expression while the NIP1s’ expression was confined to roots and root nodules. The identification and the phylogenetic analysis of AQPs in both Arachis species revealed the presence of all five sub-families of AQPs. Within the NIP subfamily, the presence of a NIP2 in both genomes supports a conserved ability to absorb Si within plants of the genus. The global expression profile of AQPs in A. hypogea revealed a similar pattern of AQP expression regardless of the subfamilies or the genomes. The tissue-specific expression of AQPs suggests an important role in the development and function of the respective organs. The AQPs identified in the present study will serve as a resource for further characterization and possible exploitation of AQPs to understand their physiological role in A. hypogea.


Background
Aquaporins (AQPs) are small (21-34 kD) integral membrane proteins, which form channels facilitating movement of water and other small solutes across the cell membrane. Aquaporins are conspicuously present across all kingdoms of life including plants where they co-ordinate water transport from the soil to different plant parts [1][2][3][4]. Based on sequence similarity and subcelluar localization, five subfamilies of AQPs have been identified in seed plants: the plasma membrane intrinsic proteins (PIPs), the tonoplast intrinsic proteins (TIPs), the nodulin26-like intrinsic proteins (NIPs), the small basic intrinsic proteins (SIPs) and the uncategorized intrinsic proteins (XIPs) [5][6][7][8]. Variation in the number of AQP subfamilies specific to different plant species has been reported. Among the five subfamilies, XIPs are absent entirely from monocots and dicots like Brassicaceae [9][10][11]. In primitive land plants, two additional unique classes of AQPs, GlpF-like intrinsic protein (GIPs) and hybrid intrinsic proteins (HIPs) have been described and are presumed to have been lost in the course of evolution [12]. Among AQPs, TIPs and PIPs are specifically located in vacuolar and plasma membranes, respectively. Being the most abundant in plants, TIPs and PIPs play a central role in mediating water transport across the plant system. The SIPs were the first to be unraveled via genome sequence analysis and are generally localized in the endoplasmic reticulum (ER). NIPs are homologous to GmNod26, an abundantly expressed transcript in the peribacteroid membrane of nitrogen-fixing nodules of soybean roots [13] and are mostly found in the plasma membrane.
The general AQP structure resembles an hourglass formed by six transmembrane (TM) α helices (H1 to H6) joined by five inter-helical loops (A to E). At the center of the pore formed by the six TM domains, two different constricts are formed: one that harbors conserved NPA (Asn-Pro-Ala) motifs, and another one known as aromatic/arginine (ar/R) selectivity filter (SF) formed with four amino acids in the channel. Among the four amino acids, one is located in each of helix 2 (H2) and helix 5 (H5), and two residues are located in loop E (LE1 and LE2). These two constrictions predominantly determine solute specificity and permeability within a given AQP [9,14,15].
The availability of whole genome sequences in cultivated crop plants has accelerated the genome-wide identification and analysis of the AQP-encoding genes [16]. The genome-wide characterization of AQPs has revealed important properties such as their distribution, evolution and conserved structural features involved in solute transport [16]. In this context, identification and characterization of AQP genes is the first step to decipher their presence and role in regulating transport of water and other physiologically important molecules. Translating this information to crop plants carries important implications with regards to breeding or engineering plants with improved water and nutrient uptake.
Plant AQPs exhibit abundant diversity in comparison with AQPs from bacteria and animals. This assists plants to overcome their disadvantage of immobility as they encounter varied environmental and climatic conditions. While initially AQPs were largely known as water channel proteins, they are now recognized to transport a plethora of small solutes like urea, H 2 O 2 , silicon, boron, ammonia and CO 2 [17]. The regulation of AQP genes in response to biotic and abiotic stresses has been reported in several crop plants [18][19][20]. Aquaporins also serve as key regulators modulating plant growth and development during various physiological and environmental states.
Arachis hypogea (L.), popularly known as peanut, is by far the most economically important species of the Arachis genus. It is an allotetraploid (2n = 4x = 40), thought to be derived from a single recent hybridization event between two wild ancestors, Arachis duranensis (AA) and Arachis ipaensis (BB) [21]. The crop is valued for the kernel, an important source of protein (28%), edible oil (42%), and numerous nutrients and minerals [22]. The production of the A. hypogea can be altered by different biotic and abiotic stresses causing significant yield losses annually. In recent years, weather fluctuations have caused severe water-deficit conditions threatening the sustainable production of A. hypogea. Drought causes tissue dehydration due to an imbalance between plant water uptake and transpiration [23]. These imbalances can be alleviated by AQPs, which play an important role in maintaining water balance and homeostasis under different environmental and stress conditions [24]. However, very little is known about the AQP distribution and function in A. hypogea (AABB), and how they could help efforts to develop more drought tolerant cultivars .
Recently the two progenitor genomes, A. duranensis and A. ipensis were sequenced to facilitate the study of the complete genome of cultivated A. hypogea [21]. In the present study, we took advantage of these available sequences to identify all AQPs in the diploid progenitor genomes of A. hypogea. Subsequently, we were able to characterize them according to their phylogenetic distribution, gene structure, conserved motifs and ar/R SF. Finally, we analyzed AQP expression in different tissues using available transcriptomic data from A. hypogea. This study brings novel and relevant information with regards to the many and specific functions AQPs play in A. hypogea and offers avenues to exploit this information to improve stress resistance in A. hypogea.

Genome-wide identification and distribution of AQPs in A. duranensis and A. ipaensis
The homology based search performed in the A. duranensis and A. ipaensis genomes revealed the presence of 32 and 36 AQPs, respectively. Subsequent identification of conserved domains also confirmed all the predicted AQPs (Additional file 1). Interestingly, based on the recent release of A. hypogea genome, we observed 73 AQPs distributed among different subfamilies (Additional file 2). HiddenMarkov model-based prediction of transmembrane helices showed the presence of six signature transmembrane domains in 23 out of 32 AQPs in A. duranensis and 23 out of 36 in A. ipaensis (Additional file 3). Tertiary protein structure analysis of the AQPs confirmed the typical hourglass-like structure formed with six TM domains for all proteins analysed except AipNIP1-3 (Additional file 4). The A. duranensis and A. ipaensis AQPs were found to be distributed among nine out of 10 chromosomes. In A. duranensis, the highest number (six) of AQPs were found on chromosome 3, 9 and 10 ( Table 1). Similarly, in A. ipaensis the highest number (10) of AQPs was found on chromosome 3, while five AQPs each were found on chromosome 9 and 10 ( Table 2).
Phylogenetic and gene structure analysis of AQPs in A. duranensis and A. ipaensis Phylogenetic analysis of AQP candidates from A. duranensis and A. ipaensis along with known AQPs from Arabidopsis thaliana and Glycine max formed five distinct clusters representing different classes of AQPs (Fig. 1). The AQP candidates were classified according to their respective cluster. In A. duranensis, AQPs were grouped into nine PIPs, 11 TIPs, eight NIPs, three SIPs and one XIP. For their part, AQP candidates from A. ipaensis were grouped into nine PIPs, 10 TIPs, 10 NIPs, three SIPs and four XIPs. The AduPIPs and AipPIPs formed two major sub-groups of PIP1s and PIP2s comprising five and four members, respectively. Likewise, the TIP family classified into five subclusters containing a different number of TIPs in each subcluster. NIPs formed three (NIP1, NIP2 and NIP3) and four (NIP1, NIP2 NIP3 and NIP4) groups in A. duranensis and A. ipaensis respectively. The SIPs from both species formed two groups, SIP1 and SIP2, containing two and one members, respectively. Among XIPs, no XIP1 was found and only a single member of XIP2 (AduXIP2-1) was observed in A. duranensis. In A. ipaensis, XIP1s had three members (AipXIP1-1, AipXIP1-2and AipXIP1-3) and XIP2s (AipXIP2-1) had one member.
Characterization of NPA motif, transmembrane domains and sub-cellular localization of A. duranensis and A. ipaensis AQPs Candidate Arachis AQPs were found to have differences in NPA motifs and residues at ar/R SF ( Table 3 and  Table 4). In both species, all PIPs and TIPs displayed two conserved NPA motifs. Among NIPs, NIP1s and NIP2s showed two conserved NPA domains, while NIP3s showed variation. Among NIP3s the variation included a substitution of alanine by serine or valine and substitution of asparagine by lysine. Additionally, all the members of the XIP and SIP subfamilies had varying NPA domains. All PIP subfamily members from both studied species displayed conservation at the ar/R SF residues (Table 3 and Table 4) with phenylalanine at H2, histidine at H5, threonine at LE1 and arginine at LE2 (Table 3 and Table 4). Most of the TIP subfamily members showed group specific conservation of ar/R SF. For instance, all TIP1s contained Histidine-Isoleucine-Alanine-Valine, while all TIP2s comprised of Histidine-Isoleucine-Glycine-Arginine in both the species. Similarly, NIPs also displayed subgroup specific conservation of ar/R SF except NIP3s, which displayed variation in the SF.
To characterize spatial expression of AQPs from both species, their subcellular localizations were predicted in silico (Additional file 5). In A. duranensis and A. ipeanensis, the majority of the PIP homologs were predicted to be localized in the plasma membrane but AduPIP1-4 that was predicted to be localized in the mitochondria. Expectedly, most TIP sub-family members were predicted to be in the vacuole. However, a few family members were predicted to be localized in the cytoplasm (AduTIP1-1, AipTIP3-1), the chloroplast (AduTIP1-2, AduTIP5-1, AipTIP1-3, AipTIP5-1), Mitochondria (AduTIP3-1) and the plasma membrane (AipTIP2-2). Most NIPs from both species were expected to be found in the plasma membrane. The SIP family members had candidates in different sites including the plasma membrane (AduSIP1-1, AipSIP1-1, AipSIP1-2), the chloroplast (AduSIP1-2) and the cytoplasm (AduSIP2-1, AipSIP2-1). The members of XIPs were found to be likely localized in the cytoplasm, chloroplast, or nucleus.

Aquaporin expression profiling across different tissues in A. hypogea
Analysis of RNA-seq data from A. hypogea showed expression of 27 out of 32, and 32 out of 36 identified AQPs from A. duranensis and A. ipeanensis, respectively. Similar patterns of expression for members of different AQP subfamilies from both AA and BB genomes were observed. Analysis of expression of AQPs at different developmental stages revealed a higher expression of PIPs followed by TIPs, SIPs, NIPs and XIPs (Fig. 3). In general, PIPs showed higher expressions across all tissues analyzed. Among TIPs, TIP2s (AduTIP2-2 and Aip-TIP2-3) showed higher expression in the roots, pistil and leaves. Higher expression of TIP3s (AduTIP3-1 and AipTIP3-1) was observed in all the five different developmental stages of seeds. Similarly, few NIPs (Adu-NIP1-2, AipNIP1-3) showed high to moderate expression in four out of five different developmental stages of seeds. The AduNIP1-4 and AipNIP1-5 showed strong expression in root nodules. Among XIPs, no expression of the unique XIP member specific to the AA genome (AduXIP2-1) was observed. However, the BB genome specific XIP members, AipXIP2-1 accumulated at higher levels in nodules of A. hypogea.

Discussion
In this study, we exploited the availability of the whole genome sequence of A. duranensis and A. ipeansis, the progenitors of cultivated A. hypogea, [21] to provide an exhaustive identification and characterization of A. hypogea AQPs as a way to facilitate a better understanding of their roles in the development of the plant. Although the full genome of A. hypogea has recently been made available, genome-wide study of AQPs in true diploid progenitor species provides some advantages over its analysis in a highly complex and polyploid genome like peanut. Indeed, it is more informative about the relative importance and function of each aquaporin in its respective diploid progenitor, and about the impact of genome polyploidization on AQP gene structure, function and dosage-dependence on gene expression pattern. The advent of next generation sequencing platforms has enabled the decoding of AQPs in many plant species [25][26][27] and has highlighted their many functions in metabolism regulation, namely in the case of biotic and abiotic stresses [28], information that can have many positive implications in developing new varieties better adapted to stress conditions. The number of AQPs identified in A. duranensis and A. ipaensis, 32 and 36, was found to be fairly proportional to their respective genome size of 1.25 and 1.56 Gb [21]. By comparison, many dicots such as Arabidopsis (35) [7], Phaseolus (41) [29], and pigeon pea (40) [30] bear similar numbers. On the other hand, some plant species such as canola which evolved with polyploidization contains as many as 120 AQPs [28]. However, notwithstanding this lower number of AQP candidates, the phylograms of both species showed homologs representing all five subfamilies (PIPs, TIPs, NIPs, SIPs and XIPs) as observed in most higher plant species. The presence of XIPs is particularly interesting since all monocots and dicots belonging to the Brassicaceae family are characterized by a complete absence [6,7,27]. The analysis of A. hypogea genome revealed the presence of 73 AQPs representing homologs of most of the AQPs identified from its progenitor genomes, A. duranensis (32) and A. ipaensis (36). The difference in the number of aquaporin genes in A. hypogea can be attributed to gene duplication and loss specific to different subfamilies of AQPs over the course of evolution. Similar observations of gene expansion have been reported in LEA and SWEET genes in Brassica napus (AACC) compared to its progenitors, Brassica rapa (AA) and Brassica oleracea (CC) [31,32]. The exon-intron structure observed in the two Arachis species was found conserved and does correlate well with their phylogenetic distribution. Since the exon-intron structure in AQP subfamilies was similar in both species, this indicates that a diversification of AQPs preceded the evolution of the genus Arachis. The similar gene structure also points to a conserved function of AQPs within the genus. The intron number was reported to be correlated to gene expression, duplication, and diversification in plants [33]. For instance, the high intron number variation in NIPs is correlated with their vulnerability to evolution.
The two conserved NPA motifs along with the four amino acids that form the ar/R SF largely determine solute specificity and transport of the substrate across Legend: AQPs [9,14,15]. Based on our analyses, the respective members of each AQP subfamily in both Arachis species showed conserved NPA motifs and similar ar/R SF. Indeed, all PIPs were found to harbor the characteristic double NPA motif and a hydrophilic ar/R SF (F/H/T/R) Small basic intrinsic proteins (SIPs) Plasma membrane intrinsic proteins (PIPs) Nodulin-26 like intrisic proteins (NIPs) as observed in AqpZ [34], confirming their affinity to transport water. The same filter was found conserved for PIP members from other plant species such as Zea mays [5], Solanum lycopersicum [35], A. thaliana [7], G. max [30], and Phaseolus vulgaris [29]. PIP members are known to regulate water transport in several plant species and play an instrumental role in maintaining root and leaf hydraulics [29]. PIPs have also been shown to regulate photosynthesis in A. thaliana, Hordeum vulgare and Nicotiana tabaccum by facilitating CO 2 diffusion in mesophyll tissues [29]. Therefore, the conserved features of PIPs suggest similar functions in Arachis, a conclusion reinforced by RNA-seq analyses that showed a higher expression of PIPs across different tissues analyzed. Among TIPs, NPA motifs were conserved and, TIP1s and TIP3s showed more hydrophobic residues than TIP2s, TIP4s and TIP5s. Generally, TIPs are located in vacuolar membranes and act as transporters of water and small solutes like ammonia, hydrogen peroxide , boron and urea [36][37][38][39]. The residues found in the ar/R SF in TIP subfamilies were similar to TIPs from other plant species pointing to a similar conserved role in Arachis species. A high accumulation of TIP3s was observed in seeds from different developing stages of A. hypogea, a phenomenon observed in A. thaliana [40,41], H. vulgare [42] and G. max [30] and reported as a role in seed desiccation processes. The TIP3s are involved in maturation of the vacuolar apparatus and allow optimal water uptake during embryo development and seed germination [43]. Recently, BvCOLD1, a boron transport TIP was found to be involved in cold tolerance in sugar beet [39]. Further studies on TIP regulation could help better understand why cold stress represents a major limitation for peanut cultivation. Interestingly, in a recent study, Devi et al. [44] suggested that putative TIPs and PIPs in A. hypogea played a role in regulating drought tolerance.

Small basic intrinsic proteins (SIPs)
Among the NIPs, NIP1s showed a selectivity filter with more hydrophobicity (WVAR) compared to NIP2s and NIP3s. In the present study, a single NIP2 gene containing a GSGR selectivity filter was observed in both Arachis species. NIP2s with a GSGR selectivity filter play a unique role in plants by allowing influx of silicon (Si) [9,30]. In turn, Si accumulation has been shown to protect plants against a wide variety of biotic and abiotic stresses [45]. Interestingly, the presence of functional NIP2s for Si permeability vary greatly among plant species and our results bring the first evidence that A. hypogea has the appropriate channel to benefit from Si fertilization. In general, NIPs will display lower expression than PIPs or TIPs, and our results confirmed this trend whereAduNIP1-4 and AipNIP1-5 were found specifically expressed in roots and root nodules. A similar specificity of NIP expression was observed in G. max [46] and Medicago truncatula [47]. Additionally, in M. truncatula, NIPs were found to be expressed from the early to late stage of nodule development, which indicates their importance in nodule organogenesis [47].
In the XIP subfamily, a single XIP2 member was found in A. duranensis, while members of both XIP1s and XIP2s were observed in A. ipaensis. This suggests that A. duranensis lost XIP1s during the course of evolution, an observation reported in many other species including all monocots, which raises the question of their importance or role in plants. When they are present, the hydrophobic nature of their selectivity filter is believed to facilitate the transport of hydrophobic and bulky molecules such as urea, glycerol, and boric acid in plants [48]. Interestingly, the expression data showed that the only A. ipaensis XIP member, AipXIP2-1, accumulated at higher levels in the nodules of A. hypogea supporting its involvement in nodule development. Several studies have established the role of AQPs in key developmental processes [49,50]. For instance, it has been reported that the increased abundance of TIPs facilitates the development of new lateral root primordia in A. thaliana [51]. Nevertheless, XIPs deserve further studies since their exact role and the consequences of their loss in some species, remain poorly understood.

Conclusions
Genome-wide analysis and characterization of the AQP gene family were performed in A. duranensis (AA) and A. ipaensis (BB) the probable progenitor genomes of A. hypogea (AABB). The identification and the phylogenetic analysis of AQPs in both species revealed the presence of all five sub-families of AQPs. Within the XIP subfamily, the loss of XIP1s from the AA genome was observed, while the presence of a NIP2 in both genomes support a conserved ability to absorb Si within plants of the genus. The global expression profile of AQPs in A. hypogea through RNA-seq data analysis revealed a similar pattern of expression of AQPs regardless of the subfamilies or the genomes. A higher expression of TIP3s was observed in different stages of seed development in A. hypogea supporting a critical physiological role of TIP3s in seed development. The high accumulation of the BB genome specific-AipXIP2-1 in nodules of A. hypogea suggests a novel role in nodule development for the elusive XIPs. The AQPs identified in the present study will serve as a resource for further characterization and possible exploitation of AQPs to understand their physiological role in A. hypogea.

Genome-wide identification and distribution of AQPs in A. duranensis and A. ipaensis
The genome sequences of A. duranensis A. ipaensis and A. hypogea were retrieved from the PeanutBase (https:// peanutbase.org/) [52]. Predicted protein sequences were used to create a local database using BioEdit ver. 7.2.5 [53]. Homologs of the AQPs coding genes were identified by BLASTp search performed against the local database using AQPs from A. thaliana, Oryza sativa and G. max (Additional file 6). An e-value of 10 − 5 was kept as an initial cut-off to identify high scoring pairs (HSPs). The blast output was tabulated, and the HSPs having greater than 100-bit score were selected. Finally, redundant hits were removed to select unique sequences for further analysis.

Structural characterization of A. duranensis and A. ipaensis AQPs
The genomic and cDNA sequences of AQPs identified in A. duranensis and A.ipaensis were retrieved from Pea-nutBase. Structural annotations of the gene models (in gff3 format) were also retrieved from PeanutBase. The gene structure of AQPs was analyzed using GSDS ver. 2.0 [54].

Identification of functional motif and transmembrane domains and estimation of isoelectric point (pI) for AQPs
The NPA motifs were identified in predicted protein sequences using conserved domain database (CDD, www. ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) [55]. Missing NPA motifs in few AQP sequences were confirmed with a manual examination. Transmembrane domains in the genes were identified using TMHMM (http://www.cbs. dtu.dk/services/TMHMM/) [56] and SOSUI online software tools [57]. The transmembrane domains were analyzed manually to confirm alterations or complete loss. The isoelectric point (pI) of AQP protein sequences were calculated using the online tool Sequence Manipulation Suite version 2 (http://www.bioinformatics.org/ sms2/protein_iep.html) [58].

Phylogenetic analysis of AQPs in A. duranensis and A. ipaensis
The predicted AQP protein sequences were aligned using CLUSTALW alignment tool in MEGA6 [59]. The phylogenetic tree was constructed by the maximum likelihood method, and the stability of the branch node was analysed by performing 1000 bootstraps. The AQP subfamilies, PIP, SIP, TIP, NIP, and XIP, were classified according to the nomenclature used for A. thaliana, and G. max [6,9].
Expression profiling of A. hypogea AQPs RNA-seq data derived from 22 different tissues of cultivated peanut available in PeanutBase (Genbank BioProject PRJNA291488) were used for expression analysis. The transcriptome assembly and expression value estimation were done as described in Clevenger et al. [62]. Briefly, de novo assembly was carried out by a genome-guided approach using assembly pipeline from Trinity [63]. Total reads were mapped to the transcript assembly from 58 libraries using Bowtie [64], allowing two mismatches within a particular 25 bp seed. Uniquely mapped raw read counts per gene were normalized using the formula of Reads Per Kilobase of transcript per Million mapped reads (RPKM) = Number of Reads / (Gene Length/1000 * Total Number of Reads/1,000,000). The RPKM values for AQPs were extracted and used for heat map preparation. A heat map was constructed using TIGR Multi Experiment Viewer (MeV,http://mev. tm4.org). Hierarchical clustering with average linkage method was performed to cluster the AQPs.