Investigation of Aquaporins Involved in the Adaptation of Canavalia Rosea to Saline-alkaline Soils and Drought Stress

Background: Canavalia rosea (Sw.) DC. (bay bean) is an extremophile halophyte that is widely distributed in coastal areas of the tropics and subtropics. Seawater and drought tolerance in this species may be facilitated by aquaporins (AQPs), channel proteins that transport water and small molecules across cell membranes and thereby maintain cellular water homeostasis in the face of abiotic stress. In C. rosea, AQP diversity, protein features, and their biological functions are still largely unknown. Results: We describe the action of AQPs in C. rosea using evolutionary analyses coupled with promoter and expression analyses. A total of 37 AQPs were identied in the C. rosea genome and classied into ve subgroups: 11 plasma membrane intrinsic proteins, 10 tonoplast intrinsic proteins, 11 Nod26-like intrinsic proteins, 4 small and basic intrinsic proteins, and 1 X-intrinsic protein. Analysis of RNA-Seq data and targeted qPCR revealed organ-specic expression of aquaporin genes and the involvement of some AQP members in adaptation of C. rosea to extreme coral reef environments. We also analyzed C. rosea sequences for phylogeny reconstruction, protein modeling, cellular localizations, and promoter analysis. Furthermore, one of PIP1 gene, CrPIP1;5, was identied as functional using a yeast expression system and transgenic overexpression in Arabidopsis. Conclusions: Our results indicate that AQPs play an important role in C. rosea responses to saline-alkaline soils and drought stress. These ndings not only increase our understanding of the role AQPs play in mediating C. rosea adaptation to extreme environments, but also improve our knowledge of plant aquaporin evolution more generally. The full-length cDNAs of CrPIP1;5 (GenBank accession number MT787665) and CrPIP2;3 (GenBank accession number MT787666) were isolated from the cDNA library of C. rosea seedlings, in which all cDNAs were inserted into a yeast expression vector (pYES-DEST52) using Gateway ® techniques (Life Technologies). The recombinant plasmids containing CrPIP1;5 and CrPIP2;3 cDNAs were designated as CrPIP1;5-pYES-DEST52 and CrPIP2;3-pYES-DEST52 and used as template DNA in the following PCR assays. The open reading frame (ORF) regions of CrPIP1;5 and CrPIP2;3 were PCR-amplied using the primer pairs PIP1-5BDF/PIP1-5BDR and PIP2-3BDF/PIP2-3BDR, respectively (Table S2), and then inserted into a pGBKT7 vector using InFusion® techniques (In-Fusion HD ® Cloning System, Clontech) to construct the pGBKT7-CrPIP1;5 and pGBKT7-CrPIP2;3 bait plasmids. The prey plasmid, pGADT7-CrPIP1;5, and pGADT7-CrPIP2;3, were generated by cloning the CrPIP1;5 and CrPIP2;3 ORFs into the pGADT7 vector after amplication (using the primers PIP1-5ADF/PIP1-5ADR, PIP2-3ADF/PIP2-3ADR as above; Table S2). Constructed vectors including activation domain (AD) and binding domain (BD) were co-transformed into AH109-competent yeast cells in pairs, and transformants were plated on SD/-Leu-Trp and SD/-Leu-Trp-His mediums to test protein interactions. Transformant spots on SD/-Leu-Trp-His medium were also supplemented with 40 µg/mL 5-bromo-4-chloro-3-indoxyl-α-d-galactopyranoside (X-α-Gal, 2 µL per spot) to further conrm interactions of the different co-transformants. Each experiment was independently repeated three times.

. In addition to water transport, AQPs facilitate the transport of small molecules such as urea, H 2 O 2 , and NO, and elements such as boron and silicon across cell membranes [6]. Aquaporins are found in a wide variety of taxa, including microbes, animals, and plants, and are the oldest family of major intrinsic proteins (MIPs).
Aquaporins have been traditionally classi ed into four major subfamilies: plasma membrane intrinsic proteins (PIPs), tonoplast intrinsic proteins (TIPs), Nod26-like intrinsic proteins (NIPs), and small and basic intrinsic proteins (SIPs) [7]. Additionally, in some plant genomes, a small number of AQPs have been identi ed as a fth subfamily called X-intrinsic proteins (XIPs), which are absent from model plants and crops such as Arabidopsis, rice, and maize [8]. Furthermore, GlpF-like intrinsic proteins (GIPs) isolated from a moss (Physcomitrella patens) and hybrid intrinsic proteins (HIPs) found in a fern (Selaginella moellendor i) and a moss (P. patens), which are rare in most plants, are both classi ed into the AQP family [9,10].
Structurally, almost all AQPs consist of six transmembrane domains (α-helices, H1 to H6) with N and C termini facing the cytosol [11]. The six transmembrane domains are joined by ve interhelical loops (A-E). The conserved loops (B and E) show extremely hydrophobic characteristic, often containing internal repeats of asparagine-proline-alanine residues (NPA motifs). These conserved, hydrophobic loops seem to be the most important features maintaining AQP function by forming short helices [12,13]. Aromatic/Arginine regions (ar/R) and Froger's positions are also conserved in most of AQPs [14]. Generally, AQPs are inserted into membranes in a tetrameric structure comprising four independent pores created by AQP monomers [15].
Besides being water channel proteins, some AQPs are also involved in facilitating the transport of CO 2 [16], NO [17], glycerol [18], H 2 O 2 [19], some trivalent elements [20], and a wide range of small uncharged solutes [21]. It is clear that AQPs show versatile functions in water uptake, nutrient balancing, long-distance signal transfer, nutrient/heavy metal acquisition in plant development, and stress responses [22].
Unlike the AQP members in yeast (only two genes, AQY1 and AQY2) [23] or animals (only 13 AQPs in mammals) [24], plants AQPs comprised large, highly diverse gene families that may be linked to plants' greater adaptability to local conditions given their sessile nature [11,25]. Many AQP gene families have been identi ed using cDNA and whole-genome analyses in a wide variety of plant species, including Arabidopsis (35 members) [26], maize (31 members) [27], and rice (34 members) [28]. Given advances in whole genome sequencing, AQP-related research has recently gained traction in studies of plant adaptation, especially for halophyte and drought-tolerant plants. As a close relative of Arabidopsis, Eutrema salsugineum has been considered a model extremophile used to identify mechanisms of salt tolerance. The AQP family in E. salsugineum has been characterized by assessing differential gene expression patterns, with research mostly focused on assessing responses to salt, cold, and drought stress [29,30]. Chickpea (Cicer arietinum) has better drought tolerance than most of leguminous species and its AQP gene family has been characterized to further investigate its adaptability to water de cit [31,32]. Furthermore the AQP gene family of cassava (Manihot esculenta), a drought-tolerant tuber that is an important food resource in many African countries, has been characterized in terms of its evolution, structure, and expression patterns [33]. Canavalia rosea is more tolerant to drought, high salinity, heat, and low nitrogen and phosphorous than most of leguminous plants. It is therefore of particular interest to identify the complete set of AQPs within C. rosea (CrAQPs) and to perform comparative analyses to understand their evolutionary relationships, particularly regarding the adaptability of this species to coastal and coral reef habitats.
In our study, the availability of whole genome sequence data for C. rosea facilitated genome-wide analysis to identify the evolutionary relationships between C. rosea AQPs and those of related leguminous species. We characterized the structure of CrAQPs and their chromosomal locations. We also investigated the expression pro les of CrAQP genes in various tissues, in response to different abiotic stressors, and in different habitats, along with promoter analyses. Additionally, a single plasma membrane intrinsic protein gene, CrPIP1;5, was functionally identi ed using heterogeneous transgenic assays.

Results
Identi cation of the C. rosea AQP family Base on the protein BLAST research and Hidden Markov model pro le (Pfam ID: PF00230) search, a total of 37 CrAQP members were identi ed and annotated in the C. rosea genome database. The set of CrAQPs includes 11 NIPs, 11 PIPs, 10 TIPs, 4 SIPs, and 1 XIP (Table 1), which were named according to their phylogenetic and sequence identity relationships with AtAQP and GmAQP proteins (Table 1). Based on multiple alignments, a neighbor-joining phylogenetic tree was constructed with the amino acid sequences of AQPs from C. rosea, Arabidopsis, and soybean ( Fig. 1). The clustering results clearly showed that there was only one sequence encoding for the XIP protein in C. rosea. In addition, the SIP subfamily in C. rosea (CrSIP) had a smaller but more conserved cluster than other three subfamilies in C. rosea (CrNIP, CrPIP, and CrTIP). We also compared the number of AQP genes in C. rosea with other plant genomes (  39]). The numbers of AQP genes in all of these typical diploid species were similar. The soybean (G. max) genome contains 72 AQP genes, which might be due to a whole-genome duplication event in the distant past [34].  The length of CrAQP proteins ranged from 155 aa (CrSIP1;3) to 709 aa (CrNIP4;1), while most were between 230 and 320 aa. The predicted molecular weight and isoelectric points of the CrAQPs ranged from 17.13 kDa to 78.97 kDa and 4.8 to 9.92, respectively (Tables 1 and 3). Thirty four of the 37 CrAQPs included six transmembrane domains and the remaining three members (CrNIP4;1, CrSIP1;3, and CrXIP1;1) possessed seven, three, and ve transmembrane domains, respectively ( Table 3). The identi cation of transmembrane regions of CrAQPs is shown in Figure S1.

Features of AQP proteins
The phosphorylation state of AQP proteins is a key factor regulating the transport of water and other small molecules. In this study, we predicted the possible phosphorylation sites of CrAQPs. In brief, all CrAQPs except for CrXIP1;1 contained all three phosphorylation sites (Ser, Thr, and Tyr; Table 1). We also predicted the subcellular localization of CrAQPs. The two programs used (WoLF_PSORT and Plant-mPLoc) had similar results and most CrAQPs were located in the plasma membrane, although some were located in vacuoles, plastids, and the endoplasmic reticulum ( Table 1). The subcellular localizations of CrAQPs showed diverse and broad patterns, indicating that the in vivo compartmentation of CrAQPs is necessary for each member to regulate transport of water and/or solutes across the plasma membrane and intracellular membrane systems, thereby exercising unique biological functions.
The NPA motifs, ar/R lter, and Froger's positions of AQPs were critical for their substrate selectivity. A multiple alignment between CrAQPs and other plant AQPs was performed and the conserved NPA motifs and amino acids in ar/R lter and Froger's positions are characterized in Table 3 and Figure S1. Except for CrPIP2;1, the other 36 CrAQPs all contained two NPA motifs, one in loop B and one in loop E, and most of them were conserved. However, some CrAQPs, such as in CrTIP4;1 and four CrSIPs, displayed a variable third residue in the LB NPA motif, in which the A residue was replaced by S/T/L. In addition, the CrXIP1;1 protein had variable rst and third residues in the LB NPA motif (SPV). In loop E, this NPA motif was more conserved and only CrNIP1;3, CrNIP3;1, CrNIP3;2, and CrNIP3;3 showed substitutions of A by V. In CrSIP1;3, the LE NPA motif degenerated into NLG and showed a greater divergence in residues of the two NPA motifs than the other CrAQPs ( Table 3). The space between the two conserved NPA motifs varied from 79 to 127 aa and most were between 108 and 119 aa (Table 3). At the ar/R selectivity lters and Froger's positions, the CrAQPs displayed more differences than in NPA motifs (Table 3). These variabilities determined the substrate speci city of CrAQPs.

Chromosomal locations and evolutionary characterization of CrAQPs
To investigate the evolutionary relationship among CrAQP genes, chromosome maps were constructed (Fig. 2a). There are eleven chromosomes in the C. rosea genome and CrAQP genes were found on all except chromosome 5. On the other ten chromosomes, the CrAQPs were unevenly distributed. Among them, chromosome 3 had seven CrAQP genes, chromosome 8 had six, chromosome 2 had ve, chromosome 4 had four, chromosomes 1, 6, and 9 had three, and chromosomes 7, 10, and 11 had two.
Gene duplication events of CrAQPs were also investigated. A total of eighteen and four CrAQP genes were found to be segmentally and tandemly duplicated, respectively ( Table 4). The distribution of segmental duplication of CrAQPs in C. rosea chromosomes was simply showed in Fig. 2b.The selection pressure acting on CrAQP genes was inferred from the ratio of non-synonymous (Ka) to synonymous (Ks) substitution values.
Our data indicate that all CrAQP genes were under evolutionary pressure, with an average Ka/Ks ratio of 0.1523. All Ka/Ks ratios were well below one (range: 0.0989-0.2738) ( Table 4). These results suggest that CrAQPs experienced strong purifying selection pressure with limited functional divergence after duplication.

Gene structures and protein motif compositions
Gene structure analyses performed using the GSDS tool revealed relatively large variation in the number and length of introns/exons that resulted in CrAQPs length variation (720-14,816 bp) across ve different CrAQP subfamilies ( Fig. 3a and b). The number of introns ranged from zero (CrSIP1;2) to eleven (CrNIP4;1). Most CrNIPs and CrPIPs possessed three to four introns and most CrTIPs had two introns, except for CrTIP4;1, which had three introns. Three of four CrSIPs had two introns, except for CrSIP1;2, which was intronless. The only CrXIP1;1 also had two introns.
In general, the motif compositions were similar within each CrAQP protein subfamily (Fig. 3c).

Cis -acting regulatory elements
Although the sequence speci city of CrAQP proteins guaranteed the functional diversity necessary for maintaining water balance and mediating neutral small molecule transmembrane transport, the regulation of CrAQP expression remains a key mediator of CrAQP function, especially in response to stress and plant growth and development. The cis-acting regulatory elements are a series of nucleotide motifs that bind to speci c transcription factors, thereby regulating transcription in plants. In this study, we identi ed putative cis-acting elements in the promoter regions of all of CrAQPs by scanning the online PlantCARE program.
The promoter analyses of all 37 CrAQPs identi ed 68 putative cis-acting elements, including 25 light responsive elements, 4 ABA responsive elements, 3 gibberellin-responsive elements, 2 MeJA-responsive elements, 2 auxin responsive elements, 1 ethylene-responsive element, 22 abiotic or biotic stress-related responsive elements, and 18 development-related responsive elements (Table S3). We characterized these elements into 12 categories: light responsive elements, gibberellin-responsive elements, MeJA-responsive elements, auxin-responsive elements, salicylic acid responsive elements, ABRE-, ERE-, MYC-, MYB-, MBS-, and TC-rich repeats, and LTR. The numbers of these elements in each CrAQP promoter region are summarized in Fig. 4a. In addition, because PIPs play an important roles in maintaining water balance in plant cells, we summarized the abiotic stress-related cis-acting elements (including ABRE, ERE, MYB, MBS, TC-rich repeats, and MYC) within 11 CrPIP promoter regions (Fig. 4b). The categories and numbers of these elements suggest that mechanisms regulating CrPIP expression are involved in stress responses. However, further functional studies are warranted to con rm the functions of these cis-acting CrAQP elements.

Expression pro les of CrAQPs in different tissues and plants residing in different habitats
Tissue-and habitat-speci c expression pro les of CrAQPs were assessed by examining their Illumina RNA-Seq data representing seven tissue types: roots, vines, young leaves, owering buds, and young fruits gathered from SCBG, and two mature leaf samples gathered from SCBG and YX island respectively. Expression of all CrAQPs was detected in at least one of the examined tissues, though the transcript level was diverse. Overall, the CrPIP members had relatively higher expression in all tissues. The subfamilies CrPIP and CrTIP also produced abundant transcripts in most examined tissues (Fig. 5).Young owering buds and young fruits tended to have high levels of CrAQP expression across the whole family (Fig. 5a). We also compared the expression levels of CrAQPs between YX island and SCBG, and expression levels were higher in the YX sample than in the SCBG sample for most CrAQP members, particularly the CrPIP members (Fig. 5b). These results suggest that CrAQPs might play diverse roles in the growth and development of C. rosea, and in this extremophile halophyte's adaptation to coral reef habitats.
Expression pro les of CrPIPs in response to different stressors and the ABA treatment We performed a gene expression analysis on different C. rosea tissues to examine the expression patterns of CrPIP genes under various abiotic stress conditions and an ABA hormone treatment. The purpose of these treatments were to mimic reef and coast adversity as much as possible. We performed qRT-PCR to detect the transcript levels of these subfamily genes. As shown in Fig. 6, expression of all CrPIPs was affected by the stressors and hormone application. We also found several CrPIP members that showed relatively stable expression patterns, even under the various stressors. These genes included CrPIP1;5, CrPIP2;2, CrPIP2;3, and CrPIP2;5 (Fig. 6). Combining these results with the RNA-Seq data (Fig. 5), it is evident that these genes maintained higher expression levels than the other CrAQP genes across different tissues and habitats, suggesting that they may be involved in maintaining basic and primary water homeostasis during C. rosea growth and development. Under high salt stress, CrPIP1;2 showed all induced expression patterns in roots, vines, and leaves, while CrPIP1;1, CrPIP1;3, CrPIP1;4, CrPIP2;1, and CrPIP2;6 showed elevated expression in vine and leaf, and their expression was downregulated in roots. In general, alkaline stress had a smaller effect on the expression of CrPIPs. The genes CrPIP1;2, CrPIP1;3, CrPIP1;4, CrPIP1;5, CrPIP2;1, CrPIP2;4, CrPIP2;5, and CrPIP2;6 were downregulated in root, while CrPIP1;1, CrPIP1;2, CrPIP2;4, and CrPIP2;5 were slightly upregulated in aerial tissues. High osmotic stress increased the expression of CrPIP1;1, CrPIP2;4, and CrPIP2;6, and the ABA treatment increased the expression of CrPIP1;2, CrPIP1;4, CrPIP2;1, and CrPIP2;6 ( Fig. 6). These results indicate the role these genes play in multiple abiotic stress responses and ABA signaling response in C. rosea.
Interactions between CrPIP1;5 and CrPIP2;3 A previous study indicated that plant PIP1 and PIP2 members can associate together in heterodimers and tetramers [35]. Therefore, in this study we analyzed two PIP members, CrPIP1;5 and CrPIP2;3, to con rm that CrAQPs could form homodimers or heterodimers. To explore CrPIP1;5-CrPIP2;3 interactions, a series of DNA constructs were prepared for a yeast two-hybrid assay (Fig. 7a). BD and AD vectors were co-transformed into yeast AH109. Both CrPIP1;5 and CrPIP2;3 did not self-activate, but both can form homodimers through direct interactions with themselves (Fig. 7b). Furthermore, CrPIP1;5 and CrPIP2;3 can interact with each other (Fig. 7c). Together, these results indicate that, at least in yeast cells, these two CrPIP members can interact with themselves and each other to form active pores for water and small molecule transport across membranes.
Abiotic stress tolerance of yeast and Arabidopsis heterologously expressing CrPIP1;5 We performed functional identi cation of CrPIP1;5 using a yeast expression system, constructing with a CrPIP1;5-pYES DEST52 recombinant vector (Fig. 8a). As seen in Fig. 8b, W303 transformed with either CrPIP1;5 or pYES2 developed normally and did not differ in growth rate from the SDG control plate. However, with the addition of PEG8000 or sorbitol, W303 transformed with CrPIP1;5 showed an obvious growth lag compared to yeast containing the pYES2 control. When NaCl was added to the SDG medium, the W303 yeast containing CrPIP1;5 showed better growth than the control (Fig. 8b). We also checked H 2 O 2 transport activity using the yeast expression system. CrPIP1;5 resulted in increased H 2 O 2 sensitivity of yeast and lower growth rates, while both the BY4741 strain and the H 2 O 2 -sensitive mutant strain skn7Δ showed similar growth performance to the SDG control plate (Fig. 8c). These results indicate that, at least in yeast cells, CrPIP1;5 is an active H 2 O and H 2 O 2 transporter.
To further assess the effects of CrPIP1;5, we generated transgenic Arabidopsis plants that ectopically expressed CrPIP1;5 under the control of 35S promoters. The plants were con rmed as transgenic using genomic PCR, RT-PCR, and qRT-PCR ( Figure S2). Plants from three homozygous T3 lines (OX 1#, OX 5#, and OX 10#) were subjected to the salt, salt-alkaline, high osmotic, and drought tolerance tests. Although our seed germinating assays indicated that CrPIP1;5 OX lines did not have better germination rates on salt, salt-alkaline, and high osmotic MS plates ( Figure S3) and in the seedling root length assays, CrPIP1;5 OX lines grew slightly less on the salt and salt-alkaline MS plates than on the control plates ( Figure S4).
The seeds of WT and CrPIP1;5 OX lines were grown in well-watered conditions for 30 days, and prior to the salt, drought, and alkaline stress treatments, the growth rates of adult plants (WT and three CrPIP1;5 OX lines) were relatively consistent. There was no difference in tolerance between WT and transgenic plants (OX 1#, OX 5#, and OX 10#) under salt (200 mM NaCl) and salt-alkaline (100 mM NaHCO 3 , pH 8.2) stressors (results not shown). Apparently, CrPIP1;5 resulted in weak sensitivity to drought (Fig. 9a). After 10 days of water withdrawal, all plants wilted to some degree in both WT and the three CrPIP1;5 OX lines. After re-watering and growing for another 7 days, most of the CrPIP1;5 OX plants did not recover, while most of the WT plants did recover and had a higher survival rate than CrPIP1;5 OX plants (Fig. 9b). This result indicates that overexpression of CrPIP1;5 decreased plant resistance to drought.
We found that GFP-fused CrPIP1;5 was constitutively expressed (under the control of CaMV 35S) in transgenic Arabidopsis plants. Root tip uorescence of roughly three-to four-day-old transgenic Arabidopsis seedlings was easily discerned by confocal microscopy; the GFP-CrPIP1;5 protein was visible in the plasma membranes of transgenic plants, while in control plant roots, the GFP signal was distributed evenly in the whole cytoplasm ( Figure S5). These results suggest that subcellular localization of CrPIP1;5 was consistent across the PIP1 subfamily and was predominantly localized to the plasma membrane. Within the plasma membrane, CrPIP1;5 was folded into a speci c transmembrane channel and functioned as a water transporter.

Discussion
Water de cit-caused by drought, high salinity/alkaline, high temperature, cold/freezing conditions or other abiotic stressors-can negatively affect plant growth and survival. However, plants have developed intricate mechanisms to cope with this type stress, including alterations to signal perception and transduction and differential expression of stress responsive genes through complex networks. Aquaporins are a class of integral membrane proteins that facilitate the diffusion of water and other small solutes. Plants often maintain large and diverse AQP families compared to animals and microorganisms. Aquaporins have been reported to play crucial roles in plant water balance and homeostasis under adverse growing conditions [5,21] and in response to speci c biotic challenges [36,37]. In this study, we performed genome-wide identi cation and characterization of AQPs in C. rosea to understand the evolution of this family and its molecular role. We were particularly interested in resolving the molecular mechanisms underlying this extremophile halophyte's adaptation to coral reef habitats and its responses to acute salt, alkaline, and drought stressors.
The AQP protein family within the C. rosea genome was characterized and 37 putatively functional CrAQP isoforms (based on Pfam domain sequences) were identi ed, belonging to the PIP (11 isoforms), TIP (10), NIP (11), SIP (4), and XIP (1) families (Table 1). We performed whole genome sequencing of C. rosea, and our result indicates that this species is diploid, with a 534.94 Mbp genome size (data not published). The number of AQPs was similar to other diploid plant species (Table 2) and their protein sequences were highly similar. This indicates that the number of AQPs and sequence speci city may not be directly related to the adaptation of C. rosea to extreme environments. The roles that CrAQPs play in stress tolerance needs to be further studied from other perspectives, such as transcriptional regulation, protein modi cation, and the regulation of AQP transmembrane transport activities.
Although numerous studied have identi ed AQPs in model plant species, research on this gene family has increasingly focused on plants that inhabit novel environments. This is largely because AQP genes are seen as candidates for use in genetic modi cation of crops to increase agricultural productivity [38,39]. The saltbush Atriplex canescens is highly tolerant of saline-alkaline soils, drought, heavy metals, and cold, and the AQP genes AcPIP2 and AcNIP5;1 have been shown to be involved in abiotic stress tolerance in this species, and their overexpression in transgenic Arabidopsis caused altered tolerance to drought and salt [40,41]. Compared with cultivated soybean, the wild Glycine soja is relatively salt-alkaline tolerant. Two AQP genes from G. soja, GsTIP2;1 and GsPIP2;1, minimized tolerance to salt and dehydration stress when overexpressed in Arabidopsis, implying they have negative impacts on stress tolerance by regulating water potential [42,43]. In most functional analyses conducted in transgenic plants, the overexpression of AQP genes caused elevated tolerance to salt and drought, such as in Malus zumi (gene MzPIP2;1) [44], Sesuvium portulacastrum (SpAQP1) [45], Stipa purpurea (SpPIP1) [46], Simmondsia chinensis (ScPIP1) [47], Thellungiella salsuginea (TsPIP1;1) [48], and Phoenix dactylifera (PdPIP1;2) [49]. The elevated expression of AQP genes in plants can lead to cellular changes in water potential, which cause alterations in water uptake and transpiration, and ultimately modify tolerance to water de cit stress. In this respect, understanding the distribution, expansion, regulation, phylogenetic diversity, and evolutionary selection of AQP genes in extremophile plants like C. rosea is an important step toward potentially improving the water utilization abilities and drought adaptations of other plant species, including agricultural crops.
Plant AQPs play versatile physiological roles in combatting abiotic stress, not only by regulating water content and potential, but also by transporting certain signaling molecules and nutrients. Generally, AQPs consist of six transmembrane helices connected by ve loops (A-E) and cytosolic N-and C-termini. Loops B (cytosolic) and E (non-cytosolic) both contain the highly conserved NPA (asparagine-proline-alanine) motifs that form part of the core of these proteins. The aromatic/arginine (ar/R) constriction is located at the non-cytosolic end of the pore. The substrate speci city of AQPs is closely related to several different signature sequences, including NPA motifs, the ar/R lter, and Froger's positions (FPs) [50]. In all CrAQP NPA motifs, the rst two residues were the most conserved, except for CrSIP1;3 and CrXIP1;1, in which the loop B and loop E NPA motifs degenerated into NLG and SPV. The third residue of NPA motifs was more variable, in which A was frequently replaced by either L, S, T, or V. However compared to the NPA motifs, the 10 amino acid residues at the ar/R lter and Froger's positions were more variable in all CrAQPs (Table 3). In some subfamilies, the ar/R

selectivity lter sequences were similar, such as in CrPIPs (F-H-T-R), CrTIP1s (H-I-A-V), and CrNIP1s (W-V-A-R).
We also analyzed Froger's positions (P1-P5), ve conserved amino acid residues that are related to glycerol transport in water-conducting AQPs. The P2, P3, P4, and P5 Froger's positions in CrPIPs were relatively conserved (S-A-F-W), and in CrTIPs, they were less conserved (S-A-Y/F-W). In CrNIPs and CrSIPs, the P3 and P4 positions mostly stayed A and Y. It is supposed that plant TIPs may transport various small solutes, including H 2 O 2 , NH 4 + , and urea, in addition to water [39]. As with other plant TIPs, CrTIPs are mainly located in vacuolar membranes and may be involved in the regulation of water ow across subcellular compartments of organelles [51]. The variation of CrTIPs in ar/R selectivity lter sequences may contribute to their multiple transport functions, and their NPA spacing varies from 79 to 127 amino acid residues, which indicates that CrTIPs might also be involved in the transmembrane transport of multiple small molecules.
Gene structure organization, gene expansion, and gene diversity are critical indicators of the evolution of gene families. The CrPIP and CrTIP subfamilies exhibit relatively stable gene structure in comparison with other subfamilies (Fig. 3). Most of them possess three (CrPIPs) or two (CrTIPs) introns, suggesting that they might share a common ancestral origin. Similar to previous reports showing very few or no intronless AQP genes in other plant species [30,31,52], only one intronless AQP was identi ed in C. rosea. The intronless gene, CrSIP1;2, might have evolved recently through a retrotransposon process. The CrAQP family has undergone a number of duplication events consistent with the highly duplicated nature of plant genomes ( Fig. 2; Table 4).
The duplication events concerning segmental and tandem duplications identi ed in this study have also been reported in other plant species [33,34]. In the present study, some duplicated CrAQPs have distinct patterns of expression in different tissues and habitats, and under different stressors and hormone exposure (Figs. 5 and 6). It is likely that these duplicated gene pairs have similar protein functions yet function in different biological processes, probably mediated by transcriptional regulation or posttranscriptional modi cation.
Canavalia rosea is a salt-and alkaline-tolerant and drought-adapted halophyte, and abiotic stressors, such as saline-alkaline soil, seasonal drought, strong solar irradiance, and high temperatures, are the main limiting factors that induce osmotic stress and disturb water balance for this species and other tropical seaside plants.
Aquaporin genes, especially the PIP isoforms, play major roles in maintaining plant water homeostasis and responses to abiotic stress. Gene transcript levels are dependent upon the structures of their promoters. Therefore, the cis-acting elements in promoter regions might provide the key to understanding genetic factors in uencing the responses of signal molecules and environmental elicitors. We summarized the abiotic stressrelated cis-acting elements in CrAQP promoters (Fig. 4) and our ndings suggest diversity in CrAQP expression patterns. Furthermore, the expression pro les of CrAQPs in different tissues revealed by RNA-Seq indicate that some of the CrPIP and CrTIP subfamilies had higher expression levels than other subfamilies (Fig. 5a), and habitat-speci c RNA-Seq data further indicated the most of the CrPIP members had greater expression levels in coastal C. rosea (YX) than in inland C. rosea (SCBG; Fig. 5b). Our results suggest that differential expression of CrPIPs might be associated with different water use strategies in different habitats, and the higher expression level of CrAQPs in coastal C. rosea plants might be an adaptive mechanism to deal with intracellular and extracellular water-de cit signals. Therefore, we further investigated the expression patterns of CrPIPs under salt, alkaline, and drought stress and the ABA hormone treatment using qRT-PCR (Fig. 6). The results showed that CrPIP expression was most affected under the saline-alkaline, high osmotic stress, and ABA treatments. Furthermore, some CrPIPs showed clearly different and even opposite expression patterns in roots, vines, and leaves. This can be attributed to the fact that in roots the PIP proteins mainly facilitate water absorption from external environments, while in vines and leaves, the PIPs may play a larger role in transpiration. Broadly, our results suggest a role for PIPs in regulating C. rosea hydraulics and probably adaptation to the challenging environmental conditions found on tropical coral reefs and islands.
We performed protein-protein interaction studies using yeast two-hybrid assays and found that two CrPIP members, encoded by CrPIP1;5 and CrPIP2;3, that were highly expressed in all tested tissues and almost constitutively expressed under the abiotic stress challenges and ABA treatment (Fig. 5a and Fig. 6). These two CrPIP members could bind to themselves and each other to form homodimers and heterodimers (Fig. 7). This is consistent with previous ndings that some PIP1 and PIP2 members could assemble as homotetramers and heterotetramers, thereby triggering channel activities, in uencing substrate speci city, and regulating PIP tra cking [53]. Here, our results on the expression patterns of CrPIP1;5 and CrPIP2;3 provide a detailed understanding of their regulatory modes and help to illuminate CrAQP functions. These data are especially helpful for characterizing AQP-interacting protein complexes involved in C. rosea's adaptations to harsh environmental conditions such as low water availability and saline-alkaline soils.
Our results from the yeast overexpression system indicate that CrPIP1;5 is an active transmembrane H 2 O and H 2 O 2 transporter (Fig. 8). We assessed the overexpression of CrPIP1;5 in transgenic Arabidopsis, and CrPIP1;5 lead to slightly reduced saline-alkaline and drought tolerance. This suggests that CrPIP1;5 could play a key role in water transport. We also found that high levels of salt, alkaline, and ABA slightly decreased the expression of CrPIP1;5 in C. rosea., This further suggests that this gene is highly important for water movement between cells and tissues, and is indeed involved in a stress response pathway that protects plants from water loss under high salinity conditions and promotes water release under high osmotic stress caused by PEG or sorbitol (Fig. 8b). The overexpression of CrPIP1;5 in transgenic Arabidopsis is in contrast to most previous ndings [39], suggesting that overexpression of plant PIPs results in improved agronomic and abiotic stress tolerance.
There are also few studies reporting that the overexpression of plant PIPs could increase sensitivity to drought stress. For example, transgenic tobacco (Nicotiana tabacum) plants overexpressing AtPIP1;4 and AtPIP2;5 displayed rapid water loss under dehydration stress and showed enhanced water ow under drought stress [54]. The Glycine soja gene GsPIP2;1 negatively impact salt and drought stress tolerance by regulating water potential when overexpressed in transgenic Arabidopsis [43]. In addition, Arabidopsis plants overexpressing AcPIP2 (a PIP gene from saltbush A. canescens) exhibited drought-sensitive phenotypes [40]. Together, these studies suggest that regulation of PIP genes within different plant species promotes plant responses to abiotic stressors by maintaining water homeostasis.

Conclusions
The leguminous nitrogen-xing plant, C. rosea, presents extreme saline-alkaline and drought resistance and is used as pioneer species on islands and reefs for arti cial vegetation construction. In the present study, we conducted a genome-wide analysis and characterization of AQPs in C. rosea. Our results will be helpful for understanding the involvement of this gene family in adaptation to stressful abiotic conditions, particularly through its impact on water balance. We determined that the CrAQP family consists of 37 members distributed across ve subfamilies. Each member had subtle variations in gene and protein structures, transcriptional regulation, subcellular localization, substrate-speci city, and post-translational regulatory mechanisms.
Expression pro ling of CrAQPs revealed higher expression of PIP-associated genes in almost every tissue of C. rosea plants, suggesting that this subfamily likely plays important roles in developmental processes and abiotic stress responses. As predicted, the two PIP1 and PIP2 members, CrPIP1;5 and CrPIP2;3, formed homodimers and heterodimers through protein interactions. We also functionally identi ed one of the CrPIP1 members, CrPIP1;5, given its highest expression levels in different tissues of C. rosea. However, our results showed that overexpression of CrPIP1;5 could increase sensitivity to saline-alkaline and drought conditions in yeast and plants. The identi cation of CrAQPs in this study will be useful for further investigation of the roles that AQPs play in the various developmental stages and physiological processes of C. rosea, as well as elucidating the possible ecological adaptation mechanisms of C. rosea to extreme environments, and identi cation candidate genes for potential introduction into transgenic agricultural crops.

Plant materials and stress treatments
Canavalia rosea plants growing on Yongxing Island (YX, 16˚83′93′′ N, 112˚34′00′′ E) and in the South China Botanical Garden (SCBG, 23˚18′76′′ N, 113˚37′02′′ E) were used in this study. To analyze tissue-speci c transcriptional patterns of the identi ed CrAQPs, roots, stems, leaves, owers, and fruits were gathered from C. rosea plants grown in SCBG. In addition, to investigate the involvement of the CrAQPs in adaptation to different habitats, adult leaves were gathered from C. rosea plants growing in both YX and SCBG.
To investigate the involvement of the CrAQP genes in abscisic acid (ABA) and in various stress responses, C. rosea was germinated from seed and 30-day-old seedlings were exposed to stressors. In brief, for the high osmotic stress treatment, seedlings were removed from their pots and carefully washed with distilled water to remove soil from the roots, and then transferred into a 300 mM mannitol solution. For high salt stress, seedlings were soaked in a 600 mM NaCl solution. For alkaline stress, seedlings were soaked in a 150 mM NaHCO 3 (pH 8.2) solution. For ABA treatment, a freshly prepared working solution of 100 µM exogenous ABA was sprayed on the leaves of seedlings. The second and/or third mature leaves from the seedling apexes were collected at 0, 2, and 24 hours during the previously described stress treatments, with the 0-hourtime point used as the control. All samples were immediately frozen in liquid nitrogen and stored at − 80 °C for subsequent gene expression analysis. Three independent biological replicates were used.

Identi cation of CrAQP genes and gene duplication analysis of the CrAQP family
To identify all putative CrAQP genes, the genome database of C. rosea (data not published) was used to obtain DNA and protein sequences. In brief, DIAMOND [55] and InterProscan

Analysis of gene structure and conserved protein motifs
The gene structure for each CrAQP was illustrated using the Gene Structure Display Server 2.0 (http://gsds.cbi.pku.edu.cn/). To identify the biochemical features of all CrAQPs, the ProtParam (http://web.expasy.org/protparam/) was used to predict molecular weights (MW) and isoelectric points (pI) of the candidate CrAQP proteins. The transmembrane domains (TMDs), NPA motifs, and other conserved amino acid residues were recognized by the sequence alignment of CrAQPs with AtAQPs [26] and GmAQPs [34]. The numbers of phosphorylation sites within CrAQPs were predicted using NetPhos 3. Promoter sequence pro ling of CrAQPs Putative CrAQP promoter sequences (2,000 bp upstream of ATG) were retrieved from the C. rosea genomes database (Table S1). Sequences were then uploaded into the PlantCARE database (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) for cis-acting regulatory element analysis. The cis-acting elements were classi ed as either hormone-speci c (gibberellin-responsive elements, MeJAresponsive elements, auxin-responsive elements, salicylic acid-responsive elements, EREs, and ABREs) or abiotic stress-responsive (light responsive elements, MYCs, MYBs, MBSs, TC-rich repeats, and LTREs). The different elements were summarized and several selected CrPIP promoters were visualized using TBtools [57]. Transcript abundance of several CrAQPs' transcript was investigated using a qRT-PCR assay. In brief, total RNA was extracted from C. rosea seedling tissues under the stress/ABA treatments and reverse transcribed to cDNA. Quantitative RT-PCR was conducted using the LightCycler480 system (Roche, Basel, Switzerland) and TransStart Tip Green qPCR SuperMix (TransGen Biotech, Beijing, China). All of the gene expression data obtained via qRT-PCR was normalized to the expression of CrEF-α (Table S2). The primers used for qRT-PCR (CrEF-αRTF/CrEF-αRTR for the reference gene and other CrAQP-speci c primer pairs) are listed in Table S2.
Functional identi cation of CrPIP1;5 in transgenic Arabidopsis plants The coding sequence (CDS) of the CrPIP1;5 cDNA was PCR-ampli ed using the primer pair PIP1-5OXGF/PIP1-5OXGR (Table S2) and then inserted into plant expression vector pEGAD to generate CrPIP1;5-pEGAD. Thus, transgenic Arabidopsis plants (three overexpression lines, OX 1#, OX 5#, and OX 10#) were generated. After con rmation with genomic PCR and quantitative RT-PCR, these T3 homozygous transgenic lines were tested for their stress tolerance according to their seed germination rate as well as seedling and adult plant growth rate. These tests were thereby meant to evaluate the biological functions of CrPIP1;5.
In brief, seed germination rate of the CrPIP1;5 transgenic Arabidopsis (OX 1#, OX 5#, OX 10#, and WT) was measured under the following stress treatments: NaCl (175 mM, 200 mM, and 225 mM; salt stress); 5 mmol/L NaHCO 3 plus 95 mmol/L NaCl (pH 8.2), 7.5 mmol/L NaHCO 3 plus 92.5 mmol/L NaCl (pH 8.2), and 10 mmol/L NaHCO 3 plus 90 mmol/L NaCl (pH 8.2; alkaline stress); mannitol (200 mM, 300 mM, and 400 mM) stress. The goal of these treatments was to detect the effect of the overexpression of CrPIP1;5 on improving the salt/alkaline/osmotic tolerance of transgenic Arabidopsis seeds during germination. Additionally, root length was calculated to evaluate the in uence of the overexpression of CrPIP1;5 on transgenic Arabidopsis seedlings under abiotic stress (100 mM, 150 mM, and 200 mM NaCl for salt stress; 0.5 mmol/L NaHCO 3 plus 99.5 mmol/L NaCl, 0.75 mmol/L NaHCO 3 plus 99.25 mmol/L NaCl, 1 mmol/L NaHCO 3 plus 99 mmol/L NaCl, pH 8.2 for alkaline stress; 200 mM, 300 mM, and 400 mM mannitol for osmotic stress. Wild-type Arabidopsis and Murashige&Skoog medium (MS) or MS plus 100 mM NaCl (pH 8.2) medium were used as controls. The seed germination and seedling growth experiments were both performed on MS plates with or without stress factors, in the same greenhouse environment used to grow the Arabidopsis plants. Drought tolerance assays were also performed on transgenic adult Arabidopsis plants. Both WT and transgenic seeds (OX 1#, OX 5#, and OX 10#) were grown on MS medium. Ten-day-old seedlings were transplanted into square pots lled with nutrient solution soaked vermiculite. Thirty to forty plants of each genotype were cultured in the greenhouse as described above without watering for another 20 days to ensure adequate growth. The water content of vermiculite in the pots was reduced over this timeframe but did not induce drought stress. The plants were then subjected to a drought tolerance assay whereby WT and transgenic plants (OX 1#, OX 5#, and OX 10#) were maintained under continuous drought conditions for 10 days and then watered for 7 days. Survival rates were then calculated according to the number of living plants at the end of the experiment. Subcellular localization of CrPIP1;5 in Arabidopsis was also detected using GFP fusion protein in seedling roots. The OX homozygous lines of CrPIP1;5-pEGAD and the control (pEGAD) transgenic plants were sterilized and spotted in MS plates to generate seedlings. Then, three-to four-day-old seedlings were detected using a camera tted to a confocal laser scanning microscope to record the GFP uorescence of different tissues. To con rm it was the cell membrane that was uorescing, seedling roots were stained by using a propidium iodide solution (1 mg/mL in phosphate buffer solution).

Statistical analysis
All the experiments in this study were repeated three times independently and results are shown as mean ± SD (n ≥ 3). Pairwise differences between means were analyzed using Student's t-tests in Microsoft Excel 2010. Phylogenetic relationships, genes' structure, and motif compositions of the AQP genes in C. rosea. a The phylogenetic tree on the left side is constructed using MEGA 6.0. The ve major groups are marked with different color backgrounds. b The exon-intron organization of the CrAQPs is constructed using GSDS 2.0 (in the middle). c The conserved motifs of each group on the right side are identi ed by the MEME web server. Different motifs are represented by different colored boxes, and the motif sequences are provided in Table S4. Homodimer or Heterodimer of the CrPIP1;5 and the CrPIP2;3 detection by yeast two-hybrid assay. a Maps of different constructs. b Both the CrPIP1;5 and the CrPIP2;3 showed self-interacting. c The CrPIP1;5 and the CrPIP2;3 could interact each other.