Genome-Wide Bioinformatics Analysis of Aquaporin Gene Family in Maize (Zea mays L.)

Aquaporins are a super family of major intrinsic proteins, which facilitate the fast and passive movement of water across the cell membrane. This study presented genome-wide identification, characterization and functional prediction of aquaporins in maize using bioinformatics. A total of 41 non-redundant putative aquaporins were identified and were classified into four subfamilies: 18 TIPs, 12 PIPs, 8 NIPs and 3 SIPs. The finding reveals that exon-intron organization were conserved within subfamilies. Several transmembrane domains(TM1-TM6) were predicted by analyzing of conserved domains and motifs, along with various selectivity filters(ar/R). The functional prediction demonstrated ZmAQPs roles in regulation of multiple compounds i.e. water, glycerol, carbohydrates, metal ions and others small solutes. Furthermore, ZmAQPs were the crucial constituent of membranous structure such as plasma membrane and vacuolar membrane etc. These results deliver valuable information to address function of ZmAQPs as well as provide basic data for the improvement of plant growth and development.


Introduction
Plants are the most adversely affected entities by water scarcity due to exponential depletion of underground water level. The movement of water from underground to plants is mainly carried out through their roots by three parallel pathways i.e. symplastic, trans-cellular or apoplastic [1]. Aquaporins (AQPs) are an ancient channel protein family embedded in membranous structures of plants that transport water and other neutral metabolites across membranes [2]. Most of them are involved in hydraulic conductivity and have potential to increase 10 to 20-fold of water transportation across plasma membrane [3]. This property of AQPs is very important in plants to hold different activities like maintenance and regulation of water [4], cell elongation [5], soil-water relations [6,7], plant cell osmo-regulation [8], seed germination [9] and even in plant reproduction [10,11]. Aquaporins also influence leaf movements and their physiology [12], salt tolerance, fruit ripening [13] and drought resistance in plants [14].
Zea mays (maize or corn) is an important cash crop, which belongs to the grass family i.e. Poaceae. It is the 3rd most significant crop grown globally for food (http://par.com.pk/). In addition to human food, maize has major contribution in animal feeds and many other purposes like bioethanol production and secondary metabolites. But maize yield is adversely affected due to environmental stresses especially shortage of water, thus solution of this problem is a major concern of world-wide researches.
Many studies have been carried out on maize aquaporins, showing its crucial role in biological regulation. The first study was performed by Chaumont [3] based on expressed sequence tags (ESTs). Later on, functional aspects of different aquaporins were studied under different conditions; for example, expression of some ZmPIPs ( PIPs of Zea mays) brought change in stomata opening and closing regulation, and characterization and evolutionary analysis of aquaporins of Arabidopsis and chickpea. Furthermore, we also predicted many important biological features of maize aquaporins, including protein sequence analysis, identification of trans-membrane domains, conserved motif analysis with respect to phylogenetic tree, gene structure analysis with evolutionary tree, chromosomal distribution, gene ontology includes biological process, molecular function and subcellular localization. These bioinformatics results might be helpful for further experimental analysis of aquaporins in maize genome.

Database searching and identification of ZmAQP genes
Aquaporin protein sequences from Arabidopsis thaliana (35 sequences) [24] and Cicer aritinum (40 sequences) [31] were used as a query in different database search engines including NCBI-Blastp, Phytozome-Blast and MaizDB-Blast. Furthermore, the position specific iteration (PSI) was also used to make it more specific [32]. The proteins having query similarity more than 75% identified through blastp were selected. After removal of redundant sequences, the identified aquaporins were validated using CD-Search program [33] for presence of MIPs domain. Finally, the data related to these selected ZmAQPs was collected including amino acid sequences, cDNA and genomic DNA sequences from MaizeDB [34].

Multiple sequence alignment and phylogenetic tree analysis
The full length aminoa cid sequences of 41 identified ZmAQP genes were aligned using MUSCLE program [35] and the aligned sequences used for phylogenetic tress construction through maximum likelihood method with 1000 bootstrap value in PhyML3.0 [36]. The phylogenetic tree was visualized in MEGA 6.0 [37]. In order to authenticate the evolutionary tree another program, Mr Bayes v3.2.6 [38]was also used to construct phylogenetic tree with parameters Markov Chain Sampling over the space of all possible reversible substitution models and prior for the amino acid model to mixed. The ZmAQPs was classified into four subfamilies NIPs, TIPs, PIPs and SIPs, based on known nomenclature of AQPs that were used as query in initial BLAST search. Thirty-one ZmAPQs were already annotated with AQPs names by Chaumont [3] and the 10 ZmAPQ genes were named according to their cumulative phylogenetic tree of ZmAQPs with Arabidopsis [24] and chickpea AQPs [31].

Gene structure analysis, chromosomal distribution and Gene Ontology (GO) of ZmAQPs
The genomic data including DNA and cDNA sequences, chromosome number, start base pair, end base pair, number of exons as well as introns were retrieved from MaizeGDB (http:// www.maizegdb.org) for gene structure analysis and chromosomal distribution. Exon-intron organization of ZmAQPs were analyzed by using Gene Display Server 2(http://gsds.cbi.pku.edu.cn/) [45] and the physical location graph was manually created using MS excel sheet. Gene Ontology of ZmAQP proteins was predicted by Blast2GO program [46] using amino acid sequences with default parameters and using different databases like Swiss-Prot protein, NCBI non-redundant protein (nr), Gene ontology (GO), Kyoto Encyclopedia of Genes(KEGG) protein family and Cluster of Orthologs Groups (COGs).

Identification and prediction of Zea mays Aquaporin's gene family
The database searches resulted in 46 putative AQP genes showing strong matches in maize (Zea may L.) genome. After removal of duplication in gene locus and validation of MIP superfamily domain in amino acid sequences, 41 putative ZmAQPs were proposed in maize genome. A literature survey of AQPs superfamily along with their subfamilies in other plant species were also included for a comparative study, given in Table 1. Generic names (ZmAQPs: ZmPIPs, ZmNIPs, ZmTIPs and ZmSIPs) were assigned for Zea mays aquaporins. The individual gene data and their predicted characteristics are listed in Table 2, including gene name, NCBI accession, chromosome number (Chr#), amino acid residues (aa), Trans-membrane domain (TMD), isoelectric point (Ip), molecular weight (MWkDa), start base pair, end base pair and sub-cellular localization (C.L) (

Sequence alignment and identification of NPA motifs and ar/R selectivity filter residues and Froger's positions
Multiple sequence alignment of ZmAQPs exhibited that most of the amino acid residues within domains were conserved among all 41 ZmAPQs. In NPA motifs, ar/R selectivity filter and Frogger's positions   Table 3 and Figure S2.
However, few genes were found as outlier to aforementioned pattern ( Figure 3, supplementary file S1).

Gene ontology (GO)
Gene Ontology analysis revealed a critical role of ZmAQPs in distinct biological, cellular and molecular processes. The biological process involved transportation of ions, glycerol and other small solute (Figure 4a, Figure S3). The molecular function revealed that ZmAQPs have crucial role in substrate-specific transmembrane transporter activity, active and passive transmembrane transporter activity, carbohydrate and organic hydroxyl compound transporter activity, receptor activity, metal binding activity, heterocyclic compound binding activity and structural molecular activity (Figure 4a, Figure   S4). In summary, significant activity of ZmAQPs was observed in transportation of compounds such as glycerol, ion, water and alcohol across plasma membrane. These activities may be due to substratespecific channel forming activity (Figure 4a). Sub-cellular localization exhibited that all ZmAQPs were incorporated into membrane of various cellular components like plasma membrane, vacuolar membrane, membrane bounded organelles and other intrinsic component of membrane. ZmAQPs were also integrated into cell periphery, intracellular organelle, plasmodesmeta and cell-cell junction (Figure 4b, Figure S5).

Gene structure analysis of ZmAQPs gene family / Genomic organization of ZmMIPs
The comprehensive transcriptomic data of Z. mays made it possible to analyze the gene structural components of aquaporins within Z. mays genome. The gene displays server (GDS) resulted that all four families showed differences in number of exon and introns. Among ZmAQPs families, ZmNIPs has highest number of exons (five exons) followed by ZmPIPs (four exons), ZmTIPs (three exons) and ZmSIPs (three exons). For more depth study, in ZmTIPs; nine genes contain two exons and remaining eight genes carried three exons while ZmTIP4;2 showed no intron. In ZmNIPs all genes carried five exons except ZmNIP1;1 and ZmNIP1;2, which showed loss of 2nd exon. In ZmPIPs; five genes had four exons, three genes (ZmPIP2;5, ZmPIP2;6 and ZmPIP2;7) has three exons and only two genes (ZmPIP1;5 and ZmPIP1;5) has two exons. However, all ZmSIPs genes carried three exons ( Figure 5).

Chromosomal location of ZmAQP gene
The chromosomal locations of 41 ZmAQPs were graphically presented in Figure 6. The ZmAQPs distribution in genome showed that 4th and 5th chromosome has highest number of ZmAQPs genes (7 genes), followed by chr2 (6 genes), chr6 (4 genes) and chr9 (4 genes). Chromosome 1, 7 and 8 carried three ZmAPQs genes, while chromosome 3 and 10 has only two. Furthermore, chromosomal location identification disclosed that most of the ZmAQPs genes were found in cluster form and located either on top or on bottom of the chromosome. Interestingly, most of the genes belonging to same family were located on same chromosome e.g. ZmPIP2;2, ZmPIP2;7 and ZmPIP2;5 formed a cluster, within a~500kbp segment on chromosome 2 while ZmPIP1;5 and ZmPIP1;3 were clustered on chromosome 4. Similarly, ZmNIP1;2 and ZmNIP2;2 were clustered on chromosome 6 ( Figure 6).

Discussion
Robust advancements in the computational biology helped to sequence large number of plant genomes, which certainly improved the identification and characterization of physiologically vital gene families in plants. The computational analysis of aquaporin proteins will definitely help in hypothesis generation and subsequent experimental validations and ultimately lead towards genetically engineered improved crop. Researchers have always been concerned to study different genes and their expression in order to understand their role in the context of a particular plant. Here in this study, AQPs family was selected due to its significant input to growth and development of a plant [47]. The prime function performed by AQPs is the regulation of water and some solutes across cell membrane [47,48]. Considering AQPs importance in plant growth and development, its genome wide analysis has been conducted in many plant species including monocots, such as rice, wheat, barley etc, as well as in dicots i.e. potato, tomato, cabbage, carrot, celery and Arabidopsis (see Table 1). Current study on ZmAQPs has its own worth because to the best of our knowledge few studies conducted on genome wide identification, characterization and functional prediction of aquaporins in maize [49]. However, Chaumont [3] carried a study on ZmAQPs based on EST/ cDNA sequence and identified 31 AQPs, while current study is based on whole genome identification of aquaporins gene and proposed 41 AQPs gene. The difference in number of AQPs between the two studies might be due to varying number of expressed transcripts at a given time. The limitation in Chaumont study [3] was addressed in this study by exploiting the whole genome sequence of maize. In contrast to Chaumont study, we also performed a genome-wide comparative study of ZmAQPs with Arabidopsis and chickpea that gave the evolutionary insight among the monocots and dicots. Chaumont Study was restricted to expressed sequence tags and its phylogenetic tree, while in this study, we provided many other biological parameters like ZmAQPs protein sequences analysis, conserved motif prediction, gene ontology, gene structure analysis and chromosomal and sub-cellular localization prediction in details. The genome wide study of aquaporins in maize suggested that the total number of AQP genes in maize were higher than in Arabidopsis (35 AQPs), sweet orange (34 AQPs), rice (34 AQPs) and physic nut (32 AQPs), but lesser than other few species like banana (47 AQPs), tomato (47 AQPs), cottonwood (55 AQPs), soybean (66 AQPs), and chinese cabbage (53 AQPs) (see Table  1 for the comparison chart). These changes in number of gene in Citation: Bari  different species may be due to size of their genome or due to evolutionary process for adaptation in natural environment [50].
The phylogenetic tree demonstrated that all ZmAQPs were classified into four sub-families viz ZmTIPs, ZmPIPs, ZmNIPs and ZmSIPs (Figure 1), which is in agreement with the Chaumont [3] and other studies listed in Table 1. Additionally, we also indicated groups and sub-groups of subfamilies; for instance, ZmTIPs was divided into five subgroups (TIP1, TIP2, TIP3, TIP4 and TIP5). Such findings were similar to other monocots AQPs i.e rice, banana, sorghum, and barley ( Table 1). As TIPs proteins involve in the transport of various small solutes like NH 4 + , H 2 O 2 , and urea [51][52][53], so the information about TIPs in maize genome may leads to improvement of transportation mechanism. The ZmPIPs were divided into two subgroups; ZmPIP1 and ZmPIP2 which are seeming to be conserved in all other plant species (listed in Table 1). Experimental analysis of PIPs conformed its role in water absorption inside roots as well as turgor pressure in leaf [54][55][56]. Furthermore, PIPs also facilitate CO 2 diffusion in mesophyll that enhances photosynthesis process [57,58]. So the study of PIPs individual genes in maize will help in understanding of C4 photosynthesis mechanism that that are crucial in engineering of C4 features into C3 plants, such as rice, wheat and potato [59]. In case of ZmNIPs, it was divided into four subgroups like ZmNIP1, ZmNIP2, ZmNIP3 and ZmNIP7;1. Conversely, Chaumont proposed three subgroups of ZmNIPs (not identified ZmNIP7). The NIP7 subgroup has its unique sequence and was studied in all genome wide studies of AQPs in plant species like Arabidopsis, chickpea, rice, banana, barley, sweet orange, tomato, common bean and sorghum (Table 1). NIPs were found a bit more diverse then other sub-families, as it is more specific to species. It was observed that different species of monocots has different number of subgroups of NIPs subfamily like sorghum and rice have four sub-groups; while moso bamboo has three, and banana has five sub-groups [60][61][62]. Generally, NIPs reported as transporter of water and various small solutes like glycerol, silicon, lactic acid and urea transport facilitator [63,64]. These differences in crops might be due to crops potency to uptake glycerol, silicon and other small solutes. The ZmSIPs sub-family has been divided in two sub-groups (ZmSIP1 and ZmSIP2) which is quite similar to Chaumont [3] study and in other plants as listed in Table 1.
The comparative phylogenetic analysis of AQPs among maize, Arabidopsis and chickpea indicated that AQPs of Arabidopsis and chickpea were closer than maize. It concluded that AQPs within dicots are more similar as compare to dicots monocots relation. For example, ZmAQPs showed close relation with itself rather than other two dicots. However further deep study is required to understand the diversity of AQPs related to its functions. Furthermore, the 13 sister branches of ZmAQPs-ZmAQPs revealed segmental duplication events within maize genome that may have some key role in evolutionary adaptation against various environmental stress. The other important sequence features of ZmAQPs including molecular weight (Mw), iso-electric point (Ip), and amino acid length were similar as reported for MIP proteins in Arabidopsis, banana, moso bamboo and other species [17,61,65]. These predicted features would be helpful for the functional characterization of ZmAQPs. The identification of trans-membrane domains in ZmAQPs gives information about structural association with various functions. The identified trans-membrane domains in ZmAQPs were same as reported in other crops like TM1 to TM6 [17,61,65].
The transportation characteristics of AQPs are due to NPA motifs and ar/R selectivity filters which form water channels [66,67]. They have high specific substrate binding capacity and are essential for selective transport of water and small solutes [66,67]. Chaumont study [3]only presented the NPA motifs, while current study included the ar/R selectivity filter and Frogger's positions that are most important for selection of molecule across biological membrane [66,67]. All ZmAQPs showed two representative NPA motifs as reported by Chaumont [3]. The ZmPIPs contain highly conserved ar/R selectivity filter region (F-H-G-R) and same pattern was also identified in other PIPs, like in Arabidopsis, tomato, wheat, barley and poplar (Table 1). Additionally, the presence of the S, A, F, W residues at P2-P5 positions in PIPs, has been reported as a signature of CO 2 transporter [68]. Any mutation in these conserved amino acids in ZmPIPs can alter the capacity of the protein for CO 2 diffusion. If the mutation leads increment in CO 2 diffusion, then it will also helpful for establishment of C4 crops. Similarly ZmTIPs carried H, I, G, R or H, I, A, V residues in the ar/R selectivity filter region, and T, S, A, Y, W or T/S, A, A, Y, W amino acid residues at P1-P5 positions are reported to transport urea and H 2 O 2 across membrane [68]. The ZmNIPs ar/R filters in maize were identical to sweet orange and soybean where these genes act as water facilitator [69]. The ZmNIPs sub-group showed G, S, G and R, at ar/R filter that are involved in transportation of water, silicon (Si) and boron (B) as identified in Arabidopsis (AtNIP5 with A, I, G and R in ar/R filter region involved in the transport of arsenic and boron but not silicon) and sweet orange [70]. The SIPs protein's basic structure and function are still under characterization but in Arabidopsis, AtSIPs are involved in functional water channel [71]. The differences in ar/R selectivity filter residues might alter the selectivity behavior for substrate transport. Thus, point mutations in these patterns either increase or decrease the transportation capacity of aquaporin proteins [70,72,73]. The variation in these conserved segments described the evolutionary diversity among ZmAQPs. Besides NPA and ar/R filter regions, there were some other important motifs, predicted through MEME discovery server ( Figure 3) including phosphoserine, amidation site, casein kinase II, N-myristoylation site, PK_Phospho, phosphothreonine and some novel motifs. Such motifs have also essential role in organization and regulation of MIP domain and sometime these motifs can be targeted for regulation and interaction of AQPs protein with other cellular components. The prediction on subcellular localization as plasma membrane, vacuole membrane and mitochondrial membrane for ZmAQPs that were corresponds to the reports in other plant species including soybean, sorghum, rubber tree, sweet orange, common bean, soybean and moso bamboo (see Table 1). The ZmTIPs mainly express in vacuole that may help to control osmotic potential and ZmPIPs integrated within plasma membrane, while ZmNIPs are expressed in various membrane such finding were similar to literature [74,75].
We have presented schematic representation of gene structures of all ZmAQPs with their evolutionary relationship ( Figure 5) that were not demonstrated in Chaumont [3]. Our results were in agreement to previous studies in other plant species [17,61,65]. Most of ZmAQPs showed similar gene structure to their orthologs in other plants like in TIPs; ZmTIP1, SbTIP1(sorghum) [60], MaTIP1(banana) [65], PeTIP1 (moso bamboo) [61] has two exons; ZmTIP2 has also covenant with its orthologs in sorghum, banana and moso bamboo except ZmTIP2;1 which demonstrated an additional exon as compare to aforementioned species [17,61,65]. The ZmTIP3 showed similarity to sorghum, while somehow diverges to moso bamboo and banana. Similarly, other subfamilies also demonstrated similar patterns. Thus, most of our findings were similar to previous studies (Table 1). So, we suggest that the loss or gain of exons in ZmAQP genes might have ensued under natural selection [28,76]. Previous studies have depicted that lost or gain of exons are the common feature of evolutionary process in plants genomes [76,77].
The chromosomal location of genes tells about the expression capacity of a gene [78]. Chaumont study [3] on ZmAQPs did not give any clue about gene location within genome of maize. However, we demonstrated chromosomal locations of ZmAQPs on whole genome sequence. All ZmAQPs genes were distributed on all 10 chromosomes forming groups. The genes belonging to same sub-family mostly form clusters within the window size of 500kb. The clustering of ZmAQPs depicted segmental duplication that is one of a key mechanisms for gene expansion that increase genetic diversity [79]. Moreover, this mechanism may also be responsible for the functional divergence by increasing total members of a given gene subfamily [80].

Conclusion
In this study, we used different bioinformatics tools for the genomewide identification, analysis and characterization of aquaporins in maize genome. Moreover, we have predicted several physiological structures with their biological functions. The aquaporin genes family widely studied in many important crops demonstrating its structural and functional diversity. The availability of genome sequences made this study possible. Current study identified 41 aquaporins in Zea mays L. and these genes assigned nomenclature as well as classified into four subfamilies. Moreover, the structural and functional features of ZmAQPs have been predicted, and a comparative phylogenetic study of ZmAQPs, CaAQPs, and AtAQPs was also conducted, which provided insights about the evolution of AQPs within plant species.
The results achieved in this study not only provide valuable information for future functional analysis of ZmAQP genes but also make a suitable reference to survey the gene family expansion in Zea mays and other crops from evolutionary perspective.