Construction and analysis of protein–protein interaction networks based on nuclear proteomics data of the desiccation-tolerant Xerophyta schlechteri leaves subjected to dehydration stress

ABSTRACT In order to understand the mechanism of desiccation tolerance in Xerophyta schlechteri, we carried out an in silico study to identify hub proteins and functional modules in the nuclear proteome of the leaves. Protein–protein interaction networks were constructed and analyzed from proteome data obtained from Abdalla and Rafudeen. We constructed networks in Cytoscape using the GeneMania software and analyzed them using a Network Analyzer. Functional enrichment analysis of key proteins in the respective networks was done using GeneMania network enrichment analysis, and GO (Gene Ontology) terms were summarized using REViGO. Also, community analysis of differentially expressed proteins was conducted using the Cytoscape Apps, GeneMania and ClusterMaker. Functional modules associated with the communities were identified using an online tool, ShinyGO. We identified HSP 70–2 as the super-hub protein among the up-regulated proteins. On the other hand, 40S ribosomal protein S2–3 (a protein added by GeneMANIA) was identified as a super-hub protein associated with the down-regulated proteins. For up-regulated proteins, the enriched biological process terms were those associated with chromatin organization and negative regulation of transcription. In the down-regulated protein-set, terms associated with protein synthesis were significantly enriched. Community analysis identified three functional modules that can be categorized as chromatin organization, anti-oxidant activity and metabolic processes.


Introduction
Resurrection plants can survive extreme water loss and survive long periods in an abiotic state and, upon watering, rapidly restore their normal metabolism [2]. Understanding the mechanisms of desiccation tolerance (DT) in resurrection plants is important as they are deemed to be an excellent model to study the mechanisms associated with the desiccation tolerance phenomenon.
Proteomics can generate expression information for the proteins in a sample at a given time point [3]. Thus, through proteomics, proteins that are up-regulated or down-regulated in response to perturbation are identified -the so-called differentially expressed proteins (DEPs) [4]. Proteomic profiling offers the opportunity to identify proteins that mediate the pathways involved in the DT mechanisms, when cells are subjected to desiccation stress. Several proteomics investigations have been carried out for the leaves of some angiosperm resurrection plants during desiccation [1], [5], [6], [7], [8], [9], [10]. However, in all these works, no attempt was made to gain deeper insights from the data using such bioinformatics approaches as protein-protein interaction (PPI) studies. PPI analysis can be used to carry out a rigorous investigation of proteomics data, producing novel insights and testable hypothesis [11], [12], [13], [14], [15], [16].
Generally, PPI studies have been extensively explored to investigate the biological significance of differential expression data and datasets, e.g. [17], [18], [19]. Given a list of proteins or genes, network biology algorithms make it possible to identify additional associated proteins [20]. This, arguably, is ideal when studying orphan species such as the resurrection plants. Often, high-throughput experiments in orphan plants do not lead to the identification of all proteins in a sample [9], [21], [22]. This is because protein IDs rely on inferring from homologs of distantly related species. In addition, analysis of biological networks makes it possible to identify what are called 'hubs'. Hubs in a network are the highly connected nodes in a scalefree system and have come to be associated with the key genes in the networks [23], [24] Therefore, re-analysis of data deposited and stored in public databases or found in publications can provide valuable clues for new research. In the past decade, advances in highthroughput proteomics and interactomics technologies resulted in an accumulation of numerous predicted plant PPIs [25], [26], [27].
Despite the importance of resurrection plants as potential model organisms in studies that focus on the improvement of the desiccation stress tolerance of crops [28] and, despite major advances in the interactome field [29], [30], [31], [32], there is still a dearth of literature exploiting PPIs in these plants.
In this study, we created interaction networks of differentially expressed X. schlechteri (formerly called X. viscosa) [33] nuclear proteins as well as observing additional related proteins predicted by GeneMANIA. PPIs analysis is important because it leads to the identification of hubs. Many plant proteins identified as hubs play a role in the plant stress response [24]. Specifically, in plant stress responserelated networks, hubs are assumed to play an essential role in the signal transduction cascade following stress detection [34].
The hubs participate in many PPIs and have significant central roles in signaling pathways (reviewed in [24]. Plant hubs can be classified into the functional categories of kinases and phosphatases, transcription factors or components of ubiquitin ligase complexes [24]. Other hub proteins include heat shock proteins (HSPs), ribosomal proteins, proteins involved in redox pathways, proteins involved in DNA repair mechanisms and cytoskeletoninteracting proteins that play a role in protecting cells and cellular components, thus making them essential and often conserved [24], [34], [35], [36], [37]. In addition to these hubs, there are other highly interacting hubs that are still uncharacterized or whose functions are not completely understand [24], [38].

Methods
A summary of the methodology used in this study is presented in Figure 1.

Protein data pre-processing
The initial protein list used in PPI analyses was downloaded from the iTRAQ proteomics data from Abdalla and Rafudeen [1]. As far as we are aware, the publication is currently the most comprehensive study on resurrection plant nuclear proteome. The nuclear proteins have a variety of biological functions and are arranged into intricate regulatory networks [39]. Therefore, characterizing the nuclear proteome is crucial for understanding gene regulation in response to desiccation in resurrection plants . The iTRAQ experiment identified 122 proteins  with 28 up-regulated, 13 down-regulated and 81 unchanged, in response to desiccation. The GenInfo Identifiers (GI numbers) used for protein IDs in the primary paper were replaced with Arabidopsis Information Resource (TAIR) gene identifiers using the UniProt ID mapping tool (www.uniprot.org) and gProfiler [40]. The conversion to TAIR gene identifiers was necessary to ensure compatibility with the bioinformatics software used in our study. The resulting gene/ protein lists are presented in the Appendix (Table S1).

Construction of PPI networks of DEPs
In order to predict the PPIs between the DEPs, Cytoscape v3.9.0 [41] coupled with the GeneMania v3.5.2 [42] application were used, with Arabidopsis thaliana as the source organism. Two interaction networks were constructed using i) up-regulated proteins and ii) down-regulated proteins. The complexity of the networks was reduced by removing duplicate edges and self-loops. The use of GeneMania allows the addition of new genes to the query list that are predicted to be associated with DEPs. Given a query list, GeneMANIA extends the list by adding functionally similar genes that it identifies using available proteomics and genomics data [42], [43] GeneMania also allows the evaluation of relationships of proteins in the respective networks in terms of co-expression, physical and genetic interactions, pathways, colocalization, protein domain similarity, and predicted interactions.

Identification of Hub proteins
The hub proteins were identified by calculating the node degree distribution by using the Network Analyzer application of Cytoscape [44]. The degree of a node represents the number of edges linked to nodes, and the node with the highest degree represents significant biological functions [20], [45]. A protein with the highest degree distribution was considered as a hub in the respective network.

Construction of a sub-network consisting of key proteins
The nodes with high degree distribution are viewed as being key to the PPI network [46] and can be considered the most important proteins in the DT mechanism. In this study, for each PPI network, the top 10% of the nodes based on degree distribution were considered as being the key proteins in the response to DT. These key proteins were constructed into a PPI network from which functional enrichment analysis was carried out.

Functional enrichment analysis of key proteins
In order to obtain deeper insights into the biological reality modeled by the key proteins in each constructed PPI network, we carried out functional enrichment analysis using GeneMania with default settings. Functional enrichment analysis of these key proteins enables a better understanding of the biological relevance of the proteins in response to desiccation stress. GeneMania allowed for the analysis of the statistically overrepresented GO BP (Gene Ontology Biological Process) terms [47]. To reduce the GO term redundancy, REViGO [48] was used. Gene ontology provides a common framework for functionally annotating protein/gene sets [47], [49], [50]. REViGO takes long lists of GO terms and summarizes them by removing the redundant terms. REViGO was implemented with default parameters.

Community structure analysis to identify functional modules
PPI networks are usually comprised of several subnetworks or functional modules that contribute to various diverse biological processes and pathways. On its own, a node may have negligible influence on the global network or global properties, yet, in a sub-network, it may have a significant influence with specific functionality [51]. Therefore, in a study like ours, it is essential to construct a network consisting of all the DEPs and carry out cluster analysis to identify densely connected regions [51]. Various clustering approaches have been suggested in the literature [52], [53], [54], [55], [56]. In this study, we used community clustering (GLay) [57] to find functional modules (communities) among the DEPs. This was achieved by using Cytoscape coupled with clusterMaker2 v1.3.1 [58]. For community analysis, we constructed a network that was a combination of the up-and down-regulated proteins. To identify the statistically overrepresented GO biological processes within each cluster (communities), the detected clusters were subjected to functional enrichment analyses using the ShinyGO v0.74 tool [59]. The ShinyGO parameters were P-value cutoff (FDR) − 0.05, species -A. thaliana, background -all the proteins detected by the proteomics experiment + their predicted associates. For functional enrichment analysis, only communities with at least 10 nodes were analyzed.

Results and discussion
The role of proteins in different plant biotic and abiotic stress responses is crucial because proteins are the key executors of cellular mechanisms involved in maintaining cellular homeostasis [4]. Also, proteins participate in forming new plant phenotypes to adapt to environmental changes [4]. However, the actions of individual proteins usually do not account for the complex signaling networks and the dynamic regulation of cellular processes involved in the plant response to different stresses [47]. Thus, results from a proteomics experiment may need to be investigated further by such bioinformatics approaches such as PPI studies to obtain a more holistic picture of plant stress response mechanisms. With network biology, it is possible to study the stress response mechanisms in cells by studying the interaction dynamics of the proteins. The interactome may be graphically represented and interpreted by constructing a network in which a protein is represented by a node, while edges represent interactions between the nodes [20]. Generally, biological networks exhibit scalefree properties, where only a few nodes have many connections that represent hubs in the network [24]. In the PPIs network, the nodes with high degrees are defined as the hub proteins [60]. It is widely considered that proteins with high connectivity (hub proteins) play more important biological roles than non-hub proteins, i.e., proteins with low connectivity [45].

PPI networks of DEPs
An initial set of 28 up-regulated and 13 down-regulated proteins were analyzed in this study. GeneMANIA was used to predict interactions between the DEPs and additional related proteins in the networks, using A. thaliana as the source organism. From the up-regulated and downregulated protein lists, GeneMANIA could recognize additional 20 proteins for both the up-regulated and downregulated initial query lists. The PPI networks for the DEPs are shown in Figures 2 and 3. It is important to note that the proteins added by GeneMANIA are not necessarily differentially expressed in response to desiccation. Rather, these are proteins predicted to be functionally linked to the DEPs.
In this work, we considered the nodes with the highest degrees distribution as the hub proteins in the PPIs. Using this criterion, the 'super-hub' protein for the network constructed from up-regulated proteins, and their predicted associates, was HSP81-2, with a degree distribution of 29. This protein was found in the proteomics experiment among the DEPs. The 'super-hub' protein for the downregulated protein set, and their predicted functionally linked proteins, is RPS2C (40S ribosomal protein S2-3) with a degree distribution of 23. Interestingly, this protein was not in the initial query list but was predicted by GeneMania to be functionally related with the downregulated proteins.

Network analysis and functional enrichment analysis of key proteins
In this study, the nodes with a large degree were regarded as 'key', and we considered the sub-network of these proteins for further investigation of the key biological processes associated with them. The key proteins (Tables 1 and 2) were subjected to GeneMania to construct sub-networks for both the up-regulated proteins (and their predicted associates) and down-regulated proteins (and their predicted associates). In order to understand the biological processes associated with the key proteins, these proteins were then subjected to network-based functional enrichment analysis in GeneMania.

PPI network of key proteins for the up-regulated proteins and predicted associated proteins
HSP81-2, which is the hub protein in the network derived from the up-regulated proteins is also the hub in the corresponding sub-network, with a degree distribution of 29 (Table 1 and Figure 4). Proteins in the HSP family are often identified as hubs in plant network [24]. HSPs or chaperone proteins, are essential for cellular homeostasis and are known to be responsible for proper protein folding localization [61]. After exposure to stressful conditions, such as high temperature and drought, HSPs also direct the abnormal/damaged protein toward the protein degradation machinery for degradation [61], [62], [63], [64], [65], [66]. During drying, cellular contents become concentrated, thereby increasing the likelihood of inappropriate molecular interactions and membrane appression [67]. Ultimately, the lack of sufficient water to surround macromolecules can cause their denaturation and loss of membrane integrity [50], [68], [69]. This loss of macromolecular structural integrity must be mitigated for plant survival during stressful conditions. Desiccation-induced proteins, such as HSPs, are believed to act directly as protective molecules during desiccation and rehydration in resurrection plants [70], [71] Given the critical role of chaperones, it is thus not surprising that HSP81-2, which has chaperonin activity, is a hub in the PPI network of the up-regulated proteins.
The other key proteins in this network are proteins in the histone family. Generally, histones have not been regarded as hubs in PPI networks. However, the fact that histones have been highly interacting in the present work make them interesting targets for future research, especially concerning desiccation stress response. Generally, histones are structural proteins that function to organize eukaryotic DNA [72]. There are four core histones (H2A, H2B, H3, and H4) which are involved in the packaging and maintenance of chromosomes [73], [74] Both histones H2A and H2B play a role in the regulation of transcription and gene expression [72], [75] The fact that we identified these two among the key proteins in our bioinformatics study, suggest that they are important in the DT mechanism in X. schlechteri. Histone H2B also plays an essential role in DNA replication and repair [72]. Abiotic stresses are known to cause DNA damage in plant cells [76], [77] a situation that compromises cellular integrity and function if the damage is not repaired. Our results, therefore, suggest an important role of H2B in DNA repair in response to desiccation stress in X. schlechteri.    The overrepresented GO biological process terms include 'regulation of protein stability', 'chromatin organization' and 'negative regulation of transcription, DNA templeted' (see Table 2 and Figure 5). The over-representation of these GO biological process terms shows their importance in stressed cells to ensure the maintenance of protein stability during desiccation and the need to ensure the minimization of synthesis  of proteins that are not essential for survival. The minimization of transcription in non-essential processes such as growth and reproduction means that resources are directed toward synthesis of protectant proteins such as antioxidants and chaperonins [6], [21]. Generally, plant stress response involves chromatin re-organization (reviewed in [78]). Such chromatin reorganization includes changes in the nuclear structure [78]. It has been previously suggested that chromatin modifications may be partly responsible for gene regulatory changes necessary for plant DT [79], [80], [81]. Thus, it is not surprising that the GO term 'chromatin organization' is overrepresented in proteins from upregulated proteins network, as this points to the need to ensure the maintenance of chromatin integrity during desiccation stress.

PPI network of key proteins for the down-regulated proteins and predicted associated proteins
The key proteins in this PPI network are 30S ribosomal protein S5, chloroplastic and 40S ribosomal protein S3a ( Figure 6 and Table 3). Generally, ribosomal proteins have been recognized as hubs in plant PPI networks [24]. Functional enrichment analysis shows that the enriched GO biological process terms included 'peptide biosynthetic processes' (Table 4 and Figure 7). These data suggest a general shutting down of translation and protein synthesis during desiccation stress. Water deficit stress has been observed to result in a very significant decrease in global translation rates [82], [83] The apparent downregulation of processes that are associated with protein synthesis is in agreement with the suggestion that plant development processes often need to be down-  regulated to give room to processes involved in protection mechanisms, such as antioxidants and chaperones. In the resurrection plants Craterostigma plantagineum and Haberlea rhodopensis, water deprivation leads to down-regulation of photosynthesisrelated transcripts, but up-regulation of transcripts encoding protective molecules such as late embryogenesis abundant proteins, sucrose, and lipocalins, among others (reviewed in [84]).

Community structure analysis
The purpose of community detection for PPI networks is to divide proteins into groups so that the proteins within each group are more similar to one another than those in the other groups [57], [85], [86]. Finding communities is very important, because the communities can reflect functional modules i.e. functional units within a system [86]. Modules are generally regarded as self-contained components of a system with well-defined interfaces with the other components [87]. Topological studies of PPIs can, therefore, help detect and define these modules [87]. In this work, community clustering studies on all DEPs resulted in three protein clusters that had more than 10 members each ( Figure 8). We then subjected each cluster to a functional enrichment analysis with the results shown in Figures 9-11. Cluster A constituted of 21 proteins. Of these, 11 came from the down-regulated proteins, while 8 came from the up-regulated proteins and the remainder were those predicted by GeneMana. The fact that proteins from the up-regulated set and those from the down-regulated set act together as a functional unit suggest a high level of crosstalk within the cell in response to desiccation. The overrepresented GO biological processes in this cluster are those associated with redox processes (Figure 9). The process of desiccation results in the accumulation of reactive oxygen species (ROS) in cells [88]. The accumulation of ROS causes damage to proteins, nucleic acids, lipids, membranes and organelles [22], [89]. Thus, the involvement of proteins associated with anti-oxidant activities is crucial to the survival of the plants. Studies in X. schlechteri leaves have shown that    peroxidases are up-regulated during the dehydration process and down-regulated upon rehydration [6]. Cluster B consisted of 21 proteins (8 from upregulated proteins and 11 from down-regulated proteins). This cluster is involved in chromatin organization. The importance of maintaining chromatin integrity has been discussed in the section on 'PPI network of key proteins for the up-regulated proteins and predicted associated proteins'.
Cluster C consisted of 17 proteins, of which 12 are from the up-regulated proteins. This cluster contains proteins with diverse metabolic processes that can be broadly classified into i) proteins involved in transport and localization, ii) regulation of metabolic processes and iii) proteins involved in carbohydrate metabolism. Generally, this cluster reflects the biological processes involved in the mobilization of resources that is required to get a co-ordinated response against desiccation stress.

Conclusion
Currently, proteomics studies to understand the mechanisms of DT in resurrection plants have generated a considerable amount of data on the DEPs.  However, only a few studies have carried out a thorough investigation using bioinformatics approaches. Mining these DEPs and constructing PPI networks could be regarded as an effective way to explore the biological significance of the DEPs. Firstly, building networks can result in the identification of associated proteins that are otherwise not detected in a conventional proteomics investigation. This is especially important in plant proteomics involving 'orphan species' where lists of identified proteins are usually small owing to the lack of specific databases for protein identification. The networks we constructed were all characterized by a small number of highly connected proteins (nodes), while the majority of nodes have only a few connections, which is the classical characteristic of PPI networks [90].
We have used bioinformatics analysis to get more insights from the nuclear proteomics data presented by Abdalla and Rafudeen [1]. The new data produced by our study include the identification of hub proteins in the up-regulated and down-regulated protein sets. Specifically, we showed that the hub proteins for the up-regulated genes were HSP81-2 while 40S ribosomal protein S2-3 was the hub for the downregulated proteins. Second, we identified functional modules for the DEPs. These functional modules can be grouped into 'redox proteins', 'chromatin organization' and 'general metabolism'. Overall, our work suggests that in order to ensure the survival of the plant during desiccation, X. schlechteri increases the synthesis of protectant proteins while minimizing the synthesis of non-essential proteins. Our work represents the first effort to use networks biology to understand the mechanisms of DT in resurrection plants.
Understanding the cellular mechanisms of DT in resurrection plants may enable future molecular improvement of drought tolerance in crop plants. Thus, resurrection plants are potential sources for gene discovery. The identification of hub proteins/genes, especially from the up-regulated protein list, can be a potential source for gene discovery. We, however, note that, network biology is meant to be a hypothesis generation process [91], [92], [93]. Thus, the results presented in this study need to be validated experimentally for the predicted interaction networks and the resulting bioinformatics inference to be considered as factual. For example, the predicted PPIs in the results are not from organ (leaf) specific databases and cannot be expected to be 100% accurate. Also, PPI analysis in non-model organisms involves inferences based on organisms such as A. thaliana, which are often evolutionarily distantly related. This is an inherent problem associated with bioinformatics of orphan species, such as resurrection plants, whose genomes are either not fully sequenced or are not fully annotated. The primary proteomics experiment identified only a few proteins (122), whose PPIs were analyzed in our work. Potentially, if thousands of proteins could be identified in a proteomics study, additional or new hub proteins and pathways can be identified. Given these limitations, our work can only be considered as a first pilot study for future analysis.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.