In silico analysis of protein-protein interfaces: Tools and approaches

Two or more proteins interact in vivo to perform complex molecular functions including catalysis, regulation, assembly, immunity and inhibition through the formation of stable interfaces. This interaction is governed by several factors that are selective, sensitive and speci(cid:977)ic in nature. Several interface features has been documented since 1975. The study of these interface features of proteins and their dynamicity during interaction with different proteins help understanding the mechanisms underlying diverse molecular functions and its biological processes. Computational tools greatly assist in studying such interface features that determine the interaction between two or more proteins, and in this context, this review enumerates the different interface features reported thus far along with the tools that aid in deciphering protein features (physicochemical characteristics, binding site and interface residue prediction and hotspot residues) along with their approaches that are employed in the prediction these features. Also, the review discusses the advantages and limitations of experimental techniques and computational biological tools deployed for deciphering the protein-protein interactions. Altogether, the review will provide insights into the optimal tools and different strategies involved in protein interaction studies that would facilitate the researchers to understand the protein structural features and molecular principles of protein-protein interaction with known functions.


INTRODUCTION
Proteins are the fundamental element for any biological process in a living organism. Two or more proteins communicate with one another to carry out speci ic molecular functions including catalysis, regulation, assembly, immunity and inhibi-tion with high selectivity, speci icity and sensitivity through the formation of stable interface (Schreiber and Keating, 2011). Hence, studies about proteinprotein interactions (PPI) play a vital role in understanding the molecular mechanisms of several biological processes in a cell.
The developments in high throughput in-vivo and invitro techniques make it feasible to determine the protein-protein interactions experimentally. But, the experimental techniques come with their own limitations including time-consumption; high cost, laborious, high false positive rate and inaccuracy in data generation that affecting the interaction results. These limitations pave way for the utility of in-silico methods which produce results that are near to those determined using experimental techniques (Mrowka, 2001).
Protein interactions are been comprehensively ana-lyzed since 1975 for several physicochemical features at the interface, as these features helps in decoding the molecular principal that are underlying different protein functions (Jones and Thornton, 1995). Protein interface features guide proteins to select their partners with high precision through the formation of a stable interface. Protein-protein interface features are studied using the 3D structures available at Protein DataBank (PDB). Protein interfaces are characterized based on the physicochemical features of the interface, which in-turn is based on the varying strength of protein association among different protein complexes.
Proteins associate with multiple partners at varying intensity forming many interfaces resulting in a permanent or transient association. Prediction of binding sites at weak or transient protein interfaces is an intricate process due to the limited availability of transient protein 3D structures in the PDB (Ozbabacan et al., 2011). Interactions between two proteins (dimer) are considered to be the strong as its monomer are rarely functional (Jones and Thornton, 1995). Hence, the structural information of an interaction combined with the deciphered interface features is important in understanding the molecular mechanism of a biological function.
Therefore, in this review we present different protein interface features reported thus far along with the currently available and updated tools that aid in deciphering protein features. Also, we discuss about the advantages and limitations of experimental techniques and computational biological tools that are deployed for deciphering the protein-protein interactions.

Protein-Protein Interface Structural Features
Protein interface ( Figure 1) is a part of the surface that comes in contact with the other proteins during binding. Protein interface physicochemical features play an important role in understanding the principle of protein interaction and the disease mechanism. This understanding leads to the discovery of optimal therapeutics for diseases. Hence, we present here a comprehensive review on the interface features that are discussed in the literature since 1975 (Table 1).

Role of Hydrophobicity In Protein-Protein Interaction
In protein-protein interaction the role of hydrophobicity and the structure responsible for the strong binding are studied in the following studies. Chothia and Janin (1975) showed that the interfaces are closely packed and concluded that the interfaces are majorly stabilized by hydrophobicity and that the complementarity to play the selective role in selecting the proteins that are to-be associated. Korn and Burnett (1991) described that the hydropathy distribution is high at interface compared to surface but less than core by reviewing 40 multisubunit proteins and 2 protein-protein complexes. They also devised a tool called hydropathy complementarity to quantify the strength of hydropathy distribution in the protein. Nearly after two decades Jones and Thornton (1995) found that the hydrophobic residues are more at the interface compared to the surface but less than the protein interior by structurally characterizing the protein interface and by comparing the interface residues with the surface and core. Xu et al. (1997) discussed about the importance of hydrophobic effect at the interface using a 362 non-redundant protein interfaces and 57 oligomeric interfaces and explored that the effect of hydrophobicity in protein binding is not as strong as it is in protein folding. Therefore these indings show the importance of hydrophobicity, hydrophobic effects, hydrophobic residues and hydrophobic patches at the interface and its impact on stability in protein binding.

Function of Interface
Nearly a decade later the role of the subunit interface in stabilizing the protein-protein binding has been studied. Miller et al. (1987) with 23 oligomeric proteins recognized that the surface area within and between the oligomeric proteins are directly proportional to the relative molecular mass and also stated that there is slight variation in the exposure of the surface to the solvent between the core and the interface region. This inding may have a signi icant role in the stability and activity of the oligomeric protein interfaces. Caffrey (2004) tested 64 protein interfaces with conservation scores obtained from two multiple sequence alignment types and recognized fewer alignment gaps present at the obligate interfaces when compared to transient interfaces and also reported that the buried interface residues are more conserved than the partially buried residues. Bahadur et al. (2004) used interface area, crystal packing, residue propensities and hydrophobic interaction energy to measure speci ic interfaces (70 protein-protein complexes) and nonspeci ic interfaces (188 monomeric proteins) and observed that the atomic packing is less compact in non-speci ic interfaces in comparison to the speci ic interfaces. Vaishnavi et al. (2010) reported that the variation in the size of interacting protein partners to be a signi icant factor in deciding the mode of interaction using 156 heterodimer protein Bahadur et al. (2004) used interface area, crystal packing, residue propensities and hydrophobic interaction energy to measure speci ic interfaces (70 proteinprotein complexes) and non-speci ic interfaces (188 monomeric proteins) and observed that the atomic packing is less compact in non-speci ic interfaces in comparison to the speci ic interfaces. Vaishnavi et al. (2010) reported that the variation in the size of interacting protein partners to be a signi icant factor in deciding the mode of interaction using 156 heterodimer protein interfaces. Conte et al. (1999) analyzed 75 protein-protein interfaces with known structure and observed small interfaces with small conformational changes and large interfaces with large conformational changes and also reported that some interfaces are predominantly polar and some are non-polar with charged residues above average. Jones and Thornton (1997) analyzed protein interfaces using surface patches made of six interface features such as solvation potential, hydrophobicity, accessible surface area (asa), interface residue propensity, planarity and protrusion (SHARP2) to distinguish interface patches from surface patches and to identify the position of the recognition sites. Xu et al. (1997) studied about the hydrogen bonds and salt bridges across 319 non-redundant protein-protein interfaces and acknowledged the presence of pattern of charge complementarity and conservations of hydrogen bonds at the interface. Furthermore, the importance of speci icity at protein interface and its application in docking and molecular design are also illustrated. Chakrabarti and Janin (2002) described that small interfaces have single patch and large interfaces enclosing multiple patches, where each patch buries interface atoms surrounded by the rim. This was proved using a dataset of 70 protein interfaces. Guharoy and Chakrabarti (2010) utilized 121 homodimers and 392 heterocomplexes and reported that conserved residues at protein interfaces are not random but are distinctly clustered. However, they have also reported that the buried residues at the interfaces are more conserved than the partially buried residues. It has been recently hypothesized that evolutionarily constraints of signal transduction and enzymes interfaces play an important role in characterizing the interacting surface; while energetic constraints are deterministic in immune, inhibitor and structural assembly interfaces (Marchetti et al., 2019). Li et al. (2006) found hotspot residues to play a vital role in determining the stability of the interacting proteins. Hotspots possess optimal energy and hence they are most favored residues at the interface. Gromiha et al. (2009) designed an energy based approach to understand the mechanism of recognition at protein interface and found that charged and aromatic residues to play an important role in protein binding.

Formation of Homodimer and Heterodimer Interfaces
Jones and Thornton (1995) explored that the obligatory (permanent) protein interfaces are closely packed with fewer hydrogen bonds than nonobligatory (transient) interfaces using the in luence of factor in the formation of the proteinprotein association in homodimer, heterodimer, enzyme inhibitory complexes and antigen-protein complexes. Zhanhua et al. (2005) identi ied critical heterodimer interface parameters through multidimensional scaling in Euclidian space using 65 heterodimers with known 3D structures. Sowmya et al. (2015) analyzed 278 heterodimer protein interfaces using ive key features such as interface area, interface polar residues abundance, hydrogen bonds, solvation free energy gain and binding energy and found salt bridges to increase with interface area in regulator-inhibitor interfaces. This suggests that regulator-inhibitor interfaces are held together mainly by salt bridges through electrostatics interactions. Nilofer et al. (2020) showed that protein interfaces are highly pronounced with van der Waals (vdW) and that H-bonds and electrostatic to play a discerning role towards broad functional speci icity where H-bonds increase in obligatory and immune complexes while electrostatic increase in non-obligatory regulator-inhibitors with interface size (number of interface residues). Moreover, they also reported that small interfaces are rich in electrostatics and that they are often found to be linked to regulatory function by analyzing 2557 homodimer and 393 heterodimers interfaces using vdW, hydrogen bonds and electrostatics.

Experimental Methods and Its Limitations
Determination of protein protein interactions (PPI) using experimental methods depends on the physical and chemical characteristics of protein interface (Szilágyi et al., 2005). Advancements in experimental methods are promising, yet with limited precision and coverage. Hence a consensus results from different experimental approaches are to be considered for PPI prediction. Experimental methods include in-vivo and in-vitro techniques. In-vivo techniques include Yeast two hybrid and af inity puri ication succeeded by mass spectrometry and in-vitro techniques include co-immuno precipitation, pull down assay and label transfer assay are previously discussed elsewhere (Fields and kyu Song, 1989;Piehler, 2005). Therefore, computational biological approaches play a pivotal role in choosing the important data for high throughput experimental analysis, thereby minimizing the cost, time consumed and high false positive rate.

Protein-Protein Interface Tools and Approaches
Protein interface is the signi icant part of the protein-protein interaction. Therefore analyzing protein interfaces are important in understanding the molecular mechanism underlying biological function. Protein-protein interface is made up of physicochemical features, interface and hotspot residues (Table 2 ). Hence, we present here a comparative discussion about the available tools and their different approaches under each category.

Tools for The Prediction of Physicochemical Features at the Interface
ProFace (Saha et al., 2006) is a web based suite of programs used for calculating physicochemical parameters across interface formed between two or more subunits. It also helps in identifying the interface residues at the protein recognition sites. PIC (Protein Interactions Calculator) (Tina et al., 2007) estimates various interactions including disulphide bonds, hydrophobic interactions, ionic interactions, hydrogen bonds, aromatic-aromatic, aromatic-sulphur and cationπ interactions at inter-protein and intra-protein complexes using standard criteria. Moreover, the server also analyzes the accessible surface area and distance of a residue from the surface of a protein. Each interaction can be visualized using Jmol or Rasmol visualization tool. ProtorP (Reynolds et al., 2009) is a web server helps in predicting the physicochemical features (size and shape, intermolecular bonding, residue and atom composition and secondary structure contributions) at the protein recognition sites that contribute to the overall binding energy of the interaction. PPCheck (Sukhwal and Sowd-hamini, 2015) is a webserver aid in computing total stabilizing energy, hydrogen bonds, hydrophobic interactions, salt bridges, favourable electrostatic interactions, unfavourable electrostatic interactions and short contacts by assigning pseudoenergies to quantify the strength of protein-protein interaction. Moreover, the server also calculates hotspot residues, computational alanine scanning, residue conservation and the prediction of right docking pose. ProFunc Laskowski et al. (2005) is a web based predicting server used to predict the function of a protein with known structure. PDBsum Laskowski et al. (2018) is a web server providing structural details on a given protein. The output is generally is in image format and the structural details include protein secondary structure, proteinligand and protein-DNA interactions, PROCHECK analyses of structural quality, and many others. The output image iles can be viewed using RasMol, PyMOL and JavaScript Viewer.
SHARP2, ProtorP and ProFace are eminent web servers in predicting the physicochemical features at protein interface. However, these web servers are not updated and non-functional. PIC is the only webserver calculating the physicochemical properties for intra and inter-protein interaction when compared to other web servers. PPCheck gives a complete analysis for protein interaction in one single window including residue conservation, computing the strength of protein interactions, identifying the hotspot residues, performing computational alanine scanning and predicting the right docking pose. Therefore PPCheck is an independent platform for the analysis of protein protein interaction. However MAPPIS also recognizes spatially conserved physicochemical features and identify hot spot residues using multiple sequence alignment. PIC is restricted to the analysis of physicochemical properties, whereas PPCheck is extended to predict the other aspects of protein protein interaction. PDBsum serves as a inest web server in providing the structural information of a protein in an image format. (1994) is a web based program to calculate the hydrogen bonding and its position at the interface. ConSurf Server Landau et al. (2005) is an automated web tool to recognize the binding residues at the interface using evolutionary conservation score. These scores are generated using Bayesian method. PIER (Protein IntErface Recognition) Kufareva et al. (2007) uses local statistical properties on the surface to identify the binding residues at atomic level for monomers. SPPIDER (Solvent accessibility based Protein-Protein Interface identi ication and Recognition) Porollo and Meller (2006) is a protein interface residue prediction method using RSA relative solvent accessibility of an amino acid residue as a ingerprint. SPPIDER is developed using machine learning approaches including Support Vector Machines and Neural networks and other informative features. iFrag Garcia-Garcia et al. (2017) is a sequence based protein interface residue predicting server using minimal sequence fragments. iFrag accepts FASTA sequence as input.

HBPLUS McDonald and Thornton
iFrag is the only server that accept sequence as input whereas other methods are depended on protein's structural information. SPPIDER's RSA ingerprint method is unique and novel method and yield better results than other methods. ConSurf server is based on evolutionary conservation and structural information of a protein complex for prediction.

Tools Predicting the Hotspot Residues at the Interface
Hotspot residues are those residues at the interface playing an important role in binding af inity thereby increasing the total binding energy. Prediction of hotspot residues holds a great impact in the ield of drug discovery and protein design. Here are some currently active hotspot residue predictor with different approaches and visualization.
PCRPi-W -Presaging Critical Residues in Protein interfaces-Web Server (Mora et al., 2010) identi ies protein interfacial hotspot residues based on the integration of structural, energetic, and evolutionary-based measures by using Bayesian Networks (BNs). SpotOn Moreira et al. (2017) identi ies and classify the interface residues as hotspots and null spots with high accuracy and sensitivity. The Spot On algorithm was developed using machine learning approach.
PCRPi-W accepts only PDB structural data as input, while 3D structural data are not available for all proteins. However, SpotOn accepts both structure and sequence data as an input. Accuracy, sensitivity and speci icity play a vital role in protein interfacial hotspot prediction. SpotOn server provide hotspot residues in a table format and can also be visualized using Jmol. In addition, it provides sequence viewer to tabulate the probability of hotspot prediction. Furthermore, PCRPi-W uses evolutionary approach measured using Bayesian Networks to identify the hotspot residues.

Limitations in the Usage of Computational Biological Tools
Computational biological tools are available as com-mercial (licensed) and free softwares (educational purpose) on the web. Commercial tools are allin-one platform type, expensive however properly maintained, updated and provide customer care support. The free trial version of commercial tools is available with minimal features for 30 days upon payment agreement. Contrarily, most of the free softwares are neither properly maintained nor updated and the issues faced with the working of the software are not properly addressed. In majority of cases the published contact person's e-mail address does not work due to change of work place (laboratory) or the completion of project tenure. On the other hand, free software request for educational e-mail address for usage, and the accessibility is restricted for independent researchers or industrial of icials. Few computational tools links (for example SHARP2, ProtorP and ProFace) given in the published paper are not working. Furthermore there are few tools that are created using CUI (Character User Interface), in such case the software are operated using command line. Researchers with no coding knowledge ind it dif icult with the software operation. Unlike wet-lab experiments there is no one-platform one-solution situation with computational biology tools as there are multiple tools with multiple approaches for one biological problem. Hence scientists have to run their query in multiple tools to trust their results accuracy. Computational biology tools require high-end computers/servers for operation nevertheless the results are not considered to be accurate until veri ied by wet-lab experiments. Computational tools also have reasonable high false positive and false negative rates. These limitations are to be considered and recti ied while developing computational biological tools.

CONCLUSIONS
Protein-protein interactions play a fundamental role in all molecular processes to perform a biological function. While, protein interface features play a pivotal role in selecting a selective, speci ic and sensitive partner for interaction. Protein interfaces are described using several features such as hydrophobic effect, hydropathy, hydration, H-bonds, charge and shape complementarity, peptide segments, ionpairs, single and multiple patches, hydrophobicity, hydrophilicity, interface residue propensity, charged, aromatic and Arginine residues, binding energy, hotspot residues, van der Waals, electrostatics and salt bridges. These features help us in deciphering the driving force for protein associations. Advancements in protein-protein docking methods, algorithms, tools and databases pro-vide deep insights about the protein interaction but poses dif iculty in mimicking these features in invivo condition with high accuracy. Hence, understanding the fundamentals of protein-protein interactions with interface features will provide us with deep knowledge and clarity about the stable interface formation. An examination and combination of both atomic and residue features at the interface compared to surface in different functional protein complexes is required to differentiate protein interfaces. Therefore, studying the distinguishable features at different protein interfaces with known structure and consequently analyzing the resulting recognition pattern of the interface are critical in identifying the protein partners for association.

Funding Support
The authors declare that they have no funding support for this study.