A novel method for prediction of protein interaction sites based on integrated RBF neural networks
Introduction
In the new century, the great plan of genome has been basically completed. Now, we entered the post-genomic era [1], [2]. In this stage, we should focus on the proteins. The proteins are much more complex than genes according to the current understanding of information and knowledge [3]. The genes are in a fixed number, and they do not change. In contrast, the proteins which are the productions of gene are in a more numbers. They are basically involved in every process of life activities. And some ones are changing all the time. They are unstable, so there will be many difficulties if we want to make sense of them. In all of these, protein interactions are essential [4]. They control these processes and they are also the base and character of cell life activities. For example, protein interactions involve in metabolism and signal transduction, gene transcription, protein translation, modification and positioning and so on. If we want to study the principle of protein interactions, we have to find the seats of a protein which are involved in the interactions called interaction sites firstly. For these reasons, it is a very important task for protein interaction sites prediction.
With the development of bioinformatics, there are many methods which are used to predict protein interaction sites. For example, neutral networks, classified trees, Bayesian networks, support vector machine (SVM) [26], [27], [28], [29], [30] and so on. In recent years, some researchers have done a lot of works. Iqbal et al. [5] used a hybrid rule-induction/likelihood-ratio based approach to predict protein–protein interactions. They selected four features, mRNA Co-expression (COE), MIPS Functional Similarity (MIPS), GO Functional similarity (GOF) and Marginal Essentiality (MES), and had got a good results. Oh et al. [6] who predicted protein binding site based on three-dimensional protein modeling had applied a two-stage template based ligand binding site prediction method to CASP8 targets and achieved high quality results. Lan et al. [7] adopted SVM, and they mainly focused on feature generation and representations. Their methods also improved the performance of classification. Liu et al. [8] also focused on features, but their method was based on PseAA (pseudo amino acid) composition and hybrid feature selection. Their prediction model was trained and tested in the k-nearest neighbors (KNNs) learning system.
The subject of predicting protein interaction sites is a two-type classification problem. The interaction sites are labeled as ‘1’, and the non-interaction sites are labeled as ‘0’. In this paper, the RBF neutral networks [9] were employed and trained by using particle swarm optimization (PSO) algorithm [10], [11]. Besides these features, we selected some new features, such as entropy, relative entropy, conservation weight and sequence variability. We represented six sliding windows with these features, and they included 1, 3, 5, 7, 9 and 11 amino acid residues respectively. These sliding windows were put into the RBF neutral networks. At the same time, we integrated the results of these RBF neutral networks. We used two strategies, decision fusion (DF) [12] and Genetic Algorithm based Selected Ensemble (GASEN) [13]. The experiments showed that the proposed method performs better than other related methods.
Section snippets
Definition of protein interaction sites
In order to create a sample set, we must define the protein interaction sites. A site is an amino acid residue. If this residue is defined as a protein interaction site, it is labeled as ‘1’. Otherwise it is labeled as ‘0’. Firstly, we introduce two words:
MASA (monomer accessible surface area). Accessible surface area of a residue that is in a monomer (a chain).
CASA (complex accessible surface area). Accessible surface area of a residue that is in a complex (contain one or several chains).
If
Data set
In this paper, a data set that contained 38 proteins is used, and it is the same with Wei Meng's [23]. They selected this data set from SPIN database. Firstly, they excluded the strong special signal generated by protease and homodimer. They also excluded the chains of a protein that involved in several interactions, because they wanted to focus on the Dimers. Then the data were filtered, and removed the chains that were labeled by membrane peptide, small proteins, coiled coil in the SCOP
Visualization of experimental results
We validated our method by protein 1npo and 1tmc. The 1npo is a kind of complex on hormone transport. It contains four chains and we used the chain of A. The chain contains 95 residues. We predicted 75 residues correctly by GASEN. The 1tmc is a kind of histocompatibility antigen. It contains three chains and we used the chain of A. The chain contains 175 residues. We predicted 142 residues correctly by GASEN (see Fig. 6, Fig. 7, Fig. 8, Fig. 9).
Conclusion
In this paper, a new integrated RBF neural networks was proposed to predict the protein–protein interaction sites. A number of features were extracted, i.e., sequence profiles, entropy, relative entropy, conservation weight, accessible surface area and sequence variability. Then six sliding windows regrading to these features were made, and they contained 1, 3, 5, 7, 9 and 11 amino acid residues respectively. And then six RBF neural networks were trained to predict the protein–protein
Conflicts of interest statement
None declared.
Acknowledgments
This research was partially supported by the Natural Science Foundation of China (61070130), the Key Project of Natural Science Foundation of Shandong Province (ZR2011FZ003) and the Key Subject Research Foundation of Shandong Province.
References (30)
- et al.
Feature generation and representations for protein–protein interaction classification
J. Biomed. Inf.
(2009) - et al.
Prediction of protein–protein interactions based on PseAA composition and hybrid feature selection
Biochem. Biophys. Res. Commun.
(2009) - et al.
Time series prediction using RBF neural networks with a nonlinear time-varying evolution PSO algorithm
Neurocomputing
(2009) - et al.
Ensembling neural networks: many could be better than all
Artif. Intell.
(2002) - et al.
Predicting protein interaction sites from residue spatial sequence profile and evolution rate
FEBS Lett.
(2006) - et al.
Using pre & post-processing methods to improve binding site predictions
Patt. Recog.
(2009) - et al.
Adaptive compressive learning for prediction of protein–protein interactions from primary sequence
J. Theoret. Biol.
(2011) - et al.
MppS: an ensemble of support vector machine based on multiple physicochemical properties of amino acids
Neurocomputing
(2006) Entering the postgenome era
Science
(1995)Trawling for proteins in the post-genome era
Nature Biotech.
(1996)