Biochemical and Biophysical Research Communications
Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition☆
Section snippets
Materials and methods
The proteins used for this study were collected from the NPD (Nuclear Protein Database) [12] at http://npd.hgu.mrc.ac.uk/. The sequences of proteins in NPD are derived from the SWISS-PROT and TREMBL Data Banks [13]. To construct a high-quality working dataset, all the data were screened strictly according to the following procedures. (1) Included were only those sequences with a clear locational description in the nucleus. (2) For protein sequences having the same name but from different
Results and discussion
The predictions were examined by re-substitution test and jackknife test on the 370 proteins classified into 9 subnuclear locations (Table 1). The re-substitution test is used to examine the self-consistency of a prediction method [21], while the jackknife test is deemed the most objective and rigorous procedure for cross-validation [21] and has been used by more and more investigators [8], [9], [11], [22], [23], [24], [25], [26], [27] to examine the power of various prediction methods.
For the
Conclusion
The OET-KNN algorithm is a very powerful classifier. Using pseudo amino acid composition to represent protein samples can incorporate a considerable amount of sequence-order effects that are totally omitted by the conventional amino acid composition. That is why the current approach, which has combined the two advantages, can significantly outperform the other approaches, such as ProtLock and SVM. It is anticipated that with the improvement of the training dataset as more proteins with known
References (30)
- et al.
A knowledge base for predicting protein localization sites in eukaryotic cells
Genomics
(1992) Protein sorting signals and prediction of subcellular localization
Adv. Protein Chem.
(2000)- et al.
Relation between amino acid composition and cellular location of proteins
J. Mol. Biol.
(1997) - et al.
Using functional domain composition and support vector machines for prediction of protein subcellular location
J. Biol. Chem.
(2002) - et al.
A joint prediction of the folding types of 1490 human proteins from their genetic codons
J. Theor. Biol.
(1993) - et al.
Predicting protein folding types by distance functions that make allowances for amino acid interactions
J. Biol. Chem.
(1994) Prediction of protein subcellular locations using Markov chain models
FEBS Lett.
(1999)- et al.
SLLE for predicting membrane protein types
J. Theor. Biol.
(2005) - et al.
Towards a systematics for protein subcellular location: quantitative description of protein localization patterns and automated analysis of fluorescence microscope images
Proc. Int. Conf. Intell. Syst. Mol. Biol.
(2000) - et al.
Protein subcellular location prediction
Protein Eng.
(1999)
Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach
J. Protein Chem.
Subcellular location prediction of apoptosis proteins
PROTEINS: Struct. Funct. Genet.
Using complexity measure factor to predict protein subcellular location
Amino Acids
Cited by (148)
Discovering nuclear targeting signal sequence through protein language learning and multivariate analysis
2020, Analytical BiochemistryCitation Excerpt :A nuclear localization signal or sequence (NLS) is an amino acid sequence peptide that binds to a protein sequence for the introduction of a nuclear protein into the nucleus [1–3].
EK-NNclus: A clustering procedure based on the evidential K-nearest neighbor rule
2015, Knowledge-Based SystemsAccurate prediction of protein structural classes by incorporating PSSS and PSSM into Chou's general PseAAC
2015, Chemometrics and Intelligent Laboratory SystemsCitation Excerpt :In statistical prediction, the following three cross-validation methods are often used to examine a predictor for its effectiveness in practical application: independent dataset test, sub-sampling test, and jackknife test [60]. However, as elucidated in [38] and demonstrated by Eqs. (28)–(32) of [38], among the three cross-validation methods, the jackknife test is deemed the least arbitrary (most objective) that can always yield a unique result for a given benchmark dataset, and hence has been increasingly used and widely recognized by investigators to examine the accuracy of various predictors (see, e.g., [46,52–55,61–70]). Accordingly, the jackknife test is also adopted here to examine the quality of the present predictor.
iDeepSubMito: identification of protein submitochondrial localization with deep learning
2021, Briefings in BioinformaticsiDRP-PseAAC: Identification of DNA Replication Proteins Using General PseAAC and Position Dependent Features
2021, International Journal of Peptide Research and Therapeutics
- ☆
Abbreviations: KNN, K-nearest neighbors; ET-KNN, evidence theoretic KNN; OET-KNN, optimized evidence theoretic KNN; NPD, Nuclear Protein Database.