ConPred_elite: a highly reliable approach to transmembrane topology prediction
Introduction
Transmembrane (TM) proteins, which play extremely important life functions, have been reported to share approximately 20–30% of genes in most genomes (Krogh et al., 2001, Liu and Rost, 2001, Arai et al., 2003). Their functions are closely correlated to their TM topology, i.e., the number, orientation to the membrane lipid bilayer, and position of transmembrane segments (TMSs) and can be classified and identified using the information of TM topology (Sugiyama et al., 2003). Thus, high quality information of TM topology is a requisite for comprehensive analysis of TM protein functions.
Many topology prediction methods have been proposed, although previous studies (Moller et al., 2001, Ikeda et al., 2002, Chen et al., 2002) revealed that the prediction accuracies of individual methods are still not high enough, i.e., at most 60–70% as to whole topology. For improving the prediction performance in a practical way, consensus prediction methods have been successfully tried with increased accuracies by as much as 10% (Ikeda et al., 2002). Recently, Nilsson et al., 2000, Nilsson et al., 2002, and Käll and Sonnhammer (2002) have presented alternative interesting approaches for consensus with TM topology prediction methods, aiming to get higher-quality TM topology models. The approach proposed by Nilsson et al., however, cannot predict the whole TM topology over an entire sequence, unfortunately. In the algorithm of the latter group, the prediction accuracy (i.e., reliability) is not given explicitly, although the whole topology can be predicted.
Here, in the hope that we can get still more reliable TM topology data, we propose a new consensus approach for TM topology prediction (ConPred_elite) that can predict the whole topology with accuracies of 0.98 and 0.95 for prokaryotic and eukaryotic TM protein sequences, respectively. Applying ConPred_elite to TM protein sequences extracted from 29 prokaryotic and 10 eukaryotic proteomes, we obtained finally 3871 and 6980 TM topology models, respectively, for next various uses.
Section snippets
Test dataset of transmembrane protein sequences with experimentally-characterized topologies
As a test dataset of TM protein sequences with experimentally-characterized TM topologies, we used a TMPDB_alpha_non-redundant dataset (<30% sequence similarities) extracted from TMPDB (Release 6.2; http://bioinfo.si.hirosaki-u.ac.jp/∼TMPDB/; Ikeda et al., 2003). This dataset includes 138 prokaryotic and 93 eukaryotic sequences of which TM topologies were determined experimentally by X-ray diffraction, NMR, two-dimensional crystal diffraction, gene fusion technique, substituted cysteine
Determining and ascertaining prediction reliability of the best combination from five methods for consensus transmembrane topology prediction (ConPred_elite)
We evaluated the consensus topology prediction performances for six combinations of five methods from six selected prediction methods by varying the value of the “allowable deviation”, n from 7 to 20 residues, using the TMPDB_alpha_non-redundant dataset. Performance evaluation was carried out separately for prokaryotic and eukaryotic sub-datasets. The results are summarized in Table 1, with the values of n, 15, and 11 residues for prokaryotic and eukaryotic sequences, respectively, that make
Acknowledgements
This research was supported in part by a Grant-in-Aid for Scientific Research on Priority Areas (C) ‘Genome Information Science’ (No. 15014203) and a Grant-in-Aid for Scientific Research (C) (No. 14580665) from the Ministry of Education, Culture, Sports, Science and Technology of Japan.
References (33)
- et al.
Comprehensive analysis of transmembrane topologies in prokaryotic genomes
Gene
(2003) - et al.
Analysis of membrane and surface protein sequences with the hydrophobic moment plot
J. Mol. Biol.
(1984) - et al.
Reliability of transmembrane predictions in whole-genome data
FEBS Lett.
(2002) - et al.
Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes
J. Mol. Biol.
(2001) - et al.
Consensus predictions of membrane protein topology
FEBS Lett.
(2000) - et al.
Forced transmembrane orientation of hydrophilic polypeptide segments in multispanning membrane proteins
Mol. Cell
(1998) - et al.
Sequence and topology of the CorA magnesium transport systems of Salmonella typhimurium and Escherichia coli. Identification of a new class of transport protein
J. Biol. Chem.
(1993) - et al.
Principles governing amino acid composition of integral membrane proteins: application to topology prediction
J. Mol. Biol.
(1998) - et al.
PlasmoDB: the plasmodium genome resource. A database integrating experimental and computational data
Nucleic Acids Res.
(2003) - et al.
GenBank
Nucleic Acids Res.
(2003)
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
Nucleic Acids Res.
Transmembrane helix predictions revisited
Protein Sci.
Ensembl 2002: accommodating comparative genomics
Nucleic Acids Res.
TopPred II: an improved software for membrane protein structure predictions
Comput. Appl. Biosci.
SOSUI: classification and secondary structure prediction system for membrane proteins
Bioinformatics
Tmbase—a database of membrane spanning proteins segments
Biol. Chem. Hoppe-Seyler
Cited by (20)
Molecular Basis and Regulation of Ammonium Transporter in Rice
2009, Rice Scienceα-helical topology prediction and generation of distance restraints in membrane proteins
2008, Biophysical JournalCitation Excerpt :Independent studies of these types of prediction methods have identified MEMSAT and TMHMM as high-performing methods in this area, although prediction performance was less impressive for eukaryotic proteins (7,8). Recent contributions in this area have considered combining a hidden Markov model with evolutionary information (9), combining a hidden Markov model with a molecular mechanics energy-scoring function (10), applying a support vector machine algorithm (11), and combining a variety of algorithms through a consensus approach (12). Since a large percentage of membrane proteins form α-helical bundles, many efforts have been made to compare and contrast these proteins with soluble α-helical proteins.
Plant membrane proteome databases
2004, Plant Physiology and BiochemistryA hidden Markov model with molecular mechanics energy-scoring function for transmembrane helix prediction
2004, Computational Biology and ChemistryCarotenoids overproduction in Dunaliella sp.: Transcriptional changes and new insights through lycopene cyclase regulation
2019, Applied Sciences (Switzerland)