ConPred_elite: a highly reliable approach to transmembrane topology prediction

https://doi.org/10.1016/j.compbiolchem.2003.11.002Get rights and content

Abstract

The function of transmembrane (TM) proteins is closely correlated to their TM topology; large quantities of highly reliable TM topology data are becoming increasingly required. We present a new consensus approach for TM topology prediction (ConPred_elite) that can predict the whole topology with accuracies of 0.98 for prokaryotic and 0.95 for eukaryotic proteins on a dataset of experimentally-characterized TM topologies. The predicted yield on the dataset is 30.4% for prokaryotic and 21.5% for eukaryotic proteins. Applying ConPred_elite to predicted TM proteins extracted from 29 prokaryotic and 10 eukaryotic proteomes, we obtained 3871 and 7271 highly reliable TM topologies (yields, 19.8 and 13.3%), respectively. The predicted TM topology data may contribute to further research into a comprehensive functional classification and identification of TM proteins based on information of the topology.

Introduction

Transmembrane (TM) proteins, which play extremely important life functions, have been reported to share approximately 20–30% of genes in most genomes (Krogh et al., 2001, Liu and Rost, 2001, Arai et al., 2003). Their functions are closely correlated to their TM topology, i.e., the number, orientation to the membrane lipid bilayer, and position of transmembrane segments (TMSs) and can be classified and identified using the information of TM topology (Sugiyama et al., 2003). Thus, high quality information of TM topology is a requisite for comprehensive analysis of TM protein functions.

Many topology prediction methods have been proposed, although previous studies (Moller et al., 2001, Ikeda et al., 2002, Chen et al., 2002) revealed that the prediction accuracies of individual methods are still not high enough, i.e., at most 60–70% as to whole topology. For improving the prediction performance in a practical way, consensus prediction methods have been successfully tried with increased accuracies by as much as 10% (Ikeda et al., 2002). Recently, Nilsson et al., 2000, Nilsson et al., 2002, and Käll and Sonnhammer (2002) have presented alternative interesting approaches for consensus with TM topology prediction methods, aiming to get higher-quality TM topology models. The approach proposed by Nilsson et al., however, cannot predict the whole TM topology over an entire sequence, unfortunately. In the algorithm of the latter group, the prediction accuracy (i.e., reliability) is not given explicitly, although the whole topology can be predicted.

Here, in the hope that we can get still more reliable TM topology data, we propose a new consensus approach for TM topology prediction (ConPred_elite) that can predict the whole topology with accuracies of 0.98 and 0.95 for prokaryotic and eukaryotic TM protein sequences, respectively. Applying ConPred_elite to TM protein sequences extracted from 29 prokaryotic and 10 eukaryotic proteomes, we obtained finally 3871 and 6980 TM topology models, respectively, for next various uses.

Section snippets

Test dataset of transmembrane protein sequences with experimentally-characterized topologies

As a test dataset of TM protein sequences with experimentally-characterized TM topologies, we used a TMPDB_alpha_non-redundant dataset (<30% sequence similarities) extracted from TMPDB (Release 6.2; http://bioinfo.si.hirosaki-u.ac.jp/∼TMPDB/; Ikeda et al., 2003). This dataset includes 138 prokaryotic and 93 eukaryotic sequences of which TM topologies were determined experimentally by X-ray diffraction, NMR, two-dimensional crystal diffraction, gene fusion technique, substituted cysteine

Determining and ascertaining prediction reliability of the best combination from five methods for consensus transmembrane topology prediction (ConPred_elite)

We evaluated the consensus topology prediction performances for six combinations of five methods from six selected prediction methods by varying the value of the “allowable deviation”, n from 7 to 20 residues, using the TMPDB_alpha_non-redundant dataset. Performance evaluation was carried out separately for prokaryotic and eukaryotic sub-datasets. The results are summarized in Table 1, with the values of n, 15, and 11 residues for prokaryotic and eukaryotic sequences, respectively, that make

Acknowledgements

This research was supported in part by a Grant-in-Aid for Scientific Research on Priority Areas (C) ‘Genome Information Science’ (No. 15014203) and a Grant-in-Aid for Scientific Research (C) (No. 14580665) from the Ministry of Education, Culture, Sports, Science and Technology of Japan.

References (33)

  • B Boeckmann et al.

    The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003

    Nucleic Acids Res.

    (2003)
  • C.P Chen et al.

    Transmembrane helix predictions revisited

    Protein Sci.

    (2002)
  • M Clamp et al.

    Ensembl 2002: accommodating comparative genomics

    Nucleic Acids Res.

    (2003)
  • M.G Claros et al.

    TopPred II: an improved software for membrane protein structure predictions

    Comput. Appl. Biosci.

    (1994)
  • T Hirokawa et al.

    SOSUI: classification and secondary structure prediction system for membrane proteins

    Bioinformatics

    (1998)
  • K Hofmann et al.

    Tmbase—a database of membrane spanning proteins segments

    Biol. Chem. Hoppe-Seyler

    (1993)
  • Cited by (20)

    • α-helical topology prediction and generation of distance restraints in membrane proteins

      2008, Biophysical Journal
      Citation Excerpt :

      Independent studies of these types of prediction methods have identified MEMSAT and TMHMM as high-performing methods in this area, although prediction performance was less impressive for eukaryotic proteins (7,8). Recent contributions in this area have considered combining a hidden Markov model with evolutionary information (9), combining a hidden Markov model with a molecular mechanics energy-scoring function (10), applying a support vector machine algorithm (11), and combining a variety of algorithms through a consensus approach (12). Since a large percentage of membrane proteins form α-helical bundles, many efforts have been made to compare and contrast these proteins with soluble α-helical proteins.

    • Plant membrane proteome databases

      2004, Plant Physiology and Biochemistry
    View all citing articles on Scopus
    View full text