Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics
Prediction and characterization of cyclic proteins from sequences in three domains of life☆
Introduction
Cyclic proteins (CPs) have their termini linked together to create a cyclic backbone and thus effectively have no beginning and no end in their native conformation. Naturally produced circular proteins have been found in bacteria, plants, fungi, and animals [1], [2]. Compared to their non-cyclic counterparts, they are relatively short (about a dozen to 100 amino acids), less prone to degradation, more structurally stable, and are harder to denature [1], [3]. One of the largest CP families, cyclotides, comprise disulfide-rich chains of 28 to 37 amino acids with a characteristic cyclic cystine knot consisting of an interlocking arrangement of three disulfide bridges [4]. They were the first discovered family of gene-expressed CPs and remain the most populated family among all depositions in the world-wide repository of CPs called CyBase [5], which as of January 2013 includes 633 cyclic proteins from 86 species. Cyclotides are among the most structurally stable proteins and are implicated in a diverse range of functions, from plant defense [3], [6] to anti-HIV, antimicrobial, hemolytic, and uterotonic capabilities [7]. They also have strong therapeutic potential and are being actively pursued as peptide-based drug leads, molecular probes, diagnostic agents, and immunosuppressants [7], [8], [9], [10], [11].
Besides cyclotides, two other families of CPs are trypsin inhibitors and bacteriocins. Cyclotides and trypsin inhibitors share the cystine knot motif. By contrast, bacteriocins are larger than cyclotides and trypsin inhibitors and do not contain a cystine knot. Bacteriocins exhibit various inhibitory functions, mainly against bacteria, such as inhibition of cell-wall synthesis and RNase or DNase activity [12]. Importantly, CPs can be synthetically synthesized [13] and efforts are being made to lower the corresponding production costs [14]. The abovementioned characteristics make CPs particularly desirable as potential therapeutic agents [15], [16].
Recent studies show that CPs are more common in the plant kingdom than was previously thought [5], including reports which suggest that cyclotides might include thousands of members [17]. The CyBase repository is undergoing continuing growth and it is expected that it will continue growing at a substantial pace [18]. Moreover, the biosynthetic mechanism of cyclization remains uncertain, and thus information on mechanisms currently cannot be used to indicate which species, and to what degree, produce cyclic proteins. These considerations provided the motivation for the current study, in which we design an accurate and fast in-silico method to predict whether a given protein chain is cyclic. Most importantly, this method is used to predict and characterize putative CPs on a proteomic scale across hundreds of eukaryotic, bacterial and archaea proteomes. Similar computational studies were recently carried out to characterize various functional classes of proteins, e.g., for disordered proteins [19], [20], caspases [21], and zinc proteins [22].
Section snippets
Datasets
We collected representative sets of data for cyclic and non-cyclic proteins. All wild-type cyclic chains, which were downloaded from CyBase [5] in July 2011, were clustered at 90% sequence similarity with CD-HIT [23] to remove redundancy; one chain from each cluster was kept. CD-HIT is a popular method (e.g., it was used to cluster UniProt to create the UniRef datasets) that implements a fast greedy incremental clustering which groups sequences into clusters that are characterized by sequence
Evaluation of predictive quality and runtime on test datasets
The predictive quality of CyPred was compared with currently available approaches to identify cyclic proteins, which include sequence alignment methods. A given test sequence was aligned against all sequences (both cyclic and non-cyclic) from the TRAINING dataset and the label of the most similar training sequence was assigned as the prediction. This allowed for a side-by-side comparison with CyPred that also uses the TRAINING dataset to build the prediction model. The similarity was measured
Conclusions
We designed and empirically tested a novel model, called CyPred, that predicts whether a protein chain is cyclic. The prediction model focuses on the four currently well-populated families of CPs (cyclotides, cyclic defensins, circular bacteriocins, and trypsin inhibitors). Prediction of other families of CPs will be addressed in the future as more annotated data becomes available.
Empirical results on TRAINING and TEST datasets showed that CyPred achieves MCC = 0.95 and sensitivity = 100%, and
Acknowledgements
We express sincere thanks to the organizers of the 2010/11 Sanofi BioGENEius Challenge Canada where initial results of this study were presented, and to Zhenling Peng who helped with the collection of data for the evolutionary pace analysis. MJM was supported by the University of Alberta Dissertation Scholarship. DJC is grateful for the support of a National Health and Medical Research Council (Australia) Fellowship and grant (APP1049928 and APP1047857).
References (40)
- et al.
Circular proteins-no end in sight
Trends Biochem. Sci.
(2002) Circling the enemy: cyclic proteins in plant defence
Trends Plant Sci.
(2009)- et al.
Plant cyclotides: a unique family of cyclic and knotted proteins that defines the cyclic cystine knot structural motif
J. Mol. Biol.
(1999) - et al.
Prediction and functional analysis of native disorder in proteins from the three kingdoms of life
J. Mol. Biol.
(2004) - et al.
Critical assessment of high-throughput standalone methods for secondary structure prediction
Brief. Bioinform.
(2011) - et al.
Basic local alignment search tool
J. Mol. Biol.
(1990) - et al.
Identification of common molecular subsequences
J. Mol. Biol.
(1981) An improved algorithm for matching biological sequences
J. Mol. Biol.
(1982)- et al.
Naturally occurring circular proteins: distribution, biosynthesis and evolution
Org. Biomol. Chem.
(2010) - et al.
CyBase: a database of cyclic protein sequences and structures, with applications in protein discovery and engineering
Nucleic Acids Res.
(2008)
Host-defense activities of cyclotides
Toxins
Cyclotides, a novel ultrastable polypeptide scaffold for drug discovery
Curr. Pharm. Des.
Do plant cyclotides have potential as immunosuppressant peptides?
J. Nat. Prod.
Cyclotides as a basis for drug design
Expert Opin. Drug Discov.
Cyclotides: a patent review
Expert Opin. Ther. Pat.
Cyclotides, a promising molecular scaffold for peptide-based therapeutics
Biopolymers
Peptide AS-48: prototype of a new class of cyclic bacteriocins
Curr. Protein Pept. Sci.
Production of cyclic peptides and proteins in vivo
Proc. Natl. Acad. Sci. U. S. A.
Synthesis of cyclic peptides and cyclic proteins via ligation of peptide hydrazides
Chembiochem
Potential therapeutic applications of the cyclotides and related cystine knot mini-proteins
Expert Opin. Investig. Drugs
Cited by (18)
Primary Structural Analysis of Cyclotides
2015, Advances in Botanical ResearchCitation Excerpt :Bioinformatic tools will undoubtedly assist in the characterization of cyclic peptides. CyPred, a recently developed bioinformatic tool (Kedarisetti, Mizianty, Kaas, Craik, & Kurgan, 2014), is able to predict whether a given peptide chain is cyclic. The model was designed based on the four well-populated cyclic peptide families: cyclotides, cyclic defensins, circular bacteriocins and trypsin inhibitors.
A glimpse into peptidomic approach
2021, Integrated Omics Approaches to Infectious DiseasesProduction of bioactive cyclotides: a comprehensive overview
2020, Phytochemistry ReviewsComputational prediction of intrinsic disorder in protein sequences with the disCoP meta-predictor
2020, Methods in Molecular BiologyPrediction of Intrinsic Disorder with Quality Assessment Using QUARTER
2020, Methods in Molecular Biology
- ☆
This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai.