Prediction and characterization of cyclic proteins from sequences in three domains of life

https://doi.org/10.1016/j.bbapap.2013.05.002Get rights and content

Highlights

  • CyPred accurately and quickly predicts whether a given chain is cyclic.

  • ~ 3500 putative cyclic proteins were found in 642 fully sequenced proteomes.

  • 74 proteomes (4 archaea, 52 bacteria, and 18 eukaryotes) have 10 + cyclic proteins.

  • Half of proteomes have at least one cyclic chain predicted with high confidence.

  • CyPred and putative cyclic chains are available at biomine.ece.ualberta.ca/CyPred/.

Abstract

Cyclic proteins (CPs) have circular chains with a continuous cycle of peptide bonds. Their unique structural traits result in greater stability and resistance to degradation when compared to their acyclic counterparts. They are also promising targets for pharmaceutical/therapeutic applications. To date, only a few hundred CPs are known, although recent studies suggest that their numbers might be substantially higher. Here we developed a first-of-its-kind, accurate and high-throughput method called CyPred that predicts whether a given protein chain is cyclic. CyPred considers currently well-represented CP families: cyclotides, cyclic defensins, bacteriocins, and trypsin inhibitors. Empirical tests demonstrate that CyPred outperforms commonly used alignment methods. We used CyPred to estimate the incidence of CPs and found ~ 3500 putative CPs among 5.7+ million chains from 642 fully sequenced proteomes from archaea, bacteria, and eukaryotes. The median number of putative CPs per species ranges from three for archaea proteomes to two for eukaryotes/bacteria, with 7% of archaea, 11% of bacterial, and 16% of eukaryotic proteomes having 10+ CPs. The differences in the estimated fractions of CPs per proteome are as large as three orders of magnitude. Among eukaryotes, animals have higher ratios of CPs compared to fungi, while plants have the largest spread of the ratios. We also show that proteomes enriched in cyclic proteins evolve more slowly than proteomes with fewer cyclic chains. Our results suggest that further research is needed to fully uncover the scope and potential of cyclic proteins. A list of putative CPs and the CyPred method are available at http://biomine.ece.ualberta.ca/CyPred/. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai.

Introduction

Cyclic proteins (CPs) have their termini linked together to create a cyclic backbone and thus effectively have no beginning and no end in their native conformation. Naturally produced circular proteins have been found in bacteria, plants, fungi, and animals [1], [2]. Compared to their non-cyclic counterparts, they are relatively short (about a dozen to 100 amino acids), less prone to degradation, more structurally stable, and are harder to denature [1], [3]. One of the largest CP families, cyclotides, comprise disulfide-rich chains of 28 to 37 amino acids with a characteristic cyclic cystine knot consisting of an interlocking arrangement of three disulfide bridges [4]. They were the first discovered family of gene-expressed CPs and remain the most populated family among all depositions in the world-wide repository of CPs called CyBase [5], which as of January 2013 includes 633 cyclic proteins from 86 species. Cyclotides are among the most structurally stable proteins and are implicated in a diverse range of functions, from plant defense [3], [6] to anti-HIV, antimicrobial, hemolytic, and uterotonic capabilities [7]. They also have strong therapeutic potential and are being actively pursued as peptide-based drug leads, molecular probes, diagnostic agents, and immunosuppressants [7], [8], [9], [10], [11].

Besides cyclotides, two other families of CPs are trypsin inhibitors and bacteriocins. Cyclotides and trypsin inhibitors share the cystine knot motif. By contrast, bacteriocins are larger than cyclotides and trypsin inhibitors and do not contain a cystine knot. Bacteriocins exhibit various inhibitory functions, mainly against bacteria, such as inhibition of cell-wall synthesis and RNase or DNase activity [12]. Importantly, CPs can be synthetically synthesized [13] and efforts are being made to lower the corresponding production costs [14]. The abovementioned characteristics make CPs particularly desirable as potential therapeutic agents [15], [16].

Recent studies show that CPs are more common in the plant kingdom than was previously thought [5], including reports which suggest that cyclotides might include thousands of members [17]. The CyBase repository is undergoing continuing growth and it is expected that it will continue growing at a substantial pace [18]. Moreover, the biosynthetic mechanism of cyclization remains uncertain, and thus information on mechanisms currently cannot be used to indicate which species, and to what degree, produce cyclic proteins. These considerations provided the motivation for the current study, in which we design an accurate and fast in-silico method to predict whether a given protein chain is cyclic. Most importantly, this method is used to predict and characterize putative CPs on a proteomic scale across hundreds of eukaryotic, bacterial and archaea proteomes. Similar computational studies were recently carried out to characterize various functional classes of proteins, e.g., for disordered proteins [19], [20], caspases [21], and zinc proteins [22].

Section snippets

Datasets

We collected representative sets of data for cyclic and non-cyclic proteins. All wild-type cyclic chains, which were downloaded from CyBase [5] in July 2011, were clustered at 90% sequence similarity with CD-HIT [23] to remove redundancy; one chain from each cluster was kept. CD-HIT is a popular method (e.g., it was used to cluster UniProt to create the UniRef datasets) that implements a fast greedy incremental clustering which groups sequences into clusters that are characterized by sequence

Evaluation of predictive quality and runtime on test datasets

The predictive quality of CyPred was compared with currently available approaches to identify cyclic proteins, which include sequence alignment methods. A given test sequence was aligned against all sequences (both cyclic and non-cyclic) from the TRAINING dataset and the label of the most similar training sequence was assigned as the prediction. This allowed for a side-by-side comparison with CyPred that also uses the TRAINING dataset to build the prediction model. The similarity was measured

Conclusions

We designed and empirically tested a novel model, called CyPred, that predicts whether a protein chain is cyclic. The prediction model focuses on the four currently well-populated families of CPs (cyclotides, cyclic defensins, circular bacteriocins, and trypsin inhibitors). Prediction of other families of CPs will be addressed in the future as more annotated data becomes available.

Empirical results on TRAINING and TEST datasets showed that CyPred achieves MCC = 0.95 and sensitivity = 100%, and

Acknowledgements

We express sincere thanks to the organizers of the 2010/11 Sanofi BioGENEius Challenge Canada where initial results of this study were presented, and to Zhenling Peng who helped with the collection of data for the evolutionary pace analysis. MJM was supported by the University of Alberta Dissertation Scholarship. DJC is grateful for the support of a National Health and Medical Research Council (Australia) Fellowship and grant (APP1049928 and APP1047857).

References (40)

  • D.J. Craik

    Host-defense activities of cyclotides

    Toxins

    (2012)
  • A. Gould et al.

    Cyclotides, a novel ultrastable polypeptide scaffold for drug discovery

    Curr. Pharm. Des.

    (2011)
  • C. Gründemann et al.

    Do plant cyclotides have potential as immunosuppressant peptides?

    J. Nat. Prod.

    (2012)
  • D.J. Craik et al.

    Cyclotides as a basis for drug design

    Expert Opin. Drug Discov.

    (2012)
  • A.B. Smith et al.

    Cyclotides: a patent review

    Expert Opin. Ther. Pat.

    (2011)
  • K. Jagadish et al.

    Cyclotides, a promising molecular scaffold for peptide-based therapeutics

    Biopolymers

    (2010)
  • M. Maqueda et al.

    Peptide AS-48: prototype of a new class of cyclic bacteriocins

    Curr. Protein Pept. Sci.

    (2004)
  • C.P. Scott et al.

    Production of cyclic peptides and proteins in vivo

    Proc. Natl. Acad. Sci. U. S. A.

    (1999)
  • J.S. Zheng et al.

    Synthesis of cyclic peptides and cyclic proteins via ligation of peptide hydrazides

    Chembiochem

    (2012)
  • D.J. Craik et al.

    Potential therapeutic applications of the cyclotides and related cystine knot mini-proteins

    Expert Opin. Investig. Drugs

    (2007)
  • Cited by (18)

    • Primary Structural Analysis of Cyclotides

      2015, Advances in Botanical Research
      Citation Excerpt :

      Bioinformatic tools will undoubtedly assist in the characterization of cyclic peptides. CyPred, a recently developed bioinformatic tool (Kedarisetti, Mizianty, Kaas, Craik, & Kurgan, 2014), is able to predict whether a given peptide chain is cyclic. The model was designed based on the four well-populated cyclic peptide families: cyclotides, cyclic defensins, circular bacteriocins and trypsin inhibitors.

    • A glimpse into peptidomic approach

      2021, Integrated Omics Approaches to Infectious Diseases
    View all citing articles on Scopus

    This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai.

    View full text