Elsevier

Toxicon

Volume 165, July 2019, Pages 95-102
Toxicon

Mapping the chemical and sequence space of the ShKT superfamily

https://doi.org/10.1016/j.toxicon.2019.04.008Get rights and content

Highlights

  • For highly diverse sequences, multidimensional scaling of residue properties identifies key clusters and sequence features.

  • The ShKT superfamily contains two distinct clusters, the ShK-like sequences and cysteine-rich domains of CRISPs.

  • The ShK-like family contains seven sub-clusters which have distinct biophysical fingerprints.

  • These analyses may be used as a guide for selection of ShK-like sequences for future functional characterisation.

Abstract

The ShKT superfamily is widely distributed throughout nature and encompasses a wide range of documented functions and processes, from modulation of potassium channels to involvement in morphogenesis pathways. Cysteine-rich secretory proteins (CRISPs) contain a cysteine-rich domain (CRD) at the C-terminus that is similar in structure to the ShK fold. Despite the structural similarity of the CRD and ShK-like domains, we know little of the sequence-function relationships in these families. Here, for the first time, we examine the evolution of the biophysical properties of sequences within the ShKT superfamily in relation to function, with a focus on the ShK-like superfamily. ShKT data were sourced from published sequences in the protein family database, in addition to new ShK-like sequences from the Australian speckled anemone (Oulactis sp.). Our analysis clearly delineates the ShK-like family from the CRDs of CRISP proteins. The four CRISP subclusters separate out into the main phyla of Mammalia, Insecta and Reptilia. The ShK-like family is in turn composed of seven subclusters, the largest of which contains members from across the eukaryotes, with a continuum of intermediate properties. Smaller sub-clusters contain specialised members such as nematode ShK-like sequences. Several of these ShKT sub-clusters contain no functionally characterised sequences. This chemical space analysis should be useful as a guide to select sequences for functional studies and to gain insight into the evolution of these highly divergent sequences with an ancient conserved fold.

Introduction

The Stichodactyla helianthus K channel toxin-like superfamily (ShKT) consists of molecules derived from both venomous and non-venomous species (Chang et al., 2018). These molecules have a range of functions and processes, from modulatory actions on potassium channels to roles in morphogenesis and cell differentiation (Gibbs et al., 2008; Prentis et al., 2018; Tsang et al., 2007; Tudor et al., 1996; Yan et al., 2000). ShKT domains are widespread throughout nature, occurring in sea anemones, reptiles, nematodes and mammals (Castañeda et al., 1995; Galea et al., 2014; Gibbs et al., 2006; Minagawa et al., 1998; Nguyen et al., 2013; Sunagar et al., 2012; Tsang et al., 2007). However, we know little of their evolution and the relationship between their chemical properties and function.

The ShKT superfamily contains two main families, the ShK-like proteins and the cysteine-rich domains (CRD) of cysteine-rich secretory proteins (CRISPs), which share a common fold (Fig. 1A). ShK-like domains occur as discrete single domains, multiple repeats or in combination with peptidase and other enzyme domains (Castañeda et al., 1995; Finn et al., 2014; Galea et al., 2014) (Fig. S1). The CRISPs have a more constrained organisation, with a ShKT domain linked by a flexible hinge to the pathogenesis-related 1 (PR1) domain (Gibbs et al., 2008). Despite their structural similarity, superfamily members have a documented range of functions, processes and taxonomic distribution.

The prototypical superfamily member is the sea anemone toxin ShK, a 35-residue peptide isolated from the sea anemone Stichodactyla helianthus (Castañeda et al., 1995). The structure of ShK is characterised by two α-helices (Fig. 1B) and three disulfide bonds (Cys1-Cys6, Cys2-Cys4 and Cys3-Cys5) (Tudor et al., 1996). CRISP sequences are larger polypeptides, characterised by 10–16 cysteines and molecular masses of 20–30 kDa, with distinct domains. The CRD region has a similar fold and cysteine connectivity to that of ShK (Fig. 1C) and is therefore thought to be homologous to ShK-like peptides, although their sequence identity is low (Guo et al., 2005).

Four main functional and biological processes have been attributed thus far to proteins in the ShKT superfamily: potassium and calcium channel modulation, involvement in morphogenesis pathways and cell regeneration. ShK, the prototypical peptide of the ShKT domain, is a potent KV1 channel blocker (Castañeda et al., 1995). An analogue of this peptide is in clinical trial for treatment of the autoimmune disease psoriasis (Chandy and Norton, 2017; Chi et al., 2012; Tarcha et al., 2017). Other ShK-like single domain peptides show activity across various KV channels, for example BgK (from Bundosoma granulifera) (KV1.1, 1.2, 1.3) and Oulactis sp. (OspTx2a) (KV1.2, 1.6) (Cotton et al., 1997; Sunanda et al., 2018). ShK-like sequences are also polyfunctional, with ShPI-1 (from Stichodactyla helianthus) and APEKTx1 (from Anthopleura elegantissima) being both Kunitz-type inhibitors with KV channel binding activity (García-Fernández et al., 2016; Peigneur et al., 2011). The precursor domain of NEP3 (from Nematostella edwardsia), a ShKT domain repeat protein, was shown to be neurotoxic to fish (Columbus-Shenkar et al., 2018; Moran et al., 2013). The function of the remaining ShKT domains in NEP3 are yet to be determined (Columbus-Shenkar et al., 2018).

The toxin-like domain (TxD) of the matrix metalloprotease MMP-23 has high structural similarity to ShK/BgK and the CRD region of CRISPs (Galea et al., 2014; Nguyen et al., 2013; Rangaraju et al., 2010). Functionally, MMP-23 may play a role in potassium channel trafficking to cell membranes (Galea et al., 2014). The TxD domain is a blocker of several KV channels (1.1, 0.3, 1.4, 1.6 and 3.2), although ineffective against others (KV 1.2, 1.5, 1.7 and KCa3.1) (Galea et al., 2014).

The importance of the ShKT domain in morphogenesis pathways is exemplified by Mab-7, a protein from the roundworm Caenorhabditis elegans (Tsang et al., 2007). The ShKT domain in Mab-7 is essential for protein binding, the protein itself being responsible for sensory ray morphogenesis, which is required for contact between the male and hermaphrodite during mating (Tsang et al., 2007).

CRISPs are found predominantly in vertebrates and are composed of four main functional groups. CRISP-1 is located in the reproductive tracts of mammals and CRISP-2 performs an autoantigen function in sperm. Functional studies of the mouse CRD domain of Tpx-1, a CRISP-2 member, show that it regulates the cardiac ryanodine receptor Ca2+ signalling, in line with its relatedness to the ion-channel modulators BgK and ShK (Gibbs et al., 2006). CRISP-3 is found in the venom of reptiles (snakes and lizards) and CRISP-4 is rodent-specific, where it is implicated in sperm interactions (Gibbs et al., 2008; Turunen et al., 2011). Several homologous CRISP-like sequences have been identified in insects (e.g. mosquitos, beetles and ants), although most of these proteins remain uncharacterised (Holt et al., 2002; Oxley et al., 2014; The Tribolium Genome Sequencing Consortium, 2008).

Cysteine-rich peptide sequences from nature comprise a valuable library of bioactive compounds. Their compact, stable structures permit a large variety of loop sequences to be tolerated, leading to high sequence and function diversity. However, this diversity typically precludes most traditional sequence analyses, such as phylogenetics (Inkpen and Doolittle, 2016; Pearson and Sierk, 2005; Rost, 1999). We therefore built quantitative maps based on sequence/chemical space to define the functional regions explored by the extant ShKs. This technique has been applied successfully to investigate the evolution of two superfamilies of defensins, sequences that are also cysteine-rich and highly sequence-diverse (Mitchell et al., 2019; Shafee et al., 2017; Shafee and Anderson, 2018), as well as for traditional globular proteins (Jackson et al., 2018). Sequences can be explicitly placed within a multidimensional sequence space based on a multiple sequence alignment (MSA) combined with the biophysical properties of their residues (Atchley et al., 2005). This multidimensional space can be summarised in a smaller number of dimensions that capture the important sequence properties. Clusters within such a space indicate sets of co-varying peptide chemical properties (Shafee and Anderson, 2018).

Here, we explore the biophysical determinants of these clusters and their relationship to function. This gives insights into the main groups of ShK-like sequences, and their evolution, while clearly delineating the CRISP CRD family within the ShKT superfamily. This analysis can also be used to guide exploration of the available ShKT sequences by identifying sequences from poorly characterised regions of chemical space.

Section snippets

Sequence data

Sequences deposited in the protein family database (Pfam - https://pfam.xfam.org/) under the ShKT clan (CL0213), CRISP (PF08562) and ShK-like (PF01549) were downloaded (Finn et al., 2014). Full-length sequences were examined, and the predicted mature peptide retained for parsing through to the alignments and chemical space analysis. Sequences also included two functionally characterised ShK-like sequences isolated from the tentacle transcriptomes of the sea anemone Oulactis sp., U-AITX-Oulsp1

Sequence retrieval and data preparation

A total of 1082 sequences, including single and multiple domains, was subject to analysis after division into their ShK and CRD domain components. The analysis included 50 new sequences from the speckled anemone (Oulactis sp.) transcriptome containing 90 ShK-like domains. Many of these sequences contained multiple ShK-like repeat domains, up to nine repeats in one sequence, or were in multi-domain proteins associated with peptidases (e.g. astacin) or protease inhibitors (e.g. Kunitz-type) (Fig.

Discussion

The ShKT fold is an ancient scaffold found in cnidarians (∼700 mya), nematodes, mammals and toxicoferan-reptiles (∼65 mya) in the form of ShK-like proteins and the CRD region of CRISPs (Chhabra et al., 2014; Fautin, 1998; Gibbs et al., 2008, 2006; Sunagar et al., 2012; Tudor et al., 1996). Within this constrained disulfide structure, sequence divergence is exceptionally high. However, the sequences are clearly not random; otherwise, the sequence space would show a spherical cloud. The clusters

Ethical statement

Ethical statement On behalf of, and having obtained permission from all the authors, I declare that:

  • (a)

    the material has not been published in whole or in part elsewhere;

  • (b)

    the paper is not currently being considered for publication elsewhere;

  • (c)

    all authors have been personally and actively involved in work leading to the review. All authors have read the manuscript and agree to its publication in Toxicon.

Conflict of interest

The authors declare there are no conflicts of interest.

Acknowledgment

The authors acknowledge Dr. Rodrigo A. V. Morales and Edward Airey for their contribution to the preparation of sequences used in the analysis. This project was funded in part by ARC linkage grant LP150100621. M.L.M acknowledges an Australian Government Research Training Program Scholarship, Monash Medicinal Chemistry Faculty Scholarship and Monash University-Museum Victoria Scholarship top-up. R.S.N acknowledges fellowship support from the Australian National Health and Medical Research Council

References (64)

  • H.M. Nguyen et al.

    Intracellular trafficking of the KV1.3 potassium channel is regulated by the prodomain of a matrix metalloprotease

    J. Biol. Chem.

    (2013)
  • T.V. Ovchinnikova et al.

    Aurelin, a novel antimicrobial peptide from jellyfish Aurelia aurita with structural features of defensins and channel-blocking toxins

    Biochem. Biophys. Res. Commun.

    (2006)
  • P.R. Oxley et al.

    The genome of the clonal raider ant Cerapachys biroi

    Curr. Biol.

    (2014)
  • W.R. Pearson et al.

    The limits of protein sequence comparison?

    Curr. Opin. Struct. Biol.

    (2005)
  • S. Peigneur et al.

    A bifunctional sea anemone peptide with Kunitz type protease and potassium channel inhibiting properties

    Biochem. Pharmacol.

    (2011)
  • S. Rangaraju et al.

    Potassium channel modulation by a toxin domain in matrix metalloprotease 23

    J. Biol. Chem.

    (2010)
  • Y. Shikamoto et al.

    Crystal structure of a CRISP family Ca2+ channel blocker derived from snake venom

    J. Mol. Biol.

    (2005)
  • S.W. Tsang et al.

    mab-7 encodes a novel transmembrane protein that orchestrates sensory ray morphogenesis in C. elegans

    Dev. Biol.

    (2007)
  • W.R. Atchley et al.

    Solving the protein sequence metric problem

    Proc. Natl. Acad. Sci. Unit. States Am.

    (2005)
  • H.M. Berman et al.

    The protein data bank

    Nucleic Acids Res.

    (2000)
  • A. Campen et al.

    TOP-IDP-Scale: a new amino acid scale measuring propensity for intrinsic disorder

    Protein Pept. Lett.

    (2008)
  • S.C. Chang et al.

    ShK toxin: history, structure and therapeutic applications for autoimmune diseases

    WikiJ. Sci.

    (2018)
  • S. Chhabra et al.

    KV1.3 channel-blocking immunomodulatory peptides from parasitic worms: implications for autoimmune diseases

    FASEB J.

    (2014)
  • Y.Y. Columbus-Shenkar et al.

    Dynamics of venom composition across a complex life cycle

    Elife

    (2018)
  • J. Cotton et al.

    A potassium-channel toxin from the sea anemone Bunodosoma granulifera, an inhibitor for KV1 channels - revision of the amino acid sequence, disulfide-bridge assignment, chemical synthesis, and biological activity

    Eur. J. Biochem.

    (1997)
  • G.E. Crooks

    WebLogo: a sequence logo generator

    Genome Res.

    (2004)
  • G. Csárdi et al.

    The igraph software package for complex network research

    InterJournal Complex Syst.

    (2006)
  • T.S. Dash et al.

    A centipede toxin family defines an ancient class of CSαβ defensins

    Structure

    (2019)
  • W.L. DeLano

    Pymol: an open-source molecular graphics tool

    CCP4 Newsl. Protein Crystallogr.

    (2002)
  • Development Core Team R

    R: A Language and Environment for Statistical Computing

    (2011)
  • D.G. Fautin

    Cnidaria

  • R.D. Finn et al.

    Pfam: the protein families database

    Nucleic Acids Res.

    (2014)
  • Cited by (11)

    View all citing articles on Scopus
    View full text