Computational analysis of evolution and conservation in a protein superfamily

doi:10.1016/S1046-2023(03)00200-7

Methods

Volume 32, Issue 2, February 2004, Pages 73-92

https://doi.org/10.1016/S1046-2023(03)00200-7 Get rights and content

Abstract

Many gene superfamilies have hundreds or thousands of members and hence pose a significant challenge when performing a large-scale phylogenetic analysis. Derivation of the most accurate alignment possible and inference of evolutionary relationships (with an appropriate measure of confidence) are significant “bottlenecks” in the process. A generally applicable strategy is outlined for identifying and aligning sequences, performing simple analysis of the resulting alignment, and inferring evolutionary relationships. Reference is made to the serpin superfamily. The ‘partition cluster’ method, a relatively rapid technique for extracting underlying associations from phylogenetic bootstrap trees, is also presented.

Introduction

Large-scale consolidation and organization of sequences into groups closely related in evolution is a useful step in the classification of proteins [1] and prediction of protein function [2]. In most cases, this is not a trivial undertaking and requires the careful selection of methods for aligning sequences and inferring phylogenetic relationships. Considerations include both the applicability of a particular method to the data (e.g., different models of evolution, different degrees of divergence) and the practical consideration of computational feasibility.

This paper will outline a general strategy for the identification and alignment of sequences from a family of proteins, and the inference of patterns of amino acid conservation and evolutionary relationships between them. This is based in part on approaches used previously to analyze the serpin gene superfamily [3], [4], [5], [6]. Information on the serpin superfamily can be found elsewhere (for a brief overview, see [7]; a detailed treatment can be found in [8]).

Section snippets

Description of method

Table 1 summarizes the software programs used in this paper and the Internet locations from which they can be accessed (as at April 2003). Most of the programs operate under a UNIX-type environment (e.g., Linux), which readily permits scripting and linking of software together. Many programs can be obtained as ready-to-run executables, but depending on the system used, some may need to be compiled. For information on the PERL programming language, the O’Reilly text ‘Learning PERL’ is

Concluding remarks

A large-scale phylogenetic analysis should be undertaken using both sequence and non-sequence-based data. A consideration of the completeness of the sequence data is important; removing those that show identity above a given threshold (e.g., 90%) can dramatically increase the manageability of the data and the speed with which it is processed. Alignment accuracy is critical and reference to structural information can be very useful in this regard. From this alignment, simple techniques can

Acknowledgements

The authors wish to thank Michael Cameron for suggesting the use of edit distance in the partition cluster method. J.C.W. is an Australian National Health and Medical Research Council Senior Research Fellow.

References (81)

G.A Silverman et al.
J. Biol. Chem.
(2001)
J Park et al.
J. Mol. Biol.
(1998)
O Kruger et al.
Gene
(2002)
S.F Altschul et al.
Trends Biochem. Sci.
(1998)
R Sadreyev et al.
J. Mol. Biol.
(2003)
C Notredame et al.
J. Mol. Biol.
(2000)
W.R Pearson
Methods Enzymol.
(1996)
S.F Altschul et al.
J. Mol. Biol.
(1990)
O Gotoh
J. Mol. Biol.
(1996)
D.G Higgins et al.
Methods Enzymol.
(1996)

W.R Atchley et al.

Mol. Biol. Evol.

(2001)

P.G Gettins

Chem. Rev.

EMBO J.

(1986)

Cited by (9)

Haemonchus contortus: Cloning and characterization of serpin
2010, Experimental Parasitology
The serpin gene of Haemonchus contortus (hc-serpin) was cloned and characterized in this study. Specific primers for rapid amplification cDNA ends (RACE) were designed based on the expression sequence tag (EST, BM173953) to amplify the 3′- and 5′-ends of hc-serpin. The full length of the cDNA of this gene was obtained by overlapping the sequences of 3′- and 5′-extremities and amplification by reverse transcription-PCR. The biochemical activities of the recombinant protein (rHc-Serpin), which was expressed in prokaryotic cells and purified by affinity chromatography and size-exclusion chromatography, were analyzed by assays of trypsin inhibition, anti-coagulation activity, and stability to temperature and pH. The results showed that the cloned full-length cDNA comprised 1317 bp and encoded a peptide with 367 amino acid residues which showed sequence similarity to several known serpins. The rHc-Serpin inhibited trypsin activity effectively and prolonged the coagulation time of rabbit blood in vitro. The rHc-Serpin was stable from pH 2.0–10.0 and kept activity at high temperature until 75 °C. Optimal pH of rHc-Serpin protein to inhibit trypsin activity was at pH 7.6. The natural serpin of H. contortus detected by immunoblot assay was about 63 kDa, and the rHc-Serpin was recognized strongly by serum from naturally infected goats. By immunohistochemistry, the serpin was localised exclusively in the epithelial cells of gastrointestinal tract in adult H. contortus. The results indicated that the cloned gene was serpin and that the protein may play important roles in the biological functions of H. contortus.
A comparative analysis of serpin genes in the silkworm genome
2009, Genomics
Serine protease inhibitors (serpins) are a superfamily of proteins, most of which control protease-mediated processes by inhibiting their cognate enzymes. Sequencing of the silkworm genome provides an opportunity to investigate serpin structure, function, and evolution at the genome level. There are thirty-four serpin genes in Bombyx mori. Six are highly similar to their Manduca sexta orthologs that regulate innate immunity. Three alternative exons in serpin1 gene and four in serpin28 encode a variable region including the reactive site loop. Splicing of serpin2 pre-mRNA yields variations in serpin2A, 2A′ and 2B. Sequence similarity and intron positions reveal the evolutionary pathway of seven serpin genes in group C. RT-PCR indicates an increase in the mRNA levels of serpin1, 3, 5, 6, 9, 12, 13, 25, 27, 32 and 34 in fat body and hemocytes of larvae injected with bacteria. These results suggest that the silkworm serpins play regulatory roles in defense responses.
Shape-shifting serpins - advantages of a mobile mechanism
2006, Trends in Biochemical Sciences
Serpins use an extraordinary mechanism of protease inhibition that depends on a rapid and marked conformational change and causes destruction of the covalently linked protease. Serpins thus provide stoichiometric, irreversible inhibition, and their dependence on conformational change is exploited for signalling and clearance. The regulatory advantages provided by structural mobility are best illustrated by the heparin activation mechanisms of the plasma serpins antithrombin and heparin cofactor II. This mechanistic complexity, however, renders serpins highly susceptible to disease-causing mutations. Recent crystal structures reveal the intricate conformational rearrangements involved in protease inhibition, activity modulation and the unique molecular pathology of the remarkable shape-shifting serpins.
The murine orthologue of human antichymotrypsin: A structural paradigm for clade A3 serpins
2005, Journal of Biological Chemistry
Citation Excerpt :
Sequence Analysis—Antitrypsin and antichymotrypsin-like protein sequences were identified from human, cow, pig, and rat using the BLAST algorithm (45) and muACT-n as a probe with an expected threshold of 1 × 10-6 and minimum sequence identity of 35%; mouse sequences were those described previously (21). Following construction of an initial set of 500 distance neighbor-joining bootstrapped distance trees using MOLPHY (46), sequences showing a strong association with muACT-n were short-listed using the partition consensus method (11, 47). A multiple protein alignment was performed using ClustalW (48), and patterns of divergence across sites within these sequences were examined using maximum likelihood dN/dS determination (49).
Antichymotrypsin (SERPINA3) is a widely expressed member of the serpin superfamily, required for the regulation of leukocyte proteases released during an inflammatory response and with a permissive role in the development of amyloid encephalopathy. Despite its biological significance, there is at present no available structure of this serpin in its native, inhibitory state. We present here the first fully refined structure of a murine antichymotrypsin orthologue to 2.1 Å, which we propose as a template for other antichymotrypsin-like serpins. A most unexpected feature of the structure of murine serpina3n is that it reveals the reactive center loop (RCL) to be partially inserted into the A β-sheet, a structural motif associated with ligand-dependent activation in other serpins. The RCL is, in addition, stabilized by salt bridges, and its plane is oriented at 90° to the RCL of antitrypsin. A biochemical and biophysical analysis of this serpin demonstrates that it is a fast and efficient inhibitor of human leukocyte elastase (k_a: 4 ± 0.9 × 10⁶ m^-1 s^-1) and cathepsin G (k_a: 7.9 ± 0.9 × 10⁵ m^-1 s^-1) giving a spectrum of activity intermediate between that of human antichymotrypsin and human antitrypsin. An evolutionary analysis reveals that residues subject to positive selection and that have contributed to the diversity of sequences in this sub-branch (A3) of the serpin superfamily are essentially restricted to the P₄–P₆′ region of the RCL, the distal hinge, and the loop between strands 4B and 5B.
Protein superfamilies based phylogenomic analysis of archaeal domain
2011, Biochemistry Research Updates
Serpins in plants and green algae
2008, Functional and Integrative Genomics

View all citing articles on Scopus

View full text

Computational analysis of evolution and conservation in a protein superfamily

Abstract

Introduction

Section snippets

Description of method

Concluding remarks

Acknowledgements

J. Biol. Chem.

J. Mol. Biol.

Gene

Trends Biochem. Sci.

J. Mol. Biol.

J. Mol. Biol.

Methods Enzymol.

J. Mol. Biol.

J. Mol. Biol.

Methods Enzymol.

J. Struct. Biol.

J. Mol. Biol.

J. Mol. Biol.

J. Mol. Biol.

Genomics

Genomics

Bull. Math. Biol.

Nucleic Acids Res.

Genome Res.

Genome Res.

Mol. Biol. Evol.

Mol. Biol. Evol.

Mol. Biol. Evol.

Chem. Rev.

Introduction to Bioinformatics

Molecular Evolution: A Phylogenetic Approach

Nucleic Acids Res.

Nucleic Acids Res.

Nucleic Acids Res.

Bioinformatics

Nucleic Acids Res.

Nucleic Acids Res.

Biochemistry

Nucleic Acids Res.

Bioinformatics

Proteins Suppl.

Bioinformatics

Bioinformatics

EMBO J.