Memoir: template-based structure prediction for membrane proteins

Ebejer, Jean-Paul; Hill, Jamie R.; Kelm, Sebastian; Shi, Jiye; Deane, Charlotte M.

doi:10.1093/nar/gkt331

Abstract

Membrane proteins are estimated to be the targets of 50% of drugs that are currently in development, yet we have few membrane protein crystal structures. As a result, for a membrane protein of interest, the much-needed structural information usually comes from a homology model. Current homology modelling software is optimized for globular proteins, and ignores the constraints that the membrane is known to place on protein structure. Our Memoir server produces homology models using alignment and coordinate generation software that has been designed specifically for transmembrane proteins. Memoir is easy to use, with the only inputs being a structural template and the sequence that is to be modelled. We provide a video tutorial and a guide to assessing model quality. Supporting data aid manual refinement of the models. These data include a set of alternative conformations for each modelled loop, and a multiple sequence alignment that incorporates the query and template. Memoir works with both α-helical and β-barrel types of membrane proteins and is freely available at http://opig.stats.ox.ac.uk/webapps/memoir.

INTRODUCTION

Membrane proteins mediate the exchange of signals and chemicals into every cell. Despite their pharmaceutical importance, few membrane protein crystal structures exist. The MPStruc database (http://blanco.biomol.uci.edu/mpstruc/) estimates that there are 383 unique protein structures in the protein data bank (PDB; as of 26 January 2013). The PDB itself contains ∼50 000 unique chains (1), meaning that despite comprising ∼25% of known sequences (2), membrane proteins constitute <1% of known structures.

In the absence of a crystal structure, the best source of structural information for a sequence is a homology model. A homology model is constructed by aligning the residues of the ‘target’ sequence onto the structure of a related ‘template’ protein. The accuracy of the model is determined by the quality of the alignment between the target and template, and by the coordinate generation method that turns this alignment into a 3D structure.

Owing to the small number of known membrane protein structures, a target membrane protein normally shares little sequence identity with any template, making accurate modelling challenging. Fortunately, structural constraints imposed on the protein by its biological membrane are thought to make membrane protein models more accurate than similarly remote globular protein models (3). The membrane also imposes constraints on sequence that can be used to improve the target–template alignment (4). Several web servers exist to produce homology models for globular proteins including HHpred (5), Swiss-Model (6) and RaptorX (7). However, no fully automated web server exists designed for general membrane proteins: at best this means that the constraints imposed by the membrane are not used in modelling, at worst the use of scoring functions designed for globular proteins may lead to distorted models.

Our Memoir web server is specifically designed for membrane proteins. An overview of Memoir’s pipeline is shown in Figure 1. First, the template protein is annotated with membrane-specific information by iMembrane (8). Next, homologous sequences are gathered for both the target and template proteins. These are aligned by MP-T (9), guided by the membrane information from iMembrane. Membrane information is again used in model building by the Medeller program (10), and the model is completed with a membrane protein-specific version of the FREAD loop-modelling method (11,12). These steps are described in more detail below.

Figure 1.

Open in new tab Download slide

The Memoir pipeline. The user inputs are a target sequence to be modelled, and a template structure on which to base the model. The sequence of the template is annotated by iMembrane with structural information, such as position within the membrane and secondary structure. This annotation, together with a set of proteins that are homologous to the target and template, are aligned by MP-T. The alignment is used as a blueprint for model building by Medeller. The resulting ‘core’ model is available for download. Loops are then added to the core model to generate Memoir’s principal outputs: the high accuracy (Hiacc) and high coverage (Hicov) models.

MATERIALS AND METHODS

iMembrane: Annotating template membrane proteins

Template protein structures are annotated by the iMembrane program (8). iMembrane annotates each residue in the structure according to its accessible surface area, secondary structure, membrane positioning and extent of contact with lipids. iMembrane’s annotations are determined from molecular-dynamics simulations in the CGDB database (13). The use of molecular dynamics allows for distortions of the protein structure and membrane due to their mutual interaction. It also allows residues to be classified by the fraction of the simulation time for which they contact each part of a membrane lipid. Membrane lipids have hydrophilic heads and hydrophobic tails, so the local electrostatic environment of a residue is determined by the part of the lipid that it contacts.

Homologue selection for alignment

The next step in the pipeline (Figure 1) is the collection of homologues of the target and the template using PSI-BLAST (14), running for five iterations on the Uniref90 database (15). A subset of the homologues is then selected as in (9). This selection procedure (see below) is a mixture of steps that filter out non-homologous sequences (such as a sequence identity cut-off), and steps that help the alignment algorithm (such as a cap on the maximum number of sequences).

Putative homologues are rejected if they have <15% sequence identity to the query, or if they are >3/2 or <2/3 the length of the query. The surviving homologues are made non-redundant at 80% sequence identity, and the homologues from the target and template are combined in equal numbers to prevent bias. This combined set is again made non-redundant. Up to 125 of the surviving sequences are randomly selected to help guide the target–template alignment.

MP-T: Target–template alignment

The target and template are aligned with the MP-T sequence-structure alignment method. The MP-T algorithm first copies the annotation of the template on to each homologue. Subsequently every pair of sequences is aligned guided by these annotations. For example, a residue that is annotated as being in a transmembrane α-helix will rarely be aligned to a gap (indels are rare in transmembrane elements), and will be preferentially aligned to an amino acid type that is favoured in transmembrane helices.

The pairwise alignments are used to construct a guide tree to select homologues for a multiple alignment phase: only sequences judged by the guide tree to be descendants of the most recent common ancestor of the target and template are selected. Multiple alignment then proceeds using MP-T’s implementation of the T-Coffee objective criterion (16). This criterion attempts to make a multiple alignment that is as consistent as possible with the pairwise alignments.

Medeller: Coordinate generation

The target–template alignment is then fed to Medeller for coordinate generation. Homology modelling is most effective in the middle of transmembrane sections, where membrane proteins are under the greatest structural constraints. The Medeller coordinate generation method builds models outwards from these constrained sections. Models consist of the protein backbone and C_β atoms, as well as the side chains of conserved residues. Model building stops when a local assessment of the quality of the sequence alignment suggests that structural similarity can no longer be assumed. This results in a ‘core model’, which is then extended by the FREAD fragment modelling method (Figure 1).

FREAD: fragment modelling

FREAD searches a protein database for fragments of the appropriate length to fill gaps in a model. Potential matches are filtered based on the propensity for the un-modelled residues to assume the conformation required by the fragment. The remaining fragments are then ranked by how closely their termini match the flanking regions of the gap in the model.

Memoir generates two models, which differ in how highly scoring a database fragment must be before it is included in the model: one is termed the ‘high accuracy’ model (∼70% of the target sequence is modelled), the other the ‘high coverage’ model (∼76% of the target sequence). To produce the high-accuracy model, FREAD is run on a database of membrane protein fragments. The high coverage model includes additional lower scoring loops from the membrane fragment database as well as loops from a soluble fragment database. Both models include all major secondary structure elements.

Web server usage

The Memoir server accepts a template structure in PDB format and a sequence to be modelled in FASTA format. The template can either be uploaded or specified by a PDB code. A typical query takes <1 h to run. An example results page is shown in Figure 2. Two models are produced: one with higher accuracy, and one with higher coverage. These are displayed in the Jmol 3D graphics viewer (17) and are available for download in PDB format (Figure 2a).

Figure 2.

Open in new tab Download slide

Parts of a Memoir results page: (a) two models are generated, one prioritizing accuracy (the ‘high accuracy’ model) and the other completeness (the ‘high coverage’ model). They are displayed in the Jmol 3d graphics viewer and are available for download in PDB format. Additional information on model creation can be downloaded using the ‘Download all results’ button. (b) Also displayed is the alignment between the target and template structure that was used in model building. (c) The alignment is accompanied by a guide to model quality, an extract of which is shown here. Values referenced in the guide, such as sequence identity, are calculated and displayed with traffic-light colour-coding (e.g. green for values that are likely to lead to a good model).

A proxy for the expected quality of a model is the quality of the corresponding target–template alignment. The results page displays this alignment (Figure 2b) together with a guide to model quality estimation based on alignment properties (an extract of which is shown in Figure 2c).

The generation of a homology model requires several programs, each of which produces its own output. A ‘Download all results’ button provides the supporting information for these methods. This information includes alternative loop structures for each loop modelled by FREAD, a Medeller model without fragment modelling (the ‘core’ model) and the full multiple sequence alignment from which the target–template alignment is inferred.

RESULTS

The main source of error in homology models is inaccuracies in the target–template alignment (18). When tested against seven other methods on a set of 115 pairs of membrane proteins, MP-T produced alignments with the smallest fraction of misaligned residues (9). Reducing the fraction of misaligned residues allows better models to be built by coordinate-generation programs.

The most cited coordinate-generation software is Modeller (19). Medeller has been tested against Modeller on a data set of 616 target–template membrane protein pairs spanning a range of sequence identities (10). On average Medeller’s core models (i.e. the models before FREAD fragment modelling, see Figure 1) had a backbone root mean square deviation (RMSD) of 1.97 Å to the native structure, compared with 2.57 Å for Modeller. This trend was true at all levels of sequence identity and may be caused by distortions of the backbone introduced by Modeller’s probability density function, which is designed for soluble proteins.

When using different alignment methods with Medeller, it was found that models generated from MP-T alignments had marginally lower coverage, but significantly higher GDT_TS (20) than models from the next best alignment method (1/4 of models saw an increase in GDT_ TS of ≥4%) (9).

Memoir produces more complete models than those described above by augmenting the core. During this process the core is fixed, preserving the RMSD advantage that Medeller enjoys over Modeller. On a test set of 156 loops from 59 Medeller core models, loop modelling led to a high-coverage model that filled 150 of the loops. In 109 of 150 of these cases, the FREAD loop model was more accurate than Modeller’s ab initio loop model on the same set.

To illustrate Memoir’s use, models of the transmembrane domains of 15 membrane proteins were built using Memoir, HHpred and Swiss-Model’s automated mode (Table 1). Over the residues common to all three models Memoir had the lowest average RMSD (2.57 Å). In four cases, Memoir’s high accuracy model had <80% coverage, but the region that Memoir left un-modelled was modelled poorly by the other methods: seven of the eight fuller models built by HHpred and Swiss-Model had RMSDs of >5 Å.

Table 1.

Comparison of models of 15 transmembrane domains built by Memoir (high-accuracy model), HHpred and Swiss-Model

Target/template	% id	% Cov^a	RMSD^b
			Memoir	HHpred	Swiss-Model
2Q7MC/2H8AA	10	57	4.07	3.63	3.76
2JMMA/2LHFA	13	62	3.85	3.85	5.29
3GIAA/3L1LA	15	93	3.97	4.20	4.81
3O0RB/3MK7A	18	92	3.18	2.64	2.91
1OGVM/2AXTa	19	59	2.60	5.06	3.39
2VL0A/3RHWA	22	93	2.61	2.58	2.64
3BRYA/3DWOX	23	84	4.25	3.67	3.55
2WIEA/2X2VA	27	80	1.31	1.47	1.31
1YC9A/3PIKA	27	89	1.35	2.16	1.35
2D57A/2W2EA	31	97	1.80	2.06	2.02
2HYDA/3B60A	34	94	2.31	2.97	2.33
1L0LD/1ZRTD	35	89	1.30	1.54	1.28
1EZVE/2FYNC	47	65	2.11	3.72	3.01
1M56C/1OCCC	48	99	1.10	2.33	2.04
2QKSA/3SYOA	50	90	2.72	2.58	3.15
	Mean	83	2.57	2.96	2.86

Target/template	% id	% Cov^a	RMSD^b
			Memoir	HHpred	Swiss-Model
2Q7MC/2H8AA	10	57	4.07	3.63	3.76
2JMMA/2LHFA	13	62	3.85	3.85	5.29
3GIAA/3L1LA	15	93	3.97	4.20	4.81
3O0RB/3MK7A	18	92	3.18	2.64	2.91
1OGVM/2AXTa	19	59	2.60	5.06	3.39
2VL0A/3RHWA	22	93	2.61	2.58	2.64
3BRYA/3DWOX	23	84	4.25	3.67	3.55
2WIEA/2X2VA	27	80	1.31	1.47	1.31
1YC9A/3PIKA	27	89	1.35	2.16	1.35
2D57A/2W2EA	31	97	1.80	2.06	2.02
2HYDA/3B60A	34	94	2.31	2.97	2.33
1L0LD/1ZRTD	35	89	1.30	1.54	1.28
1EZVE/2FYNC	47	65	2.11	3.72	3.01
1M56C/1OCCC	48	99	1.10	2.33	2.04
2QKSA/3SYOA	50	90	2.72	2.58	3.15
	Mean	83	2.57	2.96	2.86

An entry is in bold if the RMSD for the method is >0.2 Å lower than that of the next most accurate method.

^aCoverage is assessed over the transmembrane domain.

^bRMSD is assessed over common residues in all the models in the transmembrane domain.

Open in new tab

Table 1.

Comparison of models of 15 transmembrane domains built by Memoir (high-accuracy model), HHpred and Swiss-Model

Target/template	% id	% Cov^a	RMSD^b
			Memoir	HHpred	Swiss-Model
2Q7MC/2H8AA	10	57	4.07	3.63	3.76
2JMMA/2LHFA	13	62	3.85	3.85	5.29
3GIAA/3L1LA	15	93	3.97	4.20	4.81
3O0RB/3MK7A	18	92	3.18	2.64	2.91
1OGVM/2AXTa	19	59	2.60	5.06	3.39
2VL0A/3RHWA	22	93	2.61	2.58	2.64
3BRYA/3DWOX	23	84	4.25	3.67	3.55
2WIEA/2X2VA	27	80	1.31	1.47	1.31
1YC9A/3PIKA	27	89	1.35	2.16	1.35
2D57A/2W2EA	31	97	1.80	2.06	2.02
2HYDA/3B60A	34	94	2.31	2.97	2.33
1L0LD/1ZRTD	35	89	1.30	1.54	1.28
1EZVE/2FYNC	47	65	2.11	3.72	3.01
1M56C/1OCCC	48	99	1.10	2.33	2.04
2QKSA/3SYOA	50	90	2.72	2.58	3.15
	Mean	83	2.57	2.96	2.86

Target/template	% id	% Cov^a	RMSD^b
			Memoir	HHpred	Swiss-Model
2Q7MC/2H8AA	10	57	4.07	3.63	3.76
2JMMA/2LHFA	13	62	3.85	3.85	5.29
3GIAA/3L1LA	15	93	3.97	4.20	4.81
3O0RB/3MK7A	18	92	3.18	2.64	2.91
1OGVM/2AXTa	19	59	2.60	5.06	3.39
2VL0A/3RHWA	22	93	2.61	2.58	2.64
3BRYA/3DWOX	23	84	4.25	3.67	3.55
2WIEA/2X2VA	27	80	1.31	1.47	1.31
1YC9A/3PIKA	27	89	1.35	2.16	1.35
2D57A/2W2EA	31	97	1.80	2.06	2.02
2HYDA/3B60A	34	94	2.31	2.97	2.33
1L0LD/1ZRTD	35	89	1.30	1.54	1.28
1EZVE/2FYNC	47	65	2.11	3.72	3.01
1M56C/1OCCC	48	99	1.10	2.33	2.04
2QKSA/3SYOA	50	90	2.72	2.58	3.15
	Mean	83	2.57	2.96	2.86

An entry is in bold if the RMSD for the method is >0.2 Å lower than that of the next most accurate method.

^aCoverage is assessed over the transmembrane domain.

^bRMSD is assessed over common residues in all the models in the transmembrane domain.

Open in new tab

CONCLUSION

Memoir is currently the only web server designed for the homology modelling of general membrane proteins. Memoir works on all types of transmembrane protein (α-helical and β-barrel) and is easy to use. The main outputs of the server are two models in PDB format, one of which prioritizes model accuracy, and the other model completeness. Memoir’s results include supplementary information that could be used in manual model refinement, such as a multiple sequence alignment incorporating the target and template protein sequences and alternative conformations for each modelled loop. A video tutorial and a guide to the interpretation of results are provided.

ACKNOWLEDGEMENTS

We would like to thank our fellow members of the Oxford Protein Informatics Group for useful discussions.

FUNDING

Engineering and Physical Sciences Research Council (to J.R.H., S.K. and C.M.D.); European Union Framework Programme 7-funded Marie Curie Initial Training Network STARS [PITN-GA-2009-238490 to J.P.E.]; Biotechnology and Biological Sciences Research Council (to S.K. and C.M.D.); University of Oxford Doctoral Training Centres (to C.M.D.). Funding for open access charge: Public Body.

Conflict of interest statement. None declared.

REFERENCES

1

Berman

HM

,

Westbrook

J

,

Feng

Z

,

Gilliland

G

,

Bhat

TN

,

Weissig

H

,

Shindyalov

IN

,

Bourne

PE

.

The Protein Data Bank

,

Nucleic Acids Res.

,

2000

, vol.

28

(pg.

235

-

242

)

2

Oberai

A

,

Ihm

Y

,

Kim

S

,

Bowie

JU

.

A limited universe of membrane protein families and folds

,

Protein Sci.

,

2006

, vol.

15

(pg.

1723

-

34

)

3

Forrest

L

,

Tang

C

,

Honig

B

.

On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins

,

Biophys. J.

,

2006

, vol.

91

(pg.

508

-

517

)

4

Hill

JR

,

Kelm

S

,

Shi

J

,

Deane

CM

.

Environment specific substitution tables improve membrane protein alignment

,

Bioinformatics

,

2011

, vol.

27

(pg.

i15

-

i23

)

5

Söding

J

,

Biegert

A

,

Lupas

AN

.

The HHpred interactive server for protein homology detection and structure prediction

,

Nucleic Acids Res.

,

2005

, vol.

33

(pg.

W244

-

W248

)

6

Arnold

K

,

Bordoli

L

,

Kopp

J

,

Schwede

T

.

The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling

,

Bioinformatics

,

2006

, vol.

22

(pg.

195

-

201

)

7

Källberg

M

,

Wang

H

,

Wang

S

,

Peng

J

,

Wang

Z

,

Lu

H

,

Xu

J

.

Template-based protein structure modeling using the RaptorX web server

,

Nat. Protoc.

,

2012

, vol.

7

(pg.

511

-

522

)

Google Scholar

Crossref

WorldCat

8

Kelm

S

,

Shi

J

,

Deane

CM

.

iMembrane: homology-based membrane-insertion of proteins

,

Bioinformatics

,

2009

, vol.

25

(pg.

1086

-

1088

)

9

Hill

JR

,

Deane

CM

.

MP-T: improving membrane protein alignment for structure prediction

,

Bioinformatics

,

2013

, vol.

29

(pg.

54

-

61

)

10

Kelm

S

,

Shi

J

,

Deane

CM

.

MEDELLER: homology-based coordinate generation for membrane proteins

,

Bioinformatics

,

2010

, vol.

26

(pg.

2833

-

2840

)

11

Deane

CM

,

Blundell

TL

.

CODA: a combined algorithm for predicting the structurally variable regions of protein models

,

Prot. Sci.

,

2001

, vol.

10

(pg.

599

-

612

)

Google Scholar

Crossref

WorldCat

12

Choi

Y

,

Deane

CM

.

FREAD revisited: accurate loop structure prediction using a database search algorithm

,

Proteins

,

2010

, vol.

78

(pg.

1431

-

1440

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

13

Scott

KA

,

Bond

PJ

,

Ivetac

A

,

Chetwynd

AP

,

Khalid

S

,

Sansom

MS

.

Coarse-grained MD simulations of membrane protein-bilayer self-assembly

,

Structure

,

2008

, vol.

16

(pg.

621

-

630

)

14

Altschul

SF

,

Madden

TL

,

Schäffer

AA

,

Zhang

J

,

Zhang

Z

,

Miller

W

,

Lipman

DJ

.

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

,

Nucleic Acids Res.

,

1997

, vol.

25

(pg.

3389

-

3402

)

15

Suzek

BE

,

Huang

H

,

McGarvey

P

,

Mazumder

R

,

Wu

CH

.

UniRef: comprehensive and non-redundant UniProt reference clusters

,

Bioinformatics

,

2007

, vol.

23

(pg.

1282

-

1288

)

16

Notredame

C

,

Higgins

DG

,

Heringa

J

.

T-Coffee: a novel method for fast and accurate multiple sequence alignment

,

J. Mol. Biol.

,

2000

, vol.

302

(pg.

205

-

217

)

17

Hanson

RM

.

Jmol a paradigm shift in crystallographic visualization

,

J. Appl. Crystallogr.

,

2010

, vol.

43

(pg.

1250

-

1260

)

Google Scholar

Crossref

WorldCat

18

Ginalski

K

.

Comparative modeling for protein structure prediction

,

Curr. Opin. Struct. Biol.

,

2006

, vol.

16

(pg.

172

-

177

)

19

Sali

A

.

Comparative Protein Modelling by Satisfaction of Spatial Restraints

,

J. Mol. Biol.

,

1993

, vol.

234

(pg.

779

-

815

)

20

Zemla

A

,

Venclovas

,

Moult

J

,

Fidelis

K

.

Processing and evaluation of predictions in CASP4

,

Proteins

,

2001

Suppl 5

(pg.

13

-

21

)

Google Scholar

OpenURL Placeholder Text

WorldCat

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
November 2016	1
December 2016	4
January 2017	5
February 2017	6
March 2017	2
April 2017	2
May 2017	3
June 2017	4
July 2017	6
August 2017	13
September 2017	4
October 2017	5
November 2017	6
December 2017	10
January 2018	25
February 2018	21
March 2018	33
April 2018	12
May 2018	11
June 2018	7
July 2018	19
August 2018	22
September 2018	18
October 2018	18
November 2018	10
December 2018	18
January 2019	12
February 2019	14
March 2019	21
April 2019	24
May 2019	17
June 2019	7
July 2019	16
August 2019	19
September 2019	30
October 2019	12
November 2019	13
December 2019	17
January 2020	24
February 2020	13
March 2020	11
April 2020	8
May 2020	7
June 2020	20
July 2020	14
August 2020	12
September 2020	17
October 2020	9
November 2020	10
December 2020	3
January 2021	19
February 2021	6
March 2021	15
April 2021	6
May 2021	5
June 2021	7
July 2021	3
August 2021	11
September 2021	15
October 2021	9
November 2021	15
December 2021	10
January 2022	13
February 2022	6
March 2022	8
April 2022	9
May 2022	11
June 2022	10
July 2022	18
August 2022	16
September 2022	12
October 2022	16
November 2022	9
December 2022	10
January 2023	12
February 2023	4
March 2023	4
April 2023	15
May 2023	14
June 2023	10
July 2023	4
August 2023	7
September 2023	8
October 2023	7
November 2023	16
December 2023	22
January 2024	5
February 2024	5
March 2024	9
April 2024	15

Article Contents

Memoir: template-based structure prediction for membrane proteins

Abstract

INTRODUCTION

MATERIALS AND METHODS

iMembrane: Annotating template membrane proteins

Homologue selection for alignment

MP-T: Target–template alignment

Medeller: Coordinate generation

FREAD: fragment modelling

Web server usage

RESULTS

CONCLUSION

ACKNOWLEDGEMENTS

FUNDING

REFERENCES

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Memoir: template-based structure prediction for membrane proteins

Abstract

INTRODUCTION

MATERIALS AND METHODS

iMembrane: Annotating template membrane proteins

Homologue selection for alignment

MP-T: Target–template alignment

Medeller: Coordinate generation

FREAD: fragment modelling

Web server usage

RESULTS

CONCLUSION

ACKNOWLEDGEMENTS

FUNDING

REFERENCES

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only