BlockLogo: Visualization of peptide and sequence motif conservation

doi:10.1016/j.jim.2013.08.014

Journal of Immunological Methods

Volumes 400–401, 31 December 2013, Pages 37-44

https://doi.org/10.1016/j.jim.2013.08.014 Get rights and content

Highlights

•
We developed a tool for visualization of linear and non-linear immunological motifs.
•
Utility is demonstrated for neutralizing influenza B cell epitopes.
•
Utility is demonstrated for allergenic and hypoallergenic Bet v 1 allergens.
•
Utility is demonstrated for variability of HLA-DRB1 binding pocket P1.
•
The BlockLogo tool is available at http://research4.dfci.harvard.edu/cvc/blocklogo/.

Abstract

BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine the specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://met-hilab.bu.edu/blocklogo/.

Introduction

Sequence logos are useful tools for visual display of conservation and variability in a multiple sequence alignment (MSA) of DNA, RNA, or protein sequences (T D Schneider and Stephens, 1990). Individual nucleotides or residues in each position in an MSA are displayed by stacking the characters, where the height of each character corresponds to its frequency relative to the frequencies of all the characters in that position, and the height of the stack is determined by the total information content (Shannon, 1948). Sequence logos aid the interpretation of sequence data by visualization of conserved motifs representing various functional or structural properties. Examples of motifs that have been visually analyzed using sequence logos are: transcription factors (Wade et al., 2004), enzyme DNA sequences (Goll and Bestor, 2005), proteolytic cleavage sites (Mahrus et al., 2008), T-cell epitopes (Bryson et al., 2009, Olsen et al., 2011), and the analysis of targets of neutralizing antibodies in HIV (Sun et al., 2008), among others. Sequence logos display stacked motifs with the most frequent residues shown at the bottom and the least frequent motif displayed on the top of the stack. Sequence logos visualize biological sequence motifs where the height of the logo element represents its log-transformed frequency displayed in bits of information. Logos often do not display low-frequency motifs because their heights are below useful resolution.

The most popular sequence logo web server is WebLogo (Crooks et al., 2004). It enables users to generate standard sequence logos for DNA, RNA, and protein sequences. In addition to the WebLogo web server, several specialized logo generators have been developed to visualize specific motifs or functional sequence units that are unapparent from the standard sequence logos. Examples of extensions to the basic sequence logo are: RNA structure logo (J Gorodkin et al., 1997) which combines the standard sequence logo with information about base pairing and mutual information of base pairs; enoLOGOS (Workman et al., 2005) which displays energy measurements, probability matrices and alignment matrices in addition to the standard sequence logo; two-sample logo (Vacic et al., 2006) which displays comparative sequence logos for two sets of MSA; CorreLogo (Bindewald et al., 2006) calculates mutual information of nucleotides in different positions to determine correlation and potential base pairing; Phylo-mLogo (Shih et al., 2007) creates sequence logos for the comparison of phylogenetically distinct clades within an MSA of DNA sequences; Blogo (Li et al., 2008) displays a sequence logo with statistically significant bias of individual positions; RNAlogo (T.-H. Chang et al., 2008) extends the RNA structure logo with a graphical representation of secondary structure; PoreLogo (Oliva et al., 2009) uses sequence logos and 3D protein structures to visualize motifs of channels in transmembrane proteins; iceLogo (Colaert et al., 2009) provides a probability-based visualization by allowing users to define reference sequences of the sample's origin; Seq2Logo (Thomsen and Nielsen, 2012) offers the capacity to visualize amino acid sequence profiles in terms of amino acid enrichment and depletion; RIlogo (Menzel et al., 2012) for the visualization of RNA–RNA interactions; and CodonLogo (Sharma et al., 2012) which enables visualization of conserved codon patterns. The BlockLogo web server (Fig. 1) enables visualization of continuous and discontinuous immune epitopes and various sequence motifs. To our knowledge, it is the first logo web server that specifically enables visualization and analysis of immunologically relevant motifs.

WebLogo is suitable for the visualization of immunological motifs such as immune epitopes. A main limitation of the standard sequence logo for this type of application is that sequence logos carry no information about the relationship between the residues in the logo, but treat each residue as an individual independent position. Often, such logos have limited interpretability. For example, the sequence logo of influenza A HA peptide 232–241 (Fig. 2A) shows variability that can be encoded by as many as 3072 different peptides (4 × 1 × 1 × 4 × 3 × 2 × 4 × 2 × 2 × 2, corresponding to the number of different residues in each position). The BlockLogo presented in Fig. 2B and Table 1 shows, at a glance, that the vast majority of actual sequence diversity is produced by only five peptides that can be read directly from BlockLogo. The actual number of different peptides that have produced sequence logo displayed in Fig. 2A is seven, as shown in Table 1. The peptides visible in this BlockLogo have frequencies > 6%, while each of the two peptides not readable from BlockLogo has a frequency of < 1%. Sequence logos can be useful for visualizing individual anchor position variability of MHC binding peptides, however since many motifs, such as T-cell epitopes, are recognized as linear peptides rather than individual residues, they should be visualized as continuous sequence blocks or fragments. A typical MHC class I T-cell epitopes may be between 8 and 11 amino acids long. MHC class II epitopes can be longer than 30 amino acids but they bind MHC through a nine amino acid long binding core (Reinherz et al., 1999). The input to the BlockLogo web server tool is an MSA of nucleotides, of short peptides of equal length, or of a user-defined subset of positions (here termed a “block”) within an MSA of longer protein sequences. The user-defined positions from within an MSA (i.e. positions derived from the continuous or discontinuous motifs) define the blocks. The information content (Shannon entropy) and relative frequency of each block are calculated, and the sequences printed in the BlockLogo, stacked according to frequency, from the most to the least frequent, from the bottom to the top of the stack. An extension of BlockLogo enables the prediction of the binding affinity of identified peptides for a selection of common HLA molecules using the netMHC prediction algorithms (Lundegaard et al., 2011, Nielsen et al., 2007) that have been experimentally validated for accuracy.

Section snippets

Variability and conservation metrics

Calculation of information content of individual positions in an MSA of homologous protein sequences is based on Shannon entropy (Shannon, 1948). Similarly, Shannon entropy can be calculated for each motif within a defined block. Each block contains W unique motifs of length l in a dataset of N sequences. The formula used for the calculation of block entropy is (Olsen et al., 2011): $H (B_{x}) = - \sum_{w = 1}^{W} P_{w} (x) {log}_{2} (P_{w} (x))$ where H(B_x) is the total entropy of a block of motifs starting at position x, and w is a

User interface

The user is prompted to copy/paste an MSA, or upload a file containing an MSA, in standard FASTA or ClustalW formats. Users can select a block from the MSA by specifying the start and end positions of the subset, or a series of individual positions corresponding to the positions of a discontinuous motif. The motifs that have a gap in any of the positions within the specified range will be excluded by default. In the analysis of discontinuous motifs, the sequences with gaps in specified

Conservation of influenza A T-cell epitopes

To illustrate the utility of BlockLogo, we analyzed a block of peptides in 29,113 influenza virus HA protein sequences, containing approximately 36.1 bits of information. All peptides in the block of 10-mers, starting at position 232 were predicted to bind to HLA A*02:01 with similar affinities. The relative frequencies of individual peptides within the viral population cannot be determined from the standard sequence logo produced with WebLogo (Fig. 2A), but are clear from the BlockLogo (Fig. 2

Conclusion and discussion

BlockLogo is a novel sequence logo tool optimized for the visualization of user-defined continuous and discontinuous motifs, fragments, and peptides. Paired with the prediction of HLA binding, BlockLogo is a useful tool for the rapid assessment of the immunological potential of selected regions within an MSA, such as those containing human pathogen sequences or tumor antigen alignments. The BlockLogo tool provides an easily interpretable visual representation of the immunological status and

Funding

LRO was funded by the Novo Nordisk Foundation; UJK was funded by the Oticon Foundation; C. Simon was funded by the Novo Scholarship Programme; and GLZ, JS, VB, and ELR acknowledge funding from NIH grant U01 AI 90043.

References (35)

S.F. Altschul et al.
Basic local alignment search tool
J. Mol. Biol.
(1990)
G. Chelvanayagam
A roadmap for HLA-DR peptide binding specificities
Hum. Immunol.
(1997)
C. Lundegaard et al.
Prediction of epitopes using neural network based methods
J. Immunol. Methods
(2011)
S. Mahrus et al.
Global sequencing of proteolytic cleavage sites in apoptosis by specific labeling of protein N termini
Cell
(2008)
Z.-Y.J. Sun et al.
HIV-1 broadly neutralizing antibody extracts its epitope from a kinked gp41 ectodomain region on the viral membrane
Immunity
(2008)
G.L. Zhang et al.
Machine learning competition in immunology — prediction of HLA class I binding peptides
J. Immunol. Methods
(2011)
E. Bindewald et al.
CorreLogo: an online server for 3D sequence logos of RNA and DNA alignments
Nucleic Acids Res.
(2006)
S. Bryson et al.
Crystallographic definition of the epitope promiscuity of the broadly neutralizing anti-human immunodeficiency virus type 1 antibody 2F5: vaccine design implications
J. Virol.
(2009)
T.-H. Chang et al.
RNALogo: a new approach to display structural RNA alignment
Nucleic Acids Res.
(2008)
C. Chothia
Hydrophobic bonding and accessible surface area in proteins
Nature
(1974)

N. Colaert et al.

Improved visualization of protein consensus sequences by iceLogo

Nat. methods

(2009)

G.E. Crooks et al.

WebLogo: a sequence logo generator

Genome Res.

(2004)

M.G. Goll et al.

Eukaryotic cytosine methyltransferases

Annu. Rev. Biochem.

(2005)

J. Gorodkin et al.

Displaying the information contents of structural RNA alignments: the structure logos

Comput. Appl. Biosci.

(1997)

K. Katoh et al.

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

(2013)

W. Li et al.

BLogo: a tool for visualization of bias in biological sequences

Bioinformatics

(2008)

H.H. Lin et al.

Evaluation of MHC class I peptide binding prediction servers: applications for vaccine research

BMC Immunol.

(2008)

Cited by (23)

Lipid droplet-associated kinase STK25 regulates peroxisomal activity and metabolic stress response in steatotic liver
2020, Journal of Lipid Research
Citation Excerpt :
This analysis identified 12 peptides with reduced phosphorylation status in Stk25−/− livers, representing potential target sites for the kinase activity of STK25 (Fig. 3D, supplemental Table S4). Comparison of the phosphosites that were downregulated in Stk25−/− livers using the BlockLogo application (34) identified a consensus sequence with a high variability in most positions, although a proline-directed ([pS]P) motif was over-represented (Fig. 3E). Importantly, because the STK25 protein is globally depleted early in development in knockout mice (35), it is not possible to discriminate between direct and indirect targets of STK25 activity using this conventional model of gene inactivation.
Nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH) are emerging as leading causes of liver disease worldwide and have been recognized as one of the major unmet medical needs of the 21st century. Our recent translational studies in mouse models, human cell lines, and well-characterized patient cohorts have identified serine/threonine kinase (STK)25 as a protein that coats intrahepatocellular lipid droplets (LDs) and critically regulates liver lipid homeostasis and progression of NAFLD/NASH. Here, we studied the mechanism-of-action of STK25 in steatotic liver by relative quantification of the hepatic LD-associated phosphoproteome from high-fat diet-fed Stk25 knockout mice compared with their wild-type littermates. We observed a total of 131 proteins and 60 phosphoproteins that were differentially represented in STK25-deficient livers. Most notably, a number of proteins involved in peroxisomal function, ubiquitination-mediated proteolysis, and antioxidant defense were coordinately regulated in Stk25^−/− versus wild-type livers. We confirmed attenuated peroxisomal biogenesis and protection against oxidative and ER stress in STK25-deficient human liver cells, demonstrating the hepatocyte-autonomous manner of STK25's action. In summary, our results suggest that regulation of peroxisomal function and metabolic stress response may be important molecular mechanisms by which STK25 controls the development and progression of NAFLD/NASH.
Crystal Structures of Fungal Tectonin in Complex with O-Methylated Glycans Suggest Key Role in Innate Immune Defense
2018, Structure
Innate immunity is the first line of defense against pathogens and predators. To initiate a response, it relies on the detection of invaders, where lectin-carbohydrate interactions play a major role. O-Methylated glycans were previously identified as non-self epitopes and conserved targets for defense effector proteins belonging to the tectonin superfamily. Here, we present two crystal structures of Tectonin 2 from the mushroom Laccaria bicolor in complex with methylated ligands, unraveling the molecular basis for this original specificity. Furthermore, they revealed the formation of a ball-shaped tetramer with 24 binding sites distributed at its surface, resembling a small virus capsid. Based on the crystal structures, a methylation recognition motif was identified and found in the sequence of many tectonins from bacteria to human. Our results support a key role of tectonins in innate defense based on a distinctive and conserved type of lectin-glycan interaction.
plotnineSeqSuite: a Python package for visualizing sequence data using ggplot2 style
2023, BMC Genomics
Cryptic association of B7-2 molecules and its implication for clustering
2021, Protein Science
Local adaptive evolution of two distinct clades of Beijing and T families of Mycobacterium tuberculosis in Chongqing: A Bayesian population structure and phylogenetic study
2020, Infectious Diseases of Poverty
Logomaker: Beautiful sequence logos in Python
2020, Bioinformatics

View all citing articles on Scopus

View full text

Research paperBlockLogo: Visualization of peptide and sequence motif conservation

Highlights

Abstract

Introduction

Section snippets

Variability and conservation metrics

User interface

Conservation of influenza A T-cell epitopes

Conclusion and discussion

Funding

J. Mol. Biol.

Hum. Immunol.

J. Immunol. Methods

Cell

Immunity

J. Immunol. Methods

CorreLogo: an online server for 3D sequence logos of RNA and DNA alignments

Nucleic Acids Res.

Crystallographic definition of the epitope promiscuity of the broadly neutralizing anti-human immunodeficiency virus type 1 antibody 2F5: vaccine design implications

J. Virol.

RNALogo: a new approach to display structural RNA alignment