Special Section on 3DOR 2021SHREC 2021: Retrieval and classification of protein surfaces equipped with physical and chemical properties
Graphical abstract
Introduction
Automatically identifying the different conformations of a given set of proteins, as well as their interaction with other molecules, is crucial in structural bioinformatics. The well-established shape-function paradigm for proteins [1] states that a protein of a given sequence has one main privileged conformation, which is crucial for its function. However, every protein during its time evolution explores a much larger part of the conformational space. The most stable conformations visited by the protein can be experimentally captured by the NMR technique; this is because the hydrogen atoms are already included in the atomic model, thus giving less ambiguities in the charge assignment.
Recognising a protein from an ensemble of geometries corresponding to the different conformations it can assume means capturing the features that are unique to it and is a fundamental step from the structural bioinformatics viewpoint. It is preliminary to the definition of a geometry-based notion of similarity, and, subsequently, complementarity, between proteins. From the application standpoint, the identification of characteristic features can point to protein functional regions and to new target sites for blocking the activity of pathological proteins in the drug discovery field. These features can become more specific if one adds to the geometry of the molecular surface also the information related to the main physicochemical descriptors, such as local electrostatic potential [2], residue hydrophobicity [3], and the location of hydrogen bond donors and acceptors [4].
The aim of this track is to evaluate the performance of retrieval and classification of computational methods for protein surfaces characterized by physicochemical properties. Starting from a set of protein structures in different conformational states generated via NMR experiments and deposited in the PDB repository [5], we build their Solvent Excluded Surface (SES) by the freely available Software NanoShaper [6], [7]. Differently from previous SHREC tracks [8], [9], [10], [11] we enrich the protein SES triangulations with scalar fields representing physicochemical properties, evaluated at the surface vertices.
The remainder of this paper is organized as follows. Section 2 overviews the previous benchmarks that were aimed at protein shape retrieval aspects. Then, in Section 3 we detail the dataset, the ground truth and the retrieval and classification metrics used in the contest. The methods submitted for evaluation to this SHREC are detailed in Section 4, while their retrieval and classification performances are presented in Section 5. Finally, discussions and concluding remarks are in Section 6.
Section snippets
Related benchmarks
The interest of recognising proteins and other biomolecules solely based on their structure is a lively challenge in biology and the scientific literature is seeing the rise of datasets and methods for surface-based retrieval of proteins. The Protein Data Bank (PDB) repository [5] is the most widely known public repository for experimentally determined protein and nucleic acid structures. The PDB collects over 175,000 biological macromolecular 3D structures of proteins, nucleic acids, lipids,
The benchmark
During years, we witness the consolidation of the idea that to have a more satisfactory answer to the protein shape retrieval problem it is necessary to combine geometry with patterns of chemical and geometric features [14]. For this reason, we move from the previous SHREC experiences to build a dataset equipped of both characteristics.
Description of methods
Eight groups from five different countries registered to this track. Five of them proceeded with the submission of their results. Each participant was allowed to send us up to three runs for each task, in the form of a dissimilarity matrix per run. All but one submitted three runs per task; one participant delivered three runs for Task A and one for Task B. Overall, Task A has gathered 15 runs, while Task B has 13 runs.
In the following, we will denote the methods proposed by the five
Comparative analysis
The performances of each run presented in Section 4 are here quantitatively evaluated on the basis of the measures described in Section 3.3. We remind the reader that: Task A refers to the mere use of geometry, while Task B includes both geometry and physicochemical properties; for any run, the method name and its specific settings are given in Section 4. The performance measures are presented for both the PDB and BLAST classifications detailed in Section 3.2.
An additional analysis, reported in
Concluding remarks
In this paper, we have provided a detailed analysis and evaluation of state-of-the-art retrieval and classification algorithms dealing with protein similarity assessment based on molecular surfaces, which we believe deserve attention from the research community. The introduction of physicochemical properties into the benchmark, represents an element of originality in the available benchmarks for structural biology and provides a more complete representation of the protein. To enable the
CRediT authorship contribution statement
Andrea Raffo: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Project administration, Writing - original draft, Writing - review & editing. Ulderico Fugacci: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Project administration, Writing - original draft, Writing - review & editing. Silvia Biasotti: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Writing -
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowlgedgments
The track organisers thank Dr. Michela Spagnuolo and Dr. Davide Boscaini for the fruitful discussions. Special thanks go to Ms. Daniela Bejan, for her help in using the software PyMol.
This project is co-funded by the project “TEACUP: Metodi e TEcniche innovative per lo sviluppo di librerie per la modellazione, l’Analisi e il confronto CompUtazionale di Proteine”, POR FSE, Programma Operativo Regione Liguria 2014–2020, No RLOF18ASSRIC/68/1. The CNR-IMATI research is partially developed in the
References (42)
- et al.
A simple method for displaying the hydropathic character of a protein
J Mol Biol
(1982) - et al.
An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein–protein complexes
J Mol Biol
(2003) - et al.
SHREC 2020: Multi-domain protein shape retrieval challenge
Computers & Graphics
(2020) Community detection in graphs
Phys Rep
(2010)- et al.
Pytorch: an imperative style, high-performance deep learning library
arXiv preprint arXiv:191201703
(2019) - et al.
Molecular biology of the cell
(2002) - et al.
Extending the applicability of the nonlinear poisson-boltzmann equation: multiple dielectric constants and multivalent ions
The Journal of Physical Chemistry B
(2001) - et al.
The protein data bank
Nucleic Acids Res
(2000) - et al.
A general and robust ray-casting-based algorithm for triangulating surfaces at the nanoscale
PLoS ONE
(2013) - et al.
Nanoshaper-vmd interface: computing and visualizing surfaces, pockets and channels in molecular systems
Bioinform
(2019)
SHREC’17 Track: protein shape retrieval. 3Dor ’17
Proceedings of the Workshop on 3D Object Retrieval
SHREC 2018 - Protein shape retrieval
Proceedings of the 11th Eurographics Workshop on 3D Object Retrieval
SHREC19 Protein shape retrieval contest
Areas, volumes, packing, and protein structure
Annu Rev Biophys Bioeng
Analytical molecular surface calculation
J Appl Crystallogr
Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning
Nat Methods
Meshlab: an open-source mesh processing tool
Using delphi capabilities to mimic protein’s conformational reorganization with amino acid specific dielectric constants
Commun Comput Phys
Improvements to the apbs biomolecular solvation software suite
Protein Sci
Twilight zone of protein sequence alignments
Protein Engineering, Design and Selection
Blast+: architecture and applications
BMC Bioinformatics
Cited by (0)
This article has been certified as Replicable by the Graphics Replicability Stamp Initiative: http://www.replicabilitystamp.org
- 1
Track organizer.