Elsevier

Computers & Graphics

Volume 99, October 2021, Pages 1-21
Computers & Graphics

Special Section on 3DOR 2021
SHREC 2021: Retrieval and classification of protein surfaces equipped with physical and chemical properties

https://doi.org/10.1016/j.cag.2021.06.010Get rights and content

Highlights

  • A new benchmark of protein surfaces equipped with three physicochemical properties.

  • Protein surfaces are retrieved and classified with respect to different ground truths.

  • Analysis of the retrieval and classification performances of the methods that participated in SHREC 2021.

  • A thorough Investigation of how different physicochemical properties impact the performance of retrieval methods.

Abstract

This paper presents the methods that have participated in the SHREC 2021 contest on retrieval and classification of protein surfaces on the basis of their geometry and physicochemical properties. The goal of the contest is to assess the capability of different computational approaches to identify different conformations of the same protein, or the presence of common sub-parts, starting from a set of molecular surfaces. We addressed two problems: defining the similarity solely based on the surface geometry or with the inclusion of physicochemical information, such as electrostatic potential, amino acid hydrophobicity, and the presence of hydrogen bond donors and acceptors. Retrieval and classification performances, with respect to the single protein or the existence of common sub-sequences, are analysed according to a number of information retrieval indicators.

Introduction

Automatically identifying the different conformations of a given set of proteins, as well as their interaction with other molecules, is crucial in structural bioinformatics. The well-established shape-function paradigm for proteins [1] states that a protein of a given sequence has one main privileged conformation, which is crucial for its function. However, every protein during its time evolution explores a much larger part of the conformational space. The most stable conformations visited by the protein can be experimentally captured by the NMR technique; this is because the hydrogen atoms are already included in the atomic model, thus giving less ambiguities in the charge assignment.

Recognising a protein from an ensemble of geometries corresponding to the different conformations it can assume means capturing the features that are unique to it and is a fundamental step from the structural bioinformatics viewpoint. It is preliminary to the definition of a geometry-based notion of similarity, and, subsequently, complementarity, between proteins. From the application standpoint, the identification of characteristic features can point to protein functional regions and to new target sites for blocking the activity of pathological proteins in the drug discovery field. These features can become more specific if one adds to the geometry of the molecular surface also the information related to the main physicochemical descriptors, such as local electrostatic potential [2], residue hydrophobicity [3], and the location of hydrogen bond donors and acceptors [4].

The aim of this track is to evaluate the performance of retrieval and classification of computational methods for protein surfaces characterized by physicochemical properties. Starting from a set of protein structures in different conformational states generated via NMR experiments and deposited in the PDB repository [5], we build their Solvent Excluded Surface (SES) by the freely available Software NanoShaper [6], [7]. Differently from previous SHREC tracks [8], [9], [10], [11] we enrich the protein SES triangulations with scalar fields representing physicochemical properties, evaluated at the surface vertices.

The remainder of this paper is organized as follows. Section 2 overviews the previous benchmarks that were aimed at protein shape retrieval aspects. Then, in Section 3 we detail the dataset, the ground truth and the retrieval and classification metrics used in the contest. The methods submitted for evaluation to this SHREC are detailed in Section 4, while their retrieval and classification performances are presented in Section 5. Finally, discussions and concluding remarks are in Section 6.

Section snippets

Related benchmarks

The interest of recognising proteins and other biomolecules solely based on their structure is a lively challenge in biology and the scientific literature is seeing the rise of datasets and methods for surface-based retrieval of proteins. The Protein Data Bank (PDB) repository [5] is the most widely known public repository for experimentally determined protein and nucleic acid structures. The PDB collects over 175,000 biological macromolecular 3D structures of proteins, nucleic acids, lipids,

The benchmark

During years, we witness the consolidation of the idea that to have a more satisfactory answer to the protein shape retrieval problem it is necessary to combine geometry with patterns of chemical and geometric features [14]. For this reason, we move from the previous SHREC experiences to build a dataset equipped of both characteristics.

Description of methods

Eight groups from five different countries registered to this track. Five of them proceeded with the submission of their results. Each participant was allowed to send us up to three runs for each task, in the form of a dissimilarity matrix per run. All but one submitted three runs per task; one participant delivered three runs for Task A and one for Task B. Overall, Task A has gathered 15 runs, while Task B has 13 runs.

In the following, we will denote the methods proposed by the five

Comparative analysis

The performances of each run presented in Section 4 are here quantitatively evaluated on the basis of the measures described in Section 3.3. We remind the reader that: Task A refers to the mere use of geometry, while Task B includes both geometry and physicochemical properties; for any run, the method name and its specific settings are given in Section 4. The performance measures are presented for both the PDB and BLAST classifications detailed in Section 3.2.

An additional analysis, reported in

Concluding remarks

In this paper, we have provided a detailed analysis and evaluation of state-of-the-art retrieval and classification algorithms dealing with protein similarity assessment based on molecular surfaces, which we believe deserve attention from the research community. The introduction of physicochemical properties into the benchmark, represents an element of originality in the available benchmarks for structural biology and provides a more complete representation of the protein. To enable the

CRediT authorship contribution statement

Andrea Raffo: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Project administration, Writing - original draft, Writing - review & editing. Ulderico Fugacci: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Project administration, Writing - original draft, Writing - review & editing. Silvia Biasotti: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Writing -

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowlgedgments

The track organisers thank Dr. Michela Spagnuolo and Dr. Davide Boscaini for the fruitful discussions. Special thanks go to Ms. Daniela Bejan, for her help in using the software PyMol.

This project is co-funded by the project “TEACUP: Metodi e TEcniche innovative per lo sviluppo di librerie per la modellazione, l’Analisi e il confronto CompUtazionale di Proteine”, POR FSE, Programma Operativo Regione Liguria 2014–2020, No RLOF18ASSRIC/68/1. The CNR-IMATI research is partially developed in the

References (42)

  • N. Song et al.

    SHREC’17 Track: protein shape retrieval. 3Dor ’17

    Proceedings of the Workshop on 3D Object Retrieval

    (2017)
  • F. Langenfeld et al.

    SHREC 2018 - Protein shape retrieval

    Proceedings of the 11th Eurographics Workshop on 3D Object Retrieval

    (2018)
  • F. Langenfeld et al.

    SHREC19 Protein shape retrieval contest

  • F.M. Richards

    Areas, volumes, packing, and protein structure

    Annu Rev Biophys Bioeng

    (1977)
  • M.L. Connolly

    Analytical molecular surface calculation

    J Appl Crystallogr

    (1983)
  • P. Gainza et al.

    Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning

    Nat Methods

    (2020)
  • P. Cignoni et al.

    Meshlab: an open-source mesh processing tool

  • L. Wang et al.

    Using delphi capabilities to mimic protein’s conformational reorganization with amino acid specific dielectric constants

    Commun Comput Phys

    (2013)
  • E. Jurrus et al.

    Improvements to the apbs biomolecular solvation software suite

    Protein Sci

    (2018)
  • B. Rost

    Twilight zone of protein sequence alignments

    Protein Engineering, Design and Selection

    (1999)
  • C. Camacho et al.

    Blast+: architecture and applications

    BMC Bioinformatics

    (2009)
  • Cited by (0)

    This article has been certified as Replicable by the Graphics Replicability Stamp Initiative: http://www.replicabilitystamp.org

    1

    Track organizer.

    View full text