PDBrt: A free database of complexes with measured drug-target residence time [version 2; peer review: 1 approved with reservations]

Background: Difficulties in translating the in vitro potency determined by cellular assays into in vivo efficacy in living organisms complicates the design and development of drugs. However, the residence time of a drug in its molecular target is becoming a key parameter in the design and optimization of new drugs, as recent studies show that residence time can reliably predict drug efficacy in vivo . Experimental approaches to binding kinetics and target ligand complex solutions are currently available, but known bioinformatics databases do not usually report information about the ligand residence time in its molecular target. Methods: To extend existing databases we developed the Protein Data Bank (PDB) residence time database (PDBrt) which reports drug residence time. The database is implemented as an open access web-based tool. The front end uses Bootstrap with Hypertext Markup Language (HTML), jQuery for the interface and 3Dmol.js to visualize the complexes. The server-side code uses Python web application framework, Django Rest Framework and backend database PostgreSQL. Results: The PDBrt database is a free, non-commercial repository for 3D protein-ligand complex data, including the measured ligand residence time inside the binding pocket of the specific biological macromolecules as deposited in The Protein Data Bank. The PDBrt database contains information about both the protein and the ligand separately, as well as the protein-ligand complex, binding kinetics, and time of the ligand residence inside the protein binding site.


Introduction
Current knowledge allowing for the determination of the effectiveness of small molecules in vivo is limited.It is known that the leading reason for drug candidate failure is the lack of efficacy caused by a poor translation of in vitro potency assays into in vivo activity in humans (Copeland, 2016a(Copeland, , 2016b;;Swinney, 2009).In vitro experimentation refers to closed system conditions in which the drug molecule and its target are present at unchanging concentrations throughout the experiment (Tummino & Copeland, 2008).However, in living organisms, processes run under open, non-equilibrium conditions where the drug is constantly interacting with various molecules during many physiological processes in addition to its native target.To improve prediction of in vivo drug efficacy measurements of drug-target complexes, the residence time, defined as the reciprocal of the dissociation rate constant (k off ), should be considered.Drug-target residence time is crucial because pharmacological activity depends on the drug being bound to its molecular target.When the drug dissociates from the binding site, the target molecule is free to continue its pathophysiological function.Within 10 years of the development of the drug-target residence time concept, the parameter has become an extremely important factor in the process of optimizing lead structures in computer-aided drug design (Copeland et al., 2006;Copeland, 2016aCopeland, , 2016b)).Traditional computational approaches (such as molecular docking) take into account only the equilibrium affinity of the drug for its molecular target (e.g., K i ); however, the concept of residence time also takes into account conformational dynamics of molecules, which can have a significant effect on the binding and dissociation of the drug (Copeland et al., 2006;Tummino & Copeland, 2008;Copeland, 2021).Nevertheless, there is some criticism indicating that using the residence time as the only measure of drug efficacy provides a limited picture of binding and affinity kinetics and is a suboptimal way to guide drug discovery programs (Folmer, 2018).Existing free and commercial software does not include a residence time model that is able to estimate this quantity, and thus the effects of ligand modification relevant to the design and optimization of drug candidates is not available.
To enable further studies and the development of useful drug-target residence time models, we created a database of experimentally measured residence times for biomolecular complexes deposited in the Protein Data Bank (PDB).These data represent a link between structural and kinetic information of the complexes which may be helpful for various computational and machine learning studies on drug-like molecules in biological systems.The current version of our database, called PDBrt, contains 59 complexes with experimentally measured residence time including seven protein families, and 56 small molecules.Summary statistics are available in Figure 1 and Table 1.

Implementation
PDBrt database design and structure The PDBrt database has been designed as an interactive web interface where the user can browse and extract information about the ligand residence time in its molecular target.The RESTful application programming interface (API) with the Django Rest Framework (v2.2.20) and a backend PostgreSQL (v12) database running on a Nginx (v1.21.3) web server has been developed to represent the output of a query as a user-friendly web page generated in Bootstrap (v3.3.7) with Hypertext Markup Language (HTML), Cascading Style Sheets (CSS) and jQuery (v3.5.1) to report the results.Search, query, and data extraction and visualization systems were developed for searching ligand residence time and binding kinetics coefficients.The PDBrt database facilitates access to information about the ligand residence time in its molecular target.

PDBrt database management system
The Database Management System (DBMS) allows users and programmers to manipulate the data in a systematic way.The DBMS serves as an interface between the database and end users, ensuring consistent data organization and easy accessibility.The PDBrt database is a type of relational DBMS using Structured Query Language (SQL) as the standard programming language for data manipulation.
Development of the PDBrt database was a multi-step process consisting of a systematic literature search, abstract and report screening, and article review.Several sequential steps involved in data management ensure that the processed data is accessible, reliable, and current for its users (Figure 2).

REVISED Amendments from Version 1
The changes made include the addition of new references for a more critical approach to the residence time model.
Any further responses from the reviewers can be found at the end of the article Table 1.A list of chemical compounds in pairs with a given protein family.

Binding kinetics data acquisition and extraction
Data acquisition starts with collection of protein-ligand binding kinetics data from available literature by reviewing the primary reference of each pdb file in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB) to retrieve the experimentally measured ligand-target residence time or dissociation rate (k off ).The references were downloaded and carefully studied to manually extract the data.Only complexes with known ligand-target residence time or dissociation rate were added into the final dataset.Additionally, major binding kinetics coefficients were collected (if available): inhibition constant (K i ) and association rate (k on ).Currently the PDBrt database includes 59 protein-ligand complexes with known ligand-target residence time.Structures will be added on a regular basis as respective data becomes available.

Structural data acquisition
Complexes with particular PDB identifiers are downloaded from the RCSB PDB database into the internal PDBrt database core.The protein molecule along with other components such as water molecules and metal ions were saved in the pdb format, while the ligand (drug) in Structure Data Format (sdf ).Neither the protein nor the ligand was subjected to any structural optimization or modification after being downloaded from the RCSB PDB.

PDBrt data content
PDBrt database data contain, in addition to the coordinates and general information required for all deposited structures in the RCSB PDB database, target residence time and other binding kinetics coefficients, structure files mentioned above, basic ligand properties like simplified molecular-input line-entry system (SMILES) or International Chemical Identifier (InChI) string, as well as links to external databases: RCSB PDB, PDBj, PDBe, PDBsum, and the reference literature in PubMed.The database includes citations to the original sources (publications) that contain information about the experimentally measured residence time or dissociation rate.For each complex, a web-based three-dimensional rendering is provided using the free, openly available object-oriented JavaScript library 3Dmol.js,which is used for visualizing molecular data.

PDBrt database availability and updates
PDBrt is available to the community through its web-based interface.Since the PDB database is growing rapidly and drug-target residence time is a parameter of great interest for drug design and optimization, the PDBrt will be updated on a regular basis with major versions issued annually.New data can be added by the authors (database administrators) either by uploading an xlsx (MS Excel) file or manually.

PDBrt database architecture
PDBrt is a three-tier architecture: 1. Data tier comprises database and data access layer.
2. Application tier controls application functionality by performing detailed processing.
3. Presentation tier is accessible for end-users and displays information on the website.
Core relational database managed by PostgreSQL server provides information storage for the deposited data.Back end (data access layer) was implemented in Python (v3.6) and the front end of the database (presentation layer) in HTML/ CSS.

PDBrt database model
In PDBrt data is presented as a collection of relations -tables.Each column (also called attribute or field) in the table has a distinct name and a specific data type assigned to it.All the information related to a particular type is stored in a row (also called record) of that table.PDBrt database has three main columns ('Complex', 'Protein', 'Ligand') and 59 records with one-to-one type of relationship.This means that one protein or ligand could only belong to one complex and one complex consists of only one specific protein and ligand molecule (Figure 3).

Operation
PDBrt is available to the community through its web-based interface and is freely available to non-commercial users.The PDBrt database runs on all modern web browsers.See Software availability (Ługowska & Pacholczyk, 2021) for access to the database and code.

Use case
To show the usage of the PDBrt database, the Unified Modelling Language (UML) use case diagram was adopted.
Figure 4 shows five use cases of PDBrt as well as three actors: system administrator, end user and the database.The database is an actor for all use cases, system administrator for two and end user for three of these.The database actor is involved in all five use cases because it stores the data and enables operations by which advanced data handling functions are created.In the 'complex management' use case the system administrator can perform basic Create, Read, Update and Delete (CRUD) actions on the 'Complex' table: add, delete, update as well as upload new data as Microsoft Excel (xlsx) file format.In the 'user management' use case the system administrator can add, edit, and delete a regular user.In the 'view', 'search' and 'download' use case a user actor is involved.In the 'view' use case a list of complexes is displayed to the user who can choose a single complex and view its details as well as its 3D visualization.In the 'search' use case the user can filter from the whole database by PDB code, residence time, and protein or ligand name.
The 'download' use case allows users to either download the data stored in the database as plain text file or pdb/sdf file formats.
Query interface has been implemented for the query of data within PDBrt. Figure 5 shows how the query options are organized.The search engine provides one form field for keyword search and allows retrieval by PDB code, protein name and residence time.
Two user interfaces provide extensive information for result sets obtained for particular search query.The 'homepage' interface allows access to some general information in tabular format and offers the possibility to download whole sets of data files for result sets consisting of multiple PDBrt entries.The 'complex detail' interface provides information about individual structures as well as cross-links to many external resources for macromolecular structure data.

Discussion
Drug-target residence time has been shown to play an important role in the prediction of in vivo efficacy of the drug.Since the parameter is independent of drug and enzyme concentration, it is important in the drug design process and should be considered at an early stage.The availability of information about drug-target complexes with known (measured) ).However, these databases (except for BindingDB) do not store detailed information about binding rates, and none of these contain direct information about ligand (drug) residence time in its target macromolecule.The PDBrt database is dedicated to reporting the structure of protein -small molecule complexes, along with their target residence time and additional binding parameters (K i , k on , k off ).Currently a total of 59 protein-ligand complexes are deposited in the webbased PDBrt database and this data will be updated frequently as more data is made available.The information associated with the existing 59 protein-ligand complexes is available in Underlying data (Ługowska, 2021).

Wiesław Nowak
Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Torun, Poland Mankind needs new, effective medicines, and science should provide them.
For many years expectations with respect to molecular modeling were high: medicinal chemists looked for substantial help from the computer modeling community, and computational chemists provided hundreds of methods, including high-throughput virtual docking, aimed at selecting best drug leads.There are some success stories of such approach (Śledź & Caflish, 2018) but, given the human effort and money, the outcome measured by a number of new effective drugs is, in my opinion, at most "moderate", and some other more critical medical specialist may say "mediocre".The problem of rational, structure based drug design is too complex that it may be solved by, for example, a simple ligand-protein docking.Fortunately, there is a hope in this field.My optimism is based on a recent success of AlphaFold2 computer method that won CASP14 competition in protein folding (Jumper at al, 2021).The Artificial Intelligence (AI) based algorithm outperformed all the best research groups in a series of tasks that seemed to be too complex to be solved in reasonable time: how to predict a true 3D protein structures from the known 1-D sequences.So, one may expect that AI/Machine Learning (ML) based methods will be developed to a similar level of efficacy in drug-design field as well.However, ML needs good quality training data, and for that a ligand residence time seems to be very serious factor in estimation of biological effect of a tested molecule on a metabolic pathway of interest.
The manuscript "PDBrt: A free database of complexes with measured drug-target residence time " by M Ługowska and M. Pacholczyk addresses an important drawback in the computational drug design: lack of a database collecting in one place measured drugs' residence times.The authors prepared the first version of a database containing some 50 ligand-protein complexes with known kinetic characteristics.This is a highly desirable tool that may contribute to improving rational computational pharmacology.
The idea of creating such database is in my opinion very good and just in time.
Presentation of the software project is correct and contains the majority of required information.
The depositories indicated by the authors are active and apparently contain relevant data.The information content of the database may be always better, for example, I would appreciate having in it pictures (just in jpg format) showing structural 2D formulas of the scrutinized ligands.Biological essays to get residence time numerical data are very different, perhaps a separate field with comments on that and adequate error bars extracted from the original papers could also contribute to more critical usage of the data.So, the effort is ranked high, but, for the current version of the manuscript, I have two main objections/postulates: The web version of the PDBrt did not work for me, neither from Windows, Firefox, nor Ipad Safari.Authors do not precise in the manuscript how actually the user can work with the online version of the PDBrt.In the main www page there is no Help or Readme button.I do not know: shall I register to use it or not?From that main page an unexperienced user simply does not know how to get records with the sought ligand residence time for a particular protein (keyword search works only if a perfect match is indicated).One can reasonably expect that the residence time should be presented in some time units (seconds/minutes/hours) or, if it is not the case (depending on a definition adopted), an explanation should be somewhere given (at least in the manuscript). 1.
GitHub depository contains a source code of the database, but again, for me it was not so clear what do I need to install all software locally (what are hardware requirements, under what system?) to make full use of it.

2.
A minor point: In the literature there is some criticism against residence time concept, and this should be mentioned in the paper, see for example: (Folmer 2018).R.A. Copeland, the main author of the idea of drug residency time, published recently an interesting review and update of this topic, it should be referenced in the corrected manuscript as well (Copeland 2018).
After adding more instructions to less experienced potential users of the database the manuscript should be promoted to "abstracted" status and certainly will attract some users.Sure enough, the number of records should grow and real users will customize the created software to their local needs.

replication of the software development and its use by others? Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?Partly Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: biophysics, bioinformatics, computer modeling of proteins, single molecule nanomechanics, atomic and molecular physics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
where the user can find the following information: "Getting started" with a general description of the PDBrt.
○ "How to search?" with a description of search options and query results (how the user can obtain desired information about drug-target residence time or other parameters).
○ "Explore PDBrt entry" with a description of a single protein-ligand details and how the user can work with it.

From that main page an inexperienced user simply does not know how to get records with the sought ligand residence time for a particular protein (keyword search works only if a perfect match is indicated).
To search the database, please enter one of the following in the field: PDB ID, protein or ligand name, residence time.Unfortunately, the current version does not allow advanced search, including combining individual features of the searched complex.
One can reasonably expect that the residence time should be presented in some time units (seconds/minutes/hours) or, if it is not the case (depending on a definition adopted), an explanation should be somewhere given (at least in the manuscript).
Information about the unit of time in which the residence time is presented has been added to the main table on the homepage.Such information can also be found in the table with detailed information about a single complex on the Structure Detail Page.

GitHub depository contains a source code of the database, but again, for me it was not so clear what do I need to install all software locally (what are hardware requirements, under what system?) to make full use of it.
The README file was added to the Github repository with basic description how to work with the PDBrt locally.
In the literature there is some criticism against residence time concept, and this should be mentioned in the paper, see for example: (Folmer 2018

). R.A. Copeland, the main author of the idea of drug residency time, recently published an interesting review and update of this topic; it should be referenced in the corrected manuscript as well (Copeland 2018).
Thank you for bringing these 2 references to our attention.These have, of course, been included in the revised manuscript.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com

Figure 1 .
Figure 1.Distribution of protein families stored in PDBrt.

Figure 3 .
Figure 3. PDBrt database model -table of data relations and types.

Figure 4 .
Figure 4.A use case diagram for the PDBrt database.Diagram shows a subset of functions available to the regular user and website administrator.'Rectangle' represents a user, 'rounded rectangle' represents a use case and 'arrow' represents a relationship.

Figure 5 .
Figure 5. Current query capabilities of the PDBrt diagram.Diagram presents the structure of the PDBrt website.Each symbol (shape or arrow) presents PDBrt web page (single web view), page content, group of related content on a single page, relationship between web pages and group of similar web pages as described in the legend.