AlloMAPS: allosteric mutation analysis and polymorphism of signaling database

Abstract AlloMAPS database provides data on the causality and energetics of allosteric communication obtained with the structure-based statistical mechanical model of allostery (SBSMMA). The database contains data on allosteric signaling in three sets of proteins and protein chains: (i) 46 proteins with comprehensively annotated functional and allosteric sites; (ii) 1908 protein chains from PDBselect set of chains with low (<25%) sequence identity; (iii) 33 proteins with more than 50 known pathological SNPs in each molecule. In addition to energetics of allosteric signaling between known functional and regulatory sites, allosteric modulation caused by the binding to these sites, by SNPs, and by mutations designated by the user can be explored. Allosteric Signaling Maps (ASMs), which are produced via the exhaustive computational scanning for stabilizing and destabilizing mutations and for the modulation range caused by the sequence position are available for each protein/protein chain in the database. We propose to use this database for evaluating the effects of allosteric signaling in the search for latent regulatory sites and in the design of allosteric sites and effectors. The database is freely available at: http://allomaps.bii.a-star.edu.sg.


INTRODUCTION
It is a common agreement nowadays that allosteric signaling is omnipresent (1) in regulation of activities of proteins (2) and molecular machines (3) with different structures and functions regardless of their sizes, oligomerization states and interactions with other molecules (4,5). Non-competitive and modulatory rather than on/off modes of action--both archetypal characteristics of allosteric signaling--opened an opportunity for design of allosteric drugs (6,7), which could help to avoid/reduce side effects, such as toxicity and receptor desensitization, typical for traditional orthosteric compounds (4,7,8). These advantages of allosteric effectors and growing number of success cases in the design of allosteric drugs (9)(10)(11)(12)(13) have fostered active research in the field of allostery. Additionally, clear indications of the allosteric effects of mutations (14,15) and their therapeutic potential (16), as well as recently reported abundance of deleterious mutations in allosteric sites (17) opens a new field of allosteric mutagenesis.
Freely available data and resources for investigating allosteric mechanisms (18)(19)(20)(21) span from small datasets of socalled classical allosteric proteins with detailed description of experimentally determined sites and phenomenology of allosteric regulation to the benchmark sets and collections (22), and to the large literature-based lists of proteins, descriptions of allosteric modulatory actions, and allosteric networks derived from the analysis and crosslinking of different databases (23). Our goal here is two-fold: (i) to provide comprehensive data on allosteric causality and signaling obtained, for the first time, on the basis of a physicsbased model and expressed in real energy units; (ii) to derive these data for both the selection of classical allosteric proteins and for almost two thousand PDBselect protein chains (24) with low sequence identity and diverse structures belonging to 411 CATH (25) topologies. Additionally, we analyzed 33 proteins with >50 SNPs, documenting allosteric effects of mutations and potential allosteric polymorphism.
Despite the wide diversity of biological functions involving allosteric regulation, the very molecular mechanism of allosteric communication is always determined by the protein structural dynamics, hence it can be formalized in the framework of a generic physical model. We have recently developed a structure-based statistical-mechanical model of allostery (SBSMMA), which allows one to quantify the free energy of allosteric modulation originated by the perturbations, such as ligand binding and mutations ( (14,21,26), see D266 Nucleic Acids Research, 2019, Vol. 47, Database issue also reference (7) in Tutorial). Reversibility of allosteric signaling, its potential for predicting allosteric sites, and for inducing required allosteric signaling from newly designated ones were also demonstrated (27). We have recently developed the AlloSigMA web server (21), which is an implementation of the SBSMMA (26). Contrary to earlier published SPACER server (18), which is based on more phenomenological concepts of binding leverage (2) and leverage coupling (3), the AlloSigMA allows one to evaluate real energetics of the allosteric signaling on the basis of SBSMMA (26). However, the computational cost in obtaining the exhaustive allosteric signalling maps (ASMs) in SBSMMA, especially in large proteins, made it impossible to perform this analysis on-line and motivated us to build a database comprising a large set of proteins with precalculated ASMs. The AlloMAPS database is a suite of interactive tools for exploratory analysis of causality and energetics of allosteric signaling. It quantifies direct and reverse allosteric signaling from/to allosteric sites and mutated residues, evaluating the modulatory effects of perturbations on the allosteric regulation, and allowing estimation of allosteric effects of non-native allosteric sites and mutations. The database can be used for exploring allosteric communication between known allosteric and functional sites, for the detection of potential latent regulatory sites, for using allosteric effects of mutations for direct modulation of protein activity, and for affecting the latter via tuning allosteric signaling from regulatory exosites.

THEORETICAL BACKGROUND AND COMPUTA-TIONAL METHODS
The recently developed structure-based statistical mechanical model of allostery (SBSMMA, (26)) is used here for the calculation of allosteric free energy, or work exerted on the regulated sites and residues as a result of a perturbation such as ligand binding, mutations, or their combinations. SBSMMA is based on the harmonic model of a protein (Figure 1), where perturbations by the ligand binding are mimicked by increasing the interaction strength of contacts between residues of the binding site, and the effects of destabilizing/stabilizing mutations are modeled by weakening/strengthening interactions in the contact network of the mutated residue ( (14,21), see also reference (7) in Tutorial). The model consists of sequential steps in which, first, configurational ensembles of the unperturbed (0) and perturbed (P) states are characterized by corresponding sets of orthonormal modes e (0) μ and e (P) μ . Second, normal modes (in the normal mode analysis, the effective Cα harmonic potential introduced in (28) is used to approximate the global dynamics of the proteins near equilibrium) are used for the calculation of an allosteric potential U i (σ ) = 1/2 μ ε μ,i σ 2 μ , where ε μ,i = j |e μ,i − e μ, j | 2 are the parameters. The σ = (σ 1 , . . . , σ μ , . . .) is a vector of Gaussian variables with zero mean and variance 1/ε μ, i , each of which is associated with the corresponding sets of normal modes. The allosteric potential measures the total elastic work experienced by a residue as result of the change of displacement of its neighboring residues caused by a linear combination of normal modes, where the change of displacement of a residue is r i = μ e μ,i σ μ . Third, by inte- grating the allosteric potential over possible configurations of the residue's neighbors, per-residue free energy difference between the free and bound states caused by the perturbation (P) is estimated which depends exclusively on the parameters ε μ,i that characterize the unperturbed (0) and perturbed (P) protein conformational ensembles (see for details (14,21,26,27), also reference (7) in Tutorial). The allosteric modulation ( Figure 1), or background free allosteric effect, is evaluated as a deviation of the obtained free energy difference from its mean value over the protein chain: Allosteric modulation close to zero indicates that the response at the residue/site of interest is similar to the proteinaverage g (P) i value, i.e. to the background effect on the whole protein. In order to monitor the effect of a perturbation on the functional sites of interest, the allosteric modulation per site is obtained as an average over all the residues belonging to the site: In UP mutation (m ↑), the strength of interactions in the contact network of the mutated residue is increased to simulate a substitution to a bulky one at residue m. Conversely, DOWN mutation (m ↓) of a residue to small (Ala/Gly-like Nucleic Acids Research, 2019, Vol. 47, Database issue D267 residue) is modeled by a decrease in the strength of interactions with its neighbors (Figure 1). The modulation range, a generic description of the allosteric effect of an amino acid substitution in a certain sequence position, is calculated as a difference between the responses caused by mutation from the smallest (Ala/Gly-like) to the bulkiest amino acids: Large positive or negative h (P) SI T E and h (P) i values correspond to an increase or decrease of work exerted on a residue/site by its neighbors, which may induce or prevent local conformational changes.

DESCRIPTION OF THE DATABASE
The database provides access to massive data on allosteric signaling in about 2000 proteins and protein chains grouped in three sets. First, set of 'Allosteric proteins' includes 46 proteins with the knowledge on allosteric regulation welldocumented on the basis of experimental works (27). This set of proteins can be used for exploring the causality and energetics of allosteric signaling between known functional and regulatory sites and for predicting possible locations of latent allosteric sites in these proteins (2). The 'PDBselect chains' set of 1908 protein chains contains representative structures with low sequence identity (less than 25%), allowing the user to survey allosteric signaling in a wide diversity of structures presented in the Protein Data Bank (29). Finally, the 'Allosteric polymorphism' set includes 33 proteins with multiple (more than 50 in each protein) known pathology-related SNPs, allowing the study of potential allosteric mechanisms in modulation or disruption of protein activity by mutations.
Given the list of available and annotated binding sites, effects of the ligand binding to these sites and their combinations on all residues of the protein are pre-calculated and can be immediately accessed. Modulation of these effects originated by individual mutations can be evaluated for mutations of residues designated by the user. Allosteric signaling from the sites with bound ligands to other known binding sites in the protein can be observed, along with cooperativity effects upon sequential binding of ligands to subunits of the oligomer. Data on allosteric modulation of the sites as a result of individual mutations of all residues are also available, as well as data on the effects of individual mutations and their combinations on all residues in the protein. Sequence positions undergoing stronger/weaker allosteric modulation or originating stronger/weaker allosteric modulation can be determined. The strongly modulating residues can be considered as candidates for comprising new, de novo designed, allosteric sites in order to achieve required allosteric signalling. The ASMs are precalculated for the cases of modulation range, UP, and DOWN mutations, and they are presented along with the protein distance matrices for the efficient analysis of allosteric signaling from the perspective of protein's structural traits.
All the data obtained by the user in the working session can be downloaded for further processing, using the 'Download data' button in the protein information panel accessible at the top of any page (see Tutorial for details).
The database consists of three sets of proteins accumulated according to the criteria below. Allosteric proteins. This set, total 46 proteins, contains classic allosteric enzymes from previous studies (21,26,27) complemented by proteins from the benchmarking collection of allosteric proteins ASBench (22) on the basis of the following requirements (see (27) for more details): (i) if allosteric effects involve change of the oligomerization state and proteinprotein interactions, protein records lack information on functional sites or part of the structure, or other relevant information is missing, these proteins were omitted; (ii) operational definition of allosteric sites (27) was used for obtaining list of proteins with true regulatory exosites. PDBselect set. We use the PDBselect as of Nov 2017, which contains 4184 protein chains with <25% sequence identity. Selecting structures belonging to 13 'most popular' architectures according to CATH annotation, solved only by X-ray crystallography, and with sizes >30 amino acid residues, we analyzed 1908 protein chains, which represent 411 topologies and 984 homologous superfamilies, respectively. Allosteric polymorphism. Proteins with at least one sequence variant that is reported to be implicated in a disease and with a 3D structure were obtained from the list of human entries with polymorphisms or mutations (https: //www.uniprot.org/docs/humpvar) from Uniprot. From this list of proteins, 33 proteins with at least 50 SNPs that are linked to disease(s) were collected using the human polymorphisms and disease mutations index (https://www. uniprot.org/docs/humsavar). Both humpvar and humsavar lists are obtained as of September 2017. There is at least one ligand-binding site in 27 proteins in this list. Figure 2 shows a flowchart navigating through the database. Starting from the homepage with three icons for 'Allosteric proteins', 'PDBselect chains' and 'Allosteric polymorphism' parts of the database, the user can see a list of proteins or protein chains (in case of 'PDBselect chain' option) in the corresponding part of the database. Choosing the required entry from the list, the user obtains an access to the main data-record page for the selected protein/protein chain. The page is divided into two parts: 'Structure view' on the left half, and 'Sequence view'--right half. Below, the functional tabs and buttons are listed with a brief description of their role in the database.

STRUCTURE OF THE DATABASE AND WEBSITE NAVIGATION
'Binding sites' and 'Mutations' buttons located in the left structure view panel allow the user, by clicking on the button, to observe the corresponding site/residue in the structure and to obtain data on the role of these sites/mutations in allosteric modulation.
'Residues: Binding/Mutations' tab provides an access to the structure and sequence views in the data pages, allowing to analyze structure, allosteric effects of sites and mutations, effects on the sites (using button 'Show effects on sites' in the bottom of sequence view panel), and strongly/weakly modulated sequence positions as a result of the ligand binding and/or mutations (using 'High h' and 'Low h' buttons). Using tabs 'UP' or 'DOWN' user can analyze effects of stabilizing or destabilizing mutations in the sequence positions picked in the sequence view.
'Allosteric signaling map' tab provides access to the total scanning of single residues mutations. Tabs 'UP' or 'DOWN' visualize Allosteric Signaling Maps (ASMs) for stabilizing and destabilizing mutations, respectively. The tab 'Modulation range' shows ASM with a generic characteristic of the allosteric effect of the amino acid substitution in a sequence position--the difference between allosteric responses to mutation from the smallest (Ala/Gly-like) to bulkiest residue in the corresponding position. All ASMs are complemented by the matrices of the inter-residue distances in the protein/protein chain, which help to interpret the information contained in the ASM in relation to the structure and distances between allosterically communicating parts/residues of the protein.
'Sites: Mutations' tab allows one to observe effects of mutations on the binding sites described in the PDB file. By clicking on the button under the heading 'Effect of mutations on sites' user will obtain data on the allosteric signaling from all sequence positions to the corresponding site.
'Sites: Binding' tab provides a graph representation of the allosteric signaling from the analyzed site picked by the user to other binding sites listed in the PDB file. Cooperativity effects emerging upon sequential ligand binding to oligomer's subunits can also be observed by using buttons that designate the number of bound monomers. Figure 2 illustrates the database outputs, using 3pfk (ATPdependent-6-phosphofructokinase) and 5in3 (galactose-1phosphate uridylyltransferase) as examples. The 3pfk main data-record page (top center) shows results of the analysis of allosteric modulation in the PFK structure caused by the binding of ADPa activator to all four PFK subunits. A per-residue allosteric modulation caused by the 4xADPa binding is indicated by the color gradient in both the structure and sequence views. The 'High Δh' button shows residues under stronger than average positive modulation (highlighted by orange), i.e. residues likely undergoing conformational changes larger than the average of the protein/protein chains as a result of the perturbation caused by the 4xADPa binding. For proteins with multiple symmetry-related binding sites, the binding sites chosen by the user are perturbed concurrently, essentially mimicking the binding of multiple ligands to the corresponding sites in different subunits. We provide in the database the perresidue h values upon perturbation of any combination of symmetry-related binding sites. For example, user interested in the allosteric modulation upon perturbing two of the four

CONCLUSION
AlloMAPS database provides unique opportunities for exploring allosteric regulation of protein activity, as it combines analysis of about 2000 proteins and protein chains representing wide diversity of sequences, structures, and functions, with a rigorous calculation of per-residue allosteric free energy on the basis of the recently developed structure-based statistical mechanical model of allostery (SBSMMA, (21,26,27), see also reference (7) in Tutorial). Causality and energetics of allosteric signaling analyzed for known allosteric sites and SNPs allows the user to obtain important estimates on the energetics of allosteric effects, which can be used in the search for latent allosteric sites and in the design of new allosteric ligands. Several instruments in the database are aimed at modeling direct allosteric signaling caused by the ligand binding and mutations, as well as at exploring and using the modulatory effects of sites and mutations on the energetics of other sites and sequence positions. Different combinations of binding and mutations can also be considered. Allosteric Signaling Maps (ASMs) obtained from the total scanning of mutations is the first exhaustive description of the allosteric signaling in per-residue resolution. In addition to the comprehensive description of stabilizing and destabilizing mutations, it provides a generic quantitative characteristic of the allosteric effect of a sequence position --ASMs of the modulation range. We expect that ASMs for stabilizing/destabilizing mutations and D270 Nucleic Acids Research, 2019, Vol. 47, Database issue for modulation range will be used as a valuable source of information on the energetics of allosteric signaling in different design efforts. In particular, ASMs provide an opportunity to estimate modulatory effects of mutations, to combine signaling from several protein residues in order to achieve required strength of allosteric response, and even to estimate signaling from potential new sites built as a combination of considered residues. To conclude, we hope that AlloMAPS can become instrumental in targeting important tasks in protein engineering and design, such as prediction of latent regulatory exosites, evaluation of allosteric effects of mutations, modulation of protein activity and effects of the ligand binding via directed mutagenesis, and, finally, harmonized design of allosteric sites and effectors.

FUNDING
Funding for open access charges: Biomedical Research Council, Agency for Science, Technology and Research (A*STAR). Conflict of interest statement. None declared.