Computational modeling of protein conformational changes - Application to the opening SARS-CoV-2 spike

We present a new approach to compute and analyze the dynamical electro-geometric properties of proteins undergoing conformational changes. The molecular trajectory is obtained from Markov state models, and the electrostatic potential is calculated using the continuum Poisson-Boltzmann equation. The numerical electric potential is constructed using a parallel sharp numerical solver implemented on adaptive Octree grids. We introduce novel a posteriori error estimates to quantify the solution's accuracy on the molecular surface. To illustrate the approach, we consider the opening of the SARS-CoV-2 spike protein using the recent molecular trajectory simulated through the Folding@home initiative. We analyze our results, focusing on the characteristics of the receptor-binding domain and its vicinity. This work lays the foundation for a new class of hybrid computational approaches, producing high-fidelity dynamical computational measurements serving as a basis for protein bio-mechanism investigations.


Introduction
Proteins are polymers composed of amino acid chains that are folded into well-defined three-dimensional shapes. The structure and of this shape is crucial for proper protein function. However, in many proteins the three-dimensional structure is not enough, and function must be facilitated by specific intra-protein motions. [15,18,37,39]. The SARS-CoV-2 spike (S) protein is one such protein [15,18,37,39]. Naturally found as a homotrimer, the S-protein contains a buried receptor binding domain (RBD). The RBD has a high affinity to the human angiotensin-converting enzyme 2 (ACE2), and binding mediates viral entry to the cell. However, to facilitate ACE2 binding, the RBD must be exposed through a series of complex motions that occur in the unbound S protein trimer. [2,18,20,33]. Recently, Zimmerman et al. [43] produced the first Markov state models simulation of the SARS-CoV-2 spike opening. This computational tour de force involved millions of citizen scientists collaborating through the Folding@home initiative [1], and produced an overall 0.1s of molecular trajectories. For a full characterization of the protein interaction, the molecular trajectory may not be enough.
One aspect that may provide some insight into the interactions and function of a protein is the electrostatic potential it generates. For the SARS-CoV-2 spike, according to a previous study, the affinity constant (derived from the electrostatic potential) for the RBD of SARS-CoV-2 to the ACE2 is 10 to 15 times greater than that of SARS-CoV, potentially contributing to its transmission efficiency [37]. The reason for the higher binding affinity was attributed to several mutations, most notably from the residue Val404, found in SARS-CoV, to the positively charged Lys417 in SARS-CoV-2. This mutation resulted in an intensified electrostatic potential complementarity between the negatively charged ACE2 binding site and the now more positively charged RBD of SARS-CoV-2 [15,37]. To understand the contribution of protein charge repositioning and also to help inform drug design strategies that leverage this distribution, it is imperative to know how the protein's electrostatic potential changes as it deforms.
The Poisson-Boltzmann equation has long been recognized as the representation of choice to model the electric potential generated by proteins in solvents. It has drawn significant interest from the computational community since the pioneering calculations of Warwicker and Watson in the early 1980s [38], which has lead to the production of a broad variety of numerical solvers [4,3,6,7,13,21,17,22] and open source software [12,16,32], built over the traditional spectrum of numerical methods. Such tools are, for example, employed in the context of drug development and discovery to calculate solvation free energies [10,14,28]. Using massively parallel architectures [11], these calculations can be carried out on considerably large proteins, such as the entire HIV-1 capsid (4, 884, 312 atoms, Protein database entry: 3J3Q).
In this work, we combine Markov state model simulations with partial differential equation modeling to obtain dynamical electric potential maps of deforming proteins. To illustrate this novel approach, we leverage the S protein opening trajectory created by the Folding@home initiative to examine and characterize the potential of the SARS-CoV-2 S protein during trimer opening. At each frame of the simulated trajectory, we reconstruct the protein surface and calculate the generated electric potential with the approach developed by Mirzadeh et al. [26,27]. Using adaptive non-graded Octree grids and sharp discretizations, we efficiently produce high-fidelity solutions to the non-linear Poisson-Boltzmann equation. In the continuum model, all temporal variations are neglected, and as a consequence, all potential maps are independent, making their computation embarrassingly parallelizable. To verify the accuracy of our method, we construct practical a posteriori error estimates for the surface representation and electrostatic properties.
Our study of the electrostatic dynamics of the S protein opening reveals dramatic rearrangements of the electrostatic field during this process. These rearrangements act to localize a negatively charged field towards the interior of the S protein and expose a positive surface of its residue binding domain. This is in line with the negative charge of the target binding region on the ACE2 receptor and may aid the S protein in binding to its target with high affinity. The paper is arranged as follows: in section 2, we present the trajectory used for this study. Next, section 3 entails a thorough mathematical design for the calculation of the potential field that is done on each frame in the trajectory. Section 4 encapsulates the numerical method used along with a convergence study for computational validation. The dynamical electrostatic-geometric properties of the spike protein example are characterized in section 5, and conclusions are drawn in section 6.

Opening of the SARS-CoV-2 spike protein
The S protein exists primarily in the closed configuration, hiding the three identical RBDs in its core [43]. The Fold-ing@home trajectory focuses on the opening of the unglycosylated, uncleaved spike configuration through the detachment of a single monomer from the spike's core, revealing its associated binding site. A similar (but not identical) monomericopening state has recently shown to be populated in ≈ 16% of the unbound spike population in a recent cryoEM study [5].

Simulated trajectory
The trajectory, from Zimmerman et al. [43] contains a 71-frame sequence that captures the most populated pathway of the spike protein's transition from its closed to open state. This path was calculated utilizing a goal-oriented adaptive sampling algorithm (FAST, [42]) to favorably sample spike opening.
Each frame consists of a list of N = 51, 671 atoms, represented by their positions x i=1..N , radius r i=1..N and fixed partial charge z i=1..N . Throughout these frames, the protein undergoes continuous deformations, which shift the receptor-binding domain of one of the monomers from being hidden while the spike is in its closed conformation to being fully exposed once the spike opens. Fig. 1a and 1b represent the protein's molecular structure in the open and closed configurations. Each one of its chains is depicted in a different color to illustrate the protein's trimeric structure. The revealed receptor-binding domain is located at the opening extremity of the red chain (see Fig. 1a, 1b).

Relative opening measurement
To provide preliminary quantification of the spike opening, dozens of atom pairs on opposing sides of the spike were chosen, and the distance between them measured as the spike transitions from one conformation to another (i.e. as a function of frame) ( Fig. 1a and b). One of the atoms in the pair was chosen from the binding interface, the monomer depicting in red in Fig. 1(a and b), while the second atom in the pair was chosen across the top of the spike opening. Although all pairs presented the same general trajectory, four of these atom pairs were arbitrarily selected as a subset for illustration.
The resulting relative distance between the atom pairs is measured in Angstroms (Å), as shown in Fig. 1c. The lines labeled 1, 2, 3, and 4 refer to the behavior of the four-pair subset, with their average behavior depicted in black. All four measurements generally follow the same trend. This extension happens continuously outside of frames 50 to 55, which show abrupt variations. This distance is relative to the average distance between each respective atom pair (1, 2, 3, and 4 in the graph), hence the term' relative distance'. (d) The root mean squared deviation (RMSD) is calculated as a function of time with frame 1 (closed state) as the reference frame. (For interpretation of the colors in the figure(s), the reader is referred to the web version of this article.)

Magnitude of the conformation change
To measure the total structural deformation, we compute the root-mean-square deviation (RMSD). It measures the average distance between atoms in the current position and some reference configuration, defined here as the initial closed configuration. Specifically where the superscripts n and 1 are for the current and initial states. Despite large variations at the initial and final stages, the RMSD evolution (depicted in Fig. 1d), shares similar features with the evolution of the relative opening. These similarities suggest that the significant transformations that occur during the opening are recapitulated by the four distances in Fig. 1c.

Continuum modeling
This section describes the reconstruction of the potential map at each iteration from the current protein structure using the non-linear Poisson-Boltzmann (PB) equation. Because of the way the trajectory has been obtained, all frames are independent. Thus, the potential maps are decoupled and can be computed separately. A more comprehensive model, such as the Poisson-Boltzmann-Nernst-Planck model [41,25], would consider the diffusion of the ions and introduce time derivatives in the partial differential equations. Because at the atomic scale (L = 1 Å), for typical diffusivities (D ≈ 10 −9 m 2 s −1 ),

Molecular surfaces
A common way of portraying a molecular surface is by use of the van der Waals Surface depiction (see Fig. 2b). Each atom in a molecule is depicted by a sphere with location x i=1...N and radius r i=1...N that is defined by an isosurface on their electron density. The union of these spheres, depicted in gray in Fig. 2, forms the van der Waals Surfaces (vdWS) of that molecule. The nature of the vdWS means that some regions might be identified as being exposed to the solvent, while in fact, their geometry makes them inaccessible to the solvent particles. As a consequence, we define the Solvent Accessible Surface (SAS [19]) of the protein, depicted as the blue outline in Fig. 2 b; it is formed through the addition of the solvent's particle radius, r P , to each r i=1..N resulting in a buffer around the vdWS. While the puffed-up SAS depiction may be useful for some areas of study, the Solvent Excluded Surface (SES) is preferable when discussing surface details of a molecule [31], as shown in Fig. 2b. As the name suggests, the inner molecule region defined by this surface includes all locations the solvent cannot occupy, including the vdWS and the tiny crevasses on its exterior, shown as concave black triangles at the meeting of atoms A and B in Fig. 2b.

Mathematical representation and numerical construction
To represent the biomolecule, we employ the level set method [30] and capture the SES location as the zero level set of an auxiliary field φ(x) defined over the domain of interest . The solvent − , the Solvent Excluded Surface , and the inside of the molecule + are defined as The normal n to the interface is defined as pointing toward + . It is calculated as the normalized level set gradient n = ∇φ |∇φ| .
The level set function is constructed following the approach proposed by Mirzadeh et al. in [27]. We start by constructing the level set function φ S AS (x) representing the SAS as The SAS level set function is then reinitialized to be a signed-distance function (i.e. |∇φ S AS | = 1) by solving the reinitialization equation in fictitious time τ until the steady state is reached Spike relative permittivity Elementary charge e - Probe radius r p - From the reinitialized function R(φ S AS ), the SES level set function is then obtained as As it was pointed out in [27], this procedure can create non-physical inner cavities. They are identified by finding physical points disconnected from the contour of the computational domain ∂ where the level set function φ(x) is positive. In practice, such points are isolated by solving the following Laplace problem and detecting where φ(x) < 0 and c = 0. The cavities are removed by switching the sign of the level set function at these problematic positions. Finally, for computational purposes the SES level set is systematically reinitialized.

Poisson-Boltzmann equation
The electrostatic potential, , around a biomolecule immersed in a binary z:z electrolyte solution can be described by the following non-linear Poisson-Boltzmann (PB) equation, where is the relative permittivity, being equal to + in the molecule ( + ) and − in the electrolyte ( − ). 0 is the permittivity of a vacuum, N A is the Avogadro number, c b is the bulk salt concentration, e is the elementary charge, k B is the Boltzmann constant, T is the temperature, z is the valence of the background electrolyte, q i and x i are the partial charge and position of the i th atom respectively, N is the total number of atoms in the molecule. c b (x) = 0 inside the molecule. In non-dimensional form, the Poisson-Boltzmann equation takes the following form where the characteristic length L is chosen to be 1 Å, the potential is scaled by the thermal voltage ( k B T e ), z i is the nondimensional partial charge on the i th atom, and κ D = 2c b N A e 2 z 0 k B T L 2 is the non-dimensional inverse of the Debye length. Inside the molecule, κ D is null. The constant λ B = e 2 k B T 0 L is the non-dimensional Bjerrum length. The above non-dimensional PDE is completed with the following jump conditions on non-dimensional potential where the jump operator is defined for any quantity ζ defined in both domain as [ζ ] = ζ + − ζ − . All parameters and non-dimensional numbers, along with their values for this study, are summarized in Table 1.

Solution decomposition
Following [27], we treat the singularities arising in the solution due to the singular charges inside the molecules by using the decomposition proposed by Chern et al. [8]. Doing so we split the non-dimensional potential ψ , into regular and singular parts: ψ and ψ , respectively ψ =ψ +ψ.
The singular part is itself split into two parts ψ * and ψ 0 where ψ * is the Coulombic potential due to singular charges, and ψ 0 satisfies the following Poisson's problem Utilizing the decomposition shown above, the regular (non-singular) part of the solution is given by solving subject to the following jump conditions: [ψ] = 0, ∀x ∈ (21) [ ∇ψ · n] = − + ∇(ψ * + ψ 0 ) · n| , ∀x ∈ Since ψ * is known analytically, the gradient, ∇ψ * , appearing in the right hand side of (22) can be computed exactly while ∇ψ 0 must be numerically approximated.

Numerical method
The numerical approach for the resolution of the above Poisson-Boltzmann problem is presented in this section, along with novel practical a posteriori error estimates. We then verify the method for the entire trajectory by conducting a systematic convergence study for these estimators.

Implementation
The numerical method, implemented on non-graded adaptive Octree grids, follows the general description given in [27], with each snapshot of the protein treated independently. The surface and grid generation is done using the level set framework developed by Min and Gibou [23]. In particular, as Fig. 3 illustrates, the mesh is systematically adapted to the SES location. All quantities are stored at the nodes of the mesh for improved accuracy and facilitated manipulation. The numerical solutions of the Poisson systems (9)-(10)-(11) and (18)- (19) (for the cavities detection and the construction of the regular part of the solution ψ 0 respectively) are obtained using the second-order approach presented in [35], itself based on [24]. The solution, ψ , to the problem defined by Eqs. (20), (21) and (22) is constructed using a nodal version of the jump solver presented in [34]. The non-linearity of Eq. (20) is addressed using Newton's method [8,26,27], with a relative error tolerance of 10 −6 , chosen to be orders of magnitude smaller than the desired overall numerical error. The whole method is parallelized in a shared memory fashion using OpenMP [9,29].
The entire computational domain is defined as the cube of side length 400 Å (about twice the size of the whole protein) and center Calculations were performed on Octrees of maximum level ranging from 7 to 11. On the finest grids, the minimal spatial resolution is 0.18 Å.

Error estimates
To monitor the convergence of the overall method, we construct a posteriori numerical error estimators. They only rely on the decomposition presented in section 3.4, and can therefore be employed independently of the numerical approach. For the current implementation, detailed in section 4, we refer the interested reader to [26,27] for formal convergence studies, using analytic solutions and order estimations.
The error on the interface representation, gradient of the solution (i.e. electric field) and solution itself can be estimated using the following metrics e , e E , e ψ where | | denotes the surface area of . In virtue of Gauss's Theorem the first two integrals are null. Because of the boundary condition (19), the third quantity should also be null. When computing e , because the gradient of the Coulombic and the total charge are known exactly, numerical errors can only arise from calculating the local normal n or approximating the surface integral. Therefore, this metric focuses on the geometry and its manipulation only.
The numerical errors in the gradient of the component ψ 0 are the primary source of errors in the second metric e E .
Thus, it can be interpreted as a lower bound estimate for the average total normal electric field on the interface. The metric e ψ , the average error in ψ 0 on the interface, can similarly be used to estimate the numerical error in the total potential on the interface. Fig. 4a depicts the spike protein's electrostatic potential at the initial frame (closed configuration) as the mesh is refined. Positive electrostatic potential is colored in blue, neutral (no charge) in white, and the negative electrostatic potential is shown in red. The coarsest simulations (max level = 7, 8) are only able to reproduce the general structure of the protein but fail at creating an accurate electrostatic map. As the maximal resolution reaches the characteristic atomic radius (r min = 1 Å), the finest geometrical features are correctly reproduced, leading to appreciably more accurate results (max level = 9, minimal resolution 0.79 Å). Further increasing the spatial resolution refines these molecular structures and the small scale potential variations even more (max level = 10, 11).

Convergence study
The time evolution of our three error estimates for all examined resolutions is presented in Fig. 4. As expected, all three metrics converge with increasing resolution. The impact of using subatomic resolution (i.e. max level ≥ 9) is well illustrated with the convergence of the electric field and potential error estimates: it is unclear for super-atomic resolutions (max level = 7, 8), and evident for subatomic ones (max level = 9, 10, 11).
The error estimation for the total potential (e ψ ) is significantly larger than the variations between consecutive maximum levels observed in Fig. 4a, which is a proxy for the error on the regular part of the solution ψ . Since e ψ involves the singular part of the solution, which exhibits large spatial variations over small length scales, it is prone to higher numerical errors and expected to be larger than the actual error on the regular part of the solution. Closer inspection of our measurements reveals that these two metrics may differ by at least one order of magnitude. The error estimation for ψ , using max level = 10 is approximately equal to the maximum absolute surface potential value (≈ 10) observed on Fig. 4a. This misleadingly suggests the relative error on the total solution is as large as 100%. The comparison between the potential maps for max level = 10 and max level = 11 indicates that in practice the relative error on the surface potential probably lies between 1% and 10%.
From all these remarks, we are confident that the method is correctly implemented and that the most refined simulations accurately capture the S protein's electrostatic potential. For the finest resolution, the estimated average error on the total potential is close to 2, which in light of the above discussion, suggests that the average practical relative error on the surface potential is under 2%.

Results
The S protein remains predominantly in the closed conformation to mask its receptor-binding domains (RBDs), thereby impeding their binding. To bind with ACE2, the S protein transforms into its open conformation, revealing its binding interface.
In describing our results, we refer to the part of the spike containing the three RBDs as the top part and the predominantly negatively charged portion binding to the virus membrane as the bottom part. [43].
Our simulations (see Fig. 5a) illustrate that the top part of the spike protein is predominantly positively charged, in accordance with the negative charge of the ACE2 binding site (see Fig. 6). Surprisingly, the top of the spike also reveals an underlying core with a surface area that has a dense negative charge. As the spike opens, this area could be exposed to the solvent, generating a negatively charged electrostatic cloud in the upper part of the protein, which may repulse the negatively charged ACE2 receptor. Therefore, we characterize the dynamics of the geometrical and electric properties of this negatively charged core χ (Fig. 5b), the resulting repulsive cloud C (Fig. 5c), the binding site B (Fig. 6c), and the far electric field of the spike protein as they shift from the dynamics that occur during spike opening.

Negatively charged core
We define the negatively charged core χ , illustrated in Fig. 5b, as the area of the SES in the top of the protein where the electric potential is more negative than a threshold value c χ = −5. Fig. 5 c and d depict the evolution of its surface and average charge. As the spike opens, the area shrinks by about 50%, while the variation in the average potential remains small (≈ 10%). In both cases, the most significant variations happen during the first ten iterations. In comparison, the molecular structure analysis (see Fig. 1) displays continuous variation over the entirety of the trajectory, indicating that the structural and electro-geometric transformations are non-trivially coupled. The diminution of both quantities could be explained by the fact that the spike opening removes a barrier, one of the monomers, between the core and solvent, causing the dilution of the surface charges in the solvent, and therefore a contraction of the core and its charge.

Repulsive electric cloud
We define the negatively charged electric cloud C, generated by the core χ , as the region in the upper part of the solvent where the potential is below the threshold value c C = −2 (see Fig. 5c). The measurements presented in Fig. 5d and e indicate C expands by ≈ 42% as the spike opens, while its average potential decreases in magnitude by 30%. Again we interpret this phenomenon as an effective dissolution of the negative charges induced by removing the protecting monomer.
For both geometries χ and C, we observe large variations in the first ten frames. The abrupt change (around frame 50) in the protein structure, discussed in section 2, is only perceptible in the size variations of the electric cloud C.

Binding site characterization
The interface between the SARS-CoV-2 receptor-binding domain (RBD) and ACE2 are of particular interest as the binding of the two facilitates virus entry into cells [2,18,20,33]. Lan et al. [18] undertook the task of determining the residues of the RBD and ACE2 interface which form a connection, finding the location of the binding site on the spike, B, to consist of 17 residues between Lys417-Tyr505 and the ACE2 site to be formed of 20 residues between Gln24-Tyr83 and Asn330-Arg393.  . When the spike is in the closed position B is located near its center, but as the spike shifts to its open conformation, B is moved outward and exposed to the solvent, as seen in Fig. 7. As the spike opens, we monitor the evolution of the binding site area, average absolute curvature, and potential. The average is computed over the binding site surface. We interpret the average absolute curvatures as measures of the global convexity of the binding site. For the potential evolution, we distinguish between the average potential, average positive potential, and average negative potential. Fig. 6 provides a depiction of the resulting information. A relative decrease is observed for the area and average absolute curvature: as the spike opens, the binding site shrinks and flattens, in each case by 10 − 15%, suggesting minor conformational changes. The average negative potential also experiences a decrease, becoming less negative as the spike opens.
Meanwhile, the average positive potential remains relatively constant, resulting in the total average potential of B following very closely with the changes in average ψ − and ultimately nearing 0 in the final frame. This increase in average potential is consistent with the knowledge that the ACE2 receptor is designed to bind to negative charges, and may be necessary in directing the ACE2 binding to the correct region.

Electric field structure
The spike electric far-field streamlines are depicted in Fig. 7. In the closed configuration, most streamlines emanating from the bottom part of the spike are pointing away from the spike. Only a few of them, caused by the rare presence of positive charges, return rapidly to the protein. In the upper part, the streamlines are predominantly closed, suggesting that in this configuration, the spike protein may not be able to attract charges of any polarity at long range, and therefore has limited attracting potential. This may cause a lower affinity. As the spike opens, the streamlines in the bottom remain unchanged. However, above the top of the protein, the streamlines are now predominantly open. Fig. 8(a)-(b) depicts the entire electric field structure along with the region in the solvent where the electric field point away from the protein, or equivalently where negative charges would be drawn to the spike. In the closed configuration, this region is predominantly located above the spike, while in the open configuration, it is split into three sub-regions, each centered around one of the RBDs. In the latter, the sub-region centered around the revealed RBD represents 45% of the entire region and carries 34% of its total electric potential energy.
To characterize the restructuring of the electric field of the S protein, we define the first four moments of the non- Dipole Quadrupole Octupole Note that the last two moments are defined in their symmetric traceless form, as they would be for a standard electrostatic multipole expansion. Because here our continuum model is more complex, it should be reminded that these four moments are not guaranteed to be the ones appearing in the multipole expansion. Nonetheless, they remain a pertinent tool to characterize the structure of the electrostatic solution. Their evolution throughout the opening, illustrated in Fig. 8c, show that the strength of the dipole (D) is decreasing, while all other moments norms are increasing. The two most informative moments for the structure of the electric field, the dipole and the quadrupole, are diminished by 9% and increased by 67% respectively. The principal direction of Q associated to the positive eigenvector appears to remain quasi-parallel to the dipole direction. In fact, all four directions (the dipole direction and the three principal directions of Q ) appear to undergo the same rotation. The opening also reinforces the anisotropy in the tensor Q , by amplifying the difference between the magnitude of its eigenvalues, compressing one of its directions and effectively turning it into a two-dimensional quadrupole.

Conclusions
We examined the SARS-CoV-2 spike protein's transition from its open to closed conformation, observing the protein's electrostatic potential dynamics with a particular focus on its receptor-binding domains. In the Folding@home trajectory we analyzed, one of the three monomers detaches from the core of the complex and becomes visible to the surrounding environment, consistent with recent cryo-EM derived structures [5]. Our results show that despite the dramatic molecular displacement engendered by this strategic repositioning, the geometric and electric properties of the RBD itself remain largely unaltered. Instead, as we describe below, changes emanating from the exposure of the core of the S protein cause a change in the electric fields surrounding the RBD.
Our continuum analysis, both of the surface potential and the volumetric potential in the vicinity of the binding region, points to the existence of an inner negatively charged core on the surface of the spike, which is revealed to the solvent as (d) Depicts the evolution of the norm of all four first moments. We use the standard L 2 entry-wise norm. the spike opens. This negatively charged surface shrinks as the spike opens, inducing a negatively charged region between the exposed RBD and the two hidden ones. The emergence of this electric cloud is a priori puzzling: the binding site of the ACE2 receptor being almost entirely negatively charged, we would expect this cloud to repulse the receptor. However, this cloud does not envelop the spike's binding site, B, which becomes more positively charged. This is in line with what we expect is a strong binder of the ACE2 receptor and indicates that targeting the open state of the S protein may be a more viable drug design strategy than the closed configuration. Indeed, recent cryoEM studies showed that ≈ 16% of a recombinantly expressed S protein population is in an open state, with a single monomer "erected" from the core [5].
The electric field, which we observe uncoiling above the protein, exhibits a dramatic rearrangement. This is correlated to the emergence of the negatively charged cloud and manifests in the multipole moments decomposition. In particular, we observe the quadrupole moment growing in magnitude, rotating, and compressing one of its principal directions. This transformation can be interpreted as a strategic transition from an undirected configuration where the entire top part of the spike attracts negatively charged structures, such as the ACE2 binding site, to one where the attraction is directed toward a specific RBD.
The progression of time will uncover a multitude of breakthroughs regarding the behavior of the SARS-CoV-2 S protein. In the short time since the conception of our investigation, numerous discoveries by other researchers have been made public already. For example, the trajectory presented in [43] contains glycans, chemical compounds known to coat the exterior of many viruses, which may have an impact on the results presented here. Recall that the trajectory we explored here is simply one possible path the spike may follow. So there is a possibility that a different trajectory would be a more accurate representation of the true function of the spike protein. Recent studies displayed cryo_EM structures for the spike, presenting an open configuration that differs from the open configuration represented here [40,36]. A similar analysis to the one presented here may be required on different structures, as more probable trajectories are predicted and novel molecular structures uncovered, to capture a more relevant image of the electric potential on and around the spike. Utilizing the same scientific pipeline, we are confident high-fidelity quantitative insights -into the electric potential of molecules unrestricted to the one investigated here -could be obtained efficiently, supporting the global scientific effort.
At the computational level, we have developed a novel framework, combining Markov states models simulations with a continuum physics numerical solver to produce high-resolution multiscale dynamical potential maps of deforming proteins and their surrounding environment. Our framework includes practical mathematical tools to quantify the numerical error and characterize protein electro-geometric properties. The molecular trajectories alone are insufficient for understanding protein electrostatic interactions and, thus, protein core bio-mechanisms. On the other hand, it is unrealistic to envision an approach where trajectories of such a large protein could be obtained solely through continuum modeling. We believe hybrid approaches, such as the one presented here, can provide the scientific community with invaluable information, which we hope can be used to elucidate how changes to physical properties such as electrostatics translate into function.

CRediT authorship contribution statement
M. Theillard and S. Sukenik conceived the presented project. M. Theillard developed and implemented the numerical method. S. Strango, A. Kucherova, and M. Theillard carried the computations and analyzed the Results. All authors contributed to the redaction of the manuscript and approved its final form.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.