On regularization of charge singularities in solving the Poisson-Boltzmann

Numerical treatment of singular charges is a grand challenge in solving the PoissonBoltzmann (PB) equation for analyzing electrostatic interactions between the solute biomolecules and the surrounding solvent with ions. For diffuse interface PB models in which solute and solvent are separated by a smooth boundary, no effective algorithm for singular charges has been developed, because the fundamental solution with a space dependent dielectric function is intractable. In this work, a novel regularization formulation is proposed to capture the singularity analytically, which is the first of its kind for diffuse interface PB models. The success lies in a dual decomposition – besides decomposing the potential into Coulomb and reaction field components, the dielectric function is also split into a constant base plus space changing part. Using the constant dielectric base, the Coulomb potential is represented analytically via Green’s functions. After removing the singularity, the reaction field potential satisfies a regularized PB equation with a smooth source. To validate the proposed regularization, a Gaussian convolution surface (GCS) is also introduced, which efficiently generates a diffuse interface for three-dimensional realistic biomolecules. The performance of the proposed regularization is examined by considering both analytical and GCS diffuse interfaces, and compared with the trilinear method. Moreover, the proposed GCS-regularization algorithm is validated by calculating electrostatic free energies for a set of proteins and by estimating salt affinities for seven protein complexes. The results are consistent with experimental data and estimates of sharp interface PB models.


Introduction
At atomistic level of detail, biological processes involve charged objects such as proteins, DNAs and RNAs, immersed in an aquatic environment with mobile ions. To understand such processes, an electrostatic analysis is indispensable, which concerns interactions between solute macromolecules, the surrounding solvent molecules, and ions. Explicit solvent calculations are very expensive since they involve millions of water molecules, while implicit solvent models, such as the Poisson-Boltzmann (PB) equation [1][2][3], allow for fast analysis, by computing interactions via a mean force approach. Physically, by treating the macromolecule and water as dielectric continuum, the PB model combines the Gauss's law in electrodynamics with Boltzmann distribution in statistical thermodynamics. Mathematically, the PB equation is an elliptic partial differential equation (PDE) with singular source terms to account for partial charges contained in the macromolecule, and is defined on a domain consisting of solute and solvent regions.
In the classical PB model [1][2][3], a sharp interface is assumed as the solute-solvent boundary, and a low dielectric constant is assigned to the biomolecule, while the water phase is considered as a high dielectric constant medium. The shape of the interface is known as a molecular surface, and is usually modeled according to the protein structure, as well as effect of water penetration. The most commonly used models include Van der Waals (VdW) surface, solvent accessible surface (SAS) [4], solvent excluded surface (SES) [5], and Gaussian surface [6]. Nevertheless, a sharp interface introduces a discontinuity in the dielectric coefficient across the solute-solvent boundary, so that the PB potential solution loses its regularity too. This demands special handling in numerical solution of the PB equation [7,8]. Moreover, geometrical singularities could occur at sharp molecular surfaces [9,10].
It is known physically that the dielectric coefficient of a medium is determined by the polarizability of the medium in responding to local electrostatic field. Thus, for biological and chemical systems in molecular scales, the assumption of a sharp interface in the PB model as the solute-solvent boundary seems to be unphysical [11][12][13]. In particular, beginning with macromolecule interior and moving toward the molecular surface and further into the water phase, the polarizability constantly increases, but should not undergo a sharp jump [14]. This motivates the development of many PB models by treating the solute-solvent boundary as a smooth diffuse interface [9,10,[15][16][17][18]. For example, in studying charged objects immersed in liquids, a smooth implicit solvent model has been developed by incorporating the structures of water dipoles and ions into mean field modeling, so that the effective dielectric coefficient becomes a smoothly variant function [15].
A popular mathematical formulation [9,10,[16][17][18] for studying polar and non-polar interactions between the macromolecule and solvent is to construct a free energy functional by incorporating various contributions, such as electrostatic effects, protein dimensions, surface curvatures, etc. Then, the Euler-Lagrange variation of the free energy minimization will lead to a coupled PDE system, and the resulted PB equation naturally involves a diffuse interface type dielectric boundary.
This work focuses on numerical treatments of the singular sources in the PB equation, which take the form of a sum of the Dirac delta distributions in modeling of the partial charges located at the atomic centers of the biomolecules [1][2][3]. Mathematically, the electrostatic potential blows up at the atom centers too, which introduces a great difficulty to the theoretical analysis [19]. Numerically, the accurate treatment of singular charge sources in grid based computations is a significant issue for the PB equation with either sharp or diffuse interfaces.
The existing algorithms for the PB charge singularities in the classical sharp interface setting can be classified into direct and indirect approaches. The most commonly used direct approach is to distribute the point charges to their neighboring vertexes of the cube or element by means of a trilinear approximation [3]. Alternatively, in a finite element variational form, one can apply the definition of the delta function so that a point charge can be evaluated through the trial functions [20]. Being computationally efficient, these direct approaches usually involve a large approximation error. To improve the accuracy, the approximation of partial charges by Gaussian function or high order polynomials has also been considered in the PB literature [21,22]. Nevertheless, these methods involve a wide stencil of grid nodes for approximating a point charge. Numerically, when the point charge is near the molecular surface, it could happen that one will use nodes outside the molecular surface to approximate partial charges inside the protein. Recently, a second order accurate geometric discretization of the multidimensional Dirac delta distribution has been introduced in [23] for solving Poisson's equation, in which the approximation is confined within a simplex. Special treatments have been designed so that the second order accuracy is maintained even when the simplex is intersecting with the dielectric interface.
In an indirect approach for solving the PB equation with a sharp interface, the inaccurate discretization of the unbounded delta function by finite grid values is avoided. In the biophysics community, an induced charge method has been developed [24,25], in which the free energy is split into Coulomb, reaction field, and ionic solvent components. The reaction field effects due to a dielectric boundary can be physically reproduced by an appropriate mapping of induced polarization charge placed on the molecular surface, so that one can use real charges rather than their grid representation in calculating the free energy. In the mathematical literature, a series regularization methods [8,19,[26][27][28][29][30][31][32][33] have been introduced for handling singular charges by directly manipulating the PDE. Typically, these methods involve a decomposition of the potential solution into two or three components with a singular one to capture the singularity. In most methods, this singular component is defined as the Coulomb potential and satisfies a Poisson's equation with the same singular sources, so that it can be analytically solved as Green's functions. In the range separated regularization method [26,31], the delta functions are further decomposed into local and global components. The singular component corresponding to the local singular sources can be accurately approximated by tensor discretizations. After removing the calculated singular component, other potential components in all regularizations become bounded so that their numerical computation becomes more accurate. Recently, a comparison of four popular regularizations was conducted in [34] to investigate an accuracy reduction issue. An accuracy recovery technique has been correspondingly designed so that all four methods yield the same high precision.
To the best of the authors' knowledge, only direct approaches have been implemented in the existing diffuse interface PB models, and indirect approaches have never been developed for handling charge sources, due to various challenges. In the induced charge method [24,25], the surface induced charges are defined on the sharp molecular surface. It is uncertain if this method can be generalized for a diffuse interface. For regularization methods, the difficulty is due to the space dependent dielectric function in the Laplacian operator of the PB equation. Mathematically, the fundamental solution that satisfies the Poisson's equation with such a Laplacian and singular sources is not analytically available, even though a semi-analytical method has been proposed to construct it for separable geometries via quasi-harmonic approximations [35]. Without an analytical representation of the singular potential, it is unclear on how to formulate a regularization for the PB equation with diffuse interfaces.
The main goal of this paper to introduce the first regularization method to treat singular charges for diffuse interface PB models. Following our recent study of Poisson's equation with a diffuse interface [36], the main idea of our new approach is to decompose the inhomogeneous dielectric function into a constant base plus a space variant part. This allows us to capture the singular potential by considering a Poisson's equation with the constant base dielectric value, and analytically solve the fundamental solution as Green's functions. Through a simple PDE analysis, the other component, i.e., the reaction field potential, can be shown to be bounded. With a smooth dielectric setting and being free of singular sources, the reaction field potential can be accurately approximated by any numerical discretization.
The second goal of this work is to introduce a simple algorithm for generating a diffuse interface in dealing with complex protein systems. In principle, the proposed regularization method can be applied to all existing diffuse interface PB models [9,10,[15][16][17][18]. Nevertheless, in order to validate our regularization in an independent setting, a Gaussian convolution surface (GCS) will be introduced together with a fast Fourier transform (FFT) implementation. The new algorithm can robustly handle various protein geometries, and is very efficient for large complex systems. The GCS algorithm will provide a smooth surface function to characterize the space, which takes a constant value of one and zero, respectively, in the solute and solvent domains. In a narrow transition layer at the solute-solvent boundary, the surface function decays smoothly from one to zero.
The rest of this paper is organized as follows. The proposed regularization method will be introduced in Section 2. Its numerical implementation will be discussed too. In Section 3, the details of the proposed GCS diffuse interface will be offered. Numerical tests will be carried out to examine the GCS model parameters. The validation of the proposed regularization based on analytical surface and GCS will be considered in Section 4 for several simple atomic systems. The biological applications to small macromolecules, large proteins, and binding protein complexes will be investigated in Section 5. Finally, this paper ends with a conclusion.

Poisson-Boltzmann model and regularization
In this section, we will develop a regularization approach for treating singular charge sources of the Poisson-Boltzmann (PB) equation, where a smooth solute-solvent boundary is assumed to be given by a known diffuse interface. Numerical discretization of the PB equation and energy calculation will be discussed too.

A Poisson-Boltzmann model with a diffuse interface
In implicit solvent models, a macromolecule such as a protein is regarded as a solute, being immersed into an aqueous solvent. Consider a large enough cubic domain Ω ⊂ R 3 that contains this three dimensional (3D) solute-solvent system. In a traditional two-dielectric PB model [1][2][3], the domain Ω is divided by a molecule surface Γ into two parts, namely the molecule domain Ω − and solvent domain Ω + , and the dielectric function ( r) is assumed to be a piecewise constant with = − in Ω − and = + in Ω + . See Figure 1a for an illustration.
In this paper, we will focus on a smooth dielectric PB model with a diffuse interface, see  transition layer Ω t in between Ω i and Ω e . The dielectric function ( r) will take constant values in solute and solvent with ( r) = i in Ω i and ( r) = e in Ω e , with 0 < i < e . In the smooth solutesolvent boundary Ω t , ( r) varies smoothly from i to e . Mathematically, the domain partition can be characterized by a level set or surface function S ( r), which equals to one and zero, respectively, in Ω i and Ω e . In Ω t , S ( r) monotonically decays from one to zero, so that S ( r) is at least a C 2 continuous function over the entire domain Ω. See Figure 2a for an illustration. The dielectric function is then defined as [9,12] ( r) = S ( r) i + (1 − S ( r)) e , for r ∈ Ω, which is also at least C 2 continuous, see Figure 2b. In the present study, we will take i = 1 for the protein and e = 80 for the water. The construction of the smeared surface function S ( r) for real proteins will be discussed in the next section. We assume S ( r) being given in the following discussion on regularization.
For the present setting, the nonlinear Poisson-Boltzmann (PB) equation for the electrostatic potential u( r) can be expressed as [12] − ∇ · ( ( r)∇u( r)) + (1 − S ( r))κ 2 sinh(u( r)) = ρ( r), in Ω, (2.2) in the dimensionless form. Here the source term is due to singular charges contained in the protein where δ( r − r j ) is the Dirac delta distribution. Consequently, the solution u to Eq (2.2) shall be defined in a weak form. We refer to the references [19,32,37] for more details about the weak solution and well-posedness of the PB equation. In this work, N m is the total number of atoms in the solute molecule, k B is the Boltzmann constant, and T is the temperature. For each atom, a partial charge q j in terms of the fundamental charge e c is located at the atom center r j . Since r j ∈ Ω i for all j, ρ( r) is only defined within in Ω i and S ( r) has actually been dropped in the source term, i.e., S ( r)ρ( r) = ρ( r) in Eq (2.2). With S ( r), the coefficient of the PB nonlinear term (1 − S )κ 2 is at least C 2 continuous, where the modified Debye-Hückel parameter κ takes a constant value κ 2 = 2N A e 2 c 100k B T I = 8.486902807Å −2 I. Here N A is the Avogadro's Number and I is the molar ionic strength. On the outer boundary ∂Ω, a Dirichlet boundary condition can be assumed We note that the potential u and its gradient ∇u are continuous everywhere in Ω, except at charge centers. We also note that the dimensionless potential u = e c φ k B T , where the original electrostatic potential φ has the unit kcal/mol/e c .

A novel regularization for the PB equation
Following our recent study [36] on the regularization of the Poisson's equation with a diffuse interface, we propose to regularize the PB equation (2.2) through a dual decomposition strategy. The potential u is split into a Coulomb component u C and a reaction field component u RF with u( r) = u C ( r) + u RF ( r) in Ω. The dielectric function is decomposed to be a constant base value plus space variant part ( r) = i +ˆ ( r), withˆ ( r) ≥ 0 for r ∈ Ω. Note thatˆ = 0 in Ω i andˆ = e − i in Ω e .
The key of regularization is to capture the solution singularity analytically by the Green's function. Similar to [36], we assume that the Coulomb potential u C satisfies a homogeneous Poisson's equation with the singular source ρ given by Eq (2.3), Then, the singular component u C is analytically given as the Green's function G( r) After removing u C , the reaction field potential u RF ( r) will be bounded everywhere inside Ω.
With the dual decomposition, the PB equation (2.2) can be expanded as By subtracting Eq (2.5) from Eq (2.7), the PB equation is free of singular charges Since u C = G( r) is known, we can move the first term to the right-hand side This actually gives rise to a regularized PB equation for the reaction field component where the gradient of Green's function is analytically given as (2.11) The regularized source ∇ · (ˆ ∇G) is smooth and non-vanishing only in Ω t . To see this, recall that = 0 inside Ω i . Even though ∇G is singular at atom centers r j in Ω i , the limit ofˆ ∇G as r goes to r j exists and actually equals to zero. Thus,ˆ ∇G is a smooth function in the entire domain Ω, so that the new source term can be rewritten as ∇ · (ˆ ∇G) = ∇ˆ · ∇G +ˆ ∆G, in Ω. (2.12) Since ∆G = 0 everywhere except at charge centers, whereasˆ = 0 inside Ω i including at charge centers, we haveˆ ∆G = 0 for all r ∈ Ω. Thus, this term can be dropped from the new source ∇ · (ˆ ∇G) = ∇ˆ · ∇G = ∇ · ∇G, in Ω, (2.13) where we have applied ∇ = ∇ˆ , because andˆ differ by a constant i . According to the definition of by Eq (2.1), ∇ = 0 in both molecule domain Ω i and solvent domain Ω e . So does ∇ · ∇G. In the transition layer Ω t , ∇G can be analytically calculated according to Eq (2.11), while ∇ shall be smooth. Thus, the regularized source ∇ · ∇G is bounded and smooth in entire Ω, while being non-vanishing only in Ω t . In summary, we propose a new regularized PB equation for the reaction field potential (2.14) We note that the decomposition of the dielectric function = i +ˆ is used only in the derivation, and all computations can be carried out based on ( r) only. Since (1 − S ) = 0 inside Ω i , the nonlinear term in Eq (2.14) is also vanishing in Ω i . This guarantees that the reaction field potential u RF will be bounded and smooth in Ω, and can be easily solved by any numerical method. Once u RF is computed from Eq (2.14), the potential solution of the PB equation (2.2) is simply given by u = u RF + G. Throughout this paper, the dielectric coefficients will be chosen as i = 1 and e = 80. The ionic strength will be taken as I = 0.15 M unless specified otherwise.

Numerical discretization
When the electrostatic potential is weak, sinh(u) can be approximated by u, which yields a linearized PB (LPB) model. For simplicity, we will numerically validate the proposed regularization method by considering the following LPB equation with a diffuse interface − ∇ · ( ( r)∇u( r)) + (1 − S ( r))κ 2 u( r) = ρ( r), in Ω, (2.15) subject to the same source term (2.3) and boundary condition (2.4). It can be similarly derived that now the reaction field component satisfies the following regularized LPB equation Note that the second source term is vanishing inside Ω i due to (1 − S ) factor. In Ω t and Ω e , the Green's function G( r) is bounded and can be directly computed by Eq (2.6). Consider a uniform mesh partition of the computational domain Ω, with N x , N y , N z being the number of the grid points in x, y, and z directions, respectively. Without the loss of generality, we assume the grid spacing h in all x, y, and z directions to be the same, i.e., h = ∆x = ∆y = ∆z, with the unit Å. For a function f defined at a node (x i , y j , z k ), we denote f i, j,k = f (x i , y j , z k ). For real proteins, the surface function S ( r) has to be generated numerically. Thus, in the present numerical discretization, we will assume S i, j,k being known for all i, j, and k. Similarly, by using Eq (2.1), we have all nodal values i, j,k . But we may not know S and values off-grid. In the source term, G i, j,k and ∇G i, j,k can be calculated analytically.
The central finite difference method is employed to solve the regularized LPB equation (2.16). For this purpose, we rewrite Eq (2.16) into its Cartesian component form Since both surface function S and dielectric function are available only on grid nodes, derivatives of u RF and in Eq (2.17) will be discretized by using central differences involving nodal values. For example, we have In the present study, the band-width of Ω t will be taken as 3Å in protein studies, which guarantees a high accuracy in approximating ∇ by Eq (2.19). After discretization, Eq (2.17) becomes a sparse linear system with dimension N 3 -by-N 3 , where N 3 = N x × N y × N z . A biconjugate gradient iterative solver [38] is employed to solve this linear system for u RF (x i , y j , z k ). The order of accuracy of the entire central difference discretization is two.
For a comparison, the traditional trilinear method [3] will also be employed to treat singular charges. In the trilinear method, for each charge q j in the source ρ located at r j , one will find a cubic cell containing this charge. One will then distribute the charge q j into eight corner nodes of the cell. Denote Q i, j,k as the resulting fractional charge at grid point (x i , y j , z k ). One can then directly discretize the LPB equation (2.15) by evaluating q j as eight Q i, j,k values in ρ, while the left hand side terms can be approximated by the same second order central differences. We denote the resulting potential of the trilinear method as u T L (x i , y j , z k ). The trilinear distribution usually involves a huge error-much larger than the discretization error of the central difference.

Electrostatic free energy
The energy released when the solute molecule is dissolved in solvent is known as the free energy of solvation. The polar component of solvation free energy can be calculated in the PB model by computing the difference between total electrostatic free energy of the macromolecule in the solvent and in the vacuum [1][2][3]. For the linearized PB equation with a diffuse interface, i.e., Eq (2.15), this involves two numerical solutions in traditional methods, such as the trilinear method. One first solves Eq (2.15) for the water state to obtain u( r). For the vacuum state, one solves a Poisson equation which is obtained by taking κ = 0 in the PB equation (2.15) and boundary condition (2.4), and i = e = 1 for defining in Eq (2.1). Here we denote the vacuum state solution as u 0 ( r). The polar solvation free energy or electrostatic free energy is then calculated as We note that the solution of Eq (2.20) is actually the coulomb component u C , which is known analytically as the Green's function G. Nevertheless, in the trilinear method, one still numerically solvers Eq (2.20) for u 0 . Because the same source approximation is utilized in the water and vacuum states on the same mesh, the discretization error of trilinear distribution can be offset when evaluating u T L − u 0 in Eq (2.21) for free energy calculation [3]. For the regularization approach, one does not need to solve Eq (2.20), and just solves the PB equation once in the water state for u RF . Then, the electrostatic free energy is computed as Since the calculated potentials u, u 0 , and u RF are available only at grid nodes (x i , y j , z k ), a linear interpolation is conducted to evaluate these potentials at charge centers r j in Eqs (2.21) and (2.22). In the present study, the electrostatic free energy will be reported in the unit of kcal/mol.

A simple diffuse interface model
In order to validate our regularization in an independent setting, we will introduce a simple algorithm in this section to compute the surface function S ( r). For any protein, we will determine Γ i and Γ e properly so that S ( r) = 1 in Ω i and S ( r) = 0 in Ω e , while S ( r) changes smoothly in Ω t . Our intention is not to introduce another molecular surface definition. Instead, our focus is on computational efficiency in treating large proteins, as well as numerical robustness in handling complex shapes in a discrete setting.

Molecular surface and mathematical theory
We first briefly review several commonly used molecular surfaces, see Figure 3a. For a protein with a total N m atoms, we know the center r l and radius r l of each atom. By representing each atom as a hard sphere, the Van der Waals (VdW) surface is simply defined as the smallest envelop enclosing the union of all spheres. The solute-accessible surface (SAS) and solute-excluded surface (SES) [4,5] are defined by rolling a probe sphere around the VdW surface, to mimic the solvent penetration by the water molecule. The probe radius r p will be fixed as 1.5Å in this study. The SAS is traced by the center of the probe, while the SES is traced by the inward-facing surface of the probe, see Figure 3a. However, these sharp interface molecular surface definitions are known to admit geometric singularities [9,10].
We propose a new algorithm to generate a diffuse interface type smooth molecular surface based on the SAS. We start by defining a Heaviside function H( r) such that H = 1 when r is inside the SAS or on the SAS, and H = 0 when r is outside. A smeared surface function S ( r) can be obtained by convoluting H( r) with a one-dimensional (1D) Gaussian kernel function K(x) defined as This normalized Gaussian function represents the probability density function of a normal distribution with a variance σ and the expected value being zero. The Gaussian convolution or filtering will be carried out in x, y, and z directions in a tensor product style. Thus, the convolution of H and K along a x grid line can be expressed as where we have abused the notation by denoting H as a 1D function function H(x). In Figure 3b, such a 1D Heaviside function H(x) is shown in the top portion. The convoluted function shown in the bottom portion clearly involves a smooth transition from 1 to 0, corresponding to a smooth solute-solvent boundary.
Mathematically, the convolution equation (3.2) can be simply realized via the Fourier transform F , which transfers a physical space in x to a frequency space in ω, By the convolution theorem, we have . The resulting function after convoluting H( r) with K(x) in x, y, and z directions is the desired smeared surface function S ( r), and the underlying diffuse interface will be called as the Gaussian convolution surface (GCS). The initial setup of the GCS computation is illustrated in Figure 4a. Centered around the SAS, we aim to create a transition layer with the width being 3Å as the smooth solute-solvent boundary. This guarantees that the solute domainΩ i inside the VdW surfaceΓ i has a fixed dielectric value = i . Mathematically, we define an enlarged SAS by augmenting the atomic radii by 2r p = 3Å, and denote it asΓ e . OutsideΓ e , the dielectric value is fixed as = e in the solvent domainΩ e . Note that we have added a tilde in all notations because they are physically defined before the Gaussian filtering. After convolution, the actual domains and boundaries will be slightly different. We will discuss them in the next subsection.

A fast algorithm for Gaussian convolution surface
In practice, the GCS is generated discretely based on a given uniform mesh with a spacing h. The fast Fourier transform (FFT) will be adopted to accelerate the computational speed. In such  computation, one does not need to specify the exact locations ofΓ i andΓ e . Instead, it is sufficient to generate nodal values of S (x i , y j , z k ) for all grid nodes, which then can be utilized in the regularization discretization.
We first numerically determine the Heaviside function. Consider a protein with N m atoms. For l = 1, 2, . . . N m , the atom center (x l , y l , z l ) and radius r l of the protein structure can be obtained from the Protein Data Bank (PDB) by using the PDB ID. Note that the SAS is actually a VdW surface by augmenting each atomic radius r l by the probe radius r p = 1.5Å. We first initiate H = 0 for all nodes (x i , y j , z k ) in Ω. Then for each atom, we will search for nearby nodes. If the distance between the atom center and one node is close enough, i.e., ( A proper discrete sampling of Gaussian kernel K(x) is crucial for convoluting with the Heaviside function H(x i , y j , z k ). In order to make sure that the width of the transition layer is around 3Å, we may control the window size of K(x) by adjusting σ. Another important factor is the width of the sampled Gaussian kernel, see Figure 4b. In the present study, we set the half-width to be W = σ − ln(2πτ 2 σ 2 ), (3.5) which is obtained by solving K(W) = τ, where τ is a tolerance measuring the height of the truncated Gaussian kernel above the x axis, see Figure 4b. The tolerance is fixed to be τ = 0.0025 in this work. Then, the discrete kernel will be sampled uniformly in the interval [−W, W]. The total number of sampling is related to the probe radius r p and the mesh spacing h, where · is the floor function. For example, for r p = 1.5 and h = 0.5, we have M = 7. We then sample K(x) defined by Eq (3.1) by a uniform grid with M nodes over the interval [−W, W], see Figure 4b. The resulting vector is denoted as K with K ∈ R M . The Gaussian convolution equation (3.2) will be carried out dimension by dimension. Without the loss of generality, let us discuss the filtering in the x direction with N x grid nodes. The discrete Heaviside function is a vector of dimension N x , and is denoted as H. An illustration of H and K is shown in Figure 5, along with the convoluted vector. Note that by the definition of both Heaviside function H( r) and surface function S ( r), a periodic boundary condition could be assumed for both functions over the domain Ω in any Cartesian direction. Thus, their discretized versions also satisfy the periodic boundary condition. Therefore, the convolution of two vectors H and K could be represented either as linear convolution or circular convolution. For instance, in case of linear convolution, let us formulate the convoluted vector ( H K) with its i-th element being (3.7) In order to make sure that the convoluted vector has dimension N x , the discrete convolution equation (3.7) shall be defined for i = 1, 2, . . . N x . This means that the dimension H has to be extended to N x + M − 1. This is achieved by zero padding, i.e., one simply adds enough number of zeros at the end of vector H. Moreover, in order to apply the FFT algorithm, the dimension of H should be an integer power of 2. Thus, more zeros will be added into H such that the dimension N f is the least integer satisfying N f > N x + M − 1 and N f = 2 m for some integer m. A vector H after zero padding is shown in the top portion of Figure 5. The vector K is also padded with zeros such that its dimension is N f as well, see the middle portion. By maintaining the same length for H and K, the discrete Fourier components of both vectors, which are obtained by the FFT, can be multiplied term-wisely. An inverse FFT will produce a vector of dimension N f , as shown in the bottom part of Figure 5. Because the convolution result by the FFT is based on the circular convolution, the first (M − 1)/2 points of this vector are 'corrupted' by circulation. Thus, we will skip the first (M − 1)/2 points, and read the next N x values to define a vector S as the convoluted result. See Figure 5. We note that the complexity for the FFT Gaussian convolution along one grid line is O(N f log N f ). The entire GCS algorithm involves the discrete Gaussian filtering in x, y and z directions, and for all grid lines. The overall complexity asymptotically equals to O(N 3 log N 3 ), where N 3 = N x × N y × N z is the the total degree of freedom (DOF). It is noted that since the width M of the Gaussian kernel K(x) is proportional to N x , the direct convolution using Eq (3.7) will result in a complexity O(N 2 3 ). This is much slower than the proposed FFT procedure.
At last, we need to process the generated discrete function S (x i , y j , z k ) to cancel numerical artifacts. Even though the present Gaussian kernel is properly defined so that the convolution could be confined within a band of 3Å, the numerical value of S could be slightly less than 1 insideΩ i , say 0.9999. In fact, such a long tail smoothing is because the Fourier component of the Gaussian function is not a bandlimited function in the frequency domain. This is a well known drawback of the Gaussian filtering. To rectify this, we usually conduct a post-processing after convolutions. For any node (x i , y j , z k ) insideΓ i , we simply reset S = 1. Similarly, for any node (x i , y j , z k ) outsideΓ e , we will force S = 0. The S values between two boundaries will not be changed. Because the long tail smearing is so weak, the rectified discrete function S (x i , y j , z k ) are still sufficiently smooth for the regularization analysis.
Mathematically, the solute domain Ω i , transition region Ω t , solvent domain Ω e , and their boundaries Γ i and Γ e can be defined based on the final S (x i , y j , z k ). For example, Γ i and Γ e could be represented as an isosurface S = 1 and S = 0, respectively. For most parts, these domains and boundaries will be close to their counterparts defined initially in Figure 4a. Nevertheless, the new boundaries Γ i and Γ e will become very smooth, which is different fromΓ i andΓ e . Moreover, our GCS algorithm is designed to guarantee thatΩ i ⊂ Ω i andΩ e ⊂ Ω e . Finally, we note that the explicit definition of domains and boundaries are actually unnecessary, because in our regularization computation, we just need to know nodal values of the GSC surface function S (x i , y j , z k ).

GCS model validation
In this subsection, we validate the GCS model numerically by considering some simple systems. In all tests, a cubic domain Ω = [−10, 10] 3 is employed. A uniform grid with mesh size N = N x = N y = N z is used with a spacing h = 20 N−1 . The Gaussian kernel in Eq (3.1) contains only one parameter: the variance σ, which controls the shape of Gaussian function. We first explore how σ 2 influences the GCS by studying a one-atom model with radius R = 2 and the atom center located at the origin (0, 0, 0). Because of a simple geometry, the boundaries Γ i and Γ e could be fixed in the one-atom model throughout the Gaussian filtering, and are chosen as spheres with radii r i = 2 and r e = 5, respectively. In Figure 6, we plot S along a x grid line y = z = 0 by considering different σ 2 values and N = 401 for one atom system. Due to the design of the GCS algorithm, all cases have the same S values for | r| < r i = 2 and | r| > r e = 5. For the transition layer r i ≤ | r| ≤ r e , the GCS shape depends on σ 2 . In particular, as σ 2 becomes smaller, the decay from S = 1 to S = 0 becomes steeper. It can be seen in Figure 6 that as σ 2 → 0, the GCS curve has a tendency to converge to a Heaviside function with radius r = 3.5. However, such a theoretical convergence is impossible to be realized numerically with a finite mesh size N = 401. As can be seen in Figure 6b, the GCS surface is still quite smooth when σ 2 is as small as 10 −20 . Moreover, a too small σ 2 value is usually not welcomed in the PB simulations, because the central difference will yield a large numerical error in resolving a rapid change. On the other hand, when σ 2 is too large, it can be seen that the decay becomes too slow in Figure 6b. The post-processing of S becomes critical in this case. Otherwise, one may concern about the smoothness of S near r i and r e . For simplicity, a median value such as σ 2 = 1 is what we recommend for general computations.
We next study a diatomic system with both radii being R = 2 and centers located at (±3, 0, 0). In Figure 7, the GCS heat-map figures for the diatomic system are shown. By using N = 401, the initial Heaviside function is plotted in part (a), while other parts use different σ 2 values. Similar to the one atom case, σ 2 controls the steepness of decay in the transition layer. Moreover, the smoothness of the GCS diffuse interface is well illustrated in Figure 7. For this diatomic system, the VdW surfacẽ Γ i consists of two isolated spheres, and the enlarged SASΓ e looks like the SAS, i.e., it involves two geometrical singularities, see Figure 7a. Nevertheless, after Gaussian filtering, domains and boundaries become sufficiently smooth. In particular, the solute domain Ω i becomes a connected region with Ω i ⊂ Ω i . For the solvent domain, we also haveΩ e ⊂ Ω e . In Ω t , S has a smooth decay from 1 to 0 in any direction.
To further illustrate the smoothness of the GCS model, we change the distance between two atom centers by fixing σ 2 = 1 and other parameters, see Figure 8. In particular, six cases are considered with the atom centers being chosen as (±2, 0, 0), (±2.3, 0, 0), (±2.6, 0, 0), (±2.9, 0, 0), (±3.2, 0, 0), and (±3.5, 0, 0). In the first case, the VdW surfaceΓ i involves two touched spheres, while in the last case, the enlarged SASΓ e also consists of two touched spheres. After Gaussian smoothing, a connected solute domain Ω i is resulted in the first five cases, while Ω i becomes two isolated spheres in the last case. In all cases, S is sufficiently smooth, especially near the origin in the last case. These qualitative studies indicate that the GCS model provides a suitable diffuse interface for the proposed regularization. In the rest of this paper, unless specified otherwise, we will fix σ 2 = 1 in all GCS studies.

How grid size N influences the GCS
When N goes to ∞ or h → 0, the GCS will approach certain limit. Nevertheless, a finite N has to be used in practice. We next explore how N influences the GCS, by considering the same one atom system studied above. In Figure 9a, we plot S under different grid sizes N = 51, 101, ...401, along the x-axis with y = 0 and z = 0. Visually, all curves coincide with each other, except the one with N = 51, which is a little bit away from the others. It gives us a rough idea that changing grid size will not cause much difference on the shape of GCS surfaces.
Two zoom-in plots are shown in Figure 9 to allow us to see some subtle changes. In particular,  Figure 9b, the zoom-in resolution is insufficient, so that the curves are still clustered. The order from top to bottom is tentatively given as "401 → 151 → 351 → 201 → 301 → 101 → 251 → 51", and shown in the legend. We note that these two sequences are given roughly because the curves cross each other and change their relative positions frequently. What is interesting is that these two sequences have completely different orders. In general, this means that the GCS convergence with respect to N is not monotonic. Consider a "ground truth" curve as the asymptotic limit, the GCS curve will not converge to this limit from left, nor from right. Instead, the GCS curve will oscillate around the limit in a random fashion. In next section, we will show that the electrostatic free energy sensitively depends on the GCS curve near Γ i , so that the energy convergence with respect to N is also oscillatory. To comprehend such oscillations, two diagrams are considered in Figure 10, where a "ground truth" sharp interface is fixed. In each diagram, two uniform meshes are shown. The interface representation problem underlying this study can be formulated like this: based on each coarse mesh, the sharp interface is approximated by a reconstructed interface (not shown) by certain approach. Now without knowing the sharp interface explicitly, the convergence of this approach can only be assessed by examining two reconstructed interfaces, which, unfortunately, are affected by the particular locations of the grid nodes with respect to the sharp interface. In the top subfigure of Figure 10, grid nodes with two general N values, such as N = 15 and N = 21, are plotted. These   nodes are mostly different, except for a few points. Consequently, the interfaces reconstructed based on two independent meshes are quite irrelevant. This is essentially why the convergence pattern in Figure 9 by using a general sequence of N is oscillatory. This motivates us to study the convergence by considering a special sequence of N. In the bottom part of Figure 10, the mesh spacing is halved from N = 11 to N = 21. In this manner, the denser mesh keeps all original nodes of the coarser mesh, while adding more nodes. It is expected that the interface information contained in the reconstruction on the coarser mesh is preserved with newly added information. A better convergence pattern could be resulted.
In Figure 11, we display the GCS curves of one atom system by considering N = 51, 101, 201, and 401. Similarly, two zoom-in plots are also given. It can be observed that the convergence is still not monotonic, but becomes more regular. In particular, if we take the GCS curve for N = 401 as the reference solution, and measure the distance of other curves away from it. Such a distance obviously becomes smaller when N increases. Such a convergence can be illustrated visually and quantitatively. Qualitatively, we can take the surface function S for N = 401 as the reference solution S re f . Note that all grid nodes of the other three mesh sizes are also nodes for N = 401. This enables us to calculate the absolute difference |S N − S re f | point-wisely, where S N is the surface function under grid size N = 51, 101, or 201. Such differences are depicted in Figure 12 as surface plots on the plane z = 0. The convergence is evidently. Quantitatively, we measure the differences by using the L 2 norm: i, j,k |S N − S re f | 2 /N 3 , and the results are found to be 1.4E-2, 3.5E-3, and 6.9E-4, respectively, for N = 51, 101, and 201. This is obviously a convergence on the order of O(h 2 ). Therefore, in the following studies, when we examine the GCS convergence, we will take N = 51, 101, 201, and 401.

Model validation
In this section, we will validate the regularization method and the GCS model by solving the linearized PB (LPB) equation for several simple atomic systems. Comparison of the regularization method and the trilinear technique will be conducted too. In all cases, a cubic domain Ω = [−10, 10] 3 is partitioned by a uniform mesh with the spacing h = 20 N−1 in all directions. Here the length unit for h is Å.

One atom system with an analytical diffuse interface
Consider the one atom system studied in Section 3.3 with center at (0, 0, 0) and radius 2. We first define an analytical smeared surface to verify the proposed regularization method. The boundaries Γ i and Γ e are spheres with radii being r i = 2 and r e = 5, respectively. In Ω i and Ω e , the level set function takes constant values S ( r) = s i = 1 and S ( r) = s e = 0, respectively. In the transition band r i < | r| < r e , it takes the form where the parameter k controls the steepness of transition. We choose k = 6, which is large enough to ensure that S ( r) is a smooth function throughout the domain Ω. This diffuse interface will be called tanh-like surface in the following discussions. We first demonstrate how error cancellation is so crucial for the trilinear method in calculating the electrostatic free energy. Consider a point charge q 1 = 1 at the atom center (0, 0, 0). In Figure 13, the result of the regularization method is compared with that of the trilinear method in two subfigures. Line plots of potentials along a x line with y = 0 and z = 0 are shown. As mentioned in Section 2.4, the electrostatic free energy depends on u( r) − G( r). However, the trilinear method produces a huge error in approximating the singular source q 1 . Thus, the plot of u T L −G has a sharp spike near x = 0. In fact, since G( r) is undefined at x = 0, in this plot, we re-defined the grid value G(0, 0, 0) as the average of six nearest neighboring nodes. For the regularization method, Green's function G is analytically excluded so that the reaction field potential u RF is free of singularity. Being invisible in Figure 13a, the difference between two potentials u T L − G and u RF is actually quite minor away from the point charge.
As suggested in Eq (2.21), in calculating the electrostatic free energy by the trilinear method, one will not use the Green's function directly. Instead, one numerically solves the Poisson's equation (2.20) to obtain an approximated Green's function u 0 . The plot of u T L − u 0 against u RF in Figure 13b shows the power of error cancellation. The error of singular source approximation has been compensated, because the same approximation is invoked twice in u T L and u 0 . Consequently, u T L − u 0 also yields a constant potential inside the sphere, and behaves similarly to the u RF . The height difference between u T L − u 0 and u RF becomes smaller when N is larger.
We next examine the electrostatic free energies for the one atom system with one point charge at the its center and tanh-like surface. The numerical results of the trilinear and regularization methods are listed in Table 1 for various mesh sizes. For the regularization method, because the level function S ( r) is analytically defined for the tanh-like surface, all terms in the regularized source of Eq (2.16) can be computed analytically, including ∇ . Thus, in this example, the numerical error of the regularization is solely from finite difference discretization of the PB equation. When h becomes smaller, the energy of the regularization method quickly approaches to a limit around −65.6 (kcal/mol). For the trilinear method, the charge distribution is actually not conducted, because for all odd N values, the charge center (0, 0, 0) is a grid node. Thus, one simply approximates the delta function by 1 h 3 on that node. Even though the error cancellation trick has been applied, the energy predicted by the trilinear method is still inaccurate. When h is as small as h = 0.05, the energy is still not close to that of the regularization method. With large h values, the errors are significant.
To fully illustrate the approximation error of the trilinear distribution, we next consider the same one atom system and tanh-like surface with two charges q 1 = q 2 = 1 located at (±1.475, 0, 0). In this way, these two charges are not located on grid nodes for all tested N values, so that trilinear distributions are indeed carried out in all cases. The potentials generated by trilinear and regularization are depicted in Figure 14. It can be seen that the potential is no longer flat inside the sphere. We note that charge centers are close to the molecular surface in the present study, which mimics the usual protein problems. Due to an intensive interaction between charges and ∇ in diffuse interface, the potential changes abruptly near the VdW surface x = ±2. This produces a strong grid error, especially when h is large for the trilinear method. The regularization method performs robustly for this difficult problem with a fast convergence. As can be seen in Table 1, energies for h < 1 are all around −303 and the relative difference between N = 201 and N = 401 is as small as 0.11%. The energies predicted by the trilinear method are not close to those of the regularization. In particular, taking the energy of the To further compare the regularization with the trilinear method, their CPU time for solving the one-atom system is reported in Table 2. It can be seen that in most cases, the trilinear method demands more CPU time than that of the regularization, because it needs to additionally solve the Poisson equation (2.20) for calculating the electrostatic free energy. Here a biconjugate gradient iterative algorithm [38] is employed for solving both the PB and Poisson equations. For the regularization, the computation of the discrete source terms takes more time, so that for a large N like N = 401, it becomes more expensive in the case of two charges.

Regularization method verification in GCS model
For the one atom studies above, tanh-like smeared surface can be constructed analytically. However, such surface cannot be applied to more complicated molecules. In order to test the regularization method for real problems, we have introduced an efficient FFT algorithm to generate a Gaussian convolution surface (GCS). For simplicity, we will call the proposed regularization method and GCS model as the REG-GCS method. In this subsection, we first validate the REG-GCS algorithm for several atomic systems.

One atom system
We first investigate the impact of the GCS model by considering again the one atom system. Here σ = 1 is used. One point charge q 1 = 1 will be assumed at (0, 0, 0), and is treated by the regularization method. Electrostatic free energies calculated based on both tanh-like and GCS surfaces are presented in Table 3. Obviously, the energy converges to a different limit when a different diffuse interface is employed. Another subtle difference can be seen in Figure 15a for the convergence lines with respect to all N values. For the tanh-like surface, the same analytical surface is utilized for all N values. Moreover, all source terms in the regularized PB equations are analytically calculated. Consequently, the convergence pattern of the regularization method for the tanh-like surface is superior-the energy Table 3. Electrostatic free energies (kcal/mol) of one-atom system with different grid size N. The regularization method is applied with tanh-like and GCS diffuse interfaces (σ = 1).  converges monotonically and becomes almost flat for large N values. On the other hand, the GCS convergence line is oscillatory in Figure 15a. Moreover, the magnitude of such oscillation decays, but does not decay quickly when even larger N values are used. The convergence is impacted in two ways. First, the GCS surface is constructed based on a given h value and the fixed width r e − r i = 3. In other words, a different smeared surface is used for a different N. Thus, the energy convergence line includes not only the numerical convergence of the regularization method for solving the PB equation, but also the convergence of GCS surface. Second, with the level set function S being defined numerically, ∇ in the regularized sources has to be approximated numerically, which further impact the convergence of the PB solver.
In Section 3.3, we have discussed how grid size N influences the GCS model. The surface function S ( r) has been shown to oscillate with respect to N in the transition region, and the essential reason of such oscillation has been discussed. In particular, the GCS line plots near Γ i roughly follow the order "51 → 101 → 251 → 301 → 151 → 351 → 401 → 201" as shown in Figure 9c, with some lines crossing each other. Since the zoom-in region is close to the charge center, we expect a correlation between the surface plot order and the sorted energy order. The energies in Table 3 can be re-ordered from large to small, which gives an order "51 → 251 → 101 → 301 → 351 → 151 → 401 → 201". By comparing these two orders, we found that they are mostly the same. There are only two tuples (101, 251) and (351, 151) in the GCS order that are switched in the energy order. This mismatch could be due to the selected window for the zoom-in plot. We note that the the GCS order is changing from place to place. So the present GCS order may have minor differences comparing with the surface plot sequences in the innermost transit region. Moreover, for these two tuples, their energy differences are about 0.1, which is less than the total average difference of the value about 0.3. For these reasons, the oscillatory energy convergence pattern is believed to be due to the GCS convergence.
Like we mentioned in the Section 3.3 about the GCS convergence, a monotonic convergence is unlikely when a general sequence of N is adopted. Nevertheless, if we choose a sequence of doubly refined meshes such as 51, 101, 201, 401, the interface information contained in the coarse mesh will be inherited in the fine mesh. This will results a better convergence in the GCS surface construction, as well as the energy calculation. In Figure 15b, we limited ourselves to this special sequence of N, and plot energies of the tanh-like and GCS surfaces. It can be seen that the energy convergence for the GCS is still not monotonic. However, the magnitude of oscillation will decay quickly when h is further halved. In particular, we take the energy at N = 401 as the reference value, and plot the absolute errors for N = 51, 101, and 201 in Figure 15c. The self-convergence now becomes evidently and is monotonic. This agrees with the finding presented in Figure 11c for the GCS convergence by using the same sequence 51, 101, 201, 401. In the following studies, we will test the convergence of the REG-GCS algorithm by only considering the doubly refined sequence N = 51, 101, 201, and 401. By using N = 401, we next test how the free energy depends on the GCS parameter σ, by studying the same one-atom system. As we mentioned before, as σ → 0, the GCS theoretically converges to a sharp interface. In this case, this is the SAS with radius r = 3.5, whose analytical energy is known to be −46.8447 (kcal/mol). It is interesting to examine if the GCS energy could converge to −46.8447 (kcal/mol) as σ goes to zero. However, we should emphasize that such a σ convergence is numerically unreachable. As shown in Figure 6b, the GCS is still a smooth diffuse interface at σ = 10 −10 , which is still far away from the Heaviside function, while an even smaller σ will be beyond the numerical resolution of the grid size N = 401. In the present study, the electrostatic free energies obtained by  For each molecule, the reference energy E re f is taken as the one calculated by h = 0.05Å. taking σ to be 1, 10 −2 , 10 −4 , 10 −6 , 10 −8 , and 10 −10 are plotted in Figure 16 against − log 10 σ. It can be seen that starting from −68.7645 (kcal/mol) at σ = 1, the GCS free energy becomes closer to −46.8447 (kcal/mol) as σ becomes smaller, and shows a good tendency towards the sharp interface limit. In the rest of studies, we will fix σ = 1 for simplicity.

Two atoms system
We next study a diatomic system with two atoms of the same radius 2. At two centers (±10/3, 0, 0), two point charges q 1 = q 2 = 1 are assumed. The surface plot of the potential u RF calculated by the REG-GCS method is shown in Figure 17a. Besides the surface plot over the plane z = 0, the contour plot is also shown at the bottom of Figure 17a. It can be seen that u RF is no longer flat inside two atoms. The smoothness of the potential contour plot also indicates the smoothness of the GCS diffuse interface. The self-convergence analysis of the electrostatic free energy is depicted in Figure 17b. As above, we treat the energy at N = 401 as exact, and calculate absolute errors for N = 51, 101, and 201. Again, for this doubly refined sequence, the REG-GCS energy converges monotonically.

Biological applications
In this section, we carry out numerical experiments to test the performance of the proposed regularization method and GCS diffuse interface for solving real biological systems. Considering the LPB model, we will examine the convergence of the REG-GCS algorithm by studying small compounds and proteins. Salt affinities of protein complexes will also be studied. The dielectric coefficients are chosen as i = 1 and e = 80. The ionic strength is taken as I = 0.15 M, except in studying the salt effect.

Numerical convergence
We first verify the numerical convergence of the proposed REG-GCS method by considering a set of 17 small compounds. The charges, atomic coordinates, and radii are defined based on a parameterization presented in [39]. The solvation free energies of this test set have been studied by using different methods in the literature [39,40]. We note that the present algorithm is designed to compute the polar component of the solvation free energy. Thus, without non-polar components [40], the present results cannot be directly compared with those in the literature. This test set is adopted in the present study, because of small sizes of these molecular systems. This allows to use dense meshes to verify the numerical convergence.
Since the physical dimensions of 17 compounds are different, we will adjust spacing h instead of mesh size N in the present study. For each compound, we will choose a fixed computational domain Ω and partition it into uniform meshes by using h = 0.4, 0.2, 0.1 and 0.05Å. Note that h is halved in each   Table 4. A clear convergent trend is seen in all compounds. To see the converging pattern better, we set the energy calculated by h = 0.05Å (densest case) as the reference value for each molecule. The absolute errors of the other meshes are calculated and depicted in Figure 18. It can be seen that the absolute error approaches to zero in a monotonic manner. This agrees with our previous findings for atomic systems. We next examine the convergence by considering a protein with protein databank (PDB) ID: 1AHO. A fixed domain Ω = [0, 60] 3 Å is partitioned into uniform meshes with a grid size N = N x = N y = N z . By using N = 51, 101, 201, and 401, the calculated electrostatic free energies are reported in Table 5. If we take the energy with N = 401 as the reference, the absolute errors for N = 51, 101, and 201, are 220.9509, 41.9946, and 32.7952, respectively, which shows a monotonic converge trend as well.
The CPU time for generating the GCS surface and solving the LPB equation is also reported in Table 5. For each N, the majority of execution time was spent on solving the three-dimensional (3D) PDE by using a biconjugate gradient iterative algorithm [38]. When one halves the spacing, the degree of freedom (DOF) N 3 increases by eight times. The CPU increment rate is calculated as the ratio of CPU time values between two mesh levels. It is interesting to note that the CPU increment rate of the GCS surface generation is about eight. This indicates the complexity of the proposed FFT Gaussian For the numerical solution of the LPB equation, the CPU time at N = 51 is larger than that at N = 101 for this problem. This is because with a coarse spacing h = 1.2Å, the finite difference discretization error is quite significant so that the biconjugate gradient solver takes too many iterations. When h is refined to h = 0.6Å, the iterative solver actually converges faster so that the CPU time becomes short. When N is large enough, we can see the CPU increment rate is larger than 10, which is normal for the biconjugate gradient method and 3D problems. We next visualize the GCS surface for the 1AHO. With the GCS diffuse interface, we generate three isosurfaces at S = 0.1, S = 0.5 and S = 0.9. It can be seen in Figure 19 that these three surfaces are equivalent in a homological sense, and becomes bigger and smoother as S decreases. The electrostatic potential u RF computed by the REG-GCS algorithm is mapped onto three isosurfaces. Such potential plots can help us seeing the changing location of electrostatic potential and are helpful to the study of protein interactions. As expected, the intensity of the potential map is stronger when S is larger or closer to the VdW surface. The potential becomes weaker as S is approaching zero.

Electrostatic free energies of proteins
We next apply the REG-GCS method for calculating electrostatic free energy of proteins by using a fixed mesh spacing h = 0.5Å. For this purpose, a set of 21 proteins which have been studied in [11] are chosen. The results are listed in Table 6 under the name REG-GCS. In order to validate our new model, the rMIB algorithm [8] has also been employed to compute the energies. In solving the LPB equation with a sharp interface, the rMIB method treats the molecular surface by using a second order accurate interface algorithm, and uses a two-component regularization to remove charge singularities. Thus, the rMIB method is one of the most accurate algorithms for solving the PB equation [8]. Because of the difference between sharp interface and diffuse interface PB models, the energies of the rMIB method do not agree with those of the REG-GCS. However, certain correlation can be identified.
In Figure 20a, the free energies calculated by the rMIB and REG-GCS models are plotted against the number of atoms for 21 proteins. It can be seen that the difference between two models becomes larger as the atom number increases. In Figure 20b, such difference is shown against the atom number N m . A linear trend is clearly behind the data points. This motivates us to construct a linear fitting model. Let us define the energy difference as D = E(rMIB) − E(REG-GCS), and assume that it is a linear function of N m , i.e., the number of atoms. By using the least squares fitting, we have obtained D = −0.2594N m − 59.6969 (kcal/mol).
Consider that it takes a longer computational time by applying the rMIB model, we propose a simple formula to predict the free energy results of the sharp interface PB model by using our diffuse interface PB model. With the REG-GCS energy, we will predict the rMIB energy as E(rMIB) ≈ E(REG-GCS) − 0.2594N m − 59.6969 (kcal/mol). The resulted energies are also shown in Table 6 under the name REG-GCS-predict. Obviously, these predictions are in good agreement with the original rMIB energies. Moreover, the predictions are plotted against the original rMIB energies in Figure 21. The Pearson correlation coefficient of the prediction results and actual rMIB results is found to be r = 0.998. This means the prediction model is able to yield the rMIB energies almost precisely identical.
This present study demonstrates the consistency between the REG-GCS model and rMIB model. This is not surprising, because both models share many common counterparts, such as the linearized PB equation, two-component regularization, and second order finite difference discretization. From numerical point of view, the REG-GCS model is easier to implement.

Salt effect on protein-protein binding energies
We finally investigate the performance of the proposed REG-GCS model for the evaluation of the salt effect on the binding of ligands, proteins, peptides, etc.. By solving the LPB model, one can calculate the polar component of the binding energy (∆E), which measures the difference in the electrostatic free energy between the protein complex and its monomers at a salt strength I.
where E AB (I), E A (I) and E B (I) are the electrostatic free energies of the complex AB, monomer A and monomer B, respectively, at a given ionic-strength I. The polar binding free energy can be further split into salt independent and dependent parts. The salt dependent part can be analyzed by calculating ∆E(I) based on some salt strength I and at the zero salt concentration. By taking difference of these two quantities, the salt dependent part of the binding free energy is defined as [41] ∆∆E(I) = ∆E(I) − ∆E(0), in which ln I is treated as the independent variable, and ∆∆E is regarded as a function of ln I. Numerically, the binding affinity A is estimated by calculating ∆∆E at several salt strengths of I or ln I, and conducting a least square fitting for the slope.
In this investigation, we tested the salt dependence of binding free energy on 7 protein-protein complexes that have been tested in [41,42]. CHARMM force field has been used. The results calculated by the proposed REG-GCS model with h = 0.5Å are listed in Table 7. For each complex, several salt strengths of I are calculated within a range. For the E9Dnase-Im9 and Lactoglobulin dimer, the range is [0.02, 0.08], while the other complexes use the range [0.02, 0.6]. The experiment results of the binding affinity are known for these 7 complexes, and are also shown in the table. For a comparison, numerical predictions by using a sharp interface PB model [41] and a Gaussian dielectric smooth PB model [42] are also included. It can be seen that the salt effect is much weaker for the present diffuse interface PB model.
In Figure 22, we plot the REG-GCS binding affinity A against the experimental value. With a weak salt effect, the range of the REG-GCS binding affinity is much shorter than that of the experimental data. Nevertheless, our results are actually highly related to the experimental results. The Pearson correlation coefficient between the binding affinities of the REG-GCS method and experimental ones is actually 0.9564. Physically, the much weaker salt effect is due to the nature of the diffuse interface model. The ionic strength I determines the value of the modified Debye-Hückel parameter κ for the solvent. For sharp interface and Gaussian-dielectric PB models [41,42], the full strength of κ is applied quickly outside the VdW surface or in traditional solvent region. For the present GCS diffuse interface, the salt effect is loaded gradually with (1 − S )κ in the transition band, and is fully operated after 3Å away from the VdW surface. Thus, the impact of changing I to the potential u becomes much insensitive in the diffuse interface model. This is essentially why the REG-GCS salt effect becomes weaker, but is still consistent with the experiment data.
An easy fix is carried out. Because of the underestimation, we will amplify the binding affinity of the REG-GCS by a constant m to give A m = mA, such that the scaled salt dependent part of the binding free energy m∆∆E(I) matches the experimental values in a least square sense. The best scaling factor is calculated as m = 3.9201. The scaled binding affinity values are listed in Table 7 for the column REG-GCS-scaled. The mean square error equals to 0.13 for such scaled A m and experimental A values. In Figure 22, A m is also plotted against experimental A. Without any distortion, this scaling maintains the same Pearson correlation coefficient 0.9564. For both original and scaled REG-GCS results, we also fit them as straight lines in Figure 22. The corresponding slopes are 0.2075 and 0.8134, respectively. In summary, with an appropriate scaling to account for the hidden physical feature, the REG-GCS diffuse interface PB model produces decent estimates of the binding affinity. It is believed that the scaling factor should be related to the width of the GCS transition layer. This will be explored in a future work.

Conclusions
The main contribution of this work is the introduction of the first regularization method in the literature for treating charge singularities of the Poisson-Boltzmann (PB) model in a diffuse interface setting. Traditionally, only direct methods such as a trilinear distribution of the point charges are available for diffuse interface PB models [9,10,[15][16][17][18]. However, by approximating the delta function by a finite quantity, the trilinear method produces a significant error at charge centers. Taking advantage of an error cancellation in calculating the electrostatic free energy, the trilinear method can produce bounded energy estimations. Nevertheless, the source approximation error is still large and makes this method unreliable for large spacing h. On the contrary, the proposed regularization method avoids singularity analytically, and produces a fast convergence in calculating free energies when the diffuse interface is analytically constructed.
In order to validate the proposed regularization for more general problems, we have also introduced a Gaussian convolution surface (GCS) to characterize a smooth solute-solvent boundary for proteins with complex geometries. Physically, beginning with macromolecule interior and moving into the water phase, the polarizability increases gradually, and should not undergo a sharp jump. In other words, the dielectric function will increase smoothly from i to e over a transition band. This is the physical justification of the diffuse interface modeling. Mathematically, the diffuse interface PB equation with a smooth function guarantees a better regularity for the potential u near the solute-solvent boundary, i.e., u is C ∞ continuous in Ω t . For the sharp interface PB equation, u is merely C 0 continuous, so that extra efforts are required for mathematical analysis and numerical computation. Computationally, based on the fast Fourier transform (FFT), the GCS generation is very efficient. In particular, the complexity of the GCS algorithm is O(N 3 ), where N 3 is the total degree of freedom. We note N 3 value depends on the physical size of protein domain and mesh spacing h. This is different from the SES generation, which only depends on protein domain. Moreover, in testing the performance of the proposed regularization (REG) together with the GCS model, we have found that the energy convergence of the REG-GCS model is dominated by the GCS convergence, which is typically oscillatory for a sequence of general mesh size N. Nevertheless, for a doubly refined sequence of N, the energy convergence pattern becomes regular. We note that the bandwidth of Ω t should be large enough, such as 3Å in the present study. Otherwise, finite difference approximation may not able to capture the change of ∇ in Ω t . Moreover, when the width shrinks or when σ goes to zero, the GCS surface function becomes a Heaviside function. In this case, the present GCS-REG model should not be applied. Instead, one should use various regularization schemes developed for the sharp interface PB model.
In our future studies, it is interested to apply the proposed regularization in other diffuse interface PB models [9,10,[15][16][17][18]. Moreover, this study opens a new theoretical direction for developing regularization formulations for other heterogeneous dielectric PB models, such as the super Gaussian PB model [12]. The development of more accurate and efficient numerical algorithms for solving the regularized PB equation with a smooth dielectric profile will also be explored. The GCS model optimization by taking σ as a tunable parameter deserves further investigation.