Machine Learning Inference of Molecular Dipole Moment in Liquid Water

Molecular dipole moment in liquid water is an intriguing property, partly due to the fact that there is no unique way to partition the total electron density into individual molecular contributions. The prevailing method to circumvent this problem is to use maximally localized Wannier functions, which perform a unitary transformation of the occupied molecular orbitals by minimizing the spread function of Boys. Here we revisit this problem using a data-driven approach satisfying two physical constraints, namely: i) The displacement of the atomic charges is proportional to the Berry phase polarization; ii) Each water molecule has a formal charge of zero. It turns out that the distribution of molecular dipole moments in liquid water inferred from latent variables is surprisingly similar to that obtained from maximally localized Wannier functions. Apart from putting a maximum-likelihood footnote to the established method, this work highlights the capability of graph convolution based charge models and the importance of physical constraints on improving the interpretability.


2
The most notable estimation of the molecular dipole of water was given by Coulson and Eisenberg with an induction model in 1966 [4]. They found that the reaction field from neighboring molecules in ice I h increases the molecular dipole to about 2.6 D, compared to 1.84 D of an isolated H 2 O molecule. This issue was revisited in 1999 [5]. Instead of partitioning the total electron density of liquid water to individual water molecules [6], Silvestrelli and Parrinello found that µ is about 3.0 D by using density functional theory based molecular dynamics (DFTMD) [7] and maximally localized Wannier functions (MLWFs) [8]. Their result agrees with the reanalysis of the Coulson-Eisenberg model [9] and implies a charge transfer of about 0.5e along each O-H bond [10,11].
In MLWFs, [8] the Boys localization [12] was employed to maximize the distance between centroids of orbitals (Wannier centers). However, alternative localization schemes are possible. [13]. In a recent work, [14] Edmiston-Ruedenberg localization [15], which maximizes the self-repulsion energy of orbitals, has been explored to investigate the molecular dipole moment in liquid water. Depending on the regularization parameter, the resulting µ varies between 2.35 D and 2.63 D [14].
In this work, we tackle this issue with a physically constrained data-driven approach. Similar to the dipole moment of an isolated molecule, the change in polarization ∆P in condensed phase systems is well defined and experimentally measurable [16]. For liquid water, with the choice of the molecular gauge, P is the so-called itinerant polarization [17]. This is the target quantity that we used in the regression task for liquid water. In contrast, molecular dipole moments here are inferred from latent variables and not involved in the training of the model. Another physical constraint built into our regression model is the charge neutrality of each water molecule, which is formally required by the integer change of the polarization quantum in the modern theory of polarization [18,19]. Taking these ingredients into account, we show that the distribution of molecular dipole moments inferred from our regression model using the graph convolutional neural network architecture PiNet is surprisingly similar to that obtained from MLWFs. Moreover, the trained model, with PiNet using only data at ambient conditions, is transferable to liquid water in a range of different densities.
The loss function L that we used for predicting the itinerant polarization P is the squared error in terms of the l 2 norm: where R i is a 3 × N matrix of the atomic coordinates of the configuration i for a system containing N atoms. Q i represents a column vector of the atomic charge in the configuration i. Ω is the volume of the simulation box. In our ML model, neutrality is enforced for each water molecule. This is done by subtracting the net charge using the following expression: where N w is the number of water molecules in the system, J is a column vector with each entry equal to one, and is used as the symbol for the concatenation of vectors. The loss function in Equation (2) was trained with PiNet, a high-performing graph convolutional neural network architecture which works for both isolated molecules and condensed phase systems [20]. Despite the simplicity of our scalar dipole model, PiNet leads to an outstanding learning curve with the QM9 dataset [21], compared to even more sophisticated models [22] (see Figure 1a). For liquid water, the itinerant polarization P was computed with the CP2K package [23] and the BLYP functional [24,25] using the Berry phase formula [26] (see Equation (S1)). In this case, the PiNet-dipole model gives an excellent prediction of ||PΩ|| with a mean absolute error (MAE) of 0.037 D for the test set (see Figure 1b). Here we followed the conventional 80:20 splitting of the dataset (see Electronic Supplementary Information for further descriptions of the computational methods and the dataset). Data of the MuML model was extracted from Ref. [22]; b) The parity plot for itinerant polarizations P of liquid water calculated using the Berry phase formula and the ones predicted from the PiNet-dipole model.
Before looking into the inferred molecular dipole moment in liquid water from the PiNet-dipole model, it is necessary to set up a baseline for the sake of comparison. The baseline model used here is the linear regression (LR)-dipole model. Equation (2) resembles a LR problem where Q may be considered as the weight vector. However, this requires the configuration-dependent Q i to be substituted by the configuration-independentQ. This leads to the loss function used in the LR-dipole model as where Tikhonov matrices Γ are introduced to ensure the charge neutrality of each water molecule and λ is the ridge parameter. For example, if the first three entries inQ are {q O , q H , q H } for one water molecule, then Γ j can be written as: Therefore, the corresponding optimal solution forQ * is: Note that here the subscript i is dropped and that R includes all the input data over three Cartesian coordinates, which make it a 3n × N matrix instead. As seen in Table 1, the itinerant polarization of liquid water predicted from the PiNet-dipole model is orders of magnitude more accurate than that predicted using the LR-dipole model. Moreover, the PiNet-dipole model with charge neutrality constraint performs significantly better than the one without charge neutrality constraint. These highlight the necessity of having environmental-dependent atomic charges built into the PiNet-dipole model and to impose physical constraints which are compatible with the modern theory of polarization. Now we are ready to compare the molecular dipole moments in liquid water as inferred from the PiNet-dipole model with those inferred from the linear regression dipole model. To put this comparison into perspective, we included the ones computed with Wannier centers from MLWFs (see Equation (S2)). The reference calculations of the Wannier dipole moments were done with the same computational setup as the one used in computing the itinerant polarization P (see Electronic Supplementary Information for details). Results of this comparison are given in Figure 2.
The most interesting finding in Figure 2a is that the distribution of molecular dipole moments in liquid water inferred from the PiNet-dipole model is very close to that computed with Wannier centers. Since there is no regularization term in Equation (2), this result may be seen as a maximum-likelihood footnote to MLWFs. In this regard, the LR-dipole model is quite off. In addition, as shown in Figure 2b, the charge neutrality constraint is an important factor to achieve a good correlation. Because reproducing the itinerant polarization of liquid water and retaining the charge neutrality for each water molecule are two main ingredients in the PiNet-dipole model, the question that naturally arises is whether methods which satisfy these two conditions will necessarily lead to molecular dipole moments which are close to the ones obtained with Wannier centers in MLWFs. The parity plot for molecular dipole moments in liquid water between the ones computed from Wannier centers and the ones inferred from the PiNet-dipole models (with and without charge neutrality constraint); c) The parity plot for molecular dipole moments in liquid water between the ones computed from Wannier centers, and the ones inferred from the LR-dipole model and the variational charge method. (2) as a regression problem, one may use reference charges Q • to supplement the equation and transform it to a set of linear equations [27,28]. This approach is what we call the variational charge method here, as the loss function shown below may be viewed as an energy functional with respect to the atomic charge Q i .

Instead of viewing Equation
where is a column vector which plays the role of a Lagrange multiplier and κ is a weight parameter. To introduce charge neutrality for each water molecule, we added a third term in Equation (7) where η is a Lagrange multiplier and Γ j is the same Tikhonov matrix as given in Equation (5).
Taking the derivative of Equation (7) with respect to Q i , and η leads to a set of where I is a N × N identity matrix. Note that Equation (8) is solved independently for each configuration i. The reference charges Q • for liquid water were obtained with the REPEAT method [29,30], which provides electrostatic potential derived charges in periodic systems (see Electronic Supplementary Information for details). Since both and η in Equation (7) are Lagrange multipliers, the two physical constraints mentioned before (reproducing the itinerant polarization and retaining the charge neutrality) are exactly satisfied in the variational charge method. However, as shown in Figure 2a and Figure 2c, the resulting molecular dipole moments are very different from the ones obtained with Wannier centers. This suggests that one should not take the striking agreement between the molecular dipole moments in liquid water inferred from the PiNet-dipole model and those computed with Wannier centers for granted. The effective inclusion of longrange charge transfer in graph convolution [31] is likely to be the reason behind this, because the delocalization tail along the hydrogen bonds contributes significantly to the molecular dipole moment in water clusters [32].
Finally, how good is the transferability of the PiNet-dipole model for liquid water? To answer that, we used a publicly accessible dataset for BLYP liquid water at different densities [33,34]. Results of the predicted itinerant polarizations P and the corresponding mean molecular dipole moments µ are shown in Figure 3. Despite the fact that the PiNet-dipole model was trained using data only at ambient conditions (i.e. ρ = 1.0 g/cm 3 ), this model is rather transferable for liquid water in a range of different densities (Figure 3a). Moreover, the mean water dipole moments µ at different densities as inferred from the PiNet-dipole model correlate quite well with those calculated from the Wannier centers (figure 3b). These excellent agreements may be due to the high expressiveness of the underlying graph convolutional neural network architecture PiNet [20], which is also seen in the learning curve shown in Figure 1a for isolated molecules.
Presumably, the paradigm in atomistic machine learning at present focuses on predicting physical quantities which enter into the loss function and the model training, e.g. energy, force and charge [35,36,37]. Instead, we looked into the latent variables of a trained network in this work and showed that the molecular dipole moments in liquid water inferred from the PiNet-dipole model are surprisingly in accord with those obtained from Wannier centers. Apart from the importance of physical constraints as shown in work, future studies should investigate other factors which could contribute to this agreement, e.g. an effective inclusion of long-range charge transfer. With these efforts, graph convolution based charge models could provide an alternative for describing the itinerant polarization in condensed phase systems [38]. Thus, their applications to charged systems, such as electrolyte materials and electrified solid-liquid interfaces [39,40], shall be anticipated.

Data availability statement
The polarization dataset for liquid water can be accessed using the following link https://doi.org/10.5281/zenodo.4752246.

Electronic Supplementary Information
The Electronic Supplementary Information (ESI) includes: Description of QM9 and liquid water datasets, description of graph convolution and PiNet architecture, description of hyper-parameters used in the PiNet-dipole model, the comparison of charge distributions in liquid water from different models, and the influence of cutoff radius on the PiNet-dipole model.

A Description of the QM9 and the liquid water datasets A.1 QM9
The QM9 dataset is a subset of the GDB-17 database of 166 billion organic molecules [1]. The subset consists of 133885 molecules containing up to nine heavy atoms. These molecules contain carbon, oxygen, nitrogen, fluoride and hydrogen. Therefore, the dataset presents a general sample of the chemical space for small organic molecules. 3054 molecules were excluded from the dataset used in this work, as these failed the geometry consistency check. More detailed information about this dataset can be found in the original paper by Ramakrishnan et al. [2].

A.2 Liquid Water
The liquid water dataset was created using a trajectory of ab initio MD simulations of liquid water. Simulations were done using the CP2K package [3,4]. A cubic box with a length of 12.432Å was filled with 64 water molecules. Then, an NVT simulation was performed at 330 K for 39000 steps with a timestep 0.5 fs, meaning a trajectory of 19.5 ps was generated. The BLYP functional [5,6] was used with the TZV2P basis set [7], and the GTH pseudopotential [8]. The plane-wave cutoff was chosen as 600 Ry. The Bussi-Donadio-Parrinello thermostat [9] was used with a time constant of 20 fs. For every other frame in the trajectory the REPEAT charges [10], the Wannier centers [11], and the Berry phase polarization [12,13] were computed using the CP2K package [3,4]. Note that the itinerant polarization and molecular dipole moments can be readily obtained from Equation S2 by using the molecular gauge. Equivalently, the Berry phase formula (Equation S1) also gives the itinerant polarization when the branch shift is taken into account.
where Z i and R i are nuclear charge and position respectively. Ψ is the many-body wave function, e i(2π/L) ir i is the multiplicative operator and L is the cell length of a cubic lattice (for the sake of simplicity) with the volume Ω.
where N ions and N occ illustrates summing over the ions and the occupied states. Here,r W C n is the position of the Wannier center [14].

B Description of graph convolution and the PiNet architecture
A graph convolutional neural network (GCNN) combines the structure of a graph neural network and that of a convolutional neural network. A graph neural network ensures that the molecular structure is represented as a graph, where each atom is a node, and the pairwise interactions between the atoms are weighted edges. At each iteration, a message is passed between the nodes, which results in the node and edge feature vectors being iteratively updated. The convolutional part comes into play as a learnable radial filter that learns the features of the atoms S2 within the cutoff radius R c . Because of this, a hierarchy of features is created. The graph convolutional neural network has the same subnet for each element, as the generated node features are element specific [15]. In this work, the PiNN library (https://github.com/Teoroo-CMC/PiNN/) was used which was developed previously [15]. PiNN is a Python library specifically designed to develop interpretable ANN architectures, and to implement existing ANNs in an efficient manner. To do this, PiNN describes neural networks through two different operations: the pairwise interaction (PI) operation, and the interaction pooling (IP) operation. PiNet uses these two operations to create an interpretable GCNN that describes pairwise interactions as a function of the distance between atoms. The form of the function is determined by the atomic properties of the interacting atoms.
In the PI operation, the atomic properties P i can be used to create a pairwise interaction I ij , which is a function of the starting properties of the two atoms as well as the distance r ij between them (see Equation S3).
In Equation S3, t is the iterator. This iterator goes up to the number of graph convolutional blocks in the graph convolutional neural network. The use of the iterator ensures that manybody interactions are taken into account.
The opposite of this is the IP operation. This operation can be used to compute a new atomic property by summing over all the pairwise interactions of a certain atom and passing this to a function, Equation S4. The result of this is an updated atomic property. Since all the interactions are summed over, this also ensures that the result is permutation invariant.
Combining these two operators creates an atomic fingerprint that is capable of capturing the atomic information of the neighbors and using this to update the information of the atom. When multiple graph convolutional blocks are used I ij gets updated as well. In particular, PiNet expresses the PI operation as three parts. First, the interatomic distances are expressed in a radial basis e ij . To calculate this, n basis Gaussian functions multiplied with a cutoff function f c are used. As a cutoff function, the cutoff function by Behler et al. [16] is used.
Then, the activation occurs by applying the PI layer, which is a feed-forward neural network. This neural network generates a weight matrix W ij from the atomic properties P i , and P j .
Finally, activation through the interaction-interaction (II) layer occurs. This II layer is another feed-forward neural network which uses the information from the previous two parts as input to compute the interaction I ij .
In order to further refine the updated atomic property, another feed-forward neural network is used. This is done in the property-to-property (PP) layer. Afterwards, the information is passed to the next GC block [15].

C Description of the hyper-parameters used in the PiNet-dipole model
The PiNet-dipole model used here has the following architecture: The II layer consists of four layers of 64 nodes. The PI layer consists of a single layer of 64 nodes. The PP layer consists of four layers of 64 nodes. The en-layer, which can be considered as an IP layer but with fully connected nodes and coefficients, consists of 64 fully connected nodes before the final result is produced. Five graph convolutional blocks were used in the network. Ten Gaussian basis functions were used which have a width of 3.0Å, and a cutoff radius R c of 4.5Å. The charge neutral model was trained for 1.5 million steps, the model without charge neutrality constraint was trained for 2 million steps, and a batch size of 200 data points was used during the training. The learning rate was set to be 10 −4 at the beginning and was scaled by 0.994 every 10000 steps.
D The comparison of charge distributions in liquid water from different models Figure S1: a) The distribution of charges of liquid water predicted from the PiNet-dipole model, the LR-dipole model, and the variational charge method.

S4
E The influence of the cutoff radius on the PiNet-dipole model Figure S2: a) The parity plot for itinerant polarizations P of liquid water predicted from the PiNet-dipole model using a cutoff range of 3Å, 4.5Å, and 6Å. ; b) The parity plot for molecular dipole moments in liquid water predicted from the PiNet-dipole model using a cutoff range of 3Å, 4.5Å, and 6Å.