Discovery of new microscopic structures in surface water on graphene using data science

We present the first-ever application of a statistical data analysis technique based on a persistent homology method combined with molecular dynamics simulations to determine the microscopic structure of water on graphene. This work assessed the rapid transition in the water structure on going from surface water to free water with increasing distance from the graphene surface, which is one of the most fundamental issues to be resolved in the field of water research. The crossover distance was unexpectedly short at approximately 1 nm, equivalent to three water layers. Within this 1 nm thick layer of surface water, the rotational freedom of tetrahedral water clusters was almost zero due to the presence of the graphene surface. Thus, water on hydrophobic surfaces such as graphene was found to contain a newly-discovered type of water cluster.


Introduction
Water is a very familiar substance and is necessary for life as we know it, and so both bulk water and water on surfaces or confined in nanospaces have been studied in detail. Surface water is especially important because it can greatly affect the physical and chemical properties of material surfaces, such as wettability and friction. 1) Graphite surfaces have been widely used to research such effects because graphite represents a typical hydrophobic surface. 2) Surface water on carbon nanomaterials such as graphene 3) and carbon nanotubes (CNTs) 4) has also attracted much attention. Because these carbon nanomaterials have non-polar, atomically flat surfaces, they are ideal for the study of the microscopic structure of surface water. Recent theoretical and experimental investigations showed that both graphene and CNT surfaces can be covered with several layers of water molecules even though these surfaces are hydrophobi. [5][6][7] This surface water can be directly observed using high-speed frequency modulation atomic force microscopy (FM-AFM). 8,9) In addition, recent theoretical studies based on first-principles calculations have demonstrated the existence of several stable types of polygonal water clusters in bulk water based on hydrogen bonding. [10][11][12] In the case of surface water on graphene, several planar water clusters were also identified in work by Maekawa et al., 13) who established that the probability of the formation of such clusters on surfaces is different from that in bulk water.
However, the differences between surface and bulk water have yet to fully established, because it is difficult to analyze the transition between the two scenarios using conventional simulation methods such as trajectory analysis and snapshot structural observations. Thus, this type of analysis remains one of the simplest but most important issues related to the study of water on material surfaces. In the present work, we employed a data science approach based on a persistent homology (PH) analysis 14,15) combined with molecular dynamics (MD) simulations. In contrast to the previous application of PH to a solid (i.e. silica) by Hiraoka et al., this study represents the first-ever use of PH to assess a liquid (i.e. water). 16,17) 2. Simulation methods

MD simulations
The structure of water on graphene, on which the present statistical analysis was based, was constructed using classical MD simulations. The MS ForcitePlus 18) module in the Materials Studio 8.0 software package was employed for this purpose, with force fields calculated using COMPASS II. 19,20) The cutoff for short-range interactions in these simulations was 1.0 nm while long-range interactions were calculated using the Ewald method. 21) The optimized structure of water molecules generated by the COMPASS II package included an O-H bond length of 0.11 nm and an O-H-O bond angle of 109.4°, and this molecular structure was fixed for use in the present MD simulations in conjunction with a time step of 1.0 fs. The MD temperature was controlled by a Nose thermostat and calculations were carried out at both room temperature (RT) and low temperature (LT). Since it is known that the MD temperature of T MD = 250 K corresponds to the actual temperature of 300 K based on the evaluation of the diffusion coefficient 13) in the COMPASS II force field, this value was used for RT simulation. On the other hand, a value of T MD = 100 K was used for LT simulation. The same initial configuration was used for simulations at both LT and RT. The simulations were performed over a time span of 100 ns at each temperature and the structure obtained from the last 10 000 steps was used for statistical analysis.

PH analysis
During the statistical analysis, the two-dimensional hydrogen bond network of the first layer on graphene was examined using a polygon extraction algorithm, whereas a threedimensional structural analysis was carried out during the PH analysis (see appendix A for details regarding the polygon extraction algorithm). PH is a method for the characterization of data structures including graphics and images in a multi-scale manner, and represents a type of topological data analysis. This method, which was first proposed by Edelsbrunner and Haere, 14) has recently being applied in a wide range of fields, and the details of PH have been described in a number of publications. [22][23][24][25][26] In brief, this technique is based on finding features in data by searching for so-called holes, such as rings and cavities. Together, the times at which a hole is born and dies comprise a persistence pair, and a plot of the birth and death times on the horizontal and vertical axes is termed a persistence diagram (PD). A PD plot in which the line is further from the diagonal suggests longer hole survival times. In the field of materials science, PD has been applied by regarding atom positions (specifically oxygen positions in the present work) as a collection of points in space. 16,17) Herein, we applied this technique to the structural analysis of water molecules for the first time ever. In order to simplify the computational treatment, a periodic boundary was not used in our MD simulations. CGAL, 27) Dionysus 28) and Diode 29) were used in the preparation of the PD data. As in our previous work, 13) we used a cell with x = 2.13 nm, y = 2.214 nm and z = 4.15 nm, in which 60, 100 or 160 water molecules were distributed. Since 60, 100 and 160 molecules formed what were essentially single, double and triple layers of water molecules, these are referred to as the 1L, 2L and 3L models, respectively. Similar calculations and analyses were also performed for a model in which 400, 250 and 150 water molecules were dispersed in a cell having dimensions of (x, y, z) = (3.409, 3.444, 6.134) nm. The results of such calculations were found not to be affected by the MD cell size, and so the analytical results for the 60, 100 and 160 molecular systems are shown in the following sections.

Results and discussion
3.1. Density distribution of water molecules perpendicular to graphene surface The structures of the 1L, 2L and 3L models and the distribution functions for oxygen atoms in the z-axis direction (perpendicular to the graphene surface) obtained from MD calculations are shown in Fig. 1. From these data, it is evident that water layer structures were formed at both RT and LT, particularly for the 1L and 2L models. In contrast, in the case of the 3L model, the formation of the first layer on the graphene surface is clearly visible but the second layer has a lower peak than the first while the third layer is broader and has no distinct peak structure. At RT, The second and third peaks connect smoothly. It should also be noted that subpeaks (at 0.5 and 0.8 nm) can be seen slightly below the main peaks (at 0.6 and 1 nm), indicating the appearance of second and third layers at LT. There has, to date, been little discussion of this complex peak structure, but its origin will be explored later (Sect. 3.5) in this paper.

Inner-layer hydrogen bonding network on graphene surface
Here, we discuss the analysis of the in-plane hydrogen bonding network structure of the first layer on the graphene surface. The network structure in the planes of 1L and 2L models have been discussed in detail by Maekawa et al. 13) and the results of the present work are in good agreement with these prior data. Figure 2 shows the number of polygons (from triangles to octagons) in the hydrogen bonding networks in the first layers on the graphene for a total of 10 000 steps sampled from MD production runs. Using the 1L and 2L models, tetragons appeared most frequently at both RT and LT and there is a characteristic increased number of appearance of tetragons at LT in the 1L model. In the 1L model, the formation of tetragons is the most efficient and stable means of covering the surface because all OH groups can interact via intra-layer hydrogen bonds. Other polygons, such as pentagons and hexagons, are less efficient. In the case that the water molecules in the layer adopt these other shapes, some OH groups must orient themselves vertically relative to the graphene surface, and so cannot connect with other water molecules by hydrogen bonding. Interestingly, the results obtained from the 3L model show the same trends as in the 1L and 2L models at RT, but the number of pentagons is highest at LT, which differs from the other scenarios.

Analyses of PH
The changes in the in-plane network structure of the first water layer on the graphene can be clearly seen from the results of the PD analysis for each model in Fig. 3. In these plots, the units for both axes are angstroms so that the data are easier to compare to the lengths of hydrogen bonds. At a distance of approximately 1.4 Å above the birth axis, the dark region is seen to move upwards (that is, in the direction in which the death time becomes longer) as the model transitions from 1L to 3L. If we consider a regular polygon for which the length of one side equals the distance between hydrogen bonds, the birth value is approximately 1.4 Å while the death values are approximately 2.0, 2.4 and 2.8 Å for a regular square, pentagon and hexagon, respectively, and the polygon moves upward along the same birth line (appendix B). Thus, it is evident that the change in the thick portion of the PD plot at which the birth value is 1.4 Å reflects the change in the in-plane cluster structure of the water in the first layer on the graphene. This result is also qualitatively consistent with the change in the proportion of polygons in the cluster structure formed by the water in the first layer on the graphene surface, as seen in Fig. 2. 3.4. Novel three-dimensional water structure discovered by PH Additional features can be obtained from the PD analysis, including a structure that appears in the vicinity of birth and death values of 2.2 Å and is prominent at LT. In the 1L model, there is no persistence pair near this point. Rather, this feature begins to appear in the 2L model and is not only more prominent in the 3L model but has a definite pinnate structure. When the origin of these persistence pairs was investigated by considering their relationship with the hydrogen bond distance, it was found that they were caused by a distorted triangle with two oxygen atoms at its apex and a water molecular at the center of gravity of a tetrahedron. The data also showed that the pinnate part represents a persistence pair derived from a polygon formed by the oxygen atoms located at the center of gravity of the tetrahedron and at its second nearest neighbors [ Fig. 4(a)]. Considering the efficient formation of the hydrogen bonding network, in the case of the 2L model, energetic stability is increased by closing the hydrogen bonding network through forming polygonal clusters in and between planes. That is, in the 2L model, tetrahedra are formed in a limited manner under the condition   that the direction of water molecules is restricted at the graphene-water and water-vacuum interfaces. Conversely, in the 3L model, since the third layer exists, more tetrahedra can be formed by increasing the degrees of freedom related to the direction of movement of water molecules in the intermediate layer. Therefore, the difference in the total number of tetrahedral structures formed suggests that no pinnate peak should appear in the 2L model at the same LT. At RT, this peak is buried in a broad distribution, likely because of the frequent rearrangement of the hydrogen bonding network due to thermal fluctuations and the presence of water molecules in various situations, including tetrahedral structures. The PD differences between the three models at RT appear small. However, an analysis of the tetrahedral order 30,31) shows an increase in tetrahedral structures in the 3L model compared to the other two models. Therefore, the behavior at LT evidently demonstrates that the formation of tetrahedral structures is more pronounced at RT. See appendix C for details.

Crossover between surface and free water
Considering the direction of formation of the tetrahedra, it is conceivable that any one of the triangular faces, sides and apexes may be situated in the first layer of the graphene surface. However, since the apexes of a tetrahedron are not hydrogen bonded and the distance between the apexes is about twice as long as the hydrogen bond distance, if the base of a tetrahedron is situated in the first layer, the gain in energy associated with hydrogen bonding will be lost. That is, from the viewpoint of energy theory, it is expected that tetrahedra whose apexes are located in the first layer on the graphene surface will be predominant. Thus, the oxygen atom at the center of gravity takes a position closer to the second layer than to the middle of the first layer and the second layer. The oxygen at the center of gravity of this anisotropic tetrahedral structure is the origin of the subpeak just below the second layer in the distribution of oxygen atoms along the z-axis shown in Fig. 1. The anisotropy of the tetrahedra resulting from the effect of the surface sharpens this subpeak. In addition, the subpeak seen under the third layer is also considered to be derived from the oxygen at the center of gravity of the tetrahedron. The broadening of the peak is thought to be due to the weakening of the anisotropy caused by the reduced effect of the surface and the approach to free water. Figure 4(b) shows some tetrahedral structures in the vicinity of graphene in a snapshot structure used in the analysis. In the actual snapshot structure, it can be seen that the oxygen of the tetrahedral center of gravity is located at the height corresponding to each subpeak. Although it has been reported that ice-like tetrahedral structures are formed in free water, 32) the number of water molecules required for the development of such three-dimensional structures is unknown. The present results indicate that the presence of tetrahedral structures is already significant when three layers of water molecules are stacked on the graphene surface, and that a two-dimensional hydrogen bonding network is transformed into a three-dimensional network. The data also suggest that tetrahedral structures between the first and second layers will be highly anisotropic, while tetrahedral structures between the second and third layers show lower anisotropy and approach free water. That is, the crossover between surface water and free water occurs within only three layers of water.

Conclusions
This work applied PH, which has become an important data analysis method in recent years, to the analysis of water structures for the first time ever. This analysis confirmed a crossover between surface water and free water, which has thus far been considered difficult to establish. This is a good example of the application of modern data analysis techniques, which have provided new and important insights. The crossover demonstrated herein is based on the anisotropic formation of tetrahedral structures that could possibly be directly observed using FM-AFM or by spectroscopic means such as sum frequency and infrared absorption spectroscopy. It is expected that these structures will be confirmed experimentally in the future. Graphene has a charge neutral, atomically flat surface and so it is not difficult to imagine that water on a polar surface or a more highly structured surface would have a more complicated structure. Even so, this study provides a starting point for discussions of more realistic surface effects, and we expect it will lead to the control of surface properties through a deeper understanding of wettability and friction.