Molecular Dynamics in a Grand Ensemble: Bergmann-Lebowitz model and Adaptive Resolution Simulation

This article deals with the molecular dynamics simulation of open systems that can exchange energy and matter with a reservoir; the physics of the reservoir and its interactions with the system are described by the model introduced by Bergmann and Lebowitz.Despite its conceptual appeal, the model did not gain popularity in the field of molecular simulation and, as a consequence, did not play a role in the development of open system molecular simulation techniques, even though it can provide the conceptual legitimation of simulation techniques that mimic open systems. We shall demonstrate that the model can serve as a tool to devise both numerical procedures and conceptual definitions of physical quantities that cannot be defined in a straightforward way by systems with a fixed number of molecules. In particular, we discuss the utility of the Bergmann-Lebowitz (BL) model for the calculation of equilibrium time correlation functions within the Grand Canonical Adaptive Resolution method (GC-AdResS) and report numerical results for the case of liquid water.


Introduction
The physics of open systems is considered to be of primary importance in the understanding of natural phenomena and in the development of modern technology [1]. Systems in real life, as well as in experimental setups, are open systems, that is, systems which exchange energy and particles with their environment; the process of the exchange of particles is the basis of their interesting properties (see e.g. [2]). From a theoretical point of view the conceptual development of the classical and quantum statistical mechanics of open systems is challenging; in fact, theorems of statistical mechanics and dynamics derived for systems with a fixed number of particles are no longer valid in their standard formulation and must be revised accordingly, e.g., if the deterministic evolution is substituted with the stochastic evolution which controls the process of exchange of particles [3][4][5][6][7]. We will argue that extensive theoretical work with effective, elegant and (from a practical point of view) useful concepts were developed a long time ago (much in advance of the advent of computer simulations) but have remained unnoticed by the majority of the molecular simulation community. As a matter of fact, until recently, open systems with a varying number of particles have been simulated using algorithms which did not succeed as expected. The lack of success was most probably due to reduced efficiency compared to techniques based on fixed particle numbers (see, e.g. [8]). However, recently algorithms of multiscale character, which aim at bridging different scales within one unified framework, have gained great popularity, which in turn has led to the construction of efficient techniques where systems exchange energy or particles with an external environment; for example techniques using molecular resolution that can adaptively change in space (adaptive resolution simulation), see e.g. [9] and [10] and references therein.
Adaptive resolution simulation techniques allow us to focus on a specific region in space, treated at a desired (high) resolution, while the rest of the system is treated at a lower resolution. In the resolved region, some interesting process takes place while the rest of the system stays in thermodynamic equilibrium with the subsystem of interest (or, beyond equilibrium, exchanges energy and particles according to well defined statistical physical laws). In contrast to the first generation of algorithms with a varying number of particles, such algorithms are technically highly efficient and flexible. This flexibility makes them feasible for use in the calculation of various statistical properties, such as time correlation functions, some of which require a theoretical redefinition (compared to the fixed particle number simulations). The necessity of a formal redefinition of equilibrium time correlation functions in modern open systems MD simulations calls for revisiting the theoretical concepts developed about five to six decades ago for the statistical mechanics of open systems in the context of state-of-the-art computer algorithms.
In this paper, following the terminology developed in [3][4][5], we will refer to open systems which exchange energy and matter with the environment as grand ensemble systems; the grand canonical ensemble is one particular realization of a grand ensemble, as discussed in [3][4][5]. The aim of this paper is: ( We will treat the specific case of the grand canonical-like adaptive resolution simulation method (GC-AdResS) and discuss its conceptual consistency with the theory present in the literature, together with its technical advantages/limitations. The hope is that this may stimulate further research along this direction and add to the theoretical foundation of MD simulations in a grand ensemble; the need to approach more complex systems characterized by the realistic process of exchange of energy and matter with the environment, prohibitive in the past, is becoming a guiding principle in the development and the application of molecular simulation techniques [11].
The paper is organized as follows: in the first section we will give a general overview of the theoretical concepts developed about the statistical mechanics of open systems. Next we will focus on what we will call the Bergmann-Lebowitz approach, a flexible and conceptually robust model that is of utmost relevance for many state-of-the-art MD algorithms. In the second section we will briefly discuss the general features of techniques of MD with a varying number of particles and introduce the idea of MD with molecular adaptive resolution simulation. In the third section we will introduce one of the techniques of adaptive resolution simulation (GC-AdResS) and report results where the BL theory of the first section is employed to give conceptual justification for the simulations and to the corresponding calculation technique. Finally, conclusions and future perspectives will be given. It must be noticed that the technical setup and the numerical results reported in this work are original in the development of the GC-AdResS methodology. In fact the results show that, with the technical setup developed in this work, the method is reliable not only for the calculation of static properties, on which past research focused, but also for the calculation of dynamical properties, thus allowing the study of a much larger class of phenomena.

Basic concepts of a grand ensemble and extended Liouville equation
When one uses the keywords 'statistical mechanics of open systems' in an automatic literature search, one finds a considerable amount of rich material (see, e.g., [3][4][5][6][7]12]). However, the vast majority of this material focuses mostly on the idea of coupling a system to a reservoir of energy, or to nonequilibrium scenarios, such as the transport of matter from an external source. Exchange of matter techniques are usually limited to a simple extensions of the concept of heat exchange and heat flow [4][5][6]. As a matter of fact, the exchange of heat has been historically most relevant in MD simulations, as the coupling of a system to an external reservoir of energy makes simulations numerically stable and physically targeted to the desired thermodynamic state, without requiring large systems as those necessary to NVE simulations [13]. The circumstances outlined above, together with the lack of success of grand canonical-like MD methods (see the later discussion), were the reasons why the theoretical concepts of grand ensemble, developed, e.g., in [4][5][6], did not become popular in MD simulations and thus were not implemented in practical tools of calculations. However, as underlined in the introduction, the rediscovery and further development of such work became a timely necessity. In this section we will trace the idea behind the theoretical treatment of open systems in equilibrium and we will restrict the discussion to those approaches where the coupling between system and reservoir is not required in an explicit form; such approaches represent the most general model open system. Moreover, we will restrict the treatment to classical systems because our main interest lies in the field of classical MD. In particular we will define a generalized Liouville equation and associated operator (the Bergmann-Lebowitz Liouville equation/operator).
Instead, the class of approaches which explicitly require a coupling term in the Hamiltonian is usually limited to transport processes (out of equilibrium), whose external source can be formalized in specific cases only (e.g. [14]) and which in general do not admit a grand ensemble. The essential idea behind approaches which do not require an explicit coupling term, is that a small system is in contact with a large reservoir (or more than one, but for simplicity let us consider only one). The aim is to extract (thermo)dynamical laws governing the small system, from the microscopic equation of the global system (comprising the reservoir). The Liouville equation of the global system is the ideal starting point, however, the variables considered in the reservoir are macroscopic variables that can be considered to be averages over microscopic states; in the optimal case these variables do not explicitly enter into the description of the evolution of the small system. The general hypothesis at the basis of such models is that the reservoir exerts its influence on the small system only via intensive properties (see e.g. [4,6]). The key idea is that, even if the extensive variables of the reservoir change, its intensive variables are constants of motion. As a consequence, the dynamical evolution of the small system does not contain any time-dependent function of the reservoir and the small system is then governed by a self-consistent dynamical evolution. In a pioneering work, Emch and Sewell [6] proposed a method based on the basic principles reported before. They treat quantum systems, and the generalized Liouville equation is a master equation governing the evolution of the statistical operator. However, they need an abstract projector operator which coarse-grains the microscopic variables of the reservoir into macroscopic variables, that, in turn, influence the small subsystem. For MD simulations, although the premises of the method and its formalism are certainly appealing, this idea is not practical; in fact, the explicit specification and formalization of a general coarse-graining operator is not straightforward. However, a similar but more appealing idea for its formal simplicity and from the viewpoint of practical implementation, has been put forward by Bergmann and Lebowitz [4], as will be outlined below; see also [5].

Bergmann-Lebowitz Liouville equation
In the seminal paper of Bergmann and Lebowitz [4] (and subsequently in the paper of Lebowitz and Shimony [5]), the authors derive a general model of a many-particle system that is interacting with different reservoirs. Here, for simplicity and for closer analogy to a standard grand canonical MD simulation, we will treat only the case of a single reservoir. The key ingredient of the model is an impulsive, Markovian interaction between the reservoir and the system. The effect of the reservoir on the system can be completely described if one specifies the stationary distribution of the reservoir before the reservoir-system interaction (thus the knowledge of the reservoir state as a function of time is not required). In their model, each interaction between the system and the reservoir produces a discontinuous transition of a system from a state with N particles (X N ′ ) to one with M particles (X M ). Such transitions are determined not only by the configuration of the system, X N ′ , but depends also on the configuration of the reservoir in phase space. Ignoring the reservoir state upon collision, the change in the system state can be described in terms of a Markovian transition kernel, K X X ( , ) is the probability that, in an infinitesimally small time interval, the system at X M makes a transition to X N ′ as a result of the interaction with the reservoir. The probability density function, X M t ( , , ) M ρ , at some point X M of the phase space is governed by the extended Liouville equation which we will name the Bergmann-Lebowitz Liouville equation: The generalized Liouville theorem expresses the fact that there is a probability flux in and out of the system as a result of the interaction with the reservoir which induces the change from N to M particles. The determinism of the Liouville equation, which characterizes a closed system, is now replaced by a stochastic evolution in time.
It is convenient to retain the original formulation of the Liouville theorem and define an extended Liouville operator (Bergmann-Lebowitz Liouville operator): This allows us to formally write the Liouville theorem as in the standard case, namely, If the kernel satisfies the following integral condition (flux balance) then the stationary grand ensemble is the grand canonical ensemble with density The flux balance (6) is both a necessary and sufficient condition for stationarity with respect to the grand canonical distribution. In such a case, due to the fact that the BL Liouville operator is formally reduced to the standard Liouvillian M M = * that is a Liouvillian corresponding to a Hamiltonian which propagates the system in time with a variable number of particles (time dependent, stochastically regulated). As a consequence the BL Liouville equation is formally reduced to the standard Liouville equation, with the number of particles being a stochastic process.

Molecular dynamics of subsystems with a varying number of molecules
MD with a varying number of particles have been developed mostly for the calculation of the excess chemical potential following the Widom insertion or the thermodynamic integration techniques [15,16]. Such methods describe the effect of inserting or deleting a molecule in a system of N molecules; they are computationally rather demanding and the calculation of the excess chemical potential is the only aim of such studies. An extension of such a technique is that of hybrid MD/MC methods, in which the dynamical evolution of the MD system is interfaced with MC moves which insert or remove particles and then equilibrate the system locally before the next MD step is actuated (see discussion in [16] and references therein). Such an approach is not optimal and is computationally expensive, in fact each insertion would have costs of the order of those of Widom-like techniques for the calculation of the chemical potential.
Fully MD grand canonical schemes that have been developed in the past did not gain popularity due to their computational costs and a certain conceptual and theoretical artificiality. A pioneering attempt was made by Pettitt and collaborators [8,17]; see also the work of Lo and Palmer [18]. The method is based on the introduction of an additional dynamic variable s that represents the number of additional particles. At any instant the total number of molecules of the system can be written as N s + and s, the new variable, corresponds to a fractional number depending on the degree of presence of an additional molecule. An extended Hamiltonian is then derived and equations of motion for N s + variables are derived, moreover the knowledge a priori of excess chemical potentials is required at least when the molecular species are more than one (e.g. mixtures). It has been shown that such an approach was not optimal when applied to liquid water [19] and further improvements were implemented in extended versions such as that of Eslami and Müller-Plathe [16]. In our assessment, the method of [16] represents a substantial improvement over previous methods with regard to numerical robustness, nonetheless, it did not meet expectations and the number of applications presented in the literature is rather limited. In our view the idea of fractional particles is conceptually very appealing, but it introduces extra computational costs together with a more complex situation regarding the numerical stability of the algorithm and its implementation into pre-existing computational architectures of flexible (popular) MD codes. Later on, with the increasing success of multiscale MD techniques and the development of concurrent coupling techniques, a new generation of algorithms entered into the game [10]. Such a category is that of adaptive molecular resolution techniques. The common idea to all methods in such a category is the definition of two main open boundary regions, one at high resolution (e.g. atomistic) and one at coarse-grained level (spherical liquid); they are interfaced by a smaller region where molecules crossing the border acquire or loose their high resolution degrees of freedom. Molecules in the different regions are coupled via space-dependent intermolecular forces [20,21,24], Hamiltonians [25,26] or Lagrangians [27]. Each of these algorithms, in principle, can be easily converted to a grand canonical MD scheme if (1) the coarse-grained region is large enough to assure physically realistic particle number density fluctuations and (2) the high resolution region is large enough to be of statistical relevance 4 . The computational efficiency of these kinds of techniques is provably superior to methods with a varying number of particles of the previous generation (see e.g. [24,28]). From this perspective they represent a realistic pathway to future MD simulations in general, and in particular for those cases in which the variation in time of the number of particles or the physics of a subsystem is of high relevance.
In the next section we will focus on one of these techniques, developed by some of the authors within the last five to six years, with the specific aim of designing a general grand ensemble algorithm via adaptive resolution simulation. We will present the grand canonical adaptive resolution simulation (GC-AdResS) method and connect its principles to the model of Bergmann and Lebowitz. In the following, the importance of such a connection for the definition and calculation of equilibrium time correlation functions will be discussed and illustrated with numerical results.

Grand canonical-like adaptive resolution simulation (GC-AdResS): basic principles
The basic structure of the original AdResS [20] is based on an intuitive technical requirement, namely, the construction of a numerical scheme which allows the system to pass smoothly from an atomistic to a coarsegrained dynamic evolution in space in such a way that the dynamics of the atomistic part is not perturbed significantly by the dynamics of the coarse-grained part and vice versa. The flow of molecules between the two regions must be constructed in such a way that the exchange happens under conditions of thermodynamic equilibrium; it is expected that static and dynamical properties of the atomistic region must be the same as in an 4 In order to convert specific adaptive methods to a grand canonical-like setup (in most such techniques) one must go beyond the request of having a global Hamiltonian with a physical meaning. In fact in the Hamiltonian and Lagrangian based techniques, one can find statements justifying the artificial/technical process of changing the resolution of a molecule as a 'realistic physical process'. This in turn led the adaptive idea to be conceptually forced in a canonical or microcanonical ensemble only for the sake of familiarity with standard MD approaches. A global Hamiltonian does not make the conceptual derivation more rigorous than a forced-based approach, in fact it leads to implicit violations of basic principles of statistical mechanics and/or an increase in computational costs (i.e., lack of energy conservation [25,34], thermodynamic state-dependent Hamiltonians employed for first principles statistical mechanics analysis [35], increase in number of interactions as a function of the size of the system which makes the number of calculations impossible even for futuristic computers [27]). It is our opinion that the essential point is that the change of resolution is not a realistic physical process thus there is not a physics of reference against which to compare the correctness of properties of molecules with hybrid resolution. The only known first principle Hamiltonians are those of the high resolution and of the coarse grained regions; they have a particle-dependent form and are Hamiltonians typical of a grand canonical formulation. The hybrid Hamiltonian cannot be first principle because nature does not display changing resolution as a physical realistic process, thus a global Hamiltonian, while it may be technically useful in some cases [36], is conceptually artificial by definition and may be interpreted as a regression of the conceptual validity of the adaptive resolution method (if the global Hamiltonian is used for conceptual validation of the method). Instead, the perspective we propose in this paper is at the same time simple and unambiguous, that is, to treat the adaptive technique as a grand canonical-like setup and consider the interface region as a nonphysical filter with negligible (but numerically quantified) surface-like effects over the rest of the system. equivalent subsystem of a fully atomistic reference simulation. The construction of such a numerical machine is reported step by step below: • The space is partitioned in three regions, one characterized by atomistic resolution (AT) and one characterized by coarse-grained (usually spherical) resolution (CG) and a relatively small interface region with hybrid resolution (transition region or hybrid region) (Δ or HY).
• Molecules in the different regions are smoothly coupled through a spatial interpolation formula for the forces: where i and j indicates two molecules, F AT is the force corresponding the atomistic interactions (U AT ) (e.g. standard Lennard-Jones or Coulomb atomistic potential) and F CG is the force corresponding to the coarsegrained interaction potential U CG (e.g. standard COM-COM potential, where COM stays for 'the center of mass'), r is the COM position of the molecule and w x ( ) is a smooth function, defined over the transition region (Δ), which goes from 0 to 1 (or vice versa). It acts in such a way that the lower resolution is slowly transformed in the high resolution (or vice versa), as illustrated in figure 1.
• A thermodynamic force, defined via first principles of thermodynamics, acts on the COM of each molecule and a thermostat is added to assure the overall thermodynamic equilibrium at the chosen temperature. The thermodynamic force is derived in such a way that: where p AT is the chosen pressure of the atomistic system (region), p CG is the pressure of the coarse-grained model, 0 ρ is the chosen molecular density of the atomistic system (region) [29] (the explicit expression of F r ( ) th will be specified later on). A thermostat is added to take care of the loss/gain of energy in the transition region. This is the first step to pass from the original intuitive idea of AdResS to a well founded grand canonical framework of the method. In the original AdResS setup, the thermostat acts over the whole system (see top panel of figure 1), in this work the idea has been developed further and in order to match the requirements of the reservoir of the BL model for the calculation of equilibrium time correlation functions, we have constructed a setup in which the thermostat is applied to the reservoir only (i.e. hybrid and coarse-grained region); see bottom panel of figure 1. In [24] and [21] necessary conditions in Δ were derived so that the spatial probability distribution in the atomistic region was close to that of a fully atomistic reference system up to a certain chosen order. The probability distribution is that of a grand canonical ensemble, hence the name grand-canonical-AdResS (GC-AdResS). We define the mth order statistics of a joint probability distribution of M molecules, p r r ( , , ) The molecular number density r ( ) ρ corresponds to the first order, the radial distribution function to the second, three-body distributions to the third order statistics and so on; examples of how the statistics in the atomistic region is reproduced will be shown later on. We emphasize that, by construction of the method, the accuracy in the atomistic region is independent of the accuracy of the coarse-grained model, thus, in the coarse-grained region, one can use a generic liquid of spheres whose only requirement is that it has the same molecular density as the reference system (i.e. we need only to know the distribution of the reservoir and not its microscopic details, which is in accordance with the basic principle of construction of the BL reservoir). It was numerically demonstrated for the case of liquid water that the target grand canonical distribution, numerically defined as the probability distribution of a subsystem (of the size of the atomistic region in GC-AdResS) in a large fully atomistic simulation, is accurately reproduced to (at least) third order. To complete the idea of grand canonicallike setup, it was shown that the sum of the work of F r ( ) th and of the thermostat in the transition region is equivalent to the difference of the chemical potentials between the atomistic and coarse-grained resolution (at the given thermodynamic conditions). Details will be given later on.
The construction of a thermostat that acts only in the hybrid and CG regions makes the reservoir of GC-AdResS the effective technical translation of the reservoir hypothesized by Bergmann and Lebowitz in their model. A detailed discussion of the validity of the approximations of the method in the light of the theoretical hypothesis of the BL model is outlined in the next section.

Bergmann-Lebowitz model and GC-AdResS
In this section we analyze the correspondence between the BL model and GC-AdResS, more specifically we will discuss the possible mathematical mapping between the formulas of the two models and analyze the corresponding algorithmic meaning.

Mapping the Hamiltonian of the AT region
For the ith molecule, at position r i in the AT region of AdResS (hereafter named 'system'), we have w r ( ) 1 i = , thus the corresponding force can be divided into two contributions; one is the force generated by the interaction of molecule i with molecules of the AT region: i j i j , , AT = ∀ ∈ and one is the force generated by the interaction with molecules of the reservoir (11) implies the possibility of expressing the force acting on molecule i in terms of the gradient of the atomistic potential: where i  is the gradient w.r.t. molecule i. Equation (12) expresses instead the action of molecules of the reservoir on molecule i, that is an external force. The system-reservoir coupling term of equation (12) rules out the existence of a microscopic Hamiltonian for the system (embedded in the reservoir) and thus impedes a straightforward correspondence between the BL Hamiltonian, H M , of equation (7) (or H X ( ) M of equation (8)) and the Hamiltonian of the AT region, H AT , of the AdResS model. However, here we want to advocate the view that the AdResS model can be mapped to the BL framework, even though a rigorous derivation of the BL kernel from a microscopic model is beyond the scope of this paper. We will provide numerical evidence for this point of view later on in the text. Roughly speaking, one may argue that the nonintegrable part of the dynamics in the HY region represents a boundary effect that can be absorbed in the definition of the transition kernel. To elaborate on this point, we first notice that equation (12) can be recast as: i j Hence the net force on the ith particle can be considered as a (nonlocal) gradient field that is instantaneously produced by the external field generated by the other molecules. As a consequence, the energy of the ith molecule at time t 0 > associated with the coupling force of equation (14) can be defined as where the U ij · represent the interaction energies between molecule i at position r i and the other molecules sitting at r j . The total energy in the system at time t is then defined as   (17) holds true when the HY region can be considered thin compared to the AT region and when the AT region is large. In this case, given the typical cutoff radius of interactions across the HY region, there is no direct interaction between the AT region and the CG region. However, equation (17) may not hold under more realistic conditions as they are routinely used in AdResS simulation, with a not too large AT region and an HY region that is not too thin so as to avoid numerically stiff systems. Figure 2  fully specifies the microscopic characteristics of the AT system.

The action of the reservoir and the interpretation of the transition kernel
We shall proceed with discussing the correspondence between the BL and GC-AdResS reservoirs and the role of the kernel. To this end we recall that, in the BL framework, K X X ( , )

NM N M
′ is the transition rate for the system in state X M to make a transition to X N ′ as a result of the interaction with the reservoir. Further, recall that (6) is both necessary and sufficient for the system to admit a unique stationary grand canonical distribution. This implies that (6) holds by construction of GC-AdResS that is ergodic with respect to the grand canonical distribution. ; the former is at least one order of magnitude larger than the latter. Inset: the relative effect of the interaction between the AT region and the reservoir as a function of time: can be clearly seen that the contribution is, at most, 10%. It must be underlined that in a test done with a much larger system, the effect goes below 1.0%.
This clearly does not uniquely determine the transition kernel, nor does it guarantee its existence, but we will discuss how the transition kernel can be interpreted within the GC-AdResS framework. The influence of the GC-AdResS reservoir on the dynamics in the AT region comprises three contributions: (a) the thermostat, (b) the thermodynamic force, and (c) the coupling force (14). Firstly, the function of the thermostat is that of assuring thermal stability of the reservoir and, as a consequence, of the system. Thermal stability is guaranteed by irreducibility of the kernel, so that it is possible to go from any region of the AT phase space to any other region with a positive probability [22]. A slightly stronger condition is that the dynamics are ergodic which is guaranteed by the recurrence of the dynamics, i.e., every phase space region is visited infinitely often with a positive probability. We should emphasize that this condition is known to be false for almost all deterministic Hamiltonian systems expect for certain billiards and geodesic flows on surfaces of constant negative mean curvature, therefore we use a gentle stochastic thermostat in AdResS. We refrain from going into detail here and instead refer to [23] for a discussion of this issue.
Secondly, the thermodynamic force is computed via the following iterative procedure: The fixed point iteration converges locally as the density profile across the HY region becomes flat. This requires an exchange of particles between the AT and the CG regions, hence the thermodynamic force has the effect that the number of particles in the AT region vary in such a way that the average number density is constant (equal to the fixed target density). This also means that, by transporting the action of the thermostat, the effect of F x ( ) th is to impose the stationary distribution of the reservoir at the first order ( x ( ) ρ ), independently of the interaction between the reservoir and the system; this condition is equivalent to the main condition requested/satisfied by the reservoir in the BL model. The computation of the thermodynamic force corresponds to the equilibration procedure of GC-AdResS; once the fixed-point iteration has converged (which it does at least locally), the obtained force is used for the simulations of production runs. The chemical potential, (6) is then automatically determined according to the equation (see [24,28] function of equation (9) and gas ω is the chemical potential in the absence of intermolecular interactions and · r 〈 〉 indicates the conditional equilibrium average for fixed AT configurations.
Equation (20) is the minimal necessary condition that the GC-AdResS system should satisfy in order to have a grand canonical-like molecular dynamics, i.e. to satisfy the condition equation (6), and, as stated above, it is imposed by the thermodynamic force. The numerical verification that indeed the AT region of GC-AdResS behaves as a grand canonical ensemble is then made by comparing quantities calculated in the GC-AdResS AT system with those calculated in an equivalent subsystem of a fully atomistic reference system (see results in section 6.1). A subsystem in a fully atomistic simulation, if the subsystem and the total system are large enough, is a natural grand canonical system. It follows that if the reservoir in the fully atomistic reference system and the GC-AdResS reservoir have the identical insertion/deletion behaviour (equation (6)), they must spend the same amount of energy in insertion/deletion, i.e. have the same chemical potential difference between the AT region and the rest of the system. This implies that the condition of equation (6) in the BL model corresponds to equation (20) of the GC-AdResS model.
Thirdly, in accordance with the above reasoning, the coupling force in (14) does not give a major energetic contribution to the AT interactions. Nevertheless it involves strong repulsive forces that prevent the molecules entering in the AT region from overlapping with molecules that are already in the AT region, which would produce (numerical) singularities that would automatically stop the simulation. This soft collision-avoidance has the effect that the smooth density of the transition kernel is exponentially decaying outside the admissible (non-overlapping) particle configurations. Hence, even though the coupling force can be conceptually neglected as far as the construction of the transition kernel is concerned, it plays a key role in the numerical simulation as it imposes collision-avoidance between AT and HY/CG particles in a robust and numerically efficient way.
Altogether, even though we cannot give a rigorous derivation of the BL kernel within the GC-AdResS framework, we have described how some of the properties of the kernel that guarantee well-posedness of the dynamics can be inferred from the properties of the various force contributions. It is unclear whether it is possible to write the kernel explicitly in terms of the forces. We shall argue that, even though such a direct link may not exist, it is still possible to realize the BL model numerically, and GC-AdResS does exactly this. For example, stochastic insertion/removal of molecules in the system (see [8,16]) can be used to realize K X X ( , ) in a Monte Carlo fashion. The basic idea is that a molecule is inserted in the system by searching a location that is close to a minimum free energy configuration followed by a local equilibration where the rate of insertion is defined by the chemical potential of the system in accordance with (6); equivalently, within the where Q GC is the grand canonical partition function, μ the chemical potential and N the number of particles (now varying in time) of the system. The difficulty lies in how to interpret the quantity b p p q q p q ( ( , ), ( , )) t N N t N N . In fact, at a given time t the system evolved from its initial condition and it is likely to have a number of particles/molecules N′ different from the initial state. The correspondence of GC-AdResS with the model of Bergmann and Lebowitz plays a key role for making sense of b p p q q p q ( ( , ), ( , )) t N N t N N in the numerical simulation as equation (7) states that there exists a Liouvillian iL M , the action of which is to evolve the system from p q ( , ) N N to p q ( , ) t t with N′ molecules. As we have argued, the operator iL M is well defined within the GC-AdResS framework. Thus the correspondence between the BL model and GC-AdResS leads to the following ready-to-use definition of the equilibrium time correlation functions for numerical simulations with GC-AdResS: 'if a molecule leaves the AT region in the observation time window, its contribution to the correlation function is neglected'. This principle is in agreement with the philosophy of the BL model, which asserts that a molecule entering into the reservoir loses its microscopic identity.

Numerical results
Here we report numerical results for liquid water (SPC/E model) at room conditions. The section is divided in two parts: the first is dedicated to the calculation of static properties with the intention of demonstratingnumerically-that GC-AdResS produces results typical of a natural grand canonical system (as defined before). The second part is dedicated to the calculation of the equilibrium time correlation functions. In such a case the exchange of particles with the reservoir poses, on the one hand, the conceptual question of how to define the Liuoville operator of the atomistic region and, on the other hand, the practical question of how to count correlations when a molecules leaves the atomistic region or enters it. The theoretical concepts of section 2 actually give the guidelines to solve both problems. We will first prove that, with the definitions taken from section 2, GC-AdResS gives the same results as those of an open subsystem of a fully atomistic NVE simulation. Next, since in the thermodynamic limit all ensembles are equivalent, we expect, for physical consistency, that by increasing the size of the atomistic region, results systematically converge to those of a full NVE simulation where the calculations are performed over the whole system; the numerical results reported below confirm our expectations.  [24] is that the transition region is considerably smaller and that the thermostat acts only in the reservoir. A few remarks in this regard are in order: in figure 3 the number particle density of GC-AdResS agrees in a satisfactory way with that of the NVE calculation, the largest deviation (below 5%) is at the border of the atomistic region with the hybrid region. This is due to the abrupt absence of the thermostat. The effect is negligible anyway, however, there are three technical options which allow us to make the effects of such a difference even smaller: (a) apply the rigorous GC-AdResS protocol and consider an additional (but, differently from [24], negligible) atomistic buffer as part of the transition region, (b) require that the convergence of the thermodynamic force is stricter, (c) slowly switch off the thermostat in the transition region near the atomistic region. Here we have opted for the simpler option (a), because in any case the effects of this discrepancy on the calculation of physical quantities produce no more than 10% of deviation compared to the reference data (see discussion below). Figure 5 reports the particle number probability distribution of the subsystem compared with an equivalent NVE subsystem, the shape of both curves is a Gaussian and the curve of GC-AdResS is indeed shifted compared to the NVE of reference, but only for two particles. If we apply the rigorous GC-AdResS protocol and consider an additional (negligible) atomistic buffer, then the two curves essentially overlap, see figure 5 (bottom). Table 1 shows the robustness of the method as a grand canonical setup for the calculation of a thermodynamic property, that is, energy fluctuation and the covariance (see appendix for definitions and technical details). Regarding the accuracy, in the worst case the deviation is no more than 10%, which would already be numerically satisfactory. However, if we apply the rigorous GC-AdResS protocol (as in figure 5 (bottom)) the maximum deviation falls down to 3% only, see table 2.

Static properties
An additional test was done in order to prove that GC-AdResS satisfies a thermodynamic condition of a grand canonical ensemble in the thermodynamic limit. In fact in the thermodynamic limit the isothermal compressibility, T κ , in a grand canonical ensemble, can be related to the fluctuations encoded in the particle number distributions [31]: where ρ is the density of particles, k B the Boltzmann constant and, T 298 K = , the temperature. The test was done for a system of 20000 molecules with a reservoir of 800000 (total number of molecules 100000) at a pressure of atm 1 ; in this case we obtained 44.6 1.6 10 (bar ) T 6 1 κ = ± − which should be compared with the value of 45.9 1.2 10 (bar ) 6 1 ± − of the corresponding fully atomistic system and with the value of about − from NPT simulations of SPC/E water [33]; the overall accuracy is within 5% (in the worst case), which can be considered a satisfactory result. It must also be underlined that an effective compressibility, equation (25), was found to be the same in GC-AdResS and in the fully atomistic simulation (see also [29]). Given the satisfactory tests for static properties, which prove that indeed the reservoir based on the local thermostat of GC-AdResS produces grand canonical statistics, we can   Table 1. Thermodynamic fluctuations calculated in atomistic subregion (EX = 1.2) in GC-AdResS and full-atom simulations. There is a discrepancy of around 5-10% between the results of GC-AdResS and those of the reference full-atom simulation.

Quantity
Full-atomistic GC-AdResS 4.4 ± 0.2 3.9 ± 0.2 Table 2. Same quantities as above calculated in the region excluding the (negligible) part where the density is 5% off compared to the reference density, as discussed in figure 3. The numerical results in GC-AdResS and the full-atom simulation agree now within 3%, which is highly satisfactory. now proceed with the calculations of equilibrium time correlation functions where the notion of the BL Liouville operator in the limit of a grand canonical ensemble comes into play in order to provide theoretical solidity to the numerical calculations.

Dynamic properties
Here we report the numerical results of the application of GC-AdResS to the calculation of three relevant equilibrium time correlation functions for SPC/E water at room conditions. The GC-AdResS results are compared with the results obtained for an equivalent subsystem in a fully atomistic NVE simulation. Figure 6 shows the velocity-velocity autocorrelation function, C t ( ) VV (top), (molecular) dipole-dipole autocorrelation function, C t ( ) μμ (middle), reactive flux correlation function, k t ( ) (bottom); the agreement between GC-AdResS and the fully atomistic NVE simulation is remarkable. This implies that the 'ideal' reservoir of the GC-AdResS method is very close to the thermodynamic limit of a microscopic system. A necessary condition of general validity of the concepts and calculations shown here is that, as the AT region of GC-AdResS increases, results must systematically converge to those obtained for the whole system of the fully atomistic NVE simulation. This principle corresponds to the fact that in the thermodynamic limit all the ensembles are equivalent. Figure 7 shows the systematic convergence of the curves to the fully atomistic reference as a function of the size of the AT subsystem of AdResS. A general remark valid when adaptive resolution is used as a multiscale technique rather than as grand canonical setup must be made: it must be noticed that the procedure defined above to calculate time correlation functions introduces a connection between the decay of a correlation function in time and the spatial locality of the process associated with such a decay. For example, in dense gases decay times are relatively large, thus if the size of the atomistic region is too small, many molecules are likely to leave such a region with the effect that the decay time would be shorter than the real one. In practical terms, a way to probe whether or not our method captures a certain decay process is to perform a study where the size of the atomistic region is systematically varied and observe the convergence of the correlation function of interest. At the same time it must also be noticed that the connection between decay times and spatial locality is not necessarily a limitation of the procedure, but actually represents one of its main conceptual advantages; in fact it allows us to identify the essential (atomistic) degrees of freedom (in space and time) required for a certain process.

Conclusions
We have discussed the BL model as a prototypical theoretical construction for describing the statistical mechanics of open systems. Despite its conceptual solidity, the model has not been employed or discussed in connection with the development of MD techniques with a varying number of molecules. As we have argued, however, the model turns out to be very useful as far as the conceptual validation of MD techniques is concerned. We have discussed its connection to the GC-AdResS MD technique and used its principles to define equilibrium time correlation functions for a system with a varying number of molecules. Numerical results for a relevant system, liquid water at room conditions, are highly promising. We have then discussed the computational efficiency and convenience of GC-AdResS. Given the technical robustness of GC-AdResS and its conceptual validation within the BL model, one can think from this perspective about moving forward and also approaching systems out of equilibrium, e.g. subject to an external perturbation. For example, biomolecules in solution whose conformational dynamics is driven by an external (electric) field as in [30]. The response of the system to an external perturbation requires a numerical technique similar to that employed in the calculation of equilibrium time correlation functions, moreover the region of microscopic interest is limited to the first two to three solvation shells of the molecule, which is an ideal test case for a AdResS-like technique. The study of open systems is gaining popularity and the development of techniques which are both computationally efficient and theoretically well founded is a necessity of modern research in the field of molecular simulation; GC-AdResS is such an example.  The parameters σ and ϵ in the current simulations are 0.30 nm and 0.65 kJ mol 1 − , respectively. The time step used in the simulations is 0.002 ps, and the coordinates and velocities are recorded after every 10 time steps, i.e. 0.02 ps. All simulations are performed at room temperature, 298 K. The coarse-grained and the hybrid region in the AdResS system are coupled to a Langevin thermostat, whose time scale is 0.1 ps. The reaction field method [40,41] is used for calculating the electrostatic interactions in the system, with dielectric constant RF ϵ = ∞, as this tends to give good energy conservation. The 'switch' cutoff method is used to treat the van der Waals interactions. The cutoff radius for interactions is 1.2 nm. For a 1 ns full atomistic simulation (without any thermostat), the total energy obtained is 195846 − kJ mol 1 − and the drift is just 11.4 kJ mol 1 − , which is less 0.01%. The dynamical results from this micro-canonical ensemble are compared with results from AdResS simulations. All the dynamical properties are computed from equilibrated trajectories of 1 ns in fully atomistic and AdResS simulations. The velocity autocorrelation function is defined as: where · 〈 〉denotes the equilibrium average and v t v ( ) · (0) i i 〈 〉 computes the correlation between the velocities of the ith molecule at times 0 and t. In this work, the velocity autocorrelation function is calculated only for the oxygen atoms. In the same way, the dipole autocorrelation function is defined as: 〉 computes the correlation between the dipole moment of the ith molecule at times 0 and t. In the current implementation of AdResS, the electrostatic interactions are calculated by a short ranged reaction field method. The dipole autocorrelation function results are consistent with the fully atomistic simulation, also using reaction-field. We also tested the particle mesh Ewald (PME) method [42] as an alternative approach to compute coulomb interactions and calculated the dipole autocorrelation function in a fully atomistic simulation, and found that the results were identical. The reactive flux hydrogen bond correlation [43][44][45][46] function is defined as: Here h is the hydrogen bond population operator for a particular pair of molecules. It is assigned a value of '1' if there is a hydrogen bond between this pair, otherwise a value '0'. The criteria for considering a hydrogen bond between two water molecules is (1) inter-oxygen distance is less than 0.35 nm and (2) the O H O − … angle is smaller than 30°. The function C t ( ) HH is the conditional probability that a hydrogen bond between a pair of molecules is present at time 't', given that it was present at time zero. In both the fully atomistic and AdResS simulations, first C t ( ) HH was calculated and then k t ( ) was obtained by taking the numerical derivative of C t ( ) HH , using a time step of 0.02 ps.

Appendix B. Thermodynamic fluctuations
The following thermodynamic quantities are analyzed in this work: where Var E ( )is the variance in the total energy of the molecules in the atomistic subregion in AdResS and an equivalent subregion in the full-atom simulations, CoVar N E ( , )is the covariance between the total energy of the molecules and the number of molecules which are present in the atomistic subregion in AdResS and an equivalent subregion in the full-atom simulations. The energy E consists of the sum of the kinetic energy of the molecules in the region considered, plus the energy coming from the interactions of each molecule with all the other molecules of the region considered. The interactions with the reservoir, defined in the text as 'technical interactions', are not counted, for consistency with the definition of reservoir in the BL model. The different properties are calculated from a 2 ns long trajectory. The error in the data was calculated using 'block-averaging' analysis.