Information and contact geometric description of expectation variables exactly derived from master equations

In this paper a class of dynamical systems describing expectation variables exactly derived from continuous-time master equations is introduced and studied from the viewpoint of differential geometry, where such master equations consist of a set of appropriately chosen Markov kernels. To geometrize such dynamical systems for expectation variables, information geometry is used for expressing equilibrium states, and contact geometry is used for nonequilibrium states. Here time-developments of the expectation variables are identified with contact Hamiltonian vector fields on a contact manifold. Also, it is shown that the convergence rate of this dynamical system is exponential. Duality emphasized in information geometry is also addressed throughout.


Introduction
Markov Chain Monte Carlo (MCMC) methods are well-known methods for obtaining expectation values (averaged values) of some quantities with respect to target distribution functions numerically. These methods have been applied in various disciplines including mathematical engineering, physics, statistics and so on. As this name suggests, MCMC methods are based on Monte Carlo methods with Markov chains. The original Monte Carlo method was introduced in 1953 for studying liquids [1], and then as stated below, various variants originated from the original method have been proposed. Although there were some progress in developing MCMC methods, there still had been issues to be solved. Such issues include the slow relaxation problem for the cases where target distribution functions are multi-peak. In addition, poor sampling efficiency and strong initial condition dependencies are caused by this multi-peak property. A series of methods were proposed after the invention of simulated annealing method [2], and they are called extended MCMC methods [3]. Extended MCMC methods can roughly be divided into two classes. One is simulated tempering [4,5], and the other the exchange Monte Carlo method [6]. Although there have been progress in developing MCMC methods as stated above, improvements and refinements for MCMC methods in various senses have been demanded.
Information geometry is a geometrization of mathematical statistics [7], and its structure and applications have been investigated. This geometry offers tools to study statistical quantities defined on statistical manifolds, where statistical manifolds are identified with parameter spaces for parametric distribution functions. Examples of applications of information geometry include statistical interference, quantum information, and equilibrium thermodynamics [8,9]. From these examples one sees that the application of information geometry to sciences and engineering enables one to visualize theories and to utilize differential geometric tools for their analysis. Thus, one is interested in how to extend information geometry. Since equilibrium thermodynamics can be formulated with information geometry, geometrization of nonequilibrium thermodynamics is one of the keys for this task. Along with this context, one extension of information geometry is to use contact geometry [10,11], where contact geometry is known to be an odd-dimensional counterpart of symplectic geometry [12,13]. This extension is consistent with geometrization of equilibrium thermodynamics [14,15,16,17]. In Ref. [10], a contact geometric description of relaxation processes in nonequilibrium thermodynamic systems was proposed, and a link between information geometry and contact geometry was clarified. Since Markov chains can be used as models of nonequilibrium thermodynamic systems [18], it can be expected that contact geometry and Markov chains are related, and that some geometrization of MCMC methods can be constructed with contact and information geometries. Also it should be noted that dynamical systems on statistical manifolds without contact geometry have been studied in the literature [19,20,21,22]. In addition, information geometric descriptions for Markov chains were investigated in the literature as well [23,24,25].
In this paper a combination of heat-bath methods is formulated with contact geometry, where the heatbath method is a particular kind of MCMC method. We call this method the interchange Monte Carlo method, where the term "interchange" is used for avoiding any confusion to do with the exchange MCMC methods. To formulate this Monte Carlo method with contact geometry, phase space is identified with a contact manifold, and dynamics is described as a combination of contact Hamiltonian dynamical systems. It is then shown that the time-asymptotic limit of the geometrized heat-bath-method is consistent with information geometry. Also, convergence rates are explicitly calculated after introducing a metric tensor field.
Some new terminologies will appear in this paper, and they play various roles. The relations among them are briefly listed below. This paper is organized as follows. In Section 2, the heat-bath method is reviewed, and the master equation is also termed as the primary master equation. Then the dual master equations are introduced. The explicit solutions of these master equations are shown. In Section 3, with the use of the derived solutions and clarified features of the primary and dual master equations, differential equations describing timedevelopment of some observables are derived. In Section 4, a geometrization of discussions in Section 3 is given. Finally, Section 5 summarizes this paper and discusses some future works.

Solvable heat-bath method
In this section the heat-bath method known as one of MCMC methods is reviewed and then its solvability is shown first. In reviewing this, it is shown how master equations play roles. Then, the idea called "dual" used in information geometry is imported to the heat-bath method. With this idea, dual master equation is introduced. To emphasize this duality, master equation is also referred to as primary master equation.

Primary master equation
In the following a heat-bath method is reviewed, and its solvable feature is explicitly clarified. After this, primary master equation is defined.
Let Γ be a set of finite discrete states, t ∈ R time, and p(j, t) dt a probability that a state j ∈ Γ is found in between t and t + dt. The first objective is to realize a given distribution function p eq θ that can be written where θ ∈ Θ ⊂ R n is a parameter set with θ = {θ 1 , . . . , θ n }, and Z(θ) the so-called a partition function so that p eq θ (j) is normalized. Thus, One way to achieve the objective is to employ a heat-bath method. In what follows, how this objective is done is shown. Let p : Γ × R → R ≥0 be a time-dependent probability function. Then assume the following.
• The time-evolution of p follows the master equation where w : Γ × Γ → I, (I := [ 0, 1 ] ⊂ R) is such that w(j|j ′ ) ∈ I denotes a probability that a state jumps from j ′ to j.
• Choose w to be w θ (j|j ′ ) = p eq θ (j). The MCMC method with this choice is referred to as the heat-bath method.
• The target distribution p eq θ (j) for any state j ∈ Γ does not vanish: p eq θ (j) = 0. To show an explicit form of p(j, t), one rewrites (1) with the assumptions. The equation of motion for p is derived from In this paper (2) is termed as follows.
Definition 2.1. (Primary master equation). The equation (2) is referred to as a primary master equation, or a master equation.
Then the following proposition holds.
Proposition 2.1. (Solution of the primary master equation). The solution of the master equation (2) is Then it follows that lim t→∞ p(j, t) = p eq θ (j).
Proof. Solving (2) for p(j, t) with a set of given initial values {p(j, 0)}, one has the explicit form of p(j, t), . From this explicit form, the long-time limit of p(j, t) can be given immediately as (4).
With this Proposition, one notices the following.
• A solution p depends on θ. Taking into account this, p(j, t) is denoted p(j, t; θ).
• Any state transition from j to j ′ ( = j) does not occur with this master equation.

Dual master equation
In this subsection, some features of target distribution functions p eq θ are discussed first. This provides a motivation for introducing the dual of the introduced master equation. Such master equations and dual master equations will be used in Section 3 to derive dynamical systems for observables.
Target distribution functions considered in this paper are the exponential family, since they are discrete distribution functions. This family is a class of probability functions defined as follows. Let θ = {θ a } ∈ Θ ⊆ R n be a finite parameter set, and ξ a set of random variables. If a probability functionp θ being parameterized by θ can be written of the form with some functions {O a }, C andΨ, thenp θ is said to belong to the exponential family. In this paper, the Einstein convention, when an index variable appears twice in a single term it implies summation of all the values of the index, is adapted. In what follows the function C is assumed to be eliminated. Also, the following is assumed. The functionΨ is used for normalizingp θ , and gives the quantities uniquely. Then the correspondence between θ a and η a is one-to-one. This uniqueness provides a motivation for introducing the other expression of p eq θ . Substituting η = η(θ) into p eq θ , one has the other expression of the target distribution. This is denoted by p eq η , and the corresponding master equation is obtained from (2) as ∂ ∂t p(j, t) = p eq η (j) − p(j, t).
This equation is termed in this paper as follows.
Definition 2.2. (Dual master equation). The equation (7) is referred to as the dual master equation associated with (2).
The primary master equations are used for the cases where θ = {θ a } is kept fixed, and η depends on random variables. On the other hand, the dual master equations are used for the cases where η = {η a } in (6) is kept fixed, and θ depends on random variables. Similar to Proposition 2.1, one has the following.  (7) is Then it follows that lim t→∞ p(j, t) = p eq η (j).
Proof. A way to prove this is analogous to the proof of Proposition 2.1.
Accordingly, the solution of this equation is denoted p(j, t; η).
In this paper heat-bath methods reducing to the primary master equation or the dual master equation are referred to as solvable heat-bath methods.
Before closing this Section, attention is concentrated on a spin system to show how the general theory developed in this Section is applied to a physical model.
Example. (Kinetic Ising model without interaction). Let σ be a spin variable that takes the values σ = ±1, H a constant magnetic field whose dimension is an energy, T the absolute temperature, and θ = H /(k B T ) with k B being the Boltzmann constant. Consider the equilibrium system consisting of the one-spin system being coupled with a heat bath with T . With these introduced variables, the canonical distribution is given by p eq Ising θ (σ) = exp( θ σ)/ Z Ising (θ), where Z Ising (θ) is calculated to be When this canonical distribution is chosen to be the target distribution function, the primary master equation is immediately obtained from (2) as where ψ eq Ising (θ) := ln Z Ising (θ) = ln(2 cosh(θ)). ψ eq Ising (θ) The dual master equation is derived as follows. First, from (6) and (11) it follows that η = tanh θ.
Substituting this and the mathematical formula Combining these with (10), one has Finally, the explicit form of the dual master equation is obtained from (7) as The equilibrium free-energy F eq Ising is written in terms of ψ eq Ising in (11) as from which one can interpret ψ eq Ising as the negative dimensional-less free-energy for this model. This system was briefly studied in Ref. [10], and is referred to as the kinetic Ising model without interaction in this paper. Note that this system is simplified one originally considered in Ref. [18].

Time-development of observables
In this section differential equations describing time-development of observables are derived with the solvable heat-bath method under some assumptions. Here observable is defined as a function that does not depend on a random variable or a state. Thus expectation values with respect to a probability distribution function are observables. Also, time-asymptotic limits of such observables are stated.

Expectation values associated with p(j, t; θ)
In this subsection, the case where O a in (5) depends on random variables is considered. First, one defines some expectation values.
are referred to as the expectation value of O a with respect to p, and that with respect to p eq θ , respectively. If a target distribution function belongs to the exponential family, then the function Ψ eq : Θ → R with plays various roles.
Here and in what follows, (13) is assumed to exist. In the context of information geometry, this function is referred to as the θ-potential. Discrete distribution functions are considered in this paper and it has been known that such distribution functions belong to the exponential family, then Ψ eq in (13) also plays a role throughout this paper. The function Ψ eq (θ) can be interpreted as the negative dimension-less free-energy (See (12)). One then can extend this function to that for nonequilibrium states as Ψ : Since p eq θ (j) = 0 and Ψ eq (θ) < ∞ by assumptions, the term (· · · ) in (14) exists. Extending the idea for the equilibrium case, the function Ψ may be interpreted as a nonequilibrium negative dimension-less free energy. Although there could be more suitable free-energy for nonequilibrium states, (14) is employed in this paper.
One notices the following. 1. Relations among the expectation values of quantities introduced above in the timeasymptotic limit are found with (4) as lim t→∞ O a θ (t) = O a eq θ , and lim t→∞ Ψ(θ, t) = Ψ eq (θ), 2. Since a target distribution function belongs to the exponential family, it follows that [7] ln p eq Here L[Ψ eq ] is the Legendre transform of Ψ eq , 3. Since a target distribution function is a discrete one and thus belongs to the exponential family, one has (see (6) and [7]) Also, define cross entropy H, and negative entropy at equilibrium H eq so that From these definitions with (4), the time-asymptotic limit of H is obtained as

Expectation values associated with p(j, t; η)
From discussions on dual master equations, it is natural to consider systems where θ = {θ a } depends on random variables. For this purpose, θ is treated as variables depending on j in this subsection. One defines the following.
Definition 3.2. (Expectation value with respect to p(j, t; η) ). Let θ a : Γ → R be a function with a ∈ {1, . . . , n}, and p : Γ × R → R ≥0 a distribution function that follows (7). Then θ a η (t) := j∈Γ θ a (j)p(j, t; η), and θ a eq η := j∈Γ θ a (j)p eq η (j), are referred to as the expectation value of θ a with respect to p, and that with respect to p eq η , respectively. One notices the following.
1. Relations among the expectation values of quantities introduced above in the timeasymptotic limit are found with (9) as lim t→∞ θ a η (t) = θ a eq η .

The following relation is satisfied with the set of time-independent variables
Here the function H eq as a function of η is obtained as the following two steps. (i) Solving η b = ∂Ψ eq /∂θ b for θ a , one has θ a = θ a (η). (ii) Substituting θ a = θ a (η) into H eq (θ), one has H eq (θ(η)). Letting Φ eq be the Legendre transform of Ψ eq , one has that (18) can be written as Φ eq (η) = H eq (θ(η)).
In the context of information geometry, the function Φ eq is referred to as the η-potential.

It can be shown that[7]
and where δ b a is the Kronecker delta giving unity for a = b, and zero otherwise. Similar to (17), one defines By definition, it follows that H eq (θ(η)) = H eq (η), where θ(η) is obtained by solving Explicit forms of the expectation values and ones of Remarks 3.1 and 3.2 for Example in Section 2 can be shown as follows.

Example. Consider the kinetic Ising model without interaction introduced in Example in
from which one has the expectation value of O with respect to p eq θ as The θ-potential Ψ eq defined in (13) is calculated to be Ψ eq Ising (θ) = ln σ=±1 e θ σ = ln ( 2 cosh(θ) ) = ψ eq Ising (θ), where ψ eq Ising (θ) has been defined in (11). Then, one has η in (15) as Also it turns out that Ψ eq Ising is strictly convex due to The negative entropy at equilibrium H eq Ising is calculated to be The relation (18) is verified for this Example below. First, the Legendre transform L Ψ eq Ising (η) is calculated as Second, it follows from (21) and (23) that L Ψ eq Ising (η) = H eq Ising (θ(η)). They are denoted by Φ eq Ising (η) = L Ψ eq Ising (η) = H eq Ising (θ(η)). Another expression of Φ eq Ising (η) is obtained from where −1 < η < 1 coming from (21) has been applied. Similar to Ψ eq Ising , the function Φ eq Ising is shown to be strictly convex directly. It follows from dH eq Ising dθ = θ sech 2 θ, and dH eq The relation (20) for this Example is verified by combining (22) and (24)

Closed dynamical systems
As shown below a set of differential equations for { O a θ } and Ψ can be found in a closed form. Proof. Non-trivial parts of this proof is for the second and third differential equations. A proof that the equation for O a θ holds is given first. It follows from (2) and (16) that Then, a proof that the equation for Ψ holds is given below. It follows from (2) and (14) that Remark 3.3. The explicit time-dependence for this system is obtained as θ a (t) = θ a (0), From these and (16), one can also verify that the time-asymptotic limit of these variables are those defined at equilibrium : lim For the later convenience, this dynamical system is termed as follows.
Definition 3.3. (Primary moemt dynamical system). The dynamical system in Proposition 3.1 is referred to as a (primary) moment dynamical system.
Similar to the primary moment dynamical system, one has the following dynamical system.  (15). Then θ a η (t) and H(η, t) are solutions to the differential equations on R 2n+1 Proof. Non-trivial parts of this proof is for the first and third differential equations. A proof that the equation for θ a η holds is given first. It follows from (7) and (19) that Then, a proof that the equation for H(η, t) holds is given as Remark 3.4. The explicit time-dependence for this system is obtained as η a (t) = η a (0), From these and (19), one can also verify that the time-asymptotic limit of these variables are those defined at equilibrium : lim t→∞ θ a η (t) = θ a eq η , and lim t→∞ H(η, t) = Φ eq (η).
For the later convenience, this dynamical system is termed as follows.
Definition 3.4. (Dual moemt dynamical system). The dynamical system in Proposition 3.2 is referred to as a dual moment dynamical system.
In the following, it is shown how the general theory developed above is applied by focusing on the system stated in Example in Section 2.
The dual moment dynamical system discussed in Proposition 3.2 for this case is Since θ is chosen as θ = H /(k B T ) for this Example, the equations above correspond to the case where temperature depends on time, t, if H is constant.

Interchange MCMC method
Before stating how an extended MCMC can be geometrized, we introduce the interchange MCMC method used in this paper. The basic assumptions are as follows.
• A probability distribution function obeys the solvable heat-bath method. Thus, time is taken to be continuous, t ∈ R.
• Physical quantities whose expectation values are of interest are denoted by {O a } with a ∈ {1, . . . , n}.
• A thermodynamic phase space is assumed to exist such that it expresses nonequilibrium states as well as equilibrium one. This phase space is denoted by N , and N is identified with the (2n+1)-dimensional manifold R 2n+1 .
• The coordinate sets for N are chosen for each dynamical system as follows.
2. Obtain q(t) ∈ R 2n+1 that is a t time-development of q(0) by applying the primary moment dynamical system.

Interchange the dynamical variables with static dynamical variables in such a way that
Also, Ψ(θ, t) → H(η, t).
4. Obtain q(t ′ + t) ∈ R 2n+1 that is a time-development of q(t) by applying the dual moment dynamical system.

Repeat 2-5 until demanded numerical values of expectation values are obtained.
In this paper the process stated above is termed as follows.

Geometric description of MCMC methods
For the model used in this paper, the equilibrium states can be geometrized with information geometry, since the target distribution functions belong to the exponential family [7]. Several geometrization of nonequilibrium states for some MCMC methods have been proposed [23,24,25]. Yet, suffice to say that there remains no general consensus on how best to extend information geometry of equilibrium states to a geometry expressing nonequilibrium states. In the following, a geometrization of nonequilibrium states is proposed for the solvable heat-bath method. Also it is shown that such a geometry is consistent with information geometry in the sense that the time-asymptotic limit of solutions to the primary and dual master equations is described in information geometry. Here one notices that manifolds and geometric framework expressing equilibrium states should be obtained in the limiting case of nonequilibrium ones. Thus, the geometry for expressing nonequilibrium states should be wider than the geometry expressing equilibrium states in some sense. One such a wider geometry is contact geometry [10,16].

Geometry of equilibrium states
In what follows the relation between contact geometry and information geometry is briefly reviewed. Let C be a (2n + 1)-dimensional manifold, (n = 1, 2, . . .). If a one-form λ on C is provided and satisfies then the pair (C, λ) is referred to as a contact manifold, and λ a contact one-form. It has been known that there exists a special set of coordinates (x, y, z) with x = {x 1 , . . . , x n } and y = {y 1 , . . . , y n } such that λ = dz − y a dx a . The existence of such coordinates is guaranteed mathematically stated as the Darboux theorem [13]. The Legendre submanifold A ⊂ C is an n-dimensional manifold where λ| A = 0 holds. One can verify that is a Legendre submanifold, where ̟ : C → R is a function of x on C. The submanifold A ̟ is referred to as the Legendre submanifold generated by ̟, and is used for describing equilibrium thermodynamic systems [15]. It should be noted that how statistical mechanics dealing with distribution functions adopts Legendre submanifolds was investigated [26,16]. Equilibrium states, or equivalently target states, are identified with the Legendre submanifolds generated by functions in the context of geometric thermodynamics [14,15]. Besides, in the context of information geometry, equilibrium states are identified with dually flat spaces. Combining these identifications, one has the following. Proposition 4.1. (A contact manifold and a strictly convex function induce a dually flat space, [10]). Let (C, λ) be a contact manifold, (x, y, z) a set of coordinates such that λ = dz − y a dx a with x = {x 1 , . . . , x n } and y = {y 1 , . . . , y n }, and ̟ a strictly convex function depending only on x. If the Legendre submanifold generated by ̟ is simply connected, then ((C, λ), ̟) induces the n-dimensional dually flat space Dually flat space is defined in information geometry, and this space consists of a manifold M, a (pseudo-) Riemannian metric tensor field g F , and flat connections ∇, ∇ * that satisfy Thus a dually flat space is a quadruplet (M, g F , ∇, ∇ * ).
To apply this Proposition to physical systems, the coordinate sets x and y are chosen such that x a and y a form a thermodynamic conjugate pair for each a. Here it is assumed that such thermodynamic variables can be defined even for nonequilibrium states, and they are consistent with those variables defined at equilibrium. In addition to this, the physical dimension of ̟ should be equal to that of y a dx a .
How to specify a function ̟ in Proposition is given as follows. In mathematical statistics, given a probability distribution functionp θ parameterized by θ, the Fisher information matrix is defined. Each component is defined by For the exponential family, these matrix components can be obtained not only by calculating (28), but also by differentiating Ψ eq in (13) with respect to θ twice [7]. This Ψ eq and its Legendre transform Φ eq = L[Ψ eq ], a θ-potential and an η-potential, are chosen as ̟ in this Proposition. This choice is a part of procedure linking the space of distribution functions and a contact manifold. Instead of discussing physical meaning of the Proposition, we explain how this Proposition applies to a physical system by analyzing Example in Section 2.
Example. Consider the kinetic Ising model without interaction introduced in Example in Section 2. Let C be the 3-dimensional manifold with C ∼ = R × (−1, 1) × R, and λ = dz − y dx. Identify the coordinates such that x = θ, y = σ θ , and z = Ψ, then the condition (25) is dθ ∧ d σ θ ∧ dΨ = 0. Then the Legendre submanifold generated by Ψ eq Ising , is a suitable submanifold expressing the equilibrium state. It is known that equilibrium thermodynamic systems can be described in terms of information geometry. For example, the Fisher metric tensor field and the expectation coordinate on A Ψ eq Ising are respectively, where (22), (21), and g F Ising,θθ = d 2 Ψ eq Ising dθ 2 , have been used. Combining these and the first equality of (24), one verifies that θ and η are dual with respect to g F Ising in the sense that [7] g F Ising ∂ ∂θ , ∂ ∂η = sech 2 θ dθ dη = 1.
Since the Fisher metric tensor filed is a type of Riemannian metric tensor field and it is known that a Riemannian metric tensor field induces the Levi-Civita connection uniquely [27], g F Ising induces the Levi-Civita connection ∇ (0) . Its connection component Γ where g F θθ Ising = (g F Ising,θθ ) −1 = cosh 2 θ. Then one can show where C F Ising,0 is the component of the cubic-form, and is related to the Levi-Civita connection. The cubicform is defined in information geometry as C F := ∇ g F with ∇ being a connection [28]. If ∇ is flat, then its component expression for this Example is obtained as The α-connection defined in information geometry gives dual connections. Since the target distribution function belongs to the exponential family, one can show that the component of the α-connection Γ (α) θ θθ can be obtained from Γ From this, it follows that ∇ (1) is flat with θ being its coordinate. Combining the equations above, one has the component expression of (27), Furthermore, it can be shown that ∇ (−1) is flat with η being its coordinate [7]. To summarize, the quadruplet (A Ψ eq Ising , g F Ising , ∇ (1) , ∇ (−1) ) is a dually flat space, and is induced from the contact manifold C, λ, and Ψ eq Ising . Instead of the identification employed above, one can identify the coordinates such that x = η, y = θ η , and z = Φ for C ∼ = (−1, 1) × R 2 . Then, the condition (25) is dη ∧ d θ η ∧ dΦ = 0. In this case the Legendre submanifold generated by Φ eq Ising , Ising dη , and Φ = Φ eq Ising , is a suitable submanifold expressing the equilibrium state. On this submanifold, geometric objects such as the Fisher metric tensor field, the cubic form, and the α-connection, can explicitly be constructed as well.
For example, the Fisher metric tensor field is The relation between A Ψ eq Ising and A Φ eq Ising can be argued as follows. It follows from (22) and (24) that there exists a diffeomorphism between these submanifolds [10,16]. Let φ ΨΦ : A Ψ eq Ising → A Φ eq Ising be such a diffeomorphism. Then it follows from straightforward calculations [27] that the pull-back of g F Φ Ising is shown to equal to g F Ψ Ising , φ * ΨΦ ( g F Φ Ising ) = g F Ψ Ising .
So far geometry of equilibrium states have been discussed. One remaining issue is how to give the physical meaning of the set outside A ̟ , C \ A ̟ . A natural interpretation of C \ A ̟ would be some set of nonequilibrium states, and is discussed in the next subsection.

Geometry of nonequilibrium states
As shown in Proposition2.1, initial states approach to the equilibrium state as time develops. This can be reformulated on contact manifolds. In the contact geometric framework of nonequilibrium thermodynamics, the equilibrium state is identified with a Legendre submanifold. Then, as it has been clarified in Refs. [10] and [16], this asymptotic behavior can be identified with a class of contact Hamiltonian vector fields on a contact manifold. Here, contact Hamiltonian vector fields are analogous to symplectic vector fields on symplectic manifolds, and symplectic vector fields correspond to canonical equations of motion in Hamiltonian mechanics. This statement on a class of contact Hamiltonian vector fields can be summarized as follows.
Proposition 4.2. (Legendre submanifold as an attractor, [10]). Let (C, λ) be a (2n + 1)-dimensional contact manifold with λ being a contact form, (x, y, z) its coordinates so that λ = dz − y a dx a with x = {x 1 , . . . , x n }, y = {y 1 , . . . , y n }, and ̟ a function depending only on x. Then, one has 1. The contact Hamiltonian vector field associated with the contact Hamiltonian h : C → R such that gives d dt 2. The Legendre submanifold generated by ̟, given by (26), is an invariant manifold for the contact Hamiltonian vector field.
3. Every point on C \ A ̟ approaches to A ̟ along an integral curve as time develops. Equivalently A ̟ is an attractor in C.
For completeness, the component expression of the contact Hamiltonian vector field for a given contact Hamiltonian is summarized in Appendix. Also, the explicit time-dependence of (31) is obtained as [10] x a (t) = x a (0), As well as the case in Section 4.1, the details of physical meaning in the general case are not discussed here. However, how to use this Proposition is explained with Example as follows.
Applying Proposition 4.2 to this system, one has that A Ψ eq Ising defined in (29) is an invariant manifold and an attractor. Also, one concludes that h Ψ Ising ( θ, σ θ , Ψ ) decreases exponentially as time develops if an initial point is on C \ A Ψ eq Ising . From the physical interpretations of Ψ eq Ising and Ψ, one can interpret h Ψ Ising in a physical language as follows. The value of h Ψ Ising at time t can be interpreted as the difference between dimensionless free-energy at the state specified by t and that at the equilibrium state, realized in the limit t → ∞. This type of contact Hamiltonian will be used in Proposition 4.3.
As well as the set of the identifications above, the other set of the identifications enables one to discuss relaxation processes. Then the value of h Φ Ising at time t can be interpreted as the difference between negative entropy at time t and that at the equilibrium state, realized in the limit t → ∞. This type of contact Hamiltonian will be used in Proposition 4.4.
Some basics and physical applications of contact Hamiltonian systems are summarized in Ref. [29]. Also for some discussions on nonequilibrium processes in terms of contact geometry, see Ref. [16].
Relations among introduced spaces and manifolds are as follows. Let P and S θ , (θ ∈ Θ ⊆ R n ) be the sets P := p p ≥ 0 can be normalized and parameterized by some finite set , and respectively. Then the geometrization of P, the space of distribution functions, is information geometry.
In particular, dually flat space (P, g F , ∇, ∇ * ) has intensively been studied. Besides, the present geometry is about a contact manifold (C, λ) provided maps (Γ, Θ) → C, where these maps consist of the integration of quantities over random variables with the weight of a distribution function in S θ , and identity maps. Since a target distribution function belongs to the exponential family, a convex function Ψ eq exists. Then ((C, λ), Ψ eq ) induces (A Ψ eq , g F , ∇, ∇ * ), which enables one to deal with information geometry of equilibrium states from a viewpoint of contact geometry. Also, a map C → C is realized by a flow of a contact Hamiltonian vector field, and is identified with a nonequilibrium process. As a particular case, choosing a contact Hamiltonian to be (30) with ̟ = Ψ eq , one has a relaxation process. Here such a relaxation process is defined such that an integral curve connects a point of C \ A Ψ eq and that of A Ψ eq . Also, one can choose ̟ = Φ eq for expressing a relaxation process.
with some function h. Note that this class contains (30), which can be verified by choosing h such that h(Υ) = Υ with Υ := ̟(x) − z. In general, the relaxation rate for the case of (36) is not exponential. On the other hand, the relaxation rate for the case of (30) is exponential (See (32)), which is the same as that for the solvable heat bath method, (3) and (8). For this reason the original contact Hamiltonian (30) is only considered in this paper.
Remark 4.2. Divergences are defined in information geometry and they play various roles [7,8]. They are often discussed on dually flat spaces in information geometry. In this paper, connections are not introduced in contact manifolds except for Legendre submanifolds. For this reason aspects on divergences on contact manifolds are not considered here. However it is shown that the negative relative entropy that can be written as a form of divergence S θ × S θ → R, and the primary master equation yield an inequality for the solvable heat-bath method. Substituting (2) into (37), one has has been used. Similarly one discusses an inequality of the divergence depending on p and p eq η , D η ( p p eq η ) := j∈Γ p(j, t; η) ln p(j, t; η) p eq η (j) .
It then follows that d dt D η ( p p eq η ) ≤ 0.

Geometry of the interchange MCMC method
In this subsection Propositions 3.1 and 3.2 are written in a contact geometric language. In what follows phase space is identified with a (2n + 1)-dimensional contact manifold, (C, λ). As shown below, the dynamical systems stated in these propositions are contact Hamiltonian systems. 4. Let X H be a vector field associated with the dynamical system in Proposition 3.2, and Exp(tX H ) : C → C the exponential map associated with X H . Then the point in C at time t ′ + t is obtained by applying Exp(t ′ X H ) to q(t). That is, q(t ′ + t) = Exp(t ′ X H )q(t). A set of schematic pictures of this process is depicted in Fig. 3. Figure 3: Schematic pictures of dual moment dynamical system. The attractor is the Legendre submanifold A Φ eq = { θ a η = θ a eq η and H = Φ eq }.
To summarize discussions so far, one has the following main theorem in this Subsection.

Curve length from the equilibrium state
For MCMC methods, attention is often concentrated on how far a state is close to a given target state. In general, to formulate such a distance in terms of geometric language, length of a curve can be used. A length is a measure for expressing how far given two points are away, where such points are connected with an integral curve of a vector field on a manifold equipped with a metric tensor field. Since contact manifolds have been introduced for the interchange Monte Carlo method, one is interested in introducing a metric tensor field on contact manifolds for defining a length. First, a general theory is briefly summarized. Then, after a metric tensor field on a contact manifold is given, lengths between nonequilibrium states and target states are calculated where contact Hamiltonian vector fields express nonequilibrium processes.
Let M be a manifold, γ : R → M, (t → γ(t)) a curve on M,γ a vector field on M that is obtained as the push-forward γ * (∂/∂t), and g a Riemannian metric tensor field. Then the length of γ for t 0 ≤ t ≤ t 1 is defined by [27] l With this introduced metric tensor field, one can calculate the length between a state and the target state, where a contact Hamiltonian vector field is identified withγ in (42). This can be used to estimate how far states are away from the equilibrium state or a target state. In Ref. [30] a metric tensor field on contact manifolds has been studied. Let (C, λ) be a (2n + 1)dimensional contact manifold, (x, y, z) coordinates such that λ = dz − y a dx a with x = {x 1 , . . . , x n } and y = {y 1 , . . . , y n }. One defines This metric tensor field is consistent with the one at the equilibrium state, in the sense that the Fisher metric tensor field on a Legendre submanifold generated by a function is obtained with the pull-back of G.
The following gives the length along the introduced contact Hamiltonian vector field.
Lemma 4.1. The length between a state and the equilibrium state for the primary moment dynamical system calculated with (43) is where this h is specified as (38). Then the convergence rate for (44) is exponential.
Proof. Since a state is described by X Ψ in (40). Substituting X Ψ into (42), one has With (38), the last equality above can be written in terms of the contact Hamiltonian h as (44). It follows from (32) that the convergence rate is exponential.
Similarly one has the following.
Lemma 4.2. The length between a state and the equilibrium state for the dual moment dynamical system calculated with (43) is where this h is specified as (39). Then the convergence rate for (45) is exponential.
Proof. A way to prove this is analogous to the proof of Lemma 4.1.
Combining Lemmas 4.1,4.2, and Theorem 4.1, one has the main theorem in this paper.

Conclusions
This paper offers a viewpoint that an extended MCMC method can be described in contact geometry and information geometry. To show this explicitly, a solvable toy model as a master equation has been concentrated. From this master equation and the viewpoint of duality in the sense of information geometry, the dual master equation has been defined. With these equations, the interchange Monte Carlo method as an extended MCMC method has been invented. To give a contact geometric description of that Monte Carlo method, a combination of two contact Hamiltonian vector fields has been introduced. By reviewing an existing study, how to describe attractors of such contact Hamiltonian vector fields has been discussed in terms of information geometry. Then, with an introduced metric tensor field, the convergence rate has been shown to be exponential. Throughout this paper explicit expressions of geometrical objects have been shown by analyzing a spin model. There are some potential future works that follow from this study.