Statistical analysis method for the worldvolume hybrid Monte Carlo algorithm

We discuss the statistical analysis method for the worldvolume hybrid Monte Carlo (WV-HMC) algorithm [arXiv:2012.08468], which was recently introduced to substantially reduce the computational cost of the tempered Lefschetz thimble method. In the WV-HMC algorithm, the configuration space is a continuous accumulation (worldvolume) of deformed integration surfaces, and sample averages are considered for various subregions in the worldvolume. We prove that, if a sample in the worldvolume is generated as a Markov chain, then the subsample in the subregion can also be regarded as a Markov chain. This ensures the application of the standard statistical techniques to the WV-HMC algorithm. We particularly investigate the autocorrelation times for the Markov chains in various subregions, and find that there is a linear relation between the probability to be in a subregion and the autocorrelation time for the corresponding subsample. We numerically confirm this scaling law for a chiral random matrix model.


Introduction
The numerical sign problem has prevented us from the first-principles analysis of various important systems, such as quantum chromodynamics (QCD) at finite density [1], quantum Monte Carlo calculations of quantum statistical systems [2], and the real-time dynamics of quantum fields.
Among various approaches to the sign problem, some utilize the complexification of dynamical variables. For example, in the complex Langevin method [3][4][5][6], one considers the Langevin equation in the complexified configuration space. In the path optimization method [7][8][9][10], with the aid of machine learning one looks for an optimized integration surface for which the average phase factor is maximal. In the Lefschetz thimble method [11][12][13][14][15][16][17][18][19][20][21][22][23], one deforms the integration surface according to the antiholomorphic gradient flow. The deformed surface asymptotes to a union of Lefschetz thimbles, each of which gives a constant value to the imaginary part of the action and thus is free from the sign problem. Although there can appear the ergodicity problem due to the existence of infinitely high potential barriers between thimbles, this ergodicity problem can be diminished by tempering the system with the flow time [19]. This tempered Lefschetz thimble method (TLTM) thus solves the sign and ergodicity problems simultaneously. Moreover, the computational cost of TLTM has recently been reduced significantly by developing the worldvolume tempered Lefschetz thimble method (WV-TLTM) [23], which we are going to review now.
Let R N = {x} be the configuration space and S(x) the action (allowed to be complexvalued). Our main interest is to numerically estimate the expectation values of observables O(x), (1.1) Under the assumption that e −S(z) and e −S(z) O(z) are entire functions of z ∈ C N , Cauchy's theorem allows us to continuously deform the integration surface without changing the value of integral. By expressing the deformation as a flow z t (x) (t ≥ 0) with z 0 (x) = x, the deformed integration surface at flow time t can be written as Σ t = {z t (x) | x ∈ R N }, and thus we have the equality Since the numerator and the denominator are both independent of t, we can integrate each of them over an arbitrary interval [T 0 , T 1 ] with an arbitrary function W (t). This rewrites (1.2) to the integration over the worldvolume R ≡ T 0 ≤t≤T 1 Σ t : where Dz is the induced volume element on R, and K(z) is the Jacobian: dt dz t = K(z) Dz. The weight factor e −W (t) is chosen so that the probability for a configuration to appear at time t is (almost) independent of t. This setting is especially necessary when the whole range of t is relevant to simulations, as in the WV-TLTM. We further rewrite (1.3) to the ratio of reweighted averages: Here, the reweighted average is defined for the weight e −V (z) ≡ e −Re S(z)−W (t(z)) , and A(z) is the reweighting factor whose explicit form can be found in Ref. [23].
The reweighted average (1.5) is estimated by the average over a sample generated by the hybrid Monte Carlo (HMC) algorithm with the potential V (z). We will generally call a HMC algorithm on an accumulation of integration surfaces the worldvolume hybrid Monte Carlo algorithm (the WV-HMC algorithm), which includes the WV-TLTM.
In the case of the WV-TLTM, the interval [T 0 , T 1 ] should include the region with large t so as to solve the sign problem, and at the same time include the region with small t so as to reduce the ergodicity problem. However, the small-t region may be contaminated by the sign problem, and the large-t region by the ergodicity problem. Thus, in order to reduce the contributions from these potentially contaminated regions, it was proposed in Ref. [23] to estimate the observables from a subsample in a subregion with [T 0 ,T 1 ] (⊂ [T 0 , T 1 ]) (see Fig. 1), whereT 0 andT 1 are chosen such that the estimated values vary only slightly against the changes ofT 0 andT 1 . 1 Figure 1: The worldvolume R and a subregionR.
In Ref. [23] the standard error analysis was employed as if the subsample itself is generated as a Markov chain with ergodicity and detailed balance. However, this is not obvious and needs a justification. In this paper, we prove that, if consecutive configurations in the worldvolume are generated as a Markov chain with ergodicity and detailed balance, then the subset consisting of the configurations belonging to a subregion can also be regarded as a Markov chain with ergodicity and detailed balance intact. We particularly investigate the integrated autocorrelation times for the Markov chains in various subregions, and find that there is a linear relation between the probability to be in a subregion and the integrated autocorrelation time for the corresponding subsample. We numerically confirm this scaling law for a chiral random matrix model (the Stephanov model [24,25]). This paper is organized as follows. In section 2, we first prove that the subset consisting of the configurations in a subregion is a Markov chain. We then argue that there should be a linear relation between the probability to be in a subregion and the integrated autocorrelation time for the corresponding subsample. Section 3 demonstrates this scaling by explicit numerical calculations for the Stephanov model. Section 4 is devoted to conclusion 1T 0,1 were written asT 0,1 in Ref. [23]. and outlook. In appendix A we explain our Jackknife method for estimating the statistical errors of the integrated autocorrelation times.

Stochastic process in a subregion and the integrated autocorrelation time
Let R = {z} be the full configuration space (the worldvolume in the case of the WV-TLTM). Suppose that we are given a Markov chain {z k } (k = 0, 1, 2, . . .) in R with the transition probability P (z k+1 |z k ) that satisfies ergodicity as well as the detailed balance condition with respect to the unique equilibrium distribution p eq (z):

Stochastic process in a subregion
We now look at a subregionR in R, whose complement we denote byR c ≡ R\R. 2 Then, from the Markov chain {z k } (k = 0, 1, 2, . . .) in R, we can extract a subsequence {z ℓ } (ℓ = 0, 1, 2, . . .) that consists of the configurations belonging toR. We first notice that this sequence is a Markov chain with the following transition probability fromz ∈R toz ′ ∈R: Since P (z ′ |z) is ergodic by assumption, and thus since a configuration which has leftR will eventually reenterR at a finite number of steps,P (z ′ |z) is ergodic and satisfies the probability conservation: Furthermore, using the expression (2.2), one can easily show thatP (z ′ |z) satisfies the following equality:P from which we find that the unique equilibrium distributionp eq (z) forP (z ′ |z) is given bỹ .

(2.5)
We thus have proved that the estimation of the expectation value R dzp eq (z) O(z) with the subsample from a subregionR can be statistically analyzed as if it is a Markov chain.

Integrated autocorrelation time for a subchain
Let {z k } (k = 0, 1, 2, . . .) again be a Markov chain in R with the transition probability The ratioτ int (O)/τ int (O) can be evaluated as follows, when both the numerator and the denominator are not too small. We first write by ǫ andǫ, respectively, the average Monte Carlo times evolved in one-step transition with P andP . 4 We then note thatǫ can be written with ǫ by using the probability p for a configuration inR to stay inR at the next step and the probability q for a configuration inR c to stay inR c as well: Furthermore, since autocorrelations should be the same at large Monte Carlo time scales, we have the equality Note that this renormalization-group-like argument holds only when τ int (O) andτ int (O) are not too small. Combining Eq. (2.7) and Eq. (2.8), we obtain the desired result:

Scaling law for the integrated autocorrelation times
We now apply the preceding arguments to the case where the configuration space is the worldvolume of the WV-TLTM. We argue that there must be a linear relation between the probability to be in a subregionR and the integrated autocorrelation timeτ int for the corresponding subchain. This claim will be confirmed numerically in the next section.
To simplify discussions, we assume that the integrated autocorrelation time for the flow time t [i.e., τ int (O) with O(z) = t(z)] is sufficiently small, τ int (t) ≃ 1. This can be easily realized, if necessary, by removing consecutive configurations from the chain at appropriate intervals. The smallness of τ int (t) means that the probability p simply expresses the probability for a configuration to be inR. We now recall that the distribution of t is uniform in equilibrium for the WV-TLTM [see the discussion below Eq. (1.3)]. Thus, p is given by A similar statement holds for the probability q which now expresses the probability for a configuration to be inR c , and thus we obtain the relation p + q = 1. Then, Eq. (2.7) leads to the relationǫ = ǫ/p, and thus, combined with Eq. (2.10), we obtain the following scaling law for the integrated autocorrelation times: . Since the distribution of t is almost uniform, the ratio N conf (T 0 ,T 1 )/N conf (T 0 , T 1 ) almost equals p, and thus we obtain the relation This in turn means that the effective number of configurations in a subsample does not depend on the choice of the interval [T 0 ,T 1 ], because When N conf is sufficiently large (as we always assume), the statistical error is given by the for-

Numerical confirmation of the scaling law
We numerically confirm the scaling law (2.11) for the WV-TLTM [23] applied to a chiral random matrix model (the Stephanov model [24,25]).

Setup
The partition function of the Stephanov model for N f quarks with equal mass m is given by where X = (X ij = x ij + i y ij ) is an n × n complex matrix. The 2n × 2n matrix D expresses the Dirac operator in the chiral representation, where µ and τ represent the chemical potential and the temperature, respectively. The chiral condensate and the number density are defined, respectively, by We will set the parameters to N f = 1, n = 2, µ = 0.6, τ = 0, m = 0.004.
We generate a sample {z j } (j = 1, . . . , N conf ) with the HMC algorithm using the potential V (z), and estimate the reweighted average O(z) R [see Eq. (1.5)] by the sample average (3.5) The estimator of the integrated autocorrelation time τ int (O) is given by Here, C k (O) is an estimator of the autocorrelation C k (O) = O 0 O k R, c , whose explicit form is given in appendix A. We have truncated the summation at k max to avoid summing up statistical fluctuations around zero (see, e.g., Ref. [28]). The statistical error δτ int (O; k max ) is estimated by a Jackknife method that is described in appendix A. Values of k max and bin sizes used in the estimations of τ int (O; k max ) are summarized in Table 1.  In the numerical simulation with the HMC, we set T 0 = 0 and T 1 = 0.02. The HMC updates are performed with the molecular dynamics time increment ∆s = 0.001 and the step number n HMC = 100 with the average acceptance rate more than 0.99. We employ 20 independent sets of configurations, each set consisting of 3 × 10 6 configurations in [T 0 , T 1 ]. The observables are measured at every 6 iterations of the HMC algorithm, so that we have 20 independent samples of size N conf = 5 × 10 5 . We fixT 1 = T 1 and varyT 0 asT 19), for each of which an independent set of configurations is used.

Results
We now demonstrate the scaling law (2.11) from explicit numerical calculations. Recall that the argument for the scaling is based on the smallness of the integrated autocorrelation time of t (τ int (t) ≃ 1) and the uniformity of the distribution of t. Figure 2 shows that the conditionτ int (t) ≃ 1 is satisfied for all p = (T 1 −T 0 )/(T 1 − T 0 ). Figure 3 is the histogram of p = (T 1 −T 0 )/(T 1 − T 0 ), which is almost flat as required. This is realized by tuning the functional form of W (t) as in Ref. [23].   . We perform the χ 2 -fit to these data points with Finally, in Fig. 6 we plot the expectation values of the chiral condensate and the number density, together with their statistical errors. The statistical errors are estimated with the Jackknife method with the bin sizes fixed to 500. We see that both the means and the statistical errors take almost constant values in a region where (T 1 −T 0 )/(T 1 − T 0 ) is not small. The deviation of the means should be attributed to the complex geometry at large flow times, which requires larger statistics and better control of systematic errors (such as those from numerical integrations of the flow equation and of the Hamilton equations accompanied by the projection from C N to R). The deviation of the statistical errors is due to the violation of the scaling law (2.11) for either of the numerator or the denominator (or both) in the ratio of the reweighted averages.

Summary and outlook
In this paper, we have established the statistical analysis method for the WV-HMC algorithm, whose major use is intended for the WV-TLTM [23]. We proved that, if consecutive configurations in the worldvolume are generated as a Markov chain with ergodicity and detailed balance, then the subset consisting of the configurations belonging to a subregion can also be regarded as a Markov chain with ergodicity and detailed balance intact. We particularly investigated the integrated autocorrelation times for the Markov chains in various subregions, and found that there is a linear relation between the probability to be in a subregion and the integrated autocorrelation time for the corresponding subsample. We numerically confirmed this scaling law for the Stephanov model. Now with this statistical analysis method at hand, we can safely apply the WV-TLTM to large-scale simulations of the systems that have serious sign problems, such as finite density QCD, strongly correlated electron systems, frustrated spin systems, and the realtime dynamics of quantum fields. A study along this line is now in progress and will be reported elsewhere. where C k (O) is the estimator of the autocorrelation C k (O) ≡ O 0 O k c . The summation is truncated at k max to avoid summing up statistical fluctuations around zero (see, e.g., Refs. [27,28]). The value of k max should not be set very large compared to τ int (O), otherwise contributions from statistical fluctuations around zero may dominate the error. There has been known an explicit formula for the statistical error δτ int (O) when 1 ≪ k max ≪ N conf (more precisely, as N conf → ∞, k max → ∞ and k max /N conf → 0) [26] (see also Refs. [27,28]), 5 but this may not be applicable to the case when τ int (O) = O(1), for which the condition k max ≫ 1 cannot be met. Therefore, in this paper we adopt the Jackknife method for the estimation of δτ int (O).
In order to apply a resampling method of Jackknife, we introduce a sample of multidimensional observables {X r = (X OO r,k , X O r )} (r = 1, . . . , N conf − k max ) with . (A.6) The statistical error δτ int (O) can then be estimated with the standard Jackknife method.