A new way to deal with fermions in Monte Carlo simulations

An exact, nonlocal, finite step-size algorithm for Monte Carlo simulation of theories with dynamical fermions is proposed. The algorithm is based on obtaining the new configuration U' from the old one U by solving the equation $ M(U') \eta = \omega M(U) \eta$, where $M$ is fermionic operator, $\eta$ is random Gaussian vector, and $\omega$ is random real number close to unity. This algorithm can be used for the acceleration of current simulations in theories with Grassmann variables. A first test was done for SU(3) QCD with purely fermionic term in the action.


Introduction.
From the earliest days of Monte Carlo simulations in lattice field theory fermionic fields have caused annoying difficulty, which stems from their being anticommuting variables. It is not immediately straightforward to put fermions on a computer, which is expected to manipulate numbers. This problem is only an algorithmic one, since for the most actions in use one can eliminate fermions by an analytic integration. However, the resulting expressions involve the determinants of very large matrices, making the numerical simulations extremely expensive.
Over the years many interesting tricks have been developed to circumvent this problem. The most often used algorithm, Hybrid Monte Carlo (HMC) [1], is now considered to be the standart simulation tool. In recent years it was successfully challenged by the Multiboson method, which has many attractive features, like the possibility of simulation of an odd number of quark flavors. The most popular version of Multiboson method, proposed by M. Lüscher [2], has been intensively studied by many authors (see [3] and references therein/thereon), and after various improvements was claimed to be competitive or even slightly better than HMC (see for example [4]). The Multiboson algorithm, proposed by A. Slavnov [5], is less known, and was tested in [6]. Besides HMC and Multiboson method, I like to mention some interesting ideas, which may finally result in efficient algorithms for simulation of theories with fermions. They are based on the polymer [7] and Jordan-Wigner occupation number [8] representations of fermion determinant, the direct evaluation of Grassmann integrals [9], the separation of low and high eigenmodes of the Dirac operator [10], and the direct simulation of loop expansion by means of using the stochastic estimators [11].
Despite the variety of different approaches to the problem, there is a common impression that all existing algorithms remain inefficient [12]. This is clearly seen if we compare the computational cost for the theories with dynamical fermions from the one side, and purely bosonic theories from the other side. In QCD even for heavy quarks the simulations are orders of magnitude slower than the ones in the quenched regime (the approximation where the quark determinant is set to be equal to unity). Such a situation suggests that one should not stop the attempts to obtain relatively cheap fermion simulation algorithm.
In this paper I propose a new computational strategy for treating dynamical fermions in Monte Carlo simulations. It has the virtues of exactness, nonlocality and finite step-size of update 1 .
1 There is a great probability, that a successful fermionic algorithm must be nonlocal and have a finite step-size of update. Indeed, HMC implies global changes in configuration space, but with infinitesimal step-size. This drustically slows down the simulations, becouse the high energy barries become nearly impassable. Multiboson algorithm has the advantage of finite step-size, but it updates the Unlike the other exact algorithms, the CPU-time required for the update in this algorithm grows only lineary with the volume V of the system. This makes the new method to be rather attractive for practical simulations, although there are still some problems to be solved.

The algorithm
Suppose that we aim at sampling the partition function of theory with two flavours of degenerate fermion fields: where M is the discretized fermion operator acting at some vector space Ω, U denotes the bosonic degrees of freedom coupled to fermion fields. For simplicity of formulae I postpone the inclusion of purely bosonic action S b [U ]. The sampling of full partition function will be considered later. At the beginning let me introduce a simple auxiliary algorithm, which will guide the later construction and help to understand the basis of new method. Starting from old configuration U , one executes the following instructions:

Prescription A
• generate random vector η ∈ Ω with Gaussian distribution: • propose new configurationŨ with symmetric probability • acceptŨ with the probability Here Z G normalizes the distribution: In eq.(4) and below I use the following short no- The full transition probability satisfies the detailed balance condition with respect to the partition function (1): This can be easily seen by making the change of variables in the expression (6). Therefore, if the proposal matrix (3) ensures ergodicity, then Prescription A is a valid algorithm for sampling the partition function (1). However, this exact algorithm is expensive due to the necessity to invert the large matrixM in the expression (4). The cost of inversion is proportional to the volume of the system V , so if one updates the fields U locally, the computational cost of updating the entire configuration grows as O(V 2 ). One can be more clever and perform the global updates by fixing φ = M η, introducing fictitious momenta p conjugate to the fieldsŨ (or some functions ofŨ ), and pursuing a discrete integration of Hamilton evolution (this is the core of HMC algorithm, which in its traditional form has a volume dependence like V 5/4 ). Nevertheless, the procedure remains expensive, becouse one should tend the size of molecular dynamics steps to zero for minimizing the integration errors and again invert the matrixM during the integration.
Can one somehow modify the Prescription A to escape the inversion of matrixM , implementing the global update U →Ũ in configuration space? The answer is 'yes' !
The key idea is to introduce into proposal matrix (3) the dependency on η to make the new configurationŨ satisfy the approximate equalitỹ The symmetry of expression (9) under the interchange of variables U ↔Ũ will help to garantee the reversibility of algorithm. Moreover, the acceptance probability (4) should be large, if the approximate equation (9) is close enough to its exact analogue. After these handwaving speculations let me present a rigorous construction. I do it in two steps. Firstly, I prove the detailed balance condition for some general Prescription B, in which the proposal matrix P 0 [ω, η ; U →Ũ ] depends on η and random real number ω distributed in the narrow interval near unity. Secondly, I specify the particular choice of P 0 [ω, η ; U →Ũ ] and present the final algorithmic scheme.
Here ǫ is the algorithmic parameter lying in the interval 0 ≤ ǫ < 1.
The condition (10) provides the invariance of the measure µ[ω]dω under the change of variable ω → 1 ω . Indeed, using eq.(10), one can easily check that for any integrable function f (ω) the following equality holds: (13) Using condition (11) together with the property (13), one can demonstrate the symmetry of averaged over η , ω proposal matrix under the interchange of variables U ↔Ũ : This makes the algorithm reversible. Finally, condition (12) ensures the fulfillment of detailed balance equation (7) for the full transition probability Indeed, making the change of variables (8) in expression (16), one obtains: Expression (17) is symmetric with respect to the interchange U ↔Ũ due to the eqs. (12,13). Therefore, Prescription B is again a valid algorithm for sampling the partition function (1), if the averaged proposal matrix (14) provides ergodicity. Now I propose to specify the choice of the matrix P 0 [ω, η ; U →Ũ ] by defining it through the equation: It means that, analytically or numerically, one finds some solutionŨ of the equation (18) and propose it as a new configuration. A good recipe for the numerical search may be the local iterative minimization of the quantity for fixed U, ω, η. Starting fromŨ = U , the minimization proceeds until R < δ is reached, where δ determines the accuracy of solving the eq.(18). One should construct the minimization procedure in a way that ensures reversibility of the algorithm (i.e. probability to obtain the configuratioñ U starting from U at any ω should be equal to probability to obtain U starting fromŨ at 1/ω). It can be checked that the proposal matrix, defined 2 through the eq.(18), satisfies the symmetry relation (11). Indeed, the equation (18) is invariant under the simultaneous interchange of variables U ↔Ũ ; ω ↔ 1/ω. Therefore, the proposals P 0 [ω, η; U →Ũ ] and P 0 [1/ω, η;Ũ → U ] are equiprobable, becouse they are defined through the same equation.
The same logic is applicable for proving the fulfillment of symmetry relation (12). The lhs. of expression (12)  Let us note, that on the surface (18) the acceptance probability (4) acquires the following simple form: The calculation of P acc becomes extremely cheap, since one does not need to invert the matrixM anymore. Moreover, the expression (20) does not depend onŨ , so one can accept or reject ω (or the pair (ω, η) ) even before solving the equation (18).
We also need to specify the probability µ[ω]. A good choice is the following expression: 2 The definition means the choice of some concrete reversible procedure of finding the solutionŨ .
Considering together the expressions (20,21), one gets the unified probability distribution for ω: Now we are ready to write down the final algorithmic scheme for sampling the partition function (1): The algorithm • generate random vector η ∈ Ω with Gaussian distribution: • generate ω ∈ 1 − ǫ ; The only computationally expensive ingredient of the algorithm is the obtaining of solutionŨ of eq.(18). One can expect, that the usage of the local iterative minimization of the functional (19) gives the computational cost proportional only lineary to the volume V .
If the procedure for obtaining the solution of eq.(18) is fully specified, the algorithm has only one free parameter ǫ, which controls the size of the deviation of ω from unity. A rather intriguing possibility is to set ǫ = 0, therefore fixing ω = 1. This makes the algorithm 'energy conserving', in a sense that |M −1 M η| 2 = |η| 2 . Unfortunately, in this case the search for the solution of eq.(18) by minimizing the functional (19) does not work, becouse one immediately obtains the trivial solu-tionŨ = U . However, in typical case (e.g. SU(3) QCD), the solutions of eq.(18) are highly degenerate, and the subspace of nontrivial solutions in configuration space is not empty. Finding the reversible procedure of nontrivial solving of equa-tionM η = M η is the perspective subject for future investigations.
In gauge theories one can use the gauge freedom to simplify the procedure of solving the eq.(18). Indeed, under the gauge transformations eq.(18) acquires the form: where η g ≡ Gη, G is the matrix representing the gauge transformation on the vector space Ω. Solving the eq.(24) can be particularly simple for some η g . Finally let me note, that the same algorithm can be used for simulations in theories with bosonic determinants, if we change the acceptance (20). Sampling the partition function one should use P acc [ω, η] = min (1, e 1−ω 2 )|η| 2 instead of the expression (20).

Potential problems
Despite the simple formulation and cheapness of the considered algorithm, it may be not applicable for some models. The main danger is the possible absence of solutions of eq.(18). This problem may be principal (no solutions exist at all), or mild (no solutions exist for some particular η, ω). In the second case one can reject the pairs (η, ω) until the solution is found. The value of this 'hidden' acceptance would determine the actual efficiency of the algorithm in such a case.
Another possible problem may be connected with the procedure of finding the solutions of eq.(18). Suppose that one uses the local iterative minimization of the quantity (19) for this purpose. If the functional (19) can have some local minima at which R > 0, then minimization procedure can stick at some of these minima before reaching R = 0. In that case one should think of the other way for solving the eq.(18).
The ergodicity of the algorithm may also be under the question. One should check for the model of interest if any region of configuration space can be reached by the sequential updates via eq.(18). If the algorithm is nonergodic, it can be used for the acceleration of other algorithms, like HMC and Multiboson.
For given U, η, ω the eq.(18) can have many degenerate solutions 3 . This is not a problem at all, if one respects the symmetry relations (11,12) when some particular solution is being chosen. However, one should be careful in order not to violate the detailed balance and reversibility.
In order to see, if these potential problems can cause any real troubles, I made some tests for the case of SU(3) QCD with purely fermionic term 4 . Let us however note, that these tests are preliminary, since for the reasons of simplicity and lack of computer time I did not include pure bosonic part S b [U ] into the action (this corresponds to β = 0). The more complicated simulations for the full QCD in physically interesting region will be done in the future [13].

Tests for SU(3) QCD with purely fermionic term in the action
The simulations were performed at 8 4 lattice for the partition function (1) with and D oe means that this operator acts from even to odd space sites. I used even-odd preconditioning to reduce the computational cost of the algorithm. By doing so one also increases the degeneracy of solutions of eq.(18), which now provides 12V real equations for 32V unknown variables, since M [U ] acts only on even space manifold 5 . The hopping parameter k was chosen to be k = 0.2, which gives the plaquette value P = 0.0089(1). The performance of algorithm was compared with that of usual HMC.
The minimization of functional (19) was implemented by making the random moves in each expression (18) for unconditioned fermion matrix provides (2 * N dirac * N color * V ) = 24V real equations for (Ngenerator * 4V ) = 32V variables.
color direction for all links lexicographically. Such minimization algorithm garantees reversibility of the procedure.
Firstly I checked the existence of solutions of eq.(18) at different ω for typical equilibrated configurations U . The tests were done in the interval ω ∈ [0.9, 1.1]. The good news is that for the considered ω one always finds some solution (i.e. for any δ the minimization procedure finally gives R < δ). No local minima of the functional (19) R min > 0 were observed.
On the average solving the eq.(18) at ǫ = 0.05 with precision δ = 5 * 10 −7 required 42 minimization iterations (it was checked, that improving the precision did not affect the results).
Since the cost of one iteration is roughly equal to 2 * N generator = 16 multiplications by matrix M [U ], the total cost of finding the solution of eq.(18) was approximately 670 matrix multilications. This has to be compared with the cost for generating one trajectory with HMC. At step-size τ = 0.033 and trajectory length equal to 1 (this ensures 70% acceptance), one trajectory costed aproximately 4420 matrix multiplications 6 , i.e. ≈ 6.6 times more expensive than solving eq.(18).
Performing the tests it was observed that our algorithm (let me name it Omega-algorithm in the following text) alone can not be used for sampling the partition function (1). The problem is that probability P[ω, η] to accept ω < 1 is strongly suppressed by large factor in the exponent. One knows that the value of squared norm of Gaussian vector can be estimated as |η| 2 ∼ I dof , where I dof is the dimensionality of vector space on which η is defined. For even-odd preconditioned SU(3) QCD at 8 4 lattice one has I dof = 24576! Therefore, using Omega-algorithm alone, one is restricted to choose between two unfavorable possibilities: 1) Using ǫ ∼ 1/I dof . It was observed that at such small values of ǫ the evolution of U fields becomes very slow, becouse the new configura-tionŨ always lies too close to the old one (large autocorrelation times); 2) Using ǫ ≫ 1/I dof . Then ω < 1 is almost never accepted, and the system is gradually cooled until it does not move.
In the second case the trouble appears due to the nonergodicity of algorithm at large ǫ. Nevertheless, even at large ǫ Omega-algorithm can be used for acceleration of other fermionic algorithms like HMC or Multiboson, since the ergodicity is provided by them.
I tested the combination of Omega-algorithm and HMC. After each HMC trajectory the global move in gauge configuration space was implemented by using Omega-algorithm at ǫ = 0.05. The average plaquette value for this algorithmical mixture was P = 0.0091(1), coinciding with correct one within the error bars. The efficiency of HMC-Omega mixture was estimated by measuring the autocorrelation times for the plaquette. The results were: Pure HMC, 1000 trajectories: τ int = 1.4(2); HMC+Omega, 1000 (traj + ω-update): τ int = 0.7(1). Note, that autocorrelation time was reduced almost at no cost, becouse the global Omegaalgorithm update is much cheaper, than HMC trajectory. Of course, these tests are not very illustrative, becouse the autocorrelation times for theory with pure fermionic term at this kappa value are rather small. The further tests in models of practical relevance are needed. In particular, one should find a cheap way of inclusion of pure bosonic action S b [U ] in simulations. This issue is discussed in the next section.
Let me summurize the results of the first test for Omega-algorithm. In its present version the algorithm was found to be nonergodic. Nevertheless, it was demonstrated, that Omega-algorithm can be efficiently used for acceleration of other fermionic algorithms.
Here I want to emphasize, that nonergodicity is not intrinsic for Omega-algorithm. It can be rather attributed to the unfavorable procedure of finding the solution of eq.(18) by minimizing the functional (19), which forces us to use large ǫ. One can still hope that it is possible to use ǫ ∼ 1/I dof , or even ǫ = 0, and obtain nontrivial solutionsŨ far away from U at the same time. This hope is based on the high degeneracy of solu-tions of eq.(18). The main problem is to garantee the reversibility of the procedure. An investigation along these lines is in progress [13].

Inclusion of bosonic action
Now let us consider the sampling of full partition function (2). The simplest way to include the contribution of purely bosonic sector is to add to the algorithm of the previous section the following instruction: • accept the new configurationŨ with the probability The reader can easily check that one gets a valid algorithm for sampling the partition function (2). However, the new configurationŨ , obtained by solving the eq.(18), differs globally from the old one, so the acceptance (28) can be very small. Of course, one can try to make the new configuration lying close to the old one to ensure the reasonable acceptance, but by doing so one lose one of the main advantages of the new algorithm -the finite step-size of update. A more radical way is to rewrite the partition function (2) in the form: where operator B satisfies the following identity: , where N is the size of matrix M . A more rational way is, probably, to insert the bosonic contribution into the operator M locally. One can represent the bosonic action as a sum over local contributions: (some of S (i) b can be equal to zero), and then define the matrix B as follows: (no summation over i is assumed in (33)). In any case, one should try to choose the most convenient version of operator B in order to simplify the procedure of solving the eq.(31).
The new method, proposed in this paper, can provide a cheap simulation algorithm for theories with dynamical fermions. The algorithm is exact and has a finite step-size of update.
Although in its present version the algorithm was found to be nonergodic, it can be used in combination with other ergodic algorithms like HMC and Multiboson. One may expect a good performance for such algorithmical mixture when the lattice volume increases, since computational cost for our new algorithm grows only lineary with the volume of system.
The efficiency of algorithm can be strongly affected by the clever choice of procedure of solving the eq.(18). Due to the high degeneracy of solutions one may hope to find the procedure, which makes the algorithm ergodic, and samples the configuration space fast enough. Also one can speculate on possibility of finding some analytic solutions for the eq.(18). If this is feasible at all, one can obtain a very cheap fermionic algorithm, comparable in cost with the algorithms for purely bosonic theories.