mVMC - Open-source software for many-variable variational Monte Carlo method

mVMC (many-variable Variational Monte Carlo) is an open-source software based on the variational Monte Carlo method applicable for a wide range of Hamiltonians for interacting fermion systems. In mVMC, we introduce more than ten thousands variational parameters and simultaneously optimize them by using the stochastic reconfiguration (SR) method. In this paper, we explain basics and user interfaces of mVMC. By using mVMC, users can perform the calculation by preparing only one input file of about ten lines for widely studied quantum lattice models, and can also perform it for general Hamiltonians by preparing several additional input files. We show the benchmark results of mVMC for the Hubbard model, the Heisenberg model, and the Kondo-lattice model. These benchmark results demonstrate that mVMC provides ground-state and low-energy-excited-state wave functions for interacting fermion systems with high accuracy.


Introduction
High-accuracy analyses of effective Hamiltonians for interacting fermion systems have been an important issue for a long time in studies of novel quantum phases in strongly correlated electron systems such as high-temperature superconductors [1] and quantum spin liquids [2,3]. Recent theoretical development in construction of low-energy effective Hamiltonians for real materials in a non-empirical way [4] urges developing ways for high-accuracy analyses of Hamiltonians with complex hopping parameters and electron-electron interactions. To promote materials design of correlated electron systems, it is highly desirable to develop a numerical solver, which can analyze a wide range of Hamiltonians with high accuracy.
To solve Hamiltonians for interacting fermion systems, various numerical algorithms have been developed so far [5]. The exact diagonalization method is one of most reliable theoretical methods applicable to general Hamiltonians [6,7]. However, system sizes which can be treated in the exact diagonalization method are limited, because the Hilbert-space dimensions increase exponentially as a function of systems sizes. The (unbiased) quantum Monte Carlo method is another important method applicable to a wide range of quantum many-body systems [8]. For interacting fermion systems, however, the negative sign problem, i.e., the appearance of the negative weights in the Monte Carlo samplings makes it difficult or practically impossible to obtain reliable results for a realistic numerical cost except a few special cases. For one-dimensional quantum systems, the density matrix renormalization group (DMRG) is an excellent method [9][10][11], which can treat larger system sizes. The tensor-network method, which has been developed keeping close relation with the DMRG method, has succeeded in treating two-dimensional systems without suffering from the negative sign problem [12][13][14]. However, the computational time by the tensor network method increases very rapidly with the increasing tensor dimension and the convergence to the exact estimate is not throughly examined so far in complex systems. Particularly in cases of itinerant fermion models we need special care about the entanglement entropy, which increases beyond the area law and the accuracy by the tensor network becomes worse.
The variational Monte Carlo (VMC) method is one of promising methods for highly accurate calculations for general systems [15]. In the VMC calculation, we introduce a variational wave function with variational parameters, and obtain approximate ground-state wave functions by optimizing these parameters according to the variational principle. For calculating expectation values of physical quantities, we employ the Markov-chain Monte Carlo sampling. In contrast to the ordinary (unbiased) Monte Carlo method, the VMC method does not suffer from the negative sign problem since the weight of Monte Carlo sampling is positive definite. The VMC method was applied to various quantum many-body systems such as liquid 4 He [16], liquid 3 He [17], the Hubbard model [18][19][20][21][22][23][24], the Kondo-lattice model [25,26], and the Heisenberg model [27][28][29]. In these applications, trial wave functions were assumed to be a simple mean-field form with only a few variational parameters, and were optimized to reproduce ground-state properties qualitatively. The accuracy of the VMC calculation using such simple trial wave functions is, however, not satisfactory, if one hope to identify a novel quantum phase competing with other possible phases, or to treat complicated low-energy Hamiltonians for real materials.
To improve the accuracy of the VMC method substantially, more than ten thousand variational parameters are introduced in trial wave functions where they are simultaneously optimized by using the stochastic reconfiguration (SR) method [30,31]. In addition, we implement quantum number projections to restore the symmetries of the wave functions to improve their accuracy. We have developed a program package named mVMC (manyvariable Variational Monte Carlo), which can perform highly accurate VMC calculations for a wide range of the quantum lattice models. mVMC has been applied to the Hubbard models [32][33][34], the Heisenberg models [35,36], and the Kondolattice models [37,38], and has succeeded in evaluating electronic states with high accuracy. mVMC has also applied to more realistic models such as the theoretical models for the interfaces of the cuprates (stacked Hubbard models) [39], ab initio low-energy Hamiltonians for the iron-based superconduc-tors [40][41][42][43], and ab initio low-energy Hamiltonians for the organic conductor [44]. It has been shown that the mVMC can be applied to systems with spin-orbit couplings [45][46][47] and systems with electron-phonon couplings [48,49]. Recently, it has been shown that mVMC can treat the real-time evolutions [50,51] and perform finite-temperature calculations [52] based on the imaginary-time evolutions.
Recently, the authors have released mVMC as open-source software with simple and flexible user interfaces [53]. By using this software, users can perform many-variable VMC calculations for widely studied quantum lattice models by preparing only one input file with less than ten lines. Although there are several open-source software of VMC method for continuous space such as CASINO [54], QWalk [55], and turbo-RVB [56], there is no open-source software of VMC method for the quantum lattice model such as the Hubbard or the Heisenberg model to the best of our knowledge. By preparing several additional input files, users can also define general Hamiltonians with any lattice structure, any spatial dimensions and any one and two-body (interaction) terms. The user interface of mVMC is designed for seamless connection to open-source software HΦ [7,57], which is developed by some of the authors for exact diagonalization calculations. By small changes of description in input files, it is easy to check the accuracy of the variational wave functions by comparing the results to the exact diagonalization calculations for small-size systems. mVMC also supports large-scale parallelization. In mVMC, the power-Lanczos method [58] is implemented. This method systematically improves the accuracy of the wave functions.
In this paper, we describe basic usage of mVMC and fundamental properties of trial wave functions implemented in mVMC. We also exposit the key algorithms implemented in mVMC such as the quantum number projections and the SR method. We show some examples of mVMC calculations and demonstrate that the relative systematic errors on the energy become typically less than 10 −2 % compared with the results of the exact diagonalization method. This paper is organized as follows (overview is shown in Fig. 1): In Section 2, we describe how to download and build mVMC (Section 2.1), how to use mVMC (Section 2.2), how to define models and lattices (Section 2.3), and how to visualize results of mVMC calculation (Section 2.4). In Section 3, we explain the algorithms implemented in mVMC. We describe the Monte Carlo sampling method (Section 3.1), details of trial wave functions (Section 3.2), the optimization method (Section 3.3), the power-Lanczos method (Section 3.4), and the parallelizations (Section 3.5). In Section 4, we show benchmark results of mVMC for the Hubbard model, the Heisenberg model, and the Kondo-lattice model. By these benchmarks, we demonstrate excellent performance of mVMC as a numerical solver for obtaining ground states and low-lying excited states of the standard Hamiltonians for interacting fermion systems. Finally, Section 5 is devoted to the summary. are sufficiently converged ? Figure 1: Overview of this paper. For readers who want to use mVMC quickly, please read Section 2 first, where we describe basic usage of mVMC. Section 3 orange be useful for readers who want to learn the key algorithms implemented in mVMC.

How to download and build mVMC
One can download the gzipped tar file of source codes [53], samples, and manuals from the mVMC download site. It is also possible to download the repository of mVMC through GitHub [59].
For building mVMC, a C compiler, the BLAS/LAPACK library [60], and the Message Passing Interface (MPI) [61] are prerequisite. The Scalapack [62] library is optionally required. By using the CMake utility [63], one can build mVMC as follow: $ tar xzvf mVMC-1.0.2.tar.gz $ cmake mVMC-1.0.2/ $ make One can also select the compiler as follows: where $Config is chosen from the following configurations: • gcc : GNU C Compiler • intel : Intel R C Compiler + MKL library • sekirei : Intel R C Compiler + MKL library on ISSP system-B (Sekirei) • fujitsu : Fujitsu C compiler + SSL2 library on the K computer We recommend users to use the CMake utility, because the CMake utility automatically finds the required libraries. However, installing CMake utility is sometimes difficult, for example, in the systems where one does not have the administrative permission. Thus, for a system without the CMake utility, we provide a script for making the Makefiles. By using the mVMCconfig.sh, one can generate the Makefiles as follows: $ bash mVMCconfig.sh gcc-openmpi $ make mVMC This is an example for gcc + openmpi configurations and we provide several options of the combinations between compilers and implementations of the MPI libraries such as the intel compiler and mpich. For details, please refer to the manuals [ In this procedure, one should prepare all the necessary files correctly and it is sometimes time consuming. To reduce efforts in preparing the input files, we provide a simple mode called Standard mode for the standard models in the condensed matter physics as explained in the next subsection. Here, W (L) represents the length for x (y) direction on the square lattice. Wsub (Lsub) is the length of the sublattice structure in the real space for variational parameters. model and lattice specify the types of the standard models and lattice structures, respectively. The hopping transfer and on-site Coulomb interaction are represented by t and U, respectively. By using this input file for the Hubbard model, one can perform mVMC calculations by executing the following command.

$ ./vmc.out -s Stdface.def
If one wants to check the input files without the calculation, it is possible to generate only the input files without calculations by using the following command.

$ ./vmcdry.out Stdface.def
Here, vmcdry.out is the executable file only for generating the input files.

Flow of calculation
Here, we summarize the flow of mVMC calculations. First, one prepares StdFace.def in Standard mode or all necessary input files in Expert mode. Then, by executing vmc.out, one performs the optimization of the variational parameters to lower the energy according to the variational principle. In the actual calculations, one uses the SR method [30,31]. This optimization is the main part and the most time-consuming part of mVMC. After the optimization, by using the optimized variational wave functions, one calculates the specified correlation functions. This flow is summarized in Fig. 2. Optimization is performed for NVMCCalmode=0 and the calculating physical quantities for NVMCCalmode=1. By choosing NLanczosMode=1 (NLanczosMode=2), one can perform the first-step power-Lanczos calculations without (with) calculating correlation functions.

Models and Lattices
We describe available Hamiltonians and lattices in mVMC. General form of available Hamiltonians for mVMC is given by where c † iσ (c iσ ) is the creation (annihilation) operator of an electron on site i with spin σ =↑ or ↓. This Hamiltonian includes the arbitrary one-body potentials and two-body interactions in the particle-conserved systems. General one-body potential t i jσ i σ j represents the hopping between site i with spin σ i and site j with spin σ j . General two-body interaction I i jklσ i σ j σ k σ l represents the interaction which annihilates a spin σ j particle at site j and a spin σ l particle at site l, and creates a spin σ i particle at site i and a spin σ k particle at site k. Corresponding keywords for H T and H I are shown in Table 1. In Standard mode, users can employ the anti-periodic boundary conditions and details are shown in Appendix A.
We note that localized spin-1/2 systems such as the Heisenberg model can be regarded as a special case of the above Hamiltonian by completely excluding the doubly occupied sites at half filling with the Gutzwiller projections. Therefore, by using the Gutzwiller projections, one can use mVMC for solving spin-1/2 quantum spin models by properly interpreting terms of diagonal elements of t i jσ i σ j and I i jklσ i σ j σ k σ l as the potentials and interactions for localized spins, respectively.  Although the Hamiltonian given in Eq.
(3) has the most general form, it is not efficient for specifying the simple interactions such as the on-site Coulomb interactions by using the general form. To reduce the efforts in making the input files, we provide simple forms for the conventional interactions such as the on-site Coulomb interactions (H U ), the off-site Coulomb interactions (H V ), the Ising-type Hund's rule couplings (H H ), the exchange interactions (H E ), and the pair hopping terms (H P ). Forms of each interaction are given as follows: where we define the charge density operator with spin σ at site i as n iσ = c † iσ c iσ and the total charge density operator at site i as n i = n i↑ + n i↓ . Corresponding keywords are shown in Table 1.
In Standard mode, users can select the standard models with the human-readable keywords, i.e., Hubbard-type models are specified with model="Hubbard", localized spin-1/2 Heisenberg models are specified with model="Spin", and Kondo-lattice models are specified with model="Kondo". By adding GC at the back of the keywords for models, for example model="HubbardGC", one can treat the S z non-conserved systems. We note that GC is an abbreviation grand canonical ensemble but mVMC only supports the S z non-conserved systems and does not support the particle non-conserved systems in ver.1.0. We summarize available keywords for lattices and models in Standard mode in Table.2 .
The form of the Hamiltonians with "Hubbard/HubbardGC" is given by where V i j denotes the off-site Coulomb interactions and its range depends on the lattice structures. The form of the Hamiltonians with "Spin/SpinGC" is given by Here, spin operators are defined as follows: The form of the Hamiltonians with "Kondo/KondoGC" is given by where spin operators for the localized spins are denoted by S α and the operators for itinerant electrons are denoted by c † i /c i and n i . Details of the parameters are given in manuals [53].

Display lattice geometry
By using lattice.gp, which is generated after executing vmc.out or vmcdry.out, we can display the geometry of the simulation cell generated by Standard mode as follows:

$ gnuplot lattice.gp
Here, we use gnuplot [66] for visualization. Then the lattice geometry is displayed in the new window as shown in Fig. 3. Figure 3 shows the lattice geometry of the 20-sites Hubbard model on the square lattice and we use the following input file: model = "Hubbard" lattice = "square" a0w = 4 a0l = 2 a1w = -2 a1l = 4 t = 1.0 U = 10.0 nelec = 20

Fourier transformation of correlation functions
mVMC has utility programs (fourier tool) to compute the structure factors in the reciprocal space, which are defined as where X i is an operator such as c i↑ and c † i↑ c i↑ . We note that N cell denotes the number of the unit cells, for example, N site = 2 × N cell for honeycomb lattice. We can compute this type of correlation function by using fourier tool. These programs support the calculations of the one-body correlation (momentum distribution) n σ (k), the charge structure factors N(k), and the  spin structure factors S (k), S xy (k), S z (k), which are defined as We demonstrate an example of calculation for S (k) on the 20-site Hubbard model on the square lattice shown in the previous section. After calculating the correlation function in the real space by using vmc.out with NVMCCalMode=1, the utility program fourier is executed in the same directory as follows: $ fourier namelist.def geometry.dat where geometry.dat is the file specifying the positions of sites, which is generated automatically in Standard mode. Then the fourier-transformed correlation function is stored in a file output/zvo_corr.dat, where zvo is the prefix specified in the input file. For two dimensional systems, we can display a color-plot by using another utility program corplot as follows: In this program, we choose the type of correlation function listed above, and can draw the correlation functions as shown in Fig. 4.

Sampling method 3.1.1. Monte Carlo sampling
In the VMC method, the importance sampling based on the Markov-chain Monte Carlo is used for evaluating the expectation values for the many-body wave functions [15,67]. As a complete basis, we use the real-space configuration where r nσ denotes the position of the nth electron with spin σ.
For the S z -conserved system, we used the fixed S z real-space configuration defined as where N up (N down ) denotes the number of up (down) electrons and the total value of S z is given by S z = (N up − N down )/2. By using {|x }, we rewrite the expected value of the operator A as follows: By performing the Markov-chain Monte Carlo with respect to the weight ρ(x), i.e., by generating the real-space configurations according to the weight ρ(x), we can evaluate A as where N MC is the number of Monte Carlo samplings. In mVMC, we use the Mersenne twister [68] for generating the pseudo random numbers.

Update of real-space configurations
For the itinerant electron systems such as the Hubbard model, we update the real-space configurations |x with the hopping process, i.e., one electron hops into another site as shown in Fig. 5(a). In addition to the hopping update, we can use the exchange update, i.e., opposite spins exchange as shown in Fig. 5(b). For the local spin models such as the Heisenberg model, the hopping update is prohibited and only the exchange update is allowed. Even in the itinerant electrons models with strong on-site interaction, it is necessary to use the exchange update for an efficient Monte Carlo sampling because the creation of the doubly occupied site (or equivalently the creation of the holon site) becomes a rare event in the strong coupling region.
In the S z non-conserved systems, we employ the hopping update with spin flip as shown in Fig. 5 (c). We also employ local spin flip update as shown in Fig. 5(d). Although the local spin flip is included in the hopping with spin flip, for an efficient sampling, it is necessary to explicitly perform the local spin flip.
As we show later, the inner product between the Pfaffian wave functions and the real-space configuration |x is given by the Pfaffian of the skew symmetric matrix X. It is time consuming to calculate the Pfaffian for each real-space configuration. Because the changes in the real-space configuration induce the changes only in a few rows and columns in X, numerical cost can be reduced by using the Sherman-Morrison-type update technique. Details of the fast update techniques of the Pfaffian wave functions are shown in Refs. [32,36].

Wave functions
In mVMC, the form of the variational wave function is given as where |φ Pf denotes the pair-product part of the wave functions, L denotes the quantum-number projectors such as the total-spin and momentum projections, and P denotes the correlation factors such as the Gutzwiller and the Jastrow factors. We detail each part of the wave functions in the following.

Pair-product part -Pfaffian wave functions
The pair-product part of the wavefunction in mVMC is represented as the Pfaffian wave function, which is defined as where N e is the number of electrons and N s is the number of sites. Index i denotes the number of sites and σ i denotes the spin at ith site. For simplicity, we denote the combination of the site index and spin index by the capital letter such as We note that site index i can include the orbital degrees of freedom in the multi-orbital systems. If the total value of S z is conserved and fixed to 0, we often use the anti-parallel Pfaffian wave function defined as The Pfaffian wave function is an extension of the Slater determinant defined as where I denotes the index including the site and spin index and n denotes the index of the occupied orbital. As we show in Appendix B, from the Slater determinant, it is possible to construct the equivalent Pfaffian wave function by using the relations This result shows that the Pfaffian wave functions include the Slater determinants. We note that the Pfaffian wave functions can describe states that are not described by the Slater determinant such as the superconducting phases with fixed particle numbers.
Inner product between N e -particle real-space configuration |x and the Pfaffian wave functions is described by the Pfaffian as follows: where X is a 2N e × 2N e skew-symmetric matrix X whose elements are given as This relation is the reason why |φ Pf is called the Pfaffian wave function. We note that the Pfaffian is defined for the skewsymmetric matrices and its square is the determinant of the skew-symmetric matrices, i.e., We use Pfapack [69] for calculating Pfaffians in mVMC. In the field of the quantum chemistry, the Pfaffian wave functions is often called geminal wave functions [31,70].
Although the Pfaffian wave function has 2N s × (N s − 1) complex variational parameters, to reduce the numerical cost, it is possible to impose the sublattice structures on the Pfaffian wave functions. For example, if we consider the 2 × 2 sublattice structures in the real space for the N s = L × L square lattice, the number of the variational parameters are reduced from O(N 2 s ) to O(N s ). In Standard mode, by specifying the keywords W sub (L sub ), one can impose the sublattice structures in x direction (y direction) for the variational parameters. We note that the sublattice structure allows the orders within the sublattice structures, i.e., in 2 × 2 sublattice in the real space, the ordering wave vectors in the momentum space are limited to (π, π), (0, π), (π, 0), and (0, 0). If one wants to examine stability of the long-period orders, it is better to take larger sublattice structures or not to impose the sublattice structure if possible.

Quantum number projection
The quantum-number projector L consists of the momentum projector L K , the point-group symmetry projector L P , and the total-spin projector L S , which are defined as Here, K is the total momentum of the whole system and T R is the translational operator corresponding to the translational vector R, T p is the translational operator corresponding to the vector point-group operations p, and g p is the character for pointgroup operations. The number of elements in the point group or translational group is denoted by N g . We only consider the total-spin projection which filters out the S z = 0 component in the wave function and generate the wave function with S z = 0 and the total spin S . In general, the total-spin projection which filters out S z = M component and generate the wave function with S z = M and the total spin S is possible. Details of such general spin projections are shown in the literature [71,72].
In the definition of L S , Ω = (α, β, γ) denotes the Euler angles, R(Ω) = e iαS z e iβS y e iγS z is the rotational operator in the spin space, P S (x) is the S th Legendre polynomial, respectively. Here, we explain the essence of the quantum-number projections by taking a point-group projections as an example. When the Hamiltonian preserves a symmetry, any exact eigenstate has to respect its quantum number associated with the symmetry, while wavefunctions constructed so far do not necessarily satisfy this requirement. Such a symmetry-preserved state with a given quantum number can be constructed from a given state |φ as where |α are the eigenvectors of T p and g α ( p) are the eigenvalues. Here, T p is defined as To extract |α from |φ , we define the projection operator as where N g is the number of elements in the point group or translational group. By using the projection operator, we can show the following relation.
For the momentum projection, we take the translational operators T R defined as where R denotes the translational vector and the corresponding character is e ik·R . By using T R and e ik·R , the momentum projection is defined by Eq. (40).
To define the momentum and point-group projections, it is necessary to specify the translational operations (T p ) and the associated weight g α ( p) in the qptrans.def with the keyword TransSym. By preparing the file qptrans.def, one can perform the desired momentum and the point-group projections. For the anti-periodic conditions, it is necessary to consider the changes of signs in the Pfaffian wave functions. Details are shown in Appendix A.
For the total-spin projection, sum of the discrete freedoms becomes the integration over the Euler angles in the spin space and translational operation becomes the rotational operator R(Ω) in the spin space. The associated character is given by P S (cos β). Thus, the spin projection is given in Eq. (42). Although it is possible to perform the spin projection to the general Pfaffian wave function defined in Eq. (31), where S z component is not conserved, mVMC ver. 1.0 only supports spin projection for anti-parallel Pfaffian wave functions defined in Eq. (32). Because total S z in |φ A−Pf is definitely zero, triple integration for the spin projection is reduced to a single integration as follows: The integration is performed by using the Gauss-Legendre formula. We note that anti-parallel Pfaffian wave function is transformed as where κ(σ i , σ j ) is defined as

Correlation factors
To take into account the many-body correlations, the Gutzwiller factors P G [74], the Jastrow factors P J [75,76], the m-site doublon-holon correlation factors P (m) d−h (m = 2, 4) [77] are implemented in mVMC [32], which are defined as follows: In the definitions of the doublon-holon correlation factors, α d mnt and α h mnt , α d 4nt are the variational parameters. Real-space diagonal operators ξ d imnt and ξ h imnt are defined as follows: When a doublon (holon) exists at the ith site and n holons (doublons) exist at the m-site neighbors defined by t around ith site, ξ d imnt (ξ h imnt ) become 1. Otherwise ξ d imnt or ξ h imnt are 0. We note that the above correlation factors are diagonal in real-space representation, i.e., where the P(x) is a scalar number.
By taking g i = −∞, it is possible to completely eliminate the doublons in the real-space configurations. Using this projection, we describe the spin 1/2 localized spins. In mVMC, one can specify the positions of the local spins in LocSpn.def with the keyword LocSpn.
It has been shown that the long-range Jastrow factors play important roles in describing the Mott insulating phase [76]. If the Jastrow factor becomes long ranged, i.e., the indices of the Jastrow factor run over all the sites, v i j often gets large and induce numerical instability. By subtracting constant value from v i j and g i , we are able to stabilize the numerical computation without modifying the results.

Optimization method
In this subsection, we describe the optimization method used in mVMC. The SR method proposed by Sorella [30,31] is an efficient way for optimizing many variational parameters. By using the SR method, it has been shown that more than ten thousands variational parameters can be simultaneously optimized. The SR method largely relaxes the restrictions in the variational wave functions, which allows to improve the accuracy of the variational Monte Carlo method.
The essence of the SR method is the imaginary-time evolution of the wave functions in the restricted Hilbert space that is spanned by the variational parameters as we explain in sec. 3.3.1. This means that the effective dimensions of the Hilbert space increase by increasing the number of the variational parameters. Thus, in principle, the accuracy of the imaginary-time evolution becomes high for the wavefunctions with many-variational parameters. In the SR method, because taking derivatives of wavefunctions is an important procedure, we detail how to differentiate the wave functions in sec. 3.3.1 and Appendix C.
In the SR method, it is necessary to solve the linear algebraic equation with respect to the overlap matrix S . Because the dimension of S is square of the number of the variational parameters, storing the matrix S is the most memory-consuming part in mVMC and it determines the applicable range of mVMC. To relax the limitation, a conjugate gradient (CG) method for the SR method is proposed (SR-CG method) [78]. In this method, because it is not necessary to explicitly compute S , the required memory in mVMC is dramatically reduced. By using this method, it is now possible to optimize the variational parameters up to the order of hundred thousands. We detail the SR-CG method in sec. 3.3.2.

Stochastic reconfiguration method
The imaginary-time-dependent Schrödinger equation is given by By substituting the normalized wave function |ψ(τ) defined by into the Schrödinger equation, we obtain If the wavefunction |ψ(τ) is represented by the variational parameter α(τ), the Schrödinger equation is rewritten as whereα k is the derivative of the kth variational parameter α k with respect to τ. By minimizing the L 2 -norm of the Schrödinger equation with respect toα k , i.e., miṅ α k kα we can obtain the best imaginary-time evolution in the restricted Hilbert space. This minimization principle is called time-dependent variational principle (TDVP) [79], and it is commonly applied to the real-or imaginary-time evolution [50,52,[80][81][82][83][84][85][86][87][88]. From the TDVP, we obtain the following equation By discretizing the derivativeα k as ∆α k /∆τ, we obtain the formula for updating the variational parameters as where and O * means the complex conjugate of O. The operator O k is defined as where |x is a real space configuration of electrons. Here we note that the set of the real space configurations {|x } is orthogonal and complete.
To calculate O k , it is necessary to differentiate the inner product x|ψ with respect to a variational parameter α k . When α k is the variational parameter of the correlation factor, it is easy to perform the differentiation. For example, if α k is the Gutzwiller factor at kth site (α k = g k ), x|ψ becomes e g k D k (x) x|ψ , where D k (x) is doublon at kth site and |ψ is the wavefunction without Gutzwiller factors g k . Thus, the differentiation becomes When α k is the variational parameters for the Pfaffian wave functions, differentiation of the Pfaffian Pf(X) is necessary. Because the coefficient F AB of the Pfaffian wave function is a complex number [F AB = F R AB + iF I AB , Re(F AB ) = F R AB , Im(F AB ) = F I AB ], it is necessary to consider the differentiations with respect to both the real part of F AB and the imaginary part of F AB . By using the following formula for the Pfaffian we obtain Detailed calculations including the proof of Eq. (71) are shown in Appendix C.
For the spin-projected wavefunctions, coefficients of F I J is given by In this case, differentiation with respect to f ab are given as Detailed calculations are given in Appendix C.

Conjugate gradient (CG) method
The overlap matrix S is positive definite and thus a linear equation system (66) can be solved by using a Cholesky decomposition S = LL † , where L is a lower triangular matrix [60,89]. In the Cholesky decomposition, it is necessary to store S , whose dimension is O(N 2 p ). Here, N p is the size of S matrix, i.e., the number of variational parameters. For more than a hundred thousand (10 5 ) parameters, it is difficult to store S matrix in single node and the memory cost determines the applicable range of mVMC. Neuscamman and co-workers [78] succeeded in optimizing 10 5 parameters with the conjugate gradient (CG) method, which requires only a matrix-vector product to solve a linear equation system. They derived a way to multiply an arbitrary vector v by without storing S itself, which reduces the numerical cost greatly. We describe their scheme (SR-CG method) in the following. In and Algorithm 1 CG method for linear equation system Ax = b 1: procedure CG(A, b, x 0 , ε 2 , k max ) 2: ← ε 2 b 2 2 termination threshold 3: x ← x 0 initial guess 4: r ← b − Ax residue vector 5: p ← r search direction 6: ρ ← r 2 2 7: for k ← 1 . . . k max do k max : maximum # of iterations 8: w ← A p 9: α ← ρ/ p w 10: x ← x + αp 11: r ← r − αw By using these form, we can multiply S by an arbitrary realvalued vector v as where This product requires 2(N MC + 1)N p = O(N p N MC ) product, and needs to store N p × N MC sized matrixÕ instead of N p × N p sized matrix S . Once a matrix-vector product y = S v is able to be calculated, we can also solve a linear equation system (66) by the well-known CG method (Algorithm 1). Since one iteration includes one matrix-vector product with O(N p N MC ) products and 5N p products, and the CG method converges within N p iterations, the computational cost of SR-CG in the worst case is O(N 2 p N MC ). Therefore, the SR-CG algorithm reduces the numerical cost if N MC < N p . For details of the CG method, the readers are referred to standard numerical linear algebra textbooks, for example, Ref. [89].

Power Lanczos method
In the power-Lanczos method [58], by multiplying the wave functions by the Hamiltonian, we systematically improve the accuracy of the wave functions. We explain the basics of the power-Lanczos method in this subsection.
The wave function with the nth-step Lanczos iterations is defined as where α n is a kind of variational parameter. After the nth-step Lanczos iterations, α n are determined by minimizing the energy as follows where α = (α 1 , α 2 , . . . , α N ).
In mVMC, the first-step Lanczos iteration is implemented and the energy is calculated as where we define h 1 , h 2(11) , h 2 (20) , and h 3 (12) as From the condition i.e., by solving the quadratic equations, we can determine the optimized value of the parameter α 1 . By using the optimized α 1 , we can calculate other expected values in a similar way.

Parallelization
mVMC supports MPI/OpenMP hybrid parallelization. We adopt different parallelization approaches in the Monte Carlo method and the optimization method as shown in Fig.6.
In the Monte Carlo method, N sampler independent Monte Carlo samplers are created. The number of MPI processes per sampler is unity by default but can be specified from the input file. First each sampler generates N MC /N sampler real-space configurations {|x }. In each MC sampler, the iterations of loops for summations in the quantum-number projections (40,41,42) are parallelized using both MPI and OpenMP. Then the realspace configurations are distributed among MPI processes and physical quantities ψ|A|x / ψ|x are calculated independently. In the SR method, if the CG method is not used, the linear equation (66) is solved by using ScaLAPACK routines. The elements of the overlap matrix S are distributed in block-cyclic fashion. In the CG method, multiplication between the matrix S and a vector is parallelized in the same manner as in the Monte Carlo method.

Benchmark results
In this section, we show benchmark results of mVMC calculations for several standard models in the condensed matter physics such as the Hubbard model, the Heisenberg model and the Kondo-lattice model. For these models, we compare the results by mVMC with the results either by the exact diagonalization or by the unbiased quantum Monte Carlo calculations. From these comparisons, we show that mVMC can generate highly accurate wave functions for the ground states and the low-energy excited states in these standard models.

Hubbard model and Heisenberg model on the square lattice
We show examples of the mVMC calculations for the Hubbard and Heisenberg models on the square lattice. We compare the mVMC calculation with the exact diagonalization and the available unbiased quantum Monte Carlo calculations in the literature.

Comparison with exact diagonalization
By using Standard mode in mVMC, it is easy to start the calculation for the Hubbard model. An example of the input file for 4 × 4-site Hubbard model at half filling is shown as follows: In Fig. 7, we show typical optimization processes for the Hubbard model. We take two initial states in mVMC calculations; one is a random initial state, i.e., each variational parameter of the pair-product part is given by pseudo random numbers. Another initial state is the unrestricted Hartree-Fock (UHF) solutions. Following the relation between the Slater determinant and the Pfaffian wave functions [see Eq.(35)], we generate the variational parameters of the Pfaffian wave function from the Hartree-Fock solutions. The program for the UHF calculations is included in mVMC package. As shown in Fig. 7, by taking the Hartree-Fock solution as an initial state, optimization becomes faster than with a random initial state. This result shows that preparing a proper initial state is important for efficient optimization.
We examine the accuracy of mVMC calculation by comparing to the exact diagonalization result. In mVMC calculations, we can improve the accuracy of the wave function by extending sublattice structures in the variational wave functions because larger sublattice structures can take into account the long-range fluctuations. In Table 3, we compare several physical quantities such as the energy per site E/N s , the doublon density D, the nearest-neighbor spin correlations S nn , and the peak value of the spin structure factorS (k). Definitions of the physical quantities are given as follows: where e µ denotes the nearest-neighbor translational vector, and S (k) is defined in Eq. (22). One can see that the accuracy of the energy is improved by taking the large sublattice structures.
In addition to the energy, it is shown that accuracies of other physical quantities are also improved. We also perform the first-step power Lanczos calculation, which can systematically improves the accuracy of the wave functions. As a result, we show that the first-step power Lanczos iteration improves the accuracy of the energy although other physical properties are not sensitive to the Lanczos iteration at least for the system sizes studied here. The relative error becomes about 10 −4 (10 −2 %) for the best mVMC calculation [mVMC(4 × 4)+Lanczos]. This result shows that mVMC can generate highly accurate wavefunctions in the Hubbard model. Next, in Table 4, we show the results for the Heisenberg model on the square lattice, which is a strong coupling limit of the Hubbard model at half filling. An example of the input file for the 4 × 4-site Heisenberg model is shown as follows: Because the exact diagonalization can be performed up to 6 × 6 square lattice within the realistic numerical cost, we compare mVMC calculations to exact diagonalization results for the 4×4 and 6×6 Heisenberg model. In addition to the energy, we calculate the nearest-neighbor spin correlations S nn , the next-nearestneighbor spin correlations S nnn , and the spin structure factors S (k peak ). Definitions of the physical quantities are the same as those of the Hubbard model. As it is shown in the Hubbard model, by taking a large sublattice structure and performing the power Lanczos method, the accuracies of the wave functions Table 3: Comparisons with exact diagonalization (ED) for 4 × 4 Hubbard model with U = 4 and t = 1 at half filling. ED is done by using HΦ [7,57]. mVMC(2 × 2/4 × 4) means f i j has the 2 × 2/4 × 4 sublattice structure.N p is the size of S matrix, which is the number of variational parameters and k peak = (π, π). Error bars are denoted by the parentheses in the last digit. Lanczos means that the first-step Lanczos calculations on top of the mVMC calculations. In the Lanczos calculations, to reduce the numerical cost, we calculate the diagonal spin correlations such as S z nn = 3/4N s i,µ S z r i · S z r i +eµ andS z (k) = S z (k)/N s , which are equivalent to S nn andS (k) when the spin-rotational symmetry is preserved. are improved. For the best mVMC calculations, the relative error η becomes 10 −8 (10 −6 %) for L = 4, and 10 −5 (10 −3 %) for L = 6. For L = 4, because the Hilbert space is small (about 16000), mVMC with 4 × 4 sublattice structure gives nearly the exact ground-state energy. In this situation, the power-Lanczos calculations become unstable because the variance becomes almost zero. Thus, we do not show the results of the power-Lanczos method for mVMC (4 × 4) in Table 4.
To examine the accuracy of mVMC beyond the system size that can be treated by the exact diagonalization, we calculate the L × L Hubbard and Heisenberg models.For large system sizes, the numerical cost becomes large, we only perform the 2 × 2 sublattice calculations and do not perform the power-Lanczos calculations. In Fig. 8, we show the size dependence of theS (π, π)/N s for the Hubbard model with U/t = 4 and the Heisenberg model. It has been shown that the ground states of the Hubbard model on the square lattice and the Heisenberg model on the square lattice have long-range antiferromagnetic order. To detect the antiferromagnetic order, we perform the size-extrapolation of the spin structure factors and obtain the spontaneous magnetization m s , which is given by We note that the full saturated magnetic moment is given by m s = 1 in this definition. For the Heisenberg model, we plot results from the unbiased quantum Monte Carlo method in Ref. [90] and compare with mVMC calculations. For L ≤ 10, the agreement is nearly perfect, i.e., mVMC can well reproduce the results by the QMC calculations for larger system sizes beyond the exact diagonalization. We find that the deviation from QMC results becomes larger for L ≥ 12. We estimate m s = 0.67 from mVMC calculations, which is higher than the estimate by the QMC calculations (m s = 0.607). This deviation originates from the lower accuracy for the larger system sizes. By taking large-sublattice structure and performing power-Lanczos calculation, one can resolve this slight deviations. However, because such extended calculations are expensive and beyond the scope of this paper, we do not perform such calculations. For the Hubbard model with U/t = 4, we obtain m s = 0.54 which is again slightly larger than those from the QMC calculations (m s ∼ 0.48 [32,91]).

Kondo lattice model on the one-dimensional chain
We apply mVMC to the Kondo lattice model in which the local spin-1/2 spins couple with the conduction electrons through the Kondo coupling J [92]. In mVMC, we describe the local spin by completely excluding the double occupancy. To examine the accuracy of mVMC for the Kondo lattice model, we perform mVMC calculations for one-dimensional Kondo lattice model. An example of the input file for the 8-site Kondo-lattice model is shown as follows:  Table 4: Comparisons with exact diagonalization for 4 × 4 and 6 × 6 Heisenberg model with J = 1. We note k peak = (π, π). The relative errors η become 10 −6 % for L = 4 and 10 −3 % for L = 6, respectively. The definitions of the spin correlations in the Lanczos method and N p are same as those of Table 3.  Table 5: Comparisons with exact diagonalization for one-dimensional Kondo-lattice model with J = 1, t = 1, and L = 8. Notations are the same as Table 3. Upper (Lower) panel shows the results for spin singlet (triplet) sector. In the triplet sector (S = 1), we take total momentum as K = π, which gives the lowest energy in S = 1, while we take total momentum as K = 0 for S = 1. The definitions of the spin correlations in the Lanczos method are the same as those of Table 3  In Table 5, we compare mVMC calculations to the exact diagonalization results. In the mVMC calculations, we examine the size dependence of sublattice structure and the improvement by the power-Lanczos method. Equally to the previous cases, the mVMC calculations for the Kondo lattice model well reproduce the results of the exact diagonalization and the relative error becomes 10 −4 %. Other physical quantities such as the local spin correlations (S onsite ) and next-neighbor spin correlations for local spins (S loc nn ), and the spin structures are well reproduced as shown in Table 5. Definitions of the physical quantities are given as where S i (s i ) denotes the spin operators for local (conduction electrons) spins. In addition to the ground state, by selecting the different quantum number, it is possible to obtain the low-energy state by using mVMC. In the lower panel in Table 5, we show the results of the spin triplet sector. We note that the total momentum of the lowest-energy state in the spin triplet sector is given by K = π. As shown in Table 5, mVMC well reproduces the low-energy-excited state.

Summary
In summary, we have exposited the basic usage of mVMC in Sec. 2 such as the way to download and build mVMC and how to begin calculations by mVMC. In Standard mode, by preparing one input file whose length is typically less than ten lines, one can perform mVMC calculations for standard models in the condensed matter physics such as the Hubbard model, the Heisenberg model, and the Kondo-lattice model. By changing keywords for lattice, one can treat several lattice structures such as the one-dimensional chains, the square lattice, the triangular lattice, and so on. We have also presented the visualization tools included in mVMC package.
In Sec.3, we have explained the basic algorithms used in mVMC calculations such as the sampling method and the update techniques in the sampling. We have also detailed the wavefunctions implemented in mVMC and the quantum number projections. In Sec.3.4, we have expounded on the optimization method used in mVMC, i.e., the basics of the SR method. In order to solve the large linear equations in the SR method, we employ the efficient algorithm proposed by Neuscamman et al. (SR-CG method) [78]. The SR-CG method is detailed in Sec.3.4.2. Although the SR-CG method is an efficient method for optimizing a large number of parameters, more efficient method for smaller number of variational parameters (<2000) were proposed [93]. It is an intriguing future issue to examine it for the small size calculations.
In Sec. 4, we have shown several benchmark results of mVMC calculations. For the Hubbard model, Heisenberg model, and the Kondo-lattice model, we have compared mVMC calculations to the exact diagonalization results and available unbiased calculations in the literature. For all the models, we have shown that mVMC well reproduces the results by the exact diagonalization and the unbiased calculations by the QMC. We note that mVMC well reproduces not only energy but also other physical quantities such as the spin correlations. For the Kondo lattice model, we have presented the low-energy excited state by choosing the spin triplet sector through the quantum number projections and have shown that mVMC also reproduces the low-energy excited states with high accuracy. All results show that mVMC generates highly accurate wavefunctions for the quantum many-body systems.
Recent studies show that the accuracy of the VMC method is much more improved by introducing the backflow correlations for Pfaffian wavefunctions [50] and combining with the tensor network method [94]. Furthermore, in addition to the ground-state calculations, recent studies show that the VMC method can be applicable to the real-time evolution and the finite-temperature calculations in the strongly correlated electron systems [50,52]. Implementation of such extensions in mVMC is a promising way to make mVMC more useful software and will be reported in the near future.

Acknowledgements
We would like to express our sincere gratitude to Daisuke Tahara for providing us his code of variational Monte Carlo method. A part of mVMC is based on his original code. We also acknowledge Hiroshi Shinaoka, Youhei Yamaji, Moyuru Kurita, Ryui Kaneko, and Hui-Hai Zhao for their cooperations on developing mVMC. We would also like to thank the support from "Project for advancement of software usability in materials science" by Institute for Solid State Physics, University of Tokyo, for development of mVMC ver.1.0. This work was also supported by Grant-in-Aid for Scientific Research (16H06345 and 16K17746) and Building of Consortia for the Development of Human Resources in Science and Technology from the MEXT of Japan. We also thank numerical resources from the Supercomputer Center of Institute for Solid State Physics, University of Tokyo. KI was financially supported by Grant-in-Aid for JSPS Fellows (No. 17J07021) and Japan Society for the Promotion of Science through Program for Leading Graduate Schools (MERIT). We thank the computational resources of the K computer provided by the RIKEN Advanced Institute for Computational Science through the HPCI System Research project, as well as the project "Social and scientific priority issue (Creation of new functional devices and high-performance materials to support next-generation industries; CDMSI)" to be tackled by using post-K computer, under the project number hp160201, and hp170263 supported by Ministry of Education, Culture, Sports, Science and Technology, Japan (MEXT) .

Appendix A. Anti-periodic boundary conditions
In mVMC, we can specify the phase for the hopping through the cell boundary with phase0 (a 0 direction) and phase1 (a 1 direction). For example, a hopping from the ith site to the jth site through the cell boundary with the positive a 0 direction becomes By using phase0 and phase1 we can treat the anti-periodic boundary conditions. For example, by taking phase0=π in the one-dimensional chain, we can treat the anti-periodic boundary conditions. When the anti-periodic boundary condition is used, the sign of the creation/annihilation operator changes when the site index is translated across the boundary of the simulation cell. This causes the sign change of the pair wavefunction f i j in the momentum projection or the sub-lattice separation. For example, we demonstrate the case of one dimensional four-sites system with the anti-periodic boundary condition. If we translate a term f 03 c † 0↑ c 3↓ with +2, this term becomes where T +2 is the translation operator, and we use the antiperiodic boundary condition at the last equation. If we divide this system into two sub-lattices, the wavefunction should be invariable to the translation T +2 . Therefore, the variational parameter must satisfy the condition f 03 = − f 21 .
In addition, we also should consider the sign-change in the momentum projection. The translation T R which associate to the vector R is applied to the two-body wavefunction as follows: When the site translated from site i with the vector R [T R (i)] crosses the boundary, it is out of the range from 1 to N s . We define the siteT R (i) as a site folded into the original simulation cell from the site T R (i) by using the periodicity and s R (i)(= ±1) as a sign caused during the translation from T R (i) toT R (i) across anti-periodic boundary, i.e., T R c i = s R (i)cT R (i) . Then Eq. (A.3) becomes where k = T R (i), l = T R ( j).
In mVMC, we can specify the anti-symmetric condition of two-body wavefunctions. (part f i j = − f kl ) as well asT R (i) and s R (i) for the quantum number projection. If we use antiperiodic boundary condition in Standard mode, they are automatically adjusted without any additional work.

Appendix B. Relation between a Pfaffian wave function and single Slater determinant
In this section, we show the relation between Pfaffian wavefunctions and Slater determinants, which are defined as where I, J denote the site index including the spin degrees of freedom. We assume that Φ is the normalized orthogonal basis, i.e., where δ nm is the Kronecker's delta. Due to this normalized orthogonality, we obtain the following relations: [ψ † n , ψ m ] + = δ nm , (B.4) Here, we rewrite |φ SL and obtain an explicit relation between F I J and Φ In . By using the commutation relation for ψ † n , we rewrite |φ SL as |φ SL ∝ where K † n = ψ † 2n−1 ψ † 2n and we use the relation K † n K † m = K † m K † n for n m. This result shows that F I J can be described as We note that this is one of expressions of F I J for single Slater determinant, i.e, F I J depend on the pairing degrees of freedom (how to make pairs from ψ † n ) and gauge degrees of freedom (we can arbitrarily change the phase of Φ as Φ I → e iθ Φ I ). This large number of degrees of freedom is the origin of huge redundancy in F I J .
For S z = 0 case, i.e., N up = N down , the anti-parallel Pfaffian wave functions and Slater determinant are given as |0 . (B.12) By using the similar argument, we obtain the relation as follows:

Appendix C. Differentiation of Pfaffian
We prove the relation given in Eq. (71). To prove the relation, we use the following relations given as where A is a skew-symmetric matrix, B is an arbitrary square matrix, and δx denotes infinitesimal quantity. A proof of Eq. (C.1) is shown in the literature [32] and it is easy to prove Eq. (C.2) by simple calculations. It is possible to expand a Pfaffian with respect to small δx as follows: For simplicity, we consider the derivative with respect to the real part of F I J or f i j . In mVMC, because (X) I J = F I J − F JI , its derivative with respect to F R AB is given by where we use the fact that X −1 is also a skew-symmetric matrix. In a similar way, for the spin-projected wave functions, the derivative with respect to f R ab is given by and we obtain Tr X −1 ∂Pf(X) ∂ f R ab = −2 σ,σ κ(σ, σ )(X −1 ) aσ,bσ . (C.10) Using the above relations, it is easy to obtain Eq. (75).