Permutation blocking path integral Monte Carlo: A highly efficient approach to the simulation of strongly degenerate non-ideal fermions

Correlated fermions are of high interest in condensed matter (Fermi liquids, Wigner molecules), cold atomic gases and dense plasmas. Here we propose a novel approach to path integral Monte Carlo (PIMC) simulations of strongly degenerate non-ideal fermions at finite temperature by combining a fourth-order factorization of the density matrix with antisymmetric propagators, i.e., determinants, between all imaginary time slices. To efficiently run through the modified configuration space, we introduce a modification of the widely used continuous space worm algorithm, which allows for an efficient sampling at arbitrary system parameters. We demonstrate how the application of determinants achieves an effective blocking of permutations with opposite signs, leading to a significant relieve of the fermion sign problem. To benchmark the capability of our method regarding the simulation of degenerate fermions, we consider multiple electrons in a quantum dot and compare our results with other ab initio techniques, where they are available. The present permutation blocking path integral Monte Carlo approach allows us to obtain accurate results even for $N=20$ electrons at low temperature and arbitrary coupling, where no other ab initio results have been reported, so far.

The widely used path integral Monte Carlo (PIMC) method, e.g. [17], is a highly successful tool for the ab initio simulation of both distinguishable particles ('boltzmannons', e.g. [18,19]) and bosons [17] and allows for the calculation of quasi-exact results for up to N 10 3 ∼ particles [20] at finite temperature. However, the application of PIMC to fermions is hampered by the notorious sign problem [21], which renders even small systems unfeasible for state of the art techniques and has been revealed to be NP-complete for a given representation [22]. With increasing exchange effects, permutation cycles with opposite signs appear with nearly equal frequency and the statistical error increases exponentially. For this reason, standard PIMC is applicable to fermions only at weak degeneracy, that is, at relatively high temperature or low density.
The recently introduced configuration path integral Monte Carlo (CPIMC) method [9,23,24] exhibits a complementary behavior. This conceptually different approach can be interpreted as a Monte Carlo simulation on a perturbation expansion around the ideal quantum system and, therefore, CPIMC excells at weak nonideality and strong degeneracy. Unfortunately, the physically most interesting region, where both fermionic exchange and interactions are strong simultaneously, remains out of reach.
A popular approach to extend standard PIMC to higher degeneracy is Restricted PIMC (RPIMC) [25], also known as fixed node approximation. This idea requires explicit knowledge of the nodal surfaces of the density matrix, which are, in general, unknown and one has to rely on approximations, thereby introducing an uncontrollable systematic error. In addition, it has been shown analytically [26,27] that RPIMC does not reproduce the exact density matrix in the limit of the ideal Fermi gas and, therefore, the results become unreliable at increasing degeneracy [9].
Recently, DuBois et al [28] have suggested that, at least for homogeneous systems, the individual exchange probabilities in PIMC are independent of the configuration of other permutations present and that permutation frequencies of large exchange cycles can be extrapolated from few-particle permutations. This would allow for a significant reduction of the configuration space and a drastic reduction of the sign problem. While first simulation results with this approximation for the short-range interacting 3 He are in good agreement with experimental data [28], the existing comparison [9] for long-range Coulomb interaction is insufficient to assess the accuracy and, in addition, inhomogeneous systems remain out of reach.
Another possibility to relieve the sign problem in fermionic PIMC without introducing any approximations is the usage of antisymmetric imaginary time propagators, i.e., determinants [10,[29][30][31]. It is well known that the sign problem becomes more severe with an increasing number of propagators arising from the Trotter-type factorization of the density operator. Consequently, it has been proposed to combine the antisymmetric propagators with a higher order factorization [32][33][34][35] of the density matrix. This has recently allowed to obtain an accurate estimate of the ground state energy of degenerate, strongly nonideal electrons in a quantum dot [36].
In the present work, we extend this idea to finite temperature. For this purpose, we combine a fourth-order propagator derived in [37], which has already been succesfully applied to PIMC by Sakkos et al [38], with a full antisymmetrization on all time slices to simulate fermions in the canonical ensemble. We demonstrate that the introduction of determinants effectively allows for the combination of N! configurations from usual PIMC into a single configuration weight, thereby reducing the complexity of the problem and blocking both positive and negative weights to drastically increase the sign. To efficiently exploit the resulting configuration space with the Metropolis algorithm [39] at arbitrary parameters, we develop a set of Monte Carlo updates similar to the usual continuous space worm algorithm (WA) [20,40].
To demonstrate the capability of our permutation blocking (PB-PIMC) method, we consider Coulomb interacting fermions in a 2D harmonic confinement, cf equation (30), which can be experimentally realized e.g. by spin-polarized electrons in a quantum dot [1][2][3][4]. Figure 1(A) shows the average sign S for N = 20 electrons, plotted versus the coupling strength λ, cf equation (31). CPIMC is applicable in the weakly nonideal regime [I], where the system is predominantly shaped by the Fermi statistics. In contrast, standard PIMC allows one to accurately simulate systems in the strongly coupled regime [III], where exchange effects are not yet dominating, and bosons and fermions exhibit a very similar behavior. The PB-PIMC method, as will be shown in this work, is applicable over the entire coupling range yielding reasonably accurate results with acceptable computational effort. Interestingly, this includes the physically most interesting transition region [II], where both the Coulomb repulsion and quantum statistics govern the system. Here no ab initio results have been reported to this date, except for very small particle numbers, since PIMC and CPIMC fail, due to the sign problem. In panel (B), we show density profiles from all three regimes. Evidently, the transition from the strongly coupled system with a pronounced shell structure ( 15 λ = ) to the nearly ideal Fermi gas with the characteristic weak density modulations ( 0.1 λ = ) can be resolved. In the remainder of this work, we introduce the PB-PIMC method in detail. We show that the optimal choice of two free parameters of the fourth-order factorization allows for a calculation of energies and densities with an accuracy of the order of 0.1% with as few as two or three propagators, even in the low temperature regime. We calculate energies and densities from PB-PIMC for N = 20 electrons at low temperature over the entire coupling range. We find excellent agreement with both PIMC and CPIMC in the limitting cases of strong and weak coupling, respectively, and perform simulations in the transition regime, where no other ab initio results are available. Finally, we investigate the performance behavior of our method when the system size is varied.

Idea of PB-PIMC
We consider the canonical ensemble (the particle number N, volume V and inverse temperature k T 1 B β = are fixed) and write the partition function in coordinate representation as contains the coordinates of all particles andπ σ denotes the exchange operator corresponding to a particular element σ from the permutation group S N . The Hamiltonian is given by the sum of the kinetic (K ) and potential (V ) energy, H K Vˆ= + . For the next step, we use the group property of the density operatorˆe e , Note that we have exploited the permutation operatorʼs idempotency property in equation (3) to introduce antisymmetry on all P imaginary time slices. Following Sakkos et al [38], we introduce the factorization from [37], e e e e e e e , for each of the exponential functions in equation (3). By including double commutator terms of the form we have to evaluate the total force on each particle, , and equation (4) is accurate to fourth order in ϵ. The explicit form of the modified potential terms Ŵ is given by  is visualized in figure 2. The inverse temperature β has been split into P = 4 intervals of length ϵ, which are further divided into three, in general, nonequidistant sub-intervals. Thus, for each main 'bead' τ α , there exist two daughter beads, A τ α and B τ α . Let us for a moment ignore the antisymmetry in equation (3) and evaluate the imaginary time propagator in a straightforward way [38]: with the definitions of the potential terms  The right panel illustrates the combination of all PN 3 ! possible trajectories into a single configuration weight W X ( ). Between each two adjacent time slices, both the connection between beads from the same particle (diagonal elements of the diffusion matrix, the blue and red lines) and between beads from different particles (off-diagonal elements, the green lines) are efficiently grouped together to improve the average sign.
where λ β denotes the thermal wavelength m 2 2 2 λ π β = ℏ β and D is the dimensionality of the system. Thus, the matrix elements of equation (10) are equal to the free particle density matrix, i j t r r ( , ) ( , , ) The permutation operator commutes with bothρ and Ĥ and we are, therefore, allowed to artificially introduce the antisymmetrization between all P 3 slices without changing the result. This transforms equation (8) to Finally, this gives the partition function and the integration is carried out over all coordinates on all P 3 slices: The benefits of the partition function equation (12) are illustrated in the right panel of figure 2 where the beads of two particles are plotted in the τ-x-plane. In the usual PIMC formulation (without the determinants), each of the particles would correspond to a single closed trajectory as visualized by the blue and red connections. To take into account the antisymmetry of fermions, one would also need to sample all configurations with the same positions of the individual beads but different connections between adjacent time slices, which have both positive and negative weights. By indroducing determinants between all slices, we include all N! possible connections between beads on adjacent slices (the green lines) into a single configuration weight and the usual interpretation of mapping a quantum system onto an ensemble of interacting ringpolymers [41] is no longer appropriate. Therefore, a large number of sign changes, due to different permutations, are grouped together resulting in an efficient compensation of many terms (blocking), and the average sign (cf equation (22)) in our simulations is significantly increased [31].

Energy estimator
The total energy E follows from the partition function via the familiar relation Substituting the expression from equation (12) into (14) and performing a lengthy but straightforward calculation gives the final result for the thermodynamic (TD) estimator  To split the total energy into a kinetic and a potential part, we evaluate and find the TD estimator of the kinetic energy Thus, the estimator of the potential energy is given by We notice that the forces contribute to both the kinetic and the potential energy. For completeness, we mention that, for an increasing number of propagators, P → ∞, the first and second terms in equation (15) diverge, which leads to a growing variance and, therefore, statistical uncertainty of both E and K. To avoid this problem, one might derive a virial estimator, e.g. [42], which requires the evaluation of the derivative of the potential terms instead. However, since we are explicitly interested in performing simulations with few propagators to relieve the fermion sign problem, the estimator from equation (15) is sufficient.

Monte Carlo algorithm
In section 2, we have derived an expression for the partition function Z, equation (12), which incorporates determinants of the diffusion matrices between all P 3 time slices, thereby combining PN 3 ! different configurations from the usual PIMC into a single weight W X ( ). However, each determinant can still be either positive or negative, depending on the relative magnitude of diagonal and off-diagonal elements. Hence, we apply the Metropolis algorithm [39] to the modified partition function and calculate fermionic expectation values as with the definition of the average sign and the signum of the configuration X, Let us summarize some important facts about the configuration space defined by equation (20): (i) With increasing number of propagators P, the effect of the blocking decreases and, for P → ∞, the sign converges to the sign of standard PIMC. Blocking is maximal if t 1 λ ϵ and t 2 0 λ ϵ are comparable to the average interparticle distance d, cf figure 3. Only in such a case, there can be both large diagonal and off-diagonal elements in the diffusion matrices.
(ii) Configuration weights W X ( ) | | can only be large, when at least one element in each row of each diffusion matrix is large. Therefore, we sample either large diagonal or large off-diagonal elements. Blocking happens naturally as a by-product and does not have to be specifically included into the sampling. This also means that we have to implement a mechanism to sample exchange, i.e., to switch between large diagonal and offdiagonal diffusion matrix elements.
(i) There are no fixed trajectories. Therefore, beads do not have a previous or a next bead, as in standard PIMC.
For an efficient and flexible sampling algorithm, we temporarily construct artificial trajectories and choose the included beads randomly.
The most efficient mechanism for the sampling of exchange cycles in standard PIMC is the so-called worm algorithm [20,40], where macroscopic trajectories are naturally realized by a small set of local updates which enjoy a high acceptance probability. In the rest of the section, we modify this algorithm to be applicable to the new configuration space without any fixed connections between individual beads.

Sampling scheme
To take advantage of the main benefits from the usual continuous space WA, we will temporarily construct artificial trajectories and sample new beads according to standard PIMC techniques, e.g. [43]. The initial situation for our considerations is illustrated in the left panel of figure 4, where a pre-existing trajectory (pink curve) with four missing beads in the middle is shown in the τ-x-plane. We choose the sampling probability to close the configuration as which is a Gaussian (cf the blue curves in figure 4) with the variance around the intersection of the connection between the previous coordinate, r i 1 − , with the end point r M and the time slice i τ r r.

Artificial worm algorithm
In the usual WA-PIMC, the configuration space is defined by the Matsubara Green function (e.g. [44]) which implies that the algorithm does not only allow for the change of the particle number N (grand canonical ensemble) but, in addition, requires the generation of configurations with a single open path, the so-called worm. However, in the PB-PIMC configuration space defined by equation (12), there are no trajectories and, therefore, no direct realization of a worm is possible. Instead, we consider an extended ensemble, which . Only with few propagators, the thermal wavelength λ ϵ of a single propagator is comparable to the mean interparticle distance d, which is crucial for an efficient grouping of permutations into a single configuration weight. With increasing P, diagonal (red and blue lines) and off-diagonal (green lines) distances are no longer of the same order and the permutation blocking is inefficient.
combines closed configurations with a total of NP 3 beads and open configurations, where on some consecutive time slices the number of beads is reduced by one, to N 1 − . Such a configuration is illustrated in the right panel of figure 4. There are two special beads which are denoted as 'head' and 'tail' and the triangles, circles and squares symbolize beads from three different particles. There are eight beads from different particles missing (indicated by the empty symbols at the right boundary) between For most slices, the computation of the diffusion matrix allows for no degree of freedom in the extended ensemble. We define the latter in a way, that the head bead does not serve as a starting point for the elements but is treated as if it was missing. This is justified because, otherwise, there does not necessarily exist a large matrix element in this particular row because no artificial connection has been sampled on the next slice. For the configuration from figure 4, the diffusion matrix of the headʼs time slice is given by However, we emphasize that the particular choice of the extended ensemble does not influence the extracted canonical expectation values as long as detailed balance is fulfilled in all updates. We have developed a simulation scheme which consists of four different types of moves that ensure detailed balance and ergodicity. The updates are presented in detail in the appendix. . The circles, triangles and squares distinguish beads from three different particles and the empty symbols at the right boundary indicate the missing beads on a particular slice.

Simulation results
As a test system to benchmark our method, we consider N spin-polarized electrons in a quantum dot [1][2][3][4], which can be described approximately by a harmonic confinement with a frequency Ω. We use oscillator units, i.e., the characteristic energy scale E 0 Ω = ℏ and oscillator length l m Ω = ℏ , and obtain the dimensionless Hamiltonian being defined as the ratio of Coulomb and oscillator energy. For large λ, the electrons are strongly coupled and exchange effects become negligible (region [III] in figure 1), while, for 1 λ ≪ , the ideal Fermi gas will be approached and the system is governed by the fermionic exchange (region [I] in figure 1). To confirm the quality of our simulations, we compare the results at weak and strong coupling with CPIMC and standard PIMC, respectively, where they are available.

Optimal choice of a 1 and t 0
We start the discussion of the simulation results by investigating the effects of the two free parameters a 1 and t 0 on the convergence of two different observables, namely the energy E and radial density n r ( ). ] denote two different combinations of free parameters and exhibit a clearly different convergence behavior towards the exact result known from CPIMC, i.e., the black line. For P = 2, the energy with parameter set (a) is too low by almost one percent. With increasing P, E increases and reaches a maximum around P = 5, until the curves approach the exact energy from above. For parameter set (b), the energy converges monotonically from above and, even for P = 2, the deviation from the CPIMC result is as small as 0.2%. The selected energies which are listed in table 1 reveal that the total energy is converged for P = 14 within the statistical uncertainty. For the panels (C) and (D), the energy has been split into a potential (V) and kinetic (K) contribution. For both parameter combinations, V converges monotonically, although from different directions. In addition, parameter set (b) gives a much better result for small P. Panel (D) reveals, that the kinetic energy K is responsible for the non-monotonous convergence of E for parameter set (a), which again delivers worse results for P = 2, as compared to the blue circles. Finally, panel (B) shows the average sign S as a function of P 1 . Both curves exhibit a similar decrease with an increasing number of propagators, as it is expected. However, parameter set (a) always allows for a better sign than (b). The reason for this behavior is the free parameter t 0 , which controls the relative spacing between the three time slices of an imaginary time step ϵ. For t 0.04 0 = , there are a single small and two large steps. The latter allow for more blocking, since the corresponding decay length t 1 λ ϵ in the diffusion matrices is large as well. For t 0.13 0 = , on the other hand, there are three nearly equal steps, each of which with a smaller decay length than the two large ones for parameter set (a). Therefore, less blocking is possible and more determinants with a negative sign appear in the Markov chain.
The different convergence behaviors of the two free parameter combinations for small P leads to the question how to choose t 0 and a 1 for optimal results. To provide an answer, we consider the same system as in figure 5, and investigate the accuracy of the total energy as a function of t 0 , for a fixed a 0.33 1 = . The simulation results are shown in the left panel of figure 6 for P = 2 (red squares), P = 3 (blue circles) and P = 4 (green diamonds). All three curves exhibit a similar decay towards the exact value starting from small t 0 , followed by a minimum around t 0.14 0 = and finally an increasing error for larger values. We note that as few as two propagators allow for an accuracy of E E 2 10 3 Δ | | < × − for the best choice of the free parameters. Figure 6(B) shows the dependency of the average sign S on t 0 . Again, we observe that S decreases with increasing t 0 as explained during the discussion of figure 5. In addition, it is revealed that the combination of P = 4 and t 0.01 0 = leads to a larger sign than P = 3 and t 0.10 0 > . However, the optimum free parameters allow for a higher accuracy even for P = 2, compared to small t 0 with more propagators. Therefore, it turns out to be advantagous to use the fourth order factorization with the two free parameters despite the smaller average sign for the same P compared to the factorization with only a single daughter slice for each propagator, i.e., t 0.0 0 = . Finally, we mention that the optimal choice of a 1 and t 0 depends on the observable of interest. In figure 7, we investigate the effects of the free parameters on the convergence of the radial density distribution n r ( ) for the same system as in figures 5 and 6. The left panel shows n as a function of the distance to the center of the trap, r, for four different P and the parameter combination a 0.33 1 = and t 0.13 0 = , which has been proven to allow for nearly optimum energy values at P = 2, cf figure 6. The black curve corresponds to P = 10 and is converged within statistical uncertainty. For P = 2 (red diamonds), there appear significant deviations to the latter, in particular n is too large around the maximum r 1. 25 ≈ and too small at the boundary of the system. The P = 3 results (blue squares) exhibit the same trends although the differences towards the black curve are reduced. Finally, the density for P = 4 (green circles) can hardly be distinguished from the converged data. The right panel compares the density for P = 2 with two different combinations of free parameters. The red diamonds (parameter set (a)) correspond to the curve from the left panel and the green circles (parameter set (b)) to a 0.0 1 = and t 0.04 0 = . The latter parameters clearly allow for a density distribution which is much closer to the exact results than the a 1 and t 0 values which provide the optimal energy.

Temperature dependence
In the last section, we have demonstrated that the optimal choice of the free parameters a 1 and t 0 allows for the calculation of energies with an accuracy of 0.1% with as few as two propagators, even at a relatively low temperature, 5.0 β = . However, with decreasing T (i.e., increasing β) the number of required propagators must be increased to keep the commutator error fixed. In figure 8, we investigate the effect of a decreasing temperature on the accuracy provided by a few propagators P for N = 4 electrons at indermediate coupling, The left panel shows the total energy E as a function of the inverse temperature β. We compare results for P = 2 (green circles), P = 3 (red diamonds) and P = 4 (blue triangles) to exact results from CPIMC (black stars). At larger temperature, 7.0 β ⩽ , all four datasets nearly coincide and exhibit the expected decrease towards the energy of the ground state. With increasing β, the P = 2 results exhibit an unphysical drop because two propagators are not sufficient and the commutator errors become more significant. The red and blue curves exhibit a qualitatively similar trend, however, the energy drop is weaker and shifted to lower temperature. Even at 10.0 β = , which is already very close to the ground state, three propagators allow for an accurate description of the system.
In the right panel of figure 8, the average sign S is plotted versus the inverse temperature. At small β, the wavefunctions of the electrons do not overlap and, hence, the system is not degenerate. With decreasing temperature, exchange effects become increasingly important which leads to a decrease of S. However, while for standard PIMC the sign is expected to exponentially decrease with β, S seems to converge for PB-PIMC with P = 3 and P = 4 and exhibits an even slightly non-monotonous behavior for P = 2. The application of antisymmetric propagators leads to a competition with respect to S and β. On the one hand, with increasing inverse temperature off-diagonal matrix elements are increased, which leads to more negative determinants and, therefore, more negative weights in the Markov chain. On the other hand, the thermal wavelengths  are increasing with β, which makes the blocking of large diagonal and off-diagonal elements more effective. Hence, the sign can even become larger with β once the system has reached the ground state, because the particle distribution remains constant while more elements in the diffusion matrix compensate each other in the determinants.
We conclude that few propagators allow for the calculation of accurate results up to low temperature, 10.0 β ⩽ . For higher β, the system is in its ground state and finite temperature PIMC is no longer the method of choice.

Dependence on the coupling strength
In the previous sections, we have restricted ourselves to the investigation of small systems to illustrate the convergence and sign behavior depending on relevant parameters. In this section, we demonstrate that PB-PIMC allows for the calculation of accurate results at parameters where no other ab initio results have been reported, so far. Figure 9 shows results for N = 8 and N = 20 electrons at 3.0 β = over a wide range of coupling parameters, λ. In panel (A), the average sign S is plotted versus λ for standard PIMC (squares), CPIMC (circles) and the present PB-PIMC (diamonds) with P = 2 and the parameter sets t 0.14 0 = and a 0.33 1 = (N = 8, blue symbols) and t 0.10 0 = and a 0.33 1 = (N = 20, red symbols), which are known to allow for accurate energies, cf figure 6. It is well understood that PIMC allows for the simulation of strongly coupled fermions, where exchange effects do not play a dominant role. With decreasing λ, the sign exhibits a sharp drop and the sign problem prevents the simulation within feasible computation time for 2.0 λ ⩽ and 5.0 λ ⩽ , respectively. Evidently, larger systems lead to a more severe decrease of S at larger coupling strength. CPIMC, on the other hand, can be interpreted as a Monte Carlo simulation on a perturbation expansion around the ideal quantum system, i.e., 0.0 λ = . Hence, the method efficiently provides exact results for small coupling, where the system is close to an ideal one. For N = 20 around 0.3 λ ≈ , the sign almost instantly drops from S 0.97 ≈ towards zero, and CPIMC is no longer applicable, without further approximation. This means that, in particular for larger systems, there have only been results for systems that are (a) almost ideal or (b) so strongly coupled that fermions and bosons lead to nearly equal physical properties. The physically particularly interesting regime where Coulomb correlations and Fermi statistics are significant simultaneously, has remained out of reach.
However, the average sign from PB-PIMC exhibits a much less severe drop with decreasing λ than standard PIMC and saturates for 0.7 λ ⩽ . For N = 8, the average sign remains above S 0.08 = , which allows for good accuracy with relatively low effort. The small sign, S 10 3 ∼ − , for N = 20 indicates that the simulations are computationally involved but, in contrast to PIMC and CPIMC, still feasible. In panel (B) of figure 9, the total energy E for N = 20 is plotted versus λ over the entire coupling range and the statistical uncertainty from the PB-PIMC results is smaller than the size of the data points. Both, at small and large λ, the P = 2 results are in excellent agreement with the exact energy known from the other methods and, in addition, results are obtained for the particularly interesting transition region (region [II] in figure 1). In panel (C), we show the radial density for N = 20 and low coupling, 0.10 λ = , calculated with the parameter set t 0.04 0 = and a 0.0 1 = , which has been proven effective for accurate densities n r ( ). The PB-PIMC results (red diamonds) are in excellent agreement with the exact CPIMC data (blue squares) over the entire r-range. For completeness, we mention that this combination of parameters allows for an approximately three times as high sign as the choice from panels (A) and (B), which was choosen to allow for a good energy, and the results have been obtained within t 10 CPU 3 ∼ Figure 8. Temperature dependence for N = 4 and 1.3 λ = with t 0.14 0 = and a 0.33 1 = -in the left panel, the total energy is plotted versus the inverse temperature β for P = 2, P = 3 and P = 4 propagators and compared to exact CPIMC results. The right panel shows the behavior of the sign. core hours. Panel (D) shows the density of a strongly coupled system, 15.0 λ = , and N = 20. Again, the two propagators already provide very good agreement with the exact curve. In figure 1(B), we have shown density profiles for coupling parameters over the entire coupling range. At 15 λ = (red pluses), there are three distinct shells and the physical behavior is dominated by the strong Coulomb repulsion. Decreasing the coupling to 5 λ = (green bars) leads to a reduced extension of the system, and the three shells exhibit a much larger overlap. At indermediate coupling, 2 λ = (blue crosses), both the interaction and fermionic exchange govern the system. The density profile is still significantly more extended than the ideal pendant, but n exhibits modulations instead of a flat curve. Decreasing the repulsion further to 0.7 λ = (pink circles) leads to a further reduction of the extension. However, n does not approach a Gaussian-like profile as for ideal boltzmannons or bosons, but continues to exhibit the density modulations which are characteristic for fermions. For 0.1 λ = , the system is almost ideal and the density is completely dominated by the quantum statistics.
Finally, in figure 10 we compare density profiles for N = 20 particles at 3.0 β = with Fermi-, Bose-and Boltzmann statistics. Panel (A) shows results for intermediate coupling, The distinguishable boltzmannons (blue diamonds) exhibit a nearly flat profile without any shell structure, i.e., a liquid-like behavior. The bosonic particles (green circles) lead to an even smoother curve, with a slightly reduced extension of the system. For fermions (red squares), on the other hand, the exchange already plays a significant role, as the particles exhibit an additional repulsion due to the Pauli principle, and n decays only at larger r. In addition, the fermionic density profile exhibits distinct modulations. In panel (B), we show a comparison for smaller coupling, 0.7 λ = . Again, the boltzmannons and bosons lead to smooth density profiles which are very similar, despite a reduced extension of the Bose-system and an increased density around the center of the trap. The fermions exhibit a different behavior as the system is significantly more extended and the density profile again features distinct modulations.
In conclusion, we have presented ab initio results for the energy and the density for up to 20 electrons over the entire coupling range. A comparison with standard PIMC and CPIMC has revealed excellent agreement in both the limits of weak and strong coupling. A more detailed investigation of the transition from the classical to the degenerate regime, including systematic comparisons with bosons and boltzmannons, is beyond the scope of this work and will be published elsewhere.

Particle number dependence
In the last section, we have shown that the sign problem is more severe for larger systems, cf figure 9(A). Here, we provide a more detailed investigation of the performance of our method in dependence on the particle number.
In figure 11, the average sign S is plotted versus N for 0.1 λ = and 3.0 β = , i.e., a very degenerate system, with two different combinations of free parameters. It is revealed that S exhibits an exponential decay with the system size and, as usual, the smaller t 0 leads to a more effective blocking. Therefore, the PB-PIMC approach still suffers from the fermion sign problem, and feasible system sizes for 2D quantum dots at weak coupling are limited to N 30 ⩽ . This is a remarkable result since standard PIMC simulations for 0.1 λ = and 3.0 β = are possible only for N 4 ⩽ .

Discussion
In summary, we have presented a novel approach to the PIMC simulation of degenerate fermions at finite temperature by combining a fourth-order factorization of the density matrix with a full antisymmetrization between all imaginary time slices. The latter allows to merge PN 3 ! configurations from the standard PIMC formulation into a single configuration weight, thereby efficiently grouping together permutations of opposite  signs which leads to a significant relieve of the fermion sign problem. To efficiently run through the resulting configuration space at arbitrary system parameters, we have modified the widely used continuous space WA by introducing an extended ensemble with open configurations and by temporarily constructing artificial trajectories. We have demonstrated the capabilities of our method by simulating up to N = 20 electrons in a quantum dot. It has been revealed that the (empirical) optimal choice of the free parameters a 1 and t 0 from the fourth order factorization allows for the calculation of energies with an accuracy of 0.1% even for just two propagators. For completeness, we mention that different observables lead to different optimal parameters. We have concluded, that it appears to be favourable to use two instead of a single daughter time slice for each time step ϵ, despite the reduced sign for the same number of propagators.
The investigation of the temperature dependence of the convergence with respect to the number of time steps P has revealed, that as few as three propagators are sufficient to accurately simulate fermions, up to 10.0 β ⩽ . For larger inverse temperatures, the system approaches its ground state and finite temperature PIMC techniques are no longer the methods of choice.
To demonstrate that our PB-PIMC approach allows for the calculation of accurate results for systems beyond the capability of any other quantum Monte Carlo technique, we have simulated N = 20 electrons at relatively low temperature, 3.0 β = , and arbitrary coupling strength. CPIMC excells at weak coupling and provides exact results for 0.3 λ < , i.e., in the region where the systems are still close to the ideal case. Standard PIMC, on the other hand, is applicable at strong coupling 5.0 λ ⩾ where exchange effects are not yet dominating, until the rapid decrease of the sign renders any simulation unfeasible. For PB-PIMC, the sign converges for 0.7 λ ⩽ and, hence, computations are possible at arbitrary degeneracy, in particular, in the physically most interesting transition region between classical and ideal quantum behavior. We find excellent agreement with both PIMC and CPIMC in both the limits of strong and weak coupling. Finally, we have demonstrated that PB-PIMC still suffers from the fermion sign problem, since, as expected, S decreases exponentially with the particle number.
A possible future application of PB-PIMC to the quantum dot system might include the investigation of the transition from the classical to the degenerate quantum regime, in particular a systematic comparison of fermions to bosons and boltzmannons. To describe realistic quantum dots, it will be important to include the spin degrees of freedom into the simulation. In particular, this should allow us to recover, for weak coupling, Hundʼs rules physics and also to address the spin contamination problem [45,46]. Furthermore, it could be interesting to extend the considerations to 3D confinements, e.g. [47,48], and study the impact of quantum statistics on structural transitions [49]. In addition, we expect our method to be of interest for the future investigation of numerous Fermi systems, including the finite temperature homogeneous electron gas [8][9][10], two-component plasmas [11][12][13] and fermionic bilayer systems [5][6][7]. • Select the time slice of the new head, head τ , uniformly from all P 3 slices.
• Select the bead of the new head, r head .
• Select the total number of links to be erased as m M [1,˜] ∈ .
• Select m beads on the next slices from  The random construction of an artificial trajectory (the beads marked by black arrows) is followed by the re-sampling of all beads between its first (start) and last (end) bead. In the right panel, the Swap move is demonstrated. The current head is 'connected' to a random target bead on the time slice of the tail.
• Sample m 1 − new beads according to equation (24) with head and tail being the fixed endpoints: The parameter μ is another degree of freedom of the algorithm and plays the same role as the chemical potential in the usual WA-PIMC scheme.
(iii) Swap: the Swap move very efficiently generates exchange, i.e., allows for a switch between large off-diagonal or diagonal diffusion matrix elements as it is illustrated in the right panel of figure A1. Let m denote the number of missing beads between head and tail.
• Choose a target bead on the slice tail τ according to The head itself cannot be selected on the last slice and the last bead will be the new head after the update.
• 'Connect' the old head with the target bead by re-sampling the m beads between the slices of head and tail according to (iv) Advance/ Recede: these updates move the head forward (backward) in the imaginary time. However, they are optional and, in principle, not needed for ergodicity. The Advance move is executed as follows: • Calculate the number of missing beads between head and tail, α. If 0 α = , the update is rejected.
• Select the number of new beads to be sampled, m [1, ] α ∈ .
• The reverse move is given by Recede. Let κ denote the total number of beads which can be removed. If 0 κ = , the update is rejected.
• Select the total number of beads to be removed as m [1, ] κ ∈ .
• Select m beads backwards starting from the old head from