Localised distributions and criteria for correctness in complex Langevin dynamics

Complex Langevin dynamics can solve the sign problem appearing in numerical simulations of theories with a complex action. In order to justify the procedure, it is important to understand the properties of the real and positive distribution, which is effectively sampled during the stochastic process. In the context of a simple model, we study this distribution by solving the Fokker-Planck equation as well as by brute force and relate the results to the recently derived criteria for correctness. We demonstrate analytically that it is possible that the distribution has support in a strip in the complexified configuration space only, in which case correct results are expected.


Introduction
Complex Langevin (CL) dynamics [1,2] provides an approach to circumvent the sign problem in numerical simulations of lattice field theories with a complex Boltzmann weight, since it does not rely on importance sampling. In recent years a number of stimulating results has been obtained in the context of nonzero chemical potential, in both lower and four-dimensional field theories with a severe sign problem in the thermodynamic limit [3][4][5][6][7][8] (for two recent reviews, see e.g. Refs. [9,10]). However, as has been known since shortly after its inception, correct results are not guaranteed [11][12][13][14][15][16]. This calls for an improved understanding, relying on the combination of analytical and numerical insight. In the recent past, the important role played by the properties of the real and positive probability distribution in the complexified configuration space, which is effectively sampled during the Langevin process, has been clarified [17,18]. An important conclusion was that this distribution should be sufficiently localised in order for CL to yield valid results. Importantly, this insight has recently also led to promising results in nonabelian gauge theories, with the implementation of SL(N, C) gauge cooling [8,10].
The distribution in the complexified configuration space is a solution of the Fokker-Planck equation (FPE) associated with the CL process. However, in contrast to the case of real Langevin dynamics, no generic solutions of this FPE are known (see e.g. Ref. [19]). In fact, even in special cases only a few results are available [11,17,20,21]. In Refs. [17,18] this problem was addressed in a constructive manner by deriving a set of criteria for correctness, which have to be satisfied in order for CL to be reliable. These criteria reflect properties of the distribution and, importantly, can easily be measured numerically during a CL simulation, also in the case of multi-dimensional models and field theories [6].
A widely used toy model to understand CL is the simple integral where the parameters in the action are complex-valued. This model has been studied shortly after CL was introduced [11,22,23], but no complete solution was given. As we will see below, its structure, with complex σ, is relevant for the relativistic Bose gas at nonzero chemical potential [4,20]. Recently, a variant of this model (with σ = 0 and λ complex) was studied by Duncan and Niedermaier [21]: in particular they constructed the solution of the FPE, using an expansion in terms of Hermite functions. They considered the case of "complex noise", in which both the real and imaginary parts of the complexified variables are subject to stochastic kicks. Unfortunately, it has been shown in the past that generically complex noise may not be a good idea, since it leads to broad distributions in the imaginary direction and hence incorrect results [17,18]. This was indeed confirmed in Ref. [21]. In this paper we aim to combine the insights that can be distilled from the criteria for correctness discussed above with the explicit solution of the FPE, adapting the method employed in Ref. [21] to the model (1.1). The paper is organised as follows. In Sec. 2 we discuss CL and the criteria for correctness. To keep the paper sufficiently accessible, we first briefly review how to arrive at the criteria for correctness and subsequently present numerical results, for both real and complex noise. In Sec. 3 we study the probability distribution in the complexified configuration space, by solving the FPE directly as well as by a brute-force construction using the CL simulation, again for complex and real noise (the latter was not considered in Ref. [21]). In Sec. 4 we combine our findings concerning the distribution and the criteria for correctness, and provide a complete characterisation of the dynamics. Sec. 5 contains the conclusion. Finally, in order to see whether the structure found numerically can be understood analytically, a perturbative analysis of the FPE is given in Appendix A.

Complex Langevin dynamics and criteria for correctness
We consider the partition function (1.1). We take λ real and positive, so that the integral exists, while σ is taken complex. Analytical results are available: a direct evaluation of the integral yields where ξ = σ 2 /(8λ) and K p (ξ) is the modified Bessel function of the second kind. Moments x n can be obtained by differentiating with respect to σ. Odd moments vanish.
The aim is to evaluate expectation values numerically, by solving a CL process. We start from the Langevin equation, where the dot denotes differentiating with respect to the Langevin time t and the (Gaussian) noise satisfies After complexification, the CL equations reaḋ with the drift terms (2.7) The form of the drift terms is similar as in the Bose gas, after a reduction to a single momentum mode [20]. The normalisation of the real and imaginary noise components follows from Eq. (2.3) and is given by Here N I ≥ 0 is a free parameter, which can be varied. In principle, expectation values should be independent of the choice of N I , but in practice they are not. Real noise amounts to N I = 0. Expectation values are obtained by averaging over the noise. After this averaging, holomorphic observables evolve according to O P (t) = dxdy P (x, y; t)O(x + iy), (2.9) where the distribution P (x, y; t) satisfies the FPĖ P (x, y; t) = L T P (x, y; t), (2.10) with the FP operator In order to justify the approach, we also consider expectation values with respect to a complex weight ρ(x, t), This equation has a simple stationary solution, ρ(x) ∼ e −S(x) , which is the desired weight.
The task is now to show that the two expectation values O P (t) and O ρ(t) are equal, 14) at least in the limit of large t, making use of the respective FPEs and the Cauchy-Riemann (CR) equations [17,18]. Here it is essential that only holomorphic observables are considered, which evolve according to with the Langevin operatorL We note that for holomorphic observables,L = L, where L is the transpose of L T introduced above. The equivalence (2.14) can indeed be shown, as discussed in detail in Refs. [17,18], provided that integration by parts in y is allowed, without the presence of boundary terms at infinity. This construction involves the products P (x, y; t)O(x+iy) for 'all' observables O(x), and hence it puts severe constraints on the decay of the distribution at infinity. This will indeed be shown to be crucial below. From now on we consider only the equilibrium distribution P (x, y), assuming that it exists, and hence drop the t dependence. In the large t limit, the equivalence (2.14) can then be expressed in terms of the criteria for correctness [17,18]  which in principle need to be satisfied for a complete set of observables O(z).
Here the expectation value is taken with respect to the equilibrium distribution P (x, y), or equivalently, a noise average. After separating real and imaginary parts, the criteria take the form where the primes denote differentiation with respect to z. We consider as observables O n (z) = 1 n z n , (2.20) with n even (the odd powers vanish by symmetry). The associated consistency conditions, then take the explicit form which are of course nothing but the standard Schwinger-Dyson (SD) relations between n-point functions, which should be satisfied in order for the theory to be solved correctly.  We now turn to the numerical solution of the CL process, using the simplest lowest-order discretisation with an adaptive stepsize [24]. For the results shown here, the total combined Langevin time for each parameter set is 2×10 6 Langevin time units and the maximal stepsize is 5 × 10 −5 . We have verified that finite stepsize corrections are negligible. We have studied various combinations of σ and λ, keeping Re σ = A > 0. Here we focus on σ = 1 + i and λ = 1. In Fig. 1 CL results are shown for the real and imaginary parts of the observables 1 n z n and for the criteria for correctness C n = 1 n L z n , for n = 2, 4, 6, 8. The figure shows the result for real noise, N I = 0: all expectation values agree with the exact result, denoted with the horizontal lines, and the criteria for correctness are all consistent with 0, as it should be.
In Fig. 2 we show how the observables and the criteria for correctness depend on the amount of complex noise. In the top figures we see that for small N I the observables with n = 2, 4 appear to be consistent with the exact result, while for larger N I they start to deviate. Perhaps surprisingly, the lowest-order criterium C 2 is consistent with 0 for all N I shown. This implies that even though z 2 and z 4 have converged to the wrong result at larger N I , this occurs in such a way that the condition (2.22), i.e. the corresponding SD equation, is still satisfied. The possibility of multiple solutions to the SD equations when solving CL has been observed earlier in Ref. [13] (see also Refs. [25,26]).
In order to detect problems, it is necessary to consider higher moments. In Fig. 2 (below), we observe that for small N I the observables (with n ≥ 6) and the criteria (with n ≥ 4) are only marginally consistent with the expected results, while for larger N I they suffer from large fluctuations and can no longer be sensibly determined. According to the analytical justification [17,18], this implies that the results from CL cannot be trusted in the presence of complex noise. Below we give an interpretation of this in terms of the properties of the probability distribution. For now we tentatively conclude that, if we assume that the large fluctuations reflect the slow decay of the distribution in the imaginary direction, P (x, y) should decay as 1/|y| α , with 5 α 7, which will indeed be confirmed below.

Probability distributions
A crucial role in the justification of the method is played by the equilibrium distribution P (x, y) in the complexified space. In Refs. [17,18] it was shown in detail that for CL to give correct results, it is necessary that the product of the distribution and a suitable basis of observables drops off fast enough in the imaginary direction. This condition can be translated into the criteria for correctness, as discussed above. Unfortunately the Fokker-Planck equation, satisfied by the distribution, cannot be solved easily, except in the case of a noninteracting model (λ = 0), see Appendix A.
In this section we study the distribution following two approaches. Firstly, it is possible to collect histograms of the (partially integrated) distribution during the CL evolution. Note that very long runs are required, in order to sample the configuration space properly. Here we will in particular be interested in the partially integrated distributions We note that this approach can easily be extended to multi-dimensional integrals and field theories. We refer to this as the brute force method. Secondly, for the zero-dimensional model we consider here, it is possible to expand the distribution in terms of a truncated set of basis functions and solve the resulting matrix problem numerically, following Duncan and Niedermaier [21]. We discuss this approach in the next subsection.

Solving the Fokker-Planck equation
We consider the eigenvalue problem where the FP operator L T was given in Eq. (2.11) and takes the explicit form We denote the eigenvalues of −L T with κ and the eigenfunctions with P κ (x, y).
If there is a unique ground state P 0 with eigenvalue κ = 0, and for all other eigenvalues Re κ > 0, the time-dependent distribution can be written as 4) and the equilibrium distribution is given by P 0 (x, y). In the CL simulations we observe convergence to well-defined expectation values (at least for the low moments, n = 2, 4) and hence we are certain that an equilibrium distribution exists.
In order to solve the eigenvalue problem, we follow closely Ref. [21]. The FP operator is invariant under x → −x, y → −y, which implies that eigenfunctions have a definite parity, P κ (x, y) = ±P κ (−x, −y). The ground state is expected to satisfy P 0 (x, y) = P 0 (−x, −y), such that observables of the type (x + iy) n , with n odd, vanish. If P κ is an eigenfunction of L T with eigenvalue κ, then so is P * κ with eigenvalue κ * . It is expected that P 0 is real.
In Ref. [21] P (x, y) was doubly expanded in a basis of Hermite functions, i.e.
where ω is a variational parameter appearing in the harmonic oscillator eigenfunctions, and N H indicates the number of Hermite functions included in the truncated basis. The coefficients c kl have to be determined. In order to do so, we introduce creation and annihilation operators, satisfying and write In terms of these, −L T reads with the quartic terms Note that X is independent of ω. Finally, in terms of the creation/annihilation operators, the FP operator reads and we introduced the rescaled parameters, In Ref. [21], where A = B = 0, ω was chosen to be proportional to √ λ, and no adjustable parameters were left on the RHS of Eq. (3.11). As we see below, there is a great advantage in keeping ω arbitrary.
We can now compute the matrix elements with respect to the Hermite functions, using the notation where The matrix elements are and Following Ref. [21], the double indices k, l and m, n (all taking values from 0 to N H − 1) are converted into single ones, via 19) and the inverse with i, j = 1, . . . , N 2 H . The matrix elements are denoted as L T ij = kl|L T |mn , and the eigenvalue problem is written as We have solved this matrix problem with a FORTRAN90 code using subroutines provided by the LAPACK library [27]. Since the matrix size is N 2 H × N 2 H , there is an upper limit of what is practically feasible. For the maximal number of Hermite functions we have considered, N H = 150, the numerical computation takes around 36 hours on a standard work station. Convergence can be tested by increasing N H and varying ω (see the detailed discussion below). Considering the eigenvalue at (or closest to) 0, the distribution P 0 (x, y) can be reconstructed from the corresponding eigenvector, as

Complex noise
We start with the case of complex noise. The parameters in the action are taken as σ = 1 + i and λ = 1, and we consider a basis with 30 ≤ N H ≤ 150 Hermite functions. The values of ω we used are listed in Table 1. In the limit of large N H the results are expected to be independent of the value of ω. In practice however, we find that for finite N H the parameter ω plays the role of a tuning parameter: in particular, when ω is too small, there are eigenvalues with a negative real part. This becomes more prominent as N I is reduced, see below.
Obviously, in this application this would mean that the FP evolution would not thermalise and display runaway behaviour. Since the CL evolution thermalises (and is obviously independent of the choice of ω), we expect the real parts of all eigenvalues to be nonnegative. When the value of ω is increased, we observe that the eigenvalues with a real negative part move into the positive half-plane and the spectrum around the origin converges. Convergence can also be seen by studying the reconstructed probability distribution P (x, y), using Eq. (3.23). Interestingly, we always find an eigenvalue consistent with 0. When ω is increased even more, convergence properties worsen again. We find therefore that there is an ω interval for which: 1. there is an eigenvalue consistent with 0; 2. the other eigenvalues are in the right half-plane; 3. the reconstructed ground state distribution is stable under variation of N H and ω.
The ω interval depends on the parameters and is pushed to larger values as N I is reduced. We have not found a special role for the ω value used in Ref. [21], namely ω = √ 3λ (in our conventions). We first consider N I = 1, as in Ref. [21]. The smallest 15 eigenvalues are shown in Fig. 3 (left), for several values of ω. For the ω values shown here, all eigenvalues are in the right half-plane and the spectrum around the origin is to a good extent independent of ω. The reconstructed distribution P (x, y), obtained using the eigenvector corresponding to the eigenvalue at (or closest to) the origin, is shown in Fig. 4 (top). We find a smooth distribution with a double peak structure, similar as in Ref. [21]. Next we reduce the amount of complex noise and consider N I = 0.01. The spectrum is shown in Fig. 3 (right) and the reconstructed distribution in Fig. 4 (below). The findings are similar as with N I = 1, but ω has to be increased more in order to find convergence and even then the larger eigenvalues are hard to establish. The distribution has again two peaks, which are now more pronounced and rotated in the xy-plane. We note the symmetry P (−x, −y) = P (x, y). Importantly, the distribution is more squeezed in the y direction and the main features are contained in the interval −0.45 < y < 0.45.
In order to clarify the relevance of these findings, we show in Fig. 5 the partially integrated distributions P x (x) and P y (y), see Eq. (3.1), on a logarithmic scale, for the case of N I = 1. Besides presenting results for various ω values, we also show the histogram obtained during a CL simulation. We observe an acceptable agreement between the CL results and the solution of the FPE for ω ∼ 1.5, 2, down to a relative size of 10 −6 , after which the FP solution can no longer cope. We interpret this as a manifestation of the truncation. When ω is taken too large, the disagreement occurs for smaller values of x and y.
The distributions do not go to zero rapidly but decay as a power, which is clearly visible on a log-log plot. In Fig. 6 we show the distributions multiplied by x k and y k respectively, for k = 4.8, 5, and 5.2, using the CL data. At large |x| and |y|, we observe a power decay with power 5, i.e.
This suggests that the distribution decays as which indeed decays as 1/r 5 . We note that this power decay is in agreement with the conclusions from the moments above: z 2 and z 4 are well defined and can be numerically determined without any problems, while the higher moments diverge, which in the CL simulation is reflected in large fluctuations.

Real noise
We now turn to the case where CL appears to work well, i.e. with real noise (N I = 0). The eigenvalues are shown in Fig. 7 for a number of ω values. For  Figure 6: As above, for x k P x (x) and y k P y (y) with k = 4.8, 5, 5.2, using the CL data. The dotted horizontal line is meant to guide the eye.
ω < 4 eigenvalues with negative real part are present (not shown in figure). We note that in all cases there is an eigenvalue at (or close to) the origin, but in general convergence is much harder to establish from a study of the eigenvalues alone. In order to have a handle on this we also analyse the partially integrated distributions P x and P y under variation of N H and ω, and also compare those with the histograms obtained with CL. The results are shown in Fig. 8 for P y (y) (top) and P x (x) (bottom). In the case of P y , convergence as N H is increased is clearly visible (top, left). We note that for the largest N H values the distribution agrees with the result obtained by direct Langevin simulation, indicated with the black line. The distribution is very well localised and appears to drop to 0 around y = 0.28. We come back to this below. Convergence as ω is increased is demonstrated in Fig. 8 (top, right) and we observe that a large value of ω is required, ω ∼ 50. It is of course expected that the chosen value of ω eventually becomes irrelevant, but for finite N H keeping ω as a tuning parameter is essential. The distribution P x (x) is shown in Fig. 8 (below) as a function of x (left) and x 4 (right), on a logarithmic scale. In contrast to the case of complex noise, we now find an exponential rather than a power decay. Results from FPE agree with the CL histogram, independently of the value of ω in this case, but only down to a relative size of 10 −4 ; varying ω does not help in this case (increasing N H probably will). From the CL result, we see that the distribution falls off as P x (x) ∼ e −ax 4 , a ∼ 0.295. (3.27) Naively this behaviour can be expected, since for large |x| the original weight behaves as ∼ exp (−λx 4 /4). We note that the prefactor is 0.295, which is slightly larger than λ/4 = 0.25. Interestingly this seems to be understandable from a perturbative analysis, see Appendix A.
The reconstructed distribution is shown in Fig. 9. This distribution has similar characteristics as at N I = 0.01, except that the two peaks are now very pronounced and the saddle around the origin is much deeper. The peaks lie mostly in the y direction and they are therefore clearly visible in P y (y). The distribution is squeezed even more than before and its main support is in the region −0.3 < y < 0.3. The ripples visible for larger y values are an artefact of the truncation. In fact, in the next section we will demonstrate that the distribution is strictly 0 when |y| > 0.3029.
We conclude that for this choice of parameters (σ = 1 + i and λ = 1) the decay in the case of real noise is manifestly different compared to complex noise. In the latter we found a power decay, resulting in ill-defined moments z n when n > 4, while here we find exponential decay in the x direction and, as we will see below, in the y direction support only inside a strip. As a result there is no problem in computing higher moments, since they are all well-defined.

Interpretation
From the solution of the FPE and the CL process, we conclude tentatively that for real noise the distribution is localised in the y direction and has support in a strip around the origin only, with −0.3 y 0.3. This conclusion can be made more precise by studying the classical flow diagram and properties of the FPE. This analysis can also be used to find parameter values for which CL breaks down for real noise (see Sec. 4.3).

Classical flow
The classical flow diagram is shown in Fig. 10, for σ = 1 + i and λ = 1. We show the direction of the classical force by an arrow pointing in the direction (K x (x, y), K y (x, y)). The arrows are normalised to have the same length. The classical force is of course independent of N I . There are three fixed points, where K x = K y = 0: an attractive point at the origin and two repulsive fixed points, determined by σ + λz 2 = 0, or yielding (x, y) = (±0.455, ∓1.10) in this case. The flow is directed towards the origin, provided that |y| is not too large. This can be made more precise by studying where K y (x, y) changes sign. We find that K y (x, y) = 0 at  We now realise that along the horizontal dashed lines, which are determined by the extrema of the centre curve where K y = 0 (y = ±0.3029 in this case), the flow is always pointing inwards, i.e. towards the real axis. In absence of a noise component in the vertical direction, this creates a barrier for the Langevin evolution beyond which it cannot drift. Note that the repulsive fixed points actually help to establish this. Hence, provided that the process starts within this strip, it will never be able to leave (in the case of real noise and in the limit of zero stepsize). We have verified that if the dynamics starts out outside of the strip, it quickly finds its way into it, due to the mostly restoring properties of the classical flow. We conclude therefore that in the case of real noise the process takes place in the strip determined by − 0.3029 < y < 0.3029. (4.4) This is consistent with the conclusions drawn above from the histograms and the FPE solution of the distribution P (x, y). In the presence of complex noise, this conclusion no longer holds and the entire xy-plane can be explored.

Strips in the complexified configuration space
It is possible to make the argument based on classical flow presented above rigorous and show directly from the FPE that the equilibrium distribution P (x, y) is strictly zero in strips in the xy-plane, assuming sufficient decay, i.e. K x,y (x, y)P (x, y) → 0 (4.5) as x and/or y → ±∞. To achieve this, we note that the FPE takes the form of a conservation law, i.e., which allows us to consider the charge, Specialising now to the equilibrium distribution (and hence dropping the t dependence), we find that Q(y) is independent of y, provided that the product of the drift K x (x, y) and the distribution P (x, y) drops to zero at large |x|, since (4.9) We note that the required condition is always satisfied in our case, even in the case of the power decay. Since Q(y) vanishes as y → ±∞ (because J y (x, y) does, again relying on the sufficient decay), we find that (4.10) For real noise, this yields therefore the condition for all y. Since P (x, y) is nonnegative, this condition allows us to derive the following useful property: if K y (x, y) has a definite sign as a function of x for given y, P (x, y) has to vanish for this y value. As a function of x, K y (x, y) is a parabola with an extremum at and a curvature of 6λy. The value at the extremum is given by Consider now the case that y is positive (negative). In that case, when F (y) > 0 (F (y) < 0), K y (x, y) is strictly positive (negative) and hence P (x, y) has to vanish. The zeroes of F (y) are given by provided that 3A 2 − B 2 > 0. Inspection shows that F (y) > 0 when y − < y < y + and F (y) < 0 when −y + < y < −y − : hence for these y values, P (x, y) = 0. When 3A 2 − B 2 < 0, F (y) has no zeroes and F (y) and y have opposite signs. In that case, K y (x, y) has no definite sign and the reasoning cannot be followed.
In the first case the distribution can in principle be nonzero in the outer region, y 2 > y 2 + . However, once the process is in the inner strip determined by y 2 < y 2 − , it will not be able to leave this strip, due to the nature of the drift terms. Hence there is no objection to putting the distribution to zero also when y 2 > y 2 + . We conclude therefore that the equilibrium distribution has support in the strip determined by y 2 < y 2 − only, in agreement with the reasoning above. Note that P (x, y) is therefore a nonanalytic function of y. Of course the value of y − agrees with the boundary determined in the example in the previous section, i.e. with the position of the dashed lines in Fig. 10, as it should be.
For vanishing B, the action is real and the distribution is (for real noise) strictly localised on the real axis, y = 0. For small B, the width of the allowed region around y = 0 is nonzero and set by Hence increasing the amount of complexity by increasing B results in a broadening of the distribution with a width ∼ 2B. The importance of this controlled increase has been emphasised earlier in Ref. [28].

Absence of strips
The argument presented above breaks down in the presence of complex noise. In that case, the process is pushed out in the y direction and the repulsive fixed points come into play. Once the repulsive fixed point is crossed, large excursions in the y direction take place and the distribution is no longer localised. When the amount of complex noise is small, it takes time to notice this, but eventually it will happen. There are therefore no strips for complex noise, which also follows from the formal derivation above. This is demonstrated in Fig. 12 (left), where P y (y) is shown for the values of N I considered above. As shown above, this leads to power decay, P y (y) ∼ 1/|y| 5 . Interestingly, the derivation above demonstrates that strips are only present when 3A 2 > B 2 . For larger B values, one may therefore expect a breakdown of CL with real noise, similar as with complex noise. This is indeed what happens. The distribution P y (y) as B is increased is shown in Fig. 12 (right), for real noise. Note the similarity with the figure on the left. The delocalisation has a detrimental effect on the results of the CL process. This is demonstrated in 13, where the moments minus the exact result are shown on the left and the criteria for correctness on the right. We observe that increasing B has a similar effect as increasing N I , cf. Fig. 2.
The distributions for the case that σ = 1 + 3i and λ = 1 are shown in Fig. 14. The top figure shows P (x, y), obtained with the FPE. We note that the distribution still appears to be mostly contained within a strip. However, a closer look at the partially integrated distributions obtained with CL, see Fig. 14 (bottom), shows that again power decay is present, with the same power as before. This power decay sets in once the process has crossed the repulsive fixed points, which for this choice of parameters are located at x = ±1.04 and y = ∓1.44. The weight of the power tails is clearly small, yet it is enough to give rise to fluctuations for the higher moments when solving the CL process. We conclude that in absence of strips a universal power law decay is present, which results in a breakdown of the formal justification [17,18] and wrong or wildly fluctuating results in practice.
Finally we will show that it is possible to understand the universal decay directly from the FPE. We start from the assumption that the distribution is of the form P (x, y) = c (x 2 + y 2 ) α (4.16) at large x and y, where we found numerically that the power α is consistent with 3. Substituting this Ansatz in the FPE (2.10), we find, after some algebra and the removal of common factors, that At large x and/or y the final term dominates: requiring that this term vanishes yields indeed α = 3. This construction assumes that the behaviour at large distance is approximately rotationally invariant in the xy-plane and that there are no preferred directions, which would invalidate the Ansatz and the power counting above. Based on our numerical evidence, this seems to be the case. We note that the final term in Eq. (4.17) is independent of σ = A + iB and N I ; hence the decay at large distance is independent of the parameters in the action and of the amount of complex noise. We also note that B has disappeared from Eq. (4.17): the reason is that B breaks the invariance under x → −x and independently y → −y, while the Ansatz is invariant under those.
The conclusion is therefore that the decay at large x and y is universal. Of course the presence of complex noise and/or a large value of B 2 > 3A 2 is essential in catalysing large excursions, which lead to the power decay. Notably, the power decay appears to be unavoidable unless its appearance is strictly forbidden, as in the case of the strips for real noise and B 2 < 3A 2 .

Conclusion
In order to justify the results obtained with complex Langevin dynamics, it is necessary that the probability distribution is sufficiently localised in the complexified configuration space. Here we have studied properties of this distribution via a number of methods, in the case of a simple model. Using the insights gathered from classical flow, histograms obtained during the CL process, the criteria for correctness and the explicit solution of the FPE, a complete characterisation of the distribution can be given.
In the case of real noise and provided that B 2 < 3A 2 , where σ = A + iB, we found that the distribution is strictly localised, i.e. it has support in a strip in the configuration space only, with exponential decay in the real direction. In this case all moments are well-defined and, relying on the analytical proof of the method, correct results are expected. We also found that the criteria for correctness are satisfied. In contrast, when the noise is complex or when B 2 > 3A 2 , the entire configuration space is explored. Large excursions are possible due to the presence of repulsive fixed points and the decay of the distribution changes dramatically. We found strong indications that for large |x| and |y|, the distribution decays as a power, according to P (x, y) ∼ 1 (x 2 + y 2 ) 3 . (5.1) A consequence of this slow decay is that higher moments are no longer welldefined. As a result, these and the criteria for correctness suffer from large fluctuations during the CL process, an important signal of failure. Here it is important to emphasise that the inclusion of higher moments is essential to observe the breakdown. In this model the FPE can be solved explicitly, via an expansion in a truncated set of basis functions. However, it is still a nontrivial problem and perhaps the best way to find the distribution is by brute force, i.e. during the CL simulation. This also has the benefit of being applicable to higher dimensional models. In the case of the localised distribution in the strip, the used basis set may not be the one that is best adapted to the problem and, in hindsight, once it has been demonstrated that the distribution has support in a strip only, a more suitable basis can be used. This would however limit the generality of the approach.
As an outlook, we note that in the more realistic cases of multi-dimensional models and field theories, the luxury of solving the FPE is typically not available. However, we have demonstrated that the essential insight can already be obtained from a combination of histograms of partially integrated distributions and the criteria for correctness, which gives a consistent picture of the dynamics. These tools are readily available in field theory. Finally, our conclusions are also immediately applicable to nonabelian SU(N) gauge theories, for which gauge cooling provides a means to control the distribution in SL (N, C), a possibility not present in simpler models.

A Perturbative solution of the FP equation
In order to understand the numerical solution for the distribution P (x, y) found above further, we discuss in this Appendix the perturbative solution of the FP equation (2.10) in the stationary limit. Although it is only of limited use, it provides some insight, especially along the x axis.

A.1 Lowest-order solution
We write the FP operator (2.11) as and The (normalisable) solution of the lowest-order equation, is given by 9) and N 0 is the normalisation constant, This solution is similar to the one found in the relativistic Bose gas at nonzero chemical potential [20]. It is easy to see that it is the correct solution at leading order, by computing (recall that N R − N I = 1 and σ = A + iB) More generally, one may equate the two expectation values which, assuming that it is possible to shift x → x − iy, yields the relation [29,30] ρ(x) = dy P (x − iy, y), (A.14) where the LHS should be independent of N R,I . Evaluating the y integral yields in this case 16) which is indeed the expected answer.

A.2 First-order correction
To compute higher-order corrections, we expand Higher-order corrections are determined by the inhomogeneous partial differential equation, The homogeneous equation is solved by P (0) . To find the particular solution, we factor out the leading order solution, (with p (0) = 1), and write Higher-order corrections are then determined by For the first-order correction, this yields The RHS of Eq. (A.24) is a fourth order polynomial with only even powers. As a particular solution we may therefore attempt a polynomial of fourth degree, with only even terms appearing and containing 8 unknown coefficients, as it should be. It is clear that the perturbative distribution is not positive definite and strictly speaking only applies when the perturbative correction λp (1) (x, y) is small with respect to 1, i.e. around the origin. However, it can be made positive definite by a simple exponentiation, P (x, y) = P (0) (x, y) exp λp (1) (x, y) , (A.37) which has the same leading order λ dependence. This distribution is normalisable since the coefficients of the quartic terms are all negative. An example is shown in Fig. 15. We observe a double peak structure, as in the main text. At large y values, the exponentiated construction cannot be correct, since it decays exponentially rather than be 0 outside the strip found above. In the x direction, however, the perturbative solution gives a surprisingly good description of the decay. Taking y = 0, we find  Fig. 16 (left), and is seen to agree better than expected. We note that the prefactor 0.3 is also close to what was observed for the integrated distribution P x (x). In Fig. 16 (right), we also show a comparison with the perturbative expression