Global sampling of Feynman's diagrams through Normalizing Flow

Normalizing Flows (NF) are powerful generative models with increasing applications in augmenting Monte Carlo algorithms due to their high flexibility and expressiveness. In this work we explore the integration of NF in Diagrammatic Monte Carlo (DMC), presenting an architecture designed to sample the intricate multidimensional space of Feynman's diagrams through dimensionality reduction. By decoupling the sampling of diagram order and interaction times, the flow focuses on one interaction at a time. This enables constructing a general diagram by employing the same unsupervised model iteratively, dressing a zero-order diagram with interactions determined by the previously sampled order. The resulting NF-augmented DMC is tested on the widely used single-site Holstein polaron model in the entire electron-phonon coupling regime. The obtained data show that the model accurately reproduces the diagram distribution reducing sample correlation and statistical error for observables such as the polaron binding energy and the interacting Green's function.


I. INTRODUCTION
The investigation of many-body effects in interacting systems has long presented a significant challenge in the physics community.A modern approach to address this challenge centers on the strategic use of Feynman's diagrams for perturbative assessments of key quantities like self-energies and correlation functions [1,2].These evaluations provide insights into the intricate ways in which single-particle properties are impacted by interactions.Consequently, this approach establishes a comprehensive framework with versatile applications extending to thermodynamics and quasi-particle properties, thereby attracting interest from a variety of fields [3].
Diagrammatic quantum Monte Carlo (DMC) is a powerful numerical method to obtain approximation-free estimates of diagrammatic perturbative expansions [4,5].A target quantity, usually the interacting Matsubara Green's function G(k, τ ), is sampled stochastically through the Markov chain Monte Carlo (MCMC) exploration of the Feynman's diagrams contributing to its power series.This approach has been widely used in condensed matter physics to accurately solve a wide range of physical systems described in terms of effective Hamiltonians.Representative examples are given by polarons [4][5][6][7][8][9][10] and excitons [11,12], thermodynamic of spin systems [13,14] and correlation of electron gas [15].While considered to be numerically exact, DMC becomes progressively less efficient with increasing complexity of diagram's phase space, which rapidly scales when approaching real materials.For example, the incorporation of ab-initio material-specific band dispersions and electronphonon coupling introduces distinctive features in the * luca.leoni12@unibo.itphase space, leading to a sluggish MCMC exploration with increased autocorrelation between samples, inducing critical slowing down effects [16].Other Monte Carlo algorithms tackle this issue by employing global sampling strategies, which enhance the exploration of phase space within MCMC [17].These approaches, however, are not yet present in the DMC literature [18].Nevertheless, recent advancements in generative machine learning (GML) have demonstrated the potential for such architectures to serve as effective global updates in various physics-related scenarios [19][20][21][22][23][24][25].
From a broader perspective, the integration of ML techniques in many-body physics has gathered substantial interest in recent years [26].Noteworthy successes in performance enhancement have been achieved across various contexts, encompassing density functional theory [27][28][29], numerical renormalization group [30,31], and interacting spin models [32].Particularly relevant to the current context is the integration of GML models into Monte Carlo schemes to refine sampling from highdimensional, intricate probability distributions [33,34].Various GML algorithms have proven effective as global updates [17,35,36], enhancing MCMC procedures [23][24][25].This enables the generation of uncorrelated samples within the chain and mitigates the critical slowing down problem.
Normalizing Flows (NF) represent an emerging family of generative models, standing out as one of the most promising techniques in GML.NF provides a method to approximate invertible maps to desired probability distributions [37,38].The NF architecture facilitates both rapid sampling and inference of target distributions, exhibiting significant versatility in representing complex objects.It has found successful applications across a wide spectrum, including image and video generation, reinforcement learning and in the Monte Carlo integration of quantum lattice field models [19][20][21][22].The prospect of de-signing a flow-based model to capture the features of the Feynman diagram's phase space within a DMC framework is highly appealing.However, the added complexity in this case lies in the fact that the sample's dimensionality is a variable of the distribution, rendering standard NF architectures unsuitable for addressing the problem.In this letter, we introduce an approach to circumvent such limitations by proposing a model that generates a diagram iteratively, focusing on individual interactions.To evaluate the feasibility and efficiency of this hypothesis, we employ the single-site polaron Holstein model as a test case, a paradigmatic electron-phonon (el-ph) effective Hamiltonian [39].
Our designed NF augmented DMC approach yields a significant reduction in sample correlation within the Markov chain across a broad spectrum of electronphonon coupling strengths.This reduction translates into a diminished number of samples required to achieve convergence.The proposed data-driven global updates enhance statistical convergence, leading to a roughly 50% reduction in both statistical errors associated with observed polaron properties and sample autocorrelation compared to the standard DMC local procedure.Since the code developed in this study was designed to enhance the exploration of Feynman's diagram space rather than to optimize performance, the actual sampling cost is higher compared to the standard approach.
The article is organized as follows.In the next section, we introduce the NF-augmented DMC architecture tailored for the Holstein polaron model in the atomic limit.Subsequently, we present and analyze the performance of the developed code in estimating specific polaron quantities.

A. Diagram representation
The single site Holstein model is expressed as consisting of a single non-dispersive electronic band with energy ε, interacting with an Einstein-like phononic branch of frequency Ω.The electron-phonon coupling is characterised by a quadratic term with a constant interaction vertex g, and N represents the number of possible phonon states.We are interested in the quasi-particle properties of the model, which are encoded in the interacting Green's function.For polaronic systems the Green's function can be expressed as a sum over diagrams having an increasing number of phonons, as sketched in Fig. 1(a).Using Feynman's rules we can translate such expansion in a mathematical expression that, in the Matsubara formal- ism, takes the form: (2) The ordered integration accounts for every diagram having n phonons, each described by a vector of 2n interaction times x.The contribution of each diagram is expressed by the zero-temperature electronic and phononic propagators, resulting in the following diagram's weight: where x was divided in phonon's creation, b, and annihilation times, e, allowing the diagram to be represented by a vector [τ, b, e] (see Fig. 1(b)).DMC directly integrates Eq. 2 by randomly sample expansion's diagrams from a distribution proportional to their weight function.Detailed information on how the MCMC procedure is implemented in diagram space are found at [40,41], while the observable estimators for polaron quantities are introduced by Mishchenko [5].Here we focus our attention on G(τ ) and the polaron binding energy E p .The Green's function is evaluated by the histogram collecting the distribution of the variable τ , while E p through the ground state energy estimator [5]: Analytical expressions for both observables can be found in the literature [2], making them a suitable choice for testing the convergence properties of a new algorithm.

B. Flow architecture
Normalizing flows transform a simple starting probability distribution, p Z , into a more complex desired density, p X , through a series of invertible and smooth transformations.This is achieved using invertible networks [42][43][44] to construct a variational map T θ : R D → R D that, when applied to samples {z i } N i=0 drawn from p Z , generates a set x i = T θ (z i ) distributed as [37]: with J T −1 θ being the Jacobian of the invertible map.This provides fast sampling and density estimation of a parametrized distribution p θ (x), which can be fitted to the target p X (x) by minimising their Kullback-Leibler (KL) divergence.
For DMC the target p X (x) is the diagram's weight distribution proportional to the weight W (x), with x being the diagram's vector representation introduced in Eq.3 and Fig. 1(b).The complexity of approximating the full p X (x) in a NF framework lies in the dimensionality of the vector space spanned by the different diagrams.In fact, x changes dimensions based on the number of phonons, described by n (see Eq. 2), typically referred to as the diagram's order.Approaching this problem with standard NF architectures would mean creating a different model for every possible dimension n, leading to the impossibility of covering the entire space.Here, we avoid such restriction by decoupling the sampling of the diagram's order and the interaction times with the approach schematised in Fig. 2.
The strategy relies on sampling single phonons using a conditional NF model T θ [20], with specifics reported in appendix A, trained to fit a 2D distribution (p X ) describing b and e statistics based on the current diagram's length, τ .Thus, allowing the generation of a new diagram by multiple use of T θ to dress the zero order diagram, defined by τ , with a random number of phonons independently sampled from a preselected integer distribution p n (n|τ ).In this way, the entire diagram space at different τ can be explored using a single model possessing the following final density: where the n! is needed since every permutation of the phonons indices leads to the same diagram.The combination of this density with Eq. 5 can be used to completely define an update in the DMC chain.Finally, the model was completed by selecting p n (n|τ ) as a Poisson distribution exhibiting a mean value λ dependent on both τ and the el-ph coupling strength g.The form of λ was chosen to best fit the order statistic collected on preexisting runs at different τ with varying g, and showed in appendix A. This leads to a final p n conditioned also by the coupling constant g.For instance, sampling from different coupling regimes comes at no additional increase in the NF complexity due to the g dependence being completely contained in the simpler order distribution.

C. Loss function
In the considered Holstein model, comparing Eq.6 with the target weight in Eq.3 shows that the model distribution p θ should be trained to fit the unnormalized density p X (b, e) = e −(e−b) , where Ω is set as the unit of measure for the energies.Also, to generate a physical phonon the flow sampling needs to take into account that the annihilation e must happen after the creation b and both of them needs to be placed before the end of the diagram.We found that the most flexible way that allows e and b to satisfy such constraint is multiply the target p X by steep sigmoid functions to create a new τ -dependent target pX with vanishing contribution in the unphysical regions of the domain pX (b, e; τ ) = p X (b, e)σ(b)σ(e − b)σ(τ − e). ( Where the sigmoids used were defined as with the α hyperparameter controlling the steepness of the function, recovering the exact form of the distribution for large values.Therefore, we selected an α value of 50 in order to obtain a smooth enough pX that still presentes all the main fetures of the real distribution, as can be seen in Fig. 3.This allowed us to train an autoregressive neural spline model [45] to map a 2D diagonal gaussian into pX by minimising the standard reverse KL divergence of the model where S(x) is a sum of softplus functions rising from the sigmoid contributions.We obtained the final T θ (z; τ ) by an unsupervised training where new samples were generated, at every step, from a set of τ uniformly distributed in the interesting domain of G and used to estimate L as an average.Finally, the loss minimisation was carried out using standard stochastic gradient descent methods using a learning rate of 10 −4 , reduced to 10 −5 after thirty thousand steps.

III. APPLICATION AND DISCUSSION
The developed NF-based architecture was integrated into the set of local updates in the Holstein model, establishing an NF-augmented Markov Chain.Subsequently, the statistical properties of this augmented chain were systematically compared against those of the standard one.Extensive data on observables and correlation times were gathered for various coupling strengths (g).Each run involved a set of 13 parallel chains, accumulating 10 7 steps.The utilisation of separate chains facilitated the extraction of uncorrelated estimates, enabling the assessment of errors on individual observable values.This approach offered a direct means of comparing the performance of the two methodologies by analysing the evolution of mean values and errors throughout the chain.

A. Statistical performances
The evolution of the polaron energy (E p ) for a specific coupling strength (g), as depicted in Fig. 4(a), clearly illustrates that the NF global update reduces the error in the estimate by approximately a factor of two compared to the local updates in standard DMC.Additionally, the Green's function presented in Fig. 4(b) exhibits nearly identical performance for both types of updates across the entire interaction spectrum.This outcome is anticipated since the diagram's length τ , serving as the correlated variable in G(τ ), is taken as input from p W and is not sampled.Consequently, the correlation in the τ channel remains consistent with the standard case, with improvements primarily observed in phonon-related quantities.To quantitatively evaluate the performance of the NF global updates, we present the effective sample size (ESS) as a function of the coupling strength g in Fig. 5.The NF-based updates exhibit a noticeable reduction in the overall sample correlation, leading to an increase in ESS across all coupling regimes.This result underscores the effectiveness of the proposed model in accurately reconstructing the target density, thereby en-0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 hancing the overall performance of the NF-augmented Markov Chain.However, a decrease in the performance of the NF model is observed in the strong coupling regime (g = 0.9).It is important to note that this anomalous behaviour is attributed to a peculiarity of the physical model rather than the NF architecture itself.Specifically, the single-site Holstein Green's function tends to a constant as g approaches unity, leading to a flattening of the Green's function with increasing g in Fig. 4(b).This causes every value of τ to have the same statistical weight, and very large values for τ start to appear in the chain, entering an extrapolation regime and lowering the model's performance.This is a system-specific issue that does not undermine the applicability of the NF model.It is always possible to adjust parameters, such as increasing ε (currently set to 1), to artificially make high τ less probable without altering the observable estimate.

B. Computational aspects
The C++ package is constructed on top of the C++ API of PyTorch [46], while the plots were produced using Numpy [47] and Matplotlib [48].The training was carried on a single Nvidia GeForce GTX 1650 GPU and took a little less than an hour of computer time.The code is openly available in github [49].Additional details on the NF-DMC architecture is given in Appendix B.
As mentioned in the introduction, our code was not optimised for performance since the focus was placed on exploring the possible statistical advantages of the methodology.Therefore, no parallelisation or batching was implemented, and GPUs were only utilised for training, leading to longer sampling times at higher orders.As a result, at present, the cost of our approach is approximately three times higher compared to the standard one at g = 0.1 where an ∼ 40% gain in effective sample size is observed, and become worse at larger g.

IV. CONCLUSIONS
In conclusion, we presented a global sampling strategy for connected Feynman diagrams based on the use of Normalizing Flows, while integrating it into an NFaugmented Diagrammatic Monte Carlo framework.Our devised procedure was tested on the Holstein polaron model, demonstrating superior statistical performance when compared to conventional DMC, albeit at an increased computational cost.Diagrams are constructed using a bottom-up approach that sequentially samples interactions from a single unsupervised NF model trained solely utilising the weight function of the desired diagram type.This strategy represents an initial attempt to enhance DMC using generative machine learning.It could potentially be extended to other model Hamiltonians by designing NF networks capable of representing the distinctive features of the targeted diagrams.For instance, potential future developments include the incorporation of phonon momenta to account for dispersion and the utilisation of multiple NF models to describe systems with combined interactions [12,14].Moreover, its high flexibility, stemming from the ever-growing variety of NF architectures, the freedom to choose the order distribution, and the option to train without the need for a database, may attract interest from the broader quantum Monte Carlo community, facilitating its application to other perturbative methods.the work To complete the distribution we noticed how the normalisation constant e λ(τ,g) must coincide with the final result of the integration in Eq. 2 reported here p n (n|τ, g).
(A2) Therefore, by substituting the known result for G(τ ) [2] and using the fact that p n is normalized one would obtain exp[−ϵτ − g 2 (τ − 1 + e −τ )] = exp[−ϵτ − λ(τ, g)], (A3) where Ω was set as unit of measure for the energy.At the end a form of the function λ(τ, g) was obtained that respected perfectly the behaviour of the distribution λ(τ, g) = g 2 (τ − 1 + e −τ ), (A4) which avoided the need of interpolating the means found by fit of the collected statistics to get an approximated version of the exact distribution.

Appendix B: Description of the Normalizing Flow architecture
The model employed in this study is a conditional Normalizing Flow architecture using rational quadratic splines (RQS) [45] parametrized by the output of a masked multi-layer perceptron used as an autoregressive layer [50].To elucidate this architecture in greater detail, we present the mathematical formulation of T θ in an introductory way.
The parametric transformation constituting the flow is defined by its action on the components of the twodimensional input vector z, which is represented as follows: Where the function g ϕ j is an invertible RQS parametrized by ϕ j containing the coordinates and derivatives, strictly positive, of each node.Based on the number of nodes, R ϕ j , is capable of representing more complex invertible transformations within the boundaries defined by B, which was fixed to 22 in order to operate within the Green's function's domain of interest [0, 20].Generally, the transformation is completed by employing specialized neural networks to compute the appropriate ϕ j , based on the input z and our conditional parameter τ , leading to the specific transformation mapping the initial distribution p Z to the target pX .Our model employed a multi-layer perceptron (MLP) to evaluate positions and derivative of 5 nodes per spline, leading to two ϕ j composed of 15 entries each for the 2D model used in the work.In particular, a masked MLP (MMLP) was constructed based on the implementation of the Masked Autoencoder for Distribution Estimation (MADE) [42], showed in Fig. 7, to compute ϕ j as a function of τ and z <j as follows [ϕ 1 (τ ), ϕ 2 (τ, z 1 )] = MMLP(x, θ), x = concat(τ, z).(B2) Once Eq.B1 and Eq.B2 are coupled toghether they form a map acting on R 2 parametrized by the network weights θ and conditioned by τ that constitute the following conditional flow transform Where we can see how the determinant of the Jacobian, det J R θ (z, τ ), takes a simple form since the matrix is triangular.The final transformation T θ (•, τ ) presented as our model in Sec.II B was given by a composition of two R θ , giving the following variational map The Jacobian determinant can then be evaluated using the chain rule of differentiation.

FIG. 1 .
FIG. 1.(a) Graphical representation of the Green's function expansion for the single site Holstein model.(b) Vector description of a general diagram used in this work, the interaction times are stored in two separate vectors collecting the beginning and the end of the phonon lines while τ encode the length of the electron one, defining the diagram's length.

T
FIG.2.Representation of the NF-DMC workflow.The model takes as input the diagram's length τ , which is used as conditional parameter to sample both the order n and phonon interactions times b and e. n is initially sampled from a suited distribution (e.g. the Poisson distribution shown in histogram form in the inset).Then an increasing number of phonon lines is added to the initial zero-order diagram (n = 0) defined by τ by transforming gaussian samples z using the τ -conditioned NF transform T θ .Finally, for each τ an n-order diagram (i.e. with n phonon lines) is obtained, as sketched in the bottom.

FIG. 3 .
FIG.3.Target density pX and the obtained reconstruction using the NF model p θ .Both are plotted for a selected value of the conditional paramter τ of 10, while the hyperparameter α was set to 50.

4 FIG. 4 .
FIG. 4. Comparison between standard and NF-augmented DMC for the estimate of Ep and G(τ ).(a) Ep at g = 0.7 as a function of the chain's steps; the bold line marks the mean value evolution, the coloured area represents the error, and the black dashed line indicates the exact value.(b) Green's function estimate (in logarithmic scale) as a function of τ for different coupling strength g.Lines and colours have the same meaning as in panel (a).

FIG. 5 .
FIG. 5. Effective sample size of the two different MarkovChains compared at different coupling constants.The algorithm using the NF global update remains superior along the whole line, showing a reduction in performances only upon reaching g close to the model instability given by G(τ ) becoming the identity.

FIG. 6 .
FIG.6.Test of the obtained pn(n|τ, g) (red triangles) on the collected DMC distribution (blue histograms) at different τ and g values.A perfect overlap of the analytic form used in this work and the raw data is clearly seen.