ana_cont: Python package for analytic continuation

We present the Python package ana_cont for the analytic continuation of fermionic and bosonic many-body Green's functions by means of either the Pade approximants or the maximum entropy method. The determination of hyperparameters and the implementation are described in detail. The code is publicly available on GitHub, where also documentation and learning resources are provided.


Introduction
One of the beauties of complex analysis is that if we know an analytic function of a complex variable, i.e. a holomorphic function, on an (open) subdomain we know it on any connected domain.The many-body Green's functions of quantum field theory (Abrikosov et al., 1975) are such holomorphic functions, and often it is more convenient or also numerically more stable to do calculations for imaginary frequencies or times.This is possible because of the analytic continuation.But, it eventually requires an analytic continuation back to real (physical) frequencies and times at the end.Take, for instance, the time propagation e −iHt with the Hamiltonian H from time zero to t (Planck constant ≡ 1).An analytic continuation to complex times e −Hτ , here also called Wick rotation (Abrikosov et al., 1975) t → −iτ , allows us to treat the time propagation on the same footing as the Boltzmann operator e −Hβ (β = 1/T : inverse temperature; Boltzmann constant k B ≡ 1).Not only because of the unified time propagation, (semi)-analytical calculations are often easier to formulate in imaginary times or frequencies.
Numerical approaches such as plain-vanilla exact diagonalization or solving the parquet equations (Bickers, 2004;Li et al., 2019) for imaginary Matsubara frequencies avoid poles on the real frequency axis and connected numerical instabilities or discretization errors.Some numerical methods such as quantum Monte-Carlo simulations (Gull et al., 2011;Wallerberger et al., 2019) do not even work properly for real times.In all of these cases one needs, at the very end, an analytic continuation back to the real axis if physically-relevant dynamics is calculated.
Two state-of-the-art approaches to this end are the Padé approximation (Press et al., 2007) and the maximum entropy (MaxEnt) method (Jarrell and Gubernatis, 1996).For more recent advances cf.Bergeron and Tremblay (2016), Levy et al. (2017), Kraberger et al. (2017).Also alternatives such as sparse modeling of Green's functions in the intermediate representation (Otsuki et al., 2017) or neural networks (Fournier et al., 2020) are discussed in the literature.
In this paper, we present a pedagogical introduction to the analytic continuation of Green's functions by means of the Padé approximation (Press et al., 2007) and the maximum entropy (MaxEnt) method.We exemplify how noise and the choice of parameters affect the results.Last but not least, we introduce the OpenSource Python package ana_cont, discuss its implementation and usage.
Outline.The manuscript is structured as follows: We first give a minimal introduction to many-body Green's functions and their analytic properties in Section 2. After that, we review the method of Padé interpolation in Section 3. Then we show how a probabilistic approach, using Bayes' theorem, leads to the maximum entropy method (MaxEnt) in Section 4. In the subsequent Section 5 we discuss technical aspects of MaxEnt, which allows us to establish stable workflows for determining hyperparameters.Then, in Section 6, we present our actual implementation.In detail we describe numerical aspects, as well as how to use it for self-energies and susceptibilities.The installation procedure is explained in Section 7. Finally, we conclude our work in Section 8.

Many-body Green's functions
Many properties of a system of interacting electrons are encoded in the retarded one-particle Green's function (Abrikosov et al., 1975) G R (t) = −iΘ(t) {ĉ(t), ĉ † (0)} . (1) This definition uses second quantization, where an operator ĉ † (t) creates an electron at time t, and ĉ(t) annihilates it.For the sake of brevity we omit further indices such as momentum or site here, because for the analytic continuation only the general analytic properties are of relevance.The Heaviside function Θ(t) is 0 for all times t < 0, and 1 for t > 0. Angular brackets • • • denote averaging over a grand canonical ensemble.It is usually more instructive to look at the retarded Green's function in frequency domain, where frequencies ω are equivalent to energies by setting ≡ 1.However, computational methods for many-body systems often employ the Matsubara formalism (Matsubara, 1955), where correlation functions are calculated on the imaginary time or frequency axis.The fermionic one-particle Green's function in imaginary time τ is with the time ordering operator T τ , and by Fourier transform one obtains where by iω n we denote fermionic Matsubara frequencies ω n = (2n + 1)π/β for fermions.
It can be shown that all physical information of the Green's function is encoded in a spectral function A(ω), since at any complex frequency z the Green's function is By evaluating Eq. ( 5) at Matsubara frequencies z = iω n , we get the Matsubara Green's function, and at z = ω + i0 + we get the retarded Green's function, as depicted in Fig. 1.Generally, Eq. ( 5) allows calculating the Green's function in the entire complex plane.By applying the Sokhotski-Plemelj theorem (often called Weierstrass formula) we obtain the important relation

Padé interpolation
In Section 2 we have seen that the Matsubara Green's function is connected to the spectral density through the integral relation Eq. ( 5).Since the spectral density contains valuable information, it is of interest to extract the spectral density from the Matsubara Green's function.
If the Green's function is given analytically, one may just substitute iω n by ω + i0 + .Considering that in most calculations the Green's function is evaluated only numerically, a different approach is required.
A logical next step is to approximate the numerical values of the Green's function by an analytical function, which can be evaluated on the real axis through substitution as above.This is realized in the technique of Padé approximants (Press et al., 2007), where a given set of data points is interpolated by a rational function.An efficient algorithm, designed for Green's functions, was put forward already by Vidberg and Serene (1977).Although frequently called a fit, the Padé approximation is thus actually an interpolation of given data.Vidberg and Serene (1977) show that using Padé approximants for analytic continuation works remark- ably well.The authors however also discuss a major drawback of the method.Namely, it is by no means clear a priori, which Matsubara frequencies should be selected for constructing the interpolation.Instead they propose to compare several Padé approximants, where different sets of Matsubara frequencies are used.
Another problem in the construction of Padé approximants arises, when the data points on the imaginary axis are subject to stochastic uncertainty or noise.It is then impossible to still interpolate these points by a rational function without poles in the upper half-plane.However, noise has the potential to wreak havoc on the interpolation.
We illustrate this behavior in Fig. 2 by taking a simple test function with four poles.We evaluate it on a set of Matsubara frequencies and generate four test data sets with additional noise by adding random numbers from a normal distribution with four different standard deviations on different orders of magnitude.The Padé approximant constructed from the exact data (blue symbols) perfectly reproduces the original function, its poles coincide with the true poles (black boxes) of the analytic function.Adding noise of magnitude 10 −9 (orange symbols) just slightly shifts some of the poles, leaving the resulting spectral function practically unchanged.Noise of magnitude 10 −6 (green symbols) already has a much more drastic effect.Pairs of almost canceling poles and zeros appear on the upper half plane, which is necessary for interpolation of the noisy test data.Further increasing the standard deviation of the noise to 10 −3 (red symbols), a value that is by no means uncommon in actual numerical calculations, increases the number of pole-zero pairs and the spectral function is already somewhat deteriorated.

Formulation of the analytic continuation problem
Commonly employed are quantum Monte Carlo simulations, where the results intrinsically contain some uncertainty.Although we have shown above that the Padé approximation can yield good results even in the case of noisy data, it is difficult to assess its quality in real-world cases, where the exact solution is unknown.The interpolation treats data with random noise as if they were exact.Thus, the deviation of the spectral function from the true underlying spectral function has an element of arbitrariness.
For this reason, most methods for analytic continuation do not try to construct an exactly interpolating analytic function.Instead, one tries to answer the following question: Assume that N normal-distributed data G n have been measured with corresponding uncertainty σ n .Given a linear relation like Eq. ( 5) between the spectral function A(ω) and G n , or in operator notation what is the most probable spectral function A(ω) that fits the data?One can immediately write down the probability for measuring G if the spectral function is A(ω): The maximum likelihood method [cf., e.g., Press et al. (2007)] is based on the assumption that this is numerically equivalent to the probability of A being the true spectral function, after G has been measured (assuming we have no a priori information about A): Then, the most probable spectral function is the one that maximizes the exponent of Eq. ( 9), or in other words, minimizes the merit function χ 2 This is a general least squares problem (Press et al., 2007).Let us for further analysis discretize the integral, Only for the sake of demonstration, assume that the measurement uncertainty σ n is constant for all n and the discretization of ω is uniform, ∆ω k = ∆ω.In this case the least squares problem is reduced to the following set of linear equations: Hence, the kernel determines the "hardness" of the problem, and it is indicated to have a closer look at its properties.

Properties of the kernel
The kernel that relates spectral functions to Green's functions in Matsubara frequencies is  Here, ω is a frequency on the real axis and iν n are (fermionic or bosonic) Matsubara frequencies.The properties of the kernel are best understood by first discretizing the real-frequency axis, such that the kernel becomes a matrix K jk = K(iν j , ω k ).It is however not a quadratic matrix, since the number of real frequencies has to be large enough to resolve all features of the spectral function, whereas the number of Matsubara frequencies only has to be large enough to reach the asymptotic region.These criteria are mutually independent and thus it would not be justified to choose the kernel to be a quadratic matrix.Numerical necessity also often restricts the number of Matsubara frequencies.
A singular-value decomposition can be readily performed also for general non-quadratic matrices: where U and V are column-orthogonal matrices and ξ is the vector of singular values.In Fig. 3 the imaginary part of the matrix V T is shown, together with the singular values.The columns of V can be understood as basis functions for the spectral function (Shinaoka et al., 2017).The contribution of the m-th component of the spectral function in this basis to the Matsubara data is then weighted by the m-th singular value.Since only few singular values are of significant size, the problem of minimizing χ 2 in Eq. ( 11) is ill-conditioned and there is a large space of degenerate solutions.

Regularization term or prior probability
If there are many completely different solutions to the problem, this may mean that we do not have enough information to actually solve the problem.It may however also mean that we do not use all our knowledge about the problem in the right way.Our inference has to make use of all available information, and this can be done by means of Bayes' theorem where we want to find the spectral function A (hypothesis) that maximizes the conditional or posterior probability Pr[A|G] of the hypothesis being true if data G have been measured.The likelihood function Pr[G|A] has already been defined in Eq. ( 9).Additionally, with Pr[A] and Pr[G] we now have the prior probabilities of the hypothesis and the data, respectively.Eq. ( 16) is much richer compared to Eq. ( 10) and there is hope that with Bayesian statistics we get better results than with the maximum likelihood method.
However, we first have to model the prior probability of a spectral function, Pr[A].This modeling is not univocal (Press et al., 2007), but a highly successful and state of the art choice of the entropic prior is with the entropy relative to a default model D(ω).This leads to the maximum entropy method (MaxEnt).The concept of entropy was introduced into information theory by Shannon (1948), and Jaynes (1957) established the principle of maximum entropy as "a method of reasoning" in statistical mechanics.Very successfully the maximum entropy method was employed for deconvolution of optical data (Frieden, 1972;Gull and Daniell, 1978).After further developments and applications to image processing (Gull and Skilling, 1984), it was also applied to the analytical continuation problem (Silver et al., 1990b,a;Jarrell and Gubernatis, 1996).Combining Eq. ( 16) with Eq. ( 9), Eq. ( 17), and Eq. ( 18), we have to maximize the probability where with a yet unspecified scaling hyperparameter α > 0. The prior probability of the data, Pr[G], also referred to as evidence, is fixed, since we are working with one measured data set.Therefore it does not depend on the spectral function and in our considerations can be absorbed in the normalization of Pr [A|G].
Instead of the least-squares problem of Eq. ( 11) we now face the minimization of the functional Q α [A].The entropy, which was introduced as the prior probability of the spectral function, now serves as a regularization term to the ill-conditioned least-squares problem.This becomes apparent by looking at the Hessian matrix of Q with respect to A: which is here computed by choosing a discretization of the real-frequency axis and deriving by the value of the spectral function at these discrete points.The first term of Eq. ( 21) contains a product K T K, thus its conditioning is even worse than that of K: most of its eigenvalues are practically zero.However, the second term of Eq. ( 21) adds a positive diagonal matrix, scaled by α.This has the effect of increasing the eigenvalues, i. e. the curvature of Q at its minimum, and thus makes it easier to actually locate the minimum.
Let us note that, instead of the entropy, one may also choose a different regularization term.Otsuki et al. (2017) introduced the so-called sparse modeling technique, where solutions are prioritized if they are sparse in the singular space of the kernel.An implicit regularization effect can be achieved in stochastic sampling methods (Sandvik, 1998;Mishchenko et al., 2000;Nordström et al., 2016;Ghanem and Koch, 2020).

Choice of the hyperparameter α
Having seen that the functional Q α [A] can be minimized for α > 0, we now have to determine how to actually choose a good value of α.The importance of this choice is illustrated in Fig. 4. We choose the asymmetric two-peak spectrum that is shown as a black curve in the right panel.By Eq. ( 14) it is transformed to Matsubara frequencies, and noise with an amplitude of 10 −4 is added.Assuming a flat default model, we calculate the spectral function A(ω) that minimizes Eq. ( 20) for a large range of different values of α.
The influence of α is shown as a density plot in the top left panel of Fig. 4. At very high values of α, we recover the constant default model, no influence of the data can be noticed.As α is decreased, the spectral weight concentrates in the middle, then we observe the formation of two distinct peaks.At small values of α, the peaks split and in the end we get six very sharp peaks instead of two broad ones.The number and positions of such peaks is largely determined by the noise of the data.In our test case, we could generate the noise with a different random seed, which would lead to a seemingly completely different spectrum in the low-α limit.
Since apparently the result of the analytic continuation is to a large extent controlled by α, it is extremely important to choose it in a reasonable and automatizable way.Various approaches have been proposed in order to tackle this problem.
In so-called historic MaxEnt, α was chosen in a way that the χ 2 -deviation (Eq.( 11)) is approximately equal to the number of data points on the imaginary axis (Gull, 1989).It was however found to under-fit the data and both Gull (1989) and Skilling (1989) argue that it is somewhat ad hoc.
The classic MaxEnt extends the Bayesian inference scheme to α and D (Skilling, 1989;Jarrell and Gubernatis, 1996): where the scale invariant Jeffreys' prior 1/α (Jeffreys, 1939) was used for the prior probability of α.Subsequently, the dependence on the spectral function in Eq. ( 22) is integrated out and one obtains an explicit form for Pr[α|G, D].In the bottom left panel of Fig. 4 the logarithm of the posterior probability Pr[α|G, D] is shown as an orange curve.Classic MaxEnt takes the spectral function at the most probable value of α to be the best solution to the problem.This value is usually obtained by setting ∂Pr[α|G, D]/∂α = 0, which can be analytically transformed to an equation N g = −2αS after a few approximations.N g is called the "number of good measurements", for its definition we refer to Jarrell and Gubernatis (1996).In the bottom left panel of Fig. 4 we plot −N g /2αS as a blue line; its intersection with 1 is the classic MaxEnt solution, marked by a blue vertical line.The corresponding spectrum is shown in the right panel.
Closely related is Bryan's method (Bryan, 1990), where one takes the average over all values of α, weighted by the probability: The results are usually very close to classic MaxEnt, since the probability of α is sharply peaked, see the blue and orange dashed lines in Fig. 4. As one can clearly see, the solutions of classic and Bryan's MaxEnt may lead to unphysical extra peaks -a drawback already noticed by Gull (1989).However, there is yet another approach to the decision of the best α.It is based on the so-called Lcurve criterion (Lawson andHanson, 1974, 1995).In the context of MaxEnt, it was to our knowledge first proposed and explained by Bergeron and Tremblay (2016) and in the following also applied by Kraberger et al. (2017).It is somewhat more heuristic than classic or Bryan's MaxEnt in that it relies only on the behavior of χ 2 (like historic MaxEnt).For apprehending this approach, we plot χ 2 as a green curve in the lower left panel of Fig. 4. In the limit of α → ∞ it goes to a constant high value, and for α → 0 another constant of lower value is approached.Following the reasoning of Bergeron and Tremblay (2016), there is a clear interpretation for this behavior.In the high-α limit, the (large) χ 2 -deviation has negligible weight, such that the default model On the other hand, in the low-α limit, the contribution of the entropy is reduced to enforcing positivity of the spectrum; and χ 2 slowly approaches its global minimum.The low-α region, where χ 2 is relatively flat, and the subsequent region with steep increase of χ 2 , are coined noise-fitting and information-fitting regions (Bergeron and Tremblay, 2016).The optimal value of α is situated in the transition region between information fitting and noise fitting, where logχ 2 (logα) has a kink.We therefore call this method to determine α "chi2kink".This optimal α is indicated by a maximum in the curvature χ (α) (Bergeron and Tremblay, 2016;Kraberger et al., 2017).A more numerically stable and flexible approach is to fit a function Here the red and blue vertical lines mark historic and classic MaxEnt, respectively (cf.colors in the legend of the right panel).The value of α obtained by the "chi2kink" method is marked by a green vertical line.Note that Bryan's method can not be highlighted in such a way since it is a weighted average over a range of α (in this case the probability that serves as a weight is drawn in orange.)Lower right panel: Spectral functions obtained by the four different methods compared to the real (initial) spectrum.
to several values of log 10 χ 2 (log 10 α).The curvature of φ in Eq. ( 24) is maximal at x = c − ln(2 + √ 3)/d ≈ c − 1.317/d.Since this usually leads to strong underfitting, we suggest to use a more general x = c − f /d or α = 10 c−f /d , with preferred values of f ∈ [2, 2.5].In Fig. 4 the value corresponding to f = 2 is marked by green vertical lines.
In summary, historic MaxEnt seems to underfit only with respect to classic MaxEnt, since the latter tends to grave overfitting.Bergeron and Tremblay (2016) note that the formula for the posterior probability of α is an approximation that works well only when a good default model is used.
In order to get good results with classic MaxEnt, it has therefore become common practice to increase the standard deviation σ n by a manually chosen factor to shift the peak in the probability to a larger value of α.Since the rescaling factor is not known a priori (although a dependence on the actual noise amplitude is noticed), this procedure introduces an additional degree of arbitrariness.
Determining the optimal α as the point where noise-fitting starts and information fitting ends ("chi2kink") leads to considerably larger values of α, even with respect to historic MaxEnt.Despite the drawback of being less "Bayesian", it does not rely on additional approximations and the noise amplitude of the data has to be known only up to a constant prefactor.The only remaining source of arbitrariness is the parameter f mentioned above, which controls the actual value of α to be accepted.It is however restricted to a rather narrow range, whereas the error rescaling factor in classic MaxEnt can vary by several orders of magnitude.
At least in the present context of analytic continuation, we are therefore convinced that the best method of determining α is extracting it as the border between noise-and information-fitting.

Elimination of unphysical features: Preblur
Even though we are now able to find the "best" value of α, the spectrum corresponding to it may show some undesirable artifacts.In the right panel of Fig. 4 we can see that all solutions have an enlarged curvature around ω = 0 and side peaks emerge upon lowering α.An analogous tendency was observed already in the original application of MaxEnt in image processing, where it manifested itself in huge peaks in the intensity distribution.In order to address this problem, Skilling (1991) showed that the entropy does not need to be evaluated from the spectral function directly, but from a "hidden" function h(ω) that is related to the spectral function in a linear way In standard MaxEnt, Ω is just unity.Although initially requiring Ω to be an orthogonal matrix, Skilling (1991) subsequently assumed a Gaussian convolution A = g b * h with great success and named the technique "preblur".In the latter case, Ω is a matrix that contains a Gaussian function In the context of analytic continuation, preblur was introduced by Kraberger et al. ( 2017), although already Sandvik (1998) uses a somewhat similar smoothing algorithm.
The functional of Eq. ( 20), which we have to minimize, thus becomes There we now have, after α, a second hyperparameter b that controls the width of the Gaussian g.
Let us now have a closer look at the χ 2 -term of Eq. ( 26).It sums up the difference of the fit "KA" to the data G. Explicitly the fit function is now This means that the hidden spectral function is related to the Matsubara Green's function in a similar way as the spectral function, the only difference being a "blurred" kernel K. Therefore, preblur is realized by minimizing the functional Q with a slightly modified kernel, which then yields the hidden spectral function h.The spectral function A itself is subsequently obtained by the convolution of h with the same Gaussian g b (ω) as above.
Although the minimization of Q itself does not become more difficult by introducing preblur, we now face the problem that the blur width b is not known a priori, similar as the entropy-scaling parameter α.However, knowing that we can determine a good value for α by analyzing χ 2 (α), it is now worth to analyze the χ 2 -deviation as a function of both α and b.This is done in the upper left panel of Fig. 5.Over a large range of b, χ 2 depends only on α.Only after a sharp border (at b ≈ 1 in our example) the χ 2 -deviation increases steeply with b, which can be seen best in the lower left panel of Fig. 5.In other words, increasing b up to a certain value does not change the quality of the fit (but potentially the result.)Only afterwards it is suddenly impossible to get a good fit.A heuristic explanation of this is that convolving the kernel with a Gaussian permits only peaks with a minimal width of 2b in the spectrum.Once this exceeds the "real" width of one of the peaks in the spectrum, a fit becomes impossible.This observation is corroborated by the fact that we find the maximal permissive b to be close to 1 in Fig. 5, and the actual spectrum (black curve) consists of two Gaussians with standard deviation 1.
The observations made in the upper left panel of Fig. 5 can be used to propose the following algorithm for the determination of the hyperparameters: Determine the optimal α for different values of b, starting from 0 and increasing.Accept the last value of b where χ 2 at the optimal α is smaller than, e. g., 1.5 times χ 2 at the optimal α for b = 0.The optimal values of α are marked as a red dashed line in the upper left panel of Fig. 5, and the thus determined (α, b) is marked by a red asterisk.Since the optimal values of α hardly change in the reasonable range for b, one may also use a simplified algorithm: Determine the optimal α for b = 0, then increase b, keeping always the same α.If χ 2 exceeds, e. g., 1.5 times its value at b = 0, accept the last value of b before this as the best value.The corresponding (α, b) is marked in Fig. 5 by a green asterisk.
As we can see in the upper right panel, the corresponding red and green curves obtained by analytic continuation with these two (α, b) are practically indistinguishable.In the lower right panel, the differences of these reconstructed spectra are plotted and show that the deviation from the actual result is now satisfactorily small.
Let us note in passing that the above described unphysical wiggles around ω = 0 are most prominent in spectral functions similar to our example case.In cases where the spectral function shows a gap or a peak at ω = 0, preblur is usually not required.

MaxEnt for offdiagonal elements
The considerations above were done for positive definite spectral functions with a finite norm, typically one: dωA(ω) = 1 > 0. This property is fulfilled by diagonal elements of correlation functions.In some cases it is however necessary to do an analytic continuation of offdiagonal elements, which have zero norm dωA(ω) = 0, as a direct consequence of the fermionic anticommutation relation.A vanishing norm is not consistent with a positive definite function, thus the spectral function cannot have a definite sign for offdiagonal elements.Clearly the definition of the entropy, Eq. ( 18), has to be adapted to this new circumstance (Kraberger et al., 2017).At this point, let us mention that the kernel K(iω n , ω) is the same for diagonal and offdiagonal elements.
Let us, for offdiagonal components, write the spectrum as where both A + and A − are positive definite and have the same norm.Then we can write the entropy as1 Instead of treating A + , A − and A as completely independent, we can use Eq. ( 28) to eliminate A − and write Since we are searching for a minimum of Q with respect to both A and A + , we can use to eliminate also A + .Substituting A − = A − A + in Eq. ( 29) and taking the derivative, A + and A − can be expressed by A as Inserting this back in Eq. ( 29), we arrive at the so-called positive-negative entropy As a reasonable choice for the default model, Kraberger et al. (2017) propose to include the diagonal elements of the spectrum in the following way: In our implementation, it is not necessary to add a small number ε.Also methods that continue matrix-valued Green's functions as a whole, and not component-wise, have been developed (Kraberger et al., 2017;Sim and Han, 2018).

Python package for analytic continuation: ana_cont
In the previous sections we have presented two methods for analytic continuation, the Padé approximation and the maximum entropy method.Although both of them are well-established and have been implemented before [e.g. by Bergeron and Tremblay (2016), Levy et al. (2017), Kraberger et al. (2017)], we here introduce yet another code for analytic continuation.In contrast to other implementations, it is a Python package.The advantage of this approach is increased maintainability.There are too many ways of doing analytic continuation to encode all of them into a few command line arguments.The ana_cont package is available open source at https://github.com/josefkaufmann/ana_cont.

Package structure
The central class of ana_cont is AnalyticContinuationProblem, which holds problem-specific information: real-frequency grid, imaginary frequency or time grid, data on imaginary axis, inverse temperature, and keyword value im_axis numpy array, shape (N mats , ) re_axis numpy array, shape (N real , ) im_data numpy array, shape (N mats , ), float or complex kernel_mode one of 'freq_fermionic', 'freq_bosonic', 'time_fermionic', 'time_bosonic' beta float (only required for "time" kernels) type of the kernel, see Table 1.It has a method solve, which takes several keyword arguments.The most important one is method.Currently, only two methods, pade and maxent_svd are implemented.Calling solve creates an instance of a solver object.In case of method='maxent_svd', the class MaxentSolverSVD is instantiated.The class contains all necessary functions to solve the analytic continuation problem defined before.Possible keyword arguments are listed in Table 2.
The kernel for the continuation problem at hand is provided by the class Kernel.It handles realfrequency discretization of the kernel, preblur, and rotation to the eigenbasis of the covariance matrix if provided.
Furthermore there is a class GreensFunction, whose main purpose is to construct a full complex-valued Green's function out of a given spectrum, by applying the Kramers-Kronig relation Eq. ( 5).
We want to stress that this package does not contain a "main program" that can access all features by just specifying some parameters.While this may sound rather discouraging, we believe that in fact this reduces the required amount of work both for the user and for the maintainer.On the user-side, a working script can be composed of as few as 10 lines of Python code; example scripts can be found in the GitHub repository.There is full freedom in which way to provide the imaginary-axis data: They can be read from a favorite file format, or calculated on-the-fly.Computed real-axis data can be plotted or further processed without the detour of file-IO.Thus the present analytic continuation code can be easily embedded in complex postprocessing procedures.Also for the developers this principle is a large gain, since they do not have to take care of a sophisticated user interface and flow control.Since knowledge and experience of analytic continuation are absolutely necessary to use any analytic continuation program, we do not see a principal downside to this approach.
However, for the casual user, we have cast in scripts a few standard procedures for analytic continuation with a graphical user interface (GUI).Thus, most of the standard analytic continuations of bosonic and fermionic Green's functions by MaxEnt and Padé can be done even without knowledge of Python.Tutorials we omit this step.
for the GUI can be found in the wiki of our repository on GitHub: https://github.com/josefkaufmann/ana_cont/wiki 6.2.Numerics Analytic continuation is, compared to large quantum Monte Carlo simulations or Bethe-Salpeter equation inversions, a numerically rather inexpensive task.Therefore the performance of the code does not have to be the main goal.The ana_cont package is written entirely in Python, partly trading performance for flexibility and readability, while still keeping the possibility of later optimization.Nevertheless several actions have been taken in order to make the code reasonably fast -a typical MaxEnt analytic continuation takes just O(1) seconds on an average desktop computer.

Frequency-space discretization.
For a numerical treatment it is necessary to discretize the real-frequency axis: ω → ω i .For an arbitrary function F (ω) we then define F i ≡ F (ω i ).Integrals over frequency space are converted to sums by dωF (ω) ≈ i F i ∆ i , where ∆ i is the width of the frequency interval centered at ω i .

Singular value decomposition.
One of the most important steps of MaxEnt is the minimization of the functional Q α [A] in the space of spectral functions.If the real-frequency axis is discretized into, say, 1000 intervals, the space of solutions has 1000 dimensions.This is too large for a deterministic solver, and a Monte Carlo method is needed (Sandvik, 1998).However, Jarrell and Gubernatis (1996) perform a singular value decomposition of the kernel and thereby achieve a large reduction of dimensions, e. g. from 1000 to 20 in typical cases.The optimization problem can then be solved by deterministic methods like Newton root finding or the Levenberg-Marquart algorithm.The MaxentSolverSVD follows the strategy of singular value decomposition of the kernel, as defined in Eq. ( 15).The vector ξ of singular values is truncated such that only values larger than a certain threshold (10 −10 in our case) are kept.In the singular space the spectrum is parameterized through as proposed by Jarrell and Gubernatis (1996).

Minimization problem.
The above definitions are inserted into Eq.( 20), and after a few intermediate steps the stationary condition ∂Q α /∂A m = 0 leads to which has to be solved for u m .Root finding algorithms work better, when the derivative of the root function is known analytically.Therefore, we take the derivative of f with respect to u to get the Jacobian J: The expressions Eq. ( 36) and Eq. ( 37) can be put in a form that is more efficient for numerical evaluation by making the following definitions: Then we have where the quantities W ml , W mil , and B m are precomputed in order to minimize the number of matrix multiplications during the optimization procedure.Please note that we do not use the Einstein summation convention.For the optimization, the Levenberg-Marquart implementation of scipy.optimize.rootcan be used by passing optimizer='scipy_lm' to the solver.However, we find simple Newton root finding to be much faster and numerically stable.This is activated by setting optimizer='newton'.
While the above formulas are used for diagonal elements of correlation functions, with only very small changes we arrive at a different form that can be used for offdiagonal elements.The singular-space parameterization becomes (Kraberger et al., 2017) and the minimization problem is set by 6.2.5.Determination of optimal α.In Section 5.1 we have discussed various ways of determining a good value of the hyperparameter α.All of these methods are implemented in ana_cont and can be used by setting the solver keyword alpha_determination to one of 'historic', 'classic', 'bryan' or, as recommended, 'chi2kink'.
From the algorithmic point of view, all these methods have in common that the optimization is done for several values of α, where one starts at a very high value of, e.g., α = 10 12 .If alpha_determination='chi2kink', this can be adapted by setting the keyword alpha_start.One should use a value that leads to a solution very close to the default model.Subsequently α is decreased by a constant factor alpha_div in each step, which has the advantage that the solution of the previous step can be used as a starting point.After reaching the smallest value of α (alpha_end) the function of Eq. ( 24) is fitted and the parameter f is specified by the keyword fit_position.One has to be especially careful about the choice of α when continuing offdiagonal elements of correlation functions.With a spectrum that is not positive definite, one has more degrees of freedom for noise fitting.Therefore in the double logarithmic plot of χ 2 (α) the noise fitting region may not any more be a plateau, but rather another, less steep linear slope.The optimal α has to be chosen at the point, where the slope changes.

Preblur
Maxent with preblur, as described in Section 5.2 is implemented in ana_cont.Preblur is activated by passing preblur=True to the solver, and specifying the parameter b by blur_width=b.Our recommended workflow is to always first do an analytic continuation with b = 0, and then increase it step-by-step.The final value b * should be large enough to eliminate spurious features in the spectrum, but still small enough so that the limit lim α→0 logχ 2 (α) is not significantly increased with respect to b = 0.

MaxEnt for susceptibilities
The analytic continuation of susceptibilities, i. e. bosonic Green's functions, from the imaginary to the real axis is a frequently required task.In ana_cont it can be achieved by initializing the AnalyticContinuationProblem with kernel_mode='freq_bosonic'.Like in the case of fermionic correlation functions, a relation that connects the values of the susceptibility on the real and imaginary axis is required.A susceptibility χ(ω) is a response function, therefore causality implies that its imaginary part is anti-symmetric in frequencies, χ(ω) = χ * (−ω).Inserting this fact in the Kramers-Kronig relation connecting the susceptibility at real frequencies with its values at the bosonic Matsubara frequencies ω n .Since the factor ω/(ω 2 + ω 2 n ) is divergent for ω = ω n = 0, it is common to introduce another function such that we can now write with a kernel that is well-defined for all values of ω and ω n .Correspondingly the result of the analytic continuation that is returned by the solver of ana_cont is S(ω).
Our ana_cont code has been used extensively for the analytic continuation of susceptibilities by Geffroy et al. (2019).

MaxEnt for self-energies
Besides one-particle Green's functions and susceptibilities, it often is necessary to analytically continue selfenergies from Matsubara to real frequencies.Self-energies essentially have the same analytic properties as one-particle Green's functions (Luttinger, 1961).The difference is that the high-frequency limit is a non-zero constant Σ 0 (the so-called Hartree term), and the imaginary part decays like Σ 1 /(iω n ) as opposed to 1/(iω n ) in case of the Green's function.Σ 0 and Σ 1 are called the zeroth and first moments of the self-energy and can be calculated from density matrices (Wang et al., 2011).The zeroth moment is frequently referred to as Hartree term.Note that ω n are fermionic Matsubara frequencies here.In absence of particle-hole symmetry, the spectral representation is where we defined the spectrum a of the self-energy as Thus, we can write the analytic continuation problem for the self-energy in a form very similar to Eq. ( 43): with the fermionic kernel We do not see any necessity for normalizing the self-energy by division through its first moment, but merely keep this notation for consistency with the literature.The solver of ana_cont then returns a(ω).From this we can construct the imaginary part of the retarded self-energy by inverting Eq. ( 47): ImΣ R (ω) = −πΣ 1 a(ω). (49) If one wants to use the retarded self-energy for the calculation of the retarded Green's function, it is necessary to also obtain the real part of the retarded self-energy, which is given by the Kramers-Kronig relation Eq. ( 5).This is implemented in the kkt method of the GreensFunction class.For a real-world case (SrVO 3 ) the procedure of analytic continuation of a self-energy, using MaxEnt and preblur, is illustrated in Fig. 6.

Python 3
As a prerequisite for using the ana_cont package, you need to have Python 3 installed on your computer.This is possible via the official package repositories of most Linux distributions.After installing Python 3, it may also be necessary to install the Python package manager pip (again, from the official package repositories of your linux distribution).If necessary, you can use pip to install missing Python packages2 : pip install --user <package_name> Make sure that your Python 3 environment is actually activated and pip also refers to Python 3. In the Anaconda3 Python distribution all necessary packages are already included.

ana_cont
The program files for the ana_cont package can be downloaded from https://github.com/josefkaufmann/ana_cont in several ways.It is however recommended to use git: Then the main classes, as described in Section 6, can be accessed as cont.AnalyticContinuationProblem and cont.GreensFunction.Before writing a custom script it is recommended to go through some of our learning resources on GitHub, e.g.
A tutorial related to the example case of Fig. 6 is doc/tutorial_svo.ipynb.
The Python scripts with GUIs, which are located in the scripts directory, are maxent.py(fermionic maxent), maxent_bosonic.py(bosonic maxent), and pade.py for the Padé method.All details about their execution and usage are documented in the wiki of our repository on GitHub at https://github.com/josefkaufmann/ana_cont/wiki.There we provide also tutorials with links for downloading files with test data.

Conclusion
We have given an introduction to analytic continuation by means of Padé interpolation and the maximum entropy method.These are implemented within our Python package ana_cont, which can be used with great flexibility for the analytic continuation of arbitrary bosonic and fermionic Green's functions.Extensions to other analytic continuations outside the realm of quantum field theory are possible.Through casting our established workflows in applications with GUIs, we have also made MaxEnt and Padé analytic continuation accessible for non-experts.Long-term users, on the other hand, will value the flexibility and adaptibility of the Python package and adjust the provided scripts to their individual needs.

Figure 1 :
Figure 1: Simple one-pole Green's function G(z) = 1/(z − ωp) on the plane of complex frequencies z.The background density/contours show only |Im G(z)|.The pole location ωp is marked by a red asterisk.The inset in the upper left shows the retarded Green's function (evaluated along the blue line on the complex plane), where the full line is the imaginary part and the dashed line is the real part.Clearly, the imaginary part of the retarded Green's function has a peak at ω = ωp.The inset in the lower left shows the Matsubara Green's function (evaluated at the green crosses on the complex plane).Its real and imaginary part are drawn by dots and crosses, respectively.

Figure 2 :
Figure 2: Padé approximation for a test function with four poles (black squares).The Matsubara frequencies used to construct the approximant are marked by red dots.Reconstructed poles and zeros are marked by + and × symbols, respectively, with the color coding the noise level, see legend.

Figure 3 :
Figure 3: Singular-value decomposition of the kernel for the analytic continuation [Eq.(15)].Left panel: matrix of right singular vectors Vm(ω) (m: number of the singular value).Right panel: log-plot of the singular values ξm.

Figure 4 :
Figure 4: Methods for determining α.A model spectrum (black curve in the right panel) is transformed to Matsubara frequencies and normal-distributed noise with amplitude 10 −4 is added.Upper left panel: spectral function A (color code) obtained through the minimization of the MaxEnt functional Qα for a range of values of α.Lower left panel: illustration of the four different methods of determining the optimal value of α.Here the red and blue vertical lines mark historic and classic MaxEnt, respectively (cf.colors in the legend of the right panel).The value of α obtained by the "chi2kink" method is marked by a green vertical line.Note that Bryan's method can not be highlighted in such a way since it is a weighted average over a range of α (in this case the probability that serves as a weight is drawn in orange.)Lower right panel: Spectral functions obtained by the four different methods compared to the real (initial) spectrum.

Figure 5 :
Figure 5: Analyzing the preblur-parameter b.Upper left panel: χ 2 -deviation (color code and isolines) of the MaxEnt fit as a function of the hyperparameters b and α.The red dashed line marks the optimal α at each b, the green dotted line shows the optimal α at b = 0 for comparison.Lower left panel: χ 2 along the red dashed and green dotted lines in the upper left panel.Upper right panel: The true spectral function (black), and its MaxEnt reconstructions at the points marked by orange, red and green asterisks in the upper left panel, drawn in the respective colors.The lower right panel shows the differences of these reconstructions to the true spectrum.The orange ∆A(ω) (corresponding to b = 0) is as large as 10 −2 .

Figure 6 :
Figure6: Analytic continuation of the DMFT self-energy of the V-t 2g orbitals of SrVO 3(Si et al., 2020), which was obtained with symmetric improved estimators for extra precision(Kaufmann et al., 2019).Upper row : Spectrum of the imaginary part of the self-energy spectrum a(ω) = −ImΣ(ω)/π for SrVO 3 without [left] and with [right] preblur.The figures show optimization results for several different values of α.Red corresponds to α = 10 13 , and as the color changes into blue, the value is lowered by a factor of 10 in every step.The final result, at optimal α, is drawn in black.Clearly, for the highest values of α, we recover the constant default model.Lower row : Behavior of χ 2 as a function of α [left] and b [right] for the MaxEnt analytic continuation of SrVO 3 .The values taken as optimal are highlighted in red.
git clone https://github.com/josefkaufmann/ana_cont.gitThis will create a new directory ana_cont in your current working directory.Now change to this directory, cd ana_cont.There all the code files are located.If you plan to do analytic continuations by Padé interpolation, you have to compile the Padé core functions, which are written in Cython 4 .The compilation is done by the setup script: python setup.pybuild_ext --inplace Now, in a Python script, you have to insert the path to the ana_cont package into the module search path and import the package: import sys sys.path.insert(0,'/path/to/ana_cont/') import ana_cont.continuationas cont

Table 1 :
Keyword arguments for initializing the AnalyticContinuationProblem.