Three-body renormalization group limit cycles based on unsupervised feature learning

Both the three-body system and the inverse square potential carry a special significance in the study of renormalization group limit cycles. In this work, we pursue an exploratory approach and address the question which two-body interactions lead to limit cycles in the three-body system at low energies, without imposing any restrictions upon the scattering length. For this, we train a boosted ensemble of variational autoencoders, that not only provide a severe dimensionality reduction, but also allow to generate further synthetic potentials, which is an important prerequisite in order to efficiently search for limit cycles in low-dimensional latent space. We do so by applying an elitist genetic algorithm to a population of synthetic potentials that minimizes a specially defined limit-cycle-loss. The resulting fittest individuals suggest that the inverse square potential is the only two-body potential that minimizes this limit cycle loss independent of the hyperangle.


I. INTRODUCTION
The interest in renormalization group (RG) limit cycles has steadily increased ever since Wilson has pointed out in 1971 that coupling constants provided by RG equations for field theories of strong interactions do not necessarily flow towards a fixed point, see Ref. [1]. Instead, RG equations may also allow coupling constants to approach periodical trajectories in parameter space in a sense that the same coupling constants c(r * ) = c(λ n r * ) are obtained by multiplying the n th power of some preferred scaling factor λ to the short-range cutoff r * , where n ∈ N. This close connection to discrete scale invariance and logperiodic cutoff dependence renders a variety of phenomena in particle and nuclear physics as well as ultracold atoms to be attributable to some kind of RG limit cycle. For instance, while asymptotic freedom of QCD is associated with an ultraviolet fixed point, Refs. [2,3] conjecture the QCD coupling constant to approach an infrared limit cycle in the three-nucleon system based on tuning the up and down quark masses and, thereby, further increasing the magnitudes of the np spin-singlet and spin-triplet scattering lengths a1 S0 = −23.8 fm and a3 S1 = 5.4 fm, which are already large compared to the spin-triplet effective range r 0 = 1.8 fm (or the pion Compton wavelength, λ π = 1.4 fm). At the critical quark masses the deuteron can be shown to have a vanishing binding energy, while the triton gains an infinite number of excited states. These findings are supported by Ref. [4], where triton energies in the critical region are computed up to next-to-next-to-leading order in pionless EFT.
In this context, it is especially worth mentioning the Efimov effect, which has originally been formulated for a system of three identical bosons attracting each other in the S-wave channel by one short-ranged two-body interaction, see Refs. [5,6], and has experienced nu-merous generalizations to other systems like nucleons, see Ref. [7], or macromolecules, e.g. the three-stranded DNA, see Ref. [8]. In its original formulation it states that particles may enter three-body bound states even before the potential is strong enough to allow dimers (two-body bound states) to form. If the interaction becomes resonant, that is if the S-wave scattering length approaches the unitary limit a 0 → ±∞, the three-body spectrum becomes an infinite geometric series with the transcendental number s 0 = 1.00624, while the two-body spectrum only consists of a single zero-energy bound state, see Ref. [6]. The Efimov effect can be explained by an RG limit cycle with preferred scaling factor λ = exp(π/s 0 ) = 22.69438. Ref. [9] provides an alternative approach to the Efimov effect based on effective field theory and recovers a similar value s 0 ≈ 1.0064. In 2005, first experimental evidence on the Efimov effect was found in an ultracold gas of caesium atoms by magnetically tuning scattering lengths based on Feshbach resonances, see Ref. [10]. It is important to note that the same geometric threebody spectrum as in Eq. (1) manifests itself independently of the short-range behavior of the two-body potential. This universal property originates in the fact, that due to a 0 → ±∞, an effective long-range 1/R 2 behavior in terms of the hyperradius R emerges, see Ref. [6]. Using a suitable separation ansatz, treating the longrange sector reduces to solving the radial Schrödinger equation for the inverse square potential. Regarding their coupling constants, this implies the same RG flow as one obtains from directly renormalizing the quantum mechanical 1/r 2 potential. The inverse square potential, again, is known to exhibit an RG limit cycle in the two-body sector: Ref. [11] discusses its renormalization in momentum space, while Ref. [12] compares two different renormalization schemes in coordinate space: On the one hand, potential well renormalization yields infinitely many branches of continuous coupling constants, whereas delta-shell renormalization on the other hand provides one unique coupling constant with infinitely many, log-periodic discontinuities. In experiment, the inverse square potential can be reproduced by neutral atoms interacting with a charged wire, see Ref. [13]. Ref. [14] analyzes the three-body sprectrum of three identical bosons that pairwisely interact via an actual inverse square potential. In contrast to the Efimov effect, the inverse square two-body potentials do not need to be resonant and an approximate, but not exact long-range 1/r 2 behavior of the resulting three-body potential arises directly from construction. Most interestingly, an infinite, approximately geometric series of three-body bound states is shown to exist slightly below the critical strength required to form dimers. A similar three-body spectrum is later found for a system of three identical fermions, which is quite intriguing, as the classical Efimov effect requires the pairwise interactions to take place in the Swave channel. Due to antisymmetrization, however, this does not hold for the three fermions.
As follows from the detailed explanations in Ref. [15], singular potentials are strictly used as inputs in classical RG analyses and merely specify starting points of RG flow in parameter space. For a system of three identical bosons, this motivates us to change the paradigm: We pursue an exploratory approach that consists of searching among singular, discretized and finitely-ranged two-body potentials for interactions leading to RG limit cycles in the three-body sector. Similar to Ref. [14], we do not impose any requirements on the S-wave scattering length as the Efimov effect does. Since this system is only considered at low energies, the low-energy Faddeev equation, see Ref. [16], applies and can be solved using a generalization of the classical transfer matrix method, see Ref. [17], for hyperspherical coordinates. Delta-shell regularization is then applied to the resulting Faddeev wavefunctions. Successively increasing the short-range cutoff provides detailed information about the RG flow of the corresponding coupling constant. At this point, a coupling constant exhibiting a log-periodic cutoff dependence, that is discrete scale symmetry with some preferred scaling factor, indicates an RG limit cycle. As a measure for log-periodicity we introduce the limit-cycle-loss (LCLoss) on the search space: It is constructed in such a way that it decreases the closer the RG flow of some coupling constant approaches an RG limit cycle. Consequently, an exact RG limit cycle is indicated by a vanishing LCLoss.
In a machine learning (ML) context, each step of the considered discretized potentials can be understood as an input feature. A finer discretization, which is required to acquire reliable approximations of smooth po-tentials, implies a higher dimensional feature space. Unsupervised feature learning appears to be a promising approach to gather equally expressive features on much lower-dimensional vector spaces, see Ref. [18]: We decide to unsupervisedly train a boosted ensemble of convolutional variational autoencoders (VAEs) to reconstruct two-body potentials of a targetless training set specially set up for this pretext task. In fact, we benefit in two ways from this procedure: Firstly, the mentioned ensemble is able to encode high-dimensional feature vectors to low-dimensional latent vectors, containing only the most distincive information on the original potential, and viceversa. Thereby, it provides a severe dimensionality reduction and allows to relegate our search for RG limit cycles to the much lower-dimensional latent space. Secondly, it allows to generate infinitely many synthetic potentials, satisfying the feature distributions inherent to the training set, directly from latent space. The results of the pretext task are a key ingredient for the downstream task, where the actual search is performed. Here, we apply elitist genetic algorithms (GAs) motivated by Goldbergs's Simple Genetic Algorithm, see Ref. [19], to several independent populations of synthetic potentials in parallel, drawn from a multivariate standard distribution in latent space. This paper is organized as follows: At first, Sec. II briefly recapitulates hyperspherical coordinates and the low-energy Faddeev equation. For locally constant potentials we demonstrate that a simple separation ansatz in terms of a hyperradial and a hyperangular wavefunction suffices to cover all possible solutions. Accordingly, a generalized transfer matrix method allows to construct solutions for piecewise-constant potentials. The resulting zero-energy Faddeev wavefunctions are then used to formulate a matching condition for the delta-shell regularization of the given two-body potential, which yields the corresponding coupling constant for any hyperangle and cutoff hyperradius. Proofs of commutation relations, eigenvalue equations and the lengthy computation of limits, that are necessary to comprehend the results of this section, can be found in Appendices A, B and C. Sec. III introduces a scaling operation on potentials which causes all features to be of a similar order of magnitude and, thereby, to be more suitable for ML tasks. Distinguishing between the short-range and long-range regime, it provides a detailed explanation how training and test sets consisting only of scaled potentials are generated. Sec. IV motivates the necessity of dimenisonality reduction and to divide our exploratory approach into a pretext task and a downstream task. In the pretext task we train a boosted ensemble of convolutional VAEs to reconstruct the scaled potentials from the training set. Due to boosting, there is a hierarchical order among the individual members of the ensemble, which is inherited by the encoded potential. This consideration leads to the concept of latent curves from which synthetic potentials are generated in the downstream task. After having introduced the LCLoss, we may, therefore, rather understand it as a function mapping a latent curve to some non-negative number. This, finally, allows to settle on an elitist GA that is applied to fifty populations drawn from a multivariate standard distribution in latent space. At the end of each GA, we extract the fittest individual, that is the latent curve with the lowest LCLoss and compare the results in Sec. V. We end with some outlook on further related investigations.

A. Equation of motion
While the two-body problem is classically treated in spherical coordinates, a formulation in hyperspherical coordinates is advantageous for approaching systems of three particles, as demonstrated in Ref. [20]. Together with few assumptions on the potential, angular momentum, and total energy, this leads to the low-energy Faddeev equation, an integro-differential equation of motion for the three-body system in the low-energy regime.

Hyperspherical coordinates
Let all interactions within the three-body system at hand be governed by one spherically-symmetric and finitely-ranged two-body potential V , such that the total potential V tot is represented by the sum (2) In the following, any distance is given in units of the range ρ of V . Due to the translational and rotational invariance of V tot around the center of mass, only four of the originally nine degrees of freedom r 1 , r 2 , r 3 , with r i representing the position vector of the i th particle, remain. Assuming that all three bosons have the same mass, these are covered by the hyperradius as well as three hyperangles with (i, j, k) being any permutation of (1, 2, 3), see Ref. [16]. Note that R can take any value between 0 and ∞, whereas each α i is restricted to the interval [0, π/2]. The hyperradius can be understood as the root-mean-square of the three pairwise distances. A large R, therefore, indicates any large |r i − r j | in general. In contrast, the hyperangles are much more configuration-sensitive. For instance, α 1 = 0 represents the scenario in which the distance between particles 2 and 3 is much smaller than the respective distances to particle 1, which is equivalent to the interaction of a single particle with a two-body cluster. Vice-versa, if α 1 = π/2, particle 1 is much closer to the center of mass than particles 2 and 3. Finally, there is the special case from which we deduce R = |r j − r k |. If any other hyperangle takes the same value α j = π/4, this implies that the three particles must be equidistant. Since we have R = |r k − r i | and due to the root-mean-square nature of the hyperradius, the only option left for the remaining distance between particles i and j is

Low-energy Faddeev equation
After separating out the center-of-mass motion and transforming from cartesian to hyperspherical coordinates, the Schrödinger equation reduces to the Faddeev equations. While these are classically a system of coupled differential equations, they can be decoupled by projecting onto S-wave states in order to work entirely in the low-energy regime, see Ref. [20]. The resulting solutions are superpositions with the Faddeev wavefunction ψ(R, α) satisfying a single integro-differential equation referred to as the low-energy Faddeev equation. As shown in Ref. [16], together with the operators and for the hyperradial and, respectively, hyperangular kinetic energies, this equation is given by From a computational perspective, Eq. (10) provides an efficient approach to solve the three-body problem in the low-energy regime.

B. Low-energy wavefunctions
The following analysis is based on working with finitely-ranged and piecewise-constant two-body potentials. In order to solve the low-energy Faddeev equation for this class of potentials, we first need to obtain local solutions at specific potential steps. Using a generalized transfer matrix method for hyperspherical coordinates to gather all relevant boundary conditions, we then connect these individual solutions smoothly with each other.

Separation ansatz for local solutions
The special case of piecewise-constant two-body potentials V entering Eq. (2) considerably simplifies the solution procedure. Therefore, we will not perform a hyperspherical expansion as in Ref. [16], where the Faddeev wavefunction ψ is decomposed into its individual channel contributions. Instead, we use the separation ansatz which exploits the assumption V ( √ 2R sin(α)) = u/2m with u = const. for some hyperradius R and hyperangle α to an even greater extent. Inserting Eq. (11) as well as the given potential into the low-energy Faddeev equation Eq. (10) allows us to move all hyperangular dependencies to the right-hand side, which defines a hyperradial function γ: From here, we can deduce two equations of motion that are connected via the expression γ(R)/2m on the lefthand side: One equation governing the hyperangular and the other one governing the hyperradial sector.

Solving the hyperangular sector
We first solve the hyperangular sector. The equation of motion for the hyperangular wavefunction φ, is obtained by rearranging the terms on the right-hand side of Eq. (12) and the contribution of γ on the lefthand side. Note that the hyperradius enters only as a parameter. Similar to the low-energy Faddeev equation, Eq. (13) is also a homogenous, integro-differential equation. Its shape satisfies where D(R) and I denote a differential and, respectively, an integral operator that are defined by their action on the hyperangular wavefunction φ(α): The solution to Eq. (14) is much less complicated than it may appear on first glance. This is because the two operators D(R) and I can be easily shown to commute (see App. A), which implies the existence of simultaneous eigenstates. This observation allows us to search for eigenstates of the individual operators and subsequently adapt γ to the corresponding eigenvalues, such that Eq. (13) is fulfilled. Due to its fairly simple structure, we start with the eigenvalue equation for the operator D(R), The general solution to Eq. (17) is given by We have to consider the denominator of the separation ansatz in Eq. (11): In order to keep the Faddeev wavefunction ψ integrable, the hyperangular wavefunction φ needs to vanish simultaneously to the expression sin(2α) in the denominator. Since hyperangles α are restricted to 0 ≤ α ≤ π/2, this defines exactly two boundary conditions for Eq. (18), Obviously, the coefficient B must vanish to satisfy the first boundary condition. The second boundary condition restricts the eigenvalues to a discrete set and establishes a connection to the hyperradial function γ, with n = 1, 2, 3, . . . . Note that choosing n = 0 is forbidden as this yields g 0 (R) = 4 − γ(R): When inserting g 0 into Eq. (18), we see that this leads to the trivial solution φ(α) = 0. The eigenstates corresponding to the eigenvalues g n then turn out to be the simple modes φ n (α) = sin(2nα).
It can be easily checked that these eigenstates are a complete orthogonal system, which agrees with the hermitecity of D(R). Since we did not perform a hyperspherical expansion, the index n does not label the individual channels as in Ref. [16]. Instead, considering the eigenstates φ n once more, n can rather be understood as a node index, similar to the radial quantum number in the hydrogen atom. In App. B we examine how the integral operator I acts on the eigenstates φ n from Eq. (21). In fact, for each node index n = 1, 2, 3, . . . the eigenstate φ n of D(R) also turns out to be an eigenstate of I, Most interestingly, the corresponding eigenvalues i n = sin (2πn/3) /n appear to be non-degenerate with one severe exception. If the node index n = 3n is a multiple of three, we observe i 3n = 0 with n = 1, 2, 3, . . . . This not only shows that the eigenvalue i n = 0 is degenerate, but its eigenspace and, therefore, the kernel of I is infinitely dimensional. Now we insert the simultaneous eigenstates φ n into Eq. (14) and substitute the operators D(R) and I by the respective eigenvalues given in Eqs. (20) and (22), As Eq. (23) must hold for any node index n, this finally defines a discrete family of hyperradial functions 3. Solving the hyperradial sector The equation of motion for the hyperradial sector of the low-energy Faddeev equation follows from the first equality in Eq. (12), where k 2 = 2mE is the square of the total momentum. Since it is clear that the hyperradial wavefunctions must depend on k as well as on the node index n due to its connection to the hyperangular sector, we have included pertinent indices provisionally. We use the expression from Eq. (24) for the family of hyperradial functions γ n we gathered by solving the hyperangular sector. After several steps of term rearrangement we arrive at the equation The k n (v) can be understood as modified momenta and are defined by on the entire complex plane. The binary function ξ(k) = +1 for Re(k) ≥ 0 −1 for Re(k) < 0 (28) ensures that the total momentum is correctly reproduced, that is k n (0) = k, if the potential vanishes locally or, respectively, k 3n (u) = k if the node index is a multiple of three on the entire complex plane. Note that k n (u) 2 /2m corresponds to the kinetic energy in the limit n → ∞ of an infinitely large node index. Eq. (26) is related to the Bessel equation and is solved by the linear combination with J n and Y n denoting the n th Bessel function of first and second kind, respectively. Finally, we can combine the hyperangular wavefunctions φ n from Eq. (21) and the hyperradial wavefunction (29) according to the separation ansatz Eq. (11) and obtain an expression for the Faddeev wavefunction, Piecewise-constant two-body potential with three transition radii r1, r2 and r3. Due to r = √ 2R sin(α), we can adopt the classical transfer matrix method to determine Bessel coefficients for hyperspherical coordinates. Keeping the hyperangle α fixed, we apply transfer matrices for adjacent hyperradii R. As a consequence, α enters both Bessel coefficients as a parameter.

Transfer matrix method for hyperspherical coordinates
Up to a normalization, the Bessel coefficients A k,n and B k,n in Eq. (30) can be arbitrarily chosen. However, Eq. (30) assumes the two-body potential V to be constant in the vicinity of r = √ 2R sin(α) for some hyperradius R and hyperangle α. A piece-wise constant two-body potential V is constant between any adjacent transition radii 0 ≤ r i−1 < r i ≤ 1, that is where the origin r 0 = 0 serves effectively as the zeroth transition radius. In contrast, the final transition radius is always r F = 1. Radii beyond r F , with F being the number of all potential steps, correspond to radii beyond the range of V , which imposes u F +1 = 0. In the following we will refer to the index i enumerating the steps u i of the potential as the step index. The transition radius r i can be translated into transition hyperradius R i that also depends on the hyperangle, Eq. (32) allows to translate Eq. (31) into hyperspherical coordinates. An examplary potential with three non-zero transition radii is shown in Fig. 1. From now on, we work with dimensionless potentials U = 2mV for which we obtain the hyperspherical segmentation For some hyperangle α, we know that the Faddeev wavefunction ψ k,n (R, α) must behave like the solution given in Eq. (30) between any pair of transition hyperradii R i−1 (α) and R i (α), with k (i) n = k n (u i ). If generalized to hyperspherical coordinates, the transfer matrix method, see Ref. [17], appears to be quite promising for providing the hyperradial and hyperangular dependences of the Bessel coefficients. The basic idea of this hyperspherical transfer matrix method is to leverage the continuity and differentiability of the Faddeev wavefunction to formulate two boundary conditions on the Bessel coefficients at each transition hyperradius R i (α) of the piecewise-constant potential. Eqs. (35) and (36) form a system of linear equations that can be solved for the Bessel coefficients Eq. (37) can be understood as a single vector equation relating the Bessel coefficients of the two potential steps around some transition hyperradius R i (α) for a given hyperangle α via a multiplication with the transfer matrix Using the product for two countable function sequences {f n } n∈N and {g n } n∈N , the matrices M can be shown to satisfy the pattern In Eq. (40) we have kept the dependences on the hyperangle α, the two-body potential U , the node index n and the step index i implicit for the sake of brevity. The naive hyperspherical transfer matrix method allows to freely choose Bessel coefficients A (1) k,n and B (1) k,n for the step index i = 1 and then apply a chain of the required transfer matrices to obtain the Faddeev wavefunction at any hyperspherical configutation (R, α). However, choosing B (1) k,n = 0 is necessary due to the singularity of the Bessel functions Y n of second kind at the origin. We can then choose A (1) k,n = 1 as the wavefunction can still be normalized afterwards. This completely determines all remaining coefficients from Eq. (30) as There is a residual factor 1/R 2 in Eq. (30), which on first glance causes the Faddeev wavefunction to diverge in the origin, even if B (1) k,n = 0. Fortunately, this does not break the normalizability of the Faddeev wavefunction due to the following reasons: Firstly, ψ has at least a first-order root in the origin, since the smallest allowed node index is n = 1. This definitely eliminates one factor 1/R. Finally, the second 1/R factor is eliminated by the Jacobi determinant during a hyperspherical integration.

C. Delta-shell regularization
The piecewise-constant two-body potentials that we later work with approximate singular potentials. These again are known to produce unphysical infinities in the short-range sector, see Ref. [21]. Since we are only interested in the low-energy sector, this issue can be remedied by a suitable renormalization method. Fig. 2a) the corresponding polar plot using the relation r = √ 2R sin(α) is displayed. Note that all contours are represented by horizontal lines R(α) = r/( √ 2 sin(α)) with fixed r. With this knowledge, a delta-shell regularization with shortrange cutoff r * can be visually understood as eliminating the potential for all (R, α) below the line R * (α) = r * /( √ 2 sin(α)), as shown in Fig. 2b).
The key idea of the Wilsonian renormalization group approach is to eliminate the short-range degrees of freedom above some UV-cutoff by regularizing the potential and introducing additional cutoff-dependent couplings. It is essential that the renormalized potential faithfully reproduces chosen low-energy observables, which defines a matching condition on the new couplings.
Delta-shell regularization, which is thoroughly explained in Ref. [12], is a convenient regularization technique for potentials in configuration space. In contrast to related techniques like potential well regularization, it provides a unique coupling and the short-range cutoff is not bounded from below. The standard delta-shell regularization substitutes a given two-body potential U up to some cutoff radius r * by a delta-shell potential, such that the regularized potential reads with the coupling constant H(r * ). Since we are about to match the logarithmic derivatives of the hyperradial zeroenergy wavefunctions, we have to formulate Eq. (43) in terms of hyperspherical coordinates. This specific matching condition introduces respective dependences on R and α as well as a dependence on the node index n to the coupling constant and, lastly, to the renormalized potential. A very important step is to translate the radial delta distribution to a hyperradial one. Analogously to the cutoff radius r * we may define a cutoff hyperradius R * = r * / √ 2 sin(α) for some hyperangle α. As α only runs from 0 to π/2, we can be sure that sin(α) ≥ 0. For the delta distribution this means that we can extract a factor √ 2 sin(α) from its argument, Inserting Eq. (44) into Eq. (43) and implementing the above mentioned adjustments then yields the hyperspherical delta-shell potential, for hyperradii R < R * enclosed by the delta-shell. A delta-shell regularization eliminates all transition hyperradii R i−1 < R * for any hyperangle α as the regularized potential U n (R * | R, α, H n ) uniquely vanishes inside of the delta-shell, see Fig. 2a) in comparison to Fig. 2b). This makes R 1 = R * effectively the first transition hyperradius of the regularized potential and imposes u 1 = 0. All other transition hyperradii remain unchanged due to the limited range of the delta-shell potential, that is . . together with the corresponding potential steps u 2 = u i , u 3 = u i+1 , . . . and so on.
In contrast to H(r * ) from Eq. (43), we allow the coupling constant H n (R * , α) in Eq. (45) to be an arbitrary function of R * and α. This is clearly a different situation from the standard formulation of the Efimov effect, but ensures that logarithmic derivatives of Faddeev wavefunctions can be matched at all hyperangles.
When it comes to determining H n (R * , α), the hyperradial delta distribution invites to integrate over the lowenergy Faddeev equation. It is helpful to simplify the low-energy Faddeev equation even further by inserting the eigenvalues of the integral operator I from Eq. (22), Eq. (46) has to be understood as a direct relation between the coupling constant and the zero-energy Faddeev wavefunction. Proceeding from here, an integradion over the infinitesimal interval [R * − δR, R * + δR] covering the cutoff hyperradius yields (47) While the zero-energy Faddeev wavefunction itself, of course, must be continuous, the same does not hold for its hyperradial first-order derivatives as a consequence of the delta-shell potential. Although this means that the right-hand side of Eq. (47) does not necessarily vanish, most terms of its integrand do become negligible due to integrating over an infinitesimal interval. In fact, the only non-vanishing contribution comes from the secondorder hyperradial derivative of the corresponding kinetic energy operator, c.f. Eq. (8), acting on the hyperradial zero-energy wavefunction, .
(48) Here, f (1) 0,n denotes the hyperradial wavefunction inside of the first segment of the regularized potential. We assume the cutoff hyperradius R * , which serves as the first transition hyperradius of the regularzed potential, to be in the i th segment of the unregularized potential. This allows to relate the corresponding hyperradial wavefunction segment with each other via f (2) 0,n = f (i) 0,n . On the right-hand side of Eq. (48) we see that the coupling constant is completely determined by matching the hyperradial logarithmic derivatives of the hyperradial zero-energy wavefunction close to the cutoff hyperradius. The calculation of the hyperradial logarithmic derivatives is relegated to App. C. With Eq. (48) in mind, we need to use the results and lim to arrive at . (51)

III. GENERATING SCALED, SINGULAR TWO-BODY POTENTIALS
The first step in our search for limit cycles is to decide on a specific class of potentials, which defines the search space, and to generate corresponding data sets. In Eq. (2) we have already restricted the total potential to superpositions of two-body potentials. For the sake of brevity, we refer to potentials as being LC, if their coupling constant H n (R * , α), obtained by a delta-shell regularization, exhibits an RG limit cycle in the three-body sector. If such a limit cycle, however, does not manifest, we consequently call them non-LC.
In order to apply the hyperspherical transfer matrix method, the generated, attractive potentials need to be piecewise-constant. Ref. [12] suggests to search especially among singular potentials for LC potentials. With this we are interested on the one hand in faithfully approximating smooth potentials, e.g. of type 1/r n , as well as simultaneously covering the short-range and long-range regime. These two criteria require us to use numerous non-equidistant transition radii and turn out to be satisfactorily fulfilled by for i = 1, . . . , F , with F = 10 3 . These suffice to probe coupling constants H n (R * , α) for the notably large range By fixing the number of transition radii, we have automatically specified each potential to consist of exactly F = 10 3 negative segments with values u i . Following machine learning terminology, we refer to the individual u i as features. This allows us to understand potentials as vectors U = (u 1 , . . . , u F ) in feature space R F and, accordingly, the number F of features to be the feature space dimension.
The problem of naively applying machine learning algorithms to search among singular potentials is that their features strongly vary in range. Instead, the scaled potentials U that are componentwisely related to the original potentials U via have their features u i at a similar order of magnitude and are more suitable for machine learning. Note that while all u i are negative, the scaled features u i can vanish and in general take positive and negative values.
A. Short-range behavior of U During the construction of data sets it is necessary to keep physical reasonability in mind.
For instance, oscillations in the short-range sector are not resolvable. Regarding its short-range behavior, we therefore construct U to be strictly decreasing. This motivates us to generate the short-range part separated from the long-range part as follows: Algorithm 1 Generate Scaled Potential -Short Range 1: take log(r) from uniform distribution U([−8.4, −3.6]) 2: x ← (−12, log(r), 0) 3: take y from uniform distribution U([0, 7.5] 3 ) 4: sort y in descending order 5: f ← quadratic spline interpolation w.r.t. x and y 6: U short ← empty list 7: for i ∈ {1, . . . , F } do 8: append f (log(ri)) to U short 9: end for 10: sort U short in descending order 11: apply Savitzky-Golay filter of order 3 with window size 601 to U short 12: sort U short in descending order 13: short-range part of generated potential Note that the features ( U short ) i uniquely determine the short-range behavior due to imposing the normalization ( U short ) F = 0. This is equivalent to the condition U short (1) = −1 for the unscaled short-range contribution.

B. Long-range behavior of U
Having generated the short-range part U short , we are only missing the long-range part U long in order to obtain the scaled potential U as Here we use the Hadamard product defined as elementwise multiplication (x y) i = x i y i . In contrast to the short-range sector, resolvability does not impose any technical limitations here, which is why oscillations may also come at play: append yi + step to y 7: end for 8: f ← quadratic spline interpolation w.r.t. x and y 9: U long ← empty list 10: for i ∈ {1, . . . , F } do 11: append f (ri) to U long 12: end for 13: apply Savitzky-Golay filter of order 3 with window size 101 to U long twice in a row 14: return U long long-range part of generated potential 3. Ten randomly generated, singular two-body potentials are plotted over the radius r in Fig. 3a). Fig. 3b) displays the corresponding scaled potential U . Figs. 3c) and d) plot U and U over the logarithm of the radius. As can be seen, potentials constructed this way behave monotonically in the short-range regime. Vice-versa, oscillations may only occur in the long-range part.

C. Training and test data sets
Combining the results of Algorithms 1 and 2 according to Eq. (54) yields one scaled two-body potential. Fig. 3b) displays ten of such randomly generated potentials. By construction, the corresponding unscaled potentials shown in Fig. 3a) are singular in the origin. Comparing with Figs. 3c) and d), we convince ourselves that oscillations only appear in the long-range regime, whereas the short-range part behaves monotonous.
In this fashion, we generate 3 × 10 4 training potentials and 3 × 10 3 test potentials forming the training set X and, respectively, the test set Y .

IV. SELF-SUPERVISED SEARCH FOR LIMIT CYCLES
Both data sets X and Y generated in Sec. III are targetless and only contain singular, scaled potentials as shown in Fig. 3. Common supervised machine learning techniques are, therefore, ruled out. Instead, the search for LC potentials can be understood as an optimization problem that aims at minimizing some loss function directly processing bare, scaled potentials. This loss function is introduced in Sec. IV B 1 as the limit cycle loss (LCLoss) and measures how close a potential is to being LC. Naively, one could think of this optimization to take place in feature space. Since we do not assume to have already found LC potentials during the creation of the data sets X and Y , an essential part of this search is to generate new potentials, also called synthetic potentials. These synthetic potentials neither appear in X or Y , but have to satisfy the respective potential distributions. However, due to the high dimension of the feature space, several problems may occur, that are briefly referred to as the curse of dimensionality, see Ref. [22]. It is clear that not every point in feature space corresponds to a singular, scaled potential as those in X and Y . This makes generating synthetic potentials a nontrivial task that heavily relies on knowledge about patterns in both data sets as well as the fundamental distribution of potentials in feature space to draw new samples from. Finally, for faithfully estimating the mentioned potential distribution, the data sets with |X| = 3 × 10 4 and |Y | = 3 × 10 3 certainly do not contain enough samples.
Fortunately, the feature space dimension F exceeds the number of degrees of freedom in both data sets by far, which strongly invites to apply suitable dimensionality reduction techniques. Thus, we also need distinguish between a pretext task and a downstream task. The pretext task finds a low-dimensional representation containing the most distinctive information on scaled, singular potentials and, thereby, mitigates the curse of dimensionality. Hereafter, the downstream task performs the actual search for LC potentials and benefits from the results of the pretext task in several ways: Firstly, knowing the low-dimensional representation much better allows to estimate a corresponding low-dimensional potential distribution to draw synthetic potentials from. Secondly, due to being carried out in a low-dimensional vector space, optimization techniques can be assumed to converge faster and to yield more robust results. Note that the downstream task no longer relies on the data sets X and Y , but entirely runs on more abstract, machine learned features found by the pretext task. Therefore, the described procedure can most likely be attributed to unsupervised feature learning, see Ref. [18].

A. Pretext task
There are numerous dimensionality reduction techniques worth mentioning: The principal component analysis (PCA), see Ref. [23], is a quite popular method. It projects a data set onto the low-dimensional vector space R L spanned by those L ∈ N eigenvectors of its correlation matrix that have the highest eigenvalues. Since the variance of the data is maximal along these axes, they are associated to the L most distinctive features of the data set. This makes PCA a rewarding approach for classification and anomaly detection tasks given smaller and less complex data sets, see Refs [24,25]. Being a linear method, it is, however, not suited to extract complex, non-linear patterns. Ref. [26] discusses the helix problem to explain the severe limitations of PCA when dealing with non-linear data: The helix problem is a non-linear toy problem in which points are closely distributed along a helix in R 3 . While it is intuitively clear that there is only one degree of freedom at play, a PCA tends to overestimate the number of required principal components to be L = 3 and, therefore, fails to reduce the dimension of the helix-problem. In addition, Ref. [18] points out, that stacking several PCAs does not yield more abstract and expressive features, as this sequence is again a linear operation and, therefore, does not enhance non-linearity.
In the case of singular two-body potentials within an F = 10 3 dimensional feature space, a non-linear approach, e.g. via autoencoders is required. An autoencoder A = D • E : R F → R F maps the feature space R F to itself and is defined as the composition of an encoder E : R F → R L and a decoder D : R L → R F , see Refs. [27,28]. Both, the encoder and the decoder are typically fully-connected (FC), convolutional (CNN) or recurrent neural networks (RNN) that are non-linearily activated and can be trained via gradient descent. They either map to or, respectively, map from the latent space R L , which plays a central role in this concept of nonlinear dimensionality reduction. The application of E to a point in feature space is called encoding. Consequently, applying D to some point in latent space is referred to as decoding. Having both, an encoder and a decoder at hand, allows us to easily associate an entire distribution X in feature space with the distribution E(X) of the corresponding, encoded data in latent space.
The training objective for autoencoders is to reproduce inputs U ∈ X as faithfully as possible. This is accompanied by the so-called reconstruction loss measuring the deviation of a given input U from its reconstruction A( U ). For L we decide to use the L1-Loss with x, y ∈ R n , although other loss functions like the MSELoss L MSE (x, y) = n i=1 (x i − y i ) 2 /n, for instance, would, of course, also be a reasonable choice.
Note that in the case of equal feature and latent space dimensions, L = F , the reconstruction loss is minimized whenever the encoder and decoder counter-act each other, such that A approaches the identity in feature space. However, if no further sparse coding techniques as described in Ref. [29] are applied, the downstream task hardly benefits from the newly acquired latent space features. Instead, in order to obtain a significant dimensionality reduction, the case L F is particularly interesting, as it strictly enforces the encoder to act as a projection onto the low-dimensional latent space. Viceversa, the decoder performs as an embedding from latent space back into the much higher-dimensional feature space. This feature extraction paradigm is also known as the bottleneck method, see Ref. [30]. On the one hand, such a bottleneck architecture may impose a severe information loss during encoding. On the other hand, if the reconstruction loss has been sufficiently minimized during training, this implies that the encoded potentials E( U ) contain the L most distinctive features of the underlying data set. The decoder, again, has learned to reconstruct the essential behavior of the original scaled potential U .
Note that the training objective hardly influences the distribution E(X) of encoded potentials in latent space. During encoding, we cannot eliminate the case that minor changes in feature space blow up and become large deviations in latent space. Therefore, the encoded distribution in latent space may not only turn out to be disjoint and multimodal, but the Euclidean metric in latent space is no faithful similarity measure for the elements of feature space, as well. This severely complicates exploratory approches based on synthetic potentials. As the downstream task strongly depends on the latter, we need to ensure that any points between two encoded potentials in latent space can also be decoded to a meaningful, scaled potential satisfying the original feature space distribution. Therefore, we do not work with autoencoders as described above, but with so-called variational autoencoders (VAEs), see Ref. [31]. VAEs introduce the concept of randomness to the latent space. The central idea behind VAEs is that encoders do not directly map to a point but rather a multivariate random distribution, in this case a Gaussian distribution N (µ, σ), with distinct mean µ ∈ R L and standard deviation σ ∈ R L in latent space. From a technical perspective, the encoder can be understood as a map R F → R 2L that projects a vector U ∈ R F from feature space a tuple (µ, log(σ)) ∈ R L × R L with an elementwise logarithm. By construction, the VAE learns to associate adjacent points in feature space with adjacent points in latent space. Before entering the decoding pipeline, we need to keep in mind that the decoder is still a function D : R L → R F , requiring an L-dimensional input. Therefore, we first need to draw a single sample l ∈ R L from the above distribution N (µ, σ), which then serves as the actual input to the decoder. However, doing so naively would spoil the differentiability of the VAE and subsequent gradient-descent based training. An alternative procedure that conserves differentiability and that comes at play here is referred to as the reparametrization trick, see Ref. [31]: Let ∈ R L be a vector drawn from the multivariate standard distribution N (0, L i e i ) in latent space. Then l = µ + σ can be treated as a sample of N (µ, σ). Since only enters as a parameter, this is a valid starting point for decoding while still being differentiable with respect to the encoder parameters.
If the VAE is has already been trained, there is no need in keeping this randomization, which would only increase the model variance. In this case, especially during validation, we, do not apply the reparametrization trick. Instead, we directly extract the bare mean l = µ as the latent vector while ignoring the standard deviation σ.  with the elementwise mean µ (X) and, respectively, standard deviation σ (X) of X.

Convolutional VAE Architecture
Since we need to faithfully cover the correct shortand long-range behavior of 3 × 10 4 training potentials in F = 10 3 dimensional feature space, using deep encoders and decoders is highly suggested. We decide to use fullyconvolutional architectures, meaning that the operation of classical pooling layers is also handled by convolutional layers with increased strides, see Ref. [32].
The encoder and decoder architecture is shown in Figs. 4a) and b), respectively. As the given scaled potentials U ∈ X are merely treated as vectors, we need to employ one-dimensional convolutional layers in both networks. At the beginning of the encoding pipeline, a scaled potential is processed by two convolutional layers over a total amount of 16 channels with kernel size 3 and stride 3 in a row. Here as well as within the decoder, convolutional layers are non-linearily activated by the GELU activation function, see Ref. [33]. The application of each of these convolutional layers reduces the amount of features per channel by a factor of three. Stacking all 16 channels leaves us with a 1776 component vector. Since we choose to work with the latent space dimension L = 3, this vector is then mapped to a vector in R 6 via a fullyconnected layer, whose first (last) three components are interpreted as µ (σ). Applying the reparametrization trick for VAEs then yields the corresponding encoded potential.
The decoder architecture is mainly a mirror of the encoder architecture: The encoded potential is again mapped to a 1776 component vector via a fully-connected layer, which is then split into 16 channels with each containing 111 features. Analogously to the convolutional layers of the encoder, we subsequently apply two transposed convolutional layers increasing the amount of features per channel by a factor of three, instead. However, the second transposed convolutional layer maps all incoming features to a single channel, which yields a 999 component vector. Finally, the remaining step of the reconstruction pipeline is to map this vector back to feature space using another fully-connected layer.

VAE Training
While autoencoders are trained to minimize the reconstruction loss, there is an additional regulator entering the objective function during the training of VAEs, known as the Kullback-Leibler-Divergence D KL (KL-Divergence), see Ref. [34]. In general, the KL-Diergence is a positive, but asymmetric measure of how far two probability distributions deviate from each other. In the case of VAEs, the KL-Divergence penalizes the encoder the more the distribution E( U ) = N (µ, σ) deviates from the multivariate standard distribution N (0, L i e i ) in latent space and is given by see Ref. [31]. The total loss that is minimized during the training of a VAE is the sum of the reconstruction loss and the KL-Divergence. Ref. [35] introduces an additional hyperparameter β to tune the contribution of the KL-Divergence, which yields the loss Working with β 1 causes the VAE to priotize mapping any scaled potential U ∈ X as closely as possible to a multivariate standard distribution in latent space. While this also provides an encoded distribution E(X) that approximates a standard distribution quite well, this severely reduces reconstruction accuracy. In contrast, the case β 1 prioritizes accurate reconstructions over having a symmetric, unimodal encoded distribution E(X) in latent space. Ref. [34] proposes a normalized factor β norm = βL/F , which is proportional to the quotient of latent and feature space dimensions. Although in our case L/F = 3 × 10 −3 , we work with an even smaller factor of β = 10 −4 . This yields more accurate reconstructions while still producing sufficiently Gaussian encoded distributions, as shown in Fig. 5.
The actual training pipeline is fairly straightforward. The VAE is trained on a standardized data set containing the potentials U with components Here, µ (X) and σ (X) denote the elementwise mean and, respectively, standard deviation of all U ∈ X. Afterwards, we train the VAE over N E = 10 epochs using batch learning with batch size 30 and the Adam optimizer, see Ref. [36]. In order to obtain more stable results, we apply an exponentially decaying learning rate schedule η i = 10 −3 /2 i−1 . Finally, we are not quite satisfied with the resulting reconstruction loss of 3.35 × 10 −2 on the test set Y , yet. This is because the VAE only manages to reproduce the superficial behavior of input potentials, as shown in Fig. 5. Using gradient boosting, see Sec. IV A 3, it is possible to drastically reduce the reconstruction loss.

Boosted VAE Ensemble
Due to the considerably low latent space dimension L = 3 the encoder shown in Fig. 4a) severely compresses incoming information. The whole VAE architecture can, thus, be understood as an extremely narrow information bottleneck. Consequently, the decoder's capacity to reconstruct not only the rough behavior of the input potential, but also finer oscillations, especially in the longrange regime, is fairly limited. This problem, however, can be remedied by a boosting-based approach. Boosting techniques combine several of such weak learners to a single strong learner, see Refs. [37,38]. For the given regression task of reconstructing scaled potentials U ∈ X, we apply gradient boosting, see Ref. [39]: We sequentially train C autoencoders A 1 , . . . , A C such that the i th member A i reconstructs the residuals of the previous VAE A i−1 :

Algorithm 3 Boosted VAE Ensemble -Training
Require: Training set X ⊆ R F , test set Y ⊆ R F 1: L ← empty list list of test reconstruction losses 2: T ← (X, Y ) training and test sets 3: C ← 0 number of VAEs in ensemble 4: while L does not converge do Hence, each of the subsequent VAEs acts as a correction to its predecessor. This yields a hierarchical sequence A of VAEs acting on scaled potentials as follows:

Algorithm 4 Boosted VAE Ensemble -Reconstruction
Require: Pretrained VAEs A1, . . . , AC Input: Scaled potential U 1: function A( U ) 2: x ← U y ← Ai(x) 6: r ← r + y 7: x ← x − y return r ensemble-reconstruction of U 10: end function After having trained the eighth VAE A 8 , we do not observe any accuracy improvement when adding further VAEs to the ensemble. Due to C = 8, the effective latent space dimension is C × L = 24. As a result of gradient boosting we could reduce the reconstruction loss to 1.18 × 10 −3 , which is more than one order of magnitude less than for the case of a single VAE. Fig. 5 shows how the reconstruction of some example potential U ∈ X improves for increasing C. While the true behavior of U can only be guessed from the reconstruction via a single VAE (C = 1), the ensemble with the maximum number of VAEs (C = 8) reliably reproduces oscillations in the long-range regime.
It is important to note that the individual VAE's latent spaces are independent of each other. Hence, there is no way to express an ensemble-encoded potential by a single point in latent space, but rather by a sequence ζ = (ζ 1 , . . . , ζ C ) ∈ R C×L of C points in latent space that we refer to as a latent curve. Here, the i th point ζ i is the encoded contribution of A i to the reconstruction of some given potential U . As a consequence, ensembleencoding satisfies a similar formulation as Algorithm 4,

Algorithm 5 Boosted VAE Ensemble -Encoding
Require: Pretrained VAEs A1, . . . , AC with Ai = Di • Ei Input: Scaled potential U 1: function E( U ) 2: x ← U 3: ζ ← empty list 4: for i ∈ {1, . . . , C} do 5: append Ei(x) to ζ 6: y ← Ai(x) 7: x ← x − y return ζ ensemble-encoding of U 10: end function Similar to a single encoder, we can relate X to a set E(X) ⊆ R C×L of latent curves. When working with an ensemble of C VAEs, the effective latent space dimension is, therefore, C × L. For our exploratory approach we are particularly interested in directly generating syn- thetic potentials from the latent space. Given the latent curve ζ = (ζ 1 , . . . , ζ C ) ∈ R C×L , the decoded potential is the sum of all decoded contributions, Knowing the underlying latent curve distribution E(X) is essential for generating synthetic potentials using Eq. (59). Fig. 6 displays the distribution of ζ i for each autoencoder A i and a sample of 200 latent curves ζ = (ζ 1 , . . . , ζ 8 ) ∈ E(X). The shown distributions strongly resemble each other. When considering the whole set E(X) we find each distribution being centered closely to the origin and having standard deviations ranging between 0.8 and 0.9, depending on i and the considered axis in latent space. We will, therefore, approximate the latent curve distribution by a multivariate standard distribution. Fig. 7 shows three different families of synthetic potentials. The three main curves U 1 (r), U 2 (r) and U 3 (r) have been generated from a standard distribution in R C×L . The dashed curve is the decoded origin of R C×L and can be understood as an average representant of X.

Limit-cycle-loss
The boosted VAE ensemble from Sec. IV A 3 provides a dimensionality reduction which greatly simplifies the search for LC potentials in latent space. Having specified the latent space as the actual search space, Three different families of synthetically generated, scaled two-body potentials. At first, three random latent curves are decoded, which yields the scaled potentials U1, U2 and U3. In ten equidistant steps, these three latent curves successively approach the origin of latent space. Decoding these provide the more opaque potentials. Finally, the dashed black curve displays the decoded latent space origin, D(0), which can be understood as the average potential of the underlying training set.
this section is dedicated to the motivation of the limitcycle-loss (LCLoss): For a latent curve ζ ∈ R C×L the LCLoss L LC determines how much the coupling constant H n (R * , α) of the potential D(ζ) deviates from a log-periodic behavior. Therefore, the LCLoss has to be understood as a function L LC : R C×L → R + 0 . If and only if L LC (ζ) = 0, the potential D(ζ) has to be LC. For simplicity we choose a fixed hyperangle α = π/2 and node index n = 1. Figs. 8a) and b) demonstrate how the LCLoss for some ζ ∈ R C×L is computed: The idea is to find all poles p i of H n (R * , α) in the interval log(R * ) ∈ [−12 − log( √ 2 sin(α)), − log( √ 2 sin(α))] (Fig. 8a). Considering the pairwise distances p i+1 − p i between adjacent log-poles yields some distribution Λ, see Fig. 8b). If D(ζ) is LC, then all poles are equidistant, which causes all pairwise distances to be identical and the standard deviation σ(Λ) to vanish. In this case, log(λ) = µ(Λ) is the log-periodicity of H n (R * , α), which is related to the preferred scaling factor λ of that limit cycle. In general, larger LCLosses indicate larger deviations from a log-periodic behavior. This makes the quotient a promising candidate for the required measure. However, there is one difficulty that still needs to be addressed. Namely, if H n (R * , α) is not singular or if it gives rise to a small number P ∼ O(1) of poles, then the LCLoss defined as in Eq. (60) is no reliable measure for LC-ness. The same problem holds if the density of all poles is of the same order as the hyperradial resolution, FIG. 8. a) displays the coupling constant H1(R * , π/2) obtained from delta-shell renormalization for a synthetic potential at hyperangle α = π/2. The poles are clearly not equidistant, which violates discrete scaling symmetry and indicates that the given potential does not give rise to a limit cycle. b) that shows how the distances ∆x between adjacent poles are distributed. The width of this distribution can be used to measure the deviation from a limit cycle.
that is µ(Λ) ≈ 1.2×10 −4 , which corresponds to a number of P ≈ 10 5 poles. Therefore, we require to avoid an ill-defined LCLoss. Since all scaled potentials U in X and Y have been generated to satisfy the normalization U (1) = 0, that is U (r) = −1, there is one additional degree of freedom we have ignored until now. In the following we introduce an additional displacement s that is fed to the rescaled potential as follows: This corresponds to scaling the original potentials by the factor e 8s . While the number of poles increases with larger s, the shape of the distribution Λ turns out to be invariant under this operation: An LC (non-LC) potential stays LC (non-LC), regardless of the value of s.
An appropriate choice of the displacement s for some latent curve ζ allows an alternative normalization in which H n (R * , α) has only a fixed number P = 10 2 of poles and which is, therefore, more compatible with the above definition of the LCLoss. The relation between latent curves ζ ∈ R C×L and the correct displacements is established via a supervisedly trained ensemble S of C = 8 CNNs S 1 , . . . , S C : R C×L → R. Each CNN shares the same architecture as the decoders D i , see Fig. 4b). In fact, the only difference lies within the output layer, which maps to a real number s ∈ R instead of some U ∈ R F . The loss function to be minimized during the supervised training of the S i is simply the L1Loss between targets and predictions, as defined in Eq. (56). The training and test sets, X S and Y S , we use to train the ensemble S contain |X S | = 3 × 10 3 and, respectively, |Y S | = 3 × 10 2 pairs (ζ, s). For each latent curve ζ ∈ R C×L the corresponding displacement s is found via a grid search among the 150 equidistant displacements s i = −5 + (i − 1)/149. As the number of poles strictly increases monotonically in terms of the displacement s, there is a unique solution for each latent curve.
S undergoes a similar boosting procedure like the VAE ensemble A, see Algorithm 3. Of course, it is important to realize that during the i th iteration we only adapt the targets of T 1 and T 2 to the previous residuals s − i−1 j=1 S j (ζ), while the given latent curves remain unchanged. Then the resulting ensemble-prediction on the displacement s is the sum For the individual CNNs S i a similar training pipeline as in Sec. IV A 2 proves to be useful. We only introduce minor changes like an additional weight decay of 10 −3 and noise injection with the standard deviation 0.025 for further regularization as well as a modified learning rate schedule with η i = 10 −3 · 0.95 i−1 . To verify that the ensemble S meets the requirements, we generate 10 3 synthetic potentials U = D(ζ) ∈ R F from a standard distribution in R C×L . After rescaling each potential to we count all poles of the coupling constant H n (R * , α) for α = π/2 and n = 1. The resulting distribution of pole numbers P is approximately Gaussian, having the mean µ = 101.8 and standard deviation σ = 21.2. Indeed, the standard deviation is not that small, compared to the mean. Nevertheless, using the rescaled potentials from Eq. (64), the ensemble S still suffices for a sane definition of the LCLoss in Eq. (60). This is because virtually each synthetic potential that is generated during the downstream task exhibits the right amount P of poles in its coupling constant to satisfy Eq. (61).

Genetic algorithm for a latent curve population
Since the LCLoss depends on the exact pole positions of the coupling constant H n (R * , α), its loss surface is neither smooth, nor convex, which renders optimization approaches based on gradient descent fairly difficult. Instead, we decide to search for LC potentials using an elitist genetic algorithm (GA), which is motivated by Goldbergs's Simple Genetic Algorithm, see Ref. [19].
The very first step of each GA is to initialize a population. Here, we draw 100 latent curves (the individuals) from a standard distribution in R C×L , which all together form the initial population. Then we compute the LCLoss for the corresponding rescaled potentials from Eq. (64). The main part of the GA is organized in generations, where each generation consists of a sequentially executed selection, crossover, mutation and removal phase. Using standard genetic operators we aim at evolving the population towards fitter individuals, that is potentials with lower LCLosses, within 100 generations.
At the beginning of each generation, we identify the fittest individual ζ fittest , which is the latent curve with the lowest LCLoss. ζ fittest is reserved for crossover and cannot be eliminated during removal phase. Within the selection phase, we carry out ten tournament selections with tournament size 20 among all individuals but the fittest one. Once an individual has won a tournament due to having the lowest LCLoss compared to the other 19 competitors, it is reserved for crossover and cannot participate in the following tournaments. During the crossover phase, the fittest individual ζ fittest mates with each of the ten individuals ζ i with i = 1, . . . , 10 that won a tournament during selection phase. Each couple (ζ fittest , ζ i ) generates two children ζ via a heuristic crossover, with r drawn from the uniform distribution U([0, 1]). At this step, the population consists of 120 individuals. Due to being extremely elitist, this GA is prone to converge against local minima. In order to counteract this issue, we enhance the genetic diversity, which can be understood as width of the population in R C×L , during mutation phase. Gaussian mutation appears to suffice these diversification requirements and is applied to the offspring provided by the crossover phase as follows: where f ∈ R C×L is drawn from the multivariate normal distribution N (1, σ g ) with g denoting the number of past generations. We apply an exponentially decaying mutation parameter σ g = 0.5 · 0.98 g , which causes mutation to become less relevant compared to crossover towards later generations. In doing so, we assume that the GA finds well performing individuals during early generations and afterwards only needs to perform a crossover-based fine-tuning. At the end of the generation, we carry out 20 tournament selections with tournament size 20. However, in contrast to the tournaments during selection phase, we remove the individual with the highest LCLoss from the population, such that we are left with 100 individuals at the end of each generation. The GA as described above is not performed once, but in parallel for 50 populations that evolve independently from each other, instead. After fulfilling the stopping criterion of reaching the 101 st generation, we extract the fittest individuals from each population.

V. RESULTS
Whether the search for LC potentials is successful depends primarily on the results of the downstream task. While the boosted VAE and CNN ensembles A and S merely provide the theoretical framework to reliably compute LCLosses in a low-dimensional representation, the actual search is carried out by the GAs in Sec. IV B 2. For this reason, the following analysis is heavily based on the fittest individuals that are produced by the latter.

A. Fittest individuals from genetic algorithm
The fact that we have applied the above GA not to one, but to fifty independent populations in parallel allows to estimate a distribution Z of latent curves in R C×L , based on the corresponding fittest individuals. In order to save computational resources, we want to avoid carrying out the same GA explicitly to further populations and subsequently extracting the fittest individuals in each case. Instead, we directly draw new latent curves ζ from the distribution Z, which have expectedly low and can compete with the fifty extracted latent curves from Sec. IV B 2. Fig. 9 displays the 1σ and 2σ levels of that distribution. It is remarkable how similar the corresponding scaled potentials U = D(ζ) are to each other, especially in the short-range regime, all U (r) behave notably linear in terms of log(r).
The mentioned similarity implies that the loss surface of the LCLoss does not exhibit several distinct local minima that perform equally well, but one global minimum each GA draws its offspring towards, instead. Our search for LC potentials, therefore, appears to yield a unique solution. The fittest of all 50 individuals is the latent curve with the lowest LCLoss of among all GAs. Its considerably low LCLoss of L LC = 1.06 × 10 −2 indicates the associated scaled potential to be close to this unique LC potential.
The variance in the long-range regime is slightly increased, which can be traced back to the behavior of the VAE ensemble and the nature of the LCLoss. Depending on the given latent curve, the VAE ensemble may produce perturbations in the long-range sector. As long as the potential ensures a coupling constant H n (R * , α) with almost equidistant log-poles in the short-range sector, the loss gain due to these perturbations becomes negligibly small. Finally, in an actual GA such an individual may still be superior compared to other individuals in the same population. To conclude, we attribute the non-linearity in the long-range sector to a lack of sensitivity of the LCLoss to deviations from LC-ness in narrow log(R * )-intervalls. As a consequence, we assume the demanded scaled potential to be a linear function in log(r). Fitting a linear model a log(r) + s each of the 50 individuals yields one distribution per fit parameter. From these, we deduce a = −0.251 ± 0.003, s = 0.742 ± 0.029.
The fit model with the parameters a and s from Eq. (67) is displayed as the black curve in Fig. 9. The corresponding LCLoss L LC = 1.5 × 10 −2 is even slightly larger than that of the previously mentioned fittest individual. Nevertheless, it is still worth pursuing such linear scaled potentials as we will show in Sec. V B. Note that rescaling a potential U (r) ∝ log(r) leads to an 1/r p potential, ior U (r) ∝ log(r) suggests that it is of type 1/r p . Since p closely approximates the value 2, the 1/r 2 potential is a promising candidate for the demanded potential. Until now, however, the LCLoss has only been computed for the specific hyperangle α = π/2. As a consequence, the results of the previous GAs do not provide sufficient evidence to identify the 1/r 2 potential as a general solution that minimizes the LCLoss independent of the hyperangle.
Evaluating LCLosses of 1/r p potentials over a two-dimensional grid of exponents and hyperangles sheds light on this problem.
Since cutoff hyperradii R * = r * /( √ 2 sin(α)) are inversely proportional to the hyperangle α while entering the arguments of the Bessel functions in Eq. (51), the number P of poles inside the considered interval [−12 − log( √ 2 sin(α)), − log( √ 2 sin(α))] diverges in the limit α → 0. As the ensemble S does not provide an appropriate normalization at other hyperangles than π/2, it cannot counteract the increase in P . Finally, the pole density must not exceed the hyperangular resolution, which is why we need to define a lower bound for hyperangles to be considered here. Fig. 10 displays the evaluated LCLosses over the rectangle (p, α) ∈ [1.92, 2.08] × [π/8, π/2]. We observe the shown loss surface to have a distinctive ravine for the exponent p = 2. Ranging from L LC = 2.332 × 10 −4 being the global minimum to L LC = 9.389 × 10 −4 , the losses along p = 2 are all satisfactorily low. The fact that the LCLoss does not vanish exactly is most likely a discretization artifact and can be neglected for our purposes.
In summary, we determine the 1/r 2 potential to be the desired LC potential as it minimizes the LCLoss independent of the hyperangle to a value close to zero. Due to the observed similarity between all fittest individuals from Sec. V A, we can exclude the existence of other LC potentials. The topology of the loss surface shown in Increasing s leads to an approximately exponential decay of log(λ)(s, α) and, therefore, to a larger number P of poles. Vice-versa, large α cause log(λ)(s, α) to increase, which reduces P , instead. Fig. 10 invites to generalize our findings to smaller hyperangles as well. Fine-tuning the definition of the LCLoss and making it suitable for smaller hyperangles, the 1/r 2 potential seems also likely to be the unique solution to our search for α ∈ [0, π/8].
C. Log-periodicity for the 1/r 2 potential There are two mechanisms that control the number P of poles the coupling constant H n (R * , α) has in the interval [−12 − log( √ 2 sin(α)), − log( √ 2 sin(α))]. When defining the LCLoss in Sec. IV B 1, we already took advantage of the first one, being the displacement operation introduced in Eq. (62). It corresponds to multiplying an additional factor exp(8s) to the rescaled potential and increases the momenta k n (u) in Eq. (27). Entering the arguments of the Bessel functions in Eq. (51), this finally increases the pole density. The second mechanism is fine-tuning the hyperangle α, which has been briefly mentioned in Sec. V B. Due to the inverse proportionality between cutoff hyperradii R * and hyperangles α, the pole density increases whenever α decreases. Similar to the displacement operation, this happens due to the arguments of the Bessel functions in Eq. (51) being proportional to R * . The central result of our analysis is that the 1/r 2 potential uniquely minimizes the LCLoss. For each pair of displacement s and hyperangle α we can, therefore, assign any 1/r 2 potential to the log-periodicity log(λ)(s, α) of its coupling constant H n (R * , α) with node index n = 1. Fig. 11 shows how log(λ)(s, α) depends on s and α and, thereby, visualizes both discussed mechanisms of steering P (s, α) ≈ 12 log(λ)(s, α) . The behavior of log(λ)(s, α) can be described by the model which suffices to cover both mechanisms.
It cannot be expressed via a separation ansatz log(λ fit )(s, α) = f 1 (s)f 2 (α) due to containing several mixed terms. Fitting this model to the data shown in Fig. 11 yields b = −6.230 ± 0.027 and the fit parameters c ij listed in Tab. I. Using Eqs. (70) and (71) we can now estimate the number P (s, α) when given some pair (s, α). For instance, Fig. 12 contains the coupling constant H 1 (R * , π/2) for the 1/r 2 potential with displacement s = 1/8, for which we count P (1/8, π/2) = 7 poles. In comparison, the estimate based on our fit model computes the log-periodicity log(λ fit )(1/8, π/2) = 1.817 ± 0.117 and the quotient 12/ log(λ fit )(1/8, π/2) = 6.604 ± 0.425 from which it predicts Having understood the behavior of log(λ)(s, α), we can retrospectively legitimate the normalization we agreed on in Sec. IV B 1. In order to achieve P ≈ 100 poles, the mean of all pairwise log-pole differences must take values µ(Λ) ≈ 1.2 × 10 −2 . If the given potential is the 1/r 2 potential or at least close to being LC, this corresponds to the log-periodicity log(λ)(s, α). When training the ensemble S, we did not have the 1/r 2 potential in mind, yet, which is why we need to assure that it provides reasonable displacements in that special case. Inserting the displacement s = 0.742 ± 0.029 found in Eq. (67) and the hyperangle α = π/2 into Eq. (71), we obtain the logperiodicity log(λ fit )(s, α) = 0.130 ± 0.023. This allows us to estimate P = 93 +16 −17 , which agrees with the desired number of P = 100 poles at the 1σ level.
The function log(λ fit )(s, α) from Eq. (71) used to model the log-periodicity log(λ fit )(s, α) also reproduces the expected limits lim s→−∞ log(λ fit )(s, α) = ∞, lim s→∞ log(λ fit )(s, α) = 0 and lim α→0 log(λ fit )(s, α) = 0, that are all beyond the fitting regime. To what extend the obtained function is suitable for extrapolation is beyond the scope of this work and reserved for future research.

VI. DISCUSSION
In this paper we pursue an exploratory approach to identify LC potentials among a larger set of discretized, attractive and singular two-body potentials based on unsupervised feature learning. Here, the expression LC refers to two-body potentials whose coupling constant's RG flow satisfies an RG limit cycle. The coupling constants H n (R * , α) themselves result from a delta-shell regularization of the given potential U ∈ R F and not only depend on a cutoff hyperradius R * > 0, but also on some hyperangle α ∈ [0, π/2] and node index n ∈ N. In contrast to the standard formulation of the Efimov effect, a delta-shell regulator that non-trivially depends on hyperspherical coordinates via H n (R * , α) is required in order to match the logarithmic derivatives of the Faddeev wavefunctions ψ 0,n (R * , α) for arbitrary hyperangles. The ψ 0,n (R * , α), again, are obtained by connecting local solutions of the low-energy Faddeev equation via a generalized transfer matrix method. Unsupervisedly training a boosted ensemble A of C = 8 convolutional VAEs A i : R F → R L , i = 1, . . . , C, to reconstruct the scaled potentials U from the training set X ⊂ R F yields a low dimensional representation of all relevant discretized two-body potentials and allows to relegate the search for LC potentials from the high-dimensional feature space R F to the lower-dimensional, effective latent space R C×L . The hierarchical structure of A is inherited by the ensemble-encoded potentials we refer to as latent curves ζ ∈ R C×L . Each latent curve can be understood as the ordered sequence of encoded contributions ζ i of all members A i . Later, it is shown that for each member A i , the encoded contributions of training potentials approximately form a standard distribution in R L . This is why latent curves themselves and, consequently, synthetic po-tentials are simply drawn from a multivariate standard distribution N (0, 1) in R C×L . As a measure for LC-ness we introduce the LCLoss L LC that corresponds to the quotient of the standard deviation and mean of the distribution Λ of all log-pole differences p i+1 − p i for any coupling constant H 1 (R * , π/2) evaluated over the interval log(R * ) ∈ [−12 − log( √ 2), − log( √ 2)] at α = π/2 for the node index n = 1. Leveraging the fact that the shape of Λ is invariant under the displacements s provided by the boosted ensemble S, we normalize synthetic potentials such that the pole number of their coupling constant takes values sufficiently close to P ≈ 100. Thereby, sampling errors within the calculation of the LCLoss are reduced.
Finally, we apply an elitist GA to fifty independent populations of latent curves, drawn from a multivariate standard distribution N (0, 1) in R C×L , and extract the fittest individual, that is the latent curve with the lowest LCLoss, when completing the final generation. Via ensemble-decoding, this distribution of fittest latent curves ζ ∈ R C×L implies a distribution of scaled potentials U = D(ζ) in feature space. The fact that these scaled potentials do not fall into several clusters, but behave notably similar indicates that they accumulate around exactly one LC potential. Since the training set covers a wide range of different singular potentials, we can safely dismiss the existence of further LC potentials. Successfully fitting a model a log(R * ) + s to each of these scaled potentials suggests the inverse square potential to be the desired, unique LC potential. By evaluating the LCLoss of 1/r p potentials with exponents close to p = 2 not only for α = π/2, but for smaller hyperangles as well, we convince ourselves that the inverse square potential, indeed, minimizes the LCLoss independent of the hyperangle. Thereafter, we study how the log-periodicity of the coupling constant H 1 (R * , α) depends on the displacement s and the hyperangle α. Finding a hyperexponential dependence on s, we exemplarily demonstrate how this fit model can be used to correctly predict the number P of poles of the coupling constant.
In the context of RG limit cycles, the inverse square potential carries a special significance and has been covered extensively in the literature. Hence, at first sight it is not surprising that the result of our search is at least to some degree related to the inverse square potential. However, it is remarkable that there turns out to be one unique LC potential that, in addition, is exactly the inverse square potential. Most interestingly, the corresponding three-body system of identical bosons with the resulting pairwise inverse square interactions is the same system whose three-body spectra have already been derived in Ref. [14]. Therefore, this paper can also be understood as a supplement to Ref. [14] highlighting the considered three-body system in view of the behavior under a delta-shell regularization.
It is important to note that while investigating RG flows of coupling constants, cutoff hyperradii have only been continuously increased. This successive elimination of short-ranged degrees of freedom uniquely renders the found limit cycle for the inverse square potential to be an infrared limit cycle. Of course, further attention also needs to be paid to the discrete set of transition radii and the finite range of all considered two-body potentials. These not only act as ultraviolet and, respectively, infrared regulators themselves, but also restrict all meaningful analyses of coupling constants to a finite hyperradial interval. In constrast to the classical definition of a limit cycle, log-periodicity of the coupling constant in this finite interval, therefore, already poses a sufficient criterion for being LC.
The non-convex and non-smooth loss surface topology of the LCLoss as defined in this paper suggests to explore the latent space using GAs. At this point, it is important to emphasize that the existence of an alternative LCLoss that is suitable for gradient descent based optimization while reliably distinguishing between LC and non-LC potentials, seems plausible. However, a formulation of such an alternative LCLoss requires deeper understanding of the fundamental mechanisms controlling the RG flow of the considered coupling constant, which are yet to be uncovered by future research.
For further investigations it would be of great interest to determine the extent to which our results can be generalized to more complex few-body systems. At this, special attention needs to be paid to whether the found LC potential again corresponds to the two-body inverse square potential and if the found solution is unique. If for whatever reason several LC potentials should arise for some few-body system, it would be promising to compare the corresponding bound state spectra and RG flows with each other.
lim R→R + * f (i) 0,n (R, α) = A (i) 0,n (α) R * J n [k (i) n R * ] + B (i) 0,n (α) R * Y n [k (i) n R * ] (C1) Let Ω n denote either the first-kind J n or second-kind Bessel function Y n . The hyperradial derivative of √ R * Ω n [k (i) n R * ] is evaluated as follows: At first simply applying the product rule yields We use the recursion relation for Bessel functions to express the last summand of Eq. (C2) in terms of Ω n . The resulting derivative is involved when differentiating Eq. (C1):