Anatomy of a slow merger: dissecting secularly-driven inspirals of LIGO/Virgo gravitational wave sources

The dozens of compact object mergers detected by LIGO/Virgo raise a key theoretical question: how do initially wide binaries shrink sufficiently quickly that they are able to merge via gravitational wave (GW) radiation within a Hubble time? One promising class of answers involves secular driving of binary eccentricity by some external tidal perturbation. This perturbation can arise due to the presence of a tertiary point mass, in which case the system exhibits Lidov-Kozai (LK) dynamics, or it can stem from the tidal field of the stellar cluster in which the binary orbits. While these secular tide-driven mechanisms have been studied exhaustively in the case of no GW emission, when GWs are included the dynamical behavior is still incompletely understood. In this paper we consider compact object binaries driven to merger via high eccentricity excitation by (doubly-averaged, test-particle quadrupole level) cluster tides - which includes LK-driven mergers as a special case - and include the effects of both general relativistic precession and GW emission. We provide for the first time an analytical understanding of the different evolutionary stages of the binary's semimajor axis, secular oscillation timescale, and phase space structure all the way to merger. Our results will inform future population synthesis calculations of compact object binary mergers from hierarchical triples and stellar clusters.


INTRODUCTION
The LIGO/Virgo Collaboration has now detected around 90 compact object binary mergers (Abbott et al. 2021). However, there is still ambiguity on the theoretical side about which mechanisms drive these mergers.
One much-studied candidate is the Lidov-Kozai (LK) mechanism which operates in hierarchical triple systems (Lidov 1962;Kozai 1962;Naoz 2016), in which a compact object binary is both orbited and tidally torqued by a bound tertiary perturber. Provided the inclination angle between the two orbital planes of the hierarchical triple system is sufficiently large, the binary's eccentricity e can be driven periodically towards values approaching unity on secular timescales (i.e. much longer than any of the orbital periods in the system). This greatly reduces the pericentre distance p ≡ a(1 − e), where a is the binary's semimajor axis. Gravitational wave (GW) radiation is strongly enhanced at small p, i.e. is strongly concentrated around the maximum e → e max ≈ 1, so there is a 'burst' of orbital energy loss at each peak. By energy conservation, this corresponds to a decay in binary semimajor axis by some amount ∆a. Such losses accumulate over multiple secular cycles until the binary is so compact that it effectively decouples from the tidal perturbations and undergoes a GW-dominated inspiral. This basic understanding has inspired scores of papers that aim to understand binary black hole mergers and predict their rates (e.g. Miller & Hamilton 2002;Blaes et al. 2002;Wen 2003;Thompson 2011;Antonini & Perets 2012;Antognini et al. 2014;Silsbee & Tremaine 2017;Liu & Lai 2017).
More generally, every binary that resides in a stellar cluster feels the cluster's gravitational potential. As we showed in Hamilton & Rafikov (2019b,c) -hereafter referred to as Papers I and II respectively -the cluster potential provides a tidal torque on the binary just as a tertiary perturber would. This cluster-tidal torque can drive LK-like eccentricity oscillations in binaries on astrophysically relevant timescales. One can therefore consider the cluster itself to be a ubiquitous 'third body' which has the capacity to induce 'cluster tide-driven' mergers (Hamilton & Rafikov 2019a;Bub & Petrovich 2019;Arca Sedda 2020). The primary caveat to both the LK-driven and cluster-tide driven merger mechanisms is that 1pN general relativistic (GR) apsidal precession acts to quench the oscillations and hence delay mergers. In Hamilton & Rafikov (2021) -hereafter Paper III -we provided a systematic account of how the LK-like dynamics of binaries are modified as the strength of GR precession is varied.
Guided the basic understanding outlined above, most of the aforementioned studies either opt for direct numerical integration of the binary equations of motion (e.g. Silsbee & Tremaine 2017), or they aim at a parameterization of the total merger timescale as a function of e max (e.g. Thompson 2011;Liu & Lai 2018), and very little theoretical understanding has been developed beyond this (but see Randall & Xianyu 2018). On the other hand, a detailed look at numerical integrations suggests that LK-driven and cluster tide-driven mergers are in fact very dynamically rich, even in the doubly-averaged, testparticle quadrupole limit 4 that we consider exclusively in this paper. They exhibit non-trivial time evolution of the binary's semimajor axis, maximum and minimum eccentricity, phase space location, secular period, etc. To demonstrate this, we now provide a numerical example of a LK-driven merger. Figure 1 shows the result of integrating the doublyaveraged (hereafter 'DA') test-particle quadrupole Lidov-Kozai equations of motion for a binary with constituent masses m 1 = m 2 = 10M ⊙ orbiting a supermassive black hole (SMBH) of mass M = 4 × 10 6 M ⊙ . The peri/apocentre of the binary's 'outer' orbit around the SMBH, (r p , r a ), as well as its initial 'inner' semimajor axis a, eccentricity e, inclination i and argument of pericentre ω are given in the text at the top of the figure 5 . In panels (a)-(d) we plot the time evolution of a, e, i and pericentre distance p = a(1 − e) with black lines.

Example of a LK-driven merger
As expected, the binary undergoes large-amplitude eccentricity oscillations, which drive a decay in semimajor axis and ultimately lead to merger (a → 0). Moreover, the reader will notice three color-shaded vertical stripes in each of these panels blue, yellow and green) defining three representative time intervals of the dynamical evolution. In panels (f)-(h) we plot in black the trajectory of the binary in the (ω, e) phase space (familiar from Papers II and III) during those respective segments, starting at the green dot and ending at the red dot. Panel (e) shows ǫ GR , the dimensionless measure of the strength of GR precession (Paper III). In panels (i) and (j) we show with black dots the secular oscillation timescale t sec and the decay in semimajor axis |∆a|, respectively, as functions of a, evaluated at the end of each secular oscillation. Panel (k) shows the corresponding orbital decay timescale 6 τ a ≡ |d ln a/dt| −1 . The binary evolves from right to left in these last three panels, and the blue, yellow and green shaded segments are again indicated. There are also various critical values and analytical scalings shown with colored lines throughout the Figure, which will be explained in § §3-4.
This Figure exhibits several striking features, some of which are rarely mentioned in analyses of LK-driven mergers. For instance: • Despite the fact that a is changing with time (panel (a)), for most of the evolution there exist two approximately conserved quantities, namely the minimum inclination i min and the minimum pericentre distance p min reached during each secular cyclesee the red dashed lines in panels (c) and (d).
• The timescale for secular oscillations t sec changes by several orders of magnitude throughout the evo- Naoz (2016); Grishin et al. (2018). 5 For the precise definitions of these quantities see §2. 6 We note that the secular timescale was computed by finding the time elapsed between adjacent eccentricity maxima in panel (b), while ∆a was computed by calculating the semimajor axis before (a bef ) and after (a aft ) each peak, and defining ∆a(a bef ) ≡ a aft − a bef . To draw the colored contours (which are of constant DA Hamiltonian H * , see equation (3)) in panels (f)-(h) we set Γ = 1 and took the values of Θ and ǫ GR at the midpoint of the corresponding colored stripe (see §2 for the meaning of these constants). Note also that to not overload the plots we refrain from showing explicit separatrices. lution, and exhibits a highly non-trivial dependence on semimajor axis (panel (i)). Viewed as a function of time (right to left in that panel), it first increases with time (up until the yellow shaded stripe), then is almost constant (up to the green stripe), and thereafter decreases with time, ultimately becoming much smaller than its initial value. This decrease is counter-intuitive, since naively one might expect a more tightly bound binary to exhibit slower tide-driven secular evolution.
• The binary's phase space trajectory evolves as its semimajor axis shrinks. At early times it follows a librating trajectory around the fixed point at ω = π/2 (panel (f)), then it transitions to a circulating trajectory (panel (g)), which is ultimately pushed to very high minimum eccentricity (panel (h)).
Of these three observations, the first has been discussed by e.g. Wen (2003). The second was mentioned by Randall & Xianyu (2018), although their explanation of this phenomenon was incomplete as we show in Appendix E.
The third was briefly signposted by Blaes et al. (2002) and Antonini et al. (2016) -see §4. There is very little literature that goes into quantitative detail about these features and nobody has considered their interplay (for instance, how the changing phase space structure affects the evolution of the secular timescale). A central purpose of the present paper is to explain such dynamical characteristics. It will turn out that the qualitatively different behaviors exhibited by ∆a(a), t sec (a), τ a (a) etc. map onto different GR regimes and phase space features explored in Paper III.

Plan for the rest of this paper
This paper is organised as follows. In §2 we describe briefly the setup of our system without GW emission, and establish notation. We also gather in Appendix A some results from Papers I-III which we will refer to throughout this work. In §3 we introduce the effect of GW emission, and derive two approximate conservation laws, namely conservation of the minimum pericentre distance and the minimum inclination reached by the binary. In §4 we outline how a binary evolves through phase space and different GR regimes as its semimajor axis shrinks (much more detail is given in Appendix B). This allows us to establish several asymptotic regimes in which we can make analytic progress. Using these regimes, in §4.2 we write down approximate expressions for the secular timescale t sec (a), while in §4.3 we derive expressions for ∆a(a) and τ a (a). Again, details of the derivations are relegated to Appendices C-D. In §5 we verify our results via several more numerical examples akin to Figure 1, this time for binaries in (non-Keplerian) cluster potentials. In Appendix E we compare our work with the LK calculations of Randall & Xianyu (2018). In §6 we discuss our results, including their implications for LKdriven mergers, and summarize in §7.

DYNAMICAL FRAMEWORK
Here we briefly describe our setup in order to establish notation. For more detail see Papers I-III.
Consider a binary with component masses m 1 , m 2 , orbiting in a fixed, smooth, axisymmetric background potential Φ whose symmetry axis is Z. Let (X, Y ) describe show the time evolution of a, e, i, p and ǫ GR respectively. Panels (f), (g) and (h) show the phase space evolution during the time intervals shaded in blue, yellow and green respectively in panels (a)-(e). Finally, panels (i), (j) and (k) show with black dots the values of secular period tsec, semimajor axis jump |∆a| during the eccentricity peak, and characteristic orbital decay time τa as functions of a. the plane perpendicular to Z. Then on long timescales the orbit of the binary's barycenter (hereafter the 'outer' orbit) usually fills an axisymmetric torus. However, if Φ is spherically symmetric then the outer orbit is actually confined to a plane, which we can choose to be the (X, Y ) plane. Apart from phase information the orbit in this plane can be described by its peri/apocentre (r p , r a ). In this case, on timescales much longer than an outer orbital period the binary will fill an annulus with inner and outer radii (r p , r a ). In the special case of Keplerian Φ (the LK limit), the outer orbit describes not an annulus but a fixed ellipse with semimajor axis a g = (r a + r p )/2 and eccentricity e g = (r a − r p )/(r a + r p ).
The binary's internal ('inner') Keplerian orbital motion is described by the usual orbital elements: semimajor axis a, eccentricity e, inclination i (relative to the (X, Y ) plane), longitude of the ascending node Ω (relative to the X axis, which is arbitrary but fixed in the cluster frame), argument of pericentre ω and mean anomaly η. It is also useful to introduce Delaunay actions L = G(m 1 + m 2 )a, J = L √ 1 − e 2 , and J z = J cos i, and their conjugate angles η, ω and Ω, as well as the dimensionless variables Clearly, j must obey Θ 1/2 ≤ j ≤ 1 to be physically meaningful for a given Θ. The minimum/maximum j achieved in a given secular cycle is called j min/max . Ignoring GW emission, the evolution of the inner orbit is dictated by the mutual Newtonian gravitational attraction of the binary components, 1pN GR apsidal precession, and the perturbing tidal influence of the potential Φ. Expanding the tidal force due to the cluster to quadrupole order, and averaging over the inner and outer orbital motion (i.e. performing a weighted integral over the torus, annulus or ellipse mentioned above) we find the test-particle quadrupole doubly-averaged ('DA') equations of motion, given explicitly in equations (12)-(14) of Paper III. More succinctly, these can be derived from the DA Hamiltonian where A is a constant (with units of s −2 ) that depends on the potential and outer orbit. In the LK case, A = GM/[2a 3 g (1 − e 2 g ) 3/2 ], where a g and e g are respectively the semimajor axis and eccentricity of the outer orbit and M is the perturber mass. Next, are dimensionless Hamiltonians encoding the effects of cluster tides and GR precession respectively. The quantity Γ is a scalar parameter which, like A, depends on the cluster potential Φ and the choice of outer orbit. It takes values ∈ (0, 1) for binaries in spherically symmetric potentials Φ, and LK theory is recovered in the limit Γ = 1. Due to a dynamical bifurcation, it turns out that very high eccentricities are much more readily achieved by binaries with Γ > 1/5 than those with Γ < 1/5 (Paper II). Finally, the parameter ǫ GR measures the strength of GR precession: In the numerical estimate (7) we have assumed a spherical cluster of mass M and scale radius b, and A * ≡ A/(GM/b 3 ) -see Paper I 7 . The physical effect of GR precession is typically to quench the cluster tide-driven eccentricity oscillations, as we explored in detail in Paper III, and as has been long established in LK theory (Miller & Hamilton 2002;Fabrycky & Tremaine 2007;Bode & Wegg 2014). As we have shown in Paper III, there are typically no large e oscillations in the 'strong GR' regime ǫ GR ǫ strong ≡ 3(1+5Γ). On the other hand one can ignore GR precession in the 'weak GR' regime ǫ GR ≪ ǫ weak (see equation (A8)). In the intermediate regime of 'moderate GR', ǫ weak ǫ GR ≪ ǫ strong , high eccentricity excitation is modified but is not prohibited.
In Papers I-III we have gone into great detail about the phase space dynamics that follows from the Hamiltonian H, shown how phase space trajectories are split into librating and circulating families, and so on. We will not repeat the arguments here. However, it will be important throughout that we have expressions for various key quantities characterising secular evolution in terms of Γ and the three dimensionless numbers Θ, ǫ GR , j min , which are constants of motion when GW emission is ignored. It will also be important to split circulating phase space trajectories into two asymptotic regimes, characterised by j max ∼ 1 ('high-j max ') and j max ≪ 1 ('low-j max '), which is a distinction we did not make in previous papers. We gather all the relevant results in Appendix A, and draw upon them freely hereafter.

WAVE EMISSION
Gravitational wave emission modifies the conservative dynamical picture described in §2, allowing compact object binaries to merge. Throughout this paper we concentrate on those binaries whose merger timescale is significantly shortened by secular eccentricity excitationthe so-called 'cluster tide-driven mergers' (which includes LK-driven mergers as a special case). Following Randall & Xianyu (2018) we can separate these binaries further into 'fast mergers' and 'slow mergers'. Fast mergers are those that occur after only one (or at most a few) secular eccentricity cycles. Slow mergers occur after many secular eccentricity cycles, and inevitably involve a gradual transition of the binary from the weak-to-moderate GR regime to the strong GR regime (with e max and other important quantities changing over time) -see Figure 1. In this paper we will focus on understanding the physics of slow mergers.
We will assume throughout this paper that Γ > 0, and that the binary's maximum eccentricity e max is achieved at ω = ±π/2. As we have seen in Papers I-III these conditions cover almost all cases of practical interest (at least in spherical clusters), and are always satisfied in the important special case of LK dynamics (Γ = 1).

Slow mergers
Consider a binary with initial eccentricity not close to unity, and suppose that, unless excited to high e max ≈ 1, it will not merge within a Hubble time. For the required cluster tide-driven eccentricity excitation to be possible, we know from Paper III that the binary must begin its life in the weak-to-moderate GR regime. Supposing this is the case, and that the binary does indeed achieve high values of e periodically, then during each high-eccentricity episode its semimajor axis a will be decreased by some amount |∆a| because of GW emission. For a slow merger, by assumption, each individual decrease is small, |∆a| ≪ a, though of course ∆a itself depends on the value of a (see Figure 1j). Away from e ≈ 1, GW emission will be completely negligible, so we can treat a as constant there. Next, the time between these high-eccentricity episodes is t sec (a). Therefore on timescales longer than a few secular periods we can approximate the slow decay of a(t) as Equation (8) is an implicit equation for a(t) for slowmerging binaries. We can use it to define a characteristic orbital decay timescale at a given a: This is the quantity we plotted in Figure 1k. Eventually, a will become small enough that the binary reaches the strong GR regime and gets 'trapped' at high eccentricity. When this happens, equation (8) breaks down and we must use a different prescription to follow a(t) accurately all the way to merger.

Equations of motion
The main goals of §4 will be to understand the behavior of ∆a(a), t sec (a) and τ a (a) during a slow merger, and to appreciate how this behavior is intimately linked with the binary's phase space trajectory (librating or circulating) and the strength of GR precession (value of ǫ GR ). To achieve these goals we must first consider how GW emission affects our equations of motion.
Our DA theory without GW emission consists of equations (12)-(14) of Paper III, which govern the evolution of ω, j and Ω respectively under the combined effect of secular cluster tides and GR precession. In addition to this, to 2.5th post-Newtonian order GW emission causes the binary's semimajor axis and eccentricity to evolve according to the Peters (1964) To include GW emission in our theory we therefore add the following terms to our equations of motion for L, j and j z ≡ j cos i respectively: We see that GW emission affects both L and j z (and hence also Θ ≡ j 2 z ), which were constants when GW emission was ignored. Note also that at this order, GW emission does not directly affect the argument of pericentre ω or longitude of ascending node Ω (so equations (12) and (14) of Paper III are unchanged), nor does it affect inclination i.

Conserved quantities
When GW emission is switched off, the DA dynamics respects three exact conservation laws. The first is the conservation of a, which results from the 'adiabatic' assumption that the binary's inner orbital period is much shorter than timescale of variation of the weak cluster perturbation (i.e. the outer orbital timescale), allowing us to average over the inner orbit ('single averaging'). The second is the conservation of the dimensionless Hamiltonian H * (equation (3)), which follows from the fact that cluster tides are sufficiently weak that we can also average over the outer orbital period and thus treat the perturbation as time-independent ('double averaging'). The third is conservation of the z component of the binary's angular momentum, or equivalently Θ, which follows from the axisymmetry of the DA timeaveraged perturbation as viewed from the binary frame. Now that we are including GW emission, the binary's binding energy and inner orbital angular momentum will be dissipated according to equations (10)-(14), and so none of a, H * or Θ will be strictly conserved. On the other hand, it is clear from equations (10) and (11) that for a fixed semimajor axis a, changes in orbital el-ements due to GW emission are very strongly concentrated around peak eccentricity e → 1, as we already anticipated. Thus, in the weak-to-moderate GR regime we expect these GW contributions to the equations of motion to be completely negligible except in the vicinity of e ≈ 1. This means that binding energy and angular momentum can be treated as roughly conserved away from eccentricity peaks, since in the weak-to-moderate GR regime GW emission is negligible except for short bursts around e ≈ e max .
Additionally, as we saw in Figure 1, two new (approximate) conservation laws emerge which are valid on much longer timescales (≫ t sec ) -these are the conservation of the minimum pericentre distance p min ≡ a(1 − e max ), and the conservation of the minimum inclination i min . The fact that these conservation laws hold almost all the way to merger will facilitate our analytical understanding. We now derive each of them in turn.

Conservation of pmin
Approximate conservation of p min follows from the fact that, as e → 1, GW emission dissipates orbital energy of the binary much more efficiently than its angular momentum 9 . As mentioned above, because of the steep dependence on 1 − e 2 in (10)-(11), GW emission is most effective at changing a, e (and p) only at the eccentricity 'peak', when e → 1 (which is the case only during a small fraction ∼ j min ≪ 1 of each secular period), and can be neglected during the rest of the secular cycle. Thus, during each eccentricity peak GW emission causes changes of the binary orbital elements predominantly over the short time intervals δt ≪ T b during the periastron passages (where T b = 2π a 3 /[G(m 1 + m 2 )] is the inner orbital period of the binary).
Let the characteristic relative velocity of the binary components at periastron be v p ∼ G(m 1 + m 2 )/p, which follows from energy conservation and the fact that p = a(1 − e) ≪ a. Representing the effect of the GW radiation reaction as an impulsive force F acting over time δt, we can estimate the change in the binary orbital energy E over each periastron passage to be δE ∼ F v p δt. Thus, the characteristic timescale t E on which E evolves is Similarly, during each periastron passage GW emission changes the binary angular momentum J ∼ pv p by δJ ∼ F p δt, so that the characteristic time on which J evolves is The ratio of these two timescales is Thus for e ≈ e max → 1, we have t E /t J ∼ 1 − e max ≪ 1. In other words, when the binary is near peak eccentricity, 9 The following argument is not unique to the GW emission and can be generalized for any short-range dissipative force, e.g. due to the fluid tides acting inside the binary components.
its energy is dissipated much more rapidly than its angular momentum, so that one can assume that J ≈ J min is almost constant even though E (and a) evolves substantially. Since J min ∝ [a(1 − e 2 max )] 1/2 ≈ (2p min ) 1/2 for high e max , this implies that the minimum periastron distance p min does not change appreciably as a result of GW emission over a single eccentricity peak (Wen 2003, §3.1). And since the system undergoes quasi-periodic secular oscillations, the binary returns to the same value of J min (and hence the same p min ) at the e-peak of the following secular cycle.
To be more precise, let us consider the rate of change of the pericentre distance with respect to semimajor axis: Dividing (11) by (10) to get de/da, plugging this in to the right hand side of (18) and expanding near e = 1 gives In other words, the rate of change of pericentre distance p vanishes in the limit e → 1, so that p = p min is constant at the eccentricity peak. Since each individual secular cycle is symmetric (in time, relative to its eccentricity minimum), p min would then take the same value at the next eccentricity peak, be preserved there, and so on, just as we observed in Figure 1d. Thus we arrive at the conservation of the minimum pericentre distance: which holds true over multiple secular cycles as long as e max → 1. Equation (20) implies a simple scaling for j min in the weak-to-moderate GR regime: where p min = a(t i ) × (1 − e(t i )) at some reference time t i . This is an important result of this paper and will subsequently allow significant analytical simplification. We note that an argument similar to the one above also applies when the binary gets trapped at high eccentricity in the strong GR regime. In that case there are no more secular oscillations (the binary having decoupled from cluster tides), but energy is still being dissipated efficiently by GW emission while angular momentum is not (for more details see Wen (2003) and Antognini et al. (2014)). As a result p (rather than just p min ) stays approximately constant, so that j ∝ a −1/2 . We will use this scaling when considering the strong GR regime in §4.3.

Conservation of imin
The conservation of i min -the minimum value of the binary inclination which is reached at the eccentricity peak -follows from the fact that the GW emission does not affect the orientation of the orbital plane of the binary and thus does not affect its inclination. Again, because of the time-symmetry of each secular cycle, at the next eccentricity peak the binary will arrive at the same value of i min (which, again, will not be changed by the GW emission, regardless of the decay in a), and so on. As a result, it follows that over time intervals much longer than each secular cycle This is precisely what we saw in Figure 1c.
For the remainder of this paper we will take p min and i min as our two primary, a−independent constants of motion which persist over multiple secular cycles in the weak-to-moderate GR regime. We can then re-write the key secular evolution parameter Θ in terms only of a and these conserved quantities as which will greatly simplify our analytical understanding. We will discuss the circumstances in which the conservation of p min and i min breaks down, invalidating equation (23), in our detailed discussion of a numerical example in §5.2.

EVOLUTION OF A SHRINKING BINARY THROUGH TIME AND PHASE SPACE
The binary in Figure 1 started its life in the weak GR regime (see panel (e)). Then as its semimajor axis shrank it entered the moderate GR regime and finally ended up in the strong GR regime before merging. On a related note, we also saw that the binary's phase space trajectory evolved from librating (panel (f)), to circulating with e min ≪ 1 (panel (g); we call this a 'high-j max ' circulating trajectory), to circulating with e min ≈ 1 (panel (h); we call this a 'low-j max ' circulating trajectory). This pattern of behavior is rather general, and in fact has been discussed briefly in the case of LK-driven mergers by Blaes et al. (2002) and Antonini et al. (2016).
Physically, this evolution of phase space trajectory follows from the way GR precession modifies the phase space structure, by encouraging rapid pericentre precession at high e and hence expanding the region of circulating trajectories at the expense of librating trajectories. Hence, one can roughly think of increasing ǫ GR as pushing the separatrix 'down' to lower eccentricity in the (ω, e) plane. Since the increases in ǫ GR occur only when the binary is at e ≈ e max (because this is where GW emission, and hence the decay of a, is concentrated), these downward shifts of the separatrix coincide with the binary's highest eccentricity. In this way a binary on a librating trajectory inevitably moves 'towards' the separatrix from below (or rather the separatrix moves closer to it from above) and ultimately ends up crossing the separatrix onto a circulating trajectory. Eventually the binary gets trapped on a low-j max circulating trajectory in the strong GR regime, where asymptotically there are no eccentricity oscillations at all.
In this section we wish to understand more quantitatively how the binary's phase space trajectory and GR regime evolves as a function of semimajor axis a. The details of the various transitions between these regimes are quite technical, so we relegate most of our discussion to Appendix B and retain here only the salient points.

Characteristic lengthscales
As shown in Appendix B, the orbital evolution of a binary en route to merger admits a key lengthscale which is independent of a. There are then four critical semimajor axis values to contend with.
A binary that starts its life in the weak GR regime will inevitably move into the moderate GR regime at some point as its semimajor axis shrinks. Thus the first critical value is a weak , which we define to be the semimajor axis corresponding to ǫ GR = ǫ weak , i.e. to the transition between the weak and moderate GR regimes. Using (B4) and the fact that ǫ weak ≪ ǫ GR , this is Note that a weak ∼ d, unless cos i min ≪ 1.
Next we define a strong , which demarcates the inevitable transition between moderate and strong GR regimes, i.e. it corresponds to ǫ GR = ǫ strong ≡ 3(1 + 5Γ). Using (6) we get After entering the strong GR regime the binary gets trapped at high eccentricity, and its semimajor axis decays while p = a(1 − e) remains roughly constant. See §4.3.3 for more details. A binary initially on a librating phase space trajectory will transition to a circulating trajectory once a drops below some threshold value, as we saw in Figure 1. This threshold value is a sep , which is given by 10 Typically a sep d. Of course, a sep has physical meaning only if cos 2 i min < (1 + 5Γ)/10Γ. This is because for cos 2 i min > (1 + 5Γ)/10Γ, the binary is already on a circulating trajectory even for a ≫ d, so it never crosses a separatrix on its way to a → 0.
There is one further critical semimajor axis value, which we call 10 To derive this we set j 2 0 = j 2 + = 1 in equations (B1), (B3)see Papers II-III.
At a = a div the dimensionless numbers σ and κ, which play a role in setting time spent at highest eccentricity (see Paper III), diverge. This divergence will become important when we discuss the evolution of the secular timescale ( §4.2). Moreover, if a sep > a div , then a = a div corresponds approximately to the transition between high-j max (a div < a < a sep ) and low-j max (a < a div ) circulating trajectories -see Appendix B for details. For future reference we write down the ratio: To summarize, we have defined four critical semimajor axis values a weak , a strong , a sep , a div . The weak GR regime corresponds to a > a weak , while the moderate GR regime corresponds to a strong < a < a weak . We emphasize that we have purposely written e.g. a > a weak rather than a ≫ a weak here: it turns out that different dynamical regimes are not very well separated in semimajor axes (in fact we usually have a weak ∼ a sep ∼ a div ∼ d), despite being well separated in ǫ GR -see e.g. Figure 1. In Appendix B we show in more detail how the binary passes through these different regimes as a shrinks for several different values of Γ > 0. The results are quite complex: transitions between different regimes do not always happen in the same order, and the Γ > 1/5 case has to be considered separately from 0 < Γ ≤ 1/5. Nevertheless, the asymptotic regimes defined here will allow us to make analytical progress, and will give us a qualitative understanding of the behavior of t sec , ∆a and τ a throughout slow mergers, which is what we turn to next.

The secular timescale
One crucial quantity in any study of LK-driven or cluster tide-driven mergers is the period of secular eccentricity oscillations, t sec , since this gives the time elapsed between each episode of GW emission. In Figure 1i we plotted t sec as a function of semimajor axis a for a binary undergoing LK oscillations as it shrank and ultimately merged (time runs from right to left in that panel). We labelled four regimes of the t sec (a) curve, A, B, C, D, and each regime exhibited a different a-dependence. In this section we will explain the behavior in each of these regimes in turn.
In DA theory without GW emission, the definition of t sec is with dj/dt given in (A1). The right hand side of (32) of course depends on a, which decays throughout a slow merger.

Regime A: Librating trajectories, a > asep
Regime A in Figure 1i corresponds approximately to a > a sep , i.e. to librating phase space trajectories (see Figure 1f and §4.1). Binaries on librating phase space trajectories spend most of their time far away from e ≈ 1 (see Figure 1b). In the weak-to-moderate GR regime this means that the explicitly ǫ GR -dependent terms in (A1) are unimportant for most of the evolution, and so in this regime a good approximation to the secular period is found by evaluating (32) ignoring the explicitly ǫ GR -dependent terms. Technically speaking, such an approximation becomes exact in the asymptotic limit ǫ GR ≪ ǫ weak and σ → 0 (equation (B5)) -see Paper III. Moreover, for librating trajectories we know that , and assuming j 2 min ≪ j 2 max , we find (see eqations (32)-(34) of Paper II): where We emphasize that GR is still implicitly present in equation (33) because it affects the values of j + and ∆ that must be plugged into the right hand side.
We cannot simplify the expression (33) further without specifying the ordering of j 2 0 , j 2 ± , which itself depends on Γ. However, far from the separatrix between librating and circulating trajectories (i.e. a sufficiently greater than a sep ) we expect j + , ∆ to depend only weakly on a (Figures 10-11), so that This scaling is confirmed in regime A of Figure 1i with a black dashed line. Physically it arises because the binary's inner orbital period is proportional to a 3/2 . It has been noted by many authors when estimating a LKdriven merger timescale (e.g. Wen 2003; Thompson 2011;Randall & Xianyu 2018), although they did not tie it to the librating nature of the phase space trajectory. When the binary's phase space trajectory gets close to the separatrix a → a sep we know (Appendix A) that j 2 + , j 2 0 → 1 and so j max / √ ∆ → 1. Figure 2 shows that K(x) diverges for x → 1, so t sec should peak around this point. This again matches what we see 11 in regime A of Figure 1i.

Regime B:
Circulating trajectories, a div < a < asep After crossing the separatrix to a < a sep , the binary ends up on a circulating phase space trajectory. One can still use the results for t sec obtained in Paper II (equations (32)-(34)), but for circulating trajectories the scaling (34) we found in regime A no longer holds. The behavior of t sec (a) becomes significantly more complicated, and t sec does not necessarily increase with decreasing a, as we will see.
The argument we used in deriving (33) relied on e min being much smaller than unity, and the approximation σ ≪ 1. If these approximations are good (which in particular now requires high-j max circulation, i.e. a div < a < a sep , see Figure 10) then we can say that we are in regime B. This time we have j max = j 0 (equation (A14)) and j 2 + > 1 (Figures 10-11), so that ∆ ≈ j + and instead of (33), equations (32)-(34) of Paper II give Plugging equations (B1), (B3) into (35) gives us an explicit expression for t B sec (a) which we plot with a red dashed line in Figure 1i. We see that t B sec (a) diverges as we take a → a sep from below, as expected. But away from the divergence we expect the elliptic integral K to scale weakly with a, and we get (using equation (B1)) Thus, t B sec can either increase or decrease with a, depending on the value of (a/d) 7/2 cos 2 i min .

Regime C: Circulating trajectories, a a div
We see from Figure 1i that equation (35) becomes inaccurate once a a div . This is expected because σ → ∞ as a → a div (Figure 10), implying that the non-GR approximation for t sec derived in Paper II (which we used for regimes A, B) becomes invalid. Instead, we now need to use equations (32), (A1) to evaluate t sec . Figure 1i shows that t sec very rapidly diminishes as a decreases below a div , which coincides with the rapid increase of e min or, equivalently, rapid decrease of j max , which is now in the low-j max limit. This is because for high-j max circulating trajectories, a smaller a leads to a smaller j max , which in turn means that the binary spends more time at 'high' eccentricities (say with e above 0.9). The cluster tide-driven secular evolution is faster at high e than at e ∼ 0 because, even though the torque on a binary with, say, e = 0.1 is comparable to that on a binary with e = 0.9, the angular momentum of the latter is significantly smaller, so the relative change in angular momentum occurs over a much shorter timescale. For that reason t sec decreases with decreasing a, very rapidly (in a non-power law fashion) for a a div . We call this interval of rapid t sec decay regime C.
In Appendix C we use equations (32), (A1) to derive an approximation (C3) for t sec valid in regime C. Depending on the relationship between a div and a weak , there are two possibilities. For a weak ≪ a a div (as in Figure 1) On the other hand, for a a div ≪ a weak (as in Figure 5) we find Both of these expressions show a rapid decay of t sec as a drops even slightly below a div because of the a-dependent term in the denominator. However, this term rapidly becomes constant as a decreases further, switching again to a power-law behavior of t sec (a), which we cover next.
As a becomes substantially smaller than both a div and a weak , and the binary is in the moderate-GR regime, GR precession plays an even more important role in determining t sec . We call this situation regime D. In this case we can still use equation (C3) to find in the appropriate limit that The predicted scaling t D sec ∝ a 3/2 matches what we observed at the low-a end of Figure 1i. Note that this expression is valid regardless of the relationship between a div and a weak . Again the secular timescale decreases (rather than increasing like one would naively expect) as the semimajor axis shrinks, although not as rapidly as in regime C.

The evolution of semimajor axis
In this section we aim to understand how the semimajor axis of a binary decays with time in certain asymptotic regimes. To achieve this we first write down expressions for the decay in semimajor axis over one secular cycle, ∆a, in terms of a. This allows us to understand the behavior we saw in panel (j) of Figure 1. Then we plug our expressions for t sec (a) and ∆a(a) into the right hand side of (8) to calculate (very approximate) expressions for τ a .
To begin, we integrate equation (10) over one secular cycle, approximating a as constant to lowest order (which is valid since |∆a| ≪ a by assumption for a slow merger). The result is 12 12 Note that equation (40) is essentially identical to the first line in equation (55) where λ 0 ≡ (64/5)G 3 c −5 m 1 m 2 (m 1 + m 2 ) is independent of a. Assuming the binary reaches very high maximum eccentricity e max → 1, we show in Appendix D that we can ultimately approximate this as (equation (D4)): where is independent of a, and ξ is a complicated function of a and other parameters -see equation (D5).
To simplify the expressions for ξ, j + and j 0 in equation (41) we need to specify the strength of GR precession. The asymptotic regimes of interest for evaluating ∆a are therefore the weak, moderate and strong GR regimes. Unfortunately these do not map precisely onto the phase space regimes A-D that we used to understand t sec (a) behavior in §4.2. Nevertheless, there are situations where a binary sits in, for instance, both the weak GR regime and phase space regime A, and in such situations clean analytic results for τ a are possible, as we will see.

Weak GR
In Appendix D we show that for weak GR (a a weak ), From Figures 10 and 11 (or by inspection of equation (B5)) we know that for weak GR we normally have σ 1; thus ξ will also be O(1) and scale weakly with a in the weak GR regime. Since j + and j 0 also scale weakly with a in this regime (see equations (B1), (B3)), we find from (41) that This matches what we saw at the high-a end of Figure 1j. Physically, the scaling (43) just reflects the fact that the time spent in the high eccentricity state is proportional to the secular timescale, and that in the weak GR regime t sec ∝ a −3/2 (equation (34)). Indeed, one might have guessed the result (43) a priori by noting from (10) We can now evaluate the characteristic decay timescale τ a if we assume not only that the binary is in the weak GR regime but also that it is on a librating phase space trajectory (i.e. that we are in regime A). In this case we can plug equations (33) and (41), (42) into (9) to find where is dimensionless and typically O(1) away from separatri-  (49), which is valid for low-jmax circulating trajectories in the moderate GR regime. Note that the horizontal axis is logarithmic.
ces, and we defined a 'decay rate' The scaling τ a ∝ a predicted by equation (44) is exactly what we see at the high-a end in Figure 1k.

Moderate GR
In the moderate GR regime we can get a scaling for ∆a if we assume the (d/a) 7/2 terms dominate in equations (B1), (B3). Then from (41): We can simplify the expression for ξ (equation (D5)) if we further assume the binary is on a low-j max circulating trajectory (which is inevitably true at some point before the strong GR regime is reached). In this case In Figure 3 we plot ξ as a function of |κ| according to (49); in particular we see that ξ ∼ 1 except for very large |κ| ≫ 1. Also, away from a ≈ a div (i.e. in regime D) we know κ is almost never large compared to unity, and scales weakly with a ( Figures 10-11), so we can treat ξ as an order-unity constant for a rough analysis. As a result, (48) predicts a scaling ∆a ∝ a 2 .
We see from (50) that, unlike for weak GR (equation (43)), here the individual decrements in semimajor axis ∆a get smaller as the semimajor axis a shrinks. The scaling (50) matches what we saw in Figure 1j towards the low-a end. One can understand the result (50) qualitatively as follows. Like for weak GR, at very high e we again have roughly (da/dt) GW ∝ a −3 j −7 min , and again using (21) this is ∝ a 1/2 . This time, since the binary spends a large fraction of its secular period in the vicinity of j min , we get a rough estimate of ∆a by multiplying (da/dt) GW not by t min , but by t sec . Using the scaling (39) we get ∆a ∝ a 1/2 × a 3/2 ∝ a 2 . Loosely speaking, the factor of ξ in equation (48) accounts for the fraction of time that the binary actually spends in the vicinity of j min during each secular cycle.
We can also compute the decay timescale τ a in the moderate GR, low-j max circulating regime. To do so we plug (39) and (41) into (9) to find 13 where which is typically O(1). The scaling τ a ∝ a 1/2 is confirmed at the low-a end of Figure 1k. The moderate GR decay time (51) is shorter than the analogous weak GR result decay time (44) by a factor ∼ 2p min /a ∼ j min ≪ 1.

Strong GR
Once the strong GR regime is reached, equation (8) ceases to be valid because the binary decouples from cluster tides and so no longer undergoes secular eccentricity oscillations (Paper III). The evolution of a, e is then dictated purely by equations (10), (11). Supposing the transition to the strong GR regime happens at some reference time t i , we know from §3.3 that for t > t i the binary conserves its value of p = a(1 − e) = p min , meaning j = (2p min ) 1/2 a −1/2 (although p min can be a factor of ∼ 2 larger than its initial value p min (t = 0) -see Figure  7d and the final paragraph of §5.2.). With this we can eliminate eccentricity from equation (10), resulting in We now use the definition τ a ≡ |d ln a/dt| −1 (equation (9)) to calculate τ a directly from (53) as Thus the only difference between the characteristic decay timescale in the moderate GR regime (equation (51)) and that during the strong GR regime (equation (54)) is a factor of order unity which depends very weakly on a. This explains why the behavior of τ a ∝ a 1/2 is barely modified in Figure 1k once the binary enters the strong GR regime (a 10 0.45 AU).
13 Note that we do not take ∆a from (48): instead we used the more general equation (41), which allows us to take advantage of the cancellation of the factors |j + j 0 | without having to assume the dominance of the (d/a) 7/2 terms in j 2 + , j 2 0 . Similarly, ξ as given in equation (49) does not rely on this assumption.

NUMERICAL EXAMPLES
In this section we will provide further numerical examples akin to Figure 1, in particular for binaries moving in non-Keplerian potentials. Our aim is to verify and elucidate the approximate analytical results derived in § §3-4. We do this by direct numerical integration of the DA equations of motion, including both GR precession and GW emission, for various binaries that undergo slow mergers. First we give two Examples with Γ > 1/5 ( § §5.1-5.2), the first of which exhibits all the hallmark behavior of a slow merger beginning in the weak GR regime, and the second of which allows us to focus on the late-stage (moderate and strong GR) evolution. We then provide one further Example, this time for a binary with 0 < Γ ≤ 1/5 ( §5.3). Note that we also provide one additional numerical example in the LK limit in Appendix E, when comparing our work with that of Randall & Xianyu (2018).
To be clear, we note that we ran many more numerical experiments of slow mergers than those shown here. We have chosen to present here the minimal number of examples that still capture qualitatively all the possible interesting evolutionary scenarios. (There are of course non-interesting cases, such as binaries that are so tightly bound the cluster essentially plays no role in their evolution, but we do not include them here). 5.1. Example 2: Γ = 0.42 > 1/5. An initially librating trajectory in the weak GR regime In Figure 4 we show the result of integrating the DA equations of motion for a binary that orbits the spherical Hernquist potential Φ(r) = −GM/(b + r), where M = 10 6 M ⊙ and b = 1pc. The outer orbital peri/apocentre is chosen to be (r p /b, r a /b) = (1.5, 1.7). The resulting Γ value is 0.42 > 1/5. The figure is set up in precisely the same way as Figure 1.
Let us first focus on the initial ∼ 3000 Myr. From panel (a) we see that at t = 0 the binary has a 0 = 250 AU, and that for the first ∼ 3000 Myr, a(t) exceeds significantly each of the four critical values a weak , a sep , a div , a strong , which were defined in §4.1 and which we show with horizontal blue lines (see legend). It follows that during this time, the binary resides in the weak GR regime: and indeed, we see from panel (e) that ǫ GR is initially far smaller than ǫ weak (blue horizontal dotted line). Moreover, panel (b) shows that the binary undergoes the expected secular eccentricity oscillations (initially on a timescale of ∼ 30Myr), and reaches a very high maximum eccentricity of 1 − e max ≈ 10 −5 . Concomitantly there are secular oscillations in inclination i (panel (c)) and pericentre distance p (panel (d)), though as predicted in §3.3 the values of cos i min and p min reached at the peak of each secular eccentricity cycle are very nearly conserved (see the dashed red horizontal lines in these panels). Similarly, from panel (b) we see that the maximum eccentricity of the binary is well described by j 2 min = 2p min /a (equation (21)), while its minimum is well described by j max = j + . The latter fact implies that the binary is initially on a librating phase space trajectory (equation (A12)), and this is confirmed by panel (f), in which we show the phase space evolution during the time interval denoted by the blue shaded stripe.
So, we have a binary on a librating trajectory (regime A) in the weak GR regime, whose semimajor axis is slowly decaying with time. From panels (i), (j) and (k) we see that the binary obeys all the expected scalings for t sec , ∆a and τ a in this regime, namely equations (33), (43) and (44) respectively. As we know from §4 there are two key things that happen next to such a binary: one is that it enters the moderate GR regime, and the other is that its (ω, e) phase space trajectory crosses the separatrix and becomes circulating (ultimately a low-j max circulating trajectory). We also know from Figure 10 that these two occurrences can happen in any order. In this particular case the binary crosses the separatrix first: we see from panel (a) that a crosses a sep around t = 4800 Myr, and from panel (b) that around this time the minimum eccentricity gets very close to zero and then starts to increase and ceases to be well described by j max = j + . This inference is confirmed in panel (g), in which we see explicitly the evolution from libration to circulation that occurs during the time interval denoted by the yellow shaded stripe. Panel (i) confirms the expected scalings of t sec , ∆a and τ a as the binary moves through regime B.
At around t = 7100 Myr, the binary's dynamical evolution changes dramatically: the semimajor axis a approaches a div and its decay accelerates, whereas the secular timescale becomes very short (decaying in the nonpower law fashion expected of regime C -see §4.2.3). Furthermore we see that j 2 min = 2p min /a is still a good approximation for the maximum eccentricity and, though we do not show it here, the minimum eccentricity at this stage is fairly well-described by j max = −σj − (we defer a more careful, 'zoomed in' discussion of the late stages of a slow merger to §5.2). These characteristics are the hallmark of low-j max circulating trajectories in the moderate GR regime. To confirm this, we look at panel (h), which shows the phase space evolution during the green striped time interval. We see clearly that the binary rapidly evolves towards a purely high-eccentricity circulating trajectory (into regime D), whereafter it soon enters the strong GR regime and then merges. 5.2. Example 3: Γ = 0.42 > 1/5. A binary initially in the moderate GR regime In Figure 5 we show the evolution of a lower mass binary (m 1 = m 2 = 1.4M ⊙ ), but on the same outer orbit in the same cluster as in Example 2; thus we again have Γ = 0.42. We choose different initial conditions for the inner orbit, in particular a 0 = 49 AU. We see from panel (a) that because of this choice, a < a weak initially, putting the binary just inside the moderate GR regime (see also panel (e)). On the other hand, initially a > a sep so the phase space trajectory librates. This is different to Example 1 since in that case, by the time the binary reached the moderate GR regime, its phase space trajectory was already circulating. Thus with Example 3 we will not only be able to focus on the 'late-time' behavior of a slow merger, i.e. its evolution through the moderate and strong GR regimes (as promised in §5.1), but also to see a phase space transition from librating to circulating within the moderate GR regime.
From panel (b) we see that while the binary is on a librating phase space trajectory (t 4500 Myr) its maximum eccentricity is well described by j max = j + and its secular period increases with time. Once it enters the circulating region its secular period begins to decrease with time. Panel (i) shows that the t sec (a) behavior is again split cleanly into four regimes A − D. Panels (j) and (k) show that the binary also exhibits the expected asymptotic behavior for ∆a(a) and t sec (a) at small semimajor axes (i.e. in regime D). This is unsurprising since we know from panel (g) that the binary has reached lowj max circulation by this stage. On the other hand, the large-a scalings |∆a| ∝ a −3/2 (equation (43)) and τ a ∝ a (equation (44)) are never cleanly realized, because the binary does not begin its life in the weak GR regime.
Let us now turn to Figure 6, in which we zoom in on panels (a)-(d) of Figure 5, focusing on t 6000 Myr. In panel (b) we no longer show the j max = j + solid red line, since we know that for this time range the binary is certainly on a circulating trajectory. However, we have added a green dashed line that shows the minimum eccentricity that would be obtained if the binary was on a high-j max circulating trajectory, i.e. with j max = j 0 (equation (A14)). We have also added a blue dashed line showing the low-j max solution j max = −σj − (equation (A15)). We see that until around 8000 Myr the evolution is best described as a high-j max circulating trajectory with j max = j 0 . There is then a transitional stage around t ≈ 9000 Myr wherein neither high-j max nor low-j max is a good description (this corresponds to j 2 0 approaching and then crossing zero from above in Figures 9 and 10). After t ≈ 9400 Myr the evolution is quite well-described as a low-j max circulating trajectory, j max = −σj − . On the other hand there is small systematic error in this prediction, which we will explain momentarily.
We finish our discussion of Example 3 by zooming in on the very latest stage of the evolution, t 9400 Myr, which we plot in Figure 7. It is clear from this figure that at these late times the conservation of p min (panel (d)) begins to fail, evolving from its value on the red dashed line (p min (t = 0) ≈ 10 −3.11 AU) towards a slightly larger value. The reason for the evolution of p min is that by this stage e min has got so large that one cannot think of GW emission as being confined to just the peak-eccentricity portion of a secular cycle and negligible elsewhere. Instead there is a non-negligible amount of emission throughout the whole cycle, so a decays continuously rather than in a step-like fashion, as can be confirmed by zooming in on panel (a). As a result, the binary's semimajor axis upon entering the peak of its next secular cycle is slightly smaller than when it left the peak of the previous cycle, so ǫ GR is slightly larger and hence the binary achieves a larger value of p min ≈ aj 2 min /2 (see also §3.3 of Wen (2003)). The magnitude of the oscillations in p also diminish with time until, after around t = 10600 Myr, the binary reaches the strong GR regime and the oscillations are quenched. As predicted in §3.3 the value of p itself then remains effectively constant almost all the way to merger, taking a value 14 p strong ≈ 10 −2.86 AU ≈ 1.8p min (t = 0). We see from panel (b) that the resulting underestimate of  (Figure 4), so that again Γ = 0.42, but the binary constituent masses and inner orbit initial conditions are different. In this case the binary begins in the moderate GR regime on a librating phase space trajectory. p min at these late times leads to a slight overestimate of both the maximum and minimum eccentricities (the blue and red dashed lines each sit slightly too low in this panel) -hence the systematic error in the prediction j max = −σj − which was computed using the p min value from t = 0. Finally, we note that the conservation of i min also fails in these latter stages of the evolution (panel (c)), with cos i min undergoing a decrease from cos i min (t = 0) = 0.265 to roughly cos i strong ≈ 0.20. This change -which occurs for the same reason as that in p min , and is also discussed in §3.3 of Wen (2003) does not make a significant difference to our analysis because the cos 2 i min , sin 2 i min terms in e.g. equations (B1)-(B3) are already dominated by the (d/a) 7/2 terms by this stage. 5.3. Example 4: Γ = 0.176 < 1/5. In Figure 8 we provide one more Example, this time of a m 1 = m 2 = 10M ⊙ binary orbiting a M = 10 7 M ⊙ Hernquist cluster. We take new initial parameters for the inner orbit, as well as a much smaller outer orbit (r p /b, r a /b) = (0.1, 0.5), resulting in Γ = 0.176 < 1/5. This binary begins its life in the weak GR regime (a > a weak ), and is initially on a librating phase space trajectory (a > a sep ). It moves into the circulating regime (a < a sep ) at around t = 600 Myr and then into the moderate regime (a < a weak ) at around 720 Myr, and by around 860 Myr it has merged. Unsurprisingly, the secular period increases with time while the phase space orbit librates (regime A in panel (i)), and decreases with time once it begins to circulate (regimes C and D). There is no regime B in this example, which follows from the fact that in this case a sep < a div , so the divergence at a = a div does not occur while the binary is on a circulating trajectory. In fact we always have a sep < a div for 0 < Γ < 1/5, as guaranteed by equation (31). However we emphasize that this behavior is not limited to Γ < 1/5, and in fact it is easy to find examples of binaries with Γ > 1/5 that have a sep /a div < 1 and hence also have no B regime.
It is notable in this Example that compared to Examples 1-3, the transition in the behavior of e, i and p around 600Myr is very abrupt. At the beginning of the yellow shaded time interval the binary is on a librating orbit (starting at the green dot in panel (g)); then at around 600 Myr it drifts through the separatrix and joins the family of circulating trajectories; and by the end of the yellow interval it is on a very high eccentricity circulating trajectory (ending at the red dot). In other words, at some point around the middle of the yellow time interval the binary effectively 'jumps' from libration to lowj max circulation. The reason for this jump is that like in other examples, the binary inevitably ends up on a higheccentricity circulating trajectory, and for 0 < Γ < 1/5 the eccentricity of such a trajectory is forced to be larger than that of the saddle point at ω = 0, namely e f,0see Paper III. Thereafter the evolution matches the usual low-j max behavior as seen for Γ > 1/5, followed by a merger.
Finally, at the extremes of panels (i)-(k) we see the expected asymptotic behavior for regimes A and D taking shape. However the proper scalings are never fully developed at large a, simply because at t = 0 the binary is too close to the separatrix and to a div for j + , j 0 etc. to be considered near-constant.

DISCUSSION
In this paper, we have extended our theory of secular dynamics of binaries in stellar clusters by accounting for the effect of GW emission. We have demonstrated that cluster tides are capable of driving binaries to very high eccentricity, where they can emit GW bursts, shrink in semimajor axis, and ultimately merge. Our results also encompass -and in several ways extend -the theory of (test-particle, quadrupole, doubly-averaged) LK-driven compact object binary mergers, which is recovered exactly in the limit Γ = 1.
Throughout this paper we have focused on understanding the physics of 'slow mergers', i.e. those mergers that require many secular periods, but would not have occurred within a Hubble time had the cluster tidal perturbation not been present. This meant focusing on initially high-inclination systems for which eccentricity excitation is possible. We have also ignored octupole effects, short timescale fluctuations, stellar flybys and the like (see Paper II for discussion). Yet even in this relatively simple setting we have seen that the evolution of a binary from 'birth' to merger can be rather complex and is, in general, analytically intractable. Key to making analytical progress was our identification of several asymptotic regimes both in GR strength (weak, moderate, strong) and in phase space trajectory (librating, high-j max circulating, low-j max circulating). We emphasize that the analytical results derived in these regimes are only approximations. In practice the boundaries between asymptotic regimes are blurry and poorly separated, especially in a-space. Nevertheless, they have been sufficient for our purpose, which was to gain analytical -and consequently, physical -insight into an important class of problems that have traditionally been outsourced to a computer. In Table 1 we summarize our approximate asymptotic results.
To conclude this paper, we first discuss in §6.1 the implications of our results for the calculation of the total merger timescale. Finally in §6.2 we discuss our work more broadly in the context of previous studies of LKdriven mergers.
6.1. Merger timescale Secular dynamics of binaries including GW emission is a problem that has been considered many times in the LK context for hierarchical triple systems. Many LK studies that include GW emission are focused on the resulting observable merger rate, i.e. the number of binaries that merge per cubic Gpc per year in the local universe. To compute such a rate -as we did for cluster tide-driven compact object mergers in Hamilton & Rafikov (2019a) -one needs to know the time it takes for a given binary to merge as a function of its initial conditions. There are basically two ways to approach this problem. One can either integrate the equations of motion (DA, SA or N-body) directly and read off the merger time from the simulation, or one can seek an approximate (semi-)analytic formula that parameterizes the merger time in terms of those initial conditions (which can be checked using direct numerical integration for a small number of cases). The latter approach is obviously much faster

Weak GR & librating (regime A) Moderate GR & low-jmax circulating (regime D)
Strong GR Conserved quantities p min , i min p min , i min p, i tsec ∝ a −3/2 ∝ a 1/2 N/A ∆a ∝ a −3/2 ∝ a 2 N/A τa Λ weak U −1 a Λ mod U −1 (2p min ) 1/2 a 1/2 (8/5)U −1 (2p min ) 1/2 a 1/2 when one is dealing with millions or billions of binary initial conditions in a Monte-Carlo population synthesis. The merger time formula usually used in compact object merger calculations in the LK literature is where is simply the merger time of an isolated binary with initial semimajor axis a and very high initial eccentricity e ≈ 1 (Peters 1964). The formula (55) is typically justified via the following heuristic argument (Miller & Hamilton 2002;Thompson 2011;Liu & Lai 2018;Randall & Xianyu 2018). First, one assumes that the GW emission is negligible except around e ≈ e max , and so the total amount of time that needs to be spent at e ≈ e max before the binary merges is ≈ T iso m (e max ). But the amount of time that the binary actually spends in the vicinity of e max in each secular cycle is ≈ j min t sec ≡ (1 − e 2 max ) 1/2 t sec ; thus the number of secular cycles required until the time spent around e max accumulates to To get the total merger time we multiply this by t sec . Finally, evaluating everything at t = 0 we get the formula (55).
Of course, this heuristic derivation can be criticized on several levels. For instance, it makes no distinction between the values of e max , t sec at t = 0 and their values at later times, even though we know ( § §3-4) that both of these quantities vary with a. Also, it does not accurately treat the behavior of e around e max , instead assuming that e is precisely equal to e max within a discrete time window which lasts for (1 − e 2 max ) 1/2 t sec , and that GW emission is negligible outside that window.
In reality, we know from Paper II that even in the absence of GR precession, the fraction of each secular period spent in the vicinity of high eccentricity is not precisely proportional to t sec . Indeed, combining equations (34) and (59) of Paper II with equations (33) and (45) of the present paper, we find that the time for j to change from j min to √ 2j min in the ǫ GR → 0 limit (in regime A) is The factor (2Λ weak ) −1 can be significantly different from unity if the binary is near a separatrix -see Paper II. Finally, the expression for t min becomes even more complicated when we do include GR precession, especially for large values of σ and/or κ. Despite these shortcomings, the formula (55) actually works reasonably well in practice (to within a factor of order unity) when compared to direct numerical integration of the (DA, test-particle quadrupole) equations of motion for triple systems (Thompson 2011;Liu & Lai 2018;Randall & Xianyu 2018). To see why this might be the case, we now show that one can actually derive the formula (55) in a slightly less hand-waving fashion using the results of this paper. For slow mergers, a rather general formula for the merger time t m is found by integrating equation (8) from t = 0 to t = t m : (see equation (57) of Randall & Xianyu (2018)). Of course, as it stands (58) is an entirely impractical formula given the complexity of the general analytic expression for τ a that must then be integrated over. To make progress we assume that the majority of a slow merger is spent in the weak GR regime, and that by ignoring the time spent in the moderate and strong GR regimes we do not impart any major error (though we note that this approximation would fail in Figure 5, for example). Then a decent approximation to τ a is given by equation (44). Plugging this into (58) gives with Λ weak and U given in equations (45) and (46) respectively. Of course, Λ weak accounts for the fact that the time spent at highest eccentricity is not precisely proportional to t sec -see equation (57). Since p min (t) ≡ a(t) × (1 − e max (t)) is conserved throughout a slow merger, we can substitute in equation (46) the expression where in the second line we assumed e max (0) ≈ 1. Comparing the result to (55), we find that in this approximation the merger occurs at time Thus provided Λ weak ∼ 1, we recover the standard estimate of the merger timescale (55) to within a factor of order unity. If anything, one might expect that T m will be an overestimate of the 'true' merger time (even if one calculates this 'true' time by integrating the DA quadrupolar equations, i.e. ignoring SA effects, octupolar terms, and so on). That is because, as we saw in §4.2 and §5, the decay of a(t) speeds up substantially once the binary reaches its low-j max circulating phase in the moderate GR regime (see equation (51)). Thus, approximating the entire decay using the weak GR equation (44)) may seem overly conservative. It is therefore surprising to note Figure 8 of Thompson (2011) and Figure 5 of Randall & Xianyu (2018), both of which suggest that T m typically underestimates the true (DA) merger time by a factor ∼ 2 for compact object binaries in hierarchical triple systems. In future work it might be interesting to understand more deeply the reason for this trend. It may also be profitable to try to use the results of this paper to calibrate a merger timescale formula that is more accurate than (55) -even an estimate with typical in error at the level of only a few tens of percent would be a significant improvement. On the other hand, for realistic calculations such a formula may be of limited interest, since the true merger time can be greatly shortened when one includes sub-secular (e.g. 'singly-averaged') effects, octupolar terms, and so onsee e.g. Antonini et al. (2014); Grishin et al. (2018).

Relation to studies of LK-driven mergers
As discussed in §6.1, most LK studies 'solve' the problem of GW-assisted mergers either by direct numerical integration or by stating and then evaluating the merger time formula (55) after calculating e max from simple theory. There does not exist much in the literature that lies in between these extremes, in which an attempt is made to understand in detail the physics of each stage of the merger or to derive analytic results in specific asymptotic regimes as we have done here. Nevertheless, some of the key ideas covered in this paper have been considered by other authors, as we now describe.
A central result of §3 was the approximate conservation of p min and i min during slow mergers: this was ultimately what allowed us to express various important quantities (j min , t sec , etc.) as functions of a. These conservation laws (as well as their breakdown in the late stages of a slow merger) seem to have first been described in the LK limit by Wen (2003). The behavior of p during slow mergers has subsequently been appreciated as an important diagnostic of different regimes; for instance, Antonini has followed the p evolution in order to distinguish between 'LK dominated' and 'GW dominated' regimes (Antonini & Perets 2012;Antonini et al. 2014Antonini et al. , 2017. Some basic scalings of j min , t sec , etc. with a were also written down by e.g. Miller & Hamilton (2002), Wen (2003), Thompson (2011), although none of these authors venture beyond the weak GR regime in their analytical efforts, and so did not derive the peculiar results in the moderate GR regime that we have found here. No other studies have progressed beyond these simple scaling relations, to write down explicit formulae like we did in §4.
Another main achievement of the present paper has been to understand the interplay between the timeevolution of key dynamical quanties like (a, e), and the underlying phase space structure. The fact that a binary initially on a librating phase space trajectory necessarily transitions into the circulating regime as it shrinks was first mentioned by Blaes et al. (2002) (although they did not note the accompanying qualitative change in t sec behavior). Of course, since LK theory corresponds to Γ = 1 > 1/5, no previous authors have noted the new behavior that arises in the 0 < Γ ≤ 1/5 regime, e.g. the abrupt changes in phase space trajectory and the associated sharp 'kink' in t sec (a) -see §5.3. Furthermore, to our knowledge no LK study has distinguished between high-j max and low-j max circulating trajectories.
The only LK study we know of to have written down a formula for the decay in semimajor axis ∆a over one secular cycle is Randall & Xianyu (2018) -see their equation (55). These authors also wrote down an expression (their equation (57)) that is essentially the same as our equation (8), pertaining to the slow evolution of a. In addition, Randall & Xianyu (2018) seem to be the only authors who mentioned that t sec can sometimes decrease as a shrinks, even though every author who has integrated the equations of motion numerically must have encountered this phenomenon. In Appendix E we look in detail at some of the calculations of Randall & Xianyu (2018). As we show there, Randall & Xianyu (2018) implicitly assumed weak GR and σ ≪ 1 when deriving certain analytical results, so their calculations are not valid outside of this regime.

SUMMARY
In this paper we studied the (2.5pN) GW-driven orbital decay and subsequent merger of binary systems which are torqued to high eccentricity by cluster tides on secular timescales. We worked in the DA, test-particle quadrupole approximation and included the effect of (1pN) GR precession in our calculations. Our results may be summarized as follows.
• Cluster tides are capable of torquing binaries to sufficiently high eccentricity that they emit bursts of GWs and ultimately merge. Cluster tide-driven eccentricity excitation is therefore a viable mechanism for producing LIGO/Virgo mergers, similar to LK-driven mergers that have been widely explored in the past. In fact (test-particle quadrupole DA) LK-driven mergers are simply a special case of the cluster tide-driven mergers considered here.
• For slow mergers (those that take place over many secular periods) there are two approximate conservation laws that hold as the semimajor axis a decays, namely conservation of the minimum pericentre distance p min = a(1 − e max ) and conservation of the minimum inclination reached i min . The evolution of a decaying binary through phase space can be understood in terms of these conserved quantities.
• We uncovered several asymptotic regimes both in terms of GR strength and phase space morphology. The different regimes exhibit different characteristic behaviors of secular timescale t sec (a), decay in semimajor axis per cycle ∆a(a), and consequently the decay timescale τ a (a).
• We re-derived a formula for the merger timescale that has been much used in LK theory, and provided a more detailed justification for it than those that have been offered previously.
The insights from this paper will inform future studies of LK-driven and cluster tide-driven binary mergers.
We thank Ulrich Sperhake and Bence Kocsis for comments on an earlier version of this work. This work was supported by a grant from the Simons Foundation (816048, CH), STFC grant ST/T00049X/1 and Ambrose Monell Foundation (RRR).

HIGH ECCENTRICITY RESULTS WITHOUT GRAVITATIONAL WAVE EMISSION
In this section we gather some results from Papers II-III concerning cluster tide-driven secular dynamics without gravitational wave emission (but including GR precession). Though there is nothing strictly new here, it will be useful to have these results gathered in one place and written in a form that makes their meaning transparent.
Without GWs the entire eccentricity evolution is dictated by equation (15) of Paper III: where j 2 ± , j 2 0 are given in equations (16)-(19) of that Paper. It will be important that we are able to derive simple expressions for j ± , j 0 etc. in the high-eccentricity limit. If the binary initially has e not close to unity, then to reach high e max it is necessary to have both Θ ≪ 1 and the binary initially in the weak-to-moderate GR regime. Making these assumptions we can use equations (46) of Paper III, which we repeat here: Moreover, evaluating equations (18)-(19) of Paper III at ω = ±π/2, j = j min ≈ 1 we get the following equation for Σ which did not appear explicitly in Paper III: Using the approximations (A2), (A3) we find that we can write j 2 ± and j 2 0 exclusively in terms of Γ and the three dimensionless numbers Θ, ǫ GR , j min , which are constants when GW emission is ignored: We can write down an expression for j min by assuming Γ > 0 and that maximum e is achieved at ω = ±π/2. Then (see equation (52) of Paper III): where ǫ weak ≡ 6(1 + 5Γ)j 2 + j − ≈ (720ΓΣ) It follows that for weak GR (ǫ GR ≪ ǫ weak ), Similarly, for moderate GR (ǫ weak ≪ ǫ GR ≪ ǫ strong ) and using the constancy of cos i min ( §3.3.2) we have It follows from equations (A4), (A5), (A9) and(A10) that in the weak-to-moderate regime, provided Γ ∼ 1, we always have j 2 + ∼ 1 and j 2 − j 2 min ∼ Θ ≪ 1. Note that in §4 of Paper III we already arrived at the weak GR result (A9). However we did not arrive at the same moderate GR result (A10); instead we found j min ∼ ǫ GR ≫ Θ 1/2 . The reason for the discrepancy is that in Paper III we implicitly assumed that Θ was kept fixed while ǫ GR was increased. However, when the decay of a is due to GW emission, one cannot change ǫ GR without also changing Θ -see equation (23). Accounting for this fact leads to (A10).

Maximum angular momentum
For all the phase space trajectories in which we are interested, j min is given by the same formula (A7). However it turns out ( § §4.2-5) that to understand the behavior of slow mergers one must distinguish between qualitatively different trajectories, and in particular to know their maximum angular momentum j max (corresponding to minimum eccentricity e min ), so we will devote some effort to this now.
The maximum j can either be found at ω = ±π/2 (if the phase space trajectory librates) or at ω = 0 (if it circulates). In the librating case we find 15 j max ≈ j + (librating orbits). Finding an approximate expression for j max for circulating trajectories is more complex, because there are two qualitatively different regimes of circulating trajectory to consider. The first type of circulating trajectory, which we call 'high-j max ', corresponds to j max ∼ 1 or e min ∼ 0, i.e. the binary undergoes an order-unity oscillation in eccentricity during each secular cycle. This is the classic type of circulating trajectory undergone by, for instance, a binary with Γ > 1/5 in the weak GR regime starting out with small eccentricity at ω ≈ 0 -see e.g. Figure 12f for illustration. The second type of circulating solution, which we call 'low-j max ', corresponds to j max ≪ 1 or e min ∼ 1, so that the oscillation in eccentricity is actually rather small despite e max being large -see e.g. Figure 12h. In this case we can say that the binary is trapped at high eccentricity. Low-j max circulating trajectorie are important because every binary passes through this stage while in the moderate GR regime during a slow merger, as a precursor to the strong GR regime 16 .
To make the distinction between high-j max and low-j max trajectories quantitative, recall from Appendix A3 of Paper III that for all circulating trajectories j max is a solution to the cubic equation One can solve this cubic analytically, but for simplicity here we will just plot the solution. Figure 9 shows j max as a function of j 2 0 (which can be positive or negative) for different values of ǫ GR /[3(5Γ − 1)], shown with different colored solid lines. In particular, red, orange and green lines correspond to Γ > 1/5 while blue, cyan and purple lines are for Γ < 1/5. We see from Figure 9 that for Γ > 1/5, circulating solutions exist for all values of j 2 0 . However for Γ < 1/5 no solution exists below some (positive) value of j 2 0 , consistent with what we found in Appendix B of Paper III. We have chosen to split Figure 9 into two asymptotic regions, 'high-j max ' (j max > 0.141, i.e. e min < 0.99) and 'low-j max ' region (j max < 0.141, i.e. e min > 0.99). For high-j max trajectories, provided j 2 0 is positive and O(1) we expect they are well approximated by ignoring the ǫ GR term in (A13), so that j max ≈ j 0 ∼ 1 (high−j max circulating orbits). (A14) In Figure 9 we plot this solution with a dotted black curve. For low-j max trajectories, as long as j max ≪ |j 0 |, we find librating trajectories loop around fixed points at ω = ±π/2, they necessarily have j 2 max > j 2 f,π/2 . We know from Figure 3 of Paper III that j 2 f,π/2 ≫ Θ, and from equations (A9)-(A10) that Θ j 2 − , so we can ignore j − in the quartic and write Since in the weak-to-moderate GR regime we have j 2 + ∼ 1 and ǫ GR ≪ ǫstrong, the solution is obviously jmax ≈ j + . 16 For 0 < Γ ≤ 1/5, we know from Paper III that high-e circulating trajectories are immediately formed once ǫ GR exceeds 6(1 − 5Γ)Θ 3/2 ≪ ǫ weak . Thus, one does not necessarily need to be in the moderate GR regime to have low-jmax circulating orbits. However most of our focus in this paper will be on low-jmax circulating orbits that exist in the moderate GR regime, which occur for all Γ. from (A13) that We plot this solution with different grey dashed curves, using the same values of ǫ GR /[3(5Γ − 1)] that we used for the colored solid lines. We see that for Γ > 1/5 (red, orange and green lines) the true solution interpolates between the two asymptotic solutions (A14), (A15) as j 2 0 is varied. For Γ < 1/5 (blue, cyan and purple lines), equation (A15) provides a good approximation for sufficiently positive j 2 0 > 0. Overall we see that different j max curves touch the −σj − solution approximately at j max ≈ 0.141. In other words, binaries on circulating trajectories transition from high-j max to low-j max circulation around this point. Expressing j max in equation (A15) through e max and using equations (B2) and (B5) derived in the next section, we can calculate the semimajor axis at which this occurs, with the result a ≈ a div 1 + 1 − e max 10 −2 which is ≈ a div in most cases of interest since typically 1 − e max ≪ 10 −2 . Thus a good rule of thumb is that circulating trajectories with a > a div are high-j max , and circulating trajectories with a < a div are low-j max .
In this discussion we have ignored one possible regime, namely that of low-j max circulating trajectories with small |j 0 |, i.e. |j 0 | j max ≪ 1. However as Figure 9 shows, such solutions only exist for a narrow range of j 2 0 values centred around zero. This regime is typically short-lived in the sense that a shrinking binary passes through it rather quickly on the way to merger (equivalently it is centered on a very narrow semimajor axis range around a ≈ a div ). Throughout the rest of the paper we ignore this intermediate case, i.e. we always assume that low-j max circulating trajectories have |j 0 | ≫ j max .

PHASE SPACE EVOLUTION AND GR REGIMES FOR SHRINKING BINARIES
In §3.3 we have seen how j min and Θ depend on a -see equations (21) and (23). We can use these results to understand how a binary moves through phase space as its semimajor axis shrinks. To begin, we substitute (21), (23) and (6) into equations (A4)-(A6) to get j 2 ± , j 2 0 as explicit functions of semimajor axis: where d is defined in equation (24) and ℓ = [( √ 2 − 1)/2] −2/7 ≈ 1.57 -see the definition (26). Next we write down the important dimensionless quantities γ, σ and κ (familiar from equations (49), (50) and (63) of Paper III respectively) as functions of a, as follows. First, by combining equations (A8), (B1), (B2) and the definition (24) it is straightforward to show that Second, plugging (6), (B2) and (B3) into equation (50) These results lead naturally to the definitions of the critical semimajor axis values a sep and a div that we gave in §4.1. We now use these results, as well as the quantities a weak (equation (26)) and a strong (equation (27)), to understand more precisely how binaries move through phase space and different GR regimes as a decays. We begin with the regime Γ > 1/5, and then discuss 0 < Γ ≤ 1/5.   Figures 1, 12, 4 and 5 respectively. We also show the critical values a weak (dotted vertical line), a sep (dashed vertical line) and a div (dot-dashed vertical line), defined in equations (26)-(30). Additionally, in the upper panels we show with blue shading the region |j 2 | < (0.141) 2 , within which the split into 'high-j max ' and 'low-j max ' circulating trajectories is invalid (see the final paragraph of Appendix A). We show with orange shading the region (0.141) 2 < j 2 < 1. In particular, by looking at the runs of j 2 + and j 2 0 and whether they lie in this orange region, we will be able to infer the value of j max and hence infer what type of phase space trajectory the binary is on. Without loss of generality, for each example (a)-(d) we can consider a binary that starts at the extreme right of each panel, i.e. with a ≫ a weak (the weak GR regime), and follow it as a decreases.
First we focus on panels (a) and (b), which are for Γ = 1 (the LK limit). In panel (a) the binary 'begins' at large a with 0 < j 2 + < 1 and j 2 0 > 1; this means that it is on a librating trajectory in the weak GR regime, with j max ≈ j + (equation (A12)). Of course as a is decreased j 2 + is always increased, while j 2 0 is decreased, and when a = a sep the two cross over, j 2 0 = j 2 + = 1. At this point the binary switches to a high-j max circulating trajectory with j max ≈ j 0 . In this case a sep < a weak , so that the separatrix crossing occurs while the binary is still in the weak GR regime. Once a becomes smaller than a div we quickly get j 2 0 values that are strongly negative, and the binary transitions to a low-j max circulating trajectory (Figure 9) with j max ≈ −σj − ≪ 1. It will remain on such a trajectory until it gets trapped at high eccentricity in the strong GR regime around a ∼ a strong (not shown here).
Example (b) shows very similar behavior to example (a), except that the smaller value of cos i min means that the three values a weak , a sep and a div are now even more closely clustered together around a/d ≈ 1 (note also that a sep is now very slightly smaller than a weak ). Because of this clustering, example (b) is perhaps 'cleaner' than (a): for a significantly larger than d the binary is clearly on a librating trajectory in the weak GR regime, whereas for a significantly smaller than d it is clearly on a low-j max circulating trajectory in the moderate GR regime. In practice the transitions between these two various regimes are not always so well demarcated.
At this stage it is worth noting how different quantities scale with a in each regime. From examples (a)-(b) we see that in the weak GR regime (a > a weak ) we nearly always have |j 2 + |, |j 2 0 | ≫ 0.1, and both of these quantities scale very weakly with a. In the moderate GR regime (a < a weak ) the scaling of j 2 + and j 2 0 with a is much stronger, as we would expect from equations (B1), (B3). Moreover, in every case it is clear that |j 0 | 2 lies in the blue shaded region only for a very narrow range of semimajor axes surrounding a div (equation (A16)), and so we were justified in ignoring the small j 0 regime when discussing low-j max circulating trajectories in Appendix A. Turning to the bottom panels, we see that |σ| and |κ| both vary over several orders of magnitude as a is decreased. However, it is noteworthy that for a far away from a div , the value of |κ| is usually O(1) and scales weakly with a.
Finally we turn to examples (c) and (d), which are for Γ = 0.42. The physical interpretation of these examples is identical to those of (a) and (b), demonstrating a broad uniformity of evolution for all binaries in the Γ > 1/5 regime. In fact, this broad-brush picture can break down very close to Γ = 1/5, but we ignore this complication here.
Phase space evolution for 0 < Γ ≤ 1/5 In Figure 11 we plot the same quantities as in Figure 10, except this time we focus on the regime 0 < Γ ≤ 1/5. In particular the choices of Γ and cos i min in panel (a) coincide with those from Figure 8. We see that a rather different phase space evolution emerges for 0 < Γ ≤ 1/5 compared to Γ > 1/5. First we consider panel (a), which is for Γ = 0.176 and cos i min = 0.7. In this case, for large a ≫ d we have j 2 + 1 while j 2 0 is large and negative. This means that in the asymptotic weak GR regime the binary is on a librating trajectory, with j max ≈ j + . However, once a decreases below a div in this plot, we see that j 2 0 becomes positive (though still smaller than j 2 + ). Soon a reaches a sep , below which both j 2 + and j 2 0 are greater than unity: the binary has transitioned onto a low-j max circulating orbit (Figure 9). We note that all of this happens well before the binary reaches the moderate GR regime. This is not surprising because we know that a family of high-eccentricity circulating trajectories (i.e. low-j max ) naturally arises in the 0 < Γ ≤ 1/5 regime as soon as ǫ GR exceeds 6(1 − 5Γ)Θ 3/2 ≪ ǫ weak (Paper III). The binary stays on its low-j max circulating trajectory as a shrinks into the moderate GR regime a < a weak and onward to the strong GR regime.
A very similar story holds in panels (b)-(d). The only important difference is that as we decrease Γ or cos i min , or both, the value of j max ≈ j + for asymptotically weak GR (a ≫ d) decreases. This means that librating trajectories with high e max → 1 in the very weak GR regime do not reach low 17 e min (recall that we have assumed j min ≪ 1 in deriving our expression for j 2 + ). Said differently, for 0 < Γ < 1/5, binaries that initially have e ∼ 0 do not tend to reach e → 1 -that is, low minimum eccentricities are not typically associated with high maximum eccentricities, so this is typically not the type of situation in which we are interested.
Finally we mention that in all examples shown in Figure 11, for a sufficiently far from a div we again have |κ| ∼ O(1) or smaller, and κ varies only weakly with a.

DERIVATION OF tsec FORMULA IN REGIMES C AND D
As a crosses a div and the binary enters regime C, j 2 0 passes through zero and rapidly becomes strongly negativesee equation (B3) and Figures 10-11. Those Figures also show that j 2 + is typically large in amplitude for a a div , certainly larger than j 2 which is limited by j 2 max , and which is already in the low-j max regime. Also, for most of the secular cycle we can neglect j 2 − compared to j 2 since j max = −σj − is well separated from j min ≈ γj − . Taking the limit j 2 − ≪ j 2 ≪ j 2 + , j 2 0 (which is most accurate in regime D with moderate GR), equation (A1) reduces to dj dt Plugging this into equation (32) and performing the integral we get where in the second line we used j min + j max ≈ γj − − σj − = (1 − κ)j min (see equation (A15)). Using equations (21), (B1), (B3), (B6) for j min , j 2 + , j 2 0 , κ, correspondingly, we find

DERIVATION OF AN APPROXIMATE FORMULA FOR SEMIMAJOR AXIS DECAY
Assuming the binary reaches very high maximum eccentricity e max → 1 we can approximate equation (40) as where λ 1 ≡ (1 + 73/24 + 37/96)λ 0 = (170/3)G 3 c −5 m 1 m 2 (m 1 + m 2 ). In general dj/dt -given in equation (A1) -is so complicated that even this approximate integral is intractable. However, noting the very strong j −7 dependence in (D1) we expect the integral to be dominated by the contributions from very high eccentricity, i.e. j ≪ j + , |j 0 |. In this limit we can approximate dj/dt using equation (48) of Paper III. Moreover, since we know that the minimum j min is a zero of the first square bracket in that equation, we can write it as where j α ≡ γj − [1 − 1 + 4γ −2 ]/2 < 0 is the other root of the first square bracket in equation (48) of Paper III, and j σ ≡ −σj − . 18 We now take (D2) and plug it into (D1). Defining and using (21), the result is where λ 2 ≡ 1360G 7/2 m 1 m 2 (m 1 + m 2 ) 3/2 /[9c 5 A(2p min ) 3 |25Γ 2 − 1|] is independent of a, and We can simplify this result in the limit of weak GR. In this limit we have j min ≪ j max so that x max ≫ 1. We also have j σ < 0, so that |x σ − x| = x + |x σ |. In this case the integral in (D5) is completely dominated by the contribution from x ≈ 1, and so we may take the upper limit of the integral to x max → ∞ with impunity. Since j min ≈ j − and γ ≪ 1 in this limit (see equation (B4)), we can simply replace x σ with σ and x α → 1. An excellent approximation to the resulting integral (accurate to within a few percent over several decades of |σ|) is then given by equation (42). RELATION TO RANDALL & XIANYU (2018) Throughout the main text we referred to the paper by Randall & Xianyu (2018) -hereafter RX18 -several times. The RX18 paper largely inspired the present work, since those authors are among the few who have attempted to gain an analytical understanding of LK-driven slow mergers (indeed it is from their paper that we have taken the 18 Using the results of Appendix A one can check that the sign of the quantity inside the square root is positive. For instance, for Γ > 1/5 we recall that low-jmax circulating trajectories have j 2 0 < 0 and jmax = −σj − = jσ, while Type 1 circulating trajectories have j 2 0 > 0, jσ < 0 and jmax ∼ 1. terminology 'slow merger'). In particular, to our knowledge RX18 were first to (i) calculate ∆a explicitly, and (ii) comment upon the decrease in t sec as the binary shrinks and offer an explanation thereof. On the other hand, we feel that both (i) and (ii) as presented in RX18 can be improved. In this Appendix we explain how our calculations differ from those of RX18 regarding points (i) and (ii) ( § §E.1 and E.2 respectively). To begin we present Figure 12. This Figure reproduces exactly the numerical example shown in RX18's Figure  3, from which those authors drew several of their conclusions. Specifically, it follows the evolution of a binary of m 1 = m 2 = 10M ⊙ and a 0 = 0.1AU as it orbits a SMBH of mass 4 × 10 6 M ⊙ . We see that in this example the binary sits from the start in the moderate (rather than weak) GR regime on a circulating phase space trajectory, and that the secular timescale does indeed decrease as the binary shrinks. The merger occurs after around t = 7000 yr. We will refer to this Figure frequently throughout the remainder of this section.
Calculation of ∆a RX18 begin their calculation of ∆a by writing down their equation (55), the first two lines of which are identical to our equation (40) if we evaluate the final bracket at e = e max . One is then faced with the computation of an integral, ∆a ∝ dt(1 − e 2 (t)) −7/2 , over one secular cycle. To perform this integral in §4.3 we changed variables from t → j ∈ (j min , j max ) and hence wrote down equation (D1). On the other hand, RX18 choose to compute the integral by first approximating e(t) as a quadratic in time (see their equation (53)). In particular, using our notation and letting the maximum eccentricity occur at t = 0 without loss of generality, their equation (52) RX18 then plug this into dt(1 − e 2 (t)) −7/2 and integrate over t ∈ (−∞, ∞) to get ∆a. The result is their second equation (55), which in our notation and evaluating at e max ≈ 1 reads ∆a RX18 ≈ − 544G 3 m 1 m 2 (m 1 + m 2 ) 9c 5 a 3 j 6 min × d 2 e dt 2 −1/2 t=0 .
Finally, RX18 evaluateë| t=0 using their equation (53). However, RX18's method for computing ∆a implicitly makes two assumptions which are not true in general, as we now explain.
1. The assumption that e(t) is quadratic for small t is equivalent to the assumption that j(t) is quadratic for small t. We know from Paper III that this quadratic approximation is only good if the binary is in the weak GR regime (ǫ GR ≪ ǫ weak ) and it has σ ≪ 1 (equation (B5)). While these conditions do hold for many binaries of interest (i.e. see the early stages of Figures 1 and 4, for which σ ≈ 0.02 and 0.08 respectively), they are not true for the RX18 calculation shown in Figure 12 -this example begins in the moderate GR regime (panel (e)) and has σ ≈ 14.8.
The condition (E4) does happen to be true in the specific numerical example shown in Figure 12, but it is certainly not true in general, as we have seen in several numerical examples (Figures 1, 4, 5 and 8). In fact, we know from Appendix A that if a slow-merging binary is initially in the weak GR regime then it has j 2 min ∼ Θ all the way into the moderate GR regime and beyond, so in general one should use the formula (E3).
We can make a direct comparison between our method of computing ∆a and that of RX18 as follows. Let us follow the RX18 method and use equations (E2) and (E3), evaluating dω/dt at maximum eccentricity using equation (12) of Paper III -we call the result ∆a RX18 . We then compare the result to our equation for ∆a, namely (D4). Using  (2018). In this case a binary with m 1 = m 2 = 10M ⊙ orbits a supermassive black hole (i.e. Kepler potential, Γ = 1) of mass M = 4 × 10 6 M ⊙ . The outer orbit has semimajor axis ag = (ra + rp)/2 = 150 AU and eccentricity eg = (ra − rp)/(ra + rp) = 0.1. Note that a div and d overlap almost exactly, which follows from the fact that in this example cos i min ≪ 1 (see equation (30)). j 2 min = 2p min /a and after some algebra we arrive at We can make sense of (E5) by evaluating the right hand side in the weak and moderate GR regimes.
where to get the second equality we used (42). Note that for σ ≪ 1 we recover ∆a RX18 = ∆a, i.e. our calculation coincides precisely with that of RX18 when we make the approximations that they (implicitly) did, namely weak GR and σ ≪ 1. However, we emphasize that neither of these approximations is actually valid for the example shown in Figure 12.
In the moderate GR regime we assume that the ǫ GR term dominates the final bracket in (E5), and that the adependent terms dominate equations (B1), (B3). With these assumptions we get with ξ given in equation (49) (and plotted in Figure 3). All three fractions in (E7) are O(1) or larger. Thus we typically have ∆a RX18 /∆a ≫ 1, meaning that the method of RX18 can seriously overestimate the value of ∆a in the moderate GR regime.
Decrease in t sec with time As we mentioned in §1, the decrease in t sec with time during a slow merger was first pointed out by RX18 in their §3.1, when discussing the example shown in Figure 12. When interpreting this counter-intuitive scaling of t sec (a) physically, RX18 noted that smaller a (larger ǫ GR ) promotes faster apsidal precession, which is obviously true. They then claimed that this faster precession directly leads to a shorter secular period. They also claimed that it was directly responsible for the corresponding increase in maximum eccentricity with time and decrease in minimum eccentricity with time as the binary shrinks (Figure 12b). This interpretation is not quite right, and also does not explain why in the librating regime t sec increases with shrinking a. In reality, in the weak-to-moderate regime, GR precession is unimportant except during an extremely high eccentricity episode. Typically these extreme eccentricity episodes last a very short time compared to the secular period. In other words, for most phase space trajectories the second (GR) term in equation (56) of RX18 is completely negligible during the majority of the evolution, so barely affects t sec . What GR precession does do, when coupled with GW emission, is to alter the phase space morphology, and to periodically nudge the binary onto a new phase space trajectory every time it reaches high eccentricity (note how closely the contours of H * are bunched at these high eccentricities in Figure 12f-h). As a is decreased and ǫ GR is increased, after passing from libration to circulation the binary gets pushed ever further away from the separatrix, towards the low-j max circulating region where (39) applies. As long as this process continues the binary gets pushed to higher minimum eccentricity (smaller and smaller j max ), even though its e max is getting smaller. On average the binary spends more and more time at 'high' (say e 0.9) eccentricities where cluster tide-driven secular evolution is fast (since the binary angular momentum is small). We emphasize that this last statement is true regardless of GR precession: indeed, the binary typically does not care about GR precession directly when, say, e = 0.9. Thus, whereas RX18 attributed the evolution of t sec and e min/max to fast GR-aided ω precession during the whole secular cycle, both of these phenomena are present even in the weak GR regime where apsidal precession is nearly always negligible -see Figure 4.