Proving the short-wavelength approximation in Pulsar Timing Array gravitational-wave background searches

A low-frequency gravitational-wave background (GWB) from the cosmic merger history of supermassive black holes is expected to be detected in the next few years by pulsar timing arrays. A GWB induces distinctive correlations in the pulsar residuals --- the expected arrival time of the pulse less its actual arrival time. Simplifying assumptions are made in order to write an analytic expression for this correlation function, called the Hellings and Downs curve for an isotropic GWB, which depends on the angular separation of the pulsar pairs, the gravitational-wave frequency considered, and the distance to the pulsars. This is called the short-wavelength approximation, which we prove here rigorously and analytically for the first time.


Introduction
Gravitational waves (GWs) are ripples in the fabric of space-time, originating from some of the most violent events in the Universe, including the mergers of supermassive black holes. High frequency GWs from the merger of stellar-mass black holes were first detected by the Laser Interferometer Gravitational-wave Observatory (LIGO) in September 2015 [1], hailing the dawn of gravitational-wave astronomy. However, LIGO can only detect high frequency GWs, in the 100-1000Hz range. Similarly to electromagnetic radiation, different GW detectors are needed to probe different GW frequencies. Currently there are plans to launch a space-based GW detector in 2034-the Laser Interferometer Space Antenna (LISA) [2]-which will probe the millihertz GW frequency regime, thought to be populated primarily by merging supermassive black holes (SMBHs) in the 10 5 -10 6 M e range. At the very low-frequency end of the GW spectrum, one expects to find nanohertz GWs from very massive inspiraling SMBHs, in the  -M 10 10 8 9 range. These can be detected by timing millisecond pulsars, called a Pulsar Timing Array (PTA) [3][4][5][6]. Millisecond pulsars are excellent clocks, and delays or advances in their arrival times-inducing a timing residual-could signal the presence of GWs. PTA experiments are very active, and have been taking data for over a decade [7][8][9]. With a PTA, one can detect not only GWs from inspiraling SMBH binaries (SMBHBs), see e.g. [10,11], but the GW background (GWB) from the cosmic merger history of SMBHBs [12][13][14]. This GWB is expected to be detected in the next few years [15,16], with the details depending on the underlying astrophysics of the SMBH mergers. More details on PTAs can be found in recent review articles, e.g. [17][18][19][20], and an outline GW astrophysics covering nanohertz to kilohertz frequencies can be found in [21].
Indeed, a rigorous exploration and examination of the tools which will be used to make the first detection of a GWB is crucial. An isotropic GWB will induce characteristic correlations in the pulsar timing residuals. By cross-correlating these residuals, one expects to see a characteristic correlation called the Hellings and Downs curve [5]. Deviations from an isotropic GWB can be induced by nearby and/or particularly loud SMBHBs, inducing anisotropy in the GWB. Anisotropic GWBs will induce different correlations patterns, and have been explored by [22][23][24][25][26].
Here we prove analytically, and for the first time, that the Hellings and Downs curve can be extracted from the cross-correlated pulsar residuals, without making assumptions that the pulsars are all at the same distance L from the Earth. Part of this proof is a consequence of the application of the Riemann-Lebesgue lemma and the Lebesgue Dominated Convergence Theorem-well-known in the mathematics community, but less wellknown in the field of GWs. We emphasize that no previous work has been able to do this analytically, though computer-aided integration has been used to verify one's intuition numerically.

The characteristic strain
The International PTA (IPTA) published combined data on 49 millisecond pulsars in their first data release [14]. These millisecond pulsars are the most stable natural astrophysical clocks known [27], and are regularly monitored by 8 radio telescopes: 5 in Europe [8], 2 in North America [7] and one in Australia [9]. PTAs take advantage of the precise arrival times of millisecond pulsars to enable GW detection.
The GWB is described in terms of its characteristic strain, h c ( f ), with amplitude A at a reference frequency of 1/yr (e.g. [28]): The current upper limits on A are difficult to compare, since it was recently discovered that errors in planetary masses and positions (called the solar system ephemeris model) can directly affect the limit on A [12,29], and in some cases mimic a GWB signal.
While the current upper limit on A from NANOGrav can take this into account, and limit A<1.35×10 −15 , other PTAs have not yet published updates to their limits. Projections the characteristic strain accessible with future IPTA and Square Kilometer Array (SKA) [30][31][32] detectors are shown in figure 1.
The observed residuals due to the presence of a GWB with characteristic strain h c ( f ) is described by the cross-power spectral density of pulsar 1 and pulsar 2 by  see e.g. [34], where Γ 1,2 is the so-called overlap reduction function, which describes the GWB-induced correlation signature in the pulsar residuals. This is a function of the frequency of the GWB, the distance to the pulsars L 1,2 , and the angular separation of the pulsars, ζ. PTA geometry is explored in detail in figure 2. For an isotropic GWB, this is called the Hellings and Downs curve [5], and for anisotropic GWBs see [22][23][24][25].

The Hellings and Downs curve
In analogy with [22,24], we present an overview of how one arrives to the Hellings and Downs curve. A source of GWs in direction -Ŵ , see figure 2, generates a metric perturbation W (ˆ) h t, ij , which we describe as a plane wave: We note that General Relativity predicts only two independent polarizations, plus +, and cross ×, while other theories predict additional polarizations, such as breathing modes [37][38][39]. He we restrict ourselves to the wellknown tensor polarizations, A=+, ×. The W (ˆ) e ij A polarization tensors are uniquely defined by specifyingm andn-the GW principal axes, illustrated in figure 2: For a stationary, Gaussian, and unpolarized GWB, the polarization amplitudes satisfy (see e.g. [40]): The metric perturbation will change the proper distance between the Earth and the pulsars, inducing an advance or delay in the pulsar pulse's arrival time at the Earth. Consider for example a millisecond pulsar with frequency ν 0 whose location in the sky is described byp, at a distance L from the Earth. The metric perturbation affects the frequency of the radio pulses, ν, received at the radio telescope. This frequency shift is given by where The indices 'e' and 'p' refer to the Earth and the pulsar, however, it is standard write t e =t, see e.g. [22,24,41,42].
We can now write (3.6), using (3.1) and (3.2) as The fractional frequency shift, z(t), produced by a stochastic GWB is simply given by integrating equation (3.5) over all directions. Using (3.1) and (3.8), we obtain: are the antenna beam patterns for each polarization A, which we write as Searches for the GWB rely on looking for correlations induced by GWs in the timing residuals of pulsar pairs. Indeed, the observed quantity in PTA experiments is the timing residual ( ) r t , which is simply the integral of equation (3.9) in time: The expected value of the correlation between a residual from pulsar 1 at time t j , with that from a different pulsar, say pulsar 2 at time t k , depends on terms of the form: where H( f ) contains the information of the spectrum of radiation. In analogy with [22,23,36], we define the quantity above that depends on the angular separation of the pulsars, ζ, their distances from the Earth, L 1 , L 2 , and the GW frequency f, as the overlap reduction function In order to write a closed-form, analytic solution to (3.15), we choose a reference frame where one pulsar is placed along the z-axis and the other in the x-z plane as seen in figure 2. Specifically, we write 3.17 ) n e cos cos , cos sin , sin . 3.17 We remind the reader thatp 1 andp 2 are the unit vectors pointing to pulsars 1 and 2, respectively, Ŵ is the direction of GW propagation andm andn are the GW principal axes, see figure 2. Note that in this reference frame = F 0 a by (3.11), making it a convenient choice. For an isotropic GWB one is free to choose whichever coordinate system is most convenient, as was done here. However, one must be more careful when considering reference frames which are used to describe pulsar locations in an anisotropic GWB, as was done by [22].

Main results
We choose the coordinate system defined in equation (3.17), and apply it to equation (3.15). The result is equations (4.1) and (4.2).
Claim. Let L L f , , 1 2 , be real positive constants. Then, for each z p Î ( ] 0, , as  ¥ fL 1 and , a case covered in [24]. Here, Note that the above integrals are now written in terms of the coordinate system constructed in equation (3.17), and illustrated in figure 1, which was applied to (3.15) and (3.16).
Until now, one was only able to show this result by picking some values of pulsar distance 1, L 1 , and pulsar distance 2, L 2 and solve (4.1) numerically assuming some GW frequency f. In the literature, e.g. [22,24], the authors invoke the reader's physical intuition to support the numerical result-that if the exponents in (4.1) are large,  fL 1, these oscillatory pieces rapidly converge to zero. This is often referred to as the 'short wavelength approximation', and has been used without proof, which we will now provide.

Proof of claim
To prove this result, we estimate each of the four integrals (4.4)-(4.8) below which make up (4.1) separately. We apply the Lebesgue Dominated Convergence theorem (see appendix), Fubini's Theorem, and the twodimensional Divergence theorem to get the required limiting value (4.1). A key result used in the proofs which follow is a variant of the Riemann-Lebesgue Lemma in harmonic analysis (see [43], p. 277 and [44], p.2): let a, b be finite (though this is not necessary). Then for a Lebesgue integrable function f (a comprehensive definition and examples of this are given in appendix) over [a, b], The aforementioned Dominated Convergence Theorem, equation (A.1), basically gives us conditions under which we can interchange the operation of taking the limit of an integral with the integral of the limit. ) sin sin 1 cos 0, since both these quantities are necessarily non-negative. This, in turn, implies that for given ζ, θ=π−ζ and f=π or ζ=0, f any, or ζ=π, f any. Each of these cases is handled by limiting arguments.

Equation (4.5) tends to zero
Here we show that the first of the three equations with the exponential pulsar terms, (4.5), tends to zero. Using the above notation, and using the fact that K 2 is absolutely integrable over , Fubini's theorem on the interchange of iterated integrals yields the equality, where in the new variables is still absolutely integrable. Next, since * K 2 is absolutely integrable over its domain, the ordinary two-dimensional version of the Riemann-Lebesgue Lemma, equation Since the previous integral is itself  ¥ (|| || ) K 2 , the Lebesgue Dominated Convergence theorem (A.1) can be used to interchange the order of the limit and the integral. We find: Thus, (4.5) tends to zero as l  ¥.

Equation (4.6) tends to zero
Preamble. In order to extend the previous idea to more general exponents, we apply integration by parts to double integrals via the Divergence Theorem. In order to prove either (4.6) or (4.7) it suffices that we obtain the decay estimates  m as m  ¥ or l  ¥. What follows is the general idea which we then apply to the various cases. We need to estimate limits of the form i g g , i as w  ¥. (Note that we used Fubini's theorem to justify the interchange of the order of integration in the iterated integral (4.6)). Here  along with its boundary (or perimeter), , are completely contained in  and are chosen so that q f  ¹ ( ) g , 0on and inside   È (which necessarily has no points in common with ). By construction, the gradient of g , g does not vanish on   È and therefore the quantity We need to estimate the integral in (4.9) for large ω. First, observe that (suppressing the variables for clarity of exposition) where we assume, in addition, that f is sufficiently smooth so that  · u is defined.
which when inserted into the previous display and integrated over  yields, where n is the unit normal to , itself oriented in the positive direction, and σ is arc length. Combining the two previous displays we get, Once we know that both integrands are absolutely integrable over  and  respectively, we get  Then, by construction,  ¹ g 0 in  as well as on its perimeter, . Defining u as in (4.10) we then obtain (4.11) for suitably smooth functions f g , , i.e., w  ( ) I 0 on   È , for every e > 0. Now set = f K 2 and note that both integrals in (4.11) are finite on their respective region of integration. Thus, for every e > 0, i.e., so taking the limit as ε approaches zero, we must have as the right hand side of (4.14) is necessarily equal to (4.6) and so must vanish as well by (4.13), which is what we set out to prove. To this end, we note that, by continuity of the integrals, and that, in fact, the convergence is uniform in ω, and this last integral may be made arbitrarily small, independently of ω, if ε is sufficiently restricted. Hence the convergence in (4.15) is uniform in ω (actually for any w Î R but we only require this for ω on the half axis, ). We are now in a position to apply a fundamental theorem on the interchange of such limits (see citepf, p. 395) to validate the equality in (4.14) and complete the proof in the case where z p Î ( ) 0, .
In this case q f q = --( ) g , 1 c o s is independent of f and the resulting double integral can be handled in a similar way as (4.5), the only difference being the presence of a negative sign in the exponent. This, however, causes no difficulty with the argument in that section and so we omit the details.

Equation (4.7) tends to zero
We use the same basic technique as in the proof of (4.6). The proof of the limiting result for (4.7) can be obtained by reducing it to the case of (4.6) just proved. For example, that is absolutely integrable over , since K 2 is and the exponential term has modulus equal to one. So, it follows from the methods above leading to (4.6) approaching zero as m  ¥ that (4.7) also tends to zero as m  ¥. Similarly, interchanging the μ and λ terms in the preceding integral we obtain that (4.7) tends to zero as l  ¥ as well.

Final result: the Hellings and Downs curve
We have shown that for an isotropic GWB, and for  ¥ fL i , that the pulsar terms tend to zero. We can now write down the final form of the overlap reduction function: the 'Hellings and Downs' curve [5]: for z p Î ( ] 0, by [5,22,41]. Several comments should be made about equation (4.16) regarding a choice of normalization for the Hellings and Downs curve, the failure of the short-wavelength approximation, and the subsequent approximation of the pulsar term by a delta function for the autocorrelation.
When evaluating the autocorrelation, one can easily see that the value of the overlap reduction function is p 4 3. Indeed, it is a choice of normalization to set the autocorrelation equal to one when z = 0, which requires that equation (4.16) be multiplied by p ( ) 3 4 . Next, one will note the d + ( ) 1 1,2 term: this takes into account the failure of the short-wavelength approximation when evaluating the autocorrelation term. We approximate this is a delta function, however, in appendix C of [25], equation C4, it is shown that the exact solution is where we adopt natural units (c=1). Here = ( ) j x x x sin 0 is a spherical Bessel function of the first kind. Since the oscillatory piece is suppressed by a factor of ( ) fL 1 2 , approximating the pulsar term by a multiplicative factor of 2 for the z = 0 autocorrelation case is justified.
Mingarelli and Sidery 2014 [24] also explored analytic expressions for the autocorrelation, and found that the k W (ˆ) fL , 1 . This case was explored in detail by Mingarelli and Sidery 2014 [24], who found that there are strong additional correlations from the pulsar term when the pulsars are separated by less than one gravitational wavelength. The authors also gave first and second order corrections for these cases.

Discussion and conclusion
We have shown analytically that the Hellings and Downs curve approaches the Earth-term only solution, even when the pulsars are arbitrarily distant from the Earth, and not themselves at the same distance L from the Earth. Of course, the case when = ≔ fL fL fL 1 2 is easily recovered, since in this case, l m = and all terms (4.5)-(4.7) approach zero as the common value of this parameter fL approaches infinity.
The proofs indicate that the asymptotic estimate, equation (4.1), holds for sufficiently smooth kernels that are absolutely integrable over the region , and not just kernels of the form z f q ( ) K , , 2 as considered here. The astrophysical interpretation of this result is that if one monitors any galactic millisecond pulsar, and cross-correlates it with a pulsar in e.g. the Large Magellanic Cloud, the Hellings and Downs curve would still be correct correlation function to use, under the assumption that the GWB is isotropic. Anisotropic GWBs can be handled similarly, but care is required when evaluating the new kernel.
To summarize, we have shown that for pulsars at distances L 1 and L 2 from the Earth, that the pulsar terms tend to zero as the  ¥ fL i . The asymptotic estimate (4.1) is false if fL 2 is fixed as there is no reason for the integral (4.6) to tend to zero as  ¥ fL 1 , since it is independent of fL 1 . While this result is consistent with the previous intuition developed in the field of nanohertz GW astronomy, and indeed verified numerically for a few values of fL in [24], it has never before been proven analytically or generally for any fL i . This result is an important validation of a fundamental result in the field, and lends credibility and rigor to current GWB searches as we enter detection era in nanohertz GW astronomy.
Here we give a brief overview of the Lebesgue integral on the real line, R, see ( [45]), Chapter 10. Readers familiar with this concept may proceed immediately to the Main Results in section 3 and their proofs.
One of the great advantages of the Lebesgue integral over the classical (Riemann) integral is in the handling of limiting processes such as limits of integrals of sequences of functions which, in the classical case, usually requires uniform convergence of the sequence in question-this is not the case for the Lebesgue integral.
Fundamental to this now-standard integral in mathematics is the notion of Lebesgue measure. One intuitive notion of measure assigns the value b−a for the length of an interval [ ] a b , or, more generally, -- Aside traditional examples such as rectangles, unions of disjoint intervals, etc one can consider the set Q of rational numbers on the real line, or its complement, the set of irrational numbers there. Both the latter two sets are Lebesgue measurable, the former has Lebesgue measure zero while the latter has positive Lebesgue measure.
More precisely, a set Ì E R is said to have Lebesgue measure zero if for any given e > 0 there is a sequence of intervals [ ] a b , n n , n=1, 2, Ksuch that E is contained within the union of all these intervals where, in addition, e å -< = ¥ ( ) b a n n n 1 . For example, the set of all rational numbers Ì Q R has measure zero.
One of the connections between the Lebesgue theory and the ordinary Riemann theory of the integral is the following result: If a function f is bounded on an interval [ ] a b , then f is Riemann integrable over [ ] a b , if and only if the set of its discontinuities is a set of measure zero. By its very definition, the Lebesgue integrability of f forces that the absolute value of the function, | | f , in question be Lebesgue integrable, which is not so in the case of the Riemann integral. Still, the advantage in using Lebesgue integrals is huge in that we can extend the class of functions and sets over which we are integrating and still get meaningful results. Now we offer an extremely brief introduction to the Lebesgue integral, with a few key definitions and results. Briefly, with the Lebesgue integral we subdivide the range of a given function f, look for those parts where horizontal lines intersect the graph of f and then drop rectangles onto the domain of f. (Recall that we subdivide the domain of f in the case of the Riemann integral).
More generally, the measure of a set Ì E R is the greatest lower bound (see e.g. [45], p. 11) of the set of all numbers of the form å -= ¥ ( ) b a n n n 1 where the union of all the intervals [ ] a b , n n contains E. A set E is said to be Lebesgue measurable if it has finite measure. A function f is said to be measurable if the special set is measurable for every real number a. These are precisely the functions that we can integrate in the Lebesgue sense, i.e., the Lebesgue integrable functions over E. First, the integral is defined for simple functions (i.e., those functions whose range is a finite set of points). Then, using the fact that for a given measurable function In addition, the space ¥ + ( ) L R is the space of all such functions f for which there exists a constant, C, depending on f, such that < | ( )| f x C almost everywhere on + R .