The observed galaxy power spectrum in General Relativity

Measurements of the clustering of galaxies in Fourier space, and at low wavenumbers, offer a window into the early Universe via the possible presence of scale dependent bias generated by Primordial Non Gaussianites. On such large scales a Newtonian treatment of density perturbations might not be sufficient to describe the measurements, and a fully relativistic calculation should be employed. The interpretation of the data is thus further complicated by the fact that relativistic effects break statistical homogeneity and isotropy and are potentially divergent in the Infra-Red (IR). In this work we compute for the first time the ensemble average of the most used Fourier space estimator in spectroscopic surveys, including all general relativistic (GR) effects, and allowing for an arbitrary choice of angular and radial selection functions. We show that any observable is free of IR sensitivity once all the GR terms, individually divergent, are taken into account, and that this cancellation is a consequence of the presence of the Weinberg adiabatic mode as a solution to Einstein's equations. We then study the importance of GR effects, including lensing magnification, in the interpretation of the galaxy power spectrum multipoles, finding that they are in general a small, less than ten percent level, correction to the leading redshift space distortions term. This work represents the baseline for future investigations of the interplay between Primordial Non Gaussianities and GR effects on large scales and in Fourier space.


Introduction
The cosmological interpretation of galaxy clustering data is complicated by the fact we observe the angular position and redshift of the galaxies, rather than their true physical position on the lightcone. The mapping between these two systems of coordinates is distorted by a number of observational effects. The most important one is Redshift Space Distortions (RSD), i.e. the contribution of the galaxies peculiar velocity, projected along the line of sight (LOS), to their observed redshift. RSD select the observer's location as a special place and therefore break the isotropy and homogeneity of nth−point correlation functions. On the other hand, RSD give access to information about the velocity field we would not be able to retrieve otherwise.
Other observational effects, arising from the perturbation of the observed redshifts and angles, are proportional to the gravitational potential or its gradient and are therefore suppressed on sub-horizon scales with respect to the underlying dark matter density field, while becoming important only on very large scales. The relevant dimensionful parameter is the Hubble scale, which separates sub-horizon modes where Netwonian physics applies from the super-horizon scales where the full General Relativity machinery is at work. For this reason these observational features are usually called general relativistic (GR) effects 1 . Examples of GR effects are lensing, doppler magnification and the Integrated Sachs-Wolfe (ISW) effect. Galaxy clustering in a relativistic framework has been first derived in Refs. [1][2][3][4][5] in linear perturbation theory and then extended to higher orders in Refs. [7][8][9][10][11]. It has been also generalized to vector perturbations [12], non-flat FLRW universe [13], intensity mapping [14] and Ly-α forest [15] observables.
GR effects are important for two main reasons. The first one is that they provide a fully gauge-invariant framework to test gravity on the largest scales. Secondly, GR effects could be a contaminant to measurements of local Primordial Non Gaussianities (PNG) in the clustering of galaxies with scale dependent bias [16][17][18][19]. Local PNG appear as large scale divergences, proportional to the primordial gravitational potential, in the clustering of biased tracers. Local PNG arise from the coupling between the large and small scale modes generated during inflation, and in the squeezed limit they cannot be generated by the dynamics of GR at later times. However, projection effects, like the GR effects responsible for the mapping between true and observed coordinates, could mimic PNG, since they also contain terms proportional to the gravitational potential. The literature on the subject is vast: for photometric surveys measuring angular power spectra, marginalizing over the unknown free parameters that enter the full GR expression causes significant degradation of the error on local PNG [20][21][22][23]. This is in part due to GR effects contributing mostly to the cross correlation between different redshift bins, which would be zero in a standard Newtonian approach. In Fourier-space (but in plane-parallel approximation and neglecting integrated effects)Ref. [24] concluded that the degeneracy is reduced, especially if one is able to set priors on evolution and magnification biases, due to the different redshift evolution between local PNG and relativistic effects. A deeper understanding of this issue is therefore of utmost importance for future surveys that could improve over current CMB [25] and Large-Scale Structure [26] constraints by an order of magnitude [27].
On this topic, it has been recently argued in [28] that GR effects do not produce any PNG-like feature at large scales in the galaxy power spectrum. This is interpreted as a consequence of the equivalence principle. The cancellation of these terms holds for the variance of the galaxy density field, and for a galaxy power spectrum the authors in [28] define such that its variance corresponds to the observed one.
The goal of this work is to clarify on the issue of whether GR effects could be degenerate with a measurement of local PNG. We will confirm the result in [28] for the variance of the galaxy density field, but at the same time we will show the power spectrum as defined in [28] is not observable, and that true observed power spectrum as estimated in galaxy redshift surveys receives contributions proportional to the gravitational potential even in the absence of PNG. However, these terms do not show any sensitivity to long wavelength modes, which we will demonstrate is a consequence of the equivalence principle and of the existence of Weinberg adiabatic modes [6]. Contributions to the power spectrum arising from local PNG are instead genuinely sensitive long wavelength modes. For the first time we will also include observational effects like an arbitrary survey geometry and selection function when discussing the importance of GR effects on the power spectrum. Differently than previous analyses in Fourier space, our approach will include also wide-angle and evolution effects consistently.
This paper is organized as follows. In section 2 we introduce the power spectrum estimator and we discuss its interpretation and its connection to the theoretical modeling. In section 3 we describe the fully relativistic galaxy number counts and then in section 4 we discuss the cancellation of the IR divergences in the correlation function. In section 5 we compute the power spectrum including all the relativistic effects and then we conclude in section 6. Several appendices contain the details about azimuthally symmetric window functions (App. A), the full relativistic correlation function (App. B) and the contribution of the observer velocity (App. C).
In this work we adopt the following Fourier convention for the Fourier transform 2 The power spectrum estimator Given a catalog N (n, z) of galaxy positions in the sky, n, and redshifts z, we have several ways to construct a quadratic estimator of the data. For the remainder of this work we will focus on spectroscopic surveys, where redshift accuracy is obtained for each object in the catalog. If one wants to work in the observed coordinates, then a spherical harmonics approach leads to the measurements of as many angular power spectra as redshifts bins of the data (plus their cross correlation). The advantage of this approach is that the implementation of GR effects is simple and there is no need to assume a fiducial cosmology to perform the measurements, the so called Alcock-Pacinski effect [29] 2 . There are two main drawbacks of using angular power spectra. The first one is that, to achieve optimal signal extraction, the number of redshift bins has to be extremely large, see for instance Refs. [33,34]. This is more of a concern for small scales analysis than for PNG ones, however in the latter case one has to compute the cross correlation between all the different bins. This results in a large data vector, which makes the estimation of the covariance matrix of the data a challenging task. The second problem is that the connection to the perturbative approaches beyond linear theory, required to properly model the small scales, is non trivial 3 . Similar conclusions hold for spherical-Fourier-Bessel (sFB) methods [37][38][39][40][41][42][43][44], where the redshift coordinate is radially Fourier transformed. The other main option to build a quadratic estimator of the data is to work in a 3D coordinate system and measure the two-point correlation function or the power spectrum. This approach assumes a fiducial cosmology to convert angular positions and redshifts into distances. This causes no loss of information, but it requires some care if the assumed fiducial cosmology is very different than the true underlying one. That is however not a worry since Planck [45] and BAO [46,47] measurements have constrained the distance-redshift relation to a few percent. The advantages of 3D methods are the straightforward connection to perturbation theory and the reduced dimensionality of data vector, comprising of approximately 100 data points. The main drawback of 3D estimators is that GR effects are not easily implemented, especially the ones integrated along the line of sight, like lensing magnification, ISW and time-delay. One of the main goal of this work is to resolve this issue as well, providing the exact interpretation of the estimator of the power spectrum. Due to its practical advantages and robust theoretical interpretation the 3D methods should be preferred in spectroscopic surveys, and Fourier ones in particular to constraints on PNG.
A generic estimator for the multipoles of the power spectrum can be written by summing over all pairs of galaxies with the appropriate Fourier phases, (2.1) where A is a normalisation constant, ∆(s) denotes the galaxy density contrast, φ (s) is the survey window function, and L L is the Legendre polynomial of order L. This is the so called Yamamoto estimator [48], which reduces to the FKP estimator [49] for L = 0. Beyond the monopole, the estimator depends explicitly on the choice of the line of sight, d LOS , for the pair of galaxies at s 1 and s 2 .
For an unbiased estimate of cosmological parameters it is therefore of utmost importance to first understand the relation between the average of the estimator and the analytical prediction. Some of the following considerations have appeared elsewhere before [50,51], but we report them here for clarity. In Section 1 we said that RSD, and other GR effects, break homogeneity and isotropy of correlation functions, making for instance the power spectrum of the observed galaxy density field non-diagonal. This means that if we Fourier Transform ∆[s(n, z)] to ∆(k), we obtain while the Yamamoto estimator defined above is a function of a single wavenumber k. We will now show that the relation of P (k 1 , k 2 ) withP L (k) is LOS-dependent. Consider first the midpoint as the line of sight, d LOS = d m ≡ (s 1 + s 2 )/2, and assume that the window function is equal to unity everywhere inside the survey volume, which we take to cover the whole sky. It is easy to see that in this case the average of the estimator is given by [50,51] revealing that the Yamamoto estimator further compresses the underlying power spectrum by averaging over the wavenumber q associated to the line of sight d m . Another possible choice for the line of sight is one of the two pair members, d LOS = s 1 , usually called the endpoint line of sight. In this case which shows how different choices of LOS result in different weighting of the momentum q.
A more compact expression for the average of the estimator can be obtained by noticing that the configuration space two-point correlation function depends on the triangle formed by the observer and the pair of galaxies and it can therefore be formally Fourier transformed with respect to s ≡ s 2 − s 1 to define a LOS dependent power spectrum The Yamamoto estimator is then the LOS average of the LOS dependent power spectrum The above expression also clarifies how to take the plane-parallel limit of the estimator [50].

The convolution with a window function
For practical reasons the choice d LOS = s 1 is usually preferred [51][52][53], since it requires fewer Fourier Transforms than the identification of the LOS as the bisector between the two galaxies and the observer [50]. We shall use this choice, usually referred as the endpoint LOS in the literature, for the remainder of this work. Although not strictly necessary, in our implementation we will assume that the window function can be separated into a radial and angular part, i.e. φ(s) = φ(s)W (ŝ). Generalizations to non-separable masks are straightforward. If we start from the following decomposition for the two-point correlation function with µ ≡ŝ ·ŝ 1 , we arrive to a compact expression for the expectation value of the Yamamoto estimator where we have introduced The integration variable s 1 serves to keep track of the redshift evolution of the sample within the survey, and of the different dependence of the GR effects on the distance between the observer and the galaxies. In real surveys the functions F (s 1 , s) can be estimated using traditional pair counting algorithms or Fast Fourier Transforms (FFT).

The effective redshift
A useful approximation often employed in the literature is to assume that the theoretical prediction can be computed at some effective redshift. This allows to take the correlation function multipoles outside of the ds 1 integral in Eq. (2.9), that can be computed once and for all over the window function multipoles F . The effective redshift approximation is motivated by the fact that we are mostly interested in the clustering at separations much smaller than the comoving distance associated to the redshift of the individual galaxies, that can therefore be modeled at the same redshift. On the small scales relevant for BAO and RSD this has been shown to be an excellent approximation [54], as well for PNG studies on small patches of the sky [26]. To see how this works let us assume that the correlation function can be written as a sum over different terms, each of them being the correlation function between two operators O and O entering the expansion of the galaxy density field, For each contribution to the correlation function we could therefore choose an effective redshift such that at a given, small enough, reference separations, the model evaluated at this suitably defined redshift exactly matches the full prediction in Eq. (2.9). In practice, at small scales the correlation function is dominated by the Newtonian terms in the plane parallel limit, that allows us to define a single effective redshift for each multipole (2.12) Notice this operation is always possible as long as the transfer function and the growth factors are separable. Under the effective redshift approximation the average of the Yamamoto estimator reads The functions Q 1 can be easily estimated using FFT methods [55]. It is often the case that the different multipoles have very similar z eff, , and therefore a common effective redshift is defined for the entire correlation function 4 , (2.14) The definition above does not guarantee that the small scale power spectrum evaluated at z eff converges to the true answer, but the difference can usually be reabsorbed by changing the value of the unknown bias parameters. A global definition is also more prone to anisotropies in the orientation of galaxy pairs in the survey, which could also bias the theoretical prediction, as recently pointed out in [57]. The simple picture described above is complicated by those correlation function terms which are integrated along the LOS, like lensing magnification and ISW. In this case, the definition of effective redshift given above does not strictly hold. Nevertheless we will see in the next sections (see Figs. 4 and 10) that our choices in Eqs. (2.12, 2.14) are accurate enough for these terms as well, vastly simplifying the convolution with the window function.

The Integral Constraint
In redshift surveys the mean number density of the underlying galaxy population is usually not known a priori, and it is estimated from the data themselves. The resulting overdensity will therefore vanish when integrated over the entire volume of the survey, an effect known as the Integral Constraint (IC). In addition, if the redshift selection function is also estimated from the data, the overdensity has to vanish in radial bins when averaging over the angular footprint. This is the so called Radial Integral Constraint (RIC). The mathematics behind the two effects is very similar, therefore we will concentrate only the global IC. We follow Ref. [58], who showed that, to a very good accuracy, imposing the global IC boils down to following replacement of the theory prediction

The relativistic galaxy number density
In the previous section we laid out the formalism required to compute the measured power spectrum. The only missing ingredient is a model for the correlation function ξ(s, s 1 , µ) including all the relativistic effects. From the galaxy catalog N (n, z) in terms of the observed angle n and redshift z, we can defined the so-called galaxy number counts where .. denotes the angular average at fixed observed redshift. Being ∆ (n, z) an observable quantities we can express it in any gauge. We adopt therefore the Newtonian gauge where τ denotes the conformal time and the scalar metric perturbations Ψ and Φ are the Bardeen potentials. The full relativistic number counts to linear order in perturbation theory [1][2][3][4][5], including the observer terms [28,59], reads 5 where we have only assumed the validity of the Euler equation for baryons and dark matter. In the equation above V denotes the velocity potential and v || = n · v, where n is the unit vector pointing from the observer to the source, and v is the peculiar velocity in Newtonian gauge. We denote the partial derivative with respect to conformal time τ with a dot. The gauge-invariant density contrast D m coincides with the density fluctuation in the comoving gauge. To relation between dark matter and galaxies is parametrized by a galaxy bias b 1 , a magnification bias s b and a evolution bias f evo . Terms with a subscript 'o' indicate perturbations evaluated at the observer position which are needed for the 5 In Refs. [28,59] the Authors implicitly assume that the surveys are limited in volume. However, considering that current and upcoming surveys will be limited in flux we need to introduce also the magnification bias s b , defined as the slope of the luminosity function at the luminosity threshold, following the same convention of Refs. [3,13,31]. Therefore by Taylor expanding around the threshold luminosity and considering that the fractional fluctuation of the luminosity is twice the fractional fluctuation of the luminosity distance (δL/L = 2δD L /D L ), we need to replace To obtain Eq. (3.3), we have used the luminosity distance of Ref. [59] to properly include also the terms evaluated at the observer position.
gauge invariance of all the expressions. For example the term Ψ − Ψ o in the third line of the equation above is the standard gravitational redshift which is proportional to the difference between the gravitational potential at the source and the observer. We group the different relativistic effects in the following equation as follows: standard density plus RSD (first line), lensing (second line), Doppler (third line), local gravitational potentials (fourth line), integrated gravitational potentials (fifth line), where we have introduced the redshift dependent parameter We can indeed think of the relativistic number counts as a sum of different operators O(n, z), such that the total correlation function requires the computation of all possible terms of the form O (n 1 , z 1 ) O (n 2 , z 2 ) = ξ OO (s 1 , s 2 ,ŝ 1 ·ŝ 1 ) = ξ OO (s, s 1 , µ) . The correlation function can be computed directly as a function of (s, s 1 , µ), or rotated into this basis from another parametrization, for instance in terms of (s 1 , s 2 ,ŝ 1 ·ŝ 2 ). A pictorial representation of both coordinates system is shown in Fig. 1 We have checked that both methods give identical results, and we will use them interchangeably according to our convenience. For comparison with previous work we will more often work in the (s 1 , s 2 ,ŝ 1 ·ŝ 2 ) basis. The latter has the advantage of retaining a simple form even upon dropping the effective redshift approximation. So far we have not assumed any theory of gravity. Now, for the rest of the manuscript we will adopt General Relativity. Therefore, by using Einstein equations to relate metric and velocity perturbations to the density fluctuation we can write the correlation function as a linear combination of the functions where P (q) is the linear matter power spectrum at z = 0. In our notation, we introduce a differential operator D O associated to each O, which allows us to arrive to Figure 1. We show the systems of coordinates adopted in this work: (s 1 , s 2 , θ) and (s, s 1 , µ).
The line of sight variables χ 1 and χ 2 run from the observer to the sources along the vectors s 1 and s 2 , respectively.
where χ = χ 2 1 + χ 2 2 − 2χ 1 χ 2 cos θ. These differential operators, acting on j 0 (qχ) will lead to the functions I n . To avoid confusion, in this work we will always call q the momentum that enters the calculation of the correlation function, and k the wavenumber associated to the actual measurement ofP L (k). For the different perturbations we have (3.9) Einstein gravity will constrain these transfer functions and, in particular in the absence of anisotropic stress, we have a single scalar degree of freedom. In this case the transfer functions are given by where D 1 is the linear growth factor and f is the linear growth rate. The galaxy correlation function can be computed for instance using the public code COFFE [60], which we expanded to include all possible correlations between the observer's and source terms, between perturbations evaluated at the observer location, and to allow the multipole expansion in terms of the endpoint LOS.
The contributions arising from auto correlation of the observer terms represent the variance of these operators as they would be measured by observers with the same τ o , i.e. they account for the difference between a randomly chosen and a preferred observer [61]. In the cross correlations between the observer and the source what matters are instead the fluctuations on very large scales, comparable to the distance to the source. These long wavelength modes are stochastic in nature and cannot be known a priori. For example in a determination of the observer's velocity, what it is effectively being measured are the small scale fluctuations on the scale of our Local Group, which are by far the largest contribution to the observer's velocity.
4 Infrared divergences of the correlation function 4.1 The variance of the galaxy density field and the 1/q 4 terms If the expression for the galaxy number counts in Eq. (3.3) describes a gauge invariant and observable quantity, then all the summary statistics we compute out of it must be finite. However, a technical problem is immediately encountered in the calculation of the galaxy correlation functions. Namely, the contribution from any two powers of the metric potentials, for example (4.1) diverges in the Infra-Red (IR) for typical ΛCDM power spectra, i.e. when the lower integration limit is sent to zero. A healthy theory cannot have such a feature, which could resolve by itself within the framework, or could indicate a breakdown of one of the model assumptions.
A practical solution would be to remove by hand these divergences. This approach has been considered in [60], and we will now show that it is intimately related to the exact cancellation of the divergent terms. Consider the different operators O and O contributing to the divergent part of the correlation function where each term is proportional to I 4 0 defined in Eq. (3.7) 6 . One way to eliminate the IR divergency is to subtract unity from the spherical Bessel function in the definition of I 4 which is equivalent to removing the variance of each of the divergent pieces where the new functionsξ div OO are defined by the replacement of I 4 0 withĨ 4 0 . We can now compute all theξ's, which are finite, but we are left with the problem of adding a number 6 The different terms are proportional to I 4 0 for different arguments: s, s 1 , s 2 , χ 1 , χ 2 , s 2 1 + χ 2 2 − 2s 1 χ 2 cos θ, χ 2 1 + s 2 2 − 2χ 1 s 2 cos θ and χ 2 1 + χ 2 2 − 2χ 1 χ 2 cos θ of infinite terms at the end. However, it turns out that the sum of the variance of the divergent pieces is identically zero, once all the terms have been taken into account. Crucially the terms computed at the observer's position are necessary for the cancellation. This result implies that the lower integration limit in I 4 0 (s) does not play any role once all possible correlations have been taken into account, or equivalently that the final result is not sensitive to the large scale gravitational potentials. The cancellation of the IR-divergent terms in the variance of the galaxy density field however does not imply that the observed power spectrum does not contain terms scaling like P (k)k −4 . Subtracting a constant with respect to s as in Eq. (4.3) indeed changes only the zero mode, k = 0, of P L (k) . It is straightforward to see that for any finite mode k > 0, the Hankel transform ofĨ 4 0 and I 4 0 are the same 7 We therefore conclude that the true observed power spectrum will exhibit k −4 P (k)like behavior, which can however be exactly computed without any ad hoc procedure to tame the divergences.
The proof of Eq. (4.5) is long and tedious, and it is easily implemented with symbolic programming that can deal automatic with all the different pieces 8 . It is however instructive to look at a subset of terms, specifically the ones proportional to f evo , which cancels among themselves since f evo is an arbitrary free parameter of the model. The derivation illustrates the physical ingredients required for the cancellations. The terms we are interested in are (4.7) By using Eq. (3.9) we can compute the correlation function where D evo is the differential operator associated to Eq. (4.7), through the mapping in Eq. (3.9). In order to show that IR divergent contributions to the variance vanish we need to check that lim q→0 q 4 (4.9) 7 We used A little algebra shows that this condition is satisfied only if which by integrating over χ 1 reduces to In terms of the metric perturbation and the peculiar velocity this condition reads It is easy to show that this condition is satisfied by the Weinberg adiabatic mode [6], under the assumption that dark matter is the only clustering species. The existence of such adiabatic mode indeed guarantees that the long wavelength gravitational potential is unobservable and can always be reabsorbed with a change of coordinates. In this case for the dark matter velocity we have where we have explicitly used Φ = Ψ. Then by plugging this expression into Eq. (4.12) and taking a time derivative we obtain a second order differential equation for Φ Φ + 3HΦ + H 2 + 2Ḣ Φ = 0 , (4.14) where we have used The differential equation in Eq. (4.14) is precisely solved by the Weinberg adiabatic mode 9 [6, 62] Therefore we conclude that cancellation of the IR divergences is a direct consequence of the existence of the Weinberg adiabatic mode as a solution of the Einstein field equations. We also recognize that Eq. (4.12) agrees with the spatial curvature perturbation on comoving spatial surface which is conserved on super-Hubble scales for adiabatic perturbations in absence of anisotropic stresses. In presence of local Primordial Non Gaussianities, the sensitivity of the two-point function to the long-wavelength gravitational potential is physical, and it cannot therefore be removed. In this case, the correlation between large and small scale modes is actually imprinted during inflation. On the other hand, it is possible to show that the IR divergences are solely due to the auto-correlation of the primordial gravitational potential, and are hence proportional to f 2 NL . The terms involving one power of f NL and a metric perturbation are all individually divergent in the IR, but once again their sum is finite. This result is a consequence of the fact that in the squeezed limit local PNG cannot be generated by gravitational evolution.
Finally we notice that from the expression derived in this section, Eq. (4.9), directly implies lim q→0 q 3 The terms involving one power of the metric potentials are not IR sensitive in a ΛCDM Universe with parameters as determined by the Planck satellite [45]. Nevertheless, in the limit q → 0 we expect the gravitational potentials to disappear from the correlation function as shown in the previous section. This is an important consistency check of the theory, because, if the cancellation is a property of General Relativity and of Gaussian and adiabatic initial conditions, then long-wavelength modes are unobservable regardless of the actual value of cosmological parameters. Following the example in the previous section we show this is the case for the contribution in Eq. (4.7), while a complete treatment can be easily obtained through symbolic programming. Inspection of Eq. (4.7), reveals that we need to consider two set of terms to show the cancellation of a single metric potential in the large scale limit. Indeed we do not only have the contribution induced by the peculiar velocities, i.e. proportional to the gradient of the gravitational potential, but also the terms proportional to two powers of the gravitational potential combined with the expansion of the spherical Bessel at the order q 2 . In terms of the transfer function T Ψ , TΨ and T V we arrive to lim q→0 q 2 where we have introduced y = cos θ. Now combining the Euler equatioṅ 4.20) and the condition in Eq. (4.12), which we remind is satisfied Weinberg the adiabatic mode, we obtain or equivalently After some integration by parts and using the Friedman , we obtain the desired cancellation lim q→0 q 2 and as a consequence also that Similarly to the discussion of the divergent terms in the previous section, the cancellation only affects the zero mode of the power spectrum, hence GR contributions proportional to P (k)k −2 are present at any finite scale.
To summarise this Section, we found that the existence of the Weinberg adiabatic mode removes the IR sensitivity of the correlation function. The presence of the adiabatic mode is guaranteed for Gaussian and adiabatic initial conditions and by the diffeomorphism invariance of General Relativity. Notice all three conditions are required. We already discussed how local PNG induce correlation between long and short wavelength modes. If the initial conditions contained some amount of isocuvature perturbations or new long range forces are present in the dark sector, then the comoving curvature perturbation is not constant on super-horizon scales and the correlation function could exhibit a weak IR dependence.
While our calculation addresses the specific problem of the computation of the galaxy correlation function and power spectrum, similar arguments about the effects of long wavelength gravitational potentials, or better said lack thereof, on cosmological observables have long been known. For the late Universe, they usually exploit the Consistency Relations for LSS [63][64][65], which are valid under the same conditions required by the cancellation we showed above. For an application of the Consistency Relations to the relativistic bispectrum see [66]. Our result also implies that GR effects are not coupled to any super sample value of the gravitational potential and its spatial derivative, as it would be the case in presence of local PNG [67].
Our findings are in disagreement with the recent work in [28]. They assume that the observed power spectrum constructed from squaring the Fourier transform of the galaxy overdensity field is diagonal, i.e. the field is homogeneous and isotropic. But as we discussed in the introduction, RSD and GR effects break translational and rotational invariance: the Yamamoto estimator is a function of a single k mode because the other one has been integrated out. The power spectrum discussed in [28] is therefore not directly related to any observable. Where we agree with [28] is in the calculation of the variance of the galaxy overdensity field, which does not receive contribution from the gravitational potential and its first derivatives. As discussed above we clarified the necessary physical conditions for this cancellation, which requires more than just the equivalence principle to hold as suggested by [28]. The full two-point correlation function of the galaxy density field and any general estimator for the power spectrum like the one Eq. (2.1), are complicated average of all possible lines of sight for each pair of galaxies and should not be compared with the variance of the field itself, which is a function of a single line of sight.
Finally, one could argue that the any IR sensitivity of the model adopted here would be anyway removed by the presence of an Integral Constraint (IC). This could be the case, however it is important to check that the relativistic expression for the galaxy number counts provide a finite answer regardless of the way the mean number density of galaxy is measured. If one wants to resort to the IC, the latter should be included at the level of the integrand in Eq. (2.9), carefully including the effect of the window function. In our approach we removed the IR sensitivity in both Eq. (2.9) and in the expression of the IC, which can be therefore discarded for most practical applications.
The method described in this work to remove the IR divergences is also the only one that can deal with observations in which the mean is not known or cannot be measured. This is for instance the case of the excess brightness temperature of the 21 cm line, or any intensity line, measured with radio arrays in interefometric mode. With interferometers we directly have access to the fluctuations in Fourier Space, and there is no IC to subtract to the measurements of the power spectrum.

Relativistic effects in the galaxy power spectrum
Having resolved the issue of IR divergences in the correlation function we can now safely proceed to compute the prediction for the observed galaxy power spectrum multipoles. Our calculation of the correlation function includes the observer's terms, in auto-correlation with themselves and in cross-correlations with the source terms. Given the gauge invariant galaxy number counts, Eq. (3.3), we still have the freedom to perform a Lorentz boost to a preferred reference frame. In particular we are allowed to move to the CMB rest frame, and remove the terms proportional to the observer's velocity. While this operation is harmless in the theoretical prediction, some care should be used when expressing the measurement in the CMB or any other frame [68,69]. Alternatively one can keep these terms and try to measure the induced dipole to test the Copernican Principle. We provide expressions for this case in Appendix C.
The other required model ingredients are the multipoles of the window functions F (s, s 1 ), and the Fourier space windows functions Q L (k). While our expressions are fully general, we choose a rather simple survey geometry, whose properties can be analytically calculated. We thus assume that the window function has an azimuthal symmetry around an axes, for instance a spherical cap with a maximum opening angle θ max . We fix this angle by imposing the hypothetical survey covers one third of the sky, f sky = 1/3. This assumption allows us to sum over all values of and 1 in Eq. (2.9). We also assume that the radial selection function and angular footprint can be separated, which might not always be the case if different parts of the sky have large depth variations. Finally we assume a constant selection functions within a certain redshift range. We consider four redshift bins, [z min = 0.05, z max = 0.2], [0.5,1], [1,2] and [1,4]. For these choices of window We show the multipoles of the window functions considered in the manuscript Q L (s) and its Hankel transforms Q L (k) for the monopole (L = 0), the dipole (L = 1) and the quadrupole (L = 2), normalized to Q 0 (k → 0) = 1. The window function is defined as φ ( and we set f sky = 1/3. The vertical lines correspond to the largest scales probed by the survey, i.e. 2r(z max ) sin θ max in real space and 2π/ (2r(z max ) sin θ max ) in Fourier space.
function, the convolution with the correlation function further simplifies, as discussed in Appendix A. Figure 2 shows the multipoles Q L (s) and Q L (k) in our four reference surveys, for L = 0, 1, 2. The largest pair separation for an azimuthally symmetric survey geometry is given by 2r(z max ) sin θ max , where θ max = cos −1 (1 − 2f sky ). We see that the scale k ∼ 2π/ (2r(z max ) sin θ max ) indicates the boundary of the survey volume in Fourier space and we will display it in the other figures as well. The effect of the window function is quite dramatic at low k, as one can see in Fig. 3. The black dashed lines show the flat-sky prediction for the multipoles of the power spectrum without the window function, while the blue and red ones display the convolution with the window function, with and without the Integral Constraint respectively. Notice that our asymmetric choice of LOS generates a power spectrum dipole, as shown in the second row of Fig. 3.

Density and RSD
In the small angle limit the standard density and RSD correlation function and power spectrum correspond to the plane parallel limit of Kaiser [68]. For wide area surveys, like DESI or Euclid, deviations from the plane parallel regime take the name of wide-angle effects. The amplitude of these terms is controlled by ks eff , where s eff is the comoving distance to the effective redshift, as opposed to k/H for the other relativistic effects. Therefore for surveys of the size of the Hubble horizon, s eff ∼ H −1 , these effects are comparable, while for smaller surveys wide-angle effects are more important.
To derive the correlation function induced by density and redshift perturbations, i.e. what we denote by the standard Newtonian contribution to galaxy clustering, we follow Eq. (3.8), where Under the effective redshift approximation, the expression for the correlation function of density + RSD takes a simple, but long, analytical form. We have where x = s/r(z eff ) and b 1 and f are evaluated at the effective redshift z eff . The above expression can then be projected into multipoles. Notice that in linear theory all multipoles, even and odd ones, are non zero, and only in the plane-parallel limit when x → 0  we recover the standard expressions in [68]. The accuracy of the effective approximation is shown in Figure 4, where we plot the ratio between the full prediction, i.e. integrated over s 1 , and the power spectrum evaluated at the effective redshift, for the monopole, dipole and quadrupole. The prediction includes only the density plus RSD piece. For the left panels we assumed the linear bias scales with the growth factor, D(z)b 1 (z) = 1.5, while on the right ones we fixed b 1 (z) = 1.5. Different colors show different surveys, and the vertical bars display the largest scales measured in each of them. Continuous lines indicate the prediction evaluated at the z eff, (see Eq. (2.12)), which by definition are unbiased at sufficiently high k, while dashed ones use the global z eff (see Eq. (2.14)). Notice that for the dipole we only show the global z eff since in the plane parallel limit there is no intrinsic dipole. This figure shows that the effective redshift approximation with the z eff, 's is in general a very good approximation, better than a % in all cases except the more unrealistic scenario of a single redshift bin galaxy survey between z min = 1 and z max = 4. The global z eff performs slightly worse, and it is less accurate for constant galaxy bias. We also notice that the error introduced by the effective redshift approximation is usually multiplicative, and it can therefore be absorbed in a redefinition of the in principle unknown galaxy bias.
In real data the accuracy of the effective redshift prediction will depend on other observational effects, like completeness, that can be trivially included as additional redshift weights in our theoretical predictions.

Full sky and s/s 1 expansion
To obtain the estimated power spectrum we need to convolve the expression of the correlation function with the multipoles of the window function. Because of the finite survey volume, rather than computing the full ξ (s, s 1 ), a more practical approach for the convolution with the window function is to expand the multipoles in powers of s/s 1 Notice that in the above expression the ξ (j) might still explicit depend on s 1 to account for the redshift dependence of the growth factors.
The series expansion is certainly a bad approximation close to the survey boundary where the separation s could approach the distance to the pair, and it eventually diverges, but the finite size of the window function addresses many of these shortcomings. In Fig. 5 we show the ratio between the monopole of the predicted power spectrum up to a given order in the series expansion,P (j) L (k), and the full calculation. In all four cases considered we assume a sky fraction f sky = 1/3 and vary the redshift range of the measurements. We find that the second order expansion in s/s 1 , usually employed in the literature [50,55], is always a very good approximation up to the survey boundary, indicated by the thick vertical gray line in the panels. The difference of a few % is always well below the expected cosmic variance for these hypothetical surveys. Notice how the presence of the window function makes the flat-sky prediction, the blue lines at O(s/s 1 ) 0 , very accurate.
The same set of plots is shown for the quadrupole in Fig. 6. As expected the series expansion performs slightly worse than for the monopole, but it should be kept in mind that the cosmic variance increases accordingly. The perturbative expansion in s/s 1 breaks down at scale larger than k ∼ 2π/r eff , shown as the orange bar in the Figure. Indeed this scale corresponds to s/s 1 ∼ 1 in configuration space.
For the dipole (Fig. 7) the perturbative expansion starts to converge at O (s/s 1 ). Because the dipole of the correlation function vanishes for a single tracer, P 1 (k) in the plane-parallel approximation receives contributions only from the coupling between the window function and other multipoles. Figure 6. We plot the ratio between the quadrupole of the power spectrum expanded up to different orders in the wide-angle parameter s/s 1 and the full-sky power spectrum. The four panels show survey geometry with different deepness in redshift. The vertical orange line indicates k = 2π/r(z eff ), while the gray line refers to the largest scales within the survey geometry.
In the top panels we consider b 1 = 1.5/D 1 (z), while on the bottom panels b 1 = 1.5.

Lensing Magnification
At fixed pair separation s between two observed galaxies, terms like lensing magnification or ISW measure the correlation between points along the lines of sight of the two galaxies. The "non-locality" of such contributions makes harder to understand to which value of k in the power spectrum they mostly contribute. In our formalism, however, computing the average of the observed power spectrum requires only the evaluation of the threedimensional correlation function multipoles. We compute ξ(s 1 , s 2 , cos θ) or ξ(s, s 1 , µ) using FFTs, and then Monte-Carlo or quadrature methods for the integrals along the LOS and the projection into multipoles.
In this section we focus on lensing magnification, which enters the power spectrum at the same order in derivatives as the standard density and RSD terms. As mentioned before, we set s b = 0 as its value changes only the normalization of the lensing contribution to the power spectrum. Fig. 8 shows in red the contribution of lensing magnification to the monopole and quadrupole of the power spectrum, which can be compared to the RSD term (including wide angle effects) in blue. Both curves include the effect of the window function, but not the integral constraint. For reference the flat-sky linear theory prediction in absence of a window function is shown as the dashed black line. Fig. 9 instead shows, again in red, the relative difference between the RSD power spectrum and the RSD plus lensing power spectrum. Because we are adding the different contributions we can now consistently subtract the integral constraint. As expected lensing magnification is a tiny effect for low redshift surveys with a small extension in redshift, as displayed in the leftmost panels. In this case, for both the monopole and the quadrupole we find lensing magnification leads to sub-% changes on the total power spectrum. Moving to higher redshift and wider redshift ranges the effect of lensing increases, reaching tens of percent for the monopole and O(1) for the quadrupole in our most unrealistic setup of a single redshift bin between z = 1 and z = 4 over one third of the sky. We remark that the spikes in the lower panels of Fig. 9 appear because the RSD quadrupole crosses zero, see Fig. 3. The larger effect on the quadrupole can be understood by noticing that lensing magnification contains a transverse Laplacian, which depends strongly on µ and it is therefore more sensitive to the non uniform weighting of higher multipoles. The information carried by the lensing magnification effect is thus spread over several multipoles. Note for instance in Fig 8 that the lensing quadrupole, red line in the lower panels, is usually larger than the lensing monopole, red lines in the upper panels. Interestingly this could open the possibility to detect the lensing contribution by measuring the power spectrum at large scales for higher multipoles where the Newtonian effects are strongly suppressed. Investigations with more realistic angular masks and selection function are certainly needed, but our analysis indicates that lensing magnification will likely not play a major role in cosmological parameter inference in spectroscopic surveys.

The observed galaxy power spectrum
In this section we finally discuss all the relativistic effects and their effect on the observed power spectrum. As we can see from Figs. 8 -9, the relevance of the different effects changes with the survey volume and redshift coverage. In Fig. 9, as discussed in the previous Section, each line shows the ratio between the RSD prediction and RSD plus  all the GR terms above in the Figure legend. For example the green line is the relative difference between the RSD term and RSD plus lensing and Doppler terms, and so on. At low redshifts the main contribution is due to the Doppler effects. This behaviour is well explained by the presence of the dependence on 1/(Hr) of such terms. In our low redshift survey, the Doppler terms are approximately 5% and 10% of the RSD ones for the monopole and the quadrupole respectively. They are therefore much larger than the lensing magnification. Indeed, lensing is an integrated effect from the source to the observer and its contribution growths by summing up several lensing deflections along the line of sight. Local and integrated terms are subdominant at such low redshift. Notice that all GR terms are proportional to P (k)k n for some n < 0, until the window function dominates.
At intermediate redshifts, between z min = 0.5 and z max = 1, lensing become comparable to the Doppler terms, and the other local and integrated terms also grow. For this survey, GR corrections are still sub-% for the monopole and approximately 10% for the quadrupole on the largest scales. Interestingly, in our setup, the GR effects are minimized for a survey at intermediate redshifts, as the Doppler effect starts to decrease and lensing magnification has not yet taken over.
Moving to the high redshift Universe, in our survey between z min = 1 and z max = 2 or z min = 1 and z max = 4 the lensing magnification dominates over all other terms, and the conclusions we reached in the previous section apply.
Our results indicate that for current and upcoming LSS surveys, cosmological inference from Baryon Acoustic Oscillations and the shape of the power spectrum on quasilinear scales is largely unaffected by GR effects, which are sub per mille corrections to the RSD term.
We conclude this section by investigating the accuracy of the effective redshift ap-  Figure 9. We show the ratio between the multipoles (monopole on top panels and quadrupole in the bottom ones) sourced by RSD with respect to the other relativistic effects. We add the effect one by one, such that the lensing line contains also RSD, the doppler line also lensing and RSD, and so on. In the Integral Constraint we include the same effects. The vertical lines correspond to the largest scales probed by the survey, i.e. 2π/ (2r(z max ) sin θ max ).
proximation once we include all the relativistic effects. Our findings are shown in Fig. 10, assuming the linear bias scales as b 1 (z)D(z) = 1.5. While we know that in principle the redshift evolution of the integrated terms, including lensing, is scale-dependent, because the latter are a relatively small correction on top of the standard RSD power spectrum the effective redshift approximation works extremely well also in the presence of relativistic corrections, with 5% differences in the monopole only for the very extended survey between z min = 1 and z max = 4 shown in the rightmost column 10 .

Conclusions
In this work we presented the first computation, in General Relativity, for the ensemble average of the Fourier space estimators traditionally used in the galaxy power spectrum analyses of spectroscopic surveys. We clarified the relation between what is measured and what is predicted and how different estimators effectively perform different compressions of the data. Since GR corrections are important on large scales, a meaningful calculation must include observational effects like the presence of an angular mask, of a redshift selection function, and of the Integral Constraint. We implemented the presence of an arbitrary window function in the exact way, highlighting which assumptions could be employed to simplify the computation in certain situations. A prerequisite for the GR calculation is the understanding of the IR divergences, and more generally of the IR sensitivity, of the three dimensional correlation function of the galaxy overdensity field. We showed for the first time that these effects are not present once all the terms in the expression for the galaxy number counts are taken into account, and the full calculation for any observable is finite and insensitive to the large 1.06 Figure 10.
We show the accuracy of the effective redshift approximation (according to Eq. (2.12)) by including all the relativistic effects. Similarly to Fig. 9 we add each effect one by one. The top panels refer to the monopole, while the bottom ones to the quadrupole. The vertical lines correspond to the largest scales probed by the survey, i.e. 2π/ (2r(z max ) sin θ max ).
scale cutoff of the theory. We were able to trace back this cancellation to the presence of the Weinberg adiabatic mode [6] and of the Consistency relations for LSS [63][64][65]. From these results, our calculations therefore inherit the underlying conditions of validity such as adiabatic and Gaussian initial conditions, and diffeomorphism invariance. Our expression for the observed power spectrum differs from the one in Ref. [28], with whom we therefore disagree on the interpretation of the relevance of the IR contributions to the power spectrum. Since our definition coincides with the power spectrum routinely measured in galaxy surveys we conclude that the power spectrum proposed in Ref. [28] does not necessarily correspond to an observable quantity.
We then computed the observed power spectrum in four different hypothetical surveys, covering one third of the sky at different redshifts. We first confirmed earlier work in the literature on wide angle RSD, which we found to be a few % of the total signal on very large scales. We also validated the series expansion approach to wide angle RSD, which is sub-% accurate once the first few terms in the series have been included. We then checked the accuracy of using an effective redshift to model the clustering signal, and found that for our choice of radial selection functions it has a negligible impact, unless in a very extended survey between z = 1 and z = 4. The validity of the effective redshift approximation depends on both the redshift evolution of the sample and the selection function, therefore any conclusions on more realistic galaxy samples than what we considered here should rely on the full machinery provided in this work.
We then showed for the first time the impact of lensing magnification on the multipoles of the observed power spectrum, finding it is not a significant effect at low-z but it can account for the 5−10% of the monopole and the quadrupole at large scales for sufficiently realistic high redshift surveys.
A summary of our results, including all GR effects, can be found in Fig. 8 -9. For low-redshift surveys the Doppler term is the most important GR contribution, up to 5−10% of the total power, while at intermediate and high redshift lensing magnification becomes more significant. Other GR effects like ISW, time delays, and local terms are in general negligible. Being the GR effects a small contribution to the observed multipoles, the effective redshift approximation remains valid in most cases considered.
There are several directions along which this work can be extended. One of our main motivation was to understand the importance of GR effects when constraining local PNG with scale dependent bias. We presented the expression for the average of the Fourier Space estimator, but we cannot yet quantitatively address any possible degeneracy of GR terms with PNG because we miss an expression for the covariance of the measured power spectrum including GR effects and the window function. The mask indeed makes the covariance of the power spectrum non diagonal on very large scales, and so far only approximate analytical calculations in the flat sky limit have been proposed [70]. Such an investigation is subject of work in progress by the authors [71].
Our implementation of the Yamamoto estimator uses the endpoint LOS, but several other choices are possible. In particular, for analyses aiming at the Doppler terms a symmetric choice of LOS, like the bisector, is preferable because it sets to zero purely geometric dipoles. Also in this case the power spectrum can be measured with FFTs [35,72], and all our calculations go through unchanged with the replacement of s 1 with the length of the new LOS. It is also well known that Yamamoto-like estimators are not statistically optimal on large scales [73]. It is therefore natural to ask how to incorporate GR effects in optimal quadratic estimators of the power spectrum, which are traditionally implemented using a flat-sky RSD model [74]. As a final remark we note that the full GR calculation is computationally much slower than the Newtonian one. However given the smallness of many GR terms, one could imagine resorting to the flat-sky approximation in some cases and obtain order of magnitudes speed up of the numerical calculations [75]. We plan to return to these interesting problems in forthcoming publications.

A Azimuthally symmetric window function
In this appendix we provide explicit expressions for the convolution with an azimuthally symmetric window function For this survey geometry Eq. (2.10) reduces to For this symmetric window function we can also pursuit a different path, without relying on the calculation of the multipoles of the correlation function. In the case of relativistic effects integrated from the source to the observer, such approach can lead to a simpler numerical evaluation. In this case we start considering the expectation value of the power spectrum estimator where in the second line we have changed the integration variables (d 3 s 1 d 3 s 2 → d 3 s 1 d 3 s). By explicitly using the symmetry of the survey geometry we obtain where we have introduced dθ Θ x cos θ + cos θ 1 (A.10) We can see that this simplified approach fully agree with Eqs. (2.9-2.10). We can easily relate Eq. (A.9) with Eq. (2.9) through In particular, for the monopole (L = 0) we have

B Relativistic correlation function
The full relativistic correlation function can be written as and we have introduced The functions J ni are defined in this Appendix. In this section we summarize the relativistic contribution to the full-sky correlation function. The standard Newtonian contribution has been already derived in Section 5.1. To be consistent with previous derivations [60], we present them in terms of the system of coordinates (s 1 , s 2 , cos θ =ŝ 1 ·ŝ 2 ) 11 .

B.1.2 Doppler
We consider now the contribution of the Doppler term (including the peculiar velocity of the observer). By following the same approach we find

B.1.3 Local Gravitational Potential
The contribution of the local gravitational potential terms is determined by

B.1.4 Integrated Gravitational Potential
For the integrated terms we obtain

B.2 Cross-correlations
Here we present the different cross-correlations. In the final result we need to account for the permutation on the two different source positions.

B.2.1 Standard Newtonian x Lensing
We consider the correlation between the standard Newtonian terms and the lensing magnification.

C The observer's velocity
The numerical results in section 5 are computed in a frame determined by v obs = 0. We now want to investigate the contribution of the observer velocity in more detail. In general, the correlation function will contain terms proportional to the variance of the velocity field v || (2 + 1) φ 2 (k) g (θ max ) (C.3) We can see that if s max s min , the contribution to the power spectrum of the autocorrelation of the peculiar velocity evaluated at the observer position peaks at k ∼ π/s max and P v || . (C.9) Therefore the contribution at the largest scales is given by the square of the Hankel transform of the window function together with some geometrical factor due the partial sky coverage. Interestingly, in full-sky this effect is purely sourced by the dipole of the correlation function. This is a spurious dipole induced by the observer motion inside a sphere defined by the full-sky window function in terms of real space coordinates. However, in a real survey the window function is determined in redshift space and therefore we need to account for its relation to the real space coordinates φ(z 1 ) φ(s 1 ) + dφ(s 1 ) ds 1 ds 1 dz δz 1 . This is the so-called Kaiser rocket effect [68], and it has been also generalized in a relativistic framework in Ref. [76]. Fig. 11 shows the effect of the observers' velocity on the large scale monopole of the power spectrum computed from C.3. The shape of this contribution is completely determined by the angular and radial selection functions, as one can realize by noticing both auto-and cross-terms have the same shape.
So far we have implicitly assumed that the velocity field at the observer v obs is a random variable drawn by the linear Gaussian distribution of cosmological perturbation statistics. However, our motion with respect to the CMB frame can be measured from the CMB dipole [77][78][79][80] and from the modulation and aberration of the CMB anisotropies [81,82]. Our velocity with respect to the CMB is sourced by a combination of the linear gravitational field, usually identified with the Local Group Velocity, and short-scale non-linear effects, see e.g. [83]. In cross-correlating the observer velocity with the perturbations at the source position we need to account only for its linear component.
If the observer velocity is known we can include its effect in our estimator. By assigning the measured value to v obs , we are in practise drawing a single cosmological realization [61,84]. In this case we have (C.10) Now we compute from which it follows (C.13) The final result reads as