Consistency of the Parkes Pulsar Timing Array Signal with a Nanohertz Gravitational-wave Background

Pulsar timing array experiments have recently reported strong evidence for a common-spectrum stochastic process with a strain spectral index consistent with that expected of a nanohertz-frequency gravitational-wave background, but with negligible yet non-zero evidence for spatial correlations required for a definitive detection. However, it was pointed out by the Parkes Pulsar Timing Array (PPTA) collaboration that the same models used in recent analyses resulted in strong evidence for a common-spectrum process in simulations where none is present. In this work, we introduce a methodology to distinguish pulsar power spectra with the same amplitude from noise power spectra of similar but distinct amplitudes. The former is the signature of a spatially uncorrelated pulsar term of a nanohertz gravitational-wave background, whereas the latter could represent ensemble pulsar noise properties. We test the methodology on simulated data sets. We find that the reported common process in PPTA pulsars is indeed consistent with the spectral feature of a pulsar term. We recommend this methodology as one of the validity tests that the real astrophysical and cosmological backgrounds should pass, as well as for inferences about the spatially uncorrelated component of the background.


INTRODUCTION
Pulsar timing array (PTA) experiments pursue the goal of detecting nanohertz-frequency gravitational waves through temporal and spatial cross-correlation of pulse arrival times from millisecond radio pulsars. The primary target sources of such signals are coalescboris.goncharov@me.com ing supermassive binary black holes separated by less than ∼ 0.1 pc. Nanohertz gravitational waves produce correlations in the timing residuals between the measured arrival times and the arrival times predicted by the deterministic pulsar timing models (Edwards et al. 2006). Recent searches for the stochastic gravitational arXiv:2206.03766v2 [gr-qc] 21 Jun 2022 wave background by NANOGrav 1 (Arzoumanian et al. 2020), PPTA 2 (Goncharov et al. 2021a), EPTA 3 (Chen et al. 2021) and IPTA 4 (Antoniadis et al. 2022) reported evidence for the "common-spectrum process", the same power-law component in Fourier spectra of timing residuals. The spatial correlations necessary to claim a detection originate from the so-called Earth term of the gravitational wave background (Hellings & Downs 1983). The pulsar term of the background arises from the passage of gravitational waves near the pulsars and only manifests as a spatially uncorrelated process with the same spectrum of temporal correlations in all of the pulsars. It is expected that evidence for the commonspectrum process, to which both of the terms contribute, would precede the detection of the gravitational wave background (Romano et al. 2021;Pol et al. 2021). Goncharov et al. (2021a), on the other hand, pointed out that the methodology employed by Arzoumanian et al. (2020) does not allow to distinguish between common and similar noise processes in pulsars. So, it is unclear if the recent searches have detected a bona-fide gravitational wave background.
One might ask why do we need to precisely determine the degree of similarity in pulsar spectra if the detection mostly depends on spatial correlations anyway? There are several reasons. Noise processes with similar and not common noise spectra could arise from e.g. spin noise, stochastic irregularities in rotation of the pulsars (Shannon & Cordes 2010). In fact, the spin noise model discussed in Meyers et al. (2021) suggests a spectral index of timing residuals to be γ = 4 (where the power spectral density is modelled as P (f ) ∝ f −γ ), which would be difficult to distinguish from the nanohertz gravitational wave background expected from binary supermassive black holes, γ = 13/3, with current PTAs (see e.g. Figure 13 in Renzini et al. 2022, to inspect measurement uncertainties). Empirical models for spin noise in millisecond pulsars predict timing noise having spectral indices and amplitudes similar to that expected of the gravitational wave background. If the reported signal in PTAs is not common and thus not of a gravitationalwave origin, it could have interesting implications for spin noise and hence for neutron star physics (Melatos & Link 2014). Furthermore, the amplitude of the commonspectrum process is in tension with several predictions 1 The North American Nanohertz Observatory for Gravitational Waves (McLaughlin 2013) 2 The Parkes Pulsar Timing Array (Manchester et al. 2013) 3 The European Pulsar Timing Array (Desvignes et al. 2016) 4 The International Pulsar Timing Array (Verbiest et al. 2016), a consortium of all pulsar timing arrays across the world for the stochastic background amplitude. Recent work by Izquierdo-Villalba et al. (2022) suggests that the gravitational wave background of the same strain amplitude as the common-spectrum process is challenging to produce in theory given the constraints from the quasar bolometric luminosity functions or the local black hole mass function. Casey-Clyde et al. (2022) find the local number density of supermassive binary black holes inferred from the amplitude of the common-spectrum process, assuming it to originate from the stochastic background, to be five times larger than theoretical predictions. So, even if PTAs are observing hints of a gravitational wave background, inferences based on the common-spectrum process might be contaminated by pulsar-intrinsic noise, and it is important to clarify to what degree it is true. Such inferences can be promising because the constraints on the background amplitude from inter-pulsar correlations lag behind the ones based on spatial auto-correlations for most pulsar timing arrays (Pol et al. 2021).
Whereas previous analyses of time-correlated noise in pulsars were based on identifying (e.g. Lentati et al. 2016;Goncharov et al. 2021b) or modelling Goncharov et al. 2020;Chalumeau et al. 2022) noise power spectra, here we focus on modelling Bayesian priors using the second PPTA data release (PPTA-DR2, Kerr et al. 2020). The methodology is described in Section 2. In Section 3 we outline the results and we summarise the conclusions in Section 4. The timing residuals comprise a number of stochastic processes. Those that are not due to gravitational waves are considered as noise. Temporally correlated processes are called red, whereas processes without temporal correlations are referred to as white. The power spectral density of red processes is usually assumed to be a power law: where the amplitude A is in the units of strain amplitude of the stochastic gravitational wave background at f = yr −1 and −γ is a spectral index. The frequency in Fourier spectra of fluctuations in timing residuals and concurrently a frequency of a gravitational wave that would have induced these fluctuations is denoted f . To clarify, for white noise P is a constant, it does not depend on f . Red process spectra are modelled at n c harmonically related frequencies that are multiples of the reciprocal of the observation span. Values of n c as well as priors for A and γ and the list of noise terms found in PPTA DR2 were published in Goncharov et al. (2021a). Additional details on the noise models in PPTA DR2 are outlined by Goncharov et al. (2021b). All noise, timing model parameters and signals of interest are modelled using the multivariate Gaussian likelihood (Lentati et al. 2014;Arzoumanian et al. 2016;Taylor et al. 2017) and Bayesian posterior sampling. Without accounting for spatial correlations, the total PTA likelihood is a product of individual pulsar likelihoods. In particular, for a common-spectrum process with A c and γ c , the total PTA likelihood is: where θ = (θ 1 , ..., θ N ) are parameters of models that describe data of individual pulsars d = (d 1 , ..., d N ), including pulsar-intrinsic "spin" noise parameters A k , γ k and parameters of other noise terms which are not of interest to us for the purpose of this work. Some of these parameters are marginalized over analytically (van Haasteren et al. 2009) and others numerically 5 , so θ is omitted from the following equations. Both for the common-spectrum process found in PPTA DR2 and for a classical model of the stochastic gravitational wave background from circular supermassive black hole binaries, γ c = 13/3. We will fix γ c at this value throughout our analysis. We provide additional remarks on the data and analysis in Appendix A.

Importance sampling for pulsar timing arrays
When it is computationally challenging to evaluate a likelihood or to include several measurements in one likelihood, many data analyses resort to the so-called importance sampling (Chapter 10 in Gelman et al. 1995;Payne et al. 2019). The idea is that the analysis is first carried out assuming a proposal distribution -a likelihood or a prior which is easy to evaluate. Next, posterior samples obtained from this first step are used to evaluate the target distribution -a likelihood or a prior that represents the model we are ultimately interested in. Finally, if proposal samples are collected from subsets of a total data set, they can be combined into a single likelihood through the procedure known as posterior recycling . To sum up, importance sampling revolves around reweighing of likelihoods and priors.
Let us represent our signal likelihood given by Equation 2 through target likelihoods with the common-spectrum process, L(d k |A k,j , γ k,j , A c , γ c ), and the proposal likelihood without the common-spectrum process, L(d k |A k,j , γ k,j ). Both the proposal and the target likelihoods will represent individual pulsars, whereas the total signal likelihood includes contributions from all pulsars: Here, the sum is over the n k fiducial posterior samples j for pulsar k, generated for the proposal distribution. The product is over pulsars. The posterior samples include (A k , γ k ), the amplitude and the spectral index of the red noise for pulsar k. We use these fiducial samples as the "proposal distribution" in order to explore a more complicated likelihood (the "target distribution"). The Bayesian evidence Z(d k ) is the integral of the (proposal) likelihood over the prior. Importance sampling is similar to the factorized likelihood approach , where the amplitude of the common-spectrum process is obtained through the multiplication of posterior distributions obtained in analyses of individual pulsar data.

Common versus quasi-common
In the case of a common-spectrum process, which could originate from the gravitational wave background, nature provides us with one A c in all pulsars. In the language of a hierarchical Bayesian statistics (Chapter 5 in Gelman et al. 1995), A c,k for a pulsar k is drawn from a delta function distribution, described by a hyper-parameter that determines the position of a delta function distribution along possible values of A c . The standard null hypothesis is that there is no commonspectrum process and pulsar data sets are described by individual pulsar noise. In this work, we propose an alternative null hypothesis where A c,k are drawn from a Gaussian distribution described by hyper-parameters σ A , µ A and not from the delta function distribution described by A c . So, in Equation 3 we introduce multiple possible values of A c,k and marginalize over the likelihood of such a noise process over the Gaussian 6 prior π(A qc,k |µ A , σ A ) with hyper-parameters µ A and σ A being the mean and the standard deviation of the Gaussian prior, respectively: We provide a derivation in the Appendix C. For σ A = 0, π(A qc,k |µ A , σ A ) reduces to the delta function. Therefore, the likelihood L(d|A c ) in Equation 3 is a subset of the likelihood L(d|µ A , σ A ) in Equation 4. We refer to noise processes with σ A = 0 as quasi-common, which means that noise spectra in timing array pulsars are similar but not common. The measurement of σ A and µ A described in this Section is also applicable to modelling pulsar spin noise alone, which is broadly-distributed in A k . Instead, here we only infer parameters on the second common term A qc to map the measurements directly to the result of the gravitational wave search with PPTA DR2 (Goncharov et al. 2021a). The principle of the quasi-common noise model is similar to the dropout analysis (see Figure 9 in Arzoumanian et al. 2020), it quantifies how constraints on the common-spectrum process with one of the pulsars are consistent with constraints from the rest of the pulsars. As Bayes factors, high dropout factors for pulsars support the commonspectrum process, whereas low dropout factors that approach zero illuminate pulsars that do not have the same spectra as the others. We calculate the integral over A qc,k for every evaluation of the likelihood L(d|µ A , σ A ) in Equation 4 the following way.
First, we pre-compute L(d k |A k,j , γ k,j , A qc,k ) on a grid of A qc,k for n k posterior samples for N pulsars. Next, for every parameter sample (µ A , σ A ), we evaluate the prior for the grid of A qc,k , multiply it by the pre-computed likelihood L(d k |A k,j , γ k,j , A qc,k ), and evaluate the integral over the product numerically. Because parameter estimation is traditionally performed on the (base-10) logarithm of the red process amplitude, in practice we measure µ log 10 A and σ log 10 A instead of µ A and σ A .

RESULTS
We measure µ log 10 A and σ log 10 A in PPTA DR2 (Kerr et al. 2020), with a particular emphasis on σ log 10 A which distinguishes a common-spectrum process from a quasi-common noise process, as pointed out in Sec-tion 2.3. We find Savage- Dickey (1971) natural log Bayes factor in favor of σ log 10 A = 0 against other σ log 10 A within the uniform prior to be −0.16. This means that σ log 10 A is consistent with zero. We further calculate an upper limit on σ log 10 A < 0.44 at 95% credibility. Thus, we demonstrate that the data are more consistent with a common-spectrum process hypothesis than the extended quasi-common process model. We measure µ log 10 A = −14.71 +0.08 −0.14 , which is consistent with log 10 A = −14.66 ± 0.07 found in Goncharov et al. (2021a). The result of parameter estimation is provided in Figure 1. We note that the maximum a posteriori value of σ log 10 A differs from zero, which may represent a noise fluctuation.
We checked that the result is robust to the exclusion of three pulsars with particular noise properties: PSRs J0437−4715, J0711−6830 and J1643−1224. PSRs J1643−1224 and J0711−6830 show spin noise with unusual spectral indices and so we consider them population outliers. These two pulsars also do not contribute significantly to the common-spectrum process based on the dropout analysis ( Fig. 2 in Goncharov et al. 2021a), which is why it is acceptable to exclude them. PSR J0437−4715 shows excess noise that could, in principle, marginally affect measurement of the common-spectrum process, and that does affect the inference of spatial correlations (see the left panel of Figure 3 in Goncharov et al. 2021a).
Additionally, we tested our methodology using simulated data sets. The details of the simulations can be found in the Appendix B. One important issue that we identified is that it is essential to include the commonspectrum process in the proposal likelihood, not only in the target likelihood when testing the quasi-common noise hypothesis. If it is excluded, the measurement of σ log 10 A will always peak at zero, even if data sets were simulated with e.g. σ log 10 A = 2. This is because the proposal likelihood does not sufficiently match the target likelihood. Non-inclusion of the separate commonspectrum process term will prevent accumulating posterior support in two areas of the parameter space that correspond to the spin noise and the common-spectrum process, respectively. Nevertheless, our methodology is very sensitive to sub-threshold red-noise contributions, which is generally true for all Bayesian inference that combines data from multiple measurements (e.g., Goncharov et al. 2020). In particular, even though two Posterior probability Figure 1. Comparison of common-spectrum and quasi-common spectrum processes in the PPTA DR2. Left: marginalized posterior for σ log 10 A . The value is consistent with zero, which means that the common-spectrum hypothesis holds for the PPTA data. Vertical dashed line is the upper limit on σ log 10 A at 95-% credibility. Right: full posterior for µ log 10 A and σ log 10 A . Vertical dashed line corresponds to the measurement of log 10 A of the common-spectrum process in PPTA DR2 (Goncharov et al. 2021a). Our inference of the common-spectrum process amplitude is consistent with the previous measurement considering generalization of the model and the use of likelihood reweighting. Three closed lines correspond to the standard 1-, 2-and 3-σ levels. In both panels, the dash-dotted lines show the measurements performed without three PSRs with special noise properties discussed in Section 3: J0437−4715, J0711−6830, and J1643−1224. Removing these pulsars from the analysis increases uncertainty levels, as expected.
power law processes -(quasi-)common-spectrum process and pulsar-intrinsic spin noise -are not individually resolved in single pulsars, Bayesian inference with the total data set distinguishes them when models match the simulation.

CONCLUSIONS
We have shown that a pulsar timing array data set, in our case PPTA DR2, contains a red process that has spectra consistent with stochastic time-series realizations with the single power-law amplitude by measuring σ A consistent with zero. Note, this conclusion is (a) made under the assumption that the red process powerlaw index is 13/3, which is supported by the PPTA data, but (b) not yet applicable to all available PTA data sets. This discrepancy can potentially be resolved through further data combinations, for example by the IPTA. The identified common-spectrum process in the PPTA DR2 could therefore be the spatially-uncorrelated component of the stochastic gravitational wave background. We expect the variance in measured pulsar spectra from spatial correlations of the pulsar term to exceed that from different time-series realizations of the Earth term, and thus the uncertainty in σ A to be dominated by the pulsar term. A more detailed investigation of the contribution of both terms to σ A is a subject of a follow-up study.
The methodology of previous nanohertz gravitational wave searches might have led to identifying a commonspectrum process when none is present in the data (Goncharov et al. 2021a). Specifically, incorrect conclusions can be caused by a mismatch between uniform noise priors on A and γ in the models and the clustering of these parameters in real data. In particular, for the case of a single power-law red noise term in pulsars, evidence for the common-spectrum process disappears when the distribution of true parameters of such noise processes matches the Bayesian priors. The demonstration of the above two points is to be provided by A. Zic et al. (in prep.) simulations where the apparent common-spectrum process arises in a variety of scenarios that only contain pulsar-intrinsic spin noise, whereas the spurious emergence of spatial correlations in such data sets is very unlikely. Determining that the identified noise process is quasi-common could either mean that it is not the gravitational wave background or that the assumed uniform prior distributions for pulsar-intrinsic noise need to be replaced by more realistic priors. The latter may inform about stochastic irregularities in rotational properties of neutron stars and thus about neutron star physics (Melatos & Link 2014).
Neither our test nor any other test will be able to confirm the common-spectrum hypothesis because it is a matter of measuring the width of the distribution, and we would need an infinitely small measurement uncer-tainty to rule out all σ A other than zero. Thus, the techniques we introduce allow to perform a consistency test and to infer common noise properties, but cannot be used to detect the stochastic background. Once the gravitational wave background is detectable through spatial correlations, further modelling of the priors of pulsar-intrinsic spin noise with the techniques we outlined will be required to disentangle these terms. Further simulation study will be useful to test the fidelity of the presented methods under different conditions.
We would also like to point out that it is challenging to extend the methodology to allow γ to vary and thus to measure µ γ and σ γ simultaneously with µ A and σ A because the accuracy of numerical integration on a grid will decline for a higher-dimensional problem and the computational burden will significantly increase. We foresee several other modifications of the outlined methodology. For example, one could model the distribution of common-process amplitudes A linearly (rather than logarithmically), as well as to fix the amplitude and fit for the distribution of power-law indices. Moreover, one could apply our approach to fitting the distribution of pulsar-intrinsic noise amplitudes and power-law indices, which would result in larger σ A and σ γ . Furthermore, in case of finding stronger evidence for quasi-common noise with σ A > 0, it is possible to test a range of functional forms outlining the distribution of A other than the Gaussian distribution we have assumed.
From the perspective of future validation of the detection of the stochastic background, the tests we propose are complementary to other work. The dropout analysis in Arzoumanian et al. (2020) is developed to find outliers in the common-spectrum process. Taylor et al. (2013) introduced interpolant-based modeling for spatial correlations, which is important to ensure that pulsars exhibit precisely the Hellings-Downs correlations and not something else. Johnson et al. (2022) examine a bias from using a finite number of pulsars. Taylor et al. (2022) develop a factorized likelihood approach for cross-validation of measurements between subsets of timing array pulsars. Romero-Shaw et al. (2022) review several other tests to spot incorrectly specified models. Because, as outlined by Goncharov et al. (2021b), the background noise in pulsar timing arrays is often not white and Gaussian as assumed by current models and simulations, we suggest to study the effect of this noise on gravitational wave searches. the Australia Telescope, which is funded by the Commonwealth Government for operation as a National Facility managed by CSIRO. This paper includes archived data obtained through the CSIRO Data Access Portal We emphasise that the sources of noise, especially, the red-noise terms, should be modelled, to ensure the correctness of the analysis. The second data release of the PPTA contains several sources of red noise which were found and identified in Goncharov et al. (2021b) based their attribution to telescope observing band or system as well as on their radio frequency dependence. Goncharov et al. (2021b) also identified non-stationary sources of noise which affect red noise measurements if not modelled. Incorrect modelling of pulsar timing model parameters such as spin frequency derivatives or instrumental phase jumps can also appear as red noise. Pulsar timing models for PPTA DR2 in coordination with the noise analysis were performed by Reardon et al. (2021). The data set and the code is available at github.com/bvgoncharov/ppta dr2 noise analysis.
Parameter estimation and Bayesian evidence evaluation for individual pulsars is performed with nested sampling (Skilling 2006) implemented in pypolychord by Handley et al. (2015). White noise parameter estimation is performed with the ptmcmcsampler (Ellis & van Haasteren 2019). Pulsar likelihoods are modelled using the code enterprise ) and linked to pypolychord with bilby (Ashton et al. 2019), using the code available at github.com/bvgoncharov/enterprise warp.

B. VALIDATING THE GENERAL QUASI-COMMON NOISE MODEL
We performed a study using a simulated data set to confirm the validity of the importance sampling method as well as the quasi-common spectrum process model, for which the common-spectrum process is a limiting case. We note that the common-spectrum process hypothesis for a given pulsar is described by a likelihood from Equation 2, approximated with the importance sampling by Equation 3. This likelihood is then generalized in Equation 4 to represent the quasi-common spectrum process hypothesis.
We simulated a data set with 26 pulsars, as in PPTA DR2, observed for 2555 days, with the pulse arrival time errors of σ ToA = 0.1 µs and the white noise with variance proportional to σ ToA . We then simulate two red noise processes described by log 10 A and γ, as we expect from the stochastic gravitational wave background and pulsar noise. The simulations are based on 30 frequencies, which corresponds to the Fourier basis to model red noise in previous analyses. One red noise process for each pulsar is drawn from the truncated Normal distribution N (µ, σ) with µ log 10 A = −16.3, µ γ = 5, σ log 10 A = 2, σ γ = 2. Another red noise process, the quasi-common noise, has a fixed γ = 13/3 and log 10 A is drawn from the truncated Normal distribution with µ log 10 A = −13.3 and σ log 10 A = 0.5. Both Normal distributions are truncated to the edges of uniform priors used in Goncharov et al. (2021a) to avoid edge effects. The values were chosen so that pulsar-intrinsic red noise was below the detection threshold of some pulsars, as in PPTA DR2 where only 9 pulsars of 26 show evidence for spin noise. Moreover, quasi-common noise realizations have simulated A of the order of 10 −14 -10 −13 , as six realizations of the spin noise. The same number of pulsars in PPTA DR2 have the inferred common process amplitude and the spin noise amplitude of the same order of magnitude. The results of parameter estimation on the simulated data set are presented in Figure 2. Both the µ log 10 A and the σ log 10 A are consistent with the simulated values. Note, we tested that this data set shows strong evidence for the common-spectrum process when using the priors from Goncharov et al. (2021a), and yet with our new generalized model we correctly infer that it is not the case because σ log 10 A = 0. More precisely, we find log B CP ∅ > 13.5, where ∅ is the noise-only null hypothesis that includes only white and red noise, as per the simulations.
The noise in real data is more sophisticated than in the simulations, but it is not necessary to represent the whole complexity of the data to demonstrate the validity of the approach. We defer exhaustive tests of the methodology to future work. Our method is applicable to any data set under the assumption that physically relevant red noise processes (from interstellar propagation effects, instrumental noise, etc.) are separated from the common-spectrum process by including them in the models. Moreover, we also find that the current simulation study is robust to the choice of priors for pulsar-intrinsic noise, with the only caveat being that the proposal likelihood should include all red noise terms in pulsars to allow parameters of both processes to be sampled. We tested that the simulation works for a reduced case with only one red noise process per pulsar, correctly recovering position and width of a Gaussian distribution of pulsar red noise parameters. We also trialed it for the case where the maximum probability density of the distributions of pulsar spin noise amplitudes (µ log 10 A = −14.3, σ log 10 A = 1.3) and quasi-common noise amplitudes (µ log 10 A = −13.8, σ log 10 A = 0.4) are similar and the distributions have a broader overlap. Posterior probability Figure 2. Searches for a quasi common-spectrum process in a simulated data set. Left: marginalized posterior for σ log 10 A . The value is inconsistent with zero, which means that the signal that would have been interpreted as the common-spectrum process with the standard methods is only a quasi-common spectrum process. Based on this result, the gravitational wave origin of the signal can be ruled out. The simulated value is represented by the vertical dotted line and cumulative 1-σ credible levels are represented by a shaded region. Right: full posterior on µ log 10 A and σ log 10 A . The vertical and the horizontal lines correspond to the simulated values. Three closed lines correspond to the standard 1-, 2-and 3-σ levels.
Representing one proposal likelihood through the posterior P(θ k |d k ), the evidence Z(d k ), and the prior π(θ k ), we obtain L(d|µ, σ) = N k=1 Z(d k )P(θ k |d k ) L(d k |θ k , µ, σ) where the prior canceled out. Next, we approximate the integral for a k'th pulsar with a sum over n k posterior samples, θ k,j , providing where, omitting θ k,j , L(d k |µ, σ) = L(d k |A c,k )π(A c,k |µ, σ)dA c,k .
The approximation of an integral via the sum of posterior samples is explained for various applications in Hogg & Foreman-Mackey (2018) and .