The second data release from the European Pulsar Timing Array III. Search for gravitational wave signals

We present the results of the search for an isotropic stochastic gravitational wave background (GWB) at nanohertz frequencies using the second data release of the European Pulsar Timing Array (EPTA) for 25 millisecond pulsars and a combination with the first data release of the Indian Pulsar Timing Array (InPTA). We analysed (i) the full 24.7-year EPTA data set, (ii) its 10.3-year subset based on modern observing systems, (iii) the combination of the full data set with the first data release of the InPTA for ten commonly timed millisecond pulsars, and (iv) the combination of the 10.3-year subset with the InPTA data. These combinations allowed us to probe the contributions of instrumental noise and interstellar propagation effects. With the full data set, we find marginal evidence for a GWB, with a Bayes factor of four and a false alarm probability of $4\%$. With the 10.3-year subset, we report evidence for a GWB, with a Bayes factor of $60$ and a false alarm probability of about $0.1\%$ ($\gtrsim 3\sigma$ significance). The addition of the InPTA data yields results that are broadly consistent with the EPTA-only data sets, with the benefit of better noise modelling. Analyses were performed with different data processing pipelines to test the consistency of the results from independent software packages. The inferred spectrum from the latest EPTA data from new generation observing systems is rather uncertain and in mild tension with the common signal measured in the full data set. However, if the spectral index is fixed at 13/3, the two data sets give a similar amplitude of ($2.5\pm0.7)\times10^{-15}$ at a reference frequency of $1\,{\rm yr}^{-1}$. By continuing our detection efforts as part of the International Pulsar Timing Array (IPTA), we expect to be able to improve the measurement of spatial correlations and better characterise this signal in the coming years.


Introduction
The first direct gravitational wave (GW) detection (Abbott et al. 2016) marked the beginning of a new era in the exploration of the Universe.Although terrestrial interferometers such as LIGO, Virgo, and KAGRA are sensitive to GWs at kilohertz frequen-According to the established cold dark matter cosmological scenarios, galaxy formation proceeds in a hierarchical fashion, with small galaxies merging with each other over cosmic history to build progressively larger structures (White & Rees 1978).If these galaxies host SMBHs in their centres, in the aftermath of the merger, SMBH binaries (SMBHBs) will inevitably form (Begelman et al. 1980).Adiabatically inspiralling SMBHBs are anticipated to be the loudest sources of GWs at nanohertz frequencies.The incoherent superposition of their emitted GW signals forms a stochastic GW background (GWB) whose amplitude and spectral index relate to the galactic merger history of the Universe and to the dynamical properties of the emitting binaries (Jaffe & Backer 2003;Sesana et al. 2008;Sesana 2013).Besides SMBHBs, a stochastic GWB can be produced by a number of other physical processes potentially occurring in the early Universe, including non-standard inflationary fields (Guzzetti et al. 2016), first-order phase transitions (Caprini et al. 2010), and cosmological defects such as a network of cosmic strings (Damour & Vilenkin 2000).A comprehensive overview of these phenomena can be found in Caprini & Figueroa (2018).
Currently, this very low-frequency GW regime can only be accessed with pulsar timing arrays (PTAs, Foster & Backer 1990).The technique of pulsar timing relies on the exceptional rotational stability of a particular population of neutron stars, the millisecond pulsars (MSPs).The times of arrival (TOAs) of radio pulses observed at the telescope are measured precisely using maser clocks referenced to the international atomic time.A model, known as a phase-connected timing solution, is then used to account for every rotation of the pulsar for the entire series of TOAs (see Lorimer & Kramer 2004, for a detailed explanation).The pulsar timing technique has allowed high-precision measurements that have led to several significant breakthroughs, including the first indirect detection of GWs through the measured orbital shrinkage of PSR B1913+16 (Taylor & Weisberg 1989).In a PTA a network of the most stable MSPs is observed regularly and the TOAs are modelled.It is within the small deviations from the model (the residuals) that nanohertz GWs can be searched for.
The idea of using MSPs to detect nanohertz GWs was first proposed by Sazhin (1978) and Detweiler (1979).The distortions in spacetime caused by a GW propagating over the Earth or over a pulsar lead to stochastic advances or delays in the TOAs.Astrophysical sources produce GWs that cause larger delays over longer timescales, that is, a temporally correlated (red) signal.However, disentangling the GW signal from other red noise sources, such as variations in the interstellar medium (ISM) or intrinsic pulsar spin noise, with a single pulsar is impossible.Foster & Backer (1990) were the first to suggest a PTA as a method to overcome this problem.Not only would the GW signal result in a common red signal (CRS) in all pulsars, but the signal would be spatially correlated across the sky.This correlation is related to the quadrupolar nature of GWs.Although GWs passing over the individual pulsars would not be correlated, those propagating over the Earth would be.When the degree of correlation for each pair of pulsars is plotted against their angular separation, this results in the Hellings and Downs curve (HD, Hellings & Downs 1983).It is this HD curve that allows us to distinguish the GWB from other potential sources of correlated signal (e.g. the modelling of Earth's motion in the Solar system and local clock instabilities, Tiburzi et al. 2016).
The European Pulsar Timing Array (EPTA, Kramer & Champion 2013) was formed in 2004 to facilitate the detection of GWs.However, it uses pulsar observations taken well before its formal creation, some of which were specifically for PTA-style analysis.The EPTA data are provided by some of the largest radio telescopes in Europe: the Lovell telescope at the Jodrell Bank Observatory, the Nançay decimetric radio telescope, the Westerbork synthesis radio telescope, the Effelsberg 100 m radio telescope, and the Sardinia radio telescope.These telescopes supply independent data sets at a range of observing frequencies (see the EPTA Collaboration 2023), but since 2009 they have also worked together as the Large European Array for Pulsars (LEAP), a coherently phased interferometer with an equivalent dish diameter of up to 194 m (Bassa et al. 2016).
The earliest EPTA data date back to 1994, with most pulsars having over 15 years of data.The bandwidth available and the backends used to record the data have improved over the years.While only some coherently dedispersing backend systems were used initially, considerable upgrades were made around 2005-2010, when most telescopes switched to broadband, coherent dedispersion systems.In addition to offering a wide range of observing frequencies and high observing cadence (with weekly or even shorter spacings between successive observations), multiple telescopes also allow the data sets to be checked against each other, highlighting any local issues.This is crucial for reliability.
THE EPTA is a founding member of the International Pulsar Timing Array (IPTA, Verbiest et al. 2009), along with the Parkes Pulsar Timing Array (PPTA) which uses the Parkes telescope (Manchester 2006;Hobbs 2013), and the North American Nanohertz Observatory for Gravitational Waves (NANOGrav), which uses data from the Arecibo observatory, Green Bank radio telescope, and Very Large Array (Jenet et al. 2009;Arzoumanian et al. 2015).Recently, the Indian Pulsar Timing Array (InPTA, using the Giant Meterwave Radio Telescope, Joshi et al. 2018) has joined the IPTA as a full member, while the Chinese Pulsar Timing Array (CPTA, using the Five Hundred Meter Spherical Telescope, Jiang et al. 2019), and the MeerTIME programme (using the MeerKAT telescope, Bailes et al. 2020) have become observer members.
The first EPTA data release (DR1) was made in 2015 (Desvignes et al. 2016) and was used to place an upper limit on the GWB (Lentati et al. 2015).However, during the analysis of the six best pulsars in the array, a weak CRS was observed in the data.An analysis of the same pulsars in 2021 (Chen et al. 2021) allowed for a direct comparison with the earlier work and clearly showed that not only was the CRS still present in the data, but also that its properties could be significantly better constrained.This CRS is consistent with the findings of the NANOGrav 12.5year data set (Arzoumanian et al. 2020), PPTA DR2 (Goncharov et al. 2022a), and IPTA DR2 (Antoniadis et al. 2022).However, six pulsars do not provide enough pairs to sufficiently sample the HD curve, the crucial signature of GWs.To this end, EPTA Data Release 2 (DR2, the EPTA Collaboration 2023) has been created using 25 MSPs optimally selected among those timed by the collaboration, following the method described in Speri et al. (2023).
In this paper, we present the results of the search for a stochastic GWB at nanohertz frequencies in the EPTA DR2.A summary of the data set and noise models is given in Section 2. For more details, we refer interested readers to the companion papers (the EPTA Collaboration 2023; the EPTA and InPTA Collaborations 2023), respectively.In Section 3 we briefly review our analysis methods, which are similar to those used in the sixpulsar analysis presented in Chen et al. (2021).Our main results, including a comparison of the full DR2 data set against a reduced data set that includes only the new generation backend systems, are presented in Section 4. In Section 5 we discuss the addition of InPTA data and its impact on the GWB search, and draw our conclusions in Section 6.

Data set and noise models
The DR2 includes observations of 25 pulsars selected from the DR1 source list.These data were collected with six EPTA telescopes, including LEAP.The DR2 data set is a combination of data from DR1 with those recorded with a new generation of data acquisition systems which offer significantly wider bandwidth and thus greater sensitivity.The DR2 data set offers a variety of time spans for different pulsars, from a minimum of 14 to a maximum of 25 years.That data set also has a broad observing frequency coverage, starting from (∼300 MHz) and extending up to (∼4 GHz).A subset of DR2, using data for six pulsars was used for the common red noise process search presented in Chen et al. (2021).Since then, our multi-telescope data have allowed us to detect and correct an issue in the clock corrections applied to the data collected with the 'NUPPI' pulsar backend (Cognard et al. 2013) at the Nançay Radio Telescope.More details of the EPTA DR2 data set and timing analysis results can be found in our data release paper (the EPTA Collaboration 2023).
For the analysis presented in this paper, we used two versions of the DR2, the full data set and a truncated version that features data collected with the new generation of pulsar backends only.These are extended by incorporating data from the first InPTA data release (Tarafdar et al. 2022) for an overlapping set of ten pulsars.The InPTA data set was obtained using the upgraded Giant Metrewave Radio Telescope (uGMRT) from MJD 58235 to 59496 covering about 3.5 years.It complements the EPTA data with simultaneous observations in the 300-500 MHz and 1260-1460 MHz bands and adds about 0.7 years to the EPTA time span.To summarise, we analyse the following four data sets, additional details of which can be found in the EPTA data release paper (the EPTA Collaboration 2023) and the accompanying pulsar noise analysis paper (the EPTA and InPTA Collaborations 2023): 1. DR2full.The complete EPTA DR2 data set, covering 24.7 years of data; 2. DR2new.A reduced version of the entire data set, including only the last 10.3 years of data, collected with new generation wide-band backends; 3. DR2full+.The same as DR2full, but with the addition of InPTA data for ten pulsars, covering 25.4 years of data; 4. DR2new+.The same as DR2new, but with the addition of InPTA data for ten pulsars, covering 11.0 years of data.
The new generation backends use improved hardware for the conversion of the electric signals to digital data streams and allow for coherent dedispersion during the observations, whereas previous systems mostly operated with incoherent dedispersion.The increased processing power also enables us to use up to four times the frequency bandwidth as compared to the older, legacy backends.
Before a correlated signal can be searched for, the deterministic properties of individual pulsars need to be modelled.This includes the spin, astrometric, orbital (for binaries), and noise parameters of the pulsar (the EPTA Collaboration 2023).Single pulsar noise models for the data sets mentioned above have been obtained from a specific model selection scheme presented in the EPTA and InPTA Collaborations (2023).For all pulsars, the timing model parameters were analytically marginalised and the white noise parameters EFAC and EQUAD were set at fixed values; cf.Section 3. The TOAs are measured by averaging over a frequency range in which the pulse profile can be considered stable.For the legacy systems, the full bandwidth of about 128 MHz was used.However, for the new generation backends with larger bandwidths, we split the observation into sub-bands, treating each sub-band independently with a template and offset.This method allowed us to reduce the number of TOAs while retaining most of the information from the observations.With at most four TOAs per observation and due to the significantly lowered sensitivity as a result of the sub-banding we could not measure significant, time-correlated white noise.Thus the ECORR parameter, which describes the presence of such noise, was not included in the analysis.Model selection was applied for other time-correlated signals, allowing for the selection of the most favoured combination among observing-frequency independent red noise (RN), dispersion measure variations (DM), and scattering variations (SV).These correspond to stochastic signals that induce a delay in the timing residuals with a chromatic index k of zero, two, and four, respectively, which characterises the dependence on the observing radio frequency ν −k .Two large events were observed in PSR J1713+0747, one at MJD ∼ 54757 (Coles et al. 2015;Zhu et al. 2015;Desvignes et al. 2016) and one at MJD ∼ 57510 (Lam et al. 2018;Goncharov et al. 2021a;Chalumeau et al. 2022).These were assumed to be caused by sudden changes in the scattering and dispersion variation and modelled as deterministic signals with fixed chromatic indices k = 4 and k = 2, respectively, as obtained from a Bayesian fit to the data.While both events are spanned by DR2full, only the second event falls within the time-span of DR2new.
Each noise process is modelled as a sum over Fourier components.Following Chalumeau et al. (2022), we did not fix a priori the number of Fourier components of the various processes in the noise analysis.Instead, for all combinations of RN, DM, and SV, we determined the optimal number of Fourier components that best describes the data.We did not consider models that include SV but not DM, as this is not physically motivated.We obtained customised noise models for each pulsar from a Bayes factor (BF) evaluation among the candidate models and performed a final analysis by refitting the timing model parameters simultaneously with these favoured noise models.This enabled further refinement of the timing model parameters and a more reliable evaluation of the white noise parameters, which are subsequently fixed in the GWB analyses.The interpretation of custom noise models and further discussion on the robustness of these results are presented in the EPTA and InPTA Collaborations (2023).

Methods
The analysis methods closely follow those of Chen et al. (2021) and references therein.In the following, we summarise the key components of the analysis and provide details of additional analyses included in this work.
The PTA likelihood function for a CRS search is given by (van Haasteren et al. 2009) . (1) The post-fit residuals of pulsar I at observation i are denoted as δt I,i and C = C * + ΓC CRS is the sum of the block diagonal covariance matrices for all pulsars 1 and the overlap reduction function (ORF) Γ multiplied by the covariance matrix for the correlated common red signal C CRS .The timing model parameters are analytically marginalised over; see van Haasteren & Levin (2013); van Haasteren & Vallisneri (2014) for more details.
The covariance matrix for each pulsar contains information on the white and red noise components RN, DM, and SV.Measurement uncertainties on the TOAs can be calibrated with a pair of white noise parameters, EFAC and EQUAD, for each telescope, receiver and backend combination to modify the initial estimate from the instrument and TOA extraction method, The red noise power spectra were modelled with a power law representing long-term variations in the ToAs which are independent of the radio frequency of the observations.This model was used for both, individual pulsars as well as for any putative common red noise.Propagation of the radio signals through the interstellar medium adds delays that depend on the frequency of the radio photons.Following Chalumeau et al. (2022) we considered two types of processes; DM variations and scattering of the photons by electrons encountered along the line of sight between the pulsar and Earth.These were also modelled with power laws where K is the DM constant at a reference frequency of 1 MHz, k is the chromatic index of two or four for DM and SV, respectively, and ν is the radio frequency of the propagating photons.
The frequencies f of the Fourier basis were chosen to be f n = n/T (n = 1, ..., N), where T is the time interval between the first and last observations, and N is the number of frequency bins considered.This number was customised for each noise process in each individual pulsar, as described in the companion noise analysis paper (the EPTA and InPTA Collaborations 2023).
For the common red signals, we used two methods to determine the optimal number of frequency bins: 1) we fitted a broken power law to estimate the frequency where the red noise became dominant over the white noise (Arzoumanian et al. 2020); and 2) we constructed a free-spectrum model, where the power at each frequency was modelled with an independent parameter (Lentati et al. 2013;Arzoumanian et al. 2020).
For the detection of the GWB, the characteristic spatial correlation described by the HD curve is the key criterion.Here, ζ I J is the spatial angular separation between pulsars I and J, x I J = [1 − cos(ζ I J )]/2, and δ(x I J ) is the Kronecker delta function.
We employed three types of model to search for generic spatial correlations in the data to compare against the expected HD correlation from a GWB: 1. a binned correlation function (Taylor & Gair 2013), where we weighted the evidence for the pulsar pair correlations in ten bins of angular separations between pulsars; 2. a Chebyshev polynomial decomposition to the third order (Lentati et al. 2015;Chen et al. 2021) where y I J = (ζ I J − π/2)/(π/2) and c i are the Chebyshev polynomial parameters whose priors are uniform in the range [−1, 1].The cross-correlation is normalised so that Γ(x) ∈ [−1, 1].This decomposition can be used to compare against constraints from previous EPTA data sets.3. a Legendre polynomial decomposition to fifth order (Gair et al. 2014) where P i are the Legendre polynomial functions of order i and l i are the Legendre polynomial parameters whose priors are uniform in the range [−1, 1].The parameters can be interpreted as the amount of power in the monopole i = 0, dipole i = 1, quadrupole i = 2, and higher modes.A pure GWB would have no monopole or dipole contributions (i.e.l 0 = l 1 = 0) in this decomposition with all power at i ≥ 2.
The off-diagonal elements of the covariance matrix encode the information on cross-pulsar correlated common signals.
Apart from the quadrupole HD, we tested for the presence of a monopole (associated with, e.g.clock time errors) and a dipole (associated with, e.g.systematics in the model of the position of the Earth, the Solar system ephemeris, SSE) term.Unless otherwise stated, all analyses were performed with a fixed SSE model, DE440, produced by Park et al. (2021).To check for possible SSE systematics, we performed additional analyses using the BAYESEPHEM model (Arzoumanian et al. 2018;Vallisneri et al. 2020).
Using only the diagonal terms of the covariance matrix allows for fast computational analysis and corresponds to a model without any spatial correlations, which we refer to as common uncorrelated red noise (CURN).It is also possible to use only the cross-terms to search for a common signal in a split-likelihood analysis (Arzoumanian et al. 2020;Antoniadis et al. 2022).If the posterior distribution of the uncorrelated model has a substantial number of samples that are within the support of the correlated model, it is possible to employ the reweighting formalism, which was introduced in Hourihane et al. (2022), to approximate the posterior of the correlated model.The reweighting process involves sampling from the posterior distribution of the uncorrelated model (CURN) and then adjusting the weights of the obtained samples to reflect their likelihood under the correlated model.This technique enables the posterior of a correlated model to be obtained efficiently, the Bayes factor between the two models to be obtained, and the quality of the reweighted samples to be quantified.

Bayesian analysis
We estimated the parameters by evaluating the posterior probability, which is proportional to the likelihood given by Equation 1 multiplied by the prior.The inverse of the proportionality coefficient is referred to as Bayesian evidence and it is equal to the integral of the likelihood times the prior over the prior range.When searching for a GWB, we fixed the white noise parameters of each pulsar to the maximum likelihood values produced Table 1: Prior ranges for the parameters of the power laws used in the analysis: amplitude, A, and spectral index, γ.Subscripts RN, DM, and CRS denote the red noise, DM noise, and common red signal, respectively.

Parameter
Prior Type Range by the single pulsar noise analysis (the EPTA and InPTA Collaborations 2023) and we simultaneously evaluated the RN, DM, and SV of each individual pulsar and the CRS.The prior ranges adopted for these parameters is given in Table 1.Bayesian evidence was used for model selection, where the Bayes factor for one hypothesis over the other is equal to the ratio of the two evidence values corresponding to these hypotheses.The posterior evaluation was done mainly with PTMCMCSAMPLER with other samplers used for cross-checking: m3c2 (Falxa et al. 2023), Eryn (Karnesis et al. 2023), and (Williams et al. 2021).
In this work, we performed model selection in two ways.First, directly, by introducing a hyperparameter that switches between likelihoods corresponding to the two models.The ratio of the fraction of samples using one model to the fraction using the other model is the Bayes factor.This method is known as the product-space sampling method (Carlin & Chib 1995;Hee et al. 2016).The second method involves resampling from the CURN model; it is mentioned above and described in detail in Hourihane et al. (2022).

Frequentist analysis
We also used the frequentist optimal statistic (OS) framework, developed by Anholm et al. (2009), Demorest et al. (2013) and Chamberlin et al. (2015) with the noise marginalisation introduced by Vigeland et al. (2018), to compare against the Bayesian results.The Bayesian output of a CURN analysis was used as the input for the OS.This allowed for high computational efficiency, as the posterior distributions of pulsar noise were directly used to compute the statistics.With the OS we can compute the amount of correlated power for each pulsar pair.Comparing this correlation against different models gives signal-to-noise (S/N) estimates for different types of spatial correlations.

Software packages
As in Chen et al. (2021), we used both ENTERPRISE2 (Ellis et al. 2020;Taylor et al. 2021) and FORTYTWO3 (Caballero et al. 2016) for the PTA likelihood computation and cross-check the main results with both pipelines.Some of the more specific analyses were performed only with ENTERPRISE for computational cost efficacy since we have demonstrated in Chen et al. (2021) that the two pipelines produce broadly consistent results.

Search results on the EPTA data sets DR2full
and DR2new sults for the EPTA+InPTA data set analysis are presented in Section 5.
For simplicity and efficiency, our general analysis setup uses the DE440 Solar system ephemeris fit.The starting values for the marginalisation of the timing model are taken from the timing analysis (the EPTA Collaboration 2023).Pulsar noise models and observing system white noise parameters are taken from the single pulsar noise analysis (the EPTA and InPTA Collaborations 2023).
Previous PTA analyses (e.g.Arzoumanian et al. 2020;Antoniadis et al. 2022) have shown the importance of choosing the optimal number of frequencies to model any putative common signal and that most of the power of a common red signal can be found in the lowest frequency bins.We choose the width of the frequency bin to be 1/T , where T is the time span of the data set.For the power law the lowest 24 ( 9) frequency bins are used to model the CRS in the DR2full (DR2new) data set, which corresponds to a maximum frequency f max = 24/24.7 yr −1 (9/10.3yr −1 ) respectively.We subsequently used these limiting frequencies for the remaining analyses (unless otherwise specified).This choice has been verified with a broken power law analysis, which shows that using a larger number of frequency bins does not impact the recovery of the parameters of the CRS.
We show the posterior distributions of the free-spectrum model for the first 50 (20) frequency bins for the DR2full (DR2new) data set in Figure 1.The most noticeable difference is in the lowest constrained frequency bin.Extending the DR2new only best fit power law to lower frequencies would result in a lower CRS in the 1/24.7 yr −1 bin, compared to what is measured in DR2full.Analogously, fitting a power law to the DR2full data set excluding the lowest frequency bin could give constraints that are more consistent with the new data set.This could be due to the non-stationarity of the processes involved, deviations from the power law model or some additional unmodelled noise in the full data set.Further investigations are needed to understand better this difference and if and how it could be mitigated.

Spectral parameter constraints
The right panel of Figure 1 shows the posteriors of the recovered parameters for a power law model for a HD correlated process.The recovered power law parameters with DR2new are logarithmic amplitude log 10 A = −13.94+0.23  −0.48 and spectral index γ = 2.71 +1.18  −0.73 (90% credible regions).The spectral index is shallower than the expected mean value of 13/3 from a population of circular SMBHBs, which still lies within the 3σ credible region (also see Middleton et al. 2021).With DR2full the recovered logarithmic amplitude log 10 A = −14.54+0.28  −0.41 and spectral index γ = 4.19 +0.73  −0.63 is closer to 13/3.The two data sets give consistent results in the sense that the two posteriors overlap and lie on the same A − γ degeneracy line that corresponds to fixing the total HD-correlated power.Therefore, the HD-correlated power measured in DR2full and DR2new is comparable, although the spectral shape is not well constrained and appears to be different in the two data sets.In support of this statement, when fixing the spectral index to 13/3, the background amplitude inferred from the two data sets is consistent, with a value of log 10 A = −14.61+0.11  −0.12 .Table 2 gives the 90% credible regions of the recovered power law parameters for different analyses and models.Once the data set is fixed, the CRS (CURN or GWB) parameter constraints are consistent between different software packages,    Table 4: Median optimal statistic amplitudes and S/N for a single component fit for the monopole, dipole, or HD correlation fixing the spectral index of the common signal to 13/3.The uncertainties indicate the 90% credible region.namely ENTERPRISE and FORTYTWO, as well as different models.However, as already indicated by the free-spectrum analysis, there is a systematic difference in the CRS recovery between DR2full and DR2new.
We quantify the differences between the power law posterior distributions that arise from using different software packages and data sets by adapting the tensiometer package, outlined in Raveri & Doux (2021) and summarised in the EPTA and InPTA Collaborations (2023).This package essentially provides the probability density function of the parameter differences which can be integrated to obtain the mean probability for the presence of parameter shifts (see equation 4 in Raveri & Doux (2021)).The resulting probability for a parameter shift can be converted into an effective number of σs using the standard normal distribution.In short, the package produces a score that can be interpreted as 'within how many σ' two distributions are consistent (see also Raveri & Hu 2019, for more details).The results of this analysis in Table 3 indicate that the differences are minimal when comparing posteriors between different analysis software packages (ENTERPRISE vs FORTYTWO) regardless of the data set (either DR2full or DR2new).However, when comparing GWB posteriors between different data sets (DR2full vs DR2new), there are tensions of ∼ 1σ for CURN, ∼ 1.4σ for HD, and ∼ 1.6σ for Binned ORF, regardless of the software package used.
Figure 2 shows, in the left panel, the two-dimensional posterior difference distribution between the ENTERPRISE and FORTYTWO posteriors obtained for the DR2new data set, again showing consistency of the results provided by the two independent analysis packages.On the contrary, the corresponding distribution for the difference in the posteriors associated with DR2full and DR2new, shown in the right panel, highlights the significant difference between the two data sets, more detailed comparisons can be found at the following URL4 .
The parameter constraints from the Bayesian pipelines can be compared with the results of OS estimates.We first fix the spectral index to γ = 13/3 and compute the OS amplitude and S/N for a CRS with monopole, dipole, or HD correlation.A summary of our findings is given in Table 4, for the three correlation patterns in the four different data sets.The best-fit amplitudes for the HD correlation from the OS can be compared with the Bayesian value found when slicing the posterior at γ = 13/3, which is A 2 HD = 6.0 +4.0 −3.0 × 10 −30 .We notice that this value sits halfway between the OS amplitude estimate for the two data sets, with A 2 HD of 2.7 +3.0 −2.5 × 10 −30 for DR2full and 10.0 +5.1 −4.9 × 10 −30 for DR2new.Both estimates overlap with the Bayesian value within their 90% credible region.The median value for the OS S/N estimate for a HD-correlated process increases from 1.3 in DR2full to 3.5 for DR2new.The A 2 and S/N distributions of the correlated processes as estimated by the OS are shown in Figure 3, which further highlight the HD correlated signal emerging in DR2new.
Although we fixed γ = 13/3 in the previous analysis, the OS can also be computed for a common red process with an arbitrary spectral slope.Figure 4 shows how the OS amplitude and S/N of the DR2new data subset change as we vary the spectral index γ of the CRS model.We increased γ from 1.0 to 5.0 in steps of 0.5 and also included γ = 13/3 to show the expected spectral index of a stationary ensemble of inspiralling SMBHBs.We evaluate the S/N for the monopole, dipole, and HD correlations for each γ.The median of the HD S/N appears to peak around a γ of 2.0, broadly consistent with the shallow posterior found in the Bayesian analysis (cf. the right panel of Figure 1).The spread of the histograms, however, means that S/N values are self-consistent across the whole range of γ.

Spatial correlation constraints
After checking for spectral properties, we reconstruct the spatial correlation of the common red signal.The results of the Bayesian search for the correlations with ten binned free parameters and a common red signal power law are shown in Figure 5.The bins are chosen so that each of them contains 30 pulsar pairs.The grey-shaded histogram represents the distribution of pulsar pairs as a function of separation.Since the pulsar distribution is concentrated in the galactic plane, we have more pairs at small angular separations compared to an array of pulsars uniformly distributed across the sky.However, broad coverage of all angles is still achieved with the 25 pulsars chosen using the ranking procedure of Speri et al. (2023).When comparing the DR2full ORF constraints with those of DR2new, one can see that the latter appears much more consistent with the expected HD correlation.
In particular, the bins around 60, 80 and 135 degrees (i.e. the fifth, sixth, and ninth bins) have more positive correlation coefficients in DR2full.These appear to be responsible for the signal in DR2full being consistent with a CURN and a monopole, as also implied by the OS amplitude and S/N for a monopole correlation reported in Table 4.In contrast, DR2new is very consistent with a HD-correlated process.We also use Chebyshev and Legendre decompositions for the ORF in the Bayesian analysis and find ORF and power law constraints that are consistent with the binned free parameter analysis presented here; see Appendix A.
For comparison, the spatial correlations computed with the OS marginalised over the pulsar noise parameters are shown in Figure 6, where the correlation coefficients have been obtained by scaling to the median amplitude at fixed γ = 13/3, as given in Table 4.For each noise realisation, only the median values of the pulsar pair correlation are used.While Bayesian analysis averages the correlation within each bin, the OS uses each pulsar pair independently and fits the best correlation across all pairs.For comparison and visual purposes, we choose the same binning and avoid showing 300 individual pulsar pairs.Although the two methods give broadly comparable ORF constraints, several differences can be found.Firstly, the first bin with the pulsar pairs with the closest separations deviates away from the HD and is consistent with no correlation.Second, the fourth and seventh bins drop significantly into negative correlations.These dips are most prominent in DR2full, while DR2new follows the HD curve more closely.Consistent with the Bayesian evaluation, the OS reconstruction also shows prominent positive correlations for the fifth, sixth, and ninth bins in the DR2full data set, making the overall curve inconsistent with HD.
To quantify how likely the data set is actually showing evidence for a GWB with HD correlation, we compute Bayes factors comparing different spatial correlations: Hellings-Downs (HD) correlations that arise from a GWB, monopole correlations that could be produced by clock errors (CLK) and dipole correlations that could be due to SSE systematics (EPH).Firstly,  5).The evidence for these two correlations as additional processes to the CURN is inconclusive (row IDs 5 and 6).Conversely, while DR2full shows very little evidence for a GWB compared to the CURN, DR2new has a significant Bayes factor in favour of HD (row ID 2).Since this is a significant result, we have recomputed the Bayes factor using several alternative samplers and methods, as described in Section 3.1, obtaining Bayes factors of 66, 56, 62.In this data set, we also find that BFs for models including an additional CLK or EPH or CURN are about a factor of two smaller compared to the model including a GWB alone (row IDs 7, 8 and 9).On the contrary, the analogue BFs for DR2full are inconclusive with an indication for an additional monopole process (row ID 8).
These BFs can be compared to the S/N from the OS in Table 4 and Figure 3.The DR2new data set yields a median S/N ≈ 3.5 for the HD correlation, while it is about 1.3 in DR2full.These S/N estimates act as semi-independent confirmation of the BFs from Table 5.Consistent with the slightly higher BF for an additional monopole in DR2full the S/N is ≈ 1.2.For DR2new the S/N for a monopole drops to be consistent with zero.Lastly, no significant signal for a dipole correlation is found in either of the two data sets.Fig. 6: Constraints on the overlap reduction function from the optimal statistic.Blue and orange points indicate the results for DR2full and DR2new respectively.The correlation coefficients for each pair of pulsars are weighted and averaged following the description in Allen & Romano (2022) and grouped in the same way as those in Figure 5 for comparison.The HD correlation is plotted as a black line for reference.

Significance tests
To quantitatively estimate the significance of the hypothesis that a GWB signal with HD correlation is present in the data, the null hypothesis distribution need to be constructed.Many repetitions of an experiment need to be performed in order to define a strict p-value.This is, unfortunately, not possible for PTAs.Thus, we can only attempt to find a good proxy to estimate the true statistical p-value for the null hypothesis.In the following, we refer to the estimated value from our proxy methods as p-values for simplicity.The respective distributions can be constructed in two different ways, by introducing random phase shifts in the Fourier basis of the common red noise process (Taylor et al. 2017) or by moving the positions of the pulsars in the sky via a random scramble (Cornish & Sampson 2016).The aim of both methods is to effectively destroy the distinctive cross-pulsar correlations, unique to the GWB signal, while retaining the individual pulsar noise characteristics.One should emphasise that both methods should be robust against any mismodelled features in the data set, therefore they, in general, provide more conservative estimates of the significance in comparison to the possibly oversimplified noise simulation bootstrapping.The distributions of BFs under the null hypothesis (PSRN + CURN) were constructed for DR2full and DR2new using about 200 and 2000 phase shifts, respectively and are displayed in the upper panel of Figure 7.The DR2full measured BF from Table 5 lies within the 2σ range of the null hypothesis distribution with a p-value of 0.04.The p-value for the BF derived with the DR2new data set reaches a statistically interesting value of 0.0005, which corresponds to the 3σ level of significance ('evidence').The analysis was performed using both ENTERPRISE and FORTYTWO and shows consistent results between the two software packages.This significance test was repeated for the OS S/N values for the HD correlation and results are shown in the bottom panel of Figure 7.For DR2full a p-value of 0.07 is found.None of the 10000 realisations produced a S/N that is comparable to what has been found in DR2new.Therefore, only an upper limit can be set for the p-value < 0.0001, which corresponds to a significance of > 3.5σ.
Figure 8 shows the null distribution obtained with sky scrambles in the OS analysis in the top panel.A matching threshold of 0.2 for any two sky scrambles was imposed to produce about 5000 samples.A large difference particularly in the high S/N tail of the density functions can be found between DR2full and DR2new.The p-value for DR2full of 0.08 is comparable to that obtained with the phase shifts.This could indicate that in the low S/N regime, both methods produce reliable null distributions.In the high S/N regime, however, with DR2new the sky scramble p-value of 0.004 is not consistent with the phase shift method.
The bottom panel of Figure 8 compares p-values from simulations, theoretical computation and the two methods.A null distribution was generated using a set of realistic simulations resembling the statistical properties of the real DR2new data set and with the injected CURN only.The noise parameters as well 2 1 0 1 2 log 10 BF (GWB vs CURN)  The top panel is for the Bayes factors for PSRN+GWB versus PSRN+CURN using ENTERPRISE (EP) and FORTYTWO (42).It should be noted that due to the computational cost, we calculated only a limited number of phase-shifted BFs.This could explain the differences in the 1−CDFs.The OS S/N for the HD correlation with ENTERPRISE is shown in the bottom panel.Blue lines are for DR2full while orange lines are for DR2new.Vertical lines are the measured Bayes factor for PSRN+GWB versus PSRN+CURN reported in Table 5 or the OS S/N HD reported in Table 4 respectively.The estimated p-values for each method are given in the legends.
as the amplitude and slope of the CURN for the null distribution were taken as random samples from the posteriors of the CURN search with DR2new.Additionally, we compare these pvalues with those obtained from the theoretical null distribution described by generalised χ 2 (GX2) distributions and derived in Hazboun et al. (2023).The distribution is computed by fixing the noise parameters to the median values of the posteriors5 .
Both null distributions are compared to the proxy distributions obtained via phase shifting and sky scrambling.For consistency, instead of using the real data set, a target data set was simulated using the same procedure as the simulations for the null distribution, but with GWB injected instead of CURN.The lowest p-value of 0.002 is obtained using phase shifts.The most 3 2 1 0 1 2 3 4 5 4. The large discrepancy between the two data sets could be an indication that some remaining signal is still present after the scrambling, especially in the strong S/N regime.More checks need to be performed to assess the validity of this method.The bottom panel compares a simulated null distribution in cyan, the generalised χ 2 (GX2) distribution from (Hazboun et al. 2023) in purple and the two proxy methods of phase shifting in orange and sky scrambling in green.At the measured value from the DR2new, the two methods differ by a factor of a few.The estimated p-values for each method are given in the legends.
conservative number is obtained when using all sky scrambles without introducing any threshold or weighting with the p-value of 0.008.At S/N HD = 3.5 the p-values of the simulated (0.006) and theoretical (0.003) null distributions lie between those from phase shifting and sky scrambling, see Figure 8.We have tested introducing thresholds on the match between the true and scrambled pulsar positions and amongst the scrambles themselves and found that in general smaller thresholds lead to a decrease in the proxy null distribution.Similar results are obtained when adding weights to account for the different contributions from each pulsar.From all the above we can conclude that the p-value for the S/N found with DR2new should be ∼ 0.004, which was also obtained with sky scrambling at a threshold of 0.2 and no weights.
The inconsistencies between the p-values obtained with the real and simulated target data sets can be due to the incompleteness of our simulations (e.g.exponential dips for J1713+0747 are not included and a simpler noise model with a fixed number of frequency components was used).
For an optimal sky scrambling orthogonality may need to be ensured between different realisations.Additionally, a realistic PTA does not have equally good pulsars, which should be taken into account when assessing the match between different scrambles.This can limit the maximum number of possible sky scrambles or their effectiveness in breaking correlated processes (Di Marco et al. 2023).On the other hand, Hazboun et al. (2023) have shown that sky scrambling can lead to null distributions that are consistent with the theoretical prediction.Further studies are required to determine whether any method can be a good conservative proxy for PTA experiments to accurately estimate the p-value and significance of a detected signal.

Consistency tests
4.4.1.Comparing the power in the auto-correlation and cross-correlation terms.
For a true GWB both the auto-correlation and cross-correlationterms should constrain the same process.Since the crosscorrelated power is proportional to the square of the expectation value of the ORF, that is Γ 2 , which is always ≪ 1, it is expected that the power in the auto-correlation terms -equivalent to the CURN -is the dominant contributor to the signal.However, we stress that the cross-correlation terms contain the 'smoking gun' that can provide conclusive evidence for the presence of a GWB in the data.We apply the split likelihood technique (Arzoumanian et al. 2020;Antoniadis et al. 2022) to both DR2full and DR2new.Figure 9 shows the resulting posterior contours.While the autocorrelation-terms-only-analyses recover the CURN with consistent amplitude and slope for both data sets, noticeable differences can be seen for the power law parameters that are constrained using the cross-correlation terms only and assuming the HD correlation.For the DR2full data set, the cross-correlation terms contain virtually no power and thus only an upper limit is found for an HD-correlated signal.On the contrary, consistent with the much larger evidence for a GWB, the cross-correlation terms in DR2new produce a peak in amplitude.Since a long tail still remains, one cannot rule out the possibility of a zero amplitude.

Solar System Ephemeris systematics
The effects of SSE systematics on the PTA GWB search have been studied in Tiburzi et al. ( 2016); Guo et al. (2019) and models that can help mitigate the dipolar correlated signal induced by SSE systematics have been added to the tools for PTA analysis.We employ the widely used BAYESEPHEM model (Arzoumanian et al. 2018;Vallisneri et al. 2020) on the EPTA-only data sets.
Figure 10 compares the addition of BAYESEPHEM to each data set against the use of the DE440 fit alone.Allowing SSE systematics to be present and absorbed by a model is a more conservative approach.In general, the left panels show that certain frequency bins have lower power compared to the DE440 analysis.This in turn broadens the power law posteriors.Comparing the DR2full data set against the DR2new data set, the longer time span of DR2full helps to produce results that are more independent of the SSE fitting used, while the short time span of DR2new strongly limits its ability to separate SSE systematics from other common signals.In fact 10.3 years is close to the Jovian orbital  Adding BAYESEPHEM also affects the BF for the different signal models.Table 6 shows a selection of the same models as Table 5.The most relevant effect in the search for a GWB is that the BF in favour of the HD correlation in DR2new is reduced from about 60 to 17, a factor of about 3. Since BAYESEPHEM is known to partially absorb power from the GWB, this reduction follows the expectation, although the exact amount is difficult to predict (Vallisneri et al. 2020).model.This is because the standard uniform prior distributions for power law parameters of pulsar-intrinsic red noise do not match the observed distributions of these parameters.The issue was addressed in Goncharov et al. (2022b), where the authors introduced a hierarchical model that governs the distribution of power law amplitudes across pulsars in the CURN.As the distribution width is consistent with zero, this indicates the likely presence of a common-spectrum stochastic process in the data.Dropout analysis, (e.g.Arzoumanian et al. 2020) also enables the mitigation of this issue by identifying pulsar outliers.The effect of the range of simulated pulsar noise parameters on the spurious evidence for CURN is demonstrated in Figure 6 in Zic et al. (2022).
To test each pulsar's consistency with the CURN, we measured both the dropout factors and the distribution of power law amplitudes of the CURN spectrum in the pulsars.The results are shown in Figure 11.The two top panels show the measurements of σ log 10 A and µ log 10 A , the standard deviation and the mean of the power law amplitude of CURN measured in the pulsars, following Goncharov et al. (2022b).As expected, the mean µ log 10 A is consistent with the measurement of log 10 A CURN .At the same time the standard deviation is consistent with zero, confirming the presence of common temporal correlations in the data.Based on the measured dropout factors, we find that in both data sets only a few pulsars have intrinsic red noise that seems to be inconsistent with the CURN.The majority of pulsars display indiffer-ent behaviour, leaving around five pulsars to contribute most to the constraints of the CURN power law.However, the two data sets have CURNs with very different posterior contours.This can have an impact on the difference between certain pulsars agreeing more or less with the CURN.J1909−0747 and J1744−0744, for example, have very steep intrinsic red noise, thus they are more consistent with the steep CURN from DR2full compared to the shallow CURN from DR2new.More discussion on the intrinsic pulsar noise properties can be found in the EPTA and InPTA Collaborations (2023).

Continuous GW signal search
In addition to the GWB search, we have also searched our data for the presence of a continuous GW (CGW) signal from an individually resolvable SMBHB in a circular orbit.This subsection presents a preliminary analysis and a detailed investigation (which includes simulations and significance estimation) will be given in a separate paper.The main aim of this section is twofold: (i) to look for an alternative explanation of the observed common signal and (ii) to understand how the inclusion of the CGW signal in the model affects the main findings of our analysis.
First, we performed an analysis using the DR2full data set.The addition of the CGW signal to the custom pulsar noise and CURN is not informative about the presence of a CGW, with a Next, we move on to the analysis of the DR2new data set.We started by considering a simple model of a CGW source superimposed on PSRN.We found substantial support for the presence of a CGW: the BF of the model CGW+PSRN over the null model, PSRN, is 200 with the Earth term only and 260 if we include the pulsar term in the search.For this search we used the sampler and method described in Bécsy et al. (2022).The candidate source is localised in sky position and frequency.We have also computed the F e statistic (a frequentist approach; see Babak & Sesana 2012;Ellis et al. 2012).The results are shown in Figure 12.The F e statistic can be marginalised over the individual pulsar noise parameters in a similar manner as the OS.It is shown in blue with the mean value corresponding to S/N≈ 4.5.We computed the corresponding p-value (≈ 0.1% ) by scrambling the Fig. 12: F e -statistic for a CGW search marginalised over the noise uncertainties (blue).The F e obtained from the analysis of the data with scrambled sky position of pulsars is shown in orange and it is compared with theoretical (χ 2 4 ) distribution in black.
pulsar sky positions, assuming 0.2 match as the orthogonality criterion.The F e -statistic takes only the Earth-term into account and scrambling the pulsar's position destroys the coherence of the CGW, while preserving the noise properties.The F e of sky scrambles is shown in orange and is compared to the theoretically expected (for Gaussian noise) central χ 2 distribution with four degrees of freedom.
Including the CURN in the model does not change these results significantly: we can still identify the CGW candidate with BF CGW+CURN CURN = 7 − 20, with the exact value depending on whether one includes the pulsar term (BF = 7) or only the Earth term, as well as on the sampling method used (BF = 12, 20).Interestingly, the CURN parameters are much less constrained in the CGW+CURN model.We show partial results of the Bayesian parameter inference in Figure 13.
Finally, we have considered a model containing a GWB plus a CGW and found that the GWB 'absorbs' most of the CGW, that is, the CGW becomes poorly constrained.At the same time, our results indicate that we cannot exclude the presence of the CGW, since the BF is not informative (BF GWB CGW = 1.2), while the CGW model has a larger parameter space.We note that the identified CGW candidate frequency of ≈ 5 nHz is between the two lowest frequency bins that dominate the HD signal, as shown in Figure 1.The rest of the bins do not contribute much to the GWB signal, due to the relatively short observation span of the DR2new data set and the high level of white noise.To summarise, we find that the observed data is equally well explained by either a GWB or a single CGW.However, given the additional number of parameters for the CGW model and in the absence of additional data we favour the simpler GWB model.A detailed analysis including extensive simulations will be presented in a forthcoming publication.

DR2full+ and DR2new+
Analyses of DR2full+ and DR2new+ data sets including the InPTA data indicate general consistency of the results with the DR2full and DR2new data sets, respectively.We provide a comparison of posteriors for the CURN and GWB between the data sets, with and without InPTA data, in Tables 7 and 8, as well Fig. 13: Inference of the frequency and amplitude of a putative CGW in the CGW+CURN model.Plotted are the 50% and 90% credible regions.Dark-cyan contours are obtained using the Earth term only and ENTERPRISE with PTMCMCSAMPLER, purple contours with Eryn sampler, while coral contours are produced using a sampler described in Bécsy et al. (2022) and also include the pulsar term.
as in Figure 14; see also Appendix B. While the DR2full+ and DR2full produce very consistent posteriors, a ∼ 0.2σ difference can be found between DR2new+ and DR2new.The impact of the InPTA data is more noticeable in the shorter data set.
The results for the OS shown in Table 4 are consistent with the EPTA-only data sets.Both DR2full+ and DR2full give similar amplitudes and S/Ns for the three correlation models.An increase in the S/Ns of about 0.5 for monopole, dipole and HD correlations can be found with the DR2new+ against the S/Ns in the DR2new.
The corresponding BFs are in general agreement with the EPTA-only results (cf.Table 5).The BF between GWB and CURN in the DR2new+ of 65 is comparable to the 60 from DR2new.However, when testing for additional processes, we find significantly larger BFs for PSRN+GWB+CLK (row ID 8) and PSRN+GWB+EPH (row ID 9): 57 versus 28 and 43 versus 33, DR2new+ versus DR2new, respectively.The small BF difference between the models including the CLK or EPH error terms and the GWB-only model further supports the assumption that the signal could be a GWB.
Finally, we investigate the effect of the additional InPTA on the contribution of the individual pulsars to the CURN via the dropout factor in Figure 11.As with the previous results, most dropout factors are consistent between the DR2full+ and DR2full data sets.J1600−3053 shows an increase in the dropout factor, possibly due to better single pulsar noise modelling with the addition of the InPTA data.For the new generation, EPTA-only and EPTA+InPTA data sets the differences are more pronounced.Most pulsars have dropout factors that are in agreement.Two pulsars, J1713+0747 and J0613−0200, have reduced dropout factors, whereas J1909−3744 shows an increase.
In summary, the results from the EPTA and InPTA combination are in broad agreement with the EPTA-only data set.The consistency between DR2full+ and DR2full shows the robustness of the results from the full EPTA+InPTA combination.When comparing DR2new+ and DR2new the effects of the InPTA   data are more visible with differences in the power law posteriors, increased BFs and higher OS S/Ns for the GWB, but also for other possible noises.Further investigation is needed to assess and improve the combined sensitivity for GW searches.

Discussions and conclusions
The EPTA with its six telescopes and multiple observing systems has collected PTA observations for almost 25 years and using a new generation of observing systems for the last decade.For the DR2 we have increased the number of pulsars combined with the most recent observations from 6 to 25.A selection scheme has been applied to find the 25 pulsars that are sufficient to contribute about 95% of the expected sensitivity of the full array with 42 pulsars from DR1 (Speri et al. 2023).Here, we used the optimal timing and noise models obtained in the EPTA Collaboration (2023); the EPTA and InPTA Collaborations (2023) to search for a stochastic GWB.In addition, we combined data for ten common pulsars between InPTA DR1 (Tarafdar et al. 2022) and EPTA DR2 and conducted GWB searches also on those extended EPTA+InPTA data sets.We present the main results of the GWB search using two versions of the EPTA 25-pulsar data set, the full data set (DR2full) with a time span of 24.7 yr and a new backends-only data set (DR2new) with a time span of 10.3 yr.Both data sets measure a common red signal.By virtue of its length, the full data set yields a better constraint on the spectral properties of the GWB.However, the new backends-only data set provides a better measurement of the cross-correlated power.In the following, we give a summary of the results of our analysis and discuss possible reasons for the discrepancies between the two data sets.
The power law amplitude for an HD-correlated process using DR2full is log 10 A = −14.54+0.28  −0.41 , with a spectral index γ = 4.19 +0.73  −0.63 that is close to the expected value of 13/3 for a GWB from circular SMBHBs driven by GW emission.With DR2new we obtain a flatter spectrum with log 10 A = −13.94+0.23 −0.48 and γ = 2.71 +1.18  −0.73 .Fixing the spectral index to 13/3, the amplitudes for the two data sets are consistent, with values around log 10 A = −14.61+0.11  −0.15 .This indicates that the two data sets constrain the same amount of power in the GWB, although the detailed spectral shape appears to be different.
The free spectrum analysis provides the possibility to directly compare the results from different data sets in the frequency domain.Ignoring the lowest 1/24.7 yr −1 frequency bin of DR2full, the remaining bins show good consistency with those of DR2new.Including the lowest frequency bin in the power law fitting may lead to a steepening of the power law.With a shorter data span DR2new probes a different frequency range starting at about two times higher than 1/24.7 yr −1 .Power law fitting could also be affected by the frequency bins just above 10 −8 Hz, resulting in a flatter spectrum for DR2new.This could either indicate that a power law model does not provide a good fit to the common signal, or there is additional noise or signal around 10 −8 Hz or 10 −9 Hz.
The differences between the Bayesian correlation curves observed in the two data sets are most obvious around angular separations of 60 deg, 80 deg and 135 deg.The correlation curve produced by DR2full shows a prominent monopolar-like structure, with a central part shifted upward compared to the HD curve, while the correlation curve produced by DR2new follows the HD correlation much more closely.These results are in agreement with the measured BFs (PSRN+GWB vs PSRN+CURN), which are four and 60 for DR2full and DR2new, respectively.From the null hypothesis distributions of BFs (PSRN+GWB vs PSRN+CURN) constructed with phase shifts, we can infer a pvalue of 0.04 for DR2full and 0.001 for DR2new.There is essentially no evidence for other correlation patterns or additional common processes in either data set, with perhaps the exception of a tentative hint of an extra monopole in the DR2full data set, which will be the subject of further studies.
Optimal statistic analysis shows even more significant discrepancies between the two data sets.For DR2full the squared amplitude is only 2.7 × 10 −30 , which is lower than the amplitude from the Bayesian analysis (A 2 = 6.02 × 10 −30 ).There is a large scatter in the correlation coefficients, giving a similar S/N of around 1.2 for both the HD and the monopole correlations.The p-value for the S/N of the HD correlation is 0.07 from phase shifts and 0.08 from sky scrambles.For DR2new the squared amplitude is 1.0 × 10 −29 , which is more consistent with (yet higher than) the Bayesian result.The correlation coefficients also match the HD curve much better.The S/N for the HD correlation increases to 3.5, while the S/N for the monopole correlation drops to almost zero.Sky scrambles give a p-value of 0.002 for HD correlation, while phase shifts yield a p-value < 0.0001.This corresponds to > 3σ significance of the GWB signal.
Preliminary analysis including a CGW suggests that its contribution to the observed HD-correlated power cannot be ruled out.The presence of a CGW is not supported in the DR2full data set; its presence is preferred over CURN only in DR2new.The source amplitude and frequency are well-constrained.The candidate is also localised in the sky, but its position and error region depends on whether we include the pulsar term.However, adding a GWB to the analysis absorbs most of the power of the CGW, preventing any strong claim about its actual presence in the data.A more thorough analysis involving the CGW model will be presented in a separate paper.
The analysis of the combined EPTA DR2 and InPTA DR1 shows broadly consistent results with the EPTA DR2 alone.The power law parameter constraints with DR2full+ show little difference to those without the InPTA data.For DR2new+, the effect of the additional InPTA data is more pronounced.The power law parameters experience a small shift of 0.17σ towards a steeper spectral index.The BFs and OS S/Ns are also in general agreement with the EPTA-only data sets.Increases in the evidence for additional monopole and dipole correlated signals of about 0.5 can be found in DR2new+.A larger impact on the shorter data set can be expected, since the InPTA, with three years of time span, is a more significant fraction of DR2new (10.3 years) compared to DR2full (24.7 years).
With the high amplitude and large uncertainty in the spectral index, the observed HD correlated signal is broadly consistent with the expectation from a cosmic population of SMBHBs.In particular, as shown by Middleton et al. (2021) the high ampli-tude of logA = −14.61inferred when fixing γ = 13/3 is consistent with the recent discovery of over-massive black holes (e.g.McConnell et al. 2011) and the upward revision of the normalisation of the SMBH-host galaxy relations (see, e.g.McConnell & Ma 2013;Kormendy & Ho 2013).It is not straightforward, however, to construct a self-consistent SMBH and host galaxy cosmic evolution model that results in such a high GWB signal fulfilling other observational constraints on the SMBH mass function and on the evolution of the bolometric quasar luminosity function with redshift (Izquierdo-Villalba et al. 2022).A spectrum significantly flatter than γ = 13/3 can arise for a number of different reasons, including strong coupling with the environment, the predominance of highly eccentric SMBHBs (see e.g.Sesana 2013) or simply by the presence of extra power at high frequencies due to sparse and loud marginally resolvable individual binaries (Middleton et al. 2021).Besides a cosmic population of SMBHBs, the detected signal can be generated by processes occurring in the early Universe (Caprini & Figueroa 2018) as well as specific models of dark matter (Porayko et al. 2018).We plan to investigate the implications of this signal for all these formation scenarios in follow-up papers.
Our results seem to indicate that DR2new provides better constraints on the cross-correlated power than DR2full.It would normally be expected that the addition of more data would lead to a more significant detection of a stationary process.There are a few possible factors that could be contributing to this discrepancy that needs to be investigated in more detail.
1. Lower quality of the early data, which lacks multi-radio frequency coverage and polarisation calibration, may have allowed for residual unmodelled noise.This can lead to different noises and signals being recovered with the early and new generation observations.Investigating better noise modelling can help to increase the sensitivity and reliability of the early data.2. Improper weights for the power law fitting in different frequency bins could introduce bias in the recovery of the spectral properties.In particular, the lowest frequency bins only have contributions from a few pulsars with the longest time spans, but their weights are the highest since the largest amount of power in a common red signal is at the lowest frequencies.Considering a weighting scheme for the different frequency components for the power law fitting could produce an unbiased result.3. The presence of excess power at low frequencies can lead to a steepening of the power law.While excess noise at high frequencies can make the spectrum appear shallower.In both cases, noise leaks into the GW signal giving erroneous power law constraints.4. Non-stationarity of the pulsar noise or of the putative GW signal can cause the measured spectral properties to be different between the early and late part of a data set as the properties of the noise and signal could evolve over time.
Some of these differences between pulsars are smoothed out in the DR2new data set, as all pulsars have roughly the same time span ∼ 10 yr.This may help to measure the cross-correlations between pulsar pairs more robustly.
An extensive simulation campaign is ongoing to help to better understand the features of our data sets and to build more confidence in the internal consistency of our findings.Verification of the analysis algorithms and their performance on a realistic PTA data set is needed to set our expectations.These can then be compared against the real data set to test the effects that data quality and the noise/signal properties have on the results of the final analysis.Several simulation projects tackling different questions will be published in separate works.
Concurrent efforts from the NANOGrav collaboration on their 15-year data set (Arzoumanian et al., 2023) and the PPTA on their DR3 (Reardon et al., 2023) provide independent results on the search for a gravitational wave background.These will be compared in the IPTA framework to increase our confidence and prepare for the next IPTA data set.Additionally, the CPTA is preparing its first data release and analysis for a GWB signal (Xu et al., 2023).
Moving forward, we plan to add more pulsars timed with the new backends to DR2new.The EPTA data set is unique in its combination of time span, cadence, number of pulsars and, when combined with the InPTA data set, in DM monitoring.Indeed, among the pulsars timed by EPTA, there are more than 30 sources observed with new backends for a time span > 8 years displaying RMS< 2 µs, which can add significant value to the EPTA data set.A combination of the resulting data set with NANOGrav15, PPTADR3 and MeerKAT DR1 under the aegis of the third data release (DR3) of the IPTA, will produce a data set of unprecedented sensitivity that will help to pin down the nature of the signal presented here and to constrain its properties.

Fig. 1 :
Fig.1: Spectral properties of a CRS assuming HD correlations.The left panel shows the free spectrum, the independent measurement of common power at each frequency bin, for the two versions of the EPTA-only data set.The right panel shows the 1/2/3σ contour of the 2D posterior distribution of amplitude and spectral index when modelling the spectrum with a power law.In both panels, results for DR2full are in blue, while those of DR2new are in orange.The solid lines in the left panel are the power law best-fits to the GWB (see main text for the parameters of the fit), while the vertical dashed line indicates the position of f = 1 yr −1 .The vertical dashed line in the right panel denotes γ = 13/3.

Fig. 2 :
Fig. 2: Difference distributions between two posterior distributions originating from GWB processes.The left panel depicts the difference distribution between DR2new and DR2full data sets while employing the ENTERPRISE package.In comparison, the right panel shows the tension contour between ENTERPRISE and FORTYTWO software packages when we employ the DR2new data set.The plots contain three contours: 1σ, 2σ, and the ∆ contours that correspond to the value of the computed tension.

Fig. 3 :
Fig.3: Amplitude and S/N for a common red signal with γ = 13/3 for the optimal statistic.The top and bottom panels show the noise-marginalised distributions of the squared amplitude A 2 CRS and S/N, respectively, for a common signal with different correlation patterns, with HD in blue, monopole in orange, and dipole in green.Solid lines are results from DR2new and the dashed lines are results from DR2full.

Fig. 4 :
Fig. 4: Optimal statistic amplitude (left) and S/N (right) for a range of spectral indices for a common red signal using the DR2new data set.Results for the HD model are shown in blue, the dipole model in green, and the monopole model in orange.

Fig. 5 :
Fig. 5: Binned overlap reduction function.Blue is for DR2full while orange is for DR2new.The left panel shows violins of the posterior of the correlation coefficients averaged at ten bins of angular separations with 30 pulsar pairs each.The black line is the HD curve based on theoretical expectation of a GWB signal.The grey histogram is the arbitrarily normalised distribution of the number of pulsar pairs at different angular separations.The right panel is the corresponding 2D posterior for the amplitude and spectral index of the common correlated signal, showing 1/2/3 σ contours.

Fig. 7
Fig. 7: 1−cumulative density function (CDF) from phase shifts.The top panel is for the Bayes factors for PSRN+GWB versus PSRN+CURN using ENTERPRISE (EP) and FORTYTWO (42).It should be noted that due to the computational cost, we calculated only a limited number of phase-shifted BFs.This could explain the differences in the 1−CDFs.The OS S/N for the HD correlation with ENTERPRISE is shown in the bottom panel.Blue lines are for DR2full while orange lines are for DR2new.Vertical lines are the measured Bayes factor for PSRN+GWB versus PSRN+CURN reported in Table5or the OS S/N HD reported in Table4respectively.The estimated p-values for each method are given in the legends.

Fig. 8 :
Fig.8: 1−cumulative density function (CDF) from skyscrambled optimal statistic HD S/N distributions and a comparison between different methods.In the top panel, the blue histograms are for DR2full while the orange histograms are for DR2new.Vertical lines are the measured S/N HD values reported in Table4.The large discrepancy between the two data sets could be an indication that some remaining signal is still present after the scrambling, especially in the strong S/N regime.More checks need to be performed to assess the validity of this method.The bottom panel compares a simulated null distribution in cyan, the generalised χ 2 (GX2) distribution from(Hazboun et al. 2023) in purple and the two proxy methods of phase shifting in orange and sky scrambling in green.At the measured value from the DR2new, the two methods differ by a factor of a few.The estimated p-values for each method are given in the legends.

Fig. 9 :
Fig.9: Power law posterior constraints from a split analysis using the auto-correlation terms (CURN) and the crosscorrelation terms (assuming HD correlation) separately.The left two columns show the recovered power law from the autocorrelation terms and the right two columns show the crosscorrelation terms.Blue distributions are for DR2full while orange distributions are for DR2new.
4.4.3.Distinguishing a common red spectrum from similar individual pulsar noise spectra It was shown in Goncharov et al. (2021b) that the CURN hypothesis may become strongly favoured over the PSRN hypothesis even when spectra of temporal correlations across pulsars are similar and yet not strictly common, as implied by the CURN Article number

Fig. 10 :
Fig. 10: Spectral properties of a GWB signal in the style of Figure 1 for the DR2full (top) and DR2new (bottom) comparing the analysis without (blue) and with (orange) BAYESEPHEM.

Fig. 11 :
Fig.11: Tests of the CURN model.The top two panels show that pulsar spectra of temporal correlations attributed to CURN are indeed consistent with representing the same spectrum(Goncharov et al. 2022b).The blue lines show the measurement for DR2full, whereas the orange lines show the measurement for DR2new.The top left panel is the measurement of the standard deviation of the CURN amplitudes across pulsar spectra, σ log 10 A , marginalised over the mean of the CURN amplitudes, µ log 10 A , as well as pulsar-intrinsic noise parameters.The dashed vertical line denotes an upper limit at 95% credibility.The top right panel shows both the mean and the standard deviation.The inferred mean value is consistent with the measurement of log 10 A CURN denoted by a dashed vertical line.The bottom panel shows the logarithm of the dropout factors(Arzoumanian et al. 2020) which suggest pulsars' contribution to the CURN model.Positive values represent support for the CURN model, whereas negative values point towards inconsistency of pulsar data with the CURN model.The differences in the dropout factors between the data sets could be due to differences in the recovered CURN or the pulsar red noise spectral properties.

Fig. 14 :
Fig. 14: Difference distributions of posteriors while including the InPTA data.The left panel is associated with posteriors from DR2full+ and DR2full.In contrast, the right panel involves the comparison between DR2new+ and DR2new.Both panels show the tension for the GWB model.These plots provide three contours: 1σ, 2σ, and the ∆ that represent the computed tension value.

Fig. A. 2 :
Fig. A.1: Overlap reduction function reconstructed using Chebyshev polynomial.The figure is in the style of Figure 5, with the difference in the left panel being that the dashed and dotted lines indicate the central 95 and 99.7 % credible regions of the reconstructed spatial correlations.

FigFig. B. 2 :
Fig. B.1: Spectral properties of a CRS signal assuming HD correlations in the style of Figure 1 for all four data sets.

Table 2 :
90% credible regions for the power law parameters constraints in the different Bayesian analyses with DE440 for both DR2full and DR2new.The analyses included the search for common uncorrelated red noise (CURN), gravitational wave background (GWB), and a common correlated signal with overlap reduction function (ORF) modelled with different methods (binned ORF, Chebyshev ORF, and Legendre ORF).

Table 3 :
Raveri & Doux (2021)f σ) produced by the tensiometer package, detailed inRaveri & Doux (2021)when comparing posteriors produced by various data sets and software packages.The second column compares the posteriors between the DR2new and DR2full data set while employing ENTERPRISE and FORTYTWO (in brackets).On the contrary, the third column compares the posteriors given by the ENTERPRISE and FORTYTWO software packages running on the DR2new and DR2full data sets (in brackets).

Table 5 :
Model selection for different inter-pulsar correlation models for a common red signal (CRS).We present Bayes factors (BF), for different CRS models against the CURN model.We assume the DE440 SSE fit and use the PSRN+CURN model as the reference model.The model component acronyms are: (i) PSRN = individual Pulsar noise, (ii) CURN = common uncorrelated red noise, (iii) GWB = gravitational wave background with quadrupolar (HD) angular correlation, (iv) CLK = common signal with monopolar spatial correlation, as expected from a clock error, (v) EPH = common signal with dipolar spatial correlation, as expected from SSE errors.

Table 6 :
Bayes factors for different CRS models against the CURN model adding BAYESEPHEM to all models with the ENTERPRISE pipeline.The model component acronyms are as in Table 5

Table 7 :
90% credible regions for the constraints of power law parameters in the different Bayesian analyses with DE440 for both DR2full+ and DR2new+, which include the addition of InPTA data.