The orbital-decay test of general relativity to the 2% level with 6-year VLBA astrometry of the double neutron star PSR J1537+1155

PSR J1537+1155, also known as PSR B1534+12, is the second discovered double neutron star (DNS) binary. More than 20 years of timing observations of PSR J1537+1155 have offered some of the most precise tests of general relativity (GR) in the strong-field regime. As one of these tests, the gravitational-wave emission predicted by GR has been probed with the significant orbital decay ($\dot{P}_\mathrm{b}$) of PSR J1537+1155. However, compared to most GR tests provided with the post-Keplerian parameters, the orbital-decay test was lagging behind in terms of both precision and consistency with GR, limited by the uncertain distance of PSR J1537+1155. With an astrometric campaign spanning 6 years using the Very Long Baseline Array, we measured an annual geometric parallax of $1.063\pm0.075$ mas for PSR J1537+1155, corresponding to a distance of $0.94^{+0.07}_{-0.06}$ kpc. This is the most tightly-constrained model-independent distance achieved for a DNS to date. After obtaining $\dot{P}_\mathrm{b}^\mathrm{Gal}$ (i.e., the orbital decay caused by Galactic gravitational potential) with a combination of 4 Galactic mass distribution models, we updated the ratio of the observed intrinsic orbital decay to the GR prediction to $0.977\pm0.020$, three times more precise than the previous orbital-decay test ($0.91\pm0.06$) made with PSR J1537+1155.


Pulsars in double neutron star systems
Double neutron stars (DNSs) are prized testbeds on which to evaluate theories of gravity and to probe the composition of neutron stars (NSs). The DNS merger event GW170817 has been recorded both by gravitational-wave (GW) observatories and electromagnetically (e.g. Abbott et al. 2017a,b;Goldstein et al. 2017;Mooley et al. 2018), providing constraints on the interior composition of NSs (e.g. Annala et al. 2018). The same merger event also strengthens the belief that short Gamma Ray Bursts (SGRBs) are generated by DNS mergers (e.g. Coward et al. 2012), though most SGRBs are well beyond the horizon of the current ground-based GW detectors. In addition, DNS mergers are considered the prime sources of r -process elements (Eichler et al. 1989;Korobkin et al. 2012;Drout et al. 2017). To test the connection between DNS mergers and the observed abundance of r -process elements in the local universe, an estimate of the DNS merger rate is required, which can be constrained with observations of the Galactic DNS population (e.g. Kim et al. 2015;Pol et al. 2019).
During their steady inspiral stage, DNS systems can be studied by measuring and modeling the pulse timeof-arrivals (ToAs) from a pulsar residing in a DNS system (hereafter referred to as a "DNS pulsar"). So far, 16 known DNS pulsars and 3 suspected ones have been discovered from pulsar surveys (see Table 1 of Andrews & Mandel 2019), including two found in globular clusters. Though in shallower gravitational potentials compared to DNS mergers, DNS pulsars provide some of the most precise tests on gravitational theories in the strong-field regime with long-term timing observations (e.g. Stairs 2003; Kramer et al. 2006;). Gravitational theories are tested with DNS pulsars by comparing observed post-Keplerian (PK) parameters, which quantify effects beyond a simple Keplerian model of motion, to the predictions of a specific gravitational theory, e.g., the general theory of relativity (GR). However, the theory-dependent prediction of each PK parameters relies on the masses of the two DNS constituents. Therefore, one needs at least three PK measurements to test a gravitational theory, as two of them have to be used to determine the two DNS constituent masses (based on the theory to be tested). The PK parameters include (but are not limited to)Ṗ b ,ω, γ, r and s, which stand for, respectively, the orbital decay, the advance of periastron longitude, the Doppler coefficient, the "range" of the Shapiro delay effect and the "shape" of the Shapiro delay effect. To date, the best test of GR was provided with the double pulsar system PSR J0737−3039A/B (Kramer et al. 2006), thanks to the extra independent mass ratio constraint (unavailable for other DNSs).

PSR J1537+1155
PSR J1537+1155 (also known as PSR B1534+12, hereafter referred to as J1537) is the second DNS system discovered (Wolszczan 1991) in a 10.1-hr orbit. J1537 shows an exceptionally high proper motion among DNS pulsars (see Table 3 of Tauris et al. 2017), which has been explained with an unusually large kick of 175-300 km s −1 received from the second supernova (in the evolution history of J1537) (Tauris et al. 2017). Based on the timing observations of J1537, the combinedωγ -s test returned consistency with GR at the 0.17% level (Fonseca et al. 2014). However, its observed in-trinsicṖ b deviated from GR prediction, a result which was thought to be due partly or mostly to the poorly constrained distance to the pulsar (Stairs et al. 2002;Fonseca et al. 2014). Furthermore, due to the exceptionally high proper motion, the large uncertainty in the distance to J1537 has become the primary limiting factor of theṖ b test (Stairs et al. 1998(Stairs et al. , 2002Fonseca et al. 2014, also explained in Section 4).
The hitherto most precise distance to J1537 is 1.051 ± 0.005 kpc (Fonseca et al. 2014), obtained by solving for the distance that matches the orbital period derivative observed with pulsar timing, assuming the correctness of GR (Bell & Bailes 1996). However, such "timing kinematic distances" (which, in case of confusion, are conceptually different from the distances derived with radial velocities and a Galactic rotation model, e.g. Kuchar & Bania 1994;Wenger et al. 2018) cannot be used to test theories of gravity, as GR has been assumed to be correct. To carry out theṖ b test of GR with J1537, one has to have an independent measurement of its distance (Stairs et al. 2002) in order to correct the distancedependent terms from the observed orbital decay. Prior to this work, the best independent distance for J1537 has been based on its dispersion measurement (DM) along with a model of the distribution of Galactic free-electron density n e , i.e., 0.7 ± 0.2 kpc with the TC93 model (Taylor & Cordes 1993). However, there are significant downsides with employing DM-based distances for this purpose. While generally reliable for the population as a whole, DM-based distances can be inaccurate for individual sources (e.g. ). This inaccuracy is more likely for sources at high Galactic latitudes b, such as J1537 at b = 48 • , due to sparser pulsars (that allow DM measurements) in those directions. Moreover, the two more recent n e models (NE2001 and YMW16, Cordes & Lazio 2002;Yao et al. 2017) have been built using timing-derived distance of J1537, meaning the DM distance of J1537 is no longer independent for these two n e models.
Compared to the aforementioned ways to measure the distance to J1537, geometric measurements of the distance to J1537 (based on the change in angle or relative distance to the source as the Earth orbits the Sun) offer the ability to measure the source distance to higher precision and free of model dependency. Such geometric measurements can be realized with global fitting from pulsar timing or VLBI (very long baseline interferometry) observations in the radio band. Based on pulsar timing, the (timing) parallax of J1537 was measured to be 0.86 ± 0.18 mas (Fonseca et al. 2014). However, as is pointed out in Fonseca et al. (2014), precise determination of timing parallax is often hampered by the covariance between parallax and DM; the stochastic variations in the latter introduced by the changing sightline between the pulsar and Earth can corrupt the timing parallax. Therefore, VLBI astrometry remains the best way to obtain the most precise model-independent geometric distance to J1537.
In this letter, we present the astrometric results of J1537 obtained with VLBI observations spanning 6 years. Based on the new distance, we strengthen thė P b test of GR with J1537. Throughout this letter, the uncertainties are provided at 68% confidence level unless otherwise stated.

OBSERVATIONS AND DATA REDUCTION
As part of the MSPSRπ program (e.g. Ding et al. 2020a), J1537 was first observed with the Very Long Baseline Array (VLBA) at around 1.5 GHz from July 2015 to July 2017, which include 2 2-hr pilot observations under the project code BD179 and 9 1-hr ob-servations under the project code BD192. The astrometric campaign was extended with 6 2-hr VLBA observations between August 2020 and July 2021 under the project code BD229. The observation and correlation strategy is identical to that of the PSRπ program (Deller et al. 2019). ICRF J150424.9+102939 and ICRF J154049.4+144745 were observed as the bandpass calibrator and the primary phase calibrator, respectively. FIRST J153746.2+114215, 16. 3 away from J1537, has been identified and adopted as the secondary phase calibrator. At correlation of each observation, pulsar gating, based on pulse ephemerides of J1537 monitored with our timing observations, was applied to increase the S/N of detection.
All correlated data were reduced with the psrvlbireduce (https://github.com/dingswin/ psrvlbireduce) pipeline written in ParselTongue (Kettenis et al. 2006), which bridges python users to the two data-reduction packages AIPS (Greisen 2003) and DIFMAP (Shepherd et al. 1994). The final image-plane models of the two phase calibrators used for the data reduction can be found at https://github.com/dingswin/ calibrator models for astrometry.
The turbulent ionised interstellar medium between the Earth and J1537 leads to diffractive and refractive interstellar scintillation, which can increase or decrease the pulsar flux density (Stairs et al. 2002). In four of the 17 VLBA epochs, scintillation reduced the brightness of J1537 below the detection threshold. For each of the remaining 13 epochs of detection, the pulsar position and its statistical (or random) uncertainty was obtained by fitting an elliptical gaussian to the deconvolved pulsar image. The acquired pulsar positions are provided in Table 1.

ASTROMETRIC RESULTS
Upon obtaining the 13 pulsar positions, we proceeded to estimate their systematic errors. This is because small residual calibration errors remain, even though direction-dependent calibration terms (of systematic errors) have been mitigated by the use of a close in-beam calibrator. We used the empirically derived expression from Deller et al. (2019) to approach the systematic errors. For each epoch, the estimated systematic error was subsequently added in quadrature to the random error of the position. The positional uncertainties, including random and systematic errors, can be found in Table 1 alongside the pulsar positions. To make it easier for other researchers to reproduce the error budget, the image S/N for both J1537 and the secondary phase calibrator are also presented in Table 1. For the pulsar positions, the nominal systematic errors are around 0.14 mas and 0.33 mas in the right ascension (RA) and declination direction, respectively; in comparison, the median random errors are roughly twice the nominal systematic errors due to the faintness of J1537.
Based on the 13 pulsar positions and their associated positional uncertainties (including the systematic errors described above), we derived the astrometric results in three different methods: direct least-square fitting, bootstrap and Bayesian inference. Direct fitting was performed using pmpar (https://github.com/ walterfb/pmpar). A bootstrap was implemented as described in Section 3.1 of Ding et al. (2020a). Compared to direct fitting and bootstrap, Bayesian analysis offers a simpler means to incorporate prior astrometric information (obtained elsewhere), and to infer extra orbital parameters (e.g., the longitude of ascending node and inclination angle) when positional precision allows (e.g. Deller et al. 2013). We carried out Bayesian inference with sterne (aStromeTry bayEsian infeReNcE, https://github.com/dingswin/sterne). For the Bayesian inference, we assumed timing proper motion and parallax (reported in Fonseca et al. 2014) follow Gaussian distributions, and used them as prior distributions for proper motion and parallax; the negligible (at the 5 µas level) reflex motion of J1537 (i.e., sky-position shifts due to the orbital motion) was not fitted. For both bootstrap and Bayesian analysis, we adopted the median value (of the marginalized sample) for an astrometric parameter as the estimate, and used the 16th and 84th percentiles to mark the 1 σ uncertainty interval.
The astrometric results acquired with the three methods, as well as the three parallax-based distances, are summarized in Table 2. For comparison, the distances based on dispersion measure (DM) and pulsar timing are reproduced in Table 2. Here, the timing distance is quoted from the timing kinematic distance reported in Fonseca et al. (2014), which is derived from the orbital decay of J1537 by assuming GR is correct. In addition, the parallax signature, revealed by the 13 pulsar positions, is shown in Figure 1. Figure 2 presents the posterior samples simulated with MCMC in the Bayesian analysis, which suggests negligible correlation in most (7 out of 10) pairs of astrometric parameters. However, small correlation is found between the three astrometric parameters having RA component, i.e., the reference RA α J2000 , the RA proper motion component µ α and the parallax . The largest correlation coefficient |ρ| is 0.16 between and α J2000 , while |ρ| = 0.14 between µ α and .
According to Table 2, the astrometric results obtained with the three methods agree with each other; the new model-independent distances are generally con- sistent with the DM-based distance and the timing kinematic distance. The consistency between the new model-independent distances and the timing kinematic distance will be further improved in Section 4 after updating the timing kinematic distance. The small reduced chi-square χ 2 ν of 0.81 (for the method of direct fitting) implies that systematic errors for the 13 pulsar positions may have been slightly over-estimated. When applying the timing proper motion and parallax as prior information in the Bayesian analysis, χ 2 ν only rises a little to 0.84, which indicates the timing proper motion and parallax (Fonseca et al. 2014) are consistent with the VLBI data. Given a chi-square of ∼ 17 for 21 degrees of freedom, we did not see sufficient evidence to revise our estimated systematic uncertainties (reducing the estimated systematic uncertainty would bring the χ 2 ν closer to unity and increase the parallax significance).
In the following discussion, we adopt the astrometric results derived with Bayesian analysis, which incorporates the VLBI and timing measurements. For those who want to use VLBI-only results (such as pulsar timers of J1537), we recommend the bootstrap results in Table 2, as bootstrap can potentially correct improper error estimations to an appropriate level (see Ding et al. 2020b as a good example), especially when the number of measurements is relatively large ( 10 for VLBI astrometry).
whereṖ Gal b andṖ Shk b stand for the extrinsic orbital decay due to the apparent effect of radial acceleration Accordingly, the reference position errors do not take into account the position errors of the main and second phrase calibrators. b In the Bayesian analysis, we adopted the timing proper motion and parallax (Fonseca et al. 2014) as priors (assuming Gaussian distribution). c Taylor & Cordes (1993). For the two newer Galactic free-electron distribution models (Cordes & Lazio 2002;Yao et al. 2017), timing-derived distances of J1537 have been incorporated into their establishment, thus becoming correlated to the associated DM-based distances. d The timing results are reported in Fonseca et al. (2014); here, the quoted distance is the timing kinematic distance derived with the assumption that GR is correct (instead of with the timing parallax). We note that this timing kinematic distance is inferred withṖ Gal b = −3.5 fs s −1 (see Table 3 and Section 4 for more details) based on the Galactic mass distribution model by Nice & Taylor (1995). caused, respectively, by Galactic gravitational potential (Damour & Taylor 1991;Nice & Taylor 1995) and by transverse motion (Shklovskii 1970);Ṗ GW b represents the intrinsic orbital decay as a result of the GW emissions from the inspiraling DNS. The estimation of the two extrinsic orbital-decay terms rely on the distance to J1537, whileṖ Shk b also depends on the proper motion. On the other hand, the GR-basedṖ GW b can be calculated, provided the orbital period P b , the orbital eccentricity e and the masses of the two DNS constituents (Peters & Mathews 1963;Weisberg & Huang 2016), all of which have been precisely determined with pulsar timing (Fonseca et al. 2014). Hence, one can test GR by comparing the observed intrinsic orbital decaẏ . For J1537, this test is the one (among the tests with PK parameters, see Section 1.1) that showed the largest discrepancy with GR (see Figure 9 of Fonseca et al. 2014), possibly due to the unreliable DM-based distance used for the test.
Using Equation 22 of Weisberg & Huang (2016), we calculatedṖ GW b = −192.45 ± 0.06 fs s −1 . Using the proper motion µ = µ 2 α + µ 2 δ and the distance D acquired with Bayesian inference (see Table 2), we updateḋ P Shk b = µ 2 D/c · P b = 53 ± 4 fs s −1 . The uncertainties forṖ GW b andṖ Shk b were derived with error propagation, which were subsequently confirmed by Monte-Carlo simulations. In theṖ Shk b error estimation, we did not take into account the small correlation between µ α and (mentioned in Section 3). This is because the corre-lation between µ and is still negligible, as the declination component dominates the proper motion (see Table 2). can be attributed to two sources: the uncertainty in the measurements (such as distance and proper motion) and the inaccuracy of the Galactic mass distribution model. The formerṖ Gal b errors, at the ≤ 0.1 fs s −1 level (see Table 3), were derived with Monte-Carlo simulations. We approached the latterṖ Gal b errors with the standard deviation of thė P Gal b estimates listed in Table 3. For this calculation of the standard deviation, we do not include theṖ Gal b based on the analytical model by Nice & Taylor (1995), because 1) the analytical model is oversimplified (see the discussion in Appendix A of Zhu et al. 2018) and 2) the resultantṖ Gal b is inconsistent with other models (see Table 3). Accordingly, we adopted the averageṖ Gal b of the four remaining Galactic mass distribution models (Dehnen & Binney 1998;Binney & Tremaine 2011;Piffl et al. 2014;McMillan 2017) Table 2.
(1995) (see Table 3) for the calculation of the timing kinematic distance. Provided our newṖ Gal b , the timing kinematic distance of 1.05 kpc (reported by Fonseca et al. 2014 and quoted in Table 2) would decrease by 3% to 1.02 kpc, thus becoming consistent with the new model-independent distance (see Table 2).
Collectively, we reachedṖ int b = −188.0 ± 3.8 fs s −1 , corresponding toṖ int ḃ which is the third most precise orbital-decay test of GR in the strong-field regime according to Table 3 of Weisberg & Huang (2016). At the 2% precision level, the new observed intrinsic orbital decay is within 1.2 σ of the GR prediction (see Figure 3), which relieves the mild tension of the previousṖ b test (Ṗ int b /Ṗ GW b = 0.91±0.06 at 1.7 σ agreement, Stairs et al. 2002).
For visualisation, the mass-mass diagram of J1537 (updated from Figure 9 of Fonseca et al. 2014) is presented in Figure 3, which involves 6 PK parameters.
Apart from the 5 PK parameters already mentioned in Section 1, Ω spin 1 stands for the precession rate of the pulsar. Each PK parameter is a function of the two DNS constituent masses. Therefore, each observed PK parameter (and its uncertainty) offers a constraint on the two masses. If GR is correct, all mass-mass constraints should converge at the "true" masses of the pulsar and its companion. In Figure 3, this convergence is visible with the newṖ int b . Looking into the future, the bottleneck of the orbitaldecay test with J1537 will continue to be its parallax uncertainty, which would decrease with t −1/2 (e.g. Ding et al. 2021; here, t stands for the on-source time instead of the time span) in the long term despite fluctuations of J1537 brightness. This process of precision enhancement would be accelerated with high-sensitivity VLBI observations, as the VLBA observations of J1537 are generally sensitivity-limited, especially when J1537 is down-scintillated.  Nice & Taylor (1995) 1.1(1) −4.6(1) −3.51(6) a Dehnen & Binney (1998) 0.120(1) −2.04 (6)  b There are 4 models discussed in Dehnen & Binney (1998).
Here, we used the "model 3", which falls into the middle of the models 1 to 4, and is generally consistent with the other 3 models.
We are grateful to the anonymous referee for the swift and valuable comments, and appreciate the helpful discussions with Leonid Petrov, David Kaplan and Joseph Lazio. B.S. and A.L. thank Mitch Mickaliger for help with timing data acquisition and preparation. H.D. is supported by the ACAMAR (Australia-ChinA Consor-tiuM for Astrophysical Research) scholarship, which is partly funded by the China Scholarship Council (CSC). A.T.D is the recipient of an ARC Future Fellowship (FT150100415). Pulsar research at UBC is supported by an NSERC Discovery Grant and by the Canadian Institute of Advanced Research. Pulsar research at JBCA is supported by a Consolidated Grant from STFC. Parts of this research were conducted by the Australian Re-search Council Centre of Excellence for Gravitational Wave Discovery (OzGrav), through project number CE170100004. This work is based on observations with the Very Long Baseline Array (VLBA), which is operated by the National Radio Astronomy Observatory (NRAO). The NRAO is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc. Data reduction and anal-  Table 2) is provided with two dashed curves. For the other PK parameters, the green, pink, red, black and yellow strips stand for the mass-mass constraints posed by the time-averaged gravitational redshift γ = 2.0708(5) ms, the Shapiro delay "shape" s = 0.977(2), the Shapiro delay "range" r = 6.6(2) µs, the periastron advance rateω = 1.755795(2) deg yr −1 and the pulsar precession rate Ω spin 1 = 0.59 +0.12 −0.08 deg yr −1 , respectively (Fonseca et al. 2014).
ysis was performed on OzSTAR, the Swinburne-based supercomputer. This work made use of the Swinburne University of Technology software correlator, developed as part of the Australian Major National Research Facilities Programme and operated under license.