A new pulsar timing model for scalar-tensor gravity with applications to PSR J2222-0137 and pulsar-black hole binaries

Context. Scalar-tensor gravity (STG) theories are well-motivated alternatives to general relativity (GR). One class of STG theories, the Damour-Esposito-Farese (DEF) gravity, has a massless scalar field with two arbitrary coupling parameters. We are interested in this theory because, despite its simplicity, it predicts a wealth of different phenomena, such as dipolar gravitational wave emission and spontaneous scalarization of neutron stars (NSs). These phenomena of DEF gravity can be tested by timing binary radio pulsars. Aims. We aim to develop a new binary pulsar timing model DDSTG to enable more precise tests of STG theories based on a minimal set of binary parameters. The expressions for post-Keplerian (PK) parameters in DEF gravity are self-consistently incorporated into the model. The new technique takes into account all possible correlations between PK parameters naturally. Methods. Grids of physical parameters of NSs are calculated in the framework of DEF gravity for a set of 11 equations of state. The automatic Differentiation (AutoDiff) technique is employed, which aids in the calculation of gravitational form factors of NSs with higher precision than in previous works. The pulsar timing program TEMPO is selected as a framework for the realization of the DDSTG model. The implemented model is applicable to any type of pulsar companions. Results. We apply the DDSTG model to the most recently published observational data for PSR J2222-0137. The obtained limits on DEF gravity parameters for this system confirm and improve previous results. New limits are also the most reliable because DEF gravity is directly fitted to the data. We argue that future observations of PSR J2222-0137 can significantly improve the limits and that PSR-BH systems have the potential to place the tightest limits in certain areas of the DEF gravity parameter space.


Introduction
General Relativity (GR) proved to be the most successful theory of gravity for more than a century.Up to now, it has passed all experimental tests with flying colours.The weak field regime is verified by the Solar System experiments (Will 2014), whereas strong field effects are especially well tested by the timing of binary pulsars (Wex & Kramer 2020).Among other things, pulsar tests enable precise tests of the radiative properties of gravity and the strong equivalence principle (SEP).Furthermore, the largescale behaviour of gravity (low spacetime curvature and temporal variation) is tested in cosmological observations (Clifton et al. 2012a).Finally, in the last few years, ground-based grav-itational wave (GW) detectors observed GWs from coalescing binary black holes (Abbott et al. 2016(Abbott et al. , 2017c(Abbott et al. , 2019a(Abbott et al. , 2021b)), binary neutron stars (Abbott et al. 2017a(Abbott et al. ,b, 2019b) ) and also black hole-neutron star binaries (Abbott et al. 2021a); probing the hitherto unexplored highly dynamic, strong-field regime of gravity.Thus far, GR can account for all observed effects both in weak and strong gravitational fields, in the quasi-static and the highly dynamical regime, and on small as well as on large scales. 1  Despite such successes, there are still convincing reasons to investigate modified theories of gravity (Berti et al. 2015).GR describes gravitational interaction by a single massless, spin-2 tensor field g µν , known as the metric of spacetime.Some of the best motivated alternative gravity theories, the scalar-tensor gravity (STG) theories, incorporate additional scalar degrees of freedom to mediate gravity.Such scalar fields arise naturally in higher dimensional theories in their low energy limits, e.g., the old Kaluza-Klein theory (Kaluza 1921;Klein 1926) and string theories (Fujii & Maeda 2007).Scalar-tensor gravity theories are also motivated by cosmological questions of inflation, dark energy and even the thus far undiscovered quantum gravity (Clifton et al. 2012b).
The STG theory that for many years was considered as the only natural competitor of GR is known as Jordan-Fierz-Brans-Dicke (JFBD) gravity (Jordan 1955(Jordan , 1959;;Fierz 1956;Brans & Dicke 1961).2JFBD gravity is a metric theory (matter and nongravitational fields couple only to one specific spacetime metric) and therefore fulfils the Einstein equivalence principle by design (Will 1993;Damour 2012).However, an additional scalar field is nonminimally coupled to matter via the choice of a special conformal coupling function; thus, the strong equivalence principle (SEP) is violated (Will 2014).This coupling function depends on a single parameter ω BD , which by now is tightly constrained, in particular with the pulsar in the stellar triple system (Voisin et al. 2020).Bergmann (1968) and Wagoner (1970) presented the most general STG with one scalar field which is in its action at most quadratic in the derivatives of the fields.The most general mono-scalar-tensor theory with second-order field equations is Horndeski gravity (Horndeski 1974).In 1992, Damour & Esposito-Farèse (1992b) presented a generic class of STG theories with an arbitrary number of scalar fields.In the scope of this paper, we mainly focus on DEF gravity (Damour & Esposito-Farèse 1993).DEF gravity is a Bergman-Wagoner theory which has a massless scalar field and a quadratic coupling function with two arbitrary parameters.
Apart from being well-motivated, DEF gravity predicts interesting effects which are not present in weak fields of the Solar System but could be tested by employing pulsar astronomy.Neutron stars (NSs) have strong gravitational fields and large gravitational binding energies due to their high compactness.Because of this, in STG theories they can have large gravitational form-factors (also known as scalar charges), unlike weakly selfgravitating objects like normal stars and white dwarfs (WDs).Consequently, while STG theories generally predict the emission of dipolar and monopolar GWs, which increase the rate of orbital period decay relative to the GR expectation; this is expected to be especially strong in asymmetric binary systems containing a NS and a WD, due to the significant differences in their compactness and scalar charges.
Furthermore, Damour & Esposito-Farèse (1993, 1996) found a specific phenomenon in NSs called "spontaneous scalarisation".This fully non-perturbative effect excites the scalar field above the cosmological background value in NSs, allowing scalar charges of order unity, even if there is no deviation from GR in the weak-field regime.This could produce, for binary systems containing NSs with specific masses, a highly enhanced rate of dipolar GW emission.Some of the tightest tests of gravity with strongly selfgravitating objects are provided by the high precision timing of binary pulsars (Stairs 2003;Wex 2014;Shao & Wex 2016;Wex & Kramer 2020;Kramer et al. 2021).These tests are per-formed in a quasi-stationary strong-field gravity regime, where the gravitational field is strong near and inside NSs and produces high curvature, whereas the velocities are small compared to the speed of light v/c ∼ 10 −3 .The first timing model (BT) was introduced by Blandford & Teukolsky (1976) and allowed to extract information from the timing of binary pulsars.An extended model covering all relativistic effects in the dynamics of a binary system up to the first post-Newtonian level was proposed later by Damour & Deruelle (1985, 1986) (DD).
The DD model gave an opportunity to perform selfconsistent tests of gravity by means of the parametrised post-Keplerian (PPK) formalism (see, for instance, applications by Taylor & Weisberg 1989).In this generic framework, theoryindependent Keplerian and post-Keplerian (PK) timing parameters, which quantify the relativistic motion of the pulsar and the propagation of its radio signals, are fitted to the observational timing data.Apart from the well measured Keplerian parameters of the orbit, each PK parameter depends only on the masses of the pulsar and its companion.As soon as the expressions for the PK parameters in a particular gravity theory are known, these two masses can be derived from the PK parameters once at least two of them are known.If more than two PK parameters become available, then a test of the consistency of that gravity theory can be made.Later, Damour & Taylor (1992) expanded the DD model and introduced pulse structure parameters.They also showed that the DD model can be applied to a large set of fullyconservative theories of gravity and presented phenomenological PK expressions in a framework of generic boost-invariant theories (the modified Einstein-Infeld-Hoffmann formalism).
However, such an independent measurement of PK parameters means that any correlations between them inadvertently reduce the sensitivity of their measurements.Taylor & Weisberg (1989) solved this problem for GR with the introduction of the DDGR model, where the GR expressions for PK parameters are incorporated in the model, and two masses are directly fitted to the observational timing data.This mitigates or breaks any correlations between the PK parameters.
In this work we present a new pulsar timing model, the "DDSTG" model.The idea is very similar to that of the DDGR model: fitting directly for the masses of the components from pulsar timing data, but instead of GR using a particular member of the two-parametric DEF gravity.Apart from the theoretical predictions for all PK parameters within that theory, the model uses pre-calculated grids of physical parameters (gravitational form-factors) of NSs in that particular gravity theory.The direct fit of two masses of the companions reduces the model to a minimal set of parameters and naturally solves all the issues of possible correlations between the observed parameters.This feature, achieved by the construction, makes the DDSTG model superior to the traditional methods, e.g. the "PK method" based on the measurement of PK parameters.
As an application of the new timing model DDSTG, we use the most recently published timing data for PSR J2222−0137 from Cognard et al. (2017) and Guo et al. (2021b).This pulsar is interesting for several reasons: It is one of the closest pulsars known and has a good timing precision.A precise measurement of the variation of the orbital period ( Ṗb ) and the high asymmetry in compactness between the pulsar and its companion allows tight constraints on the emission of dipolar GWs.Moreover, the pulsar's mass m p = 1.831(10)M (Guo et al. 2021b) lies in the range (m p 1.5M ) subject to the spontaneous scalarisation effect; the non-detection of dipolar GW emission in this system has introduced strong constraints on the existence of that phenomenon, at least within the DEF framework (Zhao et al. 2022).
We organise the paper as follows: in Section 2, we introduce the basics of the Damour-Esposito-Farèse (DEF) gravity and present our machinery preparations necessary for the implementation of the new model.In Section 3, we discuss so far used standard timing models and introduce our new model DDSTG for testing STG theories.The next sections are devoted to the applications of our new timing model.In Section 4, we provide a brief description of the Guo et al. (2021b) data-set on PSR J2222−0137 and apply the DDSTG model to it, discuss the limits on DEF gravity and demonstrate that our new method provides indeed tighter and more reliable constraints than previous methods.In Section 5, based on simulated data-sets for the largest radio observatories, we discuss what limits on these theories might be achieved in the near future with continued timing of PSR J2222−0137 .In Section 6, we show possible future constraints from reasonable pulsar-black hole (PSR-BH) binary systems.We also briefly discuss the possible origin of PSR-BH systems and how it affects the test if such a system is located in a globular cluster (GC).Section 7 is devoted to a discussion of other alternative gravity theories and possibilities of DDSTG extensions beyond DEF gravity.In Section 8 we discuss the effect of the time-varying gravitational constant as a perspective future extension of the DDSTG model.In Section 9 we summarise our results.

Damour-Esposito-Farèse gravity
This paper investigates STG theories, natural alternatives to GR.Specifically, we mainly focus on DEF gravity.In the following we summarise aspects of that class of gravity theories that are relevant for this paper.

Theoretical aspects
DEF gravity is a STG theory with one long range, massless scalar field ϕ non-minimally coupled to the curvature scalar.The theory is fully described by the action, which is most simply presented in the Einstein frame (Damour & Esposito-Farèse 1993, 1996) where G * is the bare gravitational coupling constant, g * and R * are the determinant and the Ricci scalar associated with the Einstein metric (g * µν ).The last term in Eq. ( 1) describes an action associated with any matter fields (ψ m ).A(ϕ) is the coupling function and takes a specific exponential form in DEF gravity: Introduced quantities {α 0 , β 0 } are two arbitrary parameters of the theory and ϕ 0 is the scalar field at spatial infinity.The coupling function A(ϕ) plays the role of a conformal factor connecting the "physical metric" gµν = A 2 (ϕ)g * µν with the Einstein metric.The metric gµν is the one measured by laboratory clocks and rods due to the universal coupling of matter to this metric.The bare gravitational constant is simply related to the one measured in a Cavendish experiment by G Cav = G * (1 + α 2 0 ).GR corresponds to vanishing coupling parameters, i.e. α 0 = β 0 = 0, and JFBD gravity is recovered by setting β 0 = 0 while keeping α 0 0.
The field equations are derived by variation of the action in Eq. ( 1) and are most simply formulated in terms of the pure-spin field variables (g * µν , ϕ) in the Einstein frame: (2) with a material stress-energy tensor T µν * = 2cg −1/2 * δS m /δg * µν and α(ϕ) = ∂ ln A(ϕ)/∂ϕ, which measures the coupling strength between the scalar field and matter.
The parameterised post-Newtonian (PPN) framework allows describing a wide range of metric theories of gravity in their weak field approximation by ten independent PPN parameters.DEF gravity is a fully conservative theory and has only two PPN parameters that deviate from their GR values, which are also called Eddington parameters (Will 1993): (5) The PPN parameter γ Edd measures the spatial curvature induced by unit rest mass and β Edd is a measure of the amount of nonlinearity in the superposition law of gravity.Solar System experiments put limits on these parameters.The |γ Edd − 1| 2.3 × 10 −5 limit comes from the Shapiro time delay observed by the Cassini spacecraft (Bertotti et al. 2003;Will 2014).The |β Edd − 1| 8 × 10 −5 limit comes from observations of the perihelion shift of Mercury (Will 2014).The Cassini experiment yields a direct limit on α 2 0 1.15 × 10 −5 , whereas β 0 remains unconstrained from weak-field Solar System experiments, since α 0 can be arbitrarily small.

Scalarisation of neutron stars in DEF gravity
To place firm limits on β 0 parameter, we need to explore the strong-field effects of DEF gravity.One can expect deviations from GR in the presence of highly compact massive objects, in particular for NSs, due to the scalar field sourced by the strong internal gravitational field of the star.When placing limits on β 0 it is important that for certain areas in the DEF gravity parameter space one has a significant growth of the scalar field in the NS interior, even if the deviations for weakly self-gravitating bodies are very small, or even zero (Damour & Esposito-Farèse 1992a;Damour & Esposito-Farèse 1993, 1996).The origin of this spontaneous scalarisation lies in the tachyonic instability of the field equations happening when a NS reaches a sufficiently high compactness.The mechanism of this instability is similar to what happens in ferromagnets during spontaneous magnetisation (Damour & Esposito-Farèse 1996).It is a fully nonlinear non-perturbative effect, thus it requires accurate techniques to be properly analysed.
For testing DEF gravity, it is crucial to introduce three gravitational form-factors appearing in the expressions for PK parameters where all the derivatives are taken for a fixed baryonic mass ( mA ).The important quantity α A counts the effective couple strength between the ambient scalar field and a NS.The subscript A refers to the star A. α A is a strong-field counterpart to the parameter α 0 , which is its weak field approximation.If a NS mass exceeds the critical mass of m crit then α NS can become suddenly of order of unity α NS ∼ O(1) due to the scalarisation effect, while its weak field counterpart stays close to zero α 0 ∼ 0. The value of the critical mass depends on β 0 and the chosen equation of state (EOS).The second parameter β A reflects a change of the scalar charge when the asymptotic scalar field changes.The derivative β A can also obtain very large values in the transition scalarisation region, where α A changes fast with the change of a mass.The last quantity k A describes the field dependence of the moment of inertia I A and can become important in the eccentric systems.

Calculating grids of NSs
To test DEF gravity in precision timing experiments of binary pulsars, one must know the exact properties of NSs in that specific class of alternative gravity.In our work, we follow the procedure described in Damour & Esposito-Farèse (1996) for calculating NSs in DEF gravity.We assume the slowly rotating NS approximation with a stationary axisymmetric metric.Assuming this axial symmetry and neglecting terms of rotational velocity squared (or higher), the Einstein equations can be written as a system of 8 first-order ordinary differential equations (ODEs) for a set of radial functions (see the representation by Damour & Esposito-Farèse (1996)).The equations are complemented by appropriate boundary conditions placed at spatial infinity and the NS centre.Boundary conditions at spatial infinity and thus properties of NSs depend on the values of the theory parameters {α 0 , β 0 }.For solving NS structures, we developed a modular program in the Julia language (Bezanson et al. 2017).The program enables us to calculate both single NSs for selected parameters and grids of NSs for many desired parameters.The calculations are performed accurately and efficiently using the methods of parallel programming.Currently, DEF gravity is implemented as the chosen theory of gravity, but our program allows to use any coupling function A(ϕ) to extend our tests beyond DEF gravity parametrisation.
The internal structure and NSs parameters significantly depend on EOS.The strong dependence of the scalarisation phenomenon on the choice of EOS was extensively analysed by Shao et al. (2017) and Zhao et al. (2022) in order to put constraints on this non-linear phenomenon from radio pulsars.In our work, we use a piece-wise polytropic approximation for the EOSs following the procedure in Read et al. (2009).We select 11 EOSs that allow us to cover a range from soft to stiff while still being in agreement with the maximum mass observed for a NS (for more information about selected EOS we refer to Appendix A.2).For tests in the present paper, we select MPA1 highdensity EOS (Müther et al. 1987) in its piece-wise polytropic form (Read et al. 2009) with three polytropic pieces.MPA1 is a stiff EOS with a maximum mass of a NS equal to 2.46 M .For low densities, we use an analytic form of the SLy EOS proposed by Douchin & Haensel (2001) and approximated by four polytropic pieces (Read et al. 2009).In general stiff EOSs (e.g.MPA1) produce more conservative limits in most parts of DEF gravity parameter space.
In our program, we utilise the Automatic Differentiation (AutoDiff) technique (see e.g.Margossian (2018)) for calculating derivatives of NSs quantities.AutoDiff is a powerful alternative to numerical differentiation deprived of numerical errors (for detailed explanation see Appendix A.1). AutoDiff algorithm utilises exact expressions of derivatives for elementary functions and a chain rule to calculate complex derivatives with the working machine-precision.Thus our program with applied AutoDiff technique calculates gravitational form-factors {α A , β A , k A }, presented as derivatives in Eq. ( 6), with very high numerical precision.
Using the developed program, we calculate gravitational form-factors and masses for an extensive grid of DEF parameters and central pressures {α 0 , β 0 , p c } assuming a specific EOS.Our grids cover ranges of α 0 ∈ [−10 −1 , −10 −5 ], β 0 ∈ [−6, +10], p c ∈ [10 34 , 10 36 ] dyn/cm 2 and are sampled linearly for log α 0 , log p c and β 0 .The size of each grid is {101, 351, 121} points respectively.More technical details about the procedure of calculating NS properties, the AutoDiff technique, and the structure of precalculated grids can be found in Appendix A. We stick to precalculating grids because it is numerically costly to calculate NS structures for a particular mass on the fly.Further one can perform interpolation over these pre-calculated grids or methods beyond interpolation, e.g reduced-order models in Guo et al. (2021a).The achieved accuracy of grid values and further interpolation over the grid is sufficient for our purposes.
The accuracy of the calculations can be checked by comparison of the scalar charge (α A ) calculated as the derivative from Eq. (6) α A, deriv and its direct value from the asymptotic behaviour of the external scalar field obtained after integration α A, asympt (Damour & Esposito-Farèse 1993).The relative accuracy expressed by δ rel = |1 − α A, deriv /α A, asympt | lies in the range δ rel ∼ 10 −7 − 10 −13 for our calculations.The calculated scalar charges are significantly more accurate compared to previous works, e.g., Anderson & Yunes (2019) with typical accuracy of δ rel ∼ 10 −1 − 10 −5 .However, it is essential to mention that the accuracy of several percent is already sufficient for testing DEF gravity with binary pulsar timing.
Besides, we also find that the NS structure appears to be very sensitive to internal thermodynamic consistency.The piecewise polytropic EOSs fulfil the thermodynamic consistency by their definition.However, often one uses tabulated EOSs instead, which must be interpolated.Unfortunately, a commonly used log-log interpolation of tabulated EOSs can lead to significant numerical errors in the scalar derivatives of Eq. ( 6) because of its thermodynamic inconsistency. 3 For this reason, we use piece-wise polytropic approximations instead of tabulated 3 Doing an interpolation of a tabulated EOS in a way that correctly accounts for the first law of thermodynamics is not a trivial problem and requires a formulation in terms of Helmholtz free energy (Swesty 1996).If one uses a simple linear interpolation in {log n, log p} and {log n, log ε} planes for a tabulated EOS (n is the baryon number density and ε is the energy density), the accuracy check can fail by the amount of δ rel ∼ O(1) − O(0.1) in the region of strong scalarisation.The reason is that such interpolations do not obey the first law of thermodynamics dn d = n +p which is assumed for structure equations and defines the baryonic mass mA (Hartle 1967;Damour & Esposito-Farèse 1996).As a consequence, this not only affects NS masses but even more so the EOSs, which is absolutely sufficient for conducting our tests and for exploring the EOS dependence of pulsar constraints on DEF gravity.

A new timing model for scalar-tensor gravity
To place constraints on a gravity theory from the observations of a binary pulsar, one must know its predictions for the binary system's motion and the propagation of the electromagnetic signal in the curved spacetime of the binary system.The timing formula is the tool to capture the relativistic effects predicted by the theory from a sequence of pulse arrival times on Earth.The timing formula relates the observed (topocentric) time of arrival (TOA) of the pulse and its time of emission.

Binary pulsar timing models
The first timing model (BT) was introduced by Blandford & Teukolsky (1976) in order to describe the timing of the first binary pulsar, PSR B1913+16, which had been discovered earlier by Hulse & Taylor (1975).The BT model assumes that the pulsar and its companion follow a Keplerian motion with additional secular changes in the Keplerian parameters of the orbit.utilised Keplerian parameters are the orbital period (P b ), the epoch of periastron passage (T 0 ), the orbital eccentricity (e), the longitude of periastron (ω), and the projected semi-major axis of the pulsar's orbit (x).The model accounts for a combination of a special-relativistic time dilation and a gravitational redshift; this periodic effect is described by an extra parameter called the Einstein delay (γ).The BT model accounts for (linear-in-time) secular changes in P b , e, ω and x, introducing PK parameters: rates of change of the orbital period ( Ṗb ), eccentricity (ė), longitude of periastron ( ω) and projected semi-major axis ( ẋ).These secular changes can be caused by both relativistic and astrometric effects (Damour & Taylor 1992;Lorimer & Kramer 2004).
Later, Damour & Deruelle (1986) proved that all of the independent O(v 2 /c 2 ) timing effects could be described in a simple mathematical way for a wide range of alternative gravity theories.They developed a phenomenological (i.e.theory independent) timing model based on the full first post-Newtonian description of the two-body problem, which we will refer to as the "DD" model that uses a quasi-Keplerian solution to the equations of motion of a 2-body problem.This model allows working within a parametrised post-Keplerian approach (PPK).Damour & Taylor (1992) then showed that the DD model could be used to constrain a wide range of conservative theories of gravity obeying the modified Einstein-Infeld-Hoffmann (mEIH) framework (see also Will 1993).
For the binary system part, the timing formula of the DD model reads as: where t b is the Solar System barycentric (infinite-frequency) arrival time, t 0 is a chosen reference epoch, T is the pulsar's proper time, and D is a Doppler factor accounting for the relative radial motion of the centre of mass of the binary system and the Solar System barycentre.The quantities ∆ i in Eq. ( 7) are different time delays introducing corrections due to internal binary effects.Splitting the timing formula into a set of different contributions is to some extent a coordinate-dependent concept, however, within the approximations used it is convenient to work derivatives in Eq. ( 6), including the expression of α A from Eq. ( 6) as the derivative of the mass which enters the accuracy check.
with these individual expressions for timing delays.The term ∆ R (T ) is called "Roemer delay" and counts for the classical light travel time through the binary.∆ E (T ) is the "Einstein delay" and relates the proper emission time to the coordinate time of emission.∆ S (T ) is the "Shapiro delay" arising from the effect of the gravitational potential of the companion on the propagation of the pulsar signals.The "aberration delay" ∆ A (T ) places corrections due to the periodic changes in the direction of pulse emission (as seen in the frame of the rotating pulsar) while the pulsar follows its binary motion.
All the expressions depend on three sets of parameters.Keplerian parameters are and remain the same as for the BT model (the subscript 0 means a value at a given epoch).Separately measurable PK parameters are The DD model introduces a periastron-shift parameter k = ωP b /(2π), Shapiro "shape" (s = sin i, where i is the inclination angle) and Shapiro "range" (r) parameter for the signal propagation, and a relativistic deformation of the orbit (δ θ ).Not separately measurable PK parameters are including a second parameter for the relativistic deformation (δ r ), two aberration parameters (A and B) and a Doppler factor (D).
The mentioned delays in the framework of the DD model are presented by expressions ∆ S (T ) = −2r ln 1 − e cos u − s sin ω(cos u − e) where the secular changes incorporated in the projected semimajor axis x and eccentricity e are given by The true anomaly (A e ) and ω depend on the eccentric anomaly (u) via following the relations where u is itself a function of T and defined by solving Kepler's equation The parameters in Eq. ( 10) cannot be measured separately from the parameters in Eqs. ( 9) and ( 8) as shown by Damour & Deruelle (1986).It means that all not separately measurable parameters can be fully absorbed into the change of other parameters.More details about measurable parameters and the connection between observed parameters and the intrinsic parameters can be found in Damour & Taylor (1992).The redefinition of orbital parameters can absorb the Doppler factor (D).Such a redefinition does not affect the tests because it is only a rescaling of physical units, thus D is set to 1. 4 The aberration parameters A and B can be absorbed as well by redefinition of T 0 , x, e, δ r and δ θ .

A timing model in DEF gravity
The DD model can be applied to a wide range of fullyconservative theories of gravity.DEF gravity is a fullyconservative theory (Will 1993), thus we can investigate it in the framework of the DD model.
In different theories of gravity, the functional dependence of the PK parameters, i.e.
can differ substantially because of strong-field effects involving a highly compact NS and its companion.The PK parameters are functions of two masses, thus one can perform gravity tests if at least 3 PK parameters are measurable.In general, if there are n measured PK parameters, one can do n−2 independent tests for a given gravity theory.A common procedure of making tests using expressions for PK parameters is discussed in detail in Section 4, devoted to applications of a new model and comparison between different techniques.
The core part of the new timing model is the predictions for PK parameters (20) in DEF gravity.We use expressions for PK parameters in DEF gravity provided in the literature (Damour & Esposito-Farèse 1992b;Damour & Esposito-Farèse 1993, 1996).However, for parameters δ θ , δ r , A and B there are only more general expressions.A phenomenological theory-independent description of PK parameters from which we derive their expressions in DEF gravity is presented in Damour & Taylor (1992).The expressions of PK parameters and details are given in Appendix B.
The aberration parameters A and B can also be calculated in DEF gravity, provided we know the system's geometry.The applications we use in the paper assume a special situation when the pulsar's spin axis is aligned to the orbital angular momentum of the system.The alignment is a reasonable assumption for a recycled pulsar with a WD companion due to the preceding accretion process during the system's evolution.In this special case, A is calculated according to Eq. (B.7) assuming angles η = −π/2 and λ = i (Damour & Taylor 1992).The second aberration parameter B = 0 in this special situation.In general, an alignment may not be the case for a particular binary pulsar (e.g.double neutron star systems), but the new model can also handle this misalignment.Once the system's geometry is assumed, real A and B values can be calculated for a given epoch without the necessity to redefine other parameters. 5he PK parameters from Eq. ( 20) depend on the orbital parameters of the system and the physical properties of the pulsar.However, we also have to know properties of the companion.The PK expressions (B.1 -B.13) depend on gravitational formfactors of the companion {α c , β c }.The quantity k c does not appear in the expressions because they take into account only leading order terms, and is therefore of no interest.These quantities, in turn, depend on the type of companion.If the companion is a NS, they are calculated in the same way as for the primary pulsar using Eq. ( 6).If the companion is a black hole (BH), then the gravitational form-factors all vanish: This is a consequence of the "no-hair" theorem; the BH is not scalarised in DEF gravity (Hawking 1972;Damour & Esposito-Farèse 1992a).Section 6 is dedicated to a more thorough discussion on binary pulsar systems with a BH and the results which we can obtain from timing of a pulsar in a relativistic orbit with a BH.For a WD companion, the gravitational form-factors are approximated by their weak field expressions Such an approximation is sufficient for our purposes because WDs are not compact enough to show the strong field effects present in NSs and BHs.The next order approximation for weakly self-gravitating objects is α A α 0 (1 − 2s A ), where s A G * m A /(Rc 2 ) is the sensitivity (R is the object's radius) and has a typical value of s WD ∼ 10 −5 − 10 −3 for a WD (Damour & Esposito-Farèse 1992a;Damour & Esposito-Farèse 1993;Will 2018b).Thus the usage of weak field counterparts {α 0 , β 0 } for a WD companion instead of precisely calculated values is justified.
To summarise, the timing model in DEF gravity is defined by two theory parameters {α 0 , β 0 }, the chosen EOS for a NS, and the type of the companion among {WD, NS, BH}.We name the new model DDSTG arises as direct extension of the DD model for STG theories.

DDSTG implementation into TEMPO
Once we obtain the theoretical part of the new DDSTG timing model, we need to apply it to the timing data.We implemented the DDSTG model into one of the commonly used pulsar timing software -the TEMPO6 program (Nice et al. 2015).The local implementation of DDSTG model in TEMPO and precalculated grids of masses and gravitational form-factors are supplied with the paper7 .The authors intend to make DDSTG a part of the official TEMPO distribution to ensure forward compatibility.
There is a standard procedure of analysing radio pulsar timing data.The preprocessed TOAs are obtained after radio observations of the desired pulsar system.During the procedure, TEMPO reads TOAs, parameters of the binary model, and some coded instructions from supplied files.Then TEMPO fits the selected timing model accounting for the transformation to the Solar System barycentre, pulsar rotation, and its spin down for a chosen binary model.
Specifically for DDSTG, a user selects theory parameters {α 0 , β 0 }, EOS, and the type of a companion.During the initialisation TEMPO reads pre-calculated 3D grids of gravitational form-factors and NS masses {α A , β A , k A , m A } (see Section 2.3) for the chosen EOS.Each 3D grid depends on {α 0 , β 0 , p c } parameters, where p c is the central pressure.Then gravitational form-factors and masses are interpolated for the selected {α 0 , β 0 } values and saved into a smaller 1D grids (depending only on p c ) in the program memory.The final 1D grids remain unchanged and are used to interpolate gravitational form-factors for a particular theory in the mass range during the fitting procedure of TEMPO.
TEMPO with the selected DDSTG model fits two masses: the total mass (m tot ) of the system and the companion's mass (m c ).Every time when these masses change, the model recalculates the gravitational form-factors for the pulsar and the companion.Once the gravitational form-factors (Eq.( 6)) are known, the model calculates all PK parameters (Eqs.( 9) and ( 10)) using equations given in Appendix B. These calculated PK parameters are used afterwards in the timing formula (7).The details about the DDSTG implementation and the description of model parameters can be found in Appendix C.
In the timing formula ( 7), the Solar System barycentric arrival time (t b ) is known.However, to obtain the proper pulsar time (T ), one has to perform the inversion of the timing formula, i.e. get T as a function of t b : T = t b − ∆(t b ).The original DD model utilises an approximate analytic inversion, which sometimes is not accurate enough, for example, for the Double Pulsar (Kramer et al. 2021).In contrast, our DDSTG model performs accurate numerical inversion.Eq. ( 7) is solved iteratively for T for each TOA.The discrepancy due to an inaccurate inversion may influence the test for precise timing in the future or PSR-BH systems discussed further below.Nowadays, numerical inversion is considered a new standard and implemented in the DDS model of TEMPO (Kramer et al. 2021) and generally used in PINT (Luo et al. 2021).
The DDSTG model produces the best fit to the data for DEF gravity with a selected set of parameters and a given EOS.The output of TEMPO stays the same as for other binary models, e.g. the calculated χ 2 value and root mean square error of fit.The χ 2 statistics may be applied for further tests to compare the results for different theory parameters.

Advantages of DDSTG
The tests with pulsar data allow placing constraints on the gravity theory parameters.The most simple and common way to achieve this is to compare the measured PK parameters with the theoretical predictions from the theory.PK parameters are obtained by fitting a phenomenological model to the observational data, e.g., DD model.A phenomenological model estimates PK parameters and their uncertainties.
Within a particular gravity theory, each PK parameter depends on the masses of the pulsar and the companion and thus corresponds to a curve in a mass-mass space.Together with the measurement error, each PK parameter produces a strip in the {m p , m c } space.If there are two measured parameters, one may find the intersection area and obtain the estimated mass values.If the number of measured parameters is more than two, one can perform a test of that particular gravity theory.The test is passed if all three curves intersect in the range of errors at one point.Generally, n measured PK parameters result in n − 2 independent tests.
If a gravity theory has arbitrary parameters, tests can be done for any fixed values of theory parameters.For DEF gravity, this procedure results in an "allowed" area in {α 0 , β 0 } space that pass gravity tests within the measurement uncertainties.This area is bounded within the selected confidence limit by a curve, which is usually plotted.To date, the regions allowed by different tests have always included GR.
If the companion is optically bright, one can get information about the system from optical observations.For example, for a binary pulsar system with a bright WD it is possible to obtain the mass ratio and the companion mass using high-resolution optical spectroscopy of the companion (e.g., Antoniadis et al. 2012Antoniadis et al. , 2013)).In such cases, we do not have to measure three independent PK parameters based on the radio data, but we can combine several multi-wavelength constraints.
Another issue are possible correlations between PK and other parameters.Observed correlations can come from the theoretical correlation of parameters within the binary model (e.g.T 0 ↔ ω and P b ↔ ω in a low eccentric case) and from a nonuniform data sampling (e.g. a nonlinear correlation in a parametrisation of Shapiro delay r ↔ s).Often in the past, measured PK parameters were published without any information about observed correlations.This additional information from the timing data is lost in such cases.These days it became a common practice to publish observed correlations and even provide them explicitly.Anderson & Yunes (2019) and Anderson et al. (2019) accounted for possible observed and theoretical correlations between PK parameters within DEF and MO gravity via computationally highly demanding Markov chain Monte Carlo (MCMC) simulations utilising the correlation matrix from the TEMPO output.However, this elaborate approach would not work if the relativistic effect is not well measured or the dependence between the parameters is highly nonlinear.If the relation is nonlinear, the correlation matrix gives information only about the linear contribution.To fully account for a nonlinear relation one has to obtain a full probability density function in the parameter space.
Contrary to the PK method, the DDSTG model accounts for all even nonlinear correlations naturally and breaks them by a direct fit of the masses.This was already one of the main advantages of the DDGR model; and is one of the reasons why the DDSTG uses the same superior approach while extending it to STG theories.The model accounts for all effects internally and extracts all the information from the timing data.The information is not lost even if the correlating parameter cannot be measured but influences other parameters.Such possible correlations are relevant for PSR J1141−6545 (Venkatraman Krishnan 2019; Venkatraman Krishnan et al. 2020), where there is a correlation between the time dilation parameter (γ) of the Einstein delay and the rate of change of the projected semi-major axis due to the spin-orbit coupling caused by the fast rotating WD companion ( ẋSO ).Furthermore, the Shapiro delay cannot be measured independently but is still essential (see e.g.Bhat et al. 2008).We expect the DDSTG model to be of particular advantage for PSR J1141−6545 and there will be a dedicated paper (Venkatraman Krishnan et al. in prep.) about applying the new model to this system.
Compared to conventional methods, the constraints obtained by DDSTG may, in general, become either tighter or weaker depending on the particular case.However, the resulting restrictions are more reliable by the construction.This weakening of the limits can happen because of unaccounted correlations.

Application to PSR J2222−0137
As a demonstration of the DDSTG model, we now apply it to published timing data of a binary pulsar.We select PSR J2222−0137 because of its unusual characteristics, which we now list in detail.

About the system
We are particularly interested in recycled pulsars because of their, in general, better timing precision essential for precise gravity tests.The radio pulsar PSR J2222−0137 was discovered in the Green Bank Telescope (GBT) 350 MHz drift scan pulsar survey (Boyles et al. 2013).It is a mildly recycled pulsar which has a spin period (P) of 32.8 ms.The pulsar is in a binary system with P b of 2.44576 days and x of 10.848 light-seconds.We also expect the pulsar's spin axis to be aligned to the orbital angular momentum of the system.The alignment happens as a consequence of the pulsar recycling process, when the pulsar accretes matter from the companion.PSR J2222−0137 is already known as a unique laboratory for testing gravity theories because of its special characteristics; for more detailed information, we refer the reader to Cognard et al. (2017) and Guo et al. (2021b).It is one of the closest pulsars known, with the most precise distance measured with Very Large Baseline Interferometry (VLBI, see Deller et al. 2013) and excellent timing precision.The system has a highly significant detection of the Shapiro delay as well as the measured rate of advance of periastron ( ω), which yield precise mass measurements and ∼ 1% test of the GR predictions for the Shapiro delay (Guo et al. 2021b).
The most important characteristic in the scope of this paper is the precise measurement of Ṗb .Given the precise masses, this can be compared with a precise prediction for the orbital decay due to GW damping, furthermore the kinematic contributions to Ṗb (see Section 4.4) can be estimated precisely because of the good distance measurement.The Ṗb measurement constrains dipolar GW emission in this system, which would arise in DEF gravity (Damour & Esposito-Farèse 1992a) because of the large difference in the compactness of the components (NS and WD).Finally, the mass of the pulsar (m p = 1.831(10)M ) (Guo et al. 2021b) lies in the range where spontaneous scalarisation happens, yielding strong limits on this highly non-linear phenomenon (Shao et al. 2017;Zhao et al. 2022).

Observational data
The PSR J2222−0137 timing data used in this work are the same as used by Guo et al. (2021b).This includes observations of this pulsar going back to its original follow-up, which started on 2009 June 23 using the Green Bank Telescope (GBT); however these ended on 2011 December 26.Regular observations with the three largest European radio telescopes (3ERT) started the following years: the Nançay Radio Telescope on 2012 October 4, The Lovell Telescope at Jodrell Bank on 2012 November 20 and the dense orbital campaigns with the Effelsberg 100-m radio telescope on 2015 October 26.These observations continue to the present day; however Guo et al. (2021b) only analysed the data obtained until 2021 May 2; their processing and how the resulting ToAs were analysed is described in detail by Cognard et al. (2017) and Guo et al. (2021b).
Observations with the MeerKAT radio telescope array and the Five hundred meter Aperture Spherical Telescope (FAST) started in 2019 September 24 and 2020 October 5, but these timing data were not used by Guo et al. (2021b).However, the latter authors did use FAST data for a detailed study of the emission properties of PSR J2222−0137.In this work, we also extend the existing data by simulated data assuming the same timing properties of all of these telescopes to simulate how the timing parameters for this system will improve in the near future (see Section 5).

Mass-mass diagrams in DEF gravity
The illustrative and straightforward way to explore if a given gravity theory agrees with binary pulsar observational data is to calculate mass-mass diagrams.As mentioned in Section 3.4, firstly, one measures values of PK parameters with their uncertainties by fitting the phenomenological binary model DD to binary pulsar data.Then the observed PK values can be compared with their theoretical predictions of a specific gravity theory.The PK parameters from Eq. ( 20) depend on the mass of the pulsar (m p ) and the mass of the companion (m c ) (see Appendix B).For PSR J2222−0137 we use the measured PK parameters from Guo et al. (2021b).
To illustrate how the tests are performed, we select two points in DEF gravity parameter space near the scalarisation region with large negative β 0 and calculate mass-mass diagrams for them.The test is sensitive to the predicted dipolar contribution to Ṗb , which rises dramatically in the region of spontaneous scalarisation.The first point with α 0 = −10 −4 , β 0 = −4.3does not predict a strong enough dipolar contribution and passes the test within a 1σ limit.The corresponding mass-mass diagram is presented in the left panel of Figure 1.In contrast, the second point with α 0 = −10 −4 and a bit smaller β 0 = −4.35 is already excluded because of a rather strong scalarisation of the pulsar leading to significant dipolar GW damping (see the right panel of Figure 1).The Ṗb curve nicely shows that a significant enhancement in the scalarisation happens for a particular interval in the mass range defined by the EOS.

Various contributions to Ṗb
At this point we need to discuss the different contributions to the observed Ṗobs b and their influence on our results.The observed orbital decay of the system consists of many terms where the first term ṖGW b is due to GW damping and can include dipolar and monopolar terms in DEF gravity, besides the general quadrupolar prediction of GR.The full expressions of different GW contributions are presented in Appendix (B.9).The next two terms are of kinematic origin: the Shklovskii effect ( ṖShk b ) and the Galactic contribution ( ṖGal b ).They are the result of a timevarying Doppler factor D due to an (apparent) radial acceleration between the pulsar binary and the Solar System (Damour & Taylor 1991).The last three terms come from tidal effects, mass loss in the system, and a possible temporal variation of the gravitational constant G.
We are mainly interested in measuring the GW emission term ( ṖGW b ) and comparing it with the prediction of DEF gravity.The most significant additional effects in PSR J2222−0137 come from kinematic contributions ṖGal b and ṖShk b .These two effects arise beyond the binary system and can be combined in the overall external contribution The last three terms in Eq. ( 23) are parts of the internal contribution According to Damour & Taylor (1991), Nice & Taylor (1995), and Lazaridis et al. (2009), the Galactic differential acceleration may be analytically approximated with the expression ṖGal where l is Galactic longitude, b the Galactic latitude and β = (d/R 0 ) cos b − cos l.The quantity K z is the vertical component of the Galactic acceleration, which for Galactic heights z ≡ |d sin b| ≤ 1.5 kpc can be approximated with sufficient accuracy by where z kpc ≡ z(kpc) (Holmberg & Flynn 2004;Lazaridis et al. 2009).From Gravity Collaboration et al. ( 2021) we can take a value for the Sun's Galactocentric distance R 0 = 8275±9±33 pc.
The Galactic circular velocity at the location of the Sun (Θ 0 ) is taken to be 240.5(41)km/s (see Guo et al. 2021b, and references therein).The Shklovskii contribution (Shklovskii 1970) can be calculated using where µ α and µ δ are the proper motion in Right Ascension (RA) and Declination, respectively, and d is the distance to the pulsar.The astrometric values and uncertainties for PSR J2222−0137 are taken from Guo et al. (2021b).For PSR J2222−0137, the uncertainty in Ṗext b is small compared to the precision of the Ṗb measurement.The uncertainty of Ṗext b in our case can be neglected and therefore we can provide the fixed XPBDOT parameter in TEMPO.

Results of applying DDSTG
Our main goal is to obtain limits on DEF gravity parameters {α 0 , β 0 } by applying the DDSTG model.TEMPO allows to calculate a χ 2 value for every particularly selected pair of values (α 0 , β 0 ).Therefore we obtain χ 2 values for a grid of parameters and compare them to each other.For straightforward comparison we subtract the minimum χ 2 min over the {α 0 , β 0 } grid from calculated χ 2 .The location of the minimum χ 2 min is statistically in agreement with GR (α 0 = 0, β 0 = 0).The shifted quantity ∆χ 2 = χ 2 − χ 2 min has a χ 2 (d.o.f.= 2) distribution with 2 degrees of freedom.Finally we construct contours of a fixed ∆χ 2 value corresponding to desired confidence levels.In this paper we adhere to 90% confidence level limits with ∆χ 2 4.6.
During each run, TEMPO fits spin parameters, astrometric parameters, Keplerian parameters, two masses, and other parameters of interest, e.g.ẋ.A value of the external contribution to the rate of the orbital period change ( Ṗext b ) is fixed and selected, so that it fully accounts for Shklovskii effect and the Galactic Ṗb contributions (Shklovskii 1970;Brumberg et al. 1975   & Taylor 1992).The resulting ∆χ 2 map in {α 0 , β 0 } space for PSR J2222−0137 is presented in Figure 2. The limit on DEF parameters is placed by a contour of 90% confidence level limit of χ 2 (d.o.f.= 2) statistics.
Generally, the uncertainty of Ṗext b can be important for other systems where it is not well constrained.In this case, the DDSTG model allows a complete description with account for all the uncertainty due to external effects Ṗext b .First we have to assume a prior distribution for the Ṗext b value.Then we have to calculate the χ 2 value for a 3-dimensional map with {α 0 , β 0 , Ṗext b } as parameters of the three axes.Finally, we marginalise the probability density for the Ṗext b axis using Bayes' theorem and obtain the corrected 2-dimensional map of χ 2 corr .The details of the procedure are presented in Appendix E.
Calculating an extensive grid in a 2-dimensional parameter space with a lot of points (e.g.∼ 500) for each axis to obtain a finely resolved contour of the desired ∆χ 2 value is too computationally demanding.In this work, we use an adaptive mesh refinement technique to trace the location of the desired contour.We start from the sparse grid with 9 × 9 points in the {α 0 , β 0 } space.Then, the algorithm resolves all the cells that can have the contour line inside them.The algorithm repeats the refinement of important cells until we obtain the desired curve fineness.
For a N × N grid, the naive approach utilises O(N 2 ) iterations, whereas adaptive mesh refinement has only O(N log 2 N) complexity.For the present work we use the refinement level of 6, resulting in the final grid with 513 × 513 points.We need such a large resolution in the parameter space because of the "horn" feature at large α 0 values near β 0 −2.A simple analysis shows that adaptive refinement requires to calculate 56 times fewer points to resolve a single contour for a grid with 513 × 513 points.In Figure 3 we present an adaptive refinement map corresponding to the search of a contour from Figure 2. A high level of refinement is performed only along the contour of 90% CL limit.In Appendix D one can also find the mass-mass diagram showing what happens in the region of the "horn" in terms of PK parameters.

Comparison with the PK method
In this section we compare the constraining power of the newly developed approach of this paper with that of existing procedures.For this reason we perform the test with the traditional "PK method" based on the PK parameters from the timing data.The corresponding PK parameters are measured by means of fitting the DD model (Damour & Deruelle 1986) to the same timing data.Then for each specific choice of the theory parameters {α 0 , β 0 } we fit two masses m p and m c to minimise the χ 2 value.The χ 2 is calculated by comparison of the observed PK parameters p obs where σ obs p i is the standard deviation of the i-th measured PK parameter.Finally, we calculate the grid of χ 2 values over the desired {α 0 , β 0 } space with the same settings as for the DDSTG approach and shift χ 2 values by its minimum χ 2 min .The detailed explanation of this common "PK method" can be found in Damour & Esposito-Farèse (1998).For both methods the minimum χ 2 min is in statistical agreement with the GR value χ 2 GR .Figure 4 shows the comparison between the two methods.Each point on the plot presents a particular pair of {α 0 , β 0 } parameters.The analysis shows that, for PSR J2222−0137, the DDSTG model produces higher or equal values of ∆χ 2 compared to the traditional "PK method".Despite the absence of any strong correlations between the PK parameters in PSR J2222−0137, the new approach produces slightly more restrictive results for the area with negative β 0 .For certain areas in {α 0 , β 0 } space the difference in the shifted ∆χ 2 = χ 2 −χ 2 min values becomes statistically significant when we calculate contours of 90% confidence level limit (∆χ 2 4.6).purpose we simulate fake TOAs for a set of radio observatories, assuming realistic timing precision estimated from real timing data.

Simulated data-sets for FAST, MeerKAT and 3ERT
The PSR J2222−0137 timing data used up to this point are described in Section 4.2.We will now describe our simulations, which show what we might be able to achieve in the near and foreseeable future with timing from this system.We simulate TOAs spanning 10 years from 2021 to 2030 based on the current precision of the TOAs obtained with the 3ERT telescopes, as well as the TOA precisions from ongoing observations from MeerKAT and FAST.These simulations are conservative since they assume that there will be no improvement in the existing capabilities at these telescopes.
For TOAs from FAST, the radiometer noise reduces significantly thanks to its large collecting area, while the jitter noise becomes the primary limitation of timing precision.We find, however, that increasing the integration time to 15 min can largely reduce the jitter noise and eliminate its contribution to the timing precision (as it scales with the number of averaged pulses (N p ) as σ J ∝ 1/ N p (Lorimer & Kramer 2004).Therefore, the median TOA uncertainty from 15-min TOAs are adopted in the simulation.
Table 1 lists the telescopes assumed in our simulation, with the information on their effective diameter ( / eff ), observing bandwidth, and TOA uncertainties at L-band.All TOA uncertainties are scaled to an integration time of 15 min over the full bandwidth.For each telescope, we assume one full orbit observation (∼60 hours) per year, and split the observations into a monthly cadence, i.e. 5 hours per month8 , to allow a good estimation of timing parallax (which requires a good coverage of Earth's orbit).This is important for the estimation of uncertainties in the external Ṗb contributions shown in the next section.
The simulations are performed using the program developed in Hu et al. (2020).First, we simulate TOAs based on the above assumptions, and add the TOAs from Effelsberg, Nançay, and Lovell telescopes together to be compared with MeerKAT and FAST.For 3ERT, the simulated TOAs are combined with the existing TOAs in Guo et al. (2021b).We then adjust the TOAs to perfectly match the timing parameters measured in Guo et al. (2021b), and add a Gaussian white noise to each TOA based on its σ TOA .Finally, we fit for timing parameters and obtain their uncertainties, among which Ṗb and timing parallax are of most important here.The whole process is done with the TEMPO DDSTG model.

Contributions to the uncertainty of Ṗb
The predicted uncertainties of Ṗb are shown in Figure 5, where the pink, orange, and blue lines show the improvement in time with the simulated data from 3ERT, MeerKAT, and FAST, respectively.As discussed in Section 4.4, we also need to account for uncertainties from external effects, Ṗext b .They are also expected to become more precise in the future because of anticipated improvements in the model for the Galactic gravitational potential, distance, and proper motion from future observations.Correspondingly improved values for ṖGal b and ṖShk b may then be estimated using Eqs.( 28) and (30).For the Shklovskii effect, we find that it is mostly limited by the uncertainty in the distance, which comes from VLBI parallax or timing parallax measurement, whichever is better.With the simulated FAST data, the measurements of timing parallax and proper motion improve quickly with time.In particular, the uncertainty of timing parallax will soon surpass the VLBI parallax and hence improve the distance measurement.The corresponding uncertainty in ṖShk b will decrease with time as shown by the brown line in Figure 5.This line is below the predicted uncertainties from observed Ṗb most of the time, except at the very end of the simulation when compared to ∆ Ṗobs b with FAST (blue line).In addition, future VLBI parallax measurements will likely be improved so that Shklovskii effect will not be a limiting factor for Ṗb .
As for the contribution from the Galactic acceleration, a typical uncertainty in its vertical component (∆K z ) is about 10%, which contributes the most in ∆ ṖGal b , shown as the green dashed line in Figure 5.With FAST, Ṗb will then be limited by ∆K z from 2024 onwards, if there is no improvement for this quantity.In fact, we find that ∆K z needs to be improved to 3% (see the green dash-dotted line) to not limit the precision of Ṗb before 2030.The uncertainties in the Galactic potential do not limit the precision in this case.For the scope of further analysis, we assume that with future observations on pulsar timing, VLBI parallax, and Galactic acceleration, the precision of Ṗb will not be limited by Shklovskii and Galactic effects.For Shklovskii contribution it is a reasonable assumption at least up to 2030.We also expect our knowledge about the Galactic potential to improve in the future especially in the proximity of the Solar System, thus we assume the improvement of ∆K z for the selected very close pulsar to be 3%.

Potential future constraints on DEF gravity from PSR J2222-0137
We apply the same techniques as described in Section 4 including the adaptive mesh refinement for contours.The map of ∆χ 2 for a combined simulated data is presented in Figure 6.By 2030, we can expect a significant improvement in the limits for large positive β 0 .This region is susceptible to dipolar gravitational emission.The significant improvement in Ṗb measurement pushes the limit below the Cassini limit.
In Figure 6 we also present the comparison of constraining power for different observatories.The tightest constraints may be obtained for the combination of all three observatories.However, the precision of FAST is high enough to be significantly constraining on its own.

Simulations for PSR-BH systems
As a further application of the new timing model, we investigate what limits on DEF gravity we can obtain from a pulsar-black hole (PSR-BH) system.As mentioned above, our DDSTG implementation already allows to specify a BH as a companion, even though such a system has so far not been found.We apply

Simulated FAST data-sets
To investigate possible limits on DEF gravity from PSR-BH systems, we select presumed realistic parameters for simulating fake TOAs.These systems are assumed to comprise a 1.4 M pulsar and a 10.0 M BH in a highly eccentric orbit (e = 0.6).As PSR-BH systems are more likely to reside in GCs (see the discussion in Section 6.4 and references therein), we assume that our hypothetical systems are located in the GC M15 and take the position (right ascension "RA" and declination "DEC") and dispersion measure (DM) of PSR B2127+11C (M15C) for our simulations.To investigate how the limits on DEF gravity depend on the orbital period, we consider three cases with orbital periods of 3 d, 1 d, and 8 h, respectively.The selected parameters for PSR-BH systems are presented in Table 2.We simulate data-sets with two different TOA uncertainties (for 15-min integration time): moderate 10 µs (which is typical for pulsars in GCs), and precise 1 µs.Taking these two, by an order of magnitude different TOA uncertainties helps to explore the dependence of the DEF gravity test on the TOA precision.We effectively assume a recycled millisecond pulsar, because they in general produce a better timing precision.We assume 6 hours of timing observation on these systems every two weeks, each session starts at a random orbital phase.All simulations cover a time span of 5 years and have the same number of TOAs (n TOA = 3144).

How does a BH companion change the test of DEF gravity?
Binary systems consisting of a pulsar and a white dwarf (PSR-WD) are particularly interesting for constraining STG theories due to their high asymmetry in compactness (α WD α 0 ).The theory predicts a higher rate of orbital energy loss due to dipolar radiation from such asymmetric systems (Will 1993;Damour & Esposito-Farèse 1996).However, generally, PSR-BH systems are expected to be even more asymmetric up to a certain positive β 0 , as was pointed out by Damour & Esposito-Farèse (1998).As a result of the no-scalar-hair theorem to BHs in STG one has (Hawking 1972;Damour & Esposito-Farèse 1992a) This approximation is only valid for stationary BHs with an asymptotically flat spacetime and an asymptotically constant scalar field (Berti et al. 2013), which is not true anymore in the presence of a compact scalarised companion.However, a justification that it is still an excellent approximation can be found in Liu et al. (2014).The pulsar orbiting in an eccentric orbit will eventually induce a time-dependent scalar field at the location of the BH.This scalar field results in an induced effective scalar charge (α induced BH ) which, however, is totally negligible ( 8 × 10 −14 α p ; cf.Eq. ( 50) in Liu et al. 2014).
The absence of scalar charges for the BH results in simplified relations of the PK parameters.The main consequence of a BH presence is that all PK parameters except Ṗb are identical to  their GR expressions with an appropriate rescaling of the masses.In the GW damping sector, all the multipoles, e.g., monopolar, dipolar and quadrupolar modes, are still present.Thus PSR-BH systems are susceptible but only to the test of GW damping via Ṗb (Damour & Esposito-Farèse 1992a;Mirshekari & Will 2013;Liu et al. 2014).

Results
The limits obtained from the DDSTG model for the simulated PSR-BH systems are shown in Figure 7.For the tests presented here we do not include any potential uncertainty in the observed Ṗobs b value due to the external effects.The restrictive power of the test increases for more relativistic systems with shorter orbital periods.To place new limits on DEF gravity with a moderate TOA precision of 10 µs the orbital period should be a fraction of a day.
Another way to improve the restrictive power is to increase the precision of the TOAs (compare left and right panels of Figure 7).The obtained limits strongly depend on the accuracy of Ṗb measurement as most of the deviations from GR come from the predicted dipolar GW emission.Table 3 shows the predicted GR values of Ṗb and compares them to the corresponding ∆ Ṗb uncertainties.The lower uncertainty and the higher the predicted value the better the test.
We expect a dramatic improvement of limits on DEF gravity from relativistic PSR-BH system.Due to higher asymmetry, the test is extremely sensitive to the precision of Ṗb measurement.
To answer the possibility of obtaining accurate enough TOAs, we have to argue where we can find such a system.

Pulsar-black hole systems origin
A binary pulsar system with a stellar-mass BH may originate from several different evolutionary scenarios.The first way is a standard evolution of a massive binary system (Voss & Tauris 2003).It results in a PSR-BH system with a young slow spinperiod (∼ 0.1−1 s) pulsar and a wide, eccentric orbit.The second path is a reversal mechanism (Sipior et al. 2004) taking place under a specific set of circumstances.The pulsar is formed first and later spun-up by accretion during the red-giant phase of the companion (Pfahl et al. 2005), later becomes a BH.This is similar to the origin of PSR J1141−6545, a system where a massive WD star formed before a more massive NS (Tauris & Sennels 2000).The reversal mechanism may give a more desirable result of a recycled pulsar in a system with a BH, because recycled pulsars have generally more precise timing and a more stable rotation than the slow "normal" pulsars (Verbiest et al. 2009).
Moreover, the third possible way to form a PSR-BH system is a multiple body encounter, happening in regions of high stellar density, e.g., GCs and the Galactic Centre region (Verbiest et al. 2009;Clausen et al. 2014).Such encounters are the reason why there are ∼ 10 3 times more low-mass X-ray binaries (LMXBs) per unit stellar mass in GCs than in the Galactic disk (Clark 1975), which results in a similarly enhanced proportion of millisecond pulsars (MSPs).In these encounters, an old, inactive NS approaches a main sequence (MS) binary so closely that a chaotic interaction ensues.The most likely result of such an interaction is that the two most massive objects (in this case the old, recycled NS and the more massive MS star) will form a more compact binary system, with the lighter MS star component being ejected at high velocity (see review by Phinney 1992).That MS will evolve, fill its Roche lobe, and start transferring matter onto the NS, which is spun up in the process, this is the aforementioned LMXB stage.
If left undisturbed, many of these then evolve into binary MSP systems.A consequence of this is that GCs not only have many MSPs, but that the number of MSPs in each cluster appears to be roughly proportional to its rate of stellar encounters (Γ) (Verbunt & Hut 1987).
However, in some GCs -especially those with collapsed cores -stellar densities are so high that only few binaries evolve without being disturbed; these GCs have a large interaction rate per binary (γ GC ) (Verbunt & Freire 2014).This means that, even after being recycled, an MSP has a high probability of undergoing further ("secondary") exchanges.
In such exchanges, an incoming massive star interacts chaotically with the components of either a LMXB or a MSP binary.Again, the least massive object is the most likely to be ejected, in this case that will be the low-mass star that recycled (or was still recycling) the NS.A new binary will form, consisting of the NS and the intruding massive star.If the latter is degenerate, then the system will not undergo accretion and the orbital circularisation that comes with it, but will keep the orbital eccentricity it acquired after formation.
Several such obvious products of secondary exchange encounters have been found in GCs, invariably with a large γ GC : PSR B2127+11C, in the core-collapsed M15 GC (Prince et al. 1991;Jacoby et al. 2006), PSR J0514−4002A, in NGC 1851 (Freire et al. 2004;Ridolfi et al. 2019) (recently two additional such systems, D and E, were found in the same GC, see Ridolfi et al. 2022), PSR J1807−2500B, in the corecollapsed NGC 6544 (Lynch et al. 2012), PSR J835−3259A, in the core-collapsed NGC 6652 (DeCesar et al. 2015) and PSR J1823−3021G, in the core-collapsed NGC 6624 (Ridolfi et al. 2021).These discoveries suggest (but by no means assure us) of the possibility that similar secondary exchange encounters might produce MSP binaries with even more massive companions, such as BHs.As for the massive MSP binaries above, such MSP-BH systems would be preferentially produced in the GCs with large γ GC ; this is one of the reasons why they are targeted by the MeerKAT TRAPUM survey (Ridolfi et al. 2021).
These secondary exchange products are that their orbital periods vary between 8 h in the case of B2127+11C and 18.8 d in the case of PSR J0514−4002A, their orbital eccentricities are between 0.38 and 0.90.Thus the simulated PSR-PH systems listed in Table 2 have realistic orbital periods and eccentricities.

Influence of a globular cluster origin
For the pulsars observed in GCs, the derivatives of the spin period, P, and in the case of binary pulsars, the derivative of the orbital period, Ṗb are contaminated (and in most cases dominated) by the line-of-sight component of the acceleration of the pulsar (or binary) in the gravitational potential of the GC (a GC ); this enters Eq. ( 23) as an additional term that is similar to ṖGal b : For pulsars in GCs, the only radiative test done to date was with PSR B2127+11C (Jacoby et al. 2006).The orbital decay observed in this system is −3.95 ± 0.13 × 10 −12 s s −1 , which is within ∼ 3% of the predicted value.That test, however, is limited by the fact that the maximum value for |a GC /c| ∼ 6 × 10 −18 s −1 (Phinney 1993), i.e., in this case the maximum value for | ṖGC b | ∼ 0.17 × 10 −12 s s −1 .This is of the same order as their measurement precision, which means that this radiative test cannot be improved, unless one could measure the acceleration of the pulsar independently.The situation would be much worse if PSR B2127+11C were closer to the cluster centre, where accelerations are much larger.For instance, PSR B2127+11A, a couple of arcseconds from the centre, has P = 0.1106 s and Ṗ = −2.107× 10 −17 ; this implies that |a GC /c| > 1.9 × 10 −16 s −1 and thus | ṖGC b | > 5.5 × 10 −12 s s −1 , a value larger than the orbital decay predicted by GR.
For the three PSR-BH systems in Table 3, the situation would be similar.If they were at the locations of PSR B2127+11C, their values of P b would imply that the maximum values for ṖGC b would be 9, 3 and 1 times larger respectively, i.e., 1.53, 0.51 and 0.17 × 10 −12 s s −1 .This corresponds to, respectively, ∼ 6.9, ∼ 0.37 and 0.02 times the values of ṖGR b in Table 3.That means that, for the latter system, a ∼ 50-σ test of the radiative properties of a PSR-BH system would be possible.These numbers illustrate the immense advantage of an increasingly shorter P b , either for GW tests in GCs or in the Galaxy: on one hand, the ṖGR b increases with P −5/3 b , the "polluting" part ṖGC b decreases as P b .Thus, the significance of a particular GW test grows, everything else being identical, with P −8/3 b .However, if the 8-h PSR-BH system is placed at the location of PSR B2127+11A, the test would lose its significance almost entirely.Fortunately, even in this case, we can get a firm upper limit for | ṖGR b |; the reason is that we also measure the spin period derivative.That is also affected by the acceleration in the cluster and by other terms, such as the Shklovskii effect.Adding the equations for Ṗ and Ṗb , we obtain: Note that all the terms on the right can be measured precisely, except for the characteristic age of the pulsar, τ c , which cannot be measured independently.Therefore, the last term then quantifies the uncertainty of the test, which is, again, proportional to P b , which implies that the significance of the test is again proportional to P −8/3 b .However, that last term is necessarily positive.If we assume the pulsar is extremely old, then this term will be very small and we obtain, from the other two terms, a hard lower limit for ṖGR b .Since the latter is negative, this represents a hard upper limit of its magnitude.This means that such a test could still, in principle, falsify GR.An upper limit on the last term can be obtained from the lowest likely value for τ c ; for most MSPs τ c < 1 Gyr.This means that, for the 8-h PSR-BH system, this unknown term would be at most 0.46 × 10 −12 s s −1 , implying a ∼ 20-σ test of GR.Thus, this test will be the more significant the larger τ c is compared to the orbital decay timescale for the binary.
Despite such possible mitigation, it is clear that the location in a GC always degrades the quality of radiative tests.For the 8-h PSR-BH system discussed above, significances of 50 and 20 σ in the measurement of Ṗb represent a significant degradation relative to the tests listed in Table 3, where for a 8-h PSR-BH system timed with 10µs the significance of the Ṗb measurement is larger than 1600.

DDSTG and other gravity theories
The approach developed in this work is not restricted to DEF gravity.It can straightforwardly be extended to investigate a larger set of alternative gravity theories.Broadly speaking, every theory that maps to the DD phenomenological model (Damour & Deruelle 1986;Damour & Taylor 1992) can be put into the DDSTG framework with a new implementation of appropriate PK formulae.The DD model is a quasi-Keplerian solution in the first post-Newtonian approximation to the dynamics of a 2-body system within the modified Einstein-Infeld-Hoffmann (mEIH) framework (Damour & Taylor 1992;Will 1993Will , 2018b)).The mEIH formalism covers a large set of fully-conservative gravity theories without the "Whitehead term" in the post-Newtonian limit.Later it was extended by Will (2018a) from fully conservative to semi-conservative theories of gravity.However, this extension requires additional terms in mEIH Lagrangian, which are not accounted for in the DD model.
The DDSTG model can be directly applied to a certain class of gravity theories without a modification of the PK parameter equations.For example, DEF gravity is a specific case of a massless mono-scalar tensor gravity theory with a particular expression of the conformal coupling A(ϕ).DDSTG covers STG theories with any conformal coupling function depending on two arbitrary parameters {α 0 , β 0 } and a single massless scalar field.To work, the model only requires pre-calculated gravitational formfactors for the new coupling function.In case of a more complex coupling function depending on a higher number of parameters the TEMPO code needs to be adapted.
Recently, Mendes and Ortiz (MO, Mendes & Ortiz 2016) introduced an extension of DEF gravity.MO gravity is an example of a theory that can be easily incorporated into the DDSTG model.Its difference from DEF gravity is in the form of the conformal coupling where β 0 is a free parameter.The second parameter α 0 is hidden in MO gravity to the scalar field at infinity.MO theory received attention in recent years and was originally introduced as an analytical approximation to a more fundamental theory, where the action includes quadratic terms of the scalar field coupled to curvature.The developed framework can therefore be straightforwardly extended to MO gravity without the change of the TEMPO implementation.

DDSTG and a time-varying gravitational constant
Another interesting extension of the DDSTG model-planned for a future release-is the inclusion of the effects of a temporal variation of the gravitational constant (G).A time-evolving asymptotic scalar field (ϕ 0 ) of a gravitating system, generally, leads to a temporal variation of the local gravitational constant.
In DEF gravity, such a change of the Newtonian gravitational constant as measured in the Solar System reads ĠCav (see e.g.Uzan 2011).Generally, one expects a temporal change in ϕ 0 to arise from the expansion of the universe and φ0 to result from a cosmological model based on DEF gravity (see e.g.Damour & Nordtvedt 1993).However, as part of a more agnostic approach, φ0 can be treated as an additional, independent timing parameter.
The main impact on the orbital motion of a binary system that arises from a time-varying gravitational constant is a secular change in the orbital period (Damour et al. 1988).For two weakly self-gravitating masses, one simply has Ṗ Ġ b /P b = −2 Ġ/G.However, for binary pulsar systems, this simple expression needs to be extended by body-dependent contributions (Nordtvedt 1990(Nordtvedt , 1993)).One then has where the factor F AB accounts for all the corrections related to the strong gravitational fields of the pulsar and its companion, if the latter is also a NS.More specifically, following Nordtvedt (1993), F AB accounts for a change in the body-dependent part in the effective gravitational constant G AB as well as a change in the masses resulting directly from φ0 .9It has been demonstrated by Wex (2014) that, depending on the parameter space, pulsar mass and EOS, strong-field effects can considerably enhance the effect of a time-varying gravitational constant, i.e.F AB 1. Consequently, accounting for Ṗ Ġ b in binary pulsar tests not only provides an independent test for a varying gravitational constant but also probes strong-field aspects related to a time-varying gravitational constant.

Discussion and conclusions
In this work, we developed a new, improved approach for testing STG.We examined a specific class of STG theories known as "DEF gravity".This approach is based on a new timing model, called the DDSTG model, which is an extension of the DD model to work within DEF gravity.Analysis of pulsar timing data with this model overcomes some of the problems of conventional methods, which we have discussed in detail.The DDSTG timing model uses theoretical predictions for PK parameters in DEF gravity and therefore uses a minimal set of binary parameters.For that reason, it accounts for all the possible correlations between these parameters.All the information from the observational data is used to provide the most reliable tests of an alternative theory, directly without intermediate steps with phenomenological parameters of the DD model.As a demonstration of the DDSTG model, we applied it to the most recently published timing data of the binary pulsar PSR J2222−0137 described and used by Guo et al. (2021b).This system is of great importance for testing alternative gravity theories because it is very close to us (resulting in the most precise VLBI distance for any pulsar), has precise timing and shows a set of well measured relativistic effects.The system has a high asymmetry in the compactness between the components: it comprises a massive NS (m NS ∼ 1.82M ) and a massive WD (m WD ∼ 1.31M ).This high asymmetry results, for some areas in the DEF gravity parameter space, in the prediction of a very strong dipolar GW contribution to the rate of orbital decay ( ṖD b ).The non-detection of dipolar GWs in this system is used to constrain the DEF gravity parameter space.Moreover the mass of the pulsar lies in the "scalarisation gap" (m NS 1.5M ); this means that strict limits on the occurrence of spontaneous scalarisation, a highly non-linear phenomenon, can be placed (Zhao et al. 2022).
The results from the new method confirmed and improved the existing limits on DEF gravity parameters from this system.The DDSTG model appeared to be more constraining in the area near spontaneous scalarisation (β 0 −4.0) when compared to the commonly used PK method.It suggests that the combination of the DDSTG model with an EOS agnostic approach can improve limits placed on the spontaneous scalarisation.
Moreover, we applied the DDSTG timing model to the simulated TOA for PSR J2222−0137 covering the period of [2021][2022][2023][2024][2025][2026][2027][2028][2029][2030] to see what improvement we can expect from that system in the future.The mock timing data-sets were simulated for several large observatories FAST, 3ERT (EFF+NC+LT), MeerKAT with TOAs uncertainties based on the real timing data.We discussed the future importance of kinematic contributions (Shklovskii and Galactic) to Ṗb , and consequently to the precision of these tests.The main limiting factor to Ṗb comes from the uncertainty in the Galactic contribution ( ṖGal b ).In particular, the limit comes from the current uncertainty in the vertical component of the Galactic acceleration (∆K z ).Our analysis predicts that this limitation will disappear if the current uncertainty of ∆K z ∼ 10% can be improved to ∆K z 3%, which is likely to be the case in the future with improvement of models for the gravitational potential of our Galaxy.Future observations are expected to significantly improve the limits on DEF gravity, especially with the use of FAST data.
One of the most promising systems for testing gravity, which we hope to have in the near future, are binary pulsar-black hole systems (PSR-BH).We simulated artificial timing data-sets for three eccentric PSR-BH systems with reasonable orbital parameters for three different orbital periods.The results of applying the DDSTG model to the simulated PSR-BH data strongly depended on the precision of the Ṗb measurement.We briefly discussed possible evolution scenarios leading to the formation of the PSR-BH system, such as GC origin and reversal mechanism.Depending on the place of origin, there might be issues in obtaining a precise intrinsic Ṗb , i.e. accounting for the contamination of Ṗb from a kinematic contribution due to the acceleration of the system in the gravitational field of the GC.Depending on the timing precision and orbital properties, PSR-BH can place stringent limits on DEF gravity.
In the future, the DDSTG model can be applied to a range of different binary pulsar systems to improve the limits on DEF gravity.We expect especially interesting results from PSR J1141−6545 where DDSTG is expected to be superior to standard approaches (Venkatraman Krishnan 2019).PSR J1141−6545 is an asymmetric PSR-WD system in 4.7 hours orbit with significant spin-orbit coupling due to the fast rotating WD.Due to the spin-orbit coupling the system shows a change of the projected semi-major axis ẋSO which has a strong correlation with the time dilation parameter γ.The latter parameter is caused by the precession which cannot be calculated, because we do not know the exact spin properties of the WD.Moreover, PSR J1141−6545 shows a weak Shapiro delay in the timing data which the DDSTG model can fully exploit resulting in a significant improvement of the test.The high asymmetry in the compactness between the components (m p ∼ 1.26M NS and m c ∼ 1.02M WD) makes this system a perfect tool for radiative tests of gravity.We expect the DDSTG model to be of particular advantage because it accounts for both possible correlations and weak relativistic effects (Venkatraman Krishnan et al., in prep.).
Another perspective system to perform DEF gravity tests with the DDSTG model is the Double Pulsar PSR 0737−3039A which shows the largest number of PK parameters in the timing data (Kramer et al. 2021).The system consists of two radio pulsars with masses of m p 1.34 M and m c 1.25 M .Properties of NSs in DEF gravity (gravitational form-factors) strongly depend on the choice of the EOS, which in turn affects the test.Thus to put reliable limits on DEF gravity from PSR 0737−3039A independent of a choice of EOS, one must perform an EOS-agnostic analysis.EOS-agnostic test means that it is performed with a set of EOSs which are diverse in their properties (see Voisin et al. 2020, who used such an EOS-agnostic approach to constrain DEF gravity with a pulsar in a stellar triple system).With this paper we included a set of 11 EOSs varying from soft to stiff (see Appendix A.2) which can be used for such agnostic tests in the future.

Appendix A: Details on calculating grids of NSs
We select slowly rotating axisymmetric approximation of NSs following Damour & Esposito-Farèse (1996).The internal structure of such a NS in DEF gravity is described by a set of 8 ordinary differential equations depending on the radial variable (r).To calculate a single NS the program solves the system of ODEs by applying appropriate boundary conditions at the centre of a NS and spatial infinity.We use a shooting technique to match initial internal parameters with the external structure of the spacetime.The differential equations are solved by means of the Julia library DifferentialEquations.jl10(Rackauckas & Nie 2017).
A particular solution is determined by the choice of two theory parameters {α 0 , β 0 }, the EOS and the central pressure p c .The central value of the scalar field ϕ c is determined by the shooting procedure to correspond to the scalar field value at spatial infinity ϕ 0 = ϕ(r = ∞), which in our case is, without loss of generality, set to zero.Thus ϕ c is not an arbitrary parameter.The central density (p c ) uniquely corresponds to the gravitational mass m A for most masses of interest and fixed theory parameters {α 0 , β 0 }.In general, this is not the case -there may be several stable solutions for a fixed mass in the area of strong scalarisation.However, multiple solutions happen for large NS masses and very negative β 0 < −4.5 which is already ruled out and thus of no interest.
In the recent papers devoted to the calculation of NSs in scalar-tensor gravity (Anderson & Yunes 2019), the gravitational form-factors from Eq. ( 6) (they are also often called "scalar charges") are calculated using numerical differentiation.Numerical differentiation is performed on a grid of dependent variables.The precision of this approach is determined by two factors: a) the error in the calculation of the desired quantity and b) the error due the numerical differentiation formula.The first error depends on the precision of the numerical integration of the structure equations.The second error strongly depends on the fineness of the grid on which the derivatives are calculated.If the grid is too sparse the formula such as the central difference formula f (x) f (x + δx) − f (x − δx) / [2δx] is not accurate enough.On the other hand, if the grid is too fine, the error arises because of the subtraction of two close numbers in the computer memory.The total relative errors for calculating gravitational form-factors usually lie in the range of δ rel ∼ 10 0 − 10 −5 .In our work, we propose a more precise way to calculate all the quantities.

Appendix A.1: Automatic Differentiation
In our program, we utilise the Automatic Differentiation (Au-toDiff) technique.It is a special technique to calculate derivatives, when the algorithm knows the exact expressions for derivatives of all elementary functions and uses the chain rule to unfold complex derivatives.A review on the AutoDiff can be found, for instance, in Margossian (2018).We use an implementation of the AutoDiff technique in Julia from ForwardDiff.jllibrary, which is a part of a broad JuliaDiff framework (Revels et al. 2016).Au-toDiff allows calculating complex derivatives of quantities with respect to their arguments with the machine-precision (for Double Float numbers the corresponding precision is δ ∼ 2×10 −16 on 64-bit systems).It is done by internal implementation of sophisticated arithmetic on a special type of numbers (dual numbers) in the programming language.A derivative is calculated exactly in the desired point and does not require points in the vicinity.The precision of the method is constant and thus does not depend anymore on the fineness of the grid.
Automatic differentiation enables us to simultaneously calculate both the function F = F(x, y) and its derivatives with respect to arguments ∂F ∂x , ∂F ∂y .It takes twice as long as calculating a mere function itself and both are done with the machineprecision .To calculate a complex derivative, then the arguments are also functions B = B(x, y), C = C(x, y) we apply the chain rule: In our program, we calculate the gravitational form-factors from Eq. ( 6), which are complex derivatives of quantities ( e.g. a mass (m A ) and moment of inertia (I A ) of a NS) taken with respect to external parameters (the value of the scalar field at spatial infinity (ϕ 0 ).The properties of NSs are obtained through numerical integration of the structure equations and depend on the central scalar field (ϕ c ) and central pressure (p c ).For example for α A using Eq.(A.1) we simply obtain: where every simple derivative is calculated with the machineprecision due to the AutoDiff.As a result, the total precision of calculated gravitational form-factors is determined only by the accuracy of the differential equation solver and may be easily enhanced.We use the Julia library DifferentialEquations.jl which allows using AutoDiff on the results of the numerical integration.The numerical integration can take advantage of the AutoDiff itself because the Jacobian of the system can be calculated more accurately and used to integrate the ODE system.We obtain a higher precision if we select a finer adaptive step in the radial variable r.For our grids we set the integrator to the relative error of δ Int,rel ∼ 10 −11 .The obtained relative precision for gravitational form-factors typically lies in δ rel ∼ 10 −7 − 10 −13 range, which is more precise than previous calculations.
where we used variables Moreover, n = 2π/P b is the orbital circular frequency and ν p = 1/P is the pulsar's rotational frequency.The angle i is the inclination of the orbital plane with respect to the plane of sky, while angles λ and η are the polar angles of the spin axis.In the special case of alignment between the orbital and spin axes λ = i and η = −π/2 forcing B = 0.
The expression for Ṗb is composed from 4 different contributions (Damour & Esposito-Farèse 1992a) Ṗϕ,mon  The first two parameters "ALPHA0" and "BETA0" correspond to the values of two arbitrary DEF gravity parameters {α 0 , β 0 }.They can be selected to be zero to recover GR.Otherwise, the applicable range of values is determined by the range of the supplementary data files, containing grids of the calculated gravitational form-factors and masses.The grids are calculated as part of this work and supplied with the model.The supplied grids cover α 0 ∈ [−10 −1 , −10 −5 ] and β 0 ∈ [−6, +10] in the DEF gravity parameter space.
The third parameter "EOS" selects the equation of state to be used to perform the test.In the present work, we generally select the rather stiff EOS MPA1.TEMPO reads the supplied pre-calculated grids of the gravitational form factors and masses corresponding to the selected EOS (see also Table B.1).
The last parameter "COMP_TYPE" is used to select the type of the companion to the pulsar, among a white dwarf ("WD"), a neutron star ("NS") and a black hole ("BH").These During initialisation, TEMPO reads the parameter file and the file with the TOAs.When the {α 0 , β 0 } values and the EOS are provided, it reads the supplementary grids for the selected EOS.The supplied grids are three-dimensional and consist of the masses of NSs m A and their gravitational form factors {α A , β A , k A }. Two axes are {α 0 , β 0 } and the third axis is a fixed range of central pressure (p c ) of a NS.
Bilinear interpolation is performed in the {α 0 , β 0 } parameter space for the selected parameters.The result of the interpolation are four single-dimensional arrays for masses and gravitational form factors depending on a range of central pressures.Interpolated one-dimensional arrays are saved in the computer memory and used further without being changed.From this point gravitational form factors are ready to be interpolated in the mass range during the fitting procedure.
TEMPO with selected DDSTG model fits two masses in addition to spin, astrometric and Keplerian parameters.The masses 11 Since this is the case for any weakly self-gravitating body, the setting "WD" can also be selected for any non-degenerate companion such as a main-sequence star.
are iteratively changed during the fitting procedure.Only when the masses change TEMPO interpolates the gravitational form factors from the pre-calculated one-dimensional grids.Gravitational form factors are calculated for both the pulsar and the companion and are kept constant for all TOAs within a single iteration.
TEMPO tries to find the companion mass (m c ) and the total mass (m tot = m p + m c ) corresponding to the best fit.During the fitting, it utilises the derivatives of the timing formula (7) with respect to these two masses.We calculate derivatives separately for each delay in .We used the approximated expressions for the derivatives of the delays with respect to PK parameters provided by Damour & Deruelle (1986).After applying the chain rule for our derivatives we come to expressions depending on the derivatives of PK parameters with respect to the two masses.The model calculates the derivatives of the the PK parameters of DEF gravity (Eqs.B.1 -B.9) with respect to the two masses of the timing model, which are certainly different to those in GR.Moreover, for proper convergence of the model in highly nonlinear areas the additional corrections to the derivatives related to variation of gravitational form-factors with respect to masses (i.e.∂α A /∂m A , ∂β A /∂m A ) are required.
The formulae in Eq. (C.1) are large but can be obtained by straightforward differentiation of the PK formulae in Appendix B. The expressions are incorporated in the model and depend on gravitational form factors, masses and Keplerian parameters which are known on each iteration.Once the derivatives of PK parameters are calculated, the model calculates the derivatives of the timing delays with respect to the two masses and proceeds the iteration.The outcome of the DDSTG model in TEMPO is the same as for the DDGR model.The most important result is the χ 2 value, which is used further to perform tests of gravity.
The DDSTG timing model converges well if the eccentricity of the orbit is not too small and TEMPO is supplied with a reasonable initial guess of the orbital parameters.The model can run into difficulties in the convergence for solutions if those parameters are far away from GR or if the orbit is almost circular.The problem can happen if TEMPO changes masses significantly during the iteration and reaches a point where the Shapiro shape parameter calculated with new masses using Eq.(B.4) becomes greater than one, i.e. s > 1.In this case the expression of the Shapiro delay from Eq. ( 13) does not have a physical meaning and TEMPO breaks with an error for TOAs near conjunction, as the argument of the logarithm becomes negative.The convergence can be enforced by applying the special parameter GAIN, which however comes at the cost of increased computational time.A GAIN parameter of less than one forces TEMPO to use smaller steps while iterating.Another way which ensures the convergence is to supply TEMPO with a good initial guess for the masses in DEF gravity.The rough estimate of the masses in DEF gravity can be obtained from the traditional PK method.The combination of these two methods help to converge systems with small eccentricity and Shapiro shape parameter near one (s 1) even in the strongly nonlinear part of the {α 0 , β 0 } plane.overall Ṗb is close to the GR value, as the dipolar contribution is greatly suppressed.As a consequence, the test is passed.In To properly account for the uncertainty in Ṗext b we follow the method described in Splaver et al. (2002) based on Bayes inference.On the first stage TEMPO calculates χ 2 grid for three independent parameters {α 0 , β 0 , Ṗext b }.Then the shifted value ∆χ 2 = χ 2 − χ 2 min is described by χ 2 distribution with 3 degrees of freedom.The number of degrees of freedom is important when we calculate confidence level limits.The ∆χ where p(α 0 , β 0 , Ṗext b | {t j }) is the desired joint posterior probability function and p(α 0 , β 0 , Ṗext b ) is the prior.The Bayesian evidence p({t j }) is obtained from normalisation of posterior probability over the whole grid {α 0 , β 0 }.
In the next stage we select the prior p(α 0 , β 0 , Ṗext b ).We assume no prior information about α 0 and β 0 , so their distributions are taken uniformly for β 0 ∈ (−∞, +∞) and α 0 ∈ (−∞, 0].At this step, we utilise the information about the uncertainties in Ṗext where the integral is calculated on the grid of Ṗext b parameter.The marginalised probability than can be used to derive limits, for instance going back to χ 2 representation with 2 degrees of freedom.

Fig. 1 .
Fig. 1.Mass-mass diagrams for the PSR J2222−0137 system at the zone of high nonlinearity where a fast transition to a strongly scalarised pulsar happens.PK parameters are obtained by fitting the DD model to the timing data from Guo et al. (2021b).Calculations are preformed within DEF gravity for the MPA1 EOS allowing NSs with maximum mass in GR of M GR 2.46M .Left panel corresponds to the point in the DEF parameter space with α 0 = −10 −4 , β 0 = −4.3,which is not excluded by the test.Right panel shows the same test but for α 0 = −10 −4 , β 0 = −4.35.The prediction of a strong dipolar contribution ṖD b fails the test for the more negative β 0 = −4.35.The axes correspond to the masses of the pulsar m p and the companion c.The shadowed area is the allowed region at 68% CL limit for a corresponding PK parameter.The solid green line corresponds to the observed value of ṖGW b due to the GW emission.
and can be neglected for PSR J2222−0137.Thus the internal contribution has only one significant term left, the GW term.For the discussion about possible account of Ṗ Ġ b we refer the reader to Section 8. Within the DDSTG model, the parameter XPBDOT in TEMPO is used for treating both external and internal effects Ṗext b + Ṗint b − ṖGW b simultaneously subtracting the GW term.In our case XPBDOT accounts for only external Shklovskii and Galactic effects.There is an extensive analysis of the external terms for PSR J2222−0137 by Guo et al. (2021b) and their determined values are: ṖGal b = −0.0142(13)× 10 −12 s s −1 and ṖShk b = 0.2794(12) × 10 −12 s s −1 .These values correspond to the external variation of the observed orbital period of Ṗext b = 0.2652(18) × 10 −12 s s −1 , (26) which is consistent within 2σ with the total observed value Ṗobs b = 0.2509(76) × 10 −12 s s −1 , 0143(78) × 10 −12 s s −1 consistent with the GR prediction for quadrupolar GW emission ṖGR b = −0.00809(5)×10−12 s s −1 (Guo et al. 2021b).

Fig. 2 .
Fig. 2. ∆χ 2 map in DEF gravity parameter space for PSR J2222−0137.The test is performed by applying the DDSTG model to the timing data of Guo et al. (2021b) and the assuming rather stiff MPA1 EOS.The red line corresponds to 90% CL limit (∆χ 2 4.6), the area above the grey line is restricted by Cassini mission.GR with α 0 = 0, β 0 = 0 lies beyond the plotted domain in the blue region at the bottom at the infinity.

Fig. 3 .
Fig.3.Map of the refinement level in DEF parameter space.The test is the same as in Figure2.Areas in the parameter space with a higher refinement level have an exponentially growing resolution.The adaptive refinement procedure resolves the contour of 90% CL limit and saves computational time dramatically.

Fig. 4 .
Fig. 4. Comparison in the constraining power between the DDSTG model and the method based on measured PK parameters with the DD model.Both methods calculate χ 2 values for a grid in {α 0 , β 0 } space, each point corresponds to a unique theory.β 0 values are presented by the colour, while α 0 values cover the [−10 −1 , −10 −4 ] range.Red vertical line corresponds to the 90% CL limit, all points to the left are allowed by the test.

PFig. 5 .
Fig. 5. Comparison of different contributions in the uncertainty of Ṗb for simulated data from 2021 to 2030.The pink, orange and blue lines show the uncertainty of Ṗb using the simulated data from 3ERT (EFF, NC, LT), MeerKAT and FAST, respectively.The brown line indicates the uncertainty of ṖShk b when using the timing parallax (π x ) measured from simulated FAST data.The green lines indicate the uncertainties of ṖGal b when assuming 10% (dashed) and 3% (dash-dotted) uncertainties in the vertical component of Galactic acceleration K z .

Fig. 6 .
Fig.6.∆χ 2 map for PSR J2222−0137 from simulated data (2021-2030) combining different observatories and using MPA1 EOS.Assumed 1 orbit per year for FAST, MeerKAT, and 3ERT (EFF, NC, LT) based on their observation precision, appending to the existing data-set(Guo et al. 2021b).Each contour corresponds to 90% CL limits on DEF gravity parameters from different data-sets, grey line depicts the limit from the Cassini mission.

Fig. 7 .
Fig.7.∆χ 2 map in the DEF gravity parameter space from the simulated 5-yr timing data for three different PSR-BH systems.The left panel corresponds to 10 µs TOA uncertainty, the right panel to 1 µs.Solid lines show the 90% CL limits for PSR-BH systems with different orbital periods: 3 d, 1 d and 8 h.The grey dashed line depicts the limit from the Cassini mission and green dash-dotted line is the current limit from PSR J2222−0137 discussed in Sec. 4 (see Fig.2).
P b (d) ṖGR b (10 −12 s s −1 ) ∆ Ṗb , 10 µs ∆ Ṗb , The rate of the orbital period change in GR ( ṖGR b ) for selected PSR-BH systems and corresponding uncertainty in Ṗb measurement for three different orbital periods and two TOA uncertainties.The values of Ṗb,GR and ∆ Ṗb are given in 10 −12 s s −1 .
three compact objects have different properties in terms of gravitational form factors. NSs in DEF gravity can be moderately |α NS | |α 0 | or strongly |α NS | ∼ O(1) scalarised depending on the mass m A and the chosen DEF parameters.WDs are not enough compact to significantly deviate from the weak-field approximation |α WD | |α 0 |.Thus for WDs we apply α WD = α 0 , β WD = β 0 . 11In contrast, BHs in DEF gravity fulfil a no-hair theorem and they are completely descalarised α BH = 0 and β BH = 0.The choice of the companion type plays a significant role for the calculated values of PK parameters.
Fig. D.1.Mass-mass diagram in DEF gravity corresponding to a point near the "horn" with α 0 = −10 −1 , β 0 = −1.9.The performed test is the same as for Figure1.The shadowed area is the allowed region at 68 % CL limit for a corresponding PK parameter.The solid green line corresponds to the observed value of Ṗint b .
Figure D.1 we show a mass-mass diagram in DEF gravity with α 0 = −0.1,β 0 = −1.9.This point in the DEF gravity parameter space passes the test with timing data but is excluded by Cassini experiment (|α 0 | 3.4 × 10 −3 ).The picture is dramatically different from what we see in the zone of scalarisation in Figure 1.Appendix E: Proper account of Ṗext b uncertainty If the observed Ṗb is measured more precisely than the external contribution Ṗext b we have to take the uncertainty of the external contribution into account.In most situations Ṗext b consists of the Shklovskii and the Galactic contribution.The calculation of the Shklovskii contribution is limited by the uncertainty in the distance and the one in the proper motion.The uncertainty in the estimation of the Galactic contribution is determined by the uncertainty in the distance and our imperfect knowledge of the Galactic gravitational potential.The latter is partly systematic in nature and therefore somewhat more difficult to quantify.If the uncertainty ∆ Ṗext b is the limiting factor we no longer can have Ṗext b fixed on one value for the whole experiment (i.e.provide a fixed value for parameter XPBDOT in TEMPO) because its uncertainty can affect the derived limits.

Table 1 .
Parameters of telescopes used in simulations.
Notes.Effective diameter of telescopes, observing bandwidth, and TOA uncertainties of PSR J2222−0137 used in the simulations.All information is based on the L-band data from real observations, and is scaled to 15-min integration over the full bandwidth.

Table 3 .
Orbital period properties of simulated PSR-BH systems.

Table C .
1. Parameters used in DDSTG timing model in TEMPO.