Comments on the mass sheet degeneracy in cosmography analyses

We make a number of comments regarding modeling degeneracies in strong lensing measurements of the Hubble parameter $H_0$. The first point concerns the impact of weak lensing associated with different segments of the line of sight. We show that external convergence terms associated with the lens-source and observer-lens segments need to be included in cosmographic modeling, in addition to the usual observer-source term, to avoid systematic bias in the inferred value of $H_0$. Specifically, we show how an incomplete account of some line of sight terms biases stellar kinematics as well as ray tracing simulation methods to alleviate the mass sheet degeneracy. The second point concerns the use of imaging data for multiple strongly-lensed sources in a given system. We show that the mass sheet degeneracy is not fully resolved by the availability of multiple sources: some degeneracy remains because of differential external convergence between the different sources. Similarly, differential external convergence also complicates the use of multiple sources in addressing the approximate mass sheet degeneracy associated with a local ("internal") core component in lens galaxies. This internal-external degeneracy is amplified by the non-monotonicity of the angular diameter distance as a function of redshift. For a rough assessment of the weak lensing effects, we provide estimates of external convergence using the nonlinear matter power spectrum, paying attention to non-equal time correlators.

Strong gravitational lensing of galaxies probes the mass distribution in lens objects and the background cosmology [1][2][3][4]. Imaging data combined with gravitational time delays of quasars and supernovae could allow a determination of the Hubble parameter H 0 [5][6][7][8][9]. Subject to simplifying assumptions on the mass profile of lens galaxies, a handful of systems with quasar time delays were enough for measurements of H 0 [10][11][12][13][14][15] that were widely used for tests of the ΛCDM model [16][17][18]. The prospects for time delay cosmography will make a leap with the advent of various surveys [19][20][21] and notably the LSST [22], that will discover thousands of lensed quasars and dozens of lensed supernovae, bringing the number of strongly lensed quasars with time delay measurements to hundreds [23][24][25]; and with the JWST [26], that will sharpen constraints on lens stellar kinematics [27,28].
While observations become numerous and precise, systematic degeneracies are a well known limiting factor in the interpretation of lensing data [29][30][31][32][33]. For example, relaxing some of the simplifying assumptions made in [10][11][12][13][14][15], a possible tension between the value of H 0 inferred via lensing and via large-scale structure (LSS) analyses [34][35][36] may be replaced by a core feature in the lenses [37,38]. It is clear that a careful account of modeling degeneracies will be crucial to take advantage of the progress in observations.
In this paper we comment on certain modeling degeneracies that affect the connection of imaging, time delay, and kinematics with physical information on lens profiles and cosmology. Several aspects of our discussion have been considered in the past at various levels of detail (see, e.g. [29,30,32,39], and notably the discussion in [38]). However, as we show, recent cosmography campaigns still do not account for the degeneracy in full.
The outline of the paper is as follows. In Sec. II we review cosmological weak lensing effects [40,41], that are intertwined with the strong lensing reconstruction problem via the mass sheet degeneracy (MSD).
In Sec. III we show a limitation in using kinematics data to alleviate the MSD. The problem is that weak lensing entails three distinct effects, coming from the source-observer segment, the source-lens segment, and the lens-observer segment of the line of sight (LOS). Omitting shear for a moment, these effects are summarised by three convergence terms: κ s , κ ls , and κ l , respectively. In general, different combinations of κ s , κ ls , and κ l enter into the bias in H 0 and into the interpretation of kinematics data. To ameliorate this ambiguity, imaging+kinematics analyses such as [10-15, 28, 38] should introduce nuisance parameters for κ l , in addition to κ s .
In Sec. IV we discuss the use of ray tracing simulations to obtain an observationally-informed theoretical prior on the weak lensing correction. We note that accounting for the full bias in H 0 requires that the ray tracing be used to extract all of κ s , κ ls , and κ l . Existing analyses [10][11][12][13][14][15] neglected the κ ls and κ l terms, possibly resulting in residual bias to their inferred value of H 0 .
In Sec. V we consider systems with multiple sources. This is a timely problem because analyses of multiple sources in cluster lens systems are advancing [9], and the situation also occurs in some galaxy lenses [12,42], where we can expect significant observational progress with the advent of new surveys [23,24,43]. We show that the MSD associated with weak lensing is not resolved by multiple sources, and clarify what imaging data does measure: a certain differenceof-differences of convergence terms. This combination of convergence terms is not the same one that enters the H 0 reconstruction problem. To mitigate the MSD, theoretical estimates of the weak lensing effect must be input to the analysis, similarly (although not precisely the same) to the way it needs to be input in systems with a single source. Multi-source analyses such as Ref. [9] should be adjusted to include this effect.
In Sec. V A we consider the so-called internal MSD. Specifically, we are thinking of the impact of a sub-dominant core component in the density profile of lens galaxies, which could act as an approximate version of the MSD [32,37].
We show that imaging data by itself cannot distinguish a core deformation of the intrinsic lens model, from an adjustment of weak external convergence. Interestingly, this statement remains true even when multiple sources are available. Introducing theoretical estimates of weak lensing, it should indeed become possible to identify a core effect if the magnitude of the convergence term associated with the core is parametrically larger than that expected theoretically from weak lensing. However, we point out an important dilution factor that makes this distinction potentially difficult. In Sec. V B we estimate the effect for two sample systems.
A related discussion of the multi-source MSD was given in [44][45][46]. The main difference between that work and ours here, is that [44][45][46] considered the role of intermediate sources as additional deflectors, that must be modeled separately from the main lens, and that exhibit a residual multi-lens version of the MSD. We comment on this point in App. A. It does not replace our discussion, but adds another layer of complexity in the modeling.
We summarise in Sec. VI. In App. A we give a brief derivation of the weak lensing correction in strong lensing systems. In App. B we provide estimates of weak external convergence using the nonlinear matter power spectrum and paying attention to non-equal time correlators that arise due to projection. We use these computations for a rough assessment of the effect. This is enough for illustrating our main points in this paper, although direct weak lensing surveys or ray tracing techniques, specifically designed to match the bias of the field containing individual strong lensing systems [47][48][49][50][51][52][53][54][55], are probably mandatory for more accurate analyses.

II. RECAP: WEAK LENSING AND MASS SHEET DEGENERACY IN STRONG LENSING ANALYSES
Consider a gravitational lens system with N sources, located at redshifts z i , i = 1, . . . , N . The deflection angle caused by the lens (main deflector) relative to source i reads Here κ i ( θ) is the convergence, Σ( x) is the surface mass density of the lens computed at proper position x transverse to the observer-lens line of sight, Σ crit (z l , z i ) is the critical density (we use natural units with c = 1), is the angular diameter distance from an emitter at redshift z e to an observer at z o , and z i , z l are the redshifts of the i-th source and of the lens, respectively. Notice that we can write That is, the deflection angle affecting the i-th source is a scaled version of the deflection angle affecting the 1st source. When we discuss the internal lens model in what follows it would be convenient to highlight α 1 , understanding that α i follows by Eq. (4).
In writing α( θ) we think of the main deflector as a localised concentration of mass (localised compared with cosmological distances), assuming that α( θ) → 0 for | θ| much larger than the Einstein angle of the system, | θ E |, defined via 1 Weak lensing from large scale structure in the intervening space between the sources, the lens, and the observer, modifies the lens equation by introducing external convergence and shear. These modifications must be taken into account in lensing analyses [56]. In the tidal approximation, the lens equation becomes [41,44,45,47,[57][58][59] (see also App. A) where κ r i are external convergence factors for source i, is the reduced shear matrix, and the superscript r = l, s, ls indicates observer-lens, observer-source, and lens-source lines of sight. Compared with the internal convergence κ i , which is of order unity near the Einstein angle κ i ( θ E,i ) = O(1), the weak lensing terms are small, typically in the range |γ r,i |, |κ r i | ∼ 0.01 − 0.1. In App. B we estimate their magnitude; a typical result is illustrated in Fig. 1. We show the root mean square (RMS) values of κ l,s,ls , which are cosmological random variables. The shear terms γ l,s,ls 1,2 scale similarly. For coherence with the tidal approximation, in the following we will mostly keep first order in κ r,i , γ r,i 1,2 . We assume that the large-scale structure producing the weak lensing is distributed over cosmological scales 1 Mpc (compared with the galactic scale 1 Mpc of the primary lens that produces α i ), thus the weak lensing terms are approximated as constants over the angular range containing the strong lensing image information.
For simplicity of notation, we define 1 The definition of θ E in Eq. (5) applies for axisymmetric lenses, but may not apply for arbitrary lens mass distributions. This subtlety is not important for our analysis. Note that M s i and M ls i carry the source label i, while M l is common to all sources. With this notation, we can write a weak lensing-modified lens equation as The modified deflection angle α contains a mixture of terms, some local to the lens and some coming from weak lensing. Thus α( θ), in general, does not decay at large |θ|; instead, it satisfies α i ( θ) → M s i θ. The time delay between images A and B (associated, e.g., to a time-variable quasar) of source i is [45,58,60,61] (see also App. A) Here D i dt is the time-delay distance [62], and ψ i ( θ) = C i ψ 1 ( θ) is the intrinsic lensing potential, defined via ∇ψ i ( θ) = α i ( θ). In this analyses we do not ex-plore the possibility of obtaining time-delay data for more than one source. Thus, we will drop the source index i on ∆τ AB . The MSD affecting the lensing reconstruction problem [63] is usually represented by replacing, in Eqs. (9) and (10), where λ is an arbitrary real parameter. (More general degeneracies exist [64][65][66], but for our main points it is enough that we restrict ourselves to Eqs. (14)(15).) Image coordinates θ and magnification ratios are invariant under Eqs. (14)(15). However, time delays are affected, and therefore, so is the inference of H 0 . Eqs. (14)(15) imply a degeneracy in the modeling of weak lensing data, coupled with a reparameterization of the model describing the "intrinsic" deflection angle α 1 . Infinitely many different reparameterizations of M r i and α 1 can produce Eqs. (14)(15). In considering these possibilities we assume that lens and source redshifts are measured perfectly, so the cosmological functions C i are known without appreciable uncertainty (given a cosmological model).
Because of the inhomogeneous term (1 − λ) θ in Eq. (15), it is natural to associate the MSD with a reinterpretation of the inhomogeneous observer-source weak lensing term in Eq.
However, the interpretation is coupled to additional degeneracies with M ls i and M l . It is convenient to parameterize the combined degeneracy by allowing M ls i and M l to also be adjusted, alongside an adjustment of the intrinsic lens model and the modeled source coordinates: Here λ s , λ ls , and λ l are independent parameters. Note that Eq. (16) (for example) amounts to κ s (19)(20)(21) into Eq. (12), we see that the dimensionless time delay ∆τ AB of the transformed model changes according to (see [38] for an earlier discussion): We would like to emphasize that the availability of multiple sources does not, by itself, ameliorate the MSD: as far as imaging information is considered, the modeling degeneracy expressed by Eqs. (16)(17)(18)(19)(20)(21) remains exact. It simply amounts to a simultaneous reinterpretation of the weak lensing variables affecting all of the sources. (The same conclusion, with a different version of the MSD and a discussion of intermediate sources as additional strong lenses for background sources, was reached in Refs. [44,45].) We return to this point in Sec. V.
In the absence of a direct measurement of weak lensing applicable to the field of view of the strong lensing system, the only way to ameliorate the MSD is by appealing to theoretical estimates of the magnitude of weak lensing variables. For example, a theoretical estimate of the expected possible magnitude of κ s i , as shown in Fig. 1, could constrain the conceivable range of 1 − λ s in Eq. (16): for some systems, an additive shift of order |1−λ s | ≈ 0.1 in κ s i may be difficult to justify from a cosmological point of view. In App. B we estimate some of these theoretical constraints.

III. ON THE USE OF STELLAR KINEMATICS TO RESOLVE THE MSD
In an imaging analysis, if only a single source is available (say i = 1), one can use Eqs. (16)(17)(18)(19)(20)(21) with the choice (24) to eliminate all of κ s , κ ls , and κ l from the modeling. For this reason, the task of extracting lensing information in imaging data is often performed ignoring external convergence [10][11][12][13][14][15]. (The details of how shear is modeled [67] will not be important for the discussion in this section.) Suppose we denote the fit result for the "intrinsic deflection angle" in such an analysis by α model ( θ). By "eliminating external convergence from the equations", we mean that the fit looks for a deflection angle model α model ( θ) which goes to zero at large | θ|, possibly up to a uniform shear term Γ s θ. Eq. (20) implies that α model ( θ) is related to the true underlying physical intrinsic deflection angle by where κ s,ls,l are the true physical values of the weak lensing terms. Given a measurement of the physical image time delays, and deriving the dimensionless time delay ∆τ model AB from α model , one can extract an inferred result H model 0 , which is related to the truth value H 0 by [38] H model The usual challenge of the weak lensing MSD for cosmography is to constrain the correction factor Stellar kinematics is sensitive to the intrinsic mass-perradius (M (R)/R) of the lens, and can be used to partially resolve the MSD. Refs. [10][11][12][13][14][15] used kinematics to constrain the MSD, but in these works, weak lensing was only parameterised in terms of κ s , omitting κ ls and κ l . The omission of κ ls and κ l biases the inferred value of H 0 . To explain this we consider a simplified scenario, where we can inspect the information content of imaging, time delays, and kinematics separately.
Suppose that the intrinsic deflection angle of the lens is given by the power-law (PL) profile (we denote θ = | θ|) To this, we add some true physical values for κ s,ls,l , so altogether the imaging data satisfies Eqs. (9)(10). Note that because of weak lensing, the parameterθ E in Eq. (27) is not equal to the Einstein angle, that we will denote by θ E . The imaging part of the data can be summarised as a measurement of θ E . We will simplify the discussion by assuming that also γ PL is accurately determined. The effective modeling which transforms away the weak lensing terms would converge onto the model The relation between the PL parameterθ E and the Einstein angle θ E is, therefore, Turning to kinematics, the observable velocity dispersion for the PL profile is 2 In the second line we connect our result with Eq. (8) of Ref. [15] (see also [11,68]), defining J as a cosmologyindependent function that depends only on imaging observables. For simplicity, we assume that the velocity dispersion is isotropic.
is a function of the system redshifts and of cosmological parameters, but is independent of H 0 which cancels out in the ratio of angular diameter distances; for simplicity, we assume that it is known without error. Note that: (i) our derivation of Eq. (30) accounts explicitly for the impact of weak lensing, so there are no hidden insertions of κ r in the ratio d A (0, z s )/d A (z l , z s ) which here simply expresses the ratio of the two usual redshift integrals defining d A (z o , z e ) in an unperturbed FRW cosmology, and (ii) from the first line in Eq. (30), the kinematics measurement of σ 2 can be summarised as a measurement of (30)] and the imaging data [θ E in Eq. (29)], one can obtain a measurement of the weak lensing factor, This measurement is not equivalent to a measurement of the MSD factor that is needed in order to extract the truth value of H 0 from the effective model result H model 0 in Eq. (26); specifically, even assuming that γ PL is perfectly well known, the two weak lensing factors are offset by 1 − κ l in the denominator.
Ref. [15] presented a treatment of systematics in recent cosmographic analyses. There, the following expression was used to correct for the weak lensing MSD 3 : The terms κ ls,l were effectively set to zero in the modeling, as they were ignored in both kinematics and imaging. From Eq. (8) in Ref. [15] and our Eq. (30) it follows that for a PL density profile, the term κ ext should be identified with This expression coincides with the discussion in Ref. [68], cited by [15] for the treatment of kinematics, if we set κ ls → 0, in which case κ ext → κ s . Combining Eqs. (33), (32), and (26), we conclude that in Ref. [15] the relation between the inferred value and the truth value of the Hubble parameter was biased by the following factor: We should note that although we considered σ 2 as an observable, in practice it is not directly measured. Various observational effects such as luminosity weighting, point spread function, etc., must be taken into account. Moreover, there are important theoretical uncertainties due to the velocity anisotropy, and also due to the actual lens halo density profile (even in the simple power law model considered above, unknown profile parameters include the slope γ PL ), which must be marginalized over in the likelihood.
The correction for external convergence requires all of κ s,ls,l to be extracted simultaneously, and applied to the cosmography analysis via Eq. (26). However, Ref. [10][11][12][13][14][15] only used ray tracing to derive the observer-source LOS term, κ s . This was identified in these analyses with the parameter κ ext , that was applied to correct for the effect in the determination of H 0 using Eq. (32), with κ ls,l taken to vanish 4 5 .
Therefore we expect that in these analyses, the inferred value of H 0 (corrected by ray tracing for κ s ) is still biased w.r.t. the truth value of H 0 , by the amount: We note that the κ s,ls,l terms should be considered as separate (albeit statistically correlated) nuisance parameters in cosmography. To clarify this point, in Fig. 2 we show an estimate of the statistics of κ s and κ ls in a specific example (see, e.g. [47] and references in and of it for previous studies). For definiteness, for this example we use the results shown in Fig. 1 with a source redshift z s = 2. The top panel of Fig. 2 shows the 50% and 90% quantiles of the bivariate distribution of κ s , κ ls , assuming Gaussian statistics. The bottom panel shows the conditional probability distribution of κ ls given a measured value of κ s = 0.034 (corresponding to the RMS of κ s in this example). We emphasize that our calculation here uses the analysis of App. B, and can only be used as a rough estimate of the statistics of the weak lensing terms. More accurate results probably require ray tracing simulations. 4 As an aside, we note that the identification of κ ext with κ s extracted from ray tracing, and the alternative identification of κ ext via kinematics as in Eq. (33), are consistent for κ l = κ ls = 0, but generally inconsistent otherwise. 5 The correct definition of κ ext that incorporates all of κ s,ls,l was explicitly written in Ref. [38] (we thank Simon Birrer for drawing our attention to this fact). However, also in [38], when making contact with ray tracing priors it was assumed that κ ext = κ s ; see Sec. 5.1 there.

V. MULTIPLE SOURCES AND DIFFERENTIAL CONVERGENCE
If multiple sources are available, then the MSD requires a simultaneous adjustment of the weak lensing variables for all sources. In particular, the κ s,ls i parameters for all sources i must be adjusted together, following Eqs. (16,17). Therefore, in the multi-source scenario, a certain combination of external convergence terms is measurable from the imaging data. A quick way to see what this measurable combination is, is by assuming that the analysis pipeline attempts to fit the two systems i and j separately and independently, omitting external convergence from the equations using Eq. (25). The outcome of such a procedure are two independent fits for the effective deflection angle, the results of which should be related by an over-all factor: where δκ r ji := κ r j − κ r i . The left hand side of Eq. (36) is measurable, and the C i 's are known, so the combination is, in principle, measurable. Unfortunately, as this combination of terms is invariant under the MSD, it cannot resolve the MSD impact on the H 0 inference.
A. MSD-core ("internal convergence") Uncertainties in the intrinsic mass profile of the lens could pose a more serious problem to time-delay measurements of H 0 , than that posed by weak external convergence. Specifically, an extended cored density component in lens galaxies would act similarly to external convergence [32,37], but could, in principle, cause a much larger effect. This scenario could occur in some models of dark matter [69].
To make the discussion concrete, consider the following change to the intrinsic physical surface mass density of the primary lens, We can think of the original density profile, Σ( x), as some steeply-falling mass distribution. It could come, for example, from the sum of a CDM Navarro-Frenk-White (NFW) profile, with Σ NFW (r) ∝ 1/r 2 at r R S , where R S is the NFW length scale parameter, and a stellar mass distribution Σ * (r) that falls even faster at large r. At smaller radii, near and around the projected Einstein radius of the lenses, lensing analyses often assume Σ(r) ∼ 1/r (corresponding to 3D density scaling as ρ ∝ 1/r 2 ).
In contrast, we will assume that the core component Σ c (r) is nearly constant for r near and below the projected Einstein radius. Note that by adding the core component in Eq. (37), we are not eliminating the cusp of Σ(r) at small r, but rather just adding to it a sub-dominant constant density term. At large radii, r > R c , the core component is assumed to decay, eventually joining or falling below the original Σ(r). The lensing analyses constrain R c to be larger than a few times the projected Einstein radius of the lens, with precise details of the transition depending on the precise implementation of the core profile [38,69]. For the ultralight DM cores considered in [69], for example, lensing data demands that R c should be larger than ∼ 3 times the projected Einstein radius of the lens. In what follows, for simplicity, we will assume that R c is large enough so that we can neglect the finite radius corrections. Restoring these effects is straightforward, and not essential for our current analysis. Kinematics analyses [70] could also constrain a core feature, and may be able to provide an upper limit on R c , although the cusp+core composite model has not yet been included in existing studies.
If R c is large enough, then the core term in Eq. (37) is mathematically identical to a redefinition of the observersource external convergence. Considering Eq. (10), we see that at the level of the modeling of imaging data, the core component is indistinguishable from the shift where We will think of the internal core convergence κ c1 as a small parameter, albeit potentially somewhat larger than cosmological weak external convergence terms. We have in our mind the lensing contribution to the H 0 tension [17,18], that could be resolved by κ c1 ≈ 0.1 [37,69]. Expressed in terms of convergence and shear parameters, at leading order in weak lensing terms, we have With this understanding one can see that imaging data alone cannot directly separate a core component from weak lensing. One must resort to kinematics analyses, or to theoretical considerations that could limit the plausible weak lensing effect. We focus on the latter. It is worthwhile to highlight a key difference between convergence and shear. Under Eqs. (16)(17)(18), which deal purely with the modeling of external weak lensing, shear is adjusted multiplicatively, while convergence receives an additive correction. This feature is modified in Eqs. (40)(41), but a key part of it remains manifest: the addition of the core adjusts Γ s via an additive term, however that additive term is itself proportional to the shear terms Γ ls + Γ l . As a result, even if κ c1 is somewhat larger than typical weak lensing effects (e.g. κ c1 ≈ 0.1), this still only amounts to a relative correction of ∼10% in Γ s . Constraining such a small effect observationally or theoretically would be challenging. This point is important because certain combinations of weak lensing shear terms can, in principle, be measured directly from imaging data [59,67]. For convergence, Eq. (40) suggests a potentially large additive readjustment of κ s , if κ c1 is larger than typical weak lensing effects. However, if only one source is available (i = 1), then it could be difficult for imaging data alone to constrain κ c1 .
If more than one source is available, then we have seen in Sec. V that a certain combination of differential convergence terms is measurable from the imaging data. Inserting Eq. (40) into Eq. (36), and neglecting the small correction factor 1 − κ ls i − κ l ≈ 1 in Eq. (40), we see that the following ratio of deflection angles can be measured: At a first glance in Eq. (42), one could hope that multiple source systems could resolve the core-MSD ambiguity, because the last term on the right hand side contains the large additive term ∝ κ c1 . However, a second glance reveals a setback: in Eq. (42), κ c1 appears multiplied by the factor C j − C i , proportional to the relative difference of angular diameter distance combinations of the two sources (see Eq. (4)). Unfortunately, the angular diameter distance is a non-monotonous function of redshift; moreover, the sources of typical strong lensing systems are often located between z ∼ 1 and z ∼ 2.5, that is, around the shallow maximum of d A (0, z). As a result, in many systems of interest, the difference C j − C i ∼ O(0.1) is much smaller than unity. This "dilutes" the efficiency at which imaging data in multiple source systems could constrain the internal MSD. Fig. 3 illustrates our point. We show two examples of the curve C 2 −1. The blue line is inspired by the multiple source system of the cluster lens MACS1149.5+2223 [9,71]. The primary lens (cluster) redshift is z l ≈ 0.5. Time-delays are measured for a type Ia supernova (source 1) at z 1 ≈ 1.5. Fig. 3 shows C 2 − 1 as function of a second source redshift z 2 . (Actual additional sources of this system are distributed between z 2 ∼ 1.2 and z 2 ∼ 3.7.) The orange line is inspired by the galaxy lensing system DES J0408-5354 [12], z l ≈ 0.6, with time-delays measured to a quasar at z 1 ≈ 2.3.
Because C j − C i is a small number, the κ c1 term in Eq. (42) could be diluted down to the natural scale of weak cosmological convergence. To detect (or constrain) an internal core, it therefore becomes crucial to estimate the magnitude of weak differential convergence. We consider this problem in App. B, and comment on examples in Sec. V B.
Before we move on, let us make a rough assessment of the precision by which the left hand side of (42) can actually be measured. Note that most of the information in the lensing data comes from the angular range θ ∼ θ E , where for simplicity of this estimate we can consider spherically symmetric systems and drop the vector notation on θ. Using the fact that α(θ E ) = θ E , the relative uncertainty by which | α j / α i | can be measured is of similar size as the quoted precision on the ratio of Einstein angles, |θ Ej /θ Ei |. For typical TDCOSMO systems, this precision is at the level of ∼ 1%. Of course, this quoted precision corresponds to the main source considered by the analysis (usually, the source for which time-delays are measured). What we actually need is the differential convergence, and the precision on that would be dominated, given two sources i = 1, 2, by the source for which the precision on θ E,i is poorest.

B. Examples of multiple source systems
A key point of our analysis is that the availability of multiple sources in a lensing system can only resolve the core-MSD degeneracy to the extent, that the core-induced term, κ c1 (C i − C j ), is significantly larger than the natural expectation for the weak cosmological differential convergence term, δκ s ij − δκ ls ij , in Eq. (42). Having armed ourselves, in App. B and App. B 1, with an estimate for the external convergence, we now explore two multi-source systems from the literature.

DESJ0408-5354
As noted earlier, this galaxy lensing system has a primary lens at z l ≈ 0.6, and main source (lensed quasar-host galaxy) at z 1 ≈ 2.3, and a secondary source at z 2 ≈ 2.2. A TDCOSMO analysis of this system, fitting an elliptic powerlaw density model for the lens (without allowing for a core component), inferred a value of H 0 which was ≈ 11% higher than the CMB/LSS result [12]. Thus, a core component at κ c1 ≈ 0.1 could completely resolve the lensing H 0 tension for this system [37,69].
The question arises, whether the presence of the second lensed source for this system could resolve the core-MSD associated with κ c1 . To address this question, in the top panel of Fig. 4 we plot (blue line) the function C 2 − 1 for this system, weighted by the factor κ c1 ≡ 1 − λ = 0.1. To demonstrate the confusion with weak external convergence, following Eq. (42) we superimpose a band with width chosen as the RMS value of δκ s 12 − δκ ls 12 for the system. The red vertical line marks the redshift of the actual secondary source.
We conclude that multi-source imaging data for DESJ0408-5354 [12] is unlikely to help in constraining the core-MSD proposal sufficiently to solve the lensing H 0 tension.

MACS J1149.5+2223
As noted earlier, the lens in MACS J1149.5+2223 [9,71,72] is a galaxy cluster at z l ≈ 0.54. The main source is a type-Ia supernova at z 1 ≈ 1.5. Six additional multiplyimaged sources are distributed in redshift in the range z i ≈ 1.2 to z i ≈ 3.7.  H0 tension), compared with the cosmological RMS weak differential convergence δκ. Top: redshift parameters chosen to resemble the TDCOSMO system DESJ0408-5354 [12]. Bottom: parameters chosen to resemble the MACS J1149.5+2223 cluster system [9,71]. In both panels, the function Ci−1 vanishes at the redshift of the primary source (the source to which time-delays are measured). Vertical red lines mark the redshifts of secondary sources, that one could try to use to resolve the core-MSD. Code: .
In the bottom panel of Fig. 4 we show that for the secondary sources in MACS J1149.5+2223 [9,71], weak differential convergence should significantly (although, perhaps, not entirely) mask the presence of an internal MSD. We thus expect that adding differential convergence as nuisance parameters for the secondary sources (that is, the sources additional to the SNIa host, to which time delays were specified in the mock of [9]) would significantly increase the uncertainty on the impact of the MSD as compared to the preliminary results in the appendix of [9].

VI. SUMMARY
In the effort to determine the Hubble parameter H 0 using strong lensing time delays, a key challenge is the mass sheet degeneracy (MSD). The MSD can be naturally as-sociated with two physical phenomena: cosmological weak lensing ("external convergence" or "external MSD"); and the possibility of a core component in the lens object ("internal MSD"). Well known methods to alleviate the MSD are: (i) the combination of imaging data with stellar kinematics, (ii) the use of ray tracing simulations to obtain an observationally-informed theoretical prior on external weak lensing, and (iii) the study of systems containing more than one strongly-lensed source.
In this paper we discussed some issues related to the MSD. In Sec. III, regarding the use of kinematics, we noted that the relation between kinematics constraints and imaging data involves a combination of weak lensing terms that includes all of the observer-source, observer-lens, and source-lens segments of the line of sight (LOS). Neglecting the source-lens and observer-lens convergence termsa common practice in current analyses -could lead to a bias of the order of a few percent in the inference of H 0 from time delays. It is possible to account for the effect by adding the observer-lens term as nuisance parameter in the combined imaging+kinematics likelihood.
In Sec. IV we noted that the neglect of the source-lens and observer-lens LOS contributions also affects ray tracing methods. Here too, omitting some of the LOS terms should bias the H 0 inference. It should be possible to extract priors for all of the LOS terms, and not only the observer-source one, from ray tracing.
As we review in Sec. II, the MSD is not broken by the availability of multiple sources in the imaging analyses. In Sec. V we considered what multiple sources do allow one to measure, which is differential convergence between different sources. Interestingly, weak differential external convergence complicates attempts to resolve the internal MSD, even if the internal core effect is parameterically larger than the weak lensing terms. The problem is that multiple sources are only useful against the internal MSD to the extent that they come with significantly different angular diameter distances; in practice, however, the angular diameter distances in typical multi-source systems used in cosmography are similar to the 10% level.
In App. B we described a non-perturbative calculation of cosmological external convergence, that allows us to provide rough estimate of the expected size of the effect, as well as estimates of statistical correlations between different convergence terms. Our calculation suggests (what we think is) a natural approximate way to account for non-linear matter power spectra entering in correlation functions at different values of the cosmic time variable.

ACKNOWLEDGMENTS
We are grateful to Fred Courbin and especially Simon Birrer for comments on the manuscript, including spotting a mistake in our preliminary draft. We are also grateful to Daniel Johnson for spotting a mistake in a previous version of this work. The work of KB was supported by grant 1784/20 from the Israel Science Founda-tion. The work of YS was supported by grants from the NSF-BSF (No. 2018683), the ISF (No. 482/20), the BSF (No. 2020300) and by the Azrieli foundation. The work was supported by the International Helmholtz-Weizmann Research School for Multimessenger Astronomy, largely funded through the Initiative and Networking Fund of the Helmholtz Association. This work made use of the following public software packages: CAMB [73,74], pyfftlog (based on Ref. [75]).

Appendix A: The lens equation with weak lensing
In this appendix we review the derivation of the weak lensing effects in the lens equation. These results are known [40,41,44,45,47,[57][58][59], and we include them here for completeness of the main text. Let us suppose that we have a strong deflector located at a comoving distance η l . We can split the gravitational potential as where Φ t (β(η), η) is the weak gravitational potential associated to weak lensing effects, andΦ is the gravitational potential of the main deflector. We can implement the tidal approximation on Φ t by setting The lens equation may be written as [76] β Within the tidal approximation, Eq. (A2), we can write The second term on the RHS of Eq. (A4) is an unobservable overall shift of the deflection angle (independent of θ), which can be reabsorbed in the source coordinates. Defining we expect M ij terms to be small as long as we are dealing with weak fields and maintain only terms at first order in these quantities. In particular, for η < η l , substituting in Eq. (A4), we obtain β(η l ) = (I − M (η l , 0)) θ.
For η > η l , the situation changes due to the presence of the strong deflector. Considering the full Φ from Eq. (A2), avoiding the tidal approximation for the strong deflector (but using the thin lens approximation, encoded in the Dirac delta), we have, with η s as the comoving distance of the source, where on the last step we substituted β(η ) inside the integral with the term We can rewrite the term in square brackets in Eq. (A8) as The time delay between image solutions of Eq. (A11) can be computed by exploiting the Fermat principle [2,45,61]. First, note that we can write the potential part of the time delay due to the main deflector, t pot , as The Fermat principle states that, up to an affine transformation, the lens equation can be obtained by taking the gradient ∇ θ of the time delay function t( θ, β) and setting it to zero. Eq. (A12) can then be used to understand what is the correct prefactor (the affine parameter) entering the time delay function. We see that from the function one indeed recovers Eq. (A11) using ∇ θ t( θ, β) = 0, recalling the definition ∇ ξ ψ( ξ) = α( ξ). Notice that Eq. (A13) has the correct prefactor, Eq. (A12), in front of ψ((I − M l ) θ). Finally, Eq. (11) is recovered via ∆t AB = t( θ A , β) − t( θ B , β).

Multi-plane lens equation.
In our discussion, we did not take into account the possibility that nearer sources could act as additional lens planes for further sources [44,45]. It should be clear that adding this effect into the modeling increases the complexity and also adds more possible layers of degeneracy, beyond and on top of the weak lensing MSD we emphasized in our analysis.
Here we briefly explain how the effect can be embedded into our notation.
Adjusting our notation to that in Ref. [45], we label with the index i = 0 the primary lens plane and with index i > 0 the source planes, with i > j implying that source i has bigger redshift than source j.α i is now the deflection angle due to lens/source i, which relates with the usual quantity used in lens equations, α i , with With this, we can write the multi-plane lens equation as where M (η i , η j ) is defined in Eq. (A5) and where C ji is the generalization of the factor in Eq. (4) coming from the definition Eq. (A14), To incorporate these results into our discussion in the main text, one only needs to add to the MSD of Eqs. (14,15) the further requirement (for i > 1) This is a stretch of the argument in α j along with an over-all rescaling of α j and/or M (η i , η j ).

Appendix B: Cosmological external convergence
The cosmological external convergence between comoving distance η 1 and η 2 > η 1 in the directionn on the sky can be written as (see Ref [40] and Eq. (A5)): where δ(n, η) is the matter overdensity at x = ηn, is our comoving distance to the shell at z, and we have neglected 3-curvature and radiation in the cosmic energy budget. To calculate RMS differential convergence, δκ 2 i , we need to evaluate mixed correlation terms of the form Passing to Fourier space, and using the power spectrum we arrive at The typical angular separation of multiply-lensed sources in galaxy lensing campaigns is in the ballpark of arcseconds. This means that the proper transverse distance between the relevant geodesics is smaller than ∼ 10 kpc, which is a small separation w.r.t LSS. In the following, we will therefore compute the cosmological correlators at the same line of sight,n =n . With this simplification, the integral for the variance of differential convergence reads (κ(η l , η m ;n) − κ(η n , η o ;n)) 2 = 9H 4 0 Ω 2 with q lmno (η, η ) := q lm (η)q lm (η ) + q no (η)q no (η ) − 2q lm (η)q no (η ).
The quantities we are mostly interested in are and (We remind the reader that the indices o,s denote observer, source respectively.) Analogous formulas hold for (δκ ls ij ) 2 and (κ ls ) 2 , (κ l ) 2 . The line of sight integrals invoke the power spectrum of matter density perturbations δ, computed at non-equal times η, η . In Sec. B 1 we estimate these correlators using HALOFIT [77,78]. Our numerical results, obtained through this computation, are illustrated in Fig. 1.

Evaluation using HALOFIT
The main difficulty in evaluating expressions for the variance of the external convergence is obtaining a reliable estimate of the non-equal time matter power spectrum P δ (k, η, η ) 6 . This problem has been extensively studied in the literature, both analytically and numerically (see for instance [47][48][49][50][51] and references therein). The purpose of this section is to provide a simple, yet accurate enough analytical approximation to P δ (k, η, η ), which can be used to easily estimate the typical magnitude of external convergence given the lens and sources configuration.
In linear theory, the non-equal time matter power spectrum is simply given by where P lin (k) is the liner power spectrum evaluated at redshift zero and D(η) is the linear theory growth factor. However, since a significant contribution to the external convergence comes from very nonlinear scales, the linear theory estimate is not reliable. Indeed, as we are going to see making comparison to the results from simulations with ray-tracing, the linear theory predictions significantly underestimate the variance of external convergence.
To get a more reliable theoretical estimate, one has to use the nonlinear matter power spectrum, which can be simply obtained using HALOFIT [77,78]. Unfortunately, HALOFIT outputs the nonlinear power spectrum only at equal times. To extend this output to non-equal times requires some approximations. Inspired by the linear theory, the commonly used prescription is P δ (k, η, η ) = P δ (k, η)P δ (k, η ) . (B12) We are going to argue that this and other similar approximations do not properly capture the non-equal time matter power spectrum on small scales. The reason is large bulk flows, which displace the dark matter particles by O(10) Mpc. These large displacements exactly cancel for equal time correlation functions 7 , due to the equivalence principle. However, for non-equal time correlation functions, the dark matter particles are displaced by different amounts, depending on times at which the density fields are evaluated. On scales smaller than O(10) Mpc, this leads to exponential suppression of power in the non-equal time power spectrum. This important effect is not captured by Eq. (B12). In order to gain some intuition about how large displacements affect the non-equal time power spectrum, we can use Lagrangian perturbation theory. In this setup we have (with r the Euclidean coordinate and q the Lagrangian coordinate) where ψ is the displacement field r( q, η) = q + ψ( q, η).
In Fourier space, Hence, we can write the two-point correlator as where we used the shorthand ψ i := ψ( q i , z i ). Using homogeneity and isotropy of the universe, we can write which translates into the following formula for the non-equal time power spectrum P δ (k, η, η ) = d 3 q e −i q· k e −i k( ψ( q,η )− ψ(0,η)) . (B18) Note that for two different times the relative displacement in the exponent can be large. For a large k (small scales) this implies that the contribution to the power spectrum becomes exponentially suppressed, as we argued at the beginning of this section. We can calculate this exponential suppression a bit more explicitly. For this purpose we can focus on the simplest case of Zel'dovich approximation. The Zel'dovich displacement is simply given in terms of the linear density field as follows Using the cumulant theorem and assuming Gaussian initial conditions, the non-equal time Zel'dovich power spectrum is given by where P Z (k, η) is the standard equal-time Zel'dovich power spectrum and we have defined D(η 1 )D(η 2 ) =: D 2 (η), withη an appropriate mean comoving distance which can be determined using the form of the linear growth factor D. The same result was obtained in [80] (for a similar discussion see also [81]).
One can show that the same exponential suppression remains going to higher orders in perturbation theory. However, beyond Zel'dovich approximation, the nonlinear spectra cannot be simply expressed through the equal time counterparts anymore. For instance, the general structure of the one-loop result can be written as The exact form of δP (k, η, η ) is not important, but we know that it has two important properties. First, this correction is small in perturbation theory [82,83]. Second, δP (k, η, η ) vanishes for equal times. Therefore, we expect that the correction to the equal-time one-loop term in the square brackets is always small. Furthermore, given the expectation that P (k, η, η ) is a smooth function of η and η , when the two times are not equal, the exponential suppression at high k is always large enough to make any small mistake in the modeling of the nonlinear power spectrum insignificant.
This equation has the correct equal-time limit, on large scales (small k) it reduces to the linear theory result given by Eq. (B11), and on small scales it has the correct exponential suppression of power induced by the difference in magnitudes of large bulk flows at different redshift. The equal time power spectrum on the right hand side P δ (k,η) can be simply evaluated using the HALOFIT [77,78]. Eq. (B23) can be used in Eq. (B7) to compute the differential external convergence variance. The highly oscillating integrand (due to the presence of the Bessel function j 0 ) can be tamed by means of FFTlog techniques [75,84]. Finally, we introduce a cut-off in the k integral at some k cutoff . We choose k cutoff = 10 Mpc −1 , where individual galaxies and baryonic effects most likely lead to a breakdown of the HALOFIT result. Note that similar smoothing is implicitly used in the ray-tracing simulations when the gravitational potential is estimated from the distribution of matter. Changing k cutoff by a factor of 2 up or down affects our results at the level of a few tens of percent, which is comparable to other theoretical uncertainties in our equations. In Sec. B 2 we compare the results of our calculations with results obtained in the literature using ray tracing techniques.

Comparison to ray-tracing results
TDCOSMO derives Bayesian priors for external convergence by using ray-tracing through the Millennium simulation [85], on LOSs which are chosen to match the galaxy   B11)) on the second column and non-linear approximation (Eq. (B23)) on the third column. The ray tracing results from the TDCOSMO collaboration (available here) are shown in the last two columns (κ s variance on the fourth, κ s mean and the 16th and 86th percent quantiles on the fifth).
density observed in each strong lensing system of interest [6,51,86]. We can use these numerical results to compare with our analysis (Eq. (B10)). In Fig. 5 and Tab. I we compare our computation (linear, obtained using Eq. (B11) for the power spectrum; and nonlinear, obtained using Eq. (B23)) with ray tracing results from the TDCOSMO collaboration available here. Fig. 6 shows the results obtained in Ref. [51] for the probability distribution of external convergence, averaging over all LOSs (that is, not restricting to fields containing strong lensing systems).
Our nonlinear analysis (incorporating HALOFIT and the non-equal time approximation) reproduces the variance in κ s to within about 30% accuracy for the systems which have a mean value of κ s compatible with zero (first 4 systems in Tab. I, top 4 panels in Fig. 5). Some systems, however, are found in [51] to be biased with a mean κ s that is significantly off zero (last 3 systems in Tab. I, bottom 3 panels in Fig. 5). This probably reflects excess structure along the LOS, typical of systems in crowded fields. For these systems, our calculation not only misses the bias, but also underestimates the spread in κ s , by up to a factor of ∼ 4. Thus, indeed, the simplified computation from the previous section can only be used to provide a rough estimate of the magnitude of weak lensing effects, and ray tracing analyses on the lines of Refs. [51,[87][88][89] are probably mandatory on a system by system study. Comparing the probability distribution obtained in ray tracing [51] (blue bar histograms) with our computation, in linear theory (solid orange) and with the non-linear approximation (solid green: k cutoff = 10 Mpc −1 , dashed green: k cutoff = 5 and 20 Mpc −1 ). We remark that our results cannot reproduce the bias on the external convergence (nonzero mean seen in some of the blue bar histograms), since our computation is equivalent to an average over all LOSs, differently from the ray-tracing analysis TDCOSMO performs, which is calibrated to match the richness of the actual lensing systems. A fairer comparison between our computation and typical TDCOSMO results, obtained by averaging over many LOSs, is shown in Fig. 6. Code: .