Physics of the Dark Universe Strongly lensed supernovae as a self-sufficient probe of the distance duality relation

The observation of strongly lensed Type Ia supernovae enables both the luminosity and angular diameter distance to a source to be measured simultaneously using a single observation. This feature can be used to measure the distance duality parameter η ( z ) without relying on multiple datasets and cosmological assumptions to reconstruct the relation between angular and luminosity distances. In this paper, we show how this can be achieved by future observations of strongly lensed Type Ia systems. Using simulated datasets, we reconstruct the function η ( z ) using both parametric and non-parametric approaches, focusing on Genetic Algorithms and Gaussian processes for the latter. In the parametric approach, we find that in the realistic scenario of N lens = 20 observed systems, the parameter ϵ 0 used to describe the trend of η ( z ) can be constrained with the precision achieved by current SNIa and BAO surveys, while in the futuristic case ( N lens = 1000) these observations could be competitive with the forecast precision of upcoming LSS and SN surveys. Using the machine learning approaches of Genetic Algorithms and Gaussian processes, we find that both reconstruction methods are generally well able to correctly recover the underlying fiducial model in the mock data, even in the realistic case of N lens = 20. Both approaches learn effectively from the features of the mock data points, yielding 1 σ constraints that are in excellent agreement with the parameterised results.


Introduction
The HOLISMOKES project recently demonstrated that the exciting possibility of using strongly lensed Type Ia supernovae (SNIa) as a precision probe in cosmology could soon become a reality [1]. Strong gravitational lensing occurs when a massive object lies along the line of sight between a luminous source and an observer. The gravitational field of the lens distorts the spacetime along the line of sight, bending the light path of photons coming from the source which results in a remapping of the source light into multiple images [2,3].
Due to the different light paths taken by photons coming from the source, these images arrive at the observer at different times and are therefore delayed with respect to one another. The time delay between images, which can be measured up to an arbitrary length of time [4,5], is a typical lensing observable which is only sensitive to the mass profile of the lens and to a combination of the source and lens angular diameter distances, the so-called time delay distance [2,3]. Provided that one can properly reconstruct the lens mass profile, the strong lensing time delay can then be good source-lens alignment, one needs sources with a typical brightness comparable to a galaxy to accurately distinguish the lens galaxy from the lensed images. This has led to the use of lensed quasars as the major cosmological probe in the context of lensing, an approach which has been proven by the H0LiCOW collaboration to be highly successful in deriving cosmological constraints [18][19][20][21][22][23][24].
There exists another family of astrophysical objects that have luminosities comparable to that of a galaxy: supernova explosions. The concept of using strong lensing of SNIa as a cosmological probe was pioneered in 1964 by Refsdal [25], who showed that the strong lensing time delays can be used to directly measure the Hubble parameter, H(z). However, since lensed supernovae are thought to be far rarer than lensed quasars, the idea of using them for cosmology has long been considered a fruitless endeavour. This changed with the recent observations of two lensed supernova events (the core collapse supernova ''Refsdal'' [26] in 2014 and the Type Ia supernova iPTF16geu [27] in 2016), which reinvigorated the field [28]. As highlighted by HOLISMOKES [1], cosmology with strongly lensed SNIa will soon be possible with surveys like LSST, which is expected to measure around a thousand such events [29][30][31][32].
As previously mentioned, gravitational lensing remaps the source light from the source plane to the lens plane. While the source surface brightness is conserved in the process, the area on the lens plane in which source photons are remapped is not conserved. In other words the flux of the lensed images is different from the source flux, their ratio defining the magnification factor. From lensing observations, one typically measures the ratio of magnification between the images by comparing their measured fluxes, but the total magnification is not directly measurable because the unlensed source brightness (i.e. the unlensed source flux) is unknown. So, despite their relative rarity in comparison to lensed quasars, lensed SNIa have one compelling advantage: they allow the source brightness to be measured independently from lensing observations [33].
By assuming that SNIa are standardisable candles, the brightness (and brightness decay after the explosion) can be inferred from the light curves of the lensed events, which are well known from unlensed supernovae observations. The total magnification can then be tightly constrained, reducing the uncertainties in the lens mass profile and improving the possible cosmological constraints [33]. Since this enables us to measure the luminosity distance to these events, they can be used to test more fundamental aspects of the standard cosmological model.
We note that microlensing and other lensing effects related to substructures (such as dust clouds and subhalos) in the deflector galaxy can significantly affect the standardisable nature of SNIa, leading to large uncertainties in the inferred unlensed flux [31][32][33][34][35]. However, it is expected that a significant fraction of lensed SNIa will be standardisable: around 20% from an LSSTlike survey [32,35,36]. In the following, we assume the effect of microlensing and other effects related to substructures in the lensing galaxy to be negligible.
The distance duality relation (DDR), which relates luminosity distances to angular diameter distances, is one example of a fundamental component of cosmology which is accessible with strongly lensed SNIa. Combining information from the velocity dispersion of stars in the lensing galaxy with lensing observations and supernova light curves, lensed SNIa can provide both measurements of angular diameter and luminosity distance, making these events particularly well-suited to probing the DDR and investigating any possible deviations from it, which could indicate the presence of new physics.
In this paper, we aim to reconstruct a function related to the DDR using mock datasets of strongly lensed SNIa. We create the mock datasets for an LSST-like survey, testing three cases: realistic (20 useful lensed SNIa as expected by LSST after 10 years of observations [1]), optimistic (100 lenses corresponding to the total number of spatially-resolved lensed SNIa by LSST [29]) and futuristic (1000 lenses representing the number of events we expect to observe in the next few decades). Using both parametric and non-parametric approaches for our reconstructions, we investigate whether violations of the distance duality relation could be detected with datasets of this size, finding that the realistic LSST-like survey would be competitive with other more traditional probes of the DDR such as the combination of SNIa and BAO observations. We note that a similar analysis, involving strong lensing in the context of constraining the DDR, was performed in [37][38][39]. However, our approach in this paper differs significantly to those previous works. The main difference is that in those works it was shown that it is possible to obtain angular diameter distance measurements from strong lensing events in place of other observations able to provide this quantity (such as BAO), but additional distance luminosity measurements were still needed to constrain the DDR. Instead, we show that both the luminosity and angular diameter distances can be measured from strongly lensed SNIa, exploiting the standardisable nature of supernovae explosions along with the ''standard ruler'' nature of strong lensing events. This makes strongly lensed SNIa a self-sufficient probe of the DDR.
The structure of our paper is as follows: in Section 2 we present some theoretical aspects of the distance duality relation, in Section 3 we discuss the physics of the strongly lensed supernovae and the details of the mock data, while in Section 4 we present our methodology, with the parameterised and nonparametric approaches, and our results. Finally, in Section 5 we summarise our conclusions.

The distance duality relation
The distance duality relation is given by [40] where d L (z) is the luminosity distance and d A (z) is the angular diameter distance. It holds under the conditions that photons travel along null geodesics in an pseudo-Riemannian spacetime, and that the number of photons is conserved [41].
The first condition is a fundamental statement about the geometry of spacetime and the photon mass and is only violated in theories of gravity with a non-Riemannian geometry, or in theories where photons do not propagate on null geodesics due to coupling with other fields (see e.g. [42][43][44][45][46]). It is easier to imagine deviations from DDR occurring due to non-conservation of the photon number, for example by absorption or scattering by dust as they propagate to the observer, or via more exotic mechanisms such as the conversion of photons to axions as they interact with cosmic magnetic fields [47].
In order to investigate these possible deviations from DDR, a function η(z) can be defined from Eq. (1) as which is equal to unity if the DDR is not altered. DDR violation mechanisms are integrated effects, where photons interact with intervening components along the line of sight. Thus, one can expect that for a photon at redshift zero, such an effect does not have time to take place and no violation of the relation is present, meaning that η(z = 0) = 1. This is also clear from Eq. (2), whose limit for z = 0 is lim z→0 η(z) = 1. For this reason, we impose that η(z) is equal to 1 at vanishing redshifts, for both our parametric and non-parametric reconstructions.
The function η(z) is also commonly parameterised in the literature (e.g. [48,49]) as where ϵ(z) ̸ = 0 is equivalent to η(z) ̸ = 1, thus indicating a deviation from the standard DDR. To probe this relation and search for violations of DDR, objects for which both a luminosity distance and angular diameter distance are available are needed. This motivates the use of strongly lensed SNIa, which amply fulfil these criteria.

Strongly lensed supernovae
A survey of strongly lensed SNIa will observe the distance modulus of the supernovae, i.e. the difference between its apparent and absolute magnitude, which is given by and the time delay distance (see e.g. [8]), where z s is the redshift of the source and z l the redshift of the lens. Notice that Eq. (5) only holds under the assumption of flat space, i.e. Ω k = 0, in the context of a flat Friedmann-Lemaître-Robertson-Walker metric. In curved space, the second term on the right hand side would become d A (z s , z l ). In this paper we want to obtain measurements of d A (z s ) and therefore the assumption of a flat Universe allows us to isolate this term in the time delay distance expression. We leave the investigation of more general cases for future work. Under this assumption we can invert Eq. (5) and obtain d A (z s ), and we can write our parameterisation of the distance duality relation in terms of the distance modulus, the angular diameter distance at the lens and the time delay distance as η(z s ) = 10 −5+µ(zs)/5 The number of currently detected lensed SNIa is insufficient for any precise cosmological application, so we turn to mock datasets to forecast our future ability to probe the distance duality relation with these events.

Mock dataset
To generate our mock datasets, we focus on lensed SNIa for which measurements of the kinematics of the lens galaxy are available, along with time delay observations. In this scenario, strong lensing will provide two independent distance measures at the same time [11,13,17]: d ∆t (z l ) and d A (z l ). The measurements of the time delay distance of a lens are obtained by combining the observation of time delays between the light curves of multiple images, a lens mass model for the lensing galaxy and a reconstruction of the mass environment along the line of sight [18][19][20][21][22][23][24]. We therefore consider only these contributions to the uncertainties of d ∆t .
As in [1], to estimate the precision on d ∆t we conservatively adopt a 5% uncertainty for the time delay and a 3% uncertainty for both the mass profile and the lens environment. Summing these in quadrature we obtain a cumulative uncertainty on d ∆t of 6.6%, in agreement with current constraints from lensed quasars 1 [24]. For the angular diameter distance to the lens, d A (z l ), we assume a scenario where spatially-resolved observations of the kinematics of the lens galaxy are available, so that the uncertainties of d A are essentially dominated by the time delay uncertainties. These measurements are expected to be obtained easily after all the SNIa images have faded. We therefore adopt a 5% precision for d A .
The missing ingredient of our mock dataset is now the distance modulus µ(z s ) of the lensed SNIa. This quantity must be reconstructed starting from the lensed distance modulus of four lensed images. For standardisable candles this implies fitting the lensed light curves, with exactly the same procedure used for unlensed SNIa, to provide an estimate of the lensed distance modulusμ without any cosmological assumption or knowledge of the lens model. The unlensed distance modulus is then related to the lensed one by the following relation: where A is the magnification factor of the lensed event, defined as the ratio of the lensed to unlensed flux, i.e.

log 10
This delensing procedure to infer the unlensed distance modulus can be summarised in two simple steps: 1. Estimate the lensed magnitude,μ, from the observed light curves of the lensed SNIa. 2. Assume a mass profile to estimate the lensing magnification, 2 delens the SNIa and obtain the unlensed modulus distance, µ.
Assuming this approach to be feasible for all the systems in our catalogues to infer the unlensed µ(z s ), we model its error budget due to the SNIa brightness uncertainties following [50] and to this we add in quadrature the magnification uncertainty: where the systematic uncertainties due to flux calibration are given by σ flux = 0.01, the intrinsic scatter of SNe at fixed colour, also known as colour smearing, is given by σ scat = 0.025, the intrinsic distance scatter is σ intr = 0.12 and finally, we also include an irreducible distance modulus error, which we assume affects all events coherently and varies linearly with redshift in the form δµ(z s ) = e M z s with e M drawn from a normal distribution N (0, 0.01) [50]. For the error on the lensing magnification we assume a ∼ 20% fractional uncertainties, i.e. σ log A = 0.09 [33].
To generate the mock, we assume the lens distribution to be uniform in the range 0.1 ≤ z ≤ 0.9 and the source redshift to be twice the lens redshift i.e. z s = 2z l for simplicity. Even though there will be a distribution for the redshifts of the sources this has a small impact on cosmological inference [51,52]. 1 The assumed uncertainties correspond to having a perfect knowledge of the lens mass profile and its environment. As detailed in [10], a hierarchical analysis of the lensing observables may lead to higher uncertainties in the time delay distance.
2 As the unlensed flux is not measured in lensing observations, lensing magnification has to be determined from the lens mass profile. However, the same mass profile is needed to infer the angular and time delay distances and can be found by studying the lens galaxy and its environment [14,18,20,22,24]. Another possibility is to get the unlensed magnitude from an external catalogue of unlensed SNIa and estimate the magnification from Eq. (8) [27]. In this case one can still estimate the distance modulus but it would be the same as the one being inferred from the actual SNIa catalogue, spoiling the information of the lensed event except for the redshift z s Assuming a ΛCDM fiducial cosmology with H 0 = 70 km s −1 Mpc −1 and Ω m = 0.3 (with Ω k = 0), we calculate the angular diameter distance d A (z) at the given z l and z s . From this we can obtain d ∆t (z) using Eq. (5), while to compute the fiducial distance modulus µ(z) we use Eq. (4), obtaining the luminosity distance from d A (z) through Eq. (2), which implies choosing a fiducial η(z). We rely on the parameterised expression of η(z) of Eq. (3), and we choose for our fiducial a constant ϵ(z) = ϵ 0 . We focus on three different choices for this parameter, in order to be able to test the precision of future observations in different scenarios. We choose the standard DDR value ϵ 0 = 0, and two fiducials with different degrees of departure from DDR, with ϵ 0 = 0.01, 0.05.
Once the fiducial trends for our observables are computed, we obtain the mock datasets by drawing a random Gaussian shift around the fiducial, using the estimated 1σ uncertainties for d A (z l ), d ∆t (z l ) and µ(z s ): with i = 1 . . . N lens , D true representing the fiducial value of either d A , d ∆t and µ, and δD being the corresponding Gaussian deviate. From this we get our mock distances as are the 1σ uncertainties of the distance considered. Finally we use Eq. (6) to obtain a mock catalogue for η(z i ) from the mock datasets of d A (z l ), d ∆t (z l ) and µ(z s ). To obtain the error on each of the data points of the mock of η(z i ), we employ an MCMC-like approach, detailed as follows: 1. We construct the distribution of each of the D i,mock distances at each redshift z i of the catalogue, drawing 10,000 random samples from the assumed distribution for D i,mock . 2. We combine each of the 10,000 random samples using Eq. (6) to obtain 10,000 realisations of the distribution of η(z i ) at each redshift z i . 3. We calculate the mean and standard deviation of log 10 η(z i ) from the η(z i ) distributions at each redshift to construct our final mock datasets.
A more detailed explanation of the procedure followed to construct the mock datasets can be found in Appendix.
Our choice to construct the catalogue using log 10 η(z i ) is motivated by the fact that the distribution of η(z i ) are almost lognormal and therefore log 10 η(z i ) is almost Gaussian distributed around zero i.e. log 10 η(z i ) ≈ N (0, σ log 10 η(z i ) ). This allows us to derive constraints from our mock catalogues by employing an MCMC approach with a Gaussian likelihood of the form: where log 10 η th (z i ) is the theoretical value of log 10 η(z i ).
Furthermore, the choice of constructing the catalogue for log 10 η(z) is also useful for the application of Gaussian processes that we describe in Section 4.3 below; this approach requires the choice of a mean prior for the reconstructed function, which is usually assumed to be zero in standard applications. The choice of reconstructing log 10 η(z) allows us to keep this assumption without significantly biasing the results.

Methodology and results
In this section we describe the methodology we use in our analysis and our corresponding results. We first use a simple parameterisation of the DDR violation function η(z), forecasting the constraints that can be achieved with realistic (N lens = 20), optimistic (N lens = 100) and futuristic (N lens = 1000) mock datasets. We then focus only on the realistic and optimistic datasets and we apply machine learning approaches, namely Genetic Algorithms (GA) and Gaussian processes (GP), to reconstruct η(z).

Parameterised approach
We first adopt a simple parameterised approach to forecast the constraints achievable on DDR violation with future strongly lensed SNIa observations. We use the parameterisation of Eq. (3), and we assume the function ϵ(z) to be constant, with its value ϵ 0 the free parameter that we want to constrain with our mock dataset.
We build a likelihood module interfaced with the publicly available MCMC sampler Cobaya [53] which compares the prediction for log 10 η th (z) = ϵ 0 log 10 (1 + z) , with the mock dataset we described in Section 3.1. The improvement brought by strongly lensed SNIa observations to this analysis is evident. In most previous constraints of DDR violations, predictions of both d L (z) and d A (z), which enter in the definition of η(z) in Eq. (2), were required, as the two observables are compared independently with data (see e.g. [54][55][56]). Such an approach is intrinsically dependent on the assumptions made about the expansion history of the Universe, and in particular on the assumed dark energy model driving the late time accelerated expansion. Here, such an assumption is not necessary, as the distances entering Eq. (6) are obtained at each redshift from a single observation, and therefore there is no need to assume a cosmological model to reconstruct the luminosity and angular distances.
However, it is important to note that we assume that η(z) as defined in Eq. (2) is a valid description of DDR violation, which implies that the Universe is to first approximation homogeneous and isotropic. Finally, for Eq. (6) to hold, we further assume that the contributions to the total energy density by curvature are negligible (Ω k = 0).
For these reasons, the only free parameter in this analysis is ϵ 0 , for which we use a flat prior. The constraints we obtain on this are shown in Table 1 and the posterior distributions in Fig. 1.
We find that the realistic case (N lens = 20) would achieve the same constraining power of current constraints obtained through the combination of SNIa and BAO observations [55], while the futuristic case (N lens = 1000) reaches a sensitivity similar to the one that can be achieved by the combination of the Euclid BAO survey with the full LSST SNIa survey [55].
The optimistic case (N lens = 100) sits somewhere in the middle, but given the reduced number of assumptions made on the cosmological model in the analysis of strongly lensed SNIa, using this approach could allow DDR violation to be disentangled from other cosmological mechanisms [56].

Genetic algorithms
Here we describe a non-parametric reconstruction of the duality parameter η(z), which is based on a machine learning approach called the Genetic Algorithms (GA) and is complementary to the parameterised analysis of the previous section. The GA are a particular stochastic optimisation approach, loosely inspired from the theory of evolution and mimicking the stochastic operations of mutation, i.e. the merging of different individuals to form descendants, and crossover, a random change in the chromosomes of an individual. This is achieved by emulating natural selection, i.e. in a given environment, a population (in our case a set of test functions) will evolve and adapt under the pressure of the operators of mutation and crossover.
In general, the reproductive success of every member of the population is assumed to be proportional to their fitness, which is a measure of how well they fit the data in question. Here we implement a standard χ 2 statistic as described in the previous sections. For more details on the GA and their applications to cosmology see Refs. [57][58][59][60][61][62][63][64].
A quick overview of the fitting process is as follows. During the initialisation of the code a set of test functions is formed using a group of orthogonal polynomials, called the grammar. This is a crucial step as it has been shown that the choice of the grammar may significantly affect the convergence rate of the GA code [57]. Using then this initial population, we encode the duality parameter η(z) in every member of the population and we also require that η(z) satisfies a set of physical priors and initial conditions. In our analysis we remain completely agnostic regarding the DDR deviation mechanism, so we only assume that the duality parameter satisfies η(z = 0) = 1, but we make no assumption of a dark energy model.
After preparing the initial population, we then estimate the fitness of every member using the χ 2 and then we apply the stochastic operators of crossover and mutation to a subset of the best-fitting functions chosen via tournament selection [57]. We then repeat this process thousands of times, so as to make certain the GA code has converged, and we also use several different random seeds, in order to avoid biasing the run due to a specific random seed. The errors in the reconstruction are calculated using the path integral approach of Refs. [58,60]. In this approach the error regions are estimated by integrating the likelihood over all functions of the functional space scanned by the GA. This method has been validated by comparing its error estimates against bootstrap Monte Carlo and Fisher matrix errors [58]. Finally, here we use the publicly available code Genetic Algorithms. 3 The results of the GA reconstruction can be seen in Fig. 2. In the left column we show the reconstructions for 20 lenses, while in the right column we show the case for 100 lenses. The mocks in the top row were made with ϵ = 0, the ones in the middle row with ϵ = 0.01, while the ones in the bottom row with ϵ = 0.05.
As can be seen, in both cases of the 20 and 100 lenses, the GA is able to correctly recover within the errors the underlying fiducial model η fid (z) = log 10 (1 + z) ϵ 0 , shown with a dashed line in each of the panels.
Specifically, we find that in the case of the 20 lenses the GA is able to predict the fiducial model very well across all redshifts, albeit with a small tension at high redshifts (z ≳ 1.4) due to the lack of points. On the other hand, in the case of the 100 lenses the GA reconstruction remains very close to the fiducial model at all redshifts.

Gaussian processes
The classic definition of a Gaussian process (GP) is ''a collection of random variables, any finite number of which have a joint Gaussian distribution'' [65]. A GP can be thought of as a generalisation of a Gaussian probability distribution, but whereas a probability distribution describes finite-dimensional random variables, a stochastic process governs the properties of functions. In our case, this function that we use a GP to reconstruct is log 10 η(z), with the redshifts being the input fed to the GP. In general, the GP is completely specified by its mean and covariance functions, though the mean function is usually taken to be zero for the sake of simplicity and a baseline value of zero is hard-coded into many of the popular GP regression packages. There are many options for the covariance function, or kernel, k(z,z). GPs have been applied to reconstruct a wide variety of functions in cosmology (see e.g. [66][67][68][69][70][71][72][73][74]) and there is still some debate over the best choice of kernel, as the choice can strongly influence the resulting GP reconstruction. In this work, we choose to proceed by tailoring the kernel to one supporting a reconstruction that finds an increasing trend in redshift, as this is what we expect the fiducial models to produce.
It was found in [75] that the Matérn class of kernels performed best when reconstructing the equation of state of dark energy, w(z), using SNIa data. This class of kernels take the following form [65]: where d(z,z) represents the Euclidean distance between the inputs z andz, Γ (ν) is the gamma function, K ν is a modified Bessel function and ν controls the shape of the covariance function, tending to the Gaussian limit as ν → ∞. The hyperparameters ℓ and σ M correspond to the approximate length scale over which the function varies and the magnitude of those variations respectively. The choice of a half-integer value for ν is made in order to remove the dependence on the Bessel function [75]. The larger the value of ν, the smoother the resulting GP, although for ν ≥ 7/2, the results become hard to distinguish from one another [65]. Overall, this makes ν = 5/2 a good choice.
In the course of our analysis, we found that when a Matérn kernel is used alone, the GP struggles to follow the trend in redshift introduced by the fiducial models of ϵ = 0.01 and ϵ = 0.05. We therefore create a custom kernel that better suits our problem, by adding a dot product kernel to a Matérn (ν = 5/2) kernel. The dot product kernel takes the general form k(z,z) = σ d + z ·z, (14) where the hyperparameter σ d acts on the dot product kernel in a similar way to how σ M acts on the Matérn kernel. For the Matérn class of kernels, σ M acts to rescale the GP covariance, whereas for the dot product kernel, σ d acts as a constant offset of the covariance of the GP. We note that the dot product kernel is nonstationary, meaning that the resulting GP depends not only on the relative positions of the points, but on their absolute positions. A translation in the input space (i.e. shifting the mock data points in redshift) will therefore result in a different GP prediction from the dot product kernel even if the kernel hyperparameter is kept fixed [76].
We use the Gaussian process regressor provided by the Python package scikit-learn [77] to perform our reconstruction of log 10 η(z) with the custom kernel described above. The package also allows for optimisation of the value of any hyperparameters in the kernel by maximising the log-likelihood of the GP output.
We list the optimised values of ℓ, σ M and σ d in Table 2 to give an idea of the general behaviour of our custom kernel. Note that we do not fix these values by hand in the kernel. The only information we give to the kernel is the upper and lower bound that the optimiser explores between for the value of the length scale ℓ. This choice of bound can have an effect on the resulting reconstruction, as there may be multiple values of the hyperparameters that maximise the log-likelihood. However, the optimisation routine will only be able to find one of the maximal values each time the procedure is run. The bounds can therefore be manually shrunk to eliminate all but one of the maximal values of each of the hyperparameters, forcing the GP to use that particular combination.
The value of the hyperparameter ℓ corresponds to the average variation in the z-direction of the data, and is expected to be of order of the average distance between each mock data point. Therefore,to select the upper and lower bounds for the length scale in the Matérn kernel, we considered the approximate average distance between each mock data point in the catalogue, roughly 0.08 in terms of the redshift in the case of 20 lenses. Since it is squared, we then expect the learned length scale to be of the order 10 −3 . In the case of 100 lenses, the mock data points are spaced closer together, leading us to expect a learned length scale on the order of 10 −4 . We therefore set the bounds of the Matérn kernel as 10 −5 and 10 −1 to safely incorporate these expected values.
The value of σ M instead corresponds to the typical variation in amplitude of the function, which is expected to be of the order of the average error of the data points i.e. ∼ 0.05. Finally, the dot product kernel is equivalent to a linear regression in which σ d is the intercept of the fit. From Eq. (12) it is straightforward to see that σ d ≈ ϵ 2 0 = O(10 −4 ). We therefore see that the expected values for σ d and σ M fall well within the imposed bounds for the GP hyperparameters. While at first glance this ''recipe'' used to build the kernel appears somewhat naïve, its validity is confirmed by the optimised hyperparameter values reported in Table 2.
The results of the GP reconstruction using the custom kernel are shown in Fig. 3. The left column shows the reconstructions of log 10 η(z) for the realistic case of 20 strongly lensed SNIa, and the right column shows the optimistic case of 100 lenses. The mock data in the top row was created with no deviation from ΛCDM or the standard DDR, i.e. ϵ = 0.0, while the middle row shows the mock data for which ϵ = 0.01 and the bottom row ϵ = 0.05.
In the realistic case of 20 lenses, we see that the relatively small number of points does not prevent the GP from correctly recovering the fiducial model (dashed line in all three panels of Fig. 3) to within 1σ for all the fiducial cases.
In the optimistic case of 100 lenses, the error of the GP at high redshift is decreased with respect to the 20 lens case, due to the increased information given to the GP by the additional mock data points. However, for this particular mock dataset realisation, the reconstruction does not recover the fiducial model as well as the 20 lens case, with a slight overestimation of the log 10 η(z) function at higher redshifts for all three values of ϵ 0 . However, even with this overestimation, the reconstruction is again never more than 1σ away from the true fiducial model.
In all cases we report the χ 2 statistic for the fiducial model and the GP reconstruction in the legend of the plots.

Conclusions
In this paper we investigated the possibility of using future observations of strongly lensed Type Ia supernovae to constrain deviations from the standard distance duality relation. A departure from the DDR could be a significant smoking gun for deviations from the standard cosmological model, as it would signal that fundamental assumptions are violated, which we discussed in Section 2.
Such violations are usually investigated in the literature by combining different observations together; this allows the luminosity and angular distances to be reconstructed separately and the function η(z), equal to unity in the standard model, to be constrained. In Section 3 we discussed how the observation of strongly lensed SNIa can instead directly provide the two distances at the redshift of the source, and can therefore be used to obtain measurements of η(z), avoiding the need to reconstruct the two distances. Notice however that such a measurement is possible only under certain assumptions; one needs to be able to obtain the luminosity distance of the lensed supernovae and remove any possible magnification due to the lens, while the measurement of the angular distance at the source redshift can be obtained from the time delay distance only through the assumption of a flat Universe and if kinematic measurements of the lens galaxy are available.
Other than these assumptions, the use of such observations allows us to obtain our results without any further dependence on the cosmological model, even in the parametric approach that we discuss in Section 4. For this case we find that, as expected, the results strongly depend on the number of systems that will be observed by future surveys; for a realistic number of strongly  Fig. 2. In all cases the GA was able to correctly recover the underlying fiducial model within the errors. In Section 4.3, we presented the results of our Gaussian process reconstruction. We reconstructed log 10 η(z) for the fiducial models of ϵ 0 = 0.0, ϵ 0 = 0.01 and ϵ 0 = 0.05 using both 20 lenses and 100 lenses, finding that the GP was well able to correctly recover the underlying fiducial in the mock data. In summary, we have shown how strongly lensed SNIa will be a powerful probe of distance measures in cosmology in the upcoming LSST era. We have discussed how these systems are uniquely able to provide measurements of both luminosity and angular diameter distances, allowing excellent constraints to be placed on the distance duality relation. If any deviations from this relation were to be detected it would be an exciting hint at possible new physics easily accessible to other next-generation surveys.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. the NWO, The Netherlands and the Dutch Ministry of Education, Culture and Science (OCW), The Netherlands, and from the D-ITP consortium, a program of the NWO that is funded by the OCW, The Netherlands. NBH is supported by UK STFC studentship ST/N504245/1. MM has received the support of a fellowship from ''la Caixa'' Foundation (ID 100010434), with fellowship code LCF/BQ/PI19/11690015, and the support of the Spanish Agencia Estatal de Investigacion through the grant ''IFT Centro de Excelencia Severo Ochoa SEV-2016-0597''. MM also wants to thank the Big Star Bar for providing a work space and an internet connection during this period of remote work. SN acknowledges support from the research projects PGC2018-094773-B-C32, the Centro de Excelencia Severo Ochoa Program, Spain SEV-2016-059 and the Ramón y Cajal program, Spain through Grant No. RYC-2014-15843.
In this work we made use of the following Python packages that are not mentioned in the text: GetDist [78], a tool for the analysis of MCMC samples, Matplotlib [79], for the realisation of the plots in the paper, NumPy [80], for numerical linear algebra, and SciPy [81], for numerical sampling of the statistical distributions involved in our data analysis.

Appendix. Details of the mock catalogue creation
In this Appendix we describe in more detail the MCMC-like approach used to construct our mock catalogues of η(z i ) with i = 1 . . . N lens . As discussed in the main text, the methodology followed to generate our mock catalogues has three distinct steps. We start by constructing the probability distribution function (PDF) of the distances involved in the DDR. For a given redshift z i we start drawing random Gaussian deviates, δD(z i ), from a Gaussian distribution of the form: With the PDFs of d A , d ∆ and µ in hand, we proceed in an MCMC-like fashion. We assume the PDFs of d A , d ∆t and µ to be the posteriors of a hypothetical MCMC run with the three distances as independent parameters, so that at each redshift z i , each triplet {d A,n , d ∆t,n , µ n | n = 1 . . . 10 4 } constitutes a sample of an MCMC chain. Therefore at each n we combine the triplet values, using Eq. (6) to obtain a sample of the posterior of (log 10 η(z i )) n , i.e. we treat log 10 η(z i ) as a derived parameter of the MCMC. We apply this procedure to all 10,000 samples to construct the distribution of log 10 η(z i ). From the PDFs of log 10 η(z), we can also perform some sanity checks. First of all, assuming that log 10 η(z) = const, we can multiply the PDFs of all the log 10 η(z i ) to obtain a combined posterior and therefore the mock best fit for log 10 η(z). We show the combined PDFs of log 10 η for two mocks of N lens = 20, 100  plotted against the combined true PDFs of log 10 η for N lens = 20 in Fig. A.7. While this best-fit value will not be as accurate as the one obtained from a full MCMC sampling, it can signal inconsistency in the mock dataset without the need for a complex analysis. Furthermore, we can construct the χ 2 distribution, testing 10,000 realisations of a mock against the hypothesis log 10 η(z i ) ≈ N (0, σ log 10 η(z i ) ) as an additional sanity check. In Fig. A.8 we show the comparison between the distribution of χ 2 values for the 20 lens mock dataset and the theoretical χ 2 distribution for 20 degrees of freedom. We can see that the mock distribution follows the theoretical one extremely well.
So far, we found that our mocks are generally within the 1σ bounds of the true combined PDF, even though a significant deviation from the fiducial might happen in correspondence with the higher/lower tail of the χ 2 distribution for the mocks. In summary, this procedure has two main advantages: (1) it exposes the PDFs of the data points of the mocks, allowing them to be used for sanity checks and eventually for a full MCMC sampling similar to what has been done for the analysis of the H0LiCOW lenses (see e.g. [23]) and (2) it allows us to reconstruct the errors of the data points directly from their posteriors, removing any assumptions coming from the standard error propagation formula.