Cosmological tests with strong gravitational lenses using Gaussian processes

Strong gravitational lenses provide source/lens distance ratios Dobs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {D}}_{\mathrm{obs}}$$\end{document} useful in cosmological tests. Previously, a catalog of 69 such systems was used in a one-on-one comparison between the standard model, Λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varLambda $$\end{document}CDM, and the Rh=ct\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_{\mathrm{h}}=ct$$\end{document} universe, which has thus far been favored by the application of model selection tools to many other kinds of data. But in that work, the use of model parametric fits to the observations could not easily distinguish between these two cosmologies, in part due to the limited measurement precision. Here, we instead use recently developed methods based on Gaussian Processes (GP), in which Dobs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {D}}_{\mathrm{obs}}$$\end{document} may be reconstructed directly from the data without assuming any parametric form. This approach not only smooths out the reconstructed function representing the data, but also reduces the size of the 1σ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1\sigma $$\end{document} confidence regions, thereby providing greater power to discern between different models. With the current sample size, we show that analyzing strong lenses with a GP approach can definitely improve the model comparisons, producing probability differences in the range ∼\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document} 10–30%. These results are still marginal, however, given the relatively small sample. Nonetheless, we conclude that the probability of Rh=ct\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_{\mathrm{h}}=ct$$\end{document} being the correct cosmology is somewhat higher than that of Λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varLambda $$\end{document}CDM, with a degree of significance that grows with the number of sources in the subsamples we consider. Future surveys will significantly grow the catalog of strong lenses and will therefore benefit considerably from the GP method we describe here. In addition, we point out that if the Rh=ct\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_{\mathrm{h}}=ct$$\end{document} universe is eventually shown to be the correct cosmology, the lack of free parameters in the study of strong lenses should provide a remarkably powerful tool for uncovering the mass structure in lensing galaxies.


Introduction
The degree to which light from high-redshift quasars is deflected by intervening galaxies can be calculated precisely if one has enough information concerning the distribution of mass within the gravitational lens [1,2]. Depending on the mass of the galaxy, and the alignment between source, lens, and observer, gravitational lenses may be classified either as macro (with sub-classes of strong and weak lensing) or micro lensing systems. Strong lensing occurs when the source, lens, and observer are sufficiently well aligned that the deflection of light forms an Einstein ring. Using the angle of deflection, one may derive the radius of this ring, from which one may then also compute the angular diameter distance to the lens. This distance, however, is model dependent. Hence, together with the measured redshift of the source, this angular diameter distance may be used to discriminate between various cosmological models (see, e.g., Ref. [3][4][5][6][7]).
In this paper, we use a recent compilation of 118 [8] plus 40 [9] strong lensing systems, with good spectroscopic measurements of the central velocity dispersion based on the Sloan Lens ACS (SLACS) Survey [4,10,11], and the Lenses Structure and Dynamics (LSD ) Survey (see, e.g., refs. [12,13]), to conduct a comparative study between ΛCDM [14,15] and another Friedmann-Robertson-Walker (FRW) cosmology known as the R h = ct universe [16,17]. Over the past decade, such comparative tests between this alternative model and ΛCDM have been carried out using a wide assortment of data, most of them favouring the former over the latter (for a summary of these tests, see Table 1 in ref. [18]). These studies have included high z-quasars [19], gamma-ray bursts [15], Type Ia SNe [21,22], and cosmic chronometers [23]. The R h = ct model is characterized by a total equation of state p = −ρ/3, in terms of the total pressure p and density ρ in the cosmic fluid.
The results of these comparative tests are not yet universally accepted, however, and several counterclaims have been made in recent years. One may loosely group these into four general categories: (1) that the gravitational radius (and therefore also the Hubble radius) R h is not really physically meaningful [24][25][26]; (2) that the zero active mass condition ρ + 3 p = 0 at the basis of the R h = ct cosmology is inconsistent with the actual constitutents in the cosmic fluid [27]; (3) that the H (z) data favour ΛCDM over R h = ct [25,28]; and (4) that Type Ia SNe also favour the concordance model over R h = ct [25,28,29]. These works, and papers published in response to them [17,23,[30][31][32][33], have generated an important discussion concerning the viability of R h = ct that we aim to continue here. In Sect. 7 below, we will discuss at greater length the need to use truly model-independent data in these tests, basing their analysis on sound statistical practices. Such due diligence is of utmost importance in any serious attempt to compare different cosmologies in an unbiased fashion.
The test most directly relevant to the work reported here was carried out using strong lenses by Ref. [7], who based their comparison on parametric fits from the models themselves, and concluded that both cosmologies account for the data rather well. The precision of the measurements used in that application, however, was not good enough to favour either model over the other. In this paper, we revisit that sample of strong lensing systems and use an entirely different approach for the comparison, based on Gaussian Processes (GP) to reconstruct the function representing the data nonparametrically. In so doing, the angular diameter distance to the lensing galaxies is determined without pre-assuming any model, providing a better comparison of the competing cosmologies using a functional area minimization statistic described in Sect. 5. An obvious benefit of this approach is that a reconstructed function representing the data may be found regardless of whether or not any of the models being tested is actually the correct cosmology.
In Sect. 2 of this paper, we describe the lensing equation used in cosmological tests, and we then describe the data used with this application in Sect. 3. The Gaussian processes and the cosmological models being tested here are summarized in Sect. 4. The area minimization statistic is introduced in Sect. 5, and we explain how this is used to obtain the model probabilities. We end with our conclusions in Sect. 6.

Theory of lensing
In work with strong lensing, the observed images are typically fitted using a singular isothermal ellipsoid approximation (SIE) for the lens [34]. The projected mass distribution at redshift z l is assumed to be elliptical, with semi-major axis θ 2 and semi-minor axis θ 1 . Often, an even simpler approximation suffices, and we make use of it in this paper: we use a singular isothermal sphere (SIS) for the lens model, in which the semi-major and semi-minor axes are equal, i.e., θ 1 = θ 2 . To provide context for this approach, we first describe SIE lens model and afterwards restrict it further by setting θ 1 = θ 2 . The lens equation [35] that relates the position β in the source plane to the position θ in the image plane is given by where Φ is the lensing potential of the SIE given as [36] and is the ellipticity related to the eccentricity according to In Eq. (2), θ E is the Einstein radius, defined as where σ v is the velocity dispersion within the lens and Notice that Eq. (4) is independent of the Hubble constant H 0 . Nonetheless, one must still measure σ v , the total velocity dispersion of stellar and dark matter. Obtaining this quantity is challenging because it is not the average line-of-sight velocity dispersion weighted with surface-brightness. The velocity dispersion of the SIS (σ S I S or σ v ), may be related to the central velocity dispersion σ 0 , which is obtained from the stellar velocity dispersion with one-eighth the effective optical radius (see, e.g., Refs. [4,5]). Though this works quite well for massive elliptical galaxies, which are indistinguishable kinematically from an SIE within one effective radius, σ S I S and σ 0 are actually not equal. Dark matter is dynamically hotter than bright stars so the velocity dispersion of the former must be greater than that of the latter [37]. Treu et al. [4] studied the homogeneity of earlytype galaxies using the large samples of lenses identified by the Sloan Lenses ACS Survey (SLACS; [10,38]) and found that f S I S ≡ σ S I S /σ 0 = 1.010 ± 0.017 when fitting the geometry of multiple images. Similar results were found in Ref. [39], who examined the ratio of stellar velocity dispersion to σ S I S for different anisotropy parameters. The accumulation of evidence therefore suggests that f S I S = 1.01, and this is the value we adopt for this study. Thus, following Ref. [40], we write the Einstein Radius as where The data based on Eq. (6) will be used to compare our two cosmological models in this paper. The errors associated with individual measurements of D are calculated from the error propagation equation, containing θ E , σ 0 , f S I S = 1.01 and σ f = 0.017. We follow Grillo et al. [5] and set σ θ E = 0.05 θ E and σ σ 0 = 0.05 σ 0 . The overall dispersion in D is expected to be σ D ∼ 0.11D obs .

Data
The compilation we use here contains 158 strong lensing systems. These have excellent spectroscopic measurements of the central velocity dispersion, obtained using the Sloan Lens ACS (SLACS) Survey [4,10,11] and the Lenses Structure and Dynamics (LSD) survey [12,13]. One can also find some of the original contributions to these datasets in Refs. [41][42][43][44][45][46][47][48]. The velocity dispersion (and its aforementioned error ∼ 5%) are obtained from Sloan Digital Sky Survey Database (SDSS). Given that two distances are involved for each lens-source pairing, the GP method calls for a reconstruction of D(z l , z s ) in two dimensions. This will only be feasible, however, when the sample is large enough to yield enough statistics to warrant this full approach. For now, even with 158 strong lensing systems, we are constrained to consider small redshift ranges, effectively reducing the problem to a one-dimensional reconstruction in each sub-division. Because the data are less dispersed in the lens plane, where 0.1 < z l < 0.5, and scattered much more in the source plane, 0.3 < z s < 3.0, we carry out the reconstruction within thin redshift-shells of sources, turning D obs (z l , z s ) into a one-dimensional function of z l for what is essentially a fixed z s . To minimize the scatter in source redshifts, we use a bin size less than 0.025 and choose those bins that have at least five data points within them, allowing us to reconstruct D obs (z l , z s ) using GP for each of the selected bins. In our sample of 158 strong-lensing systems, these criteria therefore allow us to assemble five different redshift bins, with anywhere from 5 to 9 lens-source pairs in each of them. These strong lenses are displayed in Tables 1, 2, 3, 4 and 5. Note that for the purpose of GP reconstruction in one dimension, we assume that all the sources in redshift bin (z s , z s + Δz) have the same average redshift z s + Δz/2.

Gaussian processes and model comparisons
Adapting the code developed by Seikel et al. [51] for Gaussian Processes in python, we reconstruct D obs (z l , z s ) for each of the sub-samples in Tables 1, 2, 3, 4 and 5, without assuming any model a priori. The GP method uses some of the attributes of a Gaussian distribution, though the former utilizes a distribution over functions obtained using GP, while the latter represents a random variable. The reconstruction of a function f (x) at x using GP creates a Gaussian random variable with mean μ(x) and variance σ (x). The function reconstructed at x using GP, however, is not independent of that reconstructed atx = (x + dx), these being related by a covariance function k(x,x). Although one can use many possible forms of k, we use one that depends on the distance between x andx, i.e., the squared exponential covariance function defined as Note that this function depends on two hyperparameters, σ f and Δ, where σ f indicates a change in the y-direction and Δ represents a distance over which a significant change in the x-direction occurs. Overall, these two hyperparameters characterize the smoothness of the function k, and are trained on the data using a maximum likelihood procedure, which leads to the reconstructed D obs (z l , z s ) function for each source redshift shell centered on z s . For this paper, we have found that these hyperparameter values are 0.144 and 0.661, respectively. One of the principal features of the GP approach that we highlight in this application to strong lenses concerns the estimation of the 1σ confidence region attached to the reconstructed D obs (z l , z s ) curves. The 1σ confidence region depends on both the actual errors of individual data points, σ Di , on the optimized hyperparameter σ f (see Eq. 9) and on the product K * K −1 K T * (see Ref. [51]), where K * is the covariance matrix at the point of estimation x * , calculated using the given data at x i , according to K is the covariance matrix for the original dataset. Note that the dispersion at point x i will be less than σ Di when , when for that point of estimation there is a large correlation between the data. From Eq. (9) it is clear that the correlation between any two points x andx will be large only when x −x < √ 2Δ. This condition, however, is satisfied most frequently for the strong lenses used in our study, which results in GP estimated 1σ confidence regions that are smaller than the errors in the original data. We refer the reader to Ref. [51] for further details.      The principal goal of this paper is to use a GP reconstruction of the D obs (z l , z s ) functions in order to compare the predictions of the ΛCDM and R h = ct cosmological models. The standard model contains radiation (photons and neutrinos), matter (baryonic and dark) and dark energy in the form of a cosmological constant. This blend of constituents, currently dominated by dark energy, is producing a phase of accelerated expansion, following an earlier period of deceleration when radiation was dominant. In terms of today's critical density ρ c ≡ 3c 2 H 0 /8π G and Hubble constant H 0 , the Hubble expansion rate in this cosmology depends on the matter density, Ω m ≡ ρ m /ρ c , radiation density, Ω r ≡ ρ r /ρ c and dark energy density, Ω de ≡ ρ de /ρ c , with the constraint Ω m +Ω r +Ω de = 1. Since Ω r is negligible in the current era, we ignore radiation and use Ω de = 1 − Ω m . For all the calculations, we use the parameters optimized by Planck, with Ω m = 0.272, and Ω de = 0.728. Thus, one deduces from the Friedmann equation that in ΛCDM The angular diameter distance between redshifts z 1 and z 2 is given as Therefore, substituting for H (z) from Eq. (11), one gets The R h = ct universe [16,17,[52][53][54] is also an FRW cosmology with radiation (photons and neutrinos), matter (baryonic and dark) and dark energy, with radiation and dark energy dominating the early Universe, and matter and dark energy dominating the current era [55]. But while it is similar to ΛCDM in this regard, it has an additional constraint on the total equation of state, i.e., ρ + 3 p = 0, the so-called zero active mass condition, where ρ and p are the total energy density and pressure, respectively. With this additional constraint, the R h = ct universe always expands at a constant rate, which depends on only one parameter -the Hubble constant H 0 . Using the Friedmann equation with zero active mass, we find that and from Eq. (12), we therefore find that

The area minimization statistic
Now that we are dealing with a comparison between two continuous functions, i.e., D obs with either D ΛCDM or D R h =ct (each derived from Eq. 5 using Eqs. 13 and 15), we cannot use discrete sampling statistics, such as weighted least squares, for the comparison of different models. The reason is that sampling at random points to obtain the squares of differences between model and reconstructed curve would lose information between these points, whose importance cannot be ascertained prior to the sampling. To overcome this deficiency, we introduce a new statistic, based on a previous application [56,57], which we call the "Area Minimization Statistic" to estimate each model's probability of being consistent with the data. Our principal assumption is that the measurement errors are Gaussian, which we use to generate a mock sample of GP reconstructed curves representing the possible variation of D away from D obs . We do this by where D i, obs (z l , z s ) are the actual measurements as a function of z l for each source shell z s . σ D i are the actual observed errors and r is a Gaussian random variable with zero mean and a variance of 1. Next, these D i (z l , z s ) are used together with the errors σ D i to reconstruct the function D mock (z l , z s ) corresponding to each mock sample, and finally we calculate the weighted absolute area difference between D mock (z l , z s ) and the GP reconstructed function of the actual data according to . (17) In this expression, z min and z max are the minimum and maximum redshifts, respectively, of the data range. We repeat this procedure 10,000 times to build a distribution of frequency versus area differential D A, and from it construct the cumulative probability distribution. In Fig. 1 we show these quantities for the illustrative source shell 0.50 < z s < 0.525 (the frequency is shown in the top panel, and the cumulative probability distribution is on the bottom). This procedure generates a 1-to-1 mapping between the value of D A and the frequency with which it arises. With the additional assumption that curves with a smaller D A are a better match to D obs , one can then use the cumulative distribution to estimate the probability that the difference between a model's prediction and the reconstructed curve is merely due to Gaussian randomness. When comparing a model's prediction to the data, we therefore calculate its D A and use our 1-to-1 mapping to determine the probability that its inconsistency with the data is just due to variance, rather than the model being wrong. These are the probabilities we then compare to determine which model is more likely to be correct. This basic concept is common to many kinds of statistical approaches, though none of the existing ones can be used when comparing two continuous curves, as we have here.
The reconstructed curves for our five subsamples are shown in the left-hand panels of Fig. 2. These correspond to the five source redshift shells in Tables 1, 2, 3, 4 and 5. The corresponding cumulative probability distributions are plotted in the right-hand panels, which also locate the D A values for R h = ct (yellow) and ΛCDM (red). The probabilities associated with these differential areas are summarized in Table 6. Along with the reconstructed functions, the lefthand panels also show the corresponding 1σ (dark) and 2σ (light) confidence regions provided by the GP, and the theoretical predictions in ΛCDM (dashed) and R h = ct (dotted). As we highlighted earlier, the functions D obs (z l , z s ) have been reconstructed without pre-assuming any parametric form, so in principle they represent the actual variation of D with redshift, regardless of whether or not either of the two models being tested here is the correct cosmology.
The overall impression one gets from the results displayed in Fig. 2 and summarized in Table 6 is that, for every source redshift shell sampled here, the probability of R h = ct being consistent with the GP reconstructed function D obs is ∼10-30% higher than that for ΛCDM. Future surveys will greatly grow the sample of sources available for this type of analysis, differentiating between these two models with greater confidence.

Conclusions
In this paper, we have introduced the GP reconstruction approach to strong lensing studies, though clearly the avail- able sample is still not large enough for us to make full use of this method. As noted earlier, one of the principal benefits of this technique is that the function (in this case D obs ) representing the data may be obtained without the assumption of any parametric form associated with particular models. This allows one to test different models against the actual D obs , rather than against each other's predictions, neither of which may be a good representation of the measurements. In addition, GP provide 1σ and 2σ confidence regions for the reconstructed functions more in line with the population as a whole, rather than individual data points, greatly restricting the ability of 'incorrect' models to adequately fit the observations due to otherwise large measurement errors. This is reflected in the probabilities quoted in Table 6 for the two models we have examined here. Unlike previous model comparisons based on the use of parametric fits to the strong-lensing data, we now find that R h = ct is favoured over ΛCDM with consistently higher likelihoods in all five source redshift shells we have assembled for this work. Though these statistics are still quite limited, it is nonetheless telling that the differentiation between models improves as the number of sources within each shell increases. Also, at least for R h = ct, the probability of its predictions matching the GP reconstructed functions generally increases as the size of the lens sample grows. The outcome of this work underscores the importance of using unbiased data and sound statistical methods when comparing different cosmological models. As a counterexample, consider the use of H (z) measurements based on BAO observations instead of cosmic chronometers [28], constituting an unwitting use of model-dependent measurements to test competing models. Such an approach ignores the significant limitations in all but the three most recent BAO measurements [58,59] for this type of work. Previous applications of the galaxy two-point correlation function to measure the BAO scale were contaminated with redshift distortions associated with internal gravitational effects [59]. To illustrate the significance of these limitations, and the impact of the biased BAO measurements of H (z), note how the model favoured by the data switches from ΛCDM to R h = ct when only the unbiased measurements are used [61].
A second counterexample is provided by the merger of disparate sub-samples of Type Ia SNe to improve the statistical analysis. We have already published an in-depth explanation of the perils associated with the blending of data with different systematics for the purpose of model selection [62], but let us nonetheless consider a brief synopsis here. The Union2.1 catalog [63,64] includes ≈ 580 SN detections, though each sub-sample has its own systematic and intrinsic uncertainties. The conventional approach minimizes an overall χ 2 , while each sub-sample is assigned an intrinsic dispersion to ensure that χ 2 dof = 1 [28,29]. Instead, the statistically correct approach would estimate the unknown intrinsic dispersions simultaneously with all other parameters [62,65]. Quite tellingly, the outcome of the model selection is reversed when one switches from the improper statistical approach to the correct one. To emphasize how critical this reversal is in the case of ΛCDM, one simply needs to compare the outcome of using a merged supersample with that produced with a large, single homogeneous sample, such as the Supernova Legacy Survey Sample [66]. Within this context, we highlight the fact that the features of the GP reconstruction approach in the study of strong lenses are promising because, in spite of the fact that the use of these systems to measure cosmological parameters has been with us for over a decade (see, e.g., Refs. [4][5][6]67]), the results of this effort have thus far been less precise than those of other kinds of observations. For the large part, these earlier studies were based on the use of parametric fits to the data, but it is quite evident (e.g., from Figs. 1 and 2 in Ref. [7]) that the scatter in D obs about the theoretical curves generally increases significantly as D A (z l , z s ) → D A (0, z s ). That is, measuring D incurs a progressively greater error as the distance to the gravitational lens becomes a smaller fraction of the distance to the quasar source. This has to do with the fact that θ E changes less for large values of z s /z l so, for a fixed error in the Einstein angle, the measurement of D obs becomes less precise. As we have demonstrated in this paper, the analysis of strong lensing systems based on a GP reconstruction of D obs improves our ability to distinguish between different models, albeit by a modest amount given the current sample.
Upcoming survey projects, such as the Dark Energy Survey (DES; [68]), the Large Synoptic Survey Telescope (LSST; [69]), the Joint Dark Energy Mission (JDEM; [70]), and the Square Kilometer Array (SKA; e.g., Ref. [71]), are expected to greatly grow the size of the lens sample. The ability of GP reconstruction methods to differentiate between models will increase in tandem with this growth. Several sources of uncertainty still remain, however, including the actual mass distribution within the lens. And since such errors appear to be more restricting for lens systems with large values of z s /z l , a priority for future work should be the identification of strong lenses with small angular diameter distances between the source and lens relative to the distance between the lens and observer.