GRAVITATIONAL LENS MODELING WITH BASIS SETS

Simon Birrer; Adam Amara; Alexandre Refregier

doi:10.1088/0004-637X/813/2/102

1. INTRODUCTION

The standard cosmological model is based on the standard model of particle physics, Einstein's theory of General Relativity, a cosmological constant, cold dark matter, and inflation. The physical origin of the cosmological constant, inflation, and dark matter remains a mystery to date. The predictions of the expansion history of the universe have been probed with high precision and structure formation has been tested from the horizon scale down to about 1 Mpc or even below (e.g., Dawson et al. 2013; Planck-Collaboration & Aghanim 2015). The smallest scale tests come from the Lyα forest (see e.g., Seljak et al. 2006) and strong and weak lensing in anomalous quadrupole lenses (e.g., Inoue et al. 2014). At even smaller scales in the nonlinear regime, there are observational and theoretical challenges in bringing models and data into agreement. This problem occurs predominantly in the number, phase space densities, and density profiles when comparing simulations of dark matter substructure with observations of luminous satellite galaxies in our Milky Way (see, e.g., Kauffmann et al. 1993; Klypin et al. 1999; Moore et al. 1999; Boylan-Kolchin et al. 2011). A potential non-gravitational (i.e., collisional) effect of a dark matter particle may influence structure formation on small scales without having an effect on larger scales. Probing the small-scale structure formation and mass distribution may thus provide information beyond the ΛCDM model.

Strong lensing is a powerful probe to test structure formation on small scales (Metcalf & Madau 2001; Dalal & Kochanek 2002; Yoo et al. 2006; Keeton & Moustakas 2009; Moustakas et al. 2009). Strong lensing is the effect caused by the bending of light by massive foreground over-density (e.g., galaxy, group or cluster) such that multiple images of the same background object appears. This effect is well suited for many astrophysical and cosmological applications (see, e.g., Treu 2010, a review focused on galaxy sized lenses). Strong Lensing was also proposed to detect luminous and dark substructure in the lens (Koopmans 2005; Vegetti & Koopmans 2009). This technique has been successfully applied to data (Vegetti et al. 2010, 2012) where sub-clumps down to about $2\times {10}^{8}\;{M}_{\odot }$ masses have been detected. Substructure also has an effect on the flux ratios in multiple lensed quasar images (see, e.g., Metcalf & Zhao 2002; Kochanek & Dalal 2004; Amara et al. 2006; Metcalf & Amara 2012; Xu et al. 2015). Anomalous flux ratios have thus been reported in the literature. Takahashi & Inoue (2014) pointed out that the anomalous flux ratios measured can be accounted by line-of-sight structure and do not have to necessarily come from structure within the lens. With recent and upcoming large scale surveys new area and depth becomes available to discover strong lens systems. Oguri & Marshall (2010) forecasted thousands of lensed quasar systems from DES and LSST. These data sets will help to constrain the statistical features of the small scale structure imprinted in the strong lensing signal. The increasing number of strong lens systems will in the future need to be analyzed with automated modeling approaches.

The aim of this paper is to describe a lens modeling approach that can be applied to different lens systems without adjusting parameter priors by hand and uses all the information contained in an image to constrain the projected mass density of the lens with a special emphasis on substructure. Our model approach is based on parameterized basis sets in the source surface brightness and lens model. The model framework can handle an adaptive complexity in the source and lens models. In addition to the basis sets, we show the power of modern sampling techniques and we make use of fast computational methods.

The paper is structured as follow. In Section 2 we give an overview of existing lens model techniques and show how they relate to our modeling approach. In Section 3 the source surface brightness and lens potential basis sets on which our model relies on are introduced. Section 4 describes the model fitting procedure and in particular how the source surface brightness reconstruction is done and how we deal with the high number of nonlinear parameters in the lens model. We test our fitting procedure on mock data and on Hubble data of the lens system RX J1131-1231. In Section 6, we study how well we can detect substructure in a lens model without prior information on the mass, slope, and position. This section is followed by a conclusion (Section 7).

2. OVERVIEW OF LENS MODEL TECHNIQUES

Galaxy-size strong lenses have been modeled extensively in the literature (see references in this section). The following aspects have to be modeled in a strong lens system when comparing data and models on the image level:

1.
the lens mass model;
2.
the source surface brightness profile;
3.
the lens surface brightness profile;
4.
the point-spread function (PSF).

Depending on the lens system and instrument, one has to also model dust extinction, external convergence, micro lensing by stars, and other aspects.

Depending on the scientific aim, the main focus is typically more on the source surface brightness reconstruction or on the lens mass reconstruction. In both cases one can, broadly speaking, divide the modeling techniques in two regimes:

(1) Parametric reconstruction: using simple and physically motivated functional forms with a controllable number of parameters (∼10; e.g., Kochanek 1991; Kneib et al. 1996; Keeton 2001; Jullo et al. 2007). A controllable number of parameters implies that one can fully explore the parameter space and convergence to the best-fitting configuration can often be optained.

(2) Pixel based reconstruction: this is most often done with a grid where each pixel is treated as a free parameter. Pixelized source surface brightness inversions have been proposed by e.g., Wallington et al. (1996), Warren & Dye (2003), Treu & Koopmans (2004), Dye & Warren (2005), Koopmans (2005), Brewer & Lewis (2006), Suyu et al. (2006, 2010), Wayth & Webster (2006). These methods often rely on a regularization of the pixel grid when there is not a unique solution. Depending on the regularization procedures, priors and the pixel size, one can come to different reconstructed sources (see e.g., Suyu et al. 2006, 2013). Recently Tagore & Keeton (2014) did an analysis of statistical and systematic uncertainties in pixel-based source reconstructions. Furthermore, these methods are computationally expensive as they rely on large matrix inversions. For the lens mass or its potential, grid based modeling has been applied e.g., by, Blandford et al. (2001), Saha & Williams (2004), Bradač et al. (2005), Koopmans (2005), Saha et al. (2006), Suyu & Blandford (2006), Jee et al. (2007); Vegetti & Koopmans (2009), Suyu et al. (2009), Vegetti et al. (2012), and Coles et al. (2014), and even mesh-free models Merten (2014).

Computational techniques also vary for different modeling approaches. Ray-tracing has generally been used to map extended surface brightness from the source to the image plane. If significant surface brightness variations occur on very small scales, such as for quasars due to their compact size, simple ray-tracing can lead to numerical inaccuracies. One way to model such systems is to approximate quasars as point sources. One then solves the lens equation numerically for the positions in the image plane (recently e.g., Suyu et al. 2013). An alternative to avoid the point source approximation is adaptive mesh refinement (e.g., Metcalf & Amara 2012; Metcalf & Petkova 2014) which changes the ray-tracing refinement scale depending on the local spacial variation of the source at different image positions.

In standard ΛCDM, the self-similarity of dark matter indicates that the same amount of complexity as seen in galaxy clusters must also be present in galaxy-sized strong lens systems. Its effect is much weaker in terms of deflection and magnification, but it must still be present. Ideally, we want a model that is flexible such that it can describe any lens mass and source surface brightness distribution. For this model we need to be able to explore its degeneracies and to converge to the "true" solution to extract the information contained in a strong lens system.

One of the aims of our work is to fill the gap between the parametric and non-parametric models. We do so by choosing basis sets that we treat in a fully parametrized form.

3. CHOICE OF BASIS SETS

In the following sections we describe our choices for basis sets and, in addition, we present how we produce mock data given a set of parameter values.

3.1. Basis for the Source

We make use of shapelets (introduced by Refregier 2003; Refregier & Bacon 2003; Massey & Refregier 2005) in the source surface brightness plane. We implemented the two-dimensional Cartesian shapelets (Equations (1) and (18) in Refregier 2003, or in Appendix A of this work). Independent of this work, Tagore & Jackson (2015) proposed a different method to use shapelets in the source reconstruction. Shapelets form a complete orthonormal basis for an infinite series. Restricting the shapelet basis to order n provides us with a finite basis set that is linked to the scales being modeled. If we wish to model a larger range of spatial scales in the surface brightness profile, we need to use more high order shapelets. The number of basis functions m is related to the restricted order n by $m=(n+1)(n+2)/2.$ The shapelet basis functions allow us to dynamically adapt to a given problem. We can increase the complexity when we need them and reduce it when it is not appropriate. Apart from the order n, one can also set the reference scale β of the basis function. The minimal and maximal scales ( ${l}_{\mathrm{min}},$ ${l}_{\mathrm{max}}$ ) being resolved up to order n with reference scale β are ${l}_{\mathrm{min}}=\beta /\sqrt{n+1}$ and ${l}_{\mathrm{max}}=\beta \sqrt{n+1}.$ The parameter β is a user specified choice. Another choice is the peak position of the shapelet center $({x}_{0},{y}_{0}).$ For any finite order in n, the choice of the center is crucial for the fitting result. A natural choice for $({x}_{0},{y}_{0})$ is the center of the light profile of the source galaxy. In that sense $({x}_{0},{y}_{0})$ must be interpreted as two nonlinear parameters.

3.2. Basis for the Lens

Choosing a realistic basis set for the lens mass distribution is a challenging task as there are many different scales involved, especially when considering low mass sub-clumps. These sub-clumps are very small in scale but are also very dense. Having a basis set which allows a general description of such clumpy halos on different scales typically involves a large number of parameters. Depending on the sub-clump mass limit being considered, there are hundreds or even thousands of sub-clumps expected. A minimal description requires at least information about individual positions, masses, and concentrations. Such a description leads to a degenerate and non-unique lens model (e.g., Keeton 2010; Kneib & Natarajan 2011). For cluster lenses, the typical masses of substructure are several orders of magnitude below the total lens mass, but it is possible to give strong priors on the location of the substructure, namely at the position of the luminous galaxies. For detecting invisible substructure such a prior can not be used. As often called "non-parametric" or "free-form" approach, meaning there are more parameters than data constraints (i.e., deliberately under-constrained) was proposed and implemented by Saha & Williams (2004) and Coles et al. (2014). Using the catalog level image position information and time-delay measurements, there is far less information available than parameters to be constrained. One is able to draw random realizations of lens models that meet all the constraints. Statements about the validity of a specific lens model can only be drawn statistically. Doing a comparison on the image level where about 10³–10⁴ pixels are involved, more information is available to constrain the model.

In our approach we start with a softened power-law elliptical potential (SPEP; e.g., discussed by Barkana 1998). The lensing potential Φ is parameterized as

$\begin{eqnarray}&&{\rm{\Phi }}({x}_{1},{x}_{2})=\displaystyle \frac{2{E}_{{\rm{p}}}^{2}}{{\eta }^{2}}{\left(\displaystyle \frac{{\rho }_{{\rm{p}}}^{2}+{s}_{{\rm{p}}}^{2}}{{E}_{{\rm{p}}}^{2}}\right)}^{\eta /2},\end{eqnarray} \tag{ 1 }$

where

$\begin{eqnarray}&&{\rho }_{{\rm{p}}}^{2}={x}_{1}^{2}+{x}_{2}^{2}/{\mathrm{cos}}^{2}{\beta }_{{\rm{p}}}\end{eqnarray} \tag{ 2 }$

with $\mathrm{cos}{\beta }_{{\rm{p}}}$ being the axis ratio of the potential, η the power-law index, E_p the normalization of the potential, s_p the smoothing length, and ${x}_{\mathrm{1,2}}$ the position rotated such that x₁ is in the direction of the major axis of the potential. For an additional sub-clump, we model them either as a spherical Navarro–Frenk–White profile (Navarro et al. 1997) or a spherical power-law potential (SPP). For both functions, we set the softening length ${s}_{{\rm{p}}}=\mathrm{const}=0\buildrel{\prime\prime}\over{.} 0001$ for computational reasons. In that sense, the softening is virtually zero and is not a free parameter in this work.

Combining the two functions (SPEP and SPP) we get $6+4=10$ nonlinear free and partially degenerated parameters to be fitted. With this parameterization we expect a good overall fit to many different lens systems and perhaps to catch the largest substructure within the lens, visible or invisible. Such tests are shown in Section 5.

In addition, we include two-dimensional Cartesian shapelets (same functional form as for the source in Section 3.1) in the potential. We choose the scale factor β to be the Einstein radius. This allows for perturbations at the global scale of the lens that can not be made with another peaked profile. The first derivatives of the potential (deflection angle) and second order derivatives (convergence and shear) can be computed analytically and can be expressed within the same shapelet basis functions (see Appendix A), thus enabling fast computations.

3.3. Basis for the Lens Light

For the description of the lens light, we use Sérsic profiles (Sérsic 1968). Depending on the lens galaxy, adding multiple Sérsic profiles can lead to better fits (see e.g., Suyu et al. 2013).

3.4. Image Making

Having a parametric description of the source surface brightness, a possible point source, the lens potential, the lens light and the PSF, an image can be generated using the following steps.

1.
Starting in the image plane one evaluates the analytic expression of the deflection angle using grid based ray-tracing. The resolution has to be of the order of (or slightly smaller than) $\beta /\sqrt{n}$ to capture the features in the extended source model.
2.
We then compute the point source image in an iterative ray-shooting procedure starting from the local minimas of the relative distance to the point source of step 1. Corrections for the next proposed ray-shooting position can be made when considering the relative displacement to the point source and the second-order derivatives of the lens potential. The requirement of the precision of the point source position in the image plane of about 1/1000 of the pixel size can be reached within very few iterations.
3.
For the point sources, which appear as PSFs, we normalize the externally estimated PSF to their intrinsic brightness and lens magnification. We do not lose significant computational speed when modeling the PSF further out to the diffraction spikes. For the extended surface brightness, a numerical convolution needs to be made. This can be done either at the pixel or sub-pixel level. This step is the most expensive computational process in the forward image modeling. The process scales roughly linearly with the number of pixels or sub-pixels in the convolution kernel. We use Fast-Fourier-Transforms implemented in a scipy routine in python. Our default kernel size is 15 × 15 pixels.
4.
The lens light is added with analytical Sérsic profiles convolved with the same PSF kernel as the extended source surface brightness.

For the modeling, we do not add noise. When simulating realistic images, we add a Gaussian background noise with mean zero to all pixels and a scaled Poisson noise on the signal (pixel by pixel).

4. MODEL FITTING

For the modeling, we have three questions to answer.

1.
What is the best-fit configuration of the model to match the data of a specific lens system? We want to find the global minima for the ${\chi }^{2}$ value.
2.
What level of complexity is needed to fit the data to a certain level? We want to compare consistency with the data by analyzing the reduced ${\chi }^{2}$ value and compare different model configurations with a Bayes factor analysis.
3.
How well is the model solution determined by the data? We want to sample the parameter space and determine confidence intervals.

As a result, many choices have to be made in the lens modeling. More than 10 parameters in the lens model with nonlinear behavior have to be specified. For a realistic surface brightness description the shapelet order n can be higher than 20, which corresponds to 154 basis' and their corresponding coefficients. Given this level of complexity, even the first question on its own is difficult to address. Once we have a method for addressing the first question, repeating the procedure with different choices of complexity and parameterization will provide an answer to question 2. Question 3 can then be answered with a Bayesian inference method such as a Markov chain Monte Carlo (MCMC) sampling. For this we use the software package CosmoHammer (Akeret et al. 2013), which is based on the emcee method of Goodman & Weare (2010) and its implementation by Foreman-Mackey et al. (2013). The software package allows for massive parallelization in the sampling process. In this section, we focus on question 1. We will describe in detail the methods and procedures we apply to make the algorithm converge to the best-fit lens model configuration. Question 2 and 3 are addressed with examples in Sections 5 and 6.

4.1. Source Surface Brightness Reconstruction

In our method we use a weighted linear least square approach to reconstruct the source surface brightness. This is a standard procedure to minimize the quadratic distance between data and model with weighted error measures. The estimation of the covariance can also be calculated (see Equations (3)–(6) below). The minimization problem has to be linear. Let ${\boldsymbol{y}}$ be the data vector of dimension d. In our system, it contains all the pixel values of the image in the area of interest for a surface brightness reconstruction. Let W be the weight matrix of dimension d × d. In a likelihood interpretation, W is the inverse covariance matrix of the data. Assuming that the pixel errors are uncorrelated, W is a diagonal matrix. Let ${\boldsymbol{\xi }}$ be the parameter vector of dimension m. In our case, ${\boldsymbol{\xi }}$ is the vector of the coefficients of the linear combination of shapelet basis functions. The number of shapelet basis functions m depends on the shapelet order n as described in Section 3.1. Let X be the linear response matrix of the shapelet parameters on the pixel values in the image plane of dimension d × m. The product $X{\boldsymbol{\xi }}$ describes a lensed and convolved surface brightness on the image plane. X can be computed by mapping all m shapelet basis functions from the source to the image plane, convolving, and resizing them separately on the pixel scale. The computational cost of this procedure is linear in the number of basis functions involved and dominates the process for low m.

Figure 1 illustrates how the shapelet basis functions are mapped. The problem of finding the best source configuration ${{\boldsymbol{\xi }}}_{0}$ given the data ${\boldsymbol{y}}$ and the weights W can be posed as a weighted linear least square problem:

$\begin{eqnarray}&&{{\boldsymbol{\xi }}}_{0}={\mathrm{min}}_{\xi }\parallel {W}^{1/2}({\boldsymbol{y}}-X{\boldsymbol{\xi }})\parallel .\end{eqnarray} \tag{ 3 }$

This equation can be written as

$\begin{eqnarray}&&({X}^{\top }{WX}){{\boldsymbol{\xi }}}_{0}={X}^{\top }W{\boldsymbol{y}},\end{eqnarray} \tag{ 4 }$

whose solution is given by

$\begin{eqnarray}&&{{\boldsymbol{\xi }}}_{0}={({X}^{\top }{WX})}^{-1}{X}^{\top }W{\boldsymbol{y}}.\end{eqnarray} \tag{ 5 }$

The covariance matrix of ${\boldsymbol{\xi }},\;$ ${M}^{\xi }$ is therefore given by

$\begin{eqnarray}&&{M}^{\xi }={({X}^{\top }{WX})}^{-1}.\end{eqnarray} \tag{ 6 }$

${M}^{\xi }$ becomes important when marginalizing the probability distribution over ${\boldsymbol{\xi }}.$

**Figure 1.** An illustration of the modeling of the source surface brightness with three different shapelet basis functions. Left panels: shapelet basis function in the source plane. Middle panels: mapped shapelets in the image plane with a SIS lens via ray-tracing. Right panels: PSF convolved image. From top to bottom: Shapelets with $({n}_{1},{n}_{2})=(1,0),(2,1),(3,5).$
Download figure:
Standard image High-resolution image

**Figure 1.** An illustration of the modeling of the source surface brightness with three different shapelet basis functions. Left panels: shapelet basis function in the source plane. Middle panels: mapped shapelets in the image plane with a SIS lens via ray-tracing. Right panels: PSF convolved image. From top to bottom: Shapelets with $({n}_{1},{n}_{2})=(1,0),(2,1),(3,5).$
Download figure:
Standard image High-resolution image

The procedure involves a matrix inversion of dimension m × m. The computational cost and memory allocation of this inversion becomes more significant with larger m. Moreover, the matrix $({X}^{\top }{WX})$ has to be invertible. If not, this method fails to find a solution and regularization is needed. A grid based regularization was introduced by Warren & Dye (2003). Conceptually and computationally, the method of Warren & Dye (2003) and the one presented in this paper differ significantly. The matrix $({X}^{\top }{WX})$ is a dense matrix where as the matrix in grid based regularization can be sparse. A sparse matrix can only be maintained when having a small PSF (e.g., 5 × 5 pixel). We use in our method a default PSF kernel of 15 × 15 pixels and a further extension affects only the FFT-convolution of the lensed shapelet basis functions. Our method is well suited to reconstruct also lensed sources in images with larger PSF's than Hubble Space Telescope (HST) images. But the main gain of our method is in terms of the number of parameters (i.e., the size of the matrix). Well chosen basis sets can allow for a smaller number of parameters compared to grid based methods significantly.

In Figure 2, we take a mock image produced with a chosen ${{\boldsymbol{\xi }}}_{0}$ (incl. point sources) with maximum shapelet order ${n}_{\mathrm{max}}=40$ and added Poisson and Gaussian background noise to the image. We check the reconstruction by computing the relative residuals and their correlation function. We see from Figure 2 that the results are almost consistent with purely uncorrelated noise. Only for very small separations, the correlation is marginally smaller than with noise. This effect highly depends on the signal-to-noise ratio of the shapelets. Since we know the input source surface brightness, we can also check its reconstruction. The error in input versus output in the source has features which represent the scales of the shapelet basis functions. The relative error of the surface brightness is about 10% or less. This reconstruction process with ${n}_{\mathrm{max}}=40$ and 15 × 15 pixel convolution kernel takes about 4 s on a standard personal computer. When reducing the number of shapelet coefficients to ${n}_{\mathrm{max}}=20$ the reconstruction falls below 1 s.

The specific reconstruction depends on the noise realization. By repeating the reconstruction 1000 times with different noise realizations, we find that the reconstruction is stable. In Figure 3 we plot the reduced ${\chi }^{2}$ distribution of the different realizations. We find a mean ${\chi }_{\mathrm{red}}^{2}=1.0015$ with a standard deviation of $\sigma =0.009.$

**Figure 3.** Distribution of the reduced ${\chi }^{2}$ values of 1000 realizations of the image reconstruction process (see Figure 2). Each realization differs in the Gaussian and Poisson noise realization. The mean value of the distribution is ${\chi }_{\mathrm{red}}^{2}=1.0015$ and the spread is $\sigma =0.009.$
Download figure:
Standard image High-resolution image

**Figure 3.** Distribution of the reduced ${\chi }^{2}$ values of 1000 realizations of the image reconstruction process (see Figure 2). Each realization differs in the Gaussian and Poisson noise realization. The mean value of the distribution is ${\chi }_{\mathrm{red}}^{2}=1.0015$ and the spread is $\sigma =0.009.$
Download figure:
Standard image High-resolution image

4.2. Convergence Techniques

In the previous section, we showed that we can linearize all parameters of the source model given a specific lens model, and thus we can express it as a linear minimization problem. The marginalization of the linear parameters can be made analytically (see Section 4.2.3 below). Changes in the lens model, however, have a nonlinear effect on the image. In that sense, we can marginalize over many parameters in our model and are left with about 10–30 nonlinear parameters. To explore this space, we use a Particle Swarm Optimization (PSO) Kennedy et al. (2001) algorithm to find the minimum of the parameter space. The algorithm is described in more detail with an illustration in Appendix C.

Convergence toward the global minimum in parameter space can depend on several factors. It depends on the volume of the parameter space, the number of local minima and the shape of the cost function around the absolute minimum. As we are marginalizing over all the source surface brightness parameters, one can have unexpected behavior of the cost function over the lens parameters. In the following we describe our convergence method, which goes beyond simply applying the PSO algorithm and which is important for the performance of our method.

4.2.1. Parameterization

The sampling in parameter space can be made in any parameterization with a bijective transformation to the originally described form. The parameterization can have a significant impact on the convergence capacity and performance of a specific algorithm. If there are periodic boundaries in a specific parameterization, some algorithms can have difficulties. In our model, this is the case for the parameter of the semimajor axis angle of the elliptical lens potential ${\theta }_{0}$ which is defined in the range $[0,\pi ).$ The model can continuously rotate the axis but the parameter space has to jump from 0 to π, or vise versa. Mapping ${\theta }_{0}$ and the axis ratio $q=\mathrm{cos}\beta$ into ellipticity parameters $({e}_{1},{e}_{2})$ with $f:[0,\pi )\times (0,1]\to (-1,1)\times (-1,1)$ given as

$\begin{eqnarray}&&f\left(\theta ,q\right)=\left(\displaystyle \frac{1-q}{1+q}\mathrm{cos}(2\theta ),\displaystyle \frac{1-q}{1+q}\mathrm{sin}(2\theta )\right)\end{eqnarray} \tag{ 7 }$

provides a continuous link between the lensing potential and the parameter space. Reducing the surface area of boundary conditions in the parameter space can also reduce the number of local minima at the boundary surface. The fewer local minima there are the better one can find the global minimum. Priors on (θ, q), i.e., based on the observed light distribution, must be transformed into priors on $({e}_{1},{e}_{2})$ accordingly. In this work we assign uniform uninformative priors on $({e}_{1},{e}_{2}).$

In general, the particular choice of the parameterization can be crucial. The smoother a change in parameter space reflects a small change in the model output, the better a convergence algorithm can deal with the system. The fewer constraints and boundary surfaces there are in the parameter space, the more general convergence algorithm manage to converge to the global minimum.

4.2.2. Convergence with Additional Constraints

In cases where the source galaxy hosts a quasar that dominates the luminosity, its lensed positions in the image plane can be determined by the data without knowledge of the lens, source position or the extended surface brightness. The feature in the image is very well predicted by the PSF model and dominates the brightness over an extended area in the image. Any proposed lens model that predicts the image positions displaced from the features in the image will be excluded by the data with high significance. The quasar point sources introduce a degeneracy of acceptable solutions within the original parameter space. Knowing about this degeneracy can lead to faster convergence.

When having N bright point source images, there are $2N$ constraints to the system (their positions in the image plane). This reduces the effective dimensionality of the parameter space by $2\;N.$ Lensing has three symmetries imprinted in the positional information: two translations and one rotation. These transformations do not change the lens model apart from its own transformation.

In general, we can use any parameterization ${\theta }_{i}$ of an originally M-dimensional parameter space to dimension $n=M-2N+3$ (with $N\gt =2$ ) if there exists a bijective transformation (an exact one-to-one mapping of the two sets) to the original parameter space with the applied constraints. In the case of four bright images of a quasar, we determine an $(M-5)$ -dimensional parameter space and solve for the source plane position of the quasar and five additional lens model parameter with a nonlinear solver. This reduces the nonlinear parameter space in the PSO process and leads to faster convergence without breaking any degeneracies. The choice of the five lens model parameters is arbitrary as long as the parameters can provide a solution ot the point source mapping. Priors on these parameters have also to be applied in the sampling process.

4.2.3. Likelihood Computation

The likelihood calculation on the image level is the product of the likelihoods of each pixel (see, e.g., Suyu et al. 2013, for a similar approach). We estimate the variance on the intensity at pixel i as

$\begin{eqnarray}&&{\sigma }_{\mathrm{pixel},i}^{2}={\sigma }_{\mathrm{bkgd},i}^{2}+{{fd}}_{\mathrm{model},i},\end{eqnarray} \tag{ 8 }$

where ${\sigma }_{\mathrm{bkgd},i}$ is the background noise estimated form the image, ${d}_{\mathrm{model},i}$ the model prediction at pixel i and f a scaling factor. A pure Poisson noise results in f being the product of exposure time and gain. The likelihood of the data ${d}_{\mathrm{data}}$ with ${N}_{{\rm{d}}}$ image pixels given a model ${d}_{\mathrm{model}}$ with nonlinear lens model parameters ${\boldsymbol{\theta }}$ can then be written as a marginalization over the linear parameters ${\boldsymbol{\xi }},$ the source surface brightness parameters:

$\begin{eqnarray}&&P({d}_{\mathrm{data}}| {\boldsymbol{\theta }})=\int d{\boldsymbol{\xi }}P({d}_{\mathrm{data}}| {\boldsymbol{\theta }},{\boldsymbol{\xi }})P({\boldsymbol{\xi }}),\end{eqnarray} \tag{ 9 }$

where

$\begin{eqnarray}&&P({d}_{\mathrm{data}}| {\boldsymbol{\theta }},{\boldsymbol{\xi }})=\displaystyle \frac{1}{{Z}_{{\rm{d}}}}\mathrm{exp}\displaystyle \sum _{i=1}^{{N}_{{\rm{d}}}}\left[-\displaystyle \frac{{({d}_{\mathrm{data},i}-{d}_{\mathrm{model},i})}^{2}}{2{\sigma }_{\mathrm{pixel},i}^{2}}\right],\end{eqnarray} \tag{ 10 }$

with ${{\boldsymbol{d}}}_{\mathrm{model}}={\boldsymbol{X}}({\boldsymbol{\theta }}){\boldsymbol{\xi }}.$ ${Z}_{{\rm{d}}}$ is the normalization

$\begin{eqnarray}&&{Z}_{{\rm{d}}}={(2\pi )}^{{N}_{{\rm{d}}}/2}\displaystyle \prod _{i}^{{N}_{{\rm{d}}}}{\sigma }_{\mathrm{pixel},{\rm{i}}}\end{eqnarray} \tag{ 11 }$

and $P({\boldsymbol{\xi }})$ the prior distribution of the shapelet coefficients. We assume a uniform prior distribution which is independent of the lens model. The integral in Equation (9) can be computed around the maximum ${{\boldsymbol{\xi }}}_{0}$ coming from Equation (5) with covariance matrix ${M}^{\xi }$ from Equation (6). With a second-order Taylor expansion around ${{\boldsymbol{\xi }}}_{0},$ Equation (9) can be written as

$\begin{eqnarray}&&P({d}_{\mathrm{data}}| {\boldsymbol{\theta }},{{\boldsymbol{\xi }}}_{0}+{\rm{\Delta }}{\boldsymbol{\xi }})\approx P({d}_{\mathrm{data}}| {\boldsymbol{\theta }},{{\boldsymbol{\xi }}}_{0})\cdot {e}^{-\frac{1}{2}{\boldsymbol{\Delta }}{\xi }^{T}{({M}^{\xi })}^{-1}{\boldsymbol{\Delta }}\xi }.\end{eqnarray} \tag{ 12 }$

Integrating Equation (12) over ${\rm{\Delta }}{\boldsymbol{\xi }}$ results in

$\begin{eqnarray}&&P({d}_{\mathrm{data}}| {\boldsymbol{\theta }})=P({d}_{\mathrm{data}}| {\boldsymbol{\theta }},{{\boldsymbol{\xi }}}_{0}){\left[{(2\pi )}^{m}\mathrm{det}({M}^{\beta })\right]}^{\frac{1}{2}}.\end{eqnarray} \tag{ 13 }$

In principle, Equation (10) is the cost function to use for image comparison. The information about the image positions is included in this cost function. The problem with this cost function is that convergence to a good model can be difficult. The use of additional or derived information, such as the explicit image positions, can facilitate convergence.

4.2.4. Steps Toward Convergence

Having presented our model parameterization in Section 3 and discussed certain aspects of model fitting and convergence in the previous paragraphs, we describe our steps which allows us to find a reasonable fit to the data. Figure 4 illustrates the framework. Prior to the convergence algorithm, the image data has to be analyzed, the model configuration has to be chosen, the prior values have to be set and the specific configuration of the PSO process has to be given as an input. The fitting should be done within an automated process where no interaction of the modeler is needed. The output of the PSO run can then be analyzed by the modeler in terms of convergence and quality of fitting. This may lead to a change in the model parameters, functions, configuration etc. and the process is run again. Once convergence is achieved and the fitting result is good, the MCMC process is run to map out the valid parameter space given the model parameters chosen. Figure 5 illustrates the PSO and MCMC process in a nine-dimensional parameter space. The thin blue lines corresponds to the PSO process. Once this process is converged we start the MCMC process around this position (light and dark gray contours). Certain parameters are more degenerate than others. We try to map the parameter space such that remaining degeneracies are controllable.

**Figure 4.** Chart of the framework highlighting user interactions. Human interactions are needed for some tasks (green) and decisions (blue). Automated tasks are shown in gray. The core of this framework is to clearly split the preprocessing from the fitting algorithm.
Download figure:
Standard image High-resolution image

**Figure 5.** Illustration of a combined PSO and MCMC chain in a nine-dimensional nonlinear parameter space. The blue lines connect the best-fit particle during the PSO process. The red lines mark the true input parameter. Dark (light) gray contours mark the 68%-CL (95%-CL) interval estimated from the MCMC process.
Download figure:
Standard image High-resolution image

5. EXAMPLE—RX J1131-1231

In the following, we test our method on the gravitational lens RX J1131-1231. This lens was discovered by Sluse et al. (2003) and the redshifts of the lens z_l = 0.295 and background quasar source z_s = 0.658 were determined spectroscopically. The lens was extensively modeled by Claeskens et al. (2006), Brewer & Lewis (2008), and Suyu et al. (2013). We use the same archival HST ACS image in filter F814W (GO 9744; PI: Kochanek) as Suyu et al. (2013) for our lens modeling and follow a similar procedure for the reduction process and error estimation. We make use of the MultiDrizzle product from the HST archive. The PSF is estimated from the stacking of nearby stars. We estimate a PSF model error by computing the variance in each pixel from the different stars after a sub-pixel alignment with an interpolation done using all of the stars. We assume that this model error is proportional to the intensity of the point source. This method is meant to demonstrate our method in fitting the best configuration. The lens model is parameterized as an SPEP (ellipsoid) and a second SPP (round) profile (see Equation (1)). Furthermore, we choose 15 shapelet basis sets in the potential and a constant external shear component. For the lens light we follow Suyu et al. (2013) and use two elliptical Sérsic profiles with common centroids and position angles to describe the main lens galaxy and a circular Sérsic profile with ${n}_{\mathrm{sersic}}=1$ for the small companion galaxy.

Figure 6 shows our result of the fitting process to the HST image. In the upper left panel we show the reduced data. Upper middle shows the best-fit model. On the upper right the normalized residuals are plotted. The reduced ${\chi }^{2}$ value of this fit is ${\chi }_{\mathrm{red}}^{2}=1.5$ without adjusting any Poisson factors nor the background noise level originally derived from the image data products. We clearly see that there are significant residuals around the point sources which indicates clearly that our PSF model needs further improvement and that even our error model on the PSF seems to underestimate the model error in certain regions. Furthermore, extended regions of over- or under-fitting indicate that the lens model can be improved. Source surface brightness adoptions could have acted to reduce the error in the fit in case of a perfect lens model. The lower left panel shows the reconstructed extended source surface brightness profile. We clearly see the presumably star forming clumps which lead to the features in the extended Einstein ring. In the lower middle panel of Figure 6 our lens model is shown in terms of the convergence map. We notice that without mass-to-light priors, the position of the two modeled clumps is strikingly close to the position of the luminous galaxy and its companion. In the lower right panel, the magnification map is shown. The reconstruction of the image depends on the number of shapelet coefficients used. In Appendix B we discuss the effect of n_max on the quality of the source and image reconstruction for this particular lens system.

Comparing different lens and source model reconstructions from different methods is difficult. Different source surface brightness reconstruction techniques use different numbers of parameters and thus can have different ${\chi }^{2}$ values without changing the lens model. Error models and masking do have a significant impact on ${\chi }^{2}.$ Setting priors may also lead to different results (in the case of Suyu et al. 2013, position of the sub-clump modeled as a singular isothermal sphere was fixed at the position of the luminous companion and additional information from velocity dispersion measurements). All in all, different lens modeling techniques can only be properly compared based on mock data. And even on mock data, different input types of lens and source models might have a significant influence on the relative performance of the methods.

6. DETECTABILITY OF SUBSTRUCTURE

One of our main focuses for the model we present is to find and quantify substructure within a lens. In this section, we want to discuss the following issues.

1.
To what extend are our model basis functions and description able to reproduce the true image?
2.
In case of a perfect modeling: are we able to recover the true parameter configuration in a large parameter space?
3.
In case of an imperfect modeling: how does this affect the sensitivity limit, finding and quantification of substructures?

To answer our first question, we refer to our data example of RX J1131-1231 in panel 5 of Figure 6. Even though the observed and predicted images can hardly be distinguish by eye, the residual map indicates room for improvements in our modeling. Nevertheless the fact that our mass-to-light prior-free lens model provides us with a realistic solution might indicate that we are not far from reality. A priori, we do not know whether the solution found in Section 5 is the global minimum of the parameter space chosen and therefore the best reachable solution within the choices and parameters made. We will investigate whether the finder algorithm is able to recover the true input parameters when fitting mock images in the next section.

6.1. Substructure Finding

To approach question 2 above, we take a mock image that is highly inspired by RX J1131-1231 of Section 5. We keep the image quality fixed (i.e., noise levels, pixel size and PSF) but change the lens model such that we have one big SPEP profile and a minor sub-clump, a SPP. Ideally, we do not want to set any prior on the position, mass, shape and number of substructures. If we were interested in luminous sub-structure, then we could add mass-to-light priors. As we want to use our method to potentially detect dark sub-structure, we are not allowed to give any mass-to-light prior. Therefore we want to check whether our algorithm finds the preferential parameter space in the model. The main focus is on the position of the sub-clump. To explore our capability of finding sub-clumps, we generate mock data with a sub-clump in the lens model at a random position. We add Poisson and Gaussian noise on the mock image. We then run the convergence method on that image with the same weak prior information as was done for the real image in Section 5. We repeat this procedure 10 times. Our result is as follows.

1.
Success rate in position of 100%. For our setting with a random sampling of the prior parameter space, all the runs ended around the right solution (PSO).
2.
Detectability down to 10⁻⁴ level of the total lens mass in the arc of the Einstein ring (MCMC).
3.
Time for convergence of about 10⁵ evaluations of a model configuration needed. One evaluation takes a few seconds.

For one realization of the input–output process the comparison is shown in Figure 7 in terms of convergence and magnification and their residuals. We clearly see that the position of the sub-clump can be well recovered and the appearance of the critical line do match very well. This means that there is no other degenerate solution within the parameter space that can reproduce a similar feature like a sub-clump no matter what combination of source surface profile and lens model we chose. This test shows that with an ideal model we can find a single sub-clump in the lens mass without any prior on its existence and position. We also highlight that the relative likelihood comparison is large (more than 5σ compared with the best-fit model without a sub-clump). This statement in this form can only be made if other effects (such as error in the PSF model) do not interfere. As we showed for the real image in Section 5, errors in the PSF model alone can potentially lead to a higher increase in the minimal ${\chi }^{2}.$ This test, together with the discovery of a sub-clump in RX J1131-1231, in both cases without setting priors on position, mass-to-light, concentration, and mass, are encouraging hints that our model approach can extract valuable information about a lens system in a rather unbiased way.

**Figure 7.** Lens model input–output comparison for convergence and magnification log(abs(magnification)). Without priors on the size or position of the sub-clump of the order of ${10}^{8}\;{M}_{\odot },$ the PSO can find the lens configuration. The input–output comparison of the image is illustrated in Figure 2. The sensitivity for the sub-clump detection is analyzed in Figure 5. We clearly see that the we are sensitive to the position up to 01.
Download figure:
Standard image High-resolution image

6.2. Substructure Sensitivity

The last section discussed the potential to recover sub-structure. We showed that this is possible for substructure with mass ratios of ${M}_{\mathrm{sub}}\sim {10}^{-4}{M}_{\mathrm{lens}}$ , consistent with the direct detection of a sub-clump in Vegetti et al. (2012). This implies that we are sensitive to this mass regime when fitting a mass-concentration relation. The concentration of the sub-clump is not exactly matched when comparing with the most likely solution of the PSO in Figure 7. To effectively see how well we can constrain certain parameters in the model from the data, we do a full likelihood analysis with a MCMC. The mapping of the entire parameter space of this realization with an MCMC is illustrated in Figure 5 for exactly the same realization as Figure 7. The red lines mark the input parameters. We see that the input parameters are always within 2σ of the output parameter distribution. We see that there is a partial degeneracy between mass and concentration of the sub-clump (γ (clump) and log ${\theta }_{E}$ (clump) in Figure 5. Not surprising, it is difficult to constrain the profile of a very small clump which itself is close to the detection limit. Even better data than that from HST, such as from James Webb Space Telescope or ALMA, can potentially detect clumps down to lower mass levels and also constrain the profiles of these small clumps. The sensitivity limit relies mostly on three criteria: FWHM of PSF, magnification at the position of the sub-clump, and source surface brightness variation at the lensed position of the sub-clump. For any different data qualities, telescopes, etc., we are able to perform such sensitivity tests.

7. CONCLUSION

In this paper, we introduced a new strong lensing modeling framework which is based on versatile basis sets for the surface brightness and lens model. We identified the following aspects of our framework.

1.
Its modular design allows for a step-by-step increase in complexity. We are able to determine which part of the modeling needs more complexity to reproduce a lens system.
2.
It allows for automated or semi-automated fitting procedures. An adaptive cost function combined with a best-fit algorithm, allows it to fit different strong lens systems without giving specific priors to each one of them. This allows for faster and more systematic analyses of large numbers of lens systems.
3.
It is suitable for a wide range of strong lensing systems and observing conditions. Our framework can be applied to various levels of image quality, different type of lens system, sizes of the lens, etc., as the convergence algorithm does not rely on strong initial priors.
4.
It features fast source reconstruction techniques. The evaluation of the cost function given a position in parameter space including the simulation of the image can be achieved within seconds. Furthermore our convergence algorithm allows for massive parallelization on a distributed computer architecture.

We further proposed a way to model strong lens systems to extract information about the substructure content within the lens. Such investigations can potentially provide useful constraints on the abundances of low mass objects. To learn about the dark matter properties from strong lensing, one needs to combine well chosen descriptions for the source light and lens mass, algorithm techniques which can find solution in high dimension parameter spaces and a combination of different data sets to break degeneracies (multi-band imaging, spectroscopy, etc.). A special focus has to be made in choosing the right set of basis functions and the algorithmic design of the convergence method. Our approach is encouraging as it succeeds in recovering substructure in the lens without setting mass-to-light priors.

S.B. thanks Sebastian Seehars, Claudio Bruderer, Andrina Nicola, Frederic Courbin, Sherry Suyu, and Kevin Fusshoeller for helpful comments and useful discussions and Joel Akeret for valuable input on the software design and algorithms. We also want to thank the anonymous referee who helped improving this manuscript. We acknowledge the import, partial use, or inspiration of the following python packages: CosmoHammer (Akeret et al. 2013), Ufig (Bergé et al. 2013), Hope (Akeret et al. 2015), astropy http://www.astropy.org/, triangle, numpy, scipy. This work has been supported by the Swiss National Science Foundation (grant number 200021_143906 and 200021_14944).

APPENDIX A: SHAPELETS

The two-dimensional Cartesian shapelets as described in Refregier (2003) are the multiplication of two one-dimensional shapelets $\phi (x):$

$\begin{eqnarray}&&{\phi }_{{\boldsymbol{n}}}({\boldsymbol{x}})\equiv {\phi }_{{n}_{1}}({x}_{1}){\phi }_{{n}_{2}}({x}_{2}).\end{eqnarray} \tag{ 14 }$

The one-dimensional Cartesian shapelet is given by

$\begin{eqnarray}&&{\phi }_{n}(x)\equiv {\left[{2}^{n}{\pi }^{\frac{1}{2}}n!\right]}^{-\frac{1}{2}}{H}_{n}(x){e}^{-\frac{{x}^{2}}{2}},\end{eqnarray} \tag{ 15 }$

where n is a non-negative integer and H_n the Hermite polynomial of the order of n. The dimensional basis function is described as

$\begin{eqnarray}&&{\phi }_{n}(x;\beta )\equiv {\beta }^{-\frac{1}{2}}\phi ({\beta }^{-1}x).\end{eqnarray} \tag{ 16 }$

In two dimensions, we write

$\begin{eqnarray}&&{{\boldsymbol{B}}}_{{\boldsymbol{n}}}(x;\beta )\equiv {\beta }^{-1}{\phi }_{{\boldsymbol{n}}}({\beta }^{-1}{\boldsymbol{x}}).\end{eqnarray} \tag{ 17 }$

This set of basis functions is used for the source surface brightness modeling and the lensing potential. To find the derivatives of these functions, Refregier (2003) introduced raising and lowering operators which act on the basis functions as

$\begin{eqnarray}&&\hat{a}{\phi }_{n}=\sqrt{n}{\phi }_{n-1},{\hat{a}}^{\dagger }{\phi }_{n}=\sqrt{n+1}{\phi }_{n+1}.\end{eqnarray} \tag{ 18 }$

The derivative operator can be written as

$\begin{eqnarray}&&\displaystyle \frac{d}{{dx}}=\displaystyle \frac{1}{\sqrt{2}}\left(\hat{a}-{\hat{a}}^{\dagger }\right)\end{eqnarray} \tag{ 19 }$

and therefore any derivative can be written as a superposition of two other shapelet basis functions (for further discussions, see Refregier 2003). In Figure 8, it is illustrated how shapelet basis functions in the potential space do map in the deflection angle and convergence.

**Figure 8.** Shapelet functions in potential space are plotted in the first row. From top to bottom: (0, 0), (1, 0), (0, 1), (0, 1) + (3, 0). The second and third rows show the deflection angles ${\alpha }_{1}$ and ${\alpha }_{2}.$ The last row shows the corresponding convergence κ.
Download figure:
Standard image High-resolution image

**Figure 8.** Shapelet functions in potential space are plotted in the first row. From top to bottom: (0, 0), (1, 0), (0, 1), (0, 1) + (3, 0). The second and third rows show the deflection angles ${\alpha }_{1}$ and ${\alpha }_{2}.$ The last row shows the corresponding convergence κ.
Download figure:
Standard image High-resolution image

APPENDIX B: NUMBER OF SHAPELET COEFFICIENTS

The choice of the maximal order of the shapelet coefficients n_max and its corresponding number $m\;=({n}_{\mathrm{max}}+1)\cdot ({n}_{\mathrm{max}}+2)/2$ has a significant influence in the goodness of fit to imaging data. In Figure 9 we illustrate this by reconstructing the source surface brightness with different n_max. We use the same lens model as in Figure 6. We see that even with ${n}_{\mathrm{max}}=10,$ most of the features in the arcs of the image could be reconstructed qualitatively but significant residuals remain. By increasing n_max, more and more details in the source appear and the residuals go down.

**Figure 9.** Source surface brightness reconstruction of the lens system RX J1131-1231 is modeled with different shapelet orders n_max. Upper panel: the reconstructed image. Middle panel: the normalized residual maps. Lower panel: the reconstructed source. From left to right: increasing number of shapelet order n_max from 10 to 50.
Download figure:
Standard image High-resolution image

APPENDIX C: PARTICLE SWARM OPTIMIZATION

The PSO description was introduced by Kennedy et al. (2001) as a method to find the global minima in a high dimensional nonlinear distribution. The algorithm is motivated by the physical picture of a swarm of particles moving in a physical potential. Every particle gets assigned a position in parameter space, a function evaluation (the log likelihood value) and a velocity in parameter space. The particles is assigned a swarm "physical" behavior when moving up or downwards a potential and a "swarm" behavior when redirecting their velocity toward the particle at the deepest place of the potential. The PSO process is illustrated in Figure 10 in a 20-dimensional parameter space. The implementation of the PSO algorithm used in this work is publicly available as part of the CosmoHammer Akeret et al. (2013) software package. The inertia weight strategy comes from Bansal et al. (2011) and the stopping criteria of Zielinski & Laur (2008, pp. 111–138) was implemented.

**Figure 10.** Illustration of the PSO process in 20 dimensions with 160 particles and 200 iterations. Left panel: evolution of the log likelihood of the best-fit particle. Middle panel: the difference of the parameter values from the best-fit at each iteration relative to the end point of the PSO process. Right panel: velocity of the best-fit particle at each iteration. Different colors are used for each of the parameters.
Download figure:
Standard image High-resolution image

GRAVITATIONAL LENS MODELING WITH BASIS SETS

Article metrics

Permissions

Author affiliations

ORCID iDs

Dates

ABSTRACT

1. INTRODUCTION

2. OVERVIEW OF LENS MODEL TECHNIQUES

3. CHOICE OF BASIS SETS

3.1. Basis for the Source

3.2. Basis for the Lens

3.3. Basis for the Lens Light

3.4. Image Making

4. MODEL FITTING

4.1. Source Surface Brightness Reconstruction

4.2. Convergence Techniques

4.2.1. Parameterization

4.2.2. Convergence with Additional Constraints

4.2.3. Likelihood Computation

4.2.4. Steps Toward Convergence

5. EXAMPLE—RX J1131-1231

6. DETECTABILITY OF SUBSTRUCTURE

6.1. Substructure Finding

6.2. Substructure Sensitivity

7. CONCLUSION

APPENDIX A: SHAPELETS

APPENDIX B: NUMBER OF SHAPELET COEFFICIENTS

APPENDIX C: PARTICLE SWARM OPTIMIZATION

GRAVITATIONAL LENS MODELING WITH BASIS SETS

Article metrics

Permissions

Share this article

Author affiliations

ORCID iDs

Dates

ABSTRACT

1. INTRODUCTION

2. OVERVIEW OF LENS MODEL TECHNIQUES

3. CHOICE OF BASIS SETS

3.1. Basis for the Source

3.2. Basis for the Lens

3.3. Basis for the Lens Light

3.4. Image Making

4. MODEL FITTING

4.1. Source Surface Brightness Reconstruction

4.2. Convergence Techniques

4.2.1. Parameterization

4.2.2. Convergence with Additional Constraints

4.2.3. Likelihood Computation

4.2.4. Steps Toward Convergence

5. EXAMPLE—RX J1131-1231

6. DETECTABILITY OF SUBSTRUCTURE

6.1. Substructure Finding

6.2. Substructure Sensitivity

7. CONCLUSION

APPENDIX A: SHAPELETS

APPENDIX B: NUMBER OF SHAPELET COEFFICIENTS

APPENDIX C: PARTICLE SWARM OPTIMIZATION