On the detection and precise localisation of merging black holes events through strong gravitational lensing

To unlock the full spectrum of astrophysical and cosmological applications of gravitational-wave detections, it is essential to localise the associated black-hole mergers to high precision inside their host galaxies. One possible method to achieve this is to compare the properties of multiple detections of gravitationally-lensed binary black-hole merger events with the properties of strong gravitational lens systems located in the joint sky localisation of the gravitational-wave detections. In this work, we simulate the population of binary black-hole mergers lensed by galaxy-scale lenses and detectable by LIGO-Virgo-Kagra in the coming decade and the population of galaxy-scale strong lenses that will be detected by Euclid. We use these simulations to investigate the prospects for localising strongly lensed binary black-hole mergers inside the lensed galaxies of 'Euclid-like' galaxy-scale strong lenses. We find that for 20-50% of strongly lensed gravitational wave events the lens system is detectable with Euclid, if the event falls in its survey footprint. Of these, we expect to correctly identify the strongly-lensed host galaxy as likely (with posterior probability) host galaxy - based on Bayesian evidence ranking of candidate hosts - for 34.6-21.9% of quadruply-lensed gravitational-wave events when given an a-priori 1-5 deg^2 gravitational-wave-only sky localisation. For triply and doubly lensed gravitational-wave events, this becomes 29.8-14.9% and 16.4-6.6% respectively. If successfully identified, however, the localisation can be better than a fraction of the host-galaxy size, i.e. of order milli-arcseconds. A first detection in the coming decade, however, probably requires dedicated deep and high-resolution follow-ups and continued upgrades in the current and planned gravitational-wave detectors.


INTRODUCTION
The LIGO Collaboration and Virgo Collaborations have thus far identified 90 gravitational-wave candidates (Abbott et al. 2019(Abbott et al. , 2021a;; The LIGO Scientific Collaboration et al. 2021).Independent analyses of the public data identified additional candidates (Magee et al. 2019;Nitz et al. 2019Nitz et al. , 2020Nitz et al. , 2021b;;Venumadhav et al. 2019Venumadhav et al. , 2020;;Zackay et al. 2019a,b).Further upgrades to LIGO, Virgo, and KAGRA are planned for the upcoming observing runs, and a fifth detector is to be built in India (Harry 2010;Somiya 2012;Iyer et al. 2011;Aso et al. 2013;Acernese et al. 2015;Aasi et al. 2015;Akutsu et al. 2020;Abbott et al. 2020b;Saleem et al. 2022).Several new gravitationalwave avenues could become observationally feasible as the detector network improves and expands.One such avenue is the gravitational lensing of gravitational waves.
In particular, based on the predictions of the expected GW sources and lenses, several independent studies have forecast gravitationalwave lensing observations in the coming years as the detectors are ★ E-mail: ewoudwempe@gmail.comupgraded (Ng et al. 2018;Li et al. 2018;Haris et al. 2018;Oguri 2018a;Smith et al. 2019a;Xu et al. 2021;Wierda et al. 2021) 1 .Indeed, although the rate of lensing can be subject to uncertainty if the mergers do not directly trace the star-formation rate density (Mukherjee et al. 2021), the forecasts are promising.
These improvements in terms of instruments and methodology open up a unique possibility of a new "multimessenger"2 channel: Study of gravitational lens systems using both gravitational-wave and electromagnetic information.Such an analysis might be possible if the strong lens galaxy that produces the lensed gravitational wave is localised.The localisation in this context could be performed using an "archival" search, by matching the gravitational-wave image properties with electromagnetic lenses: Only the lenses that can produce the observed gravitational-wave properties would be candidate hosts for the binary black hole (BBH).Indeed, the multimessenger analysis here refers to the combined analysis of a lens system with multiple messenger signals, as opposed to an analysis using electromagnetic counterparts, such as kilonovae.The search can be done with merging compact objects that do not emit detectable electromagnetic counterpart signals, as long as the correct lens system can be discerned among the several strong lenses that exist within the sky localisation of the gravitational wave.
Indeed, if a gravitational wave from a merging stellar remnant black hole within a galaxy is gravitationally lensed, then the light from within that galaxy is also lensed.Because the Earth and thus the detectors will rotate between the arrival time of the different gravitational-wave images, the same BBH is effectively detected several times at different detector orientations, allowing for an effectively "extended" network of detectors and thus significantly improved sky localisation Seto (2004); Hannuksela et al. (2020).The galaxy would appear as a lensed galaxy within the sky localisation of the wave which will be significantly better localized than a nonlensed gravitational wave Seto (2004); Hannuksela et al. (2020).
Here we vary the sky localization between ∼ 1 − 5 deg 2 .However, an accurate quantification of the sky localization capabilities for variable signal-to-noise ratios, detector networks, and number of strong lensing images, utilizing joint parameter estimation, will be needed in the future, as the sky localization will heavily depend on these assumptions, as has been observed for non-lensed events Petrov et al. (2022). 3Consequently, one can observe this restricted area to find plausible lenses from which the waves originated, resulting in a limited number of candidate lenses (Hannuksela et al. (2020), for example, estimate 100-1000 lenses).The correct lens system must be able to produce image properties that are consistent with the time delay and magnification ratios detected in the lensed gravitational wave, narrowing down plausible candidates down to (ideally) one.A subsequent follow-up analysis of the system could pave the way to a number of scientific applications, similar to the more traditional multimessenger lensing searches (Bianconi et al. 2022;Ryczanowski et al. 2022).
Crucial for this idea is the ability to detect lenses on the electromagnetic side, and substantial methodological improvements have been made on the lens finding (Metcalf et al. 2019), in particular by incorporating machine learning (Petrillo et al. 2019;Schaefer et al. 2018;Hartley et al. 2017).Observationally, the upcoming Euclid survey (Laureĳs et al. 2011) is expected to be sensitive enough to cover a substantial part of the lensing cross section of galaxies (Collett 2015).
Unfortunately, it is not clear how practically feasible such a localisation is with near-term instruments like Euclid and LIGO-Virgo-Kagra at final sensitivity.Hannuksela et al. (2020) showed that the localisation is possible, in principle, if we assume that we can detect a lensed gravitational-wave event, identify all strong lensing systems within the sky localisation of the event, using three characteristic lens systems as an example and without the full information of the lens and source properties.That is, without including a full analysis of the near-term EM instruments' capabilities.Here we let go of the latter two assumptions and study how feasible the localisation is in practice with Euclid and LIGO-Virgo-Kagra at final sensitivity, given that we have detected a strongly lensed gravitational-wave event from a galaxy lens.In addition, we extend the Bayesian analysis framework to account for a more realistic astrophysical scenario.
We restrict ourselves to galaxy-galaxy lensing, because the lens morphology is simpler to describe.
In particular, we perform a population study and a full multidimensional Bayesian modelling of time-delays and magnifications, including also our prior expectation for the galaxy's stellar mass content.In addition, we perform the full electromagnetic modelling and consider the degeneracies that arise from this.Furthermore, we analyse a substantial number of simulated lenses inside the footprint to estimate performance.We focus on using Euclid-like imaging, and we will in practice thus restrict to events in Euclid fields, although with a ground-based telescope like Subaru/HSC4 or Rubin/LSST5 , there are still good options if the source is not in the Euclid footprint.Additionally, the ∼40 % footprint of Euclid might be supplemented with the 17 500 deg 2 footprint of the also upcoming Chinese Space Station Telescope (Gong et al. 2019).
To test the framework, in Section 2, realistic simulations of lensed gravitational waves and lens systems are set up.The results and realisations of these simulations are shown in Section 3. In Section 4, we work out a procedure for the identification and localisation of lensed GWs, and apply it on a set of simulated lenses to test what accuracies we might achieve in practice.In Section 5, we summarise and discuss our findings.
Throughout this work, we assume the Planck18 cosmology (Planck Collaboration et al. 2020).

SIMULATIONS
In this section, realistic simulations are created of the gravitational lenses, the host galaxies, and the lensed gravitational waves.This is mostly analogous to Collett (2015), but includes GWs.A summary of the simulation parameters to supplement the text is in Table 1.We first generate the lens (Sections 2.1 and 2.2) and source populations (Section 2.3), then the gravitational wave populations (Section 2.4).
The gravitational waves are placed in source galaxies as described in Section 2.5.In Section 2.6, the detectability criteria of lens systems by imaging surveys are defined.Finally, in Section 2.7, the lensing probability is discussed.

Lens mass model
The lens population mostly follows Oguri (2018b) and Collett (2015).For the lens galaxy velocity dispersion   and lens redshift   we follow the procedure of Oguri (2018b), who takes a local velocity dispersion function measurement  loc from Bernardi et al. (2010) and combines it with the redshift evolution of the velocity dispersion function of Illustris (Genel et al. 2014;Vogelsberger et al. 2014;Genel et al. 2014) galaxies  hyd as characterised by Torrey et al. ( 2015) . (1) We verify that the implementation matches Figure 1 of Oguri (2018b).Including the comoving volume element gives the joint probability density function   : d As limits, we use 60 <   /(km/s) < 600 and 0 <   < 3.6   is then normalised over these limits.The choice of the velocity dispersion function can significantly impact the expected populations, as illustrated in Figure 1, where various weighted velocity dispersion functions are shown.These velocity dispersion functions are derived from stellar velocity dispersions, but we use them for the mass model too, which is consistent with Bolton et al. (2008).
The mass model is an elliptical power law based on the formalism of Tessore & Benton Metcalf (2015), see also Barkana (1998), with Einstein radius where  LS and  OS are angular diameter distances between Lens and Source and between Observer and Source respectively.The centroid of the lens is slightly perturbed such that it is not at a fixed location with respect to the pixel boundaries.We follow Collett (2015) and assume a Rayleigh distribution for their ellipticities This is equivalent to defining their complex ellipticity components   2010), and Choi et al. (2007).The lensing cross section scales roughly like ∝  4 .The lines show the distributions for all galaxies (blue), all galaxies with the redshift correction Equation 2 (orange), galaxies with a concentration threshold (green), a sample targeting S0's and E's (orange), a (smaller) morphologically selected sample of E's + S0's (red), just ellipticals (purple) and a selection of early-type galaxies (brown).Except one (brown) all distributions are taken from Bernardi et al. (2010).In this paper, we utilise the All (All galaxies in Table 2)) and Choi+ (Early-type galaxies).)velocity dispersion functions.
et al. 2015), there is still some scatter, so we add this scatter to the drawn light ellipticities: The power-law index is drawn from a singular isothermal ellipsoid would have   = 2, a point mass has   = 3 and a mass sheet has   = 1.We use   = 0.2.Although we assume this on the astrophysical distribution of   rather than the observed distribution (i.e. the distribution after selection effects, magnification bias, etc.), we found that after drawing our sample of lenses that match our detectability criteria, the distribution of   in our simulations matches well with the observed distribution of Auger et al. (2010).We also include an external shear, with magnitudes from a Rayleigh distribution with scale  = 0.05, and random lens galaxy position angles as in Collett (2015).We note that this external shear should not be interpreted as mostly originating from external matter (Etherington et al. 2023).Instead, it should be considered a first-order correction or first-order deviation of the lens model.For instance, it can absorb a galaxy being disky or boxy.

Lens galaxies
To each lens, a lens galaxy is coupled, with the centroid of the lens mass profile matching the centroid of the lens galaxy light profile.
To describe the light distributions of the lens galaxies, we assume that they are early-type galaxies that follow the rest-frame fundamental plane.Specifically, we follow the approach of Goldstein et al. (2019).This assumption is suitable mainly for ellipticals (and to a lesser degree for lenticulars), but only a small amount of the lensing cross section is due to spirals (Sygnet et al. 2010;Féron et al. 2009).The lens-galaxy light brightness profile parameters are distributed following the rest-frame Fundamental Plane, conditional on the sampled velocity dispersion from the velocity dispersion function.Unlike Goldstein et al. (2019), we choose to work with the FP in the -band instead of the -band, and additionally, we use the FP in the lens galaxy rest-frame.The FP is characterised as a multivariate Gaussian described by the surface brightness, , the log of the effective radius, log 10   , and the log of the velocity dispersion, log 10   , as obtained by Bernardi et al. (2003).Because the velocity dispersion was already sampled using Equation 2,  and   are drawn from this Gaussian conditional on the sampled velocity dispersion. 8The resulting surface brightness and half-light radii are converted to an absolute magnitude as follows, =  − 5 log 10 (  /(ℎ 70 (10 pc)(1 arcsec))) − 2.5 log 10 (2π), (10) where  lum is the luminosity distance.These can be derived by rewriting Equation (24) of Goldstein et al. (2019).
To generate images, we need the apparent magnitude  of the lens galaxies, which depends on the galaxy's absolute magnitude , its distance  and a -correction:  =  + 5 log 10 (/10 pc) + .The -correction is required because of the difference in brightness due to the shifting of the spectrum by the galaxy's redshift, while the 8 There is a typo in Goldstein et al. (2019) filter remains constant.Since the behaviour depends on the spectra of galaxies, we derive them from galaxies in a Millenium simulation lightcone of Henriques et al. (2015).We assume that they only depend on the redshift and absolute r-band magnitude.Specifically, a probability distribution was made of the mean and standard deviation of the k-corrections in a certain   ,  bin, giving   (  , ) and   (  , ).To smoothen the distribution, a third-order polynomial fit was made on this histogram.The -corrections are then drawn from a Gaussian given the calculated mean and standard deviation.Modelling the -correction as a Gaussian with these simulation-derived standard deviations as spread is necessary to account for the fact that observationally, the -correction is uncertain, especially at higher redshifts (although the bulk of our lenses has redshifts below 1, see Figure 4).We do not include the -correction required for the mismatch between the -band and the VIS-band.However, we note that our results are robust against the exact treatment of the light distributions of lens galaxies, because in the lens modelling, lens galaxies are subtracted anyway.
In the simulations, we assume that the actual light distribution is an elliptical de Vaucouleurs profile (Sérsic profile with   = 4).This assumption is reasonable for most lenses, which are massive early-type galaxies.In the lens modelling, however, the Sérsic index is a free parameter, with a broad uniform prior.We note that this, and other simplifying assumptions on the lens galaxies, only enter in the identification in the process of lens-galaxy subtraction and modelling, so the effects are considered second order.

Source galaxies
Considering that the magnifications due to lensing can make sources below the usual magnitude cutoff visible, the regular Euclid simulations do not go deep enough for our purposes.Thus, we use the simulated, but observationally motivated JAGUAR catalogue (Williams et al. 2018), constructed as forecast for the planned deep JADES survey on the James Webb telescope.It is a complete catalogue of galaxies between 0.2 <  < 15, and a stellar mass cutoff of  * > 10 6 M ⊙ (at high redshift this is converted into an equivalent UV-magnitude cutoff), such that there is no direct selection on the source magnitude.This catalogue includes stellar masses, Sérsic profile parameters and spectra.We integrated the spectra with the VIS filter to get VIS magnitudes.Galaxies with VIS-magnitudes above 32 are filtered out (this is less than 1 % of the simulation's stellar mass).We use a denoising density estimation tool (Bigdeli et al. 2020) to construct a smooth probability distribution function from this catalogue.Specifically, the sampled parameters are redshifts (  ), stellar masses ( * ) or star-formation rates (SFR), VIS magnitudes (  ), effective radii   , axis ratios   and Sérsic indices   .The distributions of these parameters are shown in Figure 2. The ellipticity angle is isotropically distributed.

Gravitational wave population
To generate realistic lensed GWs, we must couple a GW event to a source galaxy.The redshift is set equal to the source galaxy redshift.To account for the fact that the probability of emitting a GW varies per galaxy, it depends on the galaxy parameters, we include a GW emission probability, which is described in Section 2.7.We note that although this is a different way of sampling the gravitational wave redshifts, we find that the redshift distributions are very similar to what was found by Wierda et al. (2021), who directly sample the gravitational wave redshifts using distributions that come from the redshift evolution in stellar population models, instead of linking them to galaxies.For the other observed gravitational wave parameters, we follow the procedure of Wierda et al. (2021), who use the observed LIGO-Virgo-Kagra O1-O3a distributions.We adopt a stellar remnant black mass function based on the Power Law + Peak model fitted to the LIGO/Virgo detections across O1-O3a by Abbott et al. (2021c).Using these masses, the resulting signal-to-noise ratio (SNR) is calculated for an ideally oriented source, using the pycbc package (Nitz et al. 2021a), for A+ sensitivities for LIGO, and design sensitivities for Virgo and KAGRA.The antenna pattern, which is a function of the position on the sky (, ), the event time, and the inclination (the angle between the orbital angular momentum and the line of sight), includes 4 GW detectors (Hanford-Livingston-Virgo-KAGRA).This gives the intrinsic SNR of a GW (i.e., before any lensing is applied).The lensed SNRs of each image  is then the original signal-to-noise ratio times the square root of the magnification (  =  intrinsic √ ) (e.g.Ng et al. 2018).Here we assume 100% duty cycle for simplicity.However, in reality, the duty factor will alter the absolute rate of detections and thus the rate of gravitational-wave lensing detections.The detection efficiency, likewise, will depend on the number of detectors that are online, as a more extensive detector network will allow for better sky localization and thus a greater lensing detection efficiency.In O3, the duty factor for at least one detector being online was 97%; for any two detectors being online at the same time it was 82%; and for all three detectors together it was 45%.Further details regarding instrument performance and data quality for O3a are available in Abbott et al. (2021a).Time not observing is spent either acquiring lock, unlocked and undergoing maintenance, unlocked due to unfavorable environmental conditions (earthquakes, wind, storms), or locked and in a state of commissioning, where improvements are made to the detectors.If a subset of the gravitational-wave detectors is online, the signal-to-noise ratio will be lower, which will somewhat degrade the rate of detectable lensed events.However, the duty cycle factor is being constantly improved (see e.g., Buikema et al. (2020) for improvements from 2015 to 2022), and we furthermore conservatively do not include the LIGO India detector, a fifth detector, that is being built in India.9

Binary black hole source position
An astrophysical assumption that we make is that within the galaxy, the BBH source position prior follows the observed stellar VIS-band light distribution.This is a simplified model of reality: ideally, one would consider the rest-frame light distribution, in possibly some other band, depending on the exact (currently mostly unknown) astrophysical nature of the BBH population.For instance, if the BBH rates are linked to more massive stars, the blue light distribution would represent the position dependence of the BBH rate.
Given this assumption, the prior probability of the BBH position inside the galaxy,  BBH , is proportional to the source light profile.In a model where the gravitational wave population follows the stellar mass distribution, this is equivalent to assuming a constant massto-light ratio inside the galaxy.We furthermore sample conditional on a minimum number of images for the GW.For instance, when simulating systems with ≥4 images, a priori, the gravitational wave is constrained to be inside the diamond four-image caustic.
The first way of getting realisations that satisfy these two criteria, namely having the BBH source position being distributed like the light distribution, and constraining the number of images, is starting from source galaxy position   and sampling the BBH source position conditional on this source position.The priors are The denominator, which is the probability that a BBH, whose source position follows the light distribution of a particular source galaxy, is inside the diamond caustic, must also be taken into account: The second way is by first sampling the BBH source position uniformly within the diamond caustic (U ♦ , which depends on the lens parameters   and source parameters   ), and only afterward sampling the galaxy source position.Effectively, the BBH source position and source galaxy source position are switched; this is possible because we have a flat prior on the source position.The priors are: where this last step was made by assuming a flat prior on the source position ((  ) = 1/(4π)), and defining the Sérsic function to be normalised.
With some work, Equations ( 14) and ( 16) can be sampled from with simple inverse transform sampling.For forward simulations, the second method (using Equation 16) is most convenient, but for doing identification, we require the first method.In practice, the caustics are calculated analytically using the equations from Tessore & Benton Metcalf (2015), as detailed in Appendix G.For the lens equation solving (necessary for calculating magnifications and time delays), a semi-analytical recipe was used too, outlined in Appendix H.

Gravitational wave detectability and lens identifiability
Although in practice GW detections are classified as detection using the false alarm rate, we opt for a hard SNR threshold, which can be used as a reasonable proxy for a detection.Targeted lensed searches, when one or more super-threshold image is available, may allow one to uncover so-called sub-threshold triggers below the usual noise threshold by reducing the background noise and glitch contribution (Li et al. 2019;McIsaac et al. 2020).For the choice of threshold, we repeat the argument of Wierda et al. (2021) furthest distance a detector network will be able to detect an event) increase of ∼15 %.To take this into account, we choose as threshold SNR = 7 in our work (note that the SNR ∝  −1  ).Planned improvements to the sub-threshold search pipelines are expected to push the sensitivity down further.
To determine whether a lens system can be identified as such in Euclid images, we opt for hard thresholds on some the following parameters (see Appendix B for further justifications on the numbers): (i) The solid angle over which the flux from the background galaxy is detected at ≥1.5 per pixel exceeds 0.70 arcsec 2 .This area is a measure of the information content that an image has -and is a reasonable measure of its difficulty to be detected by lens finders (Nagam et al. 2023).Specifically, it corresponds to an area of 30 resolution elements (of (FWHM/2) 2 ).
(ii) The system is sufficiently resolved, this requires an Einstein radius above 0.33 arcsec.
(iii) The background galaxy brightness distribution varies sufficiently across the lens: if the lens is small compared to the angular scale at which the source galaxy light profile varies, then the deflection of the light rays due to the lens does lead to a noticeable distortion of the source galaxy, and therefore it is undetectable.We found that the parameter that describes this scale best is the semi-minor axis length, because this is the shortest length scale across which the galaxy varies, and we require   < 0.56  (see Appendix B).
The probability of detection is therefore a sharp cut:

Lensing and detection probabilities
Not all galaxies are equally likely to host BBHs, nor are all galaxies equally likely to lens the BBH. 10 We combine all the previous in-gredients to finally fold in our analysis the probability that a given system hosts a strongly lensed BBH.
(i) The first factor is a geometrical term and concerns the lensing cross section, i.e. the area within the relevant caustic in the source plane   (  ,   ).The probability density11 of a random GW event in the sky to occur inside that caustic is The lens and source parameters   and   follow the PDFs as earlier discussed.
(ii) Additionally there is the probability that a particular source galaxy with parameters   emits a gravitational wave.What this exactly looks like is quite uncertain.Predictions based on stellar population modelling vary (Artale et al. 2019;Santoliquido et al. 2021) and observationally the redshift evolution is most likely bracketed between the star formation rate and the stellar mass density (Abbott et al. 2021c).We, therefore, consider two models12 : one where the GW emission probability is proportional to the stellar mass of the source galaxy, and one where the probability is proportional to the galaxy's star formation rate (in the last 100 Myr), where   * and  SFR are constants of proportionality (e.g. the number of GWs per year per solar mass).
A rough estimation of the number of detections can be obtained by dividing the BBH merger rate as measured by Abbott et al. (2021c) by the stellar mass density (or SFR) at  = 0 as assumed in Williams et al. (2018), which gives (iii) To consider only detectable lenses there is a probability that we observe the system with an electromagnetic (EM) imaging survey,  EMdet .The detectability criterion of Section 2.6 is applied.
(iv) To consider only detectable gravitational waves, there is the detection probability  GWdet , as explained in Section 2.6.

SAMPLE AND DATA GENERATION
With the lensed-GW population model as described in the previous section, we can generate a mock sample of lenses.This is split in two parts: in Section 3.1, the physical parameters of the lenses are drawn, considering the lensing probabilities, as well as detectability criteria, and second, in Section 3.2, a mock dataset is generated for each drawn physical realisation, with noise and uncertainties added.indicates a sample of lens systems that are hosts of gravitational waves that have a high-enough SNR to be observed, and green (GWs+EM deterved) indicates lens systems that are both observable with imaging and are host to a detectable gravitational wave, generated as described in Section 3.1.

Physical realisation generation
The purpose of this paper is assessing our ability to link detected GWs with detected EM lenses.We therefore need 3 samples -one to analyse the properties of an EM-only lens sample, one to analyse the properties of a sample of detected lensed gravitational waves, and one to analyse a sample of systems where we can both detect the lens in the EM, as well as the lensed GW.We therefore run three different simulations, with different lensing probabilities: (i) Simulation of EM lenses that Euclid could detect in a random patch of sky (ii) Simulation of GW lenses that would emit a GW which will be detectable with LIGO-Virgo-Kagra  lens (GW observed) =  cross sec  emitted GW  GWdet (24) (iii) Simulation of EM+GW observed lenses that Euclid would detect and that emit a GW which will be detectable with LIGO-Table 2. All-sky lensing rates.Given are the number of lenses discoverable with Euclid (assuming a 15 000 deg 2 survey size), and then for two scenarios the lensed gravitational wave rates, the percentage of the GW lenses that correspond to an EM lens detectable with Euclid, if it lies in its 15 000 deg 2 survey footprint, and the percentage of those EM lenses that could be identified correctly -defined as being the top hypothesis in our Bayesian evidence ranking (Section 4.2), and the percentage with an identification of over 2 confidence over the next-most-likely hypothesis.We quote accuracies for the cases of an a-priori sky localisation of 1-5 deg 2 (e.g.26.3-13.0means an accuracy of 26.3% in case of a 1 deg 2 localisation, and 13.0% for 5 deg 2 .).The first scenario assumes that the galaxy's GW emission probability is  (GW emitted) ∝  * , and the second scenario assumes  (GW emitted) ∝ SFR.The three rows indicate the difference between requiring 2 or more, 3 or more, and 4 or more observable images/events.The effect of a different velocity dispersion function (Choi et al. 2007) is shown in the lower three rows.We note that for this velocity dispersion function, we have not performed the end-to-end analysis to obtain identification accuracies, because it would not be a fair experiment: it only represents early-type galaxies, and given a gravitational wave, one cannot know a-priori that it came from an early-type galaxy.Virgo-Kagra

Vel
For each case, ∼10 5 physical realisations are sampled with the nested sampler Polychord 13 (Handley et al. 2015a,b).For the number of lenses and lensed GW rates, we have where  (lens/source galaxies) is the number of lens/source galaxies in the sky,  JAG is the number of galaxies in the Jaguar catalogue,  JAG is the sky area covered by the Jaguar catalogue, () is the normalised prior of the lens, source, and gravitational wave parameters, and ⟨ lens (EM detected|)⟩  is the probability for a random lens galaxy and a random source galaxy to be have parameters and positions such that it is a strong lens system, and that is also detectable in the EM.Polychord is used to compute the integrals of the form ∫ d  lens (EM detected|)(), by making use of the fact that this integral is equivalent to the calculation of the evidence of a system with likelihood  lens (EM detected|) and prior ().Polychord simultaneously also samples physical realisations of lens systems.The distributions of the most important parameters of the lens mass model are shown in Figure 4 and of the source galaxies are in Fig- ure 2.An overview of the predicted lensed GW rates and detected GW counts, as well as detectable fractions, is listed in Table 2.
We note that detectability is strongly correlated with large time 13 Since Polychord and most other nested samplers do not deal well with plateaus in the likelihood (Fowlie et al. 2020), in the code, strict plateaus are avoided.
delays, since large time delays are paired with larger and thus easierto-observe lens systems.If a lensed gravitational wave is discovered, time delays are available without any modelling, and they provide a rough estimate of how probable they are to be observed in the EM, and how promising an electromagnetic follow-up could be.This is illustrated in Figure 3 for quads, where the fraction of EM-observable hosts rises with the time delay.Unfortunately, on the GW-side, the opposite is true -larger time delays are harder to confidently pair to each other, because the false positive rate of pairing two unrelated GW events increases, so in reality there is a trade-off.
There is also a detection bias towards higher GW magnifications in SNR-limited samples.We discuss this shortly in Appendix C.

Mock data generation
For each of these physical realisations, a corresponding observed dataset is simulated.This includes (i) Simulated Euclid-like images were made with lenstronomy14 (Birrer et al. 2015(Birrer et al. , 2021;;Birrer & Amara 2018).Euclid noise levels are described in Appendix A. Some example images for the different physical realisations are shown in Figure 5.
(ii) A photometric redshift measurement is simulated, assuming a Gaussian error of Δ/ = 0.05 (this approximately matches the expectations of Euclid Collaboration et al. 2020).
For the simulations of GW hosts (see Section 3), data derived from the GW signals is included: (iii) Uncertainties due to unmodelled substructure and model imperfections noises are added to the time delays and GW flux-ratios.Individual arrival times are given Gaussian noise of between 1.5-6% of the largest time-delay.This conservative range is consistent with the expected time delay errors due to substructure (Liao et al. 2018;Gilman et al. 2020), and its value depends on the substructure mass spectrum and the line of sight environment.Individually measured magnifications are given log-normal uncertainties between 10-30% (so relative magnifications have √ 2 times this error). 15Although exact numbers are unknown, for microlensing + substructure this is viewed as conservative when considering that GWs are not as susceptible to (stellar-mass) microlensing due to waveform suppression (Cheung et al. 2021).In reality, the scatter due to microlensing will be non-Gaussian, due to the non-linear nature of the substructure, and due to the magnification bias, but these effects are neglected in this work and considered secondary.
(iv) The effective luminosity distance  eff  (  / 1/2 ) is drawn from a log-normal distribution around the actual luminosity distance with (ln   ) = 0.4, which is approximately equal to the errors found on O1-O2 events (Abbott et al. 2020a).This is quite conservative, considering that when combining several GW events, the errors will reduce substantially.
(v) The sky localisation of the lensed GW event is drawn from a two-dimensional Gaussian, enclosing 5 deg 2 (90 %).In reality, the uncertainties depend on the orientation of detectors and the quality of the events themselves (Abbott et al. 2020a), so in Section 4.2.3, we vary this localisation precision over a wide range.

GRAVITATIONAL WAVE EVENT LOCALISATION
Using the simulations discussed in the previous section, we set up several mock identification runs.In Section 4.1, the likelihoods corresponding to the separate data products that are used in identification are explained.In Section 4.2, the identification process is described, and the expected performance in practice is assessed.Finally, in Section 4.3, we investigate the prospects for sub-arcsecond localisation of binary BHs via the combination of all available data.

The likelihood
For the identification and localisation, we use all of the mock data that was generated in Section 3.2, which we split up in data from electromagnetic imaging,  EM , consisting of the Euclid imaging and (photometric) redshifts (part i-ii of Section 3.2), and data from the gravitational waves,  GW , consisting of flux-ratios, time-delays, an effective luminosity distance (  / 1/2 ) measurement, and a sky localisation (part iii-vi of Section 3.2).We similarly also have an associated data likelihood for EM part (L EM = ( EM |)) and a GW part (L GW = (  GW |)).All the likelihoods are constructed to match the generation of the data.The EM likelihood consists of (i) The likelihood of the image, L img ( EM |  ), is calculated by lenstronomy.
(ii) The likelihood of the redshift is a Gaussian with uncertainty Δ/ = 0.05.If   >   , then L  = 0 (although it is unlikely, given the errors, one could measure a lens redshift higher than the source).
The GW likelihood consists of the following: (iii) Although gravitational-wave detectors can resolve the arrival times to sub-millisecond accuracy, they can not be directly related to the lens model, because our lens model does not include substructure within the lens and along the line-of-sight, which can alter the arrival times.Therefore, we add to each arrival time a Gaussian error due to substructure with an uncertainty   of 1.5 %, 3 % and 6 % of the largest time-delay.The uncertainty to account for modelling imprecisions due to substructure along the line of sight (the exceedingly small measurement errors are negligible).In reality, these errors between the images will be correlated and non-Gaussian, and could be absorbed in the lens model.However, uncorrelated Gaussian errors serve as a simple and conservative approximation.The distribution of the arrival times, constructed by the difference in the arrival times between the images, will have a non-diagonal covariance matrix.Using a time-delay likelihood with such a covariance matrix is equivalent to marginalizing over an unknown time offset with uncorrelated event time errors.
Explicitly, the log L is: where   () are the predicted time delays for model parameters ,  , are the measured time delays from the GW-side and (  )   =  2 0    + 2 0 is the 3×3 covariance matrix.This covariance matrix does have off-diagonal terms, and the derivation is given in Appendix F. The time delay uncertainty  0 is set to be 1.5 %, 3 % and 6 % of the time between the first and last event.
(iv) The flux ratio likelihood.The flux ratios are assumed to have log-normal errors of 10 %, 20 % and 30 %.This error includes errors due microlensing and substructure along the line of sight as well as measurement errors from the GW-side.The parities, or rather, the Morse phase indices of images (Dai & Venumadhav 2017;Ezquiaga et al. 2021;Dai et al. 2020) are taken into account as well 16 .The loglikelihood is set up analogously to the time-delay likelihood, with , and   → 0.1, 0.2 and 0.3, when assuming 10 %, 20 % and 30 % errors on the flux ratios.Note that the flux ratio and time-delay likelihood are linked: each image pairing is kept, and all permutations over images are taken as described in Appendix E.
(v) The GW-inferred luminosity distance is modelled as a lognormal distribution, with (ln   ) = 0.4.In the future, this could be replaced by a likelihood function that more closely resembles the GW-inferred luminosity distance.
(vi) The position (RA, DEC) information from the GW-side is assumed to be a two-dimensional Gaussian.The precise likelihood will depend on the sky localisation accuracy, which depends on orientation, the number of images of the event, and the GW SNRs.We assume that this likelihood is independent of any of the other parameters, and self-consistently assume a nominal localisation accuracy of 5 deg 2 (90%).
Lastly, because binary black holes formed from stellar remnants are expected to trace the star formation17 , we need to include this information in our model prior.Indeed, we have (vii) The astrophysical probability for a source galaxy to host a BBH (Section 2.7).
(viii) The probability that (in the case of  detected GWs) the BBH source position inside the source galaxy is within the -image caustic, namely Equation 13.

Identification
We now establish the formalism for the identification of the GWs.In Section 4.2.1, the identification is conceptually explained, in Section 4.2.2, the formal Bayesian method of identification is described, and in Section 4.2.3, the performance of the method on a realistic sample of lensed GWs is shown.

Parameter overlaps
Conceptually, the identification of an EM lens system given the GW data amounts to determining which of the EM candidate lenses matches the GW data best in terms of model parameters.Given the GW data-set, a posterior of the lens and source model parameters is determined.This posterior is normalised and acts as prior on the model parameters that determine the evidence for the EM data for each lens system considered as potential lens.One way to intuitively visualise this is by doing the inference using the GW data on its own, and then comparing the resulting parameter posteriors to the posteriors of the candidate EM-lenses.For the GWs, there are the time delays, relative magnifications and a rough redshift estimate.The inference for a test system is shown in Figure 6.In this case, the inferred parameters overlap much better when using the EM data of the true host of the GW than an incorrect host, thus favouring the true system.Additionally, in the right panel it is shown that it is possible to identify the source position of the BBH to high precision in the panel in the middle right, another goal of the joint modelling.In the next section, a more formal evidence Bayesian ranking is introduced, which takes into account all priors and uses a more realistic model.

Bayesian evidences
Firstly, we assume that the detected dataset of lensed GW events,  GW , which consists of multiple GW measurements that were identified as lensed signals of the same host.We assume that this identification was not a false positive.Folding in some false positive prior probability is straightforward, however, we cannot perform a population study on that in this work, because it is uncertain what the probability of falsely identifying a set of GWs as lensed from the GW-side.Furthermore, we assume that none of the Euclid-identified lenses are false positives (generalisations are straightforward, but not expected to change the results).
From the GW-side, we have the data  GW .From the EM-side, there is the data { EM, }, which consists of  Euclid images and (photometric) redshifts of lenses in a plausible area where the lensed GW event took place.The set of EM data of all observed EM-systems is denoted by { EM, }, where  EM, is the EM data of a single lens system .Note: we assume that the plausible area on the sky is fully covered by Euclid or a follow-up of Euclid-like quality.
The Bayesian identification is a model comparison, where the different models are (i) There is no electromagnetic imaging of the lens system that generated the lensed GW in our dataset { EM, }.This could be because the system is too faint or too small to be identified in imaging by Euclid or with other data.Combining GW data with Imaging Bayes Factor favouring true system: 985 GW GW+EM (true system) GW+EM (incorrect system) Figure 6.A simplified example of the identification process and joint modelling.A lensed GW event was generated according to the procedure outlined in the text, generating mock measured time delays, flux ratios, and a luminosity distance.This example assumes an isothermal density slope (  = 2), and no external shear.The posteriors of the inferred model parameters are shown in brown.In the top right panel, luminosity distance posteriors are shown.Additionally, a mock image and photometric redshift measurement from this galaxy was generated, and the posteriors of combining this EM data with the GW data are shown in green.For an unrelated lens, imaging and a photometric redshift measurement was also generated and combined with the measured GW, of which the posteriors are shown in orange.The top panel shows the luminosity distance measurement, and the bottom corner plot shows the posterior of the Einstein radius   , time-delay distance  Δ and axis ratio .The small panel on the right shows the sub-arcsecond source BBH source position inference.Note that the true system overlaps better with the GW data than the incorrect system, which is also seen in the Bayes factor.
(ii) There is electromagnetic imaging of the lens system that generated the lensed GW in our dataset { EM, }, and it came from the candidate lens system number .
To each, there is a prior belief, denoted as (not EM det) and (EM det), which is the probability that any lensed GW has an detectable EM lens.These probabilities correspond to the detectable fractions as described in Table 2.
For model (i), the evidence is that is, the model evidence encodes the probability that the system was not detected in the electromagnetic band For model (ii), the evidence for image  as host of the GW: The constant  is not important as it is common to all of the evidences, and only the evidences ( GW |  EM,i ) are needed.The factor  lenses is the number of detectable (i.e.findable in Euclid-like images using modern lens-finding algorithms) lenses in the sky, and this factor is necessary to be able to compare the Z  's with Z miss .One way to justify these factors is by considering the limit in which the likelihoods become non-informative (L GW = L EM = 1).Then all integrals, which are just integrals over priors, go to 1, and we are left with Z miss = (not EM det) and Z  = (EM det)/ lenses .
To calculate the Z  's, we rewrite the integrals to where the various (  | . ..) serve as a prior on the lens system parameters.We self-consistently set these to be equal to the astrophysical priors, that also generated the simulated lens systems.This can be relaxed to less informative priors or be replaced with other astrophysically inferred priors.Although in the simulations, the lens galaxies were fixed at   = 4, in the lens modelling, we do not use this, and instead model the lens Sérsic index freely, with a wide uniform prior:   ∼ U (0.5, 7.5).The evidence ratio is calculated by sampling the image-only information (L EM (  |EM det)), and then applying the importance sampling evidence estimator (Geweke 1989;Neal 2001) on the samples.We chose not to include the detectability criteria (i.e. EM det and  GW det ) in the evidence calculations, as they are not fundamental (we simply use hard cuts on the GW SNRs and detectability criteria), and it is more conservative to assess our achieved identification performances without them; this makes the prior more uninformative.Implicitly this removes the magnification bias, i.e. the bias stemming from the selection effect from detection thresholds, in the reconstruction.So it removes the restriction that the lens system and GWs must satisfy our detectibility requirements in our fitting.However, we note that for the magnification bias for EM galaxies (i.e. the omission of  EM det ), this is not a large issue, because any model that is compatible with the observed light distribution should anyway be classified as observable.The choice to ignore the GW magnification was largely made for computational reasons (which would not be an issue in reality), but the effect is not likely to be large because the luminosity distance likelihood already encompasses this information: the  GW det term would penalise models with GWs that are further away because they would be undetectable, but this information is largely already incorporated in the luminosity distance measurement.

Performance
To assess how reliable this Bayesian procedure works in identifying gravitational waves, what fraction we can identify accurately and A cumulative distribution function of the Bayesian rankings of the true system with respect to the most likely incorrect system.For example, for 2 GWs, the probability the true lens ranks first is 23 %, and is in the top-5 candidates for 48 % of the cases, when using combining all information.The three panels correspond to the case with 2, 3, and 4 detected GWs.The contribution of the different types of information used is split up into different colours, with in blue the position-only rankings.The other colors indicate the improved rankings upon considering also time delays and flux ratios (yellow), when considering the host GW emission probability (green), when considering the GW-inferred luminosity distance (orange-red), when considering the constraint that is obtained from the number of detected GWs (pink), and when combining all data (brown).We assume a  (GW emitted) ∝ SFR, a localisation area of 5 deg 2 , and 20 % and 3 % errors for the flux ratios and time delays.to what confidences, a sample of 200 lensed GW events with their corresponding lens systems are (drawn from the distribution of Equation 25), and observed GW and EM data of them are generated.We focus on the model where the binary black hole population is linked to the star formation rate.For the BBH population linked to the old stellar population we performed the analysis as well, but results differ only slightly, with a slightly better identification performance for this model, as summarised in 2. Additionally, around 400 nor-mal (EM-only, so drawn from the distribution of Equation 23) lens systems18 were generated, with corresponding simulated EM data, to represent the closest systems that are not the true host.For each of the EM and the EM+GW systems, inference of the lens and the source model parameters on the EM data was done.Then, for each of the simulated GW datasets, and for each incorrect lens system (one of the 400 lens systems) as well as for the correct system that generated the GW, the evidences for generating the GW-dataset were calculated.For each GW dataset, this results in an evidence ranking, ordering the most likely EM lens highest, which ideally would be the true lens system.This way, a 200 × 401 grid is made, where for each of the GW datasets the evidences are calculated of the EM datasets.i Because the sample of the 400 EM lens systems was selected based on position, the position likelihood is the most informative: it was already used to cut out the bulk of hundreds of thousands of lenses in the universe based on the Euclid selection criterion.The performance is in fact most affected by the sky localisation from the GW-side.This is illustrated in Figure 8, where we show the identification accuracy as a function of localisation precision.We define the identification accuracy as the fraction of gravitational waves that are identified as a top candidate, having the largest evidence of all candidates as well as having Z > Z miss .To improve statistics, we oversample the positions, reporting the average accuracy over many realisations of the positions.
If the system is not ranked highest, it is still probably in the top candidates: the rank-order of the true system, with respect to the incorrect systems is shown in Figure 7.The different probabilities are split to assess the statistical power that each part of the data has.One can see that all components are informative, but high accuracies are only attained when combining all components: measured time delays and flux ratios, the astrophysical prior, and the luminosity distance likelihood are informative.This encourages such a joint analysis.
Whereas accuracies are simply derived from the rank-ordering of the lens systems, we also consider the identification Bayes factors with which the true lenses are identified over the most likely incorrect system.Figure 9 summarises these expected confidences of identification that we can expect, varying the number of detected GW events, and varying the uncertainties on the flux ratios and time delays.The full distributions, along with numerical uncertainties on the Bayes factors, are shown in Figure 10.We note that the accuracies are negatively impacted by numerical uncertainties.
When doing such a probabilistic identification, there will inevitably be false positives.In the scenario for 4+ GWs, the true lens ranks top candidate in 21.9 % of cases (see also Table 2), and a false lens ranks top in 10 % of the cases.In the remaining 68 % percent of cases, the hypothesis that the lens system was not observed was ranked with the highest evidence.The probability that the true lens ranks top with a > 2 confidence is 7.4 %, the probability that a false lens ranks top with > 2 confidence is 2.1 %.

Sub-arcsecond localisation
In the case that a reliable host identification has been done, we can refine the source position of the BBH within the host galaxy by jointly modelling the EM+GW data.This is implicitly done in the identification framework (it is part of the evidence calculations), but we also separately investigate the localisation performance.We chose to look at the localisation for three typical systems, one system with two detected GWs, one with three, and one with four.The inferences of the BBH positions in the source plane, with respect to the host galaxy are shown in Figure 11.As expected, having more detected images will improve localisation.Although smaller source galaxies improve localisation in an absolute sense (due to the assumption that the BBH source positions follow the source light distribution), the downside is that the relative source position of the BBH within the galaxy is badly constrained, and moreover, smaller sources provide fewer constraints in the lens model and hence poorer identification of the EM system that hosts the GW.
In the right panel of Figure 11, the source position posterior is actually bimodal.This is caused by the symmetry of the lens model: for an elliptical power law lens, time delays, and magnifications remain constant upon reflection through the ellipses' major and minor axes.The source light distribution prior breaks this symmetry (we assume that the source position prior is proportional to the light intensity).External shear also breaks the symmetry (though not the 180-degree rotational symmetry), but generally the effect of an external shear is relatively small compared to uncertainties on magnifications and arrival times.

SUMMARY AND DISCUSSION
We have assessed the possibility and developed a methodology for identifying the host galaxy of a lensed gravitational wave, and subsequently performing a sub-arcsecond localisation of a lensed binary black hole gravitational wave inside its host galaxy.Such identifications have the potential to answer important open questions in as-trophysics, for example on the role and environment of binary black holes, on binary black hole physics, and time-delay cosmology.
Considering a Euclid-like instrument to identify lens systems and host galaxies, a GW event rate coupled to the young stellar population, four measured strongly-lensed gravitational waves, a combined sky localisation accuracy of 1−5 deg 2 , nominal errors on the arrival times and magnifications (due to microlensing, substructure and measurement errors), we find that our evidence rankings identify the correct host as the top ranked candidate 34.6−21.9% of the cases, and for 18.7−7.3% of the cases, it is identified it with a Bayes factor ln(B) > 2 (this corresponds to a confidence of >2) above the next-highest ranked candidate.For only three (two) measured lensed GWs, while keeping the sky localisation precision constant, the accuracy drops to 11.5−3.6 % (4.1−1.0 %).Subsequent localisation inside the host galaxy is also possible, and for well-measured systems very precise: this could be done to a precision of order 10 mas for a bright and well-resolved system (Figure 11), but because of symmetries on the lens model and depending on the host galaxy's mass distribution, there may be multiple possible degenerate locations.
Additionally, we estimate the fraction of GW systems that is observable with Euclid, and conservatively find that from 20 % to 50 % of the lens systems generating detectable lensed GWs, depending on the assumptions on the velocity dispersion function and the number of detected GWs.The GW rate mainly depends on the assumed astrophysical GW environment, and because there is considerable uncertainty on the redshift evolution of the gravitational waves (Mukherjee et al. 2021), we work out two scenarios, one scenario in which the GW population follows the old stellar population, and one scenario in which the GW population follows the star formation rate.The GW rate estimates based on a redshift evolution from stellar population models by Wierda et al. (2021) lie between this.The estimated number of lens systems observable with Euclid is also very similar to the rate estimated by Collett (2015), who estimate 170 000 detectable lenses in Euclid's footprint, versus our 215 000 (that is, when assuming the same velocity dispersion function).

Model limitations
As with all simulations, simplifying approximations were made, some of which are conservative and some which are optimistic.For example, we treat all lenses as thin single-plane lenses, with a powerlaw and an external shear.We do not include an external convergence (a mass sheet) in the lens modelling.Although including this may make the lens modelling a bit harder, we note that similar degeneracies are already included in the form of an external shear and a free power-law index.Additionally, the assumption of normal errors on the arrival times and log-normal errors on the amplitudes is also an approximation.In reality, there will be correlations between the uncertainties, and the PDF of the amplitudes will be somewhat non-Gaussian: the magnifications and time-delays due to microlensing and line-of-sight substructures are non-Gaussian, and additionally, the magnification bias on the GW SNRs will not give rise to some non-Gaussian uncertainties (the SNRs determine the flux ratios).These are second-order effects however.Furthermore, the lens and source light distribution was approximated as a Sérsic ellipse.Although this approximation is acceptable for the lens galaxies, which are mostly massive ellipticals, especially for the larger source galaxies, this approximation is not that good.However, in the case of large source galaxies, the lens model is not a limiting factor to identification, and instead, we are limited by the data from the gravitational wave, while for small source galaxies, the PSF and pixel size of the imaging instrument will smoothen the source galaxies such that it Significance (number of sigmas) 10%, 1.5% 10%, 3% 10%, 6% 20%, 1.5% 20%, 3% 20%, 6% 30%, 1.5% 30%, 3% 30%, 6% Flux ratio, time delay error The identification Bayes factors for different numbers of GWs, for different uncertainties on GW fluxes and time delays.These are boxenplots: enhanced boxplots starting at the median, creating the central box between the first and third quartile, and from then each subsequent level contains half of the remaining data (Hofmann et al. 2011).The right axis indicates the sigma-equivalent to the Bayes factors on the left axis.We assume a sky localisation of 5 deg 2 , and a GW emission probability  (GW emitted) ∝ SFR.
is almost indistinguishable from a Sérsic ellipse, so we do not think this will impact identification performance substantially.
In terms of estimating the rates, we have assumed a 100% duty cycle for all detectors, but generally some time is spent either acquiring lock, unlocked and undergoing maintenance, unlocked due to unfavorable environmental conditions (earthquakes, wind, storms), or locked and in a state of commissioning, where improvements are made to the detectors.If a subset of the gravitational-wave detectors is online, the signal-to-noise ratio will be lower, which will degrade the rate of detectable lensed events.However, the duty cycle factor is being constantly improved (see e.g., Buikema et al. (2020) for improvements from 2015 to 2022), and we furthermore conservatively do not include the LIGO India detector, a fifth detector that is being built in India, which would improve the rate estimates.We have also assumed as a reference estimate a precise 5 deg 2 sky localisation, which is better than the current 2-detector and 3-detector sky localisation of typical O3 LIGO-Virgo-Kagra detections The LIGO Scientific Collaboration et al. (2021).A ∼ 10 deg 2 localisation is roughly indicative of an average sky localisation accuracy of a 4-detector or 5-detector network Fairhurst (2018), worse than the 5 deg 2 estimate.However, due to Earth's rotation, strong lensing would effectively increase the number of detectors (e.g.Seto 2004;Goyal et al. 2021a); for example, in the case of a 4-image detection with 5 detectors online 100% of the time, one would effectively have a network of close to 20 detectors, which would yield better sky localisation than a 5detector network.Nevertheless, a systematic quantification with joint parameter estimation has not been done, and it requires folding in also duty cycles and low signal-to-noise ratio detections Petrov et al. (2022).Thus, a more detailed analysis regarding the duty cycles and sky localisations should be carried out in the future.

Future prospects and follow-ups in the EM
We also investigated the results of a ground-based follow-up with Subaru/HSC, as described in Appendix D, but found that <1 % of lensed GWs could be identified correctly, suggesting that groundbased follow-up remains inefficient largely due to its limited spatial resolution.However, we only have done single-band modelling of lenses in this work, and the multiple colours that would be available from ground-based follow-ups would be helpful too, in both the lens detection and the lens modelling process.We leave this for future analyses.We also leave the analysis with follow-up observations with higher-resolution and more sensitive space-based instruments (e.g.Roman Space Telescope) for future analyses.
However, how likely is successful identification?This depends largely on the model assumptions.In the best-case scenario, where the astrophysical errors on time delays and flux ratios are minimal, and with a localisation of 1 deg 2 , assuming a 40 % sky coverage by Euclid, we can expect a 2 identification with around 5 quadruple image events, while in the nominal scenario for the errors, and 5 deg 2 localisation, we would need around 13 quadruple image events.Unfortunately, this seems to imply that the Euclid survey will likely not be sufficient to make a successful identification.
Instead, what would likely be required is a dedicated follow-up from space of events outside the Euclid footprint with similar or greater depth and resolution.A follow-up with similar quality to Euclid could increase the success rate by at least a factor 2.5 (requiring a mere 2-5 quadruple image events) and possibly more with the upcoming Roman Space Telescope or even with the Hubble Space Telescope (a 1 deg 2 search area is smaller than the COSMOS field).Additionally, if the identification was inconclusive, follow-ups could still help.For instance, one could do a ground-based lens search that goes deeper than Euclid, which would pick up lenses that are too faint for Euclid, and therefore lower increase our chances of finding a lens (and also decreasing the posterior probability in the hypothesis that the lens was not found, which helps with the confidence of identification).However, we leave detailed estimates to future work.A lensed GW event would likely warrant such a dedicated follow-up program.
Note, also, that we have here assumed 2-sigma detection thresh-     log i , t i = 10%, 1.5% 10%, 3% 10%, 6% 20%, 1.5% 20%, 3% 20%, 6% 30%, 1.5% 30%, 3% 30%, 6% Figure 10.The identification confidences with estimated numerical uncertainties, given two, three and four detected GWs. Shown is the inverse CDF of the confidence level at which the GW is identified; the Bayes factor favouring it above the most likely injected lens, or if Z miss is higher, the Bayes factor identifying it above Z miss .The legend indicates the uncertainty on the measured fluxes and on the measured times.
olds.If the threshold was set at 5-sigma, then the number of successful identifications would decrease significantly.However, if a 2-sigma or 3-sigma detection were made, it might warrant a more dedicated follow-up, which may allow us to discriminate the lens from others at higher accuracy.For instance, if a candidate has a badly constrained lens model, one could do a deeper or more highresolution follow-up of this lens.The improvement in the precision of the model could then help us confirm or rule out the candidate with more confidence.However, this, too, will require detailed estimates of the capabilities, which we leave to future work.We also note that even if the lens candidates are narrowed down to a few candidates, as opposed to one candidate, these systems can still contribute to statistical studies.For example, one might perform cosmography studies by marginalizing the Hubble constant measurement, weighted by the Bayes factor, for each candidate.By doing so, one might perform any follow-up science-case analysis in a statistical manner, similarly to the galaxy catalog methodologies developed for dark siren follow-up programs Fishbach et al. (2019).

Improvements on lensed GW detections
There are also other improvements that we may expect in the future.In particular, once built, a planned gravitational-wave detector, LIGO India, will increase the initial sky localisation accuracy from the gravitational wave.Furthermore, planned improvements to the lensing sub-threshold search pipelines Li et al. (2019);McIsaac et al. (2020) are expected to allow us to increase the number of quadruple image detections.Detection of more images, even if they have relatively low signal-to-noise ratios, would likely help with sky localisation by allowing for an effectively increased detector network in strong lensing analyses, contributing to narrowing down the a priori number of lensed candidates.A further improvement from the inclusion of sub-threshold events would be given by the time-delay (and to lesser degree magnification) information from the increased number of images; for example, four gravitational-wave images would pinpoint quite accurately the binary source position even without the magnification information, which would improve our ability to discriminate the correct lens candidate from the complete set of potential strong lens candidates.With these improvements, the number of events that are possible to localise in the coming years is expected to become more optimistic.Thus, the continued upgrades to the detector network and to the lensing pipelines will be crucial for successful lens identification.
Finally, in the scenario that the current gravitational-wave detectors do not detect a sufficient number of lensed gravitational waves for the identification, the planned third-generation gravitational-wave detectors will.In particular, third-generation detectors such as the Einstein Telescope and Cosmic Explorer are expected to detect around 50-100 lensed events per year (Biesiada et al. 2014;Li et al. 2018;Xu et al. 2021), some of which events are sure to be correctly localised to their host lens systems with a dedicated program.
A complementary avenue to localise not lensed binary black holes, but binary neutron stars, is the identification of counterparts to lensed binary neutron star events Bianconi et al. (2022); Smith et al. (2022).In particular, if a strongly lensed binary neutron star is identified in low latency and its electromagnetic counterpart is located in a rapid follow-up search, this might likewise allow for identification of the host galaxy.Although here we target binary black holes, not neutron stars, it will still be interesting to consider this possibility, as many of the follow-up multimessenger applications, such as Hubble constant measurements, could also be performed with this type of search.

Prospects for follow-up science
If the lens system is uniquely identified, one gains access to the following information: (i) Binary black hole and lens redshifts (  ,   ) through spectroscopic or photometric measurements of the host and lens galaxies; (ii) Effective gravitational-wave luminosity distance  eff  (luminosity distance as measured directly from the gravitational wave,   / 1/2 ) 19 ; (iii) Lens model reconstruction; (iv) Image properties (magnifications, time delays, Morse phases) as predicted by the lens model reconstruction and as directly inferred from the gravitational waves; (v) Precise gravitational-wave sky location; (vi) Hostgalaxy properties (e.g. its stellar mass, age, metallicity), especially with additional follow-up observations; (vii) Location of the binary black hole within the galaxy.Here we discuss potential applications enabled by this additional information.
Firstly, a measurement of (i) the source redshift and (ii) luminosity distance allows one to perform high-redshift standard-candle measurements (Hannuksela et al. 2020).In particular, strongly lensed gravitational-wave events originate from redshifts   ∼ 1 − 4, which are much above the typical binary neutron star redshifts   ∼ 0.01 (Abbott et al. 2017).This would allow us to probe cosmological expansion at a high redshift, a region currently beyond our reach with standard siren measurements.Moreover, as the gravitational wave at high redshift will have propagated an extensive distance, it could allow us to probe the gravitational-wave propagation in the context of alternative theories at much-improved accuracy, as shown by Finke et al. (2021).
Secondly, access to the lens model reconstruction (iii) and the image properties from (iv) could allow for improved lens modelling.A common hindrance to lens modelling is the mass-sheet degeneracy, which refers to the degeneracy between the power-law slope of the lens model and the lensing time delays.It has been shown, in the context of supernova lensing, that the degeneracy can be broken, once we obtain a standard-siren measurement of the system (iv), thus allowing for improved modelling of the lens itself (Oguri & Kawano 2003).In addition to being a standard siren, gravitational waves do not suffer from scattering, dispersion, or scintillation by the intergalactic and interstellar medium, and are less hampered by microlensing (Cheung et al. 2021).
Thirdly, strongly lensed gravitational waves allow us to study the 19 Here and throughout we distinguish the luminosity distance as inferred from the gravitational waves without accounting for lensing ( eff  =   / 1/2 ) from the actual, intrinsic luminosity distance (  ).
full polarization content of gravitational waves by increasing the effective number of detectors (Smith et al. 2019b;Goyal et al. 2021a).Once the sky location is known (v), one can further improve the polarization tests (e.g.Pang et al. 2020).
Fourthly, identifying the host galaxy (vi) and the merging black hole's location within it (vii) will allow us to study several interesting questions about the origin of merging black holes.For example, if they originate from galaxies, do they mostly belong to massive galaxies?Do they mainly come from old or young galaxies?Do they live in the bulges of galaxies, and do they follow the old stellar population, or in outskirts with more young galaxies?Many different formation channels for binary black holes have been proposed (Santoliquido et al. 2021;Mapelli 2020, and references therein), each having some dependence on the environment and redshift.Locating the host galaxy of the binary black hole might allow us to investigate these questions in a way not possible with other methods.Moreover, precise, sub-arcsecond, GW localisation of these cosmological-distance events would allow us to study the problem under a microscope (Hannuksela et al. 2020).
Fifthly, owing to the localisation (vi), we could independently verify strong lensing detections (Dai et al. 2020).The searches for strong lensing rely on identifying two near-identical events.However, it is also plausible that two unrelated events appear identical within detector accuracy due to astrophysical coincidence (Çalışkan et al. 2022).An identification of the lens system in the electromagnetic band would allow us to independently confirm the detection.

Future work
In addition to what we have already incorporated in our identification methodology (Section 4.1), there is some additional information that can be used in practice that we have not included.For instance, we do not include the information given by the absence of detections in the full GW-databank: if some model parameters (or in extension, a candidate lens) predict an associated GW event with a high enough amplitude in a time window where no possible associated GW is seen (accounting for detector downtime and the response due to the antenna patterns of the detector), these parameters (lens) must be disfavoured.A similar line of thought was used by Dai et al. (2020) to put an upper limit on the redshift of their candidate triplet of lensed GWs.Additionally, if we find that the gravitational wave is microlensed, this can give a wealth of extra information (Cheung et al. 2021).Furthermore, a detailed analysis of the sky localisation capabilities for strongly lensed gravitational-waves with variable detector network is left for future work.This will be particularly interesting, as the strongly lensed gravitational-wave detections will allow for an effectively improved detector network, rendering potentially highly accurate sky localisations feasible (Seto 2004).
Future work could also generalise our results.For instance, in the lens modelling, the astrophysical priors (that were also used to generate the data) could be relaxed to more uninformative priors.This may make modelling the faint source galaxies a bit harder.One could also consider other instruments, for instance, the upcoming Chinese Space Station Telescope.The lensing simulations could also be made more realistic, for example by creating them by ray-tracing through cosmological simulations.Additionally, we have considered just the importance sampling evidence estimator.In some cases, the evidence uncertainty of this estimator is quite large.Although this could be solved by just sampling more points, more sophisticated evidence calculations, like annealed importance sampling (Neal 2001) or nested sampling may be preferred.Of course, the disadvantage of such strategies is that the lens modelling cannot be done separately from the GW modelling: currently, the importance sampling estimator can simply be applied as a postprocessing step on the posterior samples of any third-party lens modelling software.
Lastly, although it was not originally intended for this purpose, the code can be easily used to calculate the astrophysical lensing probability for lenses just based on GW information too, because the lensing statistics prior is consistently folded in with the GW-data likelihood.For example, one could compute astrophysical contribution to evidence of the lensing hypothesis conditional on some set of GW events in our galaxy-galaxy lensing framework, even without any images of candidate hosts.
exp = 565 s is the exposure time  sky = 22.35 mag/arcsec 2 is the sky brightness The aperture correction due to not getting all the signal in the 1.3 arcsec aperture negligible due to the small FWHM,  = exp − (1.3/2) 2 2•0.17 2 /8 log 2 = 2 × 10 −18 .Setting SNR = 10 at  = 25, gives a photometric zeropoint of   = 25.12 and a background noise of  bg = √︃  sky +  2 ro = 9.494 e − /pix for each image.The 3-image coadd would therefore have  bg = √ 3 • 9.494 e − /pix, with a total exposure time of 3 exp = 1695 s (due to the dithering pattern, half of the area is in fact 4 exposures, but this is neglected; see Collett (2015)).The same calculation by Euclid Collaboration et al. ( 2019) yields   = 24.0, with the difference arising because (i) the authors using the design sensitivity instead of the predicted sensitivity, (ii) the authors considering the calculation as a single exposure of 1695 s instead of 3 exposures of 565 s and (iii) the authors assume that the SNR of 10 is achieved with good SExtractor settings while in the Euclid Collaboration et al. ( 2019) simulations, it was actually achieved with very sub-optimal aperture photometry.Collett (2015) uses   = 25.5.We note that the results are robust against reasonable variation of the assumed Euclid noise parameters.

APPENDIX B: OBSERVABILITY BY EUCLID
To obtain realistic requirements for classifying an image as observable (that is, detectable and recognisable as a lens by a good lens finder, Section 2.6), a visual inspection of a subset of the simulated systems was done.A human classification was carried out for several hundred systems to assess whether it would be observable.Some example classifications (in particular some of the harder ones) are shown in Figure B1, where the top panel images were classified as lenses that have clear lensing features that a lens finder may pick up, whereas the images in the bottom panel are assessed as too bad for identification.Since some level of subjectivity is involved, we tried to remain conservative but are helped by the fact that Euclid has both a high spatial resolution and a relatively high SNR for most potential lenses that could lens GW events.The detectability seems to rely mainly on the Einstein radius and the background source information content (which is summarised as a score: the number of lensed-image pixels with SNR > 1.5, see Section 2.6).The relation between these parameters and observability is shown in Figure B2.It can be seen that a reasonable cut can be made by setting hard thresholds on these parameters, indicated with the gray region.As can be seen in a few of the lenses in the lower panel of Figure B1, there is a group of lens systems that is not observable because the background galaxy is too smooth, and therefore, lensing has too little visible impact on the image morphology.Because of this, we include a cut on the source galaxy semi-minor axis   , in Figure B1, galaxies that did not pass this threshold are indicated with crosses.The final thresholds (  > 0.33 arcsec, Score > 70 and   < 0.56  ) were found by maximising the accuracy with respect to the human classification while keeping the total fraction of detectable lenses in the visually inspected sample the same.More sophisticated lens finders are available, but this is left for future work.

APPENDIX C: MAGNIFICATION DISTRIBUTIONS
Due to the magnification bias, higher magnification systems are overrepresented compared to the astrophysical distribution in samples of lenses that are SNR-limited (or similar).It is therefore interesting to quantify how strong this effect is for our lensed gravitational waves, for instance, if the sample is dominated by high-magnification systems, or more average lens systems.For our simulations done in Section 3, we have put observability limits on the GW signals.The distributions, for the systems conditional on the GWs being detected (Equation 24), are shown in Figure C1.We note that in systems of high magnification, only 2 out of 4 images are usually highly magnified, and that because of this, when requiring 3 or 4 GWs to be detected, the magnification bias is not very strong.For cases where 2 GWs are detected, there is a more substantial high-magnification tail, but in total still only ∼20 % of systems have magnifications above 10.

APPENDIX D: GROUND-BASED FOLLOW-UP
In case the gravitational wave is outside of Euclid coverage, a groundbased follow-up may still be possible.To investigate this, we run the simulations and the identification tests also for imaging with a ground-based telescope.Specifically, we test our framework with Subaru/HSC imaging (although different telescopes, for instance a target-of-opportunity follow-up with Rubin/LSST on a night with good seeing would be possible too.).We (conservatively) assume the imaging depth parameters achieved in the r-band by the 'Ultradeep survey' of the Subaru Hiper-Suprime Cam Strategic Program (Aihara et al. 2018).Since seeing is the most important parameter, we consider two cases, one with good seeing of 0.4 arcsec and one where the seeing is 0.6 arcsec.The resulting accuracies are shown in Table D1.Although the numbers are quite a bit lower than in the Euclid-based analysis, in this ground-based follow-up scenario, we have only used single-band information.It is likely that multi-band information helps, especially on the lens finding because one can make use of the difference in colour of the lens and source galaxies (Collett 2015).

APPENDIX E: PAIRING GWS TO LENSED IMAGES
A priori, the pairings between model time-delays and flux ratios and measured time delays and flux ratios are not known.Unlike in quasar or supernova time-delay cosmography, there is no information that links a measured time delay time to a specific pair of images.An apparent fix would be to just sort the images based on their arrival times.However this ordering is not necessarily correct, or unique, because of the (systematic) uncertainties on the arrival times.In some lens configurations, this is problematic, because the sorting of measured arrival times leads to a wrong time-to-image correspondence.Ideally, one would add another parameter to the model: the permutation  that links each image to an arrival time and magnification.Since we are not directly interested in it and sampling this discrete parameter explicitly is impractical, this permutation is marginalised over.This gives The information linking the BBH position distribution to the light distribution and the assumption that the probability of a GW is proportional to the luminosity is already incorporated in the priors.4 this can be worked out further: This is a Normal distribution with inverse covariance matrix , where   is the  ×  identity matrix.Inverting this (using the Sherman-Morrison formula) yields  =  2 (   −1 + 1), or ()   =  2 +  2    .The determinant can be calculated with Sylvester's theorem, which yields det() = ( 2 )  −1 , confirming that the derived distribution is normalised, as it should be.

APPENDIX G: A FAST CAUSTICS CALCULATION
Tessore & Benton Metcalf (2015) do not give a fast (analytical) way of calculating the critical curves (and therefore the caustics).lenstronomy does include several methods of computing the caustics but its numerical uncertainty makes the resulting polygon inconvenient to sample from (for example, there will be unphysical self-intersections).A small description of a reliable analytical procedure for calculating critical curves for an elliptical power law + shear model is therefore shortly described here and has since been added to lenstronomy.The derivation involves adding an extra external shear to equation ( 17 To calculate the critical curve, a grid of  is sampled, and for

Figure 2 .
Figure2.Source galaxy parameter distributions of the simulated sample of lens systems that could host a quadruply-lensed GW.Colors are like in Figure4, with the addition of the red-brown distribution, which is the distribution of the intrinsic host galaxy population (the Jaguar catalogue(Williams et al. 2018)).Shown are the source redshift (  ), the galaxy's stellar mass (log 10 (  * / ⊙ ), the galaxy's ellipticity (1 −   ), the effective radius   , and the VIS-magnitude (  ).The bin-like structure for the ellipticity is an artifact of the generation of the Jaguar catalogue(Williams et al. 2018), generated as described in Section 3.1.

Figure 3 .
Figure3.The time delay distribution (we take here the time between the first and last event) for 4 detected GWs with  * as GW rate.The green line shows the fraction of lenses that are also observable in the EM in each bin.

Figure 4 .
Figure 4. Lens mass model parameter distributions of the simulated sample of lens systems with quadruple images -systems that could host a quadruply-lensed GW.Shown are the Einstein radius (  ), lens galaxy velocity dispersion (  ), ellipticity (1 −   ), lens redshift (  ), and the power-law index (  ) of the elliptical power-law model.The different colors refer to the different conditions on the lens systems: blue (EM deterved) indicates a sample of lens systems as could be observed with Euclid-like imaging, yellow (GWs observed) indicates a sample of lens systems that are hosts of gravitational waves that have a high-enough SNR to be observed, and green (GWs+EM deterved) indicates lens systems that are both observable with imaging and are host to a detectable gravitational wave, generated as described in Section 3.1.

Figure 5 .
Figure 5. Examples of simulated lens systems.Systems in the top row were generated to have ≥ 2 images, systems in the middle row have ≥ 4 images, and systems in the bottom row are systems generated to be hosts of lensed GWs.A BBH source position is placed inside each source galaxy.Its source plane source position is indicated with the small green plus, its lensed source positions are indicated with the red stars, with the sizes representing the magnifications.The critical curves (red) and caustics (purple) are also shown.For each system, the lens redshift, source redshift and maximum time delay is listed.
Figure7.A cumulative distribution function of the Bayesian rankings of the true system with respect to the most likely incorrect system.For example, for 2 GWs, the probability the true lens ranks first is 23 %, and is in the top-5 candidates for 48 % of the cases, when using combining all information.The three panels correspond to the case with 2, 3, and 4 detected GWs.The contribution of the different types of information used is split up into different colours, with in blue the position-only rankings.The other colors indicate the improved rankings upon considering also time delays and flux ratios (yellow), when considering the host GW emission probability (green), when considering the GW-inferred luminosity distance (orange-red), when considering the constraint that is obtained from the number of detected GWs (pink), and when combining all data (brown).We assume a  (GW emitted) ∝ SFR, a localisation area of 5 deg 2 , and 20 % and 3 % errors for the flux ratios and time delays.

Figure 8 .
Figure8.The identification accuracy of the sample with different localisation precisions.For the nominal scenario (the line), 20 % and 3 % error on the fluxes and arrival times are assumed.The shaded region indicates the spread between the best case and the worst case scenario for flux ratio and time delay errors (10 % and 1.5 % for the best case and 30 % and 6 % for the worst case).
Figure9.The identification Bayes factors for different numbers of GWs, for different uncertainties on GW fluxes and time delays.These are boxenplots: enhanced boxplots starting at the median, creating the central box between the first and third quartile, and from then each subsequent level contains half of the remaining data(Hofmann et al. 2011).The right axis indicates the sigma-equivalent to the Bayes factors on the left axis.We assume a sky localisation of 5 deg 2 , and a GW emission probability  (GW emitted) ∝ SFR.

Figure 11 .
Figure11.Sub-arcsecond localisations of GWs with lensing.Shown is the source plane light distribution, and overlayed the inference on the BBH position without any GW data (black) and the improvement when jointly modelling the EM+GW data (green-blue).The (red) contours for the galaxy centroid coordinates give an idea of the uncertainty in the EM modelling.The true BBH position (white x) is recovered.Additionally, the caustics are plotted (grey) and in the inset plots the observed lens images are displayed.

Figure B2 .Figure C1 .
Figure B2.Classification thresholds.Each galaxy is a blue or orange point, depending on whether it was deemed observable.If a galaxy did not pass the size threshold (semi-minor axis length   > 0.56  ), the dot was replaced with a cross.

Table 1 .
Summary of the model.See the full text and referred sections for more complete explanations.Mass axis ratio  mass  mass / light ∼ N (1,  2  )   = 0.2 Mass axis angle  mass  mass ∼ N (  light ,  2   (   ,   ),   (   ,   ) )