Testing the LSST Difference Image Analysis Pipeline Using Synthetic Source Injection Analysis

We evaluate the performance of the Legacy Survey of Space and Time Science Pipelines Difference Image Analysis (DIA) on simulated images. By adding synthetic sources to galaxies on images, we trace the recovery of injected synthetic sources to evaluate the pipeline on images from the Dark Energy Science Collaboration Data Challenge 2. The pipeline performs well, with efficiency and flux accuracy consistent with the signal-to-noise ratio of the input images. We explore different spatial degrees of freedom for the Alard–Lupton polynomial-Gaussian image subtraction kernel and analyze for trade-offs in efficiency versus artifact rate. Increasing the kernel spatial degrees of freedom reduces the artifact rate without loss of efficiency. The flux measurements with different kernel spatial degrees of freedom are consistent. We also here provide a set of DIA flags that substantially filter out artifacts from the DIA source table. We explore the morphology and possible origins of the observed remaining subtraction artifacts and suggest that given the complexity of these artifact origins, a convolution kernel with a set of flexible bases with spatial variation may be needed to yield further improvements.


Introduction
The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST; Ivezić et al. 2019) will survey the southern sky from 2025 to 2035.This wide-field ground-based system uses an 8.4 m (6.5 m effective) telescope with a 3.2 gigapixel camera and a 9.6 deg 2 field of view.The LSST survey will repeatedly observe ∼18,000 deg 2 of the sky in the ugrizy filters during its 10 yr of operations.
The LSST data will be useful for a wide variety of science (LSST Science Collaboration et al. 2009), with guiding areas of probing dark energy and dark matter, studying the transient optical sky, producing an inventory of the solar system, and mapping the Milky Way.Achieving these scientific goals requires the discovery of transient sources and accurate photometric measurements.The LSST is expected to discover about 106 transient sources per night (Ridgway et al. 2014;Graham et al. 2019;Ivezić et al. 2019), which requires a detection pipeline with high efficiency and a low rate of nonastrophysical artifacts.
The Rubin Observatory LSST Dark Energy Science Collaboration (DESC) developed the dia_pipe 6 package to run the LSST Science Pipelines7 Difference Image Analysis (DIA) algorithms.The dia_pipe combines different parts of the LSST Science Pipelines difference imaging code into a single pipeline, which enables the user to perform image difference, source association, and forced photometry.The DESC dia_pipe has been integrated with the LSST Data Release Production pipeline.
The LSST DESC Data Challenge 2 (DC2; LSST Dark Energy Science Collaboration (LSST DESC) et al. 2021) was a modeling and simulation effort that generated simulated LSST images.From a simulated Universe with galaxies, stars, and Type Ia supernovae (SNe Ia), DC2 used GalSim (Rowe et al. 2015) to render astronomical objects into a set of images for 5 yr at 300 deg 2 of the sky.These images were processed through the LSST Science Pipelines into data products modeled on those to be delivered by the Rubin Observatory.Sánchez et al. (2022) compared the DIA performance metrics to the Dark Energy Survey DIA performance (Kessler et al. 2015) using a 15 deg 2 subset of DC2 data, based on the SNe Ia that had been included in the DC2 simulation, and found similar performance to the DESC survey.Here, we perform a more indepth evaluation of DIA using additional synthetic source injection on a subset of the DC2 data set.Our goal is to connect and explore the DIA algorithmic concepts and code as implemented in the LSST Science Pipelines.
In this work, we inject a controlled set of synthetic sources onto DC2 images to (1) evaluate the pipeline at different source magnitude levels and host magnitude levels, (2) compare the Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
pipeline performance with different kernel spatial orders, and (3) determine a set of flag-based cuts to removing false-positive detections ("artifacts").
The essential strategy of the synthetic source injection technique is to add synthetic sources onto astronomical images based on the measured point-spread function (PSF) and monitor the detection status and flux measurements of these synthetic sources.
Traditionally, the use of DIA algorithms in surveys has suffered from significant subtraction artifacts, leading to the need for human-based or machine-learning-based classification (Kaiser et al. 2002;Frieman et al. 2008;Sako et al. 2008;Bernstein et al. 2012;Kessler et al. 2015).We explore the possible origins of these artifacts using difference images produced by the DIA pipeline.We study both the statistical properties of artifact measurements and the morphology of artifacts.A few possible causes are discussed, and we give suggestions for future improvements.
The outline of this work is as follows.Section 2 introduces image difference algorithms, Section 3 introduces the analysis framework of the synthetic source injection technique, Section 4 presents the analysis results of the DIA pipeline, Section 5 discusses the morphology and potential origin of subtraction artifacts, Section 6 discusses the current status of the DIA pipeline and possible future improvements, and we provide conclusions and outline future directions in Section 7.

Difference Image Algorithms
To begin DIA, we prepare two images that cover the same region of the sky with a time separation longer than a typical light-curve duration.One image, which usually was taken in good seeing conditions and has a high signal-to-noise ratio (S/N) for the magnitude range of interest, is designated as the template image.The other image is called the science image.Typically, the template image was taken months to years before the science image and may be a coaddition of many images to reduce the photon noise.In this analysis, we use coadded template images.The science image is typically from the current night, where we search for new transient sources or variations in the brightness or position of existing sources.
The next step is to match both images to the same seeing condition and the same astrometric reference so that we can subtract one image from the other to obtain a difference image that ideally represents the astrophysical differences between them.Source detection and photometric measurements are then performed on the difference image to produce a detection catalog.The LSST Science Pipelines version used in this work provides both the Alard-Lupton (AL; Alard & Lupton 1998) and the Zackay-Ofek-Gal-Yam (ZOGY; Zackay et al. 2016) algorithms to perform image differencing.The AL algorithm assumes that the PSFs of the science image (S) and template image (T) can be matched using a kernel κ.This process is done empirically by matching common sources in both images and does not explicitly use calculated PSFs for the images.The kernel κ is solved by minimizing the square sum of pixels from the difference image.The ZOGY algorithm assumes explicit knowledge of the PSF in each image and uses the Fourier transform between them to achieve the same seeing conditions of T and S and produce a detection image highlighting the astrophysical sources.A decorrelation method is also introduced in the ZOGY algorithm to perform decorrelation of the difference image.

AL Algorithm
The AL algorithm solves for the matching kernel κ, which uses stars in both images to match the PSF of the template image to the PSF of the science image.Galaxies can be used in addition to stars, although a galaxy provides less information than a star of the same flux level.Thus, this process is not dependent on reliable star-galaxy separation.In the spatially varying (Alard & Lupton 1998;Alard 2000) situation, the matching kernel can be expressed as Here N κ is the number of kernel bases, a q is a set of fitted spatially varying coefficients, and k q is a set of kernel bases.
The most commonly used kernel bases are composed with a set of Gaussian functions modulated with a set of polynomials (Alard & Lupton 1998).Another type of kernel basis is composed with delta bases (Bramich 2008;Miller et al. 2008), which are "shape-free" delta functions defined for each kernel pixel coordinate.
Using the matching kernel κ, the difference image of the AL algorithm can be expressed as where B(x, y) is the fitted sky background.

ZOGY Algorithm
The ZOGY algorithm was designed for the case where PSFs and flux scaling factors for T and S are properly estimated, and the sky backgrounds of both images have been properly subtracted. 8The difference image from the ZOGY algorithm in Fourier space can be expressed as However, ZOGY provides no mechanism to include information about the spatial variation of PSFs.One approach to using a spatially varying ZOGY would be to split T and S into subregions within which the spatial variation of PSFs is much smaller (Zackay et al. 2016;Reiss 2017;Hu et al. 2022).Performing the ZOGY algorithm on each subregion is achievable at the expense of more CPU and more complicated bookkeeping for detections near and across the edges of subregions.Such a "cell-based" approach to templates and subtraction could fail if the PSFs are varying rapidly across the images.

The DC2 Data Set
To prepare for analysis of LSST data, LSST DESC created the DC2 data set with an end-to-end approach (LSST Dark Energy Science Collaboration (LSST DESC) et al. 2021) to simulate a subset of the full LSST survey.This approach started with a large N-body simulation of galaxies in the Universe, applied LSST-like observations, simulated each exposure, ran the LSST Data Release Processing pipeline,9 and produced data products analogous to those that will be delivered by the Rubin Observatory.The DC2 observations include six optical bands (ugrizy) with coverage of 300 deg 2 in a wide-fast-deep area and 1 deg 2 in a deep drilling field.It simulates 5 yr of the LSST survey.
The DC2 data set provides two classes of imaging data.For single-frame processing, it provides calibrated exposures (calexps) with sky background subtraction.Each calexp image is a 4000 × 4072 pixel image array with 0 2 pixel −1 scale.It also has a source catalog (src) that contains measurements of sources in that visit.After producing calexps, multiple visits are coadded together to create deepCoadd exposures.The resampling grid of deepCoadds is defined with tracts and patches.Each tract contains 7 × 7 patches, and each patch contains 4100 × 4100 pixels with a scale of 0 2 pixel −1 .A visual representation of the tract-and-patch structure can be found in Sánchez et al. (2022) or LSST Dark Energy Science Collaboration (LSST DESC) et al. (2021).

Template Exposure
We focus on i-band exposures of the DC2 data set for our analysis.We choose "deepCoadd" exposures with seven patches as template exposures.The total area analyzed in this paper is about 0.36 deg 2 .

Host Selection Criteria and Science Exposure
We undertook this work from the perspective of SN cosmology, and so we are interested in the detection and photometric accuracy of new point sources (SNe) on top of extended sources (galaxies).GCRCatalogs (Mao et al. 2018) is a DESC package that provides users with a consistent interface to access both the input truth catalogs and the output processed catalogs for the DC2 simulations.We use GCRCatalogs to extract a list of potential host galaxies from the input truth table with a range of apparent host magnitudes.Next, we measure the local surface brightness (SB) with 1″ aperture photometry at the center of each galaxy in the simulated images.We sort the galaxy sample into 1 mag arcsec −2 bins distributed from 20 to 25 mag arcsec −2 with 1 mag arcsec −2 bin widths.While in the real Universe, SNe occur throughout galaxies, what we really want to explore here is the dependence of SN detection on underlying SB.Thus, we place the SN at the centers of galaxies with different central SB.
Within each patch and each SB magnitude bin, we select a set of 20 hosts.Galaxies are chosen to satisfy two criteria: (1) each host center is at least 200 pixels away from the image boundaries in both the x-and y-directions, and (2) each host center is at least 100 pixels away from another host center.These criteria help us avoid subtractions near image edges and injecting overlapping synthetic sources.Locations of hosts on a patch are shown in Figure 1.
After selecting the host set within each patch and each host SB magnitude bin, we match each host set to 10 calexps using the tracts_mapping.sqlite3database file of the DC2 data set.We inject synthetic sources onto each calexp and then use those as the science exposures for running the image difference algorithm.We define a calexp and its corresponding deepCoadd exposure as an "image pair."In total, we have 70 image pairs.The LSST survey takes images at a range of spatial and rotational dithers; thus, a single calexp often does not fully cover all 20 hosts in each host set.We restrict our sample to calexps that cover at least 15 host galaxies.

Synthetic Source Injection Framework
Our synthetic source injection procedure is based on the insertFakes module of the pipe_tasks10 package of the LSST Science Pipelines.We convert the R.A., decl. of the host to the x, y coordinate on the image.We then simulate point sources based on the calexp PSF model at that location on the image.The synthetic source injection framework converts the magnitude to the instrumental flux and scales the normalized synthetic source model by the instrumental flux.In the singleframe processing, the instrumental flux to magnitude calibration of the images was calculated with respect to a 12 pixel radius aperture.While the PSF model is finite in size, it is always larger than the calibration radius.The PSF model is normalized to sum to 1.We calculate the aperture correction between the full PSF and the smaller calibration radius and scale the model of the PSF by this correction.We then multiply the scaled PSF by the instrumental flux from the calibration to inject a source that has the correct photoelectron contribution for its magnitude.The scaled model is added to the image plane of the calexp without making any changes to its variance plane, which means we ignore the Poisson noise of the synthetic source.
After finishing the injection stage, we run the DIA pipeline on the injected data set using the imageDifferenceDriver.py of the dia_pipe package with v23.0.011 of the LSST Science Pipelines.This stage produces difference images and then runs photometry measurement algorithms on that difference image to produce the DIA source (diaSrc) table, which contains source measurements of the detected source.We obtain positions of variable stars and active galactic nuclei (AGNs) from the truth tables in GCRCatalogs and match them to the diaSrc tables.All variables and AGNs that get matched are removed from the diaSrc tables.As a result, the detections from the diaSrc tables only have synthetic sources and artifacts.12Next, we match the known injection positions to the diaSrc tables to determine whether the injected synthetic sources are detected in the difference exposures.The diaSrc fluxes and the flag information of the detected sources are also extracted from the diaSrc tables for further analysis.

LSST Software
In this work, we use the v23.0.0 version of the LSST stack package to provide basic functionalities for synthetic source injection and image difference.We have documented the proper software versions in the GitHub repo LSSTDESC/ dia_improvement,13 which includes scripts to reproduce our work.

DIA Pipeline Analysis
In this section, we describe performance metrics, injection strategy, and analysis results of the DIA pipeline.Different kernel spatial degree of freedoms have been implemented in the pipeline to reduce subtraction artifacts.We compare their performance here and identify the photometric flags that effectively select true positives with high efficiency and a low artifact rate.

Metrics
We evaluate the subtraction quality of the DIA pipeline using the following metrics.These metrics should reflect the detection ability and the photometric measurement accuracy of the pipeline.
1. Efficiency14 (Eff): the number of detected synthetic sources divided by the number of injected synthetic sources.We expect it to approach unity when synthetic sources are bright and decrease to 0 at the faint magnitude end.The efficiency curve can be fitted using a sigmoid model.In this work, we use the function to fit our data, with a and b as fitting parameters and m as the synthetic source magnitude. 15Using this equation, we can solve the magnitude depth m 1/2 , which corresponds to the synthetic source magnitude, where the efficiency drops to 50%.The magnitude depth m 1/2 is mathematically equivalent to the fitting parameter b. 2. Artifact rate (AR): the number of subtraction artifacts per square degree.3. Flux pull (ΔF/σ): This flux pull should follow a normal distribution with a mean of 0 and a standard deviation of 1.In this analysis, we calculate the pull statistic of the diaSrc PSF flux.

Analysis Strategies
We apply two analysis strategies to sets of DC2 calexps.The first strategy is to focus on measuring efficiency as a function of source magnitude and background SB; we inject synthetic sources from 20-23 mag in 0.5 mag steps and 23-24 mag in 0.1 mag steps.These finer steps focus on the magnitude range where the efficiency drops rapidly.We use this strategy to determine the magnitude depth m 1/2 at which 50% of the sources are recovered.
For the second strategy, we undertook a related but separate campaign of studying efficiency and artifact rate for high signal-to-noise synthetic sources and low signal-to-noise synthetic sources with different subtraction configurations with two consistent sets of 20 mag (MAG20) sources and 23 mag (MAG23) sources.

Performance of the Default Configuration of the DIA Pipeline
To get a baseline evaluation of the performance of the DIA pipeline, we run dia_pipe using the default configuration of the pipeline.The default difference imaging algorithm is AL with a set of Gauss-polynomial bases.The AL kernel has three Gaussians with sigma values of 0.7, 1.5, and 3.0 pixels, with respective order 4, 2, and 2 polynomials in kernel (u, v) space.These kernel terms are allowed to spatially vary at first order in x, y across the image.The background level is allowed to vary spatially at second order in x, y.
The upper plot of Figure 2 shows our recovered detection efficiency as a function of synthetic source magnitude.The efficiency follows predicted smooth curve efficiency functions, with a decrease in efficiency as a function of both (1) decreasing injected flux and (2) increasing underlying SB (and thus background noise) at the location the synthetic source is placed.The effect of the underlying host galaxy light is most visible for the brightest hosts, SB = 20-21 mag arcsec −2 , which shows a substantial 0.25 mag m 1/2 shift in the efficiency curve.The DIA pipeline successfully recovers all16 synthetic sources brighter than 22 mag.The efficiency starts to drop below 1 around 22 mag, and it drops below 0.5 between 23 and 24 mag.In the lower plot of Figure 2, we focus on the detection efficiency at the faint magnitude (low S/N) limit.The predicted values of the m 1/2 of the fitted function are between 23.40 and 23.71 mag.
Figure 3 shows the S/N distribution of the detected synthetic sources when injected at 21, 22, 23, and 24 mag.At the synthetic source magnitude of 21, the S/N distribution is far above the detection limit, which means the pipeline should be able to recover all synthetic sources.At the injected magnitude of 23, the S/N distribution overlaps the detection threshold, and the efficiency begins to drop.When the synthetic source magnitude is 24, most injections are below the detection threshold, which explains why the efficiency is low at this magnitude.

Comparison with Previous DESC DC2 Analysis
Here we give a comparison of the i-band m 1/2 to the previous DESC DC2 DIA (Sánchez et al. 2022).That work evaluated the m 1/2 of the LSST DIA pipeline using about 15 deg 2 of the DC2 data set with artificial SN Ia light curves overlaid onto exposures.The estimated m 1/2 in the i band is 23.50 mag for all sources.
In this work, we estimate the m 1/2 in the i band using about 0.36 deg 2 of the DC2 data set with carefully controlled synthetic point sources distributed across different host brightness.The fitted m 1/2 is between 23.40 mag and 23.71 mag, depending on the underlying SB.

Kernel Spatial Degree of Freedom
Alard & Lupton (1998) and Alard (2000) suggest expanding the AL kernel coefficients a q (x, y) (Section 2.1) as a function of a power series of positions.The highest-order d s of the power series defines the kernel spatial degree of freedom.The implementation of the AL algorithm in the DIA pipeline can be configured to run with different values of d s .In this section, we explore the pipeline performance with d s varying from 1 to 4. The pipeline default value of d s is 1.Our analysis was constrained to a maximum spatial order of d s = 4 due to limitations in v23.0.0 of the LSST Science Pipelines, which have since been lifted.Raising d s beyond 4 offers increased kernel flexibility, thereby potentially reducing the artifact rate.
However, this approach carries the risk of overfitting, since the number of spatial polynomials scales proportionally with d s 2 (Alard 2000;Bramich et al. 2013).Additionally, it would result in an increase in computational time.
The primary goal of exploring the spatial degrees of freedom is to reduce artifacts while maintaining high efficiency and flux accuracy.We focus here on high-S/N sources at 20 mag (MAG20), which is far above the detection limit, and low-S/N sources at 23 mag (MAG23), which is close to the m 1/2 .Since the comparison focuses on the region that is away from the image edge,17 it is possible that the high d s s are performing worse near the image edge.More evaluations near the image boundary are still needed in future work.
In Table 1, we show the efficiency, artifact rate,18 mean, and standard deviation of the flux pull distribution for each spatial degree of freedom at MAG20 and MAG23.We find that the results from high S/N and low S/N are consistent.Up to d s = 4, the artifact rate is a monotonically decreasing function of d s .The fluctuation of the recovered mean pull of the flux of the injected sources is less than 0.020.The fluctuation of the standard deviation is less than 0.010.The fluctuation of the efficiency is less than 0.013.
In Figure 5, we show the postage stamps of difference images from different spatial degrees of freedom.Each column of stamps is made at the same position, while each row corresponds to a specific d s .By comparing the stamps, we find that high d s s, especially d s = 4, can produce cleaner difference images.As a result, the artifact rate decreases as the d s increases.

Detection Flags
The LSST Science Pipelines are configured to detect any possible source and characterize it with flags to indicate detections that are not real astrophysical sources.These 109 flags identify pixel-level features such as saturation, bad pixels, and cosmic rays as well as characteristics of the source in the subtraction such as moment and dipole fits.We use these flags to reject artifacts, which we here define as detections that are not in circular regions with 0 8 radii of known synthetic sources.The data are from MAG20 and MAG23, with host SB between 20 and 21 mag arcsec −2 .In this synthetic source injection analysis, we split the detected sources from each diaSrc table into two tables: one for the detected synthetic sources (MAG20: 1184 entries; MAG23: 1003 entries) and the other for detections that are artifacts (MAG20: 2446 entries; MAG23: 2464 entries).
We use the efficiency and artifact rate to examine the power of each flag to classify sources into real and artifact.We calculate these metrics for all flags and show the full table in Table 2. From this information in this table, we can select a small set of flags where (1) the efficiency of each flag is close to 1 at MAG20 and (2) each flag has a clear physical interpretation related to artifact classification.We document this "selected flags" set in Table 3.To evaluate the efficacy of this flag set, we compare the efficiency and artifact rate before and after applying the union of these flags to our detection.At MAG20, we find that the efficiency drops from 1.000 to 0.999, and the artifact rate drops from 695 deg −2 to 69 deg −2 .At MAG23, the efficiency drops from 0.847 to 0.791, and the artifact rate drops from 700 deg −2 to 68 deg −2 .These results Clustering pixels (second from left) come from subtraction residuals of background sources or background noise.Dipoles (second from right) require measurements and fitting; many are successfully identified, but there is a continuum of less-offset and lower signal-to-noise dipoles that are hard to identify.The dipole artifact is a subclass of the clustering-pixels artifact.Deconvolution (right) leaves characteristic rings from trying to invert information that is not available in the larger PSF image.Note.The mean and standard deviations are of the pull distribution of the difference between detected flux and injected flux (ΔF/σ).We apply a 5σ clip to the pull distribution to remove outliers.At MAG20, the percentages of outliers are (0.249%, 0.232%, 0.182%, 0.265%) at (d s = 1, 2, 3, 4).At MAG23, the percentages of outliers are (0.054%, 0.036%, 0.054%, 0.036%) at (d s = 1, 2, 3, 4).The default spatial degree of freedom is d s = 1.suggest that using the "selected flags" to sift artifacts significantly improves the artifact rate with only a small decline in efficiency.We strongly recommend the use of these flags when looking to identify a set of real astrophysical detections in diaSrc tables.

Artifact Definition
Now that we can identify artifacts using the "selected flags" discussed in Section 4.6, we examine their physical origins using artifacts from AL-based subtractions on the MAG20 data set.We define four types of artifacts that are commonly seen in difference images.The morphology of these artifacts is shown in Figure 6.
1. Saturation artifact: caused by saturated pixels, where the PSF shape is distorted.As a result, an optimal kernel solution does not exist.2. Clustering-pixels artifact: a clustering-pixels artifact appears as clusters of positive pixels and negative pixels.kernel has to "deconvolve" the PSF of the template but is not able to do so because the necessary information is not available.

Artifact Morphology and Flags
To help us understand the relationship between flags and artifact morphology, we split the "selected flags" defined in Section 4.6 into three groups: saturation flags, dipole flags, and shape flags.As their names suggest, saturation flags are set to True if the detected source contains saturated pixels.We name detected artifacts with the saturation flag set to True saturationflag artifacts and name the rest non-saturation-flag artifacts.Dipole flags are set to True for dipole artifacts.It is possible that a detected source visually appears as a dipole but has no dipole flags set to True.This could be related to the trigger mechanism and flux separation of dipole measurements.When the DIA pipeline is making a dipole classification for a detected source, it fits the fluxes of the negative and positive lobes and their corresponding centroids simultaneously.A detected source is classified as a dipole if the quadrature sum of the fitted fluxes is above the corresponding minimum S/N threshold and the fitted fluxes weighted by the total flux are both smaller than the maximum flux ratio threshold.Any detection that visually appears as a dipole but does not satisfy these criteria is not classified as a dipole artifact.Also, the fitting algorithm itself may have degeneracy in estimating dipole fluxes.Dipoles from faint and highly separated sources are almost indistinguishable from dipoles from bright and closely separated sources (Reiss 2016).The inaccuracy of dipole flux estimation could further affect dipole classification.Possible improvements such as using presubtraction images in dipole fitting are discussed in Reiss (2016).Shape flags are set to True if the detected source has second moments that are inconsistent with a Gaussian shape.More detailed definitions of these flag sets are given in Table 4.

Seeing Conditions
Artifact morphology can be affected by the relative seeing conditions of template images and science images.We split our image pairs defined in Section 3.3 into three classes based on the FWHM of their coadd (template)/calexp (science) image PSFs.These classes are defined as 1. broad: calexp FWHM coadd FWHM; 2. near: calexp FWHM < coadd FWHM and (calexp FWHM − coadd FWHM) / (coadd FWHM) > −0.05; and 3. sharp: In the broad class, calexps have larger PSFs than coadd exposures, which is the condition that the AL algorithm is capable of matching the coadd PSF to the calexp PSF.In the near class, calexp PSFs are smaller than coadd PSFs.However, the relative difference of FWHMs is still close.In the sharp class, calexp PSFs are much smaller than coadd PSFs.The AL matching kernel has to "deconvolve" the coadd PSF to match the calexp PSF.

Artifact Morphology and Artifact Fraction
Using the above definitions, we can study artifact morphology in each FWHM class.We start with calculating the proportions of artifacts in each flag group.The results are shown in Table 5.The fraction of artifacts in each class with saturation flags set to True is nearly 50%.Since they are subtraction residuals of bright background sources (starts, galaxies, etc.), it is important to understand at what background source S/N this issue occurs most frequently.We match background sources from the calexp src (Section 3.1) to artifacts using a one-direction-matching19 method with a 2″ search radius.Figure 7 shows the artifact fraction with respect to the calexp source S/N. 20There is an increasing trend between the artifact fraction and the source S/N.This comparison shows that using saturation flags to identify pixel issues can help us to remove saturation artifacts, especially when the source S/N is greater than 600.
Next, we study the artifacts that have no saturation flags set to True.These non-saturation-flag artifacts are detected under different seeing conditions with different artifact rates.For our The algorithm is based on the elliptical shape (Lupton et al. 2001;Bosch et al. 2017) of the detected sources.
The implementation (v23.0.0) used in our analysis flags negative pixel values, which is not what we want for time-variable objects that can be brighter and fainter than in the template image.This bug has been fixed in v25.0.0 of the LSST Science Pipelines.
analysis, we find that the rates of non-saturation-flag artifacts per square degree are 286 (broad), 312 (near), and 414 (sharp), which suggests that as the calexp PSFs become sharper, the artifact rate increases (Table 5).
The morphologies of non-saturation-flag artifacts are shown in Figures 8, 9, and 10.By inspecting postage stamps of these artifacts, we find that the majority of artifacts in the broad class are dipole artifacts and clustering-pixels artifacts.In the near class, the artifact morphology is similar to the broad class.However, in the sharp class, there are more ring artifacts, which indicates the PSF matching kernel is deconvolving the seeing condition of the template image.
To understand the source properties of the non-saturationflag artifacts, we match them to the src using the one-directionmatching method with a 2″ matching radius. 21Of the artifacts, 80.2% have matches in the src, which means most of these artifacts are also subtraction residuals of background sources.In Figure 11, S/N distributions of matched sources in each FWHM class are plotted.Most sources have S/Ns close to or higher than 200, which means bright sources are the main contribution to detection artifacts.This is reasonable because we can only observe artifacts from bright sources where subtraction residuals can exceed sky background fluctuations.It is hard for faint sources to produce visible artifacts because their subtraction residuals are overwhelmed by sky background noise.Other than bright sources, we also find clusters of faint sources in the S/N distributions of each class.In Figure 12, we plot a few stamps of faint sources from coadd exposures, calexps, and their corresponding artifacts from difference exposures.These plots show that peak pixels are the main causes of these artifacts.As to the artifacts that have no matches from the src, we show their morphology in Figure 13.Clearly, background noise is the origin of these artifacts.The artifact rate of this type is 62 deg -2 in the broad class, 57 deg -2 in the near class, and 79 deg -2 in the sharp class.

Dipole Artifacts
Because dipole artifacts are prevalent in all classes with the AL algorithm, we investigate whether the ZOGY algorithm also produces dipole artifacts at the same locations.Figure 14 shows postage stamps of AL dipoles and postage stamps of the ZOGY algorithm at the same positions.Both the kernel matching method (AL) and the cross-filtering method (ZOGY) have dipole artifacts with the same orientations at the same locations.
Here, we give a brief discussion of several different possible causes of dipole artifacts.
(1) Image registration.It is possible that dipoles are caused by small errors in the relative astrometric registration of the images.Applying astrometric calibration may reduce the registration error.However, even subpixel errors could be magnified by bright sources, thereby producing visible dipole artifacts that have positive-negative separations of multiple pixels.
(2) Sampling.Another possible origin of dipoles is the sampling process used for observing or warping exposures.Errors introduced in this process can be magnified by the brightness of background stars, which would result in dipole artifacts.(3) Moving object.An object with a small motion between the images can create a dipole that represents real astrophysical change.However, there are no stars with proper motion in the DC2 simulation, so this particular effect is not applicable to our testing here.(4) PSF matching.It is also possible that the AL algorithm finds an approximate kernel solution.An asymmetric kernel convolving a symmetric coadd source and subtracted by a symmetric calexp source could produce a dipole artifact.However, we also find dipole artifacts using the ZOGY cross-filtering method, which has no kernel fitting involved (Figure 14).We do not entirely exclude the kernel fitting explanation because the PSF models used for cross-filtering are measured from images.It is thus also possible that the approximate PSF used for cross-filtering caused the dipole artifact, which is mathematically similar to a nonoptimal AL matching kernel.(5) Poisson fluctuation.In the bright region, the Poisson fluctuation of background sources can exceed the background sky fluctuation, which leads to local bright/faint residuals.These residuals can spread to nearby pixels after convolution operation, which may appear as dipole artifacts in difference images.More accurate estimations of variance and covariance may increase the detection thresholds and thereby remove these artifacts from the diaSrc table.(6) Differential chromatic refraction (DCR).The atmosphere of the Earth refracts incident light, thereby deflecting the observed positions of sources (FILIP-PENKO 1982).This effect could result in dipole artifacts in difference images.Since the DC2 images do not include this effect, we exclude DCR as the cause of the artifacts in our analysis.
From this discussion, we conclude that the origin of the dipole artifacts is still undetermined.To thoroughly understand the cause will require control of the image resampling in more detail and carefully considering the resulting PSFs of the calexps and coadd templates.

Possible Improvements
We here summarize some potential improvements to the performance of the LSST Science Pipelines image subtraction algorithms.(1) Implement spatially varying delta bases.In our work, we find that the default kernel bases used in the DIA pipeline do not fully correct sampling or registration issues.Instead of using Gauss-polynomial functions as the bases, we suggest using delta bases (Bramich 2008;Miller et al. 2008), which are the most flexible bases for image difference.We also recommend including spatial variation in the delta kernel because noise properties and PSF shapes may vary across a CCD image.To avoid overfitting the model to noise using these more flexible bases, different regularization techniques (Becker et al. 2012;Bramich et al. 2016) should also be tested.
(2) Correct the kernel solution when using preconvolution or choosing calexps as templates.When the PSF of a calexp is sharper than the PSF of a coadd exposure, the image subtraction kernel needs to perform the deconvolution to the coadd exposure, which usually results in ring artifacts in the difference image.To avoid ring artifacts, one approach is to "preconvolve" the calexp using either its own PSF or a more generic Gaussian kernel.The other approach is to use the calexp as the template image and match its PSF to the PSF of the coadd exposure.Both approaches involve convolving the calexp, which adds correlations between pixels.The default normal equation of the image difference solution assumes the template is noiseless.To account for the effect of correlations, we suggest including the pixel covariance matrix Σ in the kernel solution of the AL algorithm.Due to the huge size of Σ, how to properly estimate the correlations of neighboring pixels and store Σ are worthy questions for consideration in implementation.
(3) Improve the numerical stability of the ZOGY decorrelation kernel and its AL equivalent.The ZOGY algorithm provides a source-detection optimal difference image with the advantage of noise decorrelation.However, the numerical instability caused by the division of small values at high frequencies can affect the quality of its difference image and equivalent PSF (Reiss 2017; Kovacs 2021).Kovacs (2021)  , which is either 0 or ∞.In practice, the fast Fourier transform applied to obtain frequency images cannot maintain this ratio as a smooth function of frequency due to numerical accuracy.To correct this issue, Kovacs (2021) proposes three possible workarounds, which are (a) using Gaussian models to create PSFs in Fourier space directly, (b) replacing the high-frequency noise using Gaussian limit values, and (c) using the score    noise whitening.Mathematically, this is equivalent to the zerophase component analysis (ZCA; Kessy et al. 2015) in complex space.The proof of this equivalence is given in the Appendix.
In reality, however, the white-noise assumption for the original template image and science image is not proper for bright sources, where the Poisson noise of bright pixels is dominant.
To obtain white noise for bright sources, it would be possible to apply a standard whitening process for the region around each bright source as long as we have its covariance matrix.
However, given the large number of bright sources, calculating the inverse matrix of the covariance matrix may be timeconsuming.A work-around method would be to treat each row or column of the stamp as a random vector and require the correlation between rows or columns to be white (Shi et al. 2016;Seghouane & Shokouhi 2018).This simplification significantly reduces the dimensions of the covariance matrix; however, the reliability of this method for image difference has not been tested.

Conclusions
We have evaluated the LSST Science Pipelines DIA pipeline using synthetic source injection.We selected seven coadd template exposures along with 10 calexps overlapping each coadd in the i band from the DC2 data set.By tracing the recovery of injected synthetic sources, we predict the range of the magnitude depth (m 1/2 ) of the pipeline for realistic simulated LSST images to be 23.40-23.71mag when the host SB ranges between 20-21 mag arcsec −2 and 24-25 mag arcsec −2 .We explored the effects of changing the AL kernel spatial degree of freedom on reducing artifacts with synthetic injections at MAG20 and MAG23.
We find that the default setup of the AL algorithm is good in terms of transient source detection and photometric measurements.At the injected magnitudes of 20 and 21, the efficiency is above 0.989, and more than 99% of the flux measurements are within 0.1 mag of the injected magnitudes.In the region away from the image edge, increasing the spatial degrees of freedom up to 4 can reduce the artifact rate without hurting efficiency and flux measurement.We calculate the efficiency and artifact rate of the flags from the diaSrc table.We find that by applying a proper combination of saturation flags, dipole flags, and shape flags to the detection from the diaSrc table, we can reduce the artifact rate from 695 deg −2 to 69 deg −2 at MAG20, and the efficiency only drops from 1.000 to 0.999.At MAG23, the artifact rate drops from 700 deg −2 to 68 deg −2 , while the efficiency drops from 0.847 to 0.791.To study the morphology of the artifacts and understand their origins, we split the artifacts into broad, near, and sharp classes, which correspond to the seeing condition where the calexp PSF is larger than the coadd PSF, the calexp PSF is smaller but less than 5% smaller than the coadd PSF, and the calexp PSF is more than 5% smaller than the coadd PSF.After removing saturation artifacts from the analysis, the artifact per square degree rate is 286 in the broad class, 312 in the near class, and 414 in the sharp class.We find dipole artifacts and correlated-pixel artifacts in subtractions from all classes.Ring artifacts are seen only in the sharp class, which is consistent with rings being decorrelation artifacts.By mapping these artifacts to the src of the original calexp, we find that most of the artifacts are subtraction residuals from regions around bright (S/N 200) stars.Possible causes of these artifacts include inexact PSF matching, registration issues, sampling errors, and Poisson fluctuation of bright pixels.To remove these artifacts from difference images, we suggest possible improvements, such as applying a set of more flexible kernel bases with spatial variation and improving the treatment of correlated noise in the images.back to the original coordinate system.This is exactly the same whitening procedure as the ZOGY deconvolution solution in one dimension.
In 2D, images can be represented as 1D row-ordered vectors, and PSFs can be represented as doubly circulant matrices.The Fourier transform can be represented as a Kronecker product of 1D Fourier transform matrices.The inverse of the 2D Fourier transform matrix can diagonalize circulant matrices with their Fourier transforms as eigenvalues (Jain 1989).Thus, the proof in 1D can be generalized to 2D.

Figure 1 .
Figure 1.Locations of host galaxies on a patch (4100 × 4100 pixels).Hosts with different SB are indicated with different colors.

Figure 3 .
Figure 3. S/N distribution of detected sources with true mag listed above each panel.Injections from different host SB are binned together.The efficiency at each source magnitude is shown on each plot.The vertical line in each plot shows the S/N = 5.5 detection threshold.The S/N distribution is closer to the threshold for fainter synthetic sources.

Figure 5 .
Figure 5. Postage stamps of difference images from different kernel spatial degrees of freedom (d s ).Each column shows one location on the subtraction image.Each row corresponds to a specific d s .

Figure 6 .
Figure 6.Artifacts commonly seen in difference images.Sources with saturated pixels (left) are easy to identify and marked with saturation flags in the diaSrc table.Clustering pixels (second from left) come from subtraction residuals of background sources or background noise.Dipoles (second from right) require measurements and fitting; many are successfully identified, but there is a continuum of less-offset and lower signal-to-noise dipoles that are hard to identify.The dipole artifact is a subclass of the clustering-pixels artifact.Deconvolution (right) leaves characteristic rings from trying to invert information that is not available in the larger PSF image.

Note.
Percentage of artifacts in each flag set of the broad class, near class, and sharp class.The first row shows the fraction of saturation-flag artifacts.The second row shows the fraction of non-saturation-flag artifacts that have dipole flags set to True and shape flags set to False.The third row shows the fraction of non-saturation-flag artifacts that have dipole flags set to False and shape flags set to True.The fourth row shows the fraction of non-saturation-flag artifacts that have both dipole flags set to True and shape flags set to True.The fifth row shows the fraction of non-saturation-flag artifacts that have both dipole flags and shape flags set to False.The last row shows the artifact rate (deg −2 ) of non-saturation-flag artifacts.

Figure 7 .
Figure 7. Artifacts come from the mis-subtraction of bright sources.Here we show the fraction of detections that are artifacts as a function of the S/N of the nearest background source within 2″.The dashed line shows the fraction of all artifacts.The solid line shows the fraction of artifacts that have no saturation flags set to True.

Figure 8 .
Figure 8. Morphology of non-saturation-flag artifacts from the broad class.The top row shows artifacts with both dipole flags and shape flags set to True.The second row from the top shows artifacts with dipole flags set to True and shape flags set to False.The second row from the bottom shows artifacts with dipole flags set to False and shape flags set to True.The bottom row shows artifacts with both dipole flags and shape flags set to False.

Figure 9 .
Figure 9. Morphology of non-saturation-flag artifacts from the near class.The top row shows artifacts with both dipole flags and shape flags set to True.The second row from the top shows artifacts with dipole flags set to True and shape flags set to False.The second row from the bottom shows artifacts with dipole flags set to False and shape flags set to True.The bottom row shows artifacts with both dipole flags and shape flags set to False.

Figure 10 .
Figure 10.Morphology of non-saturation-flag artifacts from the sharp class.The top row shows artifacts with both dipole flags and shape flags set to True.The second row from the top shows artifacts with dipole flags set to True and shape flags set to False.The second row from the bottom shows artifacts with dipole flags set to False and shape flags set to True.The bottom row shows artifacts with both dipole flags and shape flags set to False.

Figure 11 .
Figure 11.Signal-to-noise distribution of the sources in different FWHM classes.

Figure 12 .
Figure 12.Morphology of artifacts from faint background sources.(Top row) Postage stamps of coadd images.(Second row from top) Postage stamps of calexp images.(Bottom row) Postage stamps of artifacts.

Figure 13 .
Figure 13.Morphology of artifacts that have no matches from the src.These artifacts are visually similar in each class.Inspecting their stamps shows that they are from correlated noise.

Figure 14 .
Figure 14.(Top row) Postage stamps of coadd images.(Second row from top) Postage stamps of calexp images.(Second row from bottom) Postage stamps of dipole artifacts in subtraction using the AL Gauss-polynomial algorithm.(Bottom row) Postage stamps of subtractions using the ZOGY algorithm made at the same positions as the AL dipoles.The artifacts are consistent between the two methods.
Here the hat symbol represents the Fourier transform of each quantity, P t and P s are the PSFs of T and S, F t and F s are the flux scaling factors of both images, σ t and σ s are background noise levels, and F D is the normalization constant.The expression in the denominator of D plays the role of decorrelating the noise.

Table 1
Flux Difference Pull Distribution, Efficiency, and Artifact Rate with Different Kernel Spatial Degree of Freedom

Table 2
Efficiency and Artifact Rate of All Flags Note.Flags can be used for removing artifacts.For a specific flag, detected sources are classified as artifacts if the flag is set to True.The first row shows the results without applying flags.The last row shows the results of applying all of the above flags.

Table 3
Efficiency and Artifact Rate of Selected Flags Flags can be used for removing artifacts.For a specific flag, detected sources are classified as artifacts if the flag is set to True.The first row shows the results without applying flags.The last row shows the results of applying all of the above flags.

Table 4
Definitions of Saturation Flags, Dipole Flags, and Shape Flags Some of these flags are not actually for saturation but rather for other types of pixel-level issues.We include them here because they also indicate pixel issues.Visually inspecting detected sources shows that the majority of artifacts that have these flags set to True are saturation artifacts.
suggests that if we express the ZOGY D