The effect of quasar redshift errors on Lyman-$\alpha$ forest correlation functions

Using synthetic Lyman-$\alpha$ forests from the Dark Energy Spectroscopic Instrument (DESI) survey, we present a study of the impact of errors in the estimation of quasar redshift on the Lyman-$\alpha$ correlation functions. Estimates of quasar redshift have large uncertainties of a few hundred $\text{km s}^{-1}\,$ due to the broadness of the emission lines and the intrinsic shifts from other emission lines. We inject Gaussian random redshift errors into the mock quasar catalogues, and measure the auto-correlation and the Lyman-$\alpha$-quasar cross-correlation functions. We find a smearing of the BAO feature in the radial direction, but changes in the peak position are negligible. However, we see a significant unphysical correlation for small separations transverse to the line of sight which increases with the amplitude of the redshift errors. We interpret this contamination as a result of the broadening of emission lines in the measured mean continuum, caused by quasar redshift errors, combined with the unrealistically strong clustering of the simulated quasars on small scales.


INTRODUCTION
The study of dark energy, as a potential explanation for the accelerated nature of the expansion of the Universe, demands high precision measurements of the expansion rate. These measurements are possible with the use of standard candles or standard rulers, particularly those that are visible out to large distances, or equivalently, to early cosmic times. Riess et al. (1998) and Perlmutter et al. (1998) measured the flux and redshift of type-Ia supernovae (SNIa) which are standardizable candles. This enabled the calculation of luminosity distance, , as a function of redshift, which showed that the expansion of the Universe is accelerating. Since then, both the quantity and quality of recent SNIa data have contributed to the reduction of uncertainties on parameters describing dark energy (Scolnic et al. 2018;Brout et al. 2019).
The accelerated expansion has been confirmed using a completely independent probe: baryon acoustic oscillations (BAO) as a standard ruler. These acoustic oscillations in the primordial plasma, prior to recombination, left an imprint on the large-scale structure of the Universe that corresponds to the size of the sound horizon, , at the drag epoch. This scale manifests as a peak in the matterdensity correlation function at comoving separations ∼ 147 Mpc, or equivalently, as an oscillatory pattern in the power spectrum. In the transverse direction, the BAO peak measures the ratio ( )/ , where ( ) = (1 + ) ( ) is the comoving angular-diameter distance. In the radial direction, it determines ( )/ , where ( ) = / ( ) is the Hubble distance. These observables are used to infer the expansion history of the Universe and derive cosmological parameters of the models that describe it.
Since the first BAO measurements (Eisenstein et al. 2005;Cole et al. 2005) several spectroscopic surveys have been built with the goal of measuring the BAO scale in the distribution of matter. This distribution has been traditionally mapped with galaxies as a tracer. As galaxies become relatively faint above redshifts of = 1, quasars have been used to trace the matter field at those higher redshifts. At > 2, a new window has been recently opened to observe BAO using Lyman-( Ly ) forests, features seen in the spectra of quasars, caused by the absorption of light by neutral hydrogen (Busca et al. 2013;Slosar et al. 2013;Kirkby et al. 2013;Delubac et al. 2015;Bautista et al. 2017;de Sainte Agathe et al. 2019) and in crosscorrelation with quasars (Font-Ribera et al. 2014;du Mas des Bourboux et al. 2017;Blomqvist et al. 2019;du Mas des Bourboux et al. 2020).
An accurate estimate of the redshift of quasars is essential for determining the 2-point statistics used for cosmological inference. Errors above several hundred km s −1 on the redshift estimates cause the 2-point functions to be smeared in the line-of-sight direction, reducing the precision in BAO measurements (even though this smearing is accounted for in the modelling). The broad emission line centres in the spectra of quasars are not necessarily good indicators of the host galaxy redshift (also named systemic redshift) which makes it problematic to obtain precise measurements. Due to the complex dynamic of the line-emitting regions in quasars, the broad line centres can be shifted with respect to their expected locations in the quasar rest-frame. Using spectra with high signal to noise ratio from 32 epochs in the Sloan Digital Sky Survey Reverberation Mapping project (SDSS-RM), Shen et al. (2016) calculated the relative velocity shifts between pairs of lines, and between lines and the systemic redshift when available. The systemic redshifts of quasars were obtained when narrow emission lines and/or stellar absorption lines could be observed. Broad high-ionisation lines, such as CIII, are typically blue-shifted by tens or hundreds of km s −1 and may have a strong luminosity dependence. Conversely the velocity shifts tend to be smaller for low-ionisation lines such as MgII, with no luminosity dependence. This study enabled the authors to derive empirical recipes for unbiased estimation of redshift with uncertainties based on various lines across a range of redshifts. Template fitting software, such as redrock, 1 uses templates built from a principal component analysis (PCA), which cannot fully account for all spectral variations of quasars.
The eBOSS collaboration (Dawson et al. 2016;Alam et al. 2021) used several different redshift estimators for quasars, all included in the official SDSS quasar catalogue (Lyke et al. 2020) from the Data Release 16 (Ahumada et al. 2020). In the clustering analysis of DR16 Ly forests, du Mas des Bourboux et al. (2020) performed a detailed study of the impact of different redshift estimators on the BAO parameters, using both real data and mock catalogues. From mocks, they observed that variations in the uncertainty of BAO best-fit parameters of ∼ 0.5 with different redshift estimators, was consistent with statistical error. Analogously, photometric redshift uncertainties in galaxy clustering and BAO measurements have been found to reduce the constraining power on the Hubble parameter (Chaves-Montero et al. 2018).
The Dark Energy Spectroscopic Instrument (DESI, DESI Collaboration et al. 2016) has recently begun a five-year programme of observations, in which it will observe three times more > 2 quasars than SDSS. They will be used for clustering measurements with the Ly forest. DESI is expected to produce roughly 1% uncertainties in the BAO parameters from these forests. DESI is a multi-object optical spectrograph that receives light from 5000 optical fibres mounted at the focal plane of the 4-metre class Mayall Telescope, in Arizona, USA. The light of each object is split and dispersed onto three cameras, each one corresponding to blue, red, and infra-red wavelengths. The resolution of spectra is about 2000 -3200 in the blue, 3200 -4100 in the red and 4100 -5000 in the infrared end. For each sky pointing, the exposure time is dynamically tuned to the current observing conditions in order to match the signal to noise ratio obtained for a 1000 second exposure taken in ideal conditions. Spectra are reduced and calibrated with a fully automated spectroscopic pipeline. DESI observes simultaneously different types of targets: emission line galaxies, luminous red galaxies, quasars as tracers and quasars containing Ly forests. At the end of its 5-year programme, DESI expects to cover about 14k deg 2 of the observable sky. The DESI Ly forest survey aims to obtain four observations for more than 800k quasars with redshifts > 2.1, corresponding to a density of 60 deg −2 (Chaussidon et al. in prep.). There are three methods of redshift estimation being employed: template fitting with redrock and two machine learning algorithms, quasarNET (Busca & Balland 2018) and SQUEzE (Pérez-Ràfols et al. 2020). Routine visual inspection is not a feasible option as DESI is expected to observe O (10 6 ) quasars.
In this work, we quantify the impact of errors in the redshift estimates on the clustering of the Ly forest. In particular, we looked into the impact on BAO parameters derived from the Ly autocorrelation and the Ly -quasar cross-correlation, using synthetic versions of the completed DESI dataset (five years of observations). This paper is organised as follows. Section 2 describes the data sets used in the analysis. Section 3 describes the methods used in our analysis, and the results from the analysis on simulated data are presented in Section 4. Section 5 contains a description of the model for the contamination of the correlation functions by redshift errors.
In Section 6, we discuss the implications of our findings and present our conclusions.
In this work, conversion from angular and redshift separations to physical separation are made using a flat-LCDM model with Ω = 0.315.

SYNTHETIC DATA SETS
This work is based solely on results obtained on synthetic sets of Ly data that mimic properties of the DESI Ly survey. In this section we describe the basic principles behind the production of these synthetic data sets, the particularities of DESI data, some special sets used to test our hypotheses, and how we mimic the intrinsic errors in quasar redshift estimates.

Procedure for Ly mock creation
Synthetic Ly forest datasets are used by spectroscopic surveys such as eBOSS or DESI to test the BAO analysis pipeline and evaluate systematic effects. They are designed to reproduce the astrophysical and instrumental characteristics of the survey data. By having a large number of realisations of synthetic surveys, usually a few hundred, we can test systematic effects to high precision.
There are several methods to create mock Ly data. N-body hydrodynamical simulations Rossi et al. 2014;Walther et al. 2021;Chabanier et al. 2020) are among the most realistic methods to create forests but are too computationally expensive for producing hundreds of realisations containing volumes as large as the real surveys. Hybrid methods, (e.g., Peirani et al. 2014;Sorini et al. 2016), also rely on N-body simulations to calibrate a model used to produce quick lognormal mock catalogues. The method we use in this work to create synthetic data is based on correlated Gaussian random fields (Coles & Jones 1991;Le Goff et al. 2011;Font-Ribera et al. 2012a;Bautista et al. 2015), whose correlations follow a given input power spectrum. The random field mimics the matter density field in the Universe. Quasars are placed via Poisson sampling in regions where the density is larger than a given threshold. One dimensional density "skewers" along the lineof-sight to each quasar are drawn by interpolating the same random field. These can then be transformed to a transmitted flux fraction, representative of a Ly forest, by adding small-scale fluctuations, converting from density to optical depth, and adding redshift-space distortions (RSD).
We used an implementation of the Gaussian method of synthetic Ly forests, named L C L R (Farr et al. 2020a). C L R mock catalogues use the package CoLoRe 2 (Ramírez-Pérez et al. 2021) to place quasars in a lognormal density field. Skewers are drawn from each quasar, have small-scale power added to them and these are transformed into transmitted flux fractions using the L C L R package. 3 This process produces noiseless transmission fields that have correct clustering properties on scales relevant for BAO studies. L C L R stores skewers of metal absorption and a table of high column density systems (HCDs) in its output, which may be optionally added to the Ly skewers during subsequent stages of the pipeline. These simulated datasets cover the projected 5-year DESI footprint and contain around one million quasars in the redshift range ∈ [1.8, 3.8], of which 700,000 have > 2.1.

Simulating DESI spectra of quasars
We generated mock quasar spectra reproducing observational properties of the Dark Energy Spectroscopic Instrument, as described in Section 1. All of the mock data sets used in this work have density of forests of 50 deg −2 within the full DESI footprint, using the methodology described in du Mas des Bourboux et al. (2020). Prior to Survey Validation, this was the expected density for five years of DESI observations, but this target has since been revised upwards to 60 deg −2 . Spectral properties of our mock quasars are simulated with thepackage. 4 Continuum templates (for the unabsorbed flux of the quasar with emission lines) 5 are generated by using functions from the library (McGreer et al. 2021), which contains a broad set of tools to generate mock quasar spectra. Templates are composed by a series of broken power-laws with independent Gaussian slope distributions, and a set of emission lines defined by their rest frame wavelength, equivalent width and Gaussian r.m.s. width. The slopes and emission line profile distribution used for the mocks in this work uses a modified version of the BOSS DR9 model, which includes some emission lines from the composite model of BOSS spectra from Table 4 of Harris et al. (2016), and some adjustment of the equivalent widths so that the mean continuum resembles better the one obtained in eBOSS Data Release 14. DESI intend to co-add four observations of Ly quasars. We therefore convolve spectra to the instrumental resolution, and add pixel noise corresponding to 4000 second exposures (i.e. four times the nominal exposure time).
Astrophysical contaminants are often included in synthetic realisations of the Ly forest. Metal absorbers were added to mocks for the first time in Bautista et al. (2015) and were first modelled in the correlation function in Bautista et al. (2017), where the change in the BAO peak parameters caused by metal absorption was found to be less than 1%. Similarly, Font-Ribera & Miralda-Escudé (2012) simulated the impact of High Column Density absorbers (HCDs), and discussed its impact on the measured correlations. It is important to model these contaminants in BAO analyses (du Mas des Bourboux et al. 2020), but its impact is orthogonal to the effect discussed in this paper. For this reason, we decided not to include them in our mocks.
However, we do simulate the impact of non-linear peculiar velocities in the quasar redshifts, a phenomenon known as Fingers of God (FoG). Since our mocks are generated from Gaussian fields, we only have access to linear peculiar velocities. Therefore we simulate FoG by applying a random shift to the quasar redshifts drawn from a Gaussian distribution with an r.m.s. given by the parameter ,FoG . We use by default a value of ,FoG = 150 km s −1 , similar to the expected velocity dispersion in halos hosting quasars at ∼ 2, but in Section 4 we also use mocks with an extreme value of ,FoG = 500 km s −1 .

Simulating quasar redshift errors
Even though we try to capture the diversity of quasar continua in our mocks, fitting algorithms on simulated data often perform better than in real data (Farr et al. 2020b). For this reason, instead of trying to estimate redshifts from our mock spectra we decided to emulate errors in the pipeline redshift estimation. We add Gaussian random errors to the quasar redshifts in the mock catalogues, using by default an r.m.s. of , = 500 km s −1 .
It is important to highlight a key difference between how we simulate FoG and redshift errors. Even though they both add random shifts to the quasar redshift, the shift emulating FoG is applied before we generate the quasar continuum. On the other hand, the shift emulating redshift errors only affects the value of redshift in the quasar catalogue that will be used in the analysis, but it does not change the simulated spectrum.

No-QSO-clustering and no-forest mock data sets
As described in Section 5, the effect of redshift errors on the Ly clustering depends on both the amplitude of the quasar clustering and the smoothing of the mean continuum template.
In order to test these assumptions, we created two special types of mock data sets in addition to the standard mocks: • No-QSO-clustering mock data sets contain quasars randomly distributed in the volume, regardless of the local density, so both the auto-correlation of quasars and the QSO-Ly cross-correlation are zero by construction. The Ly auto-correlation is conserved, since Ly forests are constructed from the correlated underlying density field as in the standard mock sets.
• No-forest mock data sets were constructed assuming that quasar spectra have no Ly absorption, to allow analysis of the effect without Ly clustering. The quasars are placed at the peaks of the density field as in the standard mock sets, so the QSO auto-correlation is conserved. These data sets are effectively noiseless, having a simulated exposure time of 10 6 seconds.
In Section 4 we show how quasar redshift errors impact the observed clustering of these special mock sets, validating the assumptions of our contamination model.

METHODS
Current studies of the Ly forest correlation function for the measurement of baryon acoustic oscillations follow a rather simple approach, summarised in this section. Briefly, the steps are: • fit of quasar continua, • estimate of transmission and associated weights, • estimate of correlation functions, • estimate of distortion and covariance matrices, • fit of the BAO model.
Each of these steps can be performed using the publicly available code "Package for Igm Cosmological Correlation Analyses", . 6 (du Mas des Bourboux et al. 2021) We refer the reader to du Mas des Bourboux et al. (2020) for the full description of the methodology and its validation.

Continuum fitting
To compute correlations, we use the contrast, , , of the transmission ( ) at a given observer-frame wavelength of pixel and quasar . The transmission contrast is defined as 6 https://github.com/igmhub/picca where¯is the sample's mean transmitted fraction at the absorber redshift, assumed to be only a function of redshift (or observed wavelength if we assume a single transition, such as Ly ). We can convert between redshift and observed wavelength by assuming a given rest-frame wavelength of the absorption. In this work, we focus on the Ly absorption, for which = 1216 Å. We use a rest-frame spectral region between the Ly and Ly broad emission lines, i.e., ∈ [1040, 1200] Å. The observed frame wavelength of the spectra is obs ∈ [3600, 5500] Å.
The transmission , = ( ) is defined as the ratio between the observed flux in a given pixel and quasar, , , and the unabsorbed flux level, commonly referred as the continuum, , , such that the contrast can be written as (2) In du Mas des Bourboux et al. (2020) and previous studies, the continuum is assumed to be a universal function of wavelength in the rest-frame of the quasar, scaled by a per-quasar linear function of log-wavelength,ˆ, where = /(1 + ), is the redshift of the quasar, and are fitted parameters for each quasar,¯is also referred as the mean continuum.
In practice, due to noise, spectrograph resolution, and the non-Gaussian nature of the distribution of the transmission, it is hard to break the degeneracy between the mean continuum¯and the mean transmission¯, and estimate them separately. Therefore, in du Mas des Bourboux et al. (2020) they use Eq. 3 as the model for the product ,¯i n Eq. 2, so we can re-write it as simply: Current methods start by assuming a shape for¯( ), dividing the observed flux by it, and fitting for + log( ), assuming Gaussian statistics. Once all quasar spectra are fitted,¯is computed by stacking the ratio /[ + log( )] in the quasar rest-frame. This new mean continuum is then used again to fit for all parameters and . This process is repeated a few times until convergence. We highlight that for computing¯, an estimate of the quasar redshift is used. This estimate might be affected by intrinsic biases, scatter or measurement errors.
If the mean continuum were flat, the redshift smearing would have no effect on the derived template. However, because of broad emission lines in the mean continuum that are further broadened by the redshift errors, the derived template is systematically different from the true mean continuum. As we will see in Section 5, this fact combined with quasar clustering modifies the quasar-forest and forest-forest correlations.

Correlation functions
The auto-correlation function of Ly transmission fluctuations is defined as where ì is the separation vector between two forest pixels in the volume, which can be decomposed into separations parallel to the line-of-sight, , and transverse to the line-of-sight, ⊥ . The crosscorrelation between quasars and Ly transmission fluctuations is defined as where ( ì) is the fluctuation of the number density of quasars. The approximated formula on the right-hand side is valid under the assumption that quasars are sparse (shot-noise dominated), in which case the cross-correlation is simply the average Ly transmission around quasars at positions given by ì (see Appendix B in Font-Ribera et al. (2012b)). The auto-correlation function between quasars is QSO×QSO (ì ) = (ì) (ì + ì ) . The quasar autocorrelation is an important ingredient of our model in Section 5 and is estimated from the mock catalogues.
In practice, the ensemble averages in the above definitions of the correlation functions are in fact averages over the pairs of objects in a given volume. Correlation functions are estimated in bins of separation and ⊥ . The estimator used for the auto-correlation of the Ly forest isˆL where is the weight assigned to pixel . The weights used are described in Eq. 4 of du Mas des Bourboux et al. (2020). The sums are over all pairs of pixels for which their separation is within the bounds of bin . Analogously, the cross-correlation estimator iŝ where indexes quasars. The auto-correlation of quasars is computed with the Landy-Szalay estimator (Landy & Szalay 1993) where = ( , ) ∈ / is the weighted number of quasar pairs in bin , is the weighted total number of available quasar pairs in the volume. A Poisson sample of unclustered points, named randoms, is built following the geometry of the survey.
is the normalised number of random pairs in bin , while is the number of cross pairs between quasars and randoms. The weights assigned to the transmission contrasts are the inverse of the total pixel variance, which is assumed to be a combination of some intrinsic variance (function of observed wavelength only), and instrumental variance. The intrinsic variance is estimated from the full set of forests. The weights assigned to both forest pixels and quasars also take into account the evolution of their clustering with redshift, so ∝ (1 + ) . The default values used in the standard mocks are the same as the evolution in the eBOSS data, = 1.9 for forest pixels (du Mas des Bourboux et al. 2017) and = 0.44 for quasars (du Mas des Bourboux et al. 2019). Imperfect weighting is not expected to bias the results. For the special mocks, such as no-QSO-clustering or no-forest (see Section 2.4), the redshift evolution is neglected.

Distortion and covariance matrices
Our continuum fitting procedure uses information from the forest itself, so eachˆ, is a linear combination of all , from the same quasar . When computing the correlation functions, each pair of pixels brings with it contributions from their whole respective forests. This distorts the measured correlation function. Bautista et al. (2017) introduced a method to account for this distortion which was also used in subsequent analysis, including du Mas des Bourboux et al. (2020). If the difference between the true and the distorted is a linear function of log , the distorted correlation is a matrix times the undistorted correlation. This matrix is referred to as the distortion matrix. We compute these distortion matrices for our mock catalogues following the same procedure.
Real forests contain absorption by elements other than hydrogen, such as silicon, nitrogen and iron, which creates spurious correlations. None of our mock catalogues contain these elements, as the effect is unrelated to the topic treated in this work.
The covariance matrix of our measurements are estimated by subsampling, i.e., we divide the footprint into sub-regions and compute the correlation function in each sub-region . The covariance is written as which assumes that correlations between sub-regions are negligible.
The number of bins of our correlation functions is usually larger than the number of the sub-regions , so we smooth the covariance matrix by assuming that its correlation coefficients are only a function of Δ = − and Δ ⊥ = ⊥ − ⊥ . We average all correlation coefficients that have the same (Δ , Δ ⊥ ).

Modelling the correlations
We model the large-scale correlations following the procedure from du Mas des Bourboux et al. (2020), that we shortly describe here.
A given correlation function (auto or cross) is defined as a sum of a smooth part and a BAO peak part: Only the peak part depends on the BAO dilation parameters and ⊥ , defined as where is the comoving size of the sound horizon at drag epoch, (¯) = / (¯) is the Hubble distance, (¯) = (1 +¯) (¯) is the comoving angular diameter distance assuming a flat universe, and¯is the mean redshift of the measurement. The correlation function model between two tracers is the Fourier transform of an anisotropic biased power spectrum written as where ì is the wavevector, with modulus and = / ; and are the linear bias and redshift-space distortions parameters, respectively; bin accounts for the binning of the correlation function, NL is a empirical term that accounts for the non-linear effects on small scales, and QL is the linear matter power spectrum with a empirical anisotropic damping applied to the BAO peak component. The linear matter power spectrum is computed from a Boltzmann solver code, such as (Lewis et al. 2000). The non-linear term NL is only included for the cross-correlation in this work since our mock forests are built from Gaussian random fields and do not contain non-linear clustering. Given the Gaussian nature of both FoG and redshift errors in the simulated datasets, we use a Gaussian kernel with width to model both effects in the cross-correlation: This is similar to the Gaussian kernel proposed in Percival & White Linear density bias (Eq. 13) Linear redshift-space distortions parameter (Eq. 13) Lorentzian radial dispersion of velocities (Eq. 14) (2009), with a factor of two difference to take into account that there is only one quasar field in the cross-correlation.
The final correlation function model accounts for the distortion matrix, as discussed in Section 3.3. The distorted correlations are written as dist = ∑︁ model .
In this work, we only focus on the effect of redshift errors, so we do not add metals or high-column density systems to the mock catalogues. Therefore, our theoretical model does not have terms that account for these effects.

Fitting the BAO scale
BAO fits are made over separations of ∈ [10, 180] h −1 Mpc and between directions ∈ [0, 1] and ∈ [−1, 1] for the auto and cross-correlations respectively. The correlation function bin size is 4 h −1 Mpc , corresponding to 1590 separation bins for the auto and 3180 bins for the cross. The joint fit of auto-and cross-correlations uses a total of 4770 measurements. Four parameters are let free for the fit of the auto-correlations: , ⊥ , Ly and Ly . For the crosscorrelation, an additional parameter (Eq. 14) is also fitted. The same five are also let free for combined fits. Table 1 summarises the parameters used in this work.
During the creation of the C L R mocks, a Gaussian smoothing of 2 h −1 Mpc is applied to the model power spectrum to account for the low resolution of the simulation grid. This adds an extra (isotropic) smoothing to our mocks even before we add FoG or redshift errors. We account for this by fitting Gaussian smoothing terms for the whole model, similar to ( ) in Eq. 13, while fixing to zero on the fiducial mocks. These values are subsequently used as fixed parameters when fitting mocks that do contain FoG and redshift errors.

ANALYSIS OF SYNTHETIC DATA
In this section we present the analysis of the correlations measured from the simulated data described in Section 2, using the methodology from Section 3. Figure 1 presents the estimated correlation functions versus ( , ⊥ ) for of the average of ten independent mock realisations. The top panels show the cross-correlation between Ly forests and quasars while the bottom ones show the auto-correlation of Ly forests. The left panels are mocks without redshift errors ( , = 0 km s −1 ) while the central panels contain , = 500 km s −1 . The right panels show the difference between central and left panels, isolating the features in the correlation function caused by redshift errors. We can see that, for both the cross-and auto-correlations, these oscillatory features are located at small transverse separations ⊥ , decreasing in amplitude as ⊥ increases. Additionally, the crosscorrelation signal near zero separations shows the effect of smearing along the line of sight also caused by the redshift errors, and the oscillations occur only for negative values. These values occur when the Ly forest pixel is in front of the neighbouring quasar. Where there is a strong cross-correlation, there is also likely to be a strong correlation between the two quasars (as a forest is always in front of its own quasar). For positive , where the pixel is behind its neighbouring quasar, its own quasar will be even further behind, so the quasar auto-correlation will be less strong. The asymmetry in the signal, therefore, strongly suggests a dependence on the auto-correlation of quasars. Figures 2 and 3 show respectively the cross-correlation of the Ly forest with quasars and the auto-correlation of Ly forests, measured from different sets of mocks. Different panels in these figures show different wedges of the correlations, i.e., averages over ranges of = / . The green bands correspond to the average of ten fiducial realisations (with different cosmic variance and instrumental noise), using the standard value of ,FoG = 150 km s −1 and ignoring redshift errors ( , = 0); the pink bands are from ten mocks with a larger value for ,FoG = 500 km s −1 ; the blue bands show measurements on the fiducial mocks, after adding redshift errors to the quasar catalogues ( , = 500 km s −1 ). The width of the bands correspond to the standard deviation between the ten realisations of each dataset, i.e., an estimate of the errors for one realisation. The dashed black lines show the best-fit model obtained when analysing the fiducial set of mocks.

Impact of non-linear peculiar velocities (FoG)
Adding FoG to the quasar spectra has a negligible effect in the Ly auto-correlation, making it impossible to distinguish the green and pink bands in Fig. 3. This is expected since the redshifts of absorption lines do not depend on the quasar redshift. On the other hand, the cross-correlation with quasars is smoothed out by the random shifts of the quasar position along the line of sight. This causes small differences on BAO scales, but the impact is clearly seen on scales below 25 h −1 Mpc in Figure 2.
The impact of FoG on the best-fit parameters from the BAO fits can be seen in Table 2, where the results from the average of ten realisations are compared. Here again FoG have a negligible impact on the Ly auto-correlation, but the uncertainties on from the cross-correlation are 15% larger (from 0.42 to 0.48%).

Impact of quasar redshift errors
The blue bands in Figures 2 and 3 show the impact of redshift errors. As discussed in Section 2.3, we add these redshift errors to the catalogues after the quasar spectra have been simulated, i.e., the redshifts listed are different than the redshifts that were used to generate the quasar continua.
For correlations that are not along the line of sight (| | < 0.8), the impact of FoG and redshift errors are indistinguishable: the crosscorrelation is smoothed on small scales, and the auto-correlation is not affected. However, clear differences appear in the −1 < < −0.8 cross-correlation wedge (top left panel of Fig. 2) and more surprisingly in the 0.8 < < 1 auto-correlation wedge (left panel of Fig. 3). As shown in Table 2, redshift errors seem to degrade the BAO performance not only in the cross-correlations, but also in the Ly auto-correlation.
We believe that this is the first time that these features are detected and discussed. The right panels show the difference between the left and centre panels, isolating the contamination, which is seen as an oscillating signal at small transverse separations. Note that negative values correspond to Ly forest absorption lying between the neighbouring quasar and the observer. This scenario results in a larger auto-correlation between the two quasars than the case where the forest lies behind its neighbouring quasar. Table 2. Best-fit parameters from ten combined realisations of standard mocks, with simulated Fingers of God velocities of ,FoG = 150 or 500 km s −1 , and simulated redshift errors of , = 0 or 500 km s −1 . The fourth and fifth columns show the BAO parameters. The last column shows the best-fit value of the parameter describing the line-of-sight (Gaussian) smoothing affecting the cross-correlation, which includes FoG and other sources of error on the redshift measurement. The model used to fit the correlations does not attempt to account for the new effect discussed in this work. The value of can be expressed in units of h −1 Mpc by simply dividing by 0 = 100ℎ km s −1 Mpc −1 . In Fig. 4 we show the contaminated wedges for realisations with different redshift errors , = 0, 250, 500 and 750 km s −1 . It is clear that the amplitude of the spurious correlations grows monotonically with the amplitude of the redshift errors, and importantly, changes the broad-band shape of the correlation functions which has an impact on the fitting procedures. Table 3 shows results of BAO fits to mock realisations with increasing values of , . We report the average best-fit parameters of ten independent realisations for each case. The reported errors are therefore √ 10 ∼ 3.16 smaller than the expected errors of the full 5-year DESI survey. The best-fit dilation parameters from the auto-correlation function do not present significant changes when increasing , , while uncertainties do increase slightly for larger redshift errors. The best-fit values correlate well with the input , , but are not in agreement, likely due to the fact that also accounts for FoG, or that our model (Eq. 14) is not necessarily a good match to the Gaussian errors added to mocks.

Data Set
These results show that BAO measurements are not biased, even when considering large values for quasar redshift errors , . Only the estimated errors on and ⊥ are increased for larger values of , , likely due to the lack of modelling of the effect caused by redshift errors. Properly accounting for these features in the model could potentially help to reduce these errors. In the next section we discuss a potential way to achieve this goal.

MODEL FOR THE CONTAMINATION
In this section we describe theoretically the expected impact of redshift errors in the correlation functions and we will show that the two main elements that are responsible for the contamination are: (i) a systematic error in the mean continuum estimate, , (Eq. 3), and (ii) a non-zero quasar-quasar correlation function.
The first effect of redshift errors, if assumed to be reasonably distributed randomly around some central value (which is not necessarily the true value), is to smooth the estimate of the mean continuum¯( ). Emission lines present in the mean continuum are smeared. Figure 5 shows how the estimated mean continuum from mock catalogues is modified when increasing redshift errors. Known emission lines in the forest region include SIV 1063, 1073, FeII 1082, OIII 1084, PV 1118, 1128, and CIII* 1175. The smoothing of these lines in the continuum is clearly visible in the bottom panel, which shows the relative difference between mean continua with and without errors. Redshift errors of this magnitude introduce biases in the mean continuum of the order of half a percent. These biases have the typical shape of the subtraction of two line profiles with different widths.
If we assume a systematic error ( ) in the estimate of the mean continuum relative to the true mean continuum¯( ), as shown in the bottom panel of Figure 5, such that then the biased transmission contrast can be written from Eq. 4 aŝ * = (1 +ˆ) where second-order terms have been neglected in the approximation. When computing correlations (auto-or cross-with quasars) with this biased contrast field, we perform averages over the distribution of quasars and observed/rest-frame wavelengths. Given that quasars are not uniformly distributed over the volume, i.e., they are clustered, the average of will not average to zero. This gives the intuition for the second element in our model: the clustering of the quasars.  We only show the line-of-sight wedges that show spurious correlations caused by redshift errors. These significantly alter the broad-band shape, which will affect the fitting of the correlation functions. Table 3. Average best-fit parameters of 10 independent realisations of standard mocks, with simulated redshift errors of , = 0, 250, 500 and 750 km s −1 . All these mocks contain simulated FoG with ,FoG = 150 km s −1 . The reported errors are the estimated errors of the mean of 10 realisations so they are √ 10 smaller than the expected errors from the full 5-year DESI survey. The model used to fit the correlations does not attempt to account for the new effect discussed in this work. The value of output by the fitter originally in units of h −1 Mpc and was converted to km s −1 by multiplying it by ( )/(1+ ) = 103.9ℎ km s −1 Mpc −1 for = 2.3.  -508 -434 -363 -294 -227 -162 -99 -38 r [h 1 Mpc] . Correlation function of quasars in real space (no RSDs) in CoLoRe boxes (at z=2.05), compared with AbacusSummit-based mocks (Alam et al., in prep.) at z=1.4, with the best-fit linear model, fitted on scales larger than 25 h −1 Mpc. The deviations from linear theory in the CoLoRe mocks are significantly stronger than in AbacusSummit mocks. The high clustering at small scales makes these mocks particularly sensitive to the contamination from redshift errors.

Impact on quasar-Ly cross-correlation
We consider two quasars placed at redshifts 1 and 2 . Their separation vector is ì . We will study the cross-correlation between quasar and the forest with a given separation vector ì (corresponding to a given separation bin in a binned estimation of the correlation function), that can be decomposed into parallel separation and transverse separation ⊥ . Note that given the small angles involved, we can assume that the transverse component of ì is ⊥ = ⊥ . For a given and 1 , the forest pixel will be at the observed wavelength The corresponding quasar rest-frame wavelength of the same pixel is given by = /(1 + 2 ).
The cross-correlationˆis the stack of allˆ * (Eq. 17) lying at separation ì from the quasars.
The average can be written as a double integral over 1 and 2 . The first integral is weighted by the distribution of quasars versus redshift ( 1 ), while the second is weighted by the distribution of quasars around 1 at a given transverse separation ⊥ = ⊥ , which we denote ( 2 | 1 , ⊥ ). The excess probability of finding a quasar 2 at a distance ì from quasar 1 is given by the quasar-quasar correlation function (ì ), so Note that for a given set of 1 , 2 and ì , there is only a single possible value for . Also, ( ) is only defined in the forest region, usually between 1040 and 1200Å. Given that ( ) is a slow-varying function of compared to ( ), the first term in the square brackets is washed out by the integrals over , leaving The level of contamination therefore depends on the clustering of quasars and on the amplitude of , which is determined by the amplitude of redshift errors. In mock catalogues, we can estimate to use in the calculation of the contamination. Figure 6 shows the auto-correlation function of quasars from CoL-oRe and AbacusSummit simulations Maksimova et al. 2021;Hadzhiyska et al. 2021;Bose et al. 2021), without redshift space distortions. The AbacusSummit mocks are tailored for quasars from DESI SV (Survey Validation) (Alam et al., in prep). It can be seen that the small-scale clustering in the CoLoRe mocks is significantly stronger than in AbacusSummit. This makes the CoL-oRe mocks particularly suitable for testing this model.
Eq. 20 was used to calculate a model for the contamination in the cross-correlation function introduced by redshift errors. This equation requires two inputs, first, the systematic bias in the mean continuum estimate and, second, the quasar-quasar correlation function. We compare our model with measurements performed on mock catalogues in the following section.
The model for the contamination in the auto-correlation of Ly forests could be constructed as an extension of the crosscorrelation model. However, results on mock catalogues show that the case of the auto-correlation is more complex than the cross correlation, indicating that there must be more ingredients to be taken into account in order to correctly reproduce the contamination. We leave this exploration for future work and we focus on results for the cross-correlation in the next section.

Contamination in special mocks
The contamination due to redshift errors on the correlations was further investigated by repeating the analysis on two special sets of mocks. These are the no-forest and the no-QSO-clustering mocks, described in detail in Section 2.4. These are meant to test our hypothesis that the contamination arising from redshift errors depends on both the amplitude of the quasar clustering and the systematic error in the mean continuum estimate. This means that the effect should not depend in principle on the clustering of the Ly forest itself. Figure 7 shows the average cross and auto-correlation functions at ⊥ < 8 h −1 Mpc , plotted as functions of . Three sets of mocks are displayed: a stack of 10 realisations of standard mocks (blue), noforest (orange) and a single realisation of no-QSO-clustering mocks (green). Each panel compares the radial correlations for mocks with and without redshift errors of , = 500 km s −1 . The oscillatory features caused by redshift errors are clearly seen in standard and no-forest mocks, while as expected, no-QSO-clustering mocks do not present any visible effect. Figure 8 displays the difference Δ = 500 − 0 between the crosscorrelations with and without errors. We observe a very good agreement between Δ from standard and no-forest mocks, confirming our hypothesis. Figure 8 also shows the Δ = 500 − 0 for the model described in Section 5 as a dashed red line. This model was computed using the bias in the mean continuum as shown in Fig. 5 and the quasar auto-correlation displayed in Fig. 6. In the case of the cross-correlation, the agreement between standard, no-forest and the model is very good, both in terms of the shape of the features and their amplitudes.
The results on these special mocks and the qualitative agreement of our model for the contamination demonstrate that the effect of redshift errors in the cross-correlation depend both on the amplitude of the quasar clustering and on the systematic errors in the continuum shape.

DISCUSSION AND CONCLUSIONS
In this paper, we have described the effect of errors in estimating quasar redshifts on the Ly forest correlation functions, and the consequent impact on the BAO parameters. We used mock Ly forests with redshifts of ∈ [2.1, 3.8] designed to simulate five years of the DESI program, to which we added various amplitudes of both Gaussian random peculiar velocities (Fingers of God) and Gaussian errors in the quasar redshift estimates.
The two-point correlation functions exhibited unexpected systematic correlations at separations close to the line of sight when redshift errors were introduced (but which were absent when only large astrophysical FoG values were included). These features decrease in amplitude for increasing transverse separations, similar behaviour to the contamination caused by metals in the Ly forest. We found that these systematic correlations increase when increasing the amplitude of the redshift errors added to mocks. We believe it is the first time that this type of contamination has been observed and studied.
We analysed the impact of such correlations on fits of the baryon acoustic oscillations. We found that BAO parameters are not significantly shifted by the addition of Gaussian random redshift errors, for the three cases: cross, auto, and joint fits of auto+cross-correlations. Redshift errors also cause the uncertainties in the BAO parameters to increase. We believe that if the model accounts for the systematic correlations it could reduce these uncertainties, which is important for current Ly surveys. These results are based on averages of ten realisations of the full 5-year DESI survey.
We derived a simplified model for the contamination to correlations from redshift errors, based on two main ingredients: the quasar auto-correlation function and the systematic bias in the mean continuum. These hypotheses were tested using special sets of mock catalogues, with either no-QSO-clustering or no Ly absorption. Mocks with no-QSO-clustering do not exhibit the features, while mocks with no-forest do contain them, confirming our hypothesis. The amplitudes and shapes of the contamination in these special mocks well describe the systematic correlations in the more realistic mock sets in the case of the cross-correlation between Ly and quasars. Modelling the contamination for the Ly auto-correlation is left for future work.
A detailed study of the effects analysed here for the first time is necessary to optimise the constraining power of the Ly forest sample of DESI. Mock catalogues will need to properly include redshift errors with more realistic models, not necessarily Gaussian, and consider how different emission lines have different velocity shifts (e.g., Hewett & Wild 2010;Shen et al. 2016). The correlations between emission line velocity shifts could produce less smoothing of the forest continuum than implied by our prescription, which assumed random errors. This is because while the quasar redshift may be in error by some amount relative to the systemic redshift, the quasar redshift determined from emission lines redward of Ly may be a better predictor of the locations of the emission lines in the forest region than of the systemic redshift.
We did not study the issue of systematic biases in redshift estimates for quasars. Depending on which broad lines are present in the observed spectrum, systematic shifts may be introduced by spectral templates not accounting for them. The issue of systematic biases, and how they impact BAO constraints has been studied in Font-Ribera et al. (2013) and Glanville et al. (2021).
The methodology developed in this paper will be extremely useful in combination with a better understanding of the DESI data, to help develop fitting templates to account for these contaminations in the BAO analysis.
Finally, the contamination discussed in this paper could be an important systematic for studies that want to extract cosmological information from the full shape of the 3D correlations in the Ly forest (Cuceu et al. 2021). While the BAO measurements seem to be robust against them, it is possible that other cosmological inference might be biased if the impact of redshift errors is not taken into account.