An Overview of CHIME, the Canadian Hydrogen Intensity Mapping Experiment

The Canadian Hydrogen Intensity Mapping Experiment (CHIME) is a drift scan radio telescope operating across the 400-800 MHz band. CHIME is located at the Dominion Radio Astrophysical Observatory near Penticton, BC Canada. The instrument is designed to map neutral hydrogen over the redshift range 0.8 to 2.5 to constrain the expansion history of the Universe. This goal drives the design features of the instrument. CHIME consists of four parallel cylindrical reflectors, oriented north-south, each 100 m $\times$ 20 m and outfitted with a 256 element dual-polarization linear feed array. CHIME observes a two degree wide stripe covering the entire meridian at any given moment, observing 3/4 of the sky every day due to Earth rotation. An FX correlator utilizes FPGAs and GPUs to digitize and correlate the signals, with different correlation products generated for cosmological, fast radio burst, pulsar, VLBI, and 21 cm absorber backends. For the cosmology backend, the $N_\mathrm{feed}^2$ correlation matrix is formed for 1024 frequency channels across the band every 31 ms. A data receiver system applies calibration and flagging and, for our primary cosmological data product, stacks redundant baselines and integrates for 10 s. We present an overview of the instrument, its performance metrics based on the first three years of science data, and we describe the current progress in characterizing CHIME's primary beam response. We also present maps of the sky derived from CHIME data; we are using versions of these maps for a cosmological stacking analysis as well as for investigation of Galactic foregrounds.

1. INTRODUCTION The emergence of cosmic acceleration -the increasingly rapid expansion of the Universe since redshift ∼1.5 -has signalled that either a gravitationally repulsive dark energy dominates the energy density of the Universe today, or that Einstein's General Relativity does not correctly describe gravity on cosmological scales. The impact of this discovery on fundamental physics and astrophysics is revolutionary, and decoding the physics of cosmic acceleration requires new, higher-quality measurements of the expansion rate of the Universe as a function of time.
Nature has provided a standard ruler with which to measure the expansion history of the Universe: the baryon acoustic oscillation (BAO) scale (Seo & Eisenstein 2003. Acoustic waves propagated through the primordial plasma in the early Universe for a fixed amount of time -379,000 years -until the plasma cooled and became neutral gas, primarily hydrogen. The distance these waves travelled has been precisely measured in the Cosmic Microwave Background (CMB) radiation (Hinshaw et al. 2013;Planck Collaboration et al. 2020). These waves imparted slight baryonic overdensities on the BAO scale which are imprinted in the largescale distribution of matter in the Universe. By measuring cosmic structure as a function of time (i.e., redshift), we can deduce the apparent size of the BAO scale as a function of cosmic epoch, and hence the expansion history of the Universe.
The signature of BAO was first detected in large scale structure, at redshift z ≈ 0.35 (Eisenstein et al. 2005) and z ≈ 0.2 (Cole et al. 2005), using galaxies as tracers. More recently, measurements of the BAO scale at redshifts up to z ∼ 0.8 have been made by observing the distribution of optically-detected galaxies, using either spectroscopic (Percival et al. 2007;Beutler et al. 2011;Blake et al. 2011;Padmanabhan et al. 2012;Anderson et al. 2012;Ross et al. 2015;Alam et al. 2017Alam et al. , 2021 or photometric (Seo et al. 2012;DES Collaboration et al. 2019 catalogs, and at higher redshifts in Lyman-alpha systems (e.g. Busca et al. 2013;Slosar et al. 2013;du Mas des Bourboux et al. 2020) and quasars (Ata et al. 2018;Neveux et al. 2020). All of these efforts produce measurements of the distance-redshift relation that are consistent with the notion that the dark energy is a cosmological constant with an equation of state p DE = −ρ DE (w = −1) (Alam et al. 2021). However, improved precision in the distance-redshift relation is still possible due to the fact that only a small fraction of the accessible large scale structure has been mapped to date, especially at redshifts greater than 1. Several efforts are ongoing to map ever-larger volumes of large-scale structure to yield improved precision, particularly by the ground-based experiments DES (Dark Energy Survey Collaboration et al. 2016) and DESI (DESI Col-laboration et al. 2016), and the space-based telescopes Roman (Akeson et al. 2019), Euclid (Amendola et al. 2018), and SPHEREx (Doré et al. 2014).
A complementary way to map the large scale distribution of matter, called hydrogen intensity mapping, has been successfully demonstrated by several analyses (Pen et al. 2009;Chang et al. 2010;Masui et al. 2013;Switzer et al. 2013;Anderson et al. 2018;Wolz et al. 2021). The technique uses modest-angular-resolution observations of redshifted 21 cm emission from the hyperfine transition of neutral hydrogen to trace the distribution of hydrogen gas, and thus matter, in the Universe. Hydrogen intensity mapping allows the apparent angular and radial BAO scale to be measured through cosmic history without the expensive and time-consuming step of resolving individual galaxies.
While the intensity mapping technique was first demonstrated using conventional radio telescopes, a dedicated instrument is needed to make a measurement of cosmic acceleration with the sensitivity required to test dark energy models. In order to reduce power spectrum uncertainties due to sample variance, we need to map cosmic hydrogen over nearly half the sky, which requires a telescope with a much higher mapping speed than previously existed.
As described in this paper, the Canadian Hydrogen Intensity Mapping Experiment (CHIME) consists of an array of four 20 m × 100 m cylindrical telescopes, with no moving parts or cryogenic systems, which can observe the northern sky every day over the frequency range 400-800 MHz. As shown in Fig. 1, CHIME's angular resolution of ∼ 40 and frequency resolution of 390 kHz are well suited to measuring the BAO scale in 21 cm emission over the redshift range 0.8 ≤ z ≤ 2.5. This range covers the important epoch in cosmic history when the expansion transitioned from decelerating to accelerating (Riess et al. 2004). CHIME's large scale structure map will constitute the largest survey of the Universe ever undertaken. In addition to facilitating measurements of the BAO scale, CHIME data will constitute a rich dataset for cross-correlating with other probes of large scale structure. In a companion paper, we present a CHIME detection of cosmological 21 cm emission in cross correlation with three separate tracers of large scale structure extracted from the Sloan Digital Sky Survey (CHIME Collaboration et al. 2022a).
The main challenge associated with 21 cm intensity mapping is the very bright synchrotron foreground emission from the Milky Way and from other nearby galaxies (e.g. Santos et al. 2005;Liu & Tegmark 2012). We are investigating several approaches to foreground filtering and subtraction, that rely in various ways on recognizing the difference between the smooth Galactic spectrum and the chaotic BAO spectrum along each line of sight (e.g. Shaw et al. 2015). Separately, we note here that CHIME provides a detailed and high signalto-noise ratio dataset for probing the interstellar medium. CHIME will map the northern sky in polarization, and we will apply the Faraday synthesis technique (Brentjens & de Bruyn 2005) to obtain three-dimensional information about magnetized interstellar structures in the Galaxy. This dataset will be without precedent in the Northern hemisphere and will form a component of the Global Magneto-Ionic Medium Survey (GMIMS). GMIMS is the first effort to measure the all-sky three-dimensional structure of the Galactic magnetic field, using telescopes around the world to obtain maps with sensitivity to the range of Faraday depth structures we expect in the diffuse medium (Wolleben et al. 2019(Wolleben et al. , 2021; the CHIME frequency range is a critical component of GMIMS. CHIME has the same collecting area as the Green Bank telescope and also has a large fractional bandwidth and large instantaneous field of view. It scans the entire sky visible from Southern Canada at daily cadence with sub ms sampling. The data from CHIME are passed commensally to separate instruments which search for fast radio bursts (FRBs), monitor known pulsars visible from the site and search at high spectral resolution for 21-cm line absorption systems. Additionally, CHIME supports very long baseline interferometry (VLBI, Cassanelli et al. 2021) observations with other telescopes.
In Section 2, we present an overview of the CHIME instrument, including its mechanical design, analog and digital systems, and low-level data processing. In Section 3, we describe recent progress in characterizing CHIME's primary beam response. Section 4 is devoted to various performance metrics based on the first three years of science data, including sources of data loss, gain stability, thermal noise, excision of radio-frequency interference, and preliminary sky maps. We conclude in Section 5, discussing the outlook for future 21 cm measurements and showing an idealized forecast for the precision with which CHIME could measure the cosmic expansion history in the absence of foregrounds or systematics. (The details of this forecast are included in Appendix A.) 2. INSTRUMENT AND LOW-LEVEL PROCESSING CHIME is a transit radio telescope. It consists of linear arrays of feeds along the focus of each of four cylindrical parabolic reflectors. The optical system has no moving parts, and CHIME scans the sky as the Earth turns. A photograph of the telescope and surrounding site is shown in Fig. 2.
In this section, we walk through the design of the instrument, showing how its main features have been designed coherently to meet the performance requirements established in Section 1. The signal flow is captured schematically in Fig. 3 and we will follow this same path in our description: from reflectors which define the field of view, through feeds and analogue electronics, to an FX correlator, and the digital back end we call the data receiver.
As described in the introduction, the frequency coverage of CHIME is chosen to interrogate the epoch when dark energy first emerged in the dynamics of the Universe. A wide observing bandwidth increases the total cosmic signal power to CHIME's angular and frequency resolution. Top: The blue solid curve shows the angular scale associated with rBAO, while the other linestyles show the first few harmonics (corresponding to the peaks of successive BAO "wiggles" in k ⊥ Fourier space, located at multiples of kBAO ≈ 2π/rBAO). The shaded region shows the range of angular scales accessible to CHIME as a function of frequency, for antenna baselines ranging from 0.3 m to 100 m. The grey straight lines show the angular resolution associated with feed separations of 20 m and 100 m. Bottom: the solid curve shows the frequency separation associated with the line of sight BAO diameter for 21 cm radiation as a function of redshift. The other linestyles indicate the frequency resolution required to resolve the first two BAO harmonics in k Fourier space. CHIME's frequency resolution is 390 kHz. For all curves, we take H0 = 70 km s −1 Mpc −1 , Ωm = 0.3, and ΩΛ = 0.7. In both panels the shaded region denotes the frequency and redshift coverage of CHIME. and allows interrogation of a wide range in redshift. Limiting the frequency range to cover a factor of two eases the challenges in antenna design and allows digital sampling in the second Nyquist zone, which permits slower sampling and a substantial savings in the cost of electronics. CHIME takes advantage of the historic drop in the cost of low noise amplifiers and digital electronics to fill the aperture of its cylindrical reflectors with radio feeds in one dimension. In this geometry, every feed scans the full North-South meridian syn- Signals are amplified at each feed and brought by low-loss coaxial cables to receiver huts located in commercial RF shielded rooms within customized RF-protective shipping containers, one located between the first and second cylinders, another between the third and fourth. After band-defining amplification, analog-to-digital conversion, a time-to-frequency transform and half of a 'corner-turn', signals are brought from the two receiver huts to additional RF rooms within the white shipping containers seen at the right, where the corner-turn is completed and a spatial transform and other processing are performed. The grey and black structure at the far right is an ambient air heat exchanger associated with the water-cooling system for the X-engine in the adjacent RF rooms. Behind that, also grey, is a 0.5 MW power substation to power the instrument. CHIME is located at the Dominion Radio Astrophysical Observatory which is protected by law and the adjacent hills from terrestrial radio interference. In the background one can see five dishes of the DRAO Synthesis telescope and a solar radio monitor.
chronously and simultaneously, and the instrument scans the full overhead sky every day with no moving parts, reducing systematic errors.

Site
CHIME is built at the Dominion Radio Astrophysical Observatory (DRAO), near Penticton, B.C., Canada. DRAO is operated as a national facility for radio astronomy by the National Research Council Canada. Working at the DRAO has provided the CHIME team with very welcome connections to a community of experienced radio astronomers and engineers.
The site is in the White Lake Basin, within the traditional and unceded territory of the Syilx/Okanagan people. Prior to construction we walked the land with elders, and during initial excavation Okanagan Nation observers were present. The site offers flat land protected from radio frequency in-terference (RFI) by Federal, Provincial, and local regulation and by surrounding mountains. The climate is semi-arid, with low snowfall levels (relative to other places in Canada), important for a stationary telescope. The DRAO's John A. Galt Telescope, a 26-m steerable single-dish telescope with an equatorial mount, is located 230 m east of the centre of CHIME, and 20 m North. We use the Galt Telescope for holographic beam mapping. The DRAO supports CHIME with roads, AC power, machine shop access, well-equipped electronics laboratories, office space, and staff accommodation.
The mountains around the observatory shield the site from RFI from nearby cities, but a significant portion of the CHIME frequency band is still contaminated by satellites, airplanes, wireless communication, and TV broadcasting bands.  Figure 3. Schematic diagram of data flow through CHIME. Signals focused onto a linear array of broad-band dual-polarization antennas, are amplified (each polarization separately) using room temperature receivers with a noise performance below 30 K that amplify and filter the signals to 400 MHz to 800 MHz. The correlator is an FX design, where the F-engine digitizes and channelizes the signals from the 2048 analog receivers and also implements the majority of a corner-turn network that rearranges the channelized data for spatial correlation. The X-engine completes the corner turn and performs the cross-multiplications and averaging to compute the N 2 feed spatial correlation matrix separately at each frequency. The X-engine also performs additional real-time data processing operations to beamform and to increase spectral resolution for the pulsar, FRB, and absorber back end instruments.
UHF repeaters around 450 MHz. These features are clearly visible in the spectrum shown in Fig. 4. Besides cell-phone and TV-station bands that are static in nature, there are many sources of intermittent RFI events such as direct transmission from satellites and airplanes, as well as scattering of distant ground-based sources. One such event is visible in Fig. 4 from 460 MHz to 600 MHz at around 165 s. These scattering events typically appear as 6 MHz wide bursts which last for a few seconds, and are caused by the reflection of distant broadcast TV bands from meteor ionisation trails or aircraft.

Mechanical and Optical Design
The design of CHIME is focused on enabling the measurement of BAO across the redshift range where dark energy begins to impact the dynamics of the Universe. The spectral response, reflector geometry and RF feeds are designed together to form an instrument tuned to perform this measurement in a way that allows control and characterization of systematic errors. Total estimated cost was also a strong design driver.
Measuring BAO in the redshift range from 0.8 to 2.5 covers the region of interest for probing dark energy, and fills in a redshift gap which is sparsely covered by optical measurements. At these wavelengths, sufficient angular resolution to resolve BAO features in the power spectrum of the sky is easily achieved by a 100 m baseline (see Fig. 1). Figure 5. Measured surface error of CHIME Cylinder A compared to a best-fit parabola plotted against cylinder X, the horizontal distance East of the vertex. Points on the surface are measured with a surveyor's total station tracking a retro-reflector on a small wheeled cart as it moves over the reflector surface. The survey accuracy is nominally 3 mm in 100 m which has not been subtracted from the scatter seen here. The quantity plotted is half the optical delay error from the sky to the reflector to the focus, equivalent to simple surface error for a flat mirror. There are two main terms in the shape visible here. The mesh which forms the surface appears to sag approximately 1 cm in each of the 1 m gaps between the supporting purlins compared to the desired parabolic shape. Additionally one can see that the rolled parabolic truss is formed of three segments which also depart from the desired shape by near to 1 cm. The net surface deviation is 7.2 mm RMS, or λ/50 at CHIME's shortest wavelength. These deviations are clearly coherent over the entire structure of a cylinder. Each of the four cylinders looks similar to this one example in all the key features.
An East-West array of cylindrical, 100 m-long reflectors each coupled to a linear feed array along its focus meets these needs. Such a system scans a North-South stripe of the sky interferometrically and observes most of the 3/4 of the celestial sphere visible from our site every day as the Earth turns. Given that each feed in this system requires a feed response of ± 1 rad along the cylinder axis, choosing a reflector shape to be an f /0.25 parabola allows the use of feeds with approximately symmetric angular response patterns. At this f-ratio, the focus is level with the edges of the reflector, protecting the feed array from terrestrial radiation.
The required East-West separation of feed arrays can be achieved by varying the number of cylinders and the aperture of each. Deploying four 20 m-aperture reflectors was chosen as a reasonable compromise of costs of the reflectors and costs of the electronics to collect and process the signals while still providing massively redundant measurements of the most important (u, v) baselines. This redundancy simultaneously provides lower system noise and protection from minor variations of the response of individual elements of the instrument.
We describe the layout of the telescope in a 3D Cartesian system with +Z pointing to the zenith, +X to the East and +Y North. Thus, the linear feed arrays are oriented along the Y axis with X and Y polarization directions. When we describe the angular response of the telescope we use the orthographic projected angles x and y defined in section Section 3.2.
A steerable telescope can be turned to low elevation angles to shed snow, but this is not possible with the CHIME reflectors. Therefore, the reflector surface is formed with wire mesh to allow snow to fall though. Larger gaps in the mesh shed snow with more assurance but also allow thermal radiation from the ground to leak through to the focus, raising the system temperature. Heavy wire gauge lowers the RF leakage. Using tools from Mumford (1961), we evaluated RF leakage across the CHIME band of commercially available sheets of heavy-duty mesh, settling on 19 mm spacing woven mesh made of 2.2 mm diameter galvanized steel. This material is easily available in large flat sheets. The leakage through these sheets add from 1 K to 2 K to the system temperature across the CHIME band.
The central 78 m of each focal line is instrumented with feeds and low noise amplifiers (LNAs). The 100 m-long reflectors intercept the beams of the end feeds out to a zenith angle of 65°. These end feeds do see more RFI and more thermal loading than typical feeds, and this is accounted for in our analysis pipeline (see Section 4.1).
The reflector structure itself was designed in collaboration with Empire Dynamic Systems, Coquitlam BC, a civil engineering firm with substantial experience building astronomical facilities using standard steel fabrication techniques. Each 8 m-long section of the reflector is formed from three panels. These are rolled steel beams connected by 8 m long purlins running parallel to the axis, assembled on site and lifted into place. The mesh reflector surface is bolted to the purlins once the structure of an entire cylinder is complete. The structure is supported on steel legs which stand on cement footings placed deep enough that the base is below the anticipated frost depth.
The surface accuracy, shown in Fig. 5, corresponds to between λ/50 and λ/100 across the CHIME band. The surface errors are dominated by two terms: a consistent imperfect shape formed by the purlins welded to the curved steel frames and by almost 1 cm of sag of the mesh in each of the 1 m gaps between purlins. These perturbations are coherent for the full length of each cylinder in the North-South direction and were measured by tracking a retro-reflector across the full surface using a surveyor's total station.
The ground plane of the linear feed array, at the focus of the cylinder, is just wide enough that it can shield the narrowest building code-compliant walkway placed above it. Removable panels of the walkway facilitate access to amplifiers and cables. Access stairs at the North end of every focal line are in line with the optic axis and the same width as the ground plane.
Observations of bright point sources acquired with CHIME exhibit an unexpected phase error that scales linearly with east-west baseline distance, frequency, and the sine of the source's zenith angle. This can be explained by a clockwise rotation (looking down from the sky) of the telescope structure by 0.071 ± 0.004°with respect to the true astronomical north-south direction. Alternatively, it can be explained by a linear offset in the north-south positions of the feeds from one cylinder to the next of −2.73 ± 0.15 cm per cylin- der (from west to east). The quoted values were measured by minimizing the phase of visibilities when beamformed to the location of 24 bright point sources ranging in declination from 5°to 65°. We are currently unable to distinguish between these two explanations due to confusion between this effect and the phase of the beam response as a function of hour angle. We assume an overall rotation of the telescope when constructing the baseline distances that are used in our analyses.

Analog System
The analog signal path consists of 256 dual-polarized cloverleaf antennas (Deng 2020;Deng & Campbell-Wilson 2014) in a linear feed-array along the focus of each cylinder, with each linear polarization coupled to a low-noise amplifier (LNA), coaxial cables, a band-defining filter and amplifier (FLA) and the input to an analog-to-digital converter (ADC). A single channel is shown in Fig. 6. The system components have been designed together to optimize overall performance for interferometric measurement of the BAO. With 256 dualpolarized antennas per cylinder and four cylinders, there are 1024 antennas and 2048 analog signal chains.
Each cloverleaf antenna, together with its image antenna in the ground plane, has an effective focus nominally located at the ground plane, independent of frequency. The radiating board, whose current pattern is shown in Fig. 7, is designed to have a smooth petal shape in order to be free of resonances and match to the CHIME LNA over the octave bandwidth from 400 MHz to 800 MHz. Deng (2020) described this optimization. For each linear polarization, pairs of balanced signals from the four petals are combined via a tuned set of microstrip transmission lines (a balun) to form a single-ended 300 400 500 600 700 800 900 The simulated current pattern on the petalshaped radiating board of the cloverleaf antenna at 600 MHz. Feeds are constructed using commercial printed circuit board (PCB) materials and techniques, resulting in precise and economic antennas. Bottom: Measured S11 of the two polarizations of the cloverleaf antenna. The design substantially exceeds the goal to have a return loss of more than 10 dB over the full CHIME band, illustrated by the horizontal dashed line. signal at the input to the LNA on the base of the antenna. The petals are printed on the top and bottom surface thin (0.031") FR4 PCB material and liberally connected with vias, while the stem and base are printed on low-loss Arlon DiClad 880 (Dk=2.2) material using ordinary printed circuit techniques (Leung 2008).
Feeds are 305 mm apart along the focal line (the telescope Y axis), and communicate with one another with coupling coefficients that depend on separation, polarization, signal frequency and angle of incidence. Coupling between feeds separated by as much as five times the basic interval is not negligible. The baluns are designed to produce an effective impedance of each element of the linear antenna array, including these coupling terms, which is noise-optimal for our LNA. Balun designs are therefore different for X and Y -polarized elements because inter-feed coupling is stronger for the X ( − → E ⊥ to separation) polarization than for Y ( − → E to separation). The calculated noise temperature for the central element of a linear array is shown in Fig. 8   Using measured feed-to-feed coupling parameters (S21, S31, S41, . . . ), the effective impedance for the central feed in a linear array has been calculated as a function of frequency and incident angle. The noise is calculated using this impedance and a high-fidelity model of our LNA performance. Because of the stronger coupling for X polarization, particularly in the vicinity of the feature near 550 MHz, 15 elements are used in the X-impedance model, and 13 for Y . The sharp feature at 500 MHz in both polarizations is a property of isolated CHIME antennas.

Angle of Incidence (degrees)
Angle of Incidence (degrees) Figure 9. Sections of the modelled angular response of a CHIME feed in the E and H planes for several frequencies across the CHIME band.
Although the dual-feed antenna is symmetric with respect to its X and Y axes, each beam is slightly elliptical between its E and H planes, and therefore Y -polarized and X-polarized beams illuminate the reflector differently. The vertical dashed lines in the E and H panels are at ±90 • , corresponding to the edges of the reflector for X and Y polarized radiation. Fig. 9 shows models of the angular response of an individual feed, modelled using CST Studio (Simulia 2022), for several frequencies across the CHIME band. As desired for feeds facing an f /0.25 cylindrical reflector, the beam shape is broad and the beam width is largely independent of frequency over the CHIME band. Notice that the E-plane and H-plane beam widths are slightly different from each other. Therefore the X-polarized and Y -polarized channels have slightly different illumination patterns on the reflector, and slightly different far-field angular response patterns. The consequences of this variation will be discussed in Section 3.
The amplification and phase response of the remaining analog chain are plotted in Fig. 10. The very sharp band edges at 400 MHz and 800 MHz are designed to allow half-Nyquist sampling of the signal. The response is achieved with a custom bandpass filter built for CHIME by Minicircuits 1 , model BPF-600-2+, and installed following the first gain stage of the second stage amplifiers (FLA). One sees in Fig. 8 that the LNA noise across the CHIME band is roughly 20 K. The gains of the LNA and FLA are chosen so that all other noise contributions are minor. The FLA contributes 0.6 K at the very top end of the CHIME band. Cable losses and ADC input noise are less than this.
The non-linear response coefficients for the CHIME analog chain are plotted in Fig. 11, with all coefficients referred to the LNA input. By design, the system third-order intercept point (IP3) within the CHIME band is dominated by that of the ADC. The LNA and the first stage of the FLA are not protected by the bandpass filter so in principle strong out-ofband RFI could produce in-band harmonics from non-linear response of the front end. Extreme care has been taken with the non-linearity of the front end electronics to avoid this. RFI at the CHIME site does not reach the levels that would produce a non-linear response in our electronics.
It is worth a few remarks about the technical details of deploying 4,000 amplifiers and a similar number of cables over a 100 m square. The LNA and FLA are built into folded steel boxes which are soldered shut. A small slab of RF absorber is glued inside the FLA boxes to suppress oscillations of the final stage to which earlier generations of our amplifier were prone. Aluminum segments of the focal line which we call cassettes, consisting of four antennas, eight LNAs and associated 1 m long SMA-to-N type cables are assembled indoors and carried to the focal line where they are mounted in place and bolted to each other. Thus, the inter-feed spacing is set by digital machining. The 50 m low-loss N-type coaxial cables connecting the LNAs to the FLAs at the receiver hut are cut to be the same length to within 0.1%, and the optical delay of each cable has been measured separately. Excess cable length for the antennas nearest the hut is stored in cable trays running the length of each cylinder in a geometry we call an optical trombone. A full set of S-parameters is measured at the factory for each cable and serial numbers are for each recorded on bar codes. This is the practice for all Figure 10. Top : Gain of the analog chain from the LNA at the CHIME feed to the ADC input. The vertical dashed lines show the edges of the second Nyquist band for the CHIME ADC sampling cadence of 800 MHz, corresponding to the CHIME bandwidth of digital signals. The chief elements in this analog chain are a low noise amplifier with a peak gain of 42 dB and a gentle roll off above f = 1 GHz, a filter amplifier with a peak gain of 38 dB and a well defined passband provided by a custom filter from Mini-circuits (BPF-600-2+), 50 m of low loss LMR-400 type coaxial cable and 5 m of higher loss cable located within the receiver huts. Bottom: The sum of measured phase shifts of all components of the analog chain plotted against frequency. A single delay term is subtracted to show a flat phase curve at the centre of the band. Phase shifts associated with the very steep edges of the CHIME band-defining filters are evident. components of the analog chains. During system assembly, pair-wise connectivity of all analog components is recorded using a hand scanner and an interactive script operating on a mobile device.
The FLAs sit within a radio-frequency shielded room with their input connectors protruding through a bulkhead in the wall. DC power is supplied to the LNA from the FLA over the coaxial cable. The amplifiers of each individual signal chain can be powered off by remote command if desired. The RF room provides 100 dB of attenuation and houses the ADC and F-engine. Once installed, physical access to any antenna or LNA is available by lifting the floorboards of a walkway along each focal line. This system is less waterproof than we wish, and in heavy rains water can get to the baseboards of the antennas, causing temporary unacceptable performance. The focal line structure, consisting of an elevated enclosed dry volume mildly heated by the LNAs, is a nearly ideal bird habitat; consequently, we have found it is very important that there are no holes as large as 2 cm diameter anywhere in the structure since these would would allow starlings to enter.

FX Correlator
CHIME employs an FX correlator in which the timedomain signal from each feed is transformed to form a frequency spectrum in a part called the F-engine. At each frequency, data from every feed are collected at a single des- Figure 11. Analog chain linearity parameters, referred to the LNA input, are plotted against frequency. The non-linearity parameter IP3 is 13 dBm at the input of the CHIME ADCs, where amplified RF power is highest. The output coefficients, OP3 for the FLA and LNA are measured to be 35 dBm and 30 dBm respectively, nearly independent of frequency. These coefficients are more useful referred to a common point and so we have referred them all to the equivalent coefficients at the input of the LNA, taking account of gains, bandpasses and cable losses in front of each element. We have nearly achieved our design goal that the system limit is set by the ADC at all frequencies. In normal operation RFI signals at CHIME do not reach these levels either in band or nearby out of band.
ignated computation node and a spatial transform is made of these signals to form visibilities. This spatial transform is performed in a part of the instrument called an X-engine. These two processes are described below. The F-engine consists of eight 16-card electronics crates housed in two separate RF-shielded rooms located in modified, cooled, 20-foot shipping containers between pairs of cylinders. These two containers are connected by optical fibre to the X-engine, which is housed in a pair of RF-rooms enclosed in 40-foot shipping containers, adjacent to the telescope. The X-engine is built from 256 GPU nodes and is water cooled.

F-Engine
The F-engine is implemented using the ICE (Bandura et al. 2016a) platform. ICE uses a field programmable gate array (FPGA) and is a general-purpose astrophysics hardware and software framework that is customized to implement the data acquisition, frequency channelization, and corner-turn networking operations of the CHIME correlator.
A schematic diagram of the data flow through the Fengine is shown in Fig. 12. The core of the system is built around ICE motherboards which handle signal processing and networking using Xilinx Kintex-7 FPGAs. Each motherboard supports two custom ADC daughter boards. FPGA firmware and software are customized for the CHIME application. Each ICE motherboard digitizes 16 analog signals into 8 bits at 800 million samples per second (MSPS) Thus, the 400 MHz to 800 MHz sky signals are directly sampled in the second Nyquist zone.
The data stream from each digitized signal is fed to the FPGA, which implements a polyphase filter bank (PFB) efficiently using a fast Fourier transform (Parsons et al. 2008).
Data are processed in frames of 2048 samples, separately for each stream. A PFB is more compact in frequency than a simple FFT would be, greatly aiding RFI excision by localizing any disturbance. At the cadence of individual data frames, the PFB applies a sinc-Hamming window to 4 consecutive data frames, and outputs a single frame of 1024 complex values, one value per 390 kHz wide frequency channel, in 18+18 bit real and imaginary format. After the PFB, the data are rounded to 1024 4+4 bit complex values per frame. Adjustable scaling factors (complex gains) are applied to each frequency channel before this step in order to optimize the data compression (Mena-Parra et al. 2018).
After the frequency channelization, each ICE motherboard holds the data for 1024 frequency channels of signals from 16 analog inputs. However, in the X-engine for each frequency, data from every input must be presented to one processor in order to compute the cross-multiplications and averaging required to form the visibilities. A total of 6.6 Tbit/s of data needs to be re-arranged and transmitted to the Xengine, an operation performed in a four-stage corner-turn network (Bandura et al. 2016b). The first stage is performed in each ICE motherboard, where the frequency-domain data from each input are split into 16 subsets, each containing 1/16 of the frequency channels from all 16 inputs.
Each group of sixteen ICE motherboards is packaged in a crate, and all the boards within a crate are interconnected through a custom backplane that implements a passive high speed full-mesh network. CHIME uses a total of eight crates or 128 ICE motherboards. The second corner-turn stage is a data exchange between the boards in the crate, after which each board has all the data from 256 inputs for 64 of the frequency channels.
The third stage is a data exchange between pairs of ICE motherboards located in adjacent crates using high-speed serial links. After this third stage, the data from 512 inputs are split into 256 subsets distributed through the ICE motherboards of the two crates, and each subset contains four unique frequency channels. Each crate pair contains all the data for one quarter of the CHIME array, both polarizations from one cylinder.
The fourth stage of the corner-turn network takes place inside the GPU nodes of the X-engine. Each ICE motherboard sends its data stream to eight different GPU nodes through two active 100 m multi-mode optical fiber QSFP+ to 4×SFP+ cables. Each GPU node receives one frequency subset from one ICE motherboard in each crate pair and recombines the data to compute the correlation matrix for data from all 2048 inputs in four unique frequency channels.
The four F-engine crate pairs are housed in independent racks distributed between two separate RF-shielded rooms installed within 20 ft modified, RF-shielded shipping containers, known as Receiver Huts. Each receiver hut serves two cylinders and is placed between them at their midpoint. This arrangement minimizes the total length of coaxial cables running from the focal line of the cylinders to the receiver huts.  Figure 12. Data flow through the F-engine. A total of 128 ICE motherboards are required to process 2048 sky signals. These motherboards are installed in eight crates, with each crate handling the signals for one polarization from every antenna on one cylinder. Each motherboard digitizes 16 analog signals into 8 bits at 800 MSPS. The data stream from each digitized signal is fed to a FFT/PFB that splits the 400 MHz bandwidth into 1024 frequency channels. A four-stage corner-turn network re-arranges the data to allow spatial cross-multiplication and integration at each frequency in the X-engine. In stage one, each motherboard creates 16 new data streams, each one having 64 frequency channels from each of the 16 input signals. In stage two, motherboards within a crate exchange data through a high-speed backplane network such that each board holds the data for 64 unique frequency channels from all of the 256 inputs processed by that crate. In stage three, each motherboard sends the data from half of its frequency channels to a sister motherboard in an adjacent crate. With this inter-crate data exchange, each board within a crate pair contains the data for a subset of 32 unique frequency channels and 512 inputs. Stage four is completed within the X-engine GPU nodes. Each ICE motherboard re-orders the data into eight subsets, each containing 4 frequency channels for 512 inputs. Each subset is sent to a different GPU node. Each of the 256 GPU nodes receives data from four different motherboards such that it ends up with the information from all the 1024 polarized antennas for four unique frequency channels.
A GPS-disciplined, oven-controlled crystal oscillator provides the 10 MHz clock for the F-engine system. The GPS receiver also generates the IRIG-B timecode signal used to insert time-stamps in the data. A copy of the clock and absolute time signals is sent to each of the F-engine crates. From there, the signals are distributed to each ICE motherboard and digitizer daughter board through a low-jitter distribution network. A broadband noise source system, which will be described in Section 2.7, is used to monitor and correct for drift between copies of the clock provided to each digitizer daughter board.
The F-and X-engines communicate over 256 optical fibers. Each fiber cable contains four strands that connect one ICE motherboard to four different GPU nodes. These are carried within a waterproof cable tray that goes underneath the cylinders and above the huts. Also within the cable tray are the coaxial cables that distribute a clock and absolute time signals to the F-engine huts. The mapping of which RF frequencies are sent to which nodes in the X-engine is adjustable. This allows, for example, sending the data from frequency channels heavily corrupted by RFI to nodes which are temporarily down for repair, preserving useful bandwidth.

X-engine
The CHIME X-Engine performs spatial correlations and other real-time signal processing operations, using 256 nodes, each with 4 GPU chips. Details of the nodes and support infrastructure can be found in Denman et al. (2020). These nodes run a soft real-time pipeline built using the KOTEKAN framework (Renard et al. 2021;Renard et al. In Prep.), which handles the X-engine, RFI flagging, and multiple real-time beamforming operations. The processes performed by each node are shown in Fig. 13.
Data arrive at each of the the nodes from the F-engine on four 10 Gbit/s fibre SFP+ links. Each link conveys data from 512 feeds from four frequency bins from each of the four Fengine crate pairs. Packet capture is handled in KOTEKAN using the DPDK 2 library to reduce UDP packet capture overhead normally associated with using Linux sockets. Once in the system, the packet data from each link is split into 4 different staging memory frames, one for each frequency, which completes the final corner-turn. Following packet capture there are 4 frames each with data from one frequency channel and from all 2048 feeds, for 49,152 time samples. These frames are transferred to the GPU chips, resulting in each GPU chip processing data for exactly one of the frequency channels.
Once the data frames are on the GPU, a number of operations are applied to the data using OPENCL and hand optimized GPU kernels. The primary operation is the creation of the visibility matrix by the correlation kernel, using about 75% of the processing time. For each frequency channel, the complex data from each feed are multiplied by the complex conjugate of the corresponding signal from each other feed to create the visibility matrix. This is a Hermitian matrix, and only the upper triangle is directly computed.  Figure 13. Processes performed by each X-engine node. Data arrive from the F-engine, the final corner-turn is performed by the CPU in the X-engine, and signals from all 2048 feeds within one single frequency channel are transferred to one of the four GPUs. On each GPU the spectral kurtosis is computed as an estimation of RFI, and contaminated samples are removed. After flagging the data for RFI at ∼0.6 ms cadence, the data are correlated to produce an N 2 feed visibility matrix and summed over 31 ms. Each 31 ms correlation product is copied off the GPU and tested again, this time for long-duration RFI, which is either removed or processed further (see Fig. 14). The data are also branched off to two distinct beamforming engines, a tracking voltage beamformer with 12 steerable beams and an FFT spatial beamformer which generates 1024 power beams at increased frequency resolution. Those power beams are further split into two combinations of frequency and temporal resolution. The tracking voltage beamformer is used primarily for the CHIME/Pulsar backend, and the FFT beamformer is used for both the CHIME/FRB search backend and a 21 cm narrow-band absorber search backend. A buffer of the most recent 33 seconds of data from the F-engine is updated in RAM. When triggered by the FRB search engine, the raw voltage data in this buffer, corresponding to one event, is transmitted to an archive. This calculation dominates the computational cost in CHIME, and we worked hard to optimize it. Data are processed independently in blocks of 32×32 feeds, distributed across 64 collaborating computational instances ("work items" in a "work group"). These work items em-ploy Cannon's algorithm (Cannon 1969), collectively loading 8 sequential timesteps for all 32+32 inputs under consideration, and sharing these over high-speed local interconnects. Unsigned 4-bit values can be packed into 32-bit registers, allowing efficient multiplication and in-situ accumulation (Klages et al. 2015). Ultimately, 6 of the 8 arithmetic operations required for a complex multiply-accumulate (cMAC) operation are performed in a single GPU instruction. The remaining two are paired with another cMAC, for a total of 3 instructions per pair of cMAC computations. These intermediate products are accumulated in active registers, with top bits periodically peeled off and accumulated to high-speed local memory to prevent overflow. Products are summed in time over 12288 input time samples, before being unpacked and read out, to produce visibility products with a temporal resolution of roughly 31ms. To maximize throughput, this kernel was directly implemented in AMD's assembly-level Instruction Set Architecture (ISA), and the resulting high performance both left space for additional processing kernels (e.g. beamforming, RFI), and also allowed for a substantial reduction in observatory power envelope via low-power operation of the GPUs.
To excise RFI-contaminated data prior to the correlation operations, a spectral kurtosis value is computed over all inputs and 256 successive time samples (total ∼ 0.66 ms) (Taylor et al. 2018). Each 0.66 ms block of data with a kurtosis value deviating from the expected value by a configurable threshold is given 0 weight. The amount of data which are excised or otherwise lost (for example to lost network packets) is accounted for in the metadata and normalized later in the pipeline. These kurtosis values are extracted from the GPU and used in a second-stage RFI test which can drop entire 31 ms samples after they leave the GPU based on the statistics of the 48 × 0.66 ms spectral kurtosis samples within. This second stage is designed to excise RFI events that are lower in power but longer in duration than those found in the first stage. This second stage excision is turned off during Solar transit.
The 31 ms visibility frames which are not excised are processed in the CPU associated with each node and transmitted to a receiver system running another configuration of KOTEKAN which does further processing. See Fig. 14 and Section 2.5 for more detail.
In addition to the correlation, RFI estimation and flagging, the GPUs perform two kinds of beamforming operations. The first type is a tracking voltage beamformer, which takes right ascension and declination coordinates and generates a set of dynamic phases that are applied to input voltage data and summed over all feeds to generate a single coherent beam used to observe celestial sources while in the CHIME field of view. Currently CHIME forms 12 of these beams simultaneously. The data from these formed beams are scaled to 4+4-bit complex data at full 2.56 µs time resolution and transmitted over the 1 Gigabit Ethernet (GbE) links on the nodes. The data streams from 10 of these beams are sent to 10 CHIME/Pulsar processing nodes. The remaining two beams are used for other operations such as VLBI and calibration.
The second type of beamforming operation is an FFTbased spatial imaging beamformer (Ng et al. 2017) which generates 1024 power beams in fixed terrestrial coordinates for use in the FRB engine and the high-resolution absorber search. A spatial FFT is performed for the data from each cylinder to generate 512 beams for each polarization. Of these, 256 are selected to achieve roughly achromatic pointing. A 4-way transform is computed across all these beams in rows between cylinders. This combination produces 1024 beams at each frequency, and for each of these 128 successive temporal samples are Fourier transformed to extract higher frequency resolution. For the high frequency-resolution absorber search, the data are squared at full 128 sub-frequency spectral resolution (∼ 3 kHz), and integrated to ∼ 120 ms time resolution. After leaving the GPU these high resolution data are integrated again to 10 s, and stored on a backend running a special configuration of KOTEKAN, to enable a search for 21 cm narrow-line absorbers. For the FRB search engine, the data are squared, summed over polarizations, and summed over 16 frequency bins and 384 time samples to produce 1024 power-beams with 16 sub-frequency bins (∼ 24 kHz) per original CHIME channel at ∼1 ms time resolution. This tuning of sampling time and frequency resolution is made to match the data to the conflicting goals in the FRB engine of resolving short pulses and performing dedispersion in a discretely sampled spectrum. These data are sent from each GPU to the FRB search backend in custom UDP packets over a 1 GbE link to be searched in real time for FRBs.

Real-Time Processing
The ensemble of the 1024 GPUs generate N 2 feed correlation products at a 31 ms cadence for each of 1024 frequencies. This amounts to a raw data rate of ∼4.6 Tbit/s. It is not feasible to write out and store such a fire-hose of data. The receiver system is tasked with aggregating and processing the data stream in preparation for archiving. In the process it produces ancillary data products that are tapped for system and data quality monitoring. Fig. 14 provides a schematic representation of the receiver system. The various stages are distributed across multiple computers (aka nodes). The first of them occur on the GPU nodes themselves (executed on the CPU) before being transmitted over the network to the single receiver node, where the remainder of the pipeline occurs. Another computer, the processing node, hosts parallel processing tasks that are not time-critical for subsets of data. Notably this includes deriving the calibration solutions that are fed back into the main receiver node pipeline. The final data products are sent over the network to an archive node. Aside from a few exceptions, all of these stages are built on the KOTEKAN framework.
Accumulation and gating -In order to reduce the data rate, the first stage following the GPU co-adds RFI-cleaned 31 ms frames for 5 s. A later stage co-adds samples further to the final 10 s cadence, but optionally the subset of the data comprised of correlation products with the Galt 26 m telescope are kept at the finer time resolution to avoid smearing due to the faster fringing of the ∼230 m baseline between Galt and CHIME. This is the last chance for any operations on the fast-cadence data. The variance over the 31 ms samples is calculated to estimate the noise level in the accumulated frame and passed along with it. Gated accumulation is also supported, where samples are weighted and binned into on and off gates and the difference of the two is returned at the end of the integration window. Gating can be initiated, or its parameters updated on the fly without interrupting data acquisition. Currently, gating is used for simultaneous observations with the Galt telescope of slow (P > 300 ms) pulsars for beam holography (Section 3.2).
Eigendecomposition -The four leading eigenvalues/vectors of the N 2 feed visibility matrix are estimated for every time sample and passed on down the pipeline. It is necessary to perform this step in the X-engine in order to distribute the computational load over the 256 CPUs located there. The eigenvectors represent the response of every individual array element to the dominant modes on the sky at that moment, making them a valuable tool for real-time calibration. Importantly, it is not possible to perform this decomposition after the redundant baseline collation step, and the full N 2 feed visibility matrix is only stored for a small number of frequencies, so these eigen-data are important for offline analysis as well. Since noise-coupling between nearby feeds is significant and will outweigh the sky modes, the diagonal values of up to 30 feed separations are excised from the matrix prior to the decomposition. To avoid biasing the result, an iterative scheme is employed to progressively complete the masked region.
Calibration broker -A daily complex gain calibration for every sky signal is derived from the transit of a bright astronomical point source. The calibration broker is a service running on the processing node that produces gain solutions by fitting the eigenvector data immediately following the transit of a chosen point source. The eigenvectors are continuously provided to the broker via a shared memory ring buffer and the broker can access a timestream spanning the transit by reading the buffer file approximately 20 min after transit. During transit, the bright source is the dominant contribution to the sky signal and the visibility matrix can be approximated as an outer product of the input gain vector (a rank-1 approximation), identified as the leading eigenvector. A complication is that the 2048 sky signals include two polarizations, so there are in fact two near-orthogonal components to the matrix. There is no guarantee that these two vectors neatly divide the inputs by polarization as is required to interpret the eigenvectors as gain solutions. An additional orthogonalisation with respect to the two-dimensional space of polarizations must be performed by the broker to isolate them. The intrinsic flux density of the source across the band is corrected for using the measurements of Perley & Butler (2017). Frequencies affected by RFI are flagged by comparing the ratio of the eigenvalue on-and off-source, and those with anoma-  Figure 14. A diagram of the data flow through the CHIME post-correlation receiver system. The receiver system processes a full N 2 feed correlation matrix for each of the 1024 frequency channels every ∼31 ms from the X-engine (see Section 2.4.2). Initially this stream is processed within the CPUs of each GPU node to accumulate the data up to a 10 s cadence, estimate its noise and, if desired, perform gating for pulsar observations. Cross correlations with the Galt 26 m Telescope used for bright source and pulsar holography can be extracted at 5 s cadence to prevent fringe smearing. For use in calibration, we solve for the highest four eigenvalues and vectors of each N 2 feed frame. The data for all of this is sent over the network to a single receiver node for further processing, including flagging, calibration and baseline stacking before being written to disk. Flags for bad correlator inputs are derived by a broker process running on a separate node that assimilates various sources of data quality information into a mask for each correlator input. Similarly, gain solutions for calibration are derived by a broker that uses eigenvector and noise source timing data from the correlation products as well as environmental data to produce gains that are applied in real time to the N 2 feed data.
lous gain amplitudes are also flagged. Gains for the flagged frequencies are recovered by interpolating between the gain solutions for adjacent good frequencies. The four brightest sources are processed in this way at every transit, but only one is used for calibration. The choice of which source is used changes throughout the year to avoid calibrators near the Sun, and any differences in the primary beam patterns are corrected using the average ratio of past gains from the source, to past gains from Cygnus A (Cyg A). The calibration procedure therefore normalizes the primary beam pattern at each frequency to unity on meridian at the declination of Cyg A. See Section 2.7 for additional corrections applied later in the pipeline.
Flagging broker -The role of the flagging broker is to perform real-time identification of correlator inputs that should be excluded from further analysis. It runs on the processing node and provides regular updates to the relevant stages of the receiver pipeline. It uses a variety of data products and housekeeping metrics to repeatedly evaluate 10 different tests, with each test designed to identify malfunctioning or otherwise anomalous correlator inputs. Below we list its data sources and briefly summarize the corresponding tests.
Note that there can be multiple tests derived from a single data source.
• Layout database: Reject inputs that are not currently connected to an antenna or that have been flagged manually by a user.
• Power server: Reject inputs whose amplifiers are not currently powered.
• ADC data: Reject inputs whose raw ADC data has an outlier RMS, histogram, or spectrum.
• RFI broker: Reject inputs determined to have highly non-gaussian statistics based on a monitoring stage internal to the X-engine.
• Calibration broker: Reject inputs for which the complex gain calibration failed, or whose gain amplitudes exhibit large, broadband changes relative to its median over the past 30 days.
• Autocorrelation data: Reject inputs that have outlier noise or whose autocorrelation shows large, broadband changes relative to past values.
If a correlator input fails any one of the tests all baselines formed from that input will be given zero weight when averaging over redundant baselines.
Gain/flag application and redundant baseline collation -The N 2 feed and 26 m streams from the GPU nodes are merged into all-frequency streams as they arrive at the receiver node. The 26 m streams undergo no further processing, as do a subset of four frequencies from the N 2 feed visibility calculation that are output at this stage to preserve some of the full array information. Keeping N 2 feed terms for all frequencies, amounting to a data rate of over 200 TB/d, is not feasible because of storage constraints. A lossy compression is effected by averaging redundant baselines within each cylinder pair together. Baselines are not combined between the six cylinder pairs to maintain the possibility of correcting for any non-redundancy between the cylinders or between the signal paths that are routed to separate receiver huts. The daily rate of data archived is thus reduced to ∼1 TB. Prior to collating visibilities along redundant baselines, the gain calibration and flags generated by their respective brokers are applied to the data. This compression method is lossy due to any nonredundancy that might arise from support structures, edge effects, and imperfections in the reflectors, or non-uniformity in the feed responses, as well as imperfect calibration.
Real-time map -A subset of 64 frequencies is tapped from the main pipeline following the baseline collation stage and transmitted to the processing node, where a separate pipeline beamforms the visibilities to generate a real-time data stream we call a ringmap. The ringmap is a representation of the data as a timestream of formed beams, visualising the sky as it drifts through the field of view of the cylindrical reflectors (see Section 4.5 for details). The maps for those frequencies are buffered over a period of 24 h and can be displayed using a data monitoring web viewer. They are useful for assessing recent data quality at a glance and we study them every day.
Output datasets -The branching points in the pipeline lead to three main data products. The stack dataset is output by the baseline collation stage and contains the total of CHIME's sensitivity, with all non-flagged baselines contributing over the entire band. The N2 dataset holds complete uncompressed visibility matrices for four frequencies.
It is useful for instrument characterisation and understanding the effects of baseline collation. The gated and ungated 26m datasets contain only the cross-products with inputs from the Galt telescope, and at a 5 s cadence, twice that of the other datasets. These are produced only during simultaneous observations of point sources for beam holography (see Section 3.2).
Compression and archiving -The final module of the real-time pipeline is an archiving service that packages the data into a structured archive format, applies another stage of compression, and registers files with the archive database. It takes advantage of the relatively slow rate of change of the measured sky gradually drifting through the field of view by ordering the data with time as the fastest varying index and compressing the redundant information between nearby time samples. All the data are truncated at a specified fraction of the measured noise level to excise the high variability in the (noise dominated) least significant bits and thus further improve the effectiveness of the compression. The BITSHUF-FLE algorithm (Masui et al. 2015) compresses these data on a bitwise basis, resulting in a typical size reduction of ∼ 2-5 times for stack files. Data are stored on site for up to six months and indefinitely at archives located at Compute Canada centres. Archive files are tracked in an SQL database and all file operations are mediated by a software daemon that validates the integrity of the data and ensures storage redundancy.
2.6. System Monitoring CHIME is a complex instrument with 2048 analog signal chains processed by nearly 400 separate computers spread over six physical locations on site. To keep the experiment running 24 hours and 7 days a week, it is important to identify and rectify inevitable failures in a timely manner. In this section, we explain how the CHIME operations are monitored in almost real time to assess instrumental and experimental health.
Instrument health monitoring -The instrumental health can be monitored by verifying that various hardware and software subsystems are running, data are written to disk, equipment huts are thermally stable, and there is no failure that is an emergency and needs immediate attention, e.g. coolant leak or fire. An array of auxiliary sensors are deployed across CHIME to probe various environmental parameters. These include temperature sensors across one cylindrical reflector; ambient temperature, humidity, smoke and leak sensors in equipment huts; and a weather station with wind and rain-accumulation sensors. Data from all these sensors is streamed in real-time into a central database. In addition, metrics are collected from various hardware and software components, including but not limited to power supplies, operating systems (OS), network statistics from switches. Almost every software and firmware component also generates its own set of internal health metrics. CHIME uses PROMETHEUS (Linux Foundation 2022) and GRAFANA (Grafana Labs 2022) for managing and monitoring the housekeeping data in real-time. PROMETHEUS is an open-source monitoring system and time-series database. The data collected by PROMETHEUS can be displayed through web-based dashboards in the GRAFANA environment. PROMETHEUS allows defining rules for alert conditions and expressing them as a PROMETHEUS query that can invoke an alert to an external service. Alerts are handled by ALERTMANAGER (Linux Foundation 2022) that sends out notifications through SLACK and email to targeted team members when thresholds set on various metrics are violated.
This combination of PROMETHEUS and GRAFANA environment provides the ability to monitor the operation remotely. As there is only one Telescope Operator on site during working hours, and no one otherwise, the CHIME team provides nearly 24-hour remote monitoring of the operation by taking on shifts on a rotating basis after regular work hours. The person on duty responds to alerts in-situ only if they are critical and causing interruption in the data acquisition. As an example, temperature control in equipment huts is quite sophisticated. As both X-engine and F-engine hardware are cooled by liquid coolant, the greatest attention is paid to detecting any potential leaks in the plumbing. If leaks are detected, valves automatically cut the supply of coolant into huts to minimize any potential damage to the system. Similarly, if smoke or flood sensors are persistently tripped the power is automatically shut to receiver huts. This way the system automatically reacts to catastrophic events ensuring the safety of subsystems.
A subset of housekeeping data stored in PROMETHEUS is exported and written to an HDF5 file on a daily basis. These files are then archived to be used during offline data analysis.
Experimental health and data-integrity monitoring -Considering the amount of data that CHIME generates, it is challenging to check the data quality and integrity in real time. The focus of this operation is to highlight only those data-quality issues that can be addressed and improved by acting swiftly and adjusting certain configurable hardware or software parameters. The timeframe for these assessments can be seconds (e.g. RMS of sky signal); minutes (e.g. spectra waterfall, correlation triangle); or, a day (e.g. calibration quality, downward trend in noise integration). Data quality and integrity are monitored though a mix of manual checks and a set of automated quick data analysis on a daily basis by the remote operator(s). DIAS -3 is a software framework for data integrity analysis and generation of daily plots. It runs as a service 3 https://github.com/chime-experiment/dias that schedules the execution of data analyzers. This framework replaces slow on-demand script execution with an automated pre-generation of a set of data products, which are not archived and are only available for a few months. A lightweight package for generating web-based plots, THEREMIN, is developed in house and used to view these data products.
Using DIAS and THEREMIN we are able to monitor the quality of the data itself in near real-time. This includes estimates of the RFI environment, the full array sensitivity derived from sub-integration variances, bright source spectra, and real-time sky maps derived directly from the saved CHIME data products. This allows the CHIME team to get rapid feedback on the end to end performance of the instrument and to make timely adjustments if needed.

Offline Processing
Post processing of the CHIME data is done via a Pythonbased, YAML-configurable, offline pipeline. The basic infrastructure is available in CAPUT (Shaw et al. 2020a) and most of the non-CHIME-specific functionality is available in DRACO (Shaw et al. 2020c). CHIME specific parts of the pipeline are found in CH PIPELINE (Shaw et al. 2020b). The pipeline structure is flexible, being used not only for the main data product pipeline, but also for a variety of functions such as instrument simulations, holography and cross-correlation analysis, foreground removal, and power spectrum estimation.
The main data pipeline for CHIME runs on Compute-Canada's Cedar 4 cluster where one of our science data archives is located. The data are processed in units of Local Sidereal Days (LSD) 5 . The first step of the pipeline is to locate and load all files pertaining to a particular LSD into memory. A number of calibration and transformation operations are performed in the order presented below.
A timing correction is applied to each file to account for differences in timing between the two receiver huts 6 . The final step of redundant-baseline stacking is performed, in which redundant baselines corresponding to different pairs of cylinders are stacked together (this step is delayed to this point to allow for the timing calibration to occur). At this point, an offline stage of RFI masking is applied to the data, complementary to the real-time RFI excision that takes place in the receiver pipeline (see Section 2.5). This stage derives a figure-of-merit for sensitivity estimates based on the radiometer equation applied to cross-polarization data. This figure-of-merit is fed to a sum-threshold algorithm (Offringa et al. 2010) in frequency-time space which outputs a single mask for all baseline stacks. This stage also includes a specific search for intermittent RFI with the 6 MHz-wide bands, characteristic of TV stations.
To allow for later stacking of multiple sidereal days, the data are resampled to go from the original time-of-day basis to right ascension. This regridding is done via an inverse Lanczos interpolation which takes the data from the native resolution of around 10 s to approximately 5 in right ascension. The regridded data corresponding to a full sidereal day are combined into a sidereal stream, the final data product which is written to disk for analysis and long-term archiving. A few additional products are saved alongside each sidereal stream visibility data. These include ringmaps (see Section 4.5), delay power spectra, and bright point source spectra, as well as the sensitivity figure-of-merit and the RFI mask derived from them.
An independent second stage pipeline exists to combine many sidereal streams into higher sensitivity full sidereal day products called sidereal stacks. Initially, all sidereal streams in a specified time range are selected. These are specified to be times of mostly uninterrupted observation in which the telescope was operating in a stable mode. For instance, we require all the data that goes in a sidereal stack to have been calibrated on the same source (see Section 2.5 for calibration details).
Before stacking multiple days, an extra step of cleaning is applied to each sidereal stream to remove all day-time data as well as any times flagged as potentially corrupted by a range of environmental indicators (rain, excessive site RFI, bad calibration due to instrument restart, etc.). The data are combined into aggressively cleaned, sun-free, sidereal stacks which are the main science-ready data products of the CHIME data pipeline. Corrections for thermally induced phase shifts as described in Section 4.2 can be applied at this point.

BEAMS
The biggest challenge for detecting extragalactic 21 cm emission is filtering out the much brighter foreground emission, dominated by diffuse Galactic emission and extragalactic radio sources (Liu & Tegmark 2011). To do so, it is crucial to have precise knowledge of the instrumental beam response. Estimates by Shaw et al. (2015) indicate that this response must be characterized to roughly a part in 10 4 in power units, and this has motivated the pursuit of a number of parallel strategies for beam measurement and modelling, as well as efforts to quantify the required precision in more detail. In this section, we first describe how CHIME's instrument design determines the general features of the beam response, and then present the current status of our ongoing work to characterize this response.

General Features of the CHIME Beams
We define the "base" beam to be the illumination on the sky (amplitude, phase, and polarization) that results when a single feed broadcasts with all other feeds along the focal line shorted (Deng & Campbell-Wilson 2014). Although CHIME never operates as a transmitter, this is a useful construct for understanding the beam properties. In the absence of multipath effects, discussed below and in Section 3.3, this base beam produces a nearly elliptical illumination of the sky: ∼120 degrees long in the unfocused North-South (N-S) direction, along the cylinder axis, and a few degrees wide with frequency-dependent diffraction side-lobes in the East-West (E-W) direction, perpendicular to CHIME's cylinder axis.
Multi-path and other coupling effects alter this simple description by as much as 50% at some frequencies. The physical origin of the multi-path interference is radiation interacting with the focal-line assembly, which consists of the linear feed array and a common ground plane. In this environment, a signal broadcast by a feed will reflect off the cylinder and a large fraction of that signal will go directly to the sky, but a small portion strikes the focal plane assembly, where some is absorbed by a neighbouring feed and the rest is reflected and/or re-radiated by the assembly, eventually reaching the sky. The details of this latter interaction are complex and are still actively being characterized. Nonetheless, the "primary" beam is the illumination on the sky one gets when these effects are accounted for. The "synthesized" beam is the illumination produced by coherently combining the signal from multiple feeds, each with their own (nearly identical) primary beams. In this section we focus on characterizing CHIME's primary beam.
Since multi-path propagation is occurring within a 5 m cavity (CHIME's focal length), new interference fringes arise roughly every 30 MHz in frequency, as seen below. In the remaining sections we present the datasets used to calibrate CHIME's primary beam, and discuss approaches to modelling the full response, informed by these data.

Datasets for Beam Calibration
Ideally, the CHIME primary beam calibration would be based on direct measurements of the telescope's response to a bright (relative to the sky confusion), polarized point source along every direction in the far field, at every frequency. However, a sufficiently complete population of such sources is not available; instead, we make use of several direct measurements, each of which provides beam information in a different regime. Importantly, these regimes often overlap, which allows for multiple cross-checks on the results. Thus far, the most useful information has been obtained from three datasets: holography of bright point sources, which allows beam amplitude and phase measurements for each feed along a limited number of one-dimensional tracks through the beam; transits of bright point sources, which trace the feed-averaged beam response on meridian; and transits of the Sun, which provide similar information to holography (without the phase information) but with near-continuous sampling over a specific range of declination.
When plotting 2-dimensional beam measurements over a large angular extent, we use an orthographic projection with its origin at zenith. This projection has the advantage of not distorting the apparent beam width at different elevations. Moreover, the projected coordinates x and y in the tangent plane remain parallel to East and North, respectively. For the unit vector pointing to hour angle HA and declination δ, the where is the latitude of the observer (+49.3 • for CHIME).

Holography
Holography is an established technique for making accurate measurements of the amplitude and phase of antenna beams at radio frequencies (e.g. Bennett et al. 1976;Scott & Ryle 1977;Baars 2007). We use this technique by tracking a celestial source with a nearby moving telescope while the source transits through the stationary CHIME beam. The correlation between the signals from each stationary feed and the tracking reference telescope traces the response of CHIME along the path of the source. For CHIME holography, the John A. Galt 26 m telescope, located 230 m East of CHIME, is used as the tracking system. For these observations, a 400 MHz to 800 MHz dual-polarization modified CHIME receiver is mounted on the Galt telescope (Berger et al. 2016). The resulting cross correlations yield CHIME's co-polar and cross-polar far-field beam response (amplitude and phase) per feed, per frequency, along a track in hour angle at the declination of each observed source.
The data collected to date comprise 1888 tracks of 24 celestial sources since holographic observations began in October 2017, typically spanning ±40°or more in hour angle and −21°to 65°in declination (−70°to 16°in zenith angle). The data are fringe-stopped (phase shifted to account for Earth rotation) and binned to a celestial grid, with the resulting average and variance per bin stored on disk. Data from successive observations of a given source can be combined, reducing measurement noise. A sample holographic measurement of Cyg A is presented in Fig. 15, which shows Feed number Figure 16. Per-feed measurements of the CHIME E-W beam centroid obtained from a Gaussian fit to the holographic measurements of a Cyg A track. For each Cylinder A-D (left to right panels), the best-fit centroid is shown as a function of feed position along the cylinder. Multiple points per feed show results for each non-flagged frequency that was processed for that feed. The spread with frequency arises from a small but statistically significant oscillation in the centroid with a periodicity of 30 MHz, indicating a small E-W asymmetry in the signal multi-path. The dominant effect, however, is the position-dependent variation that arises from imperfections in the cylinder surface and primarily from a few mm of E-W position offsets of feeds on the focal line. The Y polarization is shown; the X polarization shows a similar trend with a slightly larger frequency variation.
the amplitude and phase of the co-and cross-polar beams in each CHIME cylinder. For each frequency and co-polar correlation product in the holography data, we fit the sum of a Gaussian profile and a constant offset to the amplitude response as a function of hour angle. The resulting centroid and Gaussian full-width half-max (FWHM) parameters are shown in Figs. 16 and 17, respectively, for all feeds and frequencies.
The centroid parameter shows a small but significant dependence on focal line position which is correlated for nearby feeds (Fig. 16). This suggests that the centroid offsets are due to physical displacements of the focal lines and/or cylinder structures from their design positions. Note that, given the 5 m focal length of CHIME, a 0.2°centroid offset requires an effective position offset of 1.7 cm between the E-W feed position and the symmetry plane of the cylinder. In cylinders A, C, and D, the median centroid offset (taken over feed number) is close to zero, whereas in Cylinder B, all feeds are offset to the east (i.e., towards negative hour angle), implying that the focal line as a whole is offset by ∼1 cm from the symmetry plane of Cylinder B's parabolic figure. Multiple points for each feed on a given cylinder in Fig. 16 show measurements for that feed at different frequencies, and the spread of these points represents a small frequency dependence in the E-W centroid. This variation has a periodic- X pol Figure 17. Measurements of the CHIME X − Z plane (East-West) beam FWHM obtained from a Gaussian fit to the holographic measurements of a Cyg A track, plotted as a function of frequency. Multiple points per frequency show results for each non-flagged feed that was processed for that frequency. The top and bottom panels show results for the Y and X polarizations, respectively. The dominant variation in the FWHM arises from signal multi-path which introduces a 30 MHz periodicity in the beam response. Characterizing this multi-path is the dominant ongoing effort in the CHIME beam calibration program.
ity of ∼30 MHz which arises from an E-W asymmetry in CHIME's signal multi-path. Multi-path effects are discussed in Section 3.3. Fig. 17 shows the FWHM parameter as a function of frequency for both polarizations, with multiple points per frequency representing measurements for all the non-flagged feeds for that frequency. As expected given the dipole illumination pattern of the feed, the FWHM is roughly twice as large at 400 MHz as at 800 MHz and ∼20% higher in the X polarization than in the Y polarization. Multi-path effects cause the ∼30 MHz ripple in the FWHM for both polarizations. There is a larger spread in the FWHM measurements for the X-polarization, especially at low frequencies. This difference between polarizations remains after including flags for feeds near structural elements like support struts, so the exact cause of the larger spread in X polarization, and its impact on the cosmology data analysis, remains under investigation.

Celestial Sources Near Transit
There are 37 bright point sources in CHIME's declination range with flux greater than 10 Jy at 600 MHz, which is sig- nificantly above our estimated confusion noise of ∼0.1 Jy. These sources span zenith angles of 37°north of zenith to 38.9°south of zenith. We measure the spectra of these sources at transit by phasing the CHIME array to the declination of the source and recording the observed spectrum as a separate dataset. Given our Cyg A calibration strategy, the ratio of the observed spectrum to its spectrum reported in the literature gives the ratio of CHIME's on-meridian beam response at the zenith angle of the source to its on-meridian response at the zenith angle of Cyg A. Examples of these data are shown in Fig. 18, along with a preliminary fit to a "coupling model" described in Section 3.3.2.
This technique can be extended to a much larger number of fainter sources if we restrict attention to inter-cylinder baselines which have a large east-west baseline component and therefore lower confusion noise from diffuse synchrotron emission. For the cross correlation of CHIME data with large scale structure traced by the eBOSS survey (CHIME Collaboration et al. 2022a), we used this technique to produce a model of CHIME's main lobe response from the north to south horizon. A detailed description of the procedure and the model is given there, so we provide only a brief summary here.
Inter-cylinder baselines with a large east-west component are largely insensitive to diffuse sky signals, such as Galactic synchrotron emission. Thus, one can approximate the emission measured by these baselines as solely composed of radio point sources (ignoring the subdominant cosmological signal). We construct a model of this sky using catalogs of source spectra measured by the VLA Low-frequency Sky Survey (VLSS; Cohen et al. 2007), the Westerbork Northern Sky Survey (WENSS; Rengelink et al. 1997), the NRAO VLA Sky Survey (NVSS; Condon et al. 1998), and the Green Bank survey (GB6; Gregory et al. 1996). This sky model is put into a simulation pipeline that produces mock (noise-free) visibilities which have no CHIME beam convolution applied. Then, as described in Appendix A of CHIME Collaboration et al. (2022a), we form beams on the sky using both the simulated and measured visibilities and regress the two data sets to infer the primary beam response in the data. The resulting beams are filtered to remove small-scale features that likely originate from flux errors in the catalog.
At present, the model is only derived for hour angles less than roughly 2°, but in principle it can be extended to cover the dominant east-west sidelobes. Fig. 19 shows the beam response obtained from this method for the Y polarization at 600 MHz. Our interpretation of the main features of this beam is given in Section 3.1 and 3.3.

Solar Response
The Sun provides a complementary dataset to astrophysical point sources for beam mapping. Every six months, the Sun moves between ±23.5°declination, providing quasicontinuous spatial sampling over this declination range. Additionally, the brightness of the Sun (>100 kJy) permits unconfused hour angle coverage comparable to the holographic measurements. The flux of the Sun varies with time, but this can be calibrated at every declination that has a sufficiently bright astrophysical source. Variability between such calibrations limits the accuracy of these data, as does the finite angular size of the Sun, but even this qualitative information is invaluable for guiding beam modelling efforts. Data collected in the Fall of 2019 are shown in Fig. 20. A more detailed description of CHIME's solar data processing is presented in CHIME Collaboration et al. (2022b).

Beam Modelling
Ultimately, we seek to use the datasets described above to construct a single comprehensive beam model. The biggest challenge in this endeavor is accurately accounting for the multi-path and coupling effects that modulate the simple elliptical base beam. In the following, a few complementary approaches to this problem are described: a data-driven approach, where we attempt to extrapolate the datasets de-scribed above to the 2π sr above the horizon; and a semianalytic approach where we model the coupling between separate feeds with a physically-motivated parameterization. Note that this work is ongoing, and further details are deferred to forthcoming papers. The models described below are intended to describe a typical feed's beam response. The response of individual feeds will deviate from this owing, for example, to perturbations in the cylindrical reflector shape (e.g. Fig. 5), and/or to feed position and orientation offsets (e.g. Fig. 16). Additionally, the presence of structural elements in the vicinity of some feeds, e.g. support struts, can scatter radiation and alter the beam response of those feeds (Landecker et al. 1991). Given that CHIME measures numerous redundant visibilities (i.e. correlation products with the same baseline), feed-to-feed variations will average down in the stacked data. The extent to which these variations must be accounted for when filtering foregrounds remains to be quantified.

Data-Driven Extrapolation
We exploit the fact that CHIME's beam response is nearly separable in orthographic (x, y) angular coordinates, and use singular value decomposition (SVD) of the solar data to derive a set of beam modes which can be continued to regions not covered by the solar data. The extrapolations can be guided by additional data, e.g., the holography data ( §3.2.1) and/or the celestial source data ( §3.2.2); and/or by theory, e.g. the coupling model ( §3.3.2).We have been developing a few approaches to this extrapolation problem which we outline below. However, we have yet to settle on a single approach, so we defer the details to a forthcoming paper.
In one approach we form a set of basis functions at a target frequency, derived from the solar data in a small frequency range centred on the target frequency. We use the coupling model to extrapolate these functions to 2π sr and fit them to a combination of the holography and celestial source data described above. The viability of this model rests on the fact that ∼99% of the variance in the solar data can be described by a linear combination of 3 modes which are separable in (x, y) coordinates. However, our ability to accurately extrapolate these modes to the rest of the sky relies on a model that has known limitations. Further, our ability to assess the quality of the model is limited by the available holography and source data, which have limited sky coverage. Fig. 21 shows a current estimate of the 2π sr beam response at 678 MHz.
In a second approach, we exploit the fact that the sidelobe signal in the solar data, as a function of orthographic x -once re-scaled by frequency, i.e., x ≡ x · (ν/600 MHz) -is well described by a linear combination three functions of x over the entire range of (y, ν) measured by the solar data. We fit these three modes to the near-meridian celestial source data depicted in Fig. 19, at each y and ν separately. The result is a 2π sr model which is visually similar to Fig. 21. A detailed description and comparison is deferred to a forthcoming paper.

Coupling Model
This approach is a phenomenological one inspired by physical optics: we form a parameterized model of the base beam and of multi-path effects, and fit those parameters to the data described in Section 3.2. In its simplest form -called the coupling model -the multi-path is attributed entirely to crosstalk between pairs of feeds along the focal line. In the time domain, we may express this as a superposition of base beam profiles, delayed by specific amounts in time, where A i (n, t) is the electric field produced by feed i (thought of here as a transmitter) in the absence of neighbouring feeds, n is a directional unit vector, A j (n, t + τ ij ) is the electric field produced by neighbouring feed j, delayed by a time τ ij , and α ij is a coupling coefficient that describes the strength of the coupling. In the frequency domain, the time delay transforms to a phase factor. In the model's simplest form, we assume that all feeds produce the same pattern, A(n), and that there are two coupling paths between any pair of feeds: a "direct" path via signals propagating parallel to the ground plane with delay τ ij = |∆y ij |/c, where ∆y ij is the north-south separation between feeds i and j, and a "1-bounce" path via signals reflecting once off the cylinder . Orthographic projection of the modelled CHIME beam response in X (upper panel) and Y polarization (lower panel) feed, generated at 678 MHz using the data-driven model described in Section 3.3. It is modeled using basis functions derived from solar data measurements, which are fit to independent measurements of the beam. as they travel from feed i to feed j, with a delay set by analogous geometric arguments. The model is parameterized in terms of coupling coefficients for different coupling paths, and their associated fall in strength as a function of feedseparation. An example of this model, fit to the source transit data and evaluated on meridian, is presented in Fig. 18. Typical coupling strengths between adjacent feeds are found to be ∼ 15% and ∼ 3% for the direct-path and 1-bounce-path cases, respectively. The coupling strength as a function of antenna separation falls differently for the two cases, and is estimated to be ∼ 1/|∆y ij | 2 and ∼ 1/|∆y ij | 1/2 for the direct and 1-bounce paths respectively. Multi-bounce paths couple at less than one percent. Further details about the parameterization and performance of this model will be presented in a forthcoming paper.
There are at least two known limitations of the coupling model described above: 1) to date, it has not been able to fully account for the frequency dependence we observe in the source transit data (Fig. 18), especially in the lower half of the frequency band, and 2) it predicts a N-S response modulation that is independent of E-W direction on the sky, which is inconsistent with the solar data (Fig. 20). There are at least two possible explanations for this: 1) the coupled feeds, j, (re)radiate a different base beam, A j (n), than does the source feed, i, and/or 2) in addition to coupled feeds re-radiating the source signal, there is also a reflected signal that bounces directly off the ground plane and back to the cylinder before reaching the sky. This reflected signal could have a slightly different delay parameter than the 1-bounce coupled signal, and is expected to have a different E-W profile than the coupled signal.
We are in the process of developing a richer model that incorporates these effects, parameterized by the electric field distribution in the cylinder aperture, as informed by the commercial software packages CST (Simulia 2022) and GRASP (TICRA 2022). From preliminary studies, it appears that the aperture field can be parameterized relatively compactly, and that the resulting model is qualitatively successful at fitting the features seen in the solar data. Specifically, with ∼20 parameters to describe the aperture field, single-frequency fits to the solar data in Fig. 20 produce a model with residual errors of ∼10 −3 in the solar data, but which can be evaluated over the full sky. Future work will involve using the holography and celestial source data in the model fits, so a detailed discussion of this effort will be deferred to a forthcoming paper. Note that the coupling model described above is a special case of this more general multi-path model.

Beam Model Usage
In this section we summarize how various beam models developed for CHIME have been used in scientific analyses to date.
• The celestial source model depicted in Fig. 19 was developed for the stacking analysis presented in CHIME Collaboration et al. (2022a).
• The detection of an exceptionally bright radio burst from a Galactic magnetar (CHIME/FRB Collaboration et al. 2020) occurred when the object was 22 degrees off of CHIME's meridian. Characterization of this rare event requires knowledge of the instrumental beam well off axis. We use the solar data (CHIME Collaboration et al. 2022b) and Taurus A (Tau A) holography data to measure CHIME's beam response there, enabling a measurement of the burst flux/fluence.
• The first CHIME FRB catalog (The CHIME/FRB Collaboration et al. 2021) gives an estimate of flux/fluence of each FRB. A beam model which gives the beam solid angle as a function of forward gain and frequency is required to model the statistical distribution of their brightness. An early version of the 2π sr model depicted in Fig. 21 is used for this work. This enables a measurement of the FRB sky rate, one of the main results from the paper.
• CHIME/FRB is able to perform polarimetry on some events (Mckinven et al. 2021). While polarized NOTE-Since each source of data loss is largely independent of all other sources, the total fraction of data lost is given by beam models are not yet used in these measurements, CHIME's beam data have informed which systematic effects need to be included in the polarization fits as nuisance parameters. The most important of these is the differential response of the X and Y -polarized beams near their half-power points, seen clearly in all CHIME beam measurements.
• The FRB team is building outrigger cylindrical telescopes to provide a steady stream of sub-arcsecond localizations of FRBs. Data from CHIME holography show a lack of significant beam phase variation within a few degrees of meridian (Fig. 15). This result is crucial input to the design of the CHIME Outriggers, meaning the optical design of the Outriggers could differ somewhat from CHIME's design and not require beam phase re-calibration.
4. PERFORMANCE In this section, we evaluate the performance of the instrument using data acquired over the first two years of operation and a number of dedicated measurements. This performance evaluation includes an examination of the main sources of data loss, an assessment of the stability of the complex receiver gains, a characterization of the system temperature, an investigation into the effectiveness of the real-time RFI excision algorithm, and finally, a presentation of maps of the radio sky created from the CHIME stack dataset.

Data Loss
CHIME has been operating continuously since its firstlight ceremony on 7 September 2017. The first year of operations was dedicated to commissioning the instrument, developing the real-time pipeline, and developing the calibration and flagging strategies. Acquisition of data for the cosmological analysis began on October 7, 2018. Since then, the daily data acquisition rate has averaged ≈ 1 TB/d. Of this daily total, approximately 600 GB is the stack dataset containing the primary science data. The remainder is calibration, beam holography, housekeeping, and other engineering datasets. Table 1 summarizes the main sources of data loss between 7 October 2018 and 7 October 2020. During this two-year period, the instrument was down for a total of 127 d (17 %). The majority of this time (102 d) was due to planned hardware maintenance and software upgrades, which occurred approximately five times per year. A further 25 days were unintended interruptions due to power failures, cooling failures and other accidental outages.
The radio signal from the Sun dominates over the signal from the rest of the sky, even when the Sun is in the far sidelobes. The signal from the Sun can be modelled and subtracted to a large extent; however, feed-to-feed variations in the gain or beam and inaccuracies in the model for the extended emission yield residuals that are significant compared to the noise and signal from the rest of the sky. As a result, data acquired when the Sun is above the horizon are currently excluded from the cosmological analysis.
Precipitation at the telescope site causes deterioration of the detected signal as a result of water pooling around the focal line electronics. This signal deterioration is broadband and characterized by a reduction in gain, an increase in noise, and, occasionally, gain oscillations with periods ranging from seconds to minutes. Accumulation of dry snow does not cause analogue signal deterioration, but snow-melt, which is more difficult to detect using weather data alone, does produce the same signal deterioration as rain. Signals from the 2048 feeds are monitored for broadband, differential increases in their autocorrelations. This signature is used to identify and flag wet feeds before collating redundant baselines. After each rain or snow-melt, roughly 4 % to 12 % (inter-quartile range (IQR)) of the inputs are flagged for 3 h to 21 h (IQR), effectively until they dry. It is not yet clear if data acquired during these wet periods can be used in the science analysis. Excluding it results in a 20% reduction in observing time, preferentially occurring in months when nights are longest and therefore when we have the most useful data.
Steps are being taken to improve focal line waterproofing.
The synchronization procedure implemented by the FP-GAs does not guarantee that the phase of a common signal measured by two inputs on different ADC chips will remain constant through an FPGA restart. This change in phase after FPGA re-synchronization is observed in the noise-source data, and the size of the phase change can be large compared to the requirements on instrument stability outlined in Section 4.2.2. As a result, we mask the interval between each FPGA restart and the following point source calibration. This results in an approximately 6% reduction in observing time. Table 1 also lists the average fraction of the 400 MHz to 800 MHz band that is masked due to RFI (as detailed in Section 4.4) and lost due to non-operational GPU nodes. Note that in June 2020 the correlator software was upgraded to allow for much greater flexibility in the mapping between frequency channels and GPU nodes. This gave us the capability to send frequency channels already contaminated by persistent RFI to the set of GPU nodes that are offline at any given time, which recovers a large fraction of the 10 % of the band that was previously lost due to non-operational GPU nodes.
Finally, Table 1 provides estimates of the fraction of the 2048 correlator inputs that are masked prior to collating redundant baselines. The flagging broker masks approximately 3% of inputs because they fail one or more of the tests described in Section 2.5. In addition, in December 2019 we began applying a static mask that consists of the 8 feeds at the edge of each cylinder because it was determined that these feeds exhibit a highly non-redundant beam pattern.

Stability
The stability of the instrument is assessed using the complex gains measured by the calibration broker (described in more detail in Section 2.5). The broker computes and stores gains using data from four bright source transits every day: Cassiopeia A, Cygnus A, Taurus A, and Virgo A, henceforth referred to as Cas A, Cyg A, Tau A, and Vir A respectively. Fig. 22 shows an example of N 2 feed visibility data acquired during a Cyg A transit after applying the complex gains derived from the transit. On any given day, one source is chosen as the primary calibrator (typically the brightest source to transit at night), but all of the transits are analyzed offline to assess stability. To help assess and maintain phase stability, a broadband noise source system is also employed, as described below. Raw amplitude Daily correction Daily + thermal correction Figure 23. Gain amplitude stability. The blue band shows the standard deviation of the fractional gain amplitude variations as determined from 259 transits of Cas A, Cyg A and Tau A from June 2018 to July 2019. The rms stability is evaluated input by input; the central curve shows the mean stability across inputs while the band indicates the 1-sigma spread across inputs. The orange band shows the corresponding information with the gains corrected once per day using a previous transit. The green band shows the result of applying an additional correction to the orange data based on a linear regression to the ambient temperature change since the previous transit. Gaps in the data correspond to known RFI-dominated bands.
We use end-to-end simulations to determine our stability requirements. Simulation of a CHIME-sized telescope is challenging due to computer resource limitations; therefore, we have performed simulations of a scaled-down instrument (with roughly 1/4 of CHIME's collecting area) to investigate these requirements, examining the anticipated accuracy of the 21 cm power spectrum measured after the application of the Karhunen-Loève foreground filter described in Shaw et al. (2015). This work found the requirement for fractional variations in the complex gain to be less than 1 %, which translates into phase errors smaller than 0.007 rad and amplitude errors smaller than 0.7 %. However, these requirements are derived from a simulation whose gain variations are constant across the band and un-correlated from input to input. Furthermore, it is unclear how these requirements scale with the size of the telescope, and neither of these conditions hold in the observed gain variations presented below; thus, these requirements serve as a rough guide only. More realistic simulations designed to better reflect some of the observed complex gain variations have since been performed, and the resulting requirements are noted below where applicable. Fig. 23 shows the fractional gain amplitude variations (standard deviation) for all correlator inputs as derived from the calibration broker gains over a full year from June 2018 to July 2019. These data include 259 gain solutions (94 from Cas A, 89 from Cyg A, and 76 from Tau A), which have been scrubbed of RFI contaminated transits and anomalous gains mostly related to wetness of the instrument during rainy periods.

Amplitude
The blue curve indicates the intrinsic gain variations after outliers are removed but prior to applying any calibration corrections. It shows a pronounced slope with frequency which ranges from 1 % at 400 MHz to 1.8 % (standard deviation) at 800 MHz. A substantial portion of this variation is due to the thermal susceptibility of the instrument.
The orange curve shows the residual gain amplitude variations for the same data, but after applying a daily correction similar to that which is applied to the archived visibility data. This gives an indication of the gain variations present in the stored data prior to applying any subsequent corrections (see below). To produce this curve, we take the difference between each transit's gain and a solution from the previous 48 h (if available) and compute the standard deviation of the difference. This procedure brings the fluctuations down to a nearly flat 0.9 % to 1 % level.
The green curve shows the residuals after correcting the orange data using the measured thermal susceptibilities and the ambient temperature change since the previous transit. This brings the variations down to ∼0.7 % (standard deviation). The thermal correction flattens the residuals considerably, a consequence of the fact that the temperature susceptibility of the system gain rises with frequency from 0.06 %/K at 400 MHz to 0.2 %/K at 800 MHz. This measured stability achieves the preliminary requirement described above, but with no margin.
By construction, the data tracked by the orange curve remove gain variations slower than ∼one day due to any source, while the data tracked by the green curve removes variations correlated with ambient temperature on all time scales. We find that thermal regression applied to raw data (blue curve) and the daily-corrected data (orange curve) produced similar residuals. This suggests that most of the variation on time scales longer than a day is thermal in origin, and that variations on shorter time scales are not well correlated with ambient temperature.
The analysis discussed above is carried out input by input, assuming nothing about how correlated the gain variations are across inputs. A singular value decomposition (SVD) analysis of the raw gain variations over input and time reveals a single dominant mode followed by a closely packed mode spectrum. The dominant mode accounts for about 60% of the data variance at the lower end of the band and grows to over 80% of the variance halfway to the high end of the band. The singular vector of the dominant mode is highly correlated with the ambient temperature, implying that an ambient-temperature-based correction largely accounts for the common-mode portion of the variance. Thus, the residual variability after thermal regression (the green band in Fig. 23) gives a good estimate of the non-common-mode variations in the system. These residual gain variations show some degree of correlation across inputs. Making use of this to further improve the correction is under study.
The frequency structure of the gain stability depends on the declination of the source used to derive the gains. This appears to be due to a time and/or thermal dependence of the primary beam response of the instrument. This would be expected if feed-to-feed cross-talk depended on time and/or temperature, which, in turn, could result from thermal expansion and contraction of the CHIME structure. (See Section 4.2.2 and Section 3 for further discussion of these effects.) Efforts to model this dependence are ongoing. The results shown in Fig. 23 are computed from the gains derived from the three brightest sources, so the frequency structure shown there is a weighted average of the response to these three sources. Fig. 24 summarizes the phase stability of CHIME as inferred from the response of each correlator input to the two brightest calibration sources (Cyg-A and Cas-A). The measured phase variations are highly correlated across frequencies and, to first order, can be described by delay-type variations of the form

Phase
where δφ ij is the change in the relative phase between inputs i and j at time t for radio frequency ν due to a change in the relative delay δτ ij . If we perfectly corrected all delaytype variations, the phase stability of the instrument would improve from the red curve to the black curve in Fig. 24.
The dominant sources of delay variations are: relative drifts between copies of the 10 MHz clock that defines the sampling rate of the ADCs; expansion and contraction of the telescope with ambient temperature; and changes in the electrical length of the 50 m coaxial cables with temperature. We describe these three sources of delay variation in turn, and outline the methods used to partially correct for them, to stabilize the phase. After applying the corrections, the resulting stability is given by the blue curves in Fig. 24.
The dominant source of phase instability on time scales 20 min is relative drifts between the eight copies of the 10 MHz clock that are separately distributed to each of the eight FPGA crates. Each clock defines the sampling rate of the 256 ADCs within a crate. The drifts are measured and corrected using a single broadband noise source that is distributed to one input on each of the eight FPGA crates through a passive system of coaxial cables and power splitters. The correlator computes the covariance of the noise source inputs over a 10 s integration for each of the 1024 native resolution frequency channels. The largest eigenvector of this covariance matrix is used to estimate the response of the eight inputs to the signal from the noise source. The phase of the response is referenced to the time of the last point-source calibration to remove static ripples caused by reflections in the distribution network. Then, for each 10 s integration, the phase as a function of frequency is fit to Equation (4) to extract the delay as a function of time, δτ ij (t), for each FPGA crate i relative to a reference j. This is used as a proxy for the drift in the clock copy provided to that crate relative to the reference ADC input on the reference crate.
Examining the relative delay variations between the 4 crates within a single receiver hut, we find that the variations exhibit a sawtooth pattern with an 8 min (east receiver . Gain phase stability. Top: the standard deviation over 74 days of CHIME's phase response to Cas A at transit after applying a daily calibration derived from Cyg A. Lines denote the median and bands denote the central 68 % over the 2048 CHIME feeds. Red indicates the raw phase variations. Blue indicates the residual phase variations after correcting for delay variations caused by drift between copies of the 10 MHz clock, thermal expansion of the focal line, and thermal susceptibility of analog receiver chain as tracked by the ambient temperature (see the text for a discussion of each of these effects). Black indicates the residual phase variations after removing all delay-type variations by fitting and subtracting a model for the phase variations that scales linearly with frequency from each transit. The phases are referenced to the average phase over feeds of a given polarisation. Middle: the standard deviation of the delay variations, shown as a histogram over feeds. The red histogram indicates the raw delay variations while the blue histogram shows the residual delay variations after applying the three corrections listed above. Bottom: same as the middle panel, but showing delay variations on short time scales ( 20 min), obtained by examining a window around the transit of Cyg A or Cas A when these sources are in the primary beam. The black curve in the top panel, which corresponds to the perfect removal of all delay-type variations, is by definition equal to zero for all feeds in the bottom two panels. hut) or 6 min (west receiver hut) periodicity that mimics the temperature variations in that hut. This periodicity tracks the cooling cycle of the chiller system in each hut. This produces a relative delay variation of 1 ps to 2 ps (standard deviation) between crates in the same hut. Since the temperatures of the two huts cycle at different periods, the relative delay variations between crates in different huts are significantly larger: approximately 6 ps to 8 ps (standard deviation). A suite of simulations is used to estimate the bias in the 21 cm power spectrum due to realistic clock drifts. We find that the clock drifts must have a standard deviation of 1 ps to ensure negligible bias in the power spectrum. The bottom panel of Fig. 24 shows the improvement in the shorttimescale delay noise that is achieved by regressing the delay variations obtained from point-source observations against the delay measured by the broadband noise source. The residual delay variations have standard deviation < 1.5 ps and are thus close to meeting our requirements.
Thermal expansion and contraction of the focal line introduce a temperature dependence to the north-south baseline distance that manifests as delay variations on timescales 20 min. We can model this with the following expression where is the latitude of the telescope, δ is the declination of the source, HA is the hour angle of the source, ∆y ij is the nominal north-south baseline separation, c is the speed of light, is the linear thermal expansion coefficient of the focal line, and δT (t) is the difference between the ambient temperature and the nominal temperature. Fitting the delay variations obtained from the point source transits to Equation (5) yields a thermal expansion coefficient of = 21 × 10 −6 /K for the focal line. This is approximately equal to the coefficient for aluminum and roughly twice that of steel. The focal line structure itself is made of steel while the cassettes that hold groups of 4 antennas to the focal line are made of aluminum, and are bolted to each of their neighbours. The interplay of these components as the temperature changes is still under study, but our model fits the sky data well so we adopt the best-fit as a description of the instrument. The resulting delay error is the same for all redundant baselines, so the correction for this effect can be done offline, after collating these baselines. However, the correction depends on sky position, so it needs to be implemented at the map-making stage. This work is currently under development.
After controlling for drift in the clocks and thermal expansion of the focal line, the residual delay variations exhibit a correlation with ambient temperature. Based on thermal chamber measurements of the components of the signal path, changes in the electrical length of the 50 m coaxial cables are expected to be the dominant source of thermally-induced delay variations. These changes in electrical length are the result of changes in the physical length of the cable from expansion of the centre conductor and changes in the dielectric constant due to a reduction in the dielectric density from expansion of the outer conductor. To first order, the observed delay variations can be modeled as where α is the thermal susceptibility of the coaxial cable, T is the temperature of the coaxial cable, subscripts i and j refer to specific inputs, and a bar indicates the average over all inputs. The first term is due to differences in the thermal susceptibility between cables while the second term is due to differences in the effective temperature of the cables. In order to gauge the relative importance of the two terms in Equation (6), we have installed three "cable monitors" that consist of two 50 m coaxial cables that are routed to the focal line and then back along the same path, with one end connected to the noise source described above and the other end connected to the correlator. There is one cable monitor routed to each of cylinders A, B, and C. The cable monitor data are processed in the same manner as the noise source data described above. The resulting delays are divided by 2 to account for the fact that the length of coaxial cable in the cable monitors is twice that of the CHIME on-sky inputs. The measured delays are regressed against the ambient temperature in order to measure the thermal susceptibility of the three cable monitors. The average thermal susceptibility over cable monitors isᾱ = 2.93 ps/K. The standard deviation over cable monitors is σ α = 0.04 ps/K, which will result in relative delay variations with a standard deviation of ∼0.25 ps given the temperature variations on a typical night. Residual delay variations that are not explained by differences in thermal susceptibility are attributed to differences in the effective temperature of the cables. These residuals have a standard deviation of ∼1.0 ps, which implies effective temperature variations with a standard deviation of ∼0.3 K given the value ofᾱ quoted above.
We characterize the difference in thermal susceptibility between CHIME correlator inputs by regressing the change in delay between point source transits against the change in ambient temperature. The standard deviation of the thermal susceptibility over inputs is 0.3 ps/K, which is much larger than we would expect from the scatter in the value of α measured for the three cable monitors. If we randomly draw thermal susceptibilities for three inputs from the sample of 2048, the probability they are all within 3 % like the cable monitors is < 2 %. This indicates that the analog receiver chain likely has some other source of susceptibility to the ambient temperature beyond the coaxial cables that is highly dependent on input. Nevertheless, this thermal susceptibility is well characterized using the point source observations; we estimate that our uncertainty on the thermal susceptibility is ∼ 0.05 ps/K using bootstrap resampling methods. Fig. 24 shows in blue the residual delay variations after correcting for clock drift, expansion and contraction of the focal line, and thermal susceptibility of the analog receiver chain. We find a standard deviation of < 1.5 ps on < 20 min time scales and 1 ps to 2 ps on 3 h time scales. The cable monitor data suggest that differences in the temperature of the coaxial cables are a significant contributor (∼ 1 ps) to the residual delay variation on long timescales. We are actively investigating new techniques to measure and correct for the differences in coaxial cable temperature.
We characterize the phase stability of the instrument on longer timescales by examining changes in the phase between night-time transits of other pairs of bright point sources observed between February 2019 and March 2020. The transit times of the four brightest point sources are spaced apart such that their various differences probe timescales ranging from 0 h to 24 h with a roughly 3 h sampling. The worst performance occurs on 18 h timescales where the post-correction delay variations have an RMS of 2.0 ± 0.6 ps (mean ± standard deviation over feeds). This is a small degradation in the 1.6 ± 0.6 ps RMS delay stability observed on 3 h timescales and shown in Fig. 24. If we expand the analysis to also include daytime transits, then we find a significant degradation in the delay stability, with the worst performance occurring on 10 h timescales where the RMS is 3.4 ± 1.1 ps. This is a secondary reason to exclude the daytime data from the cosmology analysis, with the primary reason being contamination from solar radio emission.

Noise
Measuring the system temperature of the CHIME receivers using observations of the radio sky alone is challenging because it requires knowledge of the effective area of the antenna beam pattern. Instead, we perform an in-situ measurement of the system temperature referred to the LNA input of four CHIME receivers (two polarizations on each of two antennas) by temporarily disconnecting the LNA from the antenna under test and connecting it to well-matched cold, ambient temperature, and hot loads at 80 K, ∼300 K and 373 K. We observe each regulated load for approximately 10 minutes, re-connect the LNA to the antenna, and resume normal observations.
The autocorrelations recorded by the CHIME correlator during the measurement are corrected for bias due to quantization to 4 bit real + 4 bit imaginary, which is insignificant for sky measurements but is a significant correction for the hot and ambient temperature measurements. The autocorrelations are converted to units of Jy using the gains obtained from the visibility matrix at the transit of Cyg A occurring approximately 6 hours before the measurements, and regressed against the load temperature. The slope of the regression is used to estimate the Jy/K factor that converts between flux density on the sky and temperature at the input to the LNA. The intercept divided by the slope is used to estimate the receiver temperature, by which we mean the noise temperature of the LNA, FLA, cables and ADC, referred to the LNA input. The Jy/K calibration factor is applied to the autocorrelations collected the night following the measurement to estimate the system temperature. The resulting system temperature measurements, referred to the LNA input for the two polarizations of one of the antennas are shown in Fig. 25. The results for both channels of the other antenna are consistent with these at the 5 % level. The receiver temperature increases from approximately 20 K at 400 MHz to 25 K at 800 MHz. This is in good agreement with measurements of the LNA temperature made in the laboratory and described in Section 2.3, indicating the LNA dominates the receiver noise, as expected from the design. The system temperature when a dim part of the radio sky is transiting overhead is approximately 50 K, but shows significant spectral structure that can be broadly separated into a 150 MHz and 30 MHz ripple. The approximately 30 K difference between the system temperature and the receiver temperature includes contributions from the radio sky, loss in the antenna balun, ground spillover, transmission through the mesh, noise coupled from neighboring feeds, and antenna impedance mis-match in order of most significant to least significant contribution.
The radiometer equation can be used to estimate the noise given the system temperature presented above and the number of baselines, integration time, and bandwidth. In what follows, the variance of the data on different time scales is estimated directly and compared to our expectation based on the radiometer equation. On short timescales, the variance of each visibility is estimated by differencing even and odd time samples at 31 ms cadence (see Section 2.5). The radio sky does not change appreciably on these timescales and thus drops out of the difference. This "fast-cadence noise estimate" shows good agreement with our expectation based on the radiometer equation after excluding events that are localized in time and frequency in a manner characteristic of RFI. On longer timescales, the variance can be estimated by differencing visibilities acquired at the same local sidereal time (LST) on different sidereal days. In general, these dayto-day variations are consistent with the fast-cadence noise estimate and integrate down with the number of redundant baselines that are stacked. There are a few exceptions. The residual complex gain instabilities described in Section 4.2 dominate the day-to-day variations when the four brightest point sources are in the primary beam. In addition, visibilities measured by the shortest intra-cylinder baselines, specifically those with a north-south distance less than 10 m, are dominated by variations in sky brightness due to residual complex gain instabilities and also variations in the noise coupled between the feeds that form the baseline.
Similar results are obtained with beamformed data, differencing "ringmaps" of the sky (see Section 4.5) produced on different sidereal days. For maps constructed with intercylinder baselines only (thus excluding the shortest intracylinder baselines mentioned above), the day-to-day variation over most of the sky is consistent with the fast-cadence noise estimate after accounting for the number of baselines that are used to produce the maps. The exception are pixels brighter than a few Jy/beam, for which the noise is dominated by residual complex gain instabilities. The noise can be further reduced by stacking maps produced on multiple days. In an analysis of 38 daily, inter-cylinder ringmaps spanning an interval of 73 days, the noise was observed to integrate down with the number of stacked days.

RFI
The real-time RFI-excision algorithm described in Section 2.4.2 was deployed in October 2019. To evaluate its performance, the Gaussianity of the autocorrelations are compared before and after applying the RFI excision. The Gaussianity test value (GT ) for signal i is defined as where ∆ν = 390 kHz is the channel bandwidth, V ii is the autocorrelation evaluated at N times t j , ∆t(∼ 10 s ) is the integration time, with (1 − f )∆t remaining on average after high speed excision and f is the real-time excision fraction. For a perfect Gaussian distribution the test will return ∼ 0, and a large deviation from 0 indicates non-Gaussianity of the data. The results of the test for a single input are shown in Fig. 26. Gaussianity of the data improves at all frequencies after applying the RFI excision, particularly in the 600 MHz to 700 MHz band where excising less than 1 % of the samples significantly improves the quality of the data. The algorithm excises 15 % of the data on average.
The offline RFI excision algorithm described in Section 2.7 masks frequencies and times where the measured sub-integration variance averaged over all cross-polar baselines deviates significantly from our expectation for radiometer noise. Over 188 nights in 2019 the average fraction of the band that was masked was 42 %, with little night-to-night variation. During this interval the real-time RFI excision was turned off. Fig. 27 shows as a solid line the cumulative distribution of frequency bins as a function of fraction of time masked over this interval. About 29 % of frequency bins are always masked, corresponding to the persistent sources of RFI discussed in Section 2.1. Fig. 27 also shows as a dashed line this same quantity for 27 nights in mid-2020 when the real-time RFI excision was turned on. The fraction of the band that is always masked increased to 35 % because of a degradation in the RFI environment at DRAO, primarily due to (i) the appearance in early 2020 of the downlink for Rogers 600 MHz band, which introduced persistent RFI in a part of the spectrum that was previously clean (617 MHz to 627 MHz) (ii) the transition from partial to complete occupation of the LTE band at 782 MHz to 788 MHz. For the cleanest half of the CHIME band the fraction of time that was masked decreased from almost 15 % to less than 5 % with use of real-time RFI excision.
Since the real-time excision operates on the 0.66 ms and 31 ms frames, it is able to mask transient RFI events while discarding a much smaller fraction of the data than the offline algorithm that operates on the 10 s data frames does. At present, the average fraction of the band that is masked is roughly the same with either method, 42 % but the fraction of the passband which is more than 95% free of RFI is much higher with rapid excision.

Sky Maps
We generate maps of the sky for data quality assessment, for instrument characterization and as the starting point for Galactic science with CHIME; all have short-term and longterm goals. Our basic product is the "ringmap". We generate one-dimensional images along the meridian by Fourier transforming, one image for every ten-second time sample, and we assemble these into an all-sky image. These maps employ visibilities directly. The process is described in detail in (CHIME Collaboration et al. 2022a). The cosmological stacking analysis is based on ringmaps with intra-cylinder baselines excluded in order to filter out diffuse Galactic emission and reduce the impact of noise crosstalk.

Single-day Maps
We show a ringmap produced from Y Y visibilities using a single sidereal day of data in Fig. 28. The map is shown in the time-sin(za) coordinate system, where za is the zenith angle, and with corresponding right ascension and declina-  (7) for a single input on October 11, 2019 from 0:00-1:30 PDT before and after kurtosis based RFI excision. The color of dots shows the average excised fraction over these 1.5 hours for each frequency channel. LTE and TV station bands are shown in purple. The Gaussianity of the data has improved in many frequency channels by excising less than 1 % of the samples, i.e., their GT value is getting closer to zero after RFI excision. Notice that almost all the data are automatically excised within the TV and LTE bands. While this heavy excision improves the GT values for what remains, frequency channels in those bands still fail and are excised. , kurtosis-based RFI excision was turned off. The dashed line indicates that nearly 40% of the CHIME band was masked less than 5% of time over 27 nights between 2020/06/26 and 2020/07/21 when the real-time RFI excision was turned on. The difference between the vertical asymptotes (30% always masked in 2019, 40% in 2020) is due to new radio transmitters nearby.
tion labels on the top and right. We show 24 sidereal hours of data, with time increasing from left to right: right ascension also increases from left to right, opposite to the astronomical convention for sky images.
The ringmap of Fig. 28 highlights a number of features of the Galaxy, our observing strategy, and instrumental features and artifacts. The large features of the radio sky dominate the map. The Galactic plane stretches across the sky, and the North Polar Spur rises from it. The Galactic plane and the North Polar Spur appear once at their true declinations and again at the top of the image, the result of aliasing. The spacing of feeds along the focal line is 30 cm, more than half a wavelength for frequencies higher than 500 MHz: the Fourier transform therefore produces an aliased response across much of the band. The bright sources -the Sun, Cas A, Cyg A, Tau A and Vir A -also have aliased versions; all except the Sun are unresolved by the CHIME beam and can be treated as point sources. Cas A is circumpolar, and a lower transit image, and its alias, are also seen. All bright sources are seen both at transit and in the sidelobes for several hours on either side of transit. Away from transit these sources appear to be at higher declination, producing the characteristic "smile" features on the ringmap. The point sources show a bright peak at the source right ascension and fainter peaks before and after transit, produced by the grating lobes; all the smile features have a dotted appearance. The shape of the smile is geometric and therefore frequency independent, but the positions of the grating lobe peaks along the smile are frequency dependent. Each time slice is an interferometric image lacking zero-spacing information, and therefore must average to zero. Consequently, the transit of each of the bright sources produces a vertical dark stripe of negative values across the map. Similar negative regions are evident at the right ascensions of particularly bright Galactic emission.  Figure 28. Map of the northern radio sky at 679 MHz constructed from data collected by CHIME over a single sidereal day (2018-12-21/22), obtained by beamforming all YY visibilities (excluding autocorrelations) for each 10 second integration to a grid of 2048 declinations along the meridian, spanning from horizon to horizon and equally spaced in sin(za). The map has been minimally processed, and no attempt has been made to deconvolve the transfer function of the instrument. The Sun and the four brightest point sources, and their aliases are identified. The map is shown with time (Pacific Standard Time, UTC−8) increasing from left to right to illustrate the CHIME observing strategy; therefore right ascension increases from left to right, opposite the astronomical convention. The image is plotted in the native time-sin(za) coordinates; declination and right ascension (or, equivalently because all observations are at hour angle zero, local sidereal time) are labeled on the right and top.
Crosstalk between adjacent feeds (see Fig. 18) produces ripples in zenith angle which are evident as horizontal stripes in an uncorrected ringmap. We have reduced the striping by subtracting the median at each declination from the image. This process is quite effective at |za| 25 • , but some striping is still evident at larger zenith angles.

Stacked Maps
In Fig. 29 we show a stacked map, formed from data from nearly two months of observations. This too is a ringmap, but the data are combined as visibilities before the formation of the map. The stack uses night-time data from 52 24-sidereal hour periods (we call them "days" for brevity), divided into contiguous sets of days from different periods in the year chosen to provide complete coverage of the sidereal day (see Section 2.7 for an overview of the daily processing pipeline). The stacking proceeds in two steps: first, days within a contiguous set are averaged together, and second, all these averages are combined.
In the first step, averaging over contiguous blocks, data deemed bad are masked (arising from the presence of the Sun or Moon, RFI, or data-quality flags) and any day with less than 70% coverage after masking is discarded. Bias due to crosstalk is estimated by calculating the median visibility at each zenith angle in a one-hour region of right ascension where the sky signal is at low intensity. This value is subtracted from each individual day of data before stacking. Ideally, the same right ascension range would be used for all averages, but this is clearly impossible because the part of the sky transiting at night is changing with time of year. We compromised by choosing two right ascension ranges for the estimate of the crosstalk contribution. To ensure consistency we use a set where both these ranges transit at night; we derive an additive correction from that average and apply it to all averages. Daily calibration is based on either Cas A or Cyg A. To account for different beam responses at the locations of these two sources we derive a multiplicative amplitude correction at every frequency and apply it to all the averages prior to stacking. To remove the most prominent "smile" artifacts for display purposes, we subtracted Cas A, Cyg A, and Tau A in visibility space.
The deconvolved ringmap at each frequency and declination is approximately given by the 1D convolution of the sky with the east-west profile of the primary beam at the corresponding frequency and declination, as described in detail in (CHIME Collaboration et al. 2022a). By this method, we attain an estimate of the true sky at each declination by deconvolving the beam profile from each row of the ringmap.
In the 52-day map of Fig. 29 we see all the features that are evident in the one-day map of Fig. 28, illustrating the fact that CHIME achieves a high signal-to-noise ratio even in one day. Both the single-day and stacked maps are confusion-  Figure 29. Map of the northern radio sky constructed from data collected by CHIME over 52 nights and stacked. This is a deconvolved Stokes I = (XX + Y Y )/2 ringmap obtained from all XX and YY visibilities using the stacked data, plotted in celestial coordinates in a plate carrée projection. The image shows most of the northern sky, oriented in the conventional way for astronomical images with right ascension increasing to the left (unlike Fig. 28). limited; a major benefit of stacked maps for Galactic science is the full sky coverage even with the elimination of daytime data. Within the envelope of the diffuse emission along the Galactic plane we can identify many well-known supernova remnants and H II regions; these are evident in more detail in Fig. 30. The combination of the visibility-space subtraction of the brightest three point sources and the deconvolution removes the grating lobe copies of all point sources and the saturation of the image at the right ascension of the brightest sources.
In Fig. 30, we show the map from Fig. 29 zoomed in on the Galactic plane and compared to a 408 MHz Stokes I map of the Galactic plane from the Canadian Galactic Plane Survey (CGPS; Tung et al. 2017). The CGPS 408 MHz data, obtained with the DRAO Synthesis Telescope, have an angular resolution of ≈ 3 , and cover the area 52 • ≤ l ≤ 193 • , −6.5 • ≤ b ≤ 8.5 • . Short spacings for the CGPS map are incorporated from the Haslam et al. (1982) single-antenna data. There is good overall agreement between the CHIME and CGPS maps in the Galactic plane. Discrete objects such as supernova remnants, and more extended objects such as the W3/4/5 H II region and the Cygnus X complex of H II regions and stellar clusters, are distinctly visible in the CHIME data, and are well matched with the CGPS data in terms of structure and relative brightness. Although the CHIME data lack zero-spacings, and thus sensitivity to the largest scale structures, much of the diffuse emission visible in the CGPS map is also clearly discernible in the CHIME map. This is especially true of the bright extended emission at the low-longitude end of the CGPS coverage. The bright radio sources, Cyg A and Cas A, produce artifacts in both the CHIME and CGPS maps, although these are more easily mitigated in the CGPS through mosaicing of fields with a sufficiently dense sampling of pointings in those regions. While CHIME does not match the high angular resolution of the CGPS, its spectral coverage far exceeds that of the CGPS 7 , allowing for more in-depth exploration of frequencydependent phenomena in the Galaxy over a larger spatial extent.
Sky maps like these will be the main data product for science involving non-cosmological foregrounds. We will have all-sky images at hundreds of frequencies across an octave obtained with the same telescope, allowing analyses of spectral indices of point sources, extended objects, and diffuse emission. The Galactic signal is dominated by synchrotron emission, linearly polarized at its source, and Faraday rotated by the intervening magneto-ionic medium along virtually every line of sight. A major scientific goal is to apply Faraday synthesis (Brentjens & de Bruyn 2005) to the polarization data. We will derive Stokes Q and U maps, which will provide a valuable dataset for Faraday synthesis across the whole sky; the wavelength-squared range and resolution of the CHIME data provide the Faraday depth resolution to isolate discrete magnetic features, with Faraday depth resolution δφ ≈ 3.8/∆(λ 2 ) ≈ 9 rad m −2 while retaining sensitivity to extended Faraday depth features, with φ max−scale ≈ πλ −2 min ≈ 22 rad m −2 , in the Galaxy (Schnitzeler et al. 2009). Therefore it will be possible to distinguish between extended structures and multiple narrow features in Faraday depth space. Exploration of this parameter space is only beginning Thomson et al. 2019). The 400-800 MHz polarization maps with ≈ 40 angular resolution from CHIME will form a component of the GMIMS survey, which includes a southern sky dataset obtained with the CSIRO Parkes Telescope (Wolleben et al. 2019) and a 1280 to 1750 MHz northern sky dataset observed with the Galt Telescope (Wolleben et al. 2021). If we are able to combine data across the 400 to 1800 MHz range, we will achieve δφ ≈ 7 rad m −2 and φ max−scale ≈ 110 rad m −2 , providing sensitivity to an unprecedented range of Faraday depth scales. CHIME is an interferometer: it has coverage of the (u, v) plane down to 30 cm baselines, but not to zero baseline because autocorrelations of the signal from each feed are excluded from the analysis. To provide information on Galactic structure at the largest angular scales, a companion polarization survey will be made with a 15 m radio telescope at DRAO, covering 350 to 1050 MHz. These data, calibrated to an absolute scale of brightness temperature, will also provide the calibration of CHIME polarization data. 7 The CGPS has a bandwidth of 3.5 MHz at 408 MHz, and a bandwidth of 35 MHz at 1420 MHz.
In addition, by observing the entire sky every day, we are sensitive to slow transients. We are cataloguing daily fluxes of 2723 point sources, primarily quasars, to characterize variability.

CONCLUSIONS AND OUTLOOK
We have built and are operating an extremely high mapping-speed instrument designed to measure the three dimensional distribution of neutral hydrogen over the full Northern Hemisphere and the redshift range 0.8 ≤ z ≤ 2.5 with enough accuracy to provide useful constraints of the expansion history of the Universe.
The instrument has been collecting data for cosmological analysis since late 2018. First results measuring the distribution of neutral hydrogen in three dimensional correlation with redshift catalogs of quasars and galaxies, using data from 2019, are presented in a companion paper (CHIME Collaboration et al. 2022a). CHIME is also monitoring the variability of 2723 sources with daily cadence, and has produced confusion limited maps of polarized Galactic emission across the 400-800 MHz band.
To quantify the cosmological constraining power of CHIME under ideal conditions, in Fig. 31 we show an updated forecast for the statistical precision of CHIME in measuring the cosmic expansion history using the BAO feature in the 21 cm power spectrum. Compared to previous forecasts in the literature, these results use a more accurate version of CHIME's feed layout (Section 3.2.1), updated models for the mean 21 cm brightness temperature and linear HI bias, and an empirically-derived estimate of CHIME's total system temperature, based on measurements presented in Section 4.3. We describe the methodology of these forecasts, which mostly follows Bull et al. (2015), in Appendix A. In particular, Table 2 lists the CHIME instrumental and survey characteristics used in these forecasts. Note that these forecasts assume perfect foreground subtraction and the absence of systematic errors. (Persistent RFI bands are indicated in the figure.) We also show the expected precision of a combined galaxy and quasar sample from the Dark Energy Spectroscopic Instrument (DESI; DESI Collaboration et al. 2016), computed within the same forecasting formalism; Lyman-α forest measurements expected from DESI (which we do not recompute, but take from DESI Collaboration et al. 2016); and state-of-the-art measurements by the extended Baryon Oscillation Spectroscopic Survey (Alam et al. 2021; specific measurements taken from Bautista et al. 2021;de Mattia et al. 2021;Hou et al. 2021;du Mas des Bourboux et al. 2020, andsummarized in Zhao et al. 2021). Fig. 31 shows that CHIME's intrinsic statistical precision is competitive with DESI, and that CHIME on its own is in principle capable of percent-level BAO measurements over most of its band.
Efforts in the coming years will be focused on realizing this potential, but we emphasize that we will need to overcome several challenges to do so. Foreground subtraction remains the primary obstacle to producing measurements which exploit CHIME's statistical power, and it is the main focus of Ideal CHIME (no foregrounds/systematics) Figure 31. Upper panel: Projected constraints on the cosmic expansion history, parameterized using the spherically-averaged distance measure DV as a function of redshift, shown relative to a fiducial ΛCDM cosmology. For CHIME, the forecast error bars (orange) were calculated for 1 year of integration time using the Fisher matrix approach of Bull et al. (2015), assuming perfect foreground subtraction and no systematics. Each error bar is statistically independent. We also show projections for the DESI clustering measurements (black), computed using the same formalism and based on combined constraints from the three clustering samples that overlap with CHIME's redshift coverage, and DESI Lyman-α forest measurements (blue), which we take from DESI Collaboration et al. (2016). (See Appendix A for the details of these forecasts.) Shaded grey bands denote regions inaccessible to CHIME due to persistent sources of RFI. Lower panel: Expansion history measurements from the final eBOSS survey, taken from the compilation in Zhao et al. (2021). Comparison to the CHIME forecasts in the upper panel indicates that the intrinsic statistical precision of CHIME is highly competitive with that of existing and near-future expansion history measurements. The challenge is to understand systematic effects well enough that statistical errors dominate.
our analysis effort. The path to seeing BAO through a haze of Galactic emission many orders of magnitude brighter is to filter out the spectrally smooth Galactic components and keep the spectrally chaotic BAO signal. Any systematic error which produces a rough or poorly understood spectral response mixes the Galactic and BAO signals. Thus, great care in the design has been taken to build a stable instrument with smooth, well characterized response. Very precise measurement of the angular response of CHIME will be necessary to perform component separation at the level required to characterize the BAO, because poorly understood frequency dependence of the angular response would lead to frequency variation of the Galactic contribution along an inferred line of sight. CHIME Collaboration et al. (2022a) describes a set of beam measurements and analysis methods that have allowed an initial detection of the 21 cm signal, and work is underway to improve upon these methods. Other areas requiring further attention include mitigation of noise crosstalk between nearby feeds, RFI mitigation in the lower half of the CHIME frequency band, and development of analysis methods that are robust to residual uncertainties in gain calibration and beam knowledge.
Overcoming these challenges has the potential to unlock a rich array of science targets accessible to 21 cm intensity mapping. Beyond BAO, there is potential to constrain the linear growth rate of structures as a way to test general relativity (Obuljen et al. 2018;Chen et al. 2019;; constrain models of cosmic inflation through signatures in the primordial power spectrum of fluctuations (Xu et al. 2016;Beutler et al. 2019) or non-Gaussian statistics in large-scale structure (Xu et al. 2015;Karagiannis et al. 2020); and probe the nature of dark matter (Carucci et al. 2015;Bauer et al. 2021). In addition, "tidal reconstruction" techniques, which reconstruct large-scale (foreground-obscured) modes from the correlations they induce between smallerscale modes (Zhu et al. 2018;Modi et al. 2019;Darwish et al. 2021), can greatly expand the opportunities for crosscorrelations with surveys of the cosmic microwave background or photometric galaxy redshifts. Additionally, lowerfrequency observations of the 21 cm line are well-suited to probing the era of reionization (Furlanetto et al. 2019a), or more ambitiously, the cosmic "dark ages" up to z ∼ O(100) (Furlanetto et al. 2019b).
The instrument described here also acts as the front-end for several other systems, providing calibrated data to an FRB detector (CHIME/FRB Collaboration et al. 2018), a 10beam system which monitors all pulsars visible from Canada with up to daily cadence (CHIME/Pulsar Collaboration et al. 2021), a system to search for cold clouds acting as 21cm absorption-line systems and a VLBI station (Cassanelli et al. 2021). Among the accomplishments these new instruments have made is the discovery of half a dozen Galactic pulsars, detection of an exceptionally bright radio burst from a Galactic magnetar (CHIME/FRB Collaboration et al. 2020), pointing to possible similarities of magnetars and FRBs, and publication of the first substantial catalog of FRB (The CHIME/FRB Collaboration et al. 2021). This broad range of additional scientific impact comes directly from achieving the sensitivity, large fractional bandwidth and enormous field of view that hydrogen intensity mapping requires.
We have shown that CHIME is capable of generating a multitude of scientific results, and have demonstrated that one can build a very powerful instrument for a comparatively small cost when a clear scientific goal drives the design. We expect a steady flow of further results in the years to come. angular diameter distance D A (z) are transformed into the volume distance D V and Alcock-Pacynski term F through D V (z) = (1 + z) 2 D A (z) 2 cz H(z) 1 3 , F (z) = (1 + z)D A (z) H(z) c , and we forecast the fractional errorbars on measurements of D V in each redshift bin. The amplitude A(z) is defined by decomposing the matter power spectrum P m into a smooth template P smooth and oscillatory BAO factor f BAO , implemented using the method from Bull et al. (2015). The linear bias b, linear growth rate f , and fluctuation amplitude σ 8 have their usual meanings, while σ NL is the redshift-space damping scale defined in the next section. In each redshift bin, the D V forecasts marginalize over the other parameters (F , A, bσ 8 , f σ 8 , and σ NL ) with no priors.

A.2. Signal models
For CHIME, we take the HI signal covariance to be C S (k, z) = T b (z) 2 b HI (z) + f (z)µ 2 2 e −k 2 µ 2 σ 2 NL P m (k, z) , where T b (z) is the HI brightness temperature and b HI (z) is the linear bias of HI. For T b (z), we use the expression from Hall et al. (2013), with the fitting formula for the mean HI density Ω HI (z) from Crighton et al. (2015), and for b HI (z), we use the model from Cosmic Visions 21 cm Collaboration et al. (2018), that smoothly interpolates between measurements from the IllustrisTNG simulation at z < 2 (Villaescusa-Navarro et al. 2018) and the analytical approximation from Castorina & Villaescusa-Navarro (2017) at z > 2. The large-scale effect of redshift-space distortions is accounted for in the f (z)µ 2 term in Eq. (A6), where f (z) is the linear growth rate and µ is the angle of the wavevector to the line of sight. At smaller scales, the exponential factor roughly accounts for the "Finger of God" effect that suppresses the observed clustering power beyond the cutoff scale σ NL . The linear matter power spectrum P m (k) is calculated using CAMB (Lewis et al. 2000). For DESI, we use C S (k, z) = b g (z) + f (z, k)µ 2 2 e −k 2 µ 2 σ 2 NL P (k) , Following Bull et al. (2015), we choose the nonlinear dispersion scale to be σ NL = 7 Mpc, corresponding to power being significantly damped at k 0.14 Mpc −1 . This value is higher than recent values from the literature, both for HI and DESI-like galaxies (e.g. CHIME Collaboration et al. 2022a), but is justified here because it limits the sensitivity of our forecasts to nonlinear scales where the assumptions in Eqs. (A6) and (A7) break down. Also, we make use of the BAO information only, instead of the full shape of the HI or galaxy power spectrum (e.g. Sailer et al. 2021). While a full-shape analysis would provide increased constraining power, it it also more likely to be affected by foregrounds and systematics, so we aim to be conservative in that respect by restricting to BAO only.

A.3. Noise models
We mainly follow Bull et al. (2015) in approximating the noise covariance for CHIME as C N (k, z) = T sys (z) 2 ν 21 n pol t tot λ 4 S sky A 2 e S FOV 1 n(u) , where ν 21 is the HI line emission rest frequency, n pol is the number of polarizations per antenna, and λ(z) is the observing wavelength corresponding to emission from redshift z. For the system temperature T sys (z), we use a constant 55 K, based on the observations in Section 4.3; note that this includes both instrumental and sky contributions, which are usually modelled separately