Articles

NEW LIMITS ON 21 cm EPOCH OF REIONIZATION FROM PAPER-32 CONSISTENT WITH AN X-RAY HEATED INTERGALACTIC MEDIUM AT z = 7.7

, , , , , , , , , , , , , , , , and

Published 2014 May 28 © 2014. The American Astronomical Society. All rights reserved.
, , Citation Aaron R. Parsons et al 2014 ApJ 788 106 DOI 10.1088/0004-637X/788/2/106

0004-637X/788/2/106

ABSTRACT

We present new constraints on the 21 cm Epoch of Reionization (EoR) power spectrum derived from three months of observing with a 32 antenna, dual-polarization deployment of the Donald C. Backer Precision Array for Probing the Epoch of Reionization in South Africa. In this paper, we demonstrate the efficacy of the delay-spectrum approach to avoiding foregrounds, achieving over eight orders of magnitude of foreground suppression (in mK2). Combining this approach with a procedure for removing off-diagonal covariances arising from instrumental systematics, we achieve a best 2σ upper limit of (41 mK)2 for k = 0.27 h Mpc−1 at z = 7.7. This limit falls within an order of magnitude of the brighter predictions of the expected 21 cm EoR signal level. Using the upper limits set by these measurements, we generate new constraints on the brightness temperature of 21 cm emission in neutral regions for various reionization models. We show that for several ionization scenarios, our measurements are inconsistent with cold reionization. That is, heating of the neutral intergalactic medium (IGM) is necessary to remain consistent with the constraints we report. Hence, we have suggestive evidence that by z = 7.7, the H i has been warmed from its cold primordial state, probably by X-rays from high-mass X-ray binaries or miniquasars. The strength of this evidence depends on the ionization state of the IGM, which we are not yet able to constrain. This result is consistent with standard predictions for how reionization might have proceeded.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

The Donald C. Backer Precision Array for Probing the Epoch of Reionization (PAPER; Parsons et al. 2010),11 a dedicated experiment employing nontracking, dual-polarization dipole antennas in a 100–200 MHz band, is one of several radio telescopes aiming to measure the power spectrum of highly redshifted 21 cm emission to inform our understanding of cosmic reionization (Furlanetto et al. 2006; Morales & Wyithe 2010; Pritchard & Loeb 2012). Other such telescopes include the Giant Metre-wave Radio Telescope (GMRT; Paciga et al. 2013),12 the LOw Frequency ARray (LOFAR; Yatawatta et al. 2013),13 and the Murchison Widefield Array (MWA; Bowman et al. 2013; Tingay et al. 2013).14 PAPER consists of two distinct arrays: a 32 antenna deployment at the NRAO site near Green Bank, WV, which is used primarily for engineering investigations and field testing, and a 64 antenna deployment in the Square Kilometre Array South Africa (SKA-SA) reserve in the Karoo desert near Carnarvon. PAPER South Africa is used primarily for science observations and provides the data on which this paper is based.

Recent advances in our understanding of how smooth-spectrum foreground emission can be isolated from the 21 cm Epoch of Reionization (EoR) signature via delay-spectrum analysis (Parsons et al. 2012b, hereafter P12b) and how maximum-redundancy array configurations can substantially improve the sensitivity of power spectrum measurements in the low signal-to-noise regime (Parsons et al. 2012a, hereafter P12a) have resulted in dramatic improvements in PAPER's prospects for constraining the power spectrum of 21 cm reionization. In this paper, we provide a first look at the application of delay-spectrum analysis to maximum-redundancy PAPER observations, culminating in a new upper limit on the power spectrum of 21 cm emission from reionization.

This paper is structured as follows. In Section 2, we describe the observations used. Section 3 details the calibration and analysis pipeline. In Section 4, we describe the results of the analysis and the constraints we are able to place on reionization, and we conclude in Section 5 with a discussion of the efficacy of delay-spectrum analysis as well as PAPER's near-term prospects for improving upon these results.

2. OBSERVATIONS

In this section, we summarize the salient features of the observations used in our analysis. For more details regarding the PAPER telescope, its drift-scan observing mode, and the primary beam shape, we refer the reader to Parsons et al. (2010), Jacobs et al. (2011), Pober et al. (2012), and Stefan et al. (2013).

Our analysis centers on the XX and YY linear polarization products of 32 PAPER antennas deployed in the maximum-redundancy configuration shown in the left panel of Figure 1. All baselines within this core were used in the calibration steps described in Section 3. For this power-spectrum analysis, we use only a subset of the baselines: the 28 antenna pairs connecting adjacent north–south columns in the strictly east–west direction (0–16, 1–17, 2–18, 3–19, 8–16, etc.), as well as the 21 slanted just north of east (1–16, 2–17, 3–18, 8–17, etc.) and the 21 just south of east (0–17, 1–18, 2–19, 9–16, etc.). Within each of these groups, baselines are all instantaneously redundant; they sample the same Fourier modes of the sky. Taken together, these 28 + 21 + 21 = 70 baselines represent the majority of the total instantaneous power-spectrum sensitivity of the 32 antenna, maximum-redundancy observations (P12a).

Figure 1.

Figure 1. Left: the 32 antenna, dual-polarization, maximum-redundancy array configuration used by the PAPER deployment in the Karoo Radio Observatory site in South Africa for power-spectrum observations. For power-spectrum analysis, we use the baselines connecting antennas between adjacent columns for north–south deflections of zero (0–16, 1–17, 2–18, 3–19, 8–16, 9–17, etc.), one (1–16, 2–17, etc.), and minus one (0–17, 1–18, etc.). Note the expanded scale in the vertical axis. Right: the 64 antenna, single-polarization, minimum-redundancy array configuration on the same site, whose imaging observations were used to characterize the spectrum of Pictor A in Jacobs et al. (2013). The maximum redundancy positions are shown in green for comparison.

Standard image High-resolution image

For each baseline, a 100 MHz band from 100 to 200 MHz was divided into 2,048 frequency channels, integrated for 10.7 s, and recorded. Observations spanned a 103 day period from 2011 December 7 to 2012 February 4, corresponding to JD2455903 to JD2456006. In this period, observations were recorded for 92 days. Over this period, we selected for analysis observations in the local sidereal time (LST) range of 1:00 to 12:00, corresponding to a colder patch in the synchrotron sky that dominates the system temperature. In summary, the data set on which this analysis is based consists of 92 full days observed with 28 + 21 + 21 east–west oriented baselines of length 30 m, covering a window of 11 sidereal hours.

For the purpose of characterizing the flux density of Pictor A to establish a flux scale for the power spectrum measurements, we also made use of data observed from JD2455747.1 to JD2455748.1 from a 64 antenna single (XX) polarization deployment of the PAPER array in a minimum redundancy configuration (see Figure 1, right panel) more suited for imaging. The details of these observations and their calibration are given in Jacobs et al. (2013) and Pober et al. (2013a). Observation bandwidth, frequency resolution, and integration time match the maximum-redundancy observations reported above.

3. CALIBRATION AND ANALYSIS

All analysis described in this section were implemented using the Astronomical Interferometry in PYthon (AIPY)15 software toolkit. This package is under revision control16; our analysis is based on the version of AIPY committed under the hashtag dff2f4dba731240ced5cc883895a80cf714fc12e. An overview of the analysis in this section is illustrated in Figure 2.

Figure 2.

Figure 2. Data flow for the power-spectrum analysis described in Section 3. Solid black lines indicate data flow. Solid red lines indicate Monte Carlo simulations used to assess signal attenuation resulting from the analysis. Dotted lines indicate information flow for calculating errors via bootstrapping. Yellow indicates analysis steps not included in simulations for which signal loss is calculated to be negligibly small.

Standard image High-resolution image

3.1. Preprocessing

In preprocessing, we use delay/delay-rate (DDR) filters (Parsons & Backer 2009) to help identify radio-frequency interference (RFI) events and as part of a data compression technique that reduces data volume by over a factor of 40. Further details are supplied in Appendix A. The result is that the 2048 original frequency channels become 203, and the 60 original time samples per 10 minute file become 14, for an effective integration time of 43 s. Our chosen output integration time is slightly shorter than the 45.7 s interval dictated by the maximum fringe rate of a 300 m baseline. These data contain all emission that rotates with the sky and falls in the delay range |τ| < 1000 ns, corresponding to −0.5 < k < 0.5 h Mpc−1 at z = 7.7 (164 MHz) but compressed in data volume by over a factor of 40.

3.2. Phase Calibration

Phase calibration is achieved using standard interferometric self-calibration with the addition of constraints derived from the substantial instantaneous redundancy of the array (Wieringa 1992; Liu et al. 2010). The GPS surveying and total-station laser ranging located all antennas to within ±10 cm from a perfect grid, based on agreement between the two surveying methods. These antenna coordinates are used to calibrate the electrical delay of each antenna signal path on the basis of redundancy. Because of the high degree of instantaneous redundancy in this array, we are able to derive per-antenna delays for each signal path relative to only two parameters that could not be solved for on the basis of redundancy: the relative electrical delay between the antennas in fiducial east–west and north–south baselines (0–16 and 0–1, respectively). We find per-antenna delays averaged over each day of observation to agree within ±0.1 ns between days. Because this residual delay is a factor of 1000 smaller than the 100 ns width of the delay bins used in the power spectrum analysis in Section 3.6, the signal attenuation that would result from averaging over the phase variation associated with this residual delay is very small (of order 10−6). On this basis, we adopt a single per-antenna delay solution for the entire 92 day observation. These per-antenna delay solutions for all baselines are incorporated into a global self-calibration step that derives the remaining two unknown phase parameters using the central components of Centaurus A, Fornax A, and Pictor A.

The use of redundancy in calibration allows antenna-based gain and phase parameters to be derived from the ratios of baselines within instantaneously redundant sets. Because these instantaneously redundant baselines measure the same sky signal, the signal with each set of redundant baselines essentially becomes a single degree of freedom that, by taking ratios, is projected out prior to calibration. Hence, there is no possibility of signal loss associated with the redundant aspect of calibration.

3.3. Flux Calibration

3.3.1. An Absolute Flux Scale from Pictor A

The first challenge we encounter in flux calibration is establishing an accurate spectrum of a calibration reference. Pictor A is chosen as a calibrator source of necessity; given the grating lobes of the synthesized beam generated by our maximum-redundancy array configuration, side lobes of Pictor A and Fornax A dominate nearly all beam pointings. Of these two sources, Pictor A is the least resolved.

Unfortunately, the spectrum of Pictor A is particularly poorly characterized in the 100–200 MHz band where PAPER operates. Previous measurements imply that the spectrum below 400 MHz deviates substantially from the power law (with spectral index α = −0.85) that holds at higher frequencies (Perley et al. 1997). There are also serious inconsistencies in the measured flux of Pictor A below 1 GHz (e.g., Slee 1995; Kuehr et al. 1981; Large et al. 1981; Burgess & Hunstead 2006) and a dearth of measurements at nearby frequencies in order to establish the spectral slope of Pictor A in this band.

This shortcoming in the literature has recently been addressed with observations from a 64 antenna, single-polarization deployment of the PAPER array in a minimum-redundancy configuration more suited for imaging. The details of these observations and their calibration are given in Jacobs et al. (2013), along with the resulting spectrum of Pictor A that is shown to be consistent with a single power law that is best fit by

Equation (1)

with S150 = 382 Jy ± 5.4, with α = −0.76 ± 0.01. Error ranges indicate 76% confidence intervals.

3.3.2. Gain Calibration Augmented by Redundancy

For these maximum-redundancy observations, gain calibration is derived on the basis of both redundancy and self-calibration. First, we correct for the known dependence of amplifier gain and cable attenuation on ambient temperature using measured temperature data and temperature coefficients that were characterized in laboratory measurements (Parashare & Bradley 2009) and confirmed in field measurements (Pober et al. 2012). These corrections correspond to approximately a 5% variation in amplitudes over the course of the observations. This is the only time-dependent parameter used in the gain calibration.

Next, redundant self-calibration is used to derive the amplitude of each antenna signal path relative to a single overall bandpass function for each (XX,YY) polarization. As mentioned in the context of phase calibration, per-antenna gains are derived from the ratios of baselines within instantaneously redundant sets, and because the sky signal cancels in these ratios, there is no possibility of signal loss associated with redundant gain calibration.

This overall bandpass is then calibrated to the flux scale established in Jacobs et al. (2013) by phasing to the position of Pictor A and averaging over all intercolumn baselines. Because Pictor A is resolved at the 20% level on the longest (300 m) east–west PAPER baselines, we use per-baseline estimates of resolution weights to down-weight and correct for resolution effects. These resolution weights are estimated from the 90 cm map of Pictor A presented in Perley et al. (1997). Averaged over all of the baselines in the array, the inclusion of resolution effects change the measured flux by less than 5%. As in Jacobs et al. (2013), a fringe-rate filter is applied to the resulting beam to suppress side lobes that introduce variations deviating more than ±0.1 mHz from the fringe rate of the source in question.

To derive a source spectrum from a drift-scan source profile, we average time samples using weights derived from a model of PAPER's primary beam response as an effective approximation of inverse-variance weighting (Pober et al. 2012) and average over 92 days of drift-scan observations. A ninth-order polynomial is fit over a 100 MHz bandwidth to bring each measured spectrum in accordance with the unpolarized model of the Pictor A spectrum described in Section 3.3.1. The resulting Pictor A spectrum, plotted for each polarization in Figure 3 along with the reference spectrum derived from minimum-redundancy PAPER observations (Jacobs et al. 2013), characterizes the flux scale used in the subsequent power-spectrum analysis.

Figure 3.

Figure 3. Calibrated source spectrum for Pictor A measured using XX (cyan) and YY (magenta) polarization data from the power-spectrum observations. Solid lines show power-law fits to the data in the range 120–170 MHz. These plots characterize the calibrated flux scale of the measurements used in the power spectrum analysis. These measurements were calibrated to agree with the Pictor A spectrum in Section 3.3.1 (solid black). The ripples in the measured spectra are the result of beam side lobes on Fornax A that are substantially higher for maximum-redundancy observations than for the minimum-redundancy observations (plusses) used in Jacobs et al. (2013).

Standard image High-resolution image

This polynomial bandpass model uses a limited number of terms to prevent the coupling of beam side lobes, which are the most likely cause of the residual gain variation in Figure 3, into the overall gain calibration. While such side lobes are difficult to distinguish from intrinsic frequency structure, we err on the side of caution (using a limited number of degrees of freedom) to avoid introducing any frequency-dependent gain terms not in the data. In this cautious approach, it is possible that higher-order frequency structure in the instrumental bandpass may not be calibrated out. To address this possibility, we make the argument that evidence of such structure must be present in the measured Pictor A spectrum after the application of the bandpass calibration, and to the extent that some higher-order structure is demonstrably present, this structure can only be ascribed to uncalibrated structure in the instrumental bandpass if it modulates foreground structure to higher-order modes in the measured power spectrum. The fact that, at the upper limits we report in Section 4.2, we do not see this structure argues for its absence. In any case, this conservative modeling of the PAPER bandpass with as few degrees of freedom as necessary does not compromise the validity of the upper limits that are reported.

The restricted order of the polynomial used in calibrating the bandpass also serves to insulate against the signal loss that would result from inadvertently including 21 cm EoR fluctuations in the bandpass model. We use simulations with many realizations of low-level white noise added to the polynomial bandpass model we fit using Pictor A to quantify the expected signal loss as a function of k in the power spectra measured in Section 4.2. In each case, we compare the power spectrum derived from the original injected noise with the assumed polynomial bandpass model to the power spectrum of the injected noise after dividing by a new fit of a ninth-order polynomial to the bandpass. We find signal loss in the range −0.06 < k < 0.06 h Mpc−1 to be less than 8% and signal loss outside of this range to be below 0.2% and to decrease rapidly with |k|. This simulation calculates signal loss strictly on the basis of fitting a ninth-order polynomial bandpass model to a spectrum containing the 21 cm EoR signal and ignores the use of multiple baseline lengths and Earth-rotation synthesis in measuring the Pictor A spectrum. This averaging of independent modes attenuates the coupling of the 21 cm EoR signal into the Pictor A spectrum that is used in the bandpass calibration in proportion to the square root of the number of modes averaged. Hence, this simulation establishes an upper bound on signal loss that results from bandpass calibration and shows it to be a small effect for the k modes that we use to set our upper limits in Section 4.2. Because polynomial fitting is a linear process, the fractional signal loss we calculate using a white-noise signal model is independent of the amplitude of the model signal.

Following calibration to a consistent flux scale, XX and YY polarizations for each baseline are directly summed to form a coarse estimate of the Stokes I parameter. This simplistic construction of Stokes I neglects differences in the beam responses of each linear polarization and, as a result, contains contributions from Stokes Q away from beam center (Jelić et al. 2010; Moore et al. 2013). Nonetheless, for analysis of a single baseline type, correcting for direction-dependent gains is highly nontrivial at best, and we attempt no such correction here. On the basis of the asymmetry of the PAPER beam model under 90° rotation, we estimate that this construction of Stokes I will contain approximately 4% of Stokes Q as well (Moore et al. 2013). For a more thorough treatment of polarization in the context of drift-scanning arrays with delay-spectrum analysis, we refer the reader to Moore et al. (2013).

3.4. Wide-band Delay-spectrum Foreground Separation

The next major step in our data reduction pipeline is to remove a best estimate of foreground emission arising from the galactic synchrotron and extragalactic point sources. This is an essential step, as foreground emission exceeds the expected level of EoR emission by nine orders of magnitude (Pober et al. 2013a) and can conceal low-level RFI and crosstalk systematics. Moreover, as described in P12b, smooth-spectrum foreground emission can corrupt higher-order k modes in power-spectrum measurements as a result of side lobes that arise from RFI flagging and the finite (windowed) bandwidth used in the line-of-sight Fourier transform. Although power-spectrum measurements that aim to constrain 21 cm reionization must be limited to ∼10 MHz to avoid significant signal evolution within the band, foreground emission is generally coherent over the entire 100 MHz band of PAPER observations.

A delay-domain filter applied over a wide bandwidth can achieve a much sharper degree of foreground isolation than is possible in any sub-band, and this can be used to effectively model and remove a smooth foreground model from each sub-band of interest. Following the procedure outlined in P12b and implemented in Pober et al. (2013a), we perform a delay transform of each integration over the entire observing band, according to the equation

Equation (2)

where V(ν) is the measured visibility as a function of spectral frequency ν, τ is delay (the Fourier complement to ν), W(ν) is a Blackman–Harris windowing function (Harris 1978) used for minimizing side lobes and band-edge effects, S(ν) encodes the frequency-dependent sample weights that result from RFI flagging, and ∼ denotes the delay transformation. Following the procedure outlined in Parsons & Backer (2009), we deconvolve the delay-domain kernel that results from the product W(ν)S(ν) using a CLEAN-likeiterative forward modeling deconvolution algorithm (Högbom 1974) in the delay domain, restricting model components to fall inside of 15 ns beyond the horizon limit of each baseline. We chose a 15 ns standoff from the horizon to suppress the first level of suprahorizon emission that results from the inherent width of $\tilde{S}(\tau)$. Because the sampling function is known exactly, the restriction of model components to low-order delay modes ensures that the deconvolution process cannot introduce power at higher-order delay modes beyond what was scattered there by side lobes of the sampling function. Moreover, because of the least-square metric used in evaluating CLEAN models for divergence, any delay-domain residuals associated with the sampling function will be unbiased.

The resultant CLEAN model is subtracted from $\tilde{V}(\tau)$ to effect a delay-domain filter suppressing smooth-spectrum emission. On the 30 m, nearly east–west baselines used in this analysis, in a band centered at 164 MHz, this corresponds to a cutoff at k = ±0.06 h Mpc−1. This delay filter is solely responsible for −60 dB of suppression (in mK2 power) of the foreground signal seen between the unfiltered signal (top black curve) and the final residual (lowest black curve) shown in Figure 4. More importantly, in the narrow-band delay transformation that forms the basis of power-spectrum measurements in Section 3.6, this filter heavily suppresses the coupling of delay modes beyond the horizon to the foreground-dominated modes within the horizon. There is a small amount of signal loss associated with this filtering. Using the simulation methodology described in Section 3.7, we compute this signal loss to be 4.8% for the two modes on either side of the horizon limit, 1.3% for the next modes outside of that, and <0.015% elsewhere.17 The cost of this signal loss is far outweighed by the benefits of suppressing couplings to foreground-dominated modes where, despite the low coupling strength, the sheer magnitude of the foreground signal creates a dominant bias.

Figure 4.

Figure 4. rms brightness temperatures of raw (black), frequency-differenced (magenta), and time-differenced (cyan) visibilities in various stages of analysis. The upper triplet of curves corresponds to raw visibilities for individual 30 m east–west baselines, averaged coherently in 43 s LST bins over 20 days with a channel bandwidth of 490 kHz, and averaged incoherently over the set of 28 identical baselines. The second triplet of curves from the top corresponds to visibilities after the application of the wide-band delay filter described in Section 3.4. The third triplet shows the result of averaging delay-filtered visibilities coherently over the set of 28 baselines. The lowest triplet shows the same baseline-averaged visibilities after the application of the fringe-rate filter described in Section 3.5. The lower cyan curve is significantly lower than the others as a result of the near-orthogonality, in fringe-rate domain, of a high-pass time-differencing filter and the low-pass fringe-rate filter; it is not a valid representation of the noise. Scaling the wide-band delay-filtered curves by the effective integration time and bandwidth, we determine a system temperature of 560 K at 160 MHz.

Standard image High-resolution image

After the removal of the bulk of the foregrounds, the signal remaining for each baseline is visibly noise-dominated. At this point, we apply a final pass of RFI excision, flagging 3σ outliers, and proceed to crosstalk removal to remove systematics that were previously undetected beneath the bright, smooth-spectrum emission. Crosstalk removal proceeds by subtracting the one hr time average of the visibilities for each baseline from each integration. This process, described in Parsons et al. (2010), distinguishes oscillating fringes associated with celestial emission from the static phase bias associated with crosstalk. This crosstalk-removal filter essentially constitutes a high-pass fringe-rate filter, as described in Section 3.5. The width of the stop-band of the crosstalk filter is much narrower than low-pass fringe-rate filters described in that section.

Night time data are then averaged in LST over the 92 day observation using 43 s time bins that match the integration interval of the data after the compression step described in Section 3.1. In the LST binning process, sporadic non-Gaussian events that evade the RFI flagging process can easily skew the average value in an LST bin away from the median value. To avoid bias from non-Gaussian tails, we flag the 10% of data in each LST bin, for each frequency channel, that deviate most from the median value, and exclude them from the computed average for that LST bin. This last flagging step suppresses non-Gaussian outlying events without biasing the computed average in an LST bin. We simulate this median filtering process for samples in a normal distribution around an underlying random signal assigned to each LST bin and verify that the computed variance of the underlying signal is not biased by the median filter, regardless of the level of the signal with respect to the noise. We also use this simulation to verify that bootstrapping correctly reproduces the range of variation in the final estimator as expected for Gaussian statistics and that the bootstrap error bars are unaffected by the median filter. These simulations indicate that the LST binning process described above has no associated bias or signal loss for recovering the cosmological EoR signal.

3.5. Fringe-rate Filtering

In the last step prior to forming power spectra, we apply a fringe-rate filter to effect time-domain integration, using the effective time interval that a baseline measures a single k mode to integrate coherently (with noise decreasing as $\sqrt{t}$, in units of mK), before measurements at different times represent independent modes that must be squared before further integration (with noise now decreasing as $\sqrt{t}$, in units of mK2).

One way of handling this additional integration is via gridding in the uv plane. Each measured visibility in a wide-field interferometer represents the integral over a kernel in the uv plane that reflects the primary beam of the elements (Bhatnagar et al. 2008; Morales & Matejek 2009) and the w component of the baseline (Cornwell et al. 2003). As noted in Sullivan et al. (2012) and Morales & Matejek (2009), in order to optimally account for the mode-mixing introduced by these kernels, gridding kernels must be used that correctly distribute each measurement among the sampled uv modes, such that, in the ensemble average over many measurements by many baselines, each uv mode becomes an optimally weighted estimator of the actual value given the set of measurements.

However, this approach has a major shortcoming when applied to maximum-redundancy array configurations. In order to maximize sensitivity, such configurations are set up to deliberately sample identical visibilities that reflect the same combinations of modes in the uv plane with few nearby measurements (P12a). As a result, such array configurations tend to lack enough measurements of different combinations of uv modes to permit the ensemble average to converge on the true value. Said differently, maximum-redundancy array configurations tend to produce measurement sets that, when expressed as linear combinations of uv modes of interest, are singular.

Our alternative approach avoids this and many of the difficulties outlined in Hazelton et al. (2013) by applying a carefully tailored fringe-rate filter to each time series of visibility spectra. As outlined in Appendix A in the context of data compression, we take the Fourier transform of the time series in each channel and apply a low-pass filter that preserves fringe rates that geometrically correspond to sources rotating on the celestial sphere. For a planar array with transit observations, fringe rates vary according to declination, with fringe rates reaching a maximum (fmax) at δ = 0°, decreasing to 0 at δ = −90°, and for an array such as PAPER deployed near −30°S latitude, reaching a minimum of ≈ − fmax/2 at δ = −60° on the far side of the south celestial pole. In order to avoid introducing undesirable frequency structure, we apply the same filter, tuned to the width set by the highest frequency of the sub-band used in the power spectrum analysis described in Section 3.6, to each channel, even though maximum fringe rates are generally frequency-dependent. In a future paper, we will explore the idea of employing fringe-rate filters that purposely down-weight fringe-rate modes on the sky according to the expected signal-to-noise ratio in each mode. Such filters would essentially correspond to a one-dimensional implementation of the inverse primary beam uv gridding discussed in Morales & Matejek (2009), and have many features in common with m-mode synthesis described in Shaw et al. (2014).

Since thermal noise scatters equally into all fringe rate bins, applying a filter that passes only fringe rates corresponding to the celestial emission has the effect of denoising the data. We apply such a filter to the data, choosing the bounds of the filter to match the geometric bounds set by a 30 m east–west baseline, according to the equation

Equation (3)

where fmax is the maximum fringe rate, beq is the baseline vector projected parallel to the equatorial plane, c is the speed of light, ω is the angular frequency of the Earth's rotation, and ν is the spectral frequency. At 174 MHz (the highest frequency in a 20 MHz window centered on 164 MHz that is used in Section 3.6), fmax = 1.27 mHz, corresponding to a fringe period of 788 s. Hence, the fringe-rate filter that is applied passes fringe rates in the range −0.63 < f < 1.27 mHz. The width of this filter corresponds in sensitivity to an effective integration time of 525 s. We note that this filtering could have been applied during the data compression described in Section 3.1 but was implemented separately to enable the compression to work uniformly over all baselines in the array without additional information about antenna location.

After applying this filter, we transform the data back to time domain in preparation for forming power spectra via the delay transform. It should be noted that, in time domain, the data are now heavily oversampled; adjacent samples are no longer statistically independent. Hence, when averaging power spectra versus time, noise will not beat down according to the strict number of samples but rather according to the actual number of statistically independent samples underlying the time series.

3.6. Narrow-band Delay-transformation and Cross-multiplication

In the final stage of our core analysis, we apply the delay transform in Equation (2) to a 20 MHz band centered at 164 MHz, again using a Blackmann–Harris window that yields an effective (noise-equivalent) bandwidth of 10 MHz, centered at redshift z = 7.7. In this paper, we limit our analysis to a single cosmological bin where, out of the entire PAPER observing bandwidth, the noise and foreground residual in Figure 4 is lowest. As discussed in P12b, the τ modes that result from the delay transform have a peaked response in k that allows each (u, v, τ) mode to be interpreted as sampling the mode k = 2π(u/X, v/X, τ/Y), with X and Y representing conversion factors from angle and frequency to comoving distance, respectively. As such, we use k and τ modes interchangeably in the following discussion.

In contrast to the wide-band delay filtering described in Section 3.4, where a CLEAN-like deconvolution with a restricted range of model components was used to suppress the side lobes of smooth-spectrum emission resulting from unsmooth sampling functions, we do not deconvolve covariances in $\tilde{V}(\tau)$ introduced by windowing and sampling functions. First, in the band chosen for this analysis, the sampling function S(ν) in Equation (2) is nearly unity at all frequencies, thanks to the exquisite RFI environment at the Karoo site in South Africa. Secondly, the covariances between nearby τ modes introduced by the windowing function go hand-in-hand with the suppression of covariances between modes at larger separations. Hence, we defer the removal of covariances introduced by the windowing and sampling functions at this stage until the covariance diagonalization process described in Section 3.7.

From these delay-transformed visibilities, we construct unbiased estimators of the power spectrum, $\widehat{P}({\bf k})$, by cross-multiplying delay spectra between the seven baseline groups constructed in Section 3.4 and summing over all cross-multiples according to the equation

Equation (4)

which follows from Equation (12) of P12a, with λ being the observing wavelength, kB is Boltzmann's constant, X2Y is a cosmological scalar with units of (h−3 Mpc3/sr Hz), and Ω is the angular area,18B is the bandwidth, 〈...〉i < j indicates the ensemble average over instantaneously redundant baseline measurements indexed by i, j, and $\tilde{V}(\tau,t)$ is the delay-transformed visibility, expressed in terms of delay τ and time t. We use t as a subscript on k to denote the different modes sampled by a baseline as the sky rotates and τ to indicate the dependence of k on the delay mode in question.

Equation (4) represents the diagonal-covariance limit of an optimal quadratic estimator (Dillon et al. 2013; Liu & Tegmark 2011), which we review in detail and adapt to delay-transformed visibilities in Appendix C. The diagonal limit generates optimal power spectrum estimators in the simple case of ideal signal covariance among statistically independent τ modes. In theory, the translation-invariance of the cosmological signal (along with the interchangeability of τ and k) ensures that the signal covariance is diagonal. Over narrow frequency bands such as the one used for these measurements, a similar invariance (and thus, diagonality) along the frequency direction should hold for the noise and foregrounds (Dillon et al. 2013). In practice, we encounter substantial systematics whose covariance deviates from this ideal. In Section 3.7, we examine how off-diagonal covariances arising from systematics can be suppressed in a straightforward way and incorporated into Equation (4) to produce power-spectrum estimators that are, to leading order, optimal. Because this later analysis follows the same trajectory as the analysis presented, we will first bring the current analysis to completion before examining how it is modified to incorporate off-diagonal covariances.

For N redundant samples of a k mode, N(N − 1)/2 cross-multiples are constructed. At this stage, we do not use i > j pairings, choosing instead to preserve the imaginary component of the power spectrum estimator, even though P21(k) is real valued, as a diagnostic. In the final power-spectrum estimate, we drop the imaginary component, which is equivalent to including the i > j baseline pairings. Hence, we ultimately have N2N measurements of P(k), with noise decreasing proportional to N in mK2 units, as would be expected for coherently integrating redundant samples before squaring them.

Next, we average over measurements of independent k modes that statistically reflect the same underlying power spectrum, to produce our best power-spectrum estimate,

Equation (5)

where the three-dimensional (3D) symmetry of the power spectrum is invoked to average over all independent measurements of modes in a shell of |k| = k, with independent measurements indexed here by t. As described in Section 3.5, the number of independent modes that are averaged (with noise decreasing with number of modes, M, as $\sqrt{M}$ in mK2 units; see P12a) is determined by overall observing window and the number of fringe-rate bins that are preserved in the fringe-rate filtering process. Noise levels are estimated using bootstrapping, as described in Section 3.8. Since we have not decimated the number of integrations to the critical sampling rate corresponding to the width of the applied fringe-rate filter, M is not the number of integrations. However, we are free to average the power-spectrum estimates for each integration, even though nearby samples do not have statistically independent noise, understanding that noise will decrease according to the number of underlying independent samples.

We also apply this same process for constructing power-spectrum estimators to data that include smooth-spectrum foreground emission before it was suppressed with the wide-band delay filter described in Section 3.4. While these data are not expected to yield accurate measurements of P(k) at higher k owing to side lobes of data flagging and low-level systematics, as mentioned previously, they do yield effective measurements of the bright foreground emission that falls within the limits of the wide-band delay filter (top right panel, Figure 5). At the baseline length used in this analysis, this emission is dominated by bright point sources such as Pictor A and Fornax A (Parsons et al. 2010; Jacobs et al. 2011). In order to present a complete picture of the power spectrum of emission on the sky that is valid at all k modes, we stitch together the final power spectrum using measurements derived from foreground data at k modes where emission is suppressed by the wide-band delay filter, and using measurements from the foreground-suppressed data elsewhere.

Figure 5.

Figure 5. Top left: the covariance matrix of delay modes from two representative baselines generated by applying a narrow-band delay transform (Section 3.6) to data without the application of a wide-band delay filter. Top right: similar to top left, but applied to data after the application of a wide-band delay filter (Section 3.4). Bottom left: similar to top right, but after removing per-baseline off-diagonal covariance terms that deviate from the average (Section 3.7). Bottom right: similar to bottom left, but after removing off-diagonal covariance terms common to all baselines associated with the central seven delay modes. This final step attenuates the expected level of the reionization signal in these central modes by ∼30% but only attenuates other modes by ∼5%.

Standard image High-resolution image

The result of the analysis up to this point is the power spectrum illustrated by the cyan curve in Figure 6, which exhibits a residual off-diagonal covariance structure that is illustrated in the top right panel of Figure 5.

Figure 6.

Figure 6. Power spectra at z = 7.7 derived from the 92 day PAPER observation described in Section 2. In both panels, solid cyan depicts 2σ upper limits derived from PAPER observations without the removal of off-diagonal covariance terms, and black indicates the final measured power spectrum with 2σ confidence intervals. Because of windowing in the delay transform (Section 3.6), adjacent measurements are ∼50% correlated. The measurements in the right panel are weighted averages of positive/negative k contributions. The horizon limit (vertical dashed) illustrates the boundary within which emission has been filtered out in delay space and reinserted after the formation of power spectra. Dashed cyan illustrates the predicted noise power spectrum from Parsons et al. (2012a) for a system temperature of 560 K. The yellow triangles indicate 2σ upper limits reported in Paciga et al. (2013) at z = 8.6. Magenta illustrates a fiducial model at 50% ionization (Lidz et al. 2008). At k ≈ 0.27 h Mpc−1, we report an upper limit on $\Delta _{21}^2(k)$ of 1660 mK2 with 2σ confidence.

Standard image High-resolution image

3.7. Suppression of Off-diagonal Covariances Arising from Systematics

The off-diagonal covariances shown in the upper-right panel of Figure 5, which are estimated empirically from measured data, are indicative of systematics corrupting what should nominally be nearly statistically independent estimators of k modes of reionization. This diagnosis is reinforced by the substantial variation of the covariance structure between different baselines; we expect the sky signal, whether from foregrounds or reionization, to be common to all baseline cross-products (as in the top left panel of Figure 5). Hence, we now turn to investigating techniques for removing these systematics while retaining, to the greatest extent possible, the reionization signal we are after. As we describe briefly here and in detail in Appendix C, this can be achieved by using the quadratic estimator formalism (Dillon et al. 2013; Liu & Tegmark 2011).

As derived in Appendix C, the special case of applying optimal quadratic estimators to delay-domain visibilities can, to leading order, be approximated as a modification of the visibility. In brief, knowledge of the covariances allows off-diagonal leakages between delay bins to be predicted and removed. If we take the output of the off-diagonal covariance removal process described in Equation (C20) as a perturbed visibility, $\tilde{V}^\prime (\tau)$ from Equation (C21), such that then for this modified visibility, Equation (4) encodes the remainder of the optimal estimator formalism.

There are, however two subtle issues to address in applying the off-diagonal covariance removal process described in Appendix C. The first problem pertains to our limited ability to empirically deduce the true covariances present in our data. Because we have a limited number of independent samples from which to estimate covariances in the data, we measure residual noise and signal covariances that, in an infinite sample set, would not exist. If we naively subtract these residual covariances, we overfit the noise and suppress the 21 cm signal.

There is a simple prescription for avoiding this problem, given that, as the top right panel of Figure 5 illustrates, the dominant systematics we see in the covariance of k modes exhibit a baseline-dependent structure. Because celestial emission, whether from foregrounds (top left panel, Figure 5) or 21 cm reionization, should have identical structure among redundant baselines, we may subtract the average covariance among many baseline cross-products (the "panels" in each subplot of Figure 5) from each individual cross-product to remove the residual signal covariance and leave behind only the covariance of noise and systematics from which off-diagonal covariances can be removed even if they are residuals that would not be present in the true covariance matrix without attenuating the desired celestial signal. This lossless removal of systematics is another substantial benefit of the highly redundant configurations presented in P12a.

We apply this technique for each baseline cross-product, using all baselines except the two being cross-multiplied for estimating the average covariance in order to avoid, to the greatest extent possible, coupling baseline-dependent systematics into each estimate of the common-mode covariance. Iterating this process a modest number of times (two or three), produces an improvement in the results as initial baseline-specific systematics are first removed from the baseline, allowing an improved estimate of the average covariance to be subtracted, such that the remainder of the baseline-specific systematics are removed. Further iteration is not necessary, as the process rapidly converges to the result shown in the bottom-left panel of Figure 5.

This process, though quite effective, introduces a second issue that must be addressed. In Equation (4), we were careful to exclude products between measurements involving the same baseline in order to avoid incurring the noise bias that would result. The benefits of avoiding this bias far outweigh the slight improvement in sensitivity that including such "auto-products" would produce. Unfortunately, by using the average of many baselines to estimate and subtract an average covariance from each baseline cross-product and then subtracting off-diagonal terms in the residual, we couple the baseline-averaged noise into the data from each baseline. The result is a low-level residual noise bias approximately equal to the noise in the power-spectrum estimate.

The most straightforward approach we have found for eliminating this residual noise bias, other than direct subtraction (which introduces additional complications), is to divide baselines into four separate groups of approximately equal size. Within each group, we apply the off-diagonal covariance removal process, including the subtraction of an average covariance, using data only from baselines within that group. Then, to avoid incurring a noise bias, we use only cross-products of baselines between these four groups to estimate P(k). By excluding intragroup cross-products, this approach sacrifices a factor of approximately 15% in sensitivity (in mK2), but as before, we find the benefits of avoiding noise bias to outweigh the loss in sensitivity.

The last step in the process of suppressing off-diagonal covariances in our data is, for select modes, to relax the constraint of not subtracting a baseline-averaged covariance for selected modes, even if it results in overfitting the noise and the signal suppression that is associated with it. Particularly, we note the interior seven k modes (the five inside of the horizon limit shown in Figure 6 as well as the first modes beyond this limit on either side) that we measure are more than an order of magnitude brighter than other modes and are so corrupted by smooth-spectrum foreground emission that, barring a heroic effort aimed at modeling and removing these foregrounds, they are unlikely to be useful for constraining high-redshift 21 cm emission. We find it advantageous to remove all off-diagonal covariances associated with these modes, even if it means overfitting the noise. The result of this process is shown in the bottom right panel of Figure 5.

Because we have overfit the noise in this final step, it now becomes necessary to investigate how we have affected the 21 cm signal that we aim to measure because failure to account for signal attenuation can lead to erroneous constraints. We use Monte Carlo simulations (see, e.g., Masui et al. 2013; Paciga et al. 2013; and Switzer et al. 2013) to estimate the expected signal attenuation through the analysis chain from Section 3.6 onward, as illustrated by the red data-flow path in Figure 2. Because our analysis does not model out modes of arbitrary shape (such as occurs in principal component analysis), and because delay filtering and covariance removal operates in the same space in which the power spectrum is measured, the simulations needed to characterize signal loss for our analysis are relatively straightforward. In our simulations, we apply the analysis pipeline used on the data, including off-diagonal covariance suppression, to a battery of simulated data sets that contain both a randomly generated sky signal and thermal noise. In each data set, the model sky signal consists of a spatially random, flat-spectrum signal common to all baselines that rotates with the celestial sphere, modeling the output of the fringe-rate filtering step that precedes the narrow-band delay transform in Figure 2. The noise that is added to this sky model is a Gaussian random signal unique to each baseline, to which a fringe-rate filter is applied.19

We examine two cases in our simulations. In the first, we apply the off-diagonal covariance removal process using empirically determined covariances internal to the simulated data. This case effectively treats the simulated signal as if it were the data and investigates the effects of overfitting noise in the data used to produce the covariance matrices. In the second case, we apply the exact diagonalization matrices used to remove covariance terms in the measured data to the simulated signal. This case examines the effect that the operations applied to the data would have on a simulated signal that is uncorrelated with the data, as would be the case for a faint signal that is subdominant to foregrounds and instrumental systematics in the data. These two cases serve to characterize any combination of signals that are correlated or uncorrelated with the data, and hence they accurately bound the signal loss to which a potential 21 cm reionization signal would be subject in this analysis.

In both simulation cases, the amplitude of each k mode in the output power spectrum of the recovered signal is compared to the known input amplitude of the simulated signal, averaged over 1000 independent simulations. In the first case, results indicate that removing the off-diagonal covariances associated with the central seven modes results in a ∼5% reduction in signal amplitude (in mK2) for other modes and a ∼30% reduction for the seven modes in question. We compensate by dividing by these signal attenuations when reporting the power spectrum limits in Figure 6. In the second case, which tests the effects of our exact analysis on the noise statistics of an uncorrelated signal, we find that noise that is uncorrelated with the data is not significantly affected by this analysis.

We note that none of the analysis described above constitutes formal model subtraction. Because we estimate covariances from data, it is possible to overfit noise and subtract signal, but we take steps to avoid this and to compensate for signal attenuation where subtraction does happen by estimating signal attenuation via Monte Carlo simulations. If the full covariances were known a priori, this analysis would be strictly lossless (Tegmark 1997) and, as shown in Appendix C, optimal to leading order.

3.8. Estimating Residual Noise

One consequence of the process for removing off-diagonal covariances described above is that baseline-specific covariances, including residual covariances from thermal noise, are heavily suppressed. As a result, it is not possible to estimate noise residuals simply from the variation between baseline cross-products at the conclusion of this process. Instead, we use bootstrapping of baseline samples at the input of the off-diagonal covariance removal process to estimate error in the output power-spectrum estimate.

There are, however, a few subtleties in applying bootstrap resampling in this context. One complication is that, as described in Section 3.7, in order to avoid incurring a noise bias, it is vital that the removal of off-diagonal covariances proceed on independent groups of baselines. To reflect the fact that our power spectrum estimator is unbiased, it is important to restrict the resampling with replacement to avoid including data from the same baseline into the four independent baseline groups. This is done by first randomly assigning baselines uniquely to one of the four groups, and then applying sampling with replacement only within each of these groups. At each iteration of resampling, the assignment of baselines to groups is randomized.

Another issue in applying bootstrapping to this case pertains to how a second bias can be mistakenly included if we are not careful to keep track of repeated entries of the same baseline data within a group. In this case, the problem results from averaging used to suppress residual signal covariances in the case that such cross-products may actually be the product of a baseline's data with itself. The noise bias in these auto-products is coupled into the average of the covariances and biases the result of the off-diagonal covariance removal in each baseline group. In the cross-multiplication of baselines between groups, these biases (which are positive) couple into the result.

The solution in this case is to exclude auto-products from the determination of the average covariance among baseline cross-products. Since only a few independent cross-products are necessary to distinguish baseline-specific features from a residual signal covariance, this restriction is relatively harmless. However, it does require that there be at least two independent baselines in each resampled baseline group. By imposing this restriction, as well as the restriction to not repeat baselines between two independent groups, we limit the sample space that bootstrapping explores. It is therefore important to check that there remain enough data permutations to adequately estimate errors. Even with these restrictions on the resampling of 28 redundant baselines, the number of distinct permutations is much larger than the number of independent samples we have, so the limiting factor in estimating errors will be the number of samples, not the number of bootstrap resamplings.

The mean and 2σ errors shown in Figure 6 are derived from 100 bootstrap resamplings, with error bars calculated to enclose 95% of the bootstrap samples. The errors we report do not include covariances between k modes, which are not treated in bootstrapping. However, as implied in Dillon et al. (2014), if these modes are not added together in binning, then the variances derived from bootstrapping are sufficient to characterize the errors.

4. RESULTS AND DISCUSSION

4.1. Foreground Suppression and Noise Levels

Figure 7 shows images of a single frequency channel using Earth-rotation synthesis, including all 496 baselines of PAPER's 32 antenna maximum-redundancy array. As a reminder, the power spectra we report in Section 4.2 is estimated directly from a restricted set of visibilities. These images are for diagnostic purposes. As indicated by the marked reduction of emission between the upper right and lower left panels, the application of the delay filter suppresses peak emission by at least three orders of magnitude. The further decrease in residual emission with the application of the fringe-rate filter between the lower left and lower right panels indicates that residuals in the lower left panel are noise-dominated, with the lower right panel showing that peak emission from the raw image has been suppressed by 3.3 orders of magnitude.

Figure 7.

Figure 7. Synthesized beam and dirty images made with the same LST-averaged, maximum-redundancy PAPER observations used in the power-spectrum analysis, but including all 496 baselines in the array. For imaging, we used every tenth integration (each with an integration time of 43 s) in a single 500 kHz channel centered at 160 MHz in an LST range of 2:00–5:00, phased to α = 3:00, δ = −30:40°. Plotted left to right, top to bottom, are the synthesized beam, an unfiltered dirty image, a dirty image with a wide-band delay filter at 15 ns past the horizon limit, and a dirty image with both the wide-band delay filter and a fringe-rate filter applied. The synthesized beam panel has been normalized to show peak beam response at the maximum of the color scale range.

Standard image High-resolution image

This result is paralleled in Figure 4, which shows the rms temperature of visibilities as a function of frequency in LST bins between 2:00 and 5:00, before and after the application of the wide-band delay filter, as well as how noise in the filtered data behaves through the various integration steps described in Section 3. As can be seen by the ratio of the unfiltered data to the residual, the wide-band delay filter is responsible for suppressing foreground emission in the middle of the band by nearly three orders of magnitude (in mK), demonstrating the efficacy of this approach for isolating smooth-spectrum foregrounds from k modes that might be used to detect 21 cm reionization. This filter substantially outperforms time-differencing or frequency-differencing the visibilities for foreground suppression because of the sinusoidal nature of the fringes.

The gap in the lower set of curves of Figure 4 between the raw and differenced visibilities indicates the presence of an interfering signal that varies relatively smoothly in time and frequency. Such characteristics are generally indicative of foreground emission and suggest that near the limits of the sensitivity of these observations, foregrounds are not completely suppressed. Examining the set of visibilities in Figure 4 from which foregrounds have been removed via derivatives in time or frequency, we find a ratio of approximately 5 between the single-baseline and baseline-averaged curves in the 140–180 MHz frequency range. This ratio is roughly consistent with the value of 5.3 that would be expected for integrating 28 baselines with independent noise. Below 130 MHz, this ratio decreases substantially, indicating the presence of a correlated signal between different baselines. This correlated signal is shown to be suppressed by time-differencing and frequency-differencing filters, indicating that it is likely to be the result of unsuppressed foregrounds near the minimum delay threshold imposed by the wide-band delay filter. This interpretation is consistent with the power spectrum reported in Figure 6.

Scaling the noise-dominated single-baseline curve by the square root of the number of measurements contributing to each temperature residual, we determine the system temperature at 160 MHz to be 560 K. This value is in excess of the expected level of sky noise arising from galactic synchrotron emission, but it represents a substantial improvement over the system temperature in Parsons et al. (2010).

Next, we examine the set of visibilities derived from data after the application of a fringe-rate filter, which, although weighted for a sharp cutoff in fringe-rate domain, effectively corresponds to a ten-fold increase in integration time over the 43 s LST bins used in the upper plot. The ratio between the curves with and without fringe-rate filtering is approximately 3 in the 140–170 MHz frequency range. This ratio is roughly consistent with the value of 3.2 that would be expected for the ten-fold increase in integration time that the application of the fringe-rate filter represents. The ratio between the filtered and unfiltered curves is substantially lower on either end of this band, indicating a departure from noise-like behavior. As in other cases, the suppression of the systematic with frequency-differencing suggests that this reflects residual foregrounds that are relatively smooth. Nonetheless, the increasing systematics near the edge of the band form the basis of one of the arguments for observing with a wide instantaneous bandwidth. It also provides the motivation for limiting our power spectrum analysis in this initial investigation to the portion of the spectrum that appears consistent with noise.

4.2. Measured Power Spectrum

In Figure 6, we show the spherically averaged 3D power spectrum derived from our measurements. In this figure, the central five data points in the range −0.06 ⩽ k ⩽ 0.06 h Mpc−1 fall within the horizon limit of the delay spectrum and come from data without the application of the wide-band delay filter. We also illustrate the data before (cyan) and after (black) the application of the off-diagonal covariance removal described in Section 3.7. Averaging over ±k, we place a 2σ upper limit on $\Delta _{21}^2(k)$ of 1660 mK2 for k ≈ 0.27 h Mpc−1. This limit is more than an order of magnitude more stringent than the previous best upper limit, which was recently revised to 61,500 mK2 at k = 0.5 h Mpc−1 at z = 8.6 (Paciga et al. 2013).

As this plot illustrates, the residual foreground signal that we detect in the frequency domain (see Figure 4) is concentrated at low |k| modes in delay space, with the largest excess in a region just beyond the horizon limit described in P12b. In the range −0.22 ⩽ k ⩽ 0.22 h Mpc−1, bins show evidence of a systematic bias that makes them inconsistent with zero with high significance. The source of this biasing signal is likely to be foreground emission, as predicted in P12b, but instrumental systematics are another possibility. We observe a second feature at moderate (3σ) significance at k ≈ ±0.35 h Mpc−1 that appears relatively symmetrical with respect to positive and negative k. We presume that this feature is caused by instrumental systematics or foreground emission, such as leakage from a polarized signal with high rotation measure (Moore et al. 2013). The underlying cause is still under investigation and will require substantial follow-up observation and careful analysis to validate.

As a diagnostic, we apply our analysis to random noise injected into the analysis before the narrow-band delay transform (see Figure 2) at the level expected for these observations with a system temperature of 560 K, which was measured at 160 MHz using frequency-domain visibilities after the application of a wide-band delay filter (Figure 4). These noise simulations agree with Equation (27) of Parsons et al. (2012a), which is the basis of the dashed cyan curve in Figure 6. We find good agreement between this expected noise level, the estimated power spectrum, and the errors derived from bootstrapping.

4.3. Constraints on the Mean 21 cm Brightness Temperature

The limits shown in Figure 6 are not yet at the level of the brighter predictions favored by numerical and seminumerical simulations (McQuinn et al. 2007; Zahn et al. 2007; Iliev et al. 2008; Santos et al. 2008; Mesinger et al. 2011) that include, among other things, the predicted effects of X-ray heating on the intergalactic medium (IGM). Nonetheless, we can, for the first time,20 constrain the 21 cm brightness temperature, 〈Tb〉, of the neutral gas at mean density for several reionization models and ionization fractions.

We recall that the brightness temperature contrast of the 21 cm transition relative to the cosmic microwave background (CMB; neglecting velocity gradient terms and assuming low optical depth, τ21) can be written in the form

Equation (6)

where TS is the spin temperature, $x_{{\rm H}\,\scriptsize{I}}$ is the neutral fraction, and δ is the density contrast, all of which are spatially dependent. The pre-factor is then

Equation (7)

(see, e.g., Zaldarriaga et al. 2004; Furlanetto et al. 2006), where we have used the Planck parameters (Planck Collaboration et al. 2013).

Most reionization models assume an early period of UV-photon production from stars that, while insufficient to ionize the IGM, serves to drive the spin temperature toward the gas temperature via the Wouthuysen–Field effect. This is followed by heating of the neutral gas, primarily through X-rays associated with star formation and black hole accretion. In these scenarios, heating drives TSTCMB during reionization, so ξ ≡ (TSTCMB)/TS ≈ 1. The brightness contrast can be significantly larger, however, if TS < TCMB. In this case, the 21 cm line is seen in absorption against the CMB.

The maximum value of this contrast corresponds physically to a situation in which the spin temperature remains tightly coupled to the gas temperature via the Wouthuysen–Field effect21 (Wouthuysen 1952; Field 1958; Hirata 2006), but in which heating of the neutral gas by any mechanism (X-rays, Lyα, or shocks) is negligible. While a full characterization of this "no-heating" scenario would require a detailed simulation, we note that a reasonable rescaling of power spectra for which ξ was assumed to be one (as in Lidz et al. 2008) can be obtained in the following way. Because reionization tends to rapidly produce either fully neutral or fully ionized regions, $x_{{\rm H}\,\scriptsize{I}}$ can be locally represented as either 0 (for an ionized region) or 1 (for a neutral one). Because we have assumed no heating of the neutral medium, the spatial dependence of ξ is negligible. This turns the product T0ξ into an overall multiplicative factor on the temperature fluctuations in Equation (6) and thus rescales the dimensionless ionization power spectrum, $\Delta ^2_i(k)$ (calculated from analytic or numerical models), as

Equation (8)

We can estimate the maximum 〈Tb〉 produced in the no-heating scenario by noting that the CMB and 21 cm spin temperature must stay in equilibrium due to residual Compton scattering until about z ∼ 150 (Furlanetto et al. 2006). Maximum contrast occurs if TS tracks the gas temperature thereafter, so we can parameterize the maximum value of ξ as

Equation (9)

where zdec is the redshift at which the kinetic temperature of the IGM gas and the CMB drop out of equilibrium. Combining Equations (7) and (9) leads to

Equation (10)

To obtain a bound on 〈Tb〉, we treat ξ as a free parameter ranging from ξmax ⩽ ξ ⩽ 1 that parameterizes the relative degree of heating of the IGM. We first note that brightness temperature fluctuations in excess of Equation (10) are ruled out a priori at a given redshift, as indicated by the maximum color scale corresponding to 〈Tb〉 > 400 mK in Figure 8. We then examine three different models for $\Delta ^2_i(k)$ at various ionization states and determine for each the maximum 〈Tb〉, beyond which the predicted value for $\Delta ^2_{21}(k)$ is inconsistent at the 2σ level with the measurements shown in Figure 6.

Figure 8.

Figure 8. Constraints at z = 7.7 on the absolute value of the 21 cm brightness temperature of the neutral gas (not normalized by ionization fraction), |〈Tb〉|, for the patchy reionization model described in Equations (8) and (11), as a function of the upper (kmax) and lower (kmin) bounds on the scale of fluctuations. Color indicates the maximum |〈Tb〉| (in mK) consistent with PAPER's 2σ upper limits on $\Delta ^2_{21}(k)$ (see Figure 6) for the listed ionization fractions. Because the power spectrum amplitude for patchy reionization is invariant under the interchange of ionized and neutral regions, xi = 0.1 and 0.3 equivalently correspond to xi = 0.9 and 0.7, respectively. The maximum color scale of 400 mK indicates the threshold of brightness temperatures that can be excluded a priori on the basis of the maximum contrast between the 21 cm spin temperature and the CMB (see Equation (10)). The black dot and black triangle indicate the coordinates of a patchy-reionization approximation to the fiducial and high-mass halos models illustrated in Figure 9, respectively.

Standard image High-resolution image

We examine two models presented in Lidz et al. (2008) based on simulations by McQuinn et al. (2007). Although the ionization power spectra are associated with suggested redshifts in that paper, they apply to any redshift for the indicated ionization fraction, xi. The first model that we examine is the fiducial "S1" model. For the xi = 0.5 and xi = 0.75 ionization power spectra, we obtain limits of 〈Tb〉 < 275 mK and 〈Tb〉 < 291 mK, respectively. These limits are illustrated in the upper two panels in Figure 9.

Figure 9.

Figure 9. Scaled 21 cm EoR power spectra predicted for the models described in Section 4.3 (Lidz et al. 2008), reflecting bounds on the 21 cm brightness temperature 〈Tb〉 at z = 7.7, according to Equation (8). Magenta curves illustrate power spectra scaled according to 〈Tb〉 ≈ 30 mK, as predicted by simulations that include the effects of X-ray heating on the IGM. Cyan curves illustrate power spectra scaled by the maximum 〈Tb〉 that is under the 2σ upper limits we measure (black). These are, left to right, top to bottom, 〈Tb〉 = 275, 291, 217, and 264 mK, respectively.

Standard image High-resolution image

In the second model that we examine, "S3," reionization is dominated by emission from massive halos, and the 21 cm power spectrum peaks at lower k modes in the final stages of reionization because of the increased spacing between the dominant sources of ionizing photons. For this model, we obtain limits of 〈Tb〉 < 217 mK and 〈Tb〉 < 264 for xi = 0.5 and 0.75, respectively. These limits are illustrated in the lower two panels in Figure 9.

The third model that we consider is a toy "patchy" reionization model described by Equation (21) in P12a, which approximates $\Delta ^2_{21}(k)$ as being flat between the bounds of a minimum (kmin) and maximum (kmax) cutoff in k, scaled appropriately so that total power is conserved:

Equation (11)

where $x_{{\rm H}\,\scriptsize{I}}=1-x_i$ is the neutral hydrogen fraction. Using this toy model, we tune kmin and kmax and determine a maximum mean temperature above which Equation (8) becomes inconsistent with our measurements, as shown in Figure 8. While our measurements are not yet able to exclude 〈Tb〉 ≈ 30 mK predicted for an X-ray-heated IGM under the reasonable assumption that kmax/kmin > 10, we are able rule out 〈Tb〉 ⩾ 370 mK for kmin < 0.1 h Mpc−1 for nearly all physically motivated values of kmax in both the xi = 0.3 and xi = 0.5 models. For xi = 0.1, this result becomes dependent on the value of kmax.

4.4. Implications for X-Ray Heating

The constraints shown in Figures 8 and 9 indicate that, for several reionization models and ionization fractions, our measurements are inconsistent with the 21 cm brightness temperature from neutral regions that would result from the gas in the IGM cooling strictly according to adiabatic expansion after recombination without additional heating. With the caveats that (1) the models we use may not correspond to the actual ionization power spectrum of reionization and (2) we do not know whether the ionization fraction at z ≈ 8 is near to the ionization fractions we investigate (although models such as Zahn et al. 2007 and Lidz et al. 2008 do predict xi = 0.5 near z ≈ 8), the implication of our measurements is that the IGM may have been heated with respect to adiabatic cooling. X-ray heating from inverse Compton scattering from supernovae, high-mass X-ray binaries, or miniquasars is the most likely culprit (Madau et al. 2004; Ricotti & Ostriker 2004; Mirabel et al. 2011; Tanaka et al. 2012), although shock heating, which is currently disfavored (McQuinn & O'Leary 2012), has not been completely ruled out as a possibility.

We have used the PAPER results to explore some basic models for reionization. Under the simplifying, although perhaps not unreasonable, assumption that 〈Tb〉 is roughly factorable from the growth of structure (e.g., ionization bubbles and/or cosmic density), we constrain 〈Tb〉 in Figures 8 and 9 using limits to brightness temperature fluctuations observed by PAPER. For example, assuming a typical minimum k scale of ionization fluctuations of 0.1 h Mpc−1 and a maximum of 10 h Mpc−1, then as shown in Figure 8, the PAPER results imply upper limits on |〈Tb〉| of 191 and 175 mK for neutral fractions of 30% and 50%, respectively, at z = 7.7. For currently favored reionization models dominated by emission from massive halos (McQuinn & O'Leary 2012), PAPER results imply upper limits between 217 and 264 mK.

In summary, for massive-halo-dominated reionization models and for a range of simple patchy reionization models, PAPER data imply that the value of 〈Tb〉 is less than 400 mK. The value of 400 mK is important because it reflects the expected 〈Tb〉 in the case where the spin temperature of the 21 cm line is coupled to the gas kinetic temperature, but the neutral gas is never heated on large scales due to, e.g., X-rays from high-mass X-ray binaries or miniquasars. Physically, this could correspond to a situation where the Lyα photons from the first galaxies are sufficient to couple the spin and kinetic temperatures throughout the neutral IGM via the Wouthuysen–Field effect, but they do not substantially heat the gas (Chen & Miralda-Escudé 2004), nor do supernovae, X-ray binaries, or miniquasars produce enough X-rays to heat the large-scale neutral IGM. We point out that while the X-ray heating constraints we present require a coupling between the spin and kinetic temperatures, we know that this coupling must be occurring at z = 7.7. Observed star formation rates at z = 8 (Bouwens et al. 2010; Schenker et al. 2013) exceed by almost an order of magnitude the threshold of 10−3M yr Mpc3 required for UV photons to couple these temperatures (McQuinn & O'Leary 2012).

While this no-heating model is considered physically unlikely (Furlanetto et al. 2006), the argument can be turned around: the fact that we constrain fluctuations at the 400 mK level provides, for the first time, suggestive evidence for large-scale heating of the neutral IGM during reionization at z > 7. Such constraints on X-ray heating may soon have repercussions on the global signal (Pritchard & Loeb 2012) and on the morphology of ionization (Mesinger et al. 2013). Note that our measurements pertain to the thermal evolution of the neutral gas during reionization and not to the subsequent thermal evolution of the ionized gas after reionization (Hui & Haiman 2003).

5. CONCLUSION

We have set the most stringent limits to date on the 21 cm EoR power spectrum. This achievement was greatly facilitated by using the delay-spectrum approach to achieve a level of foreground isolation of −80 dB outside of the horizon limit. This is the deepest level of foreground suppression that has been achieved so far by a 21 cm reionization experiment, and with a 2σ upper limit of $\Delta ^2_{21}(k)<(41\, {\rm mK})^2$ at k = 0.27 h Mpc−1, the result is within an order of magnitude (in mK) of predictions of the 21 cm EoR signal level. These constraints are sufficient to begin ruling out several scenarios where X-ray and shock heating fail to heat the IGM prior to reionization, although we cannot rule out the possibility of a cold IGM with low neutral fraction at z = 7.7.

Although PAPER has less collecting area than other competing 21 cm reionization experiments, its ability to access lower k modes than, e.g., Paciga et al. (2011), as a result of its smooth instrumental responses combined with the sensitivity boost that arises from the use of a maximum-redundancy array configuration, shows the effectiveness of the approach taken by PAPER. The efficacy of the wide-band delay filter in this analysis shows PAPER's wide operational bandwidth to be a valuable asset for isolating foreground emission in k space. This work represents an important first validation of the delay-spectrum approach to avoiding foregrounds.

At the limit of the sensitivity of these observations, we are beginning to see signs of bias at a range of k modes, presumably arising from foregrounds or other systematics. Additional investigation will be necessary to further diagnose the source of this bias and constrain its source. Using additional data that have been observed but not yet analyzed and a novel filtering technique, we expect forthcoming analysis to yield nearly an order of magnitude improvement in sensitivity, in units of mK2, over the results presented here. Additionally, we also intend to explore this power-spectrum analysis over a wider bandwidth to constraint a broader range of cosmological history. Meanwhile, observations have been completed with a 64 antenna, dual-polarization PAPER deployment in South Africa, and a new observing campaign is just beginning (as of late 2013) with a 128 antenna deployment of PAPER at the same site. Plans are also proceeding for a next-generation instrument, the Hydrogen Epoch of Reionization Array (HERA),22 which, with ∼0.1 km2 of collecting area, will have the capability to characterize the power spectrum of reionization in detail (Pober et al. 2013b).

We thank SKA-SA for the site infrastructure, maintenance, and observing support that has made this work possible, as well as the significant efforts of the staff at NRAO's Green Bank and Charlottesville sites. A.P. thanks M. McQuinn and A. Lidz for helpful discussions and ionization models. The PAPER project is supported by the National Science Foundation (awards 0804508, 1129258, and 1125558), and a generous grant from the Mt. Cuba Astronomical Association.

APPENDIX A: A DATA COMPRESSION TECHNIQUE FOR LOW-FREQUENCY TELESCOPES

Managing the volume of data generated by modern radio telescopes is a technical challenge that is becoming increasingly problematic. For example, the volume of raw data on which this paper's results are based exceeds 10 terabytes. With data volume scaling quadratically with antenna number, PAPER observations in the near future are likely to occupy more than 0.5 petabytes. This problem is common to many current arrays, and several data-reduction schemes (e.g., Ord et al. 2010) have been proposed. These schemes generally place a substantial burden on real-time calibration, a feat that, while possible, often proves elusive, particularly for arrays that are under active development.

In this section, we describe a new method of (lossy) data compression based on delay/delay-rate (DDR) filters (Parsons & Backer 2009) that are tailored to the geometric limits of celestial emission in an interferometer's native observing coordinates, omitting data outside of those limits. In contrast to data-reduction pipelines that require real-time calibration, DDR-based compression requires a minimal amount of information at the time of observation, provided that instrumental responses are smooth in time and frequency. In the minimal limit, substantial compression proceeds based simply on knowing the maximum baseline length in the array.

To briefly review the principles of DDR filters presented in Parsons & Backer (2009), the geometric delay with which a celestial signal originating in direction $\hat{s}$ enters an interferometric baseline b is given by

Equation (A1)

where c is the speed of light. Following directly from this definition, the bounds on the geometric delay for this baseline are

Equation (A2)

For the maximum baseline length of 300 m, |b|/c is approximately 1 μs, which corresponds to −0.52 ⩽ k < 0.52 h Mpc−1 at z ≈ 8. An example of the bounds of a delay filter in DDR space is given by the shaded cyan region in Figure 10. It is worth noting that the range of k that are occupied by foregrounds for the 30 m baselines used in the power spectrum analysis is substantially narrower than the maximum bounds placed by this data-compression step. This leaves a substantial region for probing the cosmological 21 cm signal that includes the modes to which we are most sensitive.

Figure 10.

Figure 10. Delay/delay-rate (DDR) transform of RFI-flagged visibilities from 30 m PAPER baselines. Raw visibilities are integrated for 10.7 s with a channel bandwidth of 50 kHz to facilitate the detection and removal of RFI and other transient events. The regions of DDR space sampled according to these observing parameters far exceed the delays (cyan) and delay rates (magenta) that can be occupied by sky emission that smoothly varies versus time and frequency, as dictated by a maximum baseline length of 300 m in the array. Color scale indicates log10(Jy), with arbitrary scaling. Although DDR filtering is applied to each baseline individually, redundant 30 m baselines were coadded in this figure to suppress noise and better illustrate the regions occupied by sky signal.

Standard image High-resolution image

Taking the Fourier transform of a visibility along the frequency axis (see Equation (2)) produces a delay-domain expression of the visibility, which we can approximate for a model sky of point sources indexed by n as

Equation (A3)

where $\tilde{A}(\tau,\hat{s}_n(t))$ indicates the Fourier transform of the antenna response in source direction $\hat{s}_n$ along the frequency axis, $\tilde{S}_n$ indicates the Fourier transform of the source flux density along the frequency axis, δD is a delta function relating delay to the geometric delay τg, n in the direction of the source indexed by n, and "*"indicates a convolution in delay.

Provided that celestial emission and instrumental responses are smooth versus frequency so that $\tilde{A}$ and $\tilde{S}$ are compact in τ, emission in $\tilde{V}$ will be tightly bounded by the geometric constraints on τg given in Equation (A2), and a low-pass filter may be applied to $\tilde{V}$ that preserves this range of τ modes. In doing so, such a filter preserves smooth-spectrum celestial emission and removes modes that are expected to consist mostly of noise. Moreover, with the range of delay modes now being band-limited, visibilities may be decimated in frequency without aliasing, according to the critical Nyquist rate. This decimation is the basis of data compression in the frequency direction.

Similar geometric limits apply to the variation of visibilities in the time dimension. As described in Equation (7) of Parsons & Backer (2009), the rate of change of the geometric delay versus time that is, delay-rate is given by

Equation (A4)

where b = (bx, by, bz) is the baseline vector expressed in equatorial coordinates, ω is the angular frequency of the Earth's rotation, and H, δ are the hour-angle and declination of a point on the celestial sphere, respectively. As a result, there exists a maximum rate of change based on the length of a baseline projected to the z = 0 equatorial plane. For arrays not deployed near the poles, |by| ≫ |bx| (i.e., they are oriented more along the east–west direction than radially from the polar axis), and the maximum rate of change corresponds to H = 0 and δ = 0, where we have

Equation (A5)

For a maximum east–west baseline length in the PAPER array of 300 m, ω|by|/c is approximately 0.07 ns s−1. To better elucidate the meaning of this bound, we take the Fourier transform along the time axis (see Equation (8) in Parsons & Backer 2009) for a model visibility consisting of a single point source located at the point of maximum delay rate, which gives us

Equation (A6)

where f is the fringe rate of the source,23$\tilde{A}(\nu,f)$ indicates the Fourier transform of the antenna response along the time direction, and approximation is indicated because we assume |by| ≫ |bx| and because the Fourier transform must involve a discrete length of time, during which our assumption that cos H ≈ 1 breaks down at second order. The delta function above gives rise to the expression for the maximum fringe rate in Equation (3).

This example of a source with a maximal fringe rate serves to show that a filter may be applied in the delay-rate domain, using the fact that the maximum delay rate is bounded by the maximum fringe rate within the band (i.e., evaluating Equation (3) at the maximum ν involved in the delay transform), to remove emission that exceeds the variation dictated by array geometry for sources locked to the celestial sphere. As in the delay-filtering case, assuming the geometric bounds on delay rate implicitly assumes that $\tilde{A}$ and $\tilde{S}$ are compact in f, which is to say that instrumental responses and celestial emission must be smooth in time; variable emission from, e.g., fast transients will be heavily suppressed by such delay-rate filters. For PAPER, with a maximum baseline length of 300 m and a maximum observing frequency of 200 MHz, the maximum delay rate has a period of 68.5 s. As described in Section 3.5, at PAPER's latitude, delay rates range from −fmax/2 to fmax. Filtering delay rates to this range corresponds in sensitivity to an effective integration time of 45.2 s. An example of the bounds of a delay filter in DDR space is given by the shaded magenta region in Figure 10. As in the delay filtering case, filtering along the delay-rate axis permits substantial down-sampling of the signal, which is the basis for the reduction in data volume along the time axis. We note that for the analysis in Section 3.1, we choose to use a slightly wider delay-rate filter to be conservative in our first application of this technique, corresponding to an integration time of 43 s.

A serious issue associated with DDR filtering pertains to how data flagging associated with the removal of RFI and systematics violates the assumptions of spectral and temporal smoothness that are implicit in the geometric bounds we derive for delay and delay rate. This issue is addressed in Parsons & Backer (2009) by a CLEAN-like iterative forward modeling deconvolution in delay domain for removing the side lobes created by the unsmooth sampling functions associated with data flagging. We apply the same technique here for DDR-based data compression. We also take advantage of the substantial suppression of celestial emission that results from the removal of low delay and delay-rate modes to improve RFI flagging. Hence, our approach to implementing DDR-based data compression proceeds as follows.

  • 1.  
    Coarse RFI flagging, using time and frequency derivatives to suppress celestial emission.
  • 2.  
    DDR transformation and CLEAN deconvolution of data from a subset of baselines. This culminates in a model of emission for each of these baselines, with model components restricted to fall within specified geometric limits. In order to avoid edge effects when filtering along the time axis, we filter data windows that overlap by a factor of three and retain only the central third of the filtered data.
  • 3.  
    Refined RFI flagging of residuals after subtracting the above model from each baseline in the subset.
  • 4.  
    Application of the RFI flagging to all baselines on the basis of occupancy in time/frequency bins.
  • 5.  
    DDR transformation and CLEAN-like deconvolution of raw data from all baselines, based on the refined RFI flagging.
  • 6.  
    Filtering of DDR-domain data into regions that are saved to three separate output files as follows.

The result of the application of this procedure for smooth-spectrum nontransient emission to data from one of the baselines used in this analysis is shown in Figure 11. As shown, the spectrally and temporally smooth components of the signal are preserved by data compression, while the unsmooth components that originate predominantly from thermal noise are removed. This compression algorithm is lossy, but the regions lost in the compression (namely, the unshaded corners in Figure 10) are likely to be of minimal import for many low-frequency radio telescopes.

Figure 11.

Figure 11. Dynamic spectrum of RFI-flagged visibilities from a 30 m PAPER baseline prior to (left) and after (right) the application of data compression along the delay and delay-rate axes. Color scale indicates log10(Jy) with arbitrary scaling.

Standard image High-resolution image

The computational cost of DDR compression is dominated by the deconvolution of sampling function that is required because of RFI flagging. The CLEAN algorithm is nonlinear, but the computation time generally scales as N2, where N is the number of antennas in an array. A 35 core cluster was sufficient to compress 32 antenna dual-polarization PAPER observations roughly four times faster than real time. For an upcoming 128 antenna deployment of PAPER, we will be deploying a 64 core cluster that is expected to be sufficient to compress data two times faster than real time.

This technique can be easily generalized to other arrays, as it requires minimal information about the geometry of an array. Compression factors could be improved by more carefully tailoring DDR filters to the geometry of each baseline individually. We have not explored this on PAPER because the advantages of a compression system that requires minimal information about the array at run-time and is minimally restrictive with respect to further analysis seem to outweigh the need for additional data compression at this time.

APPENDIX B: ON CALCULATING BEAM AREAS FOR NORMALIZING POWER-SPECTRUM MEASUREMENTS

In this section, we investigate in greater detail the issue of how beam area enters into the power-spectrum formalism for 21 cm intensity mapping derived in Morales (2005), McQuinn et al. (2006), Pen et al. (2009), and P12a. In particular, we show that, for the purpose normalizing power spectrum measurements by the integrated volume, the appropriate beam area to use is proportional to the integral of the square of the power response. This fact has been obscured in many derivations in the literature by certain approximations that have been made to make volume integrals analytically tractable. This derivation represents the general form of the special case for a Gaussian beam that was correctly derived in Pen et al. (2009).

We begin by examining the integrated volume, $\mathbb {V}$, used to normalize the 3D Fourier transform in Equation (3) of P12a. We express this volume in observing coordinates as

Equation (B1)

where B is the bandwidth, Ω is the angular area, and X, Y are redshift-dependent scalars relating angle and frequency to spatial scales, respectively. The variable Ω arises from the bounds set by A(l, m, ν), the antenna power response, on the angular extent in the integral

Equation (B2)

which is a slightly modified version of Equation (6) of P12a relating the delay-transformed visibility $\tilde{V}$, sampled at wavemodes u, v (the Fourier complements of angular coordinates l, m) and η (the Fourier complement of spectral frequency ν), to a temperature field T. Re-expressing this in real-space cosmological coordinates, we have

Equation (B3)

We may then use the definition of the Fourier transform (following the convention of Equation (1) in P12a) to write

Equation (B4)

with the multiplicative terms in the previous equation now appearing as convolutions, and $\tilde{A}$ indicating the 3D Fourier transform of the antenna power response. Combining the two integrals yields

Equation (B5)

where the second step uses the relationship between an estimate of the power spectrum $\widehat{P}({\bf k}^\prime)$ and the two-point correlation function of a temperature field given in Equations (2) and (3) in P12a. As discussed in P12a and McQuinn et al. (2006), the narrow width of $\tilde{A}$ in k space in comparison to the scales on which $\widehat{P}({\bf k})$ varies allows $\widehat{P}({\bf k})$ to be removed from the integral, giving us

Equation (B6)

Using the coordinate substitution qk' − k, we have

Equation (B7)

Finally, we invoke Parseval's theorem to substitute the integral of the variance in real space, picking up a factor of $1/\mathbb {V}$ in accordance with our chosen Fourier convention, so we have

Equation (B8)

We compare this result with the relation between the delay-transformed visibility, ${\tilde{V}}$, to the 3D power spectrum of reionization, P21(k) (P12a):

Equation (B9)

As this shows, the relevant beam area in Equation (B9) is the power-square beam, ΩPP, given by

Equation (B10)

This contrasts with the standard metric for beam area, the integrated power beam, which we will call ΩP and is given by

Equation (B11)

This beam area metric is used to convert visibility measurements from Jy units to mK, but it is incorrect for normalizing power spectra that relate to the two-point correlation function of a temperature field.

In the literature to date, sensitivity derivations have commonly assumed a top-hat beam response or a sinc response in the uv plane (see, e.g., Morales 2005; McQuinn et al. 2006; and P12a). Because ΩP = ΩPP under these assumptions, this has permitted some ambiguity as to which beam area is intended. For equations that relate power-spectrum sensitivity to a system temperature (e.g., Equations (15) and (16) in P12a)

Equation (B12)

should be used in lieu of Ω because these equations pick up two factors of ΩP in the conversion from Jy2 to mK2, along with the factor of ΩPP in the denominator relating to the integrated volume. For equations that relate a measured visibility (in units of brightness temperature, e.g., Equation (4)) to $\widehat{P}({\bf k})$, the factor of $\Omega _{\rm P}^2$ is already applied in the conversion from units of Jy to mK, and Ω corresponds to the remaining factor of ΩPP.

For PAPER, ΩP ≈ 0.72 sr, whereas ΩPP is 0.31 sr. Following the definition above, Ω' ≈ 1.69. These beam areas are calculated numerically from a beam model, but typically Ω' is about a factor of two larger than ΩP.

APPENDIX C: OPTIMALITY OF ANALYSIS FOR REMOVING OFF-DIAGONAL COVARIANCES

In this section, we investigate delay-domain analysis through the lens of the quadratic estimator formalism (Liu & Tegmark 2011; Dillon et al. 2013), ultimately showing that the analysis technique of removing off-diagonal covariances arising from systematics produces optimal power-spectrum estimators, to leading order.

C.1. Review of Quadratic Estimator Formalism

We begin by reviewing the quadratic estimator formalism. In this formalism, the value of the power spectrum pα in the αth k bin (i.e., P(kα), also known as the "bandpower") can be optimally estimated by the quantity24

Equation (C1)

where the hat signifies that $\widehat{p}_\alpha$ is an estimate of the true bandpower pα, x is a vector containing binned data with ${\bf x}_\alpha \equiv \tilde{V}(\tau _\alpha)$ from Equation (2), C ≡ 〈xx†〉 is the covariance matrix of the data, Mαβ (discussed below) normalizes the estimator, and Qβ is the response of the covariance to the βth bandpower. The family of Q matrices (one for each value of β) maps the vector space of bandpowers to the vector space of data and is constructed to satisfy the relation

Equation (C2)

where Csky is the contribution of sky emission to the covariance. This contribution includes both the foregrounds and cosmological signal. The foregrounds are included with the signal because we have conservatively avoided the direct modeling (and subtraction) of foregrounds from the data (Dillon et al. 2014). The total covariance is given by

Equation (C3)

where N is the instrumental noise covariance, and Cjunk is the contribution of all contaminants except for foregrounds and noise.

Thinking of Mαβ as being an element of a matrix M that lives in a vector space of bandpowers, one is free to choose between several different choices of M, all of which are lossless (because M is generally taken to be an invertible matrix and all steps prior to its application are also lossless) but have different error properties. Consider for example the window function matrix W, which relates the expectation value of our bandpower estimators to the true bandpowers:

Equation (C4)

where we have grouped the bandpowers into vectors. The window function matrix can be shown to take the form

Equation (C5)

where F is the Fisher information matrix, given by

Equation (C6)

Different choices for M give window functions (rows of W) with different properties, and one is subject only to the restriction that M must be normalized in a way that ensures that each window function integrates to unity to conserve power. We pick M = F−1 without loss of generality, a fact that we will explain below (see Dillon et al. 2014 for other contexts where one might entertain other choices for M, such as taking M to be diagonal).

It is important to note that Equation (C1) estimates power spectra from single-baseline data. The length of x is therefore equal to the number of delay bins, and the covariance denoted here relates different delay bins from the same baseline only. The covariance matrix shown in Figure 5, in contrast, includes correlations between delay bins of different baselines as well. However, its analysis in Section 3.7 serves only to provide a better estimate for C by taking advantage of redundant baselines. In other words, our final result in Appendix C.2 will yield a procedure for removing off-diagonal systematics that operates on a per-baseline basis once C has been estimated, with information from different baselines mixing only in the cross-multiplication to form the power spectrum.

C.2. Application to Delay-transformed Visibilities

We now turn to applying the quadratic estimator formalism to delay-transformed visibilities. As derived in P12b, τ modes are a good approximation to k, and their mutual orthogonality makes the simplified analysis in Section 3.6 a good approximation to optimal analysis. As a result, we only apply the full quadratic formalism to leading order in order to safeguard against the mismodeling of covariances, which are estimated empirically from a finite data set and are therefore imperfect representations of the true covariances. In this case, the leading-order optimal estimator reduces to the simple prescription given in Equation (4), provided that we used modified visibilities (as shown below) in place of $\tilde{V}(\tau,t)$.

Working in delay space, the Q matrices become particularly simple because delay modes are to an excellent approximation the same as Fourier modes along the line of sight (P12b). In this approximation, Csky is diagonal, with elements equal to the bandpowers, i.e.,

Equation (C7)

where A, B are index delay modes, and δAB is the Kronecker delta function. It follows that in this basis, the Q matrices simply pick out the correct delay/Fourier component:

Equation (C8)

Like the sky signal, the instrumental noise covariance can be modeled as a diagonal matrix, so that

Equation (C9)

where σA is the rms noise fluctuation in the Ath delay bin.

With ideal data, where Cjunk = 0, the preceding discussion implies that C should be diagonal. However, as we can see from Figure 5, the actual data covariance contains nonideal off-diagonal entries. We model the real covariance matrix as

Equation (C10)

where $r_A \equiv \sqrt{p_A + \sigma _A^2}$ is the total rms power (sky plus noise) in the Ath delay bin, and εAB is the coupling constant between the Ath and Bth bins. Since C is Hermitian, we note that $\varepsilon _{AB} = \varepsilon _{BA}^*$. Note that so far, our parameterization of C is still completely general because we have yet to specify the ε parameters; the fact that the off-diagonal entries are scaled to the diagonals is simply a convenient convention driven by the observation that our empirically derived covariances seem to roughly follow this form. By comparing Equations (C3), (C7), and (C9), we see that25

Equation (C11)

We will now proceed to compute the various quantities required for our power spectrum estimator, Equation (C1). First, we compute C−1, which will be our primary tool for mitigating the influence of the junk contribution. Instead of performing a brute-force inversion of C, we can make the observation that it is equal to26

Equation (C12)

Taking advantage of the fact that (N + Csky + Cjunk)−1 appears on the right-hand side, we can iterate this expression into a series:

Equation (C13)

Writing C−1 in this form makes our results more robust to "noise" that arises from our empirical determination of the covariance matrix. For example, our series expansion does not require the inversion of Cjunk, which may be unstable under inversion thanks to our imperfect estimation of the covariance. One only needs to invert the combination N + Csky, which is trivial because both N and Csky are diagonal. With a systematic series expansion, we can also henceforth work only to leading order in the ε parameters. We make this choice based on the intuition that uncertainties in our covariance estimation mean that higher order terms cannot be reliably estimated. A leading-order approximation may therefore be preferable to using an overly "noisy" covariance to determine C−1, as that would result in an overfitting of the noise in the data (Tegmark et al. 1998).

Taking only the first two terms of Equation (C13), one obtains

Equation (C14)

Inserting this into Equation (C6), it is straightforward to show that the effects of Cjunk affect the Fisher matrix only to second order in the ε parameters and may therefore be neglected:

Equation (C15)

The final ingredient that we require before we can estimate power spectra is the normalization matrix M. As we remarked above, we will pick M = F−1. If we had instead picked some popular alternatives such as taking M to be diagonal, it would have made no difference because we have just shown that the Fisher matrix is diagonal to leading order.

With a diagonal Fisher matrix, we can further simplify our power spectrum estimator. We first define a family of matrices

Equation (C16)

so that our estimator can be written compactly as

Equation (C17)

Inserting M = F−1 as well as Equations (C8) and (C15) into our definition of Gα yields

Equation (C18)

Substituting this and Equation (C14) into Equation (C17), we obtain

Equation (C19)

This, then, is our leading-order optimal estimator for the bandpower pα in the presence of a "junk" covariance that couples different delay bins. Intuitively, our estimator instructs us to subtract from each delay bin a predicted leakage from all other bins, which is determined by a coupling constant and the ratio of rms power levels between the bins.27 The resulting quantity is then squared to yield a final power estimate.

Written in this way, it is also clear how one would generalize the result to cross-power spectra. The optimal estimator for cross-power spectra between data from baseline i and baseline j, for example, would be

Equation (C20)

In summary, if we define

Equation (C21)

then Equation (C20) reveals that Equation (4) is to leading order an optimal estimator, provided we use $\tilde{V}^\prime$ in lieu of $\tilde{V}$.

Footnotes

  • 11 
  • 12 
  • 13 
  • 14 
  • 15 
  • 16 
  • 17 

    That the signal loss is relatively small outside the horizon is perhaps unsurprising. To achieve a coherent reduction of the cosmological signal outside the horizon, it is necessary for delay-space side lobes to couple signal from outside the horizon to modes within the horizon, which are then forward modeled to give outside-the-horizon side lobes to subtract from the original signal. This sort of coherent reduction represents two factors of side lobe attenuation on top of an already small cosmological signal and is therefore negligible.

  • 18 

    As described in detail in Appendix B, the angular area used to normalize high-redshift 21 cm power spectrum measurements (e.g., Ω in Equation (4)) is proportional to the integral of the squared beam power over angular area (ΩPP; Equation (B10)). This contrasts the standard beam area (ΩP; Equation (B11)) that is used to relate flux density to a brightness temperature. Because Equation (4) relates a measured visibility in units of brightness temperature to P(k), a factor of $\Omega _{\rm P}^2$ has already been applied to convert Jy to mK. In this case, Ω indicates the remaining factor of ΩPP, which for PAPER is 0.31 sr.

  • 19 

    Because the final result of this paper is a set of upper limits and not a detection of the cosmological power spectrum, it is not necessary to include a cosmological signal at the theoretically expected level in our simulations. The simulations only need to quantify signal loss at the current amplitude levels, such that if there were a signal entering at the level of our upper limits, we would have robustly detected it. Because the upper limits described in Section 4.2 are dominated by residual foregrounds and thermal noise, it is unnecessary to include the completely subdominant theoretical signal. We inject signal amplitudes for P(k) in the range 106–107 mK2, which may be compared with the upper limits shown in the left panel of Figure 6.

  • 20 

    The constraint in Paciga et al. (2011) on the brightness temperature of 21 cm emission for a single-bubble reionization model was later retracted in Paciga et al. (2013).

  • 21 

    Recent measurements (Bouwens et al. 2010; Schenker et al. 2013) show that there are certainly enough UV photons for this to be a reasonable approximation at z ∼ 8.

  • 22 
  • 23 

    Delay rate is equivalent to the frequency-integrated fringe rate.

  • 24 

    Equation (C1) in principle requires an extra bias-removal term in order to be optimal (Liu & Tegmark 2011; Dillon et al. 2013). For simplicity we omit this term in anticipation of Appendix C.2, where we will generalize our results to cross-power spectra between baselines. Because cross-power spectra do not incur noise biases (Dillon et al. 2014), our final result will not require the bias-removal term.

  • 25 

    It is important to emphasize that, in general, systematic contaminants need not be limited to off-diagonal entries. However, diagonal systematics are difficult to distinguish from signal without other distinguishing features, and so here we set Cjunk to be traceless by definition. In effect, any systematics are included as additional signal along the diagonals of Csky or N. An examination of our result will reveal that this amounts to a conservative approach where we leave diagonal systematics untouched in an effort to avoid subtracting cosmological signal.

  • 26 

    This is a special case of the binomial inverse theorem.

  • 27 

    If robust models of diagonal systematics were available, their inclusion in Cjunk would simply result in the elimination of the β ≠ α restriction on the sum in Equation (C19).

Please wait… references are loading.
10.1088/0004-637X/788/2/106