The Number Density Evolution of Extreme Emission Line Galaxies in 3D-HST: Results from a Novel Automated Line Search Technique for Slitless Spectroscopy

The multiplexing capability of slitless spectroscopy is a powerful asset in creating large spectroscopic datasets, but issues such as spectral confusion make the interpretation of the data challenging. Here we present a new method to search for emission lines in the slitless spectroscopic data from the 3D-HST survey utilizing the Wide-Field Camera 3 on board the Hubble Space Telescope. Using a novel statistical technique, we can detect compact (extended) emission lines at 90% completeness down to fluxes of 1.5 (3.0) times 10^{-17} erg/s/cm^2, close to the noise level of the grism exposures, for objects detected in the deep ancillary photometric data. Unlike previous methods, the Bayesian nature allows for probabilistic line identifications, namely redshift estimates, based on secondary emission line detections and/or photometric redshift priors. As a first application, we measure the comoving number density of Extreme Emission Line Galaxies (restframe [O III] 5007 equivalent widths in excess of 500 Angstroms). We find that these galaxies are nearly 10 times more common above z~1.5 than at z<0.5. With upcoming large grism surveys such as Euclid and WFIRST as well as grisms featuring prominently on the NIRISS and NIRCam instruments on James Webb Space Telescope, methods like the one presented here will be crucial for constructing emission line redshift catalogs in an automated and well-understood manner.


Introduction
In recent years, combinations of deep imaging and spectroscopy with the Hubble Space Telescope (HST) have been used to tackle a number of outstanding questions in observational astronomy. The HST has a particular advantage in the near-IR with the Wide-Field Camera 3 (WFC3), due to the lower sky background levels compared to ground-based observatories and the higher spatial resolution. Using the grism, all sources within the ∼2′×2′ WFC3/IR field of view have dispersed spectra, which are essentially a series of monochromatic (overlapping) two-dimensional images shifted on the detector according to their wavelength. The combined spatial and spectral information gives insight into, for example, the growth of disks and bulges at high-redshifts (Hathi et al. 2009;van Dokkum et al. 2013;Patel et al. 2013), the spatial distribution of star formation (Nelson et al. 2012), the regulation star formation in massive galaxies (Ferreras et al. 2012), and the role of environment and mergers in shaping the galaxy population (Schmidt et al. 2013). Additionally, these surveys are very efficient at covering large areas with a superior multiplexing capacity compared to even the most advanced multi-object spectrographs, allowing for complete studies of rare objects such as cold brown dwarfs (Masters et al. 2012) or 4<z<7 Lyman-break and Lyαemitting galaxies (Pirzkal et al. 2004;Malhotra et al. 2005;Rhoads et al. 2009).
That being said, slitless grism spectroscopic data are more complex to interpret than standard spectroscopic data. Contamination from unrelated spectra makes a detailed analysis of individual objects challenging, particularly in crowded fields, and often only sources detected via shallow ancillary imaging are analyzed, limiting the potential for discovery. As emission lines contain so much astrophysically interesting information and are the easiest spectral features to detect in faint sources, their detection tends to be the primary focus of grism surveys. Different methods for their discovery have been developed and tuned to the various strengths of the specific set of observations. Meurer et al. (2007) outline two techniques for finding emission lines in a semi-automated fashion. The first method relies on the detection of sources in the direct image. Each source has its corresponding grism spectrum extracted, and emission lines are detected by visual inspection. This is the preferred method of the WISP survey (Atek et al. 2010), a pureparallel survey using WFC3. There, spectral extractions are performed using the aXe software (Kümmel et al. 2009) developed to analyze HST grism data. Most spectra are taken with both the G102 and G141 grisms, covering an effective wavelength range of 0.8-1.7 μm.
The second method involves searching for emission lines directly in the grism frames. This is done by smoothing the grism frame with a sausage-shaped filter, designed to match the spatial extent of dispersed first-order spectrum, and then subtracting this smoothed image from the original frame. This effectively removes continuum sources from the image while leaving spectrally compact features, such as emission lines, which can be detected using a simple signal-to-noise ratio (S/N) cut. Undispersed zeroth order spectra from the brightest objects also appear as point sources, but their position is wellknown and they are easily masked. For each detected feature in this subtracted frame, a cutout of the direct image is inspected to determine which source could have produced the feature. This is the preferred method of the PEARS survey (Straughn et al. 2008;Pirzkal et al. 2013) using the G L 800 grism of the Advanced Camera for Surveys (ACS) covering 0.5-1.1 μm. That survey has the added advantage of having multiple position angles, such that identification of the source of the emission line in the direct image is simply a matter of identifying the area where the different spectral traces for the same feature intersect.
Both approaches have a serious limitation: while the line candidate identification may be semi-automated, the significance assessment is not, but instead relies on visual inspection, often requiring multiple people to grade the reliability of each line. Such an approach makes a determination of the true completeness difficult and is not feasible for larger data sets. In addition, the first approach only has high fidelity in redshift assignments in cases where more than one line is detected. We jointly analyze photometric redshift information and grism spectra in the determination of redshifts. This differs from the approaches of some of the aforementioned studies that utilize independent photometric redshifts to verify their results. Relying only on multiple line detections introduces problems in the subjective nature of line identification, as well as preferentially selecting objects in certain redshift ranges, typically where both Hα and [O III] λλ4959,5007 are visible. Indeed, while the quoted flux limit for compact emission lines in WISP is´---5 10 erg s cm 17 1 2 , which is based on the WFC3 exposure time calculator, these lines are often only detected in objects that have a second, brighter line.
In this paper, we develop and apply a new method for detecting significant emission features in grism spectroscopic data, using data from the 3D-HST survey 7 (van Dokkum et al. 2011;Brammer et al. 2012b). 3D-HST is a near-infrared spectroscopic Treasury program utilizing WFC3. This program provides WFC3/IR primary and ACS parallel imaging and grism spectroscopy over approximately three-quarters (∼700 square arcminutes) of the CANDELS fields (Grogin et al. 2011;Koekemoer et al. 2011). We focus here on the WFC3 grism data, which utilizes the G141 grism covering 1075-1700 nm.
3D-HST provides several advantages over other existing HST grism surveys. As the observations are dithered in a fourpoint pattern, the processed images offer additional robustness against the effect of hot and bad pixels that a pure-parallel survey cannot. We also have the ability to interlace the frames instead of drizzling them, where the pixels from the input images are alternately placed in the output image according to the position of the pixel centers in the original images: see Figure 3 of Brammer et al. (2012b). Interlacing the frames results in better noise characteristics, which is crucial to consider when pushing toward the faint limits of emission line sensitivity; the interlacing procedure is described fully in Momcheva et al. (2016). This also provides higher spatial resolution than drizzling (each interlaced pixel is 23 Å×0 06), and the ability to more easily identify pointlike emission sources.
The remainder of this paper is organized as follows. In Section 2 we describe the 3D-HST data used. Section 3 lays out our new method to search for significant emission lines in slitless spectroscopic data, utilizing photometric information as a redshift/emission line position prior. Extensive tests of the method to determine the completeness function as well as contamination are performed and discussed in Section 4. We apply our method to the 3D-HST data set in Section 5 to obtain a sample of high-equivalent width (high-EW) emission line galaxies. We then employ a Bayesian analysis to constrain a parameterized model for their luminosity and redshift distribution. Finally, in Section 6, we summarize and compare these results to previous studies of high-EW galaxies, namely the population of extreme emission line galaxies (EELGs). We adopt a ΛCDM cosmology with W = 0.3 m and = H 0 70 km s −1 Mpc −1 throughout.

Data
The spectroscopic data comes from the 3D-HST survey, designed to provide spectroscopy for four well-studied CANDELS extragalactic fields: AEGIS, COSMOS, GOODS-S, and UDS. The data set is supplemented by grism spectroscopy for GOODS-N (PI: B. Weiner). (Undispersed) objects are detected in a combined CANDELS/3D-HST F125W+F140W+F160W image, and multiband photometry is obtained as part of the Skelton et al. (2014) photometric catalog. A thorough description of the 3D-HST spectroscopic release, which includes extracted 2D grism spectra for all ∼250,000 objects in the Skelton et al. (2014) photometric catalog, and data reduction methods are given in Momcheva et al. (2016). We briefly summarize some of the important points here.
A model for the grism spectra of the entire field is created as follows. For a given object, we distribute the light (and consequently the extraction weight in the spatial direction) according to the EAZY ) continuum fit at the photometric redshift estimate, with the spatial extent according to the F125W+F140W+F160W "postage stamp" image of the object. Next, for bright objects ( < m 22 ), we use the the extracted spectrum as a second step to give the continuum the correct shape and to take the brightest emission lines into account. Creating a continuum model individually for all objects allows us to construct a model of the flux distribution for the entire field. This is useful because of spectral confusion due to overlapping unassociated spectra in the grism data. Since we create the full modeled spectra for each pointing, each extracted 2D spectrum has the modeled spectra from surrounding objects ("contamination") subtracted. We also subtract an object's own continuum ("model") in order to search for positive residuals, namely emission features. For the brightest objects, the model does not always subtract cleanly and can lead to spurious detections in neighboring objects, so we also mask any "contamination" regions where the model flux exceeds primarily interested in emission lines with high-equivalent widths in this study (see Section 5), we focus on objects with F140W AB magnitudes fainter than 24.

Simple Model Fitting of Emission Lines in 3D-HST
In this section we set out to devise a straightforward, probabilistic method to detect emission lines and assess their significance algorithmically. It is based on the following elements: to start, we calculate what the likelihood of the data is, if there was an emission line with amplitude A at a spectral position Δx with respect to the undispersed image. To assess the redshift and significance of any detection, A>0, we incorporate three pieces of independent information as prior beliefs. First, we know that in most instances there is either one or no significant emission line in the entire spectrum of an object; conversely, that means the vast majority of pixels must contain no emission line. Second, any emission line flux we detect must not violate the total flux constraints from the corresponding broadband image. Finally, in many instances, we have photometric information about the probable redshift of the object ("photometric redshift priors"), which also informs our assessment of any emission line detection significance and line identification. Taken together, this results in a joint statement about whether any object has a significant emission line, and if so at which redshift. In other words, line detection and identification are an integrated process.

Line Detection Formalism
We presume the following algorithm to operate on continuum-subtracted spectra, which we obtain as follows: every object in the Skelton et al. (2014) photometric catalog has a grism spectrum ( ¢ S ) with an associated noise spectrum (s S ) and a direct F125W+F140W+F160W-combined postage stamp (I, of dimensions x max and y max ). From each grism spectrum, ¢ S , we subtract its continuum flux model and a flux model for all contaminating spectra (see Section 2 and Momcheva et al. 2016) to obtain a spectrum S, in which we search for residual emission features.
As a convolution kernel in the line search, we need to construct an empirical template for the expected spatial distribution of a monochromatic line image at any given wavelength. We construct that from the undispersed image, I, applying a S/N cut of 2σ above the background level. For all sources whose undispersed image has fewer than 20 pixels above this threshold, we instead use the HST F140W Point Spread Function (PSF) scaled to the same flux as the image. This choice is justified, as the area of 20 (interlaced) pixels is approximately the size of a WFC3 PSF.
Presuming there is a line image of shape I, offset by Δx along the spectrum and characterized by an amplitude scale factor A, we calculate the likelihood of the data as where the spectral emission line model is , . We are dealing with a two-dimensional projection of the three-dimensional spectral information ( --S x y , , on sky on sky l rest ). In this context, , represents a correlation between I and S in the dispersion direction for different Dx.
Throughout, we use pixel coordinatesx in the dispersion direction, which of course reflect different wavelengths, l . The maximum value of Δx corresponds to the length of S, here denoted as Dx max . The scale factor A ranges from 0 to 1. At a given position, A=0 implies that there is no signal present in the spectrum. Conversely, A=1 corresponds to a position where the entire flux 8 of the galaxy is contained in a single emission feature with the same spatial extent as the direct image and is unresolved spectrally. (The mean dispersion of the primary spectral order of the G141 grism is 46.5 Å/pixel or R = 130.) We calculate the likelihood at each integer value of Δx, noting that the FWHM of the WFC3 PSF is 1.1 native pixels at 1.4 μm.
However, we want the posterior distribution function posterior of the possible line amplitude A, not the likelihood of the data. The two are related via Bayes's theorem:

prior
More specifically, we want the probability that there is a significant line detection, A>0, at any given position: posterior This is illustrated in Figure 1. Equation (2) requires the prior information on both A and Dx. Absent informative photometric redshift information (see Section 3.2), prior max max , for the case of one emission line in the entire spectrum. This is typically ∼1/300 (i.e., =1). It is this prior D ( ) p x prior that accounts for the fact that we query Dx x max independent positions along the spectrum whether there is a significant line flux. When considering only the likelihood (Equation (1)), we would expect one spurious "3σ-detection" for every ∼200 independent spectral positions, in the absence of any emission line and in the absence of any systematic errors or residuals. The factor D ( ) p x prior prevents such false detections, as the prior probability that there is no We have no external information on the possible line flux in any one object, except that A is bound by [0, 1]. Therefore The two priors on A and Δx can be combined to  . In practice, we expect such regions in Δx to have the extent of the PSF or postage stamp size, x max . This formalism is based on the assumption of a single line in the spectrum. If there are two significant lines (e.g., [O III] and Hα), then we would expect two disjoint regions in Δx with significant , threshold . If a single position Δx meets the threshold, then we simply translate it into a wavelength λ, a single value of A which can be transformed into a line flux in physical units (also taking into account the normalized wavelength-dependent grism sensitivity), and an uncertainty on that flux given the distribution of D ( |{ } ) p A S x , posterior . However, given that we are dealing with three-dimensional spectra, a bright emission line in an object that is not a point source produces significant detections of A at positions near the intrinsic l central . Our best estimate of the central line position is the "significant" pixel responsible for the maximum peak in

Incorporating Photometric Redshift Priors
A strength of this approach is that independent information about the likely redshift of the objects can be folded-in straightforwardly and stringently: it simply gets incorporated as a non-constant D ( ) p x prior in Equation (5). Given the amount of ancillary photometry in the 3D-HST/CANDELS fields, spanning from UV to IR wavelengths, specifically 0.3-8 μm, it is straightforward to estimate the redshifts photometrically for the sample (a full description is given in Brammer et al. 2012b). Briefly, we calculate photometric redshifts by applying EAZY, which calculates model fluxes by convolving linear combinations of high-resolution spectral templates with the filter transmission curves, to the broadband spectral energy distributions (SEDs) in order to estimate a probability distribution for the redshift, P(z). In addition to the default EAZY template sets, we include an additional dusty star-forming template (a Bruzual & Charlot 2003, SSP of 1.5 Gyr and A V =2.5) and an EELG template (the highest sSFR galaxy from Maseda et al. 2014, UDS-6195), as shown in Figure 2. We choose to use this P(z) distribution in cases when the minimum reduced χ 2 value is less than 5, which happens in ∼90% of the cases; otherwise we adopt a flat P(z) prior. This is illustrated in the upper two panels of Figure 3.
For a single line detection in a given spectrum, we do not know which restframe emission feature it corresponds to. The strongest (blended) emission line complexes we expect to typically observe are Paβ λ12820, He I λ10830, λλ3727,3729, and Mg II λ2800. This implies that we are only searching for sources with z5.1. 9 We convolve the redshift prior P(z) with the restframe wavelengths of these emission lines to determine a prior probability of a line detection as a function of observed wavelength. At each wavelength within the G141 grism coverage, we determine the probability of a line being present at the pixel position corresponding that wavelength P(Δx) is the combined value of the individual line PDFs at that position. Examples are shown in Figure 3. As noted in Skelton et al. (2014), there are indeed some cases in which the photometric redshift for a given object varies greatly from its spectroscopic redshift. This (small) percentage varies from field to field and is likely a function of magnitude, so we adopt a floor in our PDF such that only 98% of the total probability is allocated according to the photometric prior and distribute the remaining 2% uniformly across all observed wavelengths. with the color coding ranging from black (zero probability) to white (high probability). By marginalizing this probability distribution over all nonzero values of A, we arrive at the top panel, which is the same as the bottom set of panels in Figure 3. The object's G141 spectrum is shown in the middle panel. Figure 2. Photometric templates used in our application of EAZY  normalized at 6000 Å. The colored templates are the default EAZY set, created following the Blanton & Roweis (2007) algorithm with PÉGASE models (Fioc & Rocca-Volmerange 1997) and a calibration set of synthetic photometry derived from semi-analytic models, while the black and gray templates are an EELG (from Maseda et al. 2014) and a 1.5 Gyr Bruzual & Charlot (2003) SSP with A V =2.5 to fully reproduce the SEDs of the bluest and reddest objects in the sample. All galaxy SEDs are fit with linear combinations of these templates.

Redshifts
For every measured emission line, we must determine the redshift of the galaxy. We iteratively assume that the strongest detected line in the grism is Paβ, He I, [S III], Hα, [O III], [O II], and Mg II, and calculate the detection significance at the predicted positions of the other emission lines. If we have significant detection(s) at the predicted position(s), then we have a secure redshift determination. In the case where we do not find a significant additional emission line, we automatically identify the detected line according to the highest probability for a given line species at that wavelength position according to the photometric redshift information.
As a final check, we visually inspect all detected lines to verify that the detection is not caused by severe contamination ) P x derived from P(z) for lines in the correct observed wavelength range, the direct (undispersed) and grism images for the objects, and the output probability at each wavelength position Δx that A is nonzero (Equation (3)). The colored curves denote the expected positions of Hα, [O III], and [O II] given P(z), while the black curve denotes the overall P(λ) for all emission lines that could fall in the grism coverage. Note that in this case, Mg II, [S III], He I, and Paβ do not appreciably contribute any probability for these objects in this observed wavelength range. In the case of UDS-12359, no significant (>3σ) line detections are found; for COSMOS-19077 (one of the objects studied in Maseda et al. 2013Maseda et al. , 2014, a strong line is discovered despite assuming a flat P(λ) due to a high-χ 2 EAZY fit: this object's redshift cannot be reliably determined and is therefore excluded; for GOODS-N-27310, the P(z) correctly predicts the positions of the emission lines; and for AEGIS-24361, the lines are slightly offset from the predicted position (although we detect them regardless). or artifacts at the very edges of the detector: 10 this occurs in less than ∼5% of spectra (see Section 4.3). We also classify objects based on the agreement between the emission line redshift and the photometric redshift: some objects have significant detections of lines and photometric redshift information that either does not provide strong constraints at the position of the line, and thus the line identification is somewhat dubious, or provides no information due to the reduced-χ 2 cut. Further details on these "unknown" objects are presented in Section 5.2.

Completeness of the Sample
While grism spectroscopy allows us in principle to search for emission lines in an unbiased manner, several important issues affect our search completeness.

Line Detection Limits
The primary test of the method's efficacy is to insert fake emission lines into spectra and attempt to recover them. In order to do this, we identify a control sample of 1425 objects representing a variety of galaxy sizes and morphologies where our search method does not return any spectral positions with a significant detection. We insert a fake emission line at 1.4 μm, which is simply the direct image of the object scaled to a given flux value. We then run our search algorithm, focusing on a±7 pixel region around 1.4 μm (corresponding to the average physical extent of a galaxy in this sample based on the Kron radii from the Skelton et al. 2014 photometric catalog) to see how many lines are recovered as a function of the scaled flux value. In addition, previous work from 3D-HST has shown that typical star-forming galaxies have star formation (as traced by Hα emission) out to ∼30% larger radii than the restframe R-band stellar continuum (Nelson et al. 2012). We also repeat this test, making the artificial emission line 30% larger at the same integrated flux value.
The results of this test are shown in Figure 4. At 90% completeness, we find a flux limit of´- at the same completeness (Brammer et al. 2012b). Backgroundlimited grism line searches such as these are primarily sensitive to surface brightness, which is why the black curve in Figure 4, representing a sample with a larger dispersion in object sizes, rises more slowly than the red PSF curve. When we enlarge all emission lines by 30%, we decrease the surface brightness of all galaxies at a given flux, and hence we become less sensitive. Line sensitivity will also vary by wavelength, according to the throughput of the grism. We have performed all of these tests at 1.4 μm, close to the center of the G141 grism. The true sensitivity of our method at a given wavelength, then, is the above-quoted line sensitivity scaled according to the ratio of the 1.4 μm throughput to the throughput at the observed wavelength (see Figure 5). We can convert this flux completeness limit as a function of observed wavelength into a luminosity completeness limit as a function of redshift and line species, as illustrated in Figure 5.

False Positives
While the above test determines the flux limit at which an emission line is likely to be recovered, it does not inform us how often a noise peak or artifact would be detected significantly. In order to investigate this, we isolate contiguous regions of the grism pointing that do not have continuum model fluxes above  2 , as a function of line species and redshift (dotted lines denote the luminosities for lines we do not consider in constructing our "high-EW" sample; see Section 5). The Hα star formation rate comes from the calibrations of Kennicutt (1998). 10 We do not intend to visually verify the existence of the line, but rather to eliminate cases of obvious contamination. Our Bayesian framework eliminates the need for subjective visual searches.
´---e 5 10 s px 5 1 1 for each field and create artificial 2D extractions, each 284×31 pixels in size. As these regions are unlikely to contain real spectral information, any peaks represent noise, unmodeled contamination, or detector artifacts.
We create 176 spectra in this manner, spread across all fields. To mimic our standard line search as closely as possible, we randomly assign one of these "blank" spectra to each of the 159,536 unique objects in the photometric catalog with > m 24 det that lie in the 3D-HST spectroscopic footprint and perform the standard cross-correlation analysis, also using its photometric redshift prior. Throughout, we utilize the MAG_AUTO value as described in Momcheva et al. (2016). This is important when determining the line fluxes, as this is the magnitude of the object in the same wavelength range as the grism spectrum and within the same segmentation map that we use as the kernel. Overall, 1408 (0.88%) yielded at least one >3σ detection (see Figure 6) for all line species and line fluxes.
The number of false positives varies as a function of line flux, as not all cosmetic features and noise peaks are of the same magnitude. As this number is higher than for purely Gaussian noise (which would correspond to 0.27% for 3σ detections), we conclude that the grism exposures contain significant amounts of correlated noise and artifacts that mimic emission features, also due to unmodeled or underpredicted spectral contamination. Additional criteria are applied to create useful samples (e.g., a cut on EW as in Section 5), and thus true contamination levels are likely lower than this.

Contamination
Due to the slitless nature of grism spectroscopy, some sources are strongly contaminated by overlapping spectra from brighter sources. Chance alignment of sources in the direct image could result in both having "detections" of the same emission line in the grism data, especially if both sources are spatially small. There is no automated way to account for such events, so we must resort to visual inspection: for all objects with detected emission lines, we search for all other objects with detected emission lines that lie in a rectangular aperture with an extent corresponding to the G141 dispersion size (284 interlaced pixels). If multiple sources "produce" the same emission line, we assign the line to a single source based on the overlap with the expected trace of the source (a direct overlap as opposed to a glancing one) and the F140W morphology of the sources.
As described in Section 2 and Momcheva et al. (2016), we have a sophisticated flux model for each object in a pointing. The modeled flux for neighboring sources is subtracted when searching for emission lines in an object's spectrum to avoid potential false detections. Each object's own flux distribution is also subtracted: the flat flux distribution represents the continuum level of an object, which needs to be subtracted in order to discover residual emission lines.
The model, however, occasionally does not subtract perfectly, and we are left with residual flux. This typically scales with the flux of the object, such that brighter regions tend to have larger residuals. Regions of positive residual ( ¢ S -Model) appear like spectral features in that they are areas of "real" flux above the background level. These regions are identified as emission features, both in the (bright) object that created the original spectrum and in spectra of objects that happen to overlap with them. We mask out any pixels that have a flux level in the model higher than a threshold value. We seek to strike a balance between masking as few pixels as possible, maximizing our search area, and minimizing the chance of contamination leading to false detections. We select this masking level by utilizing the same framework as in Section 4.2. We determine the number of line detections at or below - 2 at 1.4 μm. However, once we change to a lower threshold (i.e., the masked region corresponds to brighter fluxes and hence covers a smaller area), we begin to see increasing number counts. We therefore select this as our masking threshold, as it maximizes the usable area while still removing as many potentially problematic regions of the grism frame as possible.
The primary issue for contamination from overlapping spectra, then, comes from the limited area in which we search for lines after applying this masking. The total unusable area depends on the specific pointing in question, but is equal to 18% when averaged over the whole survey area with a standard deviation of 4.1%. There are some specific cases in which the model fails to account for a particularly bright spectrum, typically in higherorders for bright stars, and we are left with "uncontaminated" regions of residual flux. These cases are obvious to identify and are a reason why all objects with detections are visually inspected.

Completeness of the Photometric Catalog
The starting point of this search is the photometric catalog of Skelton et al. (2014). Therefore we do not analyze spectra for sources that are not in the input photometric catalog.
In the left panel of Figure 7, we show the completeness fraction of the photometric catalog as a function of line flux and equivalent width, assuming a single emission line is placed in the H F W 160 filter. This is the emission line version of Figure   2 if entirely concentrated in a single line of infinite equivalent width. Note that for a given line flux, we are more likely to have the object in the photometric catalog if it has a lower equivalent width, as that implies the continuum level is higher.
The requirement that an object must be in the photometric catalog is the single strongest prior we apply to our data. If an emission line source is not in that catalog, by definition we will not be able to detect the line. In the range of fluxes where we can still robustly detect lines, our catalog completeness is approximately 60% for high-EW sources. A full catalog of emission lines for 3D-HST galaxies will be presented in M. V. Maseda et al. (2018, in preparation), utilizing a deeper photometric catalog to overcome the issues mentioned here. A comparison of this method to photometric methods for finding high-EW galaxies, namely the iJH-selection of van der Wel et al. (2011), is given in Appendix A.

A Sample of High-equivalent Width [O III] and Hα Emitters
We apply our method to 93,832 unique objects in 3D-HST with full spectral coverage. For objects with multiple spectra due to the overlapping individual pointings, we adopt the redshift corresponding to the highest individual line detection probability. In total, we find 22,786 objects with at least one emission line. In order to estimate line equivalent widths, we use the Skelton et al. (2014) catalog F125W, F140W, or F160W magnitude (depending on the line position) and the measured line flux from 3.1.
In Figure 8 we show the flux histograms for [O III] and Hα emitters from this method and from that of Momcheva et al. (2016). These methods are quite complimentary, given the optimizations for faint lines in objects without strong continuum flux presented in this work.
Here we present a first application of our approach: to measure the number density evolution of extreme EW galaxies with redshift, specifically from 0.7<z<2.3 with an extreme restframe-optical emission line EW ([O III] in excess of 500 Å and/or Hα in excess of 424 Å; see Section 5.2 for specific information about the cuts used). A full analysis of all There are small variations in the grism sensitivity as a function of wavelength, as well as catalog completeness limits that vary by field, but this information is also taken into account in this analysis. Objects with detected lines are put into three classes: multiple significant line detections, single significant line detections with well-known line identifications and hence redshifts from the photometric redshift information, and single significant line detections that have ambiguous identifications either due to broad photometric priors, namely([ ]) P O III a ( ) P H , or because they are detected far away from the expected position from the photometry, like a more extreme case than AEGIS-24361 in the right panels of Figure 3.

Verification of Redshifts
In order to demonstrate the accuracy of our new grism redshifts, we compare them to existing ground-based spectroscopic redshifts. Such samples are typically derived at optical wavelengths for the brighter objects in the sample. While none of our high-EW [O III] and Hα emitters have existing published ground-based spectroscopic redshifts, 22 galaxies in our GOODS-N sample have spectroscopic redshifts (see z spec in Skelton et al. 2014), as well as robust grism redshifts. Of these 22, eight have values that disagree by more than Δz=0.1. We note that all eight have published grism redshifts from Momcheva et al. (2016) that agree with our grism redshifts to better than Δz=0.03 using an independent analysis of the same data with a different method. These values are summarized in Table 1 and Figure 9.
When considering the z spec values, some care must be taken. The redshifts at which this work is most efficient, particularly at 1.1z2.3 when [O III] is visible in the G141 grism, are generally difficult to confirm from the ground at optical wavelengths. Comparison to a small subset of objects that have both grism redshifts and ground-based spectroscopic redshifts shows this difficulty. Further independent verification of the grism-derived redshifts would require additional observations at near-infrared wavelengths. While large samples of such redshifts are currently being obtained, the multiplexing capabilities of slitless grism spectroscopy using WFC3 are difficult to match.

Stacked Spectra
We need to determine the fraction of objects in this third category that we can positively identify as Hα or [O III] emitters, since uncertain or missing photometry can cause problems in the photometric redshift fitting and cannot be properly taken into account as part of our selection function. Namely, the selection function only contains information from the near-IR imaging, whereas the full galaxy SED is used in the photometric redshift fitting. While we cannot reliably provide line identifications for individual objects in this category, we can determine the relative numbers of Hα and [O III] emitters by comparing spectral stacks with that of known objects.  There is overall good agreement between the two grism-based methods; some disagreements exist with the published spectroscopic redshifts, but we note that those redshifts are sometimes inaccurate, as shown in Figure 10.
From high-EW objects with unambiguous redshifts from multiple significant line detections, we would like to determine the typical line ratios. These ratios are useful to define a flux/ EW cut to isolate the same "extreme" objects, as well as to constrain the relative number of the "unknown" objects that are respectively [O III] and Hα.
To do this, we create a stack of all objects in the first class alone (multiple significant line detections) with restframe a EW H or [ ] EW O III in excess of 500 Å, shown in Figure 11. The stacking is done using boxcar extractions of each spectrum in the dispersion direction and weighted according to the grism noise map of the same region. We consider this spectrum as a template EELG, with line ratios that should be representative of the class. These objects alone are used since we are primarily interested in the ratio of [O III] to Hα and to [O II] and hence need a sample containing objects with multiple lines. The relative line ratios are shown in Table 2.
We use these line ratios to construct an equivalent width limit for the sample. We define "high-EW" as having restframe [O III]λ5007 equivalent width in excess of 500Å. According to the line ratios shown in Table 2 and a continuum slope l µ l -F 2 (van der Wel et al. 2011), this implies a restframe Hα equivalent width limit of 424 Å.
We can then create a similar weighted-mean stack, normalized to the peak line flux, for all 505 of the objects in the third category (single line detections that do not have secure identifications and redshifts), shown in Figure 12. Here, we put the detected emission lines at the same wavelength and apply an observed frame equivalent width limit of 1250 Å(equivalent to a restframe EW of 500 Å at z=1.5). We now use this stack to estimate the fraction of [O III] emitters in this stack by correcting the observed ratio of the peak flux (assuming the line is [O III] λ5007) to the flux at the expected position of Hβ according to the intrinsic ratio from the stack of secure objects. We do not expect any emission line contribution at this spectral position if the primary line is Hα or [O II]. The observed ratio of [O III] to Hβ is 13.3±1.92, consistent with the observed value within 2-σ. Likewise, we can perform the same exercise with the λ4959 peak of [O III], compared to λ5007. This result is 0.50±0.034 of the expected 3:1 ratio of λ5007 to λ4959. We note that uncertainties in the object centering due to the low spectral resolution and variations in the emission line morphology can reduce the measured 4959to-5007 ratio, implying that this 50% could still be a lower limit.
If the primary peak corresponded to Hα, the same test is somewhat more difficult, given that we do not expect to see any other strong emission lines from 0.6z1.1. While He I and [S III] are covered to varying degrees in this redshift range, they are typically not strong enough to confirm Hα in the absence of other information. If we assume the lines are Hα, we do not detect any feature at the expected wavelength of [O III], placing a 3-σ upper-limit on the [O III]/Hα ratio of 0.02. Compared to the [O III]/Hα ratio from the stack of the objects with secure redshifts, we can conclude that less than 1% of the objects in

Number Density Evolution
To estimate for the evolution in the number density of such high-EW objects, we first need to construct a parameterized functional form. For simplicity, we assume a power-law distribution in line luminosities and a power-law dependence in (1+z). This functional form, for luminosities in the range + L L dL , and redshifts in the range ). Details of the calculations are given in Appendix B, but in general we can obtain estimates for α and β as well as the intrinsic number density of sources in our survey volume, f (the normalization; i.e., the total number density of sources that could be observed taking into account the incompleteness per unit volume; see Appendix B), per comoving volume element of the survey using a standard Markov Chain Monte Carlo algorithm. We incorporate the knowledge of our selection function (a numerical combination of the completeness functions described in Sections 4.1 and 4.4 and shown in Figure 7, expressed in terms of line luminosity and redshift) as well. Results of this analysis are shown in Figure 13 and in Table 3. After Note. Values are the median of the distributions shown in Figure 13 for In order to estimate the number density of these objects, we must correct the counts for (flux) incompleteness using their published completeness curves. As both samples are small, we expect Poisson errors in the number counts to dominate the errors, and hence we do not attempt to perform the same MCMC analysis with all parameters free. Instead, we fix β to zero and adopt the α value for Hα from the 3D-HST data, fitting only for the normalization. Figure 14 shows consistency between the [O III] emitters from PEARS and the Hα emitters from this work, given that we do not fit the PEARS sample with a redshift evolution. In the case of Hα, the sample is so small that the result is not wellconstrained and we only plot the upper-limit. If we extrapolate our measured Hα number density evolution down to the redshifts of the PEARS Hα sample and take into account the different survey volumes for these respective redshift ranges (3D-HST=56× PEARS), then we would predict five high-EW Hα emitters at = L L 0 , based on the number counts from 3D-HST, modulo differences in the completeness functions for the two surveys. This expectation value is consistent with the measured counts from the PEARS survey.
We therefore deduce that there is a positive evolution in the number of high-EW objects with redshift, out until at least z∼2.4.

Summary
We present here a new method for detecting emission lines in slitless spectroscopic data. We estimate the likelihood that a given position in the 2D spectrum contains an emission line by cross correlating the position with a kernel corresponding to the   ) and this study. The dashed lines represent the±1-σ uncertainties in the number densities from the width of the MCMC probability distributions for β and N. Error bars in redshift represent the actual redshift range in each bin and not uncertainties in the redshift determination. As the Hα density from PEARS is not well-constrained, the upper-limit is plotted. direct image and transform this into the probability that the position corresponds to an emission feature. This method simultaneously includes prior information on line positions from photometric redshift estimates, which also allow us to determine the redshift of the galaxy even when only a single line is detected. The photometric redshifts are determined using a set of galaxy templates that are appropriate for high-EW objects, which are otherwise poorly fit. Robust tests of the method using real and simulated data reveal low levels of contamination and yield a well-defined selection function, which depends on the continuum magnitude of the source as well as the emission line wavelength and flux.
Applying this method to the full 3D-HST data set, we obtain a sample of 22,786 galaxies with definite redshifts, either from multiple line detections or single line detections combined with photometric redshift information. The median line flux of the confirmed objects is´---2.7 10 erg s cm 17 1 2 . This method can be considered complimentary to the one presented in Momcheva et al. (2016), given the relative differences in the complexities of the methods and their abilities to find emission lines in different regimes of continuum strength. These redshifts will be included in a future 3D-HST data release.
Of this sample, we have 782 high-EW [O III] and/or Hα emitters, where the EW limits are 500 and 424 Å, respectively, due to an intrinsic flux difference between the two species in these objects. Many objects in the survey have detected emission lines but either have a very poorly constrained photometric redshift (i.e., a very high reduced-c 2 value) or no photometric redshift at all (i.e., it was not detected in enough photometric bands to be fit). A stacked spectrum for these objects reveal that a majority of them are plausibly We parameterize the number density evolution function, assumed to be a power law in both luminosity and redshift, and probe the posterior probability distributions using a Markov Chain Monte Carlo algorithm. Our sample shows an increase in the number of these objects with redshift by a factor of 6 from z∼0.6-2.4. These results strongly suggest a factor of 30 or more evolution between z∼2 and present, even though direct comparisons with lower-redshift studies are difficult due to their small volumes and different line luminosity limits.
The observed positive trend in number density with redshift is implied by previous EELG studies, such as Kakazu et al. (2007) and van der Wel et al. (2011), who estimate that [O III] EW500 Ågalaxies are two orders of magnitude more common at z∼1.7 than at z=0, which is in rough agreement with the results found here. Similarly, Maseda et al. (2014) argue that their results, combined with precise alignments of the two strong galaxy-galaxy lensing systems discovered in CANDELS/3D-HST (Brammer et al. 2012a;, show that EELGs must be common at z>1. At even higher redshifts, galaxies with these extreme emission line EWs are abundant and perhaps even ubiquitous (at z>6; e.g., Smit et al. 2014).
This method, while specifically applied to HST grism spectroscopy, is more generally applicable to any spectroscopy with spatial as well as spectral information. Its automated nature can be utilized to construct samples of emission line galaxies in large surveys, and it can also be used to determine the significance of an emission line "detection" for individual objects. Planned future grism surveys, such as Euclid and WFIRST, will require an automated line search to analyze the huge volume of data produced. Grisms also feature prominently on the James Webb Space Telescope, with both NIRISS and NIRCam providing multiple slitless grism spectroscopic modes. Given the variety of photometry that will likely be obtained in parallel, a method such as this one that can combine the spectroscopic and photometric information in a statistical way will be a powerful tool in making the best use of future JWST slitless grism spectroscopic surveys.
We would like to thank Greg Rudnick, Fabian Walter, and David Hogg for interesting discussions about statistics. We would also like to thank Pieter van Dokkum and the rest of the 3D-HST collaboration for their work on the survey and input at various stages of this project. M.V.M. was a member of the International Max Planck Research School for Astronomy and Cosmic Physics at the University of Heidelberg, IMPRS-HD, Germany. B.F.L. acknowledges support from the NSF Astronomy and Astrophysics Fellowship grant AST-1202963.

Appendix A Fidelity of Photometric Searches
The photometric selection technique of van der Wel et al.
(2011) utilizes the I F W 814 -, J F W 125 -, and H F W 160 -bands to preferentially select systems dominated by strong emission lines. By looking for a flux excess in J compared to the continuum as measured in I and H, they claim to select [O III] emitters at 1.6<z<1.8, with perhaps minor contamination by Hα emitters at z∼1.
As we now have more complete photometry from 3D-HST and CANDELS since the study of van der Wel et al. (2011), we can apply the same photometric cuts to a larger sample. Previously, van der Wel et al. (2011) found 69 objects in 279 arcmin 2 . Here, using the same cuts, we discover 312 objects in the full 896 arcmin 2 of the 3D-HST survey: 94 in AEGIS, 48 in COSMOS, 67 in GOODS-N (using the F775W filter in place of the F814W filter), 51 in GOODS-S, and 52 in UDS.
We compare this photometric sample with our spectroscopic sample to test the fidelity of the photometric search. For the following analysis, we ignore objects whenever severe contamination would prevent a line identification and objects without full spectral coverage bluewards of 14000 Å: we are left then with a total of 186 objects, of which 147 (79%) have a strong emission line in the J-band. Many of the objects without a detected emission line are intrinsically faint, and thus their emission line could simply be fainter than the noise level in the grism frames. We thus verify the iJH-selection as an efficient way to select emission line galaxies.
Another photometric selection is given in Cardamone et al. (2009) for lower-redshift emission line galaxies, the so-called "green pea" galaxies. While the same selections could yield a sizable sample in our data set, we would not detect the strongest emission lines (Hα or [O III]) in the NIR for them, given their low redshifts.
Cooke (2009) also developed a selection technique to select emission line galaxies from broadband photometric data, specifically searching for z∼3 Lyα emitters at optical wavelengths. Of a sample of 17 galaxies that were selected using photometric cuts, 8 (47%) were confirmed to have Lyα emission, even though the average EW of the emission is a factor of ∼10 lower than the [O III] emission in the z∼1.7 EELGs.

Appendix B Bayesian Luminosity Function
As described in the text, we would like to determine the evolution in the comoving number density of the high-EW emission line sample. This is a simplified case where all redshifts and luminosities are precisely known, as well as the selection function.
We can parameterize the distribution of sources in the range + + L L dL z z dz , ; , as the combination of a power law in luminosity and in redshift according to where L 0 and z 0 are fiducial values (the medians of the input observations). The probability density of L and z, then, is where f is the total number density of sources N/V (Mpc −3 ) that could be observed, and dV/dz is the comoving volume element of the survey. We can consider f as the (inverse) normalization constant since ò ò º ( ) P L z dLdz , 1 , defining f = N 1 0 . For the entire sample of N objects, the probability of L and z is simply the product of the individual probabilities: However, we do not observe the full sample due to incompleteness in, for example, flux. We do understand our selection function S and have a subsample of n objects; the observations are a binomial process of n drawing from an intrinsic distribution containing N objects. Hence we can write  where n is the number of observations in our sample and we assume a flat prior on N log . The posterior on the parameters α and β given the observed data set is simply the product of the n observed probabilities:   We would like to know the posterior probability of the model parameters given our n observed data points. From Bayes's theorem, Since the first term in Equation (14) will cancel with the second term in Equation (11), we obtain a simplified posterior probability of From here, a standard MCMC analysis can determine the distributions for the model parameters.