Predicting Lyα Emission from Galaxies via Empirical Markers of Production and Escape in the KBSS

Lyα emission is widely used to detect and confirm high-redshift galaxies and characterize the evolution of the intergalactic medium (IGM). However, many galaxies do not display Lyα emission in typical spectroscopic observations, and intrinsic Lyα emitters represent a potentially biased set of high-redshift galaxies. In this work, we analyze a set of 703 galaxies at 2 ≲ z ≲ 3 with both Lyα spectroscopy and measurements of other rest-frame ultraviolet and optical properties in order to develop an empirical model for Lyα emission from galaxies and understand how the probability of Lyα emission depends on other observables. We consider several empirical proxies for the efficiency of Lyα photon production, as well as the subsequent escape of these photons through their local interstellar medium. We find that the equivalent width of metal-line absorption and the O3 ratio of rest-frame optical nebular lines are advantageous empirical proxies for Lyα escape and production, respectively. We develop a new quantity, , that combines these two properties into a single predictor of net Lyα emission, which we find describes ∼90% of the observed variance in Lyα equivalent width when accounting for our observational uncertainties. We also construct conditional probability distributions demonstrating that galaxy selection based on measurements of galaxy properties yield samples of galaxies with widely varying probabilities of net Lyα emission. The application of the empirical models and probability distributions described here may be used to infer the selection biases of current galaxy surveys and evaluate the significance of high-redshift Lyα (non)detections in studies of reionization and the IGM.


Introduction
The Lyα line of hydrogen is a powerful tool for detecting and characterizing high-redshift galaxies. Large samples of galaxies have been selected through Lyα emission via narrowband surveys (e.g., Cowie & Hu 1998;Rhoads et al. 2000;Steidel et al. 2000;Trainor et al. 2015;Ouchi et al. 2018) or IFU spectroscopy (e.g., Bacon et al. 2015). Likewise, its strength in emission or absorption makes Lyα extremely efficient for spectroscopically confirming the redshifts of galaxies selected by other means, including broadband imaging surveys.
Lyα also provides valuable information about the properties of galaxies and their surrounding gas, in both emission and absorption (e.g., Rakic et al. 2012;Rudie et al. 2012). Lyα emission clearly signifies the presence of embedded star formation and/or active galactic nucleus (AGN) activity 5 in a galaxy, and resonant scattering of Lyα photons can cause this light to trace the gas distribution on scales comparable to the virial radius of the galaxy halo (e.g., Steidel et al. 2011;Momose et al. 2014;Wisotzki et al. 2016), or even beyond the halo radius for very luminous quasars (e.g., Cantalupo et al. 2014;Martin et al. 2015). In addition, the apparent Lyαemission from galaxies at the highest redshifts is a useful diagnostic of the intergalactic medium (IGM): the apparent drop-off in the fraction of galaxies exhibiting strong Lyα emission at z6-7 (e.g., Pentericci et al. 2011;Schenker et al. 2012) likely points to the increasing neutral fraction at this epoch, and this evolution thus constrains the tail end of cosmic reionization (Robertson et al. 2015).
However, the same physical processes of emission and scattering that make Lyα such a promising tool also introduce significant challenges to its utility. Because strong Lyα emission facilitates efficient galaxy detection and redshift confirmation-but not all galaxies exhibit strong Lyα emission -there are potential selection biases both in Lyα-selected galaxy samples and in samples of broadband-selected galaxies that are vetted through rest-UV spectra. In particular, star formation is a necessary but insufficient condition for detectable Lyα emission; only ∼50% of L * galaxies at z∼3 show Lyα in net emission in slit spectroscopy Steidel et al. 2011), although this fraction appears to increase toward lower galaxy masses and continuum luminosities (e.g., Stark et al. 2013;Oyarzún et al. 2016). As such, Lyα-selected (or Lyα-confirmed) galaxy samples will lack non-star-forming galaxies, as well as a large fraction of starforming galaxies. Resonant scattering, as well as the "fluorescent" generation of Lyα recombination emission, can be used to map out extended H I illuminated by an external or internal engine, but this same scattering often serves to impede the identification of the size, location, and intrinsic properties of the energizing source.
Finally, our incomplete knowledge of the galaxy-scale determinants of strong Lyα emission impedes our ability to use it as an IGM tracer. Given that many star-forming galaxies do not show strong Lyαemission at z∼3, where the neutral fraction of the IGM is minimal, it is not always clear which galaxies at z6-7 are intrinsic emitters of Lyα photons and whether their lack of apparent emission indicates suppression by the IGM. Recent studies of Lyα as a tracer of IGM opacity have been careful to compare galaxies at similar redshift epochs, where intrinsic galaxy evolution is likely to be minimal (e.g., from z ∼ 7 to z ∼ 6; Mason et al. 2018;Pentericci et al. 2018;Hoag et al. 2019). However, both the overall trend in increasing intrinsic Lyα emission as a function of redshift (e.g., Stark et al. 2010) and individual measurements of strong Lyα emission even at z>7 (Roberts-Borsani et al. 2016;Stark et al. 2017) indicate that the variation of intrinsic Lyα emission among galaxies is important to understand for a full accounting of the reionization process. Similarly, the uncertain connections between Lyα emission and physical properties of galaxies prevent the clear identification of the selection biases intrinsic to Lyαselection and redshift confirmation, even as we expect some such biases to be present.
Advancing the utility of Lyα emission as a tool for detecting and characterizing galaxies therefore requires a model for understanding-and potentially predicting-when this emission is expected based on a galaxy's other properties. Any such model of galaxy-scale Lyα emission must include two disparate sets of processes: (1) the production of Lyα photons in H II regions, and (2) the subsequent transmission (or absorption) of these photons through the surrounding H I gas.
The latter set of processes-those pertaining to Lyα scattering, transmission, and escape from galaxies-have been subject to detailed study for two decades, eased in part by the availability of interstellar medium (ISM) diagnostics near Lyα in the rest-UV spectra of galaxies. In particular, much work has demonstrated that Lyαemission (parameterized by the Lyα equivalent width, EW Lyα ) correlates strongly with the optical depth or covering fraction of Lyman-series lines or low-ionization metal lines (hereafter LIS lines, parameterized by EW LIS ), in the sense that galaxies with stronger Lyα emission exhibit weaker Lymanseries or LIS absorption (Kunth et al. 1998;Shapley et al. 2003;Steidel et al. 2010;Jones et al. 2012;Trainor et al. 2015;Du et al. 2018). Weak absorption lines are likely an indicator that the covering fraction or optical depth of H I gas is relatively small, and in some cases the covering fraction of LIS and/or Lymanseries lines have been shown to be much less than unity, particularly for Lyα-emitting or Lyman-continuum-emitting galaxies (Trainor et al. 2015;Steidel et al. 2018). 6 In a related phenomenon, strong Lyα emission lines are found to be narrow in velocity space and close to the systemic redshift of the emitting galaxy (e.g., Erb et al. 2014;Trainor et al. 2015), as well as spatially compact (e.g., Steidel et al. 2011;Momose et al. 2016; but see Wisotzki et al. 2016). These relationships suggest that net Lyαemission is maximized when these photons can escape their parent galaxy with minimal scattering both in velocity and in physical space.
Common among each of the above observables-LIS and Lyman-series absorption, spatial and spectral scattering-is that they are associated with modulation of the Lyα photons that occurs after these photons have left their original starforming regions. Conversely, more recent studies have identified trends linking EW Lyα to signatures of the starforming regions themselves. Much of this recent work has been enabled by the development of efficient near-infrared spectrometers such as MOSFIRE (McLean et al. 2012) and the Hubble Space Telescope (HST)/WFC3-IR grisms (e.g., Atek et al. 2010;Brammer et al. 2012) that can detect the faint rest-frame optical emission lines used to characterize the gas around young stars in high-z galaxies. McLinden et al. (2011), Finkelstein et al. (2011), and Nakajima et al. (2013) found strong [O III] lines in a total of eight Lyαselected galaxies at z∼2−3, while Song et al. (2014) localized 10 Lyα-selected galaxies in the N2-BPT 7 plane, suggesting that highredshift galaxies with strong Lyα emission have low metallicities and high nebular excitation. Trainor et al. (2016) found similarly extreme N2-BPT line ratios (i.e., high O3 and low N2) for a stacked sample of 60 L∼0.1L * galaxies with strong Lyαemission at z∼2.5, while Erb et al. (2016) and Hagen et al. (2016) demonstrated that galaxies selected for strong [O III] emission and/ or weak N2 ratios are strong Lyα emitters and share many physical properties with Lyα-selected galaxies.
In addition to presenting results for faint Lyα-emitting galaxies, Trainor et al. (2016) also show that galaxies ranging from strong Lyα emitters to Lyα absorbers can be described as a sequence in the N2-BPT plane, a phenomenon that appears to be primarily linked to the variation in the excitation state of gas in star-forming H II regions, rather than to extreme variation in gasphase metallicity. In that paper, we also argue that Lyα emission and high nebular excitation are linked by their association with strong sources of ionizing emission within galaxies, including massive stars with low Fe abundances as discussed at length by Steidel et al. (2016) and Strom et al. (2017). Taken together, the results of these rest-frame optical studies are consistent with the expectation that Lyα production is accompanied by numerous other forms of recombination emission and collisionally excited emission that originate in the same star-forming regions as Lyα, although the subsequent transmission of these non-Lyα photons is much less sensitive to the surrounding H I distribution.
It is therefore clear that the net Lyα emission on galaxy scales depends on both the properties of star-forming regions (the sites of Lyα production) and the distribution of the surrounding H I gas that modulates Lyα escape. Here we propose a holistic, empirical framework for accounting for both of these processes. Using the largest sample of galaxies with simultaneous spectroscopic measurements of Lyα, the rest-UV continuum, and a series of rest-frame optical transitions, we identify empirical discriminants of Lyα production and escape, and we demonstrate that the combination of these observable markers can predict the net Lyα emission of galaxies more reliably than individual galaxy properties.
The paper is organized as follows: Section 2 presents details of our galaxy observations and the assembly of our sample, Section 3 describes the methods used to quantify our empirical markers of the efficiency of Lyα production and escape in a given galaxy and their correlations with EW Lyα , Section 4 presents our combined model for predicting Lyα based on multiple markers, Section 5 presents the conditional probability distribution of detecting net Lyα emission as a function of other galaxy properties, Section 6 provides discussion comparing our results to previous work, and Section 7 summarizes our conclusions.

Galaxy Sample
The galaxies presented here are selected from the Keck Baryonic Structure Survey (KBSS; Rudie et al. 2012) and KBSS-MOSFIRE Strom et al. 2017), which together comprise a set of rest-UV and and rest-optical spectra of more than 1000 galaxies across 15 fields at 1.5z3.5. The galaxies are selected using optical colors in the U n , G, and  bands to identify Lyman break galaxy analogs over the target range of redshifts; a more detailed description of the photometric selection is given by Steidel et al. (2004). The full distributions of KBSS-MOSFIRE galaxies' redshifts, masses, and star formation rates (SFRs) are given in Strom et al. (2017), but they occupy the ranges 10 9 M * /M e 10 11 and 3SFR/(M e yr −1 )300, as inferred from reddened stellar population synthesis models 8 fit to Keck LRIS and MOSFIRE broadband photometry. The galaxies have typical dark matter halo masses M h ≈8× 10 11 M e estimated through galaxy-galaxy clustering measurements of the spectroscopic KBSS sample . The median apparent magnitude of the KBSS-MOSFIRE sample is á ñ =  24.4, which corresponds to an Reddy et al. 2008) at the median redshift of the sample, á ñ = z 2.3. For this work, we select the subset of the full KBSS-MOSFIRE sample that have rest-UV spectroscopic coverage of the Lyα line and surrounding region (1208 Å<λ rest <1227 Å) from Keck/LRIS (Oke et al. 1995;Steidel et al. 2004), as well as a galaxy redshift measured from rest-optical MOSFIRE (McLean et al. 2012) spectroscopy (typically from the [O III] λλ4959, 5008 doublet and/or Hα). Details of the rest-UV and rest-optical spectroscopic data are given in the sections below. The total subset of the KBSS-MOSFIRE sample used in this paper comprises 703 galaxies, although most of the individual empirical parameters described in Section 3 are not measured robustly for every galaxy. The requirements for making each measurement and the number of galaxies for which each is made are given explicitly in Section 3, with numbers also given in Table 1.

LRIS Observations
The rest-UV galaxy spectra were obtained with Keck I/ LRIS-B (Steidel et al. 2004) in multislit mode over a series of observing runs between 2002 and 2016 August. Approximately 2/3 of spectra were taken using the 4000/3400 grism, which produces a resolution R∼800 in the typical seeing conditions of 0 6-0 8. 9 The remaining spectra were taken using the 600/ 4000 grism, which produces a resolution R∼1300 in the same seeing conditions. The blue edge of the LRIS-B observations is typically determined by the atmospheric limit at λ∼3200 Å, while the red edge is determined by a dichroic that splits the light between LRIS-B and LRIS-R; dichroics with transition wavelengths λ dich =5000, 5600, or 6800 Å were used to collect the spectra in this sample. These constraints, combined with the redshift distribution of our sample, allow us to sample the rest-frame spectrum of most objects over the range Å Å l   1000 1700 rest . Each object was typically observed for 1.5 hr of 1800 s integrations, but a subsample of ∼150 galaxies included in the KBSS-LM1 project  were each observed for ∼10-14 hr.
LRIS-B spectra were reduced using a custom suite of IDL and IRAF routines. Raw 2D spectrograms were rectified using a slit-edge tracing algorithm, and the resulting rectilinear spectrograms were flat-fielded, background-subtracted, and subjected to cosmic-ray rejection. Individual 2D spectrograms were stacked after accounting for shifts in the spatial and spectral directions due to instrument flexure between exposures. 10 A 1D spectrogram was extracted for each object, and wavelength calibration was performed using arc-line spectra observed through the slit mask during daytime telescope operations, which were then individually shifted to match the wavelength of the 5577 Å sky line in each science spectrum in order to account for instrument flexure. Finally, spectra are corrected to vacuum, heliocentric wavelengths and rebinned to a common wavelength scale. Further details regarding the software routines used in the KBSS data reduction process are given by Steidel et al. (2003Steidel et al. ( , 2010. Because many objects were observed multiple times over the period since observations of the KBSS galaxy sample began, all extant spectra for a given galaxy were averaged-weighted according to their exposure time and a visual inspection of each spectrum-to produce a final spectrum for each galaxy in our sample. Stacked LRIS spectra are displayed in Figures 1-2.

MOSFIRE Observations
The MOSFIRE observations for the KBSS-MOSFIRE survey are described in detail elsewhere Strom et al. 2017). Briefly, galaxies are observed in the J, H, and/or K bands using 0 7 slits, and the resulting 2D spectrograms are reduced using the MOSFIRE data-reduction pipeline 11 provided by the instrument team. Wavelength calibration is performed by identifying OH sky lines in all spectral regions except in the red end of the K band (λ obs 2 μm), where arc lamp spectra are used owing to the paucity of sky lines. The absolute flux scaling, slit-loss corrections, and cross-band calibration are performed through Figure 1. Stacked spectra of 703 galaxies sorted in quintiles of EW Lyα ; the median EW Lyα for each sample is given by the color bar on the right. The dashed vertical line indicates Lyα, while the solid vertical lines indicate the LIS transitions described in Table 2 and Section 3.2.2. The LIS line strengths vary similarly across all six distinct absorption features, such that strong LIS absorption is associated with weak Lyα emission and vice versa.  Figure 1 according to the process described in Section 3.2.2 including normalization to the local continuum. Note that the stacked profiles show the same trends described in the text for individual objects: increased EW Lyα is strongly associated with decreased (i.e., less redshifted) v Lyα and increased (i.e., weaker and less negative) EW LIS .
observations of a slit star on each mask, as discussed in detail by Strom et al. (2017). As described in that paper, the spatial extent of the KBSS galaxies (which are marginally resolved in typical atmospheric conditions) causes the slit losses for KBSS galaxies to exceed that measured directly from the slit star.
Line measurements are performed using the IDL program MOSPEC (Strom et al. 2017). Typically, galaxy redshifts and line fluxes are fit simultaneously in a single band (J, H, or K ), with all nebular lines in the band constrained to have the same redshift and velocity width. For the [O III] λλ4960, 5008 doublet, the known line flux ratio f 5008 /f 4960 =3 is also enforced. Line widths and redshifts are not forced to match between bands (e.g., for [O III] in the H band and Hα in the K band, as would be observed at z ≈ 2-2.6), but galaxies with redshift measurements in multiple bands are checked for consistency. Each galaxy is then assigned a nebular redshift z neb based on the rest-frame optical redshift with the smallest uncertainty. For galaxies with redshift measurements in multiple bands, the typical agreement is less than Δz=0.0002.

Spectral Energy Distribution (SED) Models and SFRs
Photometry of the KBSS fields and SED modeling of the KBSS galaxy sample are described by Steidel et al. (2014) and Strom et al. (2017) using SED fitting methodology described by Reddy et al. (2012). Models are from the Bruzual & Charlot (2003) library and assume solar metallicity, a Chabrier (2003) initial mass function (IMF), a Calzetti et al. (2000) attenuation relation, and a constant star formation history with a minimum age of 50 Myr. Stellar masses (M * ), SFRs (SFR SED ), and continuum-based reddening (E(B − V ) SED ) estimates are obtained from the SED fitting.
In addition to SED-based SFRs, Hα SFRs are calculated for the majority of the KBSS-MOSFIRE sample as described by Strom et al. (2017) using the MOSFIRE measurements described above. These SFRs and sSFRs (sSFR= SFR Hα /M * ) are calculated assuming a Kennicutt (1998) Hα-SFR relation adjusted for a Chabrier (2003) IMF. Dust corrections for Hα SFRs are calculated as described in Section 3.2.3.

Empirical Galaxy Measurements
As described in Section 1, the primary goals of this paper are (1) to characterize the empirical relationships between Lyα emission and various other galaxy properties and (2) to interpret these relationships in terms of the production and escape of Lyαphotons. Below, we define the various observables used throughout this paper, and we categorize them in terms of whether they are likely to primarily relate to the production or escape of Lyα photons.

EW Lyα
First, we must define a metric of the efficiency of Lyα emission: the Lyα equivalent width (EW Lyα ). We focus on EW Lyα rather than the total Lyα luminosity for multiple reasons. First, we find that the Lyα luminosity is primarily correlated with other descriptors of the total luminosity of a galaxy, so measuring the Lyα luminosity per unit UV continuum luminosity is a more interesting descriptor of the (in)ability of Lyα photons to escape relative to non-Lyα photons at similar wavelengths. Second, characterization of EW Lyα is possible in slit spectra even without precise flux calibration, so our predictions for EW Lyα can be more easily applied to actual observations of galaxies with uncertain slit losses. 12 The value of EW Lyα is measured for each object in our sample directly from the 1D object spectrum in a manner similar to that described in Trainor et al. (2015Trainor et al. ( , 2016. Each rest-UV spectrum is first shifted to the rest frame based on its nebular redshift z neb . In order to account for the wide variety of Lyα line profiles in a systematic manner, the Lyα line flux, F Lyα , was directly integrated by summing the continuumsubtracted line flux over the range 1208 Å<λ rest <1227 Å, roughly the maximum range of wavelengths found to encompass the Lyα line in our spectra. 13 In velocity space, this corresponds to a range −1900 km s −1 <v<2800 km s −1 with respect to Lyα. The local continuum flux f λ,cont used for subtraction is estimated as the median flux in the range The Lyα line flux f Lyα is therefore positive for net Lyα emission lines and negative for net Lyα absorption. The line flux and equivalent width are thus defined by the following expressions: Again, this procedure causes galaxies with net Lyα emission in their 1D slit spectra to have EW Lyα >0 and galaxies exhibiting net Lyα absorption to be assigned EW Lyα <0. Note that, in each case, the assigned value of EW Lyα is likely to be an underestimate of the intrinsic value owing to the spatial scattering of Lyα photons, which preferentially lowers the observed ratio of Lyα photons to continuum photons in a centralized aperture. Typical relative slit losses of Lyα with respect to the nearby continuum in similar samples are found to be 2-3× (see, e.g., Steidel et al. 2011;Trainor et al. 2015). However, we have no way to determine the relative Lyα slit loss for the majority of individual objects in our sample, so we do not attempt to do so here. Furthermore, in this work we are primarily concerned with the factors that determine the net Lyα emission on galaxy scales; variation in the total Lyα emission of the galaxy-plus-halo system with physical and environmental properties of the galaxies will be discussed in future work.
Uncertainty in EW Lyα is determined based on the uncertainty in the Lyα flux (usually a small factor), as well as the uncertainty in the local continuum (usually the dominant factor, particularly for high-EW Lyα sources). In some cases, correlated noise in the continuum spectrum causes the formal uncertainty in the local continuum to be unrealistically low based on a visual inspection of the spectrum. To account for this fact, a separate estimate of the continuum flux l f ,cont is measured for each galaxy over the range 1220 Å<λ rest <1300 Å (i.e., over a ∼3× larger range of wavelengths than the first estimate), and this second continuum estimate is used to recalculate F Lyα and EW Lyα . If the new EW Lyα value differs from the original value by more than the uncertainty on EW Lyα calculated originally, then the uncertainty on EW Lyα is replaced by the absolute 12 However, note that the spatial scattering of Lyα photons (as described in Section 1) causes EW Lyα to be sensitive to the differential slit losses in Lyα vs. the continuum (although EW Lyα is still less sensitive to slit losses than the total Lyα luminosity). 13 This velocity range is also chosen to minimize contaminating absorption due to the Si III λ1206 transition. value of the difference between the two estimates of EW Lyα . This procedure increases the uncertainty on EW Lyα for 59 objects (8% of the total sample), and the total variance in EW Lyα associated with our estimated measurement error among our entire sample increases by 5%.
Furthermore, if the measured continuum value is smaller than 2× the estimated uncertainty on the continuum, then EW Lyα is defined to be a lower limit: where s l,cont is the formal uncertainty in the local Lyα continuum. This correction applies to 13 objects (2% of our total sample), and EW Lyα for these objects is assumed to take the value of their 2σ error in the analysis that follows. In this manner, a value and uncertainty for EW Lyα are estimated for each of the 703 galaxies in our sample. Spearman rank correlation statistics between EW Lyα and a series of other empirical quantities measured among the galaxies in our sample are given in Table 1 below. Definitions and measurement methodologies for each of these quantities are given in the sections that follow.

v Lyα
As discussed in Section 1 above, increasing shifts of the Lyαemission line with respect to the systemic redshift are associated with decreasing EW Lyα , and this trend likely corresponds to the fact that Lyα photons must scatter significantly in redshift and/or physical space to escape regions of high optical depth. For this reason, the difference of the Lyα redshift (z Lyα ) from systemic (z neb ) may be regarded as a proxy for the optical depth experienced by Lyαphotons transiting the galaxy ISM and CGM and is thus related to the probability of Lyα photon escape.
We define the Lyα redshift and offset velocity based on the centroid of the Lyα line flux: where c is the speed of light and the integrals in Equation (4) are evaluated over the range 1208 Å<λ rest <1227 Å, as is done to estimate the total Lyα flux (Equation (1)). Note, however, that in this case we integrate the raw flux over the Lyαregion rather than integrating the continuum-subtracted flux. This choice does not appreciably change the assigned value of v Lyα when EW Lyα is large, but it significantly reduces the noise on v Lyα when EW Lyα →0, in which case the denominator would approach zero for an analogous equation to Equation (4) weighted by continuum-subtracted flux.
Note also that the flux in the Lyα transition need not exceed l f ,cont in order to measure an Lyα velocity; if a galaxy exhibits Lyα absorption that is preferentially blueshifted, the assigned Lyα velocity will be positive and thus similar to a galaxy with redshifted Lyα emission. Velocity uncertainties are determined by a Monte Carlo analysis in which a randomly generated error array consistent with the per-pixel uncertainty is added to the Lyα region of the spectrum and the velocity is measured as above. This process is repeated 1000 times per spectrum, and the estimated velocity uncertainty is 1.5× the median absolute deviation 14 of the Monte Carlo velocity values. While velocity measurements can be made in this way for every spectrum in our sample, we include in our analysis of v Lyα only those spectra with an Lyα velocity uncertainty σ Lyα <750 km s −1 . 15 Because the Lyα velocity also depends on the accuracy of z neb , we also require that Hα and/or [O III] λ5008 are detected with at least 5σ significance. When these cuts are made, 496 galaxies in our sample have reliably measured values of v Lyα . The Spearman rank correlation statistic between v Lyα and EW Lyα is = r Sp -0.56 (p = 1.6 × 10 −42 ), indicating a strong, highly significant correlation. This relationship is consistent with previous work (e.g., Erb et al. 2014), as well as our results from stacked spectra shown in Figure 2 (left panel).
As described in Section 1, our eventual goal is to predict the value of EW Lyα for a galaxy in the absence of its direct measurement (since the Lyα flux is not always directly observable). Unfortunately, v Lyα may be ineffective as such a predictor for two reasons. First, v Lyα cannot be measured in cases where Lyα is not directly measurable (e.g., when the transition is censored by the IGM or contaminated by other emission). Second, even in cases where v Lyα is measurable (e.g., among some galaxies at high redshift), any intergalactic absorption that suppresses EW Lyα is also likely to change the observed value of v Lyα . Because scattering of Lyα by both the ISM and the surrounding IGM will produce degenerate shifts on v Lyα , v Lyα itself cannot be expected to separate between these two effects. With this in mind, we caution against the use of v Lyα to predict the intrinsic value of EW Lyα .

EW LIS
The second proxy we use for the ease of Lyα photon escape is the strength of absorption lines corresponding to lowionization interstellar gas. Unfortunately, even the strongest interstellar absorption features are difficult to measure reliably in individual spectra. In order to increase the significance of these detections, we construct a "mean" LIS absorption profile for each galaxy spectrum as follows.
Seven LIS transitions covered by the majority of our rest-UV spectra are identified in Table 2. O I λ1302 and Si II λ1304 are blended at the typical spectral resolution of our observations, so six distinct absorption features can be individually measured. For each transition, the spectral region within ±5000 km s −1 of the rest-frame wavelength is interpolated onto a grid in velocity space and normalized to its local continuum (defined as the median flux in the region >1000 km s −1 and <5000 km s −1 14 Note that σ≈1.5×MAD is a simple estimator of scale that is insensitive to outliers and recovers the usual standard deviation when applied to a Gaussian distribution (see, e.g., Rousseeuw & Croux 1993). 15 Objects with larger velocity uncertainties typically have relatively low signal-to-noise ratios in the UV continua, as well as minimal Lyα absorption and emission, making the "velocity" of the Lyα line an ill-defined quantity. from the transition wavelength in either direction). The six LIS absorption profiles are then averaged with equal weighting, and an effective rest-frame equivalent width in absorption is measured for the stacked profile via direct integration according to the following expression, which we define as EW LIS : where λ 1 and λ 2 correspond to ±1000 km s −1 from the restframe line center. The uncertainty on EW LIS is defined to be the standard deviation of absorption equivalent widths calculated as above for random 2000 km s −1 intervals in nearby regions of the rest-UV spectrum. Because the reliability of our EW LIS measurement is extremely sensitive to the strength of the far-UV continuum, we only consider measurements of EW LIS for which the local continuum is detected with signal-to-noise ratio (S/N) > 20 in the stacked profile; this sample includes 625 objects.
For 162 objects in this sample, one or more LIS transitions fall above the red edge of the LRIS-B spectrum. For 19 objects, one or more LIS transitions are flagged as discrepant: they either correspond to an equivalent width more than 15σ away from the median equivalent width of the other transitions, or they otherwise lie in a region of the spectrum that appears significantly noisier than average based on a visual inspection. In any of these cases, the missing or flagged transitions are omitted, and the mean EW LIS value and uncertainty are calculated from the remaining transitions. In total, the number of objects for which 6 (5, 4, 3, 2, 1) transitions contribute to EW LIS is 456 (75, 56, 42, 8, 0).
As described above, the vast majority of cases where one or more transitions are omitted occur because of a lack of red spectral coverage, such that the redder transitions in Table 2 are preferentially omitted. Given that the two strongest transitions are also the two bluest (and thus least likely to be omitted), there is potential for our spectral coverage to introduce a trend between EW LIS and the number of included transitions. Separating galaxies by the number of included LIS transitions (N LIS ), the median value of EW LIS for each subset is (1.43, 1.49, 1.27, 1.40, 1.55 Å) for N LIS =(6, 5, 4, 3, 2). The lack of a systematic trend between EW LIS and N LIS suggests that our EW LIS values are not particularly sensitive to the precise subset of LIS transitions included.
The distribution of EW LIS versus v Lyα is shown in Figure 3 for the 479 objects for which both quantities are measured robustly according to the criteria described above. The Spearman correlation coefficient for these two parameters is r Sp =0.28 (p = 3.2 × 10 −10 ; see Table 1), indicating a moderate (although highly statistically significant) correlation between these two proxies for ISM optical depth (or porosity) and the likely ease of Lyα photon escape. The color-coding by EW Lyα in Figure 3 demonstrates that strong Lyα emission is associated with weak LIS absorption and a small shift of the Lyα line with respect to systemic, in agreement with the expectations outlined above. The correlation between EW LIS and EW Lyα is moderately strong and highly significant ( =r 0.35 Sp , p=3×10 −21 ). We note that EW LIS has multiple practical advantages over v Lyα as a predictor of EW Lyα . Unlike v Lyα , EW LIS is likely to be unaffected by IGM absorption, since the metallicity of intergalactic gas will be negligible compared to that of the enriched galactic outflows traced by metal-line absorption. In addition, in the event that the Lyα transition is censored by the IGM, local H I, or contaminating emission, the longerwavelength LIS transitions may still be measurable in many realistic cases at both low and high redshifts. For these reasons, our analysis that follows utilizes EW LIS as our primary proxy for Lyα escape.
Because the escape of Lyα photons depends on the distribution of dust in galaxies as well as H I, we also consider the relationship between EW Lyα and E(B − V ) (see also discussion in Trainor et al. 2016 andTheios et al. 2019).
E(B − V ) SED is measured via SED fitting as described in Section 2.4 for 637 galaxies. Comparing these values to EW Lyα yields a moderate, highly significant correlation ( =r 0.23 We also measure E(B − V ) neb based on the Balmer decrement (Hα/Hβ) as described by Strom et al. (2017). Briefly, the slit-loss-corrected Hα and Hβ fluxes are compared to the canonical ratio Hα/Hβ=2.86 for case B recombination  at T=10 4 K (Osterbrock 1989). Galaxies with Hα/Hβ< 2.86 are assigned E(B − V ) neb =0, while galaxies with Hα/ Hβ > 2.86 are assigned a value of E(B − V ) neb based on a Cardelli et al. (1989) Galactic attenuation relation. The median value of E(B − V ) neb for KBSS galaxies is 0.26, and the interquartile range is 0.06-0.47 (Strom et al. 2017). As discussed by Trainor et al. (2016) and Strom et al. (2017), the Hβand Hα emission lines are measured in separate exposures in KBSS-MOSFIRE observations; at typical redshifts 2.0<z<2.6, the lines fall in the H and K near-IR atmospheric bands, respectively. For this reason, we present values of E(B − V ) neb only for those galaxies with >5σ measurements of Hα/Hβ, including the uncertainties in the individual line fluxes as well as the cross-band calibration. This cut limits our sample of secure E(B − V ) neb measurements to 208 galaxies, which display a weak correlation with EW Lyα ( =r 0.14 Sp , p=0.05).

f esc
The most direct measure of the efficiency of Lyα photon escape is the actual escape fraction of Lyα, hereafter f esc . 16 Any determination of this escape fraction relies on an estimation of the true number of Lyα photons produced in galaxies, which can then be compared to the observed Lyα flux. In practice, the observed Lyα flux is compared to the observed Hα flux, with the latter value scaled by the expected intrinsic flux ratio Ly H int for case B recombination. 17 Hα provides an effective proxy for the intrinsic Lyα luminosity because the former is not significantly scattered by H I; however, it nonetheless suffers extinction by interstellar dust. The intrinsic Hα flux (and thus the intrinsic Lyα flux) can therefore only be determined using the absolute attenuation A Hα , which is typically estimated from the inferred nebular reddening (i.e., E(B − V ) as defined in Section 3.2.3) and the application of an attenuation relation that is appropriate to the galaxy at hand. As discussed by Theios et al. (2019), no single attenuation relation is able to self-consistently describe the host of photometric and spectroscopic properties inferred for KBSS galaxies. Given these ambiguities in the attenuation correction, we present both the relative (i.e., dust-uncorrected) and absolute (i.e., dust-corrected) Lyαescape fractions based on the following definitions:  (1), which is not dust-corrected.
We calculate f esc,rel via Equation (8) for 368 galaxies for which we have a 5σ detection of Hα (we take = f 0 esc,rel where a  F 0 Ly ), as well as a measurement of EW LIS . As described in Section 3.2.3, only 208 galaxies have a robust measurement of E(B − V ) neb ; when combined with the S/N cuts on Hα and EW LIS , this leaves 188 galaxies for which f esc,abs may be calculated with confidence via Equation (9) (although subject to the remaining uncertainty in the attenuation law, as well as differential slit losses in Lyα versus Hα). Figure 4 displays the relationship between EW LIS and f esc in both its absolute and relative forms. In either case, the two quantities have a relatively strong correlation given the measurement uncertainties (r Sp = −0.50, p = 6 × 10 −13 for 188 galaxies for f ; esc,abs Table 1). Again, our analysis that follows is restricted to using EW LIS as a proxy for Lyα escape because f esc , like v Lyα , is not typically measurable in cases where we would like to predict EW Lyα .

SFR, Mass, and Luminosity
We now consider parameters that may be associated with Lyαproduction. As noted above, Hα luminosity should be a fairly direct proxy for the intrinsic luminosity of a galaxy in Lyα. However, it is less related to the efficiency of Lyα production as described by EW Lyα : dust-corrected a L H is uncorrelated with EW Lyα in our sample (r Sp = −0.05, p = 0.4) for the 208 galaxies with robust estimations of E(B − V ) and A Hα (see Section 3.2.3); the same correlation holds between EW Lyα and SFR Hα since SFR Hα is linearly related to L Hα . 18 Our photometry-based estimates of SFR SED display slightly stronger relationships with EW Lyα (r Sp = −0.17, p = 10 −5 ), and the correlation for stellar mass is very similar ( =r 0.15 Sp , p=10 −4 ) for the 637 galaxies with SED fits and EW Lyα measurements. sSFR (SFR Hα /M * ) displays a slightly stronger correlation with EW Lyα ( = r 0.23 Sp , p=10 −3 ), with lower significance due to the smaller sample size of objects with the necessary measurements of both SFR Hα and M * (199 galaxies).
Rest-UV absolute magnitudes M UV are measured from the G-and -band magnitudes. The apparent magnitude corresponding to a rest-frame wavelength λ rest =1450 Å is estimated by taking a weighted average of the G and  based on the redshift of the galaxy. 19 This apparent magnitude m UV is then converted to the absolute magnitude M UV based on the redshift of the source and the luminosity distance calculated assuming a ΛCDM cosmological model with H 0 =70 km s −1 Mpc −1 , Ω m =0.3, and Ω Λ =0.7. In this manner, M UV is measured for each of the 637 galaxies in our SED-fit sample. This parameter shows the weakest relationship with EW Lyα of any quantity we measure, with = -´r 1.5 10 Sp 2 and p=0.71. This lack of association between UV luminosity and EW Lyα is 16 f esc here should not be confused with the escape fraction of Lyman continuum (i.e., ionizing) photons. The f esc defined here is described elsewhere in the literature as a f esc,Ly , but we will omit the Lyα subscript for simplicity in this paper. 17 Note that various values are assumed for ( ) a a

F F
Ly H int in the literature, but the uncertainty on the aperture correction for Lyα in our data dwarfs the uncertainty on the intrinsic flux ratio, and our measured trends between f esc and other parameters are insensitive to the chosen value regardless. The value 8.7 is motivated by the calculations of Dopita & Sutherland (2003) and is consistent with previous studies (Atek et al. 2009;Hayes et al. 2010;Henry et al. 2015;Trainor et al. 2015). 18 This calculation includes 208 galaxies with robust dust corrections as described in Section 3.2.3; the correlation with dust-uncorrected L Hα is similarly weak despite the much larger sample. 19 Note that the Lyα line falls within the G band for z2.45, which includes roughly one-quarter of our galaxy sample. For the galaxies in this redshift interval, we correct the inferred value of M UV based on the spectroscopic measurement of EW Lyα . This correction produces a median change remarkable given the high EW Lyα values associated with faint, high-z galaxies in other recent work; these trends are discussed further in Section 6.

O32
By definition, the intrinsic EW Lyα of a galaxy is the ratio of Lyαphotons to UV continuum photons, where the latter are generated directly by OB stars and the former are generated by the gas excited and ionized by these same stars. It is therefore sensible that EW Lyα would be strongly associated with the excitation and ionization states of the gas in star-forming regions.
The O32 line ratio is one commonly used indicator of nebular ionization (Sanders et al. 2016;Steidel et al. 2016;Strom et al. 2017Strom et al. , 2018 For the ionization-bounded H II regions typically assumed in photoionization models of star-forming galaxies, O32 is approximately proportional to log(U), where U denotes the "ionization parameter," the local number of hydrogen-ionizing photons per hydrogen atom (see discussion by Steidel et al. 2016). Furthermore, O32 has previously been found to correlate strongly with Lyα emission (e.g., Nakajima et al. 2016;Trainor et al. 2016).
Notably, however, recent work has suggested that elevated O32 values may correspond in some cases to density-bounded H II regions, in which the ionized region is not entirely surrounded by neutral gas 20 (Nakajima et al. 2013;Izotov et al. 2016;Trainor et al. 2016). In particular, several recent detections of escaping Ly-continuum (rest-frame H ionizing) photons from galaxies at low and high redshift have been accompanied by elevated O32 ratios (e.g., de Barros et al. 2016;Izotov et al. 2016Izotov et al. , 2018Fletcher et al. 2019; but see Borthakur et al. 2014;Shapley et al. 2016, who find Lycontinuum escape in the absence of extreme O32). In these scenarios, an association of large O32 with high EW Lyα may reflect a combination of both increased ionizing photon production and increased probability of photon escape due to the lack of surrounding neutral gas.
O32 measurements for the KBSS sample are calculated as described in Strom et al. (2017) be detected with S/N > 3, resulting in a sample of 316 galaxies with a measured raw O32 value. Dust-corrected O32 values are measured for a smaller sample of 174 objects that meet both the requirements described above and the cut on the S/N of the Balmer decrement described in Section 3.2.4. For these measurements, each of the [O III] and [O II] emission lines is corrected for extinction using the measured Balmer decrement and a Cardelli et al. (1989) attenuation curve before calculating the line ratio. These two O32 estimators have the highest individual correlations with EW Lyα of any "production"related parameter: = r 0.43 Sp (p = 10 −15 ) for the raw O32 measurements with a slightly higher correlation strength and lower significance for the smaller sample of dust-corrected O32 values (Table 1). However, O32 also displays a strong correlation with EW LIS ( = r 0.47 Sp ), perhaps reinforcing the idea that O32 is not wholly a measure of Lyα production.

O3
The O3 ratio is another indicator of nebular ionization and excitation: ⎛ As discussed by Trainor et al. (2016), the O3 ratio is strongly associated with O32 for the high-excitation galaxies typical at z2: the two quantities are correlated with = r 0.74 Sp in the KBSS sample. Likewise, Strom et al. (2018) demonstrate that O3 is an effective indicator of log(U) through extensive photoionization modeling of the KBSS galaxy sample. O3 therefore has many of the same advantages as O32 for indicating Lyα production.
However, O3 has two significant advantages over O32. First, O3 relies on two emission lines at similar wavelengths, which makes the ratio insensitive to both dust extinction and crossband calibration. Second, O3 is insensitive to the differences between density-bounded and ionization-bounded H II regions,  (8)) for 368 galaxies with detected Lyα and Hα emission. The right panel gives the absolute escape fraction of Lyα photons (Equation (9)) for 188 galaxies that also have robust estimates of the Balmer decrement (used to dust-correct the Hα flux). Bottom panels show the EW LIS distribution for galaxies with F Lyα <0. 20 Essentially, the local ratio of O II to O III increases toward the edge of the Strömgren sphere for an ionization-bounded nebula. For a nebula that is optically thin to ionizing photons, this ionization front (and its associated region of stronger O II emission) is not present. See, e.g., Pellegrini et al. (2012).
so it may indicate nebular excitation (and Lyα production) in a manner more decoupled from the physics of Lyα escape. Based on these advantages, we rely on O3 as our primary metric of Lyα production efficiency for the remainder of this work.
Using the same line-fitting process described above to estimate the line fluxes and uncertainties, we calculate the O3 ratio for every galaxy that has S/N > 3 for both [O III] and Hβ, a total of 395 objects. O3 has a correlation with EW Lyα that is only marginally weaker than the corresponding correlation for O32 (r Sp = 0.40, p = 5 × 10 −16 ).

Summary of Correlations with Lyα
Again, Spearman rank correlation statistics for EW Lyα and EW LIS with other measured quantities are presented in Table 1. Each quantity is calculated for a different number of objects according to the cuts described above, and the p values of every measured correlation (which depend on both the measured r and the number of objects in the sample) are highly significant (p=1 in nearly all cases). However, there is a wide range of r values, indicating that certain parameters explain only a small fraction in the total variation in EW Lyα despite the statistical significance of their correlation.
Note that EW LIS is strongly correlated with the escape fraction of Lyα as expected based on the arguments that both of these quantities are related to the ability of Lyα photons to escape galaxies (see Section 3.2.2 and Figure 4). Conversely, EW LIS has a much weaker 21 correlation with O3 despite the fact that both quantities show relatively strong correlations with EW Lyα . We interpret this relationship in the sections that follow, but it is suggestive of the fact that these two quantities capture different processes (escape and production) related to the observed EW Lyα . Figure 5 visually demonstrates this same result. While O3 and EW LIS are themselves not closely correlated, there is a clear trend toward high EW Lyα in the upper right corner of Figure 5 (i.e., the region of high O3 and/or EW LIS ∼0) and low EW Lyα in the lower left corner (i.e., the region of low O3 and/or strongly negative EW LIS ). Figure 5 also includes measurements from stacked spectra of faint L∼0.1L * Lyα-selected galaxies from the KBSS Lyα survey; these measurements are shown as boxed points. The EW LIS measurements are described by Trainor et al. (2015), while the O3 measurements are described by Trainor et al. (2016). The faint galaxy measurements follow the same trend as the individual measurements from the brighter KBSS galaxies, in that high EW Lyα is associated with the upper right corner of the parameter space.
For comparison, we also include measurements of MS 1523-cB58, a gravitationally lensed galaxy with M * ≈10 9 M e , SFR≈ 50-100 M e yr −1 , and a young age ∼9Myr (Siana et al. 2008). The reported O3 value is based on spectroscopic measurements reported by Teplitz et al. (2000), while the Lyα and LIS equivalent widths are new measurements from the Keck/ESI spectrum presented by Pettini et al. (2002). Despite its young age and large SFR-both of which would predict a high rate of Lyα production -the spectrum of MS 1512-cB58 (hereafter cB58) displays net Lyα absorption, 22 consistent with its deep LIS absorption lines (EW LIS ≈ −3 Å). The galaxy cB58 thus obeys the same association as the KBSS galaxies between Lyα emission and position in the O3-EW LIS parameter space.
The structure of Figure 5 therefore suggests that a linear combination of O3 and EW LIS would better predict EW Lyα than either quantity alone. That is, we could in principle define a single parameter that is maximized when both O3 and EW LIS predict strong EW Lyα , is minimized when both O3 and EW LIS predict weak EW Lyα , and takes intermediate values when O3 and EW Lyα have contradictory implications for the value of EW Lyα . We develop such a model in Section 4 below.

Combined Model
Motivated by the arguments above, we construct a new parameter X LIS O3 with the following definition: This parameter has the behavior described at the end of Section 3.4: X LIS O3 is maximized when both O3 and EW LIS are maximized (i.e., when our proxies for both Lyα production and Lyα escape suggest that EW Lyα should be strong). Likewise, X LIS O3 will take smaller values when either or both of O3 and EW LIS are small (i.e., when EW Lyα is expected to be small according to Figure 5). We therefore may expect any equation with the form of Equation (12) to predict a strong, monotonically increasing relationship between EW Lyα and X LIS O3 . We then tune α 23 to maximize the predictive power of this relationship. Specifically, we choose the value of α that maximizes the rank correlation coefficient between X LIS O3 and EW Lyα , yielding Figure 5. O3 ratio (≡log([O III]/Hβ)) vs. rest-frame equivalent width of LIS absorption for 377 galaxies, with colors denoting Lyαequivalent width. Note that O3 and EW LIS are not strongly correlated with each other, but EW Lyα >0 is strongly associated with both weak LIS absorption (EW LIS ≈0) and strong [O III] emission (O30.5). Square boxes with black borders correspond to stacked measurements of faint L∼0.1L * Lyα-selected galaxies from Trainor et al. (2015Trainor et al. ( , 2016. The large square denotes the full sample, and the smaller squares denote measurements based on splitting about the median value of EW Lyα . The diamond with black border represents MS 1512-cB58 based on measurements from Pettini et al. (2002) and Teplitz et al. (2000). 21 While the correlation is highly significant at p<10 −6 , the low rank correlation coefficient r Sp =0.21 indicates that most of the variation in EW LIS is not associated with variation in O3. 22 Note that the detailed cB58 Lyα profile displays weak Lyα emission superimposed on a much stronger damped Lyα absorption profile, as described by Pettini et al. (2000Pettini et al. ( , 2002. Due to the lower S/N of our KBSS Lyα measurements, we simply describe each galaxy as a net absorber or emitter for the purposes of this paper. 23 Note that α is a dimensionless number that sets the weighting of EW LIS (measured in Å) relative to O3 (measured in dex); this arbitrary parameterization was chosen so that typical values of X LIS O3 would be of order unity for the galaxies in our sample. a maximum correlation of r Sp =0.49 for α≈0.2. 24 Repeating this procedure on 1000 bootstrap samples, we find that the optimum value of α is constrained to 0.19±0.06, and the bootstrap samples are correlated with EW Lyα with r Sp =0.49± 0.04 for fixed α=0.2. The relationship between EW Lyα and X LIS O3 is displayed in Figure 6. As expected from Figure 5, strong Lyα emission is closely associated with large X LIS O3 .

Exponential Model and Variance
Motivated by the distribution of points in Figure 6, we fit an exponential model to the data. Because there are large uncertainties in both axes, we choose model parameters to minimize the 2D distance between the model and data, scaled by the corresponding uncertainties in each dimension. In effect, we define a 2D analog of the traditional χ 2 parameter: , 2 which our model fitting seeks to minimize. In the above equation, x i and y i represent the values of EW Lyα and X LIS O3 for the ith object in our sample, s x i , and s y i , are the associated uncertainties for that object, and ( ) x y , , is the closest point on the model curve to the observed values x i and y i , scaled by their corresponding uncertainties. Our model takes the following form: where the best-fit coefficients and 1D marginalized uncertainties are found to be The uncertainties in the model parameters are calculated by repeating the fit on 500 bootstrap samples of the data, where each bootstrap sample contains 377 points and their corresponding uncertainties selected at random (with replacement) from the true set of 377 points in the sample. Figure 7 displays the best-fit curve along with 100 representative fits to the bootstrap samples to demonstrate the uncertainty in the fit model.
The best-fit curve corresponds to c = 515 . Being above unity, this value indicates that the typical observations differ from the best-fit model by more than their formal uncertainties, such that there is intrinsic scatter in the relationship between EW Lyα and X LIS O3 that is described neither by our model nor by our estimated observational uncertainties.
In order to determine the fraction of the total variance in EW Lyα captured by the combination of our model and our measurement uncertainties, we perform the following exercise. We begin by assigning each object a fiducial value of EW Lyα and X LIS O3 according to the nearest point to the observed values on the best-fit curve (where proximity to the curve is calculated in 2D, weighted by the uncertainty in each dimension). Each point is then perturbed in both dimensions, with the perturbation drawn from a Gaussian distribution f(μ, σ), with μ=0 and σ equal to the estimated uncertainty in EW Lyα or X LIS O3 for that object. The resulting simulated data thus represent a hypothetical sample consistent with the model, with scatter given only by the observational uncertainties.
A new fit to the resulting simulated data set is calculated (i.e., new coefficients are calculated for Equations (15)-(17)), and the following statistics are calculated to assess the variance in the simulated data: (1) the Spearman rank correlation coefficient r s of the simulated data, and (2) the c 2D 2 coefficient assessing the goodness of fit of the simulated data to its own best-fit model. Repeating this process 100 times, we find that the resulting distribution of statistics have á ñ = ), we see that the simulated data have a tighter correlation and Figure 6. Lyα equivalent width vs. X Lyα , a linear combination of O3 and EW LIS that maximizes the Spearman rank correlation with EW Lyα : r Sp =0.49, p=1.5×10 −24 . Approximately half of the ordering in observed EW Lyα is explained by these two variables alone, and 90% of the variance in EW Lyα is accounted for by the combination of an exponential model and the 2D measurement uncertainties (Section 4.1). As in Figure 5, red squares correspond to stacked measurements of faint L∼0.1L * Lyα-selected galaxies from Trainor et al. (2015Trainor et al. ( , 2016, and the red diamond corresponds to measurements of MS 1512-cB58 from Pettini et al. (2002) and Teplitz et al. (2000). Figure 7. Data are the same as in Figure 6, with a best-fit exponential relationship (blue dashed curve) and 100 fits to bootstrap realizations of the data (faint red curves). The faint galaxy stacks and cB58 spectrum (square and diamond symbols) are not used in calculating the best-fit model parameters, but their positions are well described by our exponential model. closer agreement with the fitted relation than do our real data; again, this indicates that the real data have additional sources of intrinsic scatter not described by our model or our estimated measurement errors.
We therefore model the intrinsic scatter in the relationship of EW Lyα and X LIS O3 by assuming an underlying equation of the following form:

O3 int
where f(μ, σ) is a number drawn from the Gaussian distribution with μ=0, and σ int describes the intrinsic scatter Figure 8. Conditional probability distributions for detecting Lyα in net emission (i.e., with EW Lyα > 0) as a function of other galaxy parameters. Yellow (blue) bars represent the fraction of Lyα emitters (absorbers) found within a given bin as shown by the left-hand vertical axis. Bin sizes and boundaries are determined in order to ensure that at least 30 objects are included per bin while allowing the number of bins to vary. Black lines indicate the conditional probability of Lyα detection (according to the right-hand vertical axis) at a given value of the parameter shown on the horizontal axis. Dark-gray (light-gray) shaded regions indicate the 68% (95%) confidence intervals on this conditional probability. Curves are determined based on the unbinned, nonparametric model described in Section 5.1, which depends on the measured parameter values and their uncertainties. a P em Ly represents P(EW Lyα > 0 | X), where X is the parameter given on the horizontal axis. The parameters displayed here are all relatively effective predictors of Lyα emission, with X LIS O3 and O32 being particularly effective.
in EW Lyα at fixed X LIS O3 . 25 In this manner, we can simulate values of EW Lyα and X LIS O3 drawn from the model distribution (including random perturbations corresponding to the estimated measurement uncertainties, as above), but with an additional term corresponding to the assumed intrinsic scatter that can be increased until the simulated data have similar total scatter (parameterized by r s and c 2D 2 ) to the observed data.
In practice, we find that a value σ int =7±1 Å produces the best match to the statistical properties of the observed data, with r s =0.50±0.03 and χ 2D 2 /N dof =1.33±0.06. Adopting this value for σ int implies that the intrinsic variance in EW Lyα not described by our model is , we find that ∼90% of the total variance in EW Lyα is accounted for by our exponential model and the estimated measurement errors.
The apparent success of our two-parameter model for predicting EW Lyα deserves some inspection, particularly in light of the well-known tendency (described in Section 1 and below) for Lyα emission to display substantial scatter with respect to galaxy properties. We address this topic in Section 6.

Conditional Probabilities for Lyα Detection
Despite the apparent success of the model above in selfconsistently describing the behavior of EW Lyα , it has several shortcomings. Specifically, the model described above allows us to predict the net Lyα emission of a given galaxy based on measurements of EW LIS and O3, but Figures 6-7 reveal substantial observational scatter in this relation that is not described by our exponential model. Furthermore, it is not obvious that an exponential model is physically meaningful for describing the dependence of Lyα emission on these properties.
An alternative method of describing the dependence of Lyα emission on galaxy properties would be to relinquish analytical functions for EW Lyα in favor of a nonparametric model for the conditional probability of detecting Lyα, given a value for one or more other galaxy parameters. While this method does not provide an expected numerical value for EW Lyα , it allows us to explicitly describe how the detection fraction (as well as the stochasticity in observed Lyα emission) varies as a function of galaxy properties.

Methodology
For the majority 26 of the empirical parameters listed in Table 1, we construct conditional probability functions in two ways. First, we bin the full set of galaxies for which each indicator is measured into bins with widths that are allowed to vary in order to contain at least 30 galaxies per bin. 27 Within each bin, the fraction of galaxies with detected Lyα in net emission (EW Lyα > 0) is plotted as a yellow bar in the corresponding panel of Figures 8-9; the fraction of galaxies that are net Lyα absorbers (EW Lyα 0) is plotted as a blue bar in the negative direction. This empirical Lyα emitter fraction as a function of an observed parameter X can be interpreted as the conditional probability distribution P(EW Lyα > 0 | X), hereafter ( ) a P X em Ly . Displaying the empirical Lyα emitter fraction in this way has the useful property that every galaxy contributes to the number of absorbers or emitters for a single bin, which means that each bin is independent. However, assigning each galaxy to a specific bin based on the observed value of a given Lyα-predicting parameter neglects the fact that the parameter values that define the horizontal axes of Figures 8-9 have their own observational uncertainties, which inhibits their assignment to a single specific bin. Likewise, the observational uncertainty on EW Lyα prevents a clean separation between observed Lyα emitters and absorbers. For this reason, we construct a second, unbinned estimator of ( ) a P X em Ly that explicitly incorporates both of these uncertainties. Our unbinned estimator of ( ) a P X em Ly represents each galaxy observation as a pair of 1D Gaussian probability distributions of the form ( | ) f m s s = = X X, i Xi , , where X i and s X i , are the observed value and observational uncertainty, respectively, on parameter X for galaxy i. Two distributions of this form are generated for each galaxy, with one distribution being normalized by the probability of the observed galaxy being an Lyα emitter and the second being normalized by the probability of being an absorber. Formally, we calculate the probability of an observed galaxy being an Lyα emitter from the measured value of EW Lyα and its corresponding uncertainty s a EW,Ly : ⎛ EW Ly EW,Ly will have P em ≈1 and P abs ≈0; the "emitter" and "absorber" Gaussian probability distributions are then normalized such that their integrals equal P em and P abs , respectively. The inferred incidence η of Lyαemitters (absorbers) in our sample is then the sum of all the emitter (absorber) distributions: A galaxy that is a clear Lyα emitter (P em ≈ 1) will therefore increase the integral of η em by one over the distribution of X. This contribution to the incidence will be localized to a specific value 28 of X if s X i , is small, whereas the contribution to the total incidence will be spread across a large fraction of the distribution if s X i , is large. Finally, the inferred probability of Lyα emission given the observation of a galaxy with measured parameter value X is taken to be the inferred incidence of Lyα emitters divided by the total incidence of emitters: Ly em em abs 25 Note that σ int is assumed not to vary with X LIS O3 for simplicity. While the observed distribution of EW Lyα shows significantly more scatter at large X LIS O3 than at smaller values, we find that this effect is entirely consistent with the trend of increasing measurement uncertainties on both axes as X LIS O3 and EW Lyα increase. 26 We do not present conditional probability functions for ( ) -E B V ; neb the conditional probability density function is similar to that of ( ) -E B V SED but weaker. 27 Note that the number of bins therefore depends on the total number of galaxies for which a given parameter is measured; see Table 1. 28 To reduce the sample variance of this estimator, we replace s X i , with the distance DX i to its nearest neighbor in the observed distribution of X in cases where s D > X i Xi , . This occurs for ∼2% of objects, typically in the extrema of the distribution.
This conditional probability is displayed as a black curve in each panel of Figures 8-9. Confidence intervals for this curve are calculated by repeating the process above for 100 bootstrap realizations of the data and identifying the central 68% and 95% intervals; these are shown as gray shaded regions in Figures 8-9.

Analysis of Conditional Probability Distributions
The conditional probability distributions displayed in Figures 8-9 reinforce the relationships between EW Lyα and other galaxy properties shown in Table 1. Net Lyα emission is strongly associated with weak LIS absorption (EW LIS  −1) and high ionization/excitation (O30.7; O320.5). In particular, our X LIS O3 metric is able to more clearly discriminate between Lyα emitters and absorbers than either EW LIS or O3 alone; even the most extreme values of the latter metrics only predict EW Lyα  > 0 with ∼60% probability, whereas the highest values of X LIS O3 predict Lyα in net emission in ∼80% of cases. Likewise, 25% of galaxies with < X 0 LIS O3 display Lyα in net emission. The distribution for v Lyα is also shown in Figure 8, which demonstrates the strong dependence of a P Other than v Lyα , O32 is again the most direct predictor of net Lyα emission or absorption: even the dust-uncorrected (i.e., raw) values are approximately as effective at predicting EW Lyα >0 as X LIS O3 , and the dust-corrected measurements predict Lyαemission with >80% probability at the highest O32 values and 20% at the lowest values. As discussed in Sections 3.3.2-3.3.3, this strong correlation may be due in part to the fact that dust-corrected O32 is a quite direct measure of the ionization parameter in ionization-bounded nebulae, but it may also be due to the fact that elevated O32 may indicate density-bounded nebulae that facilitate Lyα escape as well as production. Nonetheless, an observer who merely wishes to predict the net Lyα emission of a galaxy (remaining agnostic to the circumstances that facilitate this emission) will find O32 to be an effective indicator of this emission. However, the caveat to the effectiveness of this indicator remains the observational difficulty of obtaining high-S/N measurements of the [O III] and [O II] emission lines (the latter of which can be extremely faint in the high-ionization galaxies typical at high z), as well as the Balmer lines necessary to correct for differential attenuation by dust. This effect is seen in the small number of bins (which must each contain 30+ galaxies) in each of the O32 plots, as well as in the broad confidence intervals for the corresponding unbinned relations.
Note that in many of the panels of Figures 8-9, the trends apparently reverse toward the extrema of a given parameter value. Given that these regions of parameter space are sparsely populated (as can be seen based on the width of the blue and yellow bars) and the confidence intervals on a P em Ly diverge, we interpret the majority of this apparent aberrant behavior as a regression toward the overall average rate of Lyα emission ( ( ) » á ñ » a a P X P 0.5 em Ly em Ly ) when the parameter uncertainties are large. This effect is particularly noticeable in the panel for EW LIS , where the highest and lowest bins appear particularly dominated by observational error-in this case, we expect that the trend between EW LIS and EW Lyα is monotonically positive over the range in which both parameters are well measured. Conversely, the relatively tight confidence intervals for X LIS O3 and O32 appear to indicate real flattening in their relationships with a P ; em Ly some nonnegligible stochasticity in EW Lyα appears to be present that is not accounted for by these factors even when they are well measured, as demonstrated by the ∼20% of Lyα emitters (absorbers) that are present even at the lowest (highest) values of these parameters. Figure 9 displays the conditional probability distributions for several parameters that are only weakly correlated with EW Lyα , again reinforcing the results shown in Table 1. In particular, SFR Hα and M UV display negligible predictive power related to Lyα emission over the parameter space sampled by our galaxies. E(B − V ) SED is an effective predictor of a P em Ly at the lowest reddenings (E(B − V ) SED 0.2), 29 but Lyα emitters and absorbers appear almost equally common among galaxies with higher reddening values. Surprisingly, the incidence of Lyα emitters appears to grow slightly with increasing E(B − V ) SED for E(B − V ) SED >0.2. This effect may be dominated by the relatively uncertain values of E(B − V ) SED , which is somewhat degenerate with stellar population age, causing a regression toward the population mean. a P em Ly shows a moderate increase as SFR SED decreases or sSFR increases. Stellar mass (M * ) displays perhaps the most interesting relationship with EW Lyα that is not obvious from the Spearman correlation alone: Lyαemission probability increases significantly among both the lowest-and highest-mass galaxies. The low-mass relationship (along with the trends in SFR and sSFR) may reflect the tendency for low-mass, low-SFR galaxies to have relatively porous ISMs and high nebular excitation (see, e.g., Trainor et al. 2015Trainor et al. , 2016. Although galaxies with clear signatures of AGN activity were removed from our sample, the excess of Lyα emitters at high galaxy masses may reflect residual AGNs in our sample. It is also possible that some galaxy masses in our sample are overestimated owing to strong line emission that contaminates the rest-frame optical photometry; SED-fit masses are especially sensitive to this effect at high redshift (see, e.g., Schenker et al. 2013). In this case, the increase in Lyα emission among galaxies with large inferred stellar masses could be caused by the underlying association between Lyα and optical emission line strength (i.e., nebular excitation). The Lyα-emitting behavior of high-mass galaxies will be investigated in future work.
In general, these conditional probability distributions may be used to inform analyses of the Lyα detection fraction of galaxies at the highest redshifts, where the opacity of the neutral IGM may suppress the observed Lyα emission from intrinsic Lyα emitters. By using other observed properties of galaxies as priors input to the distributions above, it will be possible to more accurately characterize the degree to which evolution in both IGM and galaxy properties shapes the distribution of observed Lyα emission.

Comparison to Recent Work
A few recent studies in the low-redshift universe have measured correlations between Lyα emission and other galaxy properties with the goal of predicting the Lyαemission. Hayes et al. (2014) present data from the Lyman-Alpha Reference Sample (LARS; Östlin et al. 2014), comparing Lyαemission with 12 different global galaxy properties derived from imaging and spectroscopy. They find significant correlations in which normalized 30 Lyα emission is highest among galaxies with low SFR, low dust content (inferred by nebular line ratios or the UV slope), low mass, and nebular properties indicative of high excitation and low metallicity. The Hayes et al. (2014) study differs from the work presented here in that their individual measurements have much higher S/Ns but are much fewer in number (12 galaxies in LARS vs. the 703 galaxies in this paper). Furthermore, the original LARS sample did not include rest-UV continuum spectroscopy covering the interstellar absorption lines; while these data were later collected via HST/COS spectroscopy and presented by Rivera-Thorsen et al. (2015), there is currently no simultaneous analysis of the predictive power of combined rest-UV and rest-optical spectroscopic diagnostics of Lyα emission in LARS. 29 See Section 6 for a description of other recent studies of E(B − V ) versus EW Lyα . 30 Hayes et al. (2014) consider several metrics for Lyα emission, including the total Lyα luminosity (L Lyα ), EW Lyα , L Lyα /L Hα , and f esc,abs ; only the latter three quantities show strong correlations with galaxy properties.
Recent results by Yang et al. (2017) are more directly comparable to those presented here: Yang et al. (2017) analyze HST/COS spectra of 43 "green pea" galaxies at z∼0.1-0.3 with Sloan Digital Sky Survey optical spectroscopy. The authors find that the escape fraction of Lyα is anticorrelated with the velocity width of the Lyα line profile, the nebular dust extinction, and the stellar mass, as well as positively correlated with the O32 ratio. Each of these relationships has Spearman rank correlation coefficients of r∼0.5-0.6; while the contributions of observational uncertainties to this scatter are not explicitly calculated, the quoted uncertainties suggest that these contributions are negligible. Furthermore, Yang et al. (2017) fit a linear combination of the nebular extinction E(B − V ) and the velocity offset of the red Lyα peak, finding that the resulting relation fits the observed Lyα escape fraction with a 1σ scatter of 0.3 dex. Notably, this multiparameter relationship is similar to our own work in Section 4, but it differs in that both included parameters (the Lyα velocity offset and inferred dust extinction) fall into the category of empirical parameters we have associated with Lyαescape (Section 3.2), rather than including a proxy for the efficiency of Lyα production (Section 3.3). As with the Hayes et al. (2014) study, Yang et al. (2017) have the advantage over our own work of measuring individual spectroscopic parameters at high S/N, but they include a much smaller sample size. The Yang et al. (2017) sample also only includes galaxies selected to be spatially compact, low mass, and high excitation, whereas the KBSS sample includes 15× more galaxies over a much broader range of properties. Nonetheless, the different selection biases and relative advantages of low-and high-redshift galaxy samples make these z∼0-0.3 surveys (including continued HST/COS spectroscopy) an effective complement to the work presented here.
While not directly focused on predicting Lyα emission, recent work from the HiZELS survey (Geach et al. 2008;Sobral et al. 2009) has pointed to the complex relationships between Lyα and other recombination line emission. In particular, Oteo et al. (2015) demonstrate that galaxies selected on the basis of Hα emission display only weak Lyα emission on average. This result is consistent with the first panel of Figure 9 and the discussion in Section 3.3.1 of this paper, as we also find L Hα to be a poor predictor of Lyαemission. Although the HiZELS galaxies do not have deep rest-UV spectra, we expect that their low Lyα escape fractions would be associated with strong LIS absorption, such that they may lie near cB58 in the O3-EW LIS plane displayed in Figure 5 (i.e., the parameter space associated with high rates of intrinsic Lyα production but low rates of Lyα escape).
In more recent work, Du et al. (2018) present a systematic study of the redshift evolution of rest-UV spectroscopic properties of galaxies over z≈2-4, including the variation of Lyα with other galaxy properties in this epoch. The Du et al. (2018) spectroscopic sample is also drawn from the KBSS and has substantial overlap with the galaxies presented here. Broadly, the authors find that the relationship between EW Lyα and EW LIS is invariant with redshift for 2z4, with a similarly invariant (but weaker) relationship between EW Lyα and E(B − V ) SED . In addition to these indicators of Lyα escape, Du et al. (2018) argue that the association between the equivalent width of C III] and EW Lyα (which they find to be similar at z ∼ 2 and z ∼ 3) represents the dependence of EW Lyα on the intrinsic production rate of Lyα emission. One interesting point of comparison between their results and those presented here regards the relationship between EW Lyα and M UV ; in their z∼2 galaxy bin (which is most similar to the galaxies presented here), Du et al. (2018) find no relationship between EW Lyα and M UV , but they find an increasingly strong relationship with increasing redshift. Likewise, Oyarzún et al. (2017) find a positive correlation between M UV and EW Lyα , but the relationship appears to rely on the inclusion of lowerluminosity galaxies than are included in this sample (although similar to the galaxies described in Trainor et al. 2015Trainor et al. , 2016 see Figures 5 and 6 here) and the extension to higher redshifts. This variation may help explain the high EW Lyα values seen generically in the lowest-luminosity galaxies at z∼2 (e.g., Stark et al. 2013), which are perhaps better analogs for typical galaxies at the highest redshifts than the more luminous z∼2 galaxies described in this paper.
The results of Du et al. (2018) are in general agreement with those presented here, with the exception that Du et al. (2018) limit their analysis to composite rest-UV spectra (i.e., they include no rest-optical data, nor do they analyze the spectra of individual galaxies), and they consider the redshift evolution of these trends. The Du et al. composite spectra achieve higher-S/ N measurements of individual features than the measurements we consider here, but they also smooth over the intrinsic object-to-object variation among galaxies-variation that we highlight in this paper, particularly in Section 5. Together, therefore, these two studies provide a comprehensive view of the average trends among net Lyα emission and the processes of production and escape, while also demonstrating the substantial stochasticity that accompanies these broader trends.
Another complementary aspect of these works is that Du et al. (2018) demonstrate that individual parameters related to Lyα production and escape (i.e., EW LIS , E(B − V ), and proxies for nebular excitation) show similar relationships with EW Lyα across 2<z<4, despite the fact that the ubiquity of EW Lyα emission itself (and its dependence on M UV , SFR, and M * ) evolves significantly over this period. This invariance suggests that the models for Lyαemission developed here-particularly those shown in Figures 6 and 8-may be expected to remain useful even at higher redshifts where the intrinsic Lyαemission of galaxies is more difficult to directly measure.

Conclusions
We have presented an empirical analysis of factors affecting Lyαproduction and escape in a sample of 703 star-forming galaxies from the KBSS at z≈1.5-3.5. Our primary indicators of Lyα escape efficiency include the velocity offset of the Lyα line and the equivalent width in absorption of low-ionization interstellar lines, EW LIS , and we find that these proxies for Lyα escape are strongly associated with the directly measured Lyα escape fraction, f esc . Our indicators of Lyα production include the O3 and O32 ratios, which we argue are effective diagnostics of the ionization conditions within the H II regions from which Lyα photons originate. Several other galaxy parameters, including stellar mass, SFR, luminosity, and reddening, are shown to have much weaker relationships with observed Lyα emission.
We propose that EW LIS and O3 are the most useful predictors of EW Lyα because of their potential for observability in cases when the Lyαline is not directly detectable (or may be strongly affected by IGM absorption) and because of their strong individual correlations with EW Lyα and lack of correlation with each other. We then construct a new quantity, X LIS O3 , which is a linear combination of EW LIS and O3 with coefficients chosen to maximize the association between X LIS O3 and EW Lyα .
We find that the combination of O3 and EW LIS predicts net EW Lyα with less scatter than any single variable captured by our survey that does not require measurement of the Lyα line; ∼50% of the ordering in observed EW Lyα is captured by X LIS O3 . After accounting for measurement uncertainties and fitting an exponential model for EW Lyα as a function of X LIS O3 , we estimate that the combination of our model and observational error account for 90% of the total variance in EW Lyα at fixed X LIS O3 . We also estimate the conditional probability of detecting net Lyαemission or absorption in slit spectroscopy of a galaxy as a function of various galaxy parameters. We find that galaxies with > X 0.6 LIS O3 have an 80% probability of being net Lyα emitters, while those with < X 0 LIS O3 have less than a 25% probability of exhibiting net emission. Similarly strong variation in the probability of net Lyα emission is seen when adopting a prior based on O32, while constraints on photometric or SED-fit parameters or Hα-based SFR have negligible utility as priors over the parameter space probed by our sample.
Given the many factors affecting net Lyα emission, our twoparameter model for X LIS O3 is remarkably successful at describing the variation in Lyα emission across a large, heterogenous set of star-forming galaxies. We suggest that this success indicates that the wide variety of processes affecting Lyα emission can be broadly categorized as relating to Lyα production or escape, and capturing these two different "meta-parameters" is an essential component of any model for Lyα emission from galaxies.