The Lyman Alpha Spectral Database (LASD)

Lyman $\alpha$ (Ly$\alpha$) emission from star-forming galaxies is an important tool to study a large range of astrophysical questions: it has the potential to carry information about the source galaxy, its nearby circumgalactic medium, and also the surrounding intergalactic medium. Much observational and theoretical work has therefore focused on understanding the details of this emission line. These efforts have been hampered, however, by an absence of spectroscopic reference samples that can be used both as comparisons for observational studies and as critical tests for theoretical work. For this reason we have compiled a large sample of Ly$\alpha$ spectra, at both low and high redshift, and created a publicly available online database, at http://lasd.lyman-alpha.com. The Lyman alpha Spectral Database (LASD) hosts these spectra, as well as large set of spectral and kinematic quantities that have been homogeneously measured for the entire sample. These can easily be viewed online and downloaded in tabular form. The LASD has the capacity for users to easily upload their own Ly$\alpha$ spectra, and all the same spectral measurements will be made, reported, and ingested into the database. We actively invite the community to do so, and the LASD is intended to be a long-term community resource. In this paper we present the design of the database as well as descriptions of the underlying algorithms and the initial Ly$\alpha$ emitter samples that are in the database.


INTRODUCTION
The Lyman alpha (Lyα) emission at 1215.67Å originates from the n = 2 − 1 transition of atomic hydrogen, where n is the principal quantum number. Lyα is intrinsically the strongest spectral line of astrophysical nebulae. The line strength combined with the restframe UV wavelength means that it becomes a readily observed beacon from high redshift sources. Indeed Lyα has seen extensive, and very successful, use for detection of high redshift galaxies in both narrowband (e.g., Rhoads et al. 2000;Rauch et al. 2008;Konno et al. 2014) * Hubble Fellow and spectroscopic (Santos et al. 2004;van Breukelen et al. 2005;Drake et al. 2017;Stark et al. 2007;Urrutia et al. 2019) surveys. Additionally, at these high redshifts, the Lyα transition is often the only observable spectral line in the observer-frame optical and is therefore commonly used for spectroscopic confirmation of very high redshift galaxies detected by dropout techniques. Furthermore, since the intergalactic medium (IGM) becomes more neutral and, thus, more opaque to Lyα photons towards higher redshifts, the (non)detection of Lyα emitting galaxies provides us with tight constraints on the progress of reionization (Dijkstra 2014;Mason et al. 2019).
Apart from a pure detection tool, the power of Lyα lies in its resonant nature and consequently in its susceptibility to neutral hydrogen within the emitting galaxy arXiv:2010.02927v2 [astro-ph.GA] 10 Jan 2022 or in close proximity shaping the emergent Lyα observables. However, interpreting Lyα emission from galaxies in terms of physical properties of the system or even using it for precise redshift determination is not trivial. This is because the Lyα transition is resonant and therefore Lyα photons experience extensive scattering in, and interactions with, the surrounding medium when escaping from virtually any environment (Harrington 1973;Neufeld 1990). This means that the emergent spectral line profile carries with it an imprint of the medium through which it travels, making it very complex but also potentially very informative of the physical conditions in the galaxy.
For this reason there has been extensive work done, both empirical (e.g., Verhamme et al. 2017;Rivera-Thorsen et al. 2015;Runnholm et al. 2020, see Hayes 2015 for a review) and theoretical (e.g., Neufeld 1990;Ahn et al. 2001;Verhamme et al. 2006;Gronke et al. 2017;Lao & Smith 2020), to attempt to decode or model Lyα and determine what the imprint of various physical properties of the galaxy is on the line. Community-wide, however, there are major difficulties in the interpretation of these results: individual observational samples of Lyα emitting galaxies are often small, have been assembled in a piecemeal fashion, and different researchers have made different sets of measurements, using various definitions and methodologies/algorithms. For empirical studies this means that recovered properties may not be comparable, correlations have small statistical significance, and robust conclusions are hard to draw. For theoretical studies on the other hand it means that empirical reference samples, against which the models can be tested, are hard to come by, which complicates sanity checks of model outputs.
The main quantities derived from the Lyα spectra reflect either photometric values (flux, luminosity, EW) or kinematic properties (e.g. velocity shifts). Further quantities such as various asymmetry and skewness measures are weighted combinations of both wavelength and flux axes of in the one-dimensional spectra. For example simple velocity offsets of the main (usually red) Lyα peak are frequently derived. See for instance Steidel et al. (2010); Hashimoto et al. (2013); McLinden et al. (2014); Rivera-Thorsen et al. (2015), all of which are derived by ascribing a characteristic velocity to the Lyα. This characteristic velocity may be derived from a Gaussian fit to the line, the velocity of the peak emission, the first moment measured over a certain window, or possibly other definitions. As Lyα is redistributed in velocity space by scattering in galaxy winds and the IGM, asymmetry measurements have often been employed. The 'class' of asymmetry measurements has over the years included parametric fitting of split-Gaussian profiles, non-parametric measurements of flux distributions bluewards and rewards of line centre (when z sys is known, e.g. Erb et al. 2014) or with respect to the maximum flux (when z sys is unknown, e.g. Rhoads et al. 2003), or recast estimates of the skewness statistic Kashikawa et al. 2006). All of these measurements differ between application, group, and historical precedent and, moreover, will further depend upon what rest-wavelength/velocity window is used for the calculation, inclusion of errors, etc.
In this work we present the 'Lyman Alpha Spectral Database' (LASD), the goal of which is to help resolve some of the issues described above. The database and its associated website http://lasd.lyman-alpha.com allow the community to upload calibrated Lyα spectra which will be processed through a homogeneous analysis pipeline. In this paper, we present the database structure, web interface, and the analysis pipeline in Sec. 2. We describe the initial dataset consisting of ∼ 340 publicly available Lyα spectra in Sec. 3, and present some tests and some example correlations in Sec. 4. We present some concluding remarks and discuss the outlook for the LASD in Sec. 5.

Database & Web Interface
The LASD is built entirely in python using the Django 1 web framework both to deliver the user interface and manage the PostGreSQL database.
The database is structured into the following three primary tables: 1. Observations: This holds all the raw data that was uploaded by the user as well as the unpacked and calibrated spectrum. Note that not all of this data is available to the user (see below) but storing the uploaded spectra allows us to reanalyze them in the future, e.g., to introduce new measurements.
2. Objects: This table holds entries for all the galaxies represented in the database. Galaxies are defined by their coordinates and by name. They are created on the fly during the upload and users can specify the source with RA and DEC and optionally assign it a name. The name field also allows users to search for already defined objects. If the new object is within 2 of a previously defined object it is instead assigned to that.  3. Measurements: This table holds all the results of the the automated analysis: fluxes, kinematic properties, etc. It is separated from the uploaded data so that -in the eventuality that a major error is discovered in the analysis -this table can be safely cleared and reconstructed without endangering the uploaded data.
The first two tables are connected via a many-to-one relation meaning that one object may have multiple associated spectra but not the reverse. We designed this structure to accommodate the possibility that any given galaxy may have multiple observations with different instruments or settings. Each observation is then connected to one set of measurements using an estimated redshift (see Section 2.3), and one set using an independently obtained systemic redshift if it is supplied by the user. Note that this implies that we explicitly allow for several Lyα spectra to be uploaded for a single galaxy. This is useful if, e.g., an object has been observed with several instruments or different extraction routines are used. It is hence the responsibility of the user to compile a statistically relevant sample (for the individual usage case) from the LASD.
The web interface allows for upload of single objects as well as multiple observations at once in tarball-format and also allows for the observations and measurements to be downloaded. We have added the possibility for uploaders to mark spectra as non-downloadable if they wish to keep the original source files proprietary but the LASD automated measurements will nevertheless be included in the downloaded measurements summary. Apart from required parameters such as a redshift estimate and a reference for the spectrum, we also allow the user to provide some additional optional informa-tion such as star formation rates and a gravitionational lensing magnification estimate.
The full pipeline that a spectrum goes through is visually represented in Figure 1 and each step is described in more detail in the following sections.

Initial filtering
Before a spectrum is added to the database some tests are run to make sure that the spectrum is suitable and that we will be able to make robust measurements. Note that no manual inspection of the spectra is performed, and the integrity of the Lyα spectra, and their identification as actual Lyα emission lines, is left to the user. First the spectrum is converted to standard units ofÅ for wavelength and erg/s/cm 2 /Å for flux density and the lensing magnification factor is divided out, if these parameters are given by the user. Then the following filtering steps are applied: 1. First the spectral file is checked for basic consistency, such as a monotonic wavelength solution, and that no negative errors are present since negative errors indicate that the data is not trustworthy.
2. Next, the redshift given during upload is used to isolate a 2000 km/s broad region centered on the Lyα line. In this region the error vectors are checked against a criterion for sufficient 'good data': that less than 20 percent of the values that are identically 0 since such values can be indicative of problems with the detector and can cause problems for our algorithms.
3. Then the region ±2500 km/s around Lyα is checked to make sure that the spectrum contains a Lyα emission line. We also check that the

R F valley max
Ratio of luminosity density in the 'valley' between the peaks and the maximum peak R L cut neg Ratio of blueward luminosity and peak detection threshold R L cut pos Ratio of redward luminosity and peak detection threshold R L pos neg Ratio of redward over blueward luminosity

W std
Square-root of second moment of whole spectrum km/s W neg std Blue peak width as measured by square-root of second moment km/s W pos std Red peak width as measured by square-root of second moment km/s neg peak fraction Fraction of times a blue peak was detected pos peak fraction Fraction of times a red peak was detected skew Pearson's moment coefficient of skewness of whole spectrum spectrum is not dominated by Lyα absorption, by requiring that the error-weighted median of the edges of the ±2500 km/s window is smaller than the error-weighted median of the central ±500km/s.

4.
The last filtering step is to check that the Lyα peak has sufficient signal-to-noise for processing to be meaningful. In order to do this we calculate the signal-to-noise ratio of the continuum subtracted (see §2.3 for details on the continuum subtraction) spectrum in a sliding 250km/s broad window across the full ±2500 km/s spectral range.
We require a minimum SNR of 7 for the spectrum to be analyzed and included.

Analysis
The analysis for each spectrum consists of the following steps: 1. continuum subtraction, 2. redshift estimation, 3. computation of the spectral quantities.
For the continuum removal, we first take an iterative approach: we clip the data points that are 5σ below or above the median flux level 20 times, and the median of the remaining points is taken as the continuum estimate. Due to the presence of a peak, and the resulting skewed flux distribution, this estimate, however, is usually too large. We therefore refine this guess by masking the region around the peak 2 and taking the median flux of the remaining spectrum (weighted by the inverse of the error).
Estimating the systemic redshift using only the Lyα profile is a non-trivial problem, and it, as well as its implications, has been discussed in the literature (Adelberger et al. 2005;Steidel et al. 2010;Rudie et al. 2012;Verhamme et al. 2018;Byrohl et al. 2019). This is naturally due to the complicated diffusion in frequency and space Lyα photons undergo. Additional complications include, e.g., spatially varying intrinsic Lyα spectra (as probed by, e.g., Hα) combined with non-isotropic Lyα escape which makes even the definition of systemic redshift not unique.
To circumvent these problems, we chose to apply a simple definition which primarily characterizes a red and a blue peak in double peaked spectra in order to measure their quantities separately (see below). To do so, we choose the systemic redshift to be at the minimum between the two peaks in a double peaked spectrum, and blueward of the peak (thus, defining it to be the red peak) in a single peaked spectrum. This allows us to obtain, for instance, a natural red or blue peak width while at the same time recovering the redshift of Lyα emitting galaxies with known systemic redshift with satisfactory accuracy (cf. Fig. 4 and below).
In detail, the redshift estimate works by first running a peak detection algorithm: we use the method employed in  which is a modified version of a peak detection algorithm 3 in conjunction with a minor Gaussian smoothing with the width of 1 resolution element to reduce high frequency noise. The algorithm flags a peak (a valley) if the following N data points are at least a value of δ = 2.5 times the error in this region smaller (greater) than the candidate, and the minimum peak width is 7 data points. For our purpose, we executed the algorithm for N = (4, 6, 7, ..., 15) with the final result being the mode of the detected number of peaks. We constrain the separation between the peaks to be larger than 50 km s −1 , and smaller than 1200 km s −1 , and valleys are required to be surrounded by two peaks. If two peaks are detected in the spectrum, we use the valley between the peaks as the v = 0 estimate.
If only a single peak is detected we employ a simple iterative algorithm on the non-smoothed spectrum for finding the estimated systemic velocity. First we assume the highest point in the spectrum to be the red peak. We then use a 120 km s −1 wide sliding window to select the first spectral pixel that is no longer descending as line center. Specifically we select the pixel that is lower than the minimum of all other blueward pixels within the window plus their error.
For both the continuum removal and the redshift estimate, we explored a variety of different algorithms and parameter combinations and found that the ones described here work well. Note that if the true systemic redshift of a spectrum is supplied at upload, we still carry out the redshift estimation and subsequent analysis. In these cases the LASD will estimate all the spectral analysis quantities (see below) using both the measured and estimated z sys , and stores the measurements in two tables. This allows for a comparison of the resulting spectral quantities, homogenization of methods, and an evaluation of the applied redshift estimation algorithm.
For each spectrum we compute a range of spectral quantities, summarized in Table 1. They can be grouped in five categories: 1. Global quantities such as the continuum level (F c ), the luminosity density at line center (F lc ), the total luminosity (L tot ) or the equivalent width (EW ) of the spectrum. They are given in units of erg s −1 (km/s) −1 , erg s −1 , andÅ.
2. Peak positions (starting with the x prefix) and the resulting differences (Dx ). We define these positions as the point of the maximum luminosity density on the red/blue side ( max) as well as the first moment of the (continuum removed) flux distribution ( mean suffix). They are given as velocities in km s −1 .
3. Maximum luminosity densities (F ) and luminosities of the blue / red side (L ). If two peaks exist, we also report the luminosity density on the 'valley' between the peaks (F valley). Apart from the absolute values, we also report some ratios between them (R ) which are a useful for comparison. They are given in the same units as the 'global quantities' above.
4. The width of the peaks for which we use the fullwidth at half maximum (FWHM ) as well as the second moment of the continuum subtracted flux distribution (W ). Again, luminosities are given in erg s −1 and luminosity densities are given in erg s −1 (km/s) −1 .
5. We also compute the skewness of each peak for which we use Pearson's moment coefficient of skewness, i.e., where the sum is taken over the red / blue side andx (σ) are the first (square root of the second) moment of that side. Figure 2 shows a visual representation of the some of these measurements. We elected to use purely nonparametric properties (such as moments and weighted luminosity densities) as opposed to parametric fitting for several reasons (see also discussion in Herenz et al. 2020, and references therein). The primary reason is that we require the LASD analysis pipeline to be fully automatic and ensuring the stability of non-supervised parametric model fits is non-trivial. The second reason is that the large variety of spectral profiles that are seen in Lyα is difficult to capture in parametric models especially when model selection and tweaking needs to happen in a nonsupervised fashion. Additionally this complexity leads to disagreement in what functional shapes best model the line.
In order to quantify the uncertainty of the computed spectral quantities, we repeat the calculation 100 times and in between 'shuffle' the spectrum. That is, we draw a new flux in each bin from a Gaussian with mean and standard deviation being the reported flux and error, respectively. We then repeat the redshift estimation process, and if the systemic redshift (and uncertainty) is given by the user, draw a new redshift from a Gaussian defined by these values.
Ultimately, this procedure yields (i) a redshift estimate plus uncertainty 4 , (ii) a set of spectral quantities using this computed systemic redshift as well as their uncertainties, and, if an independent systemic redshift has been uploaded by the user, (iii) another set of these quantities plus uncertainties.
The database is specifically designed to hold observational spectra but the same analysis of simulated Lyα spectra will enable simple comparison between observations and simulations. For this reason we have made the analysis pipeline applied by the LASD available as an open source software package 5 .
We initially populate the LASD with a large number of Lyα spectra from two main archival sources, which we describe here. We use two of the largest repositories of publicly available data, with the aim to cover both low and high redshifts with relatively homogeneous data. At the low-z end we use data obtained with the Cosmic Origins Spectrograph (COS; Green et al. 2012) onboard the Hubble Space Telescope, obtained through the Barbara A. Mikulski Archive for Space Telescopes (MAST) 6 . For high-z galaxies we use publicly available data obtained with the Multi-unit Spectroscopic Explorer (MUSE; Bacon et al. 2010), mounted at Unit Telescope 4 of ESO's Very Large Telescope (VLT), obtained through the VizieR database 7 (Ochsenbein 1996). These are also the same spectra analyzed in Hayes et al. (2020), for which preliminary versions of the LASD software were also used. We stress that while these samples are large and comprise various selection functions, they are neither complete nor unbiased. We now discuss the HST and VLT spectra in turn.
3.1. HST/COS spectra at z < 0.44 All the low-z galaxies were pre-selected for observation based upon known characteristics, and have the advantage of having well-measured spectroscopic redshifts, usually derived from optical line emission. The COS has targeted hundreds of galaxies with numerous General Observer (GO) and Guaranteed Time Observations (GTO) programs, using various spectral settings. The most common of these setting are the medium resolution gratings G130M and G160M, which span wavelengths of approximately 1150-1450Å and 1350-1750Å, depending upon the elected central wavelength setting (CEN-WAVE). This places an upper limit on the Lyα redshift of 0.44, although there is a natural bias towards lowerz that results from various sample-selection and sensitivity issues. As the Earth's upper atmosphere also glows in Lyα (with higher surface brightness than any astrophysical source), all G130M spectra are contaminated by a geocoronal Lyα emission feature at λ = 1215.67Å. We therefore place a lower limit on the recession velocity of our targets of 2500 km s −1 in order to separate Lyα from the geocoronal feature, although in practice the lowest redshift system included is Haro 11 with z = 0.02 (6000 km s −1 ). Our sample comprises data from the following surveys, in approximately chronological order of observation: • GO 11522 and 12027 (PI: Green). These galaxies stem from the COS GTO programs to study Lyα in low-z (0.02-0.06) starburst galaxies, from the Kitt Peak International Spectroscopic Survey (Salzer et al. 2001). They were primarily Hαselected, have star-formation rates of ≈ 0.1 to 10 M yr −1 , and Lyα is captured by the G130M grating. They were first published in Wofford et al. (2013).
• GO 11727 and 13017 (PI: Heckman). These galaxies were observed in order to understand the UV properties (e.g. stellar continua and interstellar absorption lines and wind/outflows) in low-z objects (0.09 < z < 0.21) with properties analogous to those of Lyman Break Galaxies. They were selected from the GALEX and SDSS surveys to overlap with LBGs in terms of their SFRs (≈ 0.3 to 60 M yr −1 ), UV compactness, and metallicity. They were observed with both G130M and G160M gratings, and spectra are published in Heckman et al. (2011Heckman et al. ( , 2015. • GO 12269 (PI: Scarlata). This sample is the only low-z study that was originally selected by Lyαemission, which was obtained using slitless spectra from the GALEX satellite (Cowie et al. 2010(Cowie et al. , 2011. They were observed with COS in order to study the Lyα emission profiles at higher spectral resolution with the G160M grating, lie at 0.19 < z < 0.34, and have SFRs of ≈ 1 − 100 M yr −1 . A stack of all these spectra is presented in Figure 8 of Songaila et al. (2018).
• GO 12583 (PI: Hayes). These galaxies were selected in order to study the Lyα morphology with HST imaging, as part of the • GO 12928 (PI: Henry). These galaxies were selected from the first catalogs of starbursts known as 'Green Peas' (Cardamone et al. 2009) which are particularly compact (hence 'peas') and show exceptionally high equivalent width of optical [O iii]+Hβ emission lines (giving them a green observed color at 0.18 z 0.44). They were followed up with COS to study the Lyα profiles and outflows/winds. Because of this selection, they occupy a narrow range in SFRs and metallicities (SFR = 5-25 M yr −1 ; 12+log(O/H) ≈ 7.9−8.1); spectra (G160M for Lyα) are published in Henry et al. (2015).
• GO 13293 and 14080 (PI: Jaskot). The aim was to study the Lyα emission and proxies for the neutral gas column density (as a proxy for the escape of ionizing radiation) in a sample of green pea galaxies with exceptionally ionizing stellar populations (defined by having very high line ratios in the optical). They have redshifts of 0.027 < z < 0.14 which places Lyα in both the G130M and G160M gratings, depending upon redshift and in turn program ID. These spectra are published in Jaskot & Oey (2014) and Jaskot et al. (2017).
• GO 14201 (PI: Malhotra) These galaxies are also a sub-set of the green peas, and were selected specifically to study the Lyα output of galaxies as a function of various other properties. They have SFRs of 4-40 M yr −1 and redshifts of 0.18 < z < 0.33, which places Lyα in the G160M grating. Spectra are published in Yang et al. (2017), although note that this paper also compiles spectra from many of the programs mentioned above, including 11727, 12928, and 13293.
• GO 13744 (PI: Thuan), 14635, and 15136 (PI: Izotov). The first two programs were designed to study the ionizing emission from Green Pea galaxies (13744) and GPs with extreme [O iii]/[O ii] ratios (14635). This places them at somewhat higher redshifts, z = 0.29 − 0.43 and redshifts Lyα into the G160M grating. All these galaxies emit a substantial fraction of their Lyman continuum radiation. The final program was designed to study the Lyα emission from similar objects (15136), but concentrated at lower-z (0.03-0.07), placing Lyα in G130M. These galaxies have SFRs of 15-40 M yr −1 and spectra are published in Izotov et al. (2016Izotov et al. ( , 2018Izotov et al. ( , 2020. We are mainly concerned about the Lyα emission from star-forming galaxies, and do not consider programs targeting active-galactic nuclei, AGN (or those where the probability of AGN inclusion is high; e.g. GO 12533 and 13407, PI: Martin). There are also a number of galaxies with Lyα data from COS, but for which only low resolution spectra have been obtained with the F140L setting. We do not consider these spectra for the initial population of the database.
We obtained all these data from the MAST archives, reprocessing everything homogeneously with Version 3.3.7 of the calibration pipeline (CALCOS). We first check the centering of the galaxies in the COS near ultraviolet acquisition images, and the central wavelength of the geocoronal emission lines in the extracted spectra for every integration, to ensure an accurate wavelength solution. We reject a very small number of individual exposures that have anomalously short integration times or shutter failures. We then use a custom script to combine the individual spectra for each system, conservatively rejecting all spectral pixels with data quality (DQ) flags not equal to zero. We examine the error spectrum for each spliced spectrum, and contrast it with the error spectrum expected from the galaxy spectrum and Poisson statistics; we then follow the method outlined in Section 3.3 of Henry et al. (2015) to recompute the error spectrum, which differs significantly from expectation in the cases of poorly exposed spectra. We finally rebin the signal and error spectra to critically sample their native spectral resolution -simply binning by a uniform factor of six spectral pixels -although ultimately this process is only aesthetic and should not affect the quantities derived by the LASD algorithms. The final intrinsic resolving power (R ≡ λ/∆λ) varies between 13,000 and 19,000 depending upon grating, precise wavelength of redshifted Lyα, COS lifetime position, and the size of the Lyα-emitting region with respect to the COS aperture.
3.2. VLT/MUSE spectra at 2.9 < z < 6.6 MUSE has revolutionized high-redshift surveys for emission line galaxies since its installation at VLT. Because of its very large number of detectors, MUSE simultaneously has a very large field-of-view (60 × 60 ), a small pixel area (0.2 × 0.2 ), and long-baseline optical wavelength coverage (λ = 4800−9300Å) at 1.25Å sampling. Consequently, MUSE samples a cosmic volume of ≈ 10,000 Mpc 3 for Lyα-emitters in every pointing and, because of its high throughput and the 8.2 m aperture of VLT, MUSE is a very efficient survey instrument.
MUSE has already been used for many Lyα emitter surveys, of various depth between short, 1-hour observing blocks and the stupendously deep field of 190 hours. To initially populate the LASD we take the publicly distributed data from the MUSE-WIDE survey (Urrutia et al. 2019), which comprises 44 MUSE datacubes in the CANDELS-Deep region of the GOODS-South field (see also . This data-release (DR1) contains 479 Lyα-emitting galaxies at z ≥ 2.9, compiled into a catalog including emission-line selected galaxies and by the extraction of spectra from photometrically pre-selected objects (e.g. Guo et al. 2013). We obtained all the MUSE-Wide spectra, reduced, identified and extracted by Urrutia et al. (2019), from the CDS/VisieR. For the analysis presented in this paper we further restrict ourselves to galaxies for which the lead-line is Lyα and the integrated SNR exceeds 8.

Ingestion into the LASD database
In principle the spectra could be uploaded to the LASD in the form in which we have hereto described. However as the focus is on emission line profiles and kinematic signatures, we restrict our catalogs to galaxies with strong Lyα lines/higher signal-to-noise. Naturally this modifies the selection bias towards more luminous galaxies at a given redshift. Specifically concerning the COS sample at low-z, almost none of these galaxies were selected on their Lyα emission (only GO 12269; PI: Scarlata) and a selection are net absorbers of Lyα or have weak features because of high Hi column densities (this is mainly true for the KISSR sample of Wofford et al. 2013). For both COS and MUSE-systems, we retain only galaxies with net Lyα emission lines, defined as line flux detected at SNR≥ 8 in a region of ±2500 km s −1 from the systemic redshift of Lyα. This reduced the number of COS spectra from 145 to 123, as some galaxies are Lyα absorbers. Using the same criterion, the MUSE-Wide sample is reduced from 479 to 234 Lyα-emitters, as many galaxies have SNR lower than quoted.
The LASD can accept spectra with either systemic redshifts (i.e. measured by other emission lines) or more approximate redshifts estimated from the Lyα line (see Section 2.3). For the COS samples we upload the spectra with known z sys , usually based upon nebular lines in the optical, where we compiled the redshifts from the papers listed in Section 3.1. For COS-observed low-z galaxies with SDSS spectra, we re-measure z sys using 20 of the strongest optical emission lines; for the remainder we refer to measurements presented in the papers listed in Section 3. For the MUSE-Wide sample the we take the redshift estimates from Urrutia et al. (2019).

VALIDATIONS AND EXAMPLE SCIENCE CASES
Once we had uploaded the above datasets to the database we downloaded the resulting measurements and in this section we demonstrate some results that can be derived directly from this dataset. In figure 3 we show the distribution of luminosities of the uploaded Lyα emitters together with some representative spectra from both the COS and MUSE samples. It is directly evident from this figure that there are a large variety of Lyα spectral profiles in the database, ranging from double peaks to P-Cygni type profiles to single peak profiles. Single peak profiles are relatively more frequent in the high redshift samples which could be due to resolution effects. However, it could also be due to blue peaks being preferentially absorbed in the increasingly neutral IGM at high redshifts (e.g. Hayes et al. 2020).

Redshift detection
One of the most crucial processes that happens in the LASD processing pipeline is redshift determination, since many high redshift galaxies lack independently determined redshifts. In our initial dataset this applies to all MUSE Wide galaxies. In order to check the accuracy of the automated redshift detection algorithm we compared the estimated redshift to the true systemic redshifts for COS sample where the redshifts are precisely and independently known from optical spectroscopy. The difference between the estimated and the true redshifts are shown in panel a of Figure 4. The differences show a relatively narrow distribution of values with a median value of -59 km/s and 25th (75th) percentile at -137(37)km/s. This indicates a slight shift towards detecting lower redshifts than true which is expected based on how our algorithm operates. Overall, however, the distribution shows no strong indications of any major systematic bias at COS resolutions.
However, we must also take into account the fact that the redshift determination algorithm is sensitive to the spectral resolution of the spectrograph. We cannot use the actual MUSE spectra to estimate the size of this effect since they do not have independent redshift determination. We therefore create artificial low resolution spectra by convolving the COS spectra with a kernel corresponding to R ∼ 4500 and rebin the spectrum to the Nyquist sampling for this resolution. This kernel combined with the effective resolution of COS for the extended Lyα emission of these low-z galaxies corresponds roughly to the spectral resolution of MUSE . The resulting convolved spectra were then run through the redshift detection algorithm again, and difference between the LASD-estimated and true redshifts are shown in panel b of Figure 4.
Panel b shows that at the lower resolution the distribution of differences is no longer entirely symmetric but shows a skew and a small systematic offset on the negative side. This means that for resolutions below R ∼ 5000 we are in general finding redshifts that are slightly too low. This is what is expected since for all single peak profiles the algorithm detects the blue edge of the Lyα line which is shifted towards the blue as the profile is broadened at lower resolution. The distribution is also somewhat broadened compared to the high resolution case which is most likely due to the impact of the large variety of spectral profiles responding differently to the spectral resolution decrease. For instance, double peak profiles that blend together and become unresolved at the lower spectral resolution will cause the left edge of the profile to move considerably blueward compared to the original valley position. The prevalence of this effect will strongly depend on the properties of the input spectrum, such as the intrinsic peak separation, and the resolution of the spectrograph. Simple testing on our high resolution sample shows that significant loss of blue peaks starts to occur below spectral resolutions of ∼ 4000 which is in agreement with the results of Verhamme et al. (2015), see Figure 7 in Appendix A.

Distribution of Lyα properties
In this section we present some of the distributions of Lyα properties in our initial sample, as well as some of the correlations present in our homogeneously measured dataset. This is not an exhaustive examination of all the correlations present in the dataset, and we encourage the reader to download the data and do further explorations. Figure 5 shows the distributions of total Lyα luminosity versus the equivalent widths. We first note that the EW and the luminosity do correlate for both the low and high redshift galaxy samples. It is also clear that the low-z COS sample of galaxies samples a wider range in luminosity and EW. This is expected since the selection functions for the COS galaxies are much more diverse than the Lyα selection of MUSE. It also seems that the slope of the Luminosity -EW relation is shallower for the high redshift galaxies, which is likely because of the very different selection functions of the low-and high-z datasets.
Another illustrative example of the available data is shown in Figure 6 which shows the ratio of the luminosities blueward and redward of line center compared to the width of the red Lyα peak. The figure again illustrates the differences between the two samples with the blue COS galaxies showing a much larger spread, particularly of the FWHM compared to the MUSE sample. This is partly expected since the spectral resolution, R, of MUSE is much lower than that of COS, causing the line to be broadened. However the observed FWHMs range from 100 km/s to ∼500 km/s, which is much broader than the instrumental resolution, which is approximately ∼150 km/s. The most probable cause of this difference is that the COS sample contains galaxies that are significantly less luminous than those in the MUSE sample and therefore are likely to have smaller intrinsic velocity dispersions. This is also corroborated by the data which shows that the MUSE galaxies do have comparable FWHM to COS galaxies of similar luminosities. While doing homogeneous measurements for a large set of galaxies provides opportunities for unique insights into the properties of Lyα radiation there are some limitations that are good to keep in mind when interpreting measurements and correlations. The first of these is the difficulty of accurately determining redshifts from the Lyα spectral line. While we demonstrated that the redshift detection is robust and show no major systematic deviations across the whole sample there are still some uncertainties for a single given galaxy and this uncertainty will propagate into some of the measured quantities, such as the peak positions. The automatic redshift detection will also likely cause the fraction of luminosity on the blue side of Lyα to be systematically underestimated.

Limitations
There is also an additional bias originating from spectral resolution effects which impact not only the redshift detection but also many of our measured quantities, such as second moments and FWHMs, directly.
We finally note that observations may be obtained with any kind of spectrograph (slits, fibers, integral field, etc). As apertures can vary in size, and so can the atmospheric seeing, slit losses will differ from observation to observation, especially for the extended Lyα line. Lensed galaxies could be even more affected. Given the number of possible choices to be made, we do not record information pertaining to aperture definition, but caution the community that aperture effects will be at play and affect the photometry at an uncertain level.

OUTLOOK
The usage of Lyα in astronomy has transitioned from purely theoretical to heavily data driven. New instruments at large telescopes such as MUSE, Keck Cosmic Web Imager (KCWI; Morrissey et al. 2012Morrissey et al. , 2018, and XSHOOTER (Vernet et al. 2011) increased the number of observed Lyα spectra by orders of magnitudes in recent years. Also on the theoretical side there is steady progress with new analytic solutions (Dijkstra et al. 2006;) and radiative transfer codes (Smith et al. 2017;Michel-Dansac et al. 2020) available to the community. With this progress it becomes increasingly important that the individual pieces of knowledge become better connected, i.e., that new data acquired is compared to existing one, and that theory is compared to data.
A major hurdle to overcome is the availability of Lyα spectra. While some telescopes do have their dedicated archives, the reduced spectra are not easily accessible. Furthermore, over the years different definitions of the same quantities developed which complicate comparisons.
In this work, we have presented the Lyman Alpha Spectral Database (LASD). The database consists of a analysis software and a web portal which allows for the access of homogeneously measured Lyα line quantities, and Lyα spectra -as well as the upload of new spectra which will then be automatically be analysed. The database was designed to increase the access to comparison samples for both observational and theoretical work and to facilitate the sharing of data across research groups. We have populated the database with a sample of 332 archival spectra which we also present in this paper.
The LASD is intended to be a tool for the Lyα community to use and in order for it as useful as possible we encourage the reader both to upload new spectra and to explore the LASD dataset. We highlight that when a user uploads a spectrum they have the choice to share the full spectral data, or simply the LASD measured quantities. Furthermore, we encourage the users to cite the original observational paper when using the LASD and we provide a BibTeX file containing these references for convenience.
Given acceptance by the community, we plan to expand the LASD to feature more measurements, more built-in data exploration tools, improved links to auxiliary data, broader upload file specifications, and other improvements suggested by the users. Input from the community is both welcome and encouraged. . Illustration of the LASD algorithms ability to characterize a spectrum as double peaked as a function of the spectral resolution. The blue line shows the spectra fraction that has a blue peak and red peak detection in more than 95% of Monte Carlo iterations and the black line shows fraction with detections in 99% of iterations