UV/Vis+ photochemistry database: Structure, content and applications

The “science-softCon UV/Vis+ Photochemistry Database” (www.photochemistry.org) is a large and comprehensive collection of EUV-VUV-UV–Vis-NIR spectral data and other photochemical information assembled from published peer-reviewed papers. The database contains photochemical data including absorption, fluorescence, photoelectron, and circular and linear dichroism spectra, as well as quantum yields and photolysis related data that are critically needed in many scientific disciplines. This manuscript gives an outline regarding the structure and content of the “science-softCon UV/Vis+ Photochemistry Database”. The accurate and reliable molecular level information provided in this database is fundamental in nature and helps in proceeding further to understand photon, electron and ion induced chemistry of molecules of interest not only in spectroscopy, astrochemistry, astrophysics, Earth and planetary sciences, environmental chemistry, plasma physics, combustion chemistry but also in applied fields such as medical diagnostics, pharmaceutical sciences, biochemistry, agriculture, and catalysis. In order to illustrate this, we illustrate the use of the UV/Vis+ Photochemistry Database in four different fields of scientific endeavor.


Introduction
Photochemical data and information such as absorption spectra, fluorescence spectra, photoelectron spectra, circular and linear dichroism spectra, quantum yields etc. are important parameters needed in many scientific disciplines. Back in 1999 there was deemed to be a need for publicly accessible on-line databases containing such data and information in digital format (machine-readable). A first "UV/Vis Spectra of Atmospheric Constituents" CD-ROM [1] was published which contained at that time the largest collection of UV/Vis spectral data available free-of-charge. Based on this CD and the motivation to provide spectral data and information in digital format to the scientific community via the World Wide Web, the "UV/Vis Spectra Data Base" went on-line in August 2000 as a non-profit project. In the beginning, the on-line database contained about 1200 spectra/datasheets for 120 substances and the compiled data extended beyond atmospheric research to allow for interdisciplinary application. Although other databases existed at that time, they were mainly focused on infrared studies like HITRAN [2,3] or GEISA [4,5] or were focused on providing UV cross sections of gaseous molecules of atmospheric interest [6].
To enable platform independent usability, both the spectral data as well as the datasheets (meta-data such as publication, authors, source, wavelength range, temperature, pressure, phase etc.), are available as plain ASCII text. To guarantee the high quality standard of the fast growing "UV/Vis + Spectra Data Base", an international "Scientific Advisory Group" (SAG) was established in 2004, and the database was operated in accordance with the "Open Access" definitions and regulations of the CSPR Assessment Panel on Scientific Data and Information (International Council for Science, 2004, ICSU Report of the Committee on Scientific Planning and Review Assessment Panel on Data and Information [7]). Since 2004, in addition to publishing the on-line database every 2 years, a mirror of the on-line database has been published on CD-ROM. The latest edition in the " science-softCon UV/Vis + Spectra Data Base" series was published in 2019 [8]. The on-line database currently (as of January 2020) contains about 14,000 spectra/datasheets as well as 5200 graphical representations for about 3000 substances and is subdivided into 28 substance groups (e.g. hydrocarbons, pharmaceuticals, pesticides, polycyclic aromatic hydrocarbons, etc.). The database is updated weekly. In addition to the inclusion of new data, a main focus of the database is the preservation of data from older publications.
As mentioned by the CSPR Assessment Panel on Scientific Data and Information, database maintenance and management are costly [7]. Collection of data, preparation of metadata, and provision of professional data management expertise and institutional support for data dissemination and permanent archiving will add to the overall expense of specific research projects and maintaining the larger research infrastructure.
"Full and open access" to data implies equitable, non-discriminatory access to all data that are of value for science. It does not necessarily equate 'free of cost' at the point of delivery.
There are several economic models for providing scientists with access to data for research and education [7]. The "UV/Vis + Photochemistry Database" allows free and open access to all meta-data, and cost-recovery pricing for data (or data licenses) in order to support the full data infrastructure. Different charged subscriptions giving full-access to the data are available: for example a yearly campus-wide license provides full access to all data and information for less than 1 USD per day (for universities, governmental organizations, nonprofit organizations) and a "One-time registration" license allows the subscriber to indefinitely have access to all data and information. Both licenses includes a copy of the 12th edition of the "UV/Vis + Spectra Data Base" CD-ROM [8]. In addition, those colleagues who support us in maintaining the database through the provision of new or missing data and information can get personal free-of-charge access to all data and information. More information is available at www.photochemistry.org.

Database structure and content
The database contains spectral information (gas, liquid and solid phase) from extreme ultraviolet to near infrared spectral region (EUV-VUV-UV-Vis-NIR) and related data (e.g. information concerning publications on quantum yield studies or photolysis studies) from published peer-reviewed papers. Besides absorption spectra, which comprise most of the available data, fluorescence spectra, photoelectron spectra, circular and linear dichroism spectra, quantum yields etc. are available. The database is structured into 28 categories which outlines only a rough classification.
The data sheets provide meta-data (substance name, formula and CAS number, data source, full reference, including title, authors, journal and DOI when available, spectral range and resolution, temperature, pressure, phase, etc.), as well as data in various forms obtained and presented in the literature. This includes, for example, absorption data measured over a specific wavelength/energy range in tabulated form. In many applications (e.g. quantum yield studies or photolysis studies), the absorption cross section (σ) or the molar extinction coefficient (ε) at a specific wavelength (λ) is determined, and these single wavelength data are also included in the database. For many substances temperature dependent data are available.
Most of the available data are from published peer-reviewed papers (> 98%), data presented at scientific meetings and conferences are also available (< 1%), as well as data from PhD theses, reports and unpublished material (< 1%). As an example of the database structure and contents, absorption data of carbonyl fluoride (COF 2 ) from three different sources are presented in Fig. 1. The data sets are as provided by the authors or listed in the relevant publications. To enable a platform independent usability, all data are provided as plain ASCII text.
During more recent years, almost 3000 graphical representations, mostly from older publications have been digitized and added to the database. In addition, we have converted more than 100 datasets from "floppy discs" and hence prevented these data from being lost as technology has evolved.
Since 2019 the database has been extended to include circular and linear dichroism spectral data (Fig. 2). The absorbance curves were recorded with the electric vector of the sample beam parallel and perpendicular to the stretching direction of the polyethylene polymer.
In addition to the data relevant to the different species, the following data and information (including software) are available to all interested scientists:

AutoChem
AutoChem is an automatic computer code generator and documentor for chemically reactive systems written and updated by David Lary of NASA Goddard Space Flight Center since 1993. It was designed primarily for modeling atmospheric chemistry, and in particular, for chemical data assimilation (see [13] and references herein).

Daily Solar Irradiances
The solar irradiance data was kindly provided by Judith Lean of the Naval Research Laboratory (NRL). Daily files are available from 1975 to 2004. The solar spectra in 203 bands from 0.121 -0.859 μm are given. MAS was part of NASA's Atmospheric Laboratory for Applications and Science (ATLAS) Spacelab shuttle mission and provides a database for the study of changes in the middle atmosphere. MAS is a remote sensing instrument for passive sounding of the Earth's atmosphere, designed to study the formation and destruction of ozone by measuring emission lines of ozone, water vapor, and chlorine monoxide from the space shuttle. The millimeter-wave radiation emitted by the atmosphere in the altitude range between 10 km and 100 km has been measured at 61, 62, 63, 183, 184 and 204 GHz.
SUMER was a UV telescope and spectrometer designed for high-resolution observations of the Solar atmosphere in the extreme ultraviolet wavelength range from 50 to 160 nm. The instrument was part of the payload on-board the ESA/NASA spacecraft SOHO.

Database applications
The interdisciplinary usability of the "UV/Vis + Photochemistry Database" is shown on the basis of four different applications.

Modeling direct phototransformation of aquatic organic contaminants under environmentally relevant conditions
An increasing number of synthetic organic compounds (SOCs) are being used and discharged to the aquatic environment. Under the influence of natural sunlight, some of these SOCs are likely to transform through photolysis to more toxic products or to be photopersistent. To meet the large data demand for risk assessment of SOCs, a cheminformaticsbased direct photolysis reaction library is being developed as part of the United States Environmental Protection Agency's Chemical Transformation Simulator (CTS) project to predict the direct photolytic transformation products formed from organic contaminants in waters [48]. Both the literature and data service of UV/vis + Photochemistry Database are useful in developing and applying such a predictive tool.
Central to the direct photolysis predictive tool is a large compilation of relevant literature about the direct photo-transformation pathways of SOCs. The science-softCon UV/Vis + Photochemistry Database's literature service has served as a convenient and reliable literature source. Up to Janurary 2020, 109 of the 390 compounds compiled by CTS's direct photolysis reaction library are logged in the UV/Vis + Photochemistry Database. Compared with the larger Reaxys database, which also has UV-Vis spectrum and photolysis information, the UV/Vis + Photochemistry Database is of no or low cost, focused on photochemistry, easier to locate relevant information, and, most importantly, has digitalized spectra for analysis.
When applying the direct photolysis predictive tool, users will be able to predict transformation products. However, direct photolysis rate information is also important to know whether a contaminant is degraded and whether its products can be formed at an environmentally relevant rate. The UV-vis spectrum and quantum yield logged in the science-softCon UV/Vis + Photochemistry Database can be used for such a rate estimation according to the following equation [17,18]: k ≈ 2.303 ∫ I λ ε λ φ λ dλ ≈ 2.303φΣI λ ε λ Δλ (1) Where k is the first-order rate constant for the direct photolysis of the contaminant in pure water, I λ is the solar irradiance at a certain wavelength λ, Ɛ λ is the molar extinction coefficient of the contaminant at wavelength λ, and φ λ is the quantum yield of the photodegradation process at wavelength λ. Δλ is often 1 nm if we use I λ and Ɛ λ at every relevant wavelength. The first "≈" in the equation assumes that the solution is optically thin (absorbance < 0.02) and ignores the effect of reflection on pathlength, and the second "≈" assumes that φ is the same across the corresponding wavelengths and uses finite Δλ for integration. Table 1 shows an example back-of-the-envelope estimation of the direct photolysis rate at water surface of a strongly-absorbing pesticide, trifluralin, a moderately-absorbing explosive compound, RDX (1,3,5-trinitroperhydro-1,3,5-triazine), and a weakly-absorbing pesticide, metolachlor using the information obtained from the science-softCon UV/Vis + Photochemistry Database and a reference solar spectrum of a mid-latitude mid-summer day [18]. The estimation was conducted manually but can also be performed using software such as GCSOLAR [19] and APEX [20]. An example of the detailed spectra is plotted in Fig. 3.a for trifluralin. All three compounds were predicted to photodegrade to products: three 1st-generation products for trifluralin as shown in Fig. 3.b, one for RDX, and three for metolachlor. However, the estimated half-lives suggest that metolachlor is less photolabile compared to trifluralin. Therefore, products of metolachlor are less likely to be photochemically formed in the environment compared to that of trifluralin, providing the ratios of product formation over parent degradation are similar and the photo-lability of 1stgeneration products are similar. As illustrated in this example, the science-softCon UV/Vis + Photochemistry Database provides a literature-based method of estimating direct photolysis rates, adding another piece of information in identifying photolabile contaminants and their product formation. A larger amount of more accurate data, especially in near-UV to visible range and in aquatic phase, would lower the current large uncertainty between rate estimation and experimental data (Table 1 for metolachlor) and eventually allow automation of such calculations.

Application of UV VUV absorption spectra for air quality, photochemistry and climate on Earth
The UV-VIS spectral region is of fundamental importance for understanding air quality, the photochemistry of the troposphere and stratosphere, and cloud coverage, and their implications for climate change. In particular, by using the unique absorption signatures of gas species (see Fig. 4), abundances of a range of air pollutants like ozone (O 3 ), nitrogen dioxide (NO 2 ), formaldehyde (HCHO), and glyoxal (CHOCHO) (see Fig. 5) can be inferred from satellite UV-Vis spectroscopy measurements, as well as, for example, species involved in the destruction of polar ozone (BrO, IO and OClO). Satellites' observations can be used together with climate models to understand the relationship between the atmospheric composition and climate. The combination of UV/Vis radiation allows the retrieval of atmospheric parameters that are key in the research of climate, air quality and polar chemistry: • Relevant chemical species in the troposphere; • Spectral optical density of aerosols in a full spectral range; • Cloud cover; • Cloud height.
The continuous improvement of data provided by the "science-softCon UV/Vis + Photochemistry Database" is fundamental in achieving the required accuracy of the ever more demanding spectral measurements of the atmosphere.

Application of UV and VUV absorption spectra for breath gas analysis
The study of Secondary Organic Aerosols (SOAs) which are the reactive products of gasphase photo-oxidation of both naturally-occurring and man-made volatile organic compounds (VOCs) can be studied in the UV-VUV spectral region. Volatile organic compounds are also released by animal and human organisms with normal metabolic activity or due to pathological disorders because of diseases, infections and/or internal injuries. Breath gas analysis, which works on the principles of understanding changed level of concentrations of VOCs in exhaled breath from human beings or animals, can indicate the onset or progression of a disease [31]. For example in human beings, the presence of acetone in the breath can be observed in the case of untreated or poorly treated diabetes [32], and pentanes and carbon disulfide have been associated in case of schizophrenia [33] etc. However there are many short-comings with existing gas detection techniques such as gas chromatography, mass spectroscopy, and infrared spectroscopy which include complications such as sample pre-concentration, imperfect existing profile recognition, time-consuming, nearly impossible real-time measurements, limited characteristic VOC identification and detectable components, and definite differentiation of isomeric and isobaric molecules [34]. These limitations are compounded by complications as VOCs in breath gas samples are governed by a combination of their concentration in blood, the retention time of the compounds in the lung and airway tissue (inhalation -exhalation cycle), and last but not least, the VOC blood/air partition coefficient [31]. To overcome these limitations for using breath gas analysis as a diagnostic tool, a new concept of using GC-MS technology in conjunction with ultraviolet absorption spectroscopy is being evaluated by developing laboratory equipment [35][36][37]. In particular the use of the vacuum ultraviolet (VUV) region improves the detection of such compounds, because absorption cross sections in this spectral range are usually larger by orders of magnitude than in the UV or in the IR. This is illustrated in Fig. 6, where the VUV-UV cross-section of benzene is shown. The higher sensitivity in the VUV range allows fast and sensitive spectroscopic detection. Moreover it is also possible to derive structural and isomer-related information from the VUV spectra. Especially small oxygenated substances like short chain fatty acids, small n-alkanols (C2-C4), esters, aldehydes and ketones, which are abundantly present in breath gas, show good VUV sensitivity with rich structural information [38]. Isoprene has been reported as a candidate for monitoring cholesterol metabolism and has been successfully measured in breath [39].
The existing ultraviolet absorption spectra and accurate spectroscopic information of molecules available from "science-softCon UV/Vis + Photochemistry Database" play a major role in identifying their presence and finger printing them in a specific isomeric or isobaric phase as a good starting point. Understanding the presence of different molecules and their concentrations in exhalation and inhalation cycles in human beings and animals as a consequence of biochemical changes is an important driver for developing analytical methods such as breath gas analysis with UV-VUV spectroscopy as an analytical tool for medical diagnostics. Spectroscopic investigations of biologically important molecules cataloged in the present database and their spectral responses in mixtures with atmospheric gasses provide important inputs in understanding their role in physiological processes. These inputs in turn generate important data required for understanding the presence of disease, disease management and usefulness of new drugs under trial for a specific disease.

Remote sensing and modeling of Mars and Venus atmospheres
Today more and more complex instruments are used for remote sensing of the atmosphere of the Earth and other planets of our Solar System or even further, with the development of missions and instrumentation, not only to search for exoplanets, but now to deliver spectroscopic survey of their atmospheres. In particular, instruments probing the UV-visible spectral range have been used to characterize the atmospheric composition of a wide variety of planets. For example, the SPICAM (SPectroscopie pour l'Investigation des Caractéristiques Atmosphériques de Mars) instrument on board the ESA's mission Mars Express, is a remote sensing spectrometer observing Mars in the ultraviolet and the near infrared [41]. The UV range covers the absorption due to CO 2 and O 3 (118-320 nm). The instrument probes the Martian atmosphere in nadir geometry (looking down to the surface of the planet) or through stellar and solar occultation. In solar occultation (i.e. looking at the Sun through the atmosphere), the signal received by the instrument is given by: where I Sun represents the light intensity of the source, here the Sun, placed at the starting point of the raypath situated at the distance s obs from the observer, and τ is the optical depth along the line of sight, generally expressed as τ λ, s 1 , s 2 = ∫ s1 s2 σ a (λ, P , T )n(s)ds (3) where σ a is the absorption cross-section and n is the density of the species. The integration is done along the line of sight, i.e. through the atmosphere, considering the temperature (T) dependence of the cross section, as temperature varies substantially from the lower layers to the upper layers. This spectroscopic investigation is based on the use of absorption crosssections measured in the lab and inventoried in databases such as the Spectra Database described in this work. In the case of SPICAM, we considered the temperature dependence of the CO 2 and O 3 absorption cross sections [42]. Fig. 7 illustrates a typical analysis of a set of spectra obtained during one solar occultation observation, in which the signature of ozone is quite easily observed. A similar approach was considered for the analysis of the UVIS channel of the NOMAD instrument on board ExoMars TGO [43].
However, the accurate knowledge of cross-sections is not only required for the direct analysis of the observations, but is also crucial for the modeling of such atmospheres, through 1D photochemical to 3D Global Circulation models. The continuity equation ∂n ∂t = P − nL − ∂Φ ∂z (4) governs the chemical composition within an atmosphere, where z is the altitude, n is the number density of the species, P and L are its production and loss rates, and Ф is its upward transport flux. The production and loss rates are provided from all chemical and photochemical reactions that produce or consume the species. In particular, the photodissociation rate (J) is given by: )σ a (λ, T (z))F (λ, z)dλ (5) where σ a is the absorption cross-section, φ is the quantum yield of the photodissociation process and F is the actinic flux at the altitude z where temperature T prevails. But absorption, and therefore the effect of the cross-section, is also hidden within F (λ,z) which corresponds to the radiation flux reaching the altitude z and is given by if no scattering is considered within the atmosphere. F 0 is the radiation flux at the top of the atmosphere, μ= cos(θ) with θ the solar zenith angle and τ(λ,z) is the optical depth which includes the absorption by all gasses, Rayleigh scattering and aerosols extinction from the top of the atmosphere down to the altitude z.
Available cross-sections are usually derived from ambient temperature measurements, some are provided at low temperatures corresponding to atmospheric conditions encountered on Earth. However, data at temperatures and pressures found on other bodies, in particular at high temperatures and low pressures, are very scarce and often restricted to small spectral ranges. This lack of data has a direct impact both on the retrieval of abundances using UV spectra (Equ.3), but also on the simulation of the photochemical processes within these atmospheres. In particular CO 2 is an important species which has a large impact on the photochemistry of atmospheres of Mars and Venus, controlling the transmissivity of the atmosphere and therefore the photolysis rates of the other species. The impact of the temperature dependence of CO 2 cross-section has been analyzed in several recent papers [44][45][46], showing the importance of considering the temperature dependence on a wide range of temperature as recently imported into the database [46,47]. Indeed, the absorption of the UV flux increases substantially together with temperature since CO 2 absorption cross sections are larger at higher temperatures. Using room temperature reference data to model CO 2 -rich atmospheres, the absorption of the UV flux and therefore the photochemical rates are underestimated.

Outlook
The "science-softCon UV/Vis + Photochemistry Database" is continually evolving and growing. As of January 2020, it includes about 14,000 spectra/datasheets as well as 5200 graphical representations for about 3000 substances and is subdivided into 28 substance groups (e.g. hydrocarbons, pharmaceuticals, pesticides, polycyclic aromatic hydrocarbons etc.) This is a tremendous effort and requires a lot of manpower, not to mention technical infrastructure. We hope that the database proves useful to the scientific community and will facilitate their day-to-day work.
Since the support by the scientific community is crucial for such a photochemistry database, we would like to encourage all colleagues to assist us in maintaining the database and join our initiative "Photochemical Data and Information Sharing Platform -Share Photochemical Data & Information, Find Answers".
This initiative should develop the photochemical database towards a photochemical data sharing platform. The advantage of such a photochemical data sharing platform is that the more scientists provide their data for inclusion into the database the better is the chance for all users to find specific photochemical data within the database. In addition, the platform becomes increasingly beneficial for use across multiple disciplines. UV absorption spectrum of COF 2 obtained by Noelle [9] (blue curve), Noelle et al. [10] (red curve), and Molina and Molina [11] (green squares).  Linear dichroism absorbance spectrum of dibenzo-p-dioxin [12]. The absorbance curves were recorded with the electric vector of the sample beam parallel (blue line) and perpendicular (orange) to the stretching direction of the polyethylene polymer.  Degradation rate estimation (a) and product prediction (b) for direct photolysis of trifluralin.  Left: Absorptions due to molecules that are now commonly measured from space in the nadir geometry; Right: Back scattered albedo spectra from GOME measurements for two extreme examples. The highest albedo scene, corresponding to full coverage by high clouds, is white and quite bright, due to the cloud reflectance; the lowest albedo case is a cloud-free scene over the ocean [30].  Tropospheric NO 2 retrieved from SCIAMACHY measurements in the 425-450 nm spectral region. Pollution on urban scales is readily measured globally [30].  Absorption cross-section of benzene [40]: in the VUV region the cross-section values are between 2 and 3 orders of magnitude larger than in the UV range.  Example of SPICAM transmittance spectra obtained during one solar occultation observation. The color indicates to altitude probed from low altitudes (blue) to high altitudes (red). Black lines are fitted spectra using (Eq. (2)) [Figure from [42]]. a calculated using a reference spectrum of June 21 daily irradiance at 40 °N [18] and UV-vis spectra from the literature. b The selected φ for half-life calculation is underlined. c experiment conducted in spring at 45 °N, intensity approximately similar as the used reference solar spectrum [29].