A simple method to isolate fluorescence spectra from small dissolved organic matter datasets

Dissolved organic matter (DOM) is a complex pool of compounds with a key role in the global carbon cycle. To understand its role in natural and engineered systems, efficient approaches are necessary for tracking DOM quality and quantity. Fluorescence spectroscopy combined with parallel factor analysis (PARAFAC) is very widely used to identify and quantify different fractions of DOM as proxies of DOM source, concentration and biogeochemical processing. A major limitation of the PARAFAC approach is the requirement for a large data set containing many variable samples in which the fractions vary independently. This severely curtails the possibilities to study fluorescence composition and behavior in small or unique datasets. Herein, we present a simple and inexpensive experimental procedure that makes it possible to mathematically decompose a small dataset containing only highly-correlated fluorescent fractions. The approach, which uses widely-available commercial extraction sorbents and previously established protocols to expand the original dataset and inject the missing chemical variability, can be widely implemented at low cost. A demonstration of the procedure shows how a robust six-component PARAFAC model can be extracted from even a river-water dataset with only five bulk samples. Widespread adoption of the procedure for analyzing small fluorescence datasets is needed to confirm the suspected ubiquity of certain DOM fluorescence fractions and to create a shared inventory of ubiquitous components. Such an inventory could greatly simplify and improve the use of fluorescence as a tool to investigate biogeochemical processing of DOM in diverse water sources. © 2020 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )


Introduction
Dissolved organic matter (DOM) is an important component of the carbon cycle in both natural and engineered aquatic environments ( Bianchi, 2011 ;Ridgwell and Arndt, 2014 ). A variety of different approaches identify and quantify DOM with varying degrees of molecular insight and analytical complexity ( McCallister et al., 2018 ;Mopper et al., 2007 ). Amongst the more rapid and affordable techniques, ultraviolet-visible spectroscopy gives insight into the optically-active fractions termed chromophoric DOM (CDOM, determined via absorbance measurements) and fluorescent DOM (FDOM, characterized through fluorescence measurements). Despite decades of study, the chemical origin of DOM's optical properties and the chemical interpretation of the obtained signals remains poorly constrained ( Aiken, 2014 ;Rosario-Ortiz and Korak, 2017 ). * Correspondence to: Sven Hultins Gata 6, 41296 Gothenburg, Sweden.
Fluorescence excitation-emission matrices (EEMs) of DOM consist of broad emission spectra whose maxima tend to increase with increasing excitation wavelength ( Coble et al., 1990 ). In an effort to assign chemical interpretations, fluorescence emission in different wavelength regions is commonly attributed to chemical fractions or else is assigned a primary source such as terrestrial organic matter ( Coble, 2007 ). EEMs can be analyzed quantitatively under the assumption that the observed signals are due to the superposition of distinct fluorescence phenomena that could theoretically convert to concentrations if the true fluorescence quantum yields were known. However, fluorophores and meaningful fluorescence quantum yields remain largely unknown. Additionally, there is an ongoing debate on the origin and behavior of DOM fluorescence signals and part of DOM's fluorescence emission has been attributed to charge-transfer interactions between donor-and acceptor species ( McKay, 2020 ;McKay et al., 2018 ;Sharpless and Blough, 2014 ). Regardless of the underlying principles that cause DOM fluorescence, it has been observed that EEMs dominated by autochthonous or allochthonous material seem to largely consist of  . To appropriately interpret fluorescence EEMs and reliably establish the properties and biogeochemical distribution of fluorescence signals, overlapping fluorescence spectra must be reliably separated. Mathematical approaches are often deployed to separate and distinguish between superimposed fluorescence phenomena in EEMs ( Murphy et al., 2014a ;Stedmon et al., 2003 ). The most widely-adopted tool is a chemometric model called parallel factor analysis (PARAFAC), which identifies the underlying fluorescence spectra that best explain the observed patterns in the data set assuming linearly additive signals. Under ideal conditions when model assumptions are fulfilled (see Murphy et al., 2013 ), all systematic fluorescence emission will be accounted for and the true spectra of underlying fluorophores can be well approximated ( Bro, 1997 ). Most importantly, PARAFAC assumes that fluorescence EEMs are made up of a limited number of fluorescence phenomena with invariant excitation and emission spectra. Frustratingly, DOM fluorescence EEMs are typically very similar and many EEMs are needed so that a PARAFAC analysis produces meaningful components ( Stedmon and Bro, 2008 ).
There are several scenarios in which the conditions for a successful application of PARAFAC may not be met. In particular, since PARAFAC identifies the spectra of underlying fractions via their change in abundance relative to other fractions, it fails when samples are few and/or the abundances of underlying fractions are too correlated. In that case, PARAFAC will represent multiple discrete fractions as a single component with intermediate spectral properties, with only a small number of fractions being identified in total. Increasing sample size by supplementing a dataset with unrelated samples is not usually a solution to this problem, since at the other extreme, PARAFAC fails if samples are too different and can't be represented by the linear superposition of a relatively small number of spectra (typically < 10) plus only random error. Numerous studies have attempted to overcome this problem by redeploying models that were previously established using large datasets ( Fellman et al., 2009 ;Miller et al., 2006 ;Romera-Castillo et al., 2014 ;, but this approach relies on presumptions that are impossible to verify, including that the original model recovered only chemically-meaningful spectra, and that both the underlying fluorescence phenomena and the instrumental measurement error structures have remained constant between developing and redeploying the model. Earlier work demonstrated that numerous processes, including size fractionation, metal quenching, and biodegradation, each introduce compositional variability that can be exploited by PARAFAC to separate underlying fluorescence fractions ( Cuss and Guéguen, 2012 ;Guéguen et al., 2013 ;. Only recently, however, have various approaches been optimised with the aim of establishing reliable methods for extracting spectra from extremely small datasets containing as few Fig. 1. Schematic of the augmentation approach. Samples are collected, filtered, then absorbance and fluorescence properties are determined (Abs = absorbance, EEM = excitation-emission matrix). One sample is selected for fractionation with three different solid-phase sorbents. The combination of whole-water and permeate absorbance spectra and EEMs is assembled and analyzed to identify fluorescence components. SAX: Strong anion exchange sorbent; NH2: weak anion exchange (aminopropyl bonded) sorbent; PPL: reverse-phase styrene-divinylbenzene sorbent. as one bulk sample. Photodegradation, size-exclusion chromatography and asymmetrical flow field-flow fractionation have been shown to produce the missing variability needed to overcome the basic limitations of one-sample datasets ( Guéguen et al., 2013 ;Lin and Guo, 2020 ;Murphy et al., 2018 ;Wünsch et al., 2017 ). However, spectral averaging still occurs in some cases, and the abovementioned one-sample methods each involved expensive equipment together with complex analytical workflows, which detracts from the inexpensive and rapid nature of DOM fluorescence measurements. The goal of this study was to develop a simple, widelyapplicable method for extracting fluorescence spectra from datasets that cannot be reliably decomposed by traditional PARAFAC methods, due to too few samples and/or the presence of strongly covarying fractions. Recent work highlights the widespread deficiencies caused by ignoring these critical constraints Wünsch et al., 2017 ). The basic method involves augmenting the target dataset with fractionations obtained using three inexpensive and widely available solid-phase extraction (SPE) cartridges ( Fig. 1 ). This method is shown to overcome the dual limitations of small sample size and low chemical variability by introducing statistically meaningful variability and reducing covariance between the underlying fluorescence components.

Samples
Whole-water samples (N = 10) were collected at a drinking water treatment plant in southern Sweden on September 24, October 14, and November 5 2019 (N = 9), as well as May 25 2020 (N = 1). The treatment plant is fed directly from the Göta Älv, a 93 km long river at the west coast of Sweden sourced from lake Vänern that drains into the Kattegat at the city of Gothenburg. The 2020 sample was taken at the river intake, whereas the 2019 samples were taken at either the river intake (N = 3) or after primary biofilters (N = 6).
All samples were filled into pre-combusted amber glass bottles and transported directly from the sampling location to the laboratory where they were filtered with pre-combusted GF/F filters (0.7 μm, 2019 samples) or water-flushed 0.2 μm syringe filters (polysulfonate, 2020 samples). Filters with 0.2μm pores remove a small additional fraction of absorbing and fluorescing colloidal matter compared with filtering through 0.7 μm pore-size filters ( Massicotte et al., 2017 ;Nimptsch et al., 2014 ). We thus carried out our study under the assumption that samples filtered through 0.2 and 0.7 μm pores can be compared qualitatively if not quantitatively. After filtration, the samples were stored at 4 °C in the dark. All subsequent measurements (below) were done within five days of sampling. Dissolved organic carbon (DOC) was determined for one of the whole-water samples taken in 2019 (after biofilter treatment step) and 2020 (raw water) using high-temperature catalytic combustion with a TOC/V CP H (Shimadzu).

Augmentation Technique: Solid-Phase Extractions
To supplement the small whole-water dataset, solid-phase extractions (SPE) were carried out on a single sample taken in May 2020. Initially, a range of pre-packed, commercially-available sorbent materials were tested (silica, C2, C8, styrene-divinylbenzene, NH2, SAX) to identify those that produced high spectral variability across all permeates, indicating varying time-profiles for extracting different DOM fractions. Each column was first cleaned by a methanol-soak for a minimum of 24 h, followed by a rinse with three column volumes of fresh methanol and ultrapure water (at circumneutral pH) before extractions were performed. Initial tests led to a shortlist of three well-performing commercial sorbents: SAX (strong anion exchange sorbent, 100 mg, Agilent Technologies), NH2 (aminopropyl-bonded ion exchange sorbent, 200 mg, Agilent Technologies), and PPL (reverse-phase sorbent, 200 mg, Agilent Technologies). Permeates from only these three sorbent materials were combined to produce the dataset analysed in this study ( Fig. 1 ). Note that while the combination of SAX, NH2 and PPL permeates appeared to effectively fractionate fluorescent DOM in the studied riverine samples, other DOM sources may require different or additional sorbents and/or modified experimental protocols to achieve comparable success.
Approximately 45 mL of sample at ambient pH (approx. 6.8) was subsequently applied onto each of the three columns. A Luerslip three-way valve was installed after the column in order to divert the column permeate through a 5-mL Luer-slip syringe (Fig.  S1). The three-way valve and sampling syringe were cleaned with 10% HCl and rinsed thoroughly with ultrapure water before use. Permeate samples (3 mL) were drawn slowly by hand, then the flow of sample was stopped until the next permeate collection. The fluorescence cuvette was rinsed with one mL of the permeate sample and the remaining two mL were used for spectroscopic measurements. This procedure was repeated until approximately 45 mL had passed over each of the columns. Since optical measurements and extractions were performed simultaneously, the flow over the SPE cartridge was stopped regularly for intervals of between five and fifteen minutes. Each extraction took between three and five hours to produce and measure up to 15 subsamples, depending on the fluorometer settings (below). The three extractions were carried out sequentially and all measurements completed within 2 days.

Fluorescence Spectroscopy
Fluorescence and absorbance measurements were obtained using a HORIBA AquaLog fluorometer using a 10 mm quartz cuvette (Helma GmbH). Fluorescence emission was detected in the range of 220 -800 nm (increment ~3.3 nm) with an integration time between 3 and 9 s at excitation wavelengths between 240 nm and 450 nm (increment 3 nm) for the samples measured in 2020 and between 241 nm and 700 nm (increment 3 nm) for the 2019 samples. The accuracy of the excitation monochromator, emission detector, and the optical immaculacy of cuvettes were validated daily following a previously described protocol ( Wünsch et al., 2015 ). Excitation and emission offsets on each axis were corrected by adding three nm to the values reported by the instrument.

Data processing & chemometrics
Spectroscopy data were processed in MATLAB (v9.8, Mathworks Inc.) using the drEEM toolbox, version 0.6.0 . Inner filter effects were eliminated with the absorbance-based method ( Kothawala et al., 2013 ) and signals in each EEM normalized using the Raman peak area of ultrapure water. Since excitation wavelengths settings differed slightly between samples measured in 2019 compared to 2020, a 2D gridded linear interpolation was carried out to adjust the excitation axis of the 2019 EEMs to correspond to the EEMs measured in 2020.
The two data sets were subsequently merged and 1 st and 2 nd order physical scatter was replaced with missing numbers (no interpolation). Fluorescence emission longer than 650 nm or shorter than 310 nm was excluded from further analyses. Moreover, emission scans at 358.1, 312.6, and 519.7 nm were deleted after a preliminary analysis indicated frequent signal instability at these wavelengths. Frequent noisy measurements at excitation 246 nm also necessitated the exclusion of this excitation wavelength from further analyses. Several of the SPE permeate EEMs were then removed from the data set either because of very low fluorescence signals or excessive instrument noise (N = 8).
The underlying components of fluorescent DOM in the samples were isolated with PARAFAC using the drEEM toolbox in conjunction with the N-way toolbox ( Andersson and Bro, 20 0 0 ;Murphy et al., 2013 ). To avoid high leverages from the most fluorescent samples, all EEMs were scaled by the 3/2 th root of the standard deviation of each EEM. This scaling did not amplify measurement noise in low-signal samples as severely as established methods (scaling to unit variance) but increased the weighting of these samples compared to the unscaled data set (Fig. S2). Models with two to seven components were explored. All models were constrained to fit components with positive scores and loadings (nonnegativity constraint). Each model was initialized with random orthogonalized numbers and the best model (with lowest SSE, sum of squared errors) out of 50 random initializations was selected. A maximum of 10 4 iterations was allowed and relative change in fit of 10 −8 was chosen as the convergence criterion.

Modeling approach
Parallel factor analysis was applied to two different data sets ( Table 1 ): (1) The small dataset consisting of ten whole-water sam-ples. This first approach represents a conventional PARAFAC analysis with the caveat that sample size and variability were likely insufficient for it to be possible to derive a valid PARAFAC model; (2) the augmented dataset consisting of the ten whole-water samples and 37 SPE permeates.
In line with previous research on extractions of DOM ( Andrew et al., 2016 ;Li et al., 2017 ;Li et al., 2016 ;Rosario-Ortiz et al., 2007 ;Wünsch et al., 2018 ), it was assumed that different chemical fractions would be extracted at different rates. Moreover, we hypothesized that permeates represent sub-samples of the same original sample and not a set of wholly independent samples. These assumptions were tested by viewing model residuals to ensure randomness, investigating the degree of autocorrelation between components representing different FDOM fractions, and through cross-validation to ensure that highly similar models were reached using different subsets of the dataset. The obtained models were split-half validated as follows: The data set was divided into two randomly-sampled halves consisting of approximately half the number of bulk and half the number of available SPE-permeate EEMs, then the models obtained from each half compared visually and statistically.
All internal (within study) comparisons between spectra were based on the Tucker congruence coefficient (TCC) for excitation spectra and the shift-and shape sensitive congruence (SSC) for emission spectra, where values larger than 0.95 signified indistinguishable spectra ( Lorenzo-Seva and ten Berge, 2006 ;Wünsch et al., 2019 ). Since the SSC is more sensitive toward differences between spectra, this validation is more stringent than using TCC on both emission and excitation spectra. Inter-study comparisons with 145 fluorescence models describing DOM fluorescence was carried out using the OpenFluor database ( Murphy et al., 2014b ). All spectra with TCC ex > 0.97 and TCC em > 0.98 were exported and compared further.
An attempt to obtain a valid PARAFAC model from only the ten whole-water EEMs was unsuccessful (Fig. S4) because the small data set lacked the variability required by PARAFAC to distinguish between several highly-correlated fractions. This failure is evidenced by models with between two and seven components containing one or more components with atypical features (Fig. S4, highlighted components), including multiple distinct emission peaks and nonsensical excitation spectra lacking absorption between consecutive absorption bands. Due to the small number of samples, only a two-component model could be split-half validated.

The augmentation approach
Augmenting the original whole-water data set with the 37 SPE permeates added the missing chemical and mathematical variability needed to acquire a valid PARAFAC model. The first indication of success was the significantly increased spectral variability in the raw EEMs ( Fig. 2 B, D) which can be seen by the increased variation in predefined fluorescence peaks (definitions above and Fig. 2 ). The coefficient of variation of peaks "C" and "D" increased from 1.6 % and 5.6% in whole-water samples to 6.4% and 14.1 % in the augmented data set, respectively. Similar results were seen for fluorescence indices covering different wavelength regions of the ultraviolet-visible EEM ( Fig. 2 B, D). The average EEM in the augmented data set also had visibly higher contributions of emission at short wavelengths (low end of the visible spectrum, Fig. S3, panel B compared to panel A).
In contrast to the model from only whole-water samples, the components in the augmented data set matched commonly observed spectra in fluorescent DOM. Models with between three and six components described between 99.61 and 99.95% of the augmented data set. A seven-component model was excluded from further consideration since it seemed to overfit the data by using multiple highly similar components and multimodal emission peaks (Fig. S5, bottom panel).
Considering the component loadings and modelling error, the six-component model was the most appropriate representation of the augmented data set. The model consisted of components with emission maxima at 330, 390, 410, 450, and 510 nm ( Fig. 3 ). The components will henceforth be referred to by their longest excitation and sole emission maximum (e.g. C 280 /330 ). Components C 280/330 , C 320/390 , C 300/410 , C 320/450 , C 350/450 , and C 380/510 had Stokes shifts between 0.64 and 1.14 eV ( Table 2 ). Fluorescence emission of the whole-water samples was dominated by C 300/410 , C 350/450 , and C 380/510 , which each contributed between 22 and 25 % to the overall modelled fluorescence on average, followed by C 320/390 , C 320/450 with approx. 10 and 13 % on average. Lastly, an average of 6.5 % of the whole-water EEMs were attributable to C 280/330 ( Table 2 , see  table for standard deviation of these averages).
Each of the three SPE sorbents differed in their removal efficiency for different spectra. This meant that the autocorrelation between components in the entire augmented data set was never severe (Fig. S6). In contrast, whole-water EEMs are often highly autocorrelated because dilution primarily affects all fractions of DOM simultaneously ; . Since the decomposition of fluorescence EEMs with PARAFAC assumes that no two fluorescence phenomena covary perfectly in their spectra or fluorescence intensities, severe autocorrelations may invalidate a model ( Bro, 1997 ;Murphy et al., 2013 ). In future applications of EEM-PARFAC, the augmentation of whole-water EEMs with SPE permeates thus provides a new method to decrease the autocorrelation between EEMs in a data set by introducing independent variability that would not occur naturally.
After the initial model exploration, the six-component model describing the augmented data set was successfully split-half validated. Two models derived from random data set halves (containing half of whole-water samples and SPE permeates each) were highly similar to the model derived from the entire augmented dataset ( Fig 3 ). It should be noted that a split-half validation is usually conducted by modeling independent samples. Identifying the same model independently subsequently indicates that the model is an appropriate representation of the data set ( Bro, 1997 ). In our study, the augmented data set consisted of ten independent whole-water samples and 37 dependent SPE permeate samples. Random halves thus partly consist of samples that originated from the fractionation of the same whole-water sample. In this case,  external comparisons with previously published models should be included to increase confidence in the identified model. Such comparisons are discussed below.
In the augmented data set, the SPE permeates outnumber the whole-water samples. This raises the possibility that the identified model could fit the SPE permeates better than the wholewater samples. The lack-of-fit for permeates and whole-water samples was therefore compared to assess the appropriateness of the model for the whole-water samples. The percentage of unexplained fluorescence was largest for the SAX permeates and ap-proximately equal for the remaining permeates and whole-water samples (Fig. S7). The larger residual for SAX permeates was likely a consequence of the high removal efficiency of the SAX resin which resulted in low fluorescence intensities and a high proportion of measurement noise. In contrast, the whole-water samples were described well by the model and sample leverages indicated that permeates and whole-water samples were equally important for the model (Fig. S7A). The overall model error was low and spectral residuals were reasonably flat and typical for DOM fluorescence. (Fig. S7B-C, Fig. S8). Overall this confirmed that whole- Table 2 Properties of the six validated PARAFAC components. Average contribution to whole-water fluorescence is the average relative component score across the ten wholewater samples ( ± standard deviation). The stokes shift was calculated as described elsewhere ( Lakowicz, 2006 ). For the number of OpenFluor matches, a threshold of TCC > 0.97 (excitation) and TCC > 0.98 (emission) was applied. The provided synonyms refer to similar components found in one-sample PARAFAC studies.   water samples and permeates were equally well approximated by the model despite differences in sample size. However, future applications of the augmentation approach may require adjustments in data preprocessing, particularly data scaling to carefully balance model outcome and sample weighting. The DOM samples from the Göta Älv had relatively high concentrations of DOC along with abundant fluorescence emission. Given the abundance of whole-water fluorescence signals, the monitoring of SPE permeates still recovered reasonable fluorescence signals despite the loss of material through the extraction. In applications where the fractionated whole-water sample has little fluorescence emission to start with (e.g. marine samples), the augmentation approach may require modification. For example, it may be not be feasible to obtain reasonable fluorescence signals from SPE permeates. In this case, it may be necessary to focus on the SPE extract instead of the permeate. An elution of the SPE-DOM with an increasing concentration of methanol and / or acetone will likely reveal similar variability in terms of fluorescence composition.

Solid-phase extraction performance
In addition to new opportunities for fluorescence decomposition, the monitoring of SPE permeates also allows the characterization of DOM polarity similar to the Polarity Rapid Assessment Method ( Rosario-Ortiz et al., 2007 ). Contrary to usual protocols for PPL and C18 sorbents, but similar to the Polarity Rapid Assessment Method, samples were not acidified in our study. Rather, pH was kept at ambient levels (pH 6.8) to avoid possible pH-induced changes to the fluorescence spectra of DOM ( Esteves et al., 1999 ). After passing 45 mL though each cartridge, the SPE sorbents removed an average of 31 % (PPL), 52 % (NH2), and 78 % (SAX) of whole-water fluorescent DOM from the extracted sample. All sorbents were most efficient at the beginning of the extraction process and performance decreased with increasing permeate volume (Fig. S9A). Among the three tested sorbents, the permeate fluorescence properties changed least for PPL. In contrast, permeate fluorescence properties during NH2 and SAX extractions changed considerably (Fig. S9B).
The noticeably low efficiency of the PPL sorbent was expected due to the circumneutral pH during extraction. In contrast, the efficiency of the NH2 sorbent was comparable to previously reported values for extractions of Suwanee River NOM at pH 2, while the SAX sorbent appeared more efficient than previously reported ( Li et al., 2017 ). We observed that none of the SAX-extract could be eluted with methanol and that the sorbent remained discolored after the attempted elution (data not shown). Since extraction efficiencies are commonly estimated by comparing whole-water and methanol-eluate DOC , the previously reported low extraction efficiencies of the SAX sorbent may simply be an artefact of the failure to elute extracted DOM instead of a lack of DOM adsorption onto the sorbent. In this context, continuous or sporadic fluorescence-based monitoring of permeates during the extraction process may help to better understand the extraction process of DOM in future studies.
The performance of SPE sorbents depends on the interaction between DOM and sorbent. Analyzing permeates may thus provide some degree of insight into the chemical properties of the extracted fluorescence components and the SPE process itself. Future studies could compare the extractability of FDOM at different pH or the extractability between DOM from different sources at a constant pH. Four of the six identified components (C 280/330 , C 300/410 , C 320/450 , and C 380/510 ) showed significant spectral overlap with a previous study that investigated the SPE performance of the PPL sorbent for marine DOM samples ( Fig. 4 ). However, a comparison in SPE performance between the two studies is beyond the scope of this work, since the two studies deviate with respect to both sample pH and DOM source.

Comparison of fluorescence components with previous studies
Recent studies argue for the ubiquitous occurrence of fluorescence spectra in DOM across a wide range of different aquatic environments ( Ishii and Boyer, 2012 ;Murphy et al., 2018 ;Wünsch et al., 2017 ). Quantitative assessments of similarity have been made for all studies where data was available in repositories. The longest emitting component C 380/510 matches reoccurring component C2 identified by Ishii and Boyer (2012), C 530 in Lin & Guo (2020) , C 510 in Wünsch et al (2017) , and ubiquitous F 520 in Murphy et al. (2018) . Furthermore, C 300/410 and C 320/450 ( Table 2 ) match further components that occur ubiquitously in different environments . C 320/450 and C 320/390 were also found in a single-sample PARAFAC study describing FDOM in the Milwaukee River ( Lin and Guo, 2020 ). These findings provide further evidence for the ubiquitous occurrence of certain fluorescence spectra in DOM and increase confidence in the identified six-component model. Furthermore, the agreement between previous single-sample studies and the approach presented herein shows that intricate photochemistry  or chromatography-type ( Lin and Guo, 2020 ;Wünsch et al., 2017 ) fractionations can be replaced by or amended with a simplified methodology.
In relation to the total number of compared studies, the low rate of matches may suggest that our study identified rare components. However, we previously demonstrated that many published PARAFACpublished models likely rely on too few components in the visible wavelength range ( Wünsch et al., 2019 ). In contrast, our study identified four components with emission maxima in the visible spectrum. A degree of divergence to many of OpenFluors reference spectra is therefore to be expected.
The model in our study includes a component with broad emission peaking at 450 nm (C 320/450 ) that found few close matches in the OpenFluor database. This component may occur less frequently (i.e. is specific to the modelled data set) or more likely, it represents the combined signal from multiple unresolved fractions. Future applications of the augmentation approach for different samples should help to better constrain the distribution and spectral properties of this component.
Numerous previous studies have observed that distinguishing signals in the ultraviolet-A emission range is especially difficult Murphy et al., 2010 ). In the ultraviolet-A emission range, 1 st order Raman and Rayleigh scatter intersect, and the energy output of fluorometer lamps is noticeably low ( Cory et al., 2010 ). Nonetheless, EEM-PARAFAC models often feature at least one ultraviolet-A-emitting component ( Wünsch et al., 2019 ). In our study, the only "protein-like" fluorescence spectrum observed (C 280 208 /330 ) did not resemble a typical tryptophan-like spectrum but instead emitted at shorter wavelengths. Since the model residuals were relatively flat in the tryptophan-like region, it is most likely that C 280 2 /330 represents a mixture of multiple ultraviolet-A-emitting signals that could not be further separated due to similar extractability with different SPE sorbents and overall low abundance of signals. Future implementations of the augmentation approach should focus on improving the separation in this regard. Emission in the ultraviolet-A range is often associated with labile DOM and capturing the turnover of this material depends on there being models that accurately reflect this material.

Conclusions & Future Directions
It is often claimed that at least 50 environmental samples are required for it to be possible to reliably isolate the underlying fluorescence spectra of DOM. The approach presented herein achieved this using only a few whole-water samples combined with a simple experiment to generate solid-phase extraction permeates (PPL, SAX, and NH2 sorbent). We demonstrated the successful application of this approach by resolving the optical properties of six independently-varying fluorescent fractions in DOM samples from a Swedish drinking water treatment facility.
While there is mounting evidence that certain fluorescence spectra occur ubiquitously in terrestrially-and autochthonouslyderived DOM, most published studies do not include them. Spectral libraries like OpenFluor facilitate comparisons between studies, but a logical next step is to incorporate the growing knowledge about ubiquitous spectra into a next-generation modeling approach. This requires rapid, reliable, and highly-reproducible methods for resolving overlapping signals and isolating independent fluorescence spectra from small DOM datasets. The approaches presented here represent another critical step in this direction.
In addition to aiding the identification of fluorescence phenomena, studying FDOM behavior during extraction offers valuable insights into the chemical properties of these phenomena. Future research should aim to combine this approach with others that deliver insights into e.g. apparent molecular size ( Guéguen et al., 2013 ;Lin and Guo, 2020 ;Romera-Castillo et al., 2014 ), photochemistry Timko et al., 2015 ), or polarity .

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.