Fluorescent Dissolved Organic Matter Components as Surrogates for Disinfection Byproduct Formation in Drinking Water: A Critical Review

Disinfection byproduct (DBP) formation, prediction, and minimization are critical challenges facing the drinking water treatment industry worldwide where chemical disinfection is required to inactivate pathogenic microorganisms. Fluorescence excitation–emission matrices-parallel factor analysis (EEM-PARAFAC) is used to characterize and quantify fluorescent dissolved organic matter (FDOM) components in aquatic systems and may offer considerable promise as a low-cost optical surrogate for DBP formation in treated drinking waters. However, the global utility of this approach for quantification and prediction of specific DBP classes or species has not been widely explored to date. Hence, this critical review aims to elucidate recurring empirical relationships between common environmental fluorophores (identified by PARAFAC) and DBP concentrations produced during water disinfection. From 45 selected peer-reviewed articles, 218 statistically significant linear relationships (R2 ≥ 0.5) with one or more DBP classes or species were established. Trihalomethanes (THMs) and haloacetic acids (HAAs), as key regulated classes, were extensively investigated and exhibited strong, recurrent relationships with ubiquitous humic/fulvic-like FDOM components, highlighting their potential as surrogates for carbonaceous DBP formation. Conversely, observed relationships between nitrogenous DBP classes, such as haloacetonitriles (HANs), halonitromethanes (HNMs), and N-nitrosamines (NAs), and PARAFAC fluorophores were more ambiguous, but preferential relationships with protein-like components in the case of algal/microbial FDOM sources were noted. This review highlights the challenges of transposing site-specific or FDOM source-specific empirical relationships between PARAFAC component and DBP formation potential to a global model.


■ INTRODUCTION
The use of chlorine and other chemical disinfection methods (such as ozonation or chloramination) for drinking water disinfection can lead to the unintentional formation of potentially harmful disinfection byproducts (DBPs), through reactions with dissolved organic matter (DOM) precursors present in the raw water source. 1 Consequently, nine organohalide DBPs including four trihalomethanes (THM4) and five haloacetic acids (HAA5) are regulated by drinking water authorities in the European Union (EU) 2 and United States (US). 3 However, in excess of 700 DBPs have been identified to date, the vast majority of which are unregulated with many thoughts of being potentially carcinogenic. 4 Moreover, a recent study estimated that 32−81% of total organic halogen (TOX) loads, produced during chemical disinfection, are attributable to as yet unidentified DBPs, highlighting the likely importance of new and emerging classes over the coming years. 5 DOM in freshwaters is composed of a multitude of soluble, reduced organic carbon compounds, which may be derived from autochthonous sources, such as in situ primary production (e.g., algae and microbial biomass), and allochthonous watershed sources, such as leaf litter and soil organic matter leachates 6 whose hydrological export varies spatially and temporally within river basins around the world. 7 Humic and fulvic acids (typically originating from allochthonous terrestrial sources) comprise high molecular weight and aromaticity humic substances, which are thought to be important DOM precursors for carbonaceous DBP (C-DBP) classes. 8,9 In contrast, autochthonous DOM derived from algal and microbial sources, as well as wastewater DOM, may play a role in the formation of nitrogenous DBPs (N-DBPs), 10,11 which are believed to be potentially more harmful to human health than C-DBPs. 12 A global increase in terrestrial DOM export is forecast over the coming decades as a consequence of climate and environmental change. 13,14 Increasing DOM concentrations in raw water sources derived from surface water will pose significant challenges for safe and sustainable production of drinking water, over the coming decades.
Modern advances in fluorescence excitation−emission matrix (EEM) spectroscopy have offered a unique perspective on fluorescent dissolved organic matter (FDOM) characterization and quantification in freshwater environments. Fluorescence spectroscopy is a low-cost, nondestructive, sensitive, and selective technique that can provide critical information on the molecular properties of complex FDOM admixtures. 15 The technique also offers considerable promise for quantification of intrinsic environmental fluorophores, which may be associated with DBP formation. 16− 22 Various methods have been developed to extract qualitative and quantitative information from EEM spectra such as "peak picking" 23 and fluorescence regional integration (FRI) ( Figure  1). 24 More recently, principal component analysis (PCA) or parallel factor analysis (PARAFAC) have been applied to reduce the dimensionality of large EEM data sets into a small number of independent components. 25,26 Whereas PCA decomposes EEMs into components which are not physically meaningful, PARAFAC is capable of "unmixing" complex EEM spectra to resolve the underlying independent fluorophores present and has become the EEM decomposition and interpretation tool of choice. Machine learning (ML) approaches, such as artificial neural networks (ANNs) and self-organizing maps (SOMs) are becoming more available for applications with fluorescence spectroscopy data. 27−29 ML approaches complement EEM-PARAFAC outputs and offer pathways toward greater automation in classification 30−32 and regression analysis 33 of large and complex EEM data sets.
Scope of the Critical Review. Previous reviews of EEM-PARAFAC have included: a general overview of the technique, 26 applications in drinking water and wastewater treatment plants, 34,35 critical evaluation of commonly used  fluorescence metrics, 36 potential pitfalls of oversimplification of  EEM interpretations, 37 application of SOMs for EEM data  analysis, 28,38 new approaches for similarity metrics, 39 and the  practical challenges for continuous, online monitoring  applications. 40,41 However, to date, no article has critically reviewed the potential of ubiquitous environmental fluorophores as low cost optical surrogates for DBP formation potential at a global scale. Hence, the present critical review aims to identify statistically significant and recurrent empirical relationships between FDOM components (identified by PARAFAC) and DBP formation reported in the available literature. Consideration is given to the potential advantages of PARAFAC components for continuous online monitoring applications and early warning detection of DBP formation risk in drinking water, with the overarching goal of protecting consumer health.

■ LITERATURE REVIEW
Out of 378 identified articles, 45 were selected for inclusion (hereafter referred to as the "selected articles") which matched the scope and search criteria of the present study (procedure fully detailed in Supporting Information, Figure S1 and Table  S1). Data from the selected articles were extracted into categories such as raw water sources, EEM acquisition procedures, FDOM components, chemical disinfection method, and DBP formation potential parameters (temperature and contact time). Relevant data fields arising from the selected articles are collated in SI, Extracted Data (TXT file).
The 45 selected articles were published between 2009 and 2022 ( Figure 2). Significant growth in the awareness of DBPs and potential associations with EEM-PARAFAC components is noteworthy from 2017 onward ( Figure 2). Selected articles focus mainly on drinking water treatment plant optimization or upgrades (47%), tracking the spatiotemporal dynamics of DOM in surface water (15%), and evaluating issues of biofilm algae (17%) or species-specific leaf leachates (9%). Additionally, some articles investigated the photoirradiation impact on  24 Gray boxes refer to local wavelength pair regions at which fluorophore maximum intensities are generally "picked". 23 DOM (10%) and a comparison of methods of DOM characterization (3%). A large diversity of DOM sources, e.g., surface water (71%), algal/microbial DOM (24%) and leaf leachate (10%), were investigated and are summarized in the SI. Dissolved organic carbon (DOC) concentrations reported within the selected articles ranged from 0.03 to 1,000 mg C L −1 . ■ EXPERIMENTAL APPROACHES EEM-PARAFAC Sample Preparation. PARAFAC methodologies for EEM decomposition have been reviewed elsewhere. 26,53 Sample preparation methods prior to EEM acquisition included filtration, dilution, and/or pH adjustment of various aqueous matrices. EEM spectra were typically recorded for excitation and emission wavelength pairs higher than 240 and 300 nm, respectively, where shorter wavelengths are thought to be associated with deteriorating signal-to-noise ratios. 37 Commonly represented PARAFAC toolboxes included DOMFluor 54 and drEEM, 26 present in 62% and 17% of the selected articles, respectively, with other toolboxes also noted (detailed in SI, Extracted Data).
Disinfection Byproduct (DBP) Formation Potential. Two approaches were described in the selected articles to evaluate DBP formation, which are fully reported in SI, Extracted Data. The first and most common approach (employed in 84% of the selected articles) was a collection of composite samples from the water treatment plant with EEM acquisition and DBP formation evaluated under controlled conditions (e.g., pH, temperature, disinfectant dose and contact time). 55 The second approach, employed in 16% of studies (such as in refs 56 and 57), involved acquisition of EEMs from the raw water source followed by postchlorination sampling for DBPs at points within the distribution network. This second approach may have some limitations as DBP yield is generally a function of residence time within water distribution networks which varies between 1 and 3 days from chlorination to the consumer tap. 58 All DBP formation reactions were arrested using an appropriate quenching agent with ascorbic acid being the most popular (SI , Table S2). Ascorbic acid is shown to have good compatibility with a broad spectrum of DBPs including THMs, HAAs, HKs, HALs, and HANs. 59

COMPONENTS AND DBP FORMATION
From the 45 selected articles, 218 empirical relationships between PARAFAC component intensities and DBP concentrations ( Figure 3) were observed within the selected articles, of which 135 had strong linear relationships (e.g., Pearson correlation coefficient, R 2 ≥ 0.7; SI, Extracted Data) and 83 had moderate linear relationships (e.g., R 2 ≥ 0.5−0.7), hereafter referred to as "established relationships". Overall, a a Follows the traditional assignment and peak label made elsewhere. 23,24,44,45 Roman numerals delimit the common environmental fluorescence regions where components were identified ( Figure 1). Secondary excitation maxima are shown in parentheses. n indicates the number of components described in the selected articles used to calculate the range of excitation maxima with the secondary maxima in parentheses. b Range of excitation and emission wavelengths were not considered below 240 and 300 nm, respectively, due to potentially deteriorating signal-to-noise ratios. c Potential environmental sources are described elsewhere. 23 larger proportion of relationships between C-DBP classes ( Figure 3) and humic/fulvic-like components compared to other FDOM components were found in the selected articles as follows: 74%, 67%, 71%, and 76% for THMs, HAAs, HKs, and HALs, respectively. In contrast, a similar proportion of relationships between N-DBP classes and humic/fulvic-like versus protein-like, i.e., tyrosine/tryptophan-like components, were observed. Direct comparisons between the selected articles were not always straightforward as EEM-PARAFAC and DBP formation potential methodologies were not uniform and model performance parameters rarely reported. Most established relationships were generally derived from the entire data set using the Pearson correlation coefficient (R 2 ) which constituted a satisfactory metric of the variance for a single model but not a transferable metric to compare two or more models. 60 Error metrics, e.g., root-mean-square error, slope, and intercept, which may aid in intercomparison between models and for model performance evaluation were never discussed in the selected articles. In addition, models trained on a random subset of the data set where the remaining data are used to cross-validate the model performance were absent in the selected articles. Therefore, some caution is warranted in the transferability of the reported established relationships to predict DBP formation, where occasionally strong relationships may be coincidental 61 or site-specific for a single DOM source. 62,63 THM and HAA Classes. THM and HAA classes reported in the selected articles contained 11 regulated DBPs (SI , Table  S3) including four THMs (THM4), i.e., trichloro-(TCM), bromodichloro-(BDCM), dibromochloro-(DBCM), and tribromomethane (TBM), and five HAAs (HAA5), i.e., monochloro-, dichloro-, trichloro-, monobromo-, and dibromoacetic acid, with four additional unregulated HAAs, i.e., bromochloro-, bromodichloro-, dibromochloro-, and tribromoacetic acid. 64, 65 It is noteworthy that trichloromethane was the dominant DBP, with up to 92% of THM4 and up to 47% of the TOX load, 66 observed from chlorination/chloramination in drinking water treatment plants when bromide concentrations were low. 16,56,67−70 THMs and HAAs were investigated in 88% and 51% of the selected articles (SI , Table S4), respectively, which together accounted for 53% of the total established linear relationships with PARAFAC components (Figure 3).
Strong, recurring relationships between humic-and fulviclike FDOM components and THM and HAA formation (e.g., R 2 ≥ 0.7) indicate these ubiquitous environmental fluorophores may be significantly associated with THM and HAA formation globally. On the other hand, established relationships between THMs and HAAs with UV absorbance at 254 nm (A 254 ), especially specific ultraviolet absorbance at 254 nm (SUVA 254 ), were made in 35% of cases, for both classes. SUVA 254 is directly proportional to the aromaticity of DOM, 71 which is consistent with aromatic DOM as a precursor for THM and HAA formation. 8 In addition, the relationship between THMs and HAAs with UV absorbance demonstrate that, although PARAFAC components are a stronger surrogate for prediction of THMs and HAAs, UV absorbance may be a technically simpler approach and should be considered depending on the purpose and extent of prediction desired. 55 Humic and fulvic acid compounds form a complex mixture of aromatic and aliphatic hydrocarbon structures with functional groups including amide, carboxyl, hydroxyl, and ketone, 6 which were postulated within the selected articles as important precursors of THMs and HAAs, 8 originating in surface water from fresh plant or leaf litter leachate (Table 1). This observation is consistent with electrophilic attack on carbonyl functional groups such as aldehydes, ketones, and carboxylic acids, which is thought to be one of the major pathways in the production of THMs and HAAs. 72 In addition, changes in EEM spectra may indicate a change in DOM molecular structure, where a decrease in fluorescence intensity over all components during chlorination was widely observed. 17,22,73,74 Decreases in fluorescence intensity tend to support a hypothesis that degradation of aromatic DOM accompanied by the release of DBPs may occur simultaneously with chlorination. 74,75 Finally, to account for the additive and linear contribution of PARAFAC components, multiple linear regression (MLR) models or the sum of PARAFAC components showed substantial relationship improvement in comparison to individual PARAFAC component models. 18,73,76−78 This suggests that several DOM compounds with different fluorescence regions may be associated with precursors for THMs and HAAs.
Brominated (Br-DBPs) and Iodinated DBPs (I-DBPs). Br-DBPs and I-DBPs are formed when bromide (Br − ) and iodide (I − ) ions are present in source waters. These species react quickly with hypochlorous acid (HOCl) to form hypobromous (HOBr) or hypoiodous acid (HOI), which may further react with DOM under the same pathway as HOCl. 79,80 However, HOBr reacts typically up to 3 orders of magnitude quicker than HOCl and has a very high reactivity with phenol functional groups. 79 Therefore, the incorporation yield of Br − into THMs is around 50% compared to 5−10% for Cl − . 81 From Table 2, it can be seen that there are a large number of moderate to strong relationships between humicand fulvic-like fluorophores and Br-DBP formation. Interestingly, the observed ratio between the variation of bromide and PARAFAC components (ΔBr/ΔC PARAFAC ) before and after chlorination exhibit a strong relationship with Br-DBP formation potential, e.g., Br-THMs, Br-HAAs, and Br-HANs. Conversely, weak relationships have been observed with an individual PARAFAC component model in the same study. 82 Similar to Br-DBP formation, PARAFAC component indices (Table 2) derive a stronger linear relationship with I-DBP   62 This observation may be consistent to some extent with the variation of the reactivity between HOI and the nature and location of aromatic substituents on the phenolic moieties. 83 Other C-DBPs. Other C-DBPs such as HKs and HALs were investigated in 33% and 20% of the selected articles, respectively (SI , Table S4). Like THMs and HAAs, they exhibited many moderate and strong relationships with humic/ fulvic-like components (Figure 3). PARAFAC components were particularly suitable as a surrogate for these DBPs as 75% and 63% of the investigated relationships exhibited moderate or strong relationships (Table 2) for HKs and HALs, respectively. Overall, these other C-DBP classes share similar relationships to THMs and HAAs with humic/fulvic-like fluorophores described above.
N-DBP Classes. N-DBP classes were not as well investigated in the 45 selected articles as C-DBPs. N-DBP classes were represented by HNMs, HANs, NAs, and CNX in 29%, 50%, 5%, and 2% of the selected articles (Figure 3), respectively. Despite the low number of articles for N-DBP classes, noteworthy observations could still be ascertained. Overall, the total number of established relationships with humic/fulvic-like PARAFAC components versus protein-like fluorophores were almost identical across the N-DBP classes ( Figure 3 and Table 2). In addition, none of the selected articles reported the PARAFAC component associated with wastewater or nutrient-enriched waters identified in a previous study (λ ex /λ em : 350/428 nm). 105 This observation is surprising because humic/fulvic DOM generally has a low organic nitrogen mass ratio (e.g., <5% N/C), compared to wastewaters or algal-derived DOM, which can have up to 20% N/C. 106 In addition, protein-like components which contain a high amino acid and N-organic compound content 52 may both serve preferentially as N-DBP formation precursors. 10 In the context of identified algal/microbial DOM sources, 72% of the successfully established relationships for chlorination were between protein-like components and HNM and HAN formation potential, 73,89,107,108 which is in agreement with the N-DBP formation precursors described above. In contrast, other DOM sources, e.g., leaf leachate and natural soil/water organic matter, showed no preferential relationship with humic/fulvic-like or protein-like components and the formation of N-DBPs. These observations support the conclusion that proteinaceous DOM is not necessarily the main precursor of N-DBPs and that their formation pathways may not always involve a similar amine precursor. 10,11 Some N-DBP formation pathways may involve inorganic nitrogen species (e.g., NH 4 + , NO 2 − or NO 3 − ) or chloramine as a nitrogen source which reacts with humic/fulvic-like DOM to produce N-DBPs, 10 but this cannot be concluded from PARAFAC modeling alone. ■ DISCUSSION AND RESEARCH OUTLOOK Environmental Implications. Relationships between THMs and SUVA 254 are well established (see section on THM and HAA classes). 16,68,73,77 From the findings of this review, a majority of the strong, recurring empirical relationships were observed between carbonaceous DBPs and humic/ fulvic-like fluorophores (Figure 4) across the selected articles (Table 2) (SI, Extracted Data). Given that EEM-PARAFAC components represent the small number of independent underlying fluorophores present in raw water FDOM admixtures, they offer a much more selective surrogate for quantitative prediction of DBP formation potential in comparison to classical optical parameters such as UV absorbance or fluorescence peak picking. 16,17,51,73,85,89,96 However, some articles were noted where PARAFAC components improve only marginally (ΔR 2 ≤ 0.1) the prediction of DBP formation potential. 21,68,77,90 Furthermore, linear models developed with one specific DOM source and/or location with an associated fluorophore intensity are not necessarily transposable to other environments (Figure 4). 62,63 One possible explanation is that DOM originating from different sources may exhibit contrasting chemical composition but yield similar fluorescence intensities. 111 Additional analytical methods, such as high resolution-mass spectroscopy (HR-MS), 112 size-exclusion chromatography, 113 and size fractionation, 16 may help to better constrain the molecular basis of FDOM components across different environmental This table is a summary of data extracted from the 45 selected articles (complete data fields are provided in SI, Extracted Data as a TXT file). Established (linear) relationships for similar DBP species and disinfection methods are separated by a forward slash. b Reported DBP and chemical disinfection method employed. c Identified PARAFAC components classified by excitation−emission wavelength pairs into common environmental fluorescence regions (reported in Table 1), where hum, ful, m-hum, tyr, and tryp refer to humic, fulvic, microbial-humic, tyrosine, and tryptophanlike fluorophores, respectively. In addition, "multi" and "index" refer to relationships derived using multiple linear regression models or sum of PARAFAC components and component indices, e.g., humic-like divide by tryptophan-like, respectively. d Strong linear relationships (R 2 ≥ 0.7) and moderate linear relationships (R 2 ≥ 0.5−0.7) between similar DBPs species and PARAFAC components are differentiated by a comma. Number of established relationships are expressed in parentheses; * and ** indicate p-values of ≤0.05 and ≤0.01, respectively. In the case of several significant relationships, only the highest p-value is reported.
sources. For example, HR-MS undertaken on environmental DOM samples has demonstrated that PARAFAC components can describe up to 59% of nontarget DOM species, 114 which highlights the unique capabilities of EEM-PARAFAC as a lowcost tool in DOM characterization. ML algorithms (such as ANNs) already in use for fluorescence data 33,115 may help to improve the success of DBP formation prediction using PARAFAC components. ML approaches offer distinct advantages by taking into account nonlinear relationships, interaction between variables, and diverse variable types, e.g., continuous and discrete, and do not rely on predetermined physical-based rules or assumptions on a given data set. 60 However, the successful deployment of ML relies on having sufficient contrasting data to train and properly validate an algorithm capable of recognizing subtle differences in FDOM composition. 116 This finding highlights a clear need for community sharing of raw EEM spectra globally to be able to perform unified analysis under the same workflow, in a similar manner to the online repository "OpenFluor" for PARAFAC models. 117 Within the 45 selected articles, no raw EEM spectra were shared. While it is not generally practical to include such large data volumes in publication supporting information, online repositories such as acs.figshare 118 and Zenodo 119 are readily available which embrace FAIR principles of data sharing 120 and are advocated further.
Implications for Continuous Online Monitoring. Two approaches for continuous online monitoring applications have recently been explored: (i) selection of specific excitation− emission wavelength pairs from PARAFAC analysis and monitoring of intensities to predict online DBP formation with relatively inexpensive in situ UV light-emitting diodes (LED) fluorescence sensors; 93,121,122 (ii) acquisition of full EEM spectra using a laboratory fluorometer connected to the source water via an in situ fiber-optic sensor in combination with SOMs. 123 New, faster PARAFAC algorithms now available 124,125 will allow continuous online data processing where EEMs can acquired at near-real time frequencies.
Currently, method (i) is the most cost-effective method where UV-LEDs offer a narrow spectral bandwidth centered on a specific wavelength pair and are commercially available for humic-like (λ ex /λ em : 320−370 ± 15/450−490 ± 30 nm) and tryptophan-like (λ ex /λ em : 270−285 ± 30/340−350 ± 30 nm) regions. An exhaustive list of current commercially available sensors and overview of key practical aspects in the use of UV-LEDs have been reviewed elsewhere. 41,126 Custom made UV-LED systems have also been explored by some authors, 121,127 and the field is expected to grow with the rapid development of new UV-LEDs becoming available on the market. 128 In addition, there are several practical challenges reviewed by Henderson 40 for continuous online fluorescence applications in surface water. For example, some distortion of the fluorescence signal will be observed under high suspended sediment loads or rapid changes in temperature, which may require additional in situ instrumentation, postprocessing, and validation with a laboratory fluorometer for continuous monitoring applications. 129,130 ■ CONCLUSIONS To the best of our knowledge, the present study represents the first critical review to collate and extract recurring associations between observed PARAFAC components and the formation of specific DBP classes during chemical disinfection. From the 45 selected articles, we found 218 statistically significant linear relationships between the formation of 10 DBP classes and observed PARAFAC components; of these, 135 were strong (R 2 ≥ 0.7). From the findings of this review, humic-and fulviclike fluorophores, typically originating from allochthonous watershed sources demonstrate considerable potential as lowcost fluorescence surrogates (R 2 ≥ 0.7) for the formation of multiple C-DBP classes including THMs, HAAs, HALs, and HKs. In contrast, the formation of potentially more harmful N-DBP classes exhibited strong linear relationships across all fluorophore regions. However, where algal or microbial autochthonous DOM sources were present, protein-like fluorophores (e.g., tryptophan-like) alone show strong relationships with N-DBP formation. Here, assigning specific components as surrogates for N-DBP formation is more challenging and source-specific than humic-and fulvic-like fluorophores for C-DBPs. Relationships derived from multiple linear regression or sums of PARAFAC components tended to show stronger predictive capability compared to an individual component model. Predicting DBP formation from FDOM components during drinking water treatment presents new opportunities for continuous online monitoring applications where treatment process operation and optimization are informed by source water FDOM composition with a view toward minimizing harmful concentrations of DBPs in consumer tap water. ■ ASSOCIATED CONTENT
Further information on the literature review (article selection criteria, classification of PARAFAC components, scope of selected articles) (PDF) Extracted data from the 45 selected articles used in this review (TXT file format with a header directly readable   109 It should be noted that there is some uncertainty with the RU value positioning along the x-axis as instrument-specific conversion factors were not available. 110 in an R environment); metadata file (PDF) explaining the extracted data fields and coding (ZIP) The manuscript was written with the contributions of all authors. CRediT authorship contribution statement: E.