In-situ ﬂ uorescence spectroscopy indicates total bacterial abundance and dissolved organic carbon

We explore in-situ ﬂ uorescence spectroscopy as aninstantaneous indicator of total bacterial abundance and fae- cal contamination in drinking water. Eighty-four samples were collected outside of the recharge season from groundwater-derived water sources in Dakar, Senegal. Samples were analysed for tryptophan-like (TLF) and humic-like (HLF) ﬂ uorescence in-situ , total bacterial cells by ﬂ ow cytometry, and potential indicators of faecal contamination such as thermotolerant coliforms (TTCs), nitrate, and in a subset of 22 samples, dissolved organic carbon (DOC). Signi ﬁ cant single-predictor linear regression models demonstrated that total bacterial cells were the most effective predictor of TLF, followed by on-site sanitation density; TTCs were not a signi ﬁ cant predictor. An optimum multiple-predictor model of TLF incorporated total bacterial cells, nitrate, nitrite, on-site sanitation density, andsulphate (r 2 0.68).HLF wassimilarlyrelatedto the same parameters asTLF,with totalbacterialcells being the best correlated ( ρ s 0.64). In the subset of 22 sources, DOC clustered with TLF, HLF, and total bacterial cells, and a linearregression modeldemonstrated HLF was the best predictor of DOC (r 2 0.84). The intergranular nature oftheaquifer, timingof thestudy, and/ornon-uniqueness of thesignaltoTTCs canexplain the signi ﬁ cant associations between TLF/HLF and indicators of faecal contamination such as on-site sanitation density and nu- trients but not TTCs. The bacterial population that relates to TLF/HLF is likely to be a subsurface community that develops in-situ based on the availability of organic matter originating from faecal sources. In-situ


H I G H L I G H T S
• Total bacterial cells most related variable to tryptophan-like fluorescence (TLF) • TLF and humic-like fluorescence strongly correlate with dissolved organic carbon. • Thermotolerant coliforms are not strongly related to other variables. • TLF and HLF relate to faecal contamination.

G R A P H I C A L A B S T R A C T
a b s t r a c t a r t i c l e i n f o
Fluorescence spectroscopy has been proposed as a surrogate indicator of many environment variables in freshwater, including biological oxygen demand, dissolved organic carbon (DOC), and other nutrients (Baker and Inverarity, 2004;Henderson et al., 2009;Hudson et al., 2008). More recently, it has been suggested the technique could be a rapid indicator of faecal contamination (Stedmon et al., 2011), the most widespread health risk associated with drinking water (WHO, 2011). TLF has been associated with the presence/absence and enumeration of thermotolerant (faecal) coliforms (TTCs), including Escherichia coli, in groundwater and surface water Frank et al., 2017;Mendoza et al., 2020;Sorensen et al., 2015aSorensen et al., ,b, 2016Sorensen et al., , 2018a. HLF has also been similarly or better correlated than TLF to E. coli enumeration in UK public water supply boreholes (Sorensen et al., 2018a), Austrian springs (Frank et al., 2017), and controlled culture experiments (Fox et al., 2017).
Fluorescence at TLF and HLF wavelengths is not unique to TTCs such as E. coli. Tryptophan residues in proteins are ubiquitous in bacteria, and many bacterial cells directly fluoresce at TLF wavelengths (Bronk and Reinisch, 1993;Dalterio et al., 1986Dalterio et al., , 1987Dartnell et al., 2013;Fox et al., 2017;Seaver et al., 1998;Sohn et al., 2009). Furthermore, multiple species excrete TLF and HLF fluorophores including those that are omnipresent in freshwater systems such as Pseudomonas aeruginosa (Elliott et al., 2006;Fox et al., 2017;Nakar et al., 2020). Therefore, elevated TLF/HLF may be indicative of elevated total bacterial cells (TBCs), which are not necessarily an indicator of risk to human health, as opposed to purely TTCs. Previously, Bridgeman et al. (2015) demonstrated a correlation between TLF and TBCs, but not E. coli, in predominantly treated drinking water. Further, Sorensen et al. (2018a) correlated TLF/HLF with TBCs, as well as E. coli, in untreated groundwater from four UK public water supply boreholes.
Here, we investigate both TLF and HLF as indicators of total bacterial cells (TBCs) as well as common parameters associated with faecal contamination such as TTCs, nitrate and dissolved organic carbon, across a wide-range range of drinking water sources. The study area is a highly contaminated aquifer beneath a suburb of Dakar (Senegal), Thiaroye, where on-site sanitation (OSS) comprises the only means of sewage disposal.

Study area
The unconfined Thiaroye aquifer beneath a suburb of Dakar is located on the Cap-Vert Peninsula (Fig. 1). The aquifer comprises Quaternary fineand medium-grained aeolian sands overlying lowpermeability Tertiary marl deposits (Faye et al., 2004(Faye et al., , 2019, with no humiferous layers (Fall, 1986;Martin, 1970). The sands are 5 to 75 m thick depending on the morphology of the Tertiary deposits. It is bounded by the Atlantic Ocean to the North and Southwest, the Tertiary marl deposits that outcrop to the South, a piezometric ridge separating it from the infrabasaltic aquifer to the west, and the Tanma depression and its seasonal lake in the east. The water table is typically within 2 to 15 m of the ground surface and the hydraulic gradient is generally from southeast towards the Atlantic Ocean.
Monsoonal rainfall occurs between July and October and provides the only annual precipitation that is typically 450 to 500 mm (Faye et al., 2019). The semi-arid peninsula has an absence of perennial surface water features with the exception of hypersaline Retba Lake located below sea level. However, numerous seasonal lakes are expressions of a rise in the shallow water table. Tritium ( 3 H), stable isotope ratios of O (δ 18 O) and H (δ 2 H), and piezometric data suggest groundwater is predominantly modern (post-1963), and diffuse recharge occurs during the latter part of the monsoon once soil moisture deficits are overcome (Faye et al., 2019).
The Thiaroye aquifer contributed~50% of Dakar's water supply in the 1980s (Faye et al., 2004(Faye et al., , 2019. Groundwater withdrawals have now decreased to~5% of total municipal supply due to exorbitantly high nitrate (NO 3 − ) concentrations that had increased to an average of 450 mg/L by 2008 (Diédhiou et al., 2012). Stable isotopes of dissolved nitrate (δ 15 N and δ 18 O) indicate an organic source of contamination (Diédhiou et al., 2012;Re et al., 2011) that is likely to relate to the vast network of septic tanks (Diédhiou et al., 2012), as there is a lack of mains sewerage outside the historic city centre in the far west of the peninsula. Furthermore, Diaw et al. (submitted to journal) demonstrate a significant linear relationship between the density of OSS and NO 3 − concentrations. Notwithstanding high levels of contamination, the aquifer remains essential for self-supply due to limited access to piped water.

Groundwater sampling and analysis
2.2.1. Sampling sites Eighty-four groundwater samples were collected from 70 water sources between 29th May and 3rd July 2018 (Fig. 1). Fourteen of the sources were resampled after 27th June following the only precipitation event (19 mm at Yoff) in the sampling period, as part of another study investigating temporal dynamics in water quality in the aquifer. These 14 resampled data were included following confirmation that the outcomes of the statistical analysis presented in the results remained unchanged. The 70 sources comprise: 41 drilled hand-pumped boreholes, 1 borehole equipped with a submersible pump, 22 dug-wells, which are all used for water supply, and 6 research piezometers. Samples were obtained directly from the source where there was a pump in-situ, using a 12 V submersible WaSP-P5 pump (In-Situ Europe, Redditch, UK) at hand dug-wells, and an MP1 pump (Grundfos, Bjerringbro, Denmark) set at around 1 m 3 /h in piezometers. All sources flowed for at least a minute before sampling to ensure all pipework associated with the pump was flushed, but the sources were not purged so the water was representative of what was actually being used for supply.
Sanitary inspection surveys were undertaken at all sources to assess potential risks of contamination observable at the surface (WHO, 1997). These surveys consist of a list of proscribed yes-no questions pertaining to the presence of potential sources of contamination (e.g. on-site sanitation (OSS), animal faeces, trash) and pathways for contamination to migrate rapidly into groundwater sources. The sanitary risk score (SRS) is the total number of positive responses to these ten questions. SRS is interpreted as 9-10 (very high), 6-8 (high), 3-5 (medium), and 0-3 (low).
The density of OSS in proximity to each source was retrieved from a 100 × 100 m grid of the entire Thiaroye aquifer previously devised by Diaw et al. (submitted to journal). The grid was produced using both object-oriented classification and photo-interpretation or visual interpretation of high-resolution Quickbird satellite images, which were validated by ground-truthing surveys. Population density was retrieved for the district of each source from the National Census of Senegal in 2016 (ANSD, 2016).

In-situ analysis
TLF was determined on unfiltered samples using a portable UviLux fluorimeter targeting the excitation-emission peak at λ ex 280 ± 15 nm and λ em 360 ± 27.5 nm (Chelsea Technologies Group Ltd., UK).
The minimum detection limit for the fluorimeter is 0.17 ± 0.18 mg/L (3σ) dissolved tryptophan (Khamis et al., 2015). HLF was measured using a UviLux fluorimeter configured at an excitation-emission peak at λ ex 280 ± 15 nm and λ em 450 ± 27.5 nm (Chelsea Technologies Group Ltd., UK). This fluorimeter did not target the centres of the HLF peaks but was aligned at the same excitation as TLF because of the optical overlap between TLF and HLF regions. Both TLF and HLF fluorimeters expressed intensity using in-built factory calibrations in quinine sulphate units (QSU) allowing calculation of TLF:HLF ratios. TLF data herein are reported in ppb tryptophan, for comparison to previous work, through calibration in standards of laboratory grade L-tryptophan (Acros Organics, USA) dissolved in ultrapure water (ppb TLF = 2.4461 QSU TLF -1.5086, r 2 1.00). Fluorescence analysis was conducted in a HDPE beaker placed within a covered black container to prevent interference from sunlight. Specific electric conductivity (SEC), pH, temperature and turbidity were quantified using a multi-parameter Manta-2 sonde (Eureka Water Probes, USA).

Laboratory analysis
Thermotolerant coliform (TTC) samples were collected in sterile 250 mL polypropylene bottles and stored in a cool box (up to 8 h) before analysis. TTCs were isolated and enumerated using the membrane filtration method with Membrane Lauryl Sulphate Broth (MLSB, Oxoid Ltd., UK) as the selective medium. Between 1 and 100 mL of sample was passed through a 0.45 μm cellulose nitrate membrane (GE Whatman®, UK) to ensure colonies were not too numerous to count, whilst maintaining a limit of detection of 1 cfu/100 mL. The membrane was placed on an absorbent pad (Pall Gelman, Germany) saturated with MLSB broth in an aluminium petri dish and incubated at 44°C for 18-23 h. Subsequently, all cream to yellow colonies N1 mm counted as TTCs.
Samples for total (planktonic) bacterial cells (TBCs) were collected in 4.5 mL polypropylene cryovials (Starlab, UK) that were pre-treated with the preservative glutaraldehyde and the surfactant Pluronic F68 (Marie et al., 2014) at final concentrations of 1% and 0.01%, respectively. The samples were frozen at −18°C within 8 h of collection, defrosted overnight during transit to the UK, and analysed the following morning on a BD Accuri C6 flow cytometer equipped with a 488 nm solid state laser (Becton Dickinson UK Ltd., UK). Water samples (500 mL) were stained with a 1:50 v/v solution of SYBR Green I (Sigma-Aldrich, UK) to a final concentration of 1:10,000 v/v for 20 min in the dark at room temperature. Samples were run with the Accuri at a slow flow rate (14 mL/min, 10 mm core) for 5 min and a detection threshold of 1500 on channel FL1. A single manually drawn gate was created to discriminate bacterial cells from particulate background, and cells per mL were calculated using the total cell count in 5 min divided by the reported volume run in μL.
Samples for major anions were filtered through 0.45 μm cellulose nitrate membranes (GE Whatman®, UK) into 30 mL Nalgene bottles (Thermo Fisher Scientific, USA) in the field. Analysis for chloride, nitrite, nitrate and sulphate was conducted by ion chromatography (Dionex AS50, Thermo Fisher Scientific, USA). Samples for dissolved organic carbon (DOC) were filtered through 0.22 μm hydrophilic polyvinylidene fluoride membranes (Sterivex, Merck KGaQ, Germany) into 15 mL polypropylene centrifuge tubes (Merck KGaA, Germany). DOC was quantified by thermal oxidation using an Elementar Vario Cube (Elementar Analysensysteme GmbH, Germany). Only a subset of 22 samples was collected and analysed for DOC due to budget constraints. All hydrochemical samples were refrigerated at 4°C between collection and analysis.

Statistical analyses
Linear regression and correlation analyses were performed in R version 3.4.0 using base commands unless described otherwise. A forward stepwise linear regression algorithm was implemented (train, car package) using 10-fold cross validation. The algorithm adds one predictor to the model at a time according to whichever predictor will yield the largest decrease in the root mean square error (RMSE), until no further improvement can be achieved. Once an optimal set of predictors was retrieved, standardised beta coefficients (β) were estimated using lm. beta (lm.beta package) to allow quantitative comparisons between predictors. This command multiplies each unstandardized coefficient by the standard deviation of the associated predictor over the standard deviation of the dependent variable, hence a β refers to how many standard deviations a dependent variable changes, per standard deviation increase in the predictor. The normality of model residuals were evaluated using Q-Q plots (qqPlot, car package), and Shapiro-Wilk's tests that employ the null hypothesis that the population is of Gaussian distribution (Royston, 1982). Multicollinearity (i.e. correlation between predictors) within multi-predictor models was investigated by calculating variance inflation factors (VIF) (vif, car package) (Alin, 2010). Prior to linear regression modelling, all variables with a skewness (skewness, moments package) greater than one were natural log transformed. In the case of TTCs and NO 2 − , additions of 1 and 0.1, respectively, were made prior to transformation to ensure the logarithm could be defined. Transforming the data was necessary as initial modelling produced skewed residuals, in contrary to the assumptions required by the model. Histograms of all variables used in the linear regression modelling are presented in Fig. S1.
Correlation matrices for variables of interest were produced by rcorr (Hmisc package) using Spearman's Rank incorporating mid-ranks in the case of ties (Myles and Wolfe, 1973), due to the non-Gaussian distribution of some data. These matrices were displayed using corrplot (corrplot package) and ordered by hierarchical clustering (Friendly, 2002;Murdoch and Chow, 1996).

Water quality and sanitary risks
Groundwater beneath Dakar displays clear evidence of faecal contamination (Fig. 2). TTCs are present in 80% of samples with a median count of 115 cfu/100 mL. The median NO 3 − concentration (257 mg/L) is five times the WHO drinking water guideline value and exceeds this value in 90% of samples. A subset of sources also shows that the concentration of dissolved organic carbon is elevated (median 6.8 mg/L). Median TLF (70 ppb) and HLF (86.9 QSU) are similarly high. There is a large variation in TLF:HLF from 0.11 and 0.61 (5-95th percentiles); there is no clear TLF peak as the ratio is b1, with the exception of a single site. Median water temperature was 28.0°C with variations limited to 2.2°C (2σ) indicating minimal influence upon fluorescence data (Baker, 2005;Khamis et al., 2015). Turbidity is also likely to have limited influence on TLF/HLF with a median of 1.5 NTU and 2σ of 20.9 NTU (Khamis et al., 2015;Saraceno et al., 2017). Sanitary risk scores show that these groundwater sources are at moderate risk of contamination from the surface (mean 4.7). In terms of potential sources of faecal contamination within 10 m, only one site had on-site sanitation (OSS), 64% had animal faeces, and 50% had trash.

Predicting thermotolerant coliform counts
There is no relationship between fluorescent OM fractions and TTCs (Table 1). Significant linear regression models cannot be developed using either TLF (p-value 0.793) or HLF (p-value 0.737) as predictors of TTC counts. TLF is far in excess of the 1.3 ppb threshold proposed to indicate faecal contamination and typically (N95%) exceeds the 6.9 ppb threshold to classify high risk sources ≥100 cfu/100 mL TTCs . Box boundaries illustrate the 25th and 75th percentiles, the line within the box is the median, the whiskers are the 10th and 90th percentiles, and the circles are the 5th and 95th percentiles (n = 84).
The only significant predictors of TTCs are turbidity (p-value 0.011) and NO 2 − (p-value 0.041) though relationships are very weak (r 2 0.07 and 0.04, respectively). Potential explanatory variables relating to common sources of TTCs observed at the surface including OSS within 10 m, density of OSS, the presence of either animal faeces or trash are not significant (Table 1). The lack of relationships between TTCs and TBCs is also notable.

Predicting tryptophan-like fluorescence
Significant single-predictor linear regression models can be estimated using either: HLF, TBCs, OSS density, SEC, SO 4 2− , population density, Cl − , or NO 2 − with a p-value of b0.001, or NO 3 − or SRS with a p-value of b0.05 (Table 2). The superior model uses HLF as the predictor and has an r 2 of 0.58, with the next best models using TBCs and OSS density obtaining an r 2 of 0.41 and 0.30, respectively. Many of these ten TLF predictors are significantly correlated to each other with varying strengths (ρ s 0.29-0.78, Fig. 3), although they can be split into six hierarchical clusters: HLF and TBCs (alongside TLF); OSS and population densities; SO 4 2− , SEC, and Cl − ; NO 2 − ; SRS; and NO 3 − (Fig. 3).
Implementing the stepwise linear regression algorithm, the model incorporates HLF, followed by the addition of NO 3 − , then TBCs and, finally, sanitation density (p-value b 0.001) (Eq. (1)). HLF is the most important predictor in the model: being added first and having the greatest β. The additional predictors improve both the RMSE and r 2 to final values of 0.40 and 0.74, respectively (Table 3 The stepwise algorithm was also applied excluding HLF as a predictor. TBCs is the most important predictor and included first, followed by NO 3 − , OSS density, NO 2 − and SO 4 2− (p-value b 0.001) (Eq. (2)). The final model has an r 2 of 0.68 and an RMSE of 0.44 (Table 3, Fig. 4B). Eq. (2) predictions are similar to Eq. (1) at lower TLF with increasing differences between predictions as TLF increases (Fig. 4C).
Both multiple linear regression models have Gaussian residuals (Fig. S2, Shapiro-Wilk, p-values = 0.103-0.330), no systematic spatial patterns in model residuals (Fig. S3), and no evidence of multicollinearity (VIF b2.06, Table S1). The models are robust to the two highest TLF values; if we consider them outliers, identical  predictors enter in the same order producing marginal improvements in the RMSE and r 2 (Eqs. (S1) and (S2)). Furthermore, both models are also significant (p-value b 0.001) for the two main types of groundwater source: hand-pumped boreholes and dug wells, individually, where the models have an r 2 of between 0.67 and 0.87.

Relationships within DOC
Significant predictors of ln(TLF) are generally also predictors of ln (DOC) for the subset of 22 sources (Table S2). The optimal singlepredictor models are HLF (r 2 0.84) and ln(TLF) (r 2 0.71). The predictors and TLF cluster identically to the complete dataset with similar correlations between them (Fig. 5). DOC clusters with TLF, HLF and TBCs and there is a very strong or strong tendency for DOC to increase with each of these variables (Fig. 5). TLF and HLF remain strongly associated with TBCs (ρ s 0.67-0.68, p-value b 0.001) in this subset of sources. No significant relationships exist between either DOC, NO 3 − or OSS/population densities. The stepwise algorithm was employed using ln(DOC) as the dependent variable. HLF enters the model first, followed by TLF, which improves performance at lower concentrations (b2), and finally OSS density (Fig. 6), with the final model (Eq. (3)) having an RMSE and r 2 of 0.19 and 0.90, respectively. When ln(TLF) was designated the dependent variable, the optimal model includes only ln(DOC) (Eq. (4) Both multiple linear regression models (Eqs. (3) and (4)) have Gaussian residuals (Fig. S2, Shapiro-Wilk, p-values = 0.189-0.642) and Eq. (3) has no evidence of multicollinearity (VIF b2.76, Table S1).

Fluorescent OM as an in-situ indicator of TTCs
This is the first groundwater study to demonstrate no relationship between TLF/HLF and TTCs. We consider several potential explanations for this that relate to the environment of the Thiaroye aquifer, timing of the study, and uniqueness of TLF to TTCs. Firstly, the Thiaroye aquifer has been subject to high loading of faecal waste from a dense network of OSS facilities since settlement began in the 1970s. In intergranular aquifers such as this, groundwater movement is relatively slow and porosity high, facilitating the accumulation of pollutants over time if they do not breakdown as rapidly as they arrive. We estimate a groundwater velocity of around 30 to 200 m/year, here, indicating a flow path from outcrop in the southeast to the Atlantic Ocean in the North of up to 50 to 120 years (assuming a hydraulic conductivity of 10-60 m/d (Henry, 1972), hydraulic gradient of 0.003 (Faye et al., 2019), and effective porosity of 0.30 (Diedhiou, 2011)). Consequently, median NO 3 − is five times the WHO drinking water quality guideline value, median DOC is almost six times the global median (McDonough et al., 2020), and median TLF/HLF are at least an order of magnitude greater than previous groundwater studies (Sorensen et al., 2015a(Sorensen et al., , 2016(Sorensen et al., , 2018a. In terms of TLF, the groundwater resembles poor quality surface waters in South Africa . Against this elevated historical baseline of fluorescent OM that is potentially   Table 2 for the subset of 22 groundwater sources. Only significant (p-value b 0.01) Spearman's ρ are shown. Variables are ordered by hierarchical clustering and black squares enclose six selected clusters.
spatially heterogeneous across the city, detecting deviations in recent faecal contamination, determined by TTCs, may not be possible.
Additionally, there are also likely to be fluorophores unrelated to faecal contamination in this complex urban environment, which fluoresce at either TLF or HLF wavelengths such as proteins in solid organic waste and xenobiotic compounds (Baker and Curry, 2004;Muller et al., 2011).
The fineand medium-grained sands of the aquifer are also likely to be effective at straining bacteria during vertical and lateral groundwater flow whilst predominantly extracellular TLF and HLF fluorophores are expected to be transported more readily. Indeed, we have demonstrated that the majority of TLF (~97%) was extracellular and there was no evidence of intracellular HLF in this aquifer (Sorensen et al., submitted to journal). There are no specific studies investigating bacterial transport in this aquifer, though Weaver et al. (2013) showed bacterial transport was limited to only a few metres in moderate and coarse sandy aquifers.
Our study was conducted outside of the recharge season following nine months of no rainfall, which is often a key driver of the microbiological contamination of groundwater (Hynds et al., 2012;Worthington and Smart, 2017). Furthermore, OSS, which could provide a perennial source of recharging water contaminated with TTCs, was typically absent in close proximity to the water sources. Consequently, any TTCs present during the recharge season may have perished or become non-culturable whilst elevated fluorescent OM remained present. This seasonal pattern was previously reported by Sorensen et al. (2015a) who demonstrated that at some groundwater sources, TLF remained perennially elevated whereas TTCs were only elevated during the recharge season. Future sampling during the recharge season could examine whether seasonal TLF/HLF-TTC relationships exist.
Finally, fluorescence at TLF and HLF wavelengths is not unique to TTCs such as E. coli (Bronk and Reinisch, 1993;Dalterio et al., 1986Dalterio et al., , 1987Dartnell et al., 2013;Fox et al., 2017;Seaver et al., 1998;Sohn et al., 2009). Here, we demonstrate, despite the lack of TLF/HLF-TTC relationship, that TBCs are the best predictor of TLF and are strongly correlated to HLF. Hence, non-TTC bacteria may be the source of TLF/HLF.

Fluorescent OM as an in-situ indicator of total bacterial cells
Relationships between TLF/HLF and TBCs may indicate that we are fluorescing autochthonous compounds produced by bacteria, given the associated fluorophores are predominantly extracellular (Sorensen et al., submitted to journal). Therefore, we are enumerating cells indirectly because of their activity, which is consistent with surface water (Cammack et al., 2004;Hudson et al., 2008;Parlanti et al., 2000) and laboratory studies (Fox et al., 2019;Fox et al., 2017) linking fluorescent OM and microbial activity. Despite this, abundance and activity are likely to be broadly interlinked in environmental systems.
Alternatively, we can consider that fluorescent OM is allochthonous and derives mainly from anthropogenic activity at the surface, in addition to a subordinate baseline relating to naturally occurring fluorescent OM. The statistical relationship between TLF/HLF and OSS density supports the theory that septic tanks are a key source of allochthonous OM here. This anthropogenic fluorescent OM is associated with nutrients (C and N) that provide resources to the bacterial community in a habitat, which typically has low-availability of organic carbon and nutrients (Griebler and Lueders, 2009). It then follows that the greater the OM supply to the system, the higher the bacterial biomass that can be supported. This hypothesis has been confirmed in multiple groundwater studies that have specifically related DOC inputs from anthropogenic sources to increases in bacterial abundance and activity (Findlay et al., 1993;Foulquier et al., 2011;Smith et al., 2015;Sobczak and Findlay, 2002). Therefore, the demonstrated relationship may not be a result of bacteria producing TLF/HLF fluorophores in-situ, but rather fluorophores being associated with OM inputs from the surface that provide a microbial resource. The abundance of this resource is then quantitatively related to the abundance of bacteria.
A final explanation may be the bacterial population is transported alongside fluorescent OM from the same source at the surface because many sources, particularly those that are faecal, contain elevated fluorescent OM and bacteria. We consider this to be the least plausible explanation as bacterial transport through the fineand medium-grained sand aquifer is likely to be limited. Hence, the bacterial population that relates to TLF/HLF is likely to be a subsurface community that develops in-situ. Furthermore, the study was conducted nine months after the termination of the recharge season, which is likely to provide the greatest influx of organic matter and bacteria; any foreign bacteria from that point in time are expected to have perished.
Future microbiology studies could use high throughout DNA sequencing to reveal the community structure and function of the groundwater communities to indicate the primary source of these communities. For example the Bayesian tool SourceTracker (Knights et al., 2011) has been used to identify sources of faecal microbes in a variety of freshwater habitats (Baral et al., 2018;Brown et al., 2017;Henry et al., 2016). In addition, it is possible to use flow cytometry to distinguish what proportion of the community is live versus dead (Berney et al., 2007) and the extent of microbial activity (Léonard et al., 2016). When combined, such analyses could disentangle the relationships between total bacterial cells and TLF/ HLF by elucidating the origins of the bacterial community and whether the fluorescent OM is likely to be autochthonous or allochthonous.

Fluorescent OM as an in-situ indicator of DOC
This is the first groundwater study demonstrating a strong relationship between in-situ fluorescent OM and laboratory DOC data. Previous in-situ fluorescence research has demonstrated similarly strong relationships (r 2 0.80-1.0) in surface waters (Downing et al., 2012;Khamis et al., 2017;Lee et al., 2015;Snyder et al., 2018;Tunaley et al., 2016). In these studies, researchers were deploying fluorescent sensors targeting the excitation peak of HLF around 350 nm and, with the exception of Khamis et al. (2017), did not investigate DOC-TLF relationships. Our analysis mirrors the linear regression modelling of Khamis et al. (2017): HLF is a slightly better predictor of DOC but the model is marginally improved by the addition of TLF. The inclusion of both HLF and TLF suggests that each wavelength pair results in the fluorescence of slightly different components of the available DOC.

Conclusions
Fluorescent OM is significantly associated with parameters linked to faecal contamination that include on-site sanitation density (OSS), NO 2 − /NO 3 − , total bacterial cells (TBCs), and dissolved organic carbon (DOC); but not with thermotolerant coliforms (TTCs), which tend to be indicative of recent contamination. It is unclear whether the lack of TTC association is because of: the intergranular nature of the aquifer, sampling being conducted nine months after the cessation of the recharge season, or the non-uniqueness of tryptophan-like (TLF) and humic-like fluorescence (HLF) to TTCs. In-situ fluorescence spectroscopy instantly indicates a drinking water source is impacted by faecal contamination, although it is unclear how that may relate to microbial risk in this setting. TBCs is the most related environmental parameter to both TLF and HLF across all samples. We consider this relationship is a result of either: (1) OM and its embedded nutrients providing a resource for bacteria in the subsurface; or (2) the in-situ production of fluorescent OM by subsurface bacteria. Irrespective, the relationships support the case of fluorescence spectroscopy as a more practical, cheaper, and robust alternative for indicating TBCs than quantification by flow cytometry. In remote settings, there is the additional advantage that samples can be measured in-situ rather than being preserved for subsequent laboratory analysis using hazardous chemicals (e.g. formaldehyde, glutaraldehyde, ethanol), that can modify the sample and are not always effective (Kamiya et al., 2007).
This groundwater study demonstrates strong relationships between in-situ fluorescent OM and DOC. HLF is the most strongly correlated peak to DOC and is likely to be sufficient in providing high-resolution data that could provide novel insights into DOC and C fluxes within aquifersan ecosystem that is generally C limited. Furthermore, online fluorimeters could facilitate more cost-effective chlorination of potable groundwater and forewarn of the generation of harmful disinfection by-products within the water industry.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.