Identification of biomarkers in wastewater-based epidemiology: Main approaches and analytical methods

Wastewater-based epidemiology (WBE) has become popular to estimate the use of drugs of abuse and recently to establish the incidence of CoVID 19 in large cities. However, its possibilities have been expanded recently as a technique that allows to establish a fingerprint of the characteristics of a city, such as state of health/disease, healthy/unhealthy living habits, exposure to different types of contaminants, etc. with respect to other cities. This has been thanks to the identification of human biomarkers as well as to the fingerprinting and profiling of the characteristics of the wastewater catchment that determine these circumstances. The purpose of this review is to analyze the different methodological schemes that have been developed to perform this biomarker identification as well as the most characteristic analytical techniques in each scheme, their advantages and disadvantages and the knowledge gaps identified. We also discussed the future scope for development.


Introduction
The theoretical idea of wastewater-based epidemiology (WBE) was first described by Daughton in 2001 [1]. Since then, the concept was put into practice first through the estimation of the consumption of drugs of abuse from the concentrations present in wastewater and then, it was enriched through the estimation of the consumption of other substances or even the exposure to different pollutants, such as pesticides or phthalates [2e4]. The prevalence of some bacteria and virus diseases has also been checked with this technique [5e7]. At this point, the role that WBE has played in the recent popular monitoring of the Sars-CoV-2 spread in populations from their wastewater cannot be skipped over [8e12]. Almost 20 years later, Daughton [13] again proved to be a true visionary of the possibilities of WBE to establish a fingerprint of a community's exposure to toxic substances, its health status and/or its lifestyle by establishing the concept of Sewage Chemical-Information Mining (SCIM). This concept involves the monitoring of sewage for the information that resides in natural and anthropogenic chemicals present in sewers as a result of the everyday actions, activities, and behaviors of humans. SCIM implies a broader application of WBE that can establish authentic fingerprints and profiles of cities, including not only consumption aspects but also health and socioeconomic aspects. Currently, the transition towards sustainability of European Cities is a reality that needs to lead with climate neutrality, safe and healthy food systems and water optimization keeping an appropriate live status. WBE can play a pivotal role helping through the fingerprinting and profiling to study in a global way the habits and the needs of its citizens to fix them in the most suitable way. In fact, already many scientists are looking for mining the chemical information on urban wastewater [14e20].
Logically, to fully develop, WBE needs to evolve and find solutions to certain gaps and shortcomings remaining on the approach [3,4,21e23]. As can be deduced from the previous paragraphs, WBE is completely multidisciplinary, including analysts, epidemiologists, microbiologists, sociologists, chemical engineers, economists, health specialists, etc. [24]. However, it is also true that it has a very strong analytical component, which we consider relevant and of special interest for the topic covered by this journal [3,4].
In WBE the ability to move from theory to practice correlates with the ability to identify biomarkers that attain estimation of drug use, leisure habits, health, disease, exposure, etc … [15,25e27]. This analytical field is dominated by liquid chromatography-mass spectrometry (LC-MS) techniques since the target molecules belongs to the excreted human metabolites that tend to be more polar than parent compounds [3]. However, there are many different instruments and workflows being crucial to have methodologies (or analytical schemes for detection and elucidation) that achieve determination of these biomarkers appropriately, and with well-validated analytical methods capable of quantifying. This is challenging, since wastewater has a complex and highly variable chemical composition, making it difficult to analyze due to the high number of interferences. Nowadays, there are also many approaches to detect viruses based on any type of DNA or RNA techniques, but we have considered that this is other important field that deserves its own review and have focus only on biomarkers of the human being.
In recent years there has been a tremendous methodological effort to identify biomarkers in wastewater, in fact there are many essential reviews that have this sole objective [15,25e27]. However, there is a lack of a critical review that outlines the different types of analytical protocols following a systematic approach that includes a description of the methods and techniques applied in each case. This review focuses on closing that void, classifying approaches used to identify biomarkers in wastewater and offering a critical discussion of new trends and future directions of analytical methodologies applied. This review also provides information on the major technological advancements in methodologies employed for quantitative biomarkers based WBE research. This is intended to be a critical rather than a comprehensive review, so a selection of the most relevant recently published work in terms of instrumental and methodological aspects, and applications is presented. The number of studies in this field is enormous, so representative papers published since 2017 are discussed. A few papers that have represented a milestone within WBE outside that time range are also included.

Classification and characteristics of the different approaches to identify biomarkers
The different approaches that exist to identify biomarkers can be classified as those used to analyze proteins in (i) topdown and (ii) bottom-up approaches (schematized in Fig. 1). The formers study the metabolism of the substances in the human being, mostly through the information reported in human biomonitoring (HBM) studies, then, identify which of the resulting substances can be an appropriate biomarker to be identified in wastewater. Contrarily to what happens in the proteomic field, the top-down approach is the most well developed in WBE. This approach requires the study of the human metabolism, identification of the substances excreted by urine, selection of those representative excreted in a high percentage, development of an analytical method able to determine them, study of their stability in wastewater and then, finally their assessment in wastewater. There are many remarkable and well conducted reviews that treat the topic of potential biomarkers to be used in the WBE for different estimations [15,24e26]. These reviews categorized biomarkers depending a little bit on the authors criterion but in all cases consumption, exposure, health/disease and lifestyle/consumption are main drivers of these classifications. Studies focus on consumption of mostly illicit drugs [19,23,28e41] but also food [42,43], artificial sweeteners [44,45], alcohol [46e51], caffeine [52,53], nicotine [47,51e53] and/or tobacco [54e58], new psychoactive substances [39,59e63], opioids [18,64,65], pharmaceuticals [22,34,53,66e72] and personal care products [34] are the most common. These studies are the most elaborated, as especially in the case of drugs of abuse, work has been ongoing since 2005 and there has been a major effort by many research groups to collaborate many times disinterestedly to systematize the methodology and address the drawbacks.
One of the areas that is increasing is the evaluation of human exposure [20,26] reflected on a high number of studies regarding exposure to pesticides [73e76], mycotoxins [77], bisphenol A and its analogues [78], personal care and household products [26], phthalates [79], organophosphorus flame retardants and plasticizers [14,80]. Some of these studies are highly useful reviews of the known pharmacokinetics datasets [26,73] that help in the identification of proper biomarkers and highlights the limitation to overcome in this type of studies that is share with consumption studies dmany of the biomarkers used are not exclusive to humans and can therefore sometimes come from other sources, making it difficult to assess exposure.
Other studies focus on heath/disease status, such as stress [42,43,57,81,82], hepatitis B [83], diabetes mellitus [84], Gout [85] or just in endogenous human biomarkers (such as catecholamines [86] and others [87]). Studies on disease have a major flaw since are    based on determining a drug or its metabolite used in their treatment. This, in the case of chronic diseases, or diseases of difficult diagnosis, leads to leave aside a part of the population untreated or undiagnosed, whose identification would be one of the objectives of the WBE not yet reached. The other approach that could be considered as a bottom-up approach is based on the non-target analysis of wastewater and the identification of substances derived from humans and the study of their ability to be used as biomarkers. This analytical scheme identifies the different substances (biomarkers, their transformation products and other compounds) present in wastewater after a difficult and long analytical process. Due to the complexity of wastewater, high amount of organic matter, different types of human and biota metabolites, degradation products, etc., this approach has been much less used than the top-down approach. However, it could offer interesting and extensive information about substances still unknown that can be pivotal to make reality the possibilities of WBE.
Nowadays, the possibilities of the bottom-up approach to detect different biomarkers have already been quite explored. Several bottom-up studies are involved in the estimation of communitywide exposure to bisphenol A [88], searching of proteins [89], cancer [65] and multi-chemical exposure [90] markers, as well as investigation of (bio)transformation products [91,92] and new psychoactive substances [93,94]. All these studies highlight the difficulty of using this system in wastewater, mainly because many of the biomarkers are not major compounds in the matrix. However, this analytical strategy could become a successful, fast and an economically advantageous tool for the screening of biomarkers in wastewater after performing many improvements still needed. Table 1 summarizes the analytical methods using the top-down approach. Most of them included a sample preparation step. Two classical approaches direct injection (after filtering, centrifuging or any other type of mechanical or physical operation) and solidphase extraction (SPE) [using from the classical hydrophilic lipophilic balance (HLB) phases to ionic exchangers, passing through mixed mode or home-made prepared cartridges mixing several phases] are the most common. The use of either method depends on a combination of two factors (i) the ability of the biomarker to be retained in a cartridge and (ii) the concentration of biomarker found in wastewater. For example, illicit drugs and pharmaceuticals are usually extracted by SPE [32,34,37,95] while alcohol (ethyl sulphate) is determined by direct injection [46,47,49,51] (see Table 1 for more exhaustive examples).

Analytical technique employed in the top-down approaches
There is one important development in extraction that deserves to be mentioned. The passive sampling at sewage treatment plants, which has proven to be advantageous. McKay et al. [34] developed, calibrated and validated a microporous polyethylene tubes (MPT) passive sampler for quantitative estimation of illicit drugs and pharmaceuticals and personal care products (PPCPs) by in-situ deployment in wastewater influent. Liu et al. [69] evaluated diffusive gradients in thin-film passive samplers for 15 illicit drugs and 18 antibiotics using three types of resin gels (HLB, XAD 18, and XDA-1). Both types of samplers were capable to establish the concentrations of target biomarkers in wastewater influents showing as promising devices for providing essential monitoring data for WBE. However, passive samplers are not without their disadvantages, such as to retain a limited number of compounds, little robustness (affected by environmental conditions, biofouling, etc.) and particularly the need to in-field calibrate them, which is why their use is still in an early stage.  The detection, identification and quantification of biomarkers is carried out mostly by LC-MS using highly sensitive and quantitative triple quadrupole (QqQ) mass analysers (LC-QqQ-MS/MS) (see Table 1 for the appealing number of references). The linear ion trap is used in some of the references but always is used as a triple quad [22,34,37,42,52,66,68,70,81,85]. Most of the chromatography is a reverse phase chromatography that applies the ultra-high performance liquid chromatography (UHPLC) columns of 1.7 mm. The stationary phases are mostly C18 but as most of the analytes are very polar, there are other phases such as biphenyl [14,32,34,40,42,47,63,80,81] or pentafluoropropyl (PFP or F5) [41,51,86] also frequently used. The mobile phases are mostly water with formic or acetic acids or with volatile salts, such as ammonium formate, acetate, etc., methanol and acetonitrile. Most of the biomarkers are determined in positive ionization mode since basic or neutral molecules are most common (Table 1). However, there are a few biomarkers that are determined in negative mode [14,22,34,46,47,49,51,57,68,70,79,81,82], so, this mode cannot be dismissed.
Enantiomeric analysis has been used to distinguish humaneliminated drugs from those from a direct release (when the latter is suspected based on daily fluctuations in mass loading), particularly, for the amphetamic compounds [38,95]. This approach is more beneficial when the parent compound is used as human biomarker [95] but there are also cases of pharmaceuticals, which render enantiomerically pure substances where enantiomeric analysis is also useful [96].
Interestingly, some references report the determination of illicit drugs, pharmaceuticals, caffeine and metformin by GC-MS after derivatization with trifluoroacetic acid (TFA) [30] or methyl-bis (trifluoroacetamide) (MBTFA) [53,84]. This method is not commonly for biomarker profiling since most metabolites have polar groups and the derivatization step is mandatory. However, GC-MS showed high sensitivity and reproducibility highlighting its alternative character to LC [84] and derivatization could help to increase selectivity and specificity of the analysis. Furthermore, the use of different types of (bio)sensors has also been reported to improve speed and efficiency, which has been the subject of several reviews that presented biosensors as an effective tool for WBE [28,29,97]. Furthermore, recently, Mao et al. [98] outlined a colorimetric method based on non-aggregated noble metal nanoparticles (AuNPs and Au@Ag) for determining illicit drugs. The biosensor consisted of DNA reporter probes, capture probes, and illicit drug-binding DNA aptamers (see Fig. 2). The absorbance intensity was correlated with the concentrations of illicit drugs, enabling their quantitation. These results proved that sensors have potential on estimating the consumption of illicit drugs for WBE. Mao et al. [28] reported a disposable paper sensor upon a surface-enhanced Raman spectroscopy (SERS) for the sensitive and selective detection of methamphetamine based upon the assembly of noble metal core-shell nanoparticles on a bespoke glassy nanofibrous electrospun paper matrix. Innovations like these represent the future of analytical technologies within WBE.  Table 2 outlines the few studies that use bottom-up approaches, that mostly rely on liquid chromatographyeHRMS. These studies have many common characteristics and a number of distinguishing features. The first common feature is that most studies propose a prior step of isolation and concentration of the analytes by SPE. The SPE is performed mostly as in the previous approach with sorbents based on HLB [88,90,94] and mixed-mode cartridges containing strong or weak cationic exchangers [95]. Interestingly, a homemade cartridge that mixes several sorbents (Oasis HLB, Isolute ENVþ, Strata-X-AW and Strata-X-CV) have also been proposed [91,99]. The aim was to broaden as much as possible the spectrum of compounds that can be retained to be able to identify as many compounds as possible. It is reported that this cartridge was very advantageous as it also eliminated many interferences in a matrix (wastewater) that is very complex.

Analytical techniques in the bottom-up approaches
Regarding the instrumentation used, most studies focus on the quadrupole time-of-flight (QqTOF) [88,90,94,100], which has a lower resolution than the Orbitrap [101]. However, there is no reason for this, except perhaps that the QqTOF instruments have been on the market longer. It is to be expected that the number of applications with orbitrap will soon increase.
Another important aspect is how to acquire mass spectral information. There is a division between those studies using data independent acquisition (DIA) that acquires simultaneous mass spectra with different collision energies: (i) a low energy mode (LE) where low collision energy was selected for detection of protonated  or deprotonated molecular ions, and (ii) a high energy (HE) mode with a high collision energy for detection of fragment ions [88,90,93,100]. Instruments are versatile in DIA and can use collision energy ramps (instead of an only value) or acquire more than one HE scans segments in MS mode alternating the collision energies [92]. The main drawback of this mode is that it does not provide a true MS/MS of the precursor ion but the fragmentation takes place in the skimmer that selects the ions. DIA main advantage is that it provides a spectrum in which all ions were fragmented. To get a MS/MS spectrum of all ions present in the sample, recently sequential window acquisition of all theoretical fragmention spectra (SWATH) or MSMS ALL has been developed. In this mode, the instrument systematically acquire fragment data from precursor ion ranges chosen to cover the mass range of interest. The second scan mode was a Data Dependent Acquisition (DDA), which provided a full scan spectrum (MS), in which the X (between 5 and 20) most abundant precursor ions for each instrumental cycle were isolated and fragmented resulting in their HRMS/MS spectra [93].
In the case of WBE, DIA methods work better than DDA based on ion intensity because biomarkers in wastewater are not always major constituents. Two types of methods have been used to identify biomarkers: suspect screening, and non-target analysis. Using suspect screening, Fernando-Climent et al. [100] searched cancer markers using a suspect list based on two cancer types, prostate and breast considered as the most frequent in Norway. Furthermore, the chemotherapy drugs associated with the treatment of these cancers, together with a large group of pharmaceuticals/therapies such mitotic inhibitors, anti-metabolites, hormones and immunotherapeutic agents were added to this list of suspected compounds. The high number of potential compounds as well as the low concentrations forced the use of a series of filters, inclusion/exclusion criteria and databases, which makes the analysis long, tedious and complicated compared to the top-down system. Fig. 3 illustrates the complexity the different filters and database that are selected pre-analysis and during the processing of the data to achieve proper results using bottom-up approaches and hep to understand why this approach has been so little used in comparison to the top-down. Among many other substances, several antineoplastic hormones, were identified (such as medroxyprogesterone, see Fig. 4). Similar suspect screening has been reported for NPSs and their metabolites [91e95]. Interestingly, Kinyua et al. [91] built a suspect database of potential TPs using two in silico prediction tools, the Eawag-Biocatalysis/Biodegradation Database Pathway Prediction System and the Metabolite Predict software, and a list of reported metabolites and TPs from the literature. Samples were screened using not only filters but also an inhouse retention time prediction model. In the same way, Andr es-Costa et al. [94] used a metabolite finder software that works after the selection of the parent compounds, looking for a list of possible phase I (debenzylation, deethylation, nitroreduction, demethylation, etc.) and II (hydroxylation, methylation, different conjugations, etc.) reactions.
There are some examples of non-target screening, Kinyua et al. [92] used the software of the instrument through 'COmponent Detection Algorithm' (CODA) and 'COMPARE LCMS' algorithm. CODA -a molecular feature detection algorithm-was useful for peak selection, which involved the removal of noise and background peaks, recovery of the mass spectra of pure compounds and separation of the co-eluting components within data sets. The COMPARE LCMS ea differential analysis algorithme allowed for comparison of two or more data sets with extracted feature candidates and showed the difference between them. Instead, Boogaerts et al. [95] performed non-target screening on the most intense signals visually observed in the Orbitrap chromatogram. hoped that in the future this type of methods will increase and that the possibilities of the technique in the identification of unknowns will be exploited. Furthermore, in the non-target screening, wastewater fingerprinting involving the non-targeted chemical analysis of wastewater with subsequent multivariate data analysis based on the principle of the so-called metabolomics has also been applied [88]. Water samples were then, analyzed using a very generic chromatographic separations attaining detection of many potential compounds using HRMS. Then, after identification of as many compounds as possible in the sample retrospective data mining of their characteristic human metabolism markers is performed (most studied example is bisphenol A sulphate [88,90]). Finally, statistical analysis by multivariate methods is performed to classify the samples into specific groups according to different compounds present.

Conclusions and future trends
The top-down workflow is the most widely used within sewage epidemiology, based on identifying a priori metabolites and biomarkers that are determined in wastewater. In recent years, this system has evolved from focusing only on drugs of abuse to deploying a range of studies related to habits, lifestyle, consumption and health and disease of the population. This can provide a fingerprint of the characteristics of the city. From an analytical point of view, the top-down scheme uses a conventional target analysis based on SPE or direct injection of the sample into a LC-QqQ-MS/MS. This analysis, however simple, is not without complications, especially due to the complexity of the matrix and the poor stability of the analytes. For those analytes most studied, such as drugs of abuse, passive samplers and biosensors are becoming the next step that will facilitate the analysis of these substances.
Regarding the assessment of other indicators, such as those responsibles of health/disease, nutrition, etc., there is still a long way to go in this type of analysis to select uniquely human metabolites and to identify those linked to different types of diseases that are not related to drug treatment.
The bottom-up approach has been used very little in wastewater, although there is some work that highlights its incredible possibilities within the WBE. There are few but interesting papers that rely on broad searches for biomarkers directly in the wastewater itself as well as for degradation products and other metabolites. The future of WBE must take this route, especially to exploit the idea of city profiling. Bottom-up systems rely on HRMS, capable of detecting and identifying an infinite number of compounds, at least in theory.
In summary, the wide range of determinable biomarkers in wastewater identifies numerous behaviors (drug use or dietary habits, for example), exposures (to pesticides and industrial substances, for example) and health (pathogens or antibiotic resistance, for example) that profile one city versus another. Most biomarker studies are still academic and exploratory in nature. Soon, wastewater analysis will be able to provide a wealth of socially relevant information so that globally it can be provided with better services and a more sustainable environment.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.