UvA-DARE (Digital Academic Repository) Socioeconomic status and public health in Australia A wastewater-based study

Analysis of untreated municipal wastewater is recognized as an innovative approach to assess population exposure to or consumption of various substances. Currently, there are no published wastewater-based studies investigating the relationships between catchment social, demographic, and economic characteristics with chemicals using advanced non-targeted techniques. In this study, fifteen wastewater samples covering 27% of the Australian population were collected during a population Census. The samples were analysed with a workflow employing liquid chromatography high-resolution mass spectrometry and chemometric tools for non-target analysis. Socioeconomic characteristics of catchment areas were generated using Geospatial Information Systems software. Potential correlations were explored between pseudo-mass loads of the identified compounds and socioeconomic and demographic descriptors of the wastewater catchments derived from Census data. Markers of public health (e.g., cardiac arrhythmia, cardiovascular disease, anxiety disorder and type 2 diabetes) were identified in the wastewater samples by the proposed workflow. They were positively correlated with descriptors of disadvantage in education, occupation, marital status and income, and negatively correlated with descriptors of advantage in education and occupation. In addition, markers of polypropylene glycol (PPG) and polyethylene glycol (PEG) related compounds were positively correlated with housing and occupation disadvantage. High positive correlations were found between separated and divorced people and specific drugs used to treat cardiac arrhythmia, cardiovascular disease, and depression. Our robust non-targeted methodology in combination with Census data can identify relationships between biomarkers of public health, human behaviour and lifestyle and socio-demographics of whole populations. Furthermore, it can identify specific areas and socioeconomic groups that may need more assistance than others for public health issues. This approach complements important public health information and enables large-scale national coverage with a relatively small number of samples.


Introduction
The notion that the social, demographic and economic characteristics of communities' shape human welfare is the foundation of public health, human behaviour and lifestyle.Indeed, diverse approaches have been used to assess community public health, such as cohort studies, clinical trials, surveys (e.g., self-report questionnaires or in-person interviews), medical data analysis and metadata analysis of various sources (e.g., sales data and household economic data).Positive and negative associations have been determined between several diseases, alcohol consumption and dietary habits and socio-demographic characteristics such as age, gender, ethnicity, education, income and occupation, among others (Allen et al., 2017;Khariton et al., 2018;Lund et al., 2018;Rosengren et al., 2019).
Wastewater analysis is an innovative approach which can deliver information on drug use, chemical exposure, lifestyle habits and aspects of public health (Choi et al., 2019;Gracia-Lor et al., 2017;Rousis et al., 2021;Thomaidis et al., 2016).Untreated wastewater contains a wide range of chemicals, including human excretion products (biomarkers) produced either in the body (endogenous) or through metabolic processes following intentional or unintentional exposure to a substance (exogenous).Wastewater-based epidemiology (WBE) exploits the occurrence of these biomarkers in the wastewater of the population served by a specific wastewater treatment plant (WWTP) (Gracia-Lor et al., 2017).
In recent years, the capabilities of WBE have attracted much attention from public authorities and (inter)national organizations.For instance, the European Union, the USA, Australia and dozens of other countries adopted the WBE approach to monitor the spatial occurrence and temporal progression of SARS-CoV-2 viral fragments in wastewater during the COVID-19 pandemic (Alygizakis et al., 2021;Keshaviah et al., 2021;Lundy et al., 2021).WBE is used by many national and international agencies, such as the Australian Criminal Intelligence Commission (ACIC, 2021) and the European Monitoring Centre for Drugs and Drug Addiction (EMCDDA) (https://www.emcdda.europa.eu/topics/wastewater_en), to assess drug consumption.Typically, WBE is applied as a surveillance tool to estimate the consumption of and exposure to various compounds and to detect spatial and temporal trends (Bade et al., 2022;Gracia-Lor et al., 2017;Rousis et al., 2021).
Few studies have investigated correlations of socioeconomic and sociodemographic characteristics of populations with chemicals measured in untreated wastewater (Choi et al., 2019;Thomaidis et al., 2016).These studies examined correlations among several descriptors and WBE biomarkers of lifestyle, substance use, exposure, and health.This approach was able to detect socioeconomic and sociodemographic patterns or disparities correlated with the consumption of certain substances.However, only pre-selected target compounds have been investigated and others from the same or different classes have not been explored.New strategies are therefore needed to include more compounds and non-target analysis using advanced high-resolution mass spectrometry (HRMS) offers one solution to this challenge.The potential of HRMS originates from the acquisition of accurate-mass full-spectrum data at high sensitivity, offering the possibility to assess the presence of thousands of compounds in a sample.A key advantage is that it allows the investigation of compounds in a post-analysis way without the need for analyte preselection.The occurrence of compounds initially not considered can be studied retrospectively from the acquired data (Hollender et al., 2017).
The aim of the present study was to identify wastewater biomarkers of social, demographic, and economic characteristics of societies by applying a liquid chromatography HRMS method to untreated wastewater sampled in different states and territories in Australia during the 2016 Census period.Georeferenced maps and Census data were processed using Geospatial Information Systems (GIS) software and the socioeconomics aggregated to catchment level.The median age, the index of relative socioeconomic advantage and disadvantage (IRSAD) and thirty-nine socioeconomic indexes for areas (SEIFA) descriptors (Table 1) were calculated for each WWTP and examined for potential correlations with chemical pseudo-mass loads in wastewater.The term of "pseudo-mass load" was introduced to perform statistical analysis in non-target HRMS data.

Wastewater sampling
We conducted a nationwide wastewater-based study covering the 27 % of the total Australian population.WWTPs were selected to provide broad coverage in terms of socioeconomic and demographic inequality.Fifteen daily composite influent wastewater samples were collected from fifteen WWTPs in five states and territories of Australia (seven sites from Queensland, one site from the Northern Territory, three sites from Victoria, three sites from New South Whales and one site from the Australian Capital Territory) during the 2016 Census night.WWTP catchments were in both state or territory capitals and regional places, with WWTP service populations ranging from a few thousands to millions.Wastewater was collected using flow-or time-proportional mode, kept refrigerated (4 • C) during 24-h sampling and frozen once collected (-20 • C) and during the shipping to the lab (The University of Queensland).Then, samples were thawed, spiked with IS, filtered, and sent frozen to Stockholm University for analysis, which was performed in 2017.

Census data and characterization of catchment population
In 2016, the Australian Bureau of Statistics collected information for every person in Australia and the place they were staying on Census night.Census data describes the economic, social, and cultural make-up of the country, providing information about demographics, education, cultural and language diversity and employment status of people and families among others.The present study used the median age, the IRSAD and thirty-nine SEIFA descriptors (Table 1) for the analysis.
To generate socioeconomic variables specific to the catchment areas, the catchment boundary maps were obtained from each WWTP as PDF, image, or GIS files and georeferenced into GIS software.Geo-referenced Census variables were downloaded from the Australian Bureau of Statistics Census table builder web platform at the Statistical Area level (typically containing 200 to 400 residents and the highest resolution available for the socioeconomic variables).The georeferenced maps and Census data were processed using GIS software and the socioeconomics aggregated to catchment level based on the relative areas contained within the catchments.The socioeconomic characteristics were generated as described in detail elsewhere (Tscharke et al., 2019).

Analytical procedure
The methods of sample preparation and instrumental analysis are described elsewhere (Li et al., 2018).Briefly, wastewater samples were spiked with IS at a concentration of 10 µg/L and filtered through a 0.45 µm PTFE syringe filter.Analysis (direct injection) was performed by ultrahigh performance liquid chromatography (Dionex UltiMate RSLC; Thermo Fisher Scientific, San Jose, USA) coupled to a Q Exac-tive™ HF Hybrid Quadrupole-Orbitrap™ mass spectrometer (Orbitrap-MS/MS; Thermo Fisher Scientific, San Jose, USA) using electrospray ionization in positive mode.Chromatographic separation was achieved using a reversed phase Hypersil GOLD™ aQ C18 polar-endcapped (1.9 µm, 2.1 mm × 100 mm) column (Thermo Fisher Scientific, San Jose, USA).Mobile phases consisted of Milli-Q water (A) and acetonitrile (B), both containing 0.1 % formic acid and the gradient programme was of 14 min.The flow rate was set at 0.4 mL/min., the column temperature at 40 • C and the injection volume was 10 µL.The Q Exactive Orbitrap was operated in the top N data-dependent acquisition mode (five most abundant precursor ions) and the full scan MS ranged 80-1000 Da.

Data processing -InSpectra
The raw HRMS data were processed by the in-house InSpectra package to obtain features of interest which were then identified using the Universal Library Search Algorithm (ULSA) and researcher's experience (Fig. 1).Firstly, the HRMS data (*.RAW) were converted to generic open-source formats (*.mzXML) using the ProteoWizard Toolkit (Chambers et al., 2012).Then, the converted files were further processed using the Self-Adjusting Feature Detection (SAFD) algorithm which produces feature lists considering various input parameters (Table 2).The SAFD algorithm is consisted by nine steps, considering all the points measured within a feature without using arbitrary parameters and adapting a 3D Gaussian distribution into the data to identify the features (Samanipour et al., 2019).After that, the total number of features was reduced by grouping features belonging to the same compound (e.g., isotopic pattern) and present at the same retention time.A Naive Bayes classification model was applied that can detect isotopologues at MS level without using information on the molecular formula or arbitrary limits (e.g., mass tolerance) (van Herwerden et al., 2022).Finally, the features of interest were run through the ULSA that contained more than 700,000 entries of experimental and in-silico data (Allen et al., 2015;Samanipour et al., 2018).The identified compounds were classified at different levels of confidence, from Level 1 to Level (Schymanski et al., 2014): Level 1, availability of reference standard with MS, MS/MS and retention time match; Level 2, literature / library spectrum data or diagnostic evidence data on MS and MS/MS; Level 3, evidence on MS and MS/MS data but not a unique structure can be assigned; Level 4, an unequivocal molecular formula is assigned; Level 5, exact mass (m/z) of high interest.Identification of compounds was achieved using mass accuracy at MS and MS/MS level of lower than ppm and retention time of ± 0.2 min compared to the reference  standard.The compounds identified at low confidence levels (3 to 5) were not further investigated, as little information was available to give a specific structure to the molecule.

Loads and statistical analysis
No concentrations were available for the features to calculate loads and further assess the findings as we applied a non-target screening method.Therefore, the response of each feature was transformed to a pseudo-mass load.The following equation was used to estimate the loads:

flow Population
The area of each feature at a specific retention time was normalized to the area of one of the injected IS to reduce misestimations among the various samples due to instrumental drift and matrix effects.The chromatogram was divided into eleven segments (1 min each) in which a specific IS eluted.This IS was used to normalize all the features present in this segment.Only the most abundant features in all the samples, showing a detection frequency greater than 80 %, were examined.Nondetections (zero values) of the most abundant features were replaced with the lowest area found for the feature in all samples divided by two.
The normality of the data was checked by the Shapiro-Wilk test.Pearson and Spearman correlation tests were then used for parametric and non-parametric assessment respectively, considering a 95 % confidence level.High and very high positive or negative correlations were further investigated (Table 3).

Quality assurance and quality control (QA/QC)
The best practice protocols for WBE were applied to minimize the uncertainties related to wastewater analysis, such as sample collection, storage, and analytical methodology (Castiglioni et al., 2013).Sampling of 24-h composite influent wastewater was performed by the operators of each WWTP following a specific protocol made for this reason and completing a questionnaire to ensure reliability, representativeness, and high data quality (O'Brien et al., 2019).Samples were collected in precleaned HDPE bottles (pre-cleaned with methanol and then rinsed with Milli-Q water) to avoid external contamination and according to the most suitable mode (daily composite and not grab sampling) using autosamplers on the highest available sampling frequency.In addition, samples were collected under refrigerated conditions and shipped frozen to reduce possible degradation of the compounds.Instrumental calibration (mass accuracy) was performed regularly using Pierce™ calibration solutions (Thermo Fisher Scientific, San Jose, USA).Analysis was performed using a validated method and all samples were spiked with a mix of IS for instrumental performance assessment (e.g., method reproducibility, sensitivity, injection volume and ionization) and to correct for variations due to matrix effects.Procedural blanks were run throughout the analysis and samples were analysed in triplicate.The analysis was considered highly repeatable as the retention time drift was <0.04 min for IS and analysis of blanks showed that there was no contamination nor carry over.

Detection of significant features
The prioritization workflow (Fig. 1) extracted 25,980 different features in all samples and after specific filtering 542 features remained.Further statistical analysis demonstrated that twenty-eight features were highly correlated with at least one of the selected descriptors (Table 4).Six features, namely sotalol, metoprolol, atenolol acid (or metoprolol acid), venlafaxine, paracetamol and sitagliptin, were confirmed using authentic reference standards.Five features were identified as polypropylene glycol (PPG) and polyethylene glycol (PEG) related compounds.These compounds showed the characteristic chromatographic profile of peaks and mass losses among them and typical MS/MS spectrum (Thurman et al., 2017).With increasing retention time, a series of chromatographic peaks separated by 58.0419 mass units for PPG and 44.0262 for PEG was observed.The difference of 58.0419 mass units corresponds to a propylene oxide group, [-CH2-CH(CH3)-O-] and the difference of 44.0262 to an ethylene oxide group, [-CH2-CH2-O-].The MS/MS spectra of PPG presented typical fragments (e.g., m/z 59.0494, 117.0911 and 175.1321) with a series of 58.041 mass unit losses.Similarly, the MS/MS spectra of PEG presented typical fragments (e.g., m/z 89.0597, 133.0860 and 177.1122) with a series of 44.026 mass unit losses.For one feature, different molecular structures were assigned using data from in-silico fragmentation.Various dipeptide isomers were proposed but there was no way to distinguish them since they share common fragments.In addition, it was not feasible to assign a molecular structure for seven features, while for nine features either no fragmentation or only one fragment was observed, so identification was challenging.These features were positively correlated with descriptors showing disadvantage in education, occupation, income, housing, and other groups (e.g., separated or divorced) and negatively correlated with descriptors of advantage or disadvantage in education, housing, and occupation.Our approach demonstrated that feature filtering based on Census data can serve as an advantageous tool for presenting useful information to society, but limitations in feature identification hinder our ability to identify new biomarkers.

Sotalol
Sotalol, a medication used to treat and prevent cardiac arrhythmias, was positively correlated with AGE (Table 4).Cardiac arrhythmias could be the consequence of various medical issues and, thus, it was not feasible to link our findings with a specific illness.Nevertheless, most of them are related to heart disease, the incidence of which increases with age (Brembilla-Perrot, 2003).Sotalol was also positively correlated with the social descriptor SEPDIVORCED (Table 4).Separation and divorce have been shown to lead to health effects, such as heart rate variability, mental health problems and respiratory arrhythmia, among others (Bourassa et al., 2016;Hughes and Waite, 2009).

Metoprolol
Metoprolol is a beta-blocker and is used to treat patients with high blood pressure, chest pain and heart failure.Metoprolol negatively correlated with IRSAD and descriptors of advantage for education and positively correlated with descriptors of disadvantage for education, occupation, and other socioeconomic characteristics (Table 4).Inequalities in the treatment of diseases have been observed previously, with specific socioeconomically disadvantaged population groups, such as those with low income, lower levels of education or lower levels of employment, having poorer health status and less access to health care systems, possibly leading to the use of certain drugs (Christensen et al., 2011;Khariton et al., 2018;Rosengren et al., 2019).Communities with low socioeconomic status consumed higher amounts of metoprolol, which is one of the cheapest selective beta blocking agents of the group C (cardiovascular system) on the market.Furthermore, atenolol, one of the most used (PBS, 2017) and inexpensive drugs from the same group,

Table 3
Interpretation of the strength of a correlation coefficient (R).had a moderately negative correlation with IRSAD as has previously been reported (Choi et al., 2019).

Atenolol acid
Atenolol acid is a (bio)transformation product of atenolol and metoprolol and it positively correlated with two descriptors of disadvantage (Table 4).It is the specific human metabolite of the most widely used beta blocking agents (group C) and, therefore, could provide useful information for the whole group of drugs associated with diseases of the cardiovascular system.Both parent compounds had moderate (INC_LOW: atenolol, R = 0.592, p =.020 and metoprolol, R = 0.664, p =.0069; SEPDIVORCED: atenolol, R = 0.603, p =.017) to high (SEP-DIVORCED: metoprolol, R = 0.759, p =.0010) correlations for these descriptors.Multi-country longitudinal studies have shown that income, marital status, lifestyle habits and education among others influence cardiovascular risk or disease burden on an individual level (Appiah and Capistrant, 2017;Clark et al., 2009).Our findings were in line with these studies that showed that low socioeconomic status was positively associated with cardiovascular disease (Clark et al., 2009) and that separation or divorce was associated with a higher odd of cardiovascular disease (Appiah and Capistrant, 2017).

Venlafaxine
Venlafaxine is mainly used as an antidepressant and sometimes for the treatment of generalized or social anxiety disorder and panic attacks.It was positively correlated with the descriptor SEPDIVORCED (Table 4).Wastewater analysis showed that catchments with higher proportions of those separated or divorced consume higher amounts of venlafaxine compared to those catchments where there is a higher proportion of the population who are married or in a relationship.Other studies have suggested that divorce negatively affects mental health leading to an increased use of antidepressants (Monden et al., 2015).

Paracetamol
Paracetamol is an analgesic used to treat pain and fever and was negatively correlated with MORTGAGE (Table 4).Housing has been shown to play an important role in human health among other social determinants (Lund et al., 2018).In addition to the physical characteristics of a home (e.g., material housing conditions), social and psychological housing factors are recognised as factors that affect human health (Shaw, 2004).Consequently, all the cross-sectional nature of the data that could affect the association between housing and health should be considered.Paracetamol is one of the most used analgesics and may also be consumed in combination with other drugs.It can be purchased without a prescription and it is used to treat symptoms of many conditions.Paracetamol did not correlate with any other descriptors and no statistical analysis (correlations) among the various features was performed.Therefore, it is very difficult to provide a sound explanation for this correlation and further research is necessary.

Sitagliptin
Sitagliptin is an anti-diabetic medication for the treatment of adults with type 2 diabetes.This biomarker was more prevalent in communities with lower socioeconomic status.High positive correlations were observed between lower levels of education and high negative correlations were observed with higher levels of education and higher-skilled occupational groups (Table 4).Diabetes is an increasing problem in many countries all over the world and the social, demographic, and economic characteristics of each country and/or smaller groups within a country have affected its prevalence and incidence.Worldwide trends in diabetes have suggested that occupational social class, income, educational level, gender, age and other lifestyle-related risk factors are implicated in the pathogenesis of type 2 diabetes (Agardh et al., 2011;Shirai et al., 2021).N.I.Rousis et al. 3.1.7. Polypropylene (PPG) and polyethylene (PEG) glycol related compounds Polypropylene and polyethylene glycol related compounds have many professional, industrial and consumer uses in medicines, cosmetics, cleaning products, the chemical industry, the food industry, and others.High correlations with relative descriptors of housing and occupation and two miscellaneous variables were observed (Table 4).Due to the numerous and multidimensional uses of these compounds, it was not feasible to associate specific products to the descriptors and to suggest explanations for the correlations.For instance, it has been shown that the frequency and type of usage of certain cosmetics is associated with age, occupation and income (Park et al., 2018).In addition, occupation is a crucial factor determining the use of and/or exposure to chemicals (e.g., cleaning products) and the correlation could be related to the type of jobs and/or the specific employment profile of those living in the catchment (Rosenman et al., 2003).This study highlighted the positive correlation of these compounds with housing and occupation disadvantage.

Social, economic, and demographic characteristics
Our non-targeted approach combined with the socioeconomic diversity of wastewater catchments identified wastewater biomarkers of public health, which were positively correlated with descriptors showing disadvantage in education, occupation, marital status, income and negatively correlated with descriptors of advantage in education and occupation (Fig. 2).In addition, a variety of PPG and PEG compounds were positively correlated with housing and occupation disadvantage.

Aging
Aging is an important factor of human health and it has been shown that older people consume more medicines to treat various health disorders (Brembilla-Perrot, 2003;Moen et al., 2009).Wastewater results revealed that the levels of a drug used in cardiac arrhythmias are increased with advancing age (Fig. 2).Older people are more likely to develop cardiac arrhythmias, as many factors, such as genetics, stress, lifestyle habits and other illnesses that occur at an older age, can be the cause (Anstee et al., 2018;Chen et al., 2021).

Index of relative socioeconomic advantage and disadvantage (IRSAD)
The IRSAD, which is an indicator of the social and economic wellbeing of a population, showed that communities with low socioeconomic status were positively associated with markers of cardiovascular disease.The main descriptors that gave this outcome were related to income, education, occupation, and marital status (Fig. 2).Many studies have shown this as well as the inverse correlation; populations with a high socioeconomic status had a lower incidence of cardiovascular disease (Allen et al., 2017).Level of wealth is a critical factor that affects many aspects of human life which can lead to lifestyle practices that negatively influence human health (e.g., unhealthy diet, physical inactivity and harmful use of tobacco and alcohol) (Allen et al., 2017).Financial comfort is a good predictor of human health, as patients with high income can afford more expensive medication, have better healthcare coverage, access to (private) medical centres and have all the adequate resources for self-care (Khariton et al., 2018).In addition, education was found to be a strong marker for cardiovascular disease and a lower level of education is known to be associated with an increased risk of the disease (Rosengren et al., 2019).Certain (socioeconomic) characteristics of neighbourhoods have been shown to affect human health, as limited access to healthcare facilities, healthy food, green space, public transportation, and exposure to outdoor pollution have been associated with many diseases, such as increased risk of diabetes, cardiovascular and cerebrovascular diseases and poor mental health (Kivimäki et al., 2021).Some of the main risk factors for cardiovascular disease are related to diet, tobacco and alcohol use, physical exercise, and environmental pollution.This study identified areas with low socioeconomic status that showed significant correlations with a marker of cardiovascular disease and these results could be used by the authorities to perform certain actions to help these communities.These actions can target on a healthier lifestyle (e.g., consumption of fruits, vegetables, and fibres, reduction of smoking and drinking habits and regular physical activity), modifications to neighbourhoods that would minimize the pollution, better access to health systems and improved education that could alleviate part of the significant excess burden of the

Education
Education is characterized as a steady socioeconomic indicator from a relatively early stage of life, which affects the entire course of an individual's life.This work revealed that catchments with lower education had higher drug use associated with cardiovascular disease and type 2 diabetes (Fig. 2).It has been suggested that people with lower levels of education are limited to receiving health related information and communicating with healthcare services, with a difficulty in accessing primary and secondary prevention (Agardh et al., 2011;Kivimäki et al., 2018;Rosengren et al., 2019).This could result in a reduced ability to obtain effective health care in general, which could also be related to delayed action after the first symptoms due to a lack of awareness of the importance of seeking help early, resulting in worsening of the disease.In addition, low education is associated with inadequate access to highquality healthcare systems and poor compliance with disease treatment (Agardh et al., 2011;Allen et al., 2017;Christensen et al., 2011;Kivimäki et al., 2018;Rosengren et al., 2019).This work found that areas with a lower education level had higher consumption of drugs for specific diseases.Therefore, the areas identified in this study could be targeted at public health policies.Campaigns could focus on communities with lower levels of education to reduce inequalities in health behaviours by adapting strategies to improve knowledge about disease risk factors, their symptoms, and the operation of healthcare systems.Community health programmes focusing on primary care practices would help to further reduce health issues.

Occupation
Occupation is an important factor of the socioeconomic status, as it is related to the ability to accumulate financial resources and to the general lifestyle.We demonstrated that catchments with a population working in lower skilled occupations were found to have a higher prevalence of cardiovascular disease and type 2 diabetes (Fig. 2).Risk factors of both illnesses include obesity, availability of healthy foods, sedentary behaviour, physical inactivity, lifestyle choices, irregular working hours and psychological stress (Agardh et al., 2011;Carlsson et al., 2020).Adults spend a lot of time at work and since many of the risk factors are related to daily work habits, the workplace could be the first step in primary prevention.Our results were not based on specific types of occupation and, thus, it would be difficult to indicate which risk factors can be improved.However, workplace wellness programmes (Peñalvo et al., 2021) could focus on general guidelines, such as reducing smoking and weight gain, promoting a healthy diet and increasing physical activity.These interventions could benefit both employers and employees by improving health, increasing productivity, and reducing the cost of healthcare.

Descriptor of separated or divorced people
Our findings indicated that in catchments with a higher proportion of separated or divorced people there was a high positive correlation with specific drugs used to treat cardiac arrhythmias, cardiovascular diseases, and depression (Fig. 2).Divorce has shown mild and severe short-term and long-term effects on mental and physical health that are still evident many years later (Monden et al., 2015).Although separation or divorce could be a desirable action, both partners are exposed to stressors.They could experience health issues due to fights and conflicts over housing arrangements and joint assets, reduced income, potential social stigma, contact loss with friends and family, loss of healthcare, loss of companionship and loss of child custody (Hughes and Waite, 2009;Monden et al., 2015).Our analysis showed that certain groups had significant correlations with disease markers.As interventions are essential for their support, actions should be taken to assist divorcees, not only during the breakup (e.g., marriage counselling sessions), but after the divorce as well (e.g., childcare assistance, psychological support, financial support, and government assistance on finding a job).

Strengths, limitations, and future perspectives
Our work has several strengths, particularly that the proposed methodology can identify biomarkers which are associated with social, demographic, and economic characteristics of societies.This approach benefits from the fact that it gives a picture of the "real" situation in a location, as it targets the whole community, and its results are not derived from the analysis of small specific groups.HRMS wastewater analysis presents some advantages for monitoring public health over traditional techniques (e.g., surveys, cohort studies and clinical trials), as data are collected in an objective way (e.g., not self-reported data), it is a cost-effective approach, and no ethical issues arise since the sample is a composite of the entire population and individuals cannot be identified.We therefore suggest that this approach could be integrated into current frameworks used to identify factors influencing public health.
This work presented limitations as well, which need to be highlighted to improve future studies.The analytical technique was based on chromatography coupled to mass spectrometry focusing on small molecules and, thus, the identification of some markers (e.g., viruses) was not feasible.It should be emphasized that the use of this technique was related to the aims of the study and no technique can determine all chemicals present in the wastewater.Although this study used such a large library (thousands of compounds), only a small number of features were confirmed.This is a universal problem for HRMS analysis and more experimental data are needed in open access forms.No correlations among the various features were investigated and therefore associations among chemicals were not revealed.Our data were generated from samples of a high-income country and their applicability to low-income countries should be further explored, as several factors could be dissimilar among the populations.
We introduced the term of "pseudo-mass loads" to manage nontarget HRMS data applied for WBE purposes where the use of normalised mass loads is considered necessary.The proposed approach can be used when quantitative or semi-quantitative data are not available.The "pseudo-mass loads" are instrument specific, since they are based on the response (peak area) of features and IS and, thus, cannot be compared with other studies.Rather, the exact values can only be used within a specific context.Pseudo-mass loads are good alternatives when statistical analysis (e.g., correlation tests) is essential to elaborate big data.This approach requires the use of IS, but it is not able to correct all variations presented during an analysis for each feature, due to their high numbers in a sample.Selection of "appropriate" IS could be a future investigation, but one has to consider that most features in an analysis are finally characterised as unknowns with no information on the chemical structure available.
The sampling was performed by applying the best practice protocol for WBE and one sample was analysed from each WWTP corresponding to the day of the Census.If multiple samples had been analysed per WWTP the pseudo-mass loads would have to be averaged in order to perform the correlation analysis.This is because the data distribution and significance (p-value) of the correlation is highly affected, as the number of pairs (pseudo mass load vs value of a descriptor) is increased, even if the number of WWTPs remains stable; meaning that different pairs are attributed to the same WWTP and one specific descriptor.Therefore, we suggest using one representative sample for each WWTP, for instance the daily composite sample of the Census night.Alternatively, when more than one sample is analysed, the mean pseudo-mass load of each feature should be used, leading to a single value.In the case where multiple samples collected on consecutive days including the Census night are analysed, certain days (e.g., weekends and days with special events) need to be avoided, as it has been proven that the consumption of some chemicals (e.g., recreational or illicit drugs) increases on these days.
Our analysis was based on direct injection of the samples and, therefore, it was unlikely to detect compounds found at very low concentration levels.Thus, for further clean-up, extraction and N.I.Rousis et al. preconcentration, the solid phase extraction procedure with multiple sorbent materials could be used.However, this extraction procedure should not be used in combination with a data-dependent acquisition mode, as only the most abundant precursor ions per scan (here 5) are selected for fragmentation, filtering out everything else.Therefore, the data-independent acquisition approach could be used to overcome this limitation, since no requirements on any information about the precursor ions are made and all ions are fragmented.However, this approach produces fragments that cannot always be associated to the precursor ion and, therefore, a sample re-injection would be necessary if complex MS/MS spectra are obtained.Furthermore, the analysis could additionally be performed in negative ionization mode, to investigate compounds that are not ionized with positive polarity.

Conclusions
We developed a robust methodology based on non-targeted analysis of untreated wastewater combined with Census data (socioeconomic characteristics of catchment areas) that can identify biomarkers of public health, human behaviour, and lifestyle and can be applied worldwide.This approach can provide complementary important information compared to other traditional techniques (e.g., human biomonitoring, cohort studies and clinical trials) related to public health.Our results showed that populations presenting disadvantage in income, education, occupation, and marital status were positively correlated with biomarkers and/or proxies of diseases, such as cardiac arrhythmias, cardiovascular disease, anxiety disorder and type 2 diabetes.This study demonstrates that the proposed approach can identify inequalities between areas and socioeconomic groups that affect the health status, as well as the potential drivers behind them.This approach can be used to aid policy makers and (inter)national organizations in highlighting these areas of inequality and develop strategies for improving living standards, creating social support for disadvantaged groups, and overcoming barriers to healthcare (e.g., hospital admissions, use of healthcare and health coverage).These actions will not only have a positive impact on public health, but also on a country's economy (e.g., reduced financial health costs and medical expenses).Overall, this approach can determine specific areas that may need more assistance than others on important public health issues.

Fig. 1 .
Fig. 1.Prioritization workflow for the identification of wastewater biomarkers related to social, demographic, and economic characteristics of societies.

Fig. 2 .
Fig. 2. Wastewater biomarkers of public health significantly correlated with socioeconomic indicators of marital status (separated or divorced people), occupation, the IRSAD (index of relative socioeconomic advantage and disadvantage), education and aging.

Table 1
Socioeconomic indexes for areas (SEIFA) descriptors investigated in the present study and their characteristics.
(continued on next page) N.I.Rousis et al.

Table 2
Parameters of the self-adjusting feature detection (SAFD) algorithm.

Table 4
Twenty-eight features presented high and very high (cut-off range |R| ≥ 0.700) correlations (p <.05) with the investigated descriptors.