Human fecal contamination of water, soil, and surfaces in households sharing poor-quality sanitation facilities in Maputo, Mozambique.

Identifying the origin of fecal contamination can support more effective interventions to interrupt enteric pathogen transmission. Microbial source tracking (MST) assays may help to identify environmental routes of pathogen transmission although these assays have performed poorly in highly contaminated domestic settings, highlighting the importance of both diagnostic validation and understanding the context-specific ecological, physical, and sociodemographic factors driving the spread of fecal contamination. We assessed fecal contamination of compounds (clusters of 2–10 households that share sanitation facilities) in low-income neighborhoods of urban Maputo, Mozambique, using a set of MST assays that were validated with animal stool and latrine sludge from study compounds. We sampled five environmental compartments involved in fecal microbe transmission and exposure: compound water source, household stored water and food preparation surfaces, and soil from the entrance to the compound latrine and the entrances to each household. Each sample was analyzed by culture for the general fecal indicator Escherichia coli (cEC) and by real-time PCR for the E. coli molecular marker EC23S857, human-associated markers HF183/BacR287 and Mnif, and GFD, an avian-associated marker. We collected 366 samples from 94 households in 58 compounds. At least one microbial target (indicator organism or marker gene) was detected in 96% of samples (353/366), with both E. coli targets present in the majority of samples (78%). Human targets were frequently detected in soils (59%) and occasionally in stored water (17%) but seldom in source water or on food surfaces. The avian target GFD was rarely detected in any sample type but was most common in soils (4%). To identify risk factors of fecal contamination, we estimated associations with sociodemographic, meteorological, and physical sample characteristics for each microbial target and sample type combination using Bayesian censored regression for target concentration responses and Bayesian logistic regression for target detection status. Associations with risk factors were generally weak and often differed in direction between different targets and sample types, though relationships were somewhat more consistent for physical sample characteristics. Wet soils were associated with elevated concentrations of cEC and EC23S857 and odds of detecting HF183. Water storage container characteristics that expose the contents to potential contact with hands and other objects were weakly associated with human target detection. Our results describe a setting impacted by pervasive domestic fecal contamination, including from human sources, that was largely disconnected from the observed variation in socioeconomic and sanitary conditions. This pattern suggests that in such highly contaminated settings, transformational changes to the community environment may be required before meaningful impacts on fecal contamination can be realized.


S1. Reference material for qPCR standard curves
Nucleotide Basic Local Alignment Search Tool (BLAST) searches were performed with the published primers and probe sequences (Table S1) for each candidate assay to ensure published sequence accuracy and to obtain the expected amplicon sequence (Agarwala et al., 2016). The matching amplicon sequence and ten additional bases on both ends were extracted from the GenBank database to serve as reference sequences . Because all three assays targeting Bacteroidales 16S rRNA genes matched the same B. dorei gene sequence, a single reference sequence was extracted spanning the entire region targeted by these assays.
The reference sequences obtained for the avian-associated assays were concatenated to construct a composite reference sequence for both assays. The reference sequences for the remaining (nonavian, non-16S) assays were likewise concatenated. These three composite reference sequences (Table S2) were commercially synthesized as gBlock artificial linear plasmids (Integrated DNA Technologies, Skokie, Il, USA) to serve as standard reference material for all candidate assays (Kodani and Winchell, 2012;Liu et al., 2013).

S2. Fecal sludge sampling apparatus
We built a simple apparatus consisting of a light-weight metal broom handle to which we affixed a ¾ inch PVC "tee" fitting using super glue on one end. A long, sterile, plastic sample bag was secured over the "tee" fitting end with a cable tie to the bottom portion of the shaft as a barrier against the latrine sludge. A sterile 50 mL conical centrifuge tube was attached to the "tee" fitting, outside the protective sampling bag and perpendicular to the sampler shaft, with a pair of cable ties. The apparatus was lowered into the latrine and a scraping motion was used to fill the open centrifuge tube from multiple locations on the sludge surface. After carefully lifting the sampling device out of the pit, the tube screw-cap was replaced and the tube was allowed to drop into a new sterile plastic sample bag by cutting the cable ties attaching the tube to the "tee" fitting with a razor blade. The protective sampling bag was removed from the shaft in a similar manner and the entire device was immediately sanitized with 10% bleach and 70% ethanol.

S3. DNA isolation from fecal samples for MST assay validation
DNA was extracted from animal feces and latrine sludge in Maputo using the FastPrep SPIN Kit for Soils (MP Biomedicals, Santa Ana, CA, USA). After lysing 500 mg thawed fecal sample in the supplied bead tubes by vortexing at maximum speed for 15 minutes, we completed the extractions according to the manufacturer protocol using a final elution volume of 70 µL.
Eluted DNA was treated with 17.5 µL DNAstable Plus and maintained at room temperature for up to 14 days during transport to the United States, after which samples were stored at 4 °C and analyzed within 6 months. Latrine samples were extracted in duplicate, and an extraction blank was processed with each sample batch for a total of four negative extraction controls (NEC).

S4. Assay diagnostic performance calculations
For each candidate assay, we counted the true positive (TP) and false negative (FN) fecal samples from its associated animal host, as well as the true negative (TN) and false positive (FP) samples from non-associated fecal sources. We characterized diagnostic performance as the proportion of host samples correctly identified (sensitivity), the proportion of non-host samples in which the microbial target was not detected (specificity), and the proportion of all samples correctly identified (accuracy), as follows:

S5. Processing of environmental samples
Elution of surface swabs was accomplished through vigorous manual shaking of the collection tubes containing swabs and Ringer's solution for 60 seconds (Pickering et al., 2012).
We eluted soil samples by adding 1 g wet soil to 100 mL sterile, distilled water in a sterile sample bag and vigorously shaking by hand for 60 seconds. After settling for 15 minutes, the supernatant was used for filtrations (Boehm et al., 2009;Pickering et al., 2012).
To culture E. coli, filters were placed on sterile cellulose pads (Pall, Port Washington, NY, USA) saturated with modified mTEC broth (HiMedia, Mumbai, India) in sterile 50 mm S8 metal plates and incubated at 44.5  0.5 °C for 22 -26 hours. Approximately 25 mL sterile PBS was added to the filter column before adding any sample volumes of 10 mL or less.
Following membrane filtration for culture-based E. coli enumeration, we filtered a larger volume of sample through the same column for molecular analysis. The polycarbonate filters were folded into 2 mL cryovials and immediately archived at -80 °C. Filters were transported frozen on dry ice from Maputo to the United States and stored at -80 °C until DNA extraction, with the exception of eight surface swab filters that experienced room temperature conditions for approximately 24 hours before extraction.
For each target, we analyzed each of the three dilution series in triplicate in three separate qPCR instrument runs, for 27 total reactions at each concentration-nine reactions from each dilution series on each of the three plates. To relate these separate instrument runs to the analysis of environmental samples, each extracted PC was re-assayed alongside the unextracted dilution series in duplicate reactions corresponding to 105 pre-extraction copies.

S7. Survival modeling to estimate detection limits and extraction efficiency
Theoretical lower limits of detection (tLLoD) were calculated using survival models to estimate the target concentration corresponding to a 95% probability of amplification, a common definition of LLoD for qPCR assays that requires substantial resources to establish empirically (Bustin et al., 2009;Stokdyk et al., 2016). Recognizing that not every copy of the target gene S9 will successfully amplify but assuming each copy has an independent and identical probability of doing so, we estimated an exponential dose-response relationship between target concentration and detection in the serial dilution series of standard reference material (Verbyla et al., 2016).
For each reaction containing log10 copies of the target, the detection status follows a Bernoulli distribution with probability given by where the survival coefficient is the probability that each copy amplifies (Schmidt et al., 2013).
We estimated and solved for , the log10 copy number per reaction for which = 0.95, using Markov chain Monte Carlo (MCMC) implemented in JAGS with a uniform Beta(1,1) prior on and three chains of 2000 warmup and 4000 sampling iterations each (Plummer, 2003). We characterized tLLoD for each assay as the mean and 95% credible interval (CI) of the posterior distribution of .
We used the unextracted standard dilution series, the dilution series of PCs extracted with the PowerSoil kit, and the dilution series of GeneRite-extracted PCs to estimate three separate tLLoDs for each target. Unextracted tLLoDs correspond to the minimum target concentration in individual reaction wells for reliable detection, while tLLoDs from the extracted dilution series reflect the number of target copies that must be present on the sample filter to ensure amplification following target loss during extraction. We estimated kit-specific extraction efficiency, the proportion of DNA recovered following extraction, as the ratio of tLLoD posterior distributions from unextracted and extracted reactions.

S8. Calibration curve and extraction efficiency estimates
Averaged across all batches and plates, the calibration curves derived from extracted positive controls were relatively linear (R2 > 0.95), although the amplification efficiency was S10 somewhat poor for some targets, particularly HF183 (Table S3). Reduced linearity and amplification efficiency were both likely related to the use of reference materials that had been subjected to an extraction procedure, which helps account for target loss during processing when quantifying unknown samples but introduces additional variability. Target loss to extraction procedures was substantial, as indicated by the extraction efficiencies implied by the ratio of extracted and unextracted tLLoDs (Table S4). The GeneRite kit consistently recovered a higher proportion of target DNA than the PowerSoil kit, though the recovery estimates for the GeneRite kit were also more variable than for PowerSoil. While the increased variability was likely partially due to the greater number of PCs extracted with the PowerSoil kit, it may also reflect lower consistency for the GeneRite extraction procedure. 3.0 (2.8, 3.2) 3.7 (3.5, 3.9) 3.2 (3.0, 3.4) 20 (10, 36) 57 (28, 100) a theoretical lower limit of detection from survival modelling as log10 gc/5 µL reaction b in samples extracted with Qiagen DNeasy PowerSoil kit c in samples extracted with GeneRite DNA-EZ ST01 kit S11

S9. Variables assessed in risk factor analysis
Household and compound characteristics identified as potential hazards include observed or reported feces, soiled diapers, standing wastewater, domestic animals in the compound yard, previous compound flooding, and disposal of child feces elsewhere than the latrine. We noted amenities including household floor material and onsite access to latrines, source water points, and electricity, as well as physical characteristics of the latrine, if present. We calculated a household wealth index from survey responses using an asset-based scorecard developed for Mozambique, excluding sanitation-related assets (Knee et al., 2018;Schreiner et al., 2013).
Other socio-demographic characteristics assessed include caregiver and household head educational attainment, household size and crowding (> three household members per room), and compound population and density. We represented population density as persons per latrine, per waterpoint, and per 100 m2 of compound area.
We obtained daily records for mean, minimum, and maximum temperature, mean wind speed, cumulative precipitation, and an indicator of whether any precipitation events occurred. In the case of insubstantial precipitation events, it was possible to both observe precipitation and report zero accumulation on the same day. Because meteorological variables were available only as daily summaries and sampling was conducted primarily in the mornings, we associated each environmental sample with meteorological values for the day prior to collection. We also calculated cumulative precipitation and the number of days with rain events in the week (seven days) and month (30 days) preceding sample collection.
Physical characteristics of each sample were observed during collection or determined during initial laboratory processing, in the case of soil moisture. Observed sample characteristics S12 include source water point location, water storage container attributes, food preparation surface attributes, soil sun exposure, and soil surface wetness.
Due to limited numbers of observations, we collapsed categorical variables with multiple possible responses into dichotomous variables for the risk factor analysis (Table S5). Where applicable, responses were combined such that the response expected to have the strongest relationship with fecal contamination was compared with all others (e.g., child feces disposal in the latrine compared with all other disposal locations). Otherwise, the most frequent response was compared with all others (e.g., plastic water storage containers compared with all other materials) or subcategories were combined into their parent category (e.g., ownership of different animal types represented as ownership of any animals compared with no animals). Precipitation was somewhat infrequent, so we represented precipitation variables as cumulative sums over both the seven and 30 days preceding the sampling event to obtain positive values and investigate different temporal scales. All continuous variables were scaled to improve interpretability of model estimates for meaningful changes the value of the variable. Nonprecipitation variables were also mean-centered to aid model convergence and provide effect estimates relative to the typical household, compound, or sampling day. S13 reduced sun exposure dichotomous 1 if soil was not fully exposed to sunlight (i.e., partially or fully shaded) 0 otherwise; e.g., full sun exposure visibly wet dichotomous 1 if soil surface was visibly wet at time of collection 0 otherwise; e.g., soil surface appeared dry by visual inspection S15

S10. Tabular summaries of model results
Separate univariable models were fit for each combination of risk factor variable, sample type, and microbial target, with the risk factor variable treated as the exposure variable and the microbial target outcome in a given sample type as the response variable. The association between each risk factor and continuous response variables (cEC and EC23S concentrations) was estimated as the expected change in log10 concentration of the microbial target for a one-unit increase in the value of the characteristic. Similarly, associations with binary response variables (detection of HF183 and any human target) were estimated as the odds ratio of detecting the target given a one-unit increase in the value of the characteristic. In addition to the magnitude of the effect estimate, we characterized the strength of each association using the 95% CI of the effect estimate. Variables for which the 95% CI included the corresponding null value (zero for continuous responses and one for binary responses, each signifying no effect) were considered not to be significant risk factors of contamination by the microbial target in the sample type under consideration. Estimated associations between sanitary, sociodemographic, and meteorological characteristics and microbial outcomes for each sample type and microbial target are presented in Table S6 and estimates for sample type-specific characteristics are provided in Table S7.