Population density controls on microbial pollution across the Ganga catchment

For millions of people worldwide, sewage-polluted surface waters threaten water security, food security and human health. Yet the extent of the problem and its causes are poorly understood. Given rapid widespread global urbanisation, the impact of urban versus rural populations is particularly important but unknown. Exploiting previously unpublished archival data for the Ganga (Ganges) catchment, we ﬁ nd a strong non-linear relationship between upstream population density and microbial pollution, and predict that these river systems would fail faecal coliform standards for irrigation waters available to 79% of the catchment's 500 million inhabitants. Overall, this work shows that microbial pollution is conditioned by the continental-scale network structure of rivers, compounded by the location of cities whose growing populations contribute c. 100 times more microbial pollutants per capita than their rural counterparts. © 2017


Background
Rising demands on water resources raise concerns about the sustainable provision of clean water worldwide.Unclean water poses significant risks of diarrhoea, opportunistic infections, and consequent malnutrition accounting for ~1.7 million deaths annually, of which >90% are in developing countries and almost half are children (Prüss-Ustün et al., 2014).These deaths are primarily due to ingestion of faecal pathogens from humans or animals (Ashbolt, 2004;Kotloff et al., 2013;Prüss-Ustün et al., 2014).India's growing population and economy are driving rapid urbanisation (30% of the population now live in urban areas (Census of India, 2011a)) and exerting increased pressure on surface and groundwater availability.In rural areas ~67% of the population defecate in the open (Census of India (2011b)), a practice that poses severe risk to health and safety (Clasen et al., 2010;Mara et al., 2010;Ziegelbauer et al., 2012;Kotloff et al., 2013).In urban areas ~80% of the population have access to a toilet (Census of India (2011b)), but only ~30% are connected to a sewage pipeline and few pipelines are connected to a treatment plant (Narain, 2012).The impact of these sanitation problems on surface water quality has been documented for many years at individual sample locations or river reaches across India (Bhargava, 1983;Mukherjee et al., 1993;Baghel et al., 2005;Mishra et al., 2009;Central Pollution Control Board, 2010).However, there has been no catchmentwide quantification of the problem and limited indication of what is driving it.The former is essential to fully understand the scale of intervention required, while the latter might inform decisionmaking on 'what to do where'.Urban areas often dominate the microbial pollution signal in rivers (Tchobanoglous et al., 1991;Kay et al., 2008;McGrane et al., 2014) but there is little consensus on the extent to which this reflects an increased impact per capita or simply a larger population and thus source.This difference is important since a higher per capita impact indicates reduced attenuation, perhaps due to more efficient delivery to the river system or less efficient treatment.If the difference can be attributed to per capita contribution this will define the extent to which urban or rural focused interventions will improve surface water quality.
We address this question using archival water quality data from across the Ganga (Ganges) catchment and show the pattern of microbial pollution in its major rivers.We compare instream concentrations of a pollution proxy with upstream densities of the two major sources of faecal pathogens (humans and livestock) at 100 sites spanning an approximate surface area of 10 6 km 2 .
Faecal pathogens are difficult to measure; however thermotolerant coliforms, which originate in faeces (i.e.faecal coliforms, FC), are easily detectable and routinely monitored as indicator organisms (Ashbolt et al., 2001).FCs are not a perfect predictor of human pathogen presence, rather they establish connectivity between defecation and some receiving environment which could be contributed to by a pathogen carrier.New host-specific tracing techniques allow more precise tracking of microbial pollution sources that can help to better assess risks to human health (Harwood et al., 2014, Field andSamadpour, 2007).However, such techniques are not used within routine monitoring in India and thus do not have the spatial coverage required for our analysis.Furthermore, the use of FCs for monitoring pollution is still regarded as a viable measure of drinking and irrigation water quality (WHO, 2017).
Two key issues that must be addressed are: 1) the extent to which the FC signal that we observe reflects human sources; and 2) the potential impact of FC die-off in our pollution tracer.Upstream livestock and human population densities are strongly correlated at the catchment scale, limiting our capacity to identify the source of the pollution signal.To address this, we seek to de-correlate the predictor variables by using a mixing model to estimate contributions from each non-overlapping segment of the catchment (our sub-catchments).To address the impact of die-off in our pollution tracer we adjust the population and livestock densities using a distance decay function then seek decay parameters that will maximise performance of our statistical model.
In the sections that follow we first introduce our null hypothesis that pollution should be linearly related to source density (both with and without accounting for die-off).We then detail our data sources and methods for their analysis, and introduce the mixing model that we use to calculate effective FC concentrations and source densities for each sub-catchment (the non-overlapping segments of the catchment).

Theory: expected relationship between FC concentration and upstream source density with and without die-off
The FC concentration (C FC ) at a given location is defined by the ratio of the FC flux (Q FC ) to the water flux (Q w ): Under the assumption that there is no die-off in FCs over time, the FC flux is calculated from: where: P h is the production rate of FCs per human head [MPN # À1 T À1 ]; P a is the production rate per head of livestock [MPN # À1 T À1 ]; N h and N a are the total upstream populations of humans and livestock respectively [#]; r h and r a are the upstream population densities of humans and livestock respectively [# L À2 ]; and A is the catchment area [L 2 ].Under the assumption of spatially uniform and time invariant runoff R w [L T À1 ] the water flux Q w [L 3 T À1 ] is calculated from: Substituting equations ( 2) and (3) into equation (1) gives the following equation for FC concentration at each measurement point as a function of upstream population density.

C FC
where: k h¼ P h /R w and k a ¼ P a /R w .It is clear from this relationship that under these assumptions C FC should be a linear function of upstream population and livestock density with the gradients defined by the ratio of production rate, P, to runoff, R w .
The assumption of no FC die-off is unlikely to be true but controls on die-off remain poorly understood.Given the uncertainties, die-off is most often represented using an exponential decay based on first order kinetics (Crane and Moore, 1986;Sadeghi and Arnold, 2002;Cho et al., 2012): where: Q 0 is the FC flux at time t 0 (the time of exit from the gut) Assuming uniform time invariant FC velocity from source to measurement point the FC flux Q FC can be expressed as a function of distance: where: x is the travel distance from source to measurement point [L] and v is the characteristic velocity [L T À1 ].Changing population (of people or livestock) with distance x upstream of the sampling point can be calculated as the derivative of N(x): Assuming that FC production rates are time invariant and incorporating characteristic velocity into the decay coefficient to express decay in terms of distance, the FC flux can be calculated by combining equations (2), ( 6) and ( 7) and integrating over the range of travel distances from the measurement point to the furthest point upstream: where change in population (for both humans and livestock) and area are a function of travel distance; and k ¼ k 1 /v the distance decay coefficient [L À1 ].Substituting equations ( 3) and (8) into equation (1) gives the following equation for FC concentration: This can be implemented in discrete form by summing over the ncells upslope of the measurement point where for each cell the flow path lengths and routes are derived from digital elevation data, and human and livestock population data from the sources described below.
where: r hi and r ai are the density of human and animal populations respectively in cell i; A i is the area of cell i; and x i is the average flowpath length from cell i to the measurement point.Rearranging and simplifying equation ( 10) gives: r ai e Àkx i (11) where: k h¼ P h /R w and k a ¼ P a /R w .Re-arranged in this form, equation (11) shows that accounting for FC die-off, C FC remains a linear function of population and livestock density transformed to account for flowpath length.As in the no die-off case (equation ( 4)), the linear coefficients reflect the ratio of (human or livestock) production rate to runoff.

Methods
We used water quality samples from 100 locations across the Ganga catchment (Fig. 1), collected and analysed by six agencies following a uniform protocol.Total and faecal coliform concentrations were estimated using the standardised 9221 B and 9221 E multiple tube fermentation techniques (APHA, 1995) to establish the most probable number (MPN) of faecal coliforms per 100 ml.At each site, we collated 10 years of data (2002e2012).The frequency with which these data were sampled varies between sites, from three samples per year at the two most remote Himalayan sites, to quarterly for 24 more Himalayan sites and one or two samples per month at the remaining sites.At ~30 sites, samples were collected at two locations across the river in some years in order to improve representation.These data were quality checked for potential data entry or measurement errors.We removed a total of 63 observations where FC concentrations exceeded Total Coliform (TC) concentrations (since FC is a subset of TC).We also removed two observations at a single site on the same date where FC concentration exceeded 10 10 MPN/100 ml.We consider this to be suspicious given that the concentration is ~100 times the upper end of the range of observed concentrations for sewage influent (Tchobanoglous et al., 1991).Removing suspicious observations results in a loss of <0.5% of the full dataset and <3% at any individual site.The error-checked FC data at each site were poorly approximated by a normal distribution but were generally well approximated by a log-normal distribution, thus we used geometric means to summarise FC concentration for each site throughout our analysis.
To estimate upstream population density we used the GPWv3 gridded synthesis of census data from 2000 (Balk and Yetman, 2004;Balk et al., 2010).To estimate livestock density we used the FAO global gridded livestock density data (Wint and Robinson, 2007;Robinson et al., 2014), weighted by estimates of FC production rates for each livestock type (cow and buffalo: 10 11 MPN/# day; goats and sheep: 1.2 Â 10 10 MPN/# day; pigs: 1.1 Â 10 10 MPN/# day; poultry: 1.4 Â 10 8 MPN/# day) (ASAE Standards, 1998).Upstream area, upstream population density (UPD) and upstream livestock density (ULD) for each sample point were calculated using a D8 flow routing algorithm (O'Callaghan and Mark, 1984;Schwanghart and Scherler, 2014) and the hydrologically corrected 90 m SRTM DEM (Farr et al., 2007).To examine the influence of coliform die-off in transit and thus relax the assumption that coliforms behave as conservative tracers we introduced an exponential decay in coliform concentration with distance from the source.We sampled the shape parameter that defines the rate of distance decay at 500 logarithmic intervals from 10 À8 to 10 À1 km À1 testing model performance in each case using ordinary least squares regression.

Mixing model
The observation locations form a nested set of catchments where 82% of observation sites have at least one observation site upstream.We deal with this nested sampling in two ways.First, by assessing the results for only non-nested (independent) Sites with thick blue outlines pass Indian Government desirable standards of <500 MPNN/100 ml; those with thin blue outlines pass the upper limit of <2500 MPNN/100 ml (Central Pollution Control Board, 2008).Rivers are labelled in blue; cities are labelled in black, with approximate populations, in millions, in brackets and grey boxes to show approximate extent.Inset shows a location map of the Ganga catchment.(Forinterpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)catchments; however this considerably reduces the number of available observations.Second, by performing the analysis using sub-catchments, where these are defined as the part of the catchment that drains to the current sample site without first draining through any upstream sample site.The result of this definition is segmentation of the entire Ganga catchment into 100 nonoverlapping sub-catchments.
Effective source density and FC concentration are then calculated for each sub-catchment using an approach similar to that of Granger et al., (1996) and Portenga et al., (2015) for effective erosion rates in nested catchments.To do this we assume that catchment area can be used as a proxy for discharge (equation ( 3)) and use a mixing model to calculate the concentration of the FC input for the sub-catchment (C FCr ) given the catchment area and FC concentration at the upstream and downstream boundaries: where: C FCui is the FC concentration at upstream boundary i; and C FCd is the FC concentration at the downstream boundary of the sub-catchment; A ui is the catchment area for upstream boundary i; A d is the catchment area for the downstream boundary of the subcatchment; and n is the number of upstream boundaries.We repeat the same process to calculate the human and animal population densities (r r ) within the catchment area that drains into this subcatchment: where: r ui is the upstream population density of upstream boundary i; and r d is the upstream population density of the downstream boundary of the sub-catchment.

Observed pattern of FC concentrations
Our results suggest that high FC concentrations previously reported at the reach and sub-catchment scale (Mukherjee et al., 1993;Baghel et al., 2005;Mishra et al., 2009;Central Pollution Control Board, 2010) do not reflect isolated pockets of poor water quality but extensive pollution across the catchment.Decadal mean FC concentrations at sites across the Ganga catchment range from 3 Â 10 0 to 2.5 Â 10 6 MPN/100 ml.70% of sites fail Indian Government desirable bathing limits (Central Pollution Control Board, 2008), with those that pass located almost exclusively in the sparsely populated catchment headwaters.On the more populous plains, 70 of the 80 sites fail the desirable limits and 63 of the 80 sites fail the maximum permissible 2500 MPN/100 ml limit (Central Pollution Control Board, 2008).Locally high FC concentrations are generally associated with large population centres (Fig. 1), most markedly for rivers with smaller catchment areas (e.g. the Varuna at Varanasi).FC concentrations are moderately reduced downstream of the Yamuna-Ganga confluence as tributaries with lower FC concentrations dilute the main stem.Further downstream, even large cities (e.g.Patna) have limited influence and many samples on the main stem have very similar FC concentration, reflecting the central tendency of water quality with increasing catchment area for nested catchments.

Catchment scale relationships between FC concentration and upstream source density
Since people and livestock are the primary sources of FCs, we expect FC concentration to increase with the upstream density of these sources.Fig. 2 suggests that the data fit this expectation.If FC production per capita is spatially uniform, delivery to the river is independent of population density, and if water flux is a linear function of catchment area, then we expect FC concentration to be a linear function of upstream source density with the form y ¼ bx (see equations (1)e(4) for a full derivation).Variability in delivery to the river network, or transit time through the river network, that is uncorrelated with population density will introduce scatter to the relationship but should not alter its functional form.However, comparing the data with linear contours in Fig. 2 shows that the data are not a good fit to a linear function (r 2 < 0.1).Power functions are a better fit (r 2 ¼ 0.69 for UPD and 0.62 for ULD) but over-predict high and low FC values and under-predict central values with both UPD and ULD.Quadratic relationships offer a further improvement (r 2 ¼ 0.71 for UPD and 0.68 for ULD) suggesting positive curvature in log-log space but have a physically unreasonable negative slope at low population densities.Residuals from the quadratic function, fitted by ordinary least squares regression, for both population and livestock show some heteroscedasticity, though White (1980) and BPK (Breusch and Pagan, 1979;Koenker, 1981) tests return p-values that are always below 0.1.Given this moderate heteroscedasticity and the insensitivity of ordinary least squares coefficients to heteroscedasticity we do not pursue more complex variance weighted analyses.UPD alone explains slightly more of the variance in FC concentration than ULD, but there is little difference between the explanatory power of these predictors, and their combination in a multiple quadratic regression offers little improvement (R 2 ¼ 0.71).This is consistent with the strong correlation between upstream population and livestock densities (Fig. 2c).A cubic function constrained to monotonic increase over the range of the data gives a similar performance to the quadratic (r 2 ¼ 0.71 for UPD and 0.68 for ULD).A linear spline (in log-log space) with a single interior knot (i.e.piecewise power function) is the best-fit for both individual predictors (r 2 ¼ 0.73 for UPD and 0.71 for ULD), suggesting a threshold rather than continuous change in power relationship between UPD and FC concentration.Finally, we test one further null hypothesis that there are two ranges of source density (population or livestock) with FC concentrations represented by their average value over each range.This model is important to exclude given the appearance of clustered points within Fig. 2 but has difficult physical implications.It implies a step change in contribution at some source density and a constant contribution independent of source density change (i.e. a declining per-head contribution) within each range.The 'step model' (r 2 ¼ 0.69) does not outperform any of the curved functions (quadratic, cubic or linear spline) for UPD, though it is a slight improvement on quadratic and cubic spline functions for ULD (r 2 ¼ 0.70).These results demonstrate that there is positive curvature to the FC-UPD relationship independent of the particular functional form (quadratic, cubic or linear spline) under consideration; and that the FC-ULD relationship also contains positive curvature but can be almost as well described as two FC distributions at high and low population density.

Sub-catchment relationships between FC concentration and upstream source density
As for catchment analysis, the sub-catchment analysis suggests that people and livestock are the primary sources of FCs with FC concentration increasing with the upstream density of these sources (Fig. 3).The relationship between source density and FC concentration is not linear for sub-catchment based analysis or catchment based analysis.Fig. 3 shows that as in the catchment analysis the data are not a good fit to any linear model (r 2 < 0.2).Power functions are a better fit (r 2 ¼ 0.54 for UPD and 0.17 for ULD) but over-predict high and low FC values and under-predict central values for UPD.For ULD the fit is very poor, suggesting that in the sub-catchment based analysis livestock density is only a weak control on FC concentration.Quadratic relationships (in loglog space) offer further improvement for UPD (r 2 ¼ 0.63) but not for ULD (r 2 ¼ 0.16).UPD alone explains considerably more of the variance in FC concentration than ULD.Their combination in a multiple quadratic regression offers some improvement (R 2 ¼ 0.72).This reflects the reduced correlation between UPD and ULD for sub-catchment (r 2 ¼ 0.66) rather than catchment (r 2 ¼ 0.95) analysis (compare Fig. 2c with 3c).The linear spline with a single knot (i.e.piecewise power function) or cubic function (in loglog space) constrained to monotonic increase result in similar fits relative to a quadratic for both UPD (r 2 ¼ 0.63 in both cases) and ULD (r 2 ¼ 0.16 and 0.15 respectively).This suggests that there is not clear evidence for a threshold rather than continuous change in power relationship between UPD and FC concentration when examined at the sub-catchment scale.The results from these three (quadratic, cubic and linear spline) approaches demonstrate that there is positive curvature to the FC-UPD relationship independent of the particular functional form under consideration.They also demonstrate that UPD is a far better predictor than ULD for subcatchment scale analysis and that there is some merit in considering the two in combination.This suggests that most instream FCs are human derived.

Per capita impact on instream FC concentration
Positive curvature to the FC-UPD and FC-ULD relationships indicates that FC concentration increases with upstream source density at an increasing rate per unit increase in upstream source density.This can be interpreted as the change in FC per capita with increasing upstream source density.The gradient of the line in logarithmic space reflects its exponent in linear space thus: values > 1 indicate positive curvature and increasing per capita impact, those <1 indicate negative curvature and decreasing per capita impact with increased source density.At low upstream source densities (<10 people or <6 livestock per km 2 ), FC concentrations are low and the gradient of all three best-fit curves is slightly less than one indicating a slight decline in per capita impact with increasing upstream source density.At source densities from 10 to 60 people or 6e30 livestock per km 2 the gradient of all three best-fit curves reaches then exceeds unity, indicating that per capita impact reaches a minimum and begins increasing with increasing upstream source density.
For population density, quadratic, cubic and linear spline fits all predict a very similar relationship between UPD and FC concentration for 10 2 <UPD<10 3 #/km 2 (Fig. 2a).Over this range the predicted FC concentration increases by three orders of magnitude (from 10 2 to 10 5 MPN/100 ml), indicating a 100-fold increase in per capita impact.Over the same range in population density (10 2 <UPD<10 3 #/km 2 ) there is considerable variability in the per capita contribution from no change at the lower limit to a 10,000fold increase at the upper limit.
A similar comparison can be made for individual sites, with the linear trend lines in Fig. 2a acting as contours for per capita impact.For example, moving downstream from the catchment with lowest population density, UPD increases 10-fold from Badrinath to Srinagar (7e77 #/km 2 ) but FC concentration increases only three-fold (3e10 MPN/100 ml), thus per capita impact declines by a factor of 3. Continuing downstream from Srinagar to Kanpur UPD increases by a factor of 6 (77e450 #/km 2 ) while the FC concentration increases by a factor of 1600 (10e1.6Â 10 4 MPN/100 ml), thus impact per capita increases by a factor of 300.Per capita impact increases by a factor of 60,000 from its minimum for the rural Pindar catchment (B) to its maximum for the densely populated Yamuna at Delhi (A).These results indicate that urban populations contribute more sewage to the river per capita than rural populations and that this increase: 1) depends on the difference in population densities, rather than changing sharply at a particular density; 2) is large on average (a factor of 100); and 3) is highly, and asymmetrically, variable (ranging from a factor of 1e10,000).

The relative importance of human or livestock FC sources
Both UPD and ULD are good predictors of FC concentration based on catchment scale analysis.This may reflect the importance of both sources, but is also very likely due to the strong positive correlation between UPD and ULD in the catchment based analysis (Fig. 2c), which makes it difficult to distinguish between the sources based on these data alone.When calculated over large areas population and livestock density are highly correlated.However, at small scales population and livestock density can become decorrelated (e.g. in cities, where population density is high but livestock density low).Our sub-catchment based analysis breaks the catchment into smaller non-nested segments, disrupting the correlation between UPD and ULD (Fig. 3c).This analysis shows a small reduction in the percentage of variance in FC concentration explained by UPD and a large reduction in that explained by ULD.In the sub-catchment based analysis UPD is a much better predictor of FC concentration than ULD.This is consistent with simple accounting estimates of export coefficients calculated using population and livestock densities with estimated FC production rates for the loading terms and observed FC concentration as the output.Assuming a human production rate of 2 Â 10 9 MPN/# day (Tchobanoglous et al., 1991) and livestock production rates detailed in the methods section, livestock-derived FC loads produced on any given day range from 2 Â 10 10 MPN/km 2 day (for ULD ¼ 3 #/km 2 ) to 1.5 Â 10 13 MPN/km 2 day (for ULD ¼ 200 #/km 2 ) while population derived FC loads range from 1.4 Â 10 10 MPN/km 2 day (for UPD ¼ 7 #/km 2 ) to 2 Â 10 12 MPN/ km 2 day (for UPD ¼ 1000 #/km 2 ).Yet over this range of source densities FC concentrations increase from 2 Â 10 0 to 1 Â 10 5 MPN/ 100 ml on average.This results in export coefficients >100 times larger at high livestock and population densities than at low densities.It is difficult to conceive of a mechanism for such an increase in export coefficient for livestock-derived FCs as a function of source density.

The relative importance of local or non-local FC sources
UPD is a good predictor of instream FC concentrations across the Ganga catchment, explaining 73% of the observed variance in decadal mean FC concentrations from a catchment scale analysis and 63% from a sub-catchment scale analysis (Figs. 2a and 3a).This is consistent with findings from catchments across the world (Tchobanoglous et al., 1991;Kay et al., 2008;McGrane et al., 2014), and with previous reach-scale findings in the Ganga Catchment (Mukherjee et al., 1993;Baghel et al., 2005;Mishra et al., 2009;Central Pollution Control Board, 2010).However, there remains considerable variance in FC concentration unexplained by either UPD or ULD, particularly at high population densities, >100 people/ km 2 (Figs. 2 and 3).Previous reach-scale studies did not account for the upstream boundary condition either in terms of FC flux or upstream population (Mukherjee et al., 1993;Baghel et al., 2005;Mishra et al., 2009).These studies implicitly assumed that point sources proximal to sample sites dominated the FC signal (perhaps due to coliform die-off in transit).However, while many of our sites near larger settlements have high coliform concentrations, these concentrations are better explained by upstream population density (r 2 > 0.7) than population of the nearest settlement (r 2 ¼ 0.25).Examining paired samples above and below settlements suggests that, in some cases, positive residuals (where FC concentration is greater than predicted) may reflect sites immediately downstream of population centres.However, including a distance-decay function in our analysis did not improve our ability to predict FC concentrations.Fig. 4 shows that model performance is initially stable as the rate at which FCs decay with distance increases, but that the performance is never better than that without distance decay, and that performance declines markedly for decay rates greater than 0.01%/km.This reduction in performance relates to a reduction in decay-adjusted population density, primarily at sites with intermediate or dense populations (Fig. 5).These results suggest that, UPD is an important but not singular factor in defining the connectivity between sources and receiving waters that defines the timescales and thus efficiency of delivery.Our approach neglects many processes that should be important in the transport of coliforms from source to the point of measurement (e.g.weather dependent die-off rates, hydrological connectivity, hydraulics at the cross section and reach scale).However, it is encouraging that even our simple empirical model explains a large fraction of the variance in microbial pollution concentrations.

Implications of the FC-UPD relationship
The increase in per capita impact as UPD increases likely reflects an increase in the efficiency of delivery rather than FC production, perhaps due to changes in individual or corporate waste management decisions as population density increases.At low population densities, much of the population defecate in the open or in pit latrines (Census of India (2011b)) where faeces are less likely to be washed into the river and FCs are more likely to die in situ.As population density increases and towns and cities grow, the distance to open fields increases and there is a need for an alternative strategy to manage faeces.This problem has historically confronted communities across the world, leading to degradation of sanitary conditions and construction of sewers (Gandy, 2004;Allen, 2008;Benzerzour et al., 2011).Sewage systems vary in sophistication but generally involve the movement of excreta by water out of the population centre, often made possible by piped domestic water.The faeces have a much shorter residence time in the environment and FCs will be removed primarily by sewage treatment rather than die-off in the environment.In many Indian cities, the flux of sewage that is, and must be, removed from the population centre through a growing network of sewers and storm water drains is many times higher than the capacity of the sewage treatment facilities (Ansari et al., 2000).In this case the predominant impact of the sewage network is to remove the sewage from the population centre and rapidly deliver it to the river untreated.Sewage removal is essential for the public health of the city, but without effective treatment it comes at the cost of accentuated river pollution with associated public health implications for the population downstream.Here we demonstrate as others have (Central Pollution Control Board, 2010) the severe river pollution that results.The extent to which this can be addressed by following the same trajectory towards centralised 'end-of-pipe' sewage treatment has been called into question for practical and economic reasons (Jha, 2003;Bracken et al., 2007;Katukiza, 2012).However, there is a growing range of innovative, water and energy efficient, on-site alternatives (Jha, 2003;Bracken et al., 2007;Gates Foundation, 2014) as well as a growing recognition that this is a social as well as physical or technical issue (Burra et al., 2003;Sharma and Bhide, 2005;McFarlane, 2008).
It is important to emphasise that our results do not imply that open defecation is a safe approach to sewage management.Water is not the only vector for faecal-oral disease; transmission can also occur through food, insects, and direct contact (Wagner and Lanoix, 1958).Thus safely disposing of faeces involves more than simply ensuring that they do not enter the watercourse.There is good evidence to suggest that open defecation is extremely problematic for public health and safety (Clasen et al., 2010;Mara et al., 2010;Ziegelbauer et al., 2012).

Network structure controls the spatial pattern of microbial pollution
The relationship between upstream population density and FC concentration enables a simple predictive relationship, albeit with considerable scatter.This model predicts that 33e48% of rivers in the Ganga catchment fail the Indian Government's safe bathing standards, depending on the choice of standard (Fig. 6).However, many of those rivers that pass are in sparsely populated headwaters.For 70e85% of the catchment's population, their nearest river fails safe bathing standards (Central Pollution Control Board, 2008); for 79% it should not be used for flood irrigation, irrigation of crops eaten raw or where children are involved in farming (WHO, 1989;Blumenthal et al., 2000); and for 51% it should not be used for irrigation with sprinklers (Blumenthal et al., 2000).
The pattern of predicted FC concentration from this empirical model is strongly influenced by the catchment's network structure Fig. 5. Scatter plots of faecal coliform (FC) concentration against upstream population density (UPD) adjusted with an exponential distance decay using a range of decay coefficients (k).Panels reflect decay rates of: a) 0%/km, b) 0.01%/km, c) 1%/km and d) 10%/km.Best model performance is for no decay (k ¼ 0); small coefficients (k < 10 À4 ) have little effect; larger coefficients result in a breakdown in the relationship between UPD and FC concentration.Fig. 6.Spatial pattern of predicted coliform concentration.Dark blue areas have concentrations below 500 MPN/100 ml, the Indian Government's desirable limit for safe bathing (Central Pollution Control Board, 2008); light blue areas have concentrations below 2500 MPN/100 ml, the upper limit for safe bathing (Central Pollution Control Board, 2008).Inset shows the fraction of the river network (blue) and population (red) for which the nearest river has an FC concentration less than the x-axis value.Letters signify: (a) USA limit for safe bathing (U.S. EPA, 1976); (b) Indian government desirable limit for safe bathing (Central Pollution Control Board, 2008); (c) WHO recommended limit for flood irrigation, or for crops eaten raw, or where children are involved in farming (WHO, 1989;Blumenthal et al., 2000); (d) Indian government upper limit for safe bathing (Central Pollution Control Board, 2008); (e) WHO limit for sprinkler irrigation (Blumenthal et al., 2000).(Forinterpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)(Fig. 6).Sparsely populated Himalayan headwaters produce high discharges of clean water suppressing FC concentrations far downstream; without this discharge, plains-fed rivers (e.g.Kali) have high FC concentrations throughout.The most polluted reach of the Ganga is predicted to be between Kanpur and Allahabad.Upstream of Kanpur the diluting effect of the headwaters persists while downstream of Allahabad the Ganga is diluted first by the less polluted Yamuna (strongly influenced by the Chambal) and then by the large left bank tributaries with their headwaters in the Himalaya.This may be the result of not only the topology but also the geometry of the network, since the Ganga at Allahabad is at its furthest point from the mountain front meaning cleaner Himalayan water must travel over a larger expanse of populated plain to reach that point.
Interventions high up the river network have the highest potential for impacting FC concentration for a given FC flux reduction because: 1) lower discharge on these rivers means that the same FC flux reduction will lead to a larger concentration reduction; and 2) rivers are directed networks (i.e. they accumulate) thus a reduction in FC flux at a given location will impact only reaches downstream of it.Decisions about what to do where are difficult and necessarily political, with many drivers (Bulkeley and Mol, 2003), but the findings of this study can help guide strategic investment in pollution reduction.

Conclusions
The rivers of the Ganga catchment are subject to widespread and, in places, severe microbial pollution.52e67% of measured sites fall below the Indian Government's upper and desirable limits for safe bathing; and for 61e70% of the population, model results suggest that their nearest river falls below these same bathing standards.The network structure of the Ganga catchment preconditions certain rivers to be highly polluted, and others (with large Himalayan headwaters) to be more robust against pollution, despite their location on the densely populated plains.The entire population upstream (not only those nearby) contribute to microbial river pollution but urban populations contribute more pollution per capita than rural populations.How much more depends on their respective population densities.A person living in an area with 1000 #/km 2 contributes on average 100 times more pollution to the river than they would in an area with 100 #/km 2 .While this is an average in the presence of considerable (asymmetric) variability, the denser population in this case contribute at least as much pollution per capita at the lower limit and up to 10,000 times more at the upper limit.Densely populated areas dominate surface water pollution in the Ganga catchment not only because they contain many people but because their faeces are more efficiently delivered to the river network.We suggest that this increasing efficiency reflects: the transmission speed of urban sewerage systems, delivering the coliforms to the river more quickly with less die-off; and the limited capacity for sewage treatment within these systems.Addressing this problem requires investment in both sewage removal and treatment whether by increasing existing sewerage capacity or implementing decentralised treatment solutions.

Fig. 1 .
Fig. 1.Network graph of decadal mean FC concentrations (circle colour) and catchment area (circle size).Large red circles indicate high FC concentration and water discharge (thus high FC flux); smaller green circles indicate lower concentration and discharge (thus low FC flux).Sites with thick blue outlines pass Indian Government desirable standards of <500 MPNN/100 ml; those with thin blue outlines pass the upper limit of <2500 MPNN/100 ml (Central Pollution Control Board, 2008).Rivers are labelled in blue; cities are labelled in black, with approximate populations, in millions, in brackets and grey boxes to show approximate extent.Inset shows a location map of the Ganga catchment.(Forinterpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 2 .
Fig. 2. Catchment scale analysis of faecal coliform concentration against: a) upstream population density; b) upstream livestock density adjusted for variable coliform production rates; c) co-variation between upstream population and livestock density.Trend lines show quadratic (solid), cubic (dotted) and linear spline (dashed) regressions for a and b, and linear regression for c.Solid circles show non-nested (i.e.independent) observations, n ¼ 18; crosses show the full dataset, n ¼ 100.Labelled points are: A) Yamuna catchment at Delhi; and B) Pinder catchment at Karanprayag.

Fig. 3 .
Fig. 3. Sub-catchment based faecal coliform concentration against: a) upstream population density and b) upstream livestock density adjusted for variable coliform production rates; c) co-variation between upstream population and livestock density; d) predicted v observed coliform concentrations from multiple cubic regression with upstream population and livestock density.Trend lines show quadratic (solid), cubic (dotted) and linear spline (dashed) regressions for a and b, and linear regression for c.Solid circles show nonnested (i.e.independent) observations, n ¼ 18; crosses show the full dataset, n ¼ 100.Labelled points are: A) Yamuna catchment at Delhi; and B) Pinder catchment at Karanprayag.Contours in c show prediction surface from multiple regression.

Fig. 4 .
Fig. 4. Model performance (Adjusted r 2 for FC concentration v decay-adjusted UPD) with varying distance decay coefficient (k) for the three empirical functions fitted in Fig. 2. Best performance is always for no decay (k ¼ 0); small coefficients (k < 10 À4 ) have little effect; larger coefficients result in a breakdown in model performance.