(Not So) Gently Down The Stream: River Pollution and Health in Indonesia ∗

Waterborne diseases are the leading cause of mortality in developing countries. We emphasize a previously ignored cause of diarrhea - upstream river bathing. Using newly constructed data on upstream-downstream hydrological linkages along with village census panel data in Indonesia, we ﬁnd that upstream river bathing can explain as many as 7.5% of all diarrheal deaths. Our results, which are net of avoidance behavior, show no eﬀect of trash disposal on diarrheal infections. Furthermore we ﬁnd that individuals engage in avoidance behavior in response to trash disposal (visible pollutants) but not river bathing (invisible pollutants). We conduct policy simulations to show that targeting upstream individuals could generate substantial environmental and health savings relative to targeting downstream individuals. This provides a potential roadmap for low- and middle-income countries with limited resources for enforcement of water pollution. results are relevant for policymakers interested in reducing mortality from diseases. Given the limited enforcement of water pollution in most low- and middle-income nations, we identify sources of pollution that individuals fail to avoid and crucially, where enforcement resources will be most eﬀective. When large-scale government programs aimed at river basin cleanups are ﬁnancially or technically infeasible, policymakers could enact cost-eﬀective prevention policies that could substantially reduce diarrheal incidence; we show that a 1% reduction in the most upstream decile’s river bathing activity reduces downstream diarrheal incidence by 2.54%.


Introduction
Each day, economic agents face a number of harms and irritants from which they seek to protect themselves. These threats are various, including environmental (e.g., air pollution, water pollution), social (e.g., crime, conflict) and incidental (e.g., accidents) in nature. Against each of these potential harms, individuals can self-protect (e.g. drinking bottled water to protect against water pollution, seat belts to protect against car accidents). To the extent that self-protection is costly (Bartik, 1988;Deschenes, Greenstone and Shapiro, 2012), individuals optimize self-protection and protect themselves to varying extents from each pollutant such that the marginal cost of avoidance is less than the marginal benefit from reduced exposure to pollutants. However, this duality between observed choice and experienced utility can break down when exposure to certain pollutants is not salient, resulting in low avoidance behavior or defensive underinvestment -even when such behavior can be welfare improving. This is of particular importance in developing countries, where individuals routinely underinvest in profitable and health saving technologies (Ashraf, Berry and Shapiro, 2010;Bryan, Chowdhury and Mobarak, 2014;Greenstone and Jack, 2015;Barrett, Garg and McBride, 2016). In such a context, an understanding of the nature and extent of externalities as a result of these "silent killers" can increase the efficacy of regulatory interventions designed to promote public health and environmental improvements.
In this paper, we consider the case of river pollution and resulting waterborne diseases in Indonesia. We focus on diarrhea, which globally accounts for more than 1.5 million deaths each year (WHO, 2014). Freshwater pollution is of particular importance in low-and middleincome countries where untreated river water is routinely consumed, in part due to the low enforcement of policies intended to prevent contamination. 1 Earlier work in the area of freshwater pollution has focused primarily on industrial waste disposal in rivers (Ebenstein, 2012). We emphasize a previously ignored and seemingly benign source of water pollution, in-river bathing, and show that it has a large welfare cost. We present three major findings: (1) upstream river bathing in Indonesia can explain as many as 7.5% of all diarrheal deaths in a given year, which over our four year sample translates to 865 diarrheal fatalities, (2) individuals exhibit avoidance behavior in response to upstream trash disposal (visible) but not to upstream river bathing (invisible) and (3) targeting upstream villages can reduce diarrheal mortality by 57% more than would targeting downstream villages.
The challenge in causally estimating the impacts of river pollution on public health is finding variation in river pollution that is exogenous to local health outcomes, and also large enough to be economically meaningful. While researchers have previously employed randomized designs in subsidies for provision of clean water (Ahuja, Kremer and Zwane, 2010;Kremer et al., 2011), to the best of our knowledge no one has used experimental or quasi-experimental variation to study the impact of specific river pollutants on local health outcomes. 2 We fill this void in the literature by constructing a novel data set of drainage basins in Indonesia to assign to each of the villages in our sample their respective set of upstream and downstream villages from approximately 5.8 billion possible upstreamdownstream hydrological linkages.
Using biennial village census data from 2000 to 2008 and employing village fixed effects, 1 Greenstone and Hanna (2014) find that while air pollution regulations had a measurable impact on infant mortality in India, water pollution laws had no measurable effect. 2 The closest study to our work is Ebenstein (2012) who uses rainfall as an instrument for water quality (and not polluting behavior or actual pollutant concentrations) at certain sites in China to study the impact of poor water quality on digestive cancers.
we rely on the identifying assumption that year-to-year changes in upstream polluting behavior are exogenous to downstream health outcomes, which is plausible for two reasons. First, household level in-river polluting behavior in Indonesia is de facto unregulated, minimizing the regulatory factors that could drive household pollution selectively towards low-income villages. Second, we use data on upstream polluting behavior rather than data on local pollution levels (Lipscomb and Mobarak, 2016) or official quality grades (Ebenstein, 2012).
As a result, we are not relying on correlating local pollution with local health outcomes, which could be spurious for many reasons including but not limited to geographic (Tiebout) sorting. Instead, we rely on upstream behavior that is plausibly exogenous to downstream health outcomes.
We test the validity of our identifying assumption and rule out geographic sorting through a battery of placebo and falsification tests and find that for a given village, bathing by upstream villages increases diarrheal incidence, while bathing by downstream villages has no effect. Furthermore, we show that the effect is specific to diarrhea (consistent with its waterborne nature), with no measurable effect on other non-waterbourne diseases. The absence of an effect on other diseases is inconsistent with the existence of a spurious correlation between upstream polluting and downstream health, as we would expect to see that manifest as a significant impact between upstream bathing and at least one of the other major diseases as well. 3 Importantly, while alternative explanations could exist for each of our results, we believe there is no plausible alternative explanation that would rationalize all of our empirical findings.
3 The remaining threat to identification would be if individuals geographically sort across upstream and downstream villages within a province in our time frame in response to time-varying omitted variables correlated with upstream polluting behavior and diarrheal outbreaks, but not upstream polluting behavior and outbreaks of malaria, measles, respiratory infections or dengue. Given that in Indonesia, downstream (coastal) villages tend to be economically better off than upstream villages, this seems very unlikely.
Our results are relevant for policymakers interested in reducing mortality from waterborne diseases. Given the limited enforcement of water pollution in most low-and middle-income nations, we identify sources of pollution that individuals fail to avoid and crucially, where enforcement resources will be most effective. When large-scale government programs aimed at river basin cleanups are financially or technically infeasible, policymakers could enact cost-effective prevention policies that could substantially reduce diarrheal incidence; we show that a 1% reduction in the most upstream decile's river bathing activity reduces downstream diarrheal incidence by 2.54%.
The policy implications of our work are underscored by contributions to several relevant research areas in economics. First, we build on the literature emphasizing the estimation of causal impacts of environmental quality on human health, particularly in developing countries (see Graff  for an exhaustive review). The construction of all riparian linkages between villages using a combination of geospatial and hydrological techniques enables us to overcome previously identified data limitations (Currie et al., 2014), and extend the research on the linkage between environmental quality and human health to freshwater pollution. Our data construction and identification method can generalize to other settings to understand downstream externalities when the path of the pollutant determines the marginal social cost of pollution.
We are unaware of any paper that isolates pollutants differentially on the basis of avoidance behavior, even though such compensatory behavior has been documented in several instances (Deschenes, Greenstone and Shapiro, 2012).
Third, we relate to a growing literature on the political economy of environmental and natural resources (Burgess et al., 2012;Brollo et al., 2013). Our finding that targeting individuals in upstream areas can generate much larger health savings than targeting individuals in downstream areas is consistent with other work in this literature, such as Lipscomb and Mobarak (2016), who show that within governing jurisdictions, water quality is lowest in the downstream areas.
The rest of the paper is organized as follows. In section 2 we provide an overview of water pollution regulation in Indonesia and the epidemiological evidence on the link between river bathing and diarrhea. In section 3 we describe the health and demographic data used in this paper, as well as the construction of the upstream-downstream village networks. Section 4 details the econometric strategy that we use and in section 5 we discuss the empirical results.
In section 6, we simulate different targeted moratoriums on river bathing and associated impacts on health outcomes. Section 7 provides our concluding notes.

Background
In this section, we provide a brief overview of the state of water pollution and associated regulations in Indonesia to demonstrate that river pollution, particularly originating from households, is de facto unregulated. We also provide evidence of the epidemiological foundations on the impact of river bathing on water quality.

Water Pollution Regulation in Indonesia
Indonesia has made recent advances in environmental regulation, including the 2009 Environmental Protection and Management Law that recognizes the "serious problem" of decreasing environmental quality, as well as executive actions designed to reduce emissions and other forms of pollution (Nachmany et al., 2014). Yet, the regulation of water pollution in Indonesia can be characterized as nominally mandated but not regulated for some industries, and fully non-existent for others. Ostensibly, any individual or business that purposely pollutes or otherwise damages water sources can face imprisonment for up to 9 years and a maximum fine of 1.5 billion rupiah (USD 115,000), in accordance with Article 94 of Indonesia's Law found that all 35 rivers that were tested across Indonesia were unsafe sources for drinking water (AECEN, 2008). 4 The most unregulated source of water pollution in Indonesia is household and municipal discarding of sewage. Households routinely dispose of waste directly into rivers, while the improper construction of municipal wastewater facilities leads to the disposal of untreated sewage into river waters (Kerstens et al., 2013). Nearly two-thirds of the Citarum River's biological oxygen demand (BOD) comes from household pollution, as compared to one-third from all industrial and agricultural activities combined (Kerstens et al., 2013). Regulation of water pollution at the household level is non-existent, with households polluting into lakes and rivers with de facto impunity.
Water pollution in Indonesia is also generated from industrial waste and agricultural run-off (GWP, 2013). Industrial polluting causes toxic materials such as heavy metals and mercury to enter and poison drinking water sources. The 1989 Clean Water Program (CWP), a government initiative to curb water pollution, achieved spotty reductions in industrial pollutants with disproportionate success in East Java (Lucas and Djati, 2000). The mixed success of the CWP may be attributable to its enforceability, as the program was designed to be voluntary (Bedner, 2010) and water pollution regulations across Indonesia generally do not apply to small firms and home industries (Braadbaart, 1995). 5

Implications of River Bathing
River bathing poses two major risks to human health, both of which are symptomatic of diarrhea. First, riparian bathing increases the amount of free carbon dioxide (CO 2 ) and decreases the amount of dissolved oxygen (DO) in rivers (Bhatnagar and Sangwan, 2009;Sharma, Bhadula and Joshi, 2012). Organic and biodegradable waste from the bathers is decomposed by microbes that use oxygen and release carbon dioxide back into river water. This effect is amplified by the use of soaps and detergents that are absorbed by aquatic flora. Higher CO 2 levels drive phosphate and alkalinity concentrations, which lead to river eutrophication. Consumption of water from eutrophic rivers has been linked to gastroenteritis (WHO, 2002) and cyanobacterial toxins (Scott et al., 1985;Wu, 1999), both of which can cause symptoms of intestinal pain, nausea, and diarrhea. Second, river bathing can lead to an increased presence of coliform bacteria, which are associated with harmful pathogens known to cause nausea, vomiting, and bloody diarrhea, especially among infants and those with compromised immune systems (Joshi and Sati, 2011;Tyagi et al., 2013). While fecal coliforms do not necessarily cause diarrhea, their presence is correlated with diarrhea pathogens that may arise from removing trace amounts of fecal matter from the body during bathing, the cleaning of infants directly in the river after defecation, and from bathers (particularly infants) defecating while bathing.

Data
In this section, we describe the health and demographic data, the geospatial methods used to conduct the upstream and downstream assignment of villages along Indonesia's hydrological network, and the classification of each village into an identifiable drainage basin.

Health and Demographic Data
The Indonesian statistical agency, Badan Pusat Statistik (BPS), conducts a biennial census of all Indonesian villages known as Podes. Our sample consists of an unbalanced panel of 32,107 villages 6 over years (2000, 2003, 2006 and 2008) spread across all major Indonesian islands with the exception of Java. 7 The census is conducted in a short span of 4-6 weeks in October and November and consists of an exhaustive questionnaire to which village heads respond. 8 The census contains village-level information on health, population, river location and other demographic variables.

Disease Data
The Podes is the most spatially and temporally expansive dataset on outbreaks and resulting deaths from five major diseases: dengue, diarrhea, malaria, measles and respiratory infections. In each year of the census, the village head is asked to report if there was an outbreak of each of the different infections in that year. To the best of our knowledge, the village head is not provided any instructions -such as a cut-off or a point of comparison with respect to deaths or infections -in determining whether an outbreak has taken place. As such it is feasible that there were disease-driven deaths in a given year that were unreported in Podes because the degree of spread was not determined "high" enough to be considered an outbreak. Therefore, we validate the accuracy of this 'outbreak' measure. We use actual mortality data from Podes that is only available for 4681 village-years. 9 In the appendix table A.1, we show the outcomes using log mean death rate per epidemic using the 'outbreak' variable and the log of actual death rate and show that the results are qualitatively and quantitatively similar. With this validation, we defer to using the outbreak variable since we have a larger number of observations over a more diverse area geographically. Use of the outbreak variable generates increased power for better and more nuanced estimates of the effect of upstream bathing on downstream health outcomes. 8 We are aware that some components of this information are verified at the sub-district or district offices but we do not have information of the precise sections of the survey that are verified. 9 We have 19,933 village-years, of which only 4681 village-years have more than one year of data and provide useful variation for our panel data approach.
Table 1 provides summary statistics on the probability of village-level outbreaks for different diseases. Diarrhea, following only malaria, is the second most prevalent disease in Indonesia, followed closely by respiratory infections. Diarrhea is slightly more prevalent in hilly areas relative to flatter ones, and in rural areas relative to urban settings. This geographic pattern of disease prevalence is consistent across all diseases with the exception of dengue, which is prone to urban areas. The differences between these groups are statistically significant but not meaningfully large (columns 2 and 3).

Population and Demographic Data
We also obtain population and demographic data from Podes. In addition to village population information, the census also contains information on whether a river passes through a village, which we use to ground truth the hydrological river network data. Podes also contains information on a range of socio-economic variables that we use for robustness in our econometric strategy: dominant source of income in village, geography of village, quality of governance (e.g. education of village head), access to medical facilities in the village and political status of the village.
The dominant form of trash disposal (e.g. carry away, burnt, river polluting, other) and bathing activity (e.g. in-river, other) is reported in Podes for each village. These are binary variables and the census does not contain information on the number of individuals in each category. As an alternative, we use village populations to construct our key explanatory variables: the number of individuals polluting in the river through trash disposal and in-river bathing.
To create these variables, we replicate the following exercise for both trash disposal and river bathing for both upstream and downstream aggregate measures, which we illustrate by considering our key independent variable -upstream river bathing. For a given year t, village v, with n vt villages upstream along the river that passes through village v, we define our key independent variable -the number of individuals engaging in upstream river bathing as, where population nt is the population of the n th t upstream village and bathing nt is a binary variable, which is equal to 1 if the majority of households in the village bathe in the river in year t, and 0 otherwise. We repeat this exercise for all downstream villages.
It is important to note that our independent variables could be either over-or understated.
Including all individuals as river bathers, where only the majority of individuals engage in river bathing, likely overstates the number of individuals bathing in rivers. However, excluding any individuals as river bathers in villages where less than a majority of individuals bathe in rivers understates the number of individuals bathing in rivers. The concern of unpredictable measurement error in our key independent variables is addressed in our placebo tests in table 7. 10 Since we construct our upstream and downstream variables in the same way, any bias should be present in both, and given that the downstream effects are negligible we are confident that this is not a source of bias in our results. Cautiously, we may interpret our results as the differential impact of upstream bathing health effects net of downstream bathing health effects. We find equivalence in the two interpretations due to the approximate null effects of downstream river bathing. 11

Construction of Drainage Basins Data and Assignment of Upstream and Downstream Villages
Linking villages along a hydrological network enables us to track the impact of upstream river bathers on downstream river users. Indonesia contains seven major islands with relatively mountainous and high-elevation interiors that create a complex hydrological network of streams and rivers. The official river network for the country fails to identify minor waterways that are being used by villages for bathing, drinking and trash disposal. Conducting a classic hydrological network analysis poses the risk of failing to assign villages located along minor rivers to the river network, which may understate upstream pollutant runoff (figure 2).
Instead of tracing the hydrological network directly, we adopt a watershed approach that identifies all upstream-downstream relationships within each basin using a high resolution digital elevation model (DEM), and then determine on-river status using survey data.
Proprietary approaches to processing such a DEM are less adept to managing canopy interference -where the presence of tree canopy is mistaken for terrain -which could also render an underestimation of the number of upstream villages connected topographically to a given downstream village. The problem of canopy interference is compounded by the approximately 5.8 billion possible village relationships across Indonesia. 12 11 Another way to characterize our key independent variable described in equation (4) is as a scalar multiple of the population weighted average of the number of villages where a plurality of households engage in polluting behavior, with zero weight being assigned to villages where the dominant source of bathing (or trash disposal) is not in the river. 12 We manage canopy interference and computational processing constraints by developing a clustered implantation of the A high-resolution 30 meter void-filled DEM was used alongside village administrative boundaries from Podes. A pour point was then constructed for each individual village that self-reported being located on a river. 13 The use of self-reported locations in this way provides a means to "ground truth" our data to avoid misclassifying on-river (type 1 error) and off-river (type 2 error) villages. The mapping of all upstream and downstream village relationships was conducted independently for each sample year (2000, 2003, 2006 and 2008) to accommodate the redistricting of administrative units. 14 The product of the hydrological analysis was a list of every Indonesian village and its ordered upstream counterparts across the four sample years (approximately 13.7 million upstream observations). The comprehensive nature of the GIS output enables us to (1)

Estimation and Identification Strategy
The challenge in identifying the effects of water polluting behavior on human health is finding exogenous variation in water pollution that is large enough to capture an economically measurable effect. There are many plausible reasons why exposure to, and consumption of, impure water may be determined endogenously. For instance, poorer individuals who have a lower stock of health may be financially or behaviorally constrained from consuming clean r.watershed and r.water.outlet algorithms in GRASS GIS v7. 13 Each pour point identified the village's maximum upstream catchment, which is bound by its drainage basin. 14 Three sets of verifications were conducted. First, the GIS open source algorithms were compared against the ESRI algorithms. Second, the flow accumulations were consistent with the official Indonesian River Network. Third, the construction of drainage basins was verified by ensuring the official rivers network flowed properly through each basin ensuring that the constructed basins were logically sound.
water. Instead of correlating local water pollution with local health outcomes, we focus on the diarrheal incidence in a given village due to individuals who are geographically separated but whose (unregulated or unenforced) polluting behavior may affect downstream villages through river networks. Relying on the identifying assumption that year-to-year changes in upstream polluting behavior are exogenous to downstream diarrheal incidence, we estimate a linear probability model: While focusing on geographically-separated polluting behavior can avert some endogeneity concerns, to the extent that individuals could geographically sort over time with wealthier individuals ending up in villages with cleaner water, the coefficients on upstream polluting, β 1j would remain biased. Since we use province-year fixed effects and thereby control for 15 We validate the use of the binary outbreak variable as opposed to data on death rates in the appendix ( all province-specific changes over time, we will focus on such geographic sorting within a province over time. To overcome these concerns, we test the validity of our identifying assumption with a battery of placebo tests. First, upstream pollution could have an effect on downstream individuals' polluting behavior, but downstream polluting should not have a direct effect on upstream individuals' health. Second, we estimate the effect of polluting behavior on diseases that are not transmitted through ingestion of contaminated water, such as measles, malaria, respiratory infections, and dengue. If we are estimating a spurious geography-health correlation instead of the causal effect of upstream bathing on downstream diarrheal incidence, then we should also see association with these other diseases that are not waterborne. The absence of such effects would support the identifying assumption. Third, we add a range of time-varying control variables for changing demographic and poverty characteristics. As detailed in the results section, each of these tests supports our identifying assumption and thereby increases our confidence that we are estimating the causal effect of polluting behavior on diarrheal incidence. Following equation (2), we also estimate avoidance behavior. In particular we test whether individuals reduce consumption of drinking water from the river in response to upstream polluting. (3) where H(·) is an indicator function equal to 1 if most people in that village drink water from the river.
Three additional econometric issues bear noting. First, there could be potential concerns over the choice of our estimator. We find that our estimates are robust to different choices of estimator. In particular, we use the fixed effects logit estimator and find that our results are qualitatively similar, as reported in appendix table A.2. We also find that a mere 0.3% of our observations (433 out of 108,991) had predicted values outside the [0,1] range, suggesting that fit is not a concern with the use of a linear probability model. Second, we cluster our standard errors at the drainage basin to allow for errors to be correlated across villages along the same river segment and over time. Given that the pollutants accumulate along a river segment, clustering at the drainage basin allows for conservative inference on the effects of upstream polluting behavior. Third, given different populations across villages, we show in appendix table A.1 that our results are robust to population-weighted generalized least squares (GLS) estimation.

Results
We investigate the health effects of two kinds of non-industrial riparian polluting behaviors -bathing and trash disposal. We estimate the effects of these different polluting behaviors simultaneously using equation (2) and find strong evidence that upstream bathing causes increased diarrheal incidence. Specifically, we find that a one sample standard deviation increase in the number of people bathing upstream from a village (182,940 individuals) increases the probability of diarrheal outbreak in that village by 4.59 percentage points (table 3, column 2). Using the within-sample average of 18.29% diarrheal incidence, this corresponds to a 25.10 % effect. 16 The result is stable to the choice of specification (appendix 16 We report the results per 100,000 individuals, which corresponds to a 2.52 percentage point effect or 13.7% effect. table A.2) and limited to villages where the primary source of water is the river (table 3, column 2-3). The magnitude of the effect of upstream bathing on downstream diarrheal incidence is considerably larger when limiting our sample to only those villages where the primary source of drinking water is the river. Cumulatively yet conservatively, we estimate that upstream bathing can explain 865 deaths, which is 7.5% of all diarrheal deaths in our sample, suggesting that there is a large human cost to seemingly benign polluting behavior.
Notably, we find no evidence on the impact of upstream trash disposal on downstream diarrheal incidence. It is important to note that these are equilibrium effects, net of avoidance behavior specific to the form of upstream polluting behavior. As a result, the absence of effects on in-river trash-disposal could be evidence of avoidance behavior. Later in this section we will separately estimate the avoidance behavior in response to upstream polluting behavior.

Avoidance Behavior
The contrast between the effects of upstream bathing and upstream trash disposal is of interest. Since the effects in table 3 are net of avoidance behavior, these results suggest that downstream populations exhibit avoidance behavior in response to trash and not to pollutants entering the river through bathing in it. While we cannot explicitly or directly test the motivations behind this gap in avoidance behavior between the different sources of pollution, we hypothesize two plausible explanations. First, trash disposal in the river may pose a higher health risk than in-river bathing, and as such individuals are more active in avoiding the risk arising from the former. Yet, given the large health cost of upstream bathing, this seems less likely. Instead we favor the second explanation, that trash disposal results in pollutants that are visible to the naked eye, in contrast to impurities generated from bathing that are less or not visible to individuals. This explanation is consistent with our results in table 4 that individuals stop drinking water from the river in response to upstream trash polluting but not in response to upstream bathing.
We can quantify the extent to which individuals engage in avoidance behavior in response to upstream polluting behavior (table 4). We find some evidence of avoidance behavior in response to upstream in-river trash disposal, but no evidence for avoidance behavior with respect to river bathing. Specifically, one standard deviation (37,853 individuals) increase in upstream bathers reduces river water consumption by 0.65 percentage points (p-value = 0.06), which corresponds to a 5% effect. 17 While we cannot provide specific behavioral or structural explanations for this gap in avoidance behavior, combined with the evidence on the net effects from table 3, we cautiously conclude that individuals exhibit greater avoidance behavior with respect to visible pollutants such as trash than less visible pollutants arising from bathing, making them potentially more lethal. This is consistent with a negligible net effect of upstream trash disposal on downstream diarrheal incidence.

Topographic and Geographic Factors
Next we isolate effects based on different geographies to deduce whether (1) the results are consistent with the topography of villages (flat versus hilly), which would affect the rate of water flow, and (2) there are systematic differences in the effects of upstream bathing across urban and rural villages in our sample. In table 5 we provide results breaking down the effect of upstream bathing by topography (columns 2-3) and by urbanization (columns 4-5). Consistent with our intuition, we find that the bulk of the human health effect is concentrated in flat rather than hilly villages due to the propensity for eutrophication in stagnant water (Jiménez, 2006). By contrast, we find no evidence of differential impacts in rural versus urban locations, suggesting that avoidance behavior may not be region-specific.
We also address the concern that pollution dissipates over long distances along the river network. In the main specification,the U pstream vt indicator provides equal weight to each individual in a village that partakes in river bathing. However, it is possible that bathing activity for more distant villages may have little to no impact on disease outcomes if the contaminants dissipate over the course of the river. Given the method of riparian network construction discussed in Section 3, we can address this mechanism by weighting individuals in closer villages more than those who are farther away. To do this, we calculate an alternative Upstream (and Downstream) measure for bathing and trash behavior as: where distance nt is the distance (in meters) of the centroid of village n from village v. The function f (·) weights the relative distance of the villages with the property that f ≤ 0.

Placebo Tests
In this section we provide the results from a battery of placebo and falsification tests to demonstrate the strength of our identification strategy, and consequently the validity of our identifying assumption. Table 7 shows that the effect we find is (1) specific to diarrheal incidence and (2) specific to upstream polluting behavior. As shown in the table, we find no evidence of an effect on diarrheal incidence in a given village in response to downstream bathing. Not only is the effect of downstream bathing statistically indistinguishable from zero, it is also an order of magnitude smaller than the effect of upstream polluting. Therefore, concerns over drainage basin specific, time-varying factors that are correlated with both polluting behavior and diarrheal outcomes may be overstated.
Additionally, we follow Garg (2015) and provide placebo tests on other diseases (table 7, columns 2-5). Spurious correlation between downstream health (driven by demographic or political economy factors) and upstream polluting behavior should be shared when predicting the impact of upstream pollutants on other disease outbreaks. For instance, if individuals were geographically sorting over time in response to poor health conditions (but not to upstream bathing), then we should expect to see a correlation with incidence of at least one of the other diseases. The absence of any noticeable or meaningful effect on any of the other diseases adds support to the validity of our identification strategy.
We test for non-linearities in appendix table A.3 and find no evidence to support a non-linear relationship between exposure to upstream river bathing and downstream diarrheal incidence.

Proxy for River Access
Another possible threat to identification is that the river bathing metric is acting as a proxy for river access. Villages in which the river is more accessible (e.g. not flowing through a deep gorge) may promote increased river use for bathing as well as for other purposes. While the placebo tests presented in table 7 cast doubt upon the presence of a non-waterbourne transmission mechanism, it does not rule out other pollution sources besides bathing, such as agriculture run-off, that may be driving the main result.
We address this by using a question from Podes regarding the village's river use. The question provides a binary indicator for whether the river is used for a host of economic activities: agriculture, industry, transportation, and "other" miscellaneous activities. We replicate the calculation from equation (2) using these respective indicators in place of the river bathing indicator. In this way, we focus on the total upstream population with unfettered substantial river access. Table 8 presents the results of using these accessibility-to-river population indicators as a predictor of diarrheal outbreak. None of these activities alone (columns 1-3) have significant predictive power on the probability of a downstream diarrheal outbreak. This absence of a significant relationship between upstream river accessibility and downstream human health outcomes remains when considering all river access indicators jointly (columns 4-5). Even with the inclusion of trash dumping in column 6 we find no significant effects, and the coefficient estimates are an order of magnitude smaller than the estimated effect of river bathing. As column 6 should display any of the predictive power related to river access but not river bathing, we can be cautiously optimistic that the effect that we find in table 3 is being driven by upstream river pollution related to bathing activities and not generically river access.

Policy Simulations
The impact of upstream bathing on downstream health suggests that policy responses to river pollution should be targeted with consideration to geography. Therefore, we conduct a set of policy simulations by imposing increasingly stringent moratoriums on river bathing ( We show that the geography of targeting is essential to cost-effective policy. We categorize all sample villages into deciles based on total downriver population. Villages located near a river's headwaters with a large downstream population are grouped into the first decile while most downstream villages are grouped into the tenth decile (table 9). Targeting upstream villages generates the largest benefit -targeting the most upstream villages is an order of magnitude more effective than targeting downstream villages. Specifically, avoiding a single 19 Simulations are cumulative across all four sample years (2000, 2003, 2006 and 2008) and the exposure variable (upstream bathing population) is updated by recalculating the sample mean with a moratorium imposed on villages that fall under the targeting rule. Epidemics are predicted using the rate of 0.0252 per 100,000 upstream individuals and deaths are predicted using the sample average of 0.58 diarrhea-related fatalities occurring within a village per diarrhea outbreak. Elasticity measures are calculated as the percentage change in diarrhea-related deaths divided by the percentage change in bathing individuals for each policy increment. Note that the marginal elasticities are not forced to be decreasing as they are calculated for incremental adjustments to the policy rather than as percentage changes from the no-regulation baseline (which would be necessarily decreasing in magnitude). diarrheal death requires preventing 971,000 individuals in the most downstream decile from bathing but only 82,000 individuals in the most upstream decile. Our findings are therefore consistent with recent work on the political economy of water pollution (Lipscomb and Mobarak, 2016).
The baseline case, which most closely resembles the current state of affairs in Indonesia, has no regulation on river bathing activity. Population deciles with the largest downstream populations are then targeted incrementally until a complete moratorium on river bathing is achieved. Column 2 in table 9 shows the number of individuals bathing in the river in each decile. 20 In our two extreme cases, the absence of regulation on river bathing allows the 865 deaths attributable to river bathing to persist while a strict moratorium on river bathing prevents all of these deaths (table 9, column 3).
However, avoided deaths on decile-level moratoriums is not a comparable measure across the different deciles that have varying number of bathers. We generate two measures that allow us to compare moratoriums on different deciles -average and marginal number of individuals who must stop bathing to avoid a single instance of diarrheal mortality (table 9, columns 3 and 4 respectively). These cost calculations are akin to average and marginal costs of the policies per unit of benefit (figure 3). Columns (3) and (4) show that a policymaker interested in reducing diarrheal deaths would have to inconvenience (or compensate) the fewest number of individuals per avoided death in the most upstream decile -82,000 individuals who bathe in a river, versus 971,000 individuals in the most downstream decile.
Conversely, in columns 5 and 6 (table 9), we develop an elasticity measure that shows reductions in mortality from a 1% reduction in decile-specific river bathers (figure 4). Reducing 20 Deciles are constructed based on total population, not bathing-in-river populations. top-decile bathers by 1% reduces marginal downstream diarrhea-related mortalities by 2.54% but reducing the lowest-decile bathers by 1% reduces marginal mortalities by only 1.61%.

Conclusion
In this paper, we construct and employ a novel data set on Indonesia's drainage basins to provide the first causal evidence that household-level polluting behavior and in particular upstream in-river bathing generates large downstream health externalities. Our results have particular relevance for policymakers for several reasons. First, we uncover a previously ignored source of household level pollution: in-river bathing. We find that upstream river bathing can explain as many as 865 deaths over four years representing 7.5% of all diarrheal deaths in our sample. This represents a large human cost from a source of river pollution that the literature has almost entirely ignored. Second, we find that targeting based on geographic location of the source of pollution can result in substantial health savings. In particular, a 1% decrease in in-river bathers in the most upstream decile reduces fatalities by 2.54%. By contrast, a 1% decrease in in-river bathers in the most downstream decile reduces fatalities by only 1.62%. Third, we find suggestive evidence that individuals exhibit avoidance behavior to visible but not invisible pollutants. If salience drives a wedge between the marginal benefit and marginal cost of avoidance behavior, investment in prevention of these "silent" killers may yield considerable health savings. Instead of large scale government programs aimed at river basin cleanup which may be financially infeasible, policymakers could attempt to enact policies aimed at preventing polluting behavior. Future work could explore the health impacts of avoidance behavior to previously understudied point and non-point sources of  The mean incidence of the different diseases are statistically different from each other using the Fisher's LSD as well as the more conservative Tukey's test.  Cluster robust standard errors in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01 All specifications include village and province-year fixed effects. Standard errors are clustered at the river basin levels. All specifications include additional controls for total village population, total upstream population and total downstream population.
Columns (2)   Cluster robust standard errors in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01 All specifications include village and province-year fixed effects. Standard errors are clustered at the river basin levels. The specifications include additional controls for total village population, total upstream population and total downstream population. The sample is limited to villages that self-report proximity to a river. Cluster robust standard errors in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01 All specifications include village and province-year fixed effects. Standard errors are clustered at the river basin levels. The specifications include additional controls for total village population, total upstream population and total downstream population. The sample is limited to villages that self-report proximity to a river. Cluster robust standard errors in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01 All specifications include village and province-year fixed effects. Standard errors are clustered at the river basin levels. All specifications include additional controls for total village population, total upstream population and total downstream population.
Columns (2)   Cluster robust standard errors in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01 All specifications include village and province-year fixed effects. Standard errors are clustered at the river basin levels. The specifications include additional controls for total village population, total upstream population and total downstream population. The sample is limited to villages that self-report proximity to a river. Cluster robust standard errors in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01 All specifications include village and province-year fixed effects. Standard errors are clustered at the river basin levels. All specifications include additional controls for total village population, total upstream population, total downstream population, dominant source of income in village, geography of village, quality of governance (education of village head), access to medical facilities in the village and political status of the village.  (2000, 2003, 2006 and 2008). Each 100,000 upstream bathing individuals increase epidemic rates by 0.0251 while each epidemic yields 0.58 deaths, which is the sample average for all on-river villages across all years (excluding Java). The targeting rule deciles are based on total downstream population. For example, the top decile includes only those top 10% of villages with the largest downstream populations. Column one shows that increasingly stringent moratoriums on river bathing increase the number of bathing individuals affected, which corresponds directly with a reduction in predicted mortality displayed in column 2. Columns 3 and 4 capture the number of bathers that must be removed to prevent a death where column 3 is the average effect across all deciles under the moratorium and column 4 is the marginal effect of the most downstream moratorium decile. Columns 5 and 6 present estimates of the elasticity of placing a moratorium on river bathing. Column 5 captures the change in mortality per 1% decrease in bathers, averaged over the deciles under the moratorium. Column 6 presents the effect of the change in mortality per 1% decrease in bathers for the marginal decile placed under moratorium. Here, elasticities are generally decreasing resembling the impact of targeting the most upstream and detrimental individuals. However, because the policy rule is based on cumulative downstream population along the river, not the downstream bathing population, it is possible for the marginal elasticity measure to increase with the addition of a new decile.

A.1 Validation of Outbreak Variable
In this section, we validate the outbreak variable with data on deaths. We do not use deaths in our primary specification since we have mortality data on only 4,681 village-years with more than one time period of data. In practice we have over 19,000 village-year observations, but over 15,000 villages with just a single year of data on deaths. In table A.1 we estimate two sets of regressions. In Column 1 we use, as the dependent variable, the log of the interaction of the outbreak variable with the ratio of average diarrheal deaths per unit population. This is equivalent to computing the effect on diarrheal outbreaks and then transforming the outbreak variable into the log of mortality rate. In column 2 we use the log of the actual death rate. The coefficient of interest and the corresponding placebo test have qualitatively the same coefficients. To the extent that column 1 and 2 are comparable, by using outbreak data instead of deaths data, we are likely underestimating the impact of upstream bathing on downstream health outcomes. Cluster robust standard errors in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01 All specifications include village and province-year fixed effects. Standard errors are clustered at the river basin levels. The sample is limited to villages that self-report proximity to a river. Column (1) is defined as the log of the average number of diarrheal deaths divided by the average population interacted with the indicator variable denoting an outbreak in a village in a given year. Column (2) is the log of the death rate. Both regressions are weighted by population to account for differential populations across village and compute the average effect per person rather than per village.

A.2 Robustness to Choice of Estimator
In this section, we validate the choice of estimator. Column 1 presents the coefficients of a linear probability model (LPM) in predicting the diarrheal epidemic in a village in a given year. Using an LPM generates only 0.3% of predicted values that are outside the [0,1] range. Column 2 performs a similar estimation using a panel logit model instead of an LPM. The results of the estimation are qualitatively similar to the LPM regression, and maintains both the sign and level of significance for the bathing estimator. Cluster robust standard errors in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01 All specifications include village and province-year fixed effects. Standard errors are clustered at the river basin levels. Both specifications include additional controls for total village population, total upstream population and total downstream population. Column (1) presents estimation using an OLS linear probability model, with approximately 0.3% of predicted values lying outside the [0,1] range. Column (2) presents the results of the fixed effects logit model. R-squared presented in the table is the psuedo-R2 calculated by the logit estimation. The number of observations using the FE Logit model is smaller than Column (1) due to all-positive or all-negative outcomes for village across all 4 years of the panel being dropped. The fixed effect logit model does not allow for the computation of the marginal effect at the mean value for the variables of interest -for a more detailed discussion, see Kitazawa (2012).

A.3 Testing for Non-Linearities
In this section, we test for whether there exists a non-linear relationship between upstream bathing behavior and diarrheal incidence in downstream villages. Column 1 presents the main result of the paper using the OLS estimator, and is identical to the corresponding column in table 3. Column 2 applies a quadratic fit to the bathing estimator, and finds no significant relationship between diarrheal outbreak and the square of upstream bathing populations. The third column runs an OLS regression using the log of upstream bathing values. This monotonic transform of the explanatory variable is qualitatively similar to the main specification, and although the estimator is an order of magnitude smaller, it still estimates a positive and significant relationship. Thus we find no evidence to support a nonlinear relationship between exposure to upstream river bathing and downstream diarrheal incidence. Cluster robust standard errors in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01 All specifications include village and province-year fixed effects. Standard errors are clustered at the river basin levels. All specifications include additional controls for total village population, total upstream population and total downstream population. The sample is limited to villages that self-report proximity to a river. The mean number of upstream bathing households is 54,840.