Community matters: Heterogeneous impacts of a sanitation intervention

Highlights • The main pillar of the Government of Nigeria (GoN)’s ’National Strategy for Scaling up Sanitation’ reduces open defecation in low-wealth areas.• Our results can be used by the GoN to target the Community-Led Total Sanitation (CLTS) intervention more effectively.• We show that these findings may also be useful for the targeting of CLTS beyond the Nigerian context.• Different or complementary approaches are needed for areas characterised by higher aggregate wealth.


Introduction
Public health interventions are often promoted as drivers of behavioral change and catalysts in the investment and adoption of health technologies. The introduction of comprehensive water and sanitation programs in developed nations in the early 1900s has been dubbed the most effective public health intervention of the last century (Alsan & Goldin, 2019), partly because of the significant cost of poor sanitation and the disease environment it creates. These costs include negative impacts on child health (Alzúa, Djebbari, & Pickering, 2020;Augsburg & Rodriguez-Lesmes, 2018;Baird, Hicks, Kremer, & Miguel, 2016), morbidity (Prüss-Ustün et al., 2014), human capital other than health (Adukia, 2017;Spears & Lamba, 2016;Orgill-Meyer & Pattanayak, 2020), and psycho-social stress (Sahoo et al., 2015). Such programs can provide the foundation for economic growth. Currently 3.6 billion people still lack access to safely managed sanitation (WHO/UNICEF, 2021), most of whom live in low and middle income countries (LMICs). Improving access to sanitation has thus been recognized as a key goal towards sustainable development by the United Nations.
However, the effectiveness of public health interventions, including sanitation ones, in LMICs is little understood, especially when implemented at scale (Banerjee et al., 2017;Dupas, 2011;Cohen & Dupas, 2010). We address this knowledge gap by evaluating a participatory community-level intervention, Community-Led Total Sanitation (CLTS), which aims at improving access to safe sanitation and has been rolled out by more than 25 governments around the world (Zuin et al., 2019). CLTS entails community meetings and the provision of information with the aim of eradicating open defecation (OD) by triggering collective behavioral change and encouraging communities to construct and use toilets. While the approach has been evaluated in the past, divergent and inconclusive results suggest that the circumstances under which CLTS is an effective tool to improve sanitation in LMICs are not well understood (Radin, Jeuland, Wang, & Whittington, 2020), preventing efficient targeting (Brown, Albert, & Whittington, 2019).
In this paper, we use a cluster randomized controlled trial (RCT) to study the effectiveness of CLTS implemented at scale in Nigeria. CLTS is the main pillar of the Government of Nigeria's 'National Strategy for Scaling up Sanitation and Hygiene'. The study comprises 247 study clusters of rural communities. We randomly selected 20 households per cluster for interview at baseline in 2014 and conducted three follow-up surveys 8, 24 and 32 months after CLTS implementation to study its short-and longer-run impacts on OD and sanitation investments. Low attrition allows us to work with a balanced panel of more than 4,500 households, located in the Nigerian states Ekiti and Enugu.
Nigeria is a particularly apt context to study sanitation policies. Accounting for about half of West Africa's population, it is a key player in the region. Yet, the country is facing massive developmental challenges, evidenced for example by its ranking of 152 out of 157 countries in the World Bank's 2018 Human Capital Index (World Bank, 2019). In 2020, almost 60% of its 206 million strong population had no (18.7%), only limited (19.5%) or unimproved (19.1%) access to sanitation in 2020, and these rates have only marginally improved since the early 2000s. In consequence, Nigeria is a top contributor to the share of the global population without access to adequate sanitation (WHO/UNICEF, 2021).
We show that, on average, CLTS led to small and temporary reductions in OD among households in treated communities, i.e. a reduction of 3 percentage points (ppt) that was not sustained 32 months after the baseline. However, these estimates hide strongly varying impacts across population subgroups: we find that intervention impacts are considerably stronger among, and restricted to, the asset-poorest half of the studied communities. In these less wealthy communities, households constructed toilets, leading to a 9 ppt drop in OD, three times as much as on average. This effect was first measured after 8 months, and was sustained over the whole 32-month study period. It is substantial and comparable to interventions that also include a financial component. The average impacts of sanitation interventions that combine CLTS with subsidies or financial incentives for toilet construction range between 9 and 19 ppts (Guiteras, Levinsohn, & Mobarak, 2015;Patil et al., 2014;Andres, Deb, Joseph, Larenas, & Grabinsky Zabludovsky, 2020). In addition, our results for CLTS in less wealthy communities are comparable to the impact of the provision of sanitation-tagged microcredit in India (Augsburg et al., 2023), which resulted in a 10 ppt reduction of OD.
The finding that CLTS interacts with community characteristics links our study to the theory of geographical poverty traps which puts forward that different neighborhood endowments (of physical and human capital) may lead to different outcomes for otherwise identical households. Jalan and Ravallion (2002) argue that, in consequence, interventions have a trajectory that depends on the neighborhood even when all household characteristics are accounted for. They further argue that targeting of geographical areas may be easier and more effective given unconstrained mobility and limited data to identify poor households. We show (in the spirit of Ravallion & Wodon (1999)) that CLTS induces wealthier and less wealthy households to invest in sanitation when they live in less wealthy communities, while we do not find CLTS impacts on assetpoor or asset-rich households living in wealthier communities. We further show that community wealth encompasses a number of distinctive community characteristics that are already known to correlate positively with sanitation investments (such as pre-treatment differences in toilet ownership, measures of social cohesion, or leader characteristics). None of these characteristics could independently account for the strong impacts of CLTS in less wealthy communities, but they may interact to produce the stronger CLTS impacts in less wealthy communities. This leaves us with one very robust predictor of CLTS effectiveness: community wealth.
We use this result to propose a simple strategy to target CLTS. Measures of wealth are readily available in standard household surveys. Alternatively, wealth proxies such as nightlight intensity indices can be obtained from open access satellite data. The Government of Nigeria can use these data to develop a national targeting strategy. We illustrate this based on the 2013 Nigerian Demographic and Health Survey (DHS), and show that the data can replicate the classification of geographical areas into less wealthy and (modestly) wealthier ones with a precision that is similar to the (more detailed) measures obtained from our primary study data.
Finally, we conduct an exploratory analysis to gain insights into whether community, or area, wealth could also be used as a basis for targeting CLTS implementation in other contexts. We pool microdata from Indonesia, Nigeria and Tanzania and estimate average and heterogeneous CLTS impacts on toilet ownership along nightlight intensity -a globally available proxy for community wealth that is comparable across contexts. We find an inverse relationship between community wealth and CLTS impacts, further supporting the conjecture that area-level wealth is a plausible underlying factor of CLTS program effectiveness beyond our Nigerian RCT. The results rationalize the wide range of CLTS impacts documented in the literature.
In summary, our findings allow us to draw policy-relevant conclusions regarding a wider rollout (Muralidharan & Niehaus, 2017), especially since they are based on implementation at scale. CLTS can be an effective policy tool if appropriately targeted. We also show it closes the sanitation gap between less and more wealthy communities. Yet, despite its popularity and wide implementation, CLTS needs to be complemented with other interventions to close the large sanitation gap remaining overall, as more than a third of households in wealthier communities and more than half of households in less wealthy communities continue to openly defecate after implementation of the intervention.
The remainder of the paper is structured as follows. The next section describes the intervention. Section 3 presents the experimental design and Section 4 describes the data collection and balance in the randomisation. Section 5 presents the empirical method and Section 6 our impact estimates. Section 7 lays out our proposed targeting strategy and compares the results of our study with those of other CLTS interventions. Section 8 concludes.

The intervention
Our study focus is two of three states where the international non-governmental organization (NGO) WaterAid worked closely with local governments and local NGOs in implementing CLTS with a view to improving toilet coverage and reducing open defecation. In particular, WaterAid Nigeria and two local NGOs active in the study states Ekiti and Enugu 1 , trained local government authority (LGA) 2 staff in water, sanitation and hygiene (WASH) units, which are part of Nigeria's public service. WASH officials then took responsibility for CLTS delivery and implementation. In the context of the evaluation study, mobilization and triggering activities took place over six months -between January and June 2015.
The implementation followed the three-step CLTS approach. First, community leaders are engaged in a discussion about the negative health implications of OD, 3 as well as the potential benefits 1 The 2014 National Nutrition and Health Survey revealed that just 29% of households in Enugu and 46% of households in Ekiti had access to improved sanitation. This resulted in rates of OD of 51% and 44% respectively, according to the same survey. 2 LGAs are Nigeria's second administrative sub-division, led by a local government of CLTS in achieving behavioral change within their communities. Community leaders then arrange a community meeting, the so-called 'triggering meeting', the main component of CLTS. The meeting starts once and only if a significant number of community members gathered in a predefined public space on the appointed day. The first activity is a community mapping exercise, in which each attending community member marks their household's location and toilet ownership status on a stylized map on the ground. Community members next identify and mark regular OD sites. In many cases, facilitators follow up with graphic images showing that the community lives in an environment contaminated by feces.
Facilitators of the meetings further use the map to trace the community's contamination paths of human feces into water supplies and food. 4 As a closing task, attendees are asked to draw up a community action plan to achieve community-level open-defecation-free (ODF) status to foster collective action and collaboration. The action plan includes discussions of how vulnerable or less wealthy households can be supported to achieve ODF status. It is posted in a public spot. Volunteers (so-called 'natural leaders') are chosen to follow up regularly on each attendee's commitment towards implementing the plan. They hence carry the main responsibility for follow-up, but without any formal authority to push for action. CLTS facilitators were asked to conduct one follow-up visit to communities, checking in with natural leaders on communities' advances to become open defecation free. Eventually, the community can be certified for its achievements by the national Rural Water Supply and Sanitation Agency (RUASSA) and the National Task Group on Sanitation (NTGS).
CLTS does not offer any monetary incentives, subsidies or credit to finance toilet construction or reward OD reductions or ODF achievement. It also does not provide technical assistance or hardware nor does it promote a particular toilet technology. CLTS is expected to drive a change in sanitation practices purely by altering the perceived costs of unsafe sanitation and the perceived benefits of toilet use.

Research design
The research design is a two-stage cluster randomized controlled trial. In the first stage, the unit of interest is communities, in line with CLTS being a community-level intervention. In the second stage, the unit of interest is residents of these communities. Our sample is representative of the study area, consisting of 246 communities in nine LGAs in Ekiti and Enugu, which contrasts with sanitation evaluations that often focus on households with children as, for example, in Cameron, Olivia, and Shah (2019) and Pickering, Djebbari, Lopez, Coulibaly, and Alzua (2015).
Communities (clusters) in this study do not match Nigeria's administrative units. Rather, they were defined closely with local implementing partners to capture adequate implementation clusters and reduce information spillover, i.e. as self-contained units so that information about triggering activities would not spread to the next cluster, for example via shared markets or large public areas. To further safeguard the validity of the stable unit treatment value assumption (SUTVA), 'buffer' areas were introduced to ensure that no two clusters were located in close geographic proximity. A community comprises on average 1.7 villages or quarters, 5 where CLTS was implemented at the same time.
In total, 246 communities were randomized with equal probability into either receiving CLTS (treatment) or not receiving it during the course of the study (control). Randomization was stratified by LGA. 6 The distribution of treatment and control clusters is presented in Online Appendix Table A1 and the location of study communities is indicated in Online Appendix Fig. A1.
Within these communities, we conducted a resident census during October 2014. The census covered basic information from 50,333 households (27,888 in Enugu and 22,445 in Ekiti) in the participating LGAs, and served as our household sampling frame. More details on the research design are provided in Abramovsky, Augsburg, and Oteiza (2015).

Data
We collected panel information on communities and households within these communities at four points in time over a period of 32 months. 7 The baseline survey, administered to 4,540 households, took place during December 2014 and January 2015.
To follow the behavior of community residents over time, three follow-up surveys were conducted -after 8 months (FU1: Dec 2015 to Feb 2016), 24 months (FU2: March to April 2017) and 32 months (FU3 or 'endline survey': Nov 2017 to Jan 2018); see timeline in Online Appendix Fig. A2. The three post-intervention surveys allow us to study the sustainability of CLTS impacts over time up to about three years. Household attrition rates over the three follow-up survey rounds are low: 2.53% at FU1, 8.81% at FU2 and 11.58% at FU3. There is no differential attrition across experimental groups (see bottom panel of Table 1 and Online Appendix C).
Households in study communities are typically headed by a male (64.5%) with at least primary education (67.5%), who is employed (78.1%), mostly in farming (47.1%) and on average 56 years old. Households consist of 4 members on average, and almost a third have at least one child under the age of 6 years ( Table 1). These characteristics are balanced across experimental arms along a set of 22 indicators, except for a small (0.27) difference in the number of household members, which we hence include as a covariate throughout our analysis. 8 Almost two thirds of households (62.8%) have at least one member above the age of 4 years defecating in the open. A similar percentage of main respondents (62.4%) report to openly defecate themselves. 9 Prevalence of open defecation in our study population is closely aligned to that of our study states Ekiti and Enugu, as reported in the 2015 Nigeria Malaria Indicator Survey (NMIS) at 64.75%. In line with these behavioral measures, only 36.9% of house-4 Other activities may be added at the discretion of the facilitator. Examples include medical expense calculations related to illnesses likely induced by OD practices; transect walks through the community (often referred to as 'walks of shame'), pointing out visible feces in the environment to evoke disgust and shame; or graphic exercises, where facilitators might add feces to drinking water, illustrating that these are not necessarily visible to the naked eye. In the context of our study, about 20% of triggering meetings included at least one such additional exercise. For example, 14% of triggering meetings included graphic illustration and 7% conducted expense calculations. 5 Village is the term used in the state of Enugu and quarter in Ekiti. The median and modal number of villages or quarters within a cluster are both 1. The maximum number of villages in a cluster is 7, occurring only once. 6 Study LGAs in Enugu are Igbo Eze North, Igbo Eze South, Nkanu East and Udenu.
In Ekiti, Ido Osi, Ikole, Moba, Irepodun Ifelodun and Ekiti South West are part of the study. 7 Data collection was carried out by an independent data collection company, blinded to treatment status. The survey instruments were developed jointly with local implementing partners knowledgeable about the study areas, were piloted in the field and were further adjusted after interviewer training. These processes were followed to ensure respondents would understand, and feel comfortable with, all questions asked. 8 In an F-test of joint significance of all characteristics, we reject the null hypothesis at the 5% level (p-value = 0.038). Yet, once we remove household size, the explanatory power of the remaining variables falls markedly (p-value = 0.27), supporting the validity of our randomization strategy across all other dimensions of household and community measures. Abramovsky et al. (2015) report balancedness of our randomization on an even wider set of characteristics. holds own a toilet, 36.1% own a functioning toilet of any type, and 32.4% own a functioning and improved toilet at baseline; 98% of households that have a toilet use it at baseline. These toilet ownership measures, which integrate results from interviewer inspections, capture different dimensions of interest. The first records ownership, but ignores functionality. The second (functional toilet ownership) additionally captures whether maintenance investments into the existing stock of toilets are made. The third measure (functional improved toilet) accounts for quality beyond functionality, satisfying the stricter criteria set by the WHO/UNICEF Joint Monitoring Program regarding improved sanitation. Characteristics of the study communities in which these households reside are shown in the third panel of Table 1. They include an aggregated indicator of households' wealth to capture community wealth. The index is measured as the first factor of a principal component analysis based on a series of questions regarding asset ownership. 10 While its numerical value is not meaningful in itself, comparing it with the distribution of community wealth across Nigeria using DHS 2013 data (see Online Appendix F for details), we find that communities in our sample are typically located towards the middle (4th to 7th deciles) of the Nigerian wealth distribution, rather than in the tails. Hence, our sample communities are neither very asset-rich nor very asset-poor in terms of wealth relative to the Nigerian distribution. Our analysis will particularly focus on heterogeneity in CLTS effectiveness by this community-level wealth measure, as well as an alternative proxy, pre-intervention nightlight intensity within a 5 km radius. 11 We find that average night light intensity in our study area is very low with a mean of 2 relative to the global night light range of 0 to 63. The next set of community characteristics listed in Table 1 relate to social interactions within the community, suggested as accelerators of the effectiveness of CLTS (see, for example, Cameron et al. (2019)): a community's level of (i) trust, (ii) community participation, and (iii) religious fragmentation. Trust is the average community score of the degree to which its members trust their neighbors. Community participation is constructed similarly, based on households' participation in community events. Religious fragmentation is adapted from measures used in studies of ethnolinguistic fragmentation (ELF), as our study sample is homogeneous along ethnic lines but very diverse in terms of religion. Finally, mean toilet ownership rate per community is 36.2%. Detailed definitions of these measures and their distribution can be found in Online Appendix Section B.2.

Estimation approach
Following successful randomization evidenced in Table 1, we estimate the impact of CLTS on our primary outcome, open defecation practices, using an intent-to-treat (ITT) design based on cluster randomized assignment to treatment. 12 We compare open defecation practices y ict in household i living in community (cluster) c in period t by treatment assignment: where community-level CLTS treatment status is defined by T c . Baseline characteristics of households and their heads, X ic0 , are  (2021). For a detailed description of household and community-level covariates, see Online Appendix B. *p < 0.10, ** p < 0.05, *** p < 0.01.
10 Details of its components and their factor loadings are provided in Online Appendix B1. 11 Michalopoulos, 2013 presents evidence that wealth and night light intensity are strongly correlated. 12 In a successfully randomized scenario, as is our study (see Table 1), ITT designs yield unbiased estimates of the average impact of the intervention on the sample assigned to treatment. Additionally, we follow Imbens and Angrist (1994) and Angrist and Imbens (1995) and instrument triggered treatment with treatment assignment in Section 6.2. The results are very similar to the ITT estimates. In Online Appendix C we show that there is no evidence of selective triggering in our study.
included alongside LGA and survey wave fixed effects, x g and d t , to control for unbalanced household size, unobserved area effects and contemporaneous shocks and to increase the precision of our estimates. The parameter of interest, c, captures the average impact of CLTS. Our preferred analysis of covariance (ANCOVA) specification further conditions on the baseline value of the outcome variable, y ic0 . These estimates are more efficient than difference-indifference and simple difference estimators in experimental contexts, when pre-treatment information is available and the outcome is strongly correlated over time (McKenzie, 2012). Alongside, we present conventional difference-in difference (DiD) estimates. We investigate heterogeneous impacts, primarily along community characteristics (CCs), in the expanded specification: where we introduce a binary variable CC c , and include the interaction term T c Â CC c . In our main results, we define CC c as low and high community wealth, split along the sample median. The c r parameter captures the average CLTS treatment effect in the less wealthy half of communities (for which CC c ¼ 0), and c d is the difference in treatment effects between communities with above and below median wealth.
Since we are testing multiple hypotheses simultaneously in our analysis of heterogeneous impacts, we report both, unadjusted (or naive) p-values and p-values that are adjusted for the family-wise error rate in brackets. We compute the latter using the methodology proposed by Romano and Wolf (2005), calculated by drawing 1,000 clustered bootstrapped samples.

Results
Table 2 presents estimates of average treatment effects on open defecation behavior, defined as a dummy equal to 1 if the main respondent performs OD, 0 otherwise. Columns 1 and 2 show the simple DiD estimates with and without covariates, and Column 3 presents ANCOVA results. We find that CLTS reduced OD consistently across all specifications when we pool observations across the three follow-up surveys (Panel A), with ANCOVA providing the highest precision (p-value < 0.05). However, the magnitude of behavioral change is small -exposure to CLTS resulted in a reduction in OD by 4 ppts eight months after CLTS. This reduction is sustained for two years after intervention implementation, but then fades out (Panel B). We find no evidence of systematic measurement error in the treatment group as a result of CLTS, due to over-reporting of 'desirable' outcomes, for example: only 0.5% of households in the control and 0.9% in the treatment group report non-existing toilets at endline, and the difference is not statistically significant (p-value = 0.115). Our results are also robust to using an alternative measure of OD, namely whether at least one household member above the age of 4 performs OD (Online Appendix D).
This effect size is towards the lower bound of the range of impact estimates found in other evaluations of CLTS. CLTS evaluations in Indonesia (Cameron et al., 2019) and Bangladesh (Guiteras et al., 2015) found no statistically detectable reductions in OD. On the other hand, other studies have shown that CLTS can change behavior, including in Tanzania (Briceno, Coville, Gertler, & Martinez, 2017) and India (Pattanayak et al., 2009), with reductions in OD as high as 30 ppts in Mali (Pickering et al., 2015). 13 A recent cross-country study concludes that '[t]he impact of CLTS and subsequent sustained latrine use varied more by region than by intervention, indicating that context may be as or more important than the implementation approach in determining effectiveness' (Crocker, Saywell, & Bartram, 2017). In consequence, we focus on the estimation of heterogeneous impacts across communities in the remainder of the paper.

Heterogeneous impacts across communities
CLTS is designed and implemented as a participatory intervention at the community level, with the aim of bringing about collective change. In spite of its current popularity, the available evidence does not provide clear guidance for successful targeting, which requires a better understanding of the characteristics that best predict the effectiveness of CLTS.
The pioneers of the approach argue that the impact of CLTS on sanitation outcomes may vary by the socio-economic status of treated communities (Kar & Chambers, 2008). Following this hypothesis, we use community wealth as a widely available, comprehensive proxy for local socio-economic status (SES) and investigate heterogeneous CLTS impacts along this dimension. 14 We discretize community wealth along the sample median by ranking communities according to their wealth score. Communities with wealth scores equal to or above the median are defined as 'highwealth' communities (CC c ¼ 0), while the rest are classified as 'low-wealth' communities (CC c ¼ 1). Comparison with the Nigerian wealth distribution suggests that a more accurate labeling would be lower-and upper-middle wealth groups (see Online Appendix Fig. F3). Notes: Estimates based on OLS regression using Eq. 1. Panel A reports estimates using data pooled across all three post-intervention survey waves, while Panel B shows estimates by follow-up survey waves where FU 1, 2 and 3 denote measurements from waves conducted 8 (FU1), 24 (FU2) and 32 months (FU3) after baseline. DiD (ANCOVA) refers estimates obtained using a difference-in difference (ANCOVA) estimator. Household controls are: age, gender, education attainment and employment status of the household head; household size, whether the household has at least one child below age 6, household wealth asset score, and whether farming is the household's main economic activity. Standard errors are clustered at the community level. p-values are shown in parentheses. *p < 0.10, ** p < 0.05, *** p < 0.01. 13 Pattanayak et al., 2009 was a community-led intervention in the context of subsidy provision, which the authors state were hardly availed though. Other RCTbased evidence is available for CLTS, but these studies evaluate CLTS bundled with other interventions. E.g. Guiteras et al. (2015) include a treatment arm where the CLTS-like intervention is coupled with subsidies, and another with subsidies and a supply side technical assistance intervention. The World Bank carried out two further RCTs on the effectiveness of CLTS in combination with subsidy provision in India (Patil et al., 2014;Andres et al., 2020). 14 Detailed lists of household asset items are frequently elicited in household surveys in developing countries, as they are often more precise than measures of household income. The aggregated index has mean 0 and a standard deviation of 1. See details in Online Appendix Section B.
Using the pooled sample, we indeed find strong heterogeneity in impact estimates by community wealth. Table 3 shows that CLTS reduced OD prevalence by 9 ppts in less wealthy communities (c r from Eq. 2). The difference from wealthier communities is also highly significant and almost equal in magnitude, implying statistically insignificant impact estimates close to zero in wealthier communities.
A split of communities by wealth quartiles confirms this heterogeneity: Fig. 1 shows that CLTS impacts are largest in the first quartile, statistically significantly different from zero up to median wealth, and not different from zero among higher quartiles. 15 This supports the median split presented in Table 3 to identify communities with positive CLTS impacts, but additionally points to non-linear, even stronger impacts in the first quartile, i.e. among the least wealthy communities.
The OD reductions in less wealthy communities, achieved through CLTS implementation, are driven by increased sanitation infrastructure investment (Columns 2 to 5 of Table 3). OD reductions are almost identically matched by an increase in toilet ownership of 8 ppts (Column 2). Ownership of functioning toilets (Column 3) increased by 10 ppts, suggesting that 2% of existing toilets were kept functional due to CLTS. In contrast, we do not find that CLTS triggered higher usage of existing toilets, shared or otherwise (Columns 4 and 5). This is likely driven by the fact that reported (and interviewer-observed) usage of owned toilets at baseline was already close to universal (Table 1), and usage rates of newly constructed toilets remain high. 16 These findings are in line with ownership and use of private toilets being the most frequently discussed channels to reduce OD in CLTS community meetings (Kar, 2003). Fig. 2 shows that the impacts of CLTS on OD (left panel) and functioning toilet ownership (right panel) in less wealthy communities are achieved within 8 months, and are sustained across the three follow-up periods spanning 32 months in total, suggesting that the initial triggering worked akin to a one-shot policy in the Nigerian context. Orgill-Meyer et al. (2019) find similarly persistent effects on toilet ownership in the context of India even over a 10 year period, but -contrary to our setting -toilet ownership and OD practice diverged early on, and OD reductions could only be established one year after the intervention.

Exploring alternative margins for CLTS impacts
In this section we explore whether other community and individual characteristics that might be correlated with community wealth may be more policy-relevant margins for CLTS impacts or better predictors of its effectiveness.

Is impact heterogeneity driven by community differences in compliance?
An immediate concern could be that CLTS may be ineffective in wealthier communities due to lower compliance, i.e. a lower likelihood of triggering meetings taking place. Yet, we do not find evidence of statistically significant differences in compliance (pvalue = 0.301). We nonetheless present treatment-on-the-treated estimates in Table 4, obtained through an instrumental variable (IV) strategy. Our treatment indicator T c becomes 1 if triggering activities actually took place, and the instrumental variable is the assignment to triggering, following Imbens and Angrist (1994) and Angrist and Imbens (1995). We find qualitatively and Notes: 'Difference' is the coefficient c d from estimating Eq. 2, indicating the difference in treatment effects between communities with above and below median wealth.
Control means are calculated using endline data. Household controls are: age, gender, education attainment and employment status of the household head; household size; whether the household has at least one child below age 6; household wealth asset score; and whether farming is the household's main economic activity. Standard errors are clustered at the community level. Naive (unadjusted) p-values are shown in parentheses. In brackets we present p-values adjusted by family-wise error (FWE) rate following Romano and Wolf (2005), using 1,000 cluster bootstrap samples. *p < 0.10, ** p < 0.05, *** p < 0.01. Fig. 1. CLTS impacts on OD by community wealth quartile. Notes: The figure displays estimated confidence intervals for CLTS treatment effects by community wealth quartiles. Household controls are: age, gender, education attainment and employment status of the household head; household size; whether the household has at least one child below age 6; household wealth asset score; and whether farming is the household's main economic activity. Standard errors are clustered at the community level.. 15 Our results are also qualitatively and quantitatively robust to using a continuous specification of community wealth (see Online Appendix Section E.3). 16 The percentage of (functioning) toilet owners who report that their household does not practice OD remains very similar after the intervention (95%), and does not differ between households in treatment and control groups (p-value = 0.345). This is different from OD habits in India, for example, where toilet ownership and usage do not necessarily go hand in hand (see, for example, Gupta et al., 2020).
quantitatively very similar results compared with the ITT estimates: CLTS is only effective in triggered communities with wealth below the median, where OD is reduced by 10ppts. It is ineffective in triggered communities that are wealthier. 17

Is impact heterogeneity driven by lower baseline toilet ownership in less wealthy communities?
Furthermore, wealth is positively correlated with baseline toilet coverage, and therefore less wealthy communities may have a higher adjustment margin to react to CLTS. To understand whether wealth simply picks up low initial toilet ownership, we estimate heterogeneous impacts by baseline toilet coverage instead of community wealth. CLTS impacts on OD are 4 ppts larger in areas with low initial toilet coverage (Column 1 in Table 5), but not significantly different from zero and in economic magnitude considerably smaller than the heterogeneous community wealth impacts. This suggests that community wealth is a more informative measure for CLTS effectiveness than toilet coverage. On the other hand, when considering night light intensity, which also correlates strongly with wealth and has been used in the past to proxy for GDP per capita, a measure of income, at the sub-national level in African countries (Michalopoulos, 2013) we find similar patterns as with asset wealth. Areas characterised by low nightlight intensity show significant reductions in OD due to CLTS, and these are significantly different to impacts in areas that experience high night light intensity. The difference becomes, however, insignificant when considering adjusted p-values.

Community-or household-level heterogeneity?
Unsurprisingly, wealthier communities are composed of a higher fraction of households with higher wealth (rather than few extremely rich individuals), and vice versa, implying that community wealth estimates may be picking up heterogeneous household-level impacts of CLTS. We thus estimate heterogeneous impacts by household instead of community wealth. CLTS is more effective among less wealthy households compared to wealthier ones, but the estimated difference is about half as large as the community wealth impacts and not statistically significant under multiple hypothesis testing (Column 3 in Table 5). 18 Despite the -by construction -positive correlation between individual and community wealth, significant heterogeneity Fig. 2. Dynamics of CLTS treatment effects on OD and toilet ownership. Notes: Graphs plot the point estimates and 95% confidence intervals for CLTS impacts on OD (left) and toilet ownership (right) by follow-up survey wave and community wealth. FU 1, 2 and 3 refer to measurements obtained from followup survey waves conducted 8, 24 and 32 months after baseline. Household controls are: age, gender, education attainment and employment status of the household head; household size; whether the household has at least one child below age 6; household wealth asset score; and whether farming is the household's main economic activity. Robust standard errors are clustered at the community level.. Notes: The instrumental variable is the initial, randomized treatment assignment of the community based on the census. 'Triggering' denotes whether the triggering activities defined in Section 2 actually took place. Household controls are: age, gender, education attainment and employment status of the household head; household size; whether the household has at least one child below age 6; household wealth asset score; and whether farming is the household's main economic activity. Standard errors are clustered at the community level. Naive (unadjusted) p-values are shown in parentheses. *p < 0.10, ** p < 0.05, *** p < 0.01. 17 The shares of communities assigned to CLTS in which triggering meetings were successfully run were similar (83% in less and 75% in more wealthy communities), and not significantly different (p-value = 0.301). Attendance rates at the triggering meeting, measured as the number of attendees recorded by CLTS facilitators over village population, were also not significantly different between more (34%) and less wealthy communities (42%). A community level regression of attendance rates on community level wealth group (i.e. a dummy equal to 1 if the community is less wealthy) and LGA fixed effects yields a point estimate of just 3ppts and a p-value of 0.662. 18 In Online Appendix Section E.2, we show that other household characteristics (some of which correlated with wealth), such as household composition or education of the household head, do not explain the impact of CLTS either.
remains: 31% (34%) of households living in less (more) wealthy clusters have higher (lower) wealth than the median. Splitting the sample into four cells along median community and household wealth, we further investigate CLTS impacts by household and community wealth. While both less and more wealthy households reduce OD in less wealthy communities, there is no discernible effect nor a difference between them in wealthier communities (see Table 5, Columns 4 and 5). We conclude that CLTS is more effective in less wealthy communities, regardless of the household's position in the wealth distribution. This result is in line with Ravallion and Wodon (1999), supporting the idea of geographical targeting of development interventions, which we discuss further in Section 7.

Characteristics of less and more wealthy communities
Community wealth also correlates with a number of other community characteristics, many of which have been identified as driving factors behind intervention effectiveness, particularly in the domain of public health and health-related infrastructure (Bulthuis, Kok, Raven, & Dieleman, 2020;Cameron et al., 2019;Augsburg, Bancalari, Durrani, Vaidyanathan, & White, 2022;Deserranno, Stryjan, & Sulaiman, 2019). We thus examine whether rural communities' (i) perceived benefits and risks of sanitation infrastructure, (ii) social cohesion and interactions, (iii) local public infrastructure and (iv) leaders' characteristics are driving our estimated wealth-specific CLTS impacts. 19 Column 1 in Table 6 presents estimates of impact heterogeneity by communities' perceptions of sanitation benefits. We find weak evidence that CLTS impacts are high in areas where residents perceive sanitation benefits to be low at baseline (Column 1), but only when considering naive p-values, and we do not find evidence for differential CLTS impacts between communities with high and low benefit perceptions.
Using a wide array of measurements to capture social cohesion and social interactions -community participation, trust, religious fragmentation and wealth inequality-we find slightly stronger reductions in OD in treated communities with lower baseline social capital, fragmentation and inequality, that are, however, not statistically significant (Columns 2 to 5).
Our main results could also be explained by lower access to infrastructure in less wealthy communities, which we establish along the existence of local schools, hospitals and paved internal roads. For example, road infrastructure may proxy for transport costs, while health infrastructure may proxy for local hygiene knowledge and education levels. In contrast to our findings, both channels would lead us to expect higher CLTS impacts in wealthier communities. In addition, Augsburg et al. (2022) argue that poor public infrastructure may hamper the maintenance of sanitation investments, and hence the sustainability of intervention impacts. Yet, we find no heterogeneous CLTS impacts along any public infrastructure domain (Table 7, Columns 1 to 3).
Finally, recent evidence regarding implementing interventions at scale emphasizes the importance of political leaders and implementers for their effectiveness (Deserranno et al., 2019;Cameron et al., 2019;Jack & Recalde, 2015). Particularly since leaders are the initial point of contact for the implementing WASH officials and help organize the CLTS meeting in their village, their characteristics may affect CLTS impacts. Communities with wealth below the median in our sample have less experienced and less educated leaders than those above the median, while there is no discernible difference in their political ideology (see Online Appendix E.1). CLTS, which first seeks contact with village leaders to convince them of the benefits of sanitation, may close the knowledge gap between less and more experienced and educated leaders regarding sanitation, and hence render CLTS more effective in less wealthy communities. Yet, we find no evidence of heterogeneous CLTS impacts by leaders' tenure or education to support this hypothesis (Table 7, Columns 4 and 5).
Taken together, our results suggest that community wealth encompasses a number of distinctive community characteristics that made CLTS more effective. Yet, none of these characteristics (such as toilet coverage, implementation, measures of social cohesion, local public infrastructure or the characteristics of their leaders) could independently account for the strong differential impacts of CLTS in less wealthy communities, leaving us with one very robust predictor of CLTS effectiveness: community wealth. Notes: 'Difference' is the coefficient c d from estimating Eq. 2. Column 1 shows impact estimates for communities with baseline toilet coverage above and below the median. In Column 2, we present estimates using night light intensity above and below the median as an alternative measure of community wealth. Columns 3 to 5 present CLTS impact estimates for households with wealth above and below the median household in all communities (Column 3), as well as separately estimated in wealthier (Column 4) and less wealthy (Column 5) communities. Values below the median for each respective variable are labelled as 'Low'. Control means are calculated using endline data. Household controls are: age, gender, education attainment and employment status of the household head; household size; whether the household has at least one child below age 6; household wealth asset score; and whether farming is the household's main economic activity. Standard errors are clustered at the community level. Naive (unadjusted) pvalues are shown in parentheses. In brackets we present p-values adjusted by family-wise error rate following Romano and Wolf (2005), using 1,000 cluster bootstrap samples. *p < 0.10, ** p < 0.05, *** p < 0.01.
19 See Online Appendix Table E1 for an overview of systematic differences between less and more wealthy communities, and Online Appendix Section B.2 for a detailed discussion of the measurements used in this section.

Assessing external validity and the scope for wealth-based targeting of CLTS
Our finding that CLTS works in some but not in other contexts adds to the conflicting evidence on CLTS effectiveness, discussed previously. Importantly though, we identify one factor -community wealth -that can serve as guidance in which Nigerian settings CLTS can be effective in reducing open defecation.
We argue that our estimates can be used by the Government of Nigeria to target CLTS implementation, and thereby make more effective use of constrained government funds. Readily available population survey data, such as the 2013 Demographic and Household Survey (DHS), contain geographic location information and similar (if less detailed) asset lists to elicit wealth. We show in support of this argument that our findings are robust to using the less comprehensive wealth definition used in the DHS (see Online Appendix F for a detailed discussion). We use the DHS wealth index to classify the country's regions 20 into 'low wealth' and 'high wealth' areas to obtain the targeting map shown in Fig. 3. It highlights priority areas for targeting, i.e. less wealthy areas, in darker shades. 21 We further conduct an exploratory analysis to get insight into whether community or area wealth could also be used as a basis for targeting CLTS implementation in other contexts.
In a simple cross-study analysis, we add available microdata from RCTs conducted in Indonesia and Tanzania 22 to our study data, and re-estimate impacts on this pooled sample. 23 The exercise necessitates a consistent measure of community wealth across contexts. Since durable items underlying asset wealth indices are highly country-and context-specific (Filmer & Pritchett, 2001), so that the asset lists used in data collection vary  (2), community participation (3), fragmentation (4), and wealth inequality (5). Values below the median for each respective variable are labelled as 'Low'. Control means are calculated using endline data. Household controls are: age, gender, education attainment and employment status of the household head; household size; whether the household has at least one child below age 6; household wealth asset score; and whether farming is the household's main economic activity. Standard errors are clustered at the community level. Naive (unadjusted) p-values shown in parentheses. In brackets we present p-values adjusted by family-wise error rate following Romano and Wolf (2005), using 1,000 cluster bootstrap samples. *p < 0.10, ** p < 0.05, *** p < 0.01. Notes: 'Difference' is the coefficient c d from estimating Eq. 2. Heterogeneity dimensions considered are communities' public good infrastructure (Columns 1 to 3) and leader traits (4 and 5). Values below the median of the respective community or leader characteristics are labelled as 'Low'. Control means are calculated using endline data. Household controls are: age, gender, education attainment and employment status of the household head; household size; whether the household has at least one child below age 6; household wealth asset score; and whether farming is the household's main economic activity. Standard errors are clustered at the community level. Naive (unadjusted) p-values shown in parentheses. In brackets we present p-values adjusted by family-wise error rate following Romano and Wolf (2005), using 1,000 cluster bootstrap samples. *p < 0.10, ** p < 0.05, *** p < 0.01. 20 DHS is designed to be representative at the region level. Within regions, a random sample of communities is surveyed. A more detailed targeting map would require an alternative data source which covers all communities. 21 While we conduct this analysis at a very high geographic level, the same logic can be applied with data available for lower-level geographical asset information. 22 The datasets can be found in: Water and Sanitation Program (2008) for Indonesia and Briceno, Coville, and Martinez (2012) for Tanzania. 23 We acknowledge that each of these study contexts will be additionally characterized by context or location-specific factors (and measurement error), emphasized in Wang, Moss, and Hiller (2006) and Meager (2019), and that community wealth may not be the unique source of heterogeneous impacts. A more detailed analysis would require access to the exact cluster locations in a larger number of studies than currently available to identify cross-study variation separately from genuine observational differences across households as, for example, proposed by Meager (2019). Unfortunately, microdata from the CLTS RCT in Mali are not publicly available, and data from the RCT in Bangladesh do not provide cluster level identifiers (Guiteras, Levinsohn, & Mobarak, 2017). We use these data in a coarser analysis, described in Online Appendix Section G.2. We estimate heterogeneous CLTS impacts by night light intensity separately in all five RCTs, using the lowest available geographical observation unit to map in night light intensity. We consistently find that CLTS is on average more effective in less wealthy settings, i.e. those with lower night light intensity.
across studies 24 , we base our analysis on night light intensity, which is readily available, consistently measured globally, and has been used otherwise as a proxy for wealth or, more generally, economic activity (Michalopoulos, 2013). The CLTS impact estimates in our RCT are robust to using local night light intensity as a proxy for community wealth (see column 2, Table 7), -yet this measure does not dominate community wealth as a predictor of CLTS impacts in a horserace exercise (see Online Appendix Table G2). We compute night light intensity at cluster level d in the baseline survey year for the pooled sample and estimate average and heterogeneous CLTS impacts on toilet ownership by night light intensity, akin to impacts presented for Nigeria in Appendix Table G3. Columns 1 and 2 in Table 8 show average impact estimates of CLTS in the pooled dataset. Country fixed effects are included in Column 2 to pick up sampling variation across RCT sites, since we cannot rely on strict exogeneity by randomization for identification when using pooled estimates. The estimated average CLTS impact is a 5 ppt increase in toilet ownership that is statistically significant and robust to the inclusion of country fixed effects.
Columns 3 to 5 present heterogeneous impact estimates for the pooled sample using three alternative functional forms to capture night light variation. First, we split geographic units according to whether they display zero or positive night light intensity to reflect the strong right skew of its distribution (Appendix Fig. B1). Second, we use the results from our Nigerian RCT as reference and define the Nigerian night light median at the community level as a split point as we found no CLTS impact estimates beyond the median level of wealth in our RCT ( Fig. 1 and Table 3). Third, we estimate heterogeneous impacts by night light intensity using a more flexible split into tertiles. We replicate our finding that CLTS impacts vary by communities' wealth status, using this pooled sample and night light intensity as a proxy. Impact estimates are substantially larger in areas with low night light intensity, of magnitude 9 ppts in areas with zero night light, i.e. the lowest tertile. 25 These results get even stronger when we split areas along the Nigerian night light median (Column 4): in areas of low night light intensity CLTS increases toilet ownership by 12 percentage points (significant at the 1% confidence level). Similarly, we never find statistically significant CLTS impact estimates in areas of high night light intensity in any specification. Furthermore, impacts are declining across tertiles of increasing night light (Column 5), similar to the results in our RCT presented previously (Fig. 1). We test whether the difference between the estimated coefficients in 'low wealth' and 'high wealth' areas differ from zero and reject the hypothesis in all specifications but one. The exception is the specification in Column 3 where we split the sample into zero and positive night light areas. It is likely that this split is too coarse, since it puts areas with very low but positive night lights into the 'high' category. 26 We supplement this evidence by plotting point estimates of impacts on toilet ownership and open defecation observed in our study, Tanzania and Indonesia (Briceno et al., 2017;Cameron et al., 2019), as well as two further RCT evaluations, in Bangladesh and Mali (Pickering et al., 2015;Guiteras et al., 2015) by their corresponding satellite night light intensity. The exercise yields an inverse relationship between community wealth and CLTS impacts (see Online Appendix, Fig. G1).
In summary, the pooled cross-context estimates closely replicate our CLTS impact estimates from the Nigerian setting, and suggest that our findings may also be useful for the targeting of CLTS beyond the Nigerian context.

Conclusion
The design of effective policies to address the urgent sanitation concerns in the developing world requires a nuanced understanding of households' investment choices and drivers of behavioral change. In this paper we provide evidence on the effectiveness of Community-Led Total Sanitation (CLTS), a participatory information intervention without financial components, as implemented at scale in a collaboration between NGOs and the Nigerian government.
Our study uses a large cluster randomized experiment in Nigeria for which we collected data up to three years after treatment. Implementation of CLTS in this context was conducted at scale, i.e. by WASH civil servants trained by local NGOs. We show that the intervention had strong heterogeneous impacts by community wealth, with significant and lasting effects on open defecation habits in less wealthy communities, reducing OD rates by 9 percentage points from a baseline level of 75%. We find no effect of CLTS in wealthier communities. The OD reduction in less wealthy communities is achieved mainly through increased toilet ownership (+8 ppts from a baseline level of 24%). This result, which is robust across alternative measures of community socio-economic status, is not driven by baseline differences in toilet coverage, and can be replicated across other settings, which we show by pooling data from our study and a limited set of RCTs of similar interventions.
We provide an example of how our results could be used to develop a potentially more effective targeting strategy for CLTS in Nigeria as well as other contexts.
Our results have two further implications. First, they provide an additional reason why scale-up of interventions is not trivial Fig. 3. CLTS targeting in Nigeria. Notes: The map depicts the targeting map based on quartiles of the percentage of rural communities per region with wealth below the Nigerian median of community wealth. Community wealth is constructed using the DHS asset ownership list discussed in detail in Online Appendix F. Areas marked in darker shades depict regions in the lowest quartile of the Nigerian distribution of rural community wealth, i.e. those with the largest fraction of less wealthy rural communities. Based on our CLTS impact estimates by community wealth, CLTS should first prioritize these regions (where 97% or more communities have below median wealth), then those in the second quartile (where 68 to 97% of rural communities are less wealthy), and so forth. Source: Own calculations based on DHS Nigeria 2013.. (Ravallion, 2012;Bold, Kimenyi, Mwabu, Ng'ang'a, & Sandefur, 2018;Banerjee et al., 2017;Deaton & Cartwright, 2018). Discussions on why interventions may not scale-up successfully in a national roll-out have focused on general equilibrium and spillover effects, and recently on aspects of implementation and delivery. We show, in line with the literature on geographical poverty traps, that community-specific, heterogeneous treatment impacts are an additional impediment to successful scale-up in terms of effectiveness of interventions.
Second, we show that interventions relying on information and collective action mechanisms can have substantial impacts on households' health investments and behavior, specifically relating to sanitation. Yet, there is an important caveat for policymakers working towards meeting the sanitation-related sustainable development goals. CLTS achieves convergence between asset-poor and asset-rich communities in terms of OD and toilet coverage in our study, and thus levels the playing field. However, it is not a silver bullet that closes the large sanitation gap towards achieving open-defecation-free status in less wealthy communities as a standalone intervention. Hence, more research on alternative or supplementary interventions to close the sanitation gap in lowincome countries is needed. These may seek to magnify CLTS impacts achieved through geographical targeting by complementing with individually targeted interventions (Elbers, Fuji, Lamjouw, Ozler, & Yin, 2007). We know from the literature that the alleviation of liquidity constraints (through financial incentives, loans or subsidies) is important (see, for example, Yishay et al., 2017;Guiteras et al., 2015;Patil et al., 2014;Andres et al., 2020), as are implementation design choices, such as more intensive follow-up (Venkataramanan, Crocker, Karon, & Bartram, 2018;Augsburg et al., 2022).
In addition, there should be a focus on alternative approaches in wealthier communities where CLTS is ineffective, for example, via infrastructure investment and supply-side interventions.

Data availability
I have shared date and codes in the Attach Files step Notes: Pooled regression results using the Indonesian, Nigerian and Tanzanian samples. All specifications control for gender, age and age squared of the household head, as well as whether farming is the main economic activity of the household. District fixed effects are also included and errors are clustered at the level of the randomization unit.