Clean environments as a social norm: a field experiment on cigarette littering

Cigarette littering in public spaces is an environmental and aesthetic problem. Broken windows theory posits that visible signs of anti-social behavior such as littering create the perception of a social norm in built environments. Cigarette butts on the ground then encourage people to drop theirs as well. We test this theory on benches of a university campus in a field experiment with two treatments: (1) a clean environment with no cigarette butts on the ground and (2) a dirty environment with 25 cigarette butts on the ground. Our outcome variable is the number of additional cigarette butts on the ground after two hours. We find a small effect of approximately 0.5 butts less per 2-hour period on clean grounds. Increased cleaning efforts can thus reduce littering, but the effect is probably too small to justify additional cleaning costs.


Introduction
Human behavior in public spaces is at the core of many sustainability challenges. Among these challenges, littering is an aesthetic and an ecological problem (Dur andVollaard 2015, Veitch et al 2017). Cigarette butt littering in particular has adverse environmental and health effects (Barnes 2011, Harris 2011, Healton et al 2011, Slaughter et al 2011. Several measures to reduce littering, including cigarette littering in public spaces, such as bans, stricter laws, regulation, and nudging have not or only partly succeeded (Cingolani et al 2016, Reiter and Samuel 1980, Schneider et al 2011, Smith and Novotny 2011. Especially hospitality and educational venues, playgrounds, and bus stops are hot spots of cigarette littering (Valiente et al 2020), creating a demand for more applied research (Valiente et al 2020).
Social norms are a strong driver of human behavior, and they can explain extensive and continued cigarette butt littering. People's perceptions of what is socially appropriate or inappropriate are highly predictive of actual behavior in abstract (Krupka and Weber 2013) and contextualized experiments (Loft et al 2019, Vesely andKlöckner 2018). In a built urban environment, the impact of social norms on behavior has been mostly discussed in relation to broken window theory (Harcourt and Ludwig 2006, Volker 2017, Wilson and Kelling 1982. Broken window theory posits that small cues of neighborhood deterioration can lead to anti-social behaviors, because the negative status quo is perceived as a dominant social norm. Broken windows theory suggests that a person is less likely to litter in a green and tidy environment (Dur and Vollaard 2015, Joo and Kwon 2015, Weaver 2015. Increased public cleaning efforts could then reduce littering by breaking the downward spiral of littering. In the long run, cleaning costs would decrease. Several experimental studies in different contexts (Cialdini et al 1990, Crump et al 1977, Dur and Vollaard 2015, Finnie 1973, Geller et al 1977, Krauss et al 1978, Ramos and Torgler 2012, Reiter and Samuel 1980, Reno et al 1993 have investigated this issue in field experiments, most of which have found statistically significant and large effects (littering was cut in half in many instances). For example, Ramos and Torgler (2012) investigated the littering behavior of academics in a clean versus a messy shared indoor space, finding that more than 50% of subjects litter in the messy environment compared to only 18% in a clean environment. Cialdini et al (1990) find similar effect sizes in three different field experiments in outside environments. Only Crump et al (1977) and Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Reno et al (1993) find no or opposite effects. In a littered picnic area, Crump et al (1977) find less additional litter when tested against a benchmark of a clean picnic area, and Reno et al (1993) find no effect of a clean environment at the parking lot of a public library. As summarized in Dur and Vollaard (2015), most studies recommend increased cleaning efforts to reduce public littering.
While the evidence on the existence of social norms in the context of littering is overwhelming in the case of littering in general, there is no study that experimentally investigates this effect for cigarette littering. Cigarette littering is different from other kinds of littering, e.g.plastic bags or food left-overs. First, cigarette butts are small. Their visual and aesthetic effects could be perceived as less severe, as cigarette butts are not as disturbing as larger litter. Second, placing a cigarette butt into a proper bin requires more effort than normal litter. A smoker has to put out the cigarette butt first, which is often done on the ground. A smoker has to pick up the cigarette butt from the ground. Third, smoking is a habitual behavior, with smokers being constantly confronted with the disposal of cigarette butts, implying a certain fatigue to environmental stimuli. To identify mechanisms to reduce cigarette littering, a clear understanding of the effect of a clean versus a littered environment is required.
We contribute to the empirical literature on littering in clean versus dirty environments with a field experiment on cigarette butt littering. We introduce treatments of a clean and a dirty environment around public benches on a university campus-a hot spot of cigarette butt littering (Valiente et al 2020). Our outcome variable is the number of additional cigarette butts on the ground. Our hypothesis is that a clean environment leads to less cigarette littering.
Our contribution is twofold. First, we augment the behavioral science literature on social norms by providing additional evidence from the field in a new context. It is important to replicate experiments in diverse contexts and to build a body of studies that can be used for meta-analyses (Christensen et al 2019). Second, our results are useful for planners and policy makers to identify strategies to reduce littering.

Experimental design
The experiment took place at benches at the urban campuses of the Technische Universtität Berlin (approximately 33,000 students) and the adjacent Universität der Künste (approximately 4,000 students) in Berlin, Germany 3 . Both campuses are accessible to the general public. Two treatments were applied. In both treatments, we removed all cigarette butts from the ground. In treatment 1 (henceforth CLEAN), we left the ground clean. In treatment 2 (henceforth DIRTY) we placed exactly 25 cigarette butts within a 1.5 m radius in front of the bench. Our outcome variable was the number of additional cigarette butts on the ground after two hours. Every two hours, starting from 8:45 am, the ground in front of the benches was cleaned, and for the DIRTY treatment 25 cigarette butts were placed in front of the bench. After cleaning, the field assistants left the area to avoid experimenter demand effects. After two hours, the field assistants returned, counted the number of cigarette butts on the ground, and prepared the bench for the next round of data collection.
Our sample size allowed us to detect rather small effect sizes. For instance, for a difference of one cigarette and an assumed standard deviation of 2, the required sample size per treatment was 63 (for a two-sided independent t-test with α=0.05 and P=0.8). As we did not have any priors and could not make any distributional assumptions, we opted for a sample size of 100 per treatment. To achieve 200 observations we planned to use ten benches with four observations per bench and day for a duration of five days. We used a balanced design to achieve orthogonality between treatments, benches, time of the day, and day of the week. Each bench was alternately assigned to one of the treatments. For example, bench 1 on day 1 was assigned to CLEAN in the first run (9-11 hours), to DIRTY in the second run, to CLEAN in the third run and to DIRTY in the fourth run. On day 2, the order was reversed.
The experiment was carried out from Monday to Friday in the second week of June 2018. Initially, we selected ten benches, based on a combination of the 'diverse cases' and 'similar cases' principles (Seawright and Gerring 2008). The aim of the selection process was to cover as much heterogeneity in benches as possible while having enough observations for each type of bench to get statistically significant results. To fulfill the first criterion (diverse cases principle), we selected five benches which are as different to each other as possible based on location, congestion and type of people visiting them. Note that there could be a tradeoff between aiming for heterogeneity for broader coverage as well as validity and the efficiency goal to achieve a small standard deviation. For the second criterion, for each bench that we have included, we looked for similar benches. We only selected benches that were frequently used and feasible for data collection. For example, benches with a sandy ground were excluded, as cigarette butts easily disappear in the sand (see figure A1 for a map and pictures of selected benches).
During the experiment, some data points were lost. First, some benches were cleaned by the university staff during the data collection. We could not record the number of cigarette butts in these instances. Second, a children festival took place at two benches for two hours, and people were not allowed to smoke during that time. Third, on windy days, the wind blew away the cigarette butts. The last issue was problematic as we could detect it only for the DIRTY treatment (if there were less than 25 cigarettes when we recorded the data). Removing these cases would have led to an unbalanced design and an upward bias of the treatment effect. Instead of deleting these observations, we recoded the observations as zero, i.e., no additional cigarettes. By doing so, we maintain orthogonality and avoid an upward bias of the treatment effect estimate (at the risk of a downward bias). In other words, the results are a lower bound and a conservative estimate of the treatment effect. During field work, it became apparent that the targeted number of 200 observations could not be achieved. To compensate for that, two benches (benches 11 and 12) were added during the field phase, resulting in a total of 12 benches. The two additional benches were included from day 3 onwards, and treatments were assigned with the same procedure. In total, 206 data points were observed and used for the final analysis.

Results
Due to the loss of data points, the final outcomes are not fully balanced across benches, time of the day, and day of the week. Yet, we do not use balance tests on bench covariates, as they are a form of data-driven analysis (Ali et al 2015, Linden 2014. Due to the initially balanced and fully orthogonal design, we have no reason to believe that missing data points introduce any bias. Even if data points were missing systematically, the design would be very robust to this. In figure A2, we present graphs with respect to number of observations per treatment by benches, time of the day and day. Figure 1 shows a bar chart with the number of cigarette butts on the ground by treatments. The frequency of zero butts is higher in the CLEAN treatment. Yet, in the CLEAN treatment there were three observations with nine to 12 additional cigarette butts. Table 1 shows that the mean and median are lower, and the standard deviation is higher in the CLEAN treatment. A two sided t-test and a non-parametric Wilcoxon test reject the Null hypothesis of equal means/ distributions at a 1% significance level. The difference between the means is approximately 0.6. Plots distinguishing between benches, day of the week, and time-of-day are presented in figure A2. We estimate Poisson (count data) regression models with various fixed effects to adjust for benches, days and time-of-day effects (Cameron and Trivedi 2013). The treatment variable (DIRTY) is 0 for the CLEAN treatment and 1 for the DIRTY treatment. The coefficient of DIRTY shows the effect of the treatment on the number of cigarettes on the ground. In total, we estimate five models with different fixed effects (table 2). The first model (column 1 in table 2) includes all fixed effects (bench, day of the week, time-of-day). Models 2 to 4 include bench, day of the week, and time-of-day fixed effects, respectively. Model 5 has no fixed effects.  The results of the regressions are in line with the comparisons of means and distributions. The estimates range from 0.472 to 0.522 and are significantly different from zero on a 1% level. Most of the fixed effects are not statistically significantly different from zero. Day 2 and bench number 10 have a significant negative effect (5% level) of −0.47 and −2.0 on the number of cigarettes on the ground, respectively. Benches 4 and 6 display a significant positive effect of 0.7 and 0.5. We have no explanation for these effects, and they may just indicate the idiosyncratic popularity of a bench among smokers or a higher frequency of use. Controlling for fixed effects does not change the treatment effect. We can reject the hypothesis that there is no effect of a CLEAN environment. The full model results are reported in table A1. As an additional robustness test, we estimate ordinary linear least square regression models with the same variables. The results are similar to the Poisson model and also reported in table A2.

Discussion and conclusion
In this paper, we investigated the effect of a clean environment on cigarette butt littering. Based on broken windows theory, we argued that a clean environment can be perceived as a social norm. We conducted a field experiment at 12 benches of a university campus and randomly manipulated the number of cigarette butts on the ground. A clean environment reduced the number of cigarette butts on the ground by approximately 0.5 every two hours.
In contrast to previous studies, the positive effect of a clean environment was rather small. In other contexts, effects of a clean environment showed decreases in littering of more than 50%. Our results suggest that cigarette littering is substantially different from other forms of littering. Hence, implications differ in the specific case. To better understand the conditions under which broken windows theory holds, more studies in various contexts and further replications of experiments are required. A meta-analysis of existing results can help to identify the factors influencing broken windows theory.
The results of other papers often imply that more cleaning leads to a 'double dividend' and reduces littering substantially (Dur and Vollaard 2015). These papers end with the policy recommendation that cities should increase their budgets for cleaning. In our case, the small effect of about 0.5 cigarette butts per two hours appears too small to justify additional cleaning. Cheap measures, such as signs (Krauss et al 1978, Reiter andSamuel 1980), more and better suited litter cans (Finnie 1973), and anti-littering and anti-smoking campaigns and information (Finnie 1973, Geller et al 1977 may be more effective. Increased cleaning should focus on the 'hot spots' of littering (Valiente et al 2020). In some contexts, small nudges have been powerful tools to increase pro-environmental behavior (e.g. Rommel et al 2015). In many countries, cigarette packages display deterring pictures to limit smoking (Hammond 2011). One could test the effect of normative messages on the negative environmental consequences of littering on the cigarette package or test the impact of messages next to the benches.
Our study is limited in several aspects. First, we used only two treatments with zero and with 25 cigarette butts on the ground. To better understand the broken windows effect, one would need to know more on intermediate levels, possible tipping points such as the number of cigarette butts on the ground that would lead to a qualitative change in the perception of the dominant social norm in smokers (Cialdini et al 1990). Second, we used a specific context (university campuses and benches). In other contexts, the effects may be very different. For example, it is likely that littering behavior depends on socio-demographic characteristics (e.g. Krauss et al 1978) and the type of area (Valiente et al 2020). In our study we cannot control for such variables and it is likely that our sample is not representative for the city of Berlin. Third, we did not observe people directly. A direct observation of the people who litter would have allowed for a more extensive analysis of the determinants of littering. Several studies have found that littering behavior differs between socio-demographic groups. In contrast to those studies, we observed only the number of cigarette butts after two hours, without records of the people using the bench. Fourth, our results could be threatened by experimenter demand effects (Zizzo 2010). Before each run, field assistants cleaned the ground. We cannot fully rule out that potential smokers observed this, which might have affected their behavior. However, we believe that the effect-if any-is likely small, as we tried to avoid attention, selected time slots during teaching hours, and cleaning and counting took only approximately three minutes. Finally, our study lasted only for one week. Potential long-term effects are not covered. It would be important to investigate how people change their behavior in the long run. Only one major study (Dur and Vollaard 2015) has so far investigated long-term effects. Our experiment may serve as an inspiration for further experimentation. Extending the combination of sociological theory and behavioral economics in urban and landscape planning can help to address sustainability challenges (e.g. Lilley 2009). Interdisciplinary collaboration of experimental and behavioral economists with architects and planners is still rare, and there should be more of it (Klotz et al 2019). Small field studies such as ours are also well-suited to integrate students into interdisciplinary research activities early in their curricular.