Introduction

Nearly three billion people, primarily in low-income countries, burn highly polluting solid fuels (biomass and coal) in inefficient and typically unvented traditional stoves for cooking and space heating [1]. The resulting household air pollution is the third leading risk factor for mortality and morbidity globally, responsible for an estimated 3.9 million premature deaths and 110 million disability-adjusted life-years each year worldwide [2••]. Household air pollution is the leading environmental risk factor worldwide, causing more premature deaths and morbidity than unimproved water and sanitation, radon, and lead combined. Included in the household air pollution burden at present are impacts due to ischemic heart disease, stroke, acute lower respiratory infections in children, chronic obstructive pulmonary disease, lung cancer, and cataracts [3]. Evidence for additional health impacts, such as tuberculosis, adverse birth outcomes, asthma, acute lower respiratory infections in adults, and non-lung cancers, is accumulating, but these outcomes have not yet been included in the Global Burden of Disease estimates [3]. Several recent reviews have summarized the body of literature regarding the health impacts of household air pollution [312].

There are two general lines of inquiry for epidemiologic studies of household air pollution. One line of inquiry attempts to further characterize the association between exposure to household air pollution and specific health endpoints. The main goals of this line of inquiry often include a better understanding of the contribution of household air pollution exposure to the pathophysiology of disease, characterizing the exposure-response function, identifying the most important components or constituents of household air pollution leading to these health effects, and identifying subgroups of the population that may be especially vulnerable to the effects of household air pollution. A second line of inquiry is to evaluate the potential health benefits associated with a specific intervention, such as deployment of new cookstoves, a specific fuel (e.g., liquefied petroleum gas (LPG)), or promoting change in user behavior.

Important gaps remain within both lines of inquiry. For example, with respect to the first line of inquiry, diseases such as tuberculosis, asthma, and adverse birth outcomes were not included in the 2010 Global Burden of Disease estimates due to a lack of consistent evidence [3]. Furthermore, although cardiovascular disease was included in the 2010 Global Burden of Disease for household air pollution, the estimates were based primarily on evidence from other sources of combustion-related pollution (e.g., ambient air pollution, secondhand tobacco smoke, and active smoking) rather than household air pollution from cookstoves [13]. Existing evidence specifically linking household air pollution with cardiovascular diseases is almost exclusively limited to indicators of risk (e.g., blood pressure) rather than risk of overt clinical events such as myocardial infarction, stroke, or cardiovascular mortality [3]. In regard to the second line of inquiry, cookstove intervention programs have had limited success. Few studies to date have demonstrated that household energy interventions can have a meaningful impact on household air pollution exposure and population health, often due to incomplete adoption of the intervention or the use of an intervention stove and/or fuel that does not reduce emissions to a sufficient degree to produce meaningful health benefits.

Funding agencies, policy makers, and stove dissemination programs often emphasize the need for randomized controlled trials for household air pollution research. For example, Allen et al. 2015 recently called for increased use of randomized studies for environmental health research [14•], and the National Institute for Environmental Health Sciences (NIEHS) recently released a concept clearance calling for large randomized trials of household air pollution interventions [15]. There is a common belief that causality cannot be established without evidence from robust randomized trials. Although a randomized trial is considered the strongest design in epidemiology due to its ability to reduce the impact of confounding, the optimal design for studies of household air pollution may vary depending on the research question being investigated, as described above. Indeed, in the context of household air pollution, randomized trials may frequently be impractical, unaffordable, and/or not feasible for certain health endpoints. Furthermore, randomized trials have important limitations that may outweigh their benefits, particularly for studies of household energy interventions, and may even lead to improper causal inference. Here, we discuss the advantages and limitations of randomized trials in the context of household air pollution and draw lessons from the literature on household water treatment. We then discuss alternative study designs and methods of analysis for household air pollution research. Finally, we emphasize that high-quality, quantitative exposure assessment leading to precise exposure-response characterization is an important contribution to the field (i.e., information that is more transferable across locations and stove types) regardless of study design, and this consideration may be more important than the study design itself.

The Randomized Controlled Trial Design

The randomized controlled trial, typically considered the strongest study design in epidemiology, utilizes random allocation of participants to different treatment groups. In the studies of household air pollution, participants are often randomized into a “new stove/fuel” group and a control group that continues using traditional cooking methods. If the randomization is successful, the treatment groups will, on average, be similar with respect to all measured and unmeasured factors other than the assigned treatment. This random allocation allows for an evaluation of the treatment(s) with less potential for residual confounding versus studies where treatments or exposures are not randomized (i.e., observational studies). The close relationships between household air pollution exposures, socioeconomic factors, and health are often difficult to completely disentangle via traditional statistical methods of adjusting for confounding, and thus, the use of randomized trials has become particularly appealing for studies of household air pollution. Random allocation of the treatments can be accomplished at the individual level or by using cluster randomization (e.g., at the community level). The primary analysis for a randomized trial is to assess the average impact of the intervention on the individual, household, or community units that are assigned to a particular treatment (i.e., intention-to-treat analysis) compared to the control group. A common secondary analysis is to evaluate the average impact on those who actually adhered to the treatment (i.e., as treated) compared to those who did not receive the treatment; this secondary analysis does not have the advantages of the intention-to-treat analysis with regard to confounding. Because randomized trials evaluate the impact of specific interventions, they can provide direct information about the efficacy of these interventions, as opposed to the indirect evidence provided by observational studies evaluating the adverse health effects of exposure. Because of a range of constraints, however, the use of randomized trials for intervention evaluation is far less common in environmental epidemiology compared with other fields such as clinical medicine, pharmacology, and development economics [14•].

Limitations of Randomized Trials in Household Air Pollution Research

Many of the known limitations of randomized trials are especially pertinent for research on household air pollution. Specifically, randomized trials can be subject to bias and ethical dilemmas and difficult to interpret and may not provide appropriate information for making recommendations for policy decisions. Important limitations of randomized trials for, but not limited to, household air pollution include the following:

  • Randomized trials are not well suited for evaluating the effects of exposure on the risk of diseases with long latencies or diseases that are relatively rare (e.g., lung cancer, tuberculosis, chronic obstructive pulmonary disease, cardiovascular diseases). In this setting, a randomized controlled trial would have to be impossibly long, large, and/or expensive to have sufficient statistical power.

  • Early biomarkers are available for some chronic diseases (e.g., blood pressure for cardiovascular diseases), which may decrease the time and sample size needed for randomized trials. However, most biomarkers are intrinsically more difficult to interpret as disease outcomes and thus may have less direct policy relevance. There are also examples from the clinical literature where randomized studies on intermediate endpoints have not been confirmed in subsequent studies looking directly at clinical events (e.g., [16, 17]).

  • Given the nature of the interventions, blinding of participants to treatment assignment is near impossible, potentially resulting in issues with retention in the study, adherence to assigned treatment, and information bias.

  • Blinding of investigators to treatment status, another goal among high-quality randomized trials, can also be difficult or impossible. This limitation is particularly important for subjective health measures (e.g., self-reported symptoms, quality of life) or for health outcomes that require referral for further diagnostics from the investigative team (e.g., a field team member recommends that a child obtain further evaluation from a physician for pneumonia at a clinic).

  • For various reasons, intervention studies in household air pollution are often plagued by low adoption and sustained use of the intervention stove or fuel. In the absence of complete adherence to the assigned treatment, the observed effects of the intervention may be attenuated in the intention-to-treat analysis, and alternative analytic approaches (e.g., the as-treated analysis) are susceptible to the introduction of bias.

  • Spillover effects, or contamination effects, are often of concern. A randomized trial assumes that control homes are not affected by the intervention. This is a doubtful assumption for many studies of fuel and energy interventions given the shared access to markets, government aid, and other resources among households in the same village and villages in the same region. Furthermore, household randomization may be particularly problematic due to the typically close proximity of intervention and control households; air pollution emitted from a control household can penetrate into an intervention household and attenuate air pollution reductions achieved in the intervention households by the use of the new stove. Cluster randomization may be more realistic than individual-level randomization (household randomization), but this design is typically not as effective as individual-level randomization at reducing the impact of confounding, requires statistical methods to account for potential non-independence within the clusters, and is substantially more expensive [18].

  • There are important ethical considerations for randomized trials of household air pollution. Randomized trials in this field typically involve assignment of intervention aimed at removing or reducing the exposure rather than randomizing to a potentially harmful exposure or treatment. However, it is generally accepted that randomized trials are only ethical when there is remaining uncertainty about the intervention [14•, 19]. This standard may be met for some, but not all, health outcomes.

  • Most randomized trials by design evaluate efficacy—the impact of the intervention under ideal conditions—rather than effectiveness—the impact of the intervention under realistic conditions. For example, participants in a randomized trial may be provided with incentives to use the intervention stove as well as financial or educational support to use and maintain the stove. This may ultimately lead to different behaviors than in real-world scenarios and limit the study’s external validity. In addition to estimating the average health impact of the intervention, of great interest to policy makers are effectiveness trials that can also answer questions such as “Does the intervention work as intended? Who is likely to benefit most and who is likely to benefit least? How might the intervention be designed differently to improve its health impact?”

  • The results of a randomized trial in a specific location and population may not be generalizable to other geographic locations or populations. For example, a stove that shows meaningful health benefits in a specific village in Malawi would not likely be appropriate for populations in Central America, India, or China (or even in other villages in Africa or in Malawi itself). Furthermore, as with randomized trials in other fields, participants in trials often differ from the general population, and it is not clear that all populations will benefit similarly from household air pollution interventions [20].

  • Given the advances in stove and fuel technology and community infrastructure over time (e.g., the availability of natural gas or electricity), by the time the results of a randomized trial are published, the situation and set of available interventions may have already changed.

For further illustration of the potential limitations of randomized trials for household air pollution research, consider the only randomized trial for household air pollution published to date, the Randomized Exposure Study of Pollution Indoors and Respiratory Effects (RESPIRE) trial in Guatemala [21••]. In the intention-to-treat analysis, the intervention, a well-liked and well-used chimney woodstove (compared to open wood cookstoves), resulted in a 20 % reduction in the primary outcome of all physician-diagnosed childhood pneumonia that was marginally non-statistically significant (relative risk [RR] = 0.78; 95 % confidence interval [CI] 0.59–1.06) as well as stronger, statistically significant reductions (33–46 %) in three secondary outcomes of severe childhood pneumonia.

As an efficacy trial, RESPIRE households were encouraged to use the intervention stoves, which were repaired and maintained for the participants as needed. As a result, the intention-to-treat analysis by itself does not say whether this stove would work even in this specific population in Guatemala as a widespread intervention for childhood pneumonia, let alone successfully transfer to a different population in Central America or elsewhere in the world. The stove was idiosyncratic for this population at best and, for this research project, at worst. Even if the chimney stove in RESPIRE had performed exceedingly well in its population, for example, one could not know if it would be used or be as successful anywhere else.

If the causality between household air pollution and childhood pneumonia was still in doubt, then such a study might have some benefit. Indeed, this was the initial justification behind RESPIRE, which was planned some 17 years before it was funded in 2001, during a period in which there were still doubts about the health risks of household air pollution. By the time of the major RESPIRE intention-to-treat publication [21••], however, this point was no longer in much contention. Instead, the question had become how much health protection could be achieved by feasible interventions for broad populations, which is a question inherently not addressable by a small randomized trial of a local intervention in a particular population.

Fortunately, RESPIRE also conducted extensive personal quantitative exposure assessment of the children and women participating in the study, producing striking exposure-response relationships for several types of acute lower respiratory infection [21••]. It is this exposure-response relationship that was combined with the results of studies of ambient air pollution and secondhand tobacco smoke to create the exposure-response for acute lower respiratory infection used in the Global Burden of Disease estimates [2••, 3, 13]. Furthermore, the exposure-response analysis from the RESPIRE study suggested that the greatest benefits may occur at the lower end of the exposure spectrum (below the average exposure achieved by the intervention stove) [21••].

In the context of randomized trials for drugs or vaccines, a potential intervention does not reach the randomized trial stage unless it has been shown through a highly structured process to have a high probability of working for large populations. The idea is if the expensive randomized trial works well, the drug or vaccine can then be rolled out to millions of people with some confidence. In contrast, many household air pollution randomized trials and intervention studies are using technologies that have not been shown in any structured way to accomplish much in the study population, let alone more broadly.

Based on the RESPIRE experience, the most useful approach then would be to incorporate measurements of air pollution exposure into randomized trials of household energy interventions (see Clark et al. [22•] for further discussion of exposure assessment in household air pollution studies). Since any intervention will require behavioral changes, which come at different rates within a population, it is rare that everyone will adopt the intervention at the same moment. In household air pollution, this phenomenon is termed “stacking”; i.e., some households will continue to use the traditional polluting fuels even if partly taking up the intervention [23]. In theory, this scenario is not different than people failing to take a drug properly (non-compliance), which must be factored into any assessment of drug effectiveness. In the case of household air pollution, however, it is probably a larger problem and thus must be addressed more robustly in any household air pollution study, randomized or not.

It is important to note here that exposure assessment for household air pollution also has important gaps that will need to be addressed regardless of the study design used. More specifically, there are currently no well-developed long-term metrics of exposure. On the one hand, simple exposure indicators such as self-reported years of cooking with a fuel have provided valuable information in household air pollution studies; however, they cannot account for the high degree of uncertainty and variability in exposures, limiting in their ability to detect changes in health associated with exposure differences. A quantitative exposure metric analogous to pack-years in smoking that could be applied in these settings would be a valuable tool. Given the wide variation in stoves, fuels, cooking patterns, and household arrangements, however, development and validation of such a metric seem unlikely. Also, unlike pack-years, even a metric validated in one population could likely not be transferred to another population without extensive additional validation.

Lessons from Studies of Household Water Treatment

Various household water treatment methods have been promoted to reduce the microbial content of drinking water and prevent diarrhea in low-income settings. A number of comparisons between household energy and household water treatment methods can be drawn; both are aimed at reducing pervasive environmental risks and are usually disseminated to homes in low-income countries. Despite the promise of these interventions to improve health based on laboratory studies of efficacy, none has achieved global scale-up or successfully reduced environmental exposures on a population level due to a complex set of technological, behavioral, and economic barriers [24, 25]. This failed promise has led to calls for more rigorous, systematic assessment of both water quality and energy interventions under conditions of actual use, including the use of randomized trials [2628]. Notably, in sharp contrast to the one completed randomized trial for household air pollution [21••], at least 30 randomized trials have assessed the effectiveness of different household water treatment methods to improve water quality and health [29]. These trials are diverse in their geographic region and study design choices (i.e., blinded vs. un-blinded, household vs village randomization) and thus provide a rich literature base from which to evaluate the use of different randomized trial designs for assessing environmental interventions. Key lessons from this literature that can be applied to the setting of household energy include the challenges of blinding participants to interventions, the importance of objective outcomes, and the potential importance of spillover effects of the intervention and placebo from one household to another. Thus, the extensive experience in the field of household water treatment suggests that randomized trials of pervasive environmental exposures provide important information but that the limitations of randomized trials may compel the use of additional, complementary study designs and research questions to fully characterize both efficacy and effectiveness of the interventions.

Alternative Approaches

Given the above challenges to the design, conduct, and interpretation of randomized trials in the setting of household air pollution, it is perhaps somewhat surprising that recent calls for more research have focused almost solely on conducting additional randomized trials, sometimes at the exclusion of other epidemiologic approaches. Nonetheless, a growing body of evidence from observational studies is helping to clarify and quantify the adverse health effects of household air pollution. Among observational studies, cross-sectional studies are perhaps most abundant. In this design, the outcome and exposure are assessed at the same time. Although such studies are typically vulnerable to reverse, such studies have been useful as an early step in the research field, providing preliminary evidence for a number of important hypotheses that need to be subsequently evaluated in more detail. For example, a number of studies have considered the cross-sectional association between household air pollution exposure and either disease biomarkers (e.g., C-reactive protein) or the presence of prevalent disease (e.g., existing cardiovascular disease). In addition to the potential for reverse causation, these studies are particularly susceptible to confounding because inferences are drawn by comparing exposure and outcome across individuals.

Repeated-measures studies examining the change in outcome over time in association with changes in exposure over time are typically not susceptible to reverse causation and may also be less susceptible to confounding by between-person differences since the focus of the inference is on within-person changes. In the setting of household air pollution, one challenge with repeated-measures studies is ensuring that there are sufficient within-person differences in exposure levels over the period of observation. One approach to fostering larger within-person differences in exposure over time has been to provide participants with a non-randomized intervention with the aim of lowering household air pollution exposures. These studies are often referred to as “pre-post” studies. Importantly, the aim of this (non-randomized) intervention in a repeated measures study is fundamentally different than the purpose of the intervention in a typical randomized trial; in the former, the purpose is to alter exposure levels to enhance within-person differences in exposure over time, while in the latter, the purpose of the study is to specifically test the hypothesized benefits of the intervention. Additionally, these types of studies are often limited to endpoints that can change meaningfully over the proposed time period of the study.

While studies of continuous biomarkers of disease are useful for elucidating the pathophysiological and often subclinical effects of exposure, results of such studies are not generally ideally suited for quantifying the local or global burden of disease or for motivating local policy changes. For this purpose, requiring clinical events such as disease incidence and/or mortality, the gold standard observational study design for these purposes is the prospective cohort study, where individuals free of the disease of interest at enrollment and with different exposure patterns are followed longitudinally for new onset of disease. The major challenges of this study design include the need to carefully assess between-person differences that may be associated with both exposure and risk of the outcome (i.e., confounding) and the large population that may need to be observed in order to accumulate a sufficient number of disease events in a reasonable period of time. Notwithstanding these challenges, when done well, prospective cohort studies can provide an invaluable wealth of information about the relationship between exposure and disease outcomes. Such information is key (and currently largely unavailable for many important disease outcomes) to quantifying the burden of disease of household air pollution, characterizing the exposure-response relationship, and identifying subgroups of the population that might be particularly susceptible to these effects.

Case-control studies offer an efficient alternative to traditional prospective cohort studies. In the absence of bias, the associational estimate derived from a case-control study will be, on average, equivalent to the estimate that would have been obtained from a cohort study in the same population, but with wider CIs. The challenge in case-control studies is to sample the controls in such a way that they provide an unbiased estimate of the distribution of exposure among the study base (the population or person-time that gave rise to the cases). Notwithstanding this challenge, a case-control study that is well designed and executed can provide substantial and valuable new information about the association between exposure and disease incidence at a fraction of the time and cost of a cohort study.

Observational epidemiologic studies are often criticized as vulnerable to confounding and selection bias. In the case of household air pollution, populations using various types of stoves or fuels are often inherently different with regard to poverty-related characteristics and other factors. The close links between socioeconomic status, fuel and energy use patterns, and health outcomes often make the confounding nearly intractable. For example, users of higher-priced fuels tend to be of higher socioeconomic status and often more urbanized than users of traditional biomass and coal fuels. Similarly, stove distribution programs from governmental or non-governmental organizations will not likely result in random allocation to households (e.g., development projects may distribute stoves preferentially in poorer households). This type of non-random distribution can bias estimates of the relationship between stove use (or household air pollution exposure) and health.

These scenarios of non-random intervention allocation often lead to the call for randomized stove and/or fuel trials; however, the incorporation of methods not traditionally used in household air pollution epidemiology, such as propensity score analysis, may serve as a viable alternative to derive inference about causality from non-randomized observational studies [30]. Propensity score analysis allows for a better understanding of how the various stove/fuel groups differed before stove allocation was conducted, and this information can then be used to validly estimate the relationship between stove/fuel use and health. Other analytic approaches may also serve to approximate evidence from randomized trials in scenarios typically encountered in household air pollution research; marginal structural models can be used when time-varying confounding is likely to have occurred [31], and difference-in-difference matching may be appropriate when certain characteristics are correlated with covariates but unobservable to the investigator [32]. However, meeting the assumptions necessary to interpret an observational study estimate as a causal effect may prove difficult within the household air pollution field and will need further evaluation.

These alternative approaches can lead to estimates that are more easily generalizable, more relevant to policy makers, and potentially less biased than estimates from randomized trials. A randomized trial for which a large percentage of participants assigned to a specific stove/fuel treatment does not comply in ways that are unknown to the investigators may lead to improper causal inference as compared to a well-conducted observational study for which reasons for the initial allocation of the stove or fuel are well understood [33], particularly when linked to exposure assessment. Although increasingly recognized in the field as being critical to advancing the goal of improved health, we have limited information regarding reasons for non-adoption of household energy interventions and for stove stacking [34]. We also have limited information regarding approaches that can improve adoption and result in sustained appropriate use of a cookstove or fuel intervention [34]. Therefore, until our knowledge advances in this area, randomized trials will be subject to these limitations resulting from non-compliance.

Conclusions

We are not the first investigators to question an emphasis on randomized trials for household air pollution interventions [35] or the use of randomized trials more generally (e.g., [3638]). However, the importance of household air pollution globally, in addition to potential future funding plans from the National Institutes of Health [15], begs revisiting this issue. Randomized trials can provide strong evidence in terms of causality and efficacy of a specific intervention applied to a selected population. Indeed, strategies can be employed to at least partially alleviate some of the limitations of randomized trials, such as incorporating quantitative exposure assessment and utilizing design choices such as objective measurements, staggered intervention roll out, and interventions that have already demonstrated an acceptable level of exposure reduction. Nonetheless, the limitations of the randomized trial should be considered in the context of other study design options and the questions of interest to policy makers.

As seen by the examples provided from the household water treatment literature, the limitations of randomized trials are not unique to household air pollution. Compared with contaminated water, however, contaminated air creates a much larger portion of its effects as chronic diseases, which, by their nature, are not suited to randomized trials. As with other major categories of combustion smoke exposures, i.e., ambient air pollution, secondhand tobacco smoke, and active smoking, therefore, other research designs must be used. It is important to note, however, that no randomized trials have been done with ambient air pollution, secondhand smoke, and active smoking and probably never will be; robust observational epidemiology has provided evidence more than sufficient to drive policy for improving health.

Careful study design, high-quality measurements, and appropriate analysis methods can greatly strengthen the validity of results from observational designs. A variety of study designs may be able to provide information to fill the existing knowledge gaps if they include quantitative and high-quality exposure and health endpoint measurements and appropriately track and measure stove use and other potential important confounders over time. It is important to note that these recommendations hold true even for randomized trials. Relying on a randomized trial alone (i.e., an intention-to-treat analysis) essentially gives up the chance to do an even more valuable analysis—exposure-response. Unlike many other important risk factors, even environmental ones such as unclean water and poor sanitation, for which investigators have been unable to conduct exposure-response studies due to a lack of exposure metrics, air pollution has well-developed, standardized, and validated exposure metrics that can be translated across the world. The household air pollution community should exploit this considerable advantage at every opportunity as well as encourage even better technologies and protocols to do so in developing-country household settings.

The advantages of including exposure assessment in randomized trials were clearly demonstrated in the RESPIRE study. Unlike a particular stove at a particular place and time, exposure measurements are transferrable from one place to another, assuming that differences in susceptibility are similar across locations. Indeed, we have not conducted health effect studies of outdoor air pollution in every location in the USA, for example, but we control outdoor air pollution nevertheless based on standards derived by evaluation of exposure-response studies done in other locations. Given the enormity of the public health burden due to household air pollution and limited available resources, it is important that we do not prioritize a study design ahead of other aspects of design that will allow us to answer the most policy-relevant questions.