Does Deforestation Increase Malaria Prevalence? Evidence from Satellite Data and Health Surveys

Deforestation has been found to increase malaria risk in some settings, while a growing number of studies have found that deforestation increases malaria prevalence in humans, suggesting that in some cases forest conservation might belong in a portfolio of anti-malarial interventions. However, previous studies of deforestation and malaria prevalence were based on a small number of countries and observations, commonly using cross-sectional analyses of less-than-ideal forest data at the aggregate jurisdictional level. In this paper we combine 14 years of high-resolution satellite data on forest loss with individual-level survey data on malaria in more than 60,000 rural children in 17 countries in Africa, and fever in more than 470,000 rural children in 41 countries in Latin America, Africa, and Asia. Adhering to methods that we pre-specified in a pre-analysis plan, we tested ex-ante hypotheses derived from previous literature. We did not find that deforestation increased malaria prevalence nor that intermediate levels of forest cover had higher malaria prevalence. Our findings differ from most previous empirical studies, which found deforestation to be associated with greater malaria prevalence in other contexts. We speculate that this difference may be because deforestation in Africa is largely driven by the slow expansion of subsistence or smallholder agriculture for domestic use by long-time residents in stable socio-economic settings rather than by rapid clearing for market-driven agricultural exports by new frontier migrants as in Latin America and Asia. Our results imply that at least in Africa anti-malarial efforts should focus on other proven interventions such as bed nets, spraying, and housing improvements. Forest conservation efforts should focus on securing other benefits of forests, including carbon storage, biodiversity habitat, clean water provision, and other goods and services. www.cgdev.org Sebastian Bauhoff and Jonah Busch


Introduction
Malaria incidence fell by 41 percent globally between 2000(WHO, 2016. Yet malaria remains a substantial health burden in many low-and middle-income countries, with 212 million cases and 429,000 deaths worldwide in 2015 (WHO, 2016). This public health burden motivates the need for research on effective and cost-effective anti-malarial interventions. Proven anti-malarial interventions include insecticide-treated bed nets, indoor residual spraying, and prompt clinical treatment (Bhatt et al., 2015), as well as environmental management (e.g., drainage and canal lining) and modified human habitation (e.g., house siting and design) (Keiser et al., 2006).
The loss of natural forest cover ("deforestation") has been found to increase the density of malaria-transmitting mosquitoes and increase malaria risk in some settings (e.g., Vittor et al., 2009), suggesting that in some circumstances forest conservation might belong in a portfolio of anti-malaria interventions. However, research is needed to establish whether and when deforestation increases the proportion of people with malaria ("prevalence"), and whether and when forest conservation would be a cost-effective anti-malarial strategy relative to other interventions.
Deforestation has been found to lead to heightened malaria risk through both ecological and socio-economic mechanisms. Relative to forests, deforested lands have higher temperatures (Lindblade, 2000), more sunlight (Yasuoka and Levins, 2007), and more standing water (Patz et al., 2000), resulting in accelerated life cycles (Afrane et al., 2005), faster pupation and growth rates (Munga et al., 2006), longer survival time (Zhong et al., 2016), and higher biting rates (Petney, 2001;Vittor et al., 2006) of malaria-transmitting mosquitoes. Relative to forests, cleared lands also have fewer insectivores, more species competing for ecological niche, and arguably fewer "dead-end hosts" to dilute malaria (LaPorta et al., 2013, Wood et al., 2014. Additionally, "frontier malaria" can result from unstable socio-economic conditions commonly associated with deforestation, including rapid in-migration, new human exposure (e.g., Friedrich, 2016) and low immunity, poor housing quality, and sparse availability of health services, (de Castro et al., 2006). Singer and de Castro (2006) suggest that these "frontier malaria" effects last for 6-8 years.
There is heterogeneity in the strength and direction of these effects. Different regions have different dominant Anopheles mosquito species (Kiszewski et al., 2004) and deforested areas may be favored by some mosquito species but not others (Kar et al., 2014;Burkett-Cadena and Vittor, 2017). Deforestation is canonically considered to lead to an increase in malariatransmitting mosquitoes in Africa and Latin America but a decrease in Asia (Guerra et al., 2006). Small clearings can increase malaria by producing land cover more hospitable to larvae, while large clear cuts can decrease larvae (Singer and de Castro, 2006). While primary forest might suppress mosquitoes, secondary regrowth might encourage them (Vittor et al., 2009). And early stages of frontier settlement can have larger effects than later stages (de Castro et al., 2006;Baeza et al., 2017). All of which is to say, "the linkage between deforestation…and malaria transmission is a subtle process requiring analysis at several temporal and spatial scales" (Singer and de Castro, 2006). A recent systematic review of studies of deforestation and malaria risk in the Brazilian Amazon "failed to find overwhelming evidence supporting a consistent simple and straightforward relationship between forests, deforestation rate, and malaria" (Tucker Lima et al., 2017).
Furthermore, many other factors besides deforestation affect malaria prevalence in humans. Climatic variables that influence malaria include temperature, precipitation (Beck-Johnson et al., 2013, Mordecai et al., 2013Parham and Michael, 2010), and seasonality (Hay et al., 1998). Demographically, children under five are more vulnerable to malaria than adults and suffer more than two-thirds of deaths from malaria (WHO, 2016). Differential access to health facilities means that malaria prevalence may be higher in areas that are poor or remote-two factors that more often than not are also correlated with higher forest cover and lower deforestation rates (Busch and Ferretti-Gallon, 2017). And people can engage in avoidance behaviors such as installing bed nets and window screens, reducing standing water, and spraying to repel mosquitoes. Thus, even though there are established ecological and socio-economic links between deforestation and malaria mosquito incidence and transmission, ecological nuance and human mediating factors mean that lower rates of deforestation might not always lead to lower malaria prevalence in humans in practice. We present a conceptual model of the coupled human and natural system of deforestation and malaria in Figure 1.
A recent strand of empirical literature has tested the hypothesis that deforestation increases malaria prevalence in humans using multivariate econometric tests (Wayant et al., 2010;Pattanayak et al., 2010;Olson et al., 2010;Valle and Clark, 2013;Hahn et al., 2014a;Garg, 2014;Terrazas et al., 2015;Fornace et al., 2016;Austin et al., 2017). Seven of these studies (including two studies not yet published in peer-reviewed literature) found a positive relationship, while one found no relationship (Hahn et al., 2014a) and one found a negative relationship (Valle and Clark, 2013, but see also Hahn et al. (2014b) and Valle (2014)) ( Table  1). However, these previous empirical studies of deforestation and malaria prevalence had several limitations. First, their geographic evidence base is narrow. Six of the nine studies were from either Brazil (n=4) or Indonesia (n=2), with one from Paraguay, one from Malaysia, and one cross-national study that compared national-level data across 67 countries. We are not aware of any empirical study of deforestation and malaria prevalence from Africa, where 88 percent of malaria cases occur (WHO, 2016) and where the pattern of patchy clearing by long-established smallholders differs from the large-scale clearing for industrial agriculture by recent migrants found elsewhere (Fisher, 2010). Second, the measures of forest loss used in many of these studies are less than ideal. Other than the 4 studies from Brazil that used high-resolution PRODES-Landsat data (Olson et al., 2010;Hahn et al., 2014a;Valle and Clark, 2013;Terrazas et al., 2015), the remaining studies relied on MODIS data (which relative to Landsat data has coarser spatial resolution (500 m) and is better for rapidly monitoring the occurrence of deforestation events than measuring the area of deforestation (Goetz et al., 2014)), NDVI (which measures vegetation greenness rather than forest cover), or self-reported national rates of deforestation (which are error-prone (Grainger, 2008)). Third, all but one study (Pattanayak, 2010) used data aggregated to the level of jurisdictions, rather than grid-cell-level data on deforestation and individual-level data on malaria. Spatially aggregated data is potentially prone to the ecological fallacy (Piantadosi, 1988) or unmeasured confounding factors. One study of aggregated malaria survey data concluded that "it is not clear…that meaningful results can be obtained when survey data are highly aggregated; researchers…should asses the feasibility of disaggregating existing survey data" (Comfort et al., 2017). Fourth, the number of observations in previous studies is small, with fewer than 1000 observations in all but one study and fewer than 100 observations in three studies. Fifth, all but one previous study (Garg, 2014) is cross-sectional, which has the limitation of being unable to fully control for confounding unobserved timeinvariant geographic variables that might be correlated with both deforestation and malaria.
The primary goal of this paper was to test ex ante hypotheses derived from previous literature: that malaria prevalence in humans is higher where deforestation is highest, and where forest cover is intermediate (de Castro et al., 2006), controlling for other factors using a multivariate regression analysis. In doing so we make a number of advances on previous empirical studies of deforestation and malaria prevalence. Our study explores previously unstudied geographic regions by examining data from malaria tests from 17 countries in Africa and data on fever from 41 countries across Latin America, Africa, and Asia. We used the most accurate data set for measuring forest cover and deforestation over large scales-30-meter resolution Landsat. We used granular data on malaria in individual children and deforestation at the local grid-cell level (roughly 5.5 kilometers squared at the equator), rather than examining effects at the aggregate jurisdictional level. Our sample size (n~60,000 individuals for malaria and n~470,000 individuals for fever) was considerably larger than most previous studies, allowing more degrees of freedom to conduct more sophisticated tests. And because our data contained some locations with repeat surveys in the same communities, we were able to supplement a core cross-sectional analysis with a panel analysis with geographic fixed effects.
Secondarily, we tested several other ex ante hypotheses derived from the literature related to disaggregation: that the effect of deforestation on malaria is greater in Latin America and Africa than Asia (Guerra et al., 2006); greater for smaller than larger cut sizes (Singer and de Castro, 2006); and greater at earlier stages of a forest transition (de Castro et al., 2006). We also tested the ex ante hypothesis that the effect of deforestation on malaria diminishes after about seven years (Singer and de Castro, 2006).
Initially, a third goal of this paper was to assess the cost-effectiveness of forest conservation as an anti-malarial intervention relative to other widely used interventions. We had planned to use the results of the multi-variate regression analysis to perform back-of-envelope calculations of avoided loss of disability-adjusted life-years (DALY) per dollar spent on forest conservation. We then planned to compare this estimate of cost-effectiveness to other common malaria interventions such as intermittent preventive treatment ($24/DALY), insecticide treated bed nets ($27/DALY) and indoor residual spraying ($143/DALY) (White et al., 2011). If forest conversation were comparable in cost-effectiveness to bed nets and spraying, this would suggest that the public health community should actively promote including forest conservation within a portfolio of interventions. If effective but not-costeffective, they might still understand the anti-malarial value of forest conservation but view it only as a positive co-benefit of actions taken by others for other reasons. However, we were unable to carry out cost-effectiveness analyses for reasons explained below. We had also planned to test whether the effect of deforestation on malaria is mediated through housing quality, water source, or access to health services, but were also unable to do so for reasons explained below.
We contribute to the trend toward ensuring the credibility of empirical results in economics by writing and adhering to a pre-analysis plan, as described further below.

Data
We obtained three indicators of malaria in children from the United States Agency for International Development's Demographic and Health Survey Program, which includes both Demographic and Health Surveys and Malaria Indicator Surveys, all of which followed the same sampling protocol. Our primary dependent variable was the binary presence or absence of malaria in individual children under the age five. This measure was obtained from rapid diagnostic tests ("rapid tests"), which detect parasite antigens rather than actual parasites (Florey, 2014). They are relatively simple, requiring only a single drop of blood and do not require skilled technicians or access to laboratory equipment (Florey, 2014). However, rapid tests are prone to false positives (Kiemde et al., 2017), for example several weeks after the treatment of recent infections (Florey, 2014).
As a secondary dependent variable we used lab microscopy tests ("lab tests") from the same suite of surveys. In this process blood smears are taken to laboratories to test for the presence or absence of malaria parasites; this process is often complicated under field conditions and can be more expensive than rapid tests (Florey, 2014). Given that it is not clear that the results of either rapid tests or lab tests are better than the other, we prespecified rapid tests as our primary dependent variable rather than lab tests because the number of surveys that included this indicator was higher. Rapid tests were more frequently positive than lab microscopy tests in our data as they were in Florey (2014). Furthermore the two malaria tests were relatively well correlated in our data (r=0.58; Table 3) as they were in Florey (2014).
As a third dependent variable we used binary self-reported fever ("fever") in the last two weeks. Results from malaria tests and fever may differ because there is a lag of up to two weeks between the fever onset and the blood tests, because many cases of fever are caused by something other than malaria (Mayxay et al., 2013, Kiemde et al., 2017, and because recall may be flawed. There is probably a positive and not-by-chance-alone correlation between reported fever and rapid test results (Okiro and Snow, 2010), but this correlation is noisy and varies by country (eg Mayxay et al., 2013). Indeed, in our data fever is only weakly correlated with rapid tests (r=0.15) and lab tests (r=0.09). While fever is likely not a highly reliable proxy for malaria, it is still weakly correlated with malaria prevalence and was surveyed much more broadly, so we report results on this dependent variable as well.
Finally, in a single sensitivity analysis we included as a dependent variable mortality in children under one year of age. This variable does not distinguish the cause of death, but some substantial fraction are likely from malaria. Young children in particular are at high risk of malaria and more than two-thirds of malaria deaths occur in this group (WHO, 2016). We tested the relationship between deforestation and mortality to gain insight into whether ignoring malaria-related deaths that occurred before the interview date may have caused us to inadvertently underestimate the effect of deforestation on malaria prevalence.
Our candidate pool was all national surveys conducted under the auspices the Demographic and Health Surveys Program of the United States Agency for International Development (n~400; DHS, 2017b). These included both Demographic and Health Surveys, which asked about many development indicators, and Malaria Indicator Surveys, which were specific to malaria. These surveys were executed by an implementing agency within a host country, typically a national statistical agency. The countries and years for which surveys were conducted on malaria were not randomly selected. Rather, our sample of surveys was probably biased toward both places with higher levels of malaria (since Malaria Indicator Surveys were targeted to malaria-infected countries by design) and times with higher levels of malaria (since "the [Malaria Indicator Surveys are] usually timed to correspond with the high malaria transmission season." (DHS, 2017c).
We restricted the scope of our study to surveys from countries in Latin America/Caribbean, Sub-Saharan Africa, and South and Southeast Asia. We then limited the study to surveys that gathered observations between 2001 and 2014-the period for which temperature and precipitation data were available. We dropped those 2014 surveys that gathered some observations in 2015 for which we did not have temperature data. We further restricted the scope to rural areas, following the rural/urban binary coding used in the surveys. These scope restrictions resulted in 60,305 respondents from 23 surveys in 17 countries for rapid tests; 56,883 respondents from 22 surveys in 17 countries for lab tests; and 469,539 respondents from 90 surveys in 41 countries for fever (Table 4), of which 79 surveys were Demographic and Health Surveys and 11 were Malaria Indicator Surveys. A subset of the surveys constituted a panel; that is, they collected data from within the same geographic cell at multiple points in time.
Respondents' locations were geo-located by the original surveys. In order to preserve respondent confidentiality, the Demographic and Health Surveys administrators added random positional errors of 0-5 kilometers to 99 percent of rural clusters and random positional errors of 0-10 kilometers to 1 percent of rural clusters, within the country and survey region (DHS, 2017a). Surveys were designed to be representative of the national population or particular subsets including rural populations, after appropriate sample weights are applied; see more below on weighting.
6 Our independent variables of interest were forest cover and deforestation near the interview location. Both variables were obtained from a recent data set that was derived from satellite measurements (Hansen, 2013) and extended through 2015 (GFW, 2015). The data was spatially explicit at 30-meter resolution, applied spatially consistent methods across the entire globe, and applied temporally consistent methods from 2001-2014. The deforestation measure includes all loss of tree cover, including natural forests as well as plantations and gardens, and both anthropogenic clearing as well as natural causes such as fires. The measure of deforestation is gross rather than net; that is, it does not include forest gain. The Hansen et al. (2013) study also included a measure of forest gain, but this data set is unreliable for our purposes-as cautioned by Tyukavina et al. (2014), forest gain in the Hansen et al. We aggregated forest cover and forest-cover loss to cells that are 0.05 degrees on a side (~5.5 km at the equator) by updating the 2001-2012 data prepared by Busch and Engelmann (2017) through 2014 using the same methods. Both forest measures were fractional: deforestation was measured as the area of forest loss in the year as a fraction of cell area; forest cover was measured as the area of forest cover in the year as a fraction of cell area. Although aggregating forest cover and forest-cover loss to the grid-cell scale has the drawback of losing hyper-local information on deforestation associated with 30-meter satellite data, it has several advantages too. The aggregated scale is arguably more relevant for malaria transmission and forest cover change: the maximum flight distance of Anopheles mosquitoes is 3-10 km (Kaufmann and Briegel, 2004) while the flight range of a mosquito is typically 2-5 kilometers, thus deforestation in the local neighborhood may be a better indicator of malaria risk than just at the immediate point. To the extent that nearby observations may be spatially autocorrelated due to mosquitoes transmitting malaria between children included in surveys, these effects would be subsumed within the cell. The aggregated scale is also appropriate given the random positional errors of 0-5 km that were deliberately added to survey locations.
We included both forest cover and forest-cover squared as independent variables in our model, as we sought to differentiate between two distinct, though related, hypotheses derived from previous literature: first, that the highest risk of malaria is at intermediate levels of forest cover (e.g., de Castro et al., 2006), and second, that deforestation increases the risk of malaria (e.g., Vittor et al. 2009). As sensitivity analyses we examined models that included only deforestation; only deforestation and forest cover; and only forest cover and forestcover squared. Four previous studies included both forest cover and deforestation as explanatory variables (Valle and Clark 2013; Hahn et al. 2014a;Terrazas et al. 2015;Fornace et al. 2016), though none also included forest-cover squared.
Both deforestation and forest cover were negatively correlated with illness-related dependent variables (malaria in rapid tests; malaria in lab tests; fever), though weakly so (-0.10<r<0) ( Table 3). Heat maps of the relationship between forest cover, deforestation, and illnessrelated dependent variables did not show any clear pattern ( Figure 2). However, these simple first-order analyses could be biased if they do not control for the influence of other confounding variables that are correlated with deforestation and influence malaria prevalence. So, we turned next to multivariate regressions. We partially addressed potential omitted variables bias by including observable confounding variables in crosssectional regressions. However, this could still leave out unobservable confounding variable. We addressed time-invariant, spatially variant unobservable variables in panel regressions, though this comes at a cost to sample size. We considered an instrumental variable approach using variables that were plausibly correlated with deforestation but not with malaria (e.g., agricultural prices), but we decided to reserve such an approach for future work.
We sought to control for the influence of other variables known to have a direct effect on the likelihood a subject would have malaria ( Figure 1). Because malaria risk varies with weather, we included temperature and precipitation during the month of the interview as control variables. We considered one month to be a reasonable time interval as this corresponds roughly to the 2-4 week life cycle of an Anopheles mosquito (CDC, 2015). We included squared terms for both temperature and precipitation, as literature suggests malaria risk is highest at intermediate values of these variables (Beck-Johnson et al., 2013, Mordecai et al., 2013Parham and Michael, 2010). Malaria also varies based on season, as does the timing of surveys: DHS surveys that collect biomarkers are generally fielded shortly after the rainy season when malaria risk is highest, whereas non-biomarker surveys are fielded in the dry season for logistical reasons (Measure Evaluation, 2013). We did not attempt to code season directly across many countries, though temperature and precipitation variables capture some aspects of seasonality.
Housing quality also affects malaria exposure (Tusting et al., 2015). As a proxy for housing quality we included floor type as a control variable. We constructed a binary code of whether the floor type of the house was unfinished (e.g., clay, mud, or sand) or finished (e.g., brick, cement, or tile). We selected floor type as the proxy for housing quality because all surveys asked about floor type whereas more than 20 surveys did not ask about wall or roof type. As a sensitivity analysis we replaced floor type as a control variable with an index of housing quality constructed by summing three binary measures of whether the floor, walls, and roof of the house were unfinished or finished. For example, we coded bamboo, thatch, and wood walls as unfinished and concrete, metal, and stone walls as finished; we coded canvas, palm, and straw roofs as unfinished and asbestos, shingles, and tin roofs as finished. For a list of how we coded all constructed variables see Table 2 of our pre-analysis plan.
Proximity to standing water affects exposure to malaria-transmitting mosquitoes (Patz et al., 2001). Thus we constructed and included a binary variable for whether the household's water source was open (e.g., well, spring, pond) or piped or delivered (e.g., tap, bottled, tanker truck). 8 A child's age affects their exposure and risk of malaria; for example older children had greater malaria prevalence in Malawi (Zgambo et al., 2017), which that study's authors attributed to older children's greater independence and time spent outdoors during evening hours. We included as a control variable the child's age as a binary variable for each year between zero and four to allow for potential non-linear effects. Malaria biomarkers can linger in the bloodstream and malaria test results could potentially reflect exposure to malaria in months or years prior to the survey. Thus, to look only at malaria prevalence during the period of deforestation, we conducted a sensitivity analysis by including only children of age zero.
Households may adapt to increased environmental health risks by engaging in avoidance behavior (e.g., Moretti and Neidell, 2011). In the case of malaria, households may reduce their exposure to malaria-transmitting mosquitoes by installing insecticide-treated bed nets, which are considered an effective malaria prevention measure (Lengeler, 2014;Bhatt et al., 2015). We constructed a binary variable for whether the survey respondents stated that "some or all children slept under a bed net last night." Because not all surveys asked about bed net usage, we included bed net usage only in a sensitivity analysis. Households at higher risk of malaria may be more likely to undertake avoidance behavior by installing bed nets; thus without considering avoidance behavior the coefficient on the effect of forest loss on malaria may underestimate the full effect.
Access to local health services affects malaria. We constructed a binary indicator with the value of 1 if the child was delivered in a health facility (e.g., private, government, or NGO) and 0 if delivered at home or with a traditional birth attendant or midwife. We considered this to be a good proxy indicator for the availability of local health services because birth has been universally experienced by children under 5 and we assumed that birth in a facility universally indicated better access to health services. Alternative proxy indicators such as "child has received other vaccinations," were potentially less useful because recommended vaccines vary by country, some children may be too young to have received vaccines, and there could be a selection effect to vaccination campaigns, meaning that having been vaccinated might indicate either having better access to health services or living in an area with higher health risk. Because not all surveys asked about place of delivery, we included access to health services only in a sensitivity analysis.
We considered but did not include several control variables (e.g., wealth, education, remoteness) that, while likely correlated with malaria prevalence (e.g., Austin et al., 2017), should influence malaria transmission only through one of the direct channels above rather than directly (see Figure 1). There were also several variables hypothesized to directly affect malaria that we would have liked to include but for which data was not available. These included indoor spraying (e.g., Over et al., 2004) and population influx (see Figure 1).
We restricted the scope of our analysis to those observations for which data was available for all variables; that is we dropped observations lacking data for one or more variables.

Econometric model
Our primary econometric model was a spatially explicit cross-sectional logit model. That is: Here, is the binary health outcome for child i in grid cell c at interview time t. Forest is the forest area in the year of time t and Deforestation is forest loss between the year of time t-1 and the year of time t. Covariates include weather, household, and child covariates: temperature and temperature squared in the month of time t; precipitation and precipitation squared in the month of time t; floor type; water source; and child age. In sensitivity analyses, additional covariates included place of delivery, bed net usage, wall type, and roof type. We included survey-specific dummy variables, e.g.,, for "Liberia 2009" and "Liberia 2011". We clustered standard errors at the level of the forest grid cell because the exposures (forest cover and forest-cover change) were common to all children within a grid cell. In one sensitivity analysis we clustered standard errors at the level of DHS primary sampling units. In another sensitivity analysis we used an ordinary least squares (OLS) model rather than logit.
We weighted the observations in our sample to be representative of the rural population across the set of countries and years for which surveys were conducted. In one sensitivity analysis we rescaled the weights so that each survey counted equally; in another we used unweighted observations so that each observation counted equally.
Cross-sectional analyses of deforestation and malaria, including ours, face the challenge that observed variables may not fully control for all geographical differences that may be correlated with deforestation and affect malaria risk across sites, potentially introducing bias to estimates. In an attempt to address this issue we supplemented our cross-sectional analysis with a panel analysis that controlled for both observable and unobservable differences in malaria risk across sites. In the panel analyses we applied grid-cell level fixed effects in addition to survey-specific fixed effects in an OLS model. The panel specification would in theory be preferable to the pooled cross-sectional specification were it not for its small sample size. Out of 90 surveys, only two surveys collected repeat data from sites in the same grid cells for malaria rapid tests; two surveys collected repeat data from the same grid cells for malaria lab tests; and 19 surveys collected repeat data from the same grid cells for fever.

Pre-analysis plan
Like many other analyses, this analysis required choosing among many possible defensible model specifications and variables to include. In principle, we could have tested many permutations and chosen to present a combination of model specifications and included variables that yielded a result that was statistically significant, or had a particular effect direction, while not reporting on many other tests conducted ("data mining"; "p-hacking") (Olken, 2015). Ensuring the credibility of the analysis is especially relevant when testing a hypothesis prone to controversy, as in the case of deforestation and malaria (see for example Valle and Clark, 2013, Hahn et al., 2014b, Valle, 2014. We sought to enhance the credibility of our analysis in two ways. First, prior to conducting any analyses, we pre-selected model specification and included variables based on explicit ex ante hypotheses grounded in previous literature and our understanding of the coupled human-natural system rather than based on an exploration of our data. Second, we wrote and adhered to a pre-analysis plan, 1 in which all analyses to be undertaken in the study were specified in writing in advance. Pre-analyses plans are common and even required in some clinical research, but are uncommon and new in social science research including economics (see Miguel, 2014;Coffman and Niederle, 2015;Olken, 2015). Pre-analysis plans impose methodological discipline and prevent mid-stream revisions to methods and variables, and can help avoid the perception of inadvertently or deliberately placing a thumb on the scale to achieve desired results. On the downside, full specification in advance is "close to impossible," while a pre-specified analysis "may miss the nuance that categorizes social science research" (Olken, 2015).
We employed a two-stage "split sample strategy" as suggested by Anderson and Magruder (2017). That is, in a first-stage pre-analysis plan we constructed and tested the software code on a subset of just two surveys: Liberia 2009 and Liberia 2011. These two surveys were chosen at random from the set of countries that 1) had data for both fever and malaria; 2) had more than one year of data; and 3) were in Africa. Then we produced a second-stage pre-analysis plan for the full sample. We made and noted deviations from the original preanalysis plan only to ensure that our code was correct, not because the small sample yielded one result or another. Our pre-analysis plan describes in detail methods related to both those analyses that we undertook (e.g., correlations; heatmaps; cross-sectional regressions; panel regressions) as well as those that we were unable to undertake (e.g., mediation analyses; costeffectiveness analysis).

Results
The results of our primary specification were not consistent with our ex ante hypotheses that malaria prevalence is higher at intermediate levels of forest cover, nor higher at greater levels of deforestation, controlling for other factors (Table 5). That is, in a pooled, weighted crosssectional multivariate logit regression, we did not find a positive coefficient on forest cover, a negative coefficient on forest-cover squared, and joint significance; nor did we find a positive and significant coefficient on deforestation. This was the case for the primary dependent variable, malaria rapid test results, as well as for both secondary dependent variables: malaria lab test results and fever. As expected, we found malaria prevalence to be significantly higher at intermediate temperature, lower with a finished floor, and lower with a pumped or piped water source (Table 5). We also found malaria prevalence to be higher in older children, as found in Malawi by Zgambo et al (2017). The signs of the coefficients on precipitation and precipitation squared were consistent with malaria prevalence being higher at intermediate precipitation, but the coefficients were not jointly significant.
The results of multiple pre-specified sensitivity analyses were also not consistent with our ex ante hypotheses related to forest cover and deforestation. These sensitivity analyses included alternative weightings (Table 6-7); use of OLS (Table 8); reduction to the subset of observations for which data on all three dependent variables were collected (Table 9); clustering standard errors at the level of the DHS primary sampling unit (Table 10); the inclusion of a bed-net variable and place of delivery variable (Table 11-12); and the inclusion of the housing quality index in place of floor type (Table 13). (In these analyses bed net usage and delivery in a facility were both associated with significantly less malaria, and higher housing quality was associated with significantly less malaria and fever, as expected.) Nor were the hypothesized relationships with deforestation and forest cover borne out after limiting the sample to children of age zero only (Table 14), nor when using the alternative dependent variable of mortality in age-zero children (Table 15).
Alternative combinations of independent variables of interest also did not produce results consistent with our ex ante hypotheses. These included removing deforestation as an independent variable (Table 16), removing forest-cover squared as a dependent variable (Table 17), and removing both forest cover and forest-cover squared as dependent variables (Table 18).
Furthermore, for most individual surveys the results of our cross-sectional logit analyses also did not support our ex ante hypotheses (Table 19). The hypothesis that malaria is higher at greater levels of deforestation was borne out in only 1 out of 23 surveys for malaria rapid tests and only 1 out of 22 surveys for malaria lab tests. The hypothesis that malaria is higher at intermediate levels of deforestation was borne out in 4 out of 23 surveys for malaria rapid tests and 5 out of 23 surveys for malaria lab tests. For fever, the deforestation hypothesis was borne out in only 13 out of 90 surveys (8 out of 51 in Africa; 3 out of 10 in Asia; 2 out of 8 in Latin America), while the forest cover hypothesis was borne out in 23 out of 90 surveys (18 out of 51 in Africa; 2 out of 10 in Asia; 3 out of 8 in Latin America).
Our ex-ante hypotheses related to disaggregations were also not borne out. For instance, we did not find that the effect of deforestation on fever was significantly greater in Latin America and Africa than in Asia (Table 20). We did not find that the effect of deforestation on malaria or fever was significantly higher at earlier stages of a forest transition, i.e., at higher forest cover (Table 21). Nor did we find that the effect of deforestation on fever was disproportionately greater for smaller amounts of deforestation (Table 22). In no lagged period prior to the survey year were our ex-ante hypotheses related to forest cover and deforestation borne out (Table 23).
In our panel analysis, we did find, in accordance with our ex ante hypothesis, that malaria as measured in lab tests was significantly higher at intermediate levels of forest cover (Table  24). However, neither malaria as measured in rapid tests nor fever was significantly higher at intermediate levels of forest cover. None of the three dependent variables was significantly higher at higher levels of deforestation.
Because we did not find a significant effect of deforestation on malaria, we did not perform tests to see if the effect was mediated through housing quality, water source, or access to health services. Nor did we perform post-estimation analyses to calculate the total impact of deforestation on malaria, nor the marginal cost-effectiveness of forest conservation as an anti-malarial intervention.

Discussion
Neither of our ex ante hypotheses-that malaria prevalence would be higher at intermediate levels of forest cover, and that malaria prevalence would be higher at higher levels of deforestation-were borne out in our primary analysis, nor in many secondary sensitivity analyses. Out of 98 tests of pre-specified hypothesis run on the full pooled sample, in only a single case were our results consistent with our ex ante hypothesis: in the panel analysis, malaria as measured by rapid tests was significantly higher at intermediate levels of forest cover. Furthermore, only 11 of the 90 pre-specified hypothesis tests run on individual surveys related to malaria showed results consistent with our ex ante hypothesis. A slightly greater number of pre-specified hypothesis tests run on individual surveys related to fever showed results consistent with our ex ante hypothesis (n=36/180). Note that with so many tests some number of positive findings should be expected even in the absence of a real effect.
Our findings come with caveats related to the geographic distribution and selection of interviews. First, our malaria analysis was limited to 17 African countries (which together comprised 51 percent of the population and 65 percent of the deforestation of Sub-Saharan Africa circa 2010). Our fever analysis was geographically broader, sampling from 41 tropical countries (including 30 African countries which together comprised 93 percent of the population and 87 percent of the deforestation of Sub-Saharan Africa circa 2010). This sample included one survey from Indonesia, but no surveys from any of the other three countries for which previous studies found an effect of deforestation on malaria: Brazil, Malaysia, and Paraguay. Furthermore, the countries and years for which surveys were conducted on malaria were not randomly selected. Rather, our sample of surveys is probably biased toward both places and times with higher levels of malaria.
The limited geographic scope of surveyed countries and years means that our results are difficult to extrapolate beyond the set of countries included in the sample. Even within the countries and years of analysis, our pooled sample represented an artificial super-national aggregation of countries and years rather than a cohesive and intuitive bloc. It is important to note however that survey observations were designed to be, once properly weighted, representative of the particular countries and years in which they took place. Since few survey-specific results were consistent with our ex ante hypothesis, we are confident that our main findings are not an artifact of the artificial super-national aggregate of countries nor the weighting scheme employed.
Our findings that forest cover and deforestation did not affect malaria nor fever prompt three questions. First, how do we reconcile our finding with previous studies that found that deforestation increases malaria risk factors, including in Africa (e.g., Lindblade et al., 2000;Cohuet et al., 2004;Afrane et al., 2005;Munga et al., 2006)? The relationship between deforestation and malaria risk is nuanced to begin with, as discussed in the Introduction, and any increase in malaria risk or exposure from deforestation could be swamped by other factors, e.g., related to economic growth or public health campaigns (Figure 1). This apparent discrepancy between an effect on malaria risk and a non-effect on malaria prevalence may be analagous to the so-called "paddies paradox" (Ijumba and Lindsay, 2001), in which communities near irrigation projects had less rather than more malaria in spite of increases in Anopheles mosquitoes, perhaps because increased wealth from irrigation led to increased use of bednets and better access to improved healthcare. We were able to partially control for investments in health through the inclusion of housing quality, bed-net usage, and place of delivery, but admit that such factors are difficult to fully control for in crosssectional analyses.
Second, how do we reconcile our fever findings with previous studies that found deforestation to be associated with higher malaria prevalence in countries in South America and Southeast Asia? Fever is only a weak proxy for malaria, as discussed in the Introduction, so the absence of a positive relationship between deforestation and fever could be either because the effect on malaria is absent or because an effect is swamped by other causes of fever.
And third, how do we reconcile our malaria findings with the findings of previous studiesthat found a positive relationship between deforestation and malaria prevalence in Brazil, Indonesia, Malaysia, and Paraguay? We can think of several possible explanations for why our results might differ from what previous studies found. First, it could be that geographic variation in malaria-transmitting mosquito types and ecological conditions across the tropics means that deforestation has less of an effect on malaria in Africa than elsewhere. However, this would be contrary to the literature suggesting that it is Asia rather than Africa where deforestation has a lower effect on malaria risk (Guerra et al., 2006).
Second, there could be relevant differences in the phenomenon of deforestation between Africa and other tropical continents. Fisher (2010) describes an "African exception to drivers of deforestation," in which deforestation in Africa is largely driven by the slow expansion of subsistence or smallholder agriculture for domestic use rather than market-driven agricultural exports as in Latin America and Asia. It could be that in Latin America and Asia relatively more deforestation is undertaken by new frontier migrants with associated unstable socio-economic conditions of poor housing stock, unimproved water sources, poor access to health services, and so forth. Meanwhile in Africa relatively more deforestation is by longtime residents for whom these conditions are less likely to be changed by deforestation.
Third, we used different data sets from previous studies. Our Landsat-derived data on forests are comparable to the Landsat-derived data used in the Brazil studies, but should be more accurate than the MODIS, NDVI-based, or self-reported data used in other studies. Our DHS-survey data on malaria at the individual level should be at least as good if not better than jurisdictional-level data, which necessarily rely on modeling assumptions to estimate population-level malaria rates. While greater levels of granularity are conceptually superior to aggregated data (e.g., Comfort et al., 2017), the difference in approach could account for the difference in findings.
Fourth, the contrast in our findings could be the result of methodological differences with previous studies. Our model specification differed from those of previous studies (as indeed previous studies' specifications differed from each other; Table 1). We didn't attempt to match others' specifications; rather we derived our own specification based on our conceptual model of the coupled human and natural system (Figure 1).
Ideally, we could have isolated and tested the third and fourth potential explanations above by attempting to replicate previous studies' methodologies as closely as possible using our data. However, our data set, the Demographic and Health Surveys, did not collect data on malaria in three of the four countries where previous studies were conducted (Brazil, Malaysia, and Paraguay), and collected data on fever only in Indonesia. Furthermore, aggregating our individual-level data on malaria and other household characteristics to the jurisdictional level would require making a number of difficult assumptions. Given the multiple differences described above between our studies and others' studies, we prefer to consider how our findings and findings in the literature could both be correct. We speculate that the most likely potential explanation is an African exception in the drivers of deforestation, and thus the channels through which deforestation leads to changes in malaria. We suggest this is a fruitful topic to explore in future research.
Our findings can help prioritize actions in both the health sector and the forest sector. For anti-malarial efforts, our findings do not support the inclusion of forest conservation within a portfolio of anti-malarial interventions, at least in Sub-Saharan Africa where 88 percent of malaria cases occur (WHO, 2016). It would be more effective to prioritize proven antimalarial interventions such as bed nets, spraying, and housing improvements.
For forest conservation efforts in Africa, it makes sense to focus management interventions on securing the many other values of standing forests, including carbon storage, biodiversity habitat, clean water provision, food provision, and other aspects of health. The total social value of standing forests, and the case for conserving them, is considerable even without an effect on malaria.

Conclusion
Using by far the largest and most granular data set to date, we find little evidence to support the hypothesis that deforestation increases malaria prevalence in Africa. Our findings contrast with evidence that deforestation increases malaria risk factors in some settings, including in Africa. However, many other factors besides deforestation also affect malaria prevalence in humans, including climate, community demographics, access to health facilities, and people's behaviors to avoid malaria, meaning that increases in risk might not always translate to increases in prevalence. Our findings also contrast with the findings of most previous studies from Brazil, Indonesia, Malaysia, and Paraguay, as well as a crossnational study, which found that deforestation is associated with higher malaria prevalence. Although we can't fully explain within this study why our findings differ from those of previous studies, we speculate that the explanation may be related to an African exception to drivers of deforestation, in which deforestation in Africa is undertaken relatively more by long-time residents living in stable socio-economic conditions rather than by frontier settlers. Our results imply that at least in Africa forest conservation does not appear to be an effective anti-malarial intervention. This implies anti-malarial efforts in Africa should focus on other proven interventions such as bed nets, spraying, and housing improvements. Forest conservation efforts in Africa should focus on securing other benefits of forests, including carbon storage, biodiversity habitat, clean water provision, and the provision of other goods and services.    Unweighted averages and standard deviations in parenthesis. Panel based on interview year and forest cell. Sample sizes for bednet and facility deliveries are smaller as they is not recorded in all surveys; these variables are used only for sensitivity analyses.    0.12*** 0.19*** 0.01** (0.04) (0.05) (0.00) Temperature squared -0.00** -0.00*** -0.00 (0.00) (0.00) (0.00) Total monthly precipitation (0.01 mm) 0.00 0.00 0.00 (0.00) (0.00) (0.00) Precipitation squared -0.00* -0.00* 0.00 (0.00) (0.00) (0.00)