Geostatistical analysis of Malawi’s changing malaria transmission from 2010 to 2017

Background: The prevalence of malaria infection in time and space provides important information on the likely sub-national epidemiology of malaria burdens and how this has changed following intervention. Model-based geostatitics (MBG) allow national malaria control programmes to leverage multiple data sources to provide predictions of malaria prevalance by district over time. These methods are used to explore the possible changes in malaria prevalance in Malawi from 2010 to 2017. Methods: Plasmodium falciparum parasite prevalence ( PfPR) surveys undertaken in Malawi between 2000 and 2017 were assembled. A spatio-temporal geostatistical model was fitted to predict annual malaria risk for children aged 2–10 years ( PfPR 2–10) at 1×1 km spatial resolutions. Parameter estimation was carried out using the Monte Carlo maximum likelihood methods. Population-adjusted prevalence and populations at risk by district were calculated for 2010 and 2017 to inform malaria control program priority setting. Results: 2,237 surveys at 1,834 communities undertaken between 2000 and 2017 were identified, geo-coded and used within the MBG framework to predict district malaria prevalence properties for 2010 and 2017. Nationally, there was a 47.2% reduction in the mean modelled PfPR 2-10 from 29.4% (95% confidence interval (CI) 26.6 to 32.3%) in 2010 to 15.2% (95% CI 13.3 to 18.0%) in 2017. Declining prevalence was not equal across the country, 25 of 27 districts showed a substantial decline ranging from a 3.3% reduction to 79% reduction. By 2017, 16% of Malawi’s population still lived in areas that support PfPR 2-10 ≥ 25%. Conclusions: Malawi has made substantial progress in reducing the prevalence of malaria over the last seven years. However, Malawi remains in meso-endemic malaria transmission risk. To sustain the gains made and continue reducing the transmission further, universal control interventions need to be maintained at a national level.


Introduction
Malaria-endemic countries are increasingly encouraged to define their sub-national epidemiology to understand the likely rational allocation of intervention mixes and the changing malaria landscape [WHO, 2010;WHO, 2015]. Within a country, variations in vector ecology, environment, and intervention coverage all determine the patterns of malaria risk and incipient disease burden. Consequently, maps of malaria risk are required by national malaria control programmes to guide decision making in heterogenous settings.
National cartographies of malaria risk were common during the 1950s and 1960s [Snow & Noor, 2015]. Often these maps were based on accepted definitions of malaria endemicity [Lysenko & Semashko, 1968;Metselaar & van Thiel, 1959], derived from community-based surveys of malaria infection prevalence and climate ecologies [Snow & Noor, 2015]. The importance of malaria risk maps began to re-emerge during the 1990s, coincident with the pan-African malaria resurgent epidemic [Snow et al., 1996], resulting in a new generation of national malaria strategic plans that articulated the sub-national disparities in malaria ecology, transmission and disease burden [Omumbo et al., 2013].
In Malawi, as part of the 2001-2005 national malaria strategic plan [NMCP, 2001], the epidemiology of malaria was described based on a World Health Organization (WHO) mission to the country 30 years earlier [Cheyabejara et al., 1974] "In 1973, the WHO mission determined malaria to be meso-to hyperendemic in Malawi, except in isolated higher altitudes mountainous regions" [NMCP, 2001]. However, during the second national malaria strategic plan 2005-2010[NMCP, 2005, no references to the epidemiological patterns of transmission, dominant vector species or disease burden were provided.
It was not until 2006 that empirical data on malaria infection prevalence was used with model-based geostatistical (MBG) methods to provide predictive quantities of risk across Malawi [Kazembe et al., 2006]. This work used data on malaria prevalence from 73 survey locations, where children aged 1-10 years that had been sampled between 1970 and 2001. Temperature, rainfall, potential evapotranspiration, and elevation were all used as covariates to help predict infection prevalence at un-sampled locations using information and correlates with sampled locations. This map was used in the malaria programme review in 2010 [NMCP, 2010] and the national strategic plan 2011-2015 to highlight the hyper-endemic nature of malaria transmission in the country, with variations in higher altitude areas [NMCP, 2011].
In 2013, the application of MBG methods was extended to include 1057 surveys of malaria infection undertaken between 2000 to 2010, employing covariates related to urbanization and temperature suitability for malaria transmission to provide a 1×1 km posterior prediction of malaria risk in 2000, 2005and 2010[Bennet et al., 2013. These analyses showed significant subnational variations in malaria prevalence; however, there was little change across the prediction time-periods.
Since 2010, there have been significant additional communitybased malaria surveys undertaken, sampled both at district and national levels. This paper describes the re-assembly of malaria infection data in Malawi and their use within a MBG framework to understand changing sub-national malaria endemicity between 2010 and 2017 at a time of increased malaria intervention coverage throughout the national strategic plan 2011-2015[NMCP, 2011.

Geography
Malawi is a landlocked country located in the South Eastern region of Africa along the Great Rift Valley, bordered by Tanzania, Mozambique and Zambia (Figure 1). The country has an estimated population of 17.6 million people in 2017 [NSO, 2018], and is one of the poorest countries in Africa with a per capita GDP of US$350 [World Bank, 2018]. Altitude varies considerably from 37 metres above mean sea level (MASL) to 3003 MASL. Both proximity to Lake Malawi and altitude define the variable climate patterns [Vincent et al., 2014]. The country is divided into three regions, namely Northern, Central, and Southern regions, which encompass 28 districts ( Figure 1). This includes the small islands of Likoma and Chizumulu that form Likoma district, with circa 9,000 people, in Mozambique waters ( Figure 1). For the purposes of the present analysis Likoma district is excluded as it is situated more than 69 km from the mainland.

Malaria control 20 years on
Malawi launched a national malaria strategic plan in 2001 in line with recommendations made by the Roll Back Malaria initiative [NMCP, 2001]. With rapidly emerging sulphadoxinepyrimethamine (SP) resistance, Malawi proposed a change in its first-line treatment policy from SP to Artemether-Lumefantrine (AL) in 2004, but the policy was not implemented until 2007-2008[Malenga et al., 2009. A programme of socially marketed, subsidized insecticide-treated net (ITN) distributions was launched in 1998 [Mathanga et al., 2012;MoH, 2005], and was largely the sole source of ITN access nationwide until 2007 when the free delivery of ITN through public health facilities to children attending immunization and pregnant mothers was launched [Mathanga et al., 2012]. Between 2007 and 2010, approximately 4 million nets were distributed free of charge to vulnerable populations, including 1.1 million nets provided during the first nationwide

Amendments from Version 1
We would like to thank the reviewers for their helpful feedback. The manuscript has been modified to address the comments made by the three reviewers and improve on clarity. The main modifications in this version are related to the methods and results sections. We have changed statistical "significance" to "substantial" district level prevalence changes in the results. We have clarified the number of unique and repeat survey locations. We also include maps showing exceedance and non-exceedance probabilities to quantify decrease in prevalence between 2010 and 2017. A new reference to climate changes over the study period has been added. We have uploaded a dataset to figshare repository which now includes minimum and maximum ages of subjects. Figure 1. The Geography, population density, districts and unsuitability for malaria transmission in Malawi. Population density ranges from zero (yellow) to 37,332 person per 1 km grid (dark blue). Grey areas represent a temperature suitability index (TSI) of zero which indicates a temperature range that cannot support malaria parasite development cycles in the mosquito [Gething et al., 2011], and all correspond to unpopulated areas in the Nyika Plateau in the north and Mulange Massif range in the south.
In 2011, the National Malaria Strategy 2011-2015 was launched with a vision that all people in Malawi are free from the burden of malaria and an ambition to halve the malaria disease and mortality burden by 2016 [NMCP, 2011]. Substantial increases in long-lasting insecticide-treated net (LLIN) distribution were achieved from 2010 with over 1 million nets being delivered through public health facilities each year since 2011, and 5.4, 7.0 and 8.6 million nets distributed during a mass campaigns in 2012, 2014 and 2016, respectively. Between July 2010 and 2011, IRS using pyrethroids was undertaken in seven districts (Nkhotakota, Salima, Karonga, Nkhata Bay, Mangochi, Chikwakwa and Nsanje) protecting approximately 3 million people [NMCP, 2012]. Fuel shortages, funding interruptions and increasing Lambda-cyhalothrin and carbamate resistance [Mzilahowa et al., 2016;Wondji et al., 2012] led to a significant reduction in IRS activities, with Salima district only under IRS in 2013 where after IRS was suspended nationwide [Chanda et al., 2015]. In 2011, the introduction of rapid diagnostic tests (RDTs) was launched nationwide with supplies to peripheral health facilities and training of health workers in 2012 and 2013 [PMI, 2016], which were included for health workers in hard to reach areas as part of integrated community case management (iCCM) in 2015 [Phiri et al., 2016].

Malaria prevalence survey data assembly
The process of identifying community-based malaria survey data is described in detail elsewhere [Bennett et al., 2013;Snow et al., 2017]. Of importance have been national household sample surveys that have included malaria infection prevalence among children in national micro-nutrient surveys conducted in 2001, 2006, 2009and 2016national demographic and/or malaria indicator surveys undertaken in 2010national demographic and/or malaria indicator surveys undertaken in , 2012national demographic and/or malaria indicator surveys undertaken in , 2014national demographic and/or malaria indicator surveys undertaken in and 2017national demographic and/or malaria indicator surveys undertaken in [NMCP, 2017; sub-national surveys undertaken at district levels by the College of Medicine Malaria Alert Centre between 2005-2009[Mathanga et al., 2010, repeat surveys in three districts between 2012 and 2014 [Walldorf et al., 2015], and district-wide surveys in Chikwawa 2010-2016[Buchwald et al., 2016Kabaghe et al., 2017;Kabaghe et al., 2018a;McCann et al., 2017;Roca-Feltrer et al., 2012b]. Other data were derived from published sources and the generous help with unpublished data from national malaria scientists and collaborators, listed at the end of the paper. Data were restricted to surveys undertaken between January 2000 and December 2017.
For each of the identified data sources, information on the month and year of the survey, age range of the surveyed population, the numbers tested versus numbers identified as harbouring Plasmodium falciparum and the methods of parasite detection were all extracted. The location (longitude and latitude) of each surveyed village, school or enumeration cluster was checked mass-campaign in 2008. Before 2010, only pilot indoor-residual house spraying (IRS) using pyrethroids were undertaken in small community studies in Ntchisi, Mzimba, Kayelekera uranium mine and the Nchalo and Dwangwa Estates of the Illovo sugar using national statistical office high-resolution global positioning system (GPS) databases and other publicly available, online digital gazetteers. The raw information is provided as underlying data [Chipeta et al., 2019;Snow, 2017].
Geostatistical spatio-temporal analysis MBG [Diggle et al., 1998;Diggle & Giorgi, 2019] is a likelihoodbased approach that allows prediction of a health outcome of interest using sparsely sampled data. This modelling framework has also been extended to interpolate both the spatial and temporal variation of disease prevalence through the analysis of repeated cross-sectional data [Giorgi et al., 2018]. MBG has become a well-established tool in statistics for modelling the spatio-temporal correlation induced by unmeasured risk factors to predict prevalence at any desired place and time.
To model changes in PfPR 2−10 by borrowing strength of information across time and space, an MBG model was used. Unlike previous MBG approaches [ Bennett et al., 2013] a decision was made not to include human settlement, climate or other environmental covariates during the modelling exercise. The inclusion of covariates (climate, land use, social economic status and intervention), when used to assist predictions at locations without data, presume a clearly defined a priori biological relationship with prevalence and are only valuable when predictions must be made without large volumes of input empirical prevalence data, which themselves represent the product of all the possible covariate influences [Macharia et al., 2018].
The model is described as follows. Let x be the location of a surveyed community in year t. Define a spatio-temporal Gaussian process, S(x, t), and unstructured random effects, Z(x, t), to account for the unexplained variation between and within communities, respectively. Conditionally on S(x, t) and Z(x, t), the counts of positive tests for P. falciparum were assumed to follow mutually independent binomial distributions with number of trials N, corresponding to number of sampled individuals, and probability of a positive outcome p(x, t) at location x (n=surveyed locations) and year t (2000-2017) given by where mA and MA are the minimum and maximum age among the sampled individuals at a location x and time t. In carrying the spatio-temporal predictions, mA and MA were set to 2 and 10 respectively to standardise to the age group 2-10 years. A stationary and isotropic Gaussian process for the spatio-temporal random effects is assumed S(x, t), with an exponential correlation function given as where ϕ and ψ are scale parameters which regulate the rate of decay of the spatial and temporal correlation for the increasing distance and time separation, respectively; u = ||x − x′|| is the distance in space between the location of any two communities, one at x and the other at x′; ν = |t − t′| is the time separation in years between any two surveys.

Model validation
The model was validated using two methods. First by testing evidence against the residual spatio-temporal correlation in the data through the following variogram-based validation algorithm [Giorgi et al., 2018]: 1) Generate a point estimate Z(x i ,t i ) i.e. Z (x i ,t i ) from a non-spatio-temporal model, for each observed location x i and time t i ; 2) Permute the order of the data, including Z(x i ,t i ), while holding (x i ,t i ) fixed; 3) Compute the empirical semi-variogram for Z(x i ,t i ); 4) Repeat steps (1) and (2) a large number of times, say B; 4) Using the resulting B empirical variograms to generate 95% confidence intervals at each of the pre-defined distance bins. To conclude that there is no evidence against the adopted spatio-temporal model correlation the empirical semi-variogram from the original data must fall within the generated 95% confidence intervals. Second, validation statistics based on a 10% hold-out dataset or correlation against observed and predicted estimates of PfPR 2-10 , bias and mean absolute error was done.

Population-adjusted risk
Neither human settlement nor malaria risk are evenly distributed, and therefore to ensure that malaria risk maps converge with human population density, similar gridded surfaces of population are required. Dasymetric modelling techniques for the reallocation of populations within census units have been developed to overcome the difficulties caused by input census data of coarse spatial resolution [Linard et al., 2012;Stevens et al., 2015]. In brief, the 2008 Malawi national census, organized by 12,666 enumeration areas (lowest available level of aggregation), was reallocated using a Random Forest model in combination with land cover specific weightings, protected areas, night-time lights, roads, rivers, altitude and settlement data to adjust and re-allocate population densities within each enumeration area (available here). United Nations rural and urban growth rates were used to project population's forward to 2010 and 2017 (available here) to provide a gridded dataset of population distribution (counts) at 0.1×0.1 km resolution. The population maps were resampled to 1×1 km grids ( Figure 1) to match the malaria risk mapped outputs.

Description of survey data
A total of 2,276 independent survey data points at 1,874 unique locations were identified during the data assembly process. Two survey locations could not be geo-coded and 37 surveys were undertaken on the islands of Likoma district and were excluded. Of the remaining 2,237 surveys, 403 (18%) were repeat surveys, taken at the same geo-location but in different years and 1,834 were unique locations. The data covered the entire period 2000-2017 and represented the examination of 59,920 individuals. The majority of surveys employed microscopy (83%), rather than RDTs (17%) for parasite detection. Most survey locations (86%) were positioned using local GPS sources. The location of the age-corrected survey data is shown in Figure 2.

Spatio-temporal variation in malaria risk
The assembled data were used in the spatio-temporal model Equation 1 to generate the 1×1 km grids of mean predictions of PfPR 2-10 in 2010 and 2017 ( Figure 3). The validity of the spatio-temporal model indicated that the empirical semi-variogram falls within the 95% confidence intervals (Figure 4), indicating that there is no evidence against the adopted spatio-temporal model. The predictive performance of the model using cross validation, holding 10% of the total sample (224 surveys), indicated a high correlation between observed and predicted PfPR 2−10 with ρ = 0.72, a bias of 0.28% and mean absolute error of 15%.
The national mean predicted PfPR 2-10 in 2010 was 29.4% (95% confidence interval (CI) 26.6-32.3%) compared to 15.6% (95% CI 13.3-18.0%) in 2017. When combined with population density in each year this corresponds to a reduction of 94.7% in the numbers of people living in areas where PfPR 2-10 is greater than 40% and a corresponding 216.3% increase in populations living under areas of Pf PR 2-10 <20% ( Figure 5). As shown in Figure 3 and Figure 6, declines in PfPR 2-10 were witnessed nationwide. However, the largest declines (≥60% reductions)  Figure 7). However, the confidence intervals for prevalence changes in Machinga, Mulanje and Phalombe districts contain a zero. To emphasise on prevalence decline, we categorised prevalence into two thresholds. In Figure 6, we show areas with low Pf PR 2-10 prevalence where prevalence lies below 20% (non-exceedance probability-NEP, 80 or 90% sure) and areas with very high prevalence where PfPR 2-10 prevalence is above 30% (exceedance probability-EP, 80 or 90% sure), i.e. areas where intensive and sustained control is required.   Showing areas where predicted PfPR 2-10 is less (non-exceedance probability) than 20% which were > 80% confidently predicted (light green and dark green) or > 90% confidently predicted (dark green); and areas where PfPR 2-10 is greater (exceedance probability) than 30% which were > 80% confidently predicted (light red and dark red) or > 90% confidently predicted (dark red). Areas which do not support malaria transmission are shown in grey (see Figure 1); all other areas where transmission can occur are shown in yellow.
control coverage and improved malaria case-management, including expanded community-based care. This has corresponded with a dramatic decline in infection prevalence nationwide. Compared to 2010, the national mean PfPR 2-10 has declined by 47.2% (Figure 3, Figure 6 and Figure 7; Table 1). In 2017, 6% of the population lived under conditions of intense malaria transmission (PfPR 2-10 >40%) compared to 25% in 2010 ( Figure 5). It is not possible to directly attribute the reduction in malaria transmission to any specific intervention or combination of interventions.   The present analysis focuses on malaria infection prevalence, a frequently used malaria metric and used for over 100 years across Africa [Snow et al., 2017]. Prevalence data for Malawi have been assembled from multiple sources including districtlevel research platforms, school surveys and nutritional surveys. The leveraging of multiple national survey data from diverse research and health constituents improves the precision of predictions over sparse data collected during single cross-sectional national malaria or health surveys powered to provide information on variables other than prevalence at low spatial resolutions. Malawi has several sentinel research districts which provide platforms to investigate specific intervention access, attribution and impact questions [Buchwald et al., 2016;Escamilla et al., 2017;Kabaghe et al., 2018a;Kabaghe et al., 2018b;Roca-Feltrer et al., 2012a]; these serial, repeat observational data significantly contribute to informing the changing national profile of malaria risk in the MBG models. A key function of the NMCP remains curating and updating these data from all partners in-country. However, the analysis of sub-national variations in risk and epidemiological transitions should be triangulated with additional routine data from health information systems and malaria hospitalization. Through a process of data triangulation, a more granular understanding of the epidemiological transition is possible. The use of MBG methods to interpolate information on malaria prevalence at community levels, is less perfect when compared to complete, reliable routine data on the monthly presentation of parasitologically diagnosed fevers to health facilities. However, in the absence of complete routine data, both routine and survey data provide opportunities to understand the impact of scaled intervention on the malaria burden sub-nationally.

Conclusion
Malawi has made substantial progress in reducing the prevalence of malaria over the last seven years. It seems plausible that this transition has been a direct result of substantial investment in improving the scale and range of intervention coverage. More detailed interrogation, and triangulation, of intervention and routine data, is required to understand the sub-national impact of control. Malawi remains a high burden country. To accelerate future progress will require further prioritization of existing interventions and increasing their reach for several years before sub-national targeting of resources becomes a priority.  (203077).

Data availability
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Version 1
Are sufficient details of methods and analysis provided to allow replication by others?/ Are all the source data underlying the results available to ensure full reproducibility? I could not find the source data in the links the authors provided. For example, the figshare reference (Chipeta , 2019) does not contain the minimum and maximum age or the month of the survey. et al.
Could the authors provide R code for their models and data preparation? In particular, the population re-adjustment and age standardisation are not described in enough details to allow replication by others.

Are the conclusions drawn adequately supported by the results?
Some of the conclusions in the discussion do not seem fully supported by the statistical results presented. The authors state, on page 9, 'the populations living in districts along the shore of Lake Malawi and central region remain under highest risk.' It seems to me (a statistician with limited expertise in the application) that Phalombe and Zomba would be at highest risk because these are the two regions with potentially increasing prevalence (especially Zomba). Perhaps the authors could elaborate on how they determined 'highest risk'?
On the same page, the authors recommend, 'Malawi should be considered as a country that should, at present, maintain a single,national strategic approach to malaria control'. How is this recommendation supported by the modelling?

Some smaller questions
Are the underlying survey data based on random samples from each location? Are there potential design factors or biases that should to be taken into account (as in Giorgi ., 2014 )? et al Is month used in the temporal part of the model or year of the survey used? Only year is in the shared data.  We thank Dr Theresa Smith for thoughtful and helpful comments made on the first version. We have improved the clarity of the paper by responding to the points raised as follows:

Summary
In this paper, Chipeta and co-authors use model based geostatistics to analyse malaria prevalence

Summary
In this paper, Chipeta and co-authors use model based geostatistics to analyse malaria prevalence data collected in a large number of surveys throughout Malawi and across several years. There is a particular emphasis on estimating changes in prevalence from 2010 to 2017 during which several malaria control strategies were implemented. Using a spatio-temporal model taking into account geographic and time correlations between surveys as well as unstructured heterogeneity, the authors produce estimates of prevalence in children aged 2-10 Plasmodium falciparum (PfPR_2-10) at a spatial resolution of 1km x 1km in 2010 to 2017 along with regional estimates in the change in prevalence for 27 districts.
Overall, this is a well-written, technically sound paper. I have some small questions and suggestions: 1. I could not find the source data in the links the authors provided. For example, the figshare reference (Chipeta , 2019) does not contain the minimum and maximum age or the month of et al. the survey.
The data contains the year of the survey only as described in the data section of the Response: manuscript. We have now updated the dataset on figshare repository ( ), that contains here minimum and maximum ages We have also added this link in the updated manuscript. .

2.
Could the authors provide R code for their models and data preparation? In particular, the population re-adjustment and age standardisation are not described in enough details to allow replication by others. The assembled surveys contained a variety of sampled age groups and it is necessary Response: to have a single age-group in comparing data in time and space. The theory behind changing age profiles of infection prevalence depending on the intensity of transmission in a given location is provided in the reference by Smith . (2007). To standardise age to a single age range of 2 -10 et al years (P PR ), we incorporate this process in the model by setting minimum ( and maximum f mA) ( ages to 2 and 10 respectively, in Equation (1)  3. Some of the conclusions in the discussion do not seem fully supported by the statistical results presented. The authors state, on page 9, 'the populations living in districts along the shore of Lake Malawi and central region remain under highest risk.' It seems to me (a statistician with limited expertise in the application) that Phalombe and Zomba would be at highest risk because these are the two regions with potentially increasing prevalence (especially Zomba). Perhaps the authors could elaborate on how they determined 'highest risk'? Highest risk is a factor of the change between the two timepoints and the current risk Response: (in 2017) We agree with the reviewer, Phalombe and Zomba districts have the high prevalence, in addition, the transmission risk along the lake shore and central region remains high because of the enabling environment for the vectors. At the same time, the districts along the lake shore and central region have experienced least changes despite the interventions and control efforts. 4. On the same page, the authors recommend, 'Malawi should be considered as a country that should, at present, maintain a single, national strategic approach to malaria control'. How is this [2][3][4][5][6][7][8][9][10] should, at present, maintain a single, national strategic approach to malaria control'. How is this recommendation supported by the modelling? This is based on WHO recommendations on subnational stratification for malaria Response: control Malawi remains a meso-endemic country, with very few with extremely low pre-elimination . settings (< 1% of the population). This in our view, still calls for universal coverage control effort. Targeted interventions should be applied in pre-elimination and highly heterogeneous areas with substantial areas covering high and others covering low risk.
5. Are the underlying survey data based on random samples from each location? Are there potential design factors or biases that should to be taken into account (as in Giorgi ., 2014 )? et al The underlying surveys are based on a collection of multiple random surveys. For Response: national household surveys, these are provided in the links to national household survey designs, mostly two-stage random surveys. For other published works, these are often provided in detail in the peer reviewed publications. Overall, our only requirement was that a community or a school was sampled as a single cross-section and that information was available on the date, number examined and numbers positive. It is beyond the scope of these large data assembly approaches, levering all possible national survey data, to build a detailed sampling strategy per data point. This is possible when ONLY using multi-stage, household sample surveys. And some investigators only use these for mapping work, building in survey weights. However, it is our view that for NMCPs, these are often limited in the information they can provide on malaria prevalence and ignore the vast amounts of other unpublished works that exist within a country. Here we have levered all possible data from all possible sources to be able to provide the most complete source of data on malaria infection prevalence in Malawi.
6. Is month used in the temporal part of the model or year of the survey used? Only year is in the shared data.
Only year of the survey is used in the modelling of the data. Response: 7. Fig 3: How much uncertainty is there in these estimates?
Uncertainty in the estimates was quantified using the standard errors. The standard Response: errors ranged from 0.0005 to 0.3 across space and time and were largely influenced with availability of data across space and time. Areas with more data were seen to have low standard errors and narrow Confidence Intervals. Standard error maps are available upon request to the first author (MGC). 8. Fig 5: How much uncertainty is there in the classification? A different figure type (perhaps a stacked bar chart) would help emphasize the decrease in prevalence between the two years.
To emphasise the decrease in prevalence between 2010 and 2017, we now provide a Response: new Figure for exceedance and non-exceedance probabilities maps, showing areas where predicted P PR is less (non-exceedance probability) than 20% which were > 80% confidently f predicted or > 90% confidently predicted; and areas where P PR is greater (exceedance f probability) than 30% which were > 80% confidently predicted or > 90% confidently predicted, for both 2010 and 2017. 9. Table 1: Could the authors add the number of surveys per district as a column in this table?
We have added this information in Table 1 as suggested.

Response:
Once again, we thank the reviewer and offer the improved manuscript based on this and two other reviews. , which permits unrestricted use, distribution, and reproduction in any medium, provided the original Attribution Licence work is properly cited.

Tobias Chirwa
Division of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa Summary I have read the manuscript report and, subject to addressing my comments, should be considered for indexing. The paper looks at prevalence of malaria in space and time, taking into account multiple data sources to counter sparse data. The setting is Malawi and the authors utilize various data sources to estimate prevalence in 2010 and 2017. They argue that this change can be attributed to interventions implemented during the period. Generally, it is an important paper showing decline in prevalence over time.
General comment: The authors need to justify why they only considered changes or trends based on two time points (i.e. 2010 and 2017). They had an opportunity to optimally use the different multiple sources over time. One would have assumed the estimation would be enriched with more time points considering surveys were also conducted in the intermediate period. Can the method provide estimates for several time points? For the two time points, I therefore feel the authors should be cautious of use of statistical significance.
Specific comments: Abstract: Under results, it is not clear whether the 1834 communities are repeat communities and whether this was taken into account during analysis i.e. weighting. On declining prevalence, it is not clear why the authors say the decline was significant, and how this was determined. Was this by district, 2017 vs 2010? Introduction: In paragraph 5, the authors should clarify whether the 1057 surveys are in addition to the 73 mentioned in the previous paragraph.

Method:
The authors should give an exclusion and inclusion criteria. At the moment, it is not clear why Likoma District was excluded, although distance is mentioned.
On page 4 -the authors show that there were different interventions in selected districts. Did these overlap and how were they handled in the analysis? Were the mass campaigns a national effort? And how do we explain the differences in prevalence decline? how do we explain the differences in prevalence decline?
Under section on malaria prevalence the dates are from 2000 to 2017 while the title states 2010 to 2017this has been repeated elsewhere. Can the authors also explain how the age standardization was done?
Under the geostatistical spatio-temporal analysis, can the authors explain the model and how this fits in with sparse data scenario considering national surveys were used? What would also be useful is to be able to compute inter-cross sectional survey estimates besides 2010 and 2017 since we have direct estimates at the time of cross-sectional surveys. I am looking for added value of the model than classical analysis.
On Page 5 (population-adjusted risk), can the authors explain whether they adjusted for repeat surveys?
Results: There are varying case detection methods from survey to survey. How were these used to estimate prevalence. Sensitivity, specificity issues? Under Table 1, I wish there were estimates for intermediate years. I think the authors have not optimally utilized the data. A trend analysis would be useful as well. Can the authors explain the small changes in some districts; and also why there is such a wide variation in the changes by district?
Discussion: Can the authors discuss the design variation for the different multiple sources of data; One of the references has same title as the manuscript.
Do we have any reference to climatic changes within the period of analysis?

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Biostatistics and Epidemiology.

I confirm that I have read this submission and believe that I have an appropriate level of I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 19 Jun 2019 , Malawi-Liverpool Wellcome Trust Research Programme, Blantyre, Malawi

Michael Give Chipeta
We thank Prof Tobias Chirwa for thoughtful and helpful comments made on the first version. We have improved the clarity of the paper by responding to the points raised as follows:

Summary
I have read the manuscript report and, subject to addressing my comments, should be considered for indexing. The paper looks at prevalence of malaria in space and time, taking into account multiple data sources to counter sparse data. The setting is Malawi and the authors utilize various data sources to estimate prevalence in 2010 and 2017. They argue that this change can be attributed to interventions implemented during the period. Generally, it is an important paper showing decline in prevalence over time.

A) General comment:
The authors need to justify why they only considered changes or trends based on two-time points (i.e. 2010 and 2017). They had an opportunity to optimally use the different multiple sources over time. One would have assumed the estimation would be enriched with more time points considering surveys were also conducted in the intermediate period. Can the method provide estimates for several time points? For the two time points, I therefore feel the authors should be cautious of use of statistical significance.
We used all the available malaria survey data in Malawi, which included national

et al
The current work updated data through to 2017. Our aim was to explore changes in malaria prevalence in 2017 compared to 2010. It is always tempting to estimate malaria every year, but we argue that predictions should be made to years of maximal data, and years that are important for national malaria programme planning. We used data pre-2010 simply to improve the 2010 prediction.
We have taken note and revised accordingly to replace the word "significance" with "substantial" to better present our findings on the changes between estimates in 2010 and 2017. Additionally, we have added a map of exceedance and non-exceedance probabilities to quantify with 80% or 90% certainty that a location is above or below a given prevalence threshold. this was taken into account during analysis i.e. weighting.
The 1834 are unique communities, among the 2237 communities. That's 403 (18%) Response: were repeat survey communities, taken at the same geo-location but in different years.
Weighting was directly taken into account by the spatio-temporal correlation structure Response: as defined in equation (2). More specifically, locations that are repeatedly sampled will have a spatial correlation of 1 (first factor on the right-hand side of equation (2)) but the model will still allow for variation over time as resulting from the temporal correlation (second factor on the right-hand side of the equation (2)) and from the unstructured random effects in equation (1). 2. On declining prevalence, it is not clear why the authors say the decline was significant, and how this was determined. Was this by district, 2017 vs 2010?
The prevalence decline was determined by the percentage change per district, from Response: Table 1. We have reworded "significance" to "substantial", indicating the change between prevalence in 2010 and 2017.

Introduction:
3. In paragraph 5, the authors should clarify whether the 1057 surveys are in addition to the 73 mentioned in the previous paragraph.
The 1057 surveys include the 73 surveys mentioned in the previous paragraph.

Response:
4. The authors should give an exclusion and inclusion criteria. At the moment, it is not clear why Likoma District was excluded, although distance is mentioned.
The model-based geostatistical methodology used for analysis in the current paper Response: models prevalence by borrowing strength of information across space and time. Considering that the islands are at a distance of more than 60 kilometres from the mainland, it was deemed necessary to exclude them from the analysis. Spatial dependence of information tends to diminish with increasing separation distance and this island setting may have a different ecology, by its island status, to the contiguous areas of mainland Malawi. 5. On page 4 -the authors show that there were different interventions in selected districts. Did these overlap and how were they handled in the analysis? Were the mass campaigns a national effort? And how do we explain the differences in prevalence decline?
Indeed, the interventions were overlapping, including insecticide treated-bed nets and Response: indoor residual spraying. Net distribution campaigns were a nation-wide effort for years 2008, 2011, 2012, 2014 and 2016. IRS, however, was applied in a select number of districts as outlined in the paper. We intentionally did not include interventions as a variable in the analysis. The aim was to describe parasite prevalence without adjusting for, or testing for, interventions or other variables that might influence the prevalence of infection. We feel that climate, intervention, ecological changes would be reflected in the parasite rate in a given year. Therefore, the spatial predictions in 2010 and 2017 reflect the effects of intervention at these times. We have equally avoided specific statistical testing of intervention. Rather we have taken a plausibility approach [Habbict ., 1999] and used in previous works looking at changing malaria infection prevalence et al in Africa [Snow ., 2017;Macharia ., 2018;Giorgi ., 2018]. Habicht JP, Victora CG, Vaughan JP. Evaluation designs for adequacy, plausibility and probability of public health programme performance and impact. Int J Epidemiol. 1999;28:10-8. Macharia PM, Giorgi E, Noor AM, Waqo E, Kiptui R, Okiro EA, Snow RW (2018). Spatio-temporal analysis of Plasmodium falciparum prevalence to understand the past and chart the future of malaria control in Kenya. Malaria Journal, 17: 340. Snow RW, Sartorius B, Kyalo D, Maina J, Amratia P, Mundia CW, Bejon B, Noor AM (2017). The prevalence of Plasmodium falciparum in sub Saharan Africa since 1900. Nature, 550: 515-518.
6. Under section on malaria prevalence the dates are from 2000 to 2017 while the title states 2010 to 2017 -this has been repeated elsewhere. Can the authors also explain how the age standardization was done?
The estimated prevalence was at two-time points (2010 and 2017). However, we used Response: data pre-2010 (between 2000 and 2010) to provide stable estimates in 2010 by leveraging on the temporal autocorrelation.
The assembled surveys contained a variety of sampled age groups and it is necessary Response: to have a single age-group in comparing data in time and space. The theory behind changing age profiles of infection prevalence depending on the intensity of transmission in a given location is provided in the reference by Smith . (2007). To standardise age to a single age range of 2 -10 et al years (P PR ), we incorporate this process in the model by setting minimum ( and maximum f mA) ( ages to 2 and 10 respectively, in Equation (1).

MA)
Smith DL, Guerra CA, Snow RW, Hay SI. Standardizing estimates of the Plasmodium falciparum parasite rate. Malar J. 2007;6:131. 7. Under the geostatistical spatio-temporal analysis, can the authors explain the model and how this fits in with sparse data scenario considering national surveys were used? What would also be useful is to be able to compute inter-cross sectional survey estimates besides 2010 and 2017 since we have direct estimates at the time of cross-sectional surveys. I am looking for added value of the model than classical analysis.
The model implemented is explained under this said section. A binomial model was Response: implemented to model probability of having a positive outcome(infected) (Equation 1). The model had a stationary and isotropic Gaussian process, to account for the spatio-temporal random effects modelled with an exponential correlation function (equation 2 on the manuscript). Is the Gaussian noise/unexplained variation within communities or the unstructured random effects. The parameters were estimated using Monte Carlo maximum likelihood. All the data were considered as an individual survey point surveyed at a certain location and specific time irrespective of whether it was a cross section national survey or several data points from a village. Our rationale for selecting two time-periods is discussed earlier.
8. On Page 5 (population-adjusted risk), can the authors explain whether they adjusted for repeat surveys?
Population adjusted risk in this case means adjusting for uneven distribution of Response: population when calculating malaria risk. The use of repeat surveys in the same location is discussed above.

Results:
2-10 9. There are varying case detection methods from survey to survey. How were these used to estimate prevalence. Sensitivity, specificity issues?
As detailed in the description of survey data: Most surveys employed microscopy Response: (83%), rather than RDTs (17%) for parasite detection. The only gold standard is the PCR, however, they are too few to model and compare. Standardizing between microscopy and RDTs is fraught with difficulties because there are often very few details per survey, where both used, to describe the quality of microscopy (e.g. quality of staining, slide storage, how many high-power fields examined, and magnification of microscopy) nor any reliable algorithm to standardize between different RDTs. Where surveys have compared high quality slide reading and RDTs, for example in Kenyan school children, there has always been a close correlation between methods [Gitonga . The trend between 2010 and 2017 is unlikely to have been linear and would also be Response: hampered by large confidence intervals due to few data points by year. Notwithstanding that, we aimed to provide only the overall difference between 2010 and 2017 in the current work. The predictions are available between the principle prediction years and the data can be obtained for those interested from the primary author (MGC).
11. Can the authors explain the small changes in some districts; and also, why there is such a wide variation in the changes by district?
The changes observed by district are a function of covariates such as interventions, Response: urbanization, land use etc. We did not assemble all possible covariates by district to allow direct comparison by district in a plausibility framework due to data limitations. The wide variation is likely due to sparse data and prevalence within a district spanning different endemicity levels (heterogeneity within districts) and this will be reflected in the Confidence Bounds of each prediction.

Discussion:
12. Can the authors discuss the design variation for the different multiple sources of data; For national household surveys, these are provided in the links to national household Response: survey designs. For other published works, these are often provided in detail in the peer reviewed publications. This often lead to contacts with the authors for finer spatial and temporal level data not provided in the publication. Other data has been provided without the context of a publication by malariologists working in Malawi, and all acknowledged in the paper. Overall, our only requirement was that a community or a school was sampled as a single cross-section and that requirement was that a community or a school was sampled as a single cross-section and that information was available on the date, number examined and numbers positive. It is beyond the scope of these large data assembly approaches, levering all possible national survey data, to build a detailed sampling strategy per data point. This is possible when ONLY using multi-stage, household sample surveys. And some investigators only use these for mapping work, building in survey weights. However, it is our view that for NMCPs, these are often limited in the information they can provide on malaria prevalence and ignore the vast amounts of other unpublished works that exist within a country. Here we have levered all possible data from all possible sources to be able to provide the most complete source of data on malaria infection prevalence in Malawi.
13. One of the references has same title as the manuscript.
The reference similar to the title of the manuscript references the dataset used in the Response: analysis for future citations where the data provided will be used elsewhere.
14. Do we have any reference to climatic changes within the period of analysis?
We have added a reference on the climatic changes within the study period.

Response:
Once again, we thank the reviewer and offer the improved manuscript based on this and two other reviews.
No competing interests were disclosed. , which permits unrestricted use, distribution, and reproduction in any medium, provided the original Attribution Licence work is properly cited.

Rocco Panciera
United Nation Children's Fund (UNICEF), New York, NY, USA The article presents a geostatistical analysis of the temporal and spatial variation of malaria incidence estimated by assembling a national and multi-temporal dataset of malaria prevalence surveys in Malawi between 2010 and 2017. Geostatistical spatio-temporal models are applied to provide granular estimates of malaria prevalence standardized to age 2-10, which are then weighted by population density to estimate changes in malaria prevalence within the period and for all districts in Malawi. Such changes are attributed to Malaria-specific interventions undertaken in the same period under the National Malaria Strategy 2011-2015.
The paper is well written, scientifically sound and presents conclusions of relevance to inform the success of intervention and their variability across the country. Some comments are provided below in relation to some weaknesses in specific areas.
? Are sufficient details of methods and analysis provided to allow replication by others Authors should clarify whether the 10% cross-validation exercise was repeated for multiple 1.

5.
6. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. We thank Dr Rocco Panciera for thoughtful and helpful comments made on the first version. We have improved the clarity of the paper by responding to the points raised as follows: Authors should clarify whether the 10% cross-validation exercise was repeated for multiple extractions. : The 10% cross-validation exercise was repeated for multiple extraction Response simulations to ensure that Monte-Carlo error was reduced as much as possible.
Could the author clarify whether the model produces an estimate uncertainty surface as usual for geostatistical models, and whether this was seen to vary significantly across space (and time), and how would this might affect the conclusion drawn on sub-national variation in rates of decrease in prevalence for particular districts.
: The model produces standard error maps, and the 2.5 and 97.5 quintiles. The Response standard error ranged from 0.0005 to 0.3 across space and time and was largely influenced with availability of data across space and time. Areas with more data were seen to have low standard errors and narrow CIs, Thus, for any policy relevant work, the, managers would be confident in areas with more data and add more data collection initiatives where data were sparse. Additionally, we have added a map of exceedance and non-exceedance probabilities to quantify with 80% or 90% certainty that a location is above or below a given prevalence threshold.
Is it possible that model performance as measured based on 10% hold-out dataset through might vary throughout the temporal, window of analysis 2010-2017?
: While temporal variation would be expected, the amount of data by year is not Response enough for the exercise.
When discussing district-level temporal changes in prevalence, authors might want to indicate when the CI includes the null (i.e., in the case of Phalombe district, a decrease of incidence is also compatible with the data (i.e., lower CI of -12%)).
: We have now indicated explicitly in the discussion that changes in the CIs in Response Machinga, Mulanje and Phalombe districts included the null value.
Authors should comment on how the MAE of 15% might impact the district level changes observed in Table 1.
: The mean absolute error (MAE) represents the average magnitude of the errors Response (the absolute differences between the predictions and actual the observations). As this is an average measure, some data points with larger values in MAE may have contributed to the value obtained. However, if the interpretation is made at district level, changes ranging between 0 and 15 % would be assumed to be within the MAE bounds and thus null. The correlation was 0.72 indicating a good correspondence between observed and predicted values.
The Authors claim that no significant climate anomalies where observed during the analysis