The changing risk of Plasmodium falciparum malaria infection in Africa: 2000–10: a spatial and temporal analysis of transmission intensity

Summary Background Over a decade ago, the Roll Back Malaria Partnership was launched, and since then there has been unprecedented investment in malaria control. We examined the change in malaria transmission intensity during the period 2000–10 in Africa. Methods We assembled a geocoded and community Plasmodium falciparum parasite rate standardised to the age group 2–10 years (PfPR2–10) database from across 49 endemic countries and territories in Africa from surveys undertaken since 1980. The data were used within a Bayesian space–time geostatistical framework to predict PfPR2–10 in 2000 and 2010 at a 1 × 1 km spatial resolution. Population distribution maps at the same spatial resolution were used to compute populations at risk by endemicity class and estimate population-adjusted PfPR2–10 (PAPfPR2–10) for each of the 44 countries for which predictions were possible for each year. Findings Between 2000 and 2010, the population in hyperendemic (>50% to 75% PfPR2–10) or holoendemic (>75% PfPR2–10) areas decreased from 218·6 million (34·4%) of 635·7 million to 183·5 million (22·5%) of 815·7 million across 44 malaria-endemic countries. 280·1 million (34·3%) people lived in areas of mesoendemic transmission (>10% to 50% PfPR2–10) in 2010 compared with 178·6 million (28·1%) in 2000. Population in areas of unstable or very low transmission (<5% PfPR2–10) increased from 131·7 million people (20·7%) in 2000 to 219·0 million (26·8%) in 2010. An estimated 217·6 million people, or 26·7% of the 2010 population, lived in areas where transmission had reduced by at least one PfPR2–10 endemicity class. 40 countries showed a reduction in national mean PAPfPR2–10. Only ten countries contributed 87·1% of the population living in areas of hyperendemic or holoendemic transmission in 2010. Interpretation Substantial reductions in malaria transmission have been achieved in endemic countries in Africa over the period 2000–10. However, 57% of the population in 2010 continued to live in areas where transmission remains moderate to intense and global support to sustain and accelerate the reduction of transmission must remain a priority. Funding Wellcome Trust.


Table of Contents
The stability of malaria transmission is defined quantitatively by the average number of feeds that a mosquito takes on man during its life 1 . The stability index, however, demands detailed entomological data that are rarely available. Qualitatively, stable malaria refers to situations that are relatively insensitive to natural and man-made changes. Unstable malaria includes areas very sensitive to climatic aberrations and very amenable to elimination. Critical to the planning of elimination during the Global Malaria Eradication Programme, fifty years ago, was a quantitative description of risk for planning control and monitoring progress. During the preparatory phase, large-scale parasite prevalence surveys were undertaken to examine feasibility of elimination. It was suggested that when infection prevalence became very low, national programmes should invest in combinations of passive, active and mass-blood survey surveillance of new infections, expressed as an annual parasite incidence (API) per 1000 people resident in a reporting administrative area [2][3][4] .
These epidemiological definitions have recently been reviewed to identify transitional points from sustained malaria control or a pathway toward elimination [5][6][7] , and adapted here. We have used the reported absence of transmission or absence of clinical events to define areas free of P. falciparum risk in combination with a temperature mask to exclude the possibility of parasite transmission from vectors to man. We used caseincidence data, where these exist, to define unstable malaria < 1 P. falciparum case per 10,000 population per annum and stable malaria ≥ 1 P. falciparum case per 10,000 population per annum. In addition, we have used a biological constraint on the ability to support stable transmission and classified all extremely arid areas as those more likely to support unstable transmission.

W1.2 Medical intelligence to define the absence and stability of P. falciparum transmission
An assembly of reports, case-incidence data and communications with national malaria control programme managers was undertaken to define the extent of active P. falciparum transmission 8 . Information has been digitized using administrative boundary units in ArcGIS (version 10.1, ESRI Inc. USA).

W1.2.1 Countries deemed P. falciparum free
Plasmodium falciparum transmission probably reached its natural extent in Africa around 1900 9 and how these historical limits of transmission have changed over the last 100 years is described in detail elsewhere 8 . The Kingdom of Lesotho; the Islands of the Seychelles archipelago (Tromelin, Cargados Carajos and Agalega); Rodrigues, Saint Brandon and Chagos in the Mascarene archipelago; and the island of Saint Helena, in the Atlantic Ocean, are African territories that have never supported malaria transmission 10 . The Western Sahara was reported in 1955 by the Spanish governing authorities to be completely free from malaria transmission 11 and the neighbouring Northern areas of Mauritania, Adrar, Inchiri, Dakhlet-Nouadhibou and Tiris Zemmour, north of 20°N are also regarded as malaria free 12 .
By 2000, almost all of the North African territories were P. falciparum free. P. falciparum free status was achieved in Libya in 1973, the Kingdom of Morocco in 1974, Tunisia by 1979, and the extent of falciparum transmission dramatically reduced by 1978 in Algeria with the exception of small area on the border with Mali, where P. falciparum transmission remains uncertain by 2010 13 . The last possible case from a residual focus of transmission in Fayoum Governorate in the United Arab Republic of Egypt was in 1998. We treat both Algeria and Egypt as P. falciparum free in 2000.
In Reunion, the elimination strategy was launched in 1949 14 and eventually led to the elimination of both P. vivax and P. falciparum by 1973 15 . Mauritius was a highly endemic island at the turn of the last century 16,17 , the first elimination campaign was launched in 1948. The WHO certified Mauritius malaria-free in 1973. P. falciparum transmission has yet to re-establish itself on the island despite a resurgence of P. vivax transmission in 1975.
Between 1910 and 1973 transmission of malaria was essentially absent from the Republic of Djibouti, where after it is thought that both parasites and vectors arrived by train from neighbouring Ethiopia leading to several epidemics in the early 1980s and sustained stable transmission in populated areas until 2000. By 2010, however, case incidence had declined significantly and the majority of Djibouti is now best described as unstable 18 .

W1.2.2 The changing extent and stability of P. falciparum transmission across Africa's islands
The Republic of Cape Verde launched a malaria elimination campaign in 1948, starting on the island of Sal and extended to other Islands throughout the 1950s. The campaign was successful and malaria was felt to have been eliminated in Sal (1950), Sao Vicente (1954), Boavista and Maio (1962) and Santiago (1968) 19,20 . With the exception of Santiago, no autochthonous cases were detected for many years on any of the islands since they were declared malaria free. However, malaria transmission was re-established on the island of Santiago in 1973. In 1979, a further national elimination programme was launched and the focus was on Santiago and the entire archipelago was returned to zero incidence between 1983 and 1986 21 . In 1995-1996 an epidemic occurred in St. Catarina district on Santiago 22 . However, in 2000 and 2010 locally acquired case incidence was below 1 per 10,000 people on Santiago (mainly Santa Catarina and Santa Cruz) and therefore regarded as unstable. On Boavista malaria had been absent since 1962, including 2000, however four possible autochthonous cases were detected in 2003, 10 cases in 2009 and three in 2010 rendering Boa Vista an area of unstable transmission in 2010.
In Mayotte (Mahore and Pamanzi islands), located within the Comorian Archipelago P. falciparum transmission has been intense and stable for many years 23 . Through integrated vector and disease control in the 1980s, parasite prevalence declined to less than 0.3% and only eight clinical cases were reported by 1988 24 . However, resurgent waves of transmission continued through the early 1990s. By 2000 malaria transmission had returned to a stable state across the islands 25 . By 2010 case incidence had declined again and using detailed case data it is possible to delineate areas of unstable and stable case incidence across the islands of Mayotte 26,27 .
The two volcanic islands that make up the Democratic Republic of São Tomé and Príncipe have never been free of malaria and have maintained stable transmission despite attempts at elimination [28][29][30][31][32][33] . There were two concerted elimination efforts on the island of Zanzibar prior to the launch of RBM, both without success 34,35 . Despite a significant reduction in parasite transmission on the island since 2000, stable endemic transmission continues 36 . Its sister island of Pemba has always supported stable transmission. The three islands that form the Federal Islamic Republic of Comoros (Grand Comore, Anjouan, and Moheli) have had high P. falciprum disease rates since the 1920s 37-39 and despite significant control efforts from 2007, prevalence and partial clinical data suggest that the three islands remain at stable, endemic transmission 40 .

W1.2.3 The changing extent and stability of P. falciparum transmission in southern Africa
A careful assembly of historical evidence of risk in South Africa pre-1940 suggests that malaria was absent from large parts of the western region and areas bordering the southern reaches of Namibia and Botswana [41][42][43] . Sustained efforts to eliminate malaria since the Second World War in the Transvaal and KwaZulu-Natal provinces rendered increasing areas unstable and reduced the spatial extent of risk [43][44][45][46] . Stable endemic risks in South Africa were constrained in 2000 to areas located along the Kruger national park, borders with Zimbabwe in the Limpopo, Mpumalanga provinces and the two northern districts of Ingwavuma and Ubombo in KwaZulu-Natal province. By 2010, case incidence in South Africa had dropped dramatically and case incidence by district has been used to delineate unstable and stable areas risks in Limpopo and Mpumalanga and unstable risks in Ingwavuma, KwaZulu-Natal Province.
In 1950, a national malaria reconnaissance was undertaken across Namibia; the most southerly areas from Grootfontein and Franzfontein to the Orange River were regarded as free of malaria transmission or very focal pockets of occasional transmission 47 . Similarly today the southern provinces of Karas and Hardap are regarded as malaria-free 48,49 . Districts located immediately north of Windhoek had some clinical cases between 2000 and 2010 and these have been mapped by health administration unit to reflect areas where annual incidence was < 1 per 10,000 population per annum 50  In Zimbabwe, since the 1920s, autochthonous transmission of malaria in Harare and Bulawayo has not been reported [55][56][57] . The various efforts to control and eradicate malaria over the years probably led to constrained areas of unstable transmission since the 1950s; by 1979, the central districts were regarded as malaria free. However, malaria case incidence rose significantly across the country from the mid-1980s to a point in 2000 when the country had returned to a stable endemic state. The investment in control since 2000 has rendered 12 districts in central highlands malaria free (Goromonzi, Seke, Marondera, Hwedza, Chikomba, Chirumanzu, Gutu, Gweru, Shurugwi, ZviShavane, Chivi and Umguza) that now form part of a consolidation phase of elimination 58 .
In the Kingdom of Swaziland, following the Second World War, successive efforts at malaria control led to an infant malaria parasite rate of 0% by 1956 59 and IRS was stopped in 1959. By 1970, it is stated that the only cases were those imported from outside the country 60 and malaria operations were drastically scaled down. During the early 1980s a resurgence of malaria risk was witnessed, briefly abated, and rising again through to a peak in the late 1990s. By 2000 the disease burden was constrained to Lubombo and Hhohho region 60 ; the highveld has continued to be malaria free 61 . Following a renewed malaria control effort, malaria case incidence has begun to decline from 2000 62 , and for the three consecutive years 2006-2008, incidence was below 1 per 10,000 population to a point in 2010 where transmission is unstable risk across the regions of Lubombo and Hhohho, with one stable endemic district of Mhlangatane, and the remaining areas the Kingdom ostensibly malaria free.

W1.3.1 Temperature:
Temperature plays a key role in determining the transmission of human malaria 63 . To provide a plausible mask to eliminate the possibility of transmission across Africa, we have used a recently developed temperature suitability index (TSI) 64 . The TSI model uses a biological framework based on the survival of vectors and the fluctuating monthly ambient temperature effects on the duration of sporogony that must be completed within the lifetime of a single generation of Anophelines. This was used to generate at each 1 x 1 km pixel, periods of an average year when a vector's lifespan would exceed the time required for sporogony, and hence when transmission was not precluded by temperature. If this time exceeded the maximum feasible vector lifespan, then the cohort was deemed unable to support transmission and the area classified as being at zero risk 64

W1.3.2 Aridity:
Arid conditions effect anopheline development and survival 65 . Limited surface water reduces the availability of sites suitable for oviposition and reduces the survival of vectors at all stages of their development through the process of desiccation 66 . We have defined extreme aridity using the enhanced vegetation index (EVI) derived from Moderate Resolution Imaging Spectrometer (MODIS) satellite imagery 67 . Monthly EVI datasets archived over 11 years were averaged to a synoptic year and used to classify into areas likely to support transmission, defined by an EVI of greater than 0.1 for any two consecutive months and areas without two or more consecutive months of an EVI > 0.1 as unable to support transmission 68,69 . This aridity mask identifies small foci of risk across the Sahara that are likely to support stable transmission because of their proximity to oases and seasonal rivers, while retaining a plausible mask of almost zero transmission across the Sahara, the arid, barren areas of Eritrea (including the Danakil depression), Djibouti and Somalia and the deserts in southern Africa including the Sossusvlei and Skeleton Coast. This mask we have used conservatively as representing unstable transmission, rather than malaria free, as transmission might be possible in the presence of man-made water bodies such as dams or underground water storage ( Figure W1.1d).

W1.4 Population distribution 2000 and 2010
Where disease risks are heterogeneous in space, population distributions and counts should be resolved to higher levels of spatial detail than large regional estimates. For many low income countries of the World, however, spatially detailed, contemporary census data do not exist. This is especially true for much of Africa, where currently available census data are often over a decade old, and at administrative boundary levels just below national-level.
Modelling techniques for the spatial reallocation of populations within census units have been developed in an attempt to (a) disaggregate population count data to a finer spatial detail and (b) convert population count data from irregular administrative units to regular raster layers [70][71][72] . In brief, a dasymetric modelling technique 73 was used to redistribute census population counts within administrative unit boundaries based on land cover data sets derived from satellite imagery to produce a gridded dataset of population distribution (counts) at 0.1 x 0.1 km resolutions. Human population census data, official population size estimates and corresponding administrative unit boundaries at the highest level available from the most recent available censuses were acquired for each African country. Typical regional per-land cover class population densities were estimated from African countries for which very fine resolution population data were available, following approaches outlined elsewhere 71 . These typical population densities were then applied as weightings to redistribute census counts according to the land cover and to map human population distributions at a finer spatial resolution. The population datasets were projected to 2000 and 2010 using national rural and urban growth rates estimated by the UN Population Division 74 .
The urban extents produced by the Global Rural-Urban Mapping Project (GRUMP) 75 were used to distinguish between urban and rural areas. Finally, datasets were adjusted to ensure that national population totals matched those reported by the UN 76 . Population distribution was re-sampled to 5 x 5km grids. Figure   50 -99people per km 2 ; 100 -499 people per km 2 ; 500 -1000 people per km 2 ; > 1000 people per km 2 W2 Parasite prevalence data search, assembly, pre-processing and summaries

W2.1 Background
There are a variety of measures of the intensity of malaria transmission derived from field investigations of human populations or vectors. The most ubiquitous measure, used for over 100 years in Africa, is the parasite rate -the proportion of individuals on a single cross-sectional survey among an entire or sampled community who have evidence of a peripheral blood stage malaria infection. This data are often expressed as infection prevalence among children aged 2-10 years (PfPR 2-10 ) and used since the 1950s to define categories of endemic risk designed to guide and monitor progress toward malaria elimination targets [77][78][79] .

W2.2.1 Electronic data searches:
Online electronic databases were used as one means of identifying peer-reviewed, published data on malaria infection prevalence including: PubMed 80 ; the Armed Forces Pest Management Board -Literature Retrieval System 81 ; the World Health Organization Library Database 82 ; and the Institute de Recherché pour le Development on-line digital library service 83 . Regional journals, most notably the large number of national medical, public health and parasitological journals, were not identified readily from the above sources but titles and abstracts were available on African Journals Online (AJOL) 84 . In all digital electronic database searches for published work, the free text keywords "malaria" and "country-name" were used. We avoided using specialised Medical Subject Headings (MeSH) terms in digital archive searches to ensure as wide as possible search inclusion. Database searches were undertaken at least once per year from 2005 and the last complete digital library search was undertaken in March 2013. Searches were supplemented through routine weekly notifications from Malaria World 85 , the Roll Back Malaria news alert service 86 , the Environmental Health at USAID malaria bulletin 87 and Santé Tropicale for Francophone country national and regional journals including Medecine D'Afrique Noire 88 .
Titles and abstracts were used to identify possible parasite cross-sectional survey data undertaken since January 1980 in a variety of forms: either as community surveys, school surveys, other parasite screening methods or intervention trials. We also investigated studies of the prevalence of conditions associated with malaria when presented as part of investigations of anaemia, haemoglobinopathies, blood transfusion or nutritional status to identify coincidental reporting of malaria prevalence. In addition, it was common practice during early antimalarial drug sensitivity protocols to screen community members or school attendees to recruit infected individuals into posttreatment follow-up surveys, often data from the survey sites selected present the numbers screened and positive. Surveys of febrile populations or those attending clinics were excluded.
Publications with titles or abstracts suggestive of possible parasite data were either downloaded from journal archives where these have been made Open Access (OA) or sourced from HINARI 89 . If publications were not available OA from HINARI we visited UK library archives at the London School of Hygiene and Tropical Medicine, the Liverpool School of Tropical Medicine, the Bodleian library at the University of Oxford, Wellcome Trust and British library or the Library at the Institute Pasteur, Paris to obtain copies. References not found following these searches were requested using world catalogue searches through the Oxford libraries at a per-page cost. All publications from which data were extracted were cross-referenced using the bibliographies for additional sources that may have been missed or that may correspond to unpublished or 'grey' literature, not controlled by commercial publishers. electronic copies (Adobe PDF) were made of all original survey reports. Finally we used the digital archive of survey data developed under the Mapping Malaria Risk in Africa (MARA/ARMA) initiative, established by one of us (RWS) in collaboration with colleagues from South Africa and other regional centres in 1996 90,91 .

W2.2.3 Unpublished community survey data post-2005:
As part of the Roll Back Malaria (RBM) monitoring and evaluation national, household surveys were resurrected as a means to monitor country-level progress 92,93 . These surveys were initially embedded in the Demographic and Health Surveys (DHS) as a malaria module and were largely focussed on intervention coverage measures until 2005 when it was agreed to include malaria infection prevalence into survey protocols. These two-stage random cluster selected national surveys included combinations of methods parasite detection in different age groups depending on the national survey. Where these surveys were co-managed by the US based MEASURE-ICF programme, data from these national surveys have been made available on-line 94  Tropical Medicine and malaria meeting abstract books were identified from as many sources as possible produced as part of national and international conferences and congresses. These were used to signal possible data that were followed up through correspondence with abstract authors. Our regional presence and connections to the wider African malaria research community has created an awareness of the purpose and ambition of malaria mapping research first started in 1996 under MARA 90 . This regional connectivity of research scientists was used to contact directly colleagues working on the epidemiology of malaria to seek disaggregated site-specific and often unpublished data. The willingness to share unpublished parasite survey data has been unprecedented, 651 individual scientists and public health specialists have contributed disaggregated information on published data, unpublished data or have assisted in the location of sampled communities across Africa, all are acknowledged at our webpage 103 .

W2.2.4 Search completeness:
Our data searches have not used systematic, traditional evidence review strategies. These would have missed many unpublished sources of information. Rather our strategy has used a cascaded, opportunistic approach. Authors of peer-reviewed papers were often asked about additional information within their paper and directions to other possible unpublished work in their geographic area or from their institution. Searches of European tropical Institute's libraries and archives focussed only on the major established tropical centres of those who had colonial ties to Africa. We have not visited every country in Africa to search for national Ministry of Health data nor had access to every national parasitology research conference proceedings, it is notable that our archived data is greatest for Kenya, Tanzania, Uganda, Senegal and Namibia where we have had access and searched archived national data. . None of these datasets have been released by the MEASURE-ICF program or national collaborating partners. Similarly we have not been able to access known national and sub-national surveys of nutritional status undertaken in Ghana (1999) and Kenya (2010) that included investigations of malaria infection prevalence. We have tried to identify as many post-graduate research theses as we could through on-line resources in the UK, however obtaining digital copies of older theses was not possible and thus remain a possible source of unpublished data not archived by us. More importantly are the many post-graduate theses undertaken by students of the faculties of parasitology, public health and medicine at the many universities across Africa that have not been adequately searched.

W2.3 Data abstraction
The minimum required data fields for each record were: description of the study area (name, administrative divisions and geographical coordinates, if available), the dates of start and end of the survey (month and year) and information about blood examination (number of individuals tested, number positive for Plasmodium infections by species), the methods used to detect infection (microscopy, Rapid Diagnostic Tests (RDTs), Polymerase Chain Reaction (PCR) or combinations) and the lowest and highest age in the surveyed population. Given its ubiquity as a means for malaria diagnosis, the preferred parasite detection method was microscopy. No differentiation was made between light and fluorescent microscopy. The quality of slide reading 104,105 , variations in sensitivity/specificity between RDTs 106 or the ability of field teams to reliably read RDTs 107,108 and selection of primers for PCR 109 all influence descriptions of prevalence and will have intrinsic variance between surveys included in the database. RDTs have been shown to yield higher false positive rates than microscopy 106 but seem to stratify both the lowest (<1% parasite rate) and highest (> 50% parasite rate) more accurately compared to routine microscopy 105 . An analysis of a large collection of community parasite rate data have shown that there was minimal difference in estimates of overall mean P. falciparum prevalence in matched paired analysis of community survey data that used microscopy and RDT for parasite examination 110 .
For data derived from randomized controlled intervention trials, data were only selected when described for baseline, pre-intervention and subsequent follow-up cross-sectional surveys among control populations. When cohorts of individuals were surveyed repeatedly in time we endeavoured to include only the first survey and subsequent surveys if these were separated by at least five months from the initial survey to avoid dependence between observations based on treatment of preceding infected individuals. If it was not possible to disaggregate repeat surveys these were finally excluded from the analysis. Where age was not specified in the report for each survey but stated that the entire village or primary school children examined we assumed age ranges to be 0-99 years or 5-14 years respectively. Occasionally, reports presented the total numbers of people examined across a number of villages and only the percentage positive per village; here we assumed the denominator per village to be equivalent to the total examined divided by the total number of villages. It was possible to establish the year of every survey; however, the month of survey was occasionally not possible to define from the survey report. Here we used descriptions of "wet" and "dry" season, first or second school term or other information to make an approximation of the month of survey and included a record of this assumption. Some survey results were reported as an aggregate in space (e.g. a single PfPR for a group of villages) or time (e.g. a mean PfPR estimated from four different surveys conducted over time). In such cases, we either sought additional reports of the same surveys with higher spatial or temporal resolution. Where this was not possible and where clusters of villages exceeded 5 km 2 we excluded the record from the analysis (see below). Where additional information to provide unique time, village specific data was necessary we contacted authors to provide any missing information.

W2.4 Data geo-coding
Data geo-coding, defining a decimal longitude and latitude for each survey location, was a particularly demanding task. According to their spatial representation, data were classified as individual villages, communities or schools or a collection of communities within a definable area, corresponding to an area within a 5 km grid or approximately 0.04 decimal degrees at the equator. Where possible we aimed to retain disaggregated village, "point" level data rather than data across a "wide-area". Where data were reported across communities that exceeded at 5 km grid we regarded these as too low a spatial resolution, with significant possible variation within the polygon of information to be excluded within the modeling phase. In practice this was a difficult criterion to audit as most survey reports did not provide enough detail on the size of the area surveyed. More recent use of Global Positioning Systems (GPS) during survey work does enable a re-aggregation of household survey data with greater precision and useful in maintaining 5 km grid criteria while combining clusters of small sample sizes in space. To position each survey location where GPS coordinates were not available in space we used a variety of digital resources, amongst which the most useful were Microsoft Encarta Encyclopedia (Microsoft Inc., 2004) and Google Earth (Google Inc., 2009). Other sources of digital place name archives routinely used included GEOnet Names Server of the National Geospatial-Intelligence Agency, USA 111 ; Falling Rain Genomics' Global Gazetteer 112 ; and Alexandria Digital Library prepared by University of California, USA 113 . Across Africa a number of national digital, GPS confirmed, place-name gazetteers exist for populated places, health facilities or schools. These are increasing in number, precision and coverage. These were obtained on request from national census bureau's, ministries of education and health and NGO partners and proved to be valuable locating communities in Kenya, Uganda, Burkina Faso, The Gambia, Mauritania, Zambia, Mozambique, Madagascar, Somalia, Malawi, Ghana, Burkina Faso, Niger, Namibia, South Africa, Tanzania and Zanzibar. Although standard nomenclatures and unique naming strategies are attempted in digital gazetteers 114 , these are difficult to achieve at national levels where spellings change between authors, overtime and where the same place names are replicated across a country. As such, during the data extraction, each data point was recorded with as much geographic information from the source as possible and this was used during the geo-positioning, for example checking the geo-coding placed the survey location in the administrative units described in the report or corresponded to other details in the report on distance to rivers or towns when displayed on Google Earth. While in theory GPS coordinates should represent an unambiguous spatial location, these required careful re-checking to ensure that the survey location matched the GPS coordinates. As routine we therefore rechecked all GPS data from all sources using place names and/or Google Earth to ensure coordinates were located on communities.
All coordinates were subject to a final check using second level administrative boundary Global Administrative Units Layers (GAUL) spatial database developed and revised in 2008 by Food and Agriculture Organization (FAO) of the United Nations 115 . The spatial selection tool in ArcGIS 10.1 (ESRI, USA) was used to verify points along the coastline were within land area as defined by GAUL 2008. The Global lakes and Wetlands (GLWD) database developed by the World Wildlife Fund 116 was used to ensure inland points were within defined land area. Here we aimed to identify survey coordinates that fell slightly off the coastline, located on the river or in incorrect administrative units, every anomaly was re-checked and re-positioned using small shifts in combination with Google Earth.

W2.5. Database fidelity checks and exclusions
Data checks: The entire database was first checked with a series of simple range-check constraint queries to identify potential errors that could have occurred during data entry. These queries assessed all data fields relevant to modelling for missing or inconsistent information. The final objective was to check for any duplicates introduced during the iterative data assembly process. Pairs of survey sites found within 1 km or within five months at the same location were identified. These may have entered erroneously into the data assembly where multiple reports reviewed describing similar data. These were listed, checked and duplicates removed. Location details: Despite repeated efforts and multiple on-line digital gazetteers, national resources and personal communications we were unable to identify with sufficient precision the geo-coordinates for 540 (1.9%) survey data points at 500 unique locations.
Polygon and multiple repeat surveys: We were unable to obtain higher spatial resolution data from 23 survey reports from 13 locations that described prevalence across areas exceeding 5 km 2 in Sudan (13), Comoros (3), Tanzania (2), Bioko Island of Equatorial Guinea (1), Gabon (1), Ghana (1) and Zambia (1). Sixteen unique locations were surveyed more than once and it was not possible from the original survey reports or attempts to communicate with investigators to separate out the first from subsequent surveys in Mali (10), Senegal (2), Sudan (2), Côte d'Ivoire (1) and Cameroon (1).
Ensuring sample precision: Sample size is inversely related to prevalence where, at low sample sizes, biases in prevalence estimates are introduced, dependent on the true prevalence of the population and translates into large 14 standard errors 117 . There is a critical threshold of between 10 and 20 individuals sampled below which the standard error increases exponentially in most surveys of parasitic infections and the curve starts to flatten at a sample size of about 50 and reaches an asymptote at about 100 118 . The sample size of individual survey samples is also important in the derivation of correlations with covariates of endemicity, in testing plausible associations between say rainfall and prevalence during covariate selection where small, imprecise samples can lead to over-fitting (Table W3.4). We aimed to combine communities in close proximity where any village had less than 15 people sampled and where communities were within 5 km of each other, sampled at exactly the same time by the same investigators. Using these criteria we were still unable to merge data from 1,191 (4.2%) time-space locations and these were excluded from the final analysis (see Table W2.1 for country distribution). Coastal and lake islands or malaria free locations: Off of Africa's coastline and the shores of its major inland lakes several small islands exist. These pose problems for approaches of matching environmental covariate grids (seeW3.3) as the sizes of the islands are often less than 5 km 2 and not captured by existing climate and ecology datasets. We have identified all off-shore, small islands from within the data set and have excluded from the analysis 304 (1%) data points where surveys were undertaken on these islands. These island exclusions included 137 survey data points from the islands of Bukooli, Bujumba, Bunya, Busiro, Buvuma, Kyamusua and Mukono parishes of Uganda located in Lake Victoria; 66 data points from Wasini, Lamu, Pate and Shella islands in the Indian Ocean and islands belonging to Suba District in Lake Victoria in Kenya; 34 survey data points from the Islands of Likoma in Lake Malawi; 26 survey points from the archipelago of Bolama/Bijagos islands in the Atlantic, off the coast of Guinea Bissau; seven survey data points on Lagos Island, Nigeria; five data points from Nosy Be island in the north of Madagascar; two survey data points from Lagunes area of Côte D'Ivoire; four data points located on islands in Lake Mweru, Zambia; one from the island of Fernan-vaz, Gabon; two from the island of Mafia, Tanzania; and one from Bol located in Lake Chad, Chad. In addition, 56 time-space survey locations were sampled in areas of South Africa (51), Harare, Zimbabwe (1), Kingdom of Swaziland (1) and Mauritania (3) that by 2000 were classified through medical intelligence (W1.2) as malaria free. These were removed from the final analysis data set.

W2.6 Data summaries
The eight year data search identified 26,746 temporally unique data points at 21,341 survey locations undertaken since January 1980 using the inclusion criteria described above (see Table W2

W2.7 Age standardization
There was a large diversity in the age ranges of sampled populations between studies. To make any meaningful comparisons in time and space a single standardized age range is required. Correction to a standard age for Plasmodium falciparum is possible based on the observation and theory of infectious diseases where immunity is acquired following repeated exposure from birth. We have retained the classical age range of 2-10 years as this best describes the exposure to infection among semi-immune hosts at any given location and conforms to classifications established in the 1950s 77 . We have adapted catalytic conversion Muench models, first used in malaria by 121 , into static equations in R-script that uses the lower and upper range of the sample and the overall prevalence to transform into a predicted estimate in children aged 2-10 years, PfPR 2-10 122 .In brief, this algorithm computes the probability that an individual of a known age will be positive for P. falciparum infection as a function of the observed presence of infection, the age-dependent sensitivity of detecting infections and the age distribution of sampled populations 122 .

W3.1 The problem of over-fitting in variable selection
In statistical modelling, a set of independent covariates of the main outcome measure is often used to improve the model fit and increase the precision of predicted estimates [123][124][125][126] . The inclusion of these covariates increase model complexity and, if not carefully selected, risk over-fitting (using up too many degrees of freedom), which occurs when more terms or covariates than is necessary are used in the model fitting process 123,125,126 . Overfitting can lead to poor quality predictions because coefficients fitted to these covariates add random variations to subsequent predictions and make replication of findings difficult 123 . Where too many covariates are used, the model tends to produce highly fluctuating regression coefficients increasing the chances of large covariate coefficients and an overly optimistic fit, especially with small sample sizes 123 . This problem can be particularly pronounced when data assembled are from observational studies based on different study designs, sampling considerations and sample sizes which are then combined to describe a random process 125 . The considerations and procedures for the selection of covariates have therefore been described as one of the most important stages of the statistical modelling process 124,125,127 .
The choice of these covariates should be underpinned by the principle of parsimony (few strong and easily interpretable covariates) and plausibility (a clearly understood mechanism by which the covariate influences the outcome) [123][124][125] . In disease mapping there must be a pre-determined aetiological explanation of the relationship of the disease and the covariate under consideration. Including in the prediction process covariates whose relationship with the outcome cannot be explained apriori may still lead to a 'good' model fit but this may largely be an artefact. To model the risks of P. falciparum malaria transmission in time and space, we have implemented a process of assembling a minimum, important set of covariates that have a biologically plausible relationship with malaria transmission and subjected them to a stringent selection process for inclusion into subsequent predictive models.

W3.2 Assembling plausible covariates of PfPR 2-10
The determinants of malaria transmission are climatic (rainfall and temperature), ecological (potential breeding sites and urbanisation) and control interventions (anti-vector and ant-parasitic measures) 128,129 . These factors affect the development and survival of the P. falciparum parasite and the malaria-transmitting Anopheles vector thereby reducing the risks of infection. These covariates must be spatially and temporally matched with the observed data on transmission to account for their effect in time and space. However, they are rarely available at time points that correspond with the date of surveys. This is mainly because most of the covariate information is derived from long-term processed remotely sensed satellite imagery, which have become available at sufficiently high resolution only in the last decade 130 or modelled climatic data generated as synoptic estimates that do not represent a specific year 131 . Even where available, the inclusion of time-varying covariates, while statistically attractive, increases the complexity of MBG methods substantially to the extent that computations may be intractable where large datasets are involved. Regarding malaria control interventions, in addition to poor availability of space-time matching data, their inclusion creates an intrinsic circularity when analysing the role of intervention coverage against changing transmission, an area we hope to address in the future. For these reasons, the assembled covariates were long-term annual average raster grid representations of the climatic and environmental determinants of transmission and described in more detail below.

Temperature:
Laboratory experiments have shown that high temperatures (>34 o C) lead to almost 100% larval mortality and at lower temperatures (<16 o C) the larvae were unable to produce viable adults 132,133 . The mortality of the anopheles mosquitoes also increase sharply at ambient temperatures approaching 40 o C 134,135 .
Temperatures of between 25°C and 30°C are considered optimum for P. falciparum sporogony 128 .
It is on the basis of these biological relationships that we have assembled two temperature metrics in order to test their statistical relationships with PfPR 2-10 . These were: annual mean temperatures; and a biologically modeled temperature suitability index (TSI). The annual mean temperature surface was developed from monthly average temperature raster surfaces at 1 × 1 km resolution which were downloaded from the WorldClim website 136 . These surfaces were produced from global weather station temperature records gathered from a variety of sources for the period 1950-2000 and interpolated using a thin-plate smoothing spline algorithm, with altitude as a covariate, to produce a continuous global surface 131 (Figure W3.1a). TSI, as explained in W1.3.2, was developed as a quantitative measure of optimal P. falciparum sporozoite development at 1 × 1 km spatial resolution 64 . The TSI model uses a biological framework based on the survival of vectors and the fluctuating monthly ambient temperature effects on the duration of sporogony that must be completed within the lifetime of a single generation of Anophelines. The TSI is constructed using long-term monthly temperature time series 131 and represented on a scale of increasing transmission suitability, from 0 (unsuitable) to 1 (most suitable) ( Figure  W3.1b).

Proxies of suitable conditions for larval development (precipitation and vegetation):
Rainfall, combined with suitable ambient temperatures, provides potential breeding environments for Anopheles vectors while humidity is associated with vector longevity 137,138 . Normally, proxies of rainfall such as precipitation and vegetation are used in malaria risk predictions in Africa 131,125,130,139 . This is because actual rainfall data, typically collected from weather stations, are sparse throughout the continent 9 .
Monthly mean precipitation raster surfaces at 1 × 1 km resolution were downloaded from the WorldClim website 136 and used as a proxy for rainfall compiled from weather stations over a similar period as the mean temperature surfaces 131 . These monthly surfaces were summed to generate a synoptic annual mean precipitation surface ( Figure W3.1c). For vegetation, Fourier-processed enhanced vegetation index (EVI), derived from the Moderate-resolution Imaging Spectroradiometer (MODIS) sensor imagery and available at approximately 1 × 1 km spatial resolution 130 was used to develop an annual mean EVI surface. EVI is an index of intensity of photosynthetic activity and ranges from 0 (no vegetation) to 1 (complete vegetation) ( Figure W3.1d). EVI, compared to the more commonly used Normalised Difference Vegetation Index (NDVI), is developed from satellite imagery of higher spatial and spectral resolution and corrects for some distortions in the reflected light caused by the particles in the air as well as the ground cover below the vegetation 140 .

Urbanization:
The availability of optimum environments for the development of the malaria transmitting anopheline populations become limited in urban areas resulting in reduced vector density, biting rates and transmission intensity [141][142][143] . Overall malaria infection rates are therefore lower in urban compared to rural areas of Africa 144,145 .
Countries have different definitions of urbanisation that are rarely comparable across regions making their use in predictive modelling limited 145,71 . To develop a consistently defined surface of urbanisation, information from the Global Rural Urban Mapping Project (GRUMP) 75 146,75 . To define urban extents, a border was defined around each set of contiguous lighted pixels whose total population count was greater than 5,000 persons. Because not all urban settlements are 'well-light' to be detected by satellite sensors, a buffer was drawn around settlement points to estimate spatial extents of the settlements. The GRUMP urban extent was further refined to produce a 'peri-urban' classification constrained by population density using the Afripop data 72 . Urban areas were defined as locations with a density of more than 1000 persons per km 2 with the rest of the GRUMP urban extent defined as peri-urban ( Figure W3.1e).

Figure W3.1 Raster grid surfaces at 5 × 5 km of: a) Annual mean temperature; b) Temperature suitability index (TSI) for malaria transmission; c) Annual mean precipitation; d) Annual mean enhanced vegetation index (EVI); e) Urbanisation in 2010.
The urbanisation surface derived from the Africa population surface was used as a template grid on which all other environmental grid were spatially matched. The light grey areas are those that were malaria free and the dark grey those that supported unstable transmission in 2000.

W3.3 Statistical selection of covariates
There are several approaches to selecting the best-fitting covariates in a statistical model 123,125,126,[148][149][150] . For this study, we used a strict covariate selection approach in which a total-sets analysis based on a generalized linear regression model was implemented in bestglm package in R [151][152] . Under this approach the best combination of covariates, which were those with the lowest value of the BIC statistic when regressed against PfPR 2-10 , were selected. The BIC penalty increases if the number of predictors is >7 so that compared to other information criterion, it selects the most parsimonious model 153 .
To begin the selection process the values of the assembled covariates were extracted to each country-specific PfPR 2-10 survey location using ArcGIS 10 Spatial Analyst (ESRI Inc. NY, USA) tool. The data from the Islands were excluded in this analysis because several of the covariate surfaces did not have information for these islands (Cape Verde, São Tomé and Príncipe, Comoros and Mayotte) due to their small geographic size. For these islands, the data only was used to predict PfPR 2-10 . For each country, the total-set analysis was implemented on the full data from the period 1980-2012. The coefficients and 95% confidence intervals of the best-fit covariates from the total-set analysis are shown in Table W3.1. None of the selected covariates were selected in the best-fit model for Guinea Bissau, Eritrea and South Africa, Swaziland.

W3.4 Model-based geostatistics (MBG)
The application of MBG methods in disease mapping is an established norm 154 . Where spatial and temporal data are available, MBG methods fit the data, in a way that the spatial (and temporal) covariance is used to generate samples of the predicted posterior distribution from which point estimates and the uncertainty around these estimates are computed simultaneously [154][155][156] . Normally, Bayesian inference is done using Markov Chain Monte Carlo (MCMC) algorithms 157 . MCMC approaches, although used widely, suffer from problems of convergence and dense covariance matrices which increase the computational time and cost significantly, especially where there are large data points spatially and temporally 158

W3.5 Model specification
A Bayesian hierarchical spatial-temporal model was implemented through SPDE approach using R-INLA library [R-INLA] to produce continuous maps of PfPR 2-10 at 1 × 1 km spatial resolution using data ranging from 1980-2012. The continuous indexed GF with covariance function was represented as a discretely indexed random process, that is, as a Gaussian Markov Random Field (GMRF) [159][160][161] . This is where an explicit link between Gaussian Field (GF) and GMRF formulated as a basis function is provided through Stochastic Partial Differential Equations (SPDE) approach [161][162][163][164] . The solution for SPDE can be expressed as whereW , is the spatial Gaussian white noise, Δ is the Laplacian,α controls the smoothness of the realization and τ controls the variance. The link between Matérn smoothness v and variance 2 σ is , where d is the spatial dimension 165 . An approximation of this SPDE can be solved using a finite element method (FEM), which is a numerical technique for solving partial differential equations 43 . In this case, the spatio-temporal covariance function and dense covariance matrix of the GF are replaced by a neighbourhood structure and a sparse precision matrix respectively and together define a GMRF. A GMRF can be described as a spatial process that models spatial dependence of data observed at a spatial unit like grid or geographical region and it can be expressed as . This is an n-dimensional GMRF with mean µ and a symmetrical positive definite precision matrix Q computed as the inverse of the covariance matrix 159 . Thus the density of u is given by The sparse precision matrix Q offers computational advantage when making inference with GMRF. This is because the linear algebra operations can be performed using numerical methods for the sparse matrices which results in a considerable computational gain and this is further enhanced by using INLA algorithm for Bayesian inference [159][160] . The infinite-dimensional GRF is replaced with a finite-dimensional basis function representation where i w represents the weights and i Ψ are piece-wise linear basis functions defined on a triangulation of the domain with n nodes which are defined as mesh in the code 161 . The basis functions are deterministic and are defined by each node in the triangulation while the stochastic property of the process is determined by the weights. The model used in this paper assumed non-stationary GRFs because environmental phenomena which are known to influence PfPR 2-10 are non-stationary in nature and therefore the distribution of PfPR 2-10 is nonstationary 166 . This non-stationary model was made possible by the flexible nature of SPDE models which allows modification of the SPDE rather than the covariance function to obtain the GFRs with other dependence structures other than the stationary Matérn covariance. The stationary isotropic Matérn covariance function, between locations u and v in d° is expressed as Where V K is the modified Bessel function of the second kind, ⋅ denotes the Euclidean distance and order Year instead of month was used as the metric for time due to uncertainty around actual survey is several of the historical data and the added substantial complexity for the model. The prior for the SPDE model by default are Gaussian. These priors are chosen heuristically to match the spatial scale of the MESH domain. The user can override the defaults by supplying a "hyper" parameter 165 . This is normally suitable when the dataset lacks enough information for the likelihood to fully identify the parameters for the prior distribution. In this paper the SPDE default priors were sufficient for the model. The mesh function (inla.mesh.create) in INLA is used to create a Constrained Refined Delaunay Triangulation (CRDT). The overall effect of the triangulation construction is that, if desired, one can have smaller triangles, and hence higher accuracy of the field representation. However, this will have an effect on the computation of the model. There is therefore a need to balance the number of triangles and the computation time required. If the data points (cluster coordinates) are used to construct the mesh, a cut-off value is specified in the function represents which is the maximum distance in which data points are represented by a single vertex. If the boundary of the area domain is used to construct the mesh, (i.e using the function points.domain=border), then the mesh is constructed to cover the border of the domain using restrictions provided in other arguments. But if both data points and area domain (boundary) are used the restrictions are combined. In this model, the mesh was constructed using the boundary of the area domain. This method produces a mesh with regular size of triangles. A cut-off value was specified to avoid building many small triangles around PfPR 2-10 input locations. A reasonable offset value was used to specify the size of the inner and outer extensions around the data locations. The maximum edge value was used to specify the maximum allowed triangle edge lengths in the inner domain and in the outer extension. The inner maximum edge value was made small enough to allow the triangulation to support representing functions with small enough features, and typically smaller than the spatial correlation range of the model. Therefore this value was adjusted to fit the range of the area domain in the model 165 .

W3.5.1 Constructing a suitable MESH
A matrix was then constructed to link the PfPR 2-10 input locations to the triangles on the mesh defined by ŋ* as = ( + 1) and in the inla code in the following inla.spde.make.A function. This makes each row in the matrix to have three non-zero elements since every data point is inside a triangle and the corresponding columns are expected to have non-zero elements. In order to obtain a square matrix for the model, the response was linked to the index of the random field, where the length of the index vector was the same as the length of the projection matrix. In order to estimate the intercept, the stack function introduces a vector of ones in the matrix and this is later removed in the formula 165 .

W3.5.3 Model Validation
As a first step to understanding the uncertainty around the predictions of PfPR 2-10 using the Bayesian geostatistical model, the continuous mean map and the endemicity class map for each year were accompanied by estimates of the posterior standard deviation ( Figure W3.3).The percentage of population by standard deviation to posterior mean PfPR 2-10 is shown in Table W3. 3. In addition, a spatially and temporally representative validation set of PfPR 2-10 survey data were selected using a sampling algorithm 167 which declusters over space and time. This algorithm defined the Thiessen polygons around each survey location with each data point having a probability of selection proportional to the area of its Thiessen polygon and taking time factor into place. This allowed for data located in densely surveyed regions to have a lower probability of selection than those in sparsely surveyed regions setting a high threshold for model performance. The annual predictions were then repeated in full using the remaining data points to predict PfPR 2-10 at the validation locations. The ability of the model to predict point-values of PfPR 2-10 at unsampled locations was quantified using four simple summary statistics: the correlation coefficient between the predicted and actual set; the mean prediction error (MPE); the mean absolute prediction error (MAE); and the root mean square error (RMSE).
The correlation coefficient provides a simple measure of linear association between the data and prediction sets, the ME provides a measure of the model bias, and the MAE and RMSE is a measure of the average accuracy of individual predictions. The results of the model validation by country are provided in Table W3.4.

W3.6 Rationale for not selecting data from borders with neighbouring countries
We paid a particular attention to the issue of borrowing border data. Conceptually it makes sense to borrow data from borders on the assumption that transmission on one side of the border is likely to be a function of transmission on the other side. However, what is clear from the published literature 168, 169 is the large disparity in the extent of malaria control and socio-economic progress and therefore the likely reductions in transmission, between neighbouring countries in Africa.
To reliably understand the effect of borrowing data from borders, the best test case is using border data to predict risk in a country where large reductions of malaria have been achieved and where evidence exists that transmission is extremely low, but borders a country of relatively high transmission and low levels of control. We used Mozambique (high transmission, low control) and Swaziland (very low transmission, high level of control) for this illustrative experiment. In Swaziland the MIS of 2010 showed that out 5567 people who were tested, only 1 was positive for malaria.
When we made predictions without using border data the risk map largely corresponded with the reported data. When additional data from a 100 km buffer into Mozambique was used, the northeastern part of Swaziland showed highly elevated levels of risk (see images below), which is contrary to any available evidence from the country (Kunene et al 2011). Therefore, making predictions for a country using only data from that country was a more parsimonious approach that will prevent over or under estimation of risk along the borders of a country.