Estimating Air Temperature and Its Influence on Malaria Transmission across Africa

Malaria transmission is strongly influenced by climatic conditions which determine the abundance and seasonal dynamics of the Anopheles vector. In particular, water temperature influences larval development rates whereas air temperature determines adult longevity as well as the rate of parasite development within the adult mosquito. Although data on land surface temperature exist at a spatial resolution of approximately 1 km globally with four time steps per day, comparable data are not currently available for air temperature. In order to address this gap and demonstrate the importance of using the right type of temperature data, we fitted simple models of the relationship between land-surface and air temperature at lower resolution to obtain a high resolution estimate of air temperature across Africa. We then used these estimates to calculate some crucial malaria transmission parameters that strongly depend on air temperatures. Our results demonstrate substantial differences between air and surface temperatures that impact temperature-based maps of areas suitable for transmission. We present high resolution maps of the malaria transmission parameters driven by air temperature and their seasonal variation. The fitted air temperature datasets are made publicly available alongside this publication.


Fourier transforms
The Fast Fourier transform [1] employed in this study requires input data at   We averaged the weight of each mode over the 1113 locations across Africa on the 1.5ᵒ grid for the time series of all time series variables used in this study ( Figure S1) and found that generally slower frequencies are more important than fast fluctuations, and by far the most important mode is indeed the annual mode, followed by the bi-annual mode for all datasets apart from the EVI and MIR time series, where even slower frequencies take the second and third ranks. These slower than annual frequencies determine fluctuations between years and are therefore not of interest if results should be generalised to years outside the range of the input time series. Overall, frequencies that are integer multiples of the annual frequency tend to be more important than other frequencies of similar magnitude, particularly in the low-frequency regime of up to around 5 oscillations per year, reflecting the fact that the seasonality is driven by annual cycles.

Reconstructing smoothed time series from the Fourier Transforms
We reconstructed the time series from the Fourier transforms, but only including the terms of the most important modes, starting with the constant term 0 H only, which yields values varying across locations, but constant throughout the year, then including the constant term and the annual mode which results in sinusoidal oscillations around the mean value, and finally adding successively higher modes of integer multiples of the annual frequency. We calculated correlations between the time series aggregated to 64 points per year, prior to Fourier transforms, and the Fourier reconstructions of these time series evaluated at the same time points ( Figure S2). The correlations with the constant term only vary considerably between the datasets and are surprisingly high for some datasets indicating a low level of seasonal variation compared to the spatial variation. The correlations steeply increase with inclusion of the first two annual modes, but then reach a plateau, and therefore including further frequencies in the reconstruction does not improve the capture of the general seasonal patterns seen. Plots of the original time series compared to the Fourier reconstructions based on the first few annual modes show that the annual mode captures the majority of the seasonal oscillations, whereas the biannual mode serves to hone the shape to follow the observed patterns more closely, and in most places the triannual mode does not contribute much to the overall visual match ( Figure  S3 to Figure S5), confirming the results found in comparing the correlations. We therefore use the reconstructed time series based on the constant term 0 H and the annual and biannual modes, 1 H and 2 H , respectively, to describe the overall seasonality. This reduces the number of parameters to be stored from 256 to 5 per location. The observed values vary around the smoothed curves with a standard deviation of the difference of 1.56ᵒC, 1.57ᵒC and 2.04ᵒC for air, modelled air and surface night temperatures, 2.03ᵒC, 1.66ᵒC and 3.40ᵒC for air, modelled air and surface day temperatures, 0.030 and 0.023 for EVI and MIR and 2.13 mm per day for rainfall.

Missing data
Some of the time series datasets contain a substantial amount of missing data (due to reasons such as cloud cover). In order to analyse the proportion of missing data we restricted our analysis to a grid of 1.5ᵒ resolution, see Figure S6. The largest amount of missing data is found for the night time temperatures, with over 30% of data points missing in some locations. For the day time surface temperatures, in most locations we had complete time series, but a few locations showed up to 17% of missing data. The EVI and MIR datasets shared the missing data patterns. For these, locations had between 0 and 4 of the 184 data points missing (up to 2.2%), with more missing data in the north than the south. In the rainfall dataset, only two locations of the 1113 locations on the grid had one data point missing, all other locations had complete daily time series. For some locations we found lengthy periods without any night or day time temperature measurements, which we interpolated linearly between the neighbouring data points. This could potentially affect the Fourier transforms and render the reverse transformations unreliable. However, the locations with the most missing data were in areas where the annual variation in temperature was low ( Figure S7), and the time series reconstructed from the Fourier transforms appeared to be reasonable approximations of the seasonal patterns in the original measurements even for the locations with the largest proportion of data missing ( Figure S8).   Figure S6A: longitude 9ᵒ, 27ᵒ and 27ᵒ and latitude 6ᵒ, -1.5ᵒ, -3ᵒ for A, B and C, respectively. Black circles are the actual data points, red crosses mark the missing time points, and black lines show the reconstruction via Fourier transforms.

Sensitivity analysis of the ring radius
We aggregated the environmental data within a radius of 5km in order to represent the surrounding area on a length scale relevant for local malaria transmission dynamics rather than use data from single point locations. As the radius of 5km was chosen fairly arbitrarily, we investigated the effect of different radii on the datasets, using the night and day time surface temperatures as examples. Results for the other datasets were very similar.
Firstly, the spatial aggregation within a ring meant that there were more potential input datapoints the larger the ring radius, and therefore the proportion of data missing was smaller for larger radii ( Figure S9). While datasets with smaller ring radius contained slightly more extreme values, the differences between the datasets were small, with all pairwise correlations between the 2km, 5km and 10km ring radius datasets above 0.98 for both night and day surface temperatures ( Figure S10).

Figure S10: Density of the surface temperatures for a 2km radius (panels A and C) and a 10km radius (panels B and D) against the surface temperatures for a 5km radius for night (A and B) and day time temperatures (C and D).
Temperatures are shown in Celsius, density on a logarithmic scale. Black lines indicate the identity.

Model validation and extrapolation
The approach of extrapolating the random effects to locations the model was not fitted to via ordinary kriging [2] only makes sense if there are spatial correlations between the random effects. Analysis of the random effects included in the model revealed considerable spatial correlations as described by the variograms shown in Figure S11 and Figure S12. We fitted functional forms of the exponential type to the observed variograms; the values of the fitted variogram parameters are given in Table S1, alongside the parameter values of variograms fitted to the random effects obtained from the validation model fit (fitted to 90% of the data), which are unsurprisingly rather similar.
For model validation, we used the random effects from the validation model to estimate the random effects at the locations excluded in the validation model, and compared these to the random effects obtained for these locations in the full model fit. The correlations between these were positive throughout, but somewhat variable in magnitude ( Figure S13 and Figure S14), indicating that there was some spatial component to the random effects which could be captured in the extrapolation, but also some additional unexplained variation.
We then extrapolated the random effects from the full model fit using the fitted variograms. The random effects surfaces ( Figure S15 and Figure S16) show that the extrapolated random effects had a lower variation than the random effects at the locations of the model fit, as the spatial correlations explained only part of the overall pattern of random effects; the remainder of the information inherent in the random effects was lost in the extrapolation.       Figure S15, but for the model fitted to day time air temperatures.