Alert Threshold Algorithms and Malaria Epidemic Detection

We describe a method for comparing the ability of different alert threshold algorithms to detect malaria epidemics and use it with a dataset consisting of weekly malaria cases collected from health facilities in 10 districts of Ethiopia from 1990 to 2000. Four types of alert threshold algorithms are compared: weekly percentile, weekly mean with standard deviation (simple, moving average, and log-transformed case numbers), slide positivity proportion, and slope of weekly cases on log scale. To compare dissimilar alert types on a single scale, a curve was plotted for each type of alert, which showed potentially prevented cases versus number of alerts triggered over 10 years. Simple weekly percentile cutoffs appear to be as good as more complex algorithms for detecting malaria epidemics in Ethiopia. The comparative method developed here may be useful for testing other proposed alert thresholds and for application in other populations.

A ccurate, well-validated systems to predict unusual increases in malaria cases are needed to enable timely action by public health officials to control such epidemics and mitigate their impact on human health. Such systems are particularly needed in epidemic-prone regions, such as the East African highlands. In such places, transmission is typically highly seasonal, with considerable variation from year to year, and immunity in the population is often incomplete. Consequently, epidemics, when they occur, often cause high illness and death rates, even in adults (1,2). The value of timely interventions-such as larviciding, residual house spraying, and mass drug administration-to control malaria epidemics has been documented (3), but much less evidence exists about how to identify appropriate times to take such action when resources are limited (4). Ideally, public health and vector control workers would have access to a system that provides alerts when substantial numbers of excess cases are expected, and such alerts should be sensitive (so that alerts are reliably generated when excess cases are imminent), specific (so that there are few false alarms or alerts that do not precede significant excess cases), and timely (so that, despite some inevitable delays between sounding the alert and completing interventions, adequate lead time exists to take actions that will reduce cases before they decline "naturally").
A number of such systems have been proposed or implemented, but the comparative utility of these systems for applied public health purposes has not been rigorously established. For example, the World Health Organization has advocated the use of alerts when weekly cases exceed the 75th percentile of cases from the same week in previous years (5), and other methods, based on smoothing or parametric assumptions, have also been considered (6)(7)(8). Such methods, known as early detection systems because they detect epidemics once they have begun, can correctly identify periods that are defined by expert observers as epidemic, albeit with varying specificity. However, the ability of early detection systems to generate timely alerts that prospectively identify periods of ongoing excess transmission has not, to our knowledge, been evaluated. A detection algorithm is useful for identifying interventions only if it identifies epidemics at an early phase (9), and it (as opposed to prediction) will work only to the extent that epidemics persist (and indeed grow) over time. Thus, detecting unusual cases at one time point will be a reliable indicator that an epidemic is under way (and will be so for long enough that action taken after the warning can still have an effect).
Another approach, known as early warning, attempts to predict epidemics before unusual transmission activity begins, usually by the use of local weather or global climatic variables that are predictors of vector abundance and efficiency, and therefore of transmission potential (10)(11)(12)(13)(14). Such systems have the advantage of providing more advance warning than systems that rely on case counts, but climate-and weather-based systems require data not widely available to local malaria control officials in Africa in real time. Such systems also depend on relatively complex prediction algorithms that may be difficult to implement in the field. Studies of the forecasting ability of such systems are beginning to emerge (15); initial studies have focused on the sensitivity rather than on the specificity or timeliness of the alerts.
We describe a method for evaluating the public health value of a system to detect malaria epidemics. We use this method to evaluate several simple early detection systems for their ability to provide timely, sensitive, and specific alerts in a data series of weekly case counts from 10 locations in Ethiopia for approximately 10 years. The fundamental question we address is whether detecting excess cases for 2 weeks in a row, under a variety of working definitions of "excess," can be the basis for a system that anticipates ongoing excess malaria cases in time for action to be taken.

Study Area and Data
We collected datasets consisting of weekly parasitologically confirmed malaria cases over an average of 10 years from health facilities in 10 districts of Ethiopia (online Appendix Figure 1; available from http://www.cdc. gov/ncidod/EID/vol10no7/03-0722_app.G1.htm). The data arise from passive surveillance systems in selected districts for the years 1990-2000. Original data collected on the basis of Ethiopian weeks (which range from 5 to 9 days) were normalized to obtain mean daily cases for each Ethiopian week, and normalized data were used for all analysis. Data are summarized in Table 1.

Epidemic Detection Algorithms To Be Tested
We investigated four classes of algorithms for triggering alert thresholds. In each case, an alert was triggered if the defined threshold was exceeded for 2 consecutive weeks. (This choice is intended to improve the specificity of the alert system for any given threshold.) If another alert was triggered within 6 months, it was ignored, on the assumption that intervening after the first alert would prevent another epidemic within the next 6 months. For the purposes of historically based thresholds (1 and 2 below), the thresholds for each year were calculated on the basis of all other years in the dataset for a given health facility, excluding the year under consideration.

Weekly Percentile
The threshold was defined as a given percentile of the case numbers obtained in the same week of all years other than the one under consideration. The use of percentile as alert threshold is straightforward, and the method is relatively insensitive to extreme observations.

Weekly Mean with Standard Deviation (SD)
We defined the threshold as the weekly mean plus a defined number of SDs. Mean and SD were calculated from case counts, smoothed case counts, or log-transformed case counts.

Slide Positivity Percentage
Some studies have indicated that the proportions of positive slides were significantly higher than the usual rate during epidemics (16,17), but whether the rise in proportion of positive slides occurs early enough to serve as a useful early detection system is not known. Slide positivity proportion was calculated from the number of blood slides tested and positive slides for malaria parasites.

Slope of Weekly Cases on Log Scale
We hypothesized that rapid multiplication of the number of normalized cases from week to week might signal onset of an epidemic. To test this hypothesis and the usefulness of detecting such changes as a predictor of epidemics, we defined a set of alert thresholds on the basis of the slope of the natural logarithm of the number of normalized cases. An advantage of the slide positivity and log slope methods over the others is that they can, in principle, be used to construct alert thresholds in the absence of retrospective data.

Comparison of Alert Thresholds
To circumvent the difficulties inherent in defining a "true" epidemic and to compare the properties of these thresholds on a scale that reflects the potential, operational uses of alert thresholds, we evaluated each alert threshold algorithm for the number of alerts triggered and the number of cases that could be anticipated and prevented ("potentially prevented cases") if that alert threshold were in place. Potentially prevented cases (PPC) for each alert were defined as a function of the number of cases in a defined window starting 2 weeks after each alert (to allow for time to implement control measures). The window of effectiveness was assumed to last either 8 or 24 weeks (to account for control measures whose effects are of different durations). Since no control measure would be expected to abrogate malaria cases completely, we considered two possibilities for the number of cases in each week of the window that could be prevented: 1) cases in excess of the seasonal mean and 2) cases in excess of the seasonal mean minus 1 SD. When the observed number of cases in a week is less than the seasonal mean or the seasonal mean minus the SD, PPC is set to a minimum value of zero for that week. Figure 1 depicts graphically how the PPC was calculated. For each value of each type of threshold at each health facility, the number of PPC was transformed into a proportion (percentage), by adding the number of PPC for the alerts obtained and dividing this sum by the sum of the number of potentially prevented cases, over all weeks in the dataset.
To compare the performance of dissimilar alert types on a single scale, a curve was plotted for each type of algorithm that showed mean percent of PPC (%PPC) over all districts versus average number of alerts triggered per year, with each point representing a particular threshold value. "Better" threshold types and values are those that potentially prevent higher numbers of malaria cases with smaller numbers of alerts.

Random, Annual, and Optimally Timed Alerts
To evaluate the improvement in timing of alerts provided by each of these algorithms, we calculated PPC for alerts chosen on random weeks during the sampling period. We also made comparisons to two alert-generating policies that could not have been implemented but are in some sense optimal in hindsight. First, we evaluated a policy of triggering one alert each year on the "optimal" week, i.e., the week with the maximum value of PPC. The value of PPC corresponding to the optimal week simulated an "optimally timed" policy of annual interventions; thus, it represents one alert every year. Second, we retrospectively went through data for each site to identify the optimal timing of alerts if one had perfect predictive ability; namely, we compared PPC for a single alert generated on every week of the dataset and chose the optimal week for one alert; then we went through the remaining weeks and chose the optimal week for a second alert, and so on. This system allowed us to plot an upper bound curve for the best choice of alert times, given a defined alert frequency.

Results
The dataset consists of a total of 687,903 microscopically collected malaria cases from a health facility in each of 10 districts over an average of 10 years. On average, each of the 10 health facilities treated 11-39 malaria cases daily and >300 cases per day during the peak transmission season (Table 1). In most districts, including Awasa, Zeway, Nazareth, Jimma, Diredawa, Debrezeit, and Wolayita, the number of cases showed clear seasonal fluctuation over time. Alaba, Bahirdar, and Hosana showed longer term variation, with an increasing trend in Alaba and more complex patterns in Hosana and Bahirdar. The number of cases in all districts shows a clear year-to-year variation.
The number of alerts triggered and %PPC obtained for each level of a threshold by type of algorithm varied in the 10 districts (online Appendix Figure 2). Number of alerts triggered and %PPC for a single alert threshold level are represented by a point. These points are summarized in Figure 2, which compares the performance of all algorithms on a single scale and explores the sensitivity of results to the choice of function for determining PPC [reducing cases to weekly mean, (a) and (b), or weekly mean minus 1 SD, (c) and (d)] and the choice of window of effectiveness [8 weeks, (a) and (c); 24 weeks, (b) and (d)]. All alert threshold algorithms potentially prevented a larger number of cases than random alerts, whose performance is shown as a straight line with cases increasing in proportion to the number of alerts.
The alert threshold algorithm based on percentile performed as well as or better than the other algorithms over the range of number of alerts triggered that we examined. For a given number of alerts triggered, it prevented a greater %PPC compared to other methods. Relative to optimally timed alerts, the percentile algorithm performed well, within 10% to 20% of the best achievable performance. The slope on log scale algorithm performed slightly better than the random but much worse than the other algorithms.
Threshold algorithms defined as the weekly mean plus SDs based on different forms of the data (normalized case counts, smoothed case counts, or log-transformed case counts) performed similarly, except that the algorithms based on the smoothed cases and log-transformed cases triggered fewer alerts at a given threshold value compared to the algorithm based on normalized cases.
For highly specific threshold values (triggering relatively few alerts), the slide positivity proportion showed a lower %PPC than any other algorithm except the log slope. This pattern was reversed at more sensitive threshold values; slide positivity thresholds of <65% showed a higher %PPC than the other threshold methods for a given number of alerts per year.
The annual alert, which corresponds to intervening every year during a fixed optimal week (generally just before the high transmission season), prevented 28.4% of PPC. However, an equivalent %PPC was prevented by the weekly mean and percentile algorithms with only 0.5 alerts per year.
The preceding numbers refer to the weekly mean with 8-week window assessment (Figure 2a). Comparative performance of the different alert thresholds was insensitive to the length of the window and the choice of function to define potentially prevented cases (Figures 2a-d). In all cases, the percentile algorithm performed best overall, although the difference became smaller for the 24-week window.
In all alert threshold algorithms, the %PPC rises with increasing number of alerts and then levels off approximately at 0.4 to 0.6 alerts per year. The interrelationship between levels of percentile used, number of alerts triggered, and %PPC is presented in detail to illustrate the factors that would contribute to choosing a cost-effective threshold value.  average, similar alerts per year. Figure 3 shows that alert threshold methods based on weekly data perform much better than those based on monthly data.

Discussion
We have described a novel method for evaluating the performance of malaria early detection systems for their ability to trigger alerts of unusually high malaria case numbers with sufficient notice so that control measures can be implemented in time to have an effect on the epidemic. By defining the performance of an algorithm in terms of the potentially prevented cases falling in a given time window after the alerts are generated, we attempted to capture the public health value of an alert system, which is its ability to predict excess malaria cases. Given the same number of alerts triggered by different potential detection algorithms, the objective is to identify an alert threshold algorithm that triggers alerts at the beginning of unusually high transmission periods, on the assumption that such periods are the ones in which interventions are likely to prevent the most cases.
Given the wide variations in malaria transmission, no standard expectation exists about what proportion of cases can be averted with what intervention. With the assumption that the magnitude of the effect of an intervention would be related to the difference between the observed number of cases and size of the long-term seasonal mean and SD, we calculated PPC. In other words, we assumed that an intervention would lower the number of cases towards the underlying seasonal mean or, if very effective, to l SD below the underlying mean. The sensitivity of the relative performance of the different algorithms was tested by using different window periods (8 or 24 weeks) of effectiveness of possible intervention methods. These window periods are based on the duration of effects of common interventions, such as insecticide spraying, which have residual activity of 8 to 24 weeks (18)(19)(20), and other emergency malaria epidemic control measures such as mass drug administration that could lower the incidence of malaria within an 8-to 24-week range (21). Unlike the complex detection algorithms tested for other diseases (22)(23)(24)(25)(26), the algorithms compared in this study are simple to implement without the use of computers, which are currently unavailable to malaria control efforts in most parts of Africa.
At relatively smaller number of alerts triggered, threshold algorithms based on percentile anticipated the highest percentage of the potentially preventable malaria cases of all approaches. The percentile algorithm's good performance relative to the optimally timed alerts indicates that it triggers alerts at the beginning of epidemics rather than in the middle of ongoing epidemics. Given the attractive characteristics of the percentile algorithm, a further ques-  tion is what percentile level one should use. Beyond 0.4 to 0.6 alerts/year, the %PPC leveled off because most of the peaks with higher numbers of cases, possibly epidemic periods, were detected with fewer alerts by using 85th to 90th percentiles. The leveling off of %PPC occurs because we assume that an alert triggered at week t, which leads to application of intervention measures, will prevent another alert until week t + 24. In practical terms, an intervention initiated after an alert was triggered by a less-specific alert threshold during relatively lower transmission might provide little benefit for a community in reducing malaria transmission, especially if it consumed scarce resources that would then be unavailable during periods of higher transmission.
In situations in which cost is not an issue and yearly application of preventive measures is possible, slide positivity proportion could be recommended. It performed as well as or better than all other types of algorithms when all algorithms were set to trigger an average of one alert per year. During malaria epidemics, the slide positivity proportion becomes very high (16,17), and the rise in the proportion of positive slides may begin at the onset of the epidemic to give an early warning, as our data showed. The interannual variation in the time and intensity of the peak of malaria transmission impacts the effectiveness of the annual alert with interventions at a fixed week every year; using the slide positivity proportion would identify the right time for intervention. The limitation for using slide positivity proportion is that it requires evaluating the cut-off level in individual health facilities and revising the baseline with a change of health personnel because the baseline slide positivity proportion may vary due to differences in epidemiologic patterns of malaria and other causes of fever. Thus, although slide positivity proportion thresholds could be defined in the absence of historical data, our results suggest that such data would be required to calibrate the threshold properly for any given locality. The slope on log scale algorithm performed poorly because the largest proportional rate-of-change for the number of cases tended to occur during periods of very low case numbers (perhaps reflecting chance fluctuations).
Comparative performance of different alert thresholds was insensitive to the length of the window and the choice of function to define potentially prevented cases. This study indicated the use of weekly data rather than monthly data in constructing threshold methods and in follow-up prevented more cases, consistent with the World Health Organization's recommendations (5).
A key limitation of our study was that the use of a longterm measure of disease frequency from a retrospective dataset assumes that the long-term trend did not change significantly and that the method of data collection remained the same. Factors such as change of laboratory technician affect the number of slides that are judged positive for malaria parasites. Such changes should be considered, and revising the threshold values frequently with the most recent data and standardized training of laboratory technicians are advisable. Moreover, existing interventions (which may, in some places, have been based in part on algorithms of the sort we considered) could also interfere with the trend. In this analysis, we did not exclude epidemic years from the data since, on the one hand, we do not have a standard definition of malaria epidemics and, on the other hand, all possible data points should be used to calculate measures of disease frequency and scatter to come up with potential threshold levels unless the data points were considered as outliers.
We deliberately chose to evaluate only simple, early detection algorithms, rather than more complex ones that might require climate or weather data or complicated statistical models. In the dataset we considered, the best of these simple algorithms performed quite well relative to the best possible algorithm, which suggests that they may be adequate for many purposes. In principle, the method we propose could easily be applied to evaluate more complex, early warning algorithms and to test whether their added complexity results in substantially better performance. It is an open question whether the same methods would work as well in localities (or for diseases) with different patterns of variation in incidence, for example, in those with less pronounced seasonal peaks in incidence.
In conclusion, we have shown that simple weekly percentile cutoffs appear to perform well for detecting malaria epidemics in Ethiopia. The ability to identify periods with a higher number of malaria cases by using an early detection method will enable the more rational application of malaria control methods. The comparative technique developed in this study may be useful for testing other proposed alert threshold methods and for application in other populations and other diseases.