Short-term Forecasting of the Prevalence of Trachoma: Expert Opinion, Statistical Regression, versus Transmission Models

Background Trachoma programs rely on guidelines made in large part using expert opinion of what will happen with and without intervention. Large community-randomized trials offer an opportunity to actually compare forecasting methods in a masked fashion. Methods The Program for the Rapid Elimination of Trachoma trials estimated longitudinal prevalence of ocular chlamydial infection from 24 communities treated annually with mass azithromycin. Given antibiotic coverage and biannual assessments from baseline through 30 months, forecasts of the prevalence of infection in each of the 24 communities at 36 months were made by three methods: the sum of 15 experts’ opinion, statistical regression of the square-root-transformed prevalence, and a stochastic hidden Markov model of infection transmission (Susceptible-Infectious-Susceptible, or SIS model). All forecasters were masked to the 36-month results and to the other forecasts. Forecasts of the 24 communities were scored by the likelihood of the observed results and compared using Wilcoxon’s signed-rank statistic. Findings Regression and SIS hidden Markov models had significantly better likelihood than community expert opinion (p = 0.004 and p = 0.01, respectively). All forecasts scored better when perturbed to decrease Fisher’s information. Each individual expert’s forecast was poorer than the sum of experts. Interpretation Regression and SIS models performed significantly better than expert opinion, although all forecasts were overly confident. Further model refinements may score better, although would need to be tested and compared in new masked studies. Construction of guidelines that rely on forecasting future prevalence could consider use of mathematical and statistical models.


Introduction
The World Health Organization (WHO), the International Trachoma Initiative, Ministries of Health, and their partners aim to control blinding trachoma by 2020, implementing surgical campaigns, antibiotic distributions, hygiene initiatives, and environmental improvements [1]. Trachoma control is a massive undertaking: 50 million doses of antibiotics are now distributed annually, in 30 countries [2]. The Global Trachoma Mapping Project alone will complete population-based surveys in more than 1400 districts worldwide by the end of 2015 [3,4]. Surveys and treatment histories are now available for the vast majority of trachoma-endemic districts worldwide [5]. However, we do not know where WHO goals will likely be and not be achieved. Decisions on when to start and stop treatments are still based on guidelines dependent in large part on expert opinion [6].
Mathematical models have provided insight into the transmission of infectious diseases including trachoma [7][8][9][10][11][12][13][14][15]. However, they have rarely been used to make falsifiable predictions. As a candidate for prediction, trachoma may have some advantages over other infectious diseases. Trachoma has no nonhuman reservoirs, no long-lasting latent stage, and as yet no clinically important drug resistance, simplifying modeling greatly compared to diseases such as cholera, onchocerciasis, and tuberculosis [1]. With many infectious diseases such as SARS and Ebola [16,17], epidemics occur sporadically in time and place; forecasting can be made in a predictable time frame with post-treatment trachoma. Community-randomized trials have provided longitudinal assessment of multiple communities after mass antibiotic distributions. In a sense, after each mass treatment has brought infection to a low level, infection returns in a synchronized manner in a number of communities, offering results somewhat analogous to a repeated experiment.
Accurate forecasts could inform stakeholders of realistic goals, define trouble spots to focus resources, and suggest areas headed towards control even in the absence of intervention. Prediction has scientific value as well. The ability to predict the prevalence of an infectious disease is a test of our understanding of the epidemiology. Here, we use recent clinical trial data to forecast the prevalence of ocular chlamydial infection in children in 24 endemic communities in Niger. We compare model forecasts to expert opinions, and to a statistical regression that uses no special knowledge of the infectious process.

Data collection
Forty-eight communities were followed as part of the Niger arm of the Partnership for the Rapid Elimination of Trachoma (PRET) study. Communities were randomized to either mass antibiotics of the entire community, or antibiotics targeted just to children 12 years and younger. The 24 communities included in this study received annual antibiotic treatment of all ages. Communities were assessed at baseline and then biannually for 3 years. All individuals were offered antibiotic treatment annually, within two weeks of the assessment: children under 6 months, those allergic to macrolides, and pregnant women were offered topical tetracycline, and all others were offered a single dose of oral azithromycin (20 mg/kg for children and 1 gram for adults).
A random sample of 100 children 0-5 years old were selected from each community. If a community had less than 100 0-5 year-old children, then all were offered assessment. Each participating child had their upper right tarsal conjunctiva swabbed, and processed for PCR as previously described [18].

Ethics statement
This study of de-identified data received ethical approval from the Committee on Human Research of the University of California San Francisco and was carried out in accordance with the Declaration of Helsinki. All adult subjects provided informed consent, and a parent or guardian of any child participant provided informed consent on their behalf. The informed consent given was oral: (a) we chose verbal consent because of the low literacy rates in the study area, (b) the IRB (10.00812) approved the use of oral consent, and (c) oral consent was documented on the registration form for each study participant prior to examination in the field.

Survey methods
The WHO NTD-STAG Monitoring and Evaluation Working Group had a sub-group meeting to discuss trachoma surveillance on September 11-12, 2014 in Atlanta, GA, USA at the Task Force for Global Health, co-sponsored by WHO and the NTD Support Center. Fifteen trachoma experts were asked to forecast the 36 month prevalence of infection in the 24 communities of the PRET-Niger study described above, and were provided, for each community, the biannual prevalence estimates from 0-30 months, the antibiotic coverage at 0, 12, and 24 months, the estimated population of 0-5 years olds at baseline, and the number of children sampled at baseline (Table 1). For each of the 24 communities, the experts were asked to provide their median estimate at 36 months, as well as the lower and upper bounds of their centralized 95% credible interval (the 2.5 th , 50 th , and 97.5 th percentile of their belief). The community expert opinion was constructed by estimating each of the 15 individual's distribution for each village (see Scoring below) and then taking the arithmetic average assuming equal weights, and was used as the primary survey forecast, although each individual's forecast was also scored separately. The number of trachoma publications by each of the 15 experts was assessed by a PubMED (National Library of Medicine) search on December 1, 2014 (expert name as author AND "trachoma" as keyword).

Statistical methods
Linear mixed effects regression was used to model the prevalence at 12 months and 24 months based on observations at 6 months and at 18 months, respectively. A random intercept was used for each village. To improve normality and homoskedasticity, the square root transform was applied to the prevalence fractions. The fitted model was then used to predict the prevalence at time 36 based on observations at 30 months. Standard errors were obtained using clustered bootstrap. All calculations were conducted using R (R Foundation for Statistical Computing, Vienna, Austria, v.3.1 for Macintosh, package lme4). While the primary regression was of square-root transformed regression with a community-level random effect, we also included a linear regression model without a community-level random effect.

Modeling methods
We constructed a stochastic transmission model of transmission of Chlamydia trachomatis infection over time. For village j (j = 1, . . . 24), we assumed a population of size N j , taken from the number of children aged 0-5 years found in the census at the time of treatment k (k = 1, 2, 3 corresponding to baseline, 12 and 24 months). We assumed a classical SIS (susceptible-infectious-susceptible) model structure, assuming that the force of infection is proportional to the prevalence of infection in the population of children aged 0-5 years with proportionality constant β, and a constant per-capita recovery rate γ [19]. Between periods of treatment, we assumed that the probability p ðkÞ i;j ðtÞ that there are i infectives in village j at time t after treatment time point k obeys the following equations [20,21]: To model treatment, we assumed that each child aged 0-5 years in village j has probability c ðkÞ j of receiving treatment with the antibiotic efficacy e k for treatment period k. We modeled each treatment according to p ðkÞ i;j ðt ¼ 0Þ ¼ A . We assumed a standard beta-binomial prior (the binomial distribution in which the probability of success at each trial follows the beta distribution) Bðm;rÞ (where the shape parameters μ and ρ for each treatment were computed from the observed distribution of infection of 24 villages at baseline, 12 and 24 months, B(z 1 , z 2 ) is the beta function) [22]. The pre-treatment prevalence distribution was then computed for each village by applying Bayes' theorem: For each village j, the initial condition is determined from Eq (2), and the system numerically integrated for six or twelve months according to Eq (1). Specifically, for each village j, the pre-treatment distributions of kth treatment is p ðk;preÞ We assumed independent villages, so that the total loglikelihood at time τ months after each treatment k may be computed by summing over all 24 villages P 24 j¼1 logð The transmission coefficient and antibiotic efficacy in the model were optimized by using the Metropolis algorithm with the total likelihood of three treatment periods to fit the model to the observed numbers of PCR-positive individuals of children aged 0-5 years in each village at 6, 12, 18, 24 and 30 months [23]. Forecasting the distribution of the observed number of PCRpositive individuals of children aged 0-5 years in a village at 36 months, conditionally on the observed numbers of PCR-positive individuals of children aged 0-5 years at baseline, 6, 12, 18, 24 and 30 months from the same village, was done by using a hidden Markov model according to the equation of forecast distribution [24].
The primary modeling forecast was pre-specified as the SIS process model with a random effect, although the SIS model without a random effect was included as a sensitivity analysis. In addition, the forecast of each model as a distribution over 101 discrete units was included as a comparison to the distribution estimated by minimizing the Fisher's information (which allows a symmetric credible interval to approach a normal distribution, as well as the flexibility of asymmetric credible intervals to represent skewed distributions). Sensitivity analyses included changing the fixed mean infection duration assumed in the model to be 6 months, to 3 months or to 12 months.

Scoring
To ensure a fair comparison, all forecasts were scored from the proposed median and 95% CrI. Given the denominator of the sample for each village at 36 months, the discrete distribution which minimized the Fisher's information while constrained to that expert's median and 95% CrI was estimated (Mathematica 10.0). As a sensitivity analysis, the SIS model forecasts were also presented as a distribution from 0 to 100%, with the score compared to the score derived from the median and 95% CrI. The modeler, statistician, and each of the 15 experts surveyed were all masked to the 36-month results, as well as to the forecasts made by others. Different forecasts were pairwise compared using Wilcoxon's signed-rank test (Mathematica 10.0), using the Holm-Bonferroni multiple comparison correction, assuming 3 tests.
As a sensitivity analyses, we assessed whether the likelihood of the observed data would be greater (or lesser) had each forecast been more (or less) certain. Specifically, we perturbed each forecast by taking the density at each possible prevalence to the 1+ power, normalizing, and determining the likelihood of the observed data. Note that this maintains the support of a forecast, maintains the ordering of the outcomes, and increases the Fisher's information proportionally by (or decreases information proportionally for <0).

Results
At the baseline census, communities had a mean of 146 children (95% CI 137 to 155) aged 0 to 5 years. The mean antibiotic coverage of children was 92.3% at baseline, 89.0% at 12 months, and 89.8% at 24 months. At baseline, the estimated prevalence of infection in the 24 communities ranged from 2% to 58% with a mean prevalence of 21.1% (95% CI 19.8% to 22.5%) [18]. The community prevalence of infection at each biannual visit is displayed in Table 1. The observed prevalence of infection at 36 month which was to be forecasted ranged from ranged from 0% to 22.5% with a mean prevalence of 5.8% (95% CI 5.2% to 6.4%).
The 15 experts provided forecasts for each of the 24 communities, with the mean taken as the community forecast (Fig 1). Fig 2 shows the forecast distributions for the community of experts, regression, and the SIS model, and Table 2 ranks the likelihood of the observed 36-month prevalence for each (S1 Fig and S1 Table in Supporting Information show the difference between observed and forecast prevalence). The estimated parameters of the SIS model with random effect are shown in Table 3. The SIS model and the square root-transformed regression had significantly better likelihood than the experts (p = 0.004 and p = 0.01, respectively), and than the linear regression (p = 0.01 and p = 0.02, respectively). All forecasts were positively biased, on average estimating a greater prevalence than was observed. All forecasts had a lower (worse) likelihood if their Fisher's information was marginally increased. No individual expert forecast was better than the community forecast (the mean of the 15 experts).
A priori, the SIS model assumed a mean duration of infection of 6 months, obtaining a loglikelihood of the observed 36 month data of -41.03. Had we assumed the mean duration of infection was 3 months or 12 months, the loglikelihood would have been -41.47 or -39.91, respectively. If we had assumed the 6 month duration of infection, but did not use a community-level random effect, the likelihood score would have been -41.57. To fairly compare the different methods, the distribution of each forecast was estimated by minimizing the Fisher's information given the estimated median and 95% CrI. For the SIS model, we also expressed each full distribution, obtaining a loglikelihood score of -40.90, or nearly the same as the -41.03 obtained from minimizing the Fisher's information.
The mean number of trachoma citations on PubMED by the experts was 42 (range 0 to 133). The likelihood score and number of publications was actually inversely correlated (Spearman's correlation -0.33, p = 0.24), thus we were unable to demonstrate that this measure of expertise was associated with better forecasting.
We performed logistic regression, assuming the individual PCR results most likely to have obtained the observed pooled results, but this performed no better than linear regression of the square-root transformed regression.

Discussion
An SIS hidden Markov model and a regression model both produced forecasts with significantly higher likelihood of the observed data than a community of experts. The SIS model, which attempted to utilize an understanding of the infectious process and mass treatment, performed significantly better than linear regression, but only slightly (and not significantly) better than regression of the square root-transformed prevalence.
In general, more uncertainty resulted in better scoring forecasts. For every forecast a mathematical perturbation which reduced the Fisher's information resulted in a higher likelihood of the observed data. The inclusion of a community-level random effect in the SIS hidden Markov model improved forecasting, perhaps by increasing uncertainty. The composite survey contained less information than any individual survey, and did better than any single individual forecast. The benefit of adding uncertainty could suggest that forecasts are inherently over-confident, or that additional variance components of the data were not considered by any of the methods.
Even though the SIS hidden Markov model and regression model had significantly higher likelihood than the community experts, the forecasted distributions of prevalence (as shown in Fig 2) by all models were very similar and did not show which model was significantly better than other models. With more available data, models could improve forecasting. The SIS hidden Markov model did not include infection from outside the population of children aged 0-5 years in each community. Our previous model [13] used a simple constant exogenous infection rate to represent infection from older children or adults to children aged 0-5 years within the same community, and did not find significant differences between the estimated transmission coefficients with and without the exogenous infection rate for different durations of infection.
Of course, such models could be further refined to reflect age structured transmission dynamics. In this setting, the other age groups (older children and adults) were being treated as well, and other studies have shown consistently higher prevalence in small children than in other age groups (e.g. [25]). The prevalence of infection in different communities is clearly correlated visit-to-visit, with visits 6-months apart having a higher correlation than visits further apart. However, there may be a fundamental limit to the predictability at the community level, simply due to the vagaries of who infects whom and when they do so. Mathematical models and cross-sectional empirical studies have suggested that as disease is disappearing, the prevalence of infection should form an exponential distribution (or its discrete analog, a geometric distribution), whether the disappearance is due to mass antibiotics, environmental improvements, or a secular trend [26,27]. This exponential distribution has a much heavier tail than, for example, the normal distribution, so outliers are to be expected even when all communities are assumed to have identical transmission characteristics. Six-months is a relatively short period in trachoma control-programs typically reassess endemicity every 1-5 years. If predictability decreases as time increases between visits, than we would expect that apparent hotspots at one visit may not be the most affected areas at a subsequent visit. This has been termed chasing ghosts by trachoma programs (personal communication, PME).
Forecasts, whether made by experts, statistics, or mathematical transmission models, are rarely done in a falsifiable manner. Here, all participants were presented with identical information and masked to the results and to the other forecasters. Forecasts described the distribution of all possible outcomes, not a prediction of the single most favorable, and were scored in a pre-specified manner. The availability of results from 24 communities allowed a statistical comparison between forecasts, reducing the chance that the overall score would be dependent on a single fortunate guess.
Current WHO guidelines for starting mass drug administration are based on the district prevalence of the clinical signs of disease rather than infection, and future studies could assess forecasting at that level. In this study, we forecasted the community-level prevalence of ocular chlamydial infection. WHO guidelines currently include sub-district level intervention, at least for hypo-endemic districts with 5-10% prevalence of clinical activity in children. Individual community-level forecasting may become important for surveillance after mass antibiotic administrations have been discontinued.
Programs currently make decisions based on recommendations offered by the WHO [1]. Guidelines have relied heavily on extrapolation of existing evidence and expert opinion, since not all scenarios have been, or likely will ever be, tested in community-randomized trials. Forecasting at the individual community level has not been particularly successful. While forecasting at the district level may be more feasible than forecasting at the individual community level, statistical and transmission model forecasts should be evaluated. If proven more effective, as they were in this setting, then it may be reasonable for programmatic decisions to be based on statistical or modeling forecasts rather than just expert opinion.
Supporting Information S1 Fig. Different forecast methods versus observed result. Regressions (linear regression, green; square root-transformed, blue), SIS hidden Markov Model (red), and community of experts (black), with mean (solid circle) and 95% CI (circle). (TIFF) S1 Table. Difference between observed result and forecast by SIS, linear regression, square root transformed regression, and community opinion. (DOCX)