HIV incidence in men who have sex with men in England and Wales 2001–10: a nationwide population study

Summary Background Control of HIV transmission could be achievable through an expansion of HIV testing of at-risk populations together with ready access and adherence to antiretroviral therapy. To examine whether increases in testing rates and antiretroviral therapy coverage correspond to the control of HIV transmission, we estimated HIV incidence in men who have sex with men (MSM) in England and Wales since 2001. Methods A CD4-staged back-calculation model of HIV incidence was used to disentangle the competing contributions of time-varying rates of diagnosis and HIV incidence to observed HIV diagnoses. Estimated trends in time to diagnosis, incidence, and undiagnosed infection in MSM were interpreted against a backdrop of increased HIV testing rates and antiretroviral-therapy coverage over the period 2001–10. Findings The observed 3·7 fold expansion in HIV testing in MSM was mirrored by a decline in the estimated mean time-to-diagnosis interval from 4·0 years (95% credible interval [CrI] 3·8–4·2) in 2001 to 3·2 years (2·6–3·8) by the end of 2010. However, neither HIV incidence (2300–2500 annual infections) nor the number of undiagnosed HIV infections (7370, 95% CrI 6990–7800, in 2001, and 7690, 5460–10 580, in 2010) changed throughout the decade, despite an increase in antiretroviral uptake from 69% in 2001 to 80% in 2010. Interpretation CD4 cell counts at HIV diagnosis are fundamental to the production of robust estimates of incidence based on HIV diagnosis data. Improved frequency and targeting of HIV testing, as well as the introduction of ART at higher CD4 counts than is currently recommended, could begin a decline in HIV transmission among MSM in England and Wales. Funding UK Medical Research Council, UK Health Protection Agency.


Trends in HIV incidence among men who have sex with men in England and Wales in the era of increased HIV testing and treatment: a nationwide population study, 2001 to 2010 Technical Appendix
Results presented in the main text were obtained from the implementation of a Bayesian back-calculation model based on counts of new diagnoses of HIV and/or AIDS, stratified by CD4 count.

Back-Calculation
Back-calculation permits infection incidence to be estimated from time series of observed counts of disease endpoints and information on the distribution of the time from infection to the endpoint of interest. If time is divided into intervals (t i−1 , t i ], i = 1, ..., I, in the discrete case, the method is based on the equation: where μ i is the expected number of endpoint events in the i th time interval, h j is the expected number of new infections in the j th time interval and f i-j represents the probability that the time between the infection and the endpoint is equal to i -j time intervals.
Back-calculation was first used to estimate the incidence of HIV using AIDS diagnoses as an endpoint, 1 and the method has been subsequently extended, replacing AIDS diagnoses with HIV diagnoses. 2;3 However, the occurrence of an HIV diagnosis, unlike AIDS, is not simply a function of infection progression. Diagnosis of HIV-infection follows the processes of infection transmission, infection progression, and presentation for, and acceptance of, testing. These processes interact in a complex way. Only by relating the observed data through a suitable back-calculation framework is it possible to unravel the effects of changes over calendar time in HIV testing patterns and in underlying HIV incidence. In the absence of direct knowledge of this shifting time to HIV diagnosis distribution, additional information is required to estimate both this distribution and infection incidence. Such information could be an indicator of recent infection or levels of some prognostic marker at diagnosis. 4;5 An earlier multi-state model was adapted and developed to represent the processes that underlie the observed HIV diagnosis data, 5 which incorporated data on CD4 count at HIV diagnosis, and allowed the time-to-diagnosis to vary as infection progressed between disease stages defined by CD4 count (Figure 1).
New infections occur according to a (non-homogeneous) Poisson process with h i representing the expected number of new infections arriving in stage 1 during the interval (t i−1 , t i ]. At time t i , the expected number of individuals in each CD4 stage, k, k = 1, ..., 4, is given by E ki . Of these, a proportion d k(i+1) will be diagnosed in the next time interval (t i , t i+1 ]. Of those remaining undiagnosed, a proportion q k(k+1) will progress to the next CD4 stage in the subsequent time period. The time-todiagnosis distribution is, therefore, a complex function of the diagnosis probabilities and the disease progression parameters.
The disease progression parameters are assumed fixed and known from an analysis of CASCADE data. 6 As a result of the assumptions on the infection process, the number of arrivals into stages 6-9 (HIV diagnoses) and 5 (AIDS diagnoses) during the i th interval, are Poisson distributed with means μ i HIV and μ i AIDS respectively. These means are evaluated through a recursive process: Further technical detail can be found in a complementary paper [7]. The expected proportions of the HIV diagnoses in the i th interval that are attributable to individuals from CD4 state k = 1, . .
. , 4, say r ki , can then easily be found by

Data
Data are available on both the total number of HIV diagnoses and AIDS diagnoses for the entire history of the HIV epidemic in England & Wales. The first diagnosis occurred in mid-1979, so t 0 was set to be the beginning of 1978, to allow sufficient incubation time for this initial diagnosis. Each time-step corresponds to a quarter year (three months). If an AIDS diagnosis occurred within one quarter of an initial HIV diagnosis, it was assumed that the diagnosis was of AIDS and not of HIV (and the individual contributed to the count at state 5 and not at state 9). Similarly, CD4 counts were assumed to be "at diagnosis" if they were measured within one quarter of the initial HIV diagnosis. This CD4 count data was available from 1991 onwards, with 25% of diagnoses having CD4 data in 1991 rising to 91% in 2010.
Although both types of diagnosis (AIDS and HIV) were available throughout the entire history of the HIV epidemic, a diagnostic test for HIV became widely available only in 1984 and very few diagnoses were prior to this year, mostly due to the retrospective testing of archived specimens.

Bayesian Inference
A Bayesian framework for statistical inference was adopted for the ease with which it incorporates multiple datasets and can augment data with additional information on model parameters through prior probability distributions. Bayesian inference combines the information held in the priors with that held in the data, via the likelihood function, to produce posterior distributions for model parameters. These distributions form the basis of our inferences.

Likelihood
The overall likelihood arises as the product of the component likelihoods for the observed quarterly HIV diagnosis counts, X i HIV , the quarterly AIDS diagnosis counts, X i AIDS , and the quarterly CD4 count at diagnosis data W i .
The likelihood for the diagnosis counts arise from the fact that these are independent Poisson random variables, i.e.
The CD4 count data are based on a sub-sample of the diagnoses of known size, N i ≤ X i , giving the number of these diagnoses that fall into each of the CD4 count strata. If W ki is the number of diagnoses in the i th interval with a CD4 count in where r i is the vector of probabilities (r 1i , ..., r 4i ) as defined in (2).

Priors
If both the infection incidence process, {h i : i = 1, …, I} and diagnosis process {d ki : i = 1, …, I; k = 1, …, 4} are allowed to freely vary at each time, then the model is over-parameterised and estimation is difficult. Therefore, in the interests of model parsimony and identifiability, some smoothing of these processes is employed. This is done by assuming that they are both vary according to a random walk on the log-and the logistic-scales respectively. The random walks ensure that sudden jumps do not occur and that, for example, incidence in one quarter is correlated to incidence in the next quarter. For example, in the case of incidence, if γ i = log(h i ), then These random walks introduce variance parameters that control the range of likely values for the step size, e.g. σ γ,i 2 above.
For the random walk on the diagnosis probabilities, this variance is held fixed over time whereas for the expected incidence, there is a breakpoint early on in the epidemic (in 1986) to represent the end of a period of initial early growth. These random-walk specifications for both the time-varying rates of infection and diagnosis describe the prior probability distributions for the majority of model parameters. All that is required beyond this is the specification of prior probability distributions for the variance parameters and the starting point for each random walk. All variance parameters are chosen to have reasonably uninformative priors, whereas the prior for the initial level of incidence is focused on very small values.

Estimation
Evaluation of the posterior distributions used for inference was problematic due to their non-standard form, making them algebraically intractable. Their estimation was therefore made possible through Markov Chain-Monte Carlo (MCMC) simulation methods, which were implemented via the JAGS software, 9 embedded into the statistical package R, 10 using the RJAGS packages. 11

Derived Quantities
Together with the disease progression probabilities, the estimated diagnosis probabilities at a particular time t specify a distribution for what we define to be a "snapshot" of the time-to-diagnosis interval i.e. the interval that would be observed were the probabilities of diagnosis held fixed at the time of the snapshot.
Furthermore, in equation (1), we calculate E ki the expected numbers of people in each of the CD4 count states k at the end of the i th time interval. These quantities allow us to estimate the undiagnosed prevalence over time and the time-varying distribution of the undiagnosed prevalence across the CD4 count states. Further detail on how to derive these posterior distributions as well as how the calculate the snapshot time-to-diagnosis distributions can be found in [7]. between the levels of incidence in each of the years, and, therefore, to provide evidence of any possible trend, we use posterior probabilities, p ij , that incidence in year i is greater than incidence in year j. These can be readily estimated from the MCMC simulation. Values of p ij that lie close to 0 or 1 provide evidence that there is a significant difference between the incidence in the two years.

HIV Incidence
For the years under study, the most significant of these posterior probabilities was the one corresponding to the years 2001 and 2004, which stated that incidence was greater in 2004 with probability 0.89. This is not considered to be particularly significant deviation in incidence between the two years. For the last three years, there is much weaker evidence of any difference to any of the other years, as can be expected from the widening credible intervals attached to the incidence over

Goodness-of-fit
The plots in Figure 2 show the ability of the model to predict the last ten years-worth of quarterly HIV and AIDS at HIV diagnosis counts, as well as the observed distribution of CD4 counts at diagnosis, conditional upon the size of the sample.
The dashed lines represent 95% predictive intervals under the model. As can be seen, these contain almost all (>99%) of all the data points, highlighting the model's suitability for handling such data.

Sensitivity to the Choice of the Incubation Period Distribution
An underlying assumption in the model is that the mean incubation period to AIDS is taken to be 8·6 years, as estimated from a longitudinal analysis of individual CD4 count histories from the CASCADE collaboration. 12 Longer incubation periods have been quoted in literature on the basis of similar types of data, 13,14 with the discrepancy attributable to differences in the assumed distribution of CD4 counts at, or shortly after, seroconversion. If a slower infection progression is assumed, such as the 11 years obtained by a survival analysis, 12 this would simply serve to further smooth the infection incidence curve and not lead to any changes to the trend in HIV incidence, our primary focus. Preliminary work has also suggested our conclusions would not be substantially altered if the model is extended to account for the known differences 6 in HIV progression associated with age at infection.