Hepatitis E Virus Seroprevalence among Adults, Germany

We assessed hepatitis E virus (HEV) antibody seroprevalence in a sample of the adult population in Germany. Overall HEV IgG prevalence was 16.8% (95% CI 15.6%–17.9%) and increased with age, leveling off at >60 years of age. HEV is endemic in Germany, and the lifetime risk for exposure is high.

The incidence in the manuscript was computed using a so called simple catalytic epidemic model (Griffiths 1974;Farrington 2005), where the force of infection (FOI) is assumed to be time constant. The dependent variable in the fitted models was a binary variable indicating whether a person in the study had seroconverted or not. Catalytic models assume that infection induces life-long immunity and does not affect the mortality rate of infected individuals. A consequence of the constant FOI is that the population is assumed to be homogeneous with respect to both susceptibility and exposure to infection. Furthermore, infection is assumed to be in equilibrium state, i.e. the level of incidence is assumed to remain constant in time.

Notation
Denote the available data { (y i ,a i ), i = 1,…,n} where y i is a binary variable indicating if the i'th individual has seroconverted (0 = no, 1 = yes) and a i is the age of the individual in years (i.e. taken as a continuous variable). Let p(a) be the probability that an individual of age a has sero-converted. Inference about the parameters in a parametric model for p(a) can now be performed using the binomial likelihood: In the constant FOI model, i.e. λ(a) = α for a≥0, the probability to have seroconverted at age a is given by p(a) = 1exp(-αa). One can show (see, e.g., Becker 1989or Farrington 2005 that the desired estimation problem for α can be reduced to the fitting of a generalized linear model with complementary log-log (cloglog) link function having the following linear predictor: This model can be fitted with any GLM software (e.g., function glm in Stata or R) by specifying a binomial model with cloglog link function and using log(a) as offset in the linear predictor. The natural exponent of the intercept estimate in such a model is the desired estimate α. The annual incidence can now be computed as

Results
Performing the above calculations for our n = 4,352 individuals we obtain Î = 0.00398, i.e., the annual incidence is 398 per 100,000 population with a 95% CI of 372-428. To address post-stratification in the analysis, we additional fitted the above GLM by weighting each observation according to its specific post-strafication sampling weight (e.g. function svyglm in Stata or R). The (survey-weighted) estimated annual incidence is now 392 per 100,000 population (95% CI 364-423). We report this weighted estimate of the annual incidence in the manuscript together with the associated confidence intervals.

Model Checking
Graphical analysis of the residuals from a model with binary response is difficult due to the extreme discrete nature of the problem. Furthermore, such model checking is further complicated by the complex survey setup of our sample. Instead, we perform an alternative examination of the model fit. As a first qualitative assessment, we decided to investigate the models point-wise predictive performance. Based on the asymptotic normality of the GLM  incidence, is age specific. However, age specific incidence rates would be harder to interpret and report, especially because our aim was to calculate an overall incidence rate. Another reason is that even this model is not fully sufficient to address all aspects of the data: the green line shows the fit of a fully flexible survey-weighted kernel smoother for p(a). In concordance with Figure 1 of the manuscript we here observe a slight decrease in the seroprevalence at the high ages, which is mentioned in the manuscript discussion. Reporting age-specific annual incidence rate based on such a flexible model would require a table containing a number for each year of age, which is not really useful for our purpose.
Within the cloglog GLM framework it is possible to allow for heterogeneity in the FOI by adjusting for additional variables than age. We investigated additional dependence of sex and residence in the linear predictor, but none of these variables turned out to be significant at the 5% significance level.
Thus, our reported annual incidence estimate remains a nicely interpretable and communicable result which allows for comparison with Faramawi et al., while the above model checking indicates that the assumptions of the constant rate model are reasonable.