Main

Infectious disease epidemiology and chronic disease epidemiology have both contributed enormously to the understanding and prevention of human disease but have largely developed as separate disciplines, notably in terms of research methods. In both fields, statistical modelling has been extensively developed to analyse data from observational studies and capture associations between postulated risk factors and disease. In infectious disease epidemiology, mathematical models have also been used for more than a century to gain insight into the natural history of infection, transmission dynamics (Anderson and May, 1991) and the design and evaluation of prevention programmes (Garnett et al, 2011).

In the last three decades, the appreciation of the role of infections in cancer aetiology has greatly expanded. Among the 13 million new cancer cases that occurred worldwide in 2008, around 2 million (16%) were attributable to infections (De Martel et al, 2012). The use of mathematical models of infection transmission has, therefore, entered the field of infection-related cancer epidemiology, notably in the study of hepatitis B virus (HBV), human papillomavirus (HPV), human immunodeficiency virus (HIV), and their related cancers. The latent period between acquisition of carcinogenic infection and cancer incidence can last decades and requires the transition through intermediate steps, that is, persistent infection and pre-malignant lesions. This long latency adds substantial complexity to the evaluation of causality and assessment of infection control strategies at both a population and individual level.

Methods relying upon the simulation of transition between disease states have been used to gain insight into the natural history of cancer, such as progression of pre-cancerous cervical lesions (van Oortmarssen and Habbema, 1991; Kim et al, 2007a). However, infection transmission models, which are dynamic representations of infection natural history within a hypothetical population, explicitly account for the ability of transmission patterns and immune response to shape infection-related cancer epidemiology. They are useful to understand infection transmission processes, to estimate the key parameters that govern the spread of infection, and to project the potential impact of different preventive and therapeutic measures (Grassly and Fraser, 2008). Infection transmission and chronic-disease modelling methods are increasingly combined (Kim et al, 2007b; Garnett et al, 2011). However, the concepts, terminology, and methods used to study infection transmission dynamics are not yet well known in the domain of cancer epidemiology. This review aims to concisely illustrate the use of these models. We also briefly summarise models of carcinogenesis and discuss their specificities and possible integration with models of the natural history of cancer-associated infections.

Infection transmission models

Infection transmission models are mathematical models designed to capture the circulation of infectious agents at a population level (Keeling and Rohani, 2008). The population is subdivided into mutually exclusive ‘compartments’, representing the different phases of the natural history of the infection of interest. Figure 1 shows several alternative compartmental models, named after the compartments and the possible transitions between them. The simplest models have compartments named ‘susceptible’ (S), ‘infected’ (I), and ‘recovered’ (R). In the field of cancer-associated infections, R corresponds to infection clearance according to the findings of ad hoc tests rather than clinical recovery from an infectious disease, such as rubella or measles.

Figure 1
figure 1

Transmission models represented as flow diagrams. Some examples of alternative compartmental models for modelling infectious diseases. The total population is distributed into mutually exclusive epidemiological compartments. The models are defined by these compartments and the possible transitions between them. In the simplest models, there are two or three states: susceptible, infectious, and recovered. More complex models can also account for a latent infection, and carrier status. Finally, transmission and carcinogenic phases of the natural history of infection-related cancers can be combined into a single model.

PowerPoint slide

Several variants of these models have been used to represent HPV infection: in the SIS model, infected individuals return to the susceptible state after clearing an infection, so that they can be re-infected (Taira et al, 2004). Conversely, in the SIR model, an individual moves to the recovered compartment after clearing an infection and is assumed to be immune to re-infection (Hughes et al, 2002). In the hybrid SIS/SIR model, susceptibility to re-infection is decreased, but not totally eliminated (Baussano et al, 2013). Infections with more complex natural histories require a correspondingly more complex system of compartments. Models of HBV infection may add a compartment for latent infections, in which an individual is infected but not infectious, and another for chronic HBV carriers, that is, those at increased risk of hepatocellular carcinoma (Figure 1) (Kretzschmar et al, 2002). Furthermore, models of carcinogenic infections may also incorporate key steps of cancer natural history, such as precancerous lesions and invasive cancers (Figure 1).

The course of the infection in the population is modelled by a set of equations that describe the rates at which individuals move from one compartment to another (Box 1). The per-capita rate at which susceptible individuals acquire infection, known as the force of the infection, is a product of three factors:

  • The contact rates between individuals in the population: ‘Contact’ means an opportunity to transmit the infection. Its exact definition depends on the route of transmission (e.g., air-borne, blood-borne, food-borne, or sexually transmitted) and is affected by population-specific behaviour (e.g., mixing patterns) and social contacts and conditions (e.g., overcrowding) (Mossong et al, 2008).

  • The probability of infection per contact: not all contacts result in transmission. The potential for a contact to result in transmission is an important determinant of the spread of the infection in the population. For instance, the probability of HPV transmission per heterosexual contact (Burchell et al, 2006) is estimated to be much higher than the same probability for HIV (Boily et al, 2009).

  • The proportion of infected individuals in the population at a given time: a product of the entire history of infection in the population up to this point.

If a new infection is introduced into a closed population, that is, a population without births and deaths or migrations, the SIR model generates the epidemic curve shown in Figure 2. This curve is characterised by an initial exponential increase of infected individuals followed by gradual saturation, and then a decline. The number of susceptible individuals declines as they become infected and reaches a plateau as the epidemic burns out. In open populations, periodic epidemics can occur among new susceptible individuals, as seen in many childhood illnesses, or the infection may progressively reach an endemic steady state with constant fractions of the population in each compartment.

Figure 2
figure 2

Epidemic dynamics of the SIR model in a closed population. The evolution of a SIR model over time is depicted by the curves representing the proportion of susceptible, infectious, and recovered individuals of the population. In this example, the epidemic peaks with 15% of the population in the infectious state at 37 days. By the end of the epidemic, 80% of the population has recovered and 20% is still susceptible, never having been infected.

PowerPoint slide

Infection transmission dynamics

A key quantity that determines the population-level dynamics of infections is the effective reproductive number Rt, defined as the mean number of secondary infections produced by an infected individual at any time t. Rt is the product of four quantities: the proportion of susceptible individuals in the population, contact rates between individuals, the mean duration of infectiousness, and the probability of transmission per contact (Garnett, 2005). During the initial, growing phase of an epidemic, Rt is greater than 1, whereas in the final declining phase, Rt is less than 1. In an endemic situation, Rt is equal to 1. Changes in Rt can make an infection shift between epidemic, endemic, or infection-free phase. Prophylactic vaccination decreases the fraction of susceptible individuals in the population, and thereby decreases Rt. Sufficiently high vaccine coverage can reduce Rt below 1 and leads to infection elimination, even without vaccinating 100% of the population. This represents one of the most important strengths of mass immunisation programmes, that is the ‘herd immunity’ phenomenon by which vaccination also protects non-vaccinated individuals in the same population (Garnett, 2005). Conversely, the advent of HIV and the related immunodeficiency in sub-Saharan Africa greatly inversed the incidence of Kaposi Sarcoma (KS) by increasing the infectiousness, and hence Rt, of the herpesvirus causing KS and the probability of neoplastic transformation (Mesri et al, 2010).

Although infection transmission is a random process, the population-level behaviour becomes more regular as the size of the population increases. In a sufficiently large population, the dynamics of the infection can be approximated by a deterministic process (Keeling and Rohani, 2008). Deterministic models avoid the need to model each individual separately. Instead, the proportion of individuals in each compartment defines the state of the population and is sufficient to determine the future course of the infection. Alternatively, infection transmission can be modelled as a random or stochastic process. Stochastic models typically model individuals as they move through the various compartments according to a random process (Grassly and Fraser, 2008). Unlike deterministic models, which always produce the same results given the same parameters and initial conditions, stochastic models can generate a distribution of possible outcomes from a given set of parameters. This can be particularly useful when dealing with the earliest and latest phases of an epidemic in which small numbers of individuals are typically involved. In such a case, process variability (i.e., the innate variability in outcomes due to randomness) can determine the course of an epidemic (i.e., invasion or extinction). By definition, such variability cannot be captured by deterministic models.

Combining models with empirical data

Infection transmission models can be made more complex by describing more accurately the contacts between individuals and making the transition rates between compartments dependent on time, or demographic variables. These modifications can be used to tailor the model to a specific population, rather than a generic hypothetical one. The contact network through which an infection is transmitted can be represented with different levels of complexity. The most simplistic approach assumes that infectious and susceptible individuals interact at random, depending on either the proportion or absolute number of infectious individuals (Grassly and Fraser, 2008). More realistic approaches explicitly model the network and duration of contacts (Grassly and Fraser, 2008). Sexual partnership formation in a population, for instance, takes place according to preferences based on individual characteristics, such as age, sexual behaviour, and socio-cultural level. Furthermore, partnership duration and the concurrence of multiple partnerships are likely to affect the circulation patterns of sexually transmitted infections (Liljeros et al, 2003). Unfortunately, the data that are necessary to accurately parameterize the network and duration of contacts are rarely available. Most available models have adopted intermediate approaches, in which rates and probabilities of contacts within a population depend on more or less easily measurable characteristics, such as age, sexual habits, or intravenous drug use (Garnett and Anderson, 1994).

Transition rates between compartments govern the average time an individual spends in each compartment. In the simplest models, transition rates may be assumed to be constant. Alternatively, transition rates may be a function of time-dependent variables, such as age or time already spent in a compartment. For example, Kim et al (2007a) assumed that the probability of HPV16 clearance did not change according to a woman’s age and infection duration. Conversely, Barnabas et al (2006) assumed that the probability of HPV16 clearance was not constant but decreased as a decreasing function of age. Baussano et al (2010) modelled clearance rate as a decreasing function of time elapsed since infection irrespective of woman’s age as strongly supported by more recent data (Rodriguez et al, 2010).

The values of demographic, behavioural, and biological parameters that govern the behaviour of an infection can either be plugged into the model, if estimates are available from observational studies, or inferred by calibrating the model so that its outputs conform to empirical data. In the last decade, fitting methods have evolved. Early HPV models mainly aimed at visually assessing how well model outputs fitted empirical data (Taira et al, 2004), whereas more recent models rely to a greater extent on the systematic exploration of parameters and formal selection of best fitting sets of values (Hoare et al, 2008). Markov Chain Monte Carlo techniques and Bayesian analysis are increasingly used to infer parameter values (Toni et al, 2009).

Modelling cancer natural history

Many infection-associated cancers are the result of an asymptomatic persistent infection. Many relevant cancer-associated infections, for example, HPV, HBV, and Epstein-Barr virus, have a role both at early and late steps of the carcinogenic process, to initiate, and sustain malignant genotype and phenotype.

The carcinogenic process can be measured in terms of subclinical markers: for example, morphologically distinguishable lesions in cells (cervico-vaginal smears) or tissue (biopsies); serological markers (HBV antigens and antibodies); or, increasingly often, imaging techniques (mammography) and molecular markers (PCR-detected DNA or RNA of the pathogen of interest). Subclinical disease states may regress or eventually progress to cancer. The highest-quality epidemiological data comes from prospective studies, with repeated visits from participants to determine their current infection or disease status. A key feature of this type of cohort study, known as the panel study, is that the disease status is only known at fixed points in time, and must be inferred at time points in between two visits. In statistical terms, the transitions between disease states are interval censored. The analysis of such studies is typically based on multi-state Markov models (Bureau et al, 2003). Markov models closely resemble transmission dynamic models, as individuals move through compartments representing different disease states. However, at variance with infection transmission models, in which infection transmission is crucial, there is no interaction between individuals, so their disease states are independent of each other. Markov models can be applied to any disease with a multi-state natural history and are not limited to diseases associated with infections. The term ‘Markov’ refers to an assumption that the disease progression depends only on current state, and not on the length of time spent in that state. This may be unrealistic, but semi-Markov models that relax this assumption may be difficult to fit to epidemiological data because the disease process is only partly observed (Titman and Sharples, 2010).

Prospective studies can provide insights into the natural history of infection that cannot be obtained from other study designs. For example, they show that the clearance of incident HPV infection is independent of age, and that the apparently lower clearance of infection among older unscreened women is due to a higher prevalence of long-duration persistent infection (Rodriguez et al, 2010). However, the detection and early treatment of pre-cancerous lesions interfere with the disease process. Ideally, a combination of short-term intensive follow-up and long-term watch-and-wait investigations are required to fully understand the natural history of cancer-associated infections, but the latter investigations are seldom possible. In a population-based study of cervical neoplasias in rural Costa Rica, the frequency of follow-up visits varied by estimated risk of cervical cancer at enrolment, from 6-month or 12-month visits for the highest risk women, to passive follow-up with screening after 5–7 years for the lowest risk women (Rodriguez et al, 2010). An egregious example of long-term follow-up without intervention occurred in New Zealand between 1965 and 1974 when a gynaecologist mistakenly believed that he could distinguish cervical intraepithelial neoplasia grade 3 (CIN3) lesions that would not progress to cancer, and left some women with CIN3 untreated. This was the subject in a judicial enquiry in 1988. Retrospective analysis of cancer incidence among women whose treatment had been reviewed by judicial enquiry provided the most valid direct estimates of the progression rate of CIN3 (McCredie et al, 2008).

Epidemiological designs other than prospective studies have contributed substantially to the understanding of the association between chronic infections and cancer. For example, case–series and case–control studies on cervical cancer and cross-sectional studies on HPV infection in female populations have been essential to show: 1) the virtually constant presence of HPV DNA in cancerous and severe pre-cancerous cervical lesions; and 2) the different carcinogenic potential of different HPV types. Non-prospective studies have allowed us to establish the so-called ‘enrichment’ phenomenon, that is, the gradual rise of the relative prevalence of certain HPV types (high-risk types, notably HPV16 and 18) across precancerous lesions of increasing severity (Guan et al, 2012), and the remarkably consistent distribution of HPV types in cervical cancer worldwide (Li et al, 2011). In fact, cancer-associated pathogens (e.g. HPV, Helicobacter pylori) often belong to more or less large families of serotypes and genotypes that differ markedly by carcinogenic potency. It is, therefore, difficult to pinpoint cancer causality or design effective screening test and vaccinations without a good understanding of the heterogeneity of a pathogen family.

Statistical models used in cancer epidemiology are often descriptive in nature. They summarise the salient features of the data while discounting patterns that may be due to chance or the effects of confounding factors that may also act on cancer risk independently or in synergy with the exposure of interest. In contrast, dynamic models of a disease process are mainly designed to show how the data are generated, or at least provide plausible mechanisms. They can provide a framework in which epidemiological evidence from diverse sources can be combined consistently and in which, ideally, counter-factual questions be addressed, for example, the impact of interventions against infection or precancerous or cancerous lesions for which no control group is available (Uhry et al, 2010) or the predicted impact of interventions that have not yet taken place (e.g., mass immunisation against HPV; (Baussano et al, 2013). Such models can also tackle fundamental natural history questions that cannot otherwise (or not yet) be adequately addressed using empirical data.

Conclusions

Infection transmission models are increasingly used to study the natural history of infection-associated cancers and to project the impact of different control strategies (Garnett et al, 2011). Models of infection transmission and cancer natural history do not need to be always integrated in a single model and some authors concentrated themselves on the natural history of cancer-associated infections (Baussano et al, 2013). Obviously, integrated models of infection-related cancers are ultimately necessary to assess or project the reduction of cancer achievable through the combination of vaccination and screening (Jit et al, 2011). In fact, the possibility of combining infection transmission models and cancer models is an interesting new development of traditional models of acute infections.

As in any other applications of epidemiology, these models face a number of challenges and dangers. They are necessarily simplifications of real-world mechanisms. They often include a large number of inter-dependent parameters that cannot be accurately estimated from available data as well as uncertainties regarding the natural history of the infection and disease of interest. The need to clearly describe background assumptions, statistical methods, and computational solutions cannot, therefore, be overemphasised. Models can make uncertainties and inconsistencies explicit and generate hypotheses that can be explored in silico and tested using empirical data. A range of scenarios can be investigated using the same model, so that the sensitivity of conclusions to different assumptions can be explored.

The words of Box and Draper (1987) a quarter of a century ago still capture very well the need for humility but also courage when using increasingly complex biological knowledge and computing tools in the study of infections and cancer: ‘Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.’