HPVsim: An agent-based model of HPV transmission and cervical disease

In 2020, the WHO launched its first global strategy to accelerate the elimination of cervical cancer, outlining an ambitious set of targets for countries to achieve over the next decade. At the same time, new tools, technologies, and strategies are in the pipeline that may improve screening performance, expand the reach of prophylactic vaccines, and prevent the acquisition, persistence and progression of oncogenic HPV. Detailed mechanistic modelling can help identify the combinations of current and future strategies to combat cervical cancer. Open-source modelling tools are needed to shift the capacity for such evaluations in-country. Here, we introduce the Human papillomavirus simulator (HPVsim), a new open-source software package for creating flexible agent-based models parameterised with country-specific vital dynamics, structured sexual networks, and co-transmitting HPV genotypes. HPVsim includes a novel methodology for modelling cervical disease progression, designed to be readily adaptable to new forms of screening. The software itself is implemented in Python, has built-in tools for simulating commonly-used interventions, includes a comprehensive set of tests and documentation, and runs quickly (seconds to minutes) on a laptop. Performance is greatly enhanced by HPVsim’s multiscale modelling functionality. HPVsim is open source under the MIT License and available via both the Python Package Index (via pip install) and GitHub (hpvsim.org).


Introduction
In August of 2020, the World Health Organization adopted its first ever strategy to permanently end cervical cancer as a global public health problem [1].The Global Strategy to Accelerate the Elimination of Cervical Cancer proposes targets of 90% of girls vaccinated against human papillomavirus (HPV, the virus that causes cervical cancer) by age 15, 70% of women screened by age 35 and again by 45, and 90% compliance with treatment recommendations for pre-cancer and management of invasive cancer.The ability for countries to achieve these ambitious targets was bolstered in April of 2022, when the WHO Strategic Advisory Group of Experts on Immunization (SAGE) completed a review of HPV vaccine dosing schedules and concluded that one dose of the vaccine provides comparable efficacy to two or three doses [2].With more doses now available, the pathway towards cervical cancer elimination has become more achievable for many countries, including those with the highest burden.
The formulation of the goal to eliminate cervical cancer and the strategies that might get us there has been informed by the results of a well-established set of mathematical models [3][4][5][6][7][8].Models have also been instrumental in evaluating and incorporating new technologies into guidelines and health policies worldwide [9][10][11][12][13].These models continue to provide valuable insight to guide cervical cancer elimination.The software that we introduce in this paper differs from the existing suite of models in several ways, two of which are particularly important for the kinds of questions that can be asked of the model.Firstly, shifts in our understanding of HPV epidemiology and the emergence of new screening tools have prompted calls for models that are less reliant on cytologic, colposcopic, or histologic diagnoses [14].Secondly, demand for open-source software in general, and open-source scientific models in particular, has been continually increasing in response to tighter standards of transparency, reproducibility, and accessibility [15][16][17][18].Agent-based models in particular offer advantages arising from their ability to capture correlated heterogeneities, for example between socio-economic status, geographical location, and access to healthcare.
In this paper, we introduce software for creating, calibrating, and analysing agent-based models of HPV and cervical disease, called the Human papillomavirus simulator (HPVsim).HPVsim is built upon the architecture and design of the Starsim modelling platform, a flexible Python-based framework for epidemiological modelling which thus far includes Covasim for modelling COVID-19 [19], FPsim for family planning [20], and Synthpops for population modelling [21].Because this core software architecture has been described in detail elsewhere, in this paper we mainly focus on the two key innovations of HPVsim: (1) on the science side, HPVsim is designed to complement the existing suite of HPV models by offering a novel methodological approach to modelling HPV's natural history; and (2) on the software side, we introduce a multiscale modelling technique that substantially decreases the variability of agent-based simulations.To test the scientific realism of HPVsim, we created a model of HPV transmission in India in collaboration with the National Disease Modelling Consortium of India.The details of this model are interwoven throughout the following section on design and implementation, and additional examples of HPVsim's use are contained in the results section along with validation of the multi-scale approach.

Populations and networks
2.1.1Overview.HPVsim models agents (i.e., people) over time, simulating the transmission of HPV within sexual networks and the progression of HPV to cervical dysplasia and cervical cancer within individuals.When a simulation is initialised, it creates a population of agents with an age pyramid based on the desired country [22].Demographic updates are made annually to account for births, deaths, and migration, using UN data (Table A in S1 Text).HPV is transmitted between sexually active agents over sexual networks, which can be configured by the user by adjusting various input parameters (Table B in S1 Text).The sexual networks are generated by an algorithm described in detail in Table C within the S1 Text, but which can be briefly summarised as a female-driven partner selection algorithm that assigns males to female partners based on age and other preferences (e.g., type of relationship, concurrency, and geospatial location), and then determines the relationship duration, propensity for condom use and frequency of sex for each relationship based on the age and preferences of both parties.Similar kinds of agent-preference algorithms have been used in other agent-based models [23], and we chose to follow this approach here because it allows the network to be more easily parameterised with respect to available sexual behaviour data (as compared to using a mechanistic topology, for example).

Example:
Parameterising a sexual network for India.To create a sexual network model for India, we used data from the Demographic and Health Surveys (DHS) Program to inform the distribution of the ages of sexual debut for both males and females, and adjusted the marriage incidence rates to match reported DHS data on the proportion of women married by age (Fig A in S1 Text).According to the DHS, very few women in India have extra-or pre-marital partners.To capture this, we used a negative binomial distribution for the degree distribution of female's casual contacts, with parameter values chosen so that ~90% of women have no non-marital partners.

HPV transmission
HPVsim models HPV transmission within partnerships with per-act probability β g for genotype g.By default, HPVsim captures three genotype groups: types 16, 18, and a pooled group of all other oncogenic types.This latter group can also be disaggregated into the five types preventable by the 9-valent vaccine ("Hi5", consisting of 31, 33, 45, 52, 58) and other high-risk types (OHR, consisting of 35,39,51,56,59), and we use these two genotype groups along with types 16 and 18 for the examples shown throughout this paper.For simplicity, we assume that individuals can only have a single infection with a given genotype at a time, i.e. a person currently infected with genotype g cannot get a second infection with the same genotype.We do, however, allow multiple concurrent infections with different genotypes.In the absence of clear data to inform within-host genotype competition and given the delay in seroconversion and antibody development to provide own-and cross-immune protection [24,25], we assume HPV infections are independent; agents are no more or less likely to acquire an infection of a different genotype if they are currently infected.The probability of a person i infected with genotype g transmitting to a susceptible person j at time step t can be written as: where β g is the per-act transmission probability of genotype g, eff_cond is the assumed efficacy of condoms (set to 50% by default), cond is the probability of condom use within this partnership, and n(t) is the number of sexual acts within this partnership during time step t.The parameter inf_imm j,g (t) describes person j's protective immunity to infection against genotype g, which can be conferred by either infection (Table D in S1 Text) or vaccination (Table E in S1 Text).We assume men do not acquire neutralising immune protection after clearing infection, based upon a meta-analysis of natural acquired immunity [26].

Disease natural history
2.3.1 Motivation and overview.Historically, cervical disease was classified into histological grades depending on how much of the overall thickness of the epithelium contained neoplastic cells (CIN1 = basal third, CIN2 = up to two-thirds, CIN3 = more than two-thirds).This classification system meant that data from HPV screening programs could be used to estimate transition probabilities between these states, and these transition probabilities form the basis of the most frequently used set of existing published HPV models [27].
When designing a natural history model for HPVsim, a core aim was to decouple the model's representation of disease progression from the particular type of screening method or clinical classification system used, since both of these have evolved substantially over the past decade (e.g., the above-mentioned histological grades are no longer recommended) and continue to evolve as new methods are made available (e.g., AI-based visual evaluation tools [28,29]).The fundamental idea behind our method is driven by the fact that the longer an infection persists, the more likely it is to progress to cancer.This could theoretically be modelled via a single function that would link the duration of infection to the probability of cancer (which was the approach taken in [30]), but then we would not be able to make use of cervical screening data on the prevalence and genotype distribution of precancer.To incorporate screening data, we create a continuous function that defines the relationship between the duration of infection and the onset of precancer.The presence of high-grade squamous intraepithelial lesions (HSILs) is the most frequently-used marker of precancer, although other diagnostic signals may also be used.
In brief, the algorithm behind our natural history model is as follows.First, we sample the duration of HPV infection prior to either clearance or the onset of precancer (Fig 1A).Longer infections are more likely to result in the development of precancer, so we define a relationship linking these (Fig 1B).We then sample the duration of precancer prior to clearance or cancer (Fig 1C).The longer that a woman sustains precancer, the more likely it is that she will develop cancer; this is reflected in Fig 1D, which should be interpreted as the probability that a woman with precancer that persists for more than X years developing cancer, noting that this does not specify at what point the cancer will commence since this is dictated by the durations given in

Choice of distributions.
For all the components of our natural history model -the distribution of infection duration, the duration-to-precancer function, and the probability of cancer beginning -we need a functional form and default parameters.In this section we describe the functional forms (same for all genotypes), while the next section describes the parameter values (which vary by genotype).
For the distribution of the duration of infection, we use a log-normal distribution, since parameterising this with respect to available data is straightforward (see next section).For the duration-to-precancer function, we first consider the properties that this function should have.Available evidence suggests that women tend to spend relatively less time with no or clinically insignificant lesions, and longer with higher-grade precancerous lesions [31][32][33].This can be parsimoniously modeled using the concave part of a logistic function, such that the probability of precancer with genotype g after t years in individual i is: where ν i is an individual effect that accounts for the fact that not all infections will progress at the same pace, and is drawn from a normal distribution with mean of 1, k g is a genotype-specific growth parameter that determines how quickly infections progress, and t is the number of years since infection.We again use a log-normal distribution for the distribution of the duration of precancer, with parameterisation described in the next section.Finally, we need to specify a model of the probability of precancer progressing to invasive cervical cancer.This is a binary outcome and depends on the extent and persistence of dysplasia in the cervical cells, so we use a standard geometric distribution and model the probability of cancer with genotype g after t years in individual i as: probðcancer begins after t yearsjt; g; iÞ The parameter p g represents the probability of a given cell becoming cancerous at any point in time, while the exponent PPC t,g,i represents the cumulative probability of precancer (i.e., the integral of Eq 2).The exact form of PPC t,g,i is given by: The function in Eq 4, while complex at first glance, has three features to recommend it.Firstly, because it's a smooth approximation to a ramp function, it implies that there is an almost-zero probability of cancer beginning within the first few years of an infection, in line with the consensus understanding of the timeline to the onset of cervical cancer [31][32][33].Secondly, it has an intuitive interpretation as the cumulative precancerous time of an infection (i.e., the integral of Eq 2).Finally, it does not rely on any additional parameters aside from those already used in Eq 2.

2.3.3
Parameterising the natural history model.All in all, our method for modelling HPV's natural history has 6 parameters per genotype: the mean and standard deviation of the duration of infection, the growth rate k g in the infection-to-precancer function, the mean and standard deviation of the duration of precancer, and the per-cell probability of becoming cancerous at any given point in time.To parameterise this, we define precancer as the presence of high-grade lesions (HSIL), since this corresponds to the data we have.
For the distribution of type 16 infection durations, we use default values which imply that ~50% of infections clear within 12 months and 60% within 24 months, in line with the findings from the Guanacaste cohort study [33].Values for remaining genotypes are listed in Table F in S1 Text.
To find values for k g , p g , and the mean and standard deviation of the duration of HSIL, we used the sexual network for India described in Section 2.1.2and estimated the natural history parameters by calibrating to cervical cancer cases by age, the distribution of HPV types found within women with high-grade precancerous lesions and the distribution of HPV types found in women with invasive cervical cancer.All data were taken from Globocan 2020 estimates and HPV Information Centre compiled meta-analyses.We ran 10,000 trials searching for parameters that minimised the sum of the normalised absolute differences between the model and these data.This resulted in parameter values documented in Table F in [27].The share of women who would be observed in a given health state X years after their initial infection is plotted in Fig 2C and 2D, and is in line with available observational data [31].In separate work, we further validated our model by applying these estimates to 30 additional countries [30], under the expectation that the natural history should not vary substantially across settings.

Other implementation details. Performance:
To optimise performance, we determine all prognoses upon infection, including the precise time points on which a woman will either clear infection or eventually progress to invasive cervical cancer.These prognoses can be updated at later dates in response to interventions (e.g.screening and treatment) or changes in a state of immune compromise, such as untreated HIV infection.Pre-specifying prognoses in this way means that we only need to change the health state of a small subset of women in each timestep.
Men: For men, a duration of infection is sampled from a log-normal distribution and date of clearance assigned at the time of infection.We do not model persistent infections in males or any associated precancer/cancer outcomes.
Immunity: There is some evidence to suggest that women who clear an HPV infection and are reinfected have a reduced probability of persistent infection or high-grade lesions [34].We capture this using an immunity parameter that reduces the average duration of re-infections of the same type, and also confers some cross-immunity (Table D

in S1 Text).
Age: There is some debate about the extent to which age plays a role in exacerbating the likelihood of an infection persisting or progressing to high-grade lesions [35].In HPVsim, we include an age-modifying effect that increases the likelihood of persistence in women over the age of 30.
HIV: HPVsim can also capture the impact of HIV infection on HPV acquisition and natural history.At each time step, individuals in the simulation can be exogenously infected with HIV based upon their age, sex and the year of the simulation.Upon infection, the ART adherence of that agent is determined, which will then be used to modify any future risk of HPV acquisition and progression.While individuals adherent to ART can fail to suppress HIV or experience immune reconstitution, for simplicity we assume an individual's immune state is directly proportional to adherence.At the point of HIV infection, for individuals with prevalent HPV infection, the prognoses are modified using relative risk estimates derived from a meta-analysis (Table I in S1 Text).

Interventions
The central pillars of the global public health response to HPV consist of prophylactic vaccination programs, cervical screening programs, and treatment of both precancerous lesions and cancer.HPVsim allows users to model these standard interventions (Table 1), and additionally to define their own custom ones that might reflect country-specific strategies for prevention and treatment.The general design of the standard interventions (screening, triage, treatment, and prophylactic vaccination) is intended to allow users to flexibly specify both the product and the details of its delivery.

Multiscale modelling
In agent-based modelling, it is common for one "agent" to represent more than one person (or more precisely, for agents to represent a statistical sample of the population, much like in a survey).However, a problem arises when some events being simulated are very frequent (such as HPV infection, affecting a majority of people), and other events are very rare (such as developing cervical cancer, affecting roughly 1 in 150 women).Under these circumstances, agentbased modelling can produce considerable stochastic variation, especially with smaller simulated population sizes.To illustrate this, consider using a binomial distribution to estimate the

Intervention
Core assumptions, all user-modifiable

Prophylactic vaccination
Efficacy: a single dose of a prophylactic vaccine is assumed to confer neutralising immunity to the genotypes that it is specifically targeted at, with efficacy of 97.5% [36], as well as some cross-immunity to other genotypes (

Treatments for precancer
Treatment products: Thermal ablation is assumed to be 93% effective at removing lesions [40,41]; excisional treatments are assumed to be 95% effective at removing lesions [42]; both are assumed to be 80% effective at clearing infection [43,44].Delivery: Flexibly defined by the user; see examples at docs.hpvsim.org.
Treatments for cancer Radiation therapy is included with HPVsim and extends the lifespan of people with invasive cancer.Chemotherapy and surgery are not included by default, but can be added as custom interventions.

Therapeutic vaccination
Therapeutic vaccine products: There are currently no therapeutic vaccine products on the market, although at least one clinical trial is underway [45,46].HPVsim includes options for modelling therapeutic vaccination.The default assumptions are that the therapeutic vaccine will be delivered as a two-dose regimen targeted at types 16 and 18, with the two doses in combination having 50% efficacy at regressing high-grade lesions and 90% efficacy at virological clearance.Different assumptions can be explored by changing these parameters.Delivery: Flexibly defined by the user. https://doi.org/10.1371/journal.pcbi.1012181.t001 expected number of cancer cases in a given year.With a population size of 10k agents, our best estimate would be 66 cases, with a 95% prediction interval of 48-86 cases.From year to year and simulation to simulation, we would observe a high degree of variability, with a coefficient of variation (CoV) of 0.12.Under this simplified framework, a tenfold decrease in the simulated population size translates to a p 10-fold increase in the coefficient of variation in the estimated number of cases.Avoiding large variability would therefore typically require large population sizes.To avoid this, HPVsim uses a novel approach called multiscale modelling to optimise computational efficiency.
In a multiscale modelling approach, different types of agents are modelled with different levels of "resolution" or "sampling."For example, for a population of 10 million people being modelled, specifying 100,000 agents is roughly the number of agents required to provide a good tradeoff between statistical accuracy and simulation speed.This means there is a scale factor (or "weight") of 100 for each agent (or put another way, people are downsampled by 100).We call these "Level 0" agents since they are at the lowest (default) level of resolution.These are the agents who participate in the sexual mixing network, who are able to contract and transmit HPV, and who can give birth to new agents."Level 1" agents have a lower scale factor (higher resolution, lower weight per agent), and correspond to agents who will (without intervention) develop cancer.These Level 1 agents all have HPV (we adjust to prevent doublecounting), but do not participate in the disease transmission network; instead, only their disease progression is tracked.
When an agent becomes infected with HPV, their disease duration and outcomes are precalculated (although these can be later modified through screening, treatment, and other interventions).This is when multiscale modelling is applied.A schematic illustration of the algorithm is shown in Fig 3, with the complete algorithm outlined in Table H in S1 Text.

Variance reduction from multiscale modelling
In Fig 4, we compare two key simulation features -variability and simulation time -as a function of the multiscale agent ratio and the number of agents in the simulation.To produce this figure, we ran ten simulations for each combination of the multiscale agent ratio and the number of agents, and computed the coefficient of variability (CoV) in deaths and the overall average time per simulation.As an example, suppose we were running a simulation with 50k agents without multiscale (the second largest dark blue dot on Fig 4).This would take ~15 seconds to run and produce a CoV of 0.028 across 10 simulations.To decrease variability, we could either increase the number of agents or increase the multiscale ratio.If we increased the number of agents to 100k, our simulation would take almost twice as long (~28 seconds) and would produce a CoV of 0.023 (the largest dark blue dot on Fig 4).By contrast, if we increased the multiscale agent ratio to 100, we would see a slightly smaller increase in simulation time (to 26 seconds), but the CoV would decrease by much more, to 0.007 (the second largest yellow dot on Fig 4).

Representing diverse sexual networks
For this example, we created two different sexual networks to highlight how patterns of sexual behaviour affect HPV epidemiology and outcomes.The first represents a setting where premarital sex is relatively common, which we model by assuming a Poisson distribution for women's preferred number of partners, with 80% of women preferring to have at least one casual partner.The second represents a setting where premarital sex is very rare, and 90% of women have no non-marital partners, i.e. similar to the model for India that we developed throughout Section 2. In order for HPV to transmit in this second setting, we assume significant overdispersion in the degree distribution for females, i.e. most women have no partners and a small number have many partners.We capture this using a negative binomial distribution for women's preferred number of partners, parameterised with n = 0.025 and probability of success p = 0.0125.In each setting, we will assume that 25% of married males and 2.5% of married females have extra-marital relationships, and in each setting we assume the age of sexual debut is log-normally distributed with a mean of 17 for females and 20 for males, and a standard deviation of 2 for both sexes.
In Fig 5, we plot the degree distribution for each setting, and alongside this we plot the distribution of ages at which key health events in the lead-up to invasive cancer take place.The median age at which women who eventually proceed to invasive cervical cancer acquire their casual HPV infection is 25 years in the setting where premarital sex is common, and 30 years in the setting where premarital sex is rare.Without widespread premarital sex, a small share of sexually active women acquire HPV at a young age, but the majority of women acquire HPV later, after their spouse transmits an infection acquired from an extra-marital relationship.This would have implications for the ideal age to target for cervical screening programs, as In this illustration, there are N = 3 agents, the probability of developing precancer is 67%, the probability of developing cancer given precancer is 50%, the scale factor ("weight") for level 0 agents (non-cancer, green) is S0 = 4, the scale factor for level 1 agents (cancer, red) is S1 = 1, and therefore the multiscale agent ratio is S0/S1 = 4. On the initial step of the algorithm, there are three agents, each with a scale factor of 4, representing 12 people without cancer (S(scales) = 4+4+4 = 12).On the second step, two level 0 agents (with UIDs a and c) develop precancer; one of these agents then develops cancer, and has her scale factor changed from S0 = 4 to S1 = 1.Both precancer agents then generate three new potential level 1 agents; of these six new potential cancer agents, three develop cancer and become level 1 agents (with UIDs d, e, and f assigned sequentially when created).On the third step, the three original level 0 agents (including one with precancer and one with cancer) are combined with the three new level 1 agents (all with cancer) to form the new list of N = 6 agents.Note that although there were N = 3 agents on step 1 and N = 6 agents on step 3, the sum of the scale factors (weights) of the agents remains the same, i.e. 4+4+4 = 4+4+1+1+1+1 = 12.https://doi.org/10.1371/journal.pcbi.1012181.g003well as delivery of other interventions like a therapeutic vaccine.In real world settings, the age of causal infection is also affected by other aspects of the sexual network dynamics, including the age of sexual debut, the incidence of marriage by age, and the typical age differences between partners.

State-level variability in cervical cancer incidence and screening in India
There are large differences in reported age-standardised rates (ASR) of cervical cancer incidence across the regions of India, ranging from 4.8 cases per 100,000 women in 2020 in Dibrugarh district, Assam, up to 27.7 cases per 100,000 women in Papumpare district, Arunachal Pradesh [47].These differences cannot be attributed to HPV vaccination programs, since these only commenced in 2018 and have thus far been limited to two districts of Punjab, opportunistic vaccination in Delhi, and one state-wide program in Sikkim [48].In September of 2023, we convened a training workshop at the Indian Institute of Technology, Bombay, where members of the National Disease Modelling Consortium speculated on possible causes for the state-level variation.Among the many possible contributing factors (including wide variations in demographic, behavioural and reproductive risk factors), one additional factor raised as a possible explanation for this phenomenon was variation in screening uptake.Although all states report relatively low lifetime screening coverage [47], these data are based on usage of the public system, and in some states utilisation of the private health system may be higher, meaning that screening rates could be higher than observed in the public data.
For this illustrative case study, we investigated how much cervical cancer incidence would vary as a function of screening coverage levels alone, holding all else equal.We simulate a population with the demographics and sexual behaviour of India, as presented throughout Section 2. We then assume that screening would have been available as an out-of-pocket service starting from the year 2000, with uptake scaling up linearly from 0% to reach 5, 10, 15, or 20% over the subsequent 20 years.For each scenario, we run 20 simulations and take the average and the 90% range to account for stochastic differences.We find that prolonged screening programs, even with relatively low levels of coverage, may result in lower levels of age-standardised cancer incidence.We estimate an ASR of cancer incidence without screening of 17.17 [90% range: 15.86, 19.45] in 2020; if we assume 20% lifetime screening coverage by 2020 this falls to 12.39 [90% range: 11.73, 14.03] (Fig 6).Given that 20% would represent extremely high uptake of the private system, we conclude that while some of the state-level differences may be attributable to private sector screening, it is unlikely to be the main driver of state-level variation.

Availability
The source code for HPVsim, including documentation and tutorials, is freely available via GitHub.The repository containing the India model calibration is here.The scripts to produce the figures and analyses used in this paper are in a separate GitHub repository available here.This manuscript refers to all features and parameters of version 2.0.1 of HPVsim.HPVsim follows best practices for version and dependency management, including a CI/CD/CT pipeline with comprehensive code coverage to ensure that no regressions are introduced as dependencies are updated.

Future directions
The HPVsim software can be used to rapidly produce models of HPV and cervical cancer, but before a model can be used within a given setting, it must first be calibrated, i.e. contextualised by identifying parameters that generate model outputs resembling observed data.The calibration process represents an important step for future users and developers of HPVsim.Within this paper we presented a use case demonstrating how this can be done, and example scripts for setting up and running a calibration can be found in the Documentation and Tutorials.However, an important future direction for HPVsim research is to demonstrate its ability to fit to more diverse data sets, and to evaluate post-hoc out-of-sample fit.In addition, the model parameters could be further improved with access to better HPV burden and cervical cancer mortality estimates over time and by region.
The sexual network is an important area for future development.First, HPVsim currently only models heterosexual partnerships and cisgender individuals; expanding this could potentially increase the accuracy of model outputs.Second, implementing assortative mixing by geographical location or demographic factors other than age groups would help understand the impact of network topology on disease transmission.Furthermore, heterogeneities in individuals' risk of cervical disease and/or uptake of interventions may also correlate with the assortative mixing pattern in sexual networks.Incorporating functionalities to customise and evaluate such heterogeneities and their correlation with network structure would be key for identifying core groups for intervention.
The calibrated model for India that was presented throughout this paper is being further developed by members of the National Disease Modelling Consortium of India, with a particular focus on deeper investigation of the state-level variation in cervical cancer incidence, and how this variation may influence optimal vaccination policies for each state.There have been some studies on the cost effectiveness of vaccination and screening in India [49], but none have captured the variation by state, a detail that is likely to be vital for motivating programmatic implementation.
The calibration method that we used to parameterise HPVsim can be modified to produce estimates of the posterior distributions of parameters, which allows for better quantification of uncertainty in the parameter values.Throughout this paper we have focused on the best estimates, mostly because agent-based models already contain considerable stochasticity so it is difficult to precisely quantify the additional uncertainty around parameter values.Given these many sources of uncertainty, an approach that we commonly take for interrogating the impact of uncertainty in parameter values is to conduct sensitivity analyses.For example, rather than fixing the efficacy of condoms at 50%, one could run simulations with different values to understand the impact of this assumption.However, improving the methodology for quantifying parametric uncertainty in agent-based models is another direction for ongoing research.While HPVsim's structure is generic enough that theoretically the model can be applied to any country, we have designed it with lower-and middle-income countries in mind, and as such it may not capture characteristics specific to higher-income countries (e.g.we do not yet model cancer progression or treatment in detail).We do not capture diseases caused by HPV apart from cervical cancer (e.g.genital warts, and cancers of the vulva, vagina, penis, anus, and oropharynx), although these could be integrated into future versions by adapting the existing framework.Although we do model heterogeneities in individuals' risk of cervical disease, we do not specifically link these varying risk levels to known confounders such as parity, tobacco use, and other behavioural factors.We also do not capture differences in health-seeking behaviour and uptake of interventions arising from age, race, ethnicity, gender identity, sexuality, or income.All of these current limitations of HPVsim represent possible avenues for future development contributions.

Concluding statement
In response to the continued evolution of policy recommendations around HPV vaccination, screening, and treatment, we have created a model which, over time and as it is further applied, will enable researchers and policy makers to analyse new technologies, evaluate different strategies, and feed into decision-making pipelines.Models like HPVsim are often used to evaluate the impact of interventions, information which can then be incorporated into a cost-effectiveness analysis if cost data are available.We are committed to ongoing development and improvement of HPVsim in partnership with stakeholders, collaborators, and users of the model, not only because such collaborations strengthen the model itself, but more importantly, because they provide an opportunity to work together towards the fundamental goal of cervical cancer elimination.
Fig 1C.The core algorithm as depicted in Fig 1 uses HSILs as the indicator of precancer, but HSILs could be replaced by another biomarker if desired.

Fig 1 .
Fig 1. Infection and progression dynamics for types 16, 18, Hi5 (a pooled group consisting of types 31, 33, 45, 52, and 58), and OHR (a pooled group consisting of 35, 39, 51, 56, and 59).(A) The duration of HPV infection prior to either clearance or the onset of high-grade squamous intraepithelial lesions (HSIL) follows a log-normal distribution, with parameter values varying by genotype.(B) The mean relationship between the duration of infection, genotype, and the probability of developing precancer.(C) The duration of precancer prior to either clearance or the onset of cancer follows a log-normal distribution, with parameter values varying by genotype.(D) The probability that precancer lasting for at least X years will lead to cancer at the end of the duration drawn from the distribution in panel C. https://doi.org/10.1371/journal.pcbi.1012181.g001 S1 Text, with model fit shown in Fig B in S1 Text, and comparison to Globocan estimates in Fig C in S1 Text.Using these estimated parameter values, we obtain the natural history model shown in Fig 1, including the infection-to-precancer function in Fig 1B, the probability of cancer shown in Fig 1C.
Fig 2 then shows some additional validation plots for our natural history model of India.The dwell times in each state shown in Fig 2B, and are in line with other models

Fig 2 .
Fig 2. Implied outcomes from HPVsim's natural history model for India.(A) Age distribution of key health events in the lead-up to cervical cancer, including the age at which women acquire the HPV infection that will eventually lead to cancer (causal infection), the age at which high-grade lesions develop, and the age of onset of cancer, (B) Dwelltimes for no or low-grade lesions and high-grade lesions, for women who progress to cancer.(C+D) Estimated distribution of outcomes for all HPV infections (C), and conditional on persistence (D).https://doi.org/10.1371/journal.pcbi.1012181.g002

Fig 3 .
Fig 3. Schematic example of multiscale modelling implementation.In this illustration, there are N = 3 agents, the probability of developing precancer is 67%, the probability of developing cancer given precancer is 50%, the scale factor ("weight") for level 0 agents (non-cancer, green) is S0 = 4, the scale factor for level 1 agents (cancer, red) is S1 = 1, and therefore the multiscale agent ratio is S0/S1 = 4. On the initial step of the algorithm, there are three agents, each with a scale factor of 4, representing 12 people without cancer (S(scales) = 4+4+4 = 12).On the second step, two level 0 agents (with UIDs a and c) develop precancer; one of these agents then develops cancer, and has her scale factor changed from S0 = 4 to S1 = 1.Both precancer agents then generate three new potential level 1 agents; of these six new potential cancer agents, three develop cancer and become level 1 agents (with UIDs d, e, and f assigned sequentially when created).On the third step, the three original level 0 agents (including one with precancer and one with cancer) are combined with the three new level 1 agents (all with cancer) to form the new list of N = 6 agents.Note that although there were N = 3 agents on step 1 and N = 6 agents on step 3, the sum of the scale factors (weights) of the agents remains the same, i.e. 4+4+4 = 4+4+1+1+1+1 = 12.
Fig A in S1 Text shows how the model can be fit to these additional metrics.

Fig 4 .Fig 5 .
Fig 4. Trade-off between variability (measured by the coefficient of variation in deaths) and simulation time.Simulation times are calculated as the average across 10 simulations.MS ratio = multiscale ratio, and represents the ratio between the number of people represented by agents without cancer and the number represented by agents with cancer (see Fig 3).https://doi.org/10.1371/journal.pcbi.1012181.g004

Fig 6 .
Fig 6.Median modelled age-standardised rate (ASR) of cervical cancer incidence in 2020 across 20 simulations (blue bars) and 90% range (black bars).Modelled results are based on a setting representative of India but with varying hypothetical levels of historical screening coverage, increasing linearly from 2000 to reach the values displayed on the x axis in 2020.https://doi.org/10.1371/journal.pcbi.1012181.g006