Estimating the epidemiology of chronic Hepatitis B Virus (HBV) infection in the UK: what do we know and what are we missing?

Background: HBV is the leading global cause of cirrhosis and primary liver cancer. However, the UK HBV population has not been well characterised, and estimates of UK HBV prevalence and/or incidence vary widely between sources. We aimed to i) extract and summarise existing national HBV prevalence estimates, ii) add a new estimate based on primary care data, and; iii) critique data sources from which estimates were derived. Methods: We undertook a narrative review, searching for national estimates of CHB case numbers in the UK (incorporating incidence, prevalence and/or test positivity data) across a range of overlapping sources, including governmental body reports, publications from independent bodies (including medical charities and non-governmental organisations) and articles in peer-reviewed scientific journals. An alternative proxy for population prevalence was obtained via the UK antenatal screening programme which achieves over 95% coverage of pregnant women. We also searched for diagnoses of HBV in the QResearch primary care database based on laboratory tests and standardised coding. Results: We identified six CHB case number estimates, of which three reported information concerning population subgroups, including number of infected individuals across age, sex and ethnicity categories. Estimates among sources reporting prevalence varied from 0.27% to 0.73%, congruent with an estimated antenatal CHB prevalence of <0.5%. Our estimate, based on QResearch data, suggests a population prevalence of ~0.05%, reflecting a substantial underestimation based on primary care records. Discussion: Estimates varied by sources of error, bias and missingness, data linkage, and “blind spots” in HBV diagnoses testing/registration. The UK HBV burden is likely to be concentrated in vulnerable populations who may not be well represented in existing datasets including those experiencing socioeconomic deprivation and/or homelessness, ethnic minorities and people born in high-prevalence countries. This could lead to under- or over-estimation of population prevalence estimation. Multi-agency collaboration is required to fill evidence gaps.


Introduction
Hepatitis B virus (HBV) is the leading global cause of cirrhosis, and of primary liver cancer incidence and mortality 1,2 . Nearly 300 million individuals worldwide are estimated to be living with chronic HBV (CHB) infection. Risks of complications and death are mitigated by screening to detect cases of infection, clinical monitoring of chronic infection (including liver cancer surveillance in high-risk cases), and antiviral therapy in those who meet treatment criteria 3 .
The United Kingdom (UK) is regarded as a low prevalence setting for CHB 3,4 . However, the attributable disease burden may be substantial in specific population subgroups including people who inject drugs, the prison population, people experiencing homelessness, and individuals belonging to minority ethnic groups and born in countries where the prevalence of CHB is higher 5 . Thus, CHB is concentrated in potentially under-served, vulnerable and/or disadvantaged population subgroups 6 .
Epidemiological characterisation of the UK CHB population has been limited, with no central registry of infected persons. Existing data may primarily reflect new diagnoses (a combination of incident acute infection and new diagnoses of chronic infection), but caution is needed in making inferences about prevalence. Accurate estimation of prevalence is challenging, because complete HBV data are not likely to be well captured by large-scale electronic health record (EHR) databases for either primary or secondary care 7 , as many CHB cases remain untested and therefore undiagnosed. Addressing the lack of HBV summary data in the UK would aid researchers and policy makers in addressing further evidence gaps.
The World Health Organization (WHO) has set targets for viral hepatitis elimination within its Sustainable Development Goals for 2030. The Global Health Sector Strategy on Viral Hepatitis 8 identifies specific goals, including diagnosis in 90% of chronic infections, 90% reduction in incidence of chronic infection, and 80% treatment coverage in those eligible. High quality epidemiological data are therefore crucial to focus and measure progress, inform policy and interventions, reduce inequities and underpin resource allocation. We therefore aimed to i) extract and summarise existing national HBV prevalence estimates; (ii) present new data from the QResearch primary care database; and iii) summarise and critique the data sources from which these estimates were derived.

Methods
We searched for estimates of CHB case numbers in the UK (incorporating incidence and/or prevalence-like data) across a range of available sources in a narrative (rather than systematic) review approach. We included UK-wide reports from government bodies, publications from independent bodies (including medical charities and non-governmental organisations) and articles in peer-reviewed scientific journals, excluding smaller epidemiological studies when there was overlap of data sources with larger samples from other studies. We present positivity rates from each respective data source, but caution that these estimates are not representative of the true UK-wide population prevalence. Details of study samples/denominator are provided. The Office for National Statistics (ONS) provides UK population estimates as a point of reference for the overall denominator 9 .
We also generated a new prevalence estimate, utilising data from the UK primary care database QResearch, which contains over 35 million patient records from more than 1800 individual practices 10 . QResearch was established in 2002 and contains anonymised individual-level patient EHR. Data are collected prospectively and are linked to hospital episode statistics (HES), National Cancer Registration Analysis Service (NCRAS) and ONS mortality data. QResearch ethics approval is with East Midlands-Derby Research Ethics Committee (reference 18/EM/0400).
We identified individuals in the QResearch (version 44) database who had a record of a diagnostic Systemised Nomenclature of Medicine (SNOMED)/Read or International Classification of Disease (ICD) code indicative of CHB, or who had a history of ≥1 hepatitis B surface antigen (HBsAg) or viral load (VL) measurement. From this sample we identified individuals between 01 January 1999 and 31 December 2019, age ≥18 years with CHB, defined as: i) record of a diagnostic SNOMED/ Read code indicating CHB; or ii) record of a diagnostic ICD-9 or -10 code indicating CHB; and/or iii) Presence of HBsAg or VL on ≥2 recordings ≥6 months apart. The characteristics of HBV infection in the cases we identified are further described elsewhere 9 .
We have also drawn on findings from a similar investigation previously undertaken in the Clinical Practice Research Datalink (CPRD) 11 , which is another UK primary care database containing EHRs for over 16 million patients. This previous investigation identified CHB individuals from patients registered in the database between 2000 and 2015.
This article can be found on medRxiv 12 .

Results and discussion
UK data for CHB epidemiology are summarised in Table 1. Three of six estimates report information concerning population demographics, including number of infected individuals across age, sex and ethnicity categories. Among sources setting out to report prevalence, estimates varied from 0.27% (British

Amendments from Version 1
We have modified our manuscript to emphasise that we undertook a narrative literature review to summarise population-wide chronic HBV infection prevalence estimates produced to date. We have also considered additional data sources suggested by reviewers, and updated our discussion accordingly to reflect this Any further responses from the reviewers can be found at the end of the article An alternative proxy for population prevalence is obtained via the UK antenatal screening programme, which achieves over 95% coverage of every pregnant woman annually (approx. 700,000 women in the UK), with a CHB prevalence of <0.5% 13 . Differences between sources highlights varied sources of error, bias and missingness, problems with data linkage, and substantial "blind spots" in consistent testing and registration of HBV diagnoses.
As HBV is a notifiable disease in the UK, the UK Health Security Agency, UKHSA (previously Public Health England, PHE), has a comprehensive surveillance system for monitoring burden of CHB, by monitoring testing and diagnoses across the care pathway. This incorporates data from diagnoses through to outcomes, (including end-stage liver disease, transplantation, liver cancer and deaths) using laboratory testing surveillance (sentinel surveillance of blood-borne virus (BBV) testing in primary and secondary care settings), new laboratory diagnoses, hospital activity datasets and registries (NHS Digital hospital episode statistics, ONS cancer and deaths registries, NHS Blood and Transplant registry). However, these data have not yet been combined and incorporated in a statistical model to estimate prevalence. Sentinel surveillance captures testing in community, primary care and secondary care settings across a network of laboratories covering approximately 40% of the general population of England 13 . This likely gives the best estimate of diagnosed prevalence among a tested population, but because it combines acute incident infections and new diagnoses of pre-existing chronic infection, incidence and prevalence cannot be disaggregated.
The majority of diagnostic data are generated through testing individuals with risk factors for HBV infection or evidence of liver disease, and among those accepting risk-based testing (as captured in laboratory testing surveillance) likely overestimates the overall population prevalence. Alternative estimates can be generated from screening blood donors 22 , but this group are self-selected low-risk individuals who have undergone deferral discussions [23][24][25] , and are predominantly UK born, and therefore are not representative of the general population, leading to an under-estimate of prevalence 23 . No existing estimates factor in the undiagnosed burden, which represents the majority of people living with HBV infection (the WHO estimates that only 10.5% of people with CHB are aware of their infection status 3 ). Furthermore, the highest prevalence of CHB is in groups for whom provision of healthcare is inadequate, and/or access to healthcare is challenging (including migrants, sex-workers, prisoners, and people experiencing homelessness), so overall there are still many gaps in the data, and it is most likely that estimates using primary care datasets considerably underestimate the true burden 6 . There is a need for more evidence to delineate epidemiology in these populations.
Previous investigation has reported an increased burden of HBV infection in young, male, socioeconomically deprived individuals belonging to ethnic minorities within the UK 26 . These individuals comprise a larger percentage of the London city population as compared to other regions in the UK, and therefore regional differences must be considered when deriving a population-wide prevalence estimate, which can be achieved through statistical models.
Less biased estimates are achieved by non-targeted testing programmes such as universal antenatal screening. Opt-out screening of blood drawn from patients admitted to a selection of Emergency Departments (EDs) for HBV, HCV and HIV is being piloted across 34 sites across the UK, with new HBV diagnoses outnumbering HIV or HCV by >2:1 in the first 100 days 27 . Over time, expansion of this programme should provide an additional valuable source of epidemiological data.
While UKHSA surveillance data may include some demographic characteristics (age, sex, postcode for deprivation), unless linked to other healthcare datasets, they typically lack more detailed clinical and demographic indicators (for example, measures of deprivation, lifestyle factors, assessment of liver disease, and HBV treatment coverage) which are needed to characterise the infected population. In contrast, EHR databases (such as CPRD and QResearch) have the advantage of collecting relevant demographic and clinical metadata which are not captured by UKHSA. However, linkage across data sources is disaggregated, and thereby each EHR-based estimate misses a portion of the infected population. For example, primary care data may not reflect testing conducted in secondary care 28 , blood safety (transfusion/transplantation) and laboratory data generated by other services, while secondary care data are typically only reliable for the sub-population enrolled in consistent hospital follow-up. Poor data flow between diagnostic testing and EHR reflect a low clinical follow-up rate following a positive HBsAg test. This limited linkage to care reflects how services may not provide well for the CHB population, with gaps in referral pathways, inadequate communication and education (including translation services), and failures to deliver services to marginalised communities. Therefore, EHR databases offer the potential to characterise a subset of those infected with HBV, but do not currently generate a picture that is generalisable to the wider infected population, and cannot on their own be used to estimate prevalence.
Prevalence estimates for Hepatitis C virus (HCV) 29 and human immunodeficiency virus (HIV) 30 have recently been generated using multi-parameter evidence synthesis and back-calculation models. Similar modelling approaches to produce estimates of HBV incidence and prevalence in the UK are warranted.
Enhanced investment is needed to support the establishment of national registries with robust centralised data linkage between sources including national laboratory surveillance systems of BBV testing and new diagnoses, and thus determine which population subgroups are bearing the majority of the HBV disease burden. This will inform prevalence modelling and provide an evidence base for delivery of appropriate resources and interventions, and to benchmark progress towards elimination targets.

Ethics approval
QResearch ethics approval is with East Midlands-Derby Research Ethics Committee (reference 18/EM/0400).

Data availability
Only CC, TW, RB and JH-C have access to the QResearch individual-level patient data in order to ensure confidentiality of personal and health information, in accordance with the relevant licence agreements. QReseearch data access is according to the information on the QResearch website (www.qresearch.org).

Recommendation Evidence gap addressed
Expansion of systematic screening, including opportunistic approaches (sexual health, antenatal, emergency medicine, people born in high-prevalence settings) to better understand HBV epidemiology in the UK.
Lack of knowledge of UK HBV epidemiology and distribution.
Improved centralised data linkage between services, including laboratory records, blood and transplant services, primary and secondary care, supported by collection of metadata.
Inefficient use of administrative health data for research and to inform policy and service delivery.
Disaggregation of incidence/prevalence data where possible at source.
Lack of clarity regarding existing and incident burden of HBV. Establishment of regional and/or national registries to collate linked data for HBV infection at a population level and within high risk groups.
Lack of comprehensive identification of infections at a regional/national level, and poor representation of high risk groups.
Mathematical modelling to optimise use of existing data to generate incidence, prevalence and caseload estimates (accounting for variations between geographic centers and populations) Lack of a robust national-level prevalence estimate, and refinement to identify differences between population groups.
Efficient use of existing data sources to identify systematic data gaps, refine allocation of resources and predict progress towards elimination targets.
Insufficient progress towards elimination targets required to achieve elimination of viral hepatitis as a public health threat by 2030.
Author contributions PM and CC conceptualised the study. CC conducted the literature search for estimates, and drafted the manuscript with PM. JH-C, RB. SM, TW and EB provided methodological input. JH-C, SM, TW and EB revised the manuscript.  I appreciate that this is a narrative review, yet the information on what was searched (i.e., which keywords), in which locations (i.e., with which search engines/sources) and when (i.e., years in which databases were allowed) is unclear. It would also be helpful to include some notion of which databases were included and excluded. For instance, it seems that many small epidemiological studies intended to find prevalence or incidence in very specific demographic populations could be missed (which could help address some of the gaps stated by the authors). Including this information would help guide the reader as to how the authors arrived at their selection.
I also very much appreciate the summary box, which includes recommendations on how to achieve national CHB caseloads. But given the low prevalence of HBV and higher prevalence in specific key populations, would it make sense to aim for a national CHB caseload estimate? The aim of these recommendations should more reflect what the authors stated in their title, towards understanding the epidemiology of HBV in the United Kingdom. It would also be helpful to include a column in this box with the gap(s) in knowledge corresponding to the given recommendation, so the reader can identify the limitations of existing data (and hopefully avoid it in the future).
Finally, it is unclear why prevalence estimates were given in the results, or even why this was examined in the first place if the aim was to discuss data sources to understand HBV epidemiology (i.e., data sources that have an estimate of HBV prevalence). I would suggest that the authors make it clear why this statistic was included, possibly by adding it as a secondary objective.

Lung-Yi Mak
Department of Medicine, Queen Mary Hospital, The University of Hong Kong, Hong Kong, China This is an extremely important piece of work to estimate the epidemiology of HBsAg+ seropositivity in the UK. The authors summarized data from 6 sources, leading to a crude estimate for HBsAg seropositivity ranging from 0.27-0.73%. The following points should be addressed to further enrich the scientific contents and provide some directions for further research: 1. Ongoing pilot programs of universal HBV testing at Emergency Departments are being conducted in some NHS Trusts, especially after the COVID-19 pandemic. Would there be any chance that updated data from these pilot programs might have been recently reported and can be included in this piece? 2. Regional differences within the country, or even to a smaller scale, within a city, should also be appreciated. For instance, East London is likely having a much higher prevalence of HBsAg+ than the West/ North side of the city. While population-based data is very important, targeting high-risk groups (as the authors also addressed) is equally crucial to inform healthcare strategies such as resource allocation. This is also relevant to UK in view of the low overall prevalence of HBsAg (<2%, according to data presented), and population-based screening may not be as cost-effective as other highly endemic regions. Although the CDC has updated the recommendations since 2022 to screen for HBV infection for at least once in the lifetime for all adults, the actual implementation of such approach will highly depend on the resources available. Therefore, it would be helpful if there is data for HBsAg seroprevalence in the high-risk groups; and in addition to that addressed in this article (sexual health, antenatal, emergency medicine, people born in high-prevalence setting), regions that are traditionally considered to be impoverished areas or known to harbour a relatively high proportion of immigrants, should be the target groups for such opportunistic approaches.
3. Would there be any overlap between the laboratory-identified HBsAg+ cases from UKHSA with other data sources? (i.e., the primary care database QResearch, CPRD primary care database and the antenatal screening programme) 4. Risk of bias and sampling error is well acknowledged -the authors may also include data from the NHS Blood and Transplant study published in Transfusion 2021 which reported a seroprevalence of HBsAg+ being 6.9/ 100,000 donors between 2009-2018.

Is the study design appropriate and is the work technically sound? Yes
Are sufficient details of methods and analysis provided to allow replication by others?