Hepitopes: A live interactive database of HLA class I epitopes in hepatitis B virus

Increased clinical and scientific scrutiny is being applied to hepatitis B virus (HBV), with focus on the development of new therapeutic approaches, ultimately aiming for cure. Defining the optimum natural CD8+ T cell immune responses that arise in HBV, mediated by HLA class I epitope presentation, may help to inform novel immunotherapeutic strategies. Therefore, we have set out to develop a comprehensive database of these epitopes in HBV, coined ‘Hepitopes’. This undertaking has its foundations in a systematic literature review to identify the sites and sequences of all published class I epitopes in HBV. We also collected information regarding the methods used to define each epitope, and any reported associations between an immune response to this epitope and disease outcome. The results of this search have been collated into a new open-access interactive database that is available at http://www.expmedndm.ox.ac.uk/hepitopes. Over time, we will continue to refine and update this resource, as well as inviting contributions from others in the field to support its development. This unique new database is an important foundation for ongoing investigations into the nature and impact of the CD8+ T cell response to HBV.


Hepatitis B virus (HBV) is the prototype virus in the
Hepadnaviridae family. It has unique features that make it interesting and distinct from other related viruses. Its unusual, partially double-stranded circular genomic structure, comprising several overlapping reading frames 1 , represents a potential barrier to variability -any nonsynonymous nucleic acid substitution potentially has to be tolerated in the resulting sequences of more than one protein product 2 . However, its error-prone reverse transcriptase enzyme, responsible for the generation of nucleic acid intermediates, is a source of diversity that is unusual in a DNA virus 3-5 .
HBV is estimated to infect 240 million people globally, and to cause over half a million deaths each year (http://www.who.int/ mediacentre/factsheets/fs204/en/). At present, therapy consists of peginterferon alfa-2a, tenofovir disoproxil or entecavir (https:// www.nice.org.uk/guidance/cg165). Although successful in mediating viral suppression, these drugs, either alone or in combination, do not commonly mediate cure, and many patients are committed to life-long therapy. However, the striking recent success of direct acting antiviral therapy for HCV has sparked new enthusiasm for the vision of a HBV cure 6 .
To optimize our chances of HBV eradication, there are a number of important public health interventions that must be pursued in parallel, including diagnostics, treatment and prevention. A crucial component of this strategy is the development and testing of new drug agents, and the investigation of immunotherapeutic interventions 7,8 . There is growing interest in the idea of a therapeutic vaccine that could boost or mimic the immune responses that correlate best with HBV clearance in natural infection 6,9,10 .
There are several strands of evidence for an important role of the CD8+ T cell response in HBV. Associations have been documented between HLA genotype and disease outcome 11-13 , including responses to, and recovery from, acute infection [14][15][16]  Therefore, we have begun the process of collating HBV class I epitopes. Our primary aim was to provide a detailed and comprehensive database of epitopes that have been identified to date, but with an ongoing remit to develop this into a sustainable open-access online research resource that evolves over time.  Table 1. We also identified three additional references by searching the bibliographies of articles that had been identified in the primary search. These can be identified within our dataset (by searching the column entitled 'database' which specifies the origin of the citation).

Materials and methods
Using this approach, we identified 447 papers in total. We reviewed each paper to ensure it met our search criteria, excluding duplicates (n=113), those not relevant or missing essential data (n=177), publications not written in English (n=26) and those that we were unable to access (n=19), leading to the inclusion of final data from 112 papers. From these, we recorded the following details: citation, epitope location and sequence, HLA restriction, experimental approach used to confirm the epitope, and any documented association with stage of HBV disease. From the selected 112 manuscripts, we collated a database of HBV epitopes using Google Sheets (https://docs.google.com/spreadsheets) each of which has documented optimal and/or overlapping peptide sequence and defined HLA-class I restriction.

Sequence numbering
In order to unify our approach to numbering amino acid position within HBV proteins, we used a reference sequence published by Liu et al. 29 . In order to support consistent alignment, and to identify potential genotype-specific differences that may affect epitope presentation or recognition, we have also defined consensus sequences for each protein based on all available sequences downloaded from https://hbvdb.ibcp.fr/HBVdb/ 2 . The reference sequence and aligned genotype-specific consensus sequences are available on-line (DOI: http://dx.doi.org/10.6084/ m9.figshare.4040700.v1 30 ). Interactive database The results of the literature search, which will form the basis of an expanding data resource, can be viewed at http://www.expmedndm. ox.ac.uk/hepitopes as an interactive web element to facilitate other researchers in exploring and querying the dataset ( Figure 1A). This web element utilizes the widely used JavaScript library Datatables (SpryMedia Limited v. 1.10.12; datatables.net) to allow the data to be filtered and sorted interactively. Hosting is provided by RStudio webservice (shinyapps.io) and the web element was developed using the R framework Shiny (Shiny: Web Application Framework for R, v. 0.14.1.; https://CRAN.R-project.org/package=shiny) 31 .
The interactive web element was developed as a case study of the Live Data project led by Research Support Services and funded by IT Services at University of Oxford. The project was initiated to investigate whether a central interactive visualization service would increase engagement with datasets underlying research projects, and foster additional research impact. Hepitopes provided a template case study for researchers interested in making creating interactive interfaces to databases and flat files.
We will continue to curate the Hepitopes on-line database, adding amendments and updates over time. To facilitate wider contributions from the scientific community, our website includes a contact portal ( Figure 1B); we welcome contributions of new data and citations, corrections to existing information, or general feedback about the site and resource, with the aims of refining the quality of the dataset, encouraging dialogue and collaboration, and adding new tools, including links to other relevant resources.

Dataset validation
The results of our primary literature search are limited by our ability to identify all reports that may contain pertinent data; it may be that we have overlooked relevant citations due to the data being presented in manuscripts that did not contain our specific search terms. Furthermore, we were unable to access the complete report for 19 citations, which may contain relevant data (e.g. conference abstracts, for which the entire dataset has not been published electronically). We recognize that the nature and quality of the data presented in different reports is heterogeneous, due to different genotypes of HBV studied presented with varying alignments, and a diverse range of methodological approaches ranging from in silico prediction to detailed in vitro elucidation of epitopes in natural infection or in animal models.
It is important to recognize that the predominance of any particular HLA class I molecule may not reflect a true biological immunodominance hierarchy, but rather is a result of the bias towards identification, reporting and further investigation of epitopes restricted by the HLA class I alleles that occur at the highest phenotypic frequency in the majority of human populations. This is illustrated by HLA-A*02 epitopes, which -at the point of releasing the database -account for 42% of the total and are referenced in 107 of the cited manuscripts.
We also acknowledge that the HBV literature is skewed towards investigation of certain populations, most specifically in the Far East, where HBV genotypes B and C are endemic 32 , and potentially also in Western Europe and North America where better resources are available for these studies. On these grounds, the results of our literature search over-represent certain host/virus interactions, specifically between HBV genotype-B/C and host HLA class I alleles represented at the highest phenotypic frequency in these populations. An alternative source of bias in the existing data is the peptide sets used for in vitro confirmation of optimal epitopes, among which genotype D is anecdotally over-represented (although this methodological bias is difficult to quantify in any systematic way).
The aim is that, as the resource develops over time, we will build

Access to citations represented in the Hepitopes database
The DOI (digital object identifier, issued for articles published after the year 2000) listed for each citation within the Hepitopes database is in the public domain. Access to the abstract and full text is subject to institutional or individual subscription to the journal, or open access availability.

Ethics
Patient consent was not required for this project, and no specific ethics approval was required. The original manuscripts cited in our database should be consulted individually for details of specific ethics approvals and consent, if required.
Author contributions SL, MA and PM conceived the study. LC advised on the bibliographic search to identify relevant citations. SL, AM, JS, NG, OD, YYC, SR, and AK performed the original review of all the citations and collected relevant data. PM re-reviewed all the references in the final dataset. HN and MH developed the software to support on-line visualization of the database in consultation with PM. SL and PM wrote the manuscript, with methods contributed by HN, MH and LC, and expert input from CNH, RT, PK and EB. All authors were involved in the revision of the draft manuscript and have agreed to the final content.

Competing interests
No competing interests were disclosed.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
F1000Research database is limited to CD8+ T cell epitopes?

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed.

Cataloguing efforts
This retrospective collation and cataloguing work of sites and sequences of all published class I epitopes in HBV (Hepitopes) is a useful contribution to the knowledge base in this field. This systematic literature review has clearly been a heavily manual effort as information is so dispersed. The two online bibliographic databases that were selected to undertake targeted searches, Medline and Embase, may not offer 100% coverage and what about other resources in other languages? The 'universe' is not explicitly mentioned in the paper and needs to be added.

Technology and sustainability
I found the website easy to use, but at times, unstable, with multiple crashes on the site/ disconnection from the server. It is good to see the use of open source tools, for example, Shiny R. The interactivity is quite basic, based on column sorting (effectively an online spreadsheet) but nevertheless useful. I found the feedback and contributor process to be manual and quite primitive, through what is essentially a minimally structured paper-based form. It would nice to see a more automated upload process with structured fields available for population and validation on the fly; and some local vetting once an entry is submitted.
In terms of future proofing and keeping data up to date, the team note that they have an ongoing remit to develop this into a sustainable open-access online research resource that evolves over time. While the code is usefully for the datatable available on , it is not that clear how any maintenance of the Github project might be supported and under what budget. The team have promised to continue to refine and update this resource, yet what happened if key staff leave?
Future proofing is the main concern of so many local resources like this, which is why the Research Council's typically try not to fund on-going longer-term information resources under the auspices of smaller one-off research projects due to the lack of future accessibility. ESRC have set up a centralised web based ReStore to ensure that their web-based research methods resources created under one-time research awards are maintained in the longer term -either as maintained resources or static snaphot web-based resources. So may resources created with the very best intentions, are not kept up to date and may become out of date quickly, and this become a relic While I really embrace the idea of crowd sourcing information to maintain future, in my opinion the physical resource /database (the catalogue) should be looked after by an official long standing trusted repository e.g. Genbank and not an individual university. The fact that the authors mention another almost identical on-line database of epitopes, but for HIV (( ), and other online repositories in the field of HIV https://www.hiv.lanl.gov/content/immunology/ctl_search T cell immunology (sequence data including HBV-specific databases and https://hbvdb.ibcp.fr/HBVdb/2 , and in broader sequence repositories, such as ) http://www.hepseq.org http://www.ncbi.nlm.nih.gov makes me wonder why a unified database has not been proposed with a single interface! The last thing the community needs is a proliferation of locally managed lists so why not combine them into one infrastructure? This professionalised future proofing of resources like this really needs to be addressed in the paper, even if they are aspirational.
Finally, if the web resource/front end was not to be retained in the future this snapshot database is usefully backed up in Figshare. The paper says the team will keep this updated, but it is not clear how future dumps of the dataset might be formally archived, persistently cited, and referenced in Figshare  This manuscript describes the creation of a database cataloging epitopes derived solely from Hepatitis B virus. The authors need to be congratulated for their effort to set up a clear and user friendly resource that provides all the current scientific information known on HBV epitopes at present.
What I found particularly nice is that it details how the different epitopes were experimentally determined and this allow the user to understand whether the epitope was able to induce a T cell response in a subject with natural infection or was only the result of determination and binding assays. in silico Such a resource will allow easy progression of future research in T cell immunology. Researchers will have access to the information on relevant HBV epitopes in one dataset, which will be both convenient and time-saving. In addition as noted, the experimental technique of epitope determination is noted which can further direct experimental design.
As noted by the authors, the resource, will hopefully, can continue to grow, certainly considering the number of people who are screening HBV patients, this might not be a trivial task. In addition it is also of value that comments from readers can be added and thus allow authors to update the resource as appropriate. Thus the importance being that the resource will evolve over time and therefore be an

5.
appropriate. Thus the importance being that the resource will evolve over time and therefore be an invaluable asset in HBV research.
The authors of the manuscript/resource have clearly outlined their methods for the literature search in developing the interactive database which are robust. I also note the limitations of dataset validation, whereby citations could be overlooked as they are not search specific. However, accepting this I have a couple of suggestions, that I hope can be useful to improve the utility of the site: The data gathered do not appear to be consolidated and as such do not generate a "final" state of knowledge on HBV epitopes landscape. The website provides all the information but they are listed as a list of papers. This can create potential confusion, since for example if you are interested to know whether the core 18-27 FLPSDFFPSV epitope comprises a polymorphism (which it does) you need to ask this information when you are reading the relevant paper, where such polymorphism was demonstrated. If you are asking this identical question starting from a paper where such polymorphism was not analyzed, the answer is that there is no polymorphism in this region. I think this is an aspect that could be improved since it might generate some misinterpretation.
A similar problem is also present in relation to the impact of HLA-A02 subtypes. Some epitopes are presented as pan A2 based on old papers but more recent publication (Tan AT 2008) have et al.
show that some epitopes are exclusively A0201. For example the envelope epitopes FLLTRILTI or FLLTKILTI were demonstrated exclusively on HLA-A0201 subjects and not on HLA-A0203, 0206, 0207 subjects.) The progressive evolution of information is somehow lost. Thus at places where the term 'pan' is used this may need to be verified as new data may not corroborate this. Having said that I appreciate the limitations, as noted in the manuscript regarding data validation.
When asking for a list of, for example, HBV epitopes within the core region, the web site provides all the papers where core epitopes were demonstrated but there is a lot of redundancy. There are 6 pages listing all the papers published using the core 18-27 epitope. I am wondering whether there could be a method to first list all the different epitopes with the indication of the number of publications where such epitope was reported as a measure of the evidence supporting the validity of the epitope. This may be a method to improve the resource.
The description given in the database for the method of epitope characterization is very detailed, but what may not immediately be clear is whether the method used is accurate enough to give you real HBV epitopes. For example, if the epitopes were characterized using the T cells from naturally infected patients and targets where HBV proteins were endogenously processed, that would be ideal. If the epitope was only characterized by predictive algorithms, then it would be less so. Perhaps a scale (Patient T cells with endogenously processed epitopes > Patient T cells with peptide pulsed targets > binding assays > predictive algorithms) indicating the validity of the consolidated epitopes might be clearer for the users of the database.
As some citations may have been missed due to the search criteria, it may be a possibility to widen the search. This may be more feasible once readers have provided feedback. This will allow the search to be increased possibly to include all the relevant data for the resource. We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed. Competing Interests: