HRRD: a manually-curated database about the regulatory relationship between HPV and host RNA

HPV (Human papilloma virus) is a kind of small double-stranded DNA viruses which is extremely associated with different cancers. The roles HPV plays in the host were gradually identified through the interaction between it (including its early genes) and host RNA. In recent years, increasing numbers of studies in HPV-related cancers have been published showing the relationship between HPV and host RNA. Here, we present a database named HRRD, which contains the regulatory relationship between HPV and RNA (mRNA, miRNA and lncRNA). The information was extracted from 10,761 papers in PubMed (up to December 1st, 2019). In addition, the sequence map of HPV (198 genotypes) is also contained. HRRD was designed as a user-friendly web-based interface for data retrieval. It integrated the information of interaction between HPV and RNA, which reflects the relationship between HPV and host. We hope HRRD will further provide a comprehensive understanding of HPV in carcinogenesis and prognosis. HRRD is freely accessible at www.hmuhrrd.com/HRRD.


Scientific Reports
| (2020) 10:19586 | https://doi.org/10.1038/s41598-020-76719-6 www.nature.com/scientificreports/ Despite of numerous experimental studies in the field, there are no computational resources with a unique focus on the relationship between HPV and host RNA. Hence, it is important and necessary to establish a proper multicomparative platform to show this mutual relationship. Here, we present HRRD, a database of the relationship between HPV and host RNA, which contains the existing studies on HPV and host RNA. It will contribute to providing a comprehensive web-based resource for HPV and host, and data support for HPV-related research.

Methods
Implementation. HRRD (Fig. 2).  The information of the relationship between HPV and host RNA was extracted from articles, containing the PMID number, cancer type, HPV genotype, HPV oncogene, RNA type and name, method, material and the regulatory relationship between HPV and host RNA (action direction & action type). The information is obtained by reading the literature manually. The collected data were checked and unified, and finally summarized into Excel form. There are totally 784 pieces of information extracted from the articles, including 13 pieces of lncRNA, 195 pieces of miRNA, and others are mRNA.
The sequence information of different HPV was extracted from the genbank files, including the full-length genome of HPV and the location of its oncogenes. The information of HPV sequence was stored in the database in TXT format.

Results
Relationship page. The principal function of HRRD is query. The information about the relationship between HPV and host RNA is stored in the database. In the Relationship page, there are three input boxes with examples for users to search. Users can search by one to three conditions like HPV genotype, HPV gene or RNA name to get a list containing the relevant content of users searched. The list would contain following information: HPV genotype, HPV gene, RNA type and name, action direction and type, cancer type, PMID, method and material. Users can jump to the article summary in PubMed by clicking the PMID number. "HPV to host" means HPV affects host RNA expression, while "host to HPV" means host RNA influences HPV expression. "Activation" refers to up-regulated expression, while "Suppression" means down-regulated expression. The method shows whether this message was concluded from experimental verification or data analysis (Fig. 3C). The result was sorted by the name of RNA. The export button can help users to export the searched results in XLS format.
Sequence page. The genome information of HPVs is stored in the database. In the Sequence page, the table shows all species of HPV stored in HRRD. Users can input different genotype of HPV in the search box and click the search button to search HPV in the table below. Users can go to the result page by the plot button or clicking the HPV ID in the table (Fig. 3B). The result page is divided into two parts. The upper part displays the linear www.nature.com/scientificreports/ viewer of the corresponding HPV sequence, and the complete sequence of the HPV is shown in the lower part (Fig. 3D). The picture and sequence of HPV genome can be accessed directly.
Other pages. Other pages contain some other ancillary functions. A simple Home page was designed ( Fig. 3A) with a brief introduction of HRRD. The scrolling picture on the Home page shows HPV, the process of HPV infection and the effect of HPV on host cells. Some frequently asked questions and answers about HRRD are given in the FAQ page. In Links page, there shows the hyperlinks and brief introduction of some HPV-related databases like PaVE, HPVdb, HPVbase and NCBI. If necessary, users can visit the corresponding website. In the Download page, there is a software hyperlink named DisV-HPV16 (version I), which was developed for detecting HPV16 and its oncogenes expression in RNA-sequencing data.

Discussion and future development
To our knowledge, HRRD is the first database designed for the relationship between HPV and RNA specifically.
To ensure the accuracy of data, the information was manually extracted from relevant literature. HRRD describes the sequence map of HPV and incorporates some links to HPV-related databases so that users can search and get relevant information quickly and easily. We chose RNA as our research object, because RNA is unstable and susceptible to changes and mutates by viruses, and it is the first step in protein synthesis. For example, if researchers detect HPV infection in cancer, HRRD can help to find out whether HPV affects the key RNA and causes the carcinogenesis. In addition, if researchers want to find some key RNA in HPV-related cancer, HRRD can help to do preliminary screening and query. With the plenty resources, it will provide candidate biomarkers for HPV-related carcinogenesis and prognosis and identify potential targets for molecular therapy. We anticipate it will facilitate an insight into HPV and host.
Despite the database has contained data up to December 1st, 2019, a rapidly growing number of relevant researches are emerging from then. HRRD will be updated regularly by scanning newly published literatures. To ensure the timeliness of data, we plan to use data mining and machine learning to obtain information from the literatures. At the same time, we will update the software with modified function and expanded the scope of application. These updates will be implemented in the near future.

Data availability
HRRD is freely accessible at www.hmuhr rd.com/HRRD.