Lnc2Cancer v2.0: updated database of experimentally supported long non-coding RNAs in human cancers

Abstract Lnc2Cancer 2.0 (http://www.bio-bigdata.net/lnc2cancer) is an updated database that provides comprehensive experimentally supported associations between lncRNAs and human cancers. In Lnc2Cancer 2.0, we have updated the database with more data and several new features, including (i) exceeding a 4-fold increase over the previous version, recruiting 4989 lncRNA-cancer associations between 1614 lncRNAs and 165 cancer subtypes. (ii) newly adding about 800 experimentally supported circulating, drug-resistant and prognostic-related lncRNAs in various cancers. (iii) appending the regulatory mechanism of lncRNA in cancer, including microRNA (miRNA), transcription factor (TF), variant and methylation regulation. (iv) increasing more than 70 high-throughput experiments (microarray and next-generation sequencing) of lncRNAs in cancers. (v) Scoring the associations between lncRNA and cancer to evaluate the correlations. (vi) updating the annotation information of lncRNAs (version 28) and containing more detailed descriptions for lncRNAs and cancers. Moreover, a newly designed, user-friendly interface was also developed to provide a convenient platform for users. In particular, the functions of browsing data by cancer primary organ, biomarker type and regulatory mechanism, advanced search following several features and filtering the data by LncRNA-Cancer score were enhanced. Lnc2Cancer 2.0 will be a useful resource platform for further understanding the associations between lncRNA and human cancer.


INTRODUCTION
Cancers are a leading cause of morbidity and mortality worldwide, whose complexities can lead to the difficulty of treatment (1,2). The discovery of numerous long noncoding RNA (lncRNA) transcripts in human has dramatically altered our understanding of cancer (3). LncRNAs play pivotal roles in mediating the crosstalk between various cellular components, including proteins, RNAs and lipids, which are involved in cancerous processes (4). In order to facilitate the studies of lncRNA-cancer associations, the first version of the Lnc2Cancer database (Lnc2Cancer 1.0) was reported to allow users to search all known experimentally supported lncRNAs associated with various human cancers (5).
With the increasing interests in human lncRNAs and the availability of high-throughput technologies, the number of cancer-lncRNA associations has increased rapidly (6)(7)(8). In addition, regulatory mechanisms such as genetic variant, microRNA (miRNA) interaction, transcription factor (TF) binding and methylation modification of lncRNAs in cancers have also been widely studied (9)(10)(11). Therefore, it is urgent to update Lnc2Cancer with more resources and improved tools. More importantly, some novel research directions have emerged in cancer-related lncRNA field. In recent years, increasing evidences suggested that lncRNAs could serve as potential non-invasive diagnostic (12), drug resistance (13) and prognostic-related biomarkers (14) in various cancers. For example, the potential use of circulating lncRNAs in serum, plasma and other body fluids as biomarkers for cancer (circulating lncRNA) have been investigated by several studies (15,16). Aberrant expressions of lncRNAs were reported to be responsible for drug resistance in human cancer (drug-resistant lncRNA) (17). Many lncRNAs were also reported to hold prognostic value for survival prediction of cancer patients (prognostic-related lncRNA) (18). However, there is no specialized resource devoted to collecting, storing and distributing these data. Some existing resources only collected basic annotation and functional information on lncRNA, such as LNCipedia (19), LncRNADisease (20), LNCediting (21) and lncR-NAdb (22). Besides, some other studies focused on expression and genomic characterization for lncRNAs across cancers (23,24). However, a global and high-quality database focused on lncRNAs as cancer biomarkers is still lacking.
To meet these needs, we updated Lnc2Cancer 1.0 to version 2.0 (Lnc2Cancer 2.0) ( Figure 1 and Table 1). Lnc2Cancer 2.0 documents 4989 entries of associations between 1614 human lncRNAs and 165 human cancer subtypes by reviewing >6500 published papers. For the first time, experimentally supported circulating, drug-resistant and prognostic-related lncRNAs in human cancers are included in Lnc2Cancer 2.0. LncRNAs regulated by miRNA, TF, variant and methylation are also shown in our updated database. Furthermore, Lnc2Cancer 2.0 also contains highthroughput experiments of lncRNAs in cancers. LncRNA-Cancer score was developed to evaluate the associations between lncRNA and cancer. We hope that Lnc2Cancer 2.0 can serve as an important resource for future researches about lncRNA and human cancer.

Data expansion and pre-processing
Lnc2Cancer 2.0 is updated to include the increased associations between lncRNAs and cancer subtypes (Table 1). Firstly, we screened approximate 4600 studies within the PubMed database (25) (mainly from 2015 to 2018) following the similar keyword combinations as Lnc2Cancer 1.0. In addition, we also re-screened >2000 studies in the PubMed database (mainly before 2015) which had been included in Lnc2Cancer 1.0 to obtain more information and details.
Secondly, we extracted experimentally supported lncRNA-cancer associations which were supported by strong experimental evidence including RNAi, in vitro knockdown, western blot, qRT-PCR and luciferase reporter assay. If the lncRNA has been verified to be a circulating, drug-resistant or prognostic-related biomarker in cancer, we would extract this information. Similarly, the information would also be recorded when the regulatory mechanisms of miRNA, variant, TF and methylation to the lncRNAs in cancers were confirmed. In this step, we recorded detail information including lncRNAs and cancer names, sequence and positional information of the lncRNAs, experimental techniques (e.g. microarray, northern blot, qRT-PCR), experimental samples (cell line, blood and/or tissue), expression patterns (up-regulated, down-regulated or differential expressed), information from PubMed database (PubMed ID, year of publication and title of paper) and a brief functional description about associations between lncRNA and cancer from the original studies. Moreover, some high-quality high-throughput experiments (microarray and next-generation sequencing) of lncRNAs in cancers were extracted, which include the cancer versus normal samples, different cancer subtypes and cancer with drug treatment.
Thirdly, we collected other names of lncRNAs including aliases, synonyms, gene IDs, names from HGNC (26), Ensembl ID (27), GENCODE name (28), Genbank ID (29) and Refseq ID (30). We used these names to combine the synonyms for lncRNAs and ensure that same lncRNA had coincident information. We also updated the location of lncRNAs into GENCODE version 28. Then a standardized classification scheme, the International Classification of Diseases for Oncology, 3rd Edition (ICD-O-3) was used to annotate each cancer type. After data expansion and preprocessing, >6500 published papers were systematically reviewed. The current version of Lnc2Cancer includes 4989 entries of associations between 1614 human lncRNAs and 165 human cancer subtypes.

Experimentally supported cancer biomarkers and regulatory mechanisms for lncRNAs
To provide a comprehensive resource for associations between lncRNA and cancer, we manually curated lncRNAs which can serve as cancer biomarkers. We collected three kinds of cancer biomarkers including circulating, drugresistant and prognostic-related lncRNAs in cancer. For circulating-related lncRNAs, only the expression of lncR-NAs which can be detected in blood, plasma, serum or the researchers defined circulating lncRNAs were extracted. For drug-resistant related lncRNAs (or drug-sensitive related lncRNAs), drug names were recorded. For prognosticrelated lncRNAs, the lncRNAs which were verified to have a clear relationship with survival were collected. Eventually, lnc2Cancer 2.0 contains 366 circulating, 593 drug-resistant and 1928 prognostic-related lncRNA-cancer associations that have been experimentally supported.
The regulatory mechanisms of lncRNAs in cancer are complex and four main types of lncRNAs which are regulated by miRNA, variant, TF and methylation were collected. For lncRNAs regulated by miRNA and TF, we only collected the relations verified by high-quality experiments. For lncRNAs regulated by variant, if the somatic variant or genetic variant are located on or near lncRNAs and they influence the expression or structure of lncRNAs, this entry would be collected. For lncRNAs regulated by methylation, we adopted similar criterion with variant. Finally, 1139, 211, 225 and 319 entries about miRNA, variant, TF and methylation are included in Lnc2Cancer 2.0.

High-throughput experiments of lncRNAs in cancer
In recent years, high-throughput microarray and sequencing data are producing at an unprecedented rate in cancer genomes. There is a strong need to collect high-quality lncRNA profiles in cancers based on high-throughput experiments, and this will help to explore the function and mechanism of lncRNAs in cancer at the whole genome range. Lnc2Cancer 2.0 contains 77 high-throughput experiments across 38 cancer subtypes, including cancer versus normal samples, different cancer subtypes and cancer with drug treatment.

LncRNA-cancer score for filtering the interested associations
In Lnc2Cancer 2.0, all lncRNA-cancer associations were verified by strong experiments. To evaluate the verified levels and research hotspots, we developed a LncRNA-cancer score system. The lncRNA-cancer score is based on number of publications to verify the association and sample type (tissue, cell line or blood) used for experiment. For each lncRNA-cancer association, we calculated the confidence score as follows: where N is the number of studies which verified the specific lncRNA-cancer association, C, B, T are if the lncRNAcancer association had been verified in cancer cell lines, blood or tissues, respectively (if association had been veri- fied in cancer cell lines, blood or tissues, we set C, B, T to 1). Researchers could use the score to filter interested lncRNAcancer associations.

DATABASE CONSTRUCTION AND IMPROVED USER INTERFACE
All data in Lnc2Cancer 2.0 were stored and managed using MySQL (version 5.7.18). The web interfaces were built in JSP on Linux and Apache platform. The Lnc2Cancer 2.0 is freely available at http://www.bio-bigdata.net/lnc2cancer and http://www.bio-bigdata.com/lnc2cancer. The old version Lnc2Cancer 1.0 is still in service. Users can enter it from the Lnc2Cancer 2.0 homepage or go directly to http: //www.bio-bigdata.net/lnc2cancer1.0/. We provided a user-friendly web interface (Figure 1) that can enable users to query the database for a few steps.
(i) From the 'Browse' page, users can browse all experimentally supported associations by 'LncRNA-Centric' and 'Cancer-Centric' (Figure 2A-C). In LncRNA-Centric page, users can browse by diverse biomarker types and regulatory mechanisms for lncRNAs. In Cancer-Centric page, there are three ways including anatomical classification in human bodymap, cancer name list and input cancer name to search and browse the data. (ii) The 'Search' page provides 'general search' and 'advanced search' (Figure 2D). In general search page, users can search by lncRNA name and cancer name. In advanced search, users can get more detailed and systemic search by restricting to interested descriptions containing dysregulated expression pattern, sample, common method, biomarker type and regulatory mechanism.

CONCLUSIONS AND FUTURE EXTENSIONS
With the increase of experimentally supported lncRNAcancer associations, Lnc2Cancer database was updated and improved with the latest data and new features. Lnc2Cancer 2.0 provides a comprehensive resource for associations between lncRNA and cancer. As the studies regarding to lncRNA and cancer accumulate rapidly, lncRNAs in cancers are being identified and characterized at a rapid pace, and we believe that more roles of lncRNAs in cancers will be revealed in the future. We will continue to update and improve the database to keep pace of the researches. Lnc2Cancer will serve as a valuable resource for researchers interested in determining the role of lncRNAs in human cancers.