Abstract
The major outcomes and insights of scientific research and clinical study end up in the form of publication or clinical record in an unstructured text format. Due to advancements in biomedical research, the growth of published literature is getting tremendous large in recent years. The scientists and clinical researchers are facing a big challenge to stay current with the knowledge and to extract hidden information from this sheer quantity of millions of published biomedical literature. The potential one-stop automated solution to this problem is biomedical literature mining. One of the long-standing goals in biology is to discover the disease-causing genes and their specific roles in personalized precision medicine and drug repurposing. However, the empirical approaches and clinical affirmation are expensive and time-consuming. In silico approach using text mining to identify the disease causing genes can contribute towards biomarker discovery. This chapter presents a protocol on combining literature mining and machine learning for predicting biomedical discoveries with a special emphasis on gene–disease relation based discovery. The protocol is presented as a literature based discovery (LBD) pipeline for gene–disease based discovery. The protocol includes our web based tools: (1) DNER (Disease Named Entity Recognizer) for disease entity recognition, (2) BCCNER (Bidirectional, Contextual clues Named Entity Tagger) for gene/protein entity recognition, (3) DisGeReExT (Disease-Gene Relation Extractor) for statistically validated results and visualization, and (4) a newly introduced deep learning based method for association discovery. Our proposed deep learning based method can be generalized and applied to other important biomedical discoveries focusing on entities such as drug/chemical, or miRNA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhao S, Su C, Lu Z, Wang F (2020) Recent advances in biomedical literature mining. Brief Bioinform 22(3):bbaa057. https://doi.org/10.1093/bib/bbaa057
Nadif M, Role F (2021) Unsupervised and self-supervised deep learning approaches for biomedical text mining. Brief Bioinform 22(2):1592–1603. https://doi.org/10.1093/bib/bbab016
Kilicoglu H (2018) Biomedical text mining for research rigor and integrity: tasks, challenges, directions. Brief Bioinform 19(6):1400–1414. https://doi.org/10.1093/bib/bbx057
Westergaard D, Stærfeldt H, Tønsberg C, Jensen L, Brunak S (2018) A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput Biol 14(2):e1005962. https://doi.org/10.1371/journal.pcbi.1005962
Bhasuran B, Subramanian D, Natarajan J (2018) Text mining and network analysis to find functional associations of genes in high altitude diseases. Comput Biol Chem 75:101–110. https://doi.org/10.1016/j.compbiolchem.2018.05.002
Maroli N, Kalagatur NK, Bhasuran B, Jayakrishnan A, Manoharan RR, Kolandaivel P et al (2019) Molecular mechanism of T-2 toxin-induced cerebral edema by Aquaporin-4 blocking and permeation. J Chem Inf Model 59(11):4942–4958. https://doi.org/10.1021/acs.jcim.9b00711
Maroli N, Bhasuran B, Natarajan J, Kolandaivel P (2020) The potential role of procyanidin as a therapeutic agent against SARS-CoV-2: a text mining, molecular docking and molecular dynamics simulation approach. J Biomol Struct Dyn:1–16. https://doi.org/10.1080/07391102.2020.1823887
Abdulkadhar S, Bhasuran B, Natarajan J (2020) Multiscale Laplacian graph kernel combined with lexico-syntactic patterns for biomedical event extraction from literature. Knowl Inf Syst 63(1):143–173. https://doi.org/10.1007/s10115-020-01514-8
Bhasuran B, Natarajan J (2018) Distant supervision for large-scale extraction of gene–disease associations from literature using deepdive. In: Bhattacharyya S, Hassanien A, Gupta D, Khanna A, Pan I (eds) International Conference on Innovative Computing and Communications, 2nd edn. Springer, Singapore. https://doi.org/10.1007/978-981-13-2354-6_39
Bhasuran B, Natarajan J (2018) Automatic extraction of gene-disease associations from literature using joint ensemble learning. PLoS One 13(7):e0200699. https://doi.org/10.1371/journal.pone.0200699
Zhu F, Patumcharoenpol P, Zhang C, Yang Y, Chan J, Meechai A et al (2013) Biomedical text mining and its applications in cancer research. J Biomed Inform 46(2):200–211. https://doi.org/10.1016/j.jbi.2012.10.007
Huang CC, Lu Z (2016) Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform 17(1):132–144. https://doi.org/10.1093/bib/bbv024
Kim YH, Song M (2019) A context-based ABC model for literature-based discovery. PLoS One 14(4):e0215313. https://doi.org/10.1371/journal.pone.0215313
Yoo I, Song M (2008) Biomedical ontologies and text mining for biomedicine and Healthcare: a survey. J Comput Sci Eng 2(2):109–136. https://doi.org/10.5626/jcse.2008.2.2.109
Fiorini N, Leaman R, Lipman D, Lu Z (2018) How user intelligence is improving PubMed. Nat Biotechnol 36(10):937–945. https://doi.org/10.1038/nbt.4267
Fiorini N, Canese K, Starchenko G, Kireev E, Kim W, Miller V et al (2018) Best match: new relevance search for PubMed. PLoS Biol 16(8):e2005343. https://doi.org/10.1371/journal.pbio.2005343
Wei C, Harris B, Kao H, Lu Z (2013) tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 29(11):1433–1439. https://doi.org/10.1093/bioinformatics/btt156
Lee K, Wei CH, Lu Z (2020) Recent advances of automated methods for searching and extracting genomic variant information from biomedical literature. Brief Bioinform 22(3):bbaa142. https://doi.org/10.1093/bib/bbaa142
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH et al (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234–1240. https://doi.org/10.1093/bioinformatics/btz682
Gopalakrishnan V, Jha K, Jin W, Zhang A (2019) A survey on literature based discovery approaches in biomedical domain. J Biomed Inform 93:103141. https://doi.org/10.1016/j.jbi.2019.103141
Bhasuran B, Murugesan G, Abdulkadhar S, Natarajan J (2016) Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases. J Biomed Inform 64:1–9. https://doi.org/10.1016/j.jbi.2016.09.009
Murugesan G, Abdulkadhar S, Bhasuran B, Natarajan J (2017) BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition. EURASIP J Bioinform Syst Biol 2017(1):7. https://doi.org/10.1186/s13637-017-0060-6
Senov A (2015) Improving distributed stochastic gradient descent estimate via loss function approximation. IFAC-PapersOnLine 48(25):292–297. https://doi.org/10.1016/j.ifacol.2015.11.103
Falk P (2014) Tech services on the web: MALLET-MAchine learning for LanguagE toolkit; http://mallet.cs.umass.edu/. Tech Serv Quart 31(4):410-411. https://doi.org/10.1080/07317131.2014.943038
Aronson AR (2001) Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp:17–21. https://www.ncbi.nlm.nih.gov/pubmed/11825149
Henry S, McInnes BT (2017) Literature based discovery: models, methods, and trends. J Biomed Inform 74:20–32. https://doi.org/10.1016/j.jbi.2017.08.011
Preiss J, Stevenson M, Gaizauskas R (2015) Exploring relation types for literature-based discovery. J Am Med Inform Assoc 22(5):987–992. https://doi.org/10.1093/jamia/ocv002
Xie Q, Yang KM, Heo GE, Song M (2020) Literature based discovery of alternative TCM medicine for adverse reactions to depression drugs. BMC Bioinformatics 21(Suppl 5):405. https://doi.org/10.1186/s12859-020-03735-8
Kastrin A, Rindflesch TC, Hristovski D (2016) Link prediction on a network of co-occurring MeSH terms: towards literature-based discovery. Methods Inf Med 55(4):340–346. https://doi.org/10.3414/ME15-01-0108
Thilakaratne M, Falkner K, Atapattu T (2019) A systematic review on literature-based discovery workflow. PeerJ Comput Sci 5:e235. https://doi.org/10.7717/peerj-cs.235
Yang HT, Ju JH, Wong YT, Shmulevich I, Chiang JH (2017) Literature-based discovery of new candidates for drug repurposing. Brief Bioinform 18(3):488–497. https://doi.org/10.1093/bib/bbw030
Preiss J, Stevenson M (2016) The effect of word sense disambiguation accuracy on literature based discovery. BMC Med Inform Decis Mak 16(Suppl 1):57. https://doi.org/10.1186/s12911-016-0296-1
Hristovski D, Kastrin A, Dinevski D, Burgun A, Žiberna L, Rindflesch T (2016) Using literature-based discovery to explain adverse drug effects. J Med Syst 40(8):185. https://doi.org/10.1007/s10916-016-0544-z
Smalheiser NR (2017) Rediscovering Don Swanson: the past, present and future of literature-based discovery. J Data Inf Sci 2(4):43–64. https://doi.org/10.1515/jdis-2017-0019
Hettne KM, Thompson M, van Haagen HH, van der Horst E, Kaliyaperumal R, Mina E et al (2016) The Implicitome: a resource for rationalizing gene-disease associations. PLoS One 11(2):e0149621. https://doi.org/10.1371/journal.pone.0149621
ElShal S, Tranchevent LC, Sifrim A, Ardeshirdavani A, Davis J, Moreau Y (2016) Beegle: from literature mining to disease-gene discovery. Nucleic Acids Res 44(2):e18. https://doi.org/10.1093/nar/gkv905
Fleuren WW, Verhoeven S, Frijters R, Heupers B, Polman J, van Schaik R et al (2011) CoPub update: CoPub 50 a text mining system to answer biological questions. Nucleic Acids Res 39(Web Server issue):W450–W454. https://doi.org/10.1093/nar/gkr310
Liu Y, Liang Y, Wishart D (2015) PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more. Nucleic Acids Res 43(W1):W535–W542. https://doi.org/10.1093/nar/gkv383
Fontaine J, Andrade-Navarro M (2016) Gene set to diseases (GS2D): disease enrichment analysis on human gene sets with literature data. Genomics Comput Biol 2(1):33. https://doi.org/10.18547/gcb.2016.vol2.iss1.e33
Swanson DR (1986) Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect Biol Med 30(1):7–18. https://doi.org/10.1353/pbm.1986.0087
Swanson D (1990) Somatomedin C and arginine: implicit connections between mutually isolated literatures. Perspect Biol Med 33(2):157–186. https://doi.org/10.1353/pbm.1990.0031
Swanson D (2006) Atrial fibrillation in athletes: implicit literature-based connections suggest that overtraining and subsequent inflammation may be a contributory mechanism. Med Hypotheses 66(6):1085–1092. https://doi.org/10.1016/j.mehy.2006.01.006
Swanson DR (2011) Literature-based resurrection of neglected medical discoveries. J Biomed Discov Collab 6:34–47. https://doi.org/10.5210/disco.v6i0.3515
Swanson DR (1988) Migraine and magnesium: eleven neglected connections. Perspect Biol Med 31(4):526–557. https://doi.org/10.1353/pbm.1988.0009
Gallai V, Sarchielli P, Coata G, Firenze C, Morucci P, Abbritti G (1992) Serum and salivary magnesium levels in migraine. Results in a group of juvenile patients. Headache 32(3):132–135. https://doi.org/10.1111/j.1526-4610.1992.hed3203132.x
Hristovski D, Peterlin B, Mitchell J, Humphrey S (2005) Using literature-based discovery to identify disease candidate genes. Int J Med Inform 74(2–4):289–298. https://doi.org/10.1016/j.ijmedinf.2004.04.024
Smalheiser N, Torvik V, Zhou W (2009) Arrowsmith two-node search interface: a tutorial on finding meaningful links between two disparate sets of articles in MEDLINE. Comput Methods Prog Biomed 94(2):190–197. https://doi.org/10.1016/j.cmpb.2008.12.006
Tsuruoka Y, Tsujii J, Ananiadou S (2008) FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24(21):2559–2560. https://doi.org/10.1093/bioinformatics/btn469
Tsuruoka Y, Miwa M, Hamamoto K, Tsujii J, Ananiadou S (2011) Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics 27(13):i111–i119. https://doi.org/10.1093/bioinformatics/btr214
Pyysalo S, Baker S, Ali I, Haselwimmer S, Shah T, Young A et al (2019) LION LBD: a literature-based discovery system for cancer biology. Bioinformatics 35(9):1553–1561. https://doi.org/10.1093/bioinformatics/bty845
Crichton G, Baker S, Guo Y, Korhonen A (2020) Neural networks for open and closed literature-based discovery. PLoS One 15(5):e0232891. https://doi.org/10.1371/journal.pone.0232891
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Bhasuran, B. (2022). Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries. In: Raja, K. (eds) Biomedical Text Mining. Methods in Molecular Biology, vol 2496. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2305-3_7
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2305-3_7
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2304-6
Online ISBN: 978-1-0716-2305-3
eBook Packages: Springer Protocols