Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries

Bhasuran, Balu

doi:10.1007/978-1-0716-2305-3_7

Balu Bhasuran^3,4

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2496))

793 Accesses
1 Citations

Abstract

The major outcomes and insights of scientific research and clinical study end up in the form of publication or clinical record in an unstructured text format. Due to advancements in biomedical research, the growth of published literature is getting tremendous large in recent years. The scientists and clinical researchers are facing a big challenge to stay current with the knowledge and to extract hidden information from this sheer quantity of millions of published biomedical literature. The potential one-stop automated solution to this problem is biomedical literature mining. One of the long-standing goals in biology is to discover the disease-causing genes and their specific roles in personalized precision medicine and drug repurposing. However, the empirical approaches and clinical affirmation are expensive and time-consuming. In silico approach using text mining to identify the disease causing genes can contribute towards biomarker discovery. This chapter presents a protocol on combining literature mining and machine learning for predicting biomedical discoveries with a special emphasis on gene–disease relation based discovery. The protocol is presented as a literature based discovery (LBD) pipeline for gene–disease based discovery. The protocol includes our web based tools: (1) DNER (Disease Named Entity Recognizer) for disease entity recognition, (2) BCCNER (Bidirectional, Contextual clues Named Entity Tagger) for gene/protein entity recognition, (3) DisGeReExT (Disease-Gene Relation Extractor) for statistically validated results and visualization, and (4) a newly introduced deep learning based method for association discovery. Our proposed deep learning based method can be generalized and applied to other important biomedical discoveries focusing on entities such as drug/chemical, or miRNA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhao S, Su C, Lu Z, Wang F (2020) Recent advances in biomedical literature mining. Brief Bioinform 22(3):bbaa057. https://doi.org/10.1093/bib/bbaa057
Article CAS PubMed Central Google Scholar
Nadif M, Role F (2021) Unsupervised and self-supervised deep learning approaches for biomedical text mining. Brief Bioinform 22(2):1592–1603. https://doi.org/10.1093/bib/bbab016
Article CAS PubMed Google Scholar
Kilicoglu H (2018) Biomedical text mining for research rigor and integrity: tasks, challenges, directions. Brief Bioinform 19(6):1400–1414. https://doi.org/10.1093/bib/bbx057
Article PubMed Google Scholar
Westergaard D, Stærfeldt H, Tønsberg C, Jensen L, Brunak S (2018) A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput Biol 14(2):e1005962. https://doi.org/10.1371/journal.pcbi.1005962
Article CAS PubMed PubMed Central Google Scholar
Bhasuran B, Subramanian D, Natarajan J (2018) Text mining and network analysis to find functional associations of genes in high altitude diseases. Comput Biol Chem 75:101–110. https://doi.org/10.1016/j.compbiolchem.2018.05.002
Article CAS PubMed Google Scholar
Maroli N, Kalagatur NK, Bhasuran B, Jayakrishnan A, Manoharan RR, Kolandaivel P et al (2019) Molecular mechanism of T-2 toxin-induced cerebral edema by Aquaporin-4 blocking and permeation. J Chem Inf Model 59(11):4942–4958. https://doi.org/10.1021/acs.jcim.9b00711
Article CAS PubMed Google Scholar
Maroli N, Bhasuran B, Natarajan J, Kolandaivel P (2020) The potential role of procyanidin as a therapeutic agent against SARS-CoV-2: a text mining, molecular docking and molecular dynamics simulation approach. J Biomol Struct Dyn:1–16. https://doi.org/10.1080/07391102.2020.1823887
Abdulkadhar S, Bhasuran B, Natarajan J (2020) Multiscale Laplacian graph kernel combined with lexico-syntactic patterns for biomedical event extraction from literature. Knowl Inf Syst 63(1):143–173. https://doi.org/10.1007/s10115-020-01514-8
Article Google Scholar
Bhasuran B, Natarajan J (2018) Distant supervision for large-scale extraction of gene–disease associations from literature using deepdive. In: Bhattacharyya S, Hassanien A, Gupta D, Khanna A, Pan I (eds) International Conference on Innovative Computing and Communications, 2nd edn. Springer, Singapore. https://doi.org/10.1007/978-981-13-2354-6_39
Chapter Google Scholar
Bhasuran B, Natarajan J (2018) Automatic extraction of gene-disease associations from literature using joint ensemble learning. PLoS One 13(7):e0200699. https://doi.org/10.1371/journal.pone.0200699
Article CAS PubMed PubMed Central Google Scholar
Zhu F, Patumcharoenpol P, Zhang C, Yang Y, Chan J, Meechai A et al (2013) Biomedical text mining and its applications in cancer research. J Biomed Inform 46(2):200–211. https://doi.org/10.1016/j.jbi.2012.10.007
Article PubMed Google Scholar
Huang CC, Lu Z (2016) Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform 17(1):132–144. https://doi.org/10.1093/bib/bbv024
Article PubMed Google Scholar
Kim YH, Song M (2019) A context-based ABC model for literature-based discovery. PLoS One 14(4):e0215313. https://doi.org/10.1371/journal.pone.0215313
Article CAS PubMed PubMed Central Google Scholar
Yoo I, Song M (2008) Biomedical ontologies and text mining for biomedicine and Healthcare: a survey. J Comput Sci Eng 2(2):109–136. https://doi.org/10.5626/jcse.2008.2.2.109
Article Google Scholar
Fiorini N, Leaman R, Lipman D, Lu Z (2018) How user intelligence is improving PubMed. Nat Biotechnol 36(10):937–945. https://doi.org/10.1038/nbt.4267
Article CAS Google Scholar
Fiorini N, Canese K, Starchenko G, Kireev E, Kim W, Miller V et al (2018) Best match: new relevance search for PubMed. PLoS Biol 16(8):e2005343. https://doi.org/10.1371/journal.pbio.2005343
Article CAS PubMed PubMed Central Google Scholar
Wei C, Harris B, Kao H, Lu Z (2013) tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 29(11):1433–1439. https://doi.org/10.1093/bioinformatics/btt156
Article CAS PubMed PubMed Central Google Scholar
Lee K, Wei CH, Lu Z (2020) Recent advances of automated methods for searching and extracting genomic variant information from biomedical literature. Brief Bioinform 22(3):bbaa142. https://doi.org/10.1093/bib/bbaa142
Article CAS PubMed Central Google Scholar
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH et al (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234–1240. https://doi.org/10.1093/bioinformatics/btz682
Article CAS PubMed Google Scholar
Gopalakrishnan V, Jha K, Jin W, Zhang A (2019) A survey on literature based discovery approaches in biomedical domain. J Biomed Inform 93:103141. https://doi.org/10.1016/j.jbi.2019.103141
Article PubMed Google Scholar
Bhasuran B, Murugesan G, Abdulkadhar S, Natarajan J (2016) Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases. J Biomed Inform 64:1–9. https://doi.org/10.1016/j.jbi.2016.09.009
Article PubMed Google Scholar
Murugesan G, Abdulkadhar S, Bhasuran B, Natarajan J (2017) BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition. EURASIP J Bioinform Syst Biol 2017(1):7. https://doi.org/10.1186/s13637-017-0060-6
Article PubMed PubMed Central Google Scholar
Senov A (2015) Improving distributed stochastic gradient descent estimate via loss function approximation. IFAC-PapersOnLine 48(25):292–297. https://doi.org/10.1016/j.ifacol.2015.11.103
Article Google Scholar
Falk P (2014) Tech services on the web: MALLET-MAchine learning for LanguagE toolkit; http://mallet.cs.umass.edu/. Tech Serv Quart 31(4):410-411. https://doi.org/10.1080/07317131.2014.943038
Article Google Scholar
Aronson AR (2001) Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp:17–21. https://www.ncbi.nlm.nih.gov/pubmed/11825149
Henry S, McInnes BT (2017) Literature based discovery: models, methods, and trends. J Biomed Inform 74:20–32. https://doi.org/10.1016/j.jbi.2017.08.011
Article PubMed Google Scholar
Preiss J, Stevenson M, Gaizauskas R (2015) Exploring relation types for literature-based discovery. J Am Med Inform Assoc 22(5):987–992. https://doi.org/10.1093/jamia/ocv002
Article PubMed PubMed Central Google Scholar
Xie Q, Yang KM, Heo GE, Song M (2020) Literature based discovery of alternative TCM medicine for adverse reactions to depression drugs. BMC Bioinformatics 21(Suppl 5):405. https://doi.org/10.1186/s12859-020-03735-8
Article CAS PubMed PubMed Central Google Scholar
Kastrin A, Rindflesch TC, Hristovski D (2016) Link prediction on a network of co-occurring MeSH terms: towards literature-based discovery. Methods Inf Med 55(4):340–346. https://doi.org/10.3414/ME15-01-0108
Article PubMed Google Scholar
Thilakaratne M, Falkner K, Atapattu T (2019) A systematic review on literature-based discovery workflow. PeerJ Comput Sci 5:e235. https://doi.org/10.7717/peerj-cs.235
Article PubMed PubMed Central Google Scholar
Yang HT, Ju JH, Wong YT, Shmulevich I, Chiang JH (2017) Literature-based discovery of new candidates for drug repurposing. Brief Bioinform 18(3):488–497. https://doi.org/10.1093/bib/bbw030
Article PubMed Google Scholar
Preiss J, Stevenson M (2016) The effect of word sense disambiguation accuracy on literature based discovery. BMC Med Inform Decis Mak 16(Suppl 1):57. https://doi.org/10.1186/s12911-016-0296-1
Article PubMed PubMed Central Google Scholar
Hristovski D, Kastrin A, Dinevski D, Burgun A, Žiberna L, Rindflesch T (2016) Using literature-based discovery to explain adverse drug effects. J Med Syst 40(8):185. https://doi.org/10.1007/s10916-016-0544-z
Article PubMed Google Scholar
Smalheiser NR (2017) Rediscovering Don Swanson: the past, present and future of literature-based discovery. J Data Inf Sci 2(4):43–64. https://doi.org/10.1515/jdis-2017-0019
Article PubMed PubMed Central Google Scholar
Hettne KM, Thompson M, van Haagen HH, van der Horst E, Kaliyaperumal R, Mina E et al (2016) The Implicitome: a resource for rationalizing gene-disease associations. PLoS One 11(2):e0149621. https://doi.org/10.1371/journal.pone.0149621
Article CAS PubMed PubMed Central Google Scholar
ElShal S, Tranchevent LC, Sifrim A, Ardeshirdavani A, Davis J, Moreau Y (2016) Beegle: from literature mining to disease-gene discovery. Nucleic Acids Res 44(2):e18. https://doi.org/10.1093/nar/gkv905
Article CAS PubMed Google Scholar
Fleuren WW, Verhoeven S, Frijters R, Heupers B, Polman J, van Schaik R et al (2011) CoPub update: CoPub 50 a text mining system to answer biological questions. Nucleic Acids Res 39(Web Server issue):W450–W454. https://doi.org/10.1093/nar/gkr310
Article CAS PubMed PubMed Central Google Scholar
Liu Y, Liang Y, Wishart D (2015) PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more. Nucleic Acids Res 43(W1):W535–W542. https://doi.org/10.1093/nar/gkv383
Article CAS PubMed PubMed Central Google Scholar
Fontaine J, Andrade-Navarro M (2016) Gene set to diseases (GS2D): disease enrichment analysis on human gene sets with literature data. Genomics Comput Biol 2(1):33. https://doi.org/10.18547/gcb.2016.vol2.iss1.e33
Article Google Scholar
Swanson DR (1986) Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect Biol Med 30(1):7–18. https://doi.org/10.1353/pbm.1986.0087
Article CAS PubMed Google Scholar
Swanson D (1990) Somatomedin C and arginine: implicit connections between mutually isolated literatures. Perspect Biol Med 33(2):157–186. https://doi.org/10.1353/pbm.1990.0031
Article CAS PubMed Google Scholar
Swanson D (2006) Atrial fibrillation in athletes: implicit literature-based connections suggest that overtraining and subsequent inflammation may be a contributory mechanism. Med Hypotheses 66(6):1085–1092. https://doi.org/10.1016/j.mehy.2006.01.006
Article PubMed Google Scholar
Swanson DR (2011) Literature-based resurrection of neglected medical discoveries. J Biomed Discov Collab 6:34–47. https://doi.org/10.5210/disco.v6i0.3515
Article PubMed PubMed Central Google Scholar
Swanson DR (1988) Migraine and magnesium: eleven neglected connections. Perspect Biol Med 31(4):526–557. https://doi.org/10.1353/pbm.1988.0009
Article CAS PubMed Google Scholar
Gallai V, Sarchielli P, Coata G, Firenze C, Morucci P, Abbritti G (1992) Serum and salivary magnesium levels in migraine. Results in a group of juvenile patients. Headache 32(3):132–135. https://doi.org/10.1111/j.1526-4610.1992.hed3203132.x
Article CAS PubMed Google Scholar
Hristovski D, Peterlin B, Mitchell J, Humphrey S (2005) Using literature-based discovery to identify disease candidate genes. Int J Med Inform 74(2–4):289–298. https://doi.org/10.1016/j.ijmedinf.2004.04.024
Article PubMed Google Scholar
Smalheiser N, Torvik V, Zhou W (2009) Arrowsmith two-node search interface: a tutorial on finding meaningful links between two disparate sets of articles in MEDLINE. Comput Methods Prog Biomed 94(2):190–197. https://doi.org/10.1016/j.cmpb.2008.12.006
Article Google Scholar
Tsuruoka Y, Tsujii J, Ananiadou S (2008) FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24(21):2559–2560. https://doi.org/10.1093/bioinformatics/btn469
Article CAS PubMed PubMed Central Google Scholar
Tsuruoka Y, Miwa M, Hamamoto K, Tsujii J, Ananiadou S (2011) Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics 27(13):i111–i119. https://doi.org/10.1093/bioinformatics/btr214
Article CAS PubMed PubMed Central Google Scholar
Pyysalo S, Baker S, Ali I, Haselwimmer S, Shah T, Young A et al (2019) LION LBD: a literature-based discovery system for cancer biology. Bioinformatics 35(9):1553–1561. https://doi.org/10.1093/bioinformatics/bty845
Article CAS PubMed Google Scholar
Crichton G, Baker S, Guo Y, Korhonen A (2020) Neural networks for open and closed literature-based discovery. PLoS One 15(5):e0232891. https://doi.org/10.1371/journal.pone.0232891
Article CAS PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

DRDO-BU Center for Life Sciences, Bharathiar University Campus, Coimbatore, Tamilnadu, India
Balu Bhasuran
Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
Balu Bhasuran

Authors

Balu Bhasuran
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Morgridge Institute for Research, University of Wisconsin, Madison, WI, USA
Kalpana Raja

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Bhasuran, B. (2022). Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries. In: Raja, K. (eds) Biomedical Text Mining. Methods in Molecular Biology, vol 2496. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2305-3_7

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2305-3_7
Published: 18 June 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2304-6
Online ISBN: 978-1-0716-2305-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics