Skip to main content

Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries

  • Protocol
  • First Online:
Biomedical Text Mining

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2496))

Abstract

The major outcomes and insights of scientific research and clinical study end up in the form of publication or clinical record in an unstructured text format. Due to advancements in biomedical research, the growth of published literature is getting tremendous large in recent years. The scientists and clinical researchers are facing a big challenge to stay current with the knowledge and to extract hidden information from this sheer quantity of millions of published biomedical literature. The potential one-stop automated solution to this problem is biomedical literature mining. One of the long-standing goals in biology is to discover the disease-causing genes and their specific roles in personalized precision medicine and drug repurposing. However, the empirical approaches and clinical affirmation are expensive and time-consuming. In silico approach using text mining to identify the disease causing genes can contribute towards biomarker discovery. This chapter presents a protocol on combining literature mining and machine learning for predicting biomedical discoveries with a special emphasis on gene–disease relation based discovery. The protocol is presented as a literature based discovery (LBD) pipeline for gene–disease based discovery. The protocol includes our web based tools: (1) DNER (Disease Named Entity Recognizer) for disease entity recognition, (2) BCCNER (Bidirectional, Contextual clues Named Entity Tagger) for gene/protein entity recognition, (3) DisGeReExT (Disease-Gene Relation Extractor) for statistically validated results and visualization, and (4) a newly introduced deep learning based method for association discovery. Our proposed deep learning based method can be generalized and applied to other important biomedical discoveries focusing on entities such as drug/chemical, or miRNA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhao S, Su C, Lu Z, Wang F (2020) Recent advances in biomedical literature mining. Brief Bioinform 22(3):bbaa057. https://doi.org/10.1093/bib/bbaa057

    Article  CAS  PubMed Central  Google Scholar 

  2. Nadif M, Role F (2021) Unsupervised and self-supervised deep learning approaches for biomedical text mining. Brief Bioinform 22(2):1592–1603. https://doi.org/10.1093/bib/bbab016

    Article  CAS  PubMed  Google Scholar 

  3. Kilicoglu H (2018) Biomedical text mining for research rigor and integrity: tasks, challenges, directions. Brief Bioinform 19(6):1400–1414. https://doi.org/10.1093/bib/bbx057

    Article  PubMed  Google Scholar 

  4. Westergaard D, Stærfeldt H, Tønsberg C, Jensen L, Brunak S (2018) A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput Biol 14(2):e1005962. https://doi.org/10.1371/journal.pcbi.1005962

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Bhasuran B, Subramanian D, Natarajan J (2018) Text mining and network analysis to find functional associations of genes in high altitude diseases. Comput Biol Chem 75:101–110. https://doi.org/10.1016/j.compbiolchem.2018.05.002

    Article  CAS  PubMed  Google Scholar 

  6. Maroli N, Kalagatur NK, Bhasuran B, Jayakrishnan A, Manoharan RR, Kolandaivel P et al (2019) Molecular mechanism of T-2 toxin-induced cerebral edema by Aquaporin-4 blocking and permeation. J Chem Inf Model 59(11):4942–4958. https://doi.org/10.1021/acs.jcim.9b00711

    Article  CAS  PubMed  Google Scholar 

  7. Maroli N, Bhasuran B, Natarajan J, Kolandaivel P (2020) The potential role of procyanidin as a therapeutic agent against SARS-CoV-2: a text mining, molecular docking and molecular dynamics simulation approach. J Biomol Struct Dyn:1–16. https://doi.org/10.1080/07391102.2020.1823887

  8. Abdulkadhar S, Bhasuran B, Natarajan J (2020) Multiscale Laplacian graph kernel combined with lexico-syntactic patterns for biomedical event extraction from literature. Knowl Inf Syst 63(1):143–173. https://doi.org/10.1007/s10115-020-01514-8

    Article  Google Scholar 

  9. Bhasuran B, Natarajan J (2018) Distant supervision for large-scale extraction of gene–disease associations from literature using deepdive. In: Bhattacharyya S, Hassanien A, Gupta D, Khanna A, Pan I (eds) International Conference on Innovative Computing and Communications, 2nd edn. Springer, Singapore. https://doi.org/10.1007/978-981-13-2354-6_39

    Chapter  Google Scholar 

  10. Bhasuran B, Natarajan J (2018) Automatic extraction of gene-disease associations from literature using joint ensemble learning. PLoS One 13(7):e0200699. https://doi.org/10.1371/journal.pone.0200699

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Zhu F, Patumcharoenpol P, Zhang C, Yang Y, Chan J, Meechai A et al (2013) Biomedical text mining and its applications in cancer research. J Biomed Inform 46(2):200–211. https://doi.org/10.1016/j.jbi.2012.10.007

    Article  PubMed  Google Scholar 

  12. Huang CC, Lu Z (2016) Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform 17(1):132–144. https://doi.org/10.1093/bib/bbv024

    Article  PubMed  Google Scholar 

  13. Kim YH, Song M (2019) A context-based ABC model for literature-based discovery. PLoS One 14(4):e0215313. https://doi.org/10.1371/journal.pone.0215313

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Yoo I, Song M (2008) Biomedical ontologies and text mining for biomedicine and Healthcare: a survey. J Comput Sci Eng 2(2):109–136. https://doi.org/10.5626/jcse.2008.2.2.109

    Article  Google Scholar 

  15. Fiorini N, Leaman R, Lipman D, Lu Z (2018) How user intelligence is improving PubMed. Nat Biotechnol 36(10):937–945. https://doi.org/10.1038/nbt.4267

    Article  CAS  Google Scholar 

  16. Fiorini N, Canese K, Starchenko G, Kireev E, Kim W, Miller V et al (2018) Best match: new relevance search for PubMed. PLoS Biol 16(8):e2005343. https://doi.org/10.1371/journal.pbio.2005343

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Wei C, Harris B, Kao H, Lu Z (2013) tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 29(11):1433–1439. https://doi.org/10.1093/bioinformatics/btt156

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Lee K, Wei CH, Lu Z (2020) Recent advances of automated methods for searching and extracting genomic variant information from biomedical literature. Brief Bioinform 22(3):bbaa142. https://doi.org/10.1093/bib/bbaa142

    Article  CAS  PubMed Central  Google Scholar 

  19. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH et al (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234–1240. https://doi.org/10.1093/bioinformatics/btz682

    Article  CAS  PubMed  Google Scholar 

  20. Gopalakrishnan V, Jha K, Jin W, Zhang A (2019) A survey on literature based discovery approaches in biomedical domain. J Biomed Inform 93:103141. https://doi.org/10.1016/j.jbi.2019.103141

    Article  PubMed  Google Scholar 

  21. Bhasuran B, Murugesan G, Abdulkadhar S, Natarajan J (2016) Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases. J Biomed Inform 64:1–9. https://doi.org/10.1016/j.jbi.2016.09.009

    Article  PubMed  Google Scholar 

  22. Murugesan G, Abdulkadhar S, Bhasuran B, Natarajan J (2017) BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition. EURASIP J Bioinform Syst Biol 2017(1):7. https://doi.org/10.1186/s13637-017-0060-6

    Article  PubMed  PubMed Central  Google Scholar 

  23. Senov A (2015) Improving distributed stochastic gradient descent estimate via loss function approximation. IFAC-PapersOnLine 48(25):292–297. https://doi.org/10.1016/j.ifacol.2015.11.103

    Article  Google Scholar 

  24. Falk P (2014) Tech services on the web: MALLET-MAchine learning for LanguagE toolkit; http://mallet.cs.umass.edu/. Tech Serv Quart 31(4):410-411. https://doi.org/10.1080/07317131.2014.943038

    Article  Google Scholar 

  25. Aronson AR (2001) Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp:17–21. https://www.ncbi.nlm.nih.gov/pubmed/11825149

  26. Henry S, McInnes BT (2017) Literature based discovery: models, methods, and trends. J Biomed Inform 74:20–32. https://doi.org/10.1016/j.jbi.2017.08.011

    Article  PubMed  Google Scholar 

  27. Preiss J, Stevenson M, Gaizauskas R (2015) Exploring relation types for literature-based discovery. J Am Med Inform Assoc 22(5):987–992. https://doi.org/10.1093/jamia/ocv002

    Article  PubMed  PubMed Central  Google Scholar 

  28. Xie Q, Yang KM, Heo GE, Song M (2020) Literature based discovery of alternative TCM medicine for adverse reactions to depression drugs. BMC Bioinformatics 21(Suppl 5):405. https://doi.org/10.1186/s12859-020-03735-8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Kastrin A, Rindflesch TC, Hristovski D (2016) Link prediction on a network of co-occurring MeSH terms: towards literature-based discovery. Methods Inf Med 55(4):340–346. https://doi.org/10.3414/ME15-01-0108

    Article  PubMed  Google Scholar 

  30. Thilakaratne M, Falkner K, Atapattu T (2019) A systematic review on literature-based discovery workflow. PeerJ Comput Sci 5:e235. https://doi.org/10.7717/peerj-cs.235

    Article  PubMed  PubMed Central  Google Scholar 

  31. Yang HT, Ju JH, Wong YT, Shmulevich I, Chiang JH (2017) Literature-based discovery of new candidates for drug repurposing. Brief Bioinform 18(3):488–497. https://doi.org/10.1093/bib/bbw030

    Article  PubMed  Google Scholar 

  32. Preiss J, Stevenson M (2016) The effect of word sense disambiguation accuracy on literature based discovery. BMC Med Inform Decis Mak 16(Suppl 1):57. https://doi.org/10.1186/s12911-016-0296-1

    Article  PubMed  PubMed Central  Google Scholar 

  33. Hristovski D, Kastrin A, Dinevski D, Burgun A, Žiberna L, Rindflesch T (2016) Using literature-based discovery to explain adverse drug effects. J Med Syst 40(8):185. https://doi.org/10.1007/s10916-016-0544-z

    Article  PubMed  Google Scholar 

  34. Smalheiser NR (2017) Rediscovering Don Swanson: the past, present and future of literature-based discovery. J Data Inf Sci 2(4):43–64. https://doi.org/10.1515/jdis-2017-0019

    Article  PubMed  PubMed Central  Google Scholar 

  35. Hettne KM, Thompson M, van Haagen HH, van der Horst E, Kaliyaperumal R, Mina E et al (2016) The Implicitome: a resource for rationalizing gene-disease associations. PLoS One 11(2):e0149621. https://doi.org/10.1371/journal.pone.0149621

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. ElShal S, Tranchevent LC, Sifrim A, Ardeshirdavani A, Davis J, Moreau Y (2016) Beegle: from literature mining to disease-gene discovery. Nucleic Acids Res 44(2):e18. https://doi.org/10.1093/nar/gkv905

    Article  CAS  PubMed  Google Scholar 

  37. Fleuren WW, Verhoeven S, Frijters R, Heupers B, Polman J, van Schaik R et al (2011) CoPub update: CoPub 50 a text mining system to answer biological questions. Nucleic Acids Res 39(Web Server issue):W450–W454. https://doi.org/10.1093/nar/gkr310

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Liu Y, Liang Y, Wishart D (2015) PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more. Nucleic Acids Res 43(W1):W535–W542. https://doi.org/10.1093/nar/gkv383

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Fontaine J, Andrade-Navarro M (2016) Gene set to diseases (GS2D): disease enrichment analysis on human gene sets with literature data. Genomics Comput Biol 2(1):33. https://doi.org/10.18547/gcb.2016.vol2.iss1.e33

    Article  Google Scholar 

  40. Swanson DR (1986) Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect Biol Med 30(1):7–18. https://doi.org/10.1353/pbm.1986.0087

    Article  CAS  PubMed  Google Scholar 

  41. Swanson D (1990) Somatomedin C and arginine: implicit connections between mutually isolated literatures. Perspect Biol Med 33(2):157–186. https://doi.org/10.1353/pbm.1990.0031

    Article  CAS  PubMed  Google Scholar 

  42. Swanson D (2006) Atrial fibrillation in athletes: implicit literature-based connections suggest that overtraining and subsequent inflammation may be a contributory mechanism. Med Hypotheses 66(6):1085–1092. https://doi.org/10.1016/j.mehy.2006.01.006

    Article  PubMed  Google Scholar 

  43. Swanson DR (2011) Literature-based resurrection of neglected medical discoveries. J Biomed Discov Collab 6:34–47. https://doi.org/10.5210/disco.v6i0.3515

    Article  PubMed  PubMed Central  Google Scholar 

  44. Swanson DR (1988) Migraine and magnesium: eleven neglected connections. Perspect Biol Med 31(4):526–557. https://doi.org/10.1353/pbm.1988.0009

    Article  CAS  PubMed  Google Scholar 

  45. Gallai V, Sarchielli P, Coata G, Firenze C, Morucci P, Abbritti G (1992) Serum and salivary magnesium levels in migraine. Results in a group of juvenile patients. Headache 32(3):132–135. https://doi.org/10.1111/j.1526-4610.1992.hed3203132.x

    Article  CAS  PubMed  Google Scholar 

  46. Hristovski D, Peterlin B, Mitchell J, Humphrey S (2005) Using literature-based discovery to identify disease candidate genes. Int J Med Inform 74(2–4):289–298. https://doi.org/10.1016/j.ijmedinf.2004.04.024

    Article  PubMed  Google Scholar 

  47. Smalheiser N, Torvik V, Zhou W (2009) Arrowsmith two-node search interface: a tutorial on finding meaningful links between two disparate sets of articles in MEDLINE. Comput Methods Prog Biomed 94(2):190–197. https://doi.org/10.1016/j.cmpb.2008.12.006

    Article  Google Scholar 

  48. Tsuruoka Y, Tsujii J, Ananiadou S (2008) FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24(21):2559–2560. https://doi.org/10.1093/bioinformatics/btn469

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Tsuruoka Y, Miwa M, Hamamoto K, Tsujii J, Ananiadou S (2011) Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics 27(13):i111–i119. https://doi.org/10.1093/bioinformatics/btr214

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Pyysalo S, Baker S, Ali I, Haselwimmer S, Shah T, Young A et al (2019) LION LBD: a literature-based discovery system for cancer biology. Bioinformatics 35(9):1553–1561. https://doi.org/10.1093/bioinformatics/bty845

    Article  CAS  PubMed  Google Scholar 

  51. Crichton G, Baker S, Guo Y, Korhonen A (2020) Neural networks for open and closed literature-based discovery. PLoS One 15(5):e0232891. https://doi.org/10.1371/journal.pone.0232891

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Bhasuran, B. (2022). Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries. In: Raja, K. (eds) Biomedical Text Mining. Methods in Molecular Biology, vol 2496. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2305-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2305-3_7

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2304-6

  • Online ISBN: 978-1-0716-2305-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics