short-paper

Challenges in adapting text mining for full text articles to assist pathway curation

Authors:
K. E. Ravikumar

Mayo clinic, Rochester, MN

Mayo clinic, Rochester, MN
View Profile

,
K. B. Wagholikar

Mayo clinic, Rochester, MN

Mayo clinic, Rochester, MN
View Profile

,
Hongfang Liu

Mayo clinic, Rochester, MN

Mayo clinic, Rochester, MN
View Profile

BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health InformaticsSeptember 2014Pages 551–558https://doi.org/10.1145/2649387.2649444

Published:20 September 2014Publication History

BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Pages 551–558

ABSTRACT

Annotation of biological pathway databases is largely driven by manual effort with little assistance from text mining. It is a great challenge to the pathway curators to keep up with the pace of ever-growing literature. There have been recent efforts to fill this gap through text mining by identifying the relevant papers and the textual evidence pertaining to pathway information. In the current work, we evaluated the performance of a text mining system that extracts events describing molecular pathways from full text articles and its potential role in assisting manual curation of pathway databases. We specifically investigated the merits of mining full text articles for extracting pathway events by comparing the performance of our system on both full text articles and biomedical abstracts. From the preliminary results, we observed that by processing full text articles the performance of the system improved by nearly 22% against a small drop of 5% in the precision in comparison against the extractions from PubMed abstracts. Preliminary analysis of the text mining results for selected pathways from PharmGKB suggest that the pathway curators do use their biological knowledge to infer new information that go beyond what is often expressed in either the full text articles or abstracts. This study is an attempt to identify the magnitude of gaps that exist between the text mining deliverables and the demands of pathway curation.

References

Baumgartner WA, Cohen KB, Fox LM, Acquaah-Mensah G, Hunter L (2007) Manual curation is not sufficient for annotation of genomic databases. Bioinformatics 23: i41--i48. Google ScholarDigital Library
Björne J, Ginter F, Pyysalo S, Tsujii Ji, Salakoski T (2010) Complex event extraction at PubMed scale. Bioinformatics 26: i382--i390. Google ScholarDigital Library
Björne J, Heimonen J, Ginter F, Airola A, Pahikkala T, et al. (2009) Extracting complex biological events with rich graph-based feature sets;. Association for Computational Linguistics. pp. 10--18. Google ScholarDigital Library
Chen H, Sharp BM (2004) Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics 5: 147.Google ScholarCross Ref
Rzhetsky A, Iossifov I, Koike T, Krauthammer M, Kra P, et al. (2004) GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. Journal of biomedical informatics 37: 43--53. Google ScholarDigital Library
Hunter L, Lu Z, Firby J, Baumgartner WA, Johnson HL, et al. (2008) OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression. BMC bioinformatics 9: 78.Google Scholar
Bandy J, Milward D, McQuay S (2009) Mining protein--protein interactions from published literature using Linguamatics I2E. Protein Networks and Pathway Analysis: Springer. pp. 3--13.Google Scholar
Kemper B, Matsuzaki T, Matsuoka Y, Tsuruoka Y, Kitano H, et al. (2010) PathText: a text mining integrator for biological pathway visualizations. Bioinformatics 26: i374. Google ScholarDigital Library
Ohta T, Matsuzaki T, Okazaki N, Miwa M, Sætre R, et al. (2010) Medie and Info-pubmed: 2010 update. BMC Bioinformatics 11: P7.Google ScholarCross Ref
Nobata C, Cotter P, Okazaki N, Rea B, Sasaki Y, et al. (2008) Kleio: a knowledge-enriched information retrieval system for biology; Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. pp. 787--788. Google ScholarDigital Library
Tsuruoka Y, Tsujii, J. and Ananiadou, S. (2008) FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 24: 2559--2560. Google ScholarDigital Library
Cornish-Bowden A, Hunter P, Cuellar A, Mjolsness E, Juty N, et al. (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19: 524--531.Google ScholarCross Ref
Le Novere N, Hucka M, Mi H, Moodie S, Schreiber F, et al. (2009) The systems biology graphical notation. Nature biotechnology 27: 735--741.Google Scholar
Björne J, Salakoski T (2013) TEES 2.1: Automated annotation scheme learning in the BioNLP 2013 Shared Task. ACL 2013: 16.Google Scholar
Miwa M, Ananiadou S (2013) NaCTeM EventMine for BioNLP 2013 CG and PC tasks. Acl 2013: 94.Google Scholar
Ohta T, Pyysalo S, Rak R, Rowley A, Chun H-W, et al. (2013) Overview of the pathway curation (PC) task of BioNLP shared task 2013, Sofia Bulgaria, August 9 2013, pp. 67--75.Google Scholar
Hewett M, Oliver DE, Rubin DL, Easton KL, Stuart JM, et al. (2002) PharmGKB: the pharmacogenetics knowledge base. Nucleic acids research 30: 163--165.Google Scholar
Ramakrishnan C, Patnia A, Hovy EH, Burns GAPC (2012) Layout-aware text extraction from full-text PDF of scientific articles. Source code for biology and medicine 7: 7.Google Scholar
Cohen KB, Christiansen T, Hunter LE. (2011) Parenthetically speaking: Classifying the contents of parentheses for text mining;. AMIA annual symposium proceedings. pp. 267--272.Google Scholar
GENIA Tagger, http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/.Google Scholar
Huang M, Liu J, Zhu X (2011) GeneTUKit: a software for document-level gene normalization. Bioinformatics 27: 1032--1033. Google ScholarDigital Library
Torii M, Wagholikar K, Liu H (2011) Using machine learning for concept extraction on clinical documents from multiple data sources. Journal of the American Medical Informatics Association 18: 580--587.Google ScholarCross Ref
Leaman R, Dogan RI, Lu, Z (2013) DNorm: Disease Name Normalization with Pairwise Learning to Rank. Bioinformatics 29: 2909--2917.Google ScholarCross Ref
Rocktäschel T, Weidlich M, Leser U (2012) ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics 28: 1633--1640. Google ScholarDigital Library
Narayanaswamy M, Ravikumar KE, Vijay-Shanker K. (2003) A biological named entity recognizer; Pacific Symposium on Biocomputing, 3rd-7th January, Hawaii, 8: 427--438.Google Scholar
Maglott D, Ostell J, Pruitt KD, Tatusova T (2005) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 33: D54--D58.Google ScholarCross Ref
MeSH Ontology http://www.nlm.nih.gov/mesh/Google Scholar
Mattingly CJ, Rosenstein MC, Colby GT, Forrest Jr JN, Boyer JL (2006) The Comparative Toxicogenomics Database (CTD): a resource for comparative toxicological studies. Journal of Experimental Zoology Part A: Comparative Experimental Biology 305: 689--692.Google ScholarCross Ref
Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, et al. (2012) BRAT: a web-based tool for NLP-assisted text annotation; Association for Computational Linguistics. pp. 102--107. Google ScholarDigital Library
Ravikumar KE, Wagholikar KB, Liu H. (2014) Towards pathway curation through literature mining-a case study using pharmgkb; Pacific symposium of Biocomputing 2014, 3^rd-7^th January, Hawii, pp. 352--363.Google Scholar
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, et al. (2004) UniProt: the universal protein knowledgebase. Nucleic acids research 32: D115.Google Scholar
Klyne G, Carroll JJ (2006) Resource description framework (RDF): Concepts and abstract syntax.Google Scholar
BEL (2013) Biological Expression Language; http://www.openbel.org/.Google Scholar
Courtot M, Juty N, Knüpfer C, Waltemath D, Zhukova A, et al. (2011) Controlled vocabularies and semantics in systems biology. Molecular systems biology 7.Google Scholar

Index Terms

Challenges in adapting text mining for full text articles to assist pathway curation
1. Applied computing
  1. Life and medical sciences
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Thesauri

Recommendations

Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies

Graphical abstractDisplay Omitted Highlights Automatic classification of drug-gene relations is 85% sensitive and 69% specific. Our approach prospectively found more gene targets than manual search. Our approach identified new gene targets for ...
Read More
Identifying Disease Genes Based on Functional Annotation and Text Mining

The identification of disease genes from candidated regions is one of the most important tasks in bioinformatics research. Most approaches based on function annotations cannot be used to identify genes for diseases without any known pathogenic genes or ...
Read More
Using expression data to help pathway curation
BIBMW '12: Proceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)

Pathway models for organisms beyond the most popular model organisms are often notoriously incomplete, even for commercially important species such as gallus gallus. This can make experimental expression data hard to interpret. The paper describes ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
September 2014
851 pages
ISBN:9781450328944
DOI:10.1145/2649387
General Chairs:
Pierre Baldi
University of California, Irvine
,
Wei Wang
University of California, Los Angeles
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 September 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
PharmGKB
pathway curation
pathway extraction
text mining
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate254of885submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 175
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Challenges in adapting text mining for full text articles to assist pathway curation

BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies

Identifying Disease Genes Based on Functional Annotation and Text Mining

Using expression data to help pathway curation