Automatic Extraction of HLA-Disease Interaction Information from Biomedical Literature

Chae, JeongMin; Chae, JiEun; Lee, Taemin; Jung, YoungHee; Oh, HeungBum; Jung, SoonYoung

doi:10.1007/978-3-642-10238-7_18

JeongMin Chae¹⁰,
JiEun Chae¹¹,
Taemin Lee¹⁰,
YoungHee Jung¹⁰,
HeungBum Oh¹² &
…
SoonYoung Jung¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 28))

Included in the following conference series:

International Conference on Future Generation Communication and Networking

336 Accesses

Abstract

The HLA control a variety of function involved in immune response and influence susceptibility to over 40 diseases. It is important to find out how HLA cause the disease or modify susceptibility or course of it. In this paper, we developed an automatic HLA-disease information extraction procedure that uses biomedical publications. First, HLA and diseases are recognized in the literature using built-in regular languages and disease categories of Mesh. Second, we generated parse trees for each sentence in PubMed using collins parser. Third, we build our own information extraction algorithm. The algorithm searched parsing trees and extracted relation information from sentences. We automatically collected 10,184 sentences from 66,785 PubMed abstracts using HaDextract. The precision rate of extracted relations reported 89.6% in randomly selected 144 sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hanisch, D., Fluck, J., Mevissen, H.-T.: Playing Biologys names Game-Identifying Protein Names in Scientific Text. In: Pacific Symposium on Biocomputing, pp. 403–414 (2003)
Google Scholar
Hatzivassiloglou, V., Duboue, P.A., Rzhetsky, A.: Disambiguating Proteins, Genes, and RNA in Text - A Machine Learning Approach. Bioinfomatics 1(1), 1–10 (2001)
Google Scholar
Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning support vector machines for biomedical named entity recognition. In: Proceedings of the workshop on Natural Language Processing in the Biomedical Domain, July 2002, pp. 1–8 (2002)
Google Scholar
Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinfomatics 17(suppl. 1), S74-S82 (2001)
Google Scholar
Temkin, J.M., Gilder, M.R.: Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinfomatics 19(16), 2046–2053 (2003)
Article Google Scholar
Leroy, G., chen, H., Martinez, J.D.: A shallow parser based on closed-class words to capture relations in biomedical text. Journal of Biomedical Informatics (2003)
Google Scholar
McDonald, D.M., Chen, H., Su, H., Marshall, B.B.: Extracting gene pathway relations using a hybrid grammar: the Arizona Relation Parser. Bioinfomatics 20(18), 3370–3378 (2004)
Article Google Scholar
Horn, F., Lau, A.L., Cohen, F.E.: Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinfomatics 20(4), 557–568 (2004)
Article Google Scholar
Novichkova, S., Egorov, S., Daraselia, N.: MedScan, a natural language processing engine for MedLine abstract. Bioinfomatics 19(13), 1699–1706 (2003)
Article Google Scholar
Schwartz, A.S., Hearst, M.A.: A simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text. In: Pacific Symposium on Biocomputing, vol. 8, pp. 451–462 (2003)
Google Scholar
Ratnaparkhi, A.: A Maximum Entropy Part-Of-Speech Tagger. In: Proceedings of the Empirical Methods in Natural Language Processing Conference, May 17-18, University of Pennsylvania (1996)
Google Scholar
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. PhD Dissertation, University of Pennsylvania (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science Education, Korea University, Email:bluesky@comedu.korea.ac.kr, Korea
JeongMin Chae, Taemin Lee, YoungHee Jung & SoonYoung Jung
Department of Computer and Information Science, University of Pennsylvania, USA
JiEun Chae
Department of Laboratory Medicine, Asan Medical Center and University of Ulsan College of Medicine, Korea
HeungBum Oh

Authors

JeongMin Chae
View author publications
You can also search for this author in PubMed Google Scholar
JiEun Chae
View author publications
You can also search for this author in PubMed Google Scholar
Taemin Lee
View author publications
You can also search for this author in PubMed Google Scholar
YoungHee Jung
View author publications
You can also search for this author in PubMed Google Scholar
HeungBum Oh
View author publications
You can also search for this author in PubMed Google Scholar
SoonYoung Jung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Hannam University, 306-791, Daejeon, South Korea
Tai-hoon Kim
Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada
Laurence T. Yang
Department of Computer Science and Engineering, Masan, Kyungnam University, Kyungnam, South Korea
Jong Hyuk Park
National Chung Cheng University, Chiayi County, Taiwan
Alan Chin-Chen Chang
University of Western Macedonia, West Macedonia, Greece
Thanos Vasilakos
University of Western Sydney, Penrith South, NSW, Australia
Yan Zhang
University of Limoges/CNRS, Site Jidé, 83 rue d’Isle, 87000, Limoges, France
Damien Sauveron
University of Plymouth, Plymouth, UK
Xingang Wang
Wonkwang University, Iksan Chonbuk, South Korea
Young-Sik Jeong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chae, J., Chae, J., Lee, T., Jung, Y., Oh, H., Jung, S. (2009). Automatic Extraction of HLA-Disease Interaction Information from Biomedical Literature. In: Kim, Th., et al. Advances in Computational Science and Engineering. FGCN 2008. Communications in Computer and Information Science, vol 28. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10238-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-10238-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10237-0
Online ISBN: 978-3-642-10238-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics