Abstract
The HLA control a variety of function involved in immune response and influence susceptibility to over 40 diseases. It is important to find out how HLA cause the disease or modify susceptibility or course of it. In this paper, we developed an automatic HLA-disease information extraction procedure that uses biomedical publications. First, HLA and diseases are recognized in the literature using built-in regular languages and disease categories of Mesh. Second, we generated parse trees for each sentence in PubMed using collins parser. Third, we build our own information extraction algorithm. The algorithm searched parsing trees and extracted relation information from sentences. We automatically collected 10,184 sentences from 66,785 PubMed abstracts using HaDextract. The precision rate of extracted relations reported 89.6% in randomly selected 144 sentences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hanisch, D., Fluck, J., Mevissen, H.-T.: Playing Biologys names Game-Identifying Protein Names in Scientific Text. In: Pacific Symposium on Biocomputing, pp. 403–414 (2003)
Hatzivassiloglou, V., Duboue, P.A., Rzhetsky, A.: Disambiguating Proteins, Genes, and RNA in Text - A Machine Learning Approach. Bioinfomatics 1(1), 1–10 (2001)
Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning support vector machines for biomedical named entity recognition. In: Proceedings of the workshop on Natural Language Processing in the Biomedical Domain, July 2002, pp. 1–8 (2002)
Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinfomatics 17(suppl. 1), S74-S82 (2001)
Temkin, J.M., Gilder, M.R.: Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinfomatics 19(16), 2046–2053 (2003)
Leroy, G., chen, H., Martinez, J.D.: A shallow parser based on closed-class words to capture relations in biomedical text. Journal of Biomedical Informatics (2003)
McDonald, D.M., Chen, H., Su, H., Marshall, B.B.: Extracting gene pathway relations using a hybrid grammar: the Arizona Relation Parser. Bioinfomatics 20(18), 3370–3378 (2004)
Horn, F., Lau, A.L., Cohen, F.E.: Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinfomatics 20(4), 557–568 (2004)
Novichkova, S., Egorov, S., Daraselia, N.: MedScan, a natural language processing engine for MedLine abstract. Bioinfomatics 19(13), 1699–1706 (2003)
Schwartz, A.S., Hearst, M.A.: A simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text. In: Pacific Symposium on Biocomputing, vol. 8, pp. 451–462 (2003)
Ratnaparkhi, A.: A Maximum Entropy Part-Of-Speech Tagger. In: Proceedings of the Empirical Methods in Natural Language Processing Conference, May 17-18, University of Pennsylvania (1996)
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. PhD Dissertation, University of Pennsylvania (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chae, J., Chae, J., Lee, T., Jung, Y., Oh, H., Jung, S. (2009). Automatic Extraction of HLA-Disease Interaction Information from Biomedical Literature. In: Kim, Th., et al. Advances in Computational Science and Engineering. FGCN 2008. Communications in Computer and Information Science, vol 28. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10238-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-10238-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10237-0
Online ISBN: 978-3-642-10238-7
eBook Packages: Computer ScienceComputer Science (R0)