Skip to main content

Corpus Construction for Extracting Disease-Gene Relations

  • Conference paper
  • 1344 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7661))

Abstract

Many corpus-based statistical methods have been used to tackle issues of extracting disease-gene relations (DGRs) from literature. There are two limitations in the corpus-based approach: One is that available corpora for training a system are not enough and the other is that previous most research have not deal with various types of DGRs but a binary relation. In other words, analysis of presence of relation itself has been a common issue. However, the binary relation is not enough to explain DGR in practice. One solution is to construct a corpus that can analyze various types of relations between diseases and their related genes.

This article describes a corpus construction process with respect to the DGRs. Eleven topics of relations were defined by biologists. Four annotators participated in the corpus annotation task and their inter-annotator agreement was calculated to show reliability for the annotation results.

The gold standard data in the proposed approach can be used to enhance the performance of many research. Examples include recognition of gene and disease names and extraction of fine-grained DGRs. The corpus will be released through the GENIA project home page.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Song, S.-K., Choi, Y.-S., Chun, H.-W., Jeong, C.-H., Choi, S.-P., Sung, W.-K.: Multi-words Terminology Recognition Using Web Search. In: Kim, T.-H., Gelogo, Y. (eds.) UNESST 2011. CCIS, vol. 264, pp. 233–238. Springer, Heidelberg (2011)

    Google Scholar 

  2. Chun, H.W., Jeong, C.H., Song, S.K., Choi, Y.S., Choi, S.P., Sung, W.K.: Composite Kernel-based Relation Extraction using Predicate-Argument Structure. In: Kim, T.-H., Adeli, H., Ma, J., Fang, W.-C., Kang, B.-H., Park, B., Sandnes, F.E., Lee, K.C. (eds.) UNESST 2011. CCIS, vol. 264, pp. 269–273. Springer, Heidelberg (2011)

    Google Scholar 

  3. Chen, J.Y., Shen, C., Sivachenko, A.Y.: Mining Alzheimer disease relevant proteins from integrated protein interactome data. In: The Pacific Symposium on Biocomputing (PSB), pp. 367–378 (2006)

    Google Scholar 

  4. Rosario, B., Hearst, M.A.: Classifying Semantic Relations in Bioscience Texts. In: Proc. of the Annual Meeting of the relation of Computational Linguistics (ACL), pp. 431–438 (2004)

    Google Scholar 

  5. Chen, S., Wen, K.: An integrated system for cancer-related genes mining from biomedical literatures. International Journal of Computer Science and Applications 3(1), 26–39 (2006)

    Google Scholar 

  6. Chun, H.W., Tsuruoka, Y., Kim, J.D., Shiba, R., Nagata, N., Hishiki, T., Tsujii, T.: Extraction of gene-disease relations from MEDLINE using domain dictionaries and machine learning. In: The Pacific Symposium on Biocomputing (PSB), pp. 133–154 (2006)

    Google Scholar 

  7. Chun, H.W., Tsuruoka, Y., Kim, J.D., Shiba, R., Nagata, N., Hishiki, T., Tsujii, T.: Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts. BMC Bioinformatics 7, S4 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chun, HW., Song, SK., Choi, SP., Jung, H. (2012). Corpus Construction for Extracting Disease-Gene Relations. In: Chen, L., Felfernig, A., Liu, J., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2012. Lecture Notes in Computer Science(), vol 7661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34624-8_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34624-8_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34623-1

  • Online ISBN: 978-3-642-34624-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics