skip to main content
research-article

Developing Position Structure-Based Framework for Chinese Entity Relation Extraction

Published:01 September 2011Publication History
Skip Abstract Section

Abstract

Relation extraction is the task of finding semantic relations between two entities in text, and is often cast as a classification problem. In contrast to the significant achievements on English language, research progress in Chinese relation extraction is relatively limited. In this article, we present a novel Chinese relation extraction framework, which is mainly based on a 9-position structure. The design of this proposed structure is motivated by the fact that there are some obvious connections between relation types/subtypes and position structures of two entities. The 9-position structure can be captured with less effort than applying deep natural language processing, and is effective to relieve the class imbalance problem which often hurts the classification performance. In our framework, all involved features do not require Chinese word segmentation, which has long been limiting the performance of Chinese language processing. We also utilize some correction and inference mechanisms to further improve the classified results. Experiments on the ACE 2005 Chinese data set show that the 9-position structure feature can provide strong support for Chinese relation extraction. As well as this, other strategies are also effective to further improve the performance.

References

  1. Boser, B. E., Guyon, I., and Vapnik, V. 1992. A training algorithm for optimal margin classifers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory (CLT’92). 144--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bunescu, R. and Mooney, R. 2005. A shortest path dependency tree kernel for relation extraction. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNL’05). 724--731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chawla, N., Japkowicz, N., and Kolcz, A. 2004. Editorial: Special issue on learning from imbalanced datasets. SIGKDD Explor. Newsl. 6, 1, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Che, W., Jiang, J., Su, Z., Pan, Y., and Liu, T. 2005a. Improved-edit-distance kernel for Chinese relation extraction. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP’05). 134--139.Google ScholarGoogle Scholar
  5. Che, W., Liu, T., and Li, S. 2005b. Automatic entity relation extraction. J. Chi. Inf. Proc. 19, 2, 1--6.Google ScholarGoogle Scholar
  6. Chen, J., Ji, D., Tan, C., and Niu, Z. 2006a. Unsupervised relation disambiguation using spectral clustering. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics (COLING-ACL’06). 89--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chen, J., Ji, D., Tan, C., and Niu, Z. 2006b. Relation extraction using label propagation based semi-supervised learning. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics (COLING-ACL’06). 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chen, Y., Li, W., Liu, Y., Zheng, D., and Zhao, T. 2010. Exploring deep belief network for Chinese relation extraction. In Proceedings of the Joint Conference on Chinese Language Processing (CLP’10).Google ScholarGoogle Scholar
  9. Cortes, C. and Vapnik, V. 1995. Support-vector network. Mach. Learn. 20, 273--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Culotta, A. and Sorensen, J. 2004. Dependency tree kernels for relation extraction. In Proceedings of the 42th Annual Meeting of the Association for Computer Linguistics (ACL’04). 423--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Culotta, A., McCallum, A., and Betz, J. 2006. Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In Proceedings of the Joint Human Language Technology Conference/Annual Meeting of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289--1305. Google ScholarGoogle ScholarCross RefCross Ref
  13. Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International Conference on Research and Development in Information Retrieval (SIGIR’99). 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Huang, R. H., Sun, L., and Feng, Y. Y. 2008. Study of kernel-based methods for feature space for relation extraction. In Proceedings of the 4th Asia Information Retrieval Symposium (AIRS’08). 598--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jiang, J. and Zhai, C. 2007. A systematic exploration of the feature space for relation extraction. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT’07). 113--120.Google ScholarGoogle Scholar
  16. Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning (ECML’98). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kambhatla, N. 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the 42th Annual Meeting of the Association for Computer Linguistics (ACL’04). 178--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kambhatla, N. 2006. Minority vote: At-Least-N voting improves recall for extracting relations. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics (COLING-ACL’06). 460--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Katrenko, S., Adriaans, P., and van Someren, M. 2010. Using local alignments for relation recognition. J. Artif. Int. Res. 38, 1, 1--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Li, W., Qian, D., Lu, Q., and Yuan, C. 2007. Detecting, categorizing and clustering entity mentions in Chinese text. In Proceedings of the 30th Annual International Conference on Research and Development in Information Retrieval (SIGIR’07). 647--654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Li, W., Zhang, P., Wei, F., Lu, Q., and Hou, Y. 2008. A novel feature-based approach to Chinese entity relation extraction. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL’08). 89--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Manevitz, M. L. and Yousef, M. 2001. One-class SVMs for document classification. J. Mach. Learn. Res. 2, 139--154. Google ScholarGoogle Scholar
  23. Miller, S., Fox, H., Ramshaw, L., and Weischedel, R. 2000. A novel use of statistical parsing to extract information from text. In Proceedings of 6th Applied Natural Language Processing Conference (ANLP’00). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Miyao, Y., Saetre, R., Sagae, K., Matsuzaki, T., and Tsujii, J. 2008. Task-oriented evaluation of syntactic parsers and their representations. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL’08). 46--54.Google ScholarGoogle Scholar
  25. Nakov, P. and Hearst, M. 2008. Solving relational similarity problems using the web as a corpus. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL’08). 452--460.Google ScholarGoogle Scholar
  26. Sra, S. 2006. Efficient large scale linear programming support vector machines. In Proceedings of the European Conference on Machine Learning (ECML’06). 767--774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Takaaki, H., Satoshi, S., and Ralph, G. 2004. Discovering relations among named entities from large corpora. In Proceedings of the 42th Annual Meeting of the Association for Computer Linguistics (ACL’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wang, T. and Li, Y. 2006. Automatic extraction of hierarchical relations from texts. In Proceedings of the 3rd European Semantic Web Conference (ESWC’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML’97). 412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zelenko, D., Aone, C., and Richardella, A. 2003. Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083--1106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Zhang, J., Ouyang, Y., Li, W., and Hou, Y. 2009. A novel composite kernel approach to Chinese entity relation extraction. In Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages (ICCPOL’09). 236--247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Zhang, M., Zhang, J., Su, J., and Zhou, G. 2006. A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics (COLING-ACL’06). 825--832. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zhang, Z. 2004. Weakly-supervised relation classification for information extraction. In Proceedings of ACM 13th conference on Information and Knowledge Management (CIKM’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zhou, G. and Zhang, M. 2007. Extracting relation information from text documents by exploring various types of knowledge. Inf. Process. Manage. 43, 4, 969--982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhou, G., Su, J., Zhang, J., and Zhang, M. 2005. Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting of the Association for Computer Linguistics (ACL’05). 427--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhou, G., Zhan, M., Ji, D., and Zhu, Q. 2007. Tree kernel-based relation extraction with context-sensitive structured parse tree information. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 728--736.Google ScholarGoogle Scholar
  37. Zhou, J., Xu, Q., Chen, J., and Qu, W. 2009a. A multi-view approach for relation extraction. In Proceedings of the International Conference on Web Information Systems and Mining (WISM’09), Wenyin Liu, Xiangfeng Luo, Fu Lee Wang, and Jingsheng Lei (Eds.) Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhou, G., Qian, L., and Zhu, Q. 2009b. Label propagation via bootstrapped support vectors for semantic relation extraction between named entities. Comput. Speech Lang. 23, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhou, G., Qian, L., and Fan, J. 2010. Tree kernel-based semantic relation extraction with rich syntactic and semantic information. Inf. Sci. 180, 8, 1313--1325. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Developing Position Structure-Based Framework for Chinese Entity Relation Extraction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian Language Information Processing
      ACM Transactions on Asian Language Information Processing  Volume 10, Issue 3
      September 2011
      114 pages
      ISSN:1530-0226
      EISSN:1558-3430
      DOI:10.1145/2002980
      Issue’s Table of Contents

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 2011
      • Accepted: 1 April 2011
      • Revised: 1 February 2011
      • Received: 1 November 2010
      Published in talip Volume 10, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader