Abstract
Relation extraction is the task of finding semantic relations between two entities in text, and is often cast as a classification problem. In contrast to the significant achievements on English language, research progress in Chinese relation extraction is relatively limited. In this article, we present a novel Chinese relation extraction framework, which is mainly based on a 9-position structure. The design of this proposed structure is motivated by the fact that there are some obvious connections between relation types/subtypes and position structures of two entities. The 9-position structure can be captured with less effort than applying deep natural language processing, and is effective to relieve the class imbalance problem which often hurts the classification performance. In our framework, all involved features do not require Chinese word segmentation, which has long been limiting the performance of Chinese language processing. We also utilize some correction and inference mechanisms to further improve the classified results. Experiments on the ACE 2005 Chinese data set show that the 9-position structure feature can provide strong support for Chinese relation extraction. As well as this, other strategies are also effective to further improve the performance.
- Boser, B. E., Guyon, I., and Vapnik, V. 1992. A training algorithm for optimal margin classifers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory (CLT’92). 144--152. Google ScholarDigital Library
- Bunescu, R. and Mooney, R. 2005. A shortest path dependency tree kernel for relation extraction. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNL’05). 724--731. Google ScholarDigital Library
- Chawla, N., Japkowicz, N., and Kolcz, A. 2004. Editorial: Special issue on learning from imbalanced datasets. SIGKDD Explor. Newsl. 6, 1, 1--6. Google ScholarDigital Library
- Che, W., Jiang, J., Su, Z., Pan, Y., and Liu, T. 2005a. Improved-edit-distance kernel for Chinese relation extraction. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP’05). 134--139.Google Scholar
- Che, W., Liu, T., and Li, S. 2005b. Automatic entity relation extraction. J. Chi. Inf. Proc. 19, 2, 1--6.Google Scholar
- Chen, J., Ji, D., Tan, C., and Niu, Z. 2006a. Unsupervised relation disambiguation using spectral clustering. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics (COLING-ACL’06). 89--96. Google ScholarDigital Library
- Chen, J., Ji, D., Tan, C., and Niu, Z. 2006b. Relation extraction using label propagation based semi-supervised learning. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics (COLING-ACL’06). 129--136. Google ScholarDigital Library
- Chen, Y., Li, W., Liu, Y., Zheng, D., and Zhao, T. 2010. Exploring deep belief network for Chinese relation extraction. In Proceedings of the Joint Conference on Chinese Language Processing (CLP’10).Google Scholar
- Cortes, C. and Vapnik, V. 1995. Support-vector network. Mach. Learn. 20, 273--297. Google ScholarDigital Library
- Culotta, A. and Sorensen, J. 2004. Dependency tree kernels for relation extraction. In Proceedings of the 42th Annual Meeting of the Association for Computer Linguistics (ACL’04). 423--429. Google ScholarDigital Library
- Culotta, A., McCallum, A., and Betz, J. 2006. Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In Proceedings of the Joint Human Language Technology Conference/Annual Meeting of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL’06). Google ScholarDigital Library
- Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289--1305. Google ScholarCross Ref
- Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International Conference on Research and Development in Information Retrieval (SIGIR’99). 50--57. Google ScholarDigital Library
- Huang, R. H., Sun, L., and Feng, Y. Y. 2008. Study of kernel-based methods for feature space for relation extraction. In Proceedings of the 4th Asia Information Retrieval Symposium (AIRS’08). 598--604. Google ScholarDigital Library
- Jiang, J. and Zhai, C. 2007. A systematic exploration of the feature space for relation extraction. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT’07). 113--120.Google Scholar
- Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning (ECML’98). Google ScholarDigital Library
- Kambhatla, N. 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the 42th Annual Meeting of the Association for Computer Linguistics (ACL’04). 178--181. Google ScholarDigital Library
- Kambhatla, N. 2006. Minority vote: At-Least-N voting improves recall for extracting relations. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics (COLING-ACL’06). 460--466. Google ScholarDigital Library
- Katrenko, S., Adriaans, P., and van Someren, M. 2010. Using local alignments for relation recognition. J. Artif. Int. Res. 38, 1, 1--48. Google ScholarDigital Library
- Li, W., Qian, D., Lu, Q., and Yuan, C. 2007. Detecting, categorizing and clustering entity mentions in Chinese text. In Proceedings of the 30th Annual International Conference on Research and Development in Information Retrieval (SIGIR’07). 647--654. Google ScholarDigital Library
- Li, W., Zhang, P., Wei, F., Lu, Q., and Hou, Y. 2008. A novel feature-based approach to Chinese entity relation extraction. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL’08). 89--92. Google ScholarDigital Library
- Manevitz, M. L. and Yousef, M. 2001. One-class SVMs for document classification. J. Mach. Learn. Res. 2, 139--154. Google Scholar
- Miller, S., Fox, H., Ramshaw, L., and Weischedel, R. 2000. A novel use of statistical parsing to extract information from text. In Proceedings of 6th Applied Natural Language Processing Conference (ANLP’00). Google ScholarDigital Library
- Miyao, Y., Saetre, R., Sagae, K., Matsuzaki, T., and Tsujii, J. 2008. Task-oriented evaluation of syntactic parsers and their representations. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL’08). 46--54.Google Scholar
- Nakov, P. and Hearst, M. 2008. Solving relational similarity problems using the web as a corpus. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL’08). 452--460.Google Scholar
- Sra, S. 2006. Efficient large scale linear programming support vector machines. In Proceedings of the European Conference on Machine Learning (ECML’06). 767--774. Google ScholarDigital Library
- Takaaki, H., Satoshi, S., and Ralph, G. 2004. Discovering relations among named entities from large corpora. In Proceedings of the 42th Annual Meeting of the Association for Computer Linguistics (ACL’04). Google ScholarDigital Library
- Wang, T. and Li, Y. 2006. Automatic extraction of hierarchical relations from texts. In Proceedings of the 3rd European Semantic Web Conference (ESWC’06). Google ScholarDigital Library
- Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML’97). 412--420. Google ScholarDigital Library
- Zelenko, D., Aone, C., and Richardella, A. 2003. Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083--1106. Google ScholarDigital Library
- Zhang, J., Ouyang, Y., Li, W., and Hou, Y. 2009. A novel composite kernel approach to Chinese entity relation extraction. In Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages (ICCPOL’09). 236--247. Google ScholarDigital Library
- Zhang, M., Zhang, J., Su, J., and Zhou, G. 2006. A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics (COLING-ACL’06). 825--832. Google ScholarDigital Library
- Zhang, Z. 2004. Weakly-supervised relation classification for information extraction. In Proceedings of ACM 13th conference on Information and Knowledge Management (CIKM’04). Google ScholarDigital Library
- Zhou, G. and Zhang, M. 2007. Extracting relation information from text documents by exploring various types of knowledge. Inf. Process. Manage. 43, 4, 969--982. Google ScholarDigital Library
- Zhou, G., Su, J., Zhang, J., and Zhang, M. 2005. Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting of the Association for Computer Linguistics (ACL’05). 427--434. Google ScholarDigital Library
- Zhou, G., Zhan, M., Ji, D., and Zhu, Q. 2007. Tree kernel-based relation extraction with context-sensitive structured parse tree information. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 728--736.Google Scholar
- Zhou, J., Xu, Q., Chen, J., and Qu, W. 2009a. A multi-view approach for relation extraction. In Proceedings of the International Conference on Web Information Systems and Mining (WISM’09), Wenyin Liu, Xiangfeng Luo, Fu Lee Wang, and Jingsheng Lei (Eds.) Google ScholarDigital Library
- Zhou, G., Qian, L., and Zhu, Q. 2009b. Label propagation via bootstrapped support vectors for semantic relation extraction between named entities. Comput. Speech Lang. 23, 4. Google ScholarDigital Library
- Zhou, G., Qian, L., and Fan, J. 2010. Tree kernel-based semantic relation extraction with rich syntactic and semantic information. Inf. Sci. 180, 8, 1313--1325. Google ScholarDigital Library
Index Terms
- Developing Position Structure-Based Framework for Chinese Entity Relation Extraction
Recommendations
Chinese Open Relation Extraction and Knowledge Base Establishment
Named entity relation extraction is an important subject in the field of information extraction. Although many English extractors have achieved reasonable performance, an effective system for Chinese relation extraction remains undeveloped due to the ...
Corpus-Based Extraction of Collocations in Chinese
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03Collocation, i.e. the sequences of certain words which habitually co-occur, plays an essential part in human language. The present study is intending to identify the detailed classification and typical features of collocations in Chinese language, and ...
Research on Progress and Inspiration of Entity Relation Extraction in English Open Domain
Machine Learning for Cyber SecurityAbstractIn the era of big data, how to extract unrestricted type of entity relations from open domain text is a challenging topic. In order to further understand related deep issues, this paper summarized the latest progress in the field of English entity ...
Comments