Abstract
This paper aims to perform the task of Chinese Word Segmentation on judgements. For this task, the main challenge is the lack of the annotated corpus. To alleviate this challenge, this paper proposes an active learning approach. Specifically, on the basis of a few initial annotated samples, a new active learning approach is proposed to annotate some informative characters, and then select the context around these characters for annotation. In the active learning approach, it not only considers the uncertainty of the sample, but also leverages the redundancy of the sample for the selection of informative characters. Furthermore, this paper adopts the local annotation strategy, which select a substrings around the informative characters rather than the whole sentences and thus could also reduce the annotation. The empirical study demonstrates that the proposed approach effectively reduces the annotation cost and performances better than other baseline sample selection strategies under the same scale of annotation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xue, N.W.: Chinese word segmentation as character tagging. Comput. Linguist. Chin. Lang. Process. 8(1), 29–48 (2003)
Gao, J.F., Li, M., Wu, A., Huang, C.N.: Chinese word segmentation and named entity recognition: a pragmatic approach. Comput. Linguist. 31(4), 531–574 (2005)
Chen, C., Ng, V.I.: Joint modeling for Chinese event extraction with rich linguistic features. In: Proceedings of COLING, pp. 529–544 (2012)
Zhang, R.Q., Yasuda, K., Sumita, E.: Improved statistical machine translation by multiple Chinese word segmentation. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 216–223 (2008)
Sproat, R., Shih, C.A.: Statistical method for finding word boundaries in Chinese text. Comput. Process. Chin. Orient. Lang. 4(4), 336–351 (1990)
Maosong, S., Dayang, S., Tsou. B.K.: Chinese word segmentation without using lexicon and hand-crafted training data. In: Proceedings of ACL, pp. 1265–1271 (2002)
Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter for sighan bakeoff 2005. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, pp. 168–171 (2005)
Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, pp. 282–289 (2001)
Shi, Y.X., Wang, M.Q.: A dual-layer CRFs based joint decoding method for cascaded segmentation and labeling tasks. In: Proceedings of IJCAI, pp. 1707–1712 (2007)
Sun, W.W., Xu, J.X.: Enhancing Chinese word segmentation using unlabeled data. In: Proceedings of EMNLP, pp. 970–979 (2011)
Zhao, H., Huang, C.N., Li, M., Lu, B.L.: Effective tag set selection in Chinese word segmentation via conditional random field modeling. In: Proceedings of PACLIC, pp. 87–94 (2006)
Zheng, X.Q., Chen, H.Y., Xu, T.Y.: Deep learning for Chinese word segmentation and POS tagging. In: Proceedings of EMNLP, pp. 647–657 (2013)
Chen, X.C., Qiu, X.P., Zhu, C.X., Huang, X.J.: Gated recursive neural network for Chinese word segmentation. In: Proceedings of ACL, pp. 1744–1753 (2015)
Chen, X.C., Qiu, X.P., Zhu, C.X., Liu, P.F., Huang, X.J.: Long short-term memory neural networks for Chinese word segmentation. In: Proceedings of EMNLP, pp. 1197–1206 (2015)
Cai, D., Zhao, H.: Neural word segmentation learning for Chinese. In: Proceedings of ACL, pp. 409–420 (2016)
Sassano, M.: An empirical study of active learning with support vector machines for Japanese word segmentation. In: Proceedings of EMNLP, Proceedings of ACL, pp. 505–512 (2002)
Li, S.S., Zhou, G.G., Huang, C.R.: Active learning for Chinese word segmentation. In: Proceedings of COLING, pp. 683–692 (2012)
Xia, F.: The segmentation guidelines for the Penn Chinese Treebank (3.0) (2000)
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of SIGIR, pp. 3–12 (1994)
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 287–294 (1992)
Acknowledgments
This research work has been partially supported by three NSFC grants, No. 61375073, No. 61672366 and No. 61331011.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Yan, Q., Wang, L., Li, S., Liu, H., Zhou, G. (2018). Active Learning for Chinese Word Segmentation on Judgements. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_73
Download citation
DOI: https://doi.org/10.1007/978-3-319-73618-1_73
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)