Abstract
Previous work has shown that joint modeling of two Natural Language Processing (NLP) tasks are effective for achieving better performances for both tasks. Lots of task-specific joint models are proposed. This paper proposes a Hierarchical Long Short-Term Memory (HLSTM) model and some its variants for modeling two tasks jointly. The models are flexible for modeling different types of combinations of tasks. It avoids task-specific feature engineering. Besides the enabling of correlation information between tasks, our models take the hierarchical relations between two tasks into consideration, which is not discussed in previous work. Experimental results show that our models outperform strong baselines in three different types of task combination. While both correlation information and hierarchical relations between two tasks are helpful to improve performances for both tasks, the models especially boost performance of tasks on the top of the hierarchical structures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Sentences with ‘request’ intent are not included, since there is always no slot values in those sentences.
- 3.
- 4.
References
Zhou, J., Qu, W., Zhang, F.: Exploiting chunk-level features to improve phrase chunking. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 557–567. Association for Computational Linguistics, July 2012
Chen, W., Zhang, Y., Isahara, H.: An empirical study of Chinese chunking. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 97–104. Association for Computational Linguistics, July 2006
Tan, Y., Yao, T., Chen, Q., Zhu, J.: Applying conditional random fields to chinese shallow parsing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 167–176. Springer, Heidelberg (2005)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Lyu, C., Zhang, Y., Ji, D.: Joint word segmentation, POS-tagging and syntactic chunking. In: Thirtieth AAAI Conference on Artificial Intelligence, March 2016
Tie-jun, Z.C.H.Z., De-quan, Z.: Joint Chinese word segmentation and POS tagging system with undirected graphical models. J. Electr. Inf. Technol. 3, 038 (2010)
Shi, Y., Yao, K., Chen, H., Pan, Y.C., Hwang, M.Y., Peng, B.: Contextual spoken language understanding using recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5271–5275. IEEE, April 2015
Lee, C., Ko, Y., Seo, J.: A simultaneous recognition framework for the spoken language understanding module of intelligent personal assistant software on smart phones. Short Papers, vol. 29, p. 818 (2015)
Duh, K.: Jointly labeling multiple sequences: a factorial HMM approach. In: Proceedings of the ACL Student Research Workshop, pp. 19–24. Association for Computational Linguistics, June 2005
Li, X.: Research on joint learning of sequence labeling in natural language processing (Dissertation for the Doctoral Degree in Engineering). Harbin Institue of Technology, Harbin, China (2010)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE, May 2013
Chen, X., Qiu, X., Zhu, C., Liu, P., Huang, X.: Long short-term memory neural networks for chinese word segmentation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2015)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better mini-batch algorithms via accelerated gradient methods. In: Advances in neural information processing systems, pp. 1647–1655 (2011)
Goller, C., Kuchler, A.: Learning task-dependent distributed representations by backpropagation through structure. In: IEEE International Conference on Neural Networks, vol. 1, pp. 347–352. IEEE, June 1996
Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., Bengio, Y.: Theano: new features and speed improvements. arXiv preprint arXiv:1211.5590 (2012)
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), vol. 4, p. 3, June 2010
Williams, J., Raux, A., Ramachandran, D., Black, A.: The dialog state tracking challenge. In: Proceedings of the SIGDIAL 2013 Conference, pp. 404–413, August 2013
Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: INTERSPEECH, pp. 3771–3775, August 2013
Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L., Hakkani-Tur, D., Zweig, G.: Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 530–539 (2015)
Florian, R., Ngai, G.: Multidimensional transformation-based learning. In: Proceedings of the 2001 workshop on Computational Natural Language Learning, vol. 7, p. 1. Association for Computational Linguistics, July 2001
Pei, W., Ge, T., Chang, B.: Max-margin tensor neural network for Chinese word segmentation. ACL 1, 293–303 (2014)
Qiu, X., Qian, P., Yin, L., Wu, S., Huang, X.: Overview of the NLPCC 2015 shared task: chinese word segmentation and POS tagging for micro-blog texts. In: Hou, L., et al. (eds.) NLPCC 2015. LNCS, vol. 9362, pp. 541–549. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25207-0_50
Acknowledgments
This paper is partially supported by National Natural Science Foundation of China (No. 61273365), discipline building plan in 111 base (No. B08004) and Engineering Research Center of Information Networks of MOE, and the Co-construction Program with the Beijing Municipal Commission of Education.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Zhou, Q., Wen, L., Wang, X., Ma, L., Wang, Y. (2016). A Hierarchical LSTM Model for Joint Tasks. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-47674-2_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47673-5
Online ISBN: 978-3-319-47674-2
eBook Packages: Computer ScienceComputer Science (R0)