Abstract
Named Entity Recognition (NER) is a well-studied domain in Natural Language Processing. Traditional NER systems, such as Stanford NER system, achieve high performance with formal and grammatically well-structured texts. However, when these systems are applied to informal and noisy texts, which have mixed language with emoticons or abbreviations, there is a significant degradation in results. We attempt to fill this gap by developing a NER system with using novel term features including Word2vec based features and machine learning based classifier. We describe the features and Word2Vec implementation used in our solution and report the results obtained by our system. The system is quite efficient and scalable in terms of classification time complexity and shows promising results which can be potentially improved with larger training sets or with the use of semi-supervised classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rizzo, G., van Erp, M., Plu, J., Troncy, R.: Making sense of Microposts (#Microposts2016) Named Entity rEcognition and Linking (NEEL) Challenge. In: 6th Workshop on Making Sense of Microposts (#Microposts2016), pp. 50–59 (2016)
Torres-Tramon, P., Hromic, H., Walsh, B., Heravi, B., Hayes, C.: Kanopy4Tweets: entity extraction and linking for twitter. In: 6th International Workshop on Making Sense of Microposts (#Microposts) (2016)
Ghosh, S., Maitra, P., Das, D.: Feature based approach to named entity recognition and linking for tweets. In: 6th International Workshop on Making Sense of Microposts (#Microposts) (2016)
Greenfield, K., Caceres, R., Coury, M., Geyer, K., Gwon, Y., Matterer, J., Mensch, A., Sahin, C., Simek, O.: A reverse approach to named entity extraction and linking in Microposts. In: 6th International Workshop on Making Sense of Microposts (2016)
Kucuk, D., Jacquet, G., Steinberger, R.: Named entity recognition on Turkish tweets. In: Proceedings of the Language Resources and Evaluation Conference (2014)
Celikkaya, G., Torunoglu, D., Eryigit, G.: Named entity recognition on real data: a preliminary investigation for Turkish. In: Proceedings of the 7th International Conference on Application of Information and Communication Technologies (2013)
Şeker, G.A., Eryiğit, G.: Initial explorations on using CRFs for Turkish named entity recognition. In: Proceedings of the 24th International Conference on Computational Linguistics, COLING 2012, Mumbai, India (2012)
Eken, B., Cüneyd Tantug, A.: Recognizing named entities in Turkish tweets. In: Proceedings of the Fourth International Conference on Software Engineering and Applications, Dubai, UAE, January 2015
Moreno, I., Moreda, P., Romá-Ferri, M.T.: MaNER: a MedicAl named entity recogniser. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds.) NLDB 2015. LNCS, vol. 9103, pp. 418–423. Springer, Cham (2015). doi:10.1007/978-3-319-19581-0_40
Moreno, I., Moreda, P., Romá-Ferri, M.T.: An active ingredients entity recogniser system based on profiles. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2016. LNCS, vol. 9612, pp. 276–284. Springer, Cham (2016). doi:10.1007/978-3-319-41754-7_25
Riiter, A., Clark, S., Etzioni, M., Etzioni, O.: Named entity recognition in tweets: an experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, 27–31 July 2011
Siencnik, S.: Adapting word2vec to named entity recognition. In: Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA (2015)
Kucuk, D., Steinberger, R.: Experiments to improve named entity recognition on Turkish tweets. In: Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM) @ EACL 2014, Gothenburg, Sweden, 26–30 April 2014, pp. 71–78 (2014)
Ek, T., Kirkegaard, C., Jonsson, H., Nugues, P.: Named entity recognition for short text messages. Procedia-Soc. Behav. Sci. 27, 178–187 (2011)
Godin, F., Vandersmissen, B., De Neve, W., Van de Walle, R.: Named entity recognition for twitter microposts using distributed word representations. In: ACL-IJCNLP 2015, pp. 146–153 (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)
Mikolov, T., Yih, W.-T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL HLT (2013)
Acknowledgements
The co-authors Mete Taşpınar and Murat Can Ganiz would like to thank Buğse Erdoğan and Fahriye Gün from Marmara University @BIGDaTA_Lab for their help. This work is supported in part by Marmara University BAP D type project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Taşpınar, M., Ganiz, M.C., Acarman, T. (2017). A Feature Based Simple Machine Learning Approach with Word Embeddings to Named Entity Recognition on Tweets. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-59569-6_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59568-9
Online ISBN: 978-3-319-59569-6
eBook Packages: Computer ScienceComputer Science (R0)