Skip to main content

A Feature Based Simple Machine Learning Approach with Word Embeddings to Named Entity Recognition on Tweets

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10260))

Abstract

Named Entity Recognition (NER) is a well-studied domain in Natural Language Processing. Traditional NER systems, such as Stanford NER system, achieve high performance with formal and grammatically well-structured texts. However, when these systems are applied to informal and noisy texts, which have mixed language with emoticons or abbreviations, there is a significant degradation in results. We attempt to fill this gap by developing a NER system with using novel term features including Word2vec based features and machine learning based classifier. We describe the features and Word2Vec implementation used in our solution and report the results obtained by our system. The system is quite efficient and scalable in terms of classification time complexity and shows promising results which can be potentially improved with larger training sets or with the use of semi-supervised classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rizzo, G., van Erp, M., Plu, J., Troncy, R.: Making sense of Microposts (#Microposts2016) Named Entity rEcognition and Linking (NEEL) Challenge. In: 6th Workshop on Making Sense of Microposts (#Microposts2016), pp. 50–59 (2016)

    Google Scholar 

  2. Torres-Tramon, P., Hromic, H., Walsh, B., Heravi, B., Hayes, C.: Kanopy4Tweets: entity extraction and linking for twitter. In: 6th International Workshop on Making Sense of Microposts (#Microposts) (2016)

    Google Scholar 

  3. Ghosh, S., Maitra, P., Das, D.: Feature based approach to named entity recognition and linking for tweets. In: 6th International Workshop on Making Sense of Microposts (#Microposts) (2016)

    Google Scholar 

  4. Greenfield, K., Caceres, R., Coury, M., Geyer, K., Gwon, Y., Matterer, J., Mensch, A., Sahin, C., Simek, O.: A reverse approach to named entity extraction and linking in Microposts. In: 6th International Workshop on Making Sense of Microposts (2016)

    Google Scholar 

  5. Kucuk, D., Jacquet, G., Steinberger, R.: Named entity recognition on Turkish tweets. In: Proceedings of the Language Resources and Evaluation Conference (2014)

    Google Scholar 

  6. Celikkaya, G., Torunoglu, D., Eryigit, G.: Named entity recognition on real data: a preliminary investigation for Turkish. In: Proceedings of the 7th International Conference on Application of Information and Communication Technologies (2013)

    Google Scholar 

  7. Şeker, G.A., Eryiğit, G.: Initial explorations on using CRFs for Turkish named entity recognition. In: Proceedings of the 24th International Conference on Computational Linguistics, COLING 2012, Mumbai, India (2012)

    Google Scholar 

  8. Eken, B., Cüneyd Tantug, A.: Recognizing named entities in Turkish tweets. In: Proceedings of the Fourth International Conference on Software Engineering and Applications, Dubai, UAE, January 2015

    Google Scholar 

  9. Moreno, I., Moreda, P., Romá-Ferri, M.T.: MaNER: a MedicAl named entity recogniser. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds.) NLDB 2015. LNCS, vol. 9103, pp. 418–423. Springer, Cham (2015). doi:10.1007/978-3-319-19581-0_40

    Chapter  Google Scholar 

  10. Moreno, I., Moreda, P., Romá-Ferri, M.T.: An active ingredients entity recogniser system based on profiles. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2016. LNCS, vol. 9612, pp. 276–284. Springer, Cham (2016). doi:10.1007/978-3-319-41754-7_25

    Chapter  Google Scholar 

  11. Riiter, A., Clark, S., Etzioni, M., Etzioni, O.: Named entity recognition in tweets: an experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, 27–31 July 2011

    Google Scholar 

  12. Siencnik, S.: Adapting word2vec to named entity recognition. In: Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA (2015)

    Google Scholar 

  13. Kucuk, D., Steinberger, R.: Experiments to improve named entity recognition on Turkish tweets. In: Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM) @ EACL 2014, Gothenburg, Sweden, 26–30 April 2014, pp. 71–78 (2014)

    Google Scholar 

  14. Ek, T., Kirkegaard, C., Jonsson, H., Nugues, P.: Named entity recognition for short text messages. Procedia-Soc. Behav. Sci. 27, 178–187 (2011)

    Article  Google Scholar 

  15. Godin, F., Vandersmissen, B., De Neve, W., Van de Walle, R.: Named entity recognition for twitter microposts using distributed word representations. In: ACL-IJCNLP 2015, pp. 146–153 (2015)

    Google Scholar 

  16. https://pypi.python.org/pypi/gensim

  17. http://scikit-learn.org/stable/index.html

  18. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)

    Google Scholar 

  19. Mikolov, T., Yih, W.-T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL HLT (2013)

    Google Scholar 

Download references

Acknowledgements

The co-authors Mete Taşpınar and Murat Can Ganiz would like to thank Buğse Erdoğan and Fahriye Gün from Marmara University @BIGDaTA_Lab for their help. This work is supported in part by Marmara University BAP D type project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mete Taşpınar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Taşpınar, M., Ganiz, M.C., Acarman, T. (2017). A Feature Based Simple Machine Learning Approach with Word Embeddings to Named Entity Recognition on Tweets. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59569-6_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59568-9

  • Online ISBN: 978-3-319-59569-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics