skip to main content
10.1145/2824864.2824882acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
research-article

AMRITA_CEN@FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features

Authors Info & Claims
Published:05 December 2014Publication History

ABSTRACT

This paper aims at implementing Named Entity Recognition (NER) for four languages such as English, Tamil, Hindi and Malayalam. The results obtained from this work are submitted to a research evaluation workshop Forum for Information Retrieval and Evaluation (FIRE 2014). This system detects three levels of named entity tags which are referred as nested named entities. It is a multi-label problem solved using chain classifier method. In this work, Conditional Random Field (CRF) and Support Vector Machine (SVM) are used for implementing NER system. In FIRE 2014, we developed a English NER system using CRF and other NER system for Tamil, Hindi and Malayalam are based on SVM. The FIRE estimated the average precision for all the four languages as 41.93 for outermost level and 33.25 for inner level. In order to improve the performance of Indian languages, we implemented CRF based NER system for the same corpus in Tamil, Hindi and Malayalam. The average precision measure for these mentioned languages are 42.87 for outer level and 36.31 for inner level. The overall performance of the NER system improved by 2.24% for outer level and 9.20% for inner level.

References

  1. S. AbdelRahman, M. Elarnaoty, M. Magdy, and A. Fahmy. Integrated machine learning techniques for arabic named entity recognition. IJCSI, 7:27--36, 2010.Google ScholarGoogle Scholar
  2. Abinaya.N, Neethu John, Anand Kumar.M and Soman.K.P. Amrita@fire-2014: Named entity recognition for indian languages. working notes in FIRE 2014 -- NER Task, 2014.Google ScholarGoogle Scholar
  3. S. B. Bam and T. B. Shahi. Named entity recognition for nepali text using support vector machines. Intelligent Information Management, 2014, 2014.Google ScholarGoogle Scholar
  4. Y. Benajiba, M. Diab, and P. Rosso. Arabic named entity recognition using optimized feature sets. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 284--293. Association for Computational Linguistics, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Borthwick. A maximum entropy approach to named entity recognition. PhD thesis, New York University, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Ekbal and S. Bandyopadhyay. Bengali named entity recognition using support vector machine. In IJCNLP, pages 51--58, 2008.Google ScholarGoogle Scholar
  7. A. Ekbal and S. Bandyopadhyay. Named entity recognition using support vector machine: A language independent approach. International Journal of Electrical, Computer, and Systems Engineering, 4(2):155--170, 2010.Google ScholarGoogle Scholar
  8. G. Georgiev, P. Nakov, K. Ganchev, P. Osenova, and K. Simov. Feature-rich named entity recognition for bulgarian using conditional random fields. In RANLP, pages 113--117, 2009.Google ScholarGoogle Scholar
  9. J. Giménez and L. Marquez. Svmtool: A general pos tagger generator based on support vector machines. In In Proceedings of the 4th International Conference on Language Resources and Evaluation. Citeseer, 2004.Google ScholarGoogle Scholar
  10. R. Grishman. The nyu system for muc-6 or where's the syntax? In Proceedings of the 6th conference on Message understanding, pages 167--175. Association for Computational Linguistics, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Joachims. Svmlight: Support vector machine. SVM-Light Support Vector Machine http://svmlight.joachims. org/, University of Dortmund, 19(4), 1999.Google ScholarGoogle Scholar
  12. D. Kaur and V. Gupta. A survey of named entity recognition in english and other indian languages. IJCSI International Journal of Computer Science Issues, 7(6):1694--0814, 2010.Google ScholarGoogle Scholar
  13. T. Kudo. Crf++: Yet another crf toolkit {ol}. 2009.Google ScholarGoogle Scholar
  14. J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Malarkodi, R. Pattabhi, and L. D. Sobha. Tamil ner--coping with real time challenges. In 24th International Conference on Computational Linguistics, page 23.Google ScholarGoogle Scholar
  16. D. Nadeau, P. Turney, and S. Matwin. Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. 2006.Google ScholarGoogle Scholar
  17. Pallavi, Anitha S Pillai and Sobha L. Named entity recognition for indian languages: A survey. International Journal of Advanced Research in Computer Science and Software Engineering, 3:1215--1218, November 2013.Google ScholarGoogle Scholar
  18. S. Pandian, K. A. Pavithra, and T. Geetha. Hybrid three-stage named entity recognizer for tamil. INFOS2008, March Cairo-Egypt. Available at: http://infos2008. fci. cu. edu.eg/infos/NLP_08_P045-052. pdf, 2008.Google ScholarGoogle Scholar
  19. Pattabhi RK Rao, Malarkodi CS, Vijay Sundar Ram and Sobha Lalitha Devi. Neril: Named entity recognition for Indian languages Track at FIRE-2014.Google ScholarGoogle Scholar
  20. Prakash Hiremath, Shambhavi B. R. Approaches to named entity recognition in indian languages: A study. International Journal of Engineering and Advanced Technology (IJEAT), ISSN: 2249-8958, Volume-3 Issue-6,:191--194, August 2014.Google ScholarGoogle Scholar
  21. L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 147--155. Association for Computational Linguistics, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. K. Saha, S. Chatterji, S. Dandapat, S. Sarkar, and P. Mitra. A hybrid approach for named entity recognition in indian languages. In Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pages 17--24, 2008.Google ScholarGoogle Scholar
  23. T. D. Singh, K. Nongmeikapam, A. Ekbal, and S. Bandyopadhyay. Named entity recognition for manipuri using support vector machine. In PACLIC, pages 811--818, 2009.Google ScholarGoogle Scholar
  24. K. P Soman, R. Loganathan, and V. Ajay. machine learning with SVM and other kernel methods. PHI Learning Pvt. Ltd., 2009.Google ScholarGoogle Scholar
  25. K. Srinivasagan, S. Suganthi, and N. Jeyashenbagavalli. An automated system for tamil named entity recognition using hybrid approach. In Intelligent Computing Applications (ICICA), 2014 International Conference on, pages 435--439. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Sutton and A. McCallum. An introduction to conditional random fields for relational learning. Introduction to statistical relational learning, pages 93--128, 2006.Google ScholarGoogle Scholar
  27. M. Tkachenko and A. Simanovsky. Named entity recognition: Exploring features. In Proceedings of KONVENS, volume 2012, pages 118--127, 2012.Google ScholarGoogle Scholar
  28. R. Vijayakrishna and S. L. Devi. Domain focused named entity recognizer for tamil using conditional random fields. In IJCNLP, pages 59--66, 2008.Google ScholarGoogle Scholar
  29. L. Zhang, Y. Pan, and T. Zhang. Focused named entity recognition using machine learning. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 281--288. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Abinaya.N, Neethu John, M. Anand Kumar and K. P Soman. AMRITA@FIRE-2014: Named Entity Recognition for Indian Languages. Working note in Forum for Information Retrieval Evaluation (FIRE 2014), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. AMRITA_CEN@FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation
      December 2014
      151 pages
      ISBN:9781450337557
      DOI:10.1145/2824864
      • Editors:
      • Prasenjit Majumder,
      • Mandar Mitra,
      • Sukomal Pal,
      • Madhulika Agrawal,
      • Parth Mehta

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 December 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate19of64submissions,30%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader