ABSTRACT
This paper aims at implementing Named Entity Recognition (NER) for four languages such as English, Tamil, Hindi and Malayalam. The results obtained from this work are submitted to a research evaluation workshop Forum for Information Retrieval and Evaluation (FIRE 2014). This system detects three levels of named entity tags which are referred as nested named entities. It is a multi-label problem solved using chain classifier method. In this work, Conditional Random Field (CRF) and Support Vector Machine (SVM) are used for implementing NER system. In FIRE 2014, we developed a English NER system using CRF and other NER system for Tamil, Hindi and Malayalam are based on SVM. The FIRE estimated the average precision for all the four languages as 41.93 for outermost level and 33.25 for inner level. In order to improve the performance of Indian languages, we implemented CRF based NER system for the same corpus in Tamil, Hindi and Malayalam. The average precision measure for these mentioned languages are 42.87 for outer level and 36.31 for inner level. The overall performance of the NER system improved by 2.24% for outer level and 9.20% for inner level.
- S. AbdelRahman, M. Elarnaoty, M. Magdy, and A. Fahmy. Integrated machine learning techniques for arabic named entity recognition. IJCSI, 7:27--36, 2010.Google Scholar
- Abinaya.N, Neethu John, Anand Kumar.M and Soman.K.P. Amrita@fire-2014: Named entity recognition for indian languages. working notes in FIRE 2014 -- NER Task, 2014.Google Scholar
- S. B. Bam and T. B. Shahi. Named entity recognition for nepali text using support vector machines. Intelligent Information Management, 2014, 2014.Google Scholar
- Y. Benajiba, M. Diab, and P. Rosso. Arabic named entity recognition using optimized feature sets. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 284--293. Association for Computational Linguistics, 2008. Google ScholarDigital Library
- A. Borthwick. A maximum entropy approach to named entity recognition. PhD thesis, New York University, 1999. Google ScholarDigital Library
- A. Ekbal and S. Bandyopadhyay. Bengali named entity recognition using support vector machine. In IJCNLP, pages 51--58, 2008.Google Scholar
- A. Ekbal and S. Bandyopadhyay. Named entity recognition using support vector machine: A language independent approach. International Journal of Electrical, Computer, and Systems Engineering, 4(2):155--170, 2010.Google Scholar
- G. Georgiev, P. Nakov, K. Ganchev, P. Osenova, and K. Simov. Feature-rich named entity recognition for bulgarian using conditional random fields. In RANLP, pages 113--117, 2009.Google Scholar
- J. Giménez and L. Marquez. Svmtool: A general pos tagger generator based on support vector machines. In In Proceedings of the 4th International Conference on Language Resources and Evaluation. Citeseer, 2004.Google Scholar
- R. Grishman. The nyu system for muc-6 or where's the syntax? In Proceedings of the 6th conference on Message understanding, pages 167--175. Association for Computational Linguistics, 1995. Google ScholarDigital Library
- T. Joachims. Svmlight: Support vector machine. SVM-Light Support Vector Machine http://svmlight.joachims. org/, University of Dortmund, 19(4), 1999.Google Scholar
- D. Kaur and V. Gupta. A survey of named entity recognition in english and other indian languages. IJCSI International Journal of Computer Science Issues, 7(6):1694--0814, 2010.Google Scholar
- T. Kudo. Crf++: Yet another crf toolkit {ol}. 2009.Google Scholar
- J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001.Google ScholarDigital Library
- C. Malarkodi, R. Pattabhi, and L. D. Sobha. Tamil ner--coping with real time challenges. In 24th International Conference on Computational Linguistics, page 23.Google Scholar
- D. Nadeau, P. Turney, and S. Matwin. Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. 2006.Google Scholar
- Pallavi, Anitha S Pillai and Sobha L. Named entity recognition for indian languages: A survey. International Journal of Advanced Research in Computer Science and Software Engineering, 3:1215--1218, November 2013.Google Scholar
- S. Pandian, K. A. Pavithra, and T. Geetha. Hybrid three-stage named entity recognizer for tamil. INFOS2008, March Cairo-Egypt. Available at: http://infos2008. fci. cu. edu.eg/infos/NLP_08_P045-052. pdf, 2008.Google Scholar
- Pattabhi RK Rao, Malarkodi CS, Vijay Sundar Ram and Sobha Lalitha Devi. Neril: Named entity recognition for Indian languages Track at FIRE-2014.Google Scholar
- Prakash Hiremath, Shambhavi B. R. Approaches to named entity recognition in indian languages: A study. International Journal of Engineering and Advanced Technology (IJEAT), ISSN: 2249-8958, Volume-3 Issue-6,:191--194, August 2014.Google Scholar
- L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 147--155. Association for Computational Linguistics, 2009. Google ScholarDigital Library
- S. K. Saha, S. Chatterji, S. Dandapat, S. Sarkar, and P. Mitra. A hybrid approach for named entity recognition in indian languages. In Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pages 17--24, 2008.Google Scholar
- T. D. Singh, K. Nongmeikapam, A. Ekbal, and S. Bandyopadhyay. Named entity recognition for manipuri using support vector machine. In PACLIC, pages 811--818, 2009.Google Scholar
- K. P Soman, R. Loganathan, and V. Ajay. machine learning with SVM and other kernel methods. PHI Learning Pvt. Ltd., 2009.Google Scholar
- K. Srinivasagan, S. Suganthi, and N. Jeyashenbagavalli. An automated system for tamil named entity recognition using hybrid approach. In Intelligent Computing Applications (ICICA), 2014 International Conference on, pages 435--439. IEEE, 2014. Google ScholarDigital Library
- C. Sutton and A. McCallum. An introduction to conditional random fields for relational learning. Introduction to statistical relational learning, pages 93--128, 2006.Google Scholar
- M. Tkachenko and A. Simanovsky. Named entity recognition: Exploring features. In Proceedings of KONVENS, volume 2012, pages 118--127, 2012.Google Scholar
- R. Vijayakrishna and S. L. Devi. Domain focused named entity recognizer for tamil using conditional random fields. In IJCNLP, pages 59--66, 2008.Google Scholar
- L. Zhang, Y. Pan, and T. Zhang. Focused named entity recognition using machine learning. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 281--288. ACM, 2004. Google ScholarDigital Library
- Abinaya.N, Neethu John, M. Anand Kumar and K. P Soman. AMRITA@FIRE-2014: Named Entity Recognition for Indian Languages. Working note in Forum for Information Retrieval Evaluation (FIRE 2014), 2014. Google ScholarDigital Library
Index Terms
- AMRITA_CEN@FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features
Recommendations
AMRITA_CEN@FIRE-2014: Morpheme Extraction and Lemmatization for Tamil using Machine Learning
FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval EvaluationThis paper presents the method of Morpheme Extraction and lemmatization for Tamil language in Morpheme Extraction Task (MET) of FIRE-2014. Tamil is a morphologically rich and agglutinative language. Such a language needs deeper analysis at the word ...
Urdu language processing: a survey
Extensive work has been done on different activities of natural language processing for Western languages as compared to its Eastern counterparts particularly South Asian Languages. Western languages are termed as resource-rich languages. Core ...
A deep learning-based bilingual Hindi and Punjabi named entity recognition system using enhanced word embeddings
AbstractThe increasing availability of information on the web makes the task of named entity recognition (NER) more challenging. Named entity recognition is an important pre-processor tool that is concerned with the extraction of entities of ...
Highlights- Development of enhanced word embeddings for bilingual NER system is a novel attempt.
Comments