ABSTRACT
In this paper, we study the bias in named entity recognition (NER) models---specifically, the difference in the ability to recognize male and female names as PERSON entity types. We evaluate NER models on a dataset containing 139 years of U.S. census baby names and find that relatively more female names, as opposed to male names, are not recognized as PERSON entities. The result of this analysis yields a new benchmark for gender bias evaluation in named entity recognition systems. The data and code for the application of this benchmark is publicly available for researchers to use.
- Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations. 55--60. http://www.aclweb.org/anthology/P/P14/P14-5010Google Scholar
- Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. 2019. Mitigating Gender Bias in Natural Language Processing: Literature Review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1630--1640.Google ScholarCross Ref
Index Terms
- Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition
Recommendations
Named entity recognition and resolution in legal text
Semantic Processing of Legal TextsNamed entities in text are persons, places, companies, etc. that are explicitly mentioned in text using proper nouns. The process of finding named entities in a text and classifying them to a semantic type, is called named entity recognition. Resolution ...
Inducing Gazetteer for Chinese Named Entity Recognition Based on Local High-Frequent Strings
FITME '09: Proceedings of the 2009 Second International Conference on Future Information Technology and Management EngineeringGazetteers, or entity dictionaries, are important for named entity recognition (NER). Although the dictionaries extracted automatically by the previous methods from a corpus, web or Wikipedia are very huge, they also misses some entities, especially the ...
Unsupervised biomedical named entity recognition
Display Omitted BM-NER is approached by an unsupervised stepwise method.Noun phrase chunking is a good approximation of boundary detection.Distributional semantics works well in classifying entities.The system performs well on clinical and biological ...
Comments