short-paper

Free Access

Can LLM Substitute Human Labeling? A Case Study of Fine-grained Chinese Address Entity Recognition Dataset for UAV Delivery

Authors:
Yuxuan Yao

Department of Computer Science, City University of Hong Kong & City University of Hong Kong Shenzhen Research Institute, Hong Kong, Hong Kong

Department of Computer Science, City University of Hong Kong & City University of Hong Kong Shenzhen Research Institute, Hong Kong, Hong Kong

0009-0009-3955-7272
View Profile

,
Sichun Luo

Department of Computer Science, City University of Hong Kong & City University of Hong Kong Shenzhen Research Institute, Hong Kong, Hong Kong

Department of Computer Science, City University of Hong Kong & City University of Hong Kong Shenzhen Research Institute, Hong Kong, Hong Kong

0000-0001-8753-9137
View Profile

,
Haohan Zhao

Department of Computer Science, City University of Hong Kong & City University of Hong Kong Shenzhen Research Institute, Hong Kong, Hong Kong

Department of Computer Science, City University of Hong Kong & City University of Hong Kong Shenzhen Research Institute, Hong Kong, Hong Kong

0009-0003-6084-1522
View Profile

,
Guanzhi Deng

Department of Computer Science, City University of Hong Kong & City University of Hong Kong Shenzhen Research Institute, Hong Kong, Hong Kong

Department of Computer Science, City University of Hong Kong & City University of Hong Kong Shenzhen Research Institute, Hong Kong, Hong Kong

0009-0003-9557-5308
View Profile

,
Linqi Song

Department of Computer Science, City University of Hong Kong & City University of Hong Kong Shenzhen Research Institute, Hong Kong, Hong Kong

Department of Computer Science, City University of Hong Kong & City University of Hong Kong Shenzhen Research Institute, Hong Kong, Hong Kong

0000-0003-2756-4984
View Profile

WWW '24: Companion Proceedings of the ACM on Web Conference 2024May 2024Pages 1099–1102https://doi.org/10.1145/3589335.3651446

Published:13 May 2024Publication History

WWW '24: Companion Proceedings of the ACM on Web Conference 2024

Pages 1099–1102

ABSTRACT

We present CNER-UAV, a fine-grained C hinese N ame E ntity R ecognition dataset specifically designed for the task of address resolution in U nmanned A erial V ehicle delivery systems. The dataset encompasses a diverse range of five categories, enabling comprehensive training and evaluation of NER models. To construct this dataset, we sourced the data from a real-world UAV delivery system and conducted a rigorous data cleaning and desensitization process to ensure privacy and data integrity. The resulting dataset, consisting of around 12,000 annotated samples, underwent human experts and L arge L anguage M odel annotation. We evaluated classical NER models on our dataset and provided in-depth analysis. The dataset and models are publicly available at https://github.com/zhhvvv/CNER-UAV.

Supplemental Material

rsp6956.mov

Supplemental video

mov

44.7 MB

Download

References

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems , Vol. 33 (2020), 1877--1901.Google Scholar
Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).Google Scholar
Yiming Cui, Wanxiang Che, Shijin Wang, and Ting Liu. 2022a. LERT: A Linguistically-motivated Pre-trained Language Model. arxiv: 2211.05344 [cs.CL]Google Scholar
Yiming Cui, Ziqing Yang, and Ting Liu. 2022b. PERT: Pre-training BERT with Permuted Language Model. arxiv: 2203.06906 [cs.CL]Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Fabrizio Gilardi, Meysam Alizadeh, and Maël Kubli. 2023. Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056 (2023).Google Scholar
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google Scholar
Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li. 2020. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering, Vol. 34, 1 (2020), 50--70.Google ScholarDigital Library
Onkar Litake, Maithili Ravindra Sabane, Parth Sachin Patil, Aparna Abhijeet Ranade, and Raviraj Joshi. 2022. L3cube-mahaner: A marathi named entity recognition dataset and bert models. In Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference. 29--34.Google Scholar
Pan Liu, Yanming Guo, Fenglei Wang, and Guohui Li. 2022. Chinese named entity recognition: The state of the art. Neurocomputing , Vol. 473 (2022), 37--53.Google ScholarDigital Library
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
Sichun Luo, Bowei He, Haohan Zhao, Yinya Huang, Aojun Zhou, Zongpeng Li, Yuanzhang Xiao, Mingjie Zhan, and Linqi Song. 2023. RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation. arXiv preprint arXiv:2312.16018 (2023).Google Scholar
Sichun Luo, Jiansheng Wang, Aojun Zhou, Li Ma, and Linqi Song. 2024 a. Large Language Models Augmented Rating Prediction in Recommender System. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7960--7964. https://doi.org/10.1109/ICASSP48485.2024.10447514Google ScholarCross Ref
Sichun Luo, Yuxuan Yao, Bowei He, Yinya Huang, Aojun Zhou, Xinyi Zhang, Yuanzhang Xiao, Mingjie Zhan, and Linqi Song. 2024 b. Integrating Large Language Models into Recommendation via Mutual Augmentation and Adaptive Aggregation. arXiv preprint arXiv:2401.13870 (2024).Google Scholar
Meituan. 2021. Food Delivery Giant Meituan Unveils Drones for Delivery Service, Offering New User Experience. https://pandaily.com/food-delivery-giant-meituan-unveils-drones-for-delivery-service/Google Scholar
Ke Wang, Houxing Ren, Aojun Zhou, Zimu Lu, Sichun Luo, Weikang Shi, Renrui Zhang, Linqi Song, Mingjie Zhan, and Hongsheng Li. 2023. Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning. arXiv preprint arXiv:2310.03731 (2023).Google Scholar
Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, and Michael Zeng. 2021. Want to reduce labeling cost? GPT-3 can help. arXiv preprint arXiv:2108.13487 (2021).Google Scholar
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems , Vol. 32 (2019).Google Scholar
Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, et al. 2022. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414 (2022).Google Scholar
Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129 (2019).Google Scholar
Aojun Zhou, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo, Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie Zhan, et al. 2023. Solving challenging math word problems using gpt-4 code interpreter with code-based self-verification. arXiv preprint arXiv:2308.07921 (2023).Google Scholar

Index Terms

Can LLM Substitute Human Labeling? A Case Study of Fine-grained Chinese Address Entity Recognition Dataset for UAV Delivery
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
      2. Language resources

Recommendations

NumER: A Fine-Grained Numeral Entity Recognition Dataset
Natural Language Processing and Information Systems
Abstract
Named entity recognition (NER) is essential and widely used in natural language processing tasks such as question answering, entity linking, and text summarization. However, most current NER models and datasets focus more on words than on ...
Read More
Fine-grained Dutch named entity recognition

This paper describes the creation of a fine-grained named entity annotation scheme and corpus for Dutch, and experiments on automatic main type and subtype named entity recognition. We give an overview of existing named entity annotation schemes, and ...
Read More
EduNER: a Chinese named entity recognition dataset for education research
Abstract
A high-quality domain-oriented dataset is crucial for the domain-specific named entity recognition (NER) task. In this study, we introduce a novel education-oriented Chinese NER dataset (EduNER). To provide representative and diverse training data,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '24: Companion Proceedings of the ACM on Web Conference 2024
May 2024
1928 pages
ISBN:9798400701726
DOI:10.1145/3589335
General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dataset annotation
large language model
name entity recognition
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 37
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)37
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Can LLM Substitute Human Labeling? A Case Study of Fine-grained Chinese Address Entity Recognition Dataset for UAV Delivery

WWW '24: Companion Proceedings of the ACM on Web Conference 2024

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

NumER: A Fine-Grained Numeral Entity Recognition Dataset

Fine-grained Dutch named entity recognition

EduNER: a Chinese named entity recognition dataset for education research

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Can LLM Substitute Human Labeling? A Case Study of Fine-grained Chinese Address Entity Recognition Dataset for UAV Delivery

WWW '24: Companion Proceedings of the ACM on Web Conference 2024

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

NumER: A Fine-Grained Numeral Entity Recognition Dataset

Fine-grained Dutch named entity recognition

EduNER: a Chinese named entity recognition dataset for education research

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media