Abstract
Few-shot sequence labeling aims to identify novel classes based on only a few labeled samples. Existing methods solve the data scarcity problem mainly by designing token-level or span-level labeling models based on metric learning. However, these methods are only trained at a single granularity (i.e., either token-level or span-level) and have some weaknesses of the corresponding granularity. In this article, we first unify token- and span-level supervisions and propose a Consistent Dual Adaptive Prototypical (CDAP) network for few-shot sequence labeling. CDAP contains the token- and span-level networks, jointly trained at different granularities. To align the outputs of two networks, we further propose a consistent loss to enable them to learn from each other. During the inference phase, we propose a consistent greedy inference algorithm that first adjusts the predicted probability and then greedily selects non-overlapping spans with maximum probability. Extensive experiments show that our model achieves new state-of-the-art results on three benchmark datasets. All the code and data of this work will be released at https://github.com/zifengcheng/CDAP.
- [1] . 2020. Learning unsupervised knowledge-enhanced representations to reduce the semantic gap in information retrieval. ACM Trans. Inf. Syst. 38, 4 (2020), 38:1–38:48.
DOI: Google ScholarDigital Library - [2] . 2020. Augmented natural language for generative sequence labeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 375–385. Google ScholarCross Ref
- [3] . 2022. Few-shot named entity recognition with self-describing networks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). 5711–5722. Google ScholarCross Ref
- [4] . 2019. A closer look at few-shot classification. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net. Retrieved from https://openreview.net/forum?id=HkxLXnAcFQ.Google Scholar
- [5] . 2021. Meta-baseline: Exploring simple meta-learning for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’21). IEEE, 9042–9051.
DOI: Google ScholarCross Ref - [6] . 2021. A unified target-oriented sequence-to-sequence model for emotion-cause pair extraction. IEEE ACM Trans. Audio Speech Lang. Process. 29 (2021), 2779–2791.
DOI: Google ScholarDigital Library - [7] . 2016. Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Comput. Ling. 4 (2016), 357–370. Google ScholarCross Ref
- [8] . 2020. Guiding attention in sequence-to-sequence models for dialogue act prediction. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20). 7594–7601. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/6259Google ScholarCross Ref
- [9] . 2018. Snips voice platform: An embedded spoken language understanding system for private-by-design voice interfaces. Retrieved from http://arxiv.org/abs/1805.10190.Google Scholar
- [10] . 2021. Template-based named entity recognition using BART. In Proceedings of the Association for Computational Linguistics (ACL/IJCNLP’21). 1835–1845. Google ScholarCross Ref
- [11] . 2019. Hierarchically-refined label attention network for sequence labeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 4113–4126. Google ScholarCross Ref
- [12] . 2022. CONTaiNER: Few-shot named entity recognition via contrastive learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). 6338–6353. Google ScholarCross Ref
- [13] . 2017. Results of the WNUT2017 shared task on novel and emerging entity recognition. In Proceedings of the 3rd Workshop on Noisy User-generated Text (NUT@EMNLP’17). Association for Computational Linguistics, 140–147.
DOI: Google ScholarCross Ref - [14] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 4171–4186. Google ScholarCross Ref
- [15] . 2020. A baseline for few-shot image classification. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20). OpenReview.net. Retrieved from https://openreview.net/forum?id=rylXBkrYDSGoogle Scholar
- [16] . 2021. Few-NERD: A few-shot named entity recognition dataset. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP’21). 3198–3213. Google ScholarCross Ref
- [17] . 2006. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 4 (2006), 594–611.
DOI: Google ScholarDigital Library - [18] . 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 1126–1135. Retrieved from http://proceedings.mlr.press/v70/finn17a.htmlGoogle ScholarDigital Library
- [19] . 2018. Probabilistic model-agnostic meta-learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS’18). 9537–9548. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/8e2c381d4dd04f1c55093f22c59c3a08-Abstract.htmlGoogle Scholar
- [20] . 2019. Few-shot classification in named entity recognition task. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing (SAC’19). 993–1000. Google ScholarDigital Library
- [21] . 2021. SpanNER: Named entity re-/recognition as span prediction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP’21). 7183–7195.
DOI: Google ScholarCross Ref - [22] . 2019. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19) and the 31st Innovative Applications of Artificial Intelligence Conference (IAAI’19). AAAI Press, 6407–6414.
DOI: Google ScholarDigital Library - [23] . 2022. Semantic models for the first-stage retrieval: A comprehensive review. ACM Trans. Inf. Syst. 40, 4 (2022), 66:1–66:42.
DOI: Google ScholarDigital Library - [24] . 2021. ConVEx: Data-efficient and few-shot slot labeling. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’21). 3375–3389. Google ScholarCross Ref
- [25] . 2019. Cross attention network for few-shot classification. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS’19). 4005–4016. Retrieved from https://proceedings.neurips.cc/paper/2019/hash/01894d6f048493d2cacde3c579c315a3-Abstract.htmlGoogle Scholar
- [26] . 2020. Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 1381–1393. Google ScholarCross Ref
- [27] . 2022. Inverse is better! fast and accurate prompt for few-shot slot tagging. In Proceedings of the Association for Computational Linguistics (ACL’22). 637–647. Google ScholarCross Ref
- [28] . 2021. Few-shot named entity recognition: An empirical baseline study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 10408–10423. Google ScholarCross Ref
- [29] . 2015. Bidirectional LSTM-CRF models for sequence tagging. Retrieved from http://arxiv.org/abs/1508.01991Google Scholar
- [30] . 2020. Generalizing natural language analysis through span-relation representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). Association for Computational Linguistics, 2120–2133.
DOI: Google ScholarCross Ref - [31] . 2016. Neural architectures for named entity recognition. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT’16). 260–270. Google ScholarCross Ref
- [32] . 2022. Good examples make A faster learner: Simple demonstration-based learning for low-resource NER. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). 2687–2700. Google ScholarCross Ref
- [33] . 2020. Boosting few-shot learning with adaptive margin loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). Computer Vision Foundation/IEEE, 12573–12581.
DOI: Google ScholarCross Ref - [34] . 2022. Few-shot named entity recognition via meta-learning. IEEE Trans. Knowl. Data Eng. 34, 9 (2022), 4245–4256.
DOI: Google ScholarCross Ref - [35] . 2020. Adversarial feature hallucination networks for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). Computer Vision Foundation/IEEE, 13467–13476.
DOI: Google ScholarCross Ref - [36] . 2020. Few-shot learning for new user recommendation in location-based social networks. In Proceedings of the Web Conference (WWW’20). 2472–2478.
DOI: Google ScholarDigital Library - [37] . 2020. A unified MRC framework for named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 5849–5859.
DOI: Google ScholarCross Ref - [38] . 2021. Topic-enhanced knowledge-aware retrieval model for diverse relevance estimation. In Proceedings of the Web Conference (WWW’21). ACM/IW3C2, 756–767.
DOI: Google ScholarDigital Library - [39] . 2015. Coupled sequence labeling on heterogeneous annotations: POS tagging as a case study. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL’15). 1783–1792. Google ScholarCross Ref
- [40] . 2020. An attention-based deep relevance model for few-shot document filtering. ACM Trans. Inf. Syst. 39, 1 (2020), 6:1–6:35.
DOI: Google ScholarDigital Library - [41] . 2022. Few-shot node classification on attributed networks with graph meta-learning. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 471–481.
DOI: Google ScholarDigital Library - [42] . 2019. Decoupled weight decay regularization. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). Retrieved from https://openreview.net/forum?id=Bkg6RiCqY7Google Scholar
- [43] . 2019. Exploring sequence-to-sequence learning in aspect term extraction. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL’19). 3538–3547. Google ScholarCross Ref
- [44] . 2022. Label semantics for few shot named entity recognition. In Proceedings of the Association for Computational Linguistics (ACL’22). 1956–1971. Google ScholarCross Ref
- [45] . 2021. Frustratingly simple few-shot slot tagging. In Proceedings of the Association for Computational Linguistics (ACL/IJCNLP’21). 1028–1033. Google ScholarCross Ref
- [46] . 2022. Unstructured text enhanced open-domain dialogue system: A systematic survey. ACM Trans. Inf. Syst. 40, 1 (2022), 9:1–9:44.
DOI: Google ScholarDigital Library - [47] . 2022. Template-free prompt tuning for few-shot NER. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL’22). 5721–5732. Google ScholarCross Ref
- [48] . 2022. Decomposed meta-learning for few-shot named entity recognition. In Findings of the Association for Computational Linguistics (ACL’22). 1584–1596. Google ScholarCross Ref
- [49] . 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). Google ScholarCross Ref
- [50] . 2021. Few-shot learning for slot tagging with attentive relational network. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL’21). 1566–1572. Google ScholarCross Ref
- [51] . 2020. Instance-based learning of span representations: A case study through named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). Association for Computational Linguistics, 6452–6459.
DOI: Google ScholarCross Ref - [52] . 2013. Towards robust linguistic analysis using ontonotes. In Proceedings of the 17th Conference on Computational Natural Language Learning (CoNLL’13). ACL, 143–152. Retrieved from https://aclanthology.org/W13-3516/Google Scholar
- [53] . 2019. A stack-propagation framework with token-level intent detection for spoken language understanding. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 2078–2087.
DOI: Google ScholarCross Ref - [54] . 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the 7th Conference on Natural Language Learning (CoNLL’03) Held in Cooperation with North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL’03). ACL, 142–147. Retrieved from https://aclanthology.org/W03-0419/Google Scholar
- [55] . 2022. Parallel instance query network for named entity recognition. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). Association for Computational Linguistics, 947–961.
DOI: Google ScholarCross Ref - [56] . 2017. Prototypical networks for few-shot learning. Advances in Neural Information Processing Systems. Retrieved from https://proceedings.neurips.cc/paper/2017/hash/cb8da6767461f2812ae4290eac7cbc42-Abstract.htmlGoogle Scholar
- [57] . 2019. Hierarchical attention prototypical networks for few-shot text classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 476–485.
DOI: Google ScholarCross Ref - [58] . 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 1199–1208. Retrieved from http://openaccess.thecvf.com/content_cvpr_2018/html/Sung_Learning_to_Compare_CVPR_2018_paper.htmlGoogle ScholarCross Ref
- [59] . 2021. A sequence-to-set network for nested named entity recognition. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI’21). ijcai.org, 3936–3942.
DOI: Google ScholarCross Ref - [60] . 2021. Learning from miscellaneous other-class words for few-shot named entity recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP’21). 6236–6247. Google ScholarCross Ref
- [61] . 2021. A large-scale analysis of mixed initiative in information-seeking dialogues for conversational search. ACM Trans. Inf. Syst. 39, 4 (2021), 49:1–49:32.
DOI: Google ScholarDigital Library - [62] . 2017. Attention is all you need. In Proceedings of the Annual Conference on Neural Information Processing Systems: Advances in Neural Information Processing Systems 30. 5998–6008. Retrieved from https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.htmlGoogle Scholar
- [63] . 2016. Matching networks for one shot learning. Advances in Neural Information Processing Systems. Retrieved from https://proceedings.neurips.cc/paper/2016/hash/90e1357833654983612fb05e3ec9148c-Abstract.htmlGoogle Scholar
- [64] . 2022. An enhanced span-based decomposition method for few-shot sequence labeling. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL’22). 5012–5024. Google ScholarCross Ref
- [65] . 2021. Learning from language description: Low-shot named entity recognition via decomposed framework. In Proceedings of the Association for Computational Linguistics (EMNLP’21). 1618–1630. Google ScholarCross Ref
- [66] . 2018. Low-shot learning from imaginary data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). Computer Vision Foundation/IEEE Computer Society, 7278–7286.
DOI: Google ScholarCross Ref - [67] . 2021. A unified generative framework for various NER subtasks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP’21). Association for Computational Linguistics, 5808–5822. Google ScholarCross Ref
- [68] . 2020. Simple and effective few-shot named entity recognition with structured nearest-neighbor learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 6365–6375. Google ScholarCross Ref
- [69] . 2019. TapNet: Neural network augmented with task-adaptive projection for few-shot learning. In Proceedings of the 36th International Conference on Machine Learning (ICML’19). 7115–7123. Retrieved from http://proceedings.mlr.press/v97/yoon19a.htmlGoogle Scholar
- [70] . 2021. Few-shot intent classification and slot filling with retrieved examples. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’21). 734–749. Google ScholarCross Ref
- [71] . 2020. Named entity recognition as dependency parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 6470–6476.
DOI: Google ScholarCross Ref - [72] . 2021. Few-shot conversational dense retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21). 829–838.
DOI: Google ScholarDigital Library - [73] . 2017. The GUM corpus: Creating multilayer resources in the classroom. Lang. Resour. Eval. 51, 3 (2017), 581–612.
DOI: Google ScholarDigital Library - [74] . 2017. Neural models for sequence chunking. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 3365–3371. Retrieved from http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14776Google ScholarCross Ref
- [75] . 2018. MetaGAN: An adversarial approach to few-shot learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS’18). 2371–2380. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/4e4e53aa080247bc31d0eb4e7aeb07a0-Abstract.htmlGoogle Scholar
- [76] . 2022. Reachable distance function for KNN classification. IEEE Trans. Knowl. Data Eng. 35, 7 (2022), 7382–7396.Google ScholarDigital Library
- [77] . 2022. De-bias for generative extraction in unified NER task. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). Association for Computational Linguistics, 808–818.
DOI: Google ScholarCross Ref - [78] . 2022. FlipDA: Effective and robust data augmentation for few-shot learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL’22). Association for Computational Linguistics, 8646–8665.
DOI: Google ScholarCross Ref - [79] . 2022. Deep span representations for named entity recognition. Retrieved from https://arXiv:2210.04182.Google Scholar
Index Terms
- Unifying Token- and Span-level Supervisions for Few-shot Sequence Labeling
Recommendations
Improving sequence labeling with labeled clue sentences
AbstractPre-trained language models (PLMs) have achieved noticeable success on a variety of natural language processing tasks, such as sequence labeling. In particular, the existing sequence labeling methods fine-tune PLMs on large-scale ...
Highlights- A general framework uses labeled clues to mitigate labeled data shortages.
- Two ...
A Cluster-then-label Approach for Few-shot Learning with Application to Automatic Image Data Labeling
Few-shot learning (FSL) aims at learning to generalize from only a small number of labeled examples for a given target task. Most current state-of-the-art FSL methods typically have two limitations. First, they usually require access to a source dataset (...
Few-Shot Adaptation for Multimedia Semantic Indexing
MM '18: Proceedings of the 26th ACM international conference on MultimediaWe propose a few-shot adaptation framework, which bridges zero-shot learning and supervised many-shot learning, for semantic indexing of image and video data. Few-shot adaptation provides robust parameter estimation with few training examples, by ...
Comments