Abstract
The COVID-19 pandemic has presented significant challenges to the healthcare industry and society as a whole. With the rapid development of COVID-19 vaccines, social media platforms have become a popular medium for discussions on vaccine-related topics. Identifying vaccine-related tweets and analyzing them can provide valuable insights for public health researchers and policymakers. However, manual annotation of a large number of tweets is time-consuming and expensive. In this study, we evaluate the usage of Large Language Models, in this case GPT-4 (March 23 version), and weak supervision, to identify COVID-19 vaccine-related tweets, with the purpose of comparing performance against human annotators. We leveraged a manually curated gold-standard dataset and used GPT-4 to provide labels without any additional fine-tuning or instructing, in a single-shot mode (no additional prompting).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 183–194. Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1341531.1341557
Pershad, Y., Hangge, P.T., Albadawi, H., Oklu, R.: Social Medicine: twitter in healthcare. J. Clin. Med. Res. 7 (2018). https://doi.org/10.3390/jcm7060121
Xue, J., et al.: Twitter discussions and emotions about the COVID-19 pandemic: machine learning approach. J. Med. Internet Res. 22, e20550 (2020). https://doi.org/10.2196/20550
Ratner, A., Bach, S., Varma, P., Ré, C.: Weak Supervision: the New Programming Paradigm for Machine Learning. Hazy Research. https://dawn.cs (2019)
Cutler, J., Culotta, A.: Using weak supervision to scale the development of machine-learning models for social media-based marketing research. Applied Marketing Analytics. 5, 159–169 (2019)
Chandra, A.L., Desai, S.V., Balasubramanian, V.N., Ninomiya, S., Guo, W.: Active learning with point supervision for cost-effective panicle detection in cereal crops. Plant Methods 16, 34 (2020). https://doi.org/10.1186/s13007-020-00575-8
Shin, C., Li, W., Vishwakarma, H., Roberts, N., Sala, F.: Universalizing Weak Supervision. http://arxiv.org/abs/2112.03865 (2021)
Ratner, A., De Sa, C., Wu, S., Selsam, D., Ré, C.: Data programming: creating large training sets. Quickly. Adv. Neural Inf. Process. Syst. 29, 3567–3575 (2016)
Zhang, J., Hsieh, C.-Y., Yu, Y., Zhang, C., Ratner, A.: A Survey on Programmatic Weak Supervision. http://arxiv.org/abs/2202.05433 (2022)
Munro, R., Monarch, R.: Human-in-the-Loop Machine Learning: Active Learning and Annotation for Human-centered AI. Simon and Schuster (2021)
Brown, T., et al.: Others: language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Wang, Y., et al.: A clinical text classification paradigm using weak supervision and deep representation. BMC Med. Inform. Decis. Mak. 19, 1 (2019). https://doi.org/10.1186/s12911-018-0723-6
Deriu, J., et al.: Leveraging large amounts of weakly supervised data for multi-language sentiment classification. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1045–1052. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE (2017). https://doi.org/10.1145/3038912.3052611
Agarwal, V., et al.: Learning statistical models of phenotypes using noisy labeled training data. J. Am. Med. Inform. Assoc. 23, 1166–1173 (2016). https://doi.org/10.1093/jamia/ocw028
Zamani, H., Bruce Croft, W.: On the Theory of Weak Supervision for Information Retrieval (2018). https://doi.org/10.1145/3234944.3234968
Tekumalla, R., Asl, J.R., Banda, J.M.: Mining archive. org’s twitter stream grab for pharmacovigilance research gold. In: Proceedings of the International AAAI Conference on Web and Social Media, pp. 909–917 (2020)
Tekumalla, R., Banda, J.M.: Using weak supervision to generate training datasets from social media data: a proof of concept to identify drug mentions. Neural Comput. Appl. (2021). https://doi.org/10.1007/s00521-021-06614-2
Tekumalla, R., Banda, J.M.: An enhanced approach to identify and extract medication mentions in tweets via weak supervision. In: Proceedings of the BioCreative VII Challenge Evaluation Workshop (2021)
Tekumalla, R., Banda, J.M.: Identifying epidemic related Tweets using noisy learning. In: Proceedings of LatinX in NLP Research Workshop at NAACL 2022
Tekumalla, R., Banda, J.M.: TweetDIS: A large twitter dataset for natural disasters built using weak supervision. In: 2022 IEEE International Conference on Big Data (Big Data), pp. 4816–4823 (2022). https://doi.org/10.1109/BigData55660.2022.10020214
Tekumalla, R., Banda, J.M.: An empirical study on characterizing natural disasters in class imbalanced social media data using weak supervision. In: 2022 IEEE International Conference on Big Data (Big Data), pp. 4824–4832 (2022). https://doi.org/10.1109/BigData55660.2022.10020594
Saab, K., Dunnmon, J., Ré, C., Rubin, D., Lee-Messer, C.: Weak supervision as an efficient approach for automated seizure detection in electroencephalography. NPJ Digit Med. 3, 59 (2020). https://doi.org/10.1038/s41746-020-0264-0
Fries, J.A., et al.: Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences. Nat. Commun. (2019). https://doi.org/10.1101/339630
Saab, K., et al.: Doubly weak supervision of deep learning models for head CT. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, pp. 811–819. Springer International Publishing (2019). https://doi.org/10.1007/978-3-030-32248-9_90
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/abs/1810.04805, (2018)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving Language Understanding by Generative Pre-Training (2018). Accessed 17 June 2023
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. http://arxiv.org/abs/1909.11942 (2019)
Liu, Y., et al.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. http://arxiv.org/abs/1907.11692 (2019)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. http://arxiv.org/abs/1910.01108 (2019)
Clark, K., Luong, M.-T., Le, Q.V., Manning, C.D.: ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. http://arxiv.org/abs/2003.10555 (2020)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 5485–5551 (2020)
Shoeybi, M., et al.: Training Multi-Billion Parameter Language Models Using Model Parallelism. http://arxiv.org/abs/1909.08053 (2019)
Lewis, M., et al.: BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. http://arxiv.org/abs/1910.13461
Martin, L., et al.: CamemBERT: a Tasty French Language Model. http://arxiv.org/abs/1911.03894 (2019)
Müller, M., Salathé, M., Kummervold, P.E.: COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter. http://arxiv.org/abs/2005.07503 (2020)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020). https://doi.org/10.1093/bioinformatics/btz682
Beltagy, I., Lo, K., Cohan, A.: SciBERT: A Pretrained Language Model for Scientific Text. http://arxiv.org/abs/1903.10676 (2019)
Huang, K., Altosaar, J., Ranganath, R.: ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. http://arxiv.org/abs/1904.05342 (2019)
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: LEGAL-BERT: The Muppets straight out of Law School. http://arxiv.org/abs/2010.02559 (2020)
Liu, Z., Huang, D., Huang, K., Li, Z., Zhao, J.: Finbert: A pre-trained financial language representation model for financial text mining. In: Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp. 4513–4519 (2021)
Yang, Y., Uy, M.C.S., Huang, A.: FinBERT: A Pretrained Language Model for Financial Communications. http://arxiv.org/abs/2006.08097 (2020)
Araci, D.: FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. http://arxiv.org/abs/1908.10063 (2019)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (2023)
Settles, B.: Active Learning Literature Survey. University of Wisconsin-Madison Department of Computer Sciences (2009)
Veselovsky, V., Ribeiro, M.H., West, R.: Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. http://arxiv.org/abs/2306.07899 (2023)
Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks. http://arxiv.org/abs/2303.15056 (2023)
He, X., et al.: AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators. http://arxiv.org/abs/2303.16854 (2023)
Møller, A.G., Dalsgaard, J.A., Pera, A., Aiello, L.M.: Is a Prompt and a Few Samples All You Need? Using GPT-4 for Data Augmentation in Low-Resource Classification Tasks. http://arxiv.org/abs/2304.13861 (2023)
Huang, F., Kwak, H., An, J.: Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech. http://arxiv.org/abs/2302.07736
Yu, D., Li, L., Su, H., Fuoli, M.: Using LLM-Assisted Annotation for Corpus Linguistics: A Case Study of Local Grammar Analysis, http://arxiv.org/abs/2305.08339 (2023)
Kuzman, T., Mozetic, I., Ljubešic, N.: Chatgpt: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification. arXiv e-prints, arXiv--2303 (2023)
Zhu, Y., Zhang, P., Haq, E.-U., Hui, P., Tyson, G.: Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks. http://arxiv.org/abs/2304.10145 (2023)
Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S.: On the dangers of stochastic parrots: can language models be too big? . In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610–623. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3442188.3445922
Reiss, M.V.: Testing the Reliability of ChatGPT for Text Annotation and Classification: A Cautionary Remark. http://arxiv.org/abs/2304.11085 (2023)
Beware the Hype: ChatGPT didn’t Replace Human Data Annotators. https://news.techworkerscoalition.org/2023/04/04/issue-5/. Accessed 17 June 2023
Banda, J.M., et al.: A large-scale COVID-19 twitter chatter dataset for open scientific research—an international collaboration. Epidemiologia. 2, 315–324 (2021). https://doi.org/10.3390/epidemiologia2030024
Weissenbacher, D., Banda, J., Davydova, V., et al.: Overview of the seventh social media mining for health applications (#SMM4H) shared tasks at COLING 2022. In: Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, pp. 221–241. Association for Computational Linguistics, Gyeongju, Republic of Korea (2022)
Tekumalla, R., Asl, J.R., Banda, J.M.: Mining archive.org’s twitter stream grab for pharmacovigilance research gold. ICWSM. 14, 909–917 (2020). https://doi.org/10.1609/icwsm.v14i1.7357
Solmaz, G., Cirillo, F., Maresca, F., Kumar, A.G.A.: Label Augmentation with Reinforced Labeling for Weak Supervision. http://arxiv.org/abs/2204.06436 (2022)
Robinson, J., Jegelka, S., Sra, S.: Strength from weakness: fast learning using weak supervision. In: Iii, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning, pp. 8127–8136. PMLR (13--18 Jul 2020)
Nguyen, D.Q., Vu, T., Tuan Nguyen, A.: BERTweet: a pre-trained language model for english tweets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 9–14. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-demos.2
Magge, A., et al.: Overview of the sixth social media mining for health applications (#SMM4H) shared tasks at NAACL 2021. In: Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task, pp. 21–32. Association for Computational Linguistics, Mexico City, Mexico (2021). https://doi.org/10.18653/v1/2021.smm4h-1.4
AWS Pricing Calculator. https://calculator.aws/#/addService/SageMakerGroundTruth. Accessed 22 June 2023
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960). https://doi.org/10.1177/001316446002000104
McHugh, M.L.: Interrater reliability: the kappa statistic. Biochem. Med.. Med. 22, 276–282 (2012). https://doi.org/10.1016/j.jocd.2012.03.005
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 378–382 (1971). https://doi.org/10.1037/h0031619
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tekumalla, R., Banda, J.M. (2023). Leveraging Large Language Models and Weak Supervision for Social Media Data Annotation: An Evaluation Using COVID-19 Self-reported Vaccination Tweets. In: Mori, H., Asahi, Y., Coman, A., Vasilache, S., Rauterberg, M. (eds) HCI International 2023 – Late Breaking Papers. HCII 2023. Lecture Notes in Computer Science, vol 14056. Springer, Cham. https://doi.org/10.1007/978-3-031-48044-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-48044-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48043-0
Online ISBN: 978-3-031-48044-7
eBook Packages: Computer ScienceComputer Science (R0)