Skip to main content

Leveraging Large Language Models and Weak Supervision for Social Media Data Annotation: An Evaluation Using COVID-19 Self-reported Vaccination Tweets

  • Conference paper
  • First Online:
HCI International 2023 – Late Breaking Papers (HCII 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14056))

Included in the following conference series:

  • 423 Accesses

Abstract

The COVID-19 pandemic has presented significant challenges to the healthcare industry and society as a whole. With the rapid development of COVID-19 vaccines, social media platforms have become a popular medium for discussions on vaccine-related topics. Identifying vaccine-related tweets and analyzing them can provide valuable insights for public health researchers and policymakers. However, manual annotation of a large number of tweets is time-consuming and expensive. In this study, we evaluate the usage of Large Language Models, in this case GPT-4 (March 23 version), and weak supervision, to identify COVID-19 vaccine-related tweets, with the purpose of comparing performance against human annotators. We leveraged a manually curated gold-standard dataset and used GPT-4 to provide labels without any additional fine-tuning or instructing, in a single-shot mode (no additional prompting).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 183–194. Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1341531.1341557

  2. Pershad, Y., Hangge, P.T., Albadawi, H., Oklu, R.: Social Medicine: twitter in healthcare. J. Clin. Med. Res. 7 (2018). https://doi.org/10.3390/jcm7060121

  3. Xue, J., et al.: Twitter discussions and emotions about the COVID-19 pandemic: machine learning approach. J. Med. Internet Res. 22, e20550 (2020). https://doi.org/10.2196/20550

    Article  Google Scholar 

  4. Ratner, A., Bach, S., Varma, P., Ré, C.: Weak Supervision: the New Programming Paradigm for Machine Learning. Hazy Research. https://dawn.cs (2019)

  5. Cutler, J., Culotta, A.: Using weak supervision to scale the development of machine-learning models for social media-based marketing research. Applied Marketing Analytics. 5, 159–169 (2019)

    Google Scholar 

  6. Chandra, A.L., Desai, S.V., Balasubramanian, V.N., Ninomiya, S., Guo, W.: Active learning with point supervision for cost-effective panicle detection in cereal crops. Plant Methods 16, 34 (2020). https://doi.org/10.1186/s13007-020-00575-8

    Article  Google Scholar 

  7. Shin, C., Li, W., Vishwakarma, H., Roberts, N., Sala, F.: Universalizing Weak Supervision. http://arxiv.org/abs/2112.03865 (2021)

  8. Ratner, A., De Sa, C., Wu, S., Selsam, D., Ré, C.: Data programming: creating large training sets. Quickly. Adv. Neural Inf. Process. Syst. 29, 3567–3575 (2016)

    Google Scholar 

  9. Zhang, J., Hsieh, C.-Y., Yu, Y., Zhang, C., Ratner, A.: A Survey on Programmatic Weak Supervision. http://arxiv.org/abs/2202.05433 (2022)

  10. Munro, R., Monarch, R.: Human-in-the-Loop Machine Learning: Active Learning and Annotation for Human-centered AI. Simon and Schuster (2021)

    Google Scholar 

  11. Brown, T., et al.: Others: language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  12. Wang, Y., et al.: A clinical text classification paradigm using weak supervision and deep representation. BMC Med. Inform. Decis. Mak. 19, 1 (2019). https://doi.org/10.1186/s12911-018-0723-6

    Article  Google Scholar 

  13. Deriu, J., et al.: Leveraging large amounts of weakly supervised data for multi-language sentiment classification. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1045–1052. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE (2017). https://doi.org/10.1145/3038912.3052611

  14. Agarwal, V., et al.: Learning statistical models of phenotypes using noisy labeled training data. J. Am. Med. Inform. Assoc. 23, 1166–1173 (2016). https://doi.org/10.1093/jamia/ocw028

    Article  Google Scholar 

  15. Zamani, H., Bruce Croft, W.: On the Theory of Weak Supervision for Information Retrieval (2018). https://doi.org/10.1145/3234944.3234968

  16. Tekumalla, R., Asl, J.R., Banda, J.M.: Mining archive. org’s twitter stream grab for pharmacovigilance research gold. In: Proceedings of the International AAAI Conference on Web and Social Media, pp. 909–917 (2020)

    Google Scholar 

  17. Tekumalla, R., Banda, J.M.: Using weak supervision to generate training datasets from social media data: a proof of concept to identify drug mentions. Neural Comput. Appl. (2021). https://doi.org/10.1007/s00521-021-06614-2

  18. Tekumalla, R., Banda, J.M.: An enhanced approach to identify and extract medication mentions in tweets via weak supervision. In: Proceedings of the BioCreative VII Challenge Evaluation Workshop (2021)

    Google Scholar 

  19. Tekumalla, R., Banda, J.M.: Identifying epidemic related Tweets using noisy learning. In: Proceedings of LatinX in NLP Research Workshop at NAACL 2022

    Google Scholar 

  20. Tekumalla, R., Banda, J.M.: TweetDIS: A large twitter dataset for natural disasters built using weak supervision. In: 2022 IEEE International Conference on Big Data (Big Data), pp. 4816–4823 (2022). https://doi.org/10.1109/BigData55660.2022.10020214

  21. Tekumalla, R., Banda, J.M.: An empirical study on characterizing natural disasters in class imbalanced social media data using weak supervision. In: 2022 IEEE International Conference on Big Data (Big Data), pp. 4824–4832 (2022). https://doi.org/10.1109/BigData55660.2022.10020594

  22. Saab, K., Dunnmon, J., Ré, C., Rubin, D., Lee-Messer, C.: Weak supervision as an efficient approach for automated seizure detection in electroencephalography. NPJ Digit Med. 3, 59 (2020). https://doi.org/10.1038/s41746-020-0264-0

    Article  Google Scholar 

  23. Fries, J.A., et al.: Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences. Nat. Commun. (2019). https://doi.org/10.1101/339630

  24. Saab, K., et al.: Doubly weak supervision of deep learning models for head CT. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, pp. 811–819. Springer International Publishing (2019). https://doi.org/10.1007/978-3-030-32248-9_90

  25. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/abs/1810.04805, (2018)

  26. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving Language Understanding by Generative Pre-Training (2018). Accessed 17 June 2023

    Google Scholar 

  27. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. http://arxiv.org/abs/1909.11942 (2019)

  28. Liu, Y., et al.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. http://arxiv.org/abs/1907.11692 (2019)

  29. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. http://arxiv.org/abs/1910.01108 (2019)

  30. Clark, K., Luong, M.-T., Le, Q.V., Manning, C.D.: ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. http://arxiv.org/abs/2003.10555 (2020)

  31. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019)

    Google Scholar 

  32. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 5485–5551 (2020)

    MathSciNet  Google Scholar 

  33. Shoeybi, M., et al.: Training Multi-Billion Parameter Language Models Using Model Parallelism. http://arxiv.org/abs/1909.08053 (2019)

  34. Lewis, M., et al.: BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. http://arxiv.org/abs/1910.13461

  35. Martin, L., et al.: CamemBERT: a Tasty French Language Model. http://arxiv.org/abs/1911.03894 (2019)

  36. Müller, M., Salathé, M., Kummervold, P.E.: COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter. http://arxiv.org/abs/2005.07503 (2020)

  37. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020). https://doi.org/10.1093/bioinformatics/btz682

    Article  Google Scholar 

  38. Beltagy, I., Lo, K., Cohan, A.: SciBERT: A Pretrained Language Model for Scientific Text. http://arxiv.org/abs/1903.10676 (2019)

  39. Huang, K., Altosaar, J., Ranganath, R.: ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. http://arxiv.org/abs/1904.05342 (2019)

  40. Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: LEGAL-BERT: The Muppets straight out of Law School. http://arxiv.org/abs/2010.02559 (2020)

  41. Liu, Z., Huang, D., Huang, K., Li, Z., Zhao, J.: Finbert: A pre-trained financial language representation model for financial text mining. In: Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp. 4513–4519 (2021)

    Google Scholar 

  42. Yang, Y., Uy, M.C.S., Huang, A.: FinBERT: A Pretrained Language Model for Financial Communications. http://arxiv.org/abs/2006.08097 (2020)

  43. Araci, D.: FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. http://arxiv.org/abs/1908.10063 (2019)

  44. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (2023)

  45. Settles, B.: Active Learning Literature Survey. University of Wisconsin-Madison Department of Computer Sciences (2009)

    Google Scholar 

  46. Veselovsky, V., Ribeiro, M.H., West, R.: Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. http://arxiv.org/abs/2306.07899 (2023)

  47. Gilardi, F., Alizadeh, M., Kubli, M.: ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks. http://arxiv.org/abs/2303.15056 (2023)

  48. He, X., et al.: AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators. http://arxiv.org/abs/2303.16854 (2023)

  49. Møller, A.G., Dalsgaard, J.A., Pera, A., Aiello, L.M.: Is a Prompt and a Few Samples All You Need? Using GPT-4 for Data Augmentation in Low-Resource Classification Tasks. http://arxiv.org/abs/2304.13861 (2023)

  50. Huang, F., Kwak, H., An, J.: Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech. http://arxiv.org/abs/2302.07736

  51. Yu, D., Li, L., Su, H., Fuoli, M.: Using LLM-Assisted Annotation for Corpus Linguistics: A Case Study of Local Grammar Analysis, http://arxiv.org/abs/2305.08339 (2023)

  52. Kuzman, T., Mozetic, I., Ljubešic, N.: Chatgpt: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification. arXiv e-prints, arXiv--2303 (2023)

    Google Scholar 

  53. Zhu, Y., Zhang, P., Haq, E.-U., Hui, P., Tyson, G.: Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks. http://arxiv.org/abs/2304.10145 (2023)

  54. Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S.: On the dangers of stochastic parrots: can language models be too big? . In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610–623. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3442188.3445922

  55. Reiss, M.V.: Testing the Reliability of ChatGPT for Text Annotation and Classification: A Cautionary Remark. http://arxiv.org/abs/2304.11085 (2023)

  56. Beware the Hype: ChatGPT didn’t Replace Human Data Annotators. https://news.techworkerscoalition.org/2023/04/04/issue-5/. Accessed 17 June 2023

  57. Banda, J.M., et al.: A large-scale COVID-19 twitter chatter dataset for open scientific research—an international collaboration. Epidemiologia. 2, 315–324 (2021). https://doi.org/10.3390/epidemiologia2030024

    Article  Google Scholar 

  58. Weissenbacher, D., Banda, J., Davydova, V., et al.: Overview of the seventh social media mining for health applications (#SMM4H) shared tasks at COLING 2022. In: Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, pp. 221–241. Association for Computational Linguistics, Gyeongju, Republic of Korea (2022)

    Google Scholar 

  59. Tekumalla, R., Asl, J.R., Banda, J.M.: Mining archive.org’s twitter stream grab for pharmacovigilance research gold. ICWSM. 14, 909–917 (2020). https://doi.org/10.1609/icwsm.v14i1.7357

  60. Solmaz, G., Cirillo, F., Maresca, F., Kumar, A.G.A.: Label Augmentation with Reinforced Labeling for Weak Supervision. http://arxiv.org/abs/2204.06436 (2022)

  61. Robinson, J., Jegelka, S., Sra, S.: Strength from weakness: fast learning using weak supervision. In: Iii, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning, pp. 8127–8136. PMLR (13--18 Jul 2020)

    Google Scholar 

  62. Nguyen, D.Q., Vu, T., Tuan Nguyen, A.: BERTweet: a pre-trained language model for english tweets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 9–14. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-demos.2

  63. Magge, A., et al.: Overview of the sixth social media mining for health applications (#SMM4H) shared tasks at NAACL 2021. In: Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task, pp. 21–32. Association for Computational Linguistics, Mexico City, Mexico (2021). https://doi.org/10.18653/v1/2021.smm4h-1.4

  64. AWS Pricing Calculator. https://calculator.aws/#/addService/SageMakerGroundTruth. Accessed 22 June 2023

  65. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960). https://doi.org/10.1177/001316446002000104

    Article  Google Scholar 

  66. McHugh, M.L.: Interrater reliability: the kappa statistic. Biochem. Med.. Med. 22, 276–282 (2012). https://doi.org/10.1016/j.jocd.2012.03.005

    Article  Google Scholar 

  67. Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 378–382 (1971). https://doi.org/10.1037/h0031619

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan M. Banda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tekumalla, R., Banda, J.M. (2023). Leveraging Large Language Models and Weak Supervision for Social Media Data Annotation: An Evaluation Using COVID-19 Self-reported Vaccination Tweets. In: Mori, H., Asahi, Y., Coman, A., Vasilache, S., Rauterberg, M. (eds) HCI International 2023 – Late Breaking Papers. HCII 2023. Lecture Notes in Computer Science, vol 14056. Springer, Cham. https://doi.org/10.1007/978-3-031-48044-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48044-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48043-0

  • Online ISBN: 978-3-031-48044-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics