Abstract
From the outset of the COVID-19 pandemic, social media has provided a platform for sharing and discussing experiences in real time. This rich source of information may also prove useful to researchers for uncovering evolving insights into post-acute sequelae of SARS-CoV-2 (PACS), commonly referred to as Long COVID. In order to leverage social media data, we propose using entity-extraction methods for providing clinical insights prior to defining subsequent downstream tasks. In this work, we address the gap between state-of-the-art entity recognition models and the extraction of clinically relevant entities which may be useful to provide explanations for gaining relevant insights from Twitter data. We then propose an approach to bridge the gap by utilizing existing configurable tools, and datasets to enhance the capabilities of these models. Code for this work is available at: https://github.com/VectorInstitute/ProjectLongCovid-NER.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aucott, J. N., & Rebman, A. W. (2021). Long-haul covid: Heed the lessons from other infection-triggered illnesses. The Lancet, 397(10278), 967–968.
Bhambhoria, R., et al. (2020). A smart system to generate and validate question answer pairs for covid-19 literature. In: Proceedings of the First Workshop on Scholarly Document Processing (pp. 20–30)
Bodenreider, O.: The unified medical language system (umls): Integrating biomedical terminology. Nucleic Acids Research, 32(Database issue), 267–270. https://doi.org/10.1093/nar/gkh061.
C., E., et al.: RETAIN: interpretable predictive model in healthcare using reverse time attention mechanism. CoRR abs/1608.05745 (2016), http://arxiv.org/abs/1608.05745
Chiauzzi, E., & Wicks, P. (2019). Digital trespass: Ethical and terms-of-use violations by researchers accessing data from an online patient community. Journal of Medical Internet Research, 21(2), e11985.
Demner-Fushman, D., et al.: MetaMap Lite: An evaluation of a new Java implementation of MetaMap. Journal of the American Medical Informatics Association, 24(4), 841–844 (01 2017). https://doi.org/10.1093/jamia/ocw177
Domingo, F., et al. (2021) Prevalence of long-term effects in individuals diagnosed with covid-19: A living systematic review.
Guo, J., et al. (2020). Mining twitter to explore the emergence of covid-19 symptoms. Public Health Nursing, 37(6), 934–940.
Jelodar, H., et al. (2020). Deep sentiment classification and topic discovery on novel coronavirus or covid-19 online discussions: Nlp using lstm recurrent neural network approach. IEEE Journal of Biomedical and Health Informatics, 24(10), 2733–2742.
M., G., et al.: Umlsbert: Clinical domain knowledge augmentation of contextual embeddings using the unified medical language system metathesaurus (2020)
Mackey, T., et al. (2020). Machine learning to detect self-reporting of symptoms, testing access, and recovery associated with covid-19 on twitter: retrospective big data infoveillance study. JMIR Public Health and Surveillance, 6(2), e19509.
Magge, A., Klein, A., Miranda-Escalada, A., Al-garadi, M.A., Alimova, I., Miftahutdinov, Z., Farre-Maduell, E., Lopez, S.L., Flores, I., O’Connor, K., Weissenbacher, D., Tutubalina, E., Sarker, A., Banda, J.M., Krallinger, M., & Gonzalez-Hernandez, G. (Eds.). (2021). In Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task. Association for Computational Linguistics, Mexico City, Mexico, June 2021. https://aclanthology.org/2021.smm4h-1.0
Morgan, M., et al. (2014). Information extraction for social media. In: Proceedings of the Third Workshop on Semantic Web and Information Extraction (pp. 9–16).
Müller, M., et al. (2020). Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter. arXiv:2005.07503 (2020)
Phillips, N. (2021). The coronavirus is here to stay-here’s what that means. Nature, 590(7846), 382–384.
Pradhan, S., et al. (2014). Semeval-2014 task 7: Analysis of clinical text. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Citeseer
Qin, L., et al. (2020). Prediction of number of cases of 2019 novel coronavirus (covid-19) using social media search index. International Journal of Environmental Research and Public Health, 17(7), 2365.
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead (2019)
Sarker, A., Gonzalez-Hernandez, G.: Overview of the second social media mining for health (smm4h) shared tasks at AMIA 2017. In: Proceedings of the 2nd Social Media Mining for Health Research and Applications Workshop co-located with the American Medical Informatics Association Annual Symposium (AMIA 2017). http://ceur-ws.org/Vol-1996/.
Sha, Y., Wang, M.D. (2017). Interpretable predictions of clinical outcomes with an attention-based recurrent neural network. In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 233–240).
Smailhodzic, E., et al. (2016). Social media use in healthcare: A systematic review of effects on patients and on their relationship with healthcare professionals. BMC Health Services Research, 16(1), 1–14.
Staccini, P., Lau, A. Y., et al. (2020). Social media, research, and ethics: Does participant willingness matter? Yearbook of Medical Informatics, 29(01), 176–183.
Sudre, C., et al. (2021). Attributes and predictors of long covid. Nature Medicine, 27(4), 626–631.
Suominen, H., et al. (2013). Overview of the share/clef ehealth evaluation lab 2013. In: International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 212–231). Springer.
Tsao, S., et al. (2021). What social media told us in the time of covid-19: A scoping review. The Lancet Digital Health.
Uzuner, Ö., et al. (2011). 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5), 552–556.
Vijayan, T., et al. (2020). Trusting evidence over anecdote: Clinical decision making in the era of covid-19. BMJ. https://blogs.bmj.com/bmj/2020/07/23/trusting-evidence-over-anecdote-clinical-decision-making-in-the-era-of-covid-19/.
Wang, Y., et al. (2021). Examining risk and crisis communications of government agencies and stakeholders during early-stages of covid-19 on twitter. Computers in Human Behavior, 114, 106568.
Wiegreffe, S., Pinter, Y. (2019). Attention is not not explanation. arxiv:1908.04626.
Williams, M. L., Burnap, P., & Sloan, L. (2017). Towards an ethical framework for publishing twitter data in social research: Taking into account users’ views, online context and algorithmic estimation. Sociology, 51(6), 1149–1168.
Acknowledgements
The authors would like to thank Vector Institute for making this collaboration possible and providing academic infrastructure and computing support during all phases of this work. We would also like to thank Antoaneta Vladimirova and Celine Leng from Roche, Esmat Sahak from University of Toronto for their support throughout this project, as well as Dr. Angela Cheung from the University Health Network for her expertise and guidance.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Bhambhoria, R. et al. (2023). Towards Providing Clinical Insights on Long Covid from Twitter Data. In: Shaban-Nejad, A., Michalowski, M., Bianco, S. (eds) Multimodal AI in Healthcare. Studies in Computational Intelligence, vol 1060. Springer, Cham. https://doi.org/10.1007/978-3-031-14771-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-14771-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14770-8
Online ISBN: 978-3-031-14771-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)