Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey

Anastassia Shaitarova; Jamil Zaghir; Alberto Lavelli; Michael Krauthammer; Fabio Rinaldi

doi:10.1055/s-0043-1768726

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00034612.xml

Share / Bookmark

Facebook X Linkedin Weibo

Download PDF

CC BY-NC-ND 4.0 · Yearb Med Inform 2023; 32(01): 230-243
DOI: 10.1055/s-0043-1768726

Section 10: Natural Language Processing

Survey

Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey

Anastassia Shaitarova‡^*

¹Department of Computational Linguistics, University of Zurich, Zurich, Switzerland

,

Jamil Zaghir‡^*

²Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland

³Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland

,

Alberto Lavelli

⁴Natural Language Processing Research Unit, Center for Digital Health and Wellbeing, Fondazione Bruno Kessler, Trento, Italy

,

Michael Krauthammer

⁵Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland

⁶Biomedical Informatics, University Hospital Zurich, Zurich, Switzerland

,

Fabio Rinaldi

⁴Natural Language Processing Research Unit, Center for Digital Health and Wellbeing, Fondazione Bruno Kessler, Trento, Italy

⁵Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland

⁷Dalle Molle Institute for Artificial Intelligence Research, Lugano, Switzerland

⁸Swiss Institute of Bioinformatics

› Author Affiliations

› Further Information

Abstract
Full Text
References

Permissions and Reprints

Summary

Objectives: This survey aims to provide an overview of the current state of biomedical and clinical Natural Language Processing (NLP) research and practice in Languages other than English (LoE). We pay special attention to data resources, language models, and popular NLP downstream tasks.

Methods: We explore the literature on clinical and biomedical NLP from the years 2020-2022, focusing on the challenges of multilinguality and LoE. We query online databases and manually select relevant publications. We also use recent NLP review papers to identify the possible information lacunae.

Results: Our work confirms the recent trend towards the use of transformer-based language models for a variety of NLP tasks in medical domains. In addition, there has been an increase in the availability of annotated datasets for clinical NLP in LoE, particularly in European languages such as Spanish, German and French. Common NLP tasks addressed in medical NLP research in LoE include information extraction, named entity recognition, normalization, linking, and negation detection. However, there is still a need for the development of annotated datasets and models specifically tailored to the unique characteristics and challenges of medical text in some of these languages, especially low-resources ones. Lastly, this survey highlights the progress of medical NLP in LoE, and helps at identifying opportunities for future research and development in this field.

Keywords

Multilingualism - natural language processing - datasets as topic - language models - shared tasks

^* These authors contributed equally to this work

⁶ We use the standardized nomenclature ISO 639-3 for the language codes (https://iso639-3.sil.org/code_tables/639/data).

Publication History

Article published online:
26 December 2023

© 2023. IMIA and Thieme. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

References
1 Zhou B, Yang G, Shi Z, Ma S. Natural Language Processing for Smart Healthcare. IEEE Rev Biomed Eng 2022;1–17. doi:10.1109/RBME.2022.3210270.

PubMed
2 Aramaki E, Wakamiya S, Yada S, Nakamura Y. Natural Language Processing: from Bedside to Everywhere. Yearb Med Inform 2022 Jun 2; doi:10.1055/s-0042-1742510.

PubMed
3 Li I, Pan J, Goldwasser J, Verma N, Wong WP, Nuzumlalı MY, et al. Neural Natural Language Processing for unstructured data in electronic health records: A review. Comput Sci Rev 2022 Nov 1;46:100511. doi:10.1016/j.cosrev.2022.100511.

PubMed
4 Névéol A, Dalianis H, Velupillai S, Savova G, Zweigenbaum P. Clinical Natural Language Processing in languages other than English: opportunities and challenges. J Biomed Semant 2018 Mar 30;9(1):12. doi:10.1186/s13326-018-0179-8.

PubMed
5 Walpole SC. Including papers in languages other than English in systematic reviews: important, feasible, yet often omitted. J Clin Epidemiol 2019 Jul 1;111:127–34. doi:10.1016/j.jclinepi.2019.03.004.

PubMed
6 Dalianis H. Characteristics of Patient Records and Clinical Corpora. Dalianis H, editor. Clinical Text Mining: Secondary Use of Electronic Patient Records. Cham: Springer International Publishing; 2018. p. 21–34. doi:10.1007/978-3-319-78503-5_4.

PubMed
7 Soares F, Yamashita GH. On the crucial role of multilingual biomedical databases in epidemic events (SARS-CoV-2 analysis). Int J Infect Dis 2020 Jul;96:352–4. doi:10.1016/j.ijid.2020.05.023.

PubMed
8 Grabar N, Grouin C. Year 2020 (with COVID): Observation of Scientific Literature on Clinical Natural Language Processing. Yearb Med Inform 2021 Aug;30(1):257–63. doi:10.1055/s-0041-1726528.

PubMed
9 Laparra E, Mascio A, Velupillai S, Miller T. A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records. Yearb Med Inform 2021 Aug;30(1):239–44. doi:10.1055/s-0041-1726522.

PubMed
10 Yang F, Wang X, Ma H, Li J. Transformers-sklearn: a toolkit for medical language understanding with transformer-based models. BMC Med Inform Decis Mak 2021 Jul 30;21(Suppl 2):90. doi:10.1186/s12911-021-01459-0.

PubMed
11 Jati BS, Widyawan S, Muhammad Nur Rizal ST. Multilingual Named Entity Recognition Model for Indonesian Health Insurance Question Answering System. Proceedings of the 3rd International Conference on Information and Communications Technology (ICOIACT). 2020. p. 180–4. doi:10.1109/ICOIACT50329.2020.9332027.

PubMed
12 Gérardin C, Wajsbürt P, Vaillant P, Bellamine A, Carrat F, Tannier X. Multilabel classification of medical concepts for patient clinical profile identification. Artif Intell Med 2022 Jun;128:102311. doi:10.1016/j.artmed.2022.102311.

PubMed
13 Wang B, Xie Q, Pei J, Tiwari P, Li Z, Fu J. Pre-trained Language Models in Biomedical Domain: A Systematic Survey. arXiv; 2021. Available at: http://arxiv.org/abs/2110.05006.

PubMed
14 AlShuweihi M, Salloum SA, Shaalan K. Biomedical Corpora and Natural Language Processing on Clinical Text in Languages Other Than English: A Systematic Review. In: Al-Emran M, Shaalan K, Hassanien AE, editors. Recent Advances in Intelligent Systems and Smart Applications. Cham: Springer International Publishing; 2021. p. 491–509. (Studies in Systems, Decision and Control). doi:10.1007/978-3-030-47411-9_27.

PubMed
15 Ge Y, Guo Y, Yang YC, Al-Garadi MA, Sarker A. Few-shot learning for medical text: A systematic review. arXiv; 2022. Available at: https://arxiv.org/abs/2204.14081.

PubMed
16 Magnini B, Altuna B, Lavelli A, Speranza M, Zanoli R. The E3C Project: Collection and Annotation of a Multilingual Corpus of Clinical Cases. Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 : Bologna, Italy, March 1-3, 2021. Torino: Accademia University Press; 2021. p. 258–64. (Collana dell'Associazione Italiana di Linguistica Computazionale). doi:10.4000/books.aaccademia.8663.

PubMed
17 Miranda-Escalada A, Farre-Maduell E, Lima-Lopez S, Estrada D, Gasco L, Krallinger M. Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources. Procesamiento del Lenguaje Natural 2022;69:241–53. doi:10.26342/2022-69-21.

PubMed
18 Miranda-Escalada, Antonio, Eulàlia Farré, Gasco L, Lima S, Krallinger M. DisTEMIST corpus: detection and normalization of disease mentions in Spanish clinical cases. Zenodo; 2022. doi:10.5281/ZENODO.6408476.

PubMed
19 Blinov P, Nesterov A, Zubkova G, Reshetnikova A, Kokh V, Shivade C. RuMedNLI: A Russian Natural Language Inference Dataset For The Clinical Domain. PhysioNet; 2022. doi:10.13026/gxzd-cf80.

PubMed
20 Shivade C. MedNLI - A Natural Language Inference Dataset For The Clinical Domain. PhysioNet; 2017. doi:10.13026/C2RS98

PubMed
21 Frei J, Kramer F. GERNERMED -- An Open German Medical NER Model. arXiv; 2021. Available at: http://arxiv.org/abs/2109.12104.

PubMed
22 Frei J, Kramer F. Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP. arXiv; 2022. Available at: http://arxiv.org/abs/2208.14493.

PubMed
23 Modersohn L, Schulz S, Lohr C, Hahn U. GraSCCo-The First Publicly Shareable, Multiply-Alienated German Clinical Text Corpus. Stud Health Technol Inform 2022;296:66–72. doi:10.3233/SHTI220805.

PubMed
24 Borchert F, Lohr C, Modersohn L. GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines. arXiv; 2020. Available at: https://arxiv.org/abs/2007.06400.

PubMed
25 Borchert F, Lohr C, Modersohn L, Witt J, Langer T, Follmann M, et al. GGPONC 2.0-the German clinical guideline corpus for oncology: Curation workflow, annotation policy, baseline NER taggers. Proceedings of the Thirteenth Language Resources and Evaluation Conference. 2022. p. 3650–60. Available at: https://aclanthology.org/2022.lrec-1.389/.

PubMed
26 Kittner M, Lamping M, Rieke DT, Götze J, Bajwa B, Jelas I, et al. Annotation and initial evaluation of a large annotated German oncological corpus. JAMIA Open 2021 Apr 1;4(2):ooab025. doi:10.1093/jamiaopen/ooab025.

PubMed
27 Grabar N, Dalloux C, Claveau V. CAS: corpus of clinical cases in French. J Biomed Semant 2020 Aug 6;11(1):7. doi:10.1186/s13326-020-00225-x.

PubMed
28 Hiebel N, Ferret O, Fort K, Névéol A. CLISTER : A Corpus for Semantic Textual Similarity in French Clinical Narratives. Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association; 2022. p. 4306–15. Available at: https://aclanthology.org/2022.lrec-1.459.

PubMed
29 Yada S, Nakamura Y, Wakamiya S, Aramaki E. Real-MedNLP: Overview of REAL document-based MEDical Natural Language processing Task. Proceedings of the 16^th NTCIR Conference on Evaluation of Information Access Technologies NII. 2022.

PubMed
30 Kim YM, Lee TH. Korean clinical entity recognition from diagnosis text using BERT. BMC Med Inform Decis Mak 2020 Sep 30;20(7):242. doi:10.1186/s12911-020-01241-8.

PubMed
31 Kim YM, Lee TH, Na SO. Constructing novel datasets for intent detection and ner in a korean healthcare advice system: guidelines and empirical results. Appl Intell 2022;53(1):1–21. doi:10.1007/s10489-022-03400-y.

PubMed
32 Sazzed S. BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla (Bengali). Proceedings of the 21^st Workshop on Biomedical Language Processing. Dublin, Ireland: Association for Computational Linguistics; 2022. p. 323–9. doi:10.18653/v1/2022.bionlp-1.31.

PubMed
33 Van Nguyen K, Van Huynh T, Nguyen DV, Nguyen AGT, Nguyen NLT. New Vietnamese Corpus for Machine Reading Comprehension of Health News Articles. ACM Trans Asian Low-Resour Lang Inf Process 2022 Sep 23;21(5):105:1-105:28. doi:10.1145/3527631.

PubMed
34 Boudjellal N, Zhang H, Khan A, Ahmad A, Naseem R, Dai L. A Silver Standard Biomedical Corpus for Arabic Language. Complexity 2020 Oct 9;2020:e8896659. doi:10.1155/2020/8896659.

PubMed
35 Cherednichenko O, Kanishcheva O, Yakovleva O, Arkatov D. Collection and Processing of a Medical Corpus in Ukrainian. Proceedings of the 4^th International Conference on Computational Linguistics and Intelligent Systems. Lviv: CEUR; 2020. Available at: https://ceur-ws.org/Vol-2604/paper21.pdf.

PubMed
36 Hammoud J, Vatian A, Dobrenko N, Vedernikov N, Shalyto A, Gusarova N. New Arabic Medical Dataset for Diseases Classification. arXiv; 2021. Available at: http://arxiv.org/abs/2106.15236.

PubMed
37 Zhuoma C, Cairang J, Sangjie D, Yangmao Z, Zhuoma Z. Tibetan Medical Named Entity Recognition Study for Tibetan Clinical Electronic Medical Records. SSRN; 2022 Feb 22; doi: 10.2139/ssrn.4040676.

PubMed
38 Zaghir J, Goldman JP, Bjelogrlic M, Keszthelyi D, Gaudet-Blavignac C, Turbé H, et al. Performance of Machine Learning Methods to Classify French Medical Publications. Stud Health Technol Inform 2022;294:874–5. doi:10.3233/SHTI220613.

PubMed
39 Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004 Jan 1;32(Database issue):D267-70. doi: 10.1093/nar/gkh061.

PubMed
40 Lewis P, Ott M, Du J, Stoyanov V. Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art. Proceedings of the 3rd Clinical Natural Language Processing Workshop. Online: Association for Computational Linguistics; 2020. p. 146–57. doi:10.18653/v1/2020.clinicalnlp-1.17.

PubMed
41 Boudjellal N, Zhang H, Khan A, Ahmad A, Naseem R, Shang J, et al. ABioNER: a BERT-based model for Arabic biomedical named-entity recognition. Complexity 2021;2021.

PubMed
42 Kim Y, Kim JH, Lee JM, Jang MJ, Yum YJ, Kim S, et al. A pre-trained BERT for Korean medical natural language processing. Sci Rep 2022;12(1):1–10.

PubMed
43 Bressem KK, Adams LC, Gaudin RA, Tröltzsch D, Hamm B, Makowski MR, et al. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports. Bioinformatics 2020 Nov 1;36(21):5255–61. doi:10.1093/bioinformatics/btaa668.

PubMed
44 Turkmen H, Dikenelli O, Eraslan C, Callı MC. Bioberturk: Exploring Turkish Biomedical Language Model Development Strategies in Low Resource Setting. Research Square; 2022. doi:10.21203/rs.3.rs-2165226/v1.

PubMed
45 Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans Comput Healthc 2022 Jan 31;3(1):1–23. doi:10.1145/3458754.

PubMed
46 Carrino CP, Llop J, Pàmies M, Gutiérrez-Fandiño A, Armengol-Estapé J, Silveira-Ocampo J, et al. Pretrained Biomedical Language Models for Clinical NLP in Spanish. Proceedings of the 21st Workshop on Biomedical Language Processing. Dublin, Ireland: Association for Computational Linguistics; 2022. p. 193–9. doi:10.18653/v1/2022.bionlp-1.19.

PubMed
47 Li X, Zhang H, Zhou XH. Chinese clinical named entity recognition with variant neural structures based on BERT methods. J Biomed Inform 2020;107:103422. doi:10.1016/j.jbi.2020.103422.

PubMed
48 Carrino CP, Armengol-Estapé J, Gutiérrez-Fandiño A, Llop-Palao J, Pàmies M, Gonzalez-Agirre A, et al. Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario. arXiv; 2021. Available at: http://arxiv.org/abs/2109.03570.

PubMed
49 de Vries W, van Cranenburgh A, Bisazza A, Caselli T, van Noord G, Nissim M. BERTje: A Dutch BERT Model. arXiv; 2019. Available at: https://arxiv.org/abs/1912.09582.

PubMed
50 Tanvir H, Kittask C, Eiche S, Sirts K. EstBERT: A Pretrained Language-Specific BERT for Estonian. Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). Reykjavik, Iceland (Online): Linköping University Electronic Press, Sweden; 2021. p. 11–9. Available at: https://aclanthology.org/2021.nodalida-main.2.

PubMed
51 Olthof AW, van Ooijen PM, Cornelissen LJ. The natural language processing of radiology requests and reports of chest imaging: Comparing five transformer models' multilabel classification and a proof-of-concept study. Health Informatics J 2022 Dec;28(4):14604582221131198. doi:10.1177/14604582221131198.

PubMed
52 Grancharova M, Dalianis H. Applying and sharing pre-trained BERT-models for named entity recognition and classification in Swedish electronic patient records. Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). 2021. p. 231–9. Available at: https://aclanthology.org/2021.nodalida-main.23/.

PubMed
53 Bailly A, Blanc C, Guillotin T. Classification multi-label de cas cliniques avec CamemBERT (Multi-label classification of clinical cases with CamemBERT). Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles Atelier DÉfi Fouille de Textes (DEFT). 2021. p. 14–20. Available at: https://aclanthology.org/2021.jeptalnrecital-deft.2/.

PubMed
54 Schneider ETR, de Souza JVA, Knafou J, Silva e Oliveira LE, Copara J, Gumiel YB, et al. BioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition. Proceedings of the 3rd Clinical Natural Language Processing Workshop. Online: Association for Computational Linguistics; 2020. p. 65–72. doi:10.18653/v1/2020.clinicalnlp-1.7.

PubMed
55 Bitton Y, Cohen R, Schifter T, Bachmat E, Elhadad M, Elhadad N. Cross-lingual Unified Medical Language System entity linking in online health communities. J Am Med Inform Assoc 2020 Sep 10;27(10):1585–92. doi:10.1093/jamia/ocaa150.

PubMed
56 Johnson A, Karanasou P, Gaspers J, Klakow D. Cross-lingual Transfer Learning for Japanese Named Entity Recognition. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers). 2019. p. 182–9. doi:10.18653/v1/N19-2023.

PubMed
57 Wang C, Wang H, Zhuang H, Li W, Han S, Zhang H, et al. Chinese medical named entity recognition based on multi-granularity semantic dictionary and multimodal tree. J Biomed Inform 2020 Nov;111:103583. doi:10.1016/j.jbi.2020.103583.

PubMed
58 Shi B, Zhang L, Huang J, Zheng H, Wan J, Zhang L. MDA: An Intelligent Medical Data Augmentation Scheme Based on Medical Knowledge Graph for Chinese Medical Tasks. Appl Sci 2022;12(20):10655. doi: 10.3390/app122010655.

PubMed
59 Kim YM, Lee TH. Korean clinical entity recognition from diagnosis text using BERT. BMC Med Inform Decis Mak 2020 Sep 30;20(7):242. doi:10.1186/s12911-020-01241-8.

PubMed
60 Kawazoe Y, Shibata D, Shinohara E, Aramaki E, Ohe K. A clinical specific BERT developed using a huge Japanese clinical text corpus. PLoS One 2021 Nov 9;16(11):e0259763. doi:10.1371/journal.pone.0259763.

PubMed
61 de Souza JVA, Schneider ETR, Cezar JO, Silva LE, Gumiel YB, Paraiso EC, et al. A multilabel approach to Portuguese clinical named entity recognition. J Health Inform 2020;12.

PubMed
62 Mitrofan M, Pais V. Improving Romanian BioNER Using a Biologically Inspired System. Proceedings of the 21st Workshop on Biomedical Language Processing. Association for Computational Linguistics; 2022. p. 316–22. doi:10.18653/v1/2022.bionlp-1.30.

PubMed
63 Kaplar A, Stošović M, Kaplar A, Brković V, Naumović R, Kovačević A. Evaluation of clinical named entity recognition methods for Serbian electronic health records. Int J Med Inf 2022 Aug;164:104805. doi:10.1016/j.ijmedinf.2022.104805.

PubMed
64 Frei J, Frei-Stuber L, Kramer F. GERNERMED++: Transfer Learning in German Medical NLP. ArXiv; 2022. Available at: https://arxiv.org/abs/2206.14504.

PubMed
65 Wajsbürt P, Sarfati A, Tannier X. Medical concept normalization in French using multilingual terminologies and contextual embeddings. J Biomed Inform 2021 Feb 1;114:103684. doi:10.1016/j.jbi.2021.103684.

PubMed
66 Budiarti RPN, Sukaridhoto S, Al-Hafidz IA, Satrio NA. Symptoms identification of ICD-11 based on clinical NLP mobile apps for diagnosing the disease (ICD-11). Bali Med J 2022;11(3):1162–7.

PubMed
67 French E, McInnes BT. An overview of biomedical entity linking throughout the years. J Biomed Inform 2022 Dec 2;104252. doi:10.1016/j.jbi.2022.104252.

PubMed
68 Névéol A, Grouin C, Leixa J, Rosset S, Zweigenbaum P. The Quaero French Medical Corpus: A Resource for Medical Entity Recognition and Normalization. Proceedings of the Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing. 2014. p. 24–30.

PubMed
69 Gonzalez-Agirre A, Marimon M, Intxaurrondo A, Rabal O, Villegas M, Krallinger M. PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity Recognition track. Proceedings of the 5th Workshop on BioNLP Open Shared Tasks. Hong Kong, China: Association for Computational Linguistics; 2019. p. 1–10. doi:10.18653/v1/D19-5701.

PubMed
70 Miranda-Escalada A, Farré E, Krallinger M. Named Entity Recognition, Concept Normalization and Clinical Coding: Overview of the Cantemist Track for Cancer Text Mining in Spanish, Corpus, Guidelines, Methods and Results. Proceedings of the Iberian Languages Evaluation Forum. 2020;303–23.

PubMed
71 Liu F, Vulić I, Korhonen A, Collier N. Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking. Proceedings of the 59^th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021. p. 565–74. doi:10.18653/v1/2021.acl-short.72.

PubMed
72 Shaitarova A, Rinaldi F. Negation typology and general representation models for cross-lingual zero-shot negation scope resolution in Russian, French, and Spanish. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. 2021. p. 15–23. doi:10.18653/v1/2021.naacl-srw.3.

PubMed
73 Jiménez-Zafra SM, Morante R, Martín-Valdivia MT, Ureña-López LA. Corpora Annotated with Negation: An Overview. Comput Linguist 2020;46(1):1–52. doi:10.1162/coli_a_00371.

PubMed
74 Mahany A, Khaled H, Elmitwally NS, Aljohani N, Ghoniemy S. Negation and Speculation in NLP: A Survey, Corpora, Methods, and Applications. Appl Sci 2022 Jan;12(10):5209. doi:10.3390/app12105209.

PubMed
75 Marimon M, Vivaldi J, Bel N. Annotation of negation in the IULA Spanish Clinical Record Corpus. Proceedings of the Workshop Computational Semantics Beyond Events and Roles. Association for Computational Linguistics; 2017. p. 43–52. doi:10.18653/v1/W17-1807.

PubMed
76 Dalloux C, Claveau V, Grabar N. Speculation and Negation detection in French biomedical corpora. Proceedings of the International Conference on Recent Advances in Natural Language Processing. INCOMA Ltd.; 2019. p. 223–32. doi:10.26615/978-954-452-056-4_026.

PubMed
77 Pabón OS, Montenegro O, Torrente M, González AR, Provencio M, Menasalvas E. Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach. PeerJ Comput Sci 2022 Mar 7;8:e913. doi:10.7717/peerj-cs.913.

PubMed
78 Lima Lopez S, Perez N, Cuadros M, Rigau G. NUBes: A Corpus of Negation and Uncertainty in Spanish Clinical Texts. Proceedings of the Twelfth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association; 2020. p. 5772–81. Available at: https://aclanthology.org/2020.lrec-1.708.

PubMed
79 Oliveira LESE, Peters AC, da Silva AMP, Gebeluca CP, Gumiel YB, Cintho LMM, et al. SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks. J Biomed Semant 2022 May 8;13(1):13. doi:10.1186/s13326-022-00269-1.

PubMed
80 Shaitarova A, Furrer L, Rinaldi F. Cross-lingual transfer-learning approach to negation scope resolution. Proceedings of the Swiss Text Analytics Conference & Conference on Natural Language Processing. 2020 Jun 25; doi:10.5167/UZH-197355.

PubMed
81 Mirzapour M, Abdaoui A, Tchechmedjiev A, Digan W, Bringay S, Jonquet C. French FastContext: A publicly accessible system for detecting negation, temporality and experiencer in French clinical notes. J Biomed Inform 2021 May 1;117:103733. doi:10.1016/j.jbi.2021.103733.

PubMed
82 Santiso S, Pérez A, Casillas A, Oronoz M. Neural negated entity recognition in Spanish electronic health records. J Biomed Inform 2020 May;105:103419. doi:10.1016/j.jbi.2020.103419.

PubMed
83 Dalloux C, Claveau V, Grabar N, Oliveira LES, Moro CMC, Gumiel YB, et al. Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora. Nat Lang Eng 2020;1–21. doi:10.1017/S1351324920000352.

PubMed
84 Funkner A, Balabaeva K, Kovalchuk S. Negation Detection for Clinical Text Mining in Russian. Digit Pers Health Med 2020;342–6. doi:10.3233/SHTI200179.

PubMed
85 Hartmann M, Søgaard A. Multilingual Negation Scope Resolution for Clinical Text. Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis. Association for Computational Linguistics; 2021. p. 7–18. Available at: https://aclanthology.org/2021.louhi-1.2.

PubMed
86 Rivera Zavala R, Martinez P. The Impact of Pretrained Language Models on Negation and Speculation Detection in Cross-Lingual Medical Text: Comparative Study. JMIR Med Inform 2020 Dec 3;8(12):e18953. doi:10.2196/18953.

PubMed
87 Gasco L, Estrada-Zavala D, Farré-Maduell E, Lima-López S, Miranda-Escalada A, Krallinger M. Overview of the SocialDisNER shared task on detection of diseases mentions from healthcare related and patient generated social media content: methods, evaluation and corpora. Proceedings of the Seventh Social Media Mining for Health (# SMM4H) Workshop and Shared Task. 2022. Available at: https://aclanthology.org/2022.smm4h-1.48/.

PubMed
88 Lima-López S, Farré-Maduell E, Miranda-Escalada A, Brivá-Iglesias V, Krallinger M. NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts. Procesamiento del Lenguaje Natural 2021;67:243–56. doi:10.26342/2021-67-21.

PubMed
89 Chapman WW, Nadkarni PM, Hirschman L, D'avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 2011 Sep-Oct;18(5):540-3. doi:10.1136/amiajnl-2011-000465.

PubMed
90 Deleger L, Li Q, Lingren T, Kaiser M, Molnar K, Stoutenborough L, et al. Building gold standard corpora for medical natural language processing tasks. AMIA Annu Symp Proc 2012:144-53.

PubMed
91 Wissler L, Almashraee M, Díaz DM, Paschke A. The Gold Standard in Corpus Annotation. Proceedings of the 5^th IEEE Germany Student Conference. 2014;21. doi:10.13140/2.1.4316.3523.

PubMed
92 Miranda-Escalada A, Gonzalez-Agirre A, Armengol-Estapé J, Krallinger M. Overview of Automatic Clinical Coding: Annotations, Guidelines, and Solutions for non-English Clinical Cases at CodiEsp Track of CLEF eHealth 2020. CLEF (Working Notes). 2020.

PubMed
93 Costa J, Lopes I, Carreiro AV, Ribeiro D, Soares C. Fraunhofer AICOS at CLEF eHealth 2020 Task 1: Clinical Code Extraction From Textual Data Using Fine-Tuned BERT Models. CLEF (Working Notes). 2020.

PubMed
94 Perea-Ortega JM, López-Úbeda P, Díaz-Galiano MC, Valdivia MTM, López LAU. SINAI at CLEF eHealth 2020: Testing Different pre-trained Word Embeddings for Clinical Coding in Spanish. CLEF (Working Notes). 2020.

PubMed
95 Cossin S, Jouhet V. IAM at CLEF eHealth 2020: Concept Annotation in Spanish Electronic Health Records. CLEF (Working Notes). 2020.

PubMed
96 Mayya V, Kamath SS, Sugumaran V. LAT A − Label Attention Transformer Architectures for ICD-10 Coding of Unstructured Clinical Notes. Proceedings of the IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology. 2021. p. 1–7.

PubMed
97 García-Santa N, Cetina K, Cappellato L, Eickhoff C, Ferro N, Nevéol A. FLE at CLEF eHealth 2020: Text Mining and Semantic Knowledge for Automated Clinical Encoding. CLEF (Working Notes). 2020.

PubMed
98 de la Iglesia I, Martínez-Puente M, Platas A, San Miguel I, Atutxa A, Gojenola K. Media team: Clef-2020 ehealth task 1: Multilingual information extraction-codiesp. CLEF (Working Notes). 2020.

PubMed
99 Cotik V, Alemany LA, Filippo D, Luque F, Roller R, Vivaldi J, et al. Overview of CLEF eHealth Task 1-SpRadIE: A challenge on information extraction from Spanish Radiology Reports. CLEF (Working Notes). 2021. p. 732–50.

PubMed
100 Solarte-Pabón O, Montenegro O, Blazquez-Herranz A, Saputro H, Rodriguez-González A, Menasalvas E. Information extraction from Spanish radiology reports using multilingual BERT. CLEF Ehealth. 2021. p. 834–45.

PubMed
101 Fabregat H, Duque A, Araujo L, Martínez-Romo J. LSI_UNED at CLEF eHealth2021: Exploring the effects of transfer learning in negation detection and entity recognition in clinical texts. CLEF (Working Notes). 2021. p. 780–93.

PubMed
102 Ruas P, Neves A, Andrade VD, Couto FM, Aragón ME. LasigeBioTM at CANTEMIST: Named Entity Recognition and Normalization of Tumour Morphology Entities and Clinical Coding of Spanish Health-related Documents. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 422–37.

PubMed
103 Osborne JD, O'Leary T, Del Monte J, Sasse K. Identification of Cancer Entities in Clinical Text Combining Transformers with Dictionary Features. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 458–67.

PubMed
104 Han JC, Tsai RTH. NCU-IISR: Pre-trained Language Model for CANTEMIST Named Entity Recognition. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 347–51.

PubMed
105 López-Úbeda P, Díaz-Galiano MC, Martín-Valdivia MT, López LAU. Extracting Neoplasms Morphology Mentions in Spanish Clinical Cases through Word Embeddings. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 324–34.

PubMed
106 de Vargas Romero G, Segura-Bedmar I. Exploring Deep Learning for Named Entity Recognition of Tumor Morphology Mentions. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 396–411.

PubMed
107 Vunikili R. Clinical NER using Spanish BERT Embeddings. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 505–11

PubMed
108 García-Pablos A, Perez N, Cuadros M, Zotova E. Vicomtech at eHealth-KD Challenge 2020: Deep End-to-End Model for Entity and Relation Extraction in Medical Text. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 102–11.

PubMed
109 Pavanelli L, Schneider ETR, Gumiel YB, Ferreira TC, de Oliveira LFA, de Souza JVA, et al. PUCRJ-PUCPR-UFMG at eHealth-KD Challenge 2021: A Multilingual BERT-based System for Joint Entity Recognition and Relation Extraction. Proceedings of the Iberian Languages Evaluation Forum. 2021. p. 683–91.

PubMed
110 Balouchzahi F, Sidorov G, Shashirekha HL. ADOP FERT-Automatic Detection of Occupations and Profession in Medical Texts using Flair and BERT. Proceedings of the Iberian Languages Evaluation Forum. 2021. p. 747–57.

PubMed
111 Harkawat J, Vaidhya T. Spanish Pre-Trained Language Models for HealthCare Industry. Proceedings of the Iberian Languages Evaluation Forum. 2021. p. 796–802.

PubMed
112 Schwarz M, Chapman K, Häussler B. Multilingual Medical Entity Recognition and Cross-lingual Zero-Shot Linking with Facebook AI Similarity Search. Proceedings of the Iberian Languages Evaluation Forum. 2022.

PubMed
113 Avram AM, Mitrofan M, P is V. Species Entity Recognition Using a Neural Inhibitory Mechanism. Proceedings of the Iberian Languages Evaluation Forum. 2022.

PubMed
114 Tamayo A, Burgos D, Gelbukh A. ParTNER: Paragraph Tuning for Named Entity Recognition on Clinical Cases in Spanish using mBERT+ Rules. Proceedings of the Iberian Languages Evaluation Forum. 2022.

PubMed
115 Piad-Morffis A, Gutiérrez Y, Canizares-Diaz H, Estevez-Velarde S, Muñoz R, Montoyo A, et al. Overview of the ehealth knowledge discovery challenge at IberLEF 2020. Proceedings of the Iberian Languages Evaluation Forum. 2020. p. 71–84.

PubMed
116 Miranda-Escalada A, Gascó L, Lima-Lopez S, Farré-Maduell E, Estrada D, Nentidis A, et al. Overview of DisTEMIST at BioASQ: Automatic detection and normalization of diseases from clinical texts: results, methods, evaluation and multilingual resources. CLEF (Working Notes). 2022; p. 179–203.

PubMed
117 Chizhikova M, Collado-Montañez J, López-Úbeda P, Díaz-Galiano MC, Ureña-López LA, Martín-Valdivia MT. SINAI at CLEF 2022: Leveraging biomedical transformers to detect and normalize disease mentions. CLEF (Working Notes). 2022. p. 265–73.

PubMed
118 Neves A. Unicage at DISTE.MIST-Named Entity Recognition system using only Bash and Unicage tools. CLEF (Working Notes); 2022. p. 325–34

PubMed
119 Borchert F, Schapranow MP. HPI-DHC@ BioASQ DisTEMIST: Spanish Biomedical Entity Linking with Pre-trained Transformers and Cross-lingual Candidate Retrieval. CLEF (Working Notes). 2022. p. 244–58.

PubMed
120 Cardon R, Grabar N, Grouin C, Hamon T. Présentation de la campagne d'évaluation DEFT 2020: similarité textuelle en domaine ouvert et extraction d'information précise dans des cas cliniques. DEFT; 2020. Available at: https://aclanthology.org/2020.jeptalnrecital-deft.1/.

PubMed
121 Copara Zea JL, Knafou JDM, Naderi N, Moro C, Ruch P, Teodoro D. Contextualized French Language Models for Biomedical Named Entity Recognition. Actes de la 6e conférence conjointe Journées d'Études sur la Parole, Traitement Automatique des Langues Naturelles, Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, Atelier DÉfi Fouille de Textes (DEFT). 2020. p. 36–48. Available at: https://aclanthology.org/2020.jeptalnrecital-deft.4/.

PubMed
122 Naderi N, Knafou J, Copara J, Ruch P, Teodoro D. Ensemble of deep masked language models for effective named entity recognition in health and life science corpora. Front Res Metr Anal 2021;6. doi:10.3389/frma.2021.689803.

PubMed
123 Grouin C, Grabar N, Illouz G. Classification de cas cliniques et évaluation automatique de réponses d'étudiants : présentation de la campagne DEFT 2021. Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles, Atelier DÉfi Fouille de Textes (DEFT). 2021. p. 1–13. Available at: https://aclanthology.org/2021.jeptalnrecital-deft.1.

PubMed
124 Mannion A, Chevalier T, Schwab D, Goeuriot L. Identification de profil clinique du patient : Une approche de classification de séquences utilisant des modèles de langage français contextualisés. Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. 2021. p. 54–62. Available at: https://aclanthology.org/2021.jeptalnrecital-deft.6/.

PubMed
125 Dou Z, Yamamoto T. Overview of NTCIR-16. Proceedings of the 16th NTCIR Conference on Evaluation of Information Access Technologies. 2022. p. 3–7.

PubMed
126 Tamayo A, Gelbukh A, Burgos DA. NLP-CIC-WFU at SocialDisNER: Disease mention extraction in Spanish tweets using transfer learning and search by propagation. Proceedings of The Seventh Workshop on Social Media Mining for Health Applications. 2022. p. 19–22. Available at: https://aclanthology.org/2022.smm4h-1.6.

PubMed
127 Cetina K, García-Santa N. FRE at SocialDisNER: Joint learning of language models for named entity recognition. Proceedings of The Seventh Workshop on Social Media Mining for Health Applications. 2022. p. 68–70. Available at: https://aclanthology.org/2022.smm4h-1.20.

PubMed
128 Verma H, Bagherzadeh P, Bergler S. CLaCLab at SocialDisNER: Using medical gazetteers for named-entity recognition of disease mentions in Spanish tweets. Proceedings of The Seventh Workshop on Social Media Mining for Health Applications. 2022. p. 68–70. Available at: https://aclanthology.org/2022.smm4h-1.16.

PubMed
129 Montañés-Salas R, López-Bosque I, García-Garcés L, del-Hoyo-Alonso R. ITAINNOVA at SocialDisNER: A Transformers cocktail for disease identification in social media in Spanish. Proceedings of The Seventh Workshop on Social Media Mining for Health Applications. 2022. p. 71–4. Available at: https://aclanthology.org/2022.smm4h-1.21.

PubMed
130 Fu J, Li S, Yuan HM, Li Z, Gan Z, Chen Y, et al. CASIA@SMM4H'22: A uniform health information mining system for multilingual social media texts. Proceedings of The Seventh Workshop on Social Media Mining for Health Applications. p. 143–7. Available at: https://aclanthology.org/2022.smm4h-1.39.

PubMed
131 Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, et al. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics; 2020. p. 8440–51. doi:10.18653/v1/2020.acl-main.747.

PubMed
132 Canete J, Chaperon G, Fuentes R, Ho JH, Kang H, Pérez J. Spanish pre-trained BERT model and evaluation data. Practical ML for Developing Countries Workshop. 2020:1–10.

PubMed
133 López-García G, Jerez JM, Ribelles N, Alba E, Veredas FJ. Detection of Tumor Morphology Mentions in Clinical Reports in Spanish Using Transformers. Proceedings of the International Work-Conference on Artificial Neural Networks. Springer; 2021. p. 24–35.

PubMed
134 Martin L, Muller B, Suárez PJO, Dupont Y, Romary L, de la Clergerie ÉV, et al. CamemBERT: a Tasty French Language Model. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics; 2020. p. 7203–19. doi:10.18653/v1/2020.acl-main.645.

PubMed
135 Le H, Vial L, Frej J, Segonne V, Coavoux M, Lecouteux B, et al. FlauBERT: Unsupervised Language Model Pre-training for French. ArXiv; 2020; Available at: http://arxiv.org/abs/1912.05372.

PubMed
136 Otegi A, Agirre A, Campos JA, Soroa A, Agirre E. Conversational question answering in low resource scenarios: A dataset and case study for Basque. Proceedings of the 12th Language Resources and Evaluation Conference. 2020. p. 436–42. Available at: https://aclanthology.org/2020.lrec-1.55/.

PubMed
137 Gutiérrez Fandiño A, Armengol Estapé J, Pàmies M, Llop Palao J, Silveira Ocampo J, Pio Carrino C, et al. MarIA: Spanish language models. Procesamiento del Lenguaje Natural 2022;68:39–60. doi:10.26342/2022-68-3.

PubMed
138 López-García G, Jerez JM, Ribelles N, Alba E, Veredas FJ. Transformers for Clinical Coding in Spanish. IEEE Access 2021;9:72387–97. doi:10.1109/ACCESS.2021.3080085.

PubMed
139 Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language Models are Few-Shot Learners. Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2020. p. 1877–901. Available at: https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.

PubMed
140 Biswas S. ChatGPT and the Future of Medical Writing. Radiology 2023 Apr;307(2):e223312. doi:10.1148/radiol.223312.

PubMed
141 Jin D, Pan E, Oufattole N, Weng WH, Fang H, Szolovits P. What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams. Appl Sci 2021 Jan;11(14):6421. doi:10.3390/app11146421.

PubMed
142 Andrade VDT, Ruas P, Couto FM. Named Entity Recognition and Linking: a Portuguese and Spanish Oncological Parallel Corpus. bioRxiv; 2021. p. 2021.09.16.460605. doi:10.1101/2021.09.16.460605.

PubMed
143 Ma H, Yang F, Ren J, Li N, Dai M, Wang X, et al. ECCParaCorp: a cross-lingual parallel corpus towards cancer education, dissemination and application. BMC Med Inform Decis Mak 2020 Jul 9;20(3):122. doi:10.1186/s12911-020-1116-1.

PubMed
144 Mititelu VB, Mitrofan M. The Romanian medical treebank-SiMoNERo. Proceedings of the 15th Edition of the International Conference on Linguistic Resources and Tools for Natural Language Processing–ConsILR. 2020. p. 7–16.

PubMed
145 Blinov P, Reshetnikova A, Nesterov A, Zubkova G, Kokh V. RuMedBench: A Russian Medical Language Understanding Benchmark. Proceedings of the International Conference on Artificial Intelligence in Medicine. 2022. p. 383–92.

PubMed
146 Campillos-Llanos L, Valverde-Mateos A, Capllonch-Carrión A. A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine. BMC Med Inform Decis Mak 2021;21(69). doi:10.1186/s12911-021-01395-z.

PubMed

Subscribe to RSS

Share / Bookmark

Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey

Summary

Keywords

Publication History

References