Impact of translation on biomedical information extraction from real-life clinical notes

doi:10.21203/rs.3.rs-2728780/v1

Download PDF

Research Article

Impact of translation on biomedical information extraction from real-life clinical notes

https://doi.org/10.21203/rs.3.rs-2728780/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

The objective of our study is to determine whether using English tools to extract and normalize French medical concepts on translations provides comparable performance to French models trained on a set of annotated French clinical notes.

We compare two methods: a method involving French language models and a method involving English language models. For the native French method, the Named Entity Recognition (NER) and normalization steps are performed separately. For the translated English method, after the firsttranslation step, we compare a two-step method and a terminology- oriented method that performs extraction and normalization at the same time. We used French, English and bilingual annotated datasets to evaluate all steps (NER, normalization and translation) of our algorithms.

Concerning the results, the native French method performs better than the translated English one with a global f1 score of 0.51 [0.47;0.55] against 0.39 [0.34;0.44] and 0.38 [0.36;0.40] for the two English methods tested.

In conclusion, despite the recent improvement of the translation models, there is a signifi- cant performance difference between the two approaches in favor of the native French method which is more efficient on French medical texts, even with few annotated documents.

Concept Normalization

Named Entity Recognition

Natural Language Processing

Translation

Named Entity Recognition (NER) and term normalization are important steps in biomedical Natural Language Processing (NLP). NER is used to extract key information from textual medical reports and normalization consists of mapping a specific term to its formal reference in a shared terminology such as UMLS® [1]. Major improvements have been made recently in these areas, especially in English, as a huge amount of data is available in the literature and resources. Modern automatic language processing relies heavily on pre-trained language models, which allow for efficient semantic representation of texts. The development of algorithms such as transformers [2, 3] has led to significant progress in this area.

In many languages other than English, efforts still need to be made to obtain such interesting results, in particular due to a much smaller amount of accessible data [4].

In this context, our work explores the question of the relevance of a translation step for the recognition and normalization of medical concepts in biomedical documents in French. We compare two methods: 1) a native French approach were only annotated documents and resources in French are used, and 2) a translation-based approach where documents are translated into English, in order to take advantage of existing tools and resources for this language which would allow to extract concepts mentioned in unseen French texts without new training data (zero-shot) as proposed in Van Mulligen et al. [5].

We evaluate and discuss the results on several French biomedical corpora, including a new set of 42 hospitalization reports annotated with 4 entity groups. We evaluate the normalization task at the document-level, in order to avoid a cross-lingual alignment step at evaluation time, which would add a potential level of error and thus would make the results more difficult to interpret (see word alignment in [6, 7]). This normalization is performed by matching all the terms to their Concept Unique Identifier (CUI) in the UMLS^® [1]. Figure 1 summarizes these different steps, from the raw French text and the translated English text to CUI aggregation and comparison at document-level. All our codes are available on github [8].

The different steps of our algorithms rely heavily on Transformers language models [2]. These models are currently the state of the art for many natural language processing (NLP) tasks, such as machine translation, named entity recognition, classification, and text normalization (also known as "entity binding"). Once trained, these models can represent any specific language, such as biomedical language or legal language. The power of these models comes from their neural architecture but also depends largely on the amount of data on which they are trained. In the biomedical domain, two main types of data are available: public articles (e.g. PubMed ) and clinical electronic medical records databases (e.g. MIMIC III [14]), and the most powerful models are for example BioBERT [15] which has been trained on the whole of PubMed in English, and ClinicalBERT [16] trained on PubMed and MIMIC III. In French language, the variety of models is less important, with the models CamemBERT [17] and FlauBERT [18] for the general domain, and no specific model available for the biomedical domain.

In addition to the particularly powerful English pre-trained models, universal biomedical terminologies (i.e., metathesaurus) also contain significantly more English terms than other languages. For example, the Unified Medical Language System (UMLS®[1]) contains at least ten times more English terms than French terms, which can enable ruled-based models to perform better in English. As mentioned above, each concept of reference in the UMLS®[1] is assigned a Concept Unique Identifier (CUI), associated with a set of synonyms eventually in several languages, and a semantic group -such as Disorders, Chemical and Drugs, Procedure, Anatomy and so on.

At the same time, machine translation has also gained in performance thanks to the same type of language models based on transformers, and the last few years have seen the emergence of high-quality automatic translation such as opus-mt developed by Tiedemann et al. [11], Google Translate® and others. These last two observations led several research teams to add a translation step in order to analyze medical texts, for instance to extract relevant mentions in ultrasound reports [19, 20] or in the case of medical concept normalization [9, 10, 21]. Work in the general (non-medical) domain has also focused on alignment between named entities in parallel bilingual texts [22, 23].

3.1. Overview

Figure 2 presents the main approaches and models used in our study. We explored a “native French approach axis” (axis 1 in Fig. 2) based on French language models learned on and applied to French annotated data; and two “translated English approach axes” (axes 2.1 and 2.2) based on a translation step and English concept extraction tools. We compare the performance of all axes with an average of the CUI predictions accuracies at document level, for all documents.

3.1.1. Native French approach

Axis 1 consists of two steps: a NER step and a normalization step. For the NER step, we used the nested named entity extraction algorithm presented in the Section 3.4 below. Then, a normalization step is performed by two different algorithms : a deep multilingual normalization model [9] and CODER [10] with the all version (detailed in sections 3.5.1 and 3.5.2 respectively).

3.1.2. Translated-English approach

Axes 2.1 and 2.2 first consist in a translation step, presented in section 3.3 below, operated by the state-of-the-art opus-mt-fr-en algorithm [11] or Google Translate®. Then, like axis 1, axis 2.1 is based on a NER and a Normalization step. The NER step is done by the same algorithm but trained on the n2c2 2019 dataset[24], for the normalization step with used the same deep multilingual algorithm[9] (section 3.5.1) and the English version of CODER [10] based on a BioBERT[15] model. This axis allows to compare two methods whose difference is only the translation step.

Axis 2.2 is based on the MedCAT[13] algorithm presented in section 3.5.3 which performs the NER and normalization simultaneously. In this case, we compare the native French method with a state-of-the-art English system, ready to use, which is not available in French.

3.2. Datasets

3.2.1. Overview

For all our experiments, we chose to focus on only four UMLS®[1] group: Chemical & Drugs (CHEM) and Devices (DEVI) corresponding to medical devices such as pace-maker, catheter, etc., Disorders (DISO) corresponding to all signs, symptoms, findings (for instance positive or negative biological test results) and diseases, Procedures (PROC) correspond- ing to all diagnostic and therapeutic procedures such as imaging, biological tests, operative procedures, etc.

Table 1 presents the datasets used for all our experiments with the corresponding numbers of documents. First, two French datasets were used for the final evaluation, as well as for training the axis-1 models. QUAERO is a freely available corpus [25] based on pharmacological notes with two sub-corpora: Medline (PubMed abstract short sentences) and EMEA (drug notices). We also annotated a new real-life clinical notes dataset from the Assistance-Publique Hôpitaux de Paris datawarehouse, described in Supplementary materials Section 1.1.1.

Second, we used the corpus n2c2 2019 [24] with annotated CUIs – on which we auto- matically added the UMLS^®[1] semantic group information, to train the axis-2.1 system and to evaluate the English NER and normalization algorithms. We also used the Mantra dataset [26], corresponding to a multilingual gold standard corpus for biomedical concept recognition.

Finally, we fine-tuned and tested the translation algorithms on both 2016 [27] and 2019 [28] WMT biomedical corpora. Detailed description of the number of respective entities in the datasets can be found in the supplementary Table 1.

Table 1

**Datasets**. Presentation of all datasets used.
Language	French			English		English-French
Datasets	Quaero		French notes	n2c2_2019	Mantra (English)	WMT 2016	WMT 2019
Datasets	EMEA	Medline	French notes	n2c2_2019	Mantra (English)	WMT 2016	WMT 2019
Type	Drug notices	Medline titles	French notes	English notes	Drug not. & Medline titles	Pubmed abstracts	Pubmed abstracts
Size (docs)	38	2514	42	100	200	> 600k sent.	6542
Used for :
Train NER	x	x	x	x
Test NER	x	x	x	x
Normalisation	x	x	x	x
Test MedCAT				x	x
Translation (Fine Tuning)						x	x
Translation (test)						x

French corpus annotation methods are detailed in section 1.1.1 of supplementary materials with supplementary Fig. 1. Entities repartition for this annotation is detailed in Supplementary Table 1.

3.3. Translation

We used and compared two main algorithms for the translation step: the opus-mt-fr-en model [11] that we tested without and with fine-tuning on the two biomedical translation corpora from 2016 and 2019 [27, 28], and Google Translate® as a comparison model.

3.4. Named Entity Recognition

For this step, we used the algorithm of Wajsburt et al.[29] described in [30]. This model is based on the representation of a BERT transformer [3] and computes the scores of all possible concepts to be predicted in the text. The extracted concepts are delimited by three values: (start, end, label). More precisely, the text encoding corresponds to the last 4 layers of BERT, the Fasttext embedding and a Char-CNN max-pool representation [31] of the word. The decoding step is then performed by a 3-layer LSTM [32] with learnable gating weights [33], similar to the method in [34]. A sigmoid function is added to the top. Values (start, end, label) with a score greater than 0.5 are retained for prediction. The loss function is a binary cross-entropy and we used the Adam optimizer [35].

In our experiments, for the native French axis (axis 1 on Fig. 2), the pre-trained embeddings used to train the model were based on a FastText [36] trained from scratch on 5 Gigabytes of clinical text and a camemBERT-large [17] fine-tuned on this same dataset. For the English axis 2.1, the pre-trained models were BioWordVec [12] and clinicalBERT [16].

3.5. Normalization algorithms

This step of our experiments is essential in order to compare a native French and a translated English method and consists in mapping each mention extracted from the text to its associated CUI in the UMLS®[1]. We compare three models for this step, described below: the deep multilingual normalization algorithm developed by [9], the CODER[10] and the MedCAT model[13] that performs both NER and normalization at the same time.

All these three models do not need any training dataset other than the UMLS®.

3.5.1. Deep multilingual normalization

This algorithm from Wajsbürt et al. [9] considers the normalization task as a highly- multiclass classification problem with a cosine similarity and a softmax function as a last layer. The model is based on contextualized embedding, using the pre-trained multilingual BERT model [3] and works in two steps: during the first step, the BERT model is fine-tuned and French UMLS terms and their corresponding English synonyms are learned. Then, in the second step, the BERT model is frozen and the representation of all English-only terms (i.e. only present in English in the UMLS®[1]) is learned. The same training is used for the native French and translated English approach. This model was trained with the 2021 UMLS®[1] version, corresponding to the version used for the annotation of the French corpus. This model was thus trained on more than 4 million concepts corresponding to 2 million CUIs.

3.5.2. CODER

For the deep multilingual model and the CODER model, in order to improve performances in terms of accuracy, we chose to add the semantic group information (i.e. CHEM, DEVI, DISO, PROC) to the output of the model: namely, among the k first CUIs chosen from a mention, we choose the first one of the right group.

The MedCAT algorithm is detailed in Section 1.1.1 (supplementary materials).

Table 2

**NER performance.** Results of the NER models. For all experiments we used the same NER algorithm described in section 3.4, but with different pre-trained models. FastText* corresponds to a FastText [36] trained from scratch on our local clinical dataset.
Data set		EMEA test			French notes			n2c2 2019 test
Models		FastText* & camemBERT-FT			FastText* & camemBERT-FT			BioWordVec [12] & ClinicalBERT [16]
Models		precision	recall	f1-score	precision	recall	f1-score	precision	recall	f1-score
	CHEM	0.80	0.83	0.82	0.84	0.88	0.86	0.87	0.85	0.86
	DEVI	0.42	0.81	0.55	0.00	0.00	0.00	0.58	0.51	0.54
Groups	DISO	0.54	0.63	0.59	0.67	0.65	0.66	0.74	0.72	0.73
	PROC	0.73	0.78	0.74	0.78	0.72	0.75	0.80	0.78	0.79
	Overall	0.71	0.77	0.74	0.73	0.71	0.72	0.78	0.76	0.77

The sections below present the performance results for each step. The n2c2 2019 challenge corpus [24] allowed us to evaluate the performance of our English models on clinical data and the Biomedical Translation shared task 2016 [27] to evaluate our translation performance on biomedical data with a BLEU score [38].

4.1. NER performances

To be able to compare our native French and translated English approaches, we used the same NER model (section 3.4), trained and tested on each respective datasets described above (section 3.2). Table 2 presents the corresponding results. The overall F1-scores are similar from one dataset to another: from 0.72 to 0.77.

4.2. Normalization performance

Table 3: Normalization performance. Presentation of the accuracy results of the Normalization models computed from the annotated datasets, focusing on the four semantic groups of interest : CHEM, DEVI, DISO, PROC.

4.3. Translation performances

For the two translation models, the respective BLEU scores [38] are computed on the 2016 Biomedical Translation shared task [27]. A fine-tuned version of opus-mt-fr-en [11] on the 2016 and 2019 Biomedical Translation shared tasks was also tested. However, the Google translate model could not be used for our experiments involving clinical notes due to confidentiality reasons.

Table 4 shows the BLEU score results for the three models, showing that fine-tuning on the opus-mt-fr-en model [11] on biomedical datasets led to the best results, with a BLEU score[38] of 0.51. We will use this model for the overall performance of axes 2.1 and 2.2.

4.4. Overall performances from raw text to CUI predictions

This section presents the overall performance of the 3 axes, in an end-to-end pipeline. For axis 2, the results are those obtained with the best normalization algorithm (presented in Table 3). The model used for translation was the opus-mt-fr-en [11] fine-tuned model. The results are presented in Table 5, the best results are obtained by the native French approach on the EMEA corpus [25] and the French clinical notes. The 95% confidence intervals were calculated using the empirical bootstrap method [39].

Table 4

**Translation performances.** BLEU scores of Translation models. *opus-mt-fr-en* FT corresponds to the *opus-mt-fr-en* model [11] *fine-tuned* on biomedical translated corpus from [27] and [28].
Data set		wmt biomed 2016 test
	Google Translate	0.42
Models	opus-mt-fr-en	0.31
	opus-mt-fr-en FT	0.51

Table 5

**Overall performances.** The normalization step is performed by the deep multilingual model and the translation by the opus-mt-fr-en FT model.
Data sets		EMEA test			French		notes
Data sets		precision	recall	f1-score	precision	recall	f1-score
	Axis 1 (French NER + normalization)	0.63	0.60	0.61 [0.53;0.65]	0.49	0.53	0.51 [0.47;0.55]
Methods	Axis 2.1 (Translation + NER + normalization)	0.53	0.40	0.45 [0.38;0.51]	0.41	0.38	0.39 [0.34;0.44]
	Axis 2.2 (Translation + MedCAT[13])	0.53	0.46	0.49 [0.38;0.54]	0.38	0.38	0.38 [0.36;0.40]

In this paper, we compared two approaches for extracting medical concepts from clinical notes. A French approach based on a French language model and a translated English approach where we compare two state-of-the-art English biomedical language models, after a translation step. The main advantages of our experiment are that it is reproducible, and that we were able to analyze the performance of each step of the algorithm: NER, normalization and translation, and to test several models for each step.

5.1.1. The quality of the translation is not sufficient

We show that the native French approach outperforms the two translated English approaches, even with a small French training dataset. This analysis confirms that, when possible, an annotated dataset improves feature extraction. The evaluation of each intermediate step allows us to show that the performance of each module is similar in French and in English. We can then conclude that it is rather the translation phase itself that is of insufficient quality to allow the use of English as a proxy without loss of performance. This is confirmed by the performance calculations of the translation, where the calculated BLEU scores are relatively low, although improved by a fine-tuning step.

In conclusion, although translation is commonly used for entity extraction or term normalization in languages other than English [20, 40, 41, 42, 5], due to the availability of turnkey models that do not require additional annotation by a clinician, we show that this induces a significant performance loss.

Commercial API-based translation services could not be used for our task due to data privacy issues. However, the opus-mt model is considered state of the art, it is adjustable on domain specific data, and the translation results presented in Table 4 confirm the lack of performance difference between this model and the google translate model.

Even if our experiments were performed on only one language, the French-English pair is one of the best performing in recent translation benchmarks[43]. It is unlikely that other languages would lead to significantly better results.

5.1.2. Error Analysis

In these experiments, the overall results may appear low, but the task is still complex, especially because the UMLS® [1] contains many synonyms with different CUIs. To better understand, we performed an error analysis on the normalization task only, as shown in Supplementary Table 3, with a physician's evaluation, on a sample of 100 errors for both models. We calculated that 24% and 39% of the terms found by the deep normalization algorithm [9] and CODER [10] respectively were actually synonyms but with two different UMLS CUIs. For example, cardiac ultrasound has CUI C1655737 while echocardiography has another CUI C0013516, similarly H/O: thromboembolism has a CUI of C0455533 while history of thromboembolism has a CUI of C1997787 and so on. In addition, as shown in Supplementary Table 3, abbreviations and misspelled words also induce many errors and are difficult to manage, even though some abbreviations are already built into UMLS. Another limitation comes from the ever-changing versions of the UMLS®. In any case, it is the relative differences between the results that matter for our purposes, not the absolute values.

5.1.3. Limitations

This work has several limitations, first of all, the real-life French clinical notes had very few terms attached to the “Devices” semantic group, thus preventing the NER algorithm from finding them in the test dataset. However, this drawback, penalizing the native French approach, still allows us to conclude on the results. Moreover, in this study, we did not take into account the attributes of the extracted terms such as the negation, the hypothetical attribute or the belonging to another person than the patient, this for comparison purposes, indeed the datasets QUAERO [25] and n2c2 2019 [24] did not have this information labeled.

Ethics

The study and its experimental protocol was approved by the AP-HP Scientific and Ethical Committee (IRB00011591 decision number CSE 20-0093). Patients were informed that their EHR information could be reused after an anonymization process and those who objected to the reuse of their data were excluded. All methods were carried out in accordance with relevant guidelines (reference methodology MR-004 of the CNIL: Commission Nationale de l’Informatique et des Libertés [44]).

Data availability

The datasets analyzed during the current study are not publicly available due the confidentiality of data from patient records, even after de-identification. However, access to the AP-HP data warehouse’s raw data can be granted following the process described on its website: www.eds.aphp.fr, contacting the Ethical and Scientific Commity at [email protected]. A prior validation of the access by the local institutional review board is required. In the case of non-APHP researchers, the signature of a collaboration contract is moreover mandatory.

Acknowledgments

The authors would like to thank the AP-HP data warehouse, which provided the data and the computing power to carry out this study under good conditions. We wish to thank all the medical colleges, including internal medicine, rheumatology, dermatology, nephrology, pneumology, hepato-gastroenterology, hematology, endocrinology, gynecology, infectiology, cardiology, oncology, emergency and intensive care units, that gave their agreements for the use of the clinical data.

Competing interest

Authors declare no competing interest

Consent for publication

Not applicable

Funding

Not applicable

Authors contribution

Christel Gérardin: worked on the conceptualization, data curation, formal analysis, investigation, methodology, software, validation, original drafting, writing, revising, and editing the manuscript.

Yuhan Xiong: worked on investigation, methodology, software, validation.

Perceval Wajsbürt: worked on the investigation, the software, the revision of the manuscript.

Fabrice Carrat: worked on conceptualization, methodology, project administration, supervision, writing - original version, writing - revision and editing of the manuscript.

Xavier Tannier: worked on conceptualization, formal analysis, methodology, writing - original version, writing - revision and editing of the manuscript.

O. Bodenreider, The unified medical language system (UMLS): integrating biomedical terminology, Nucleic acids research 32 (suppl 1) (2004) D267–D270.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L- . Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017).
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding. arxiv, arXiv preprint arXiv:1810.04805 (2019).
A. Névéol, H. Dalianis, S. Velupillai, G. Savova, P. Zweigenbaum, Clinical natural language processing in languages other than english: opportunities and challenges, Journal of biomedical semantics 9 (1) (2018) 1–13.
E. M. van Mulligen, Z. Afzal, S. A. Akhondi, D. Vo, J. A. Kors, Erasmus MC at CLEF ehealth 2016: Concept recognition and coding in french texts, in: K. Balog, L. Cap- pellato, N. Ferro, C. Macdonald (Eds.), Working Notes of CLEF 2016 - Conference and Labs of the Evaluation forum, E^´vora, Portugal, 5-8 September, 2016, Vol. 1609 of CEUR Workshop Proceedings, CEUR-WS.org, 2016, pp. 171–178.
Q. Gao, S. Vogel, Parallel implementations of word alignment tool, in: Software engi- neering, testing, and quality assurance for natural language processing, 2008, pp. 49–57.
S. Vogel, H. Ney, C. Tillmann, Hmm-based word alignment in statistical translation, in: COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics, 1996.
Github link, https://github.com/ChristelDG/biomed translation.
P. Wajsbürt, A. Sarfati, X. Tannier, Medical concept normalization in French using mul- tilingual terminologies and contextual embeddings, Journal of Biomedical Informatics 114 (2021) 103684.
Z. Yuan, Z. Zhao, H. Sun, J. Li, F. Wang, S. Yu, Coder: Knowledge-infused cross- lingual medical term embedding for term normalization, Journal of biomedical informatics (2022) 103983.
J. Tiedemann, S. Thottingal, OPUS-MT — Building open translation services for the World, in: Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT), Lisbon, Portugal, 2020.
Y. Zhang, Q. Chen, Z. Yang, H. Lin, Z. Lu, Biowordvec, improving biomedical word embeddings with subword information and mesh, Scientific data 6 (1) (2019) 1–9.
Z. Kraljevic, D. Bean, A. Mascio, L. Roguski, A. Folarin, A. Roberts, R. Bendayan, R. Dobson, Medcat–medical concept annotation tool, arXiv preprint arXiv:1912.10166 (2019).
A. Johnson, T. Pollard, L. Shen, L.-w. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. Celi, R. Mark, Mimic-iii, a freely accessible critical care database, Scientific Data 3 (2016) 160035. doi:10.1038/sdata.2016.35.
J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, Biobert: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics 36 (4) (2020) 1234–1240.
K. Huang, J. Altosaar, R. Ranganath, Clinicalbert: Modeling clinical notes and pre- dicting hospital readmission, arXiv preprint arXiv:1904.05342 (2019).
L. Martin, B. Muller, P. J. Ortiz Suarez, Y. Dupont, L. Romary, de la Clergerie, D. Seddah, B. Sagot, CamemBERT: a tasty French language model, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, 2020, pp. 7203–7219.
H. Le, L. Vial, J. Frej, V. Segonne, M. Coavoux, B. Lecouteux, A. Allauzen, B. Crabb´e, L. Besacier, D. Schwab, Flaubert: Unsupervised language model pre-training for french, in: Proceedings of The 12th Language Resources and Evaluation Conference, European Language Resources Association, Marseille, France, 2020, pp. 2479–2490.
L. Campos, V. Pedro, F. Couto, Impact of translation on named-entity recognition in radiology texts, Database 2017 (2017).
V. Suarez-Paniagua, H. Dong, A. Casey, A multi-bert hybrid system for named entity recognition in spanish radiology reports, CLEF eHealth (2021).
N. Perez, M. Cuadros, G. Rigau, Biomedical term normalization of EHRS with UMLS, arXiv preprint arXiv:1802.02870 (2018).
Y. Chen, C. Zong, K.-Y. Su, On jointly recognizing and aligning bilingual named enti- ties, in: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10, Association for Computational Linguistics, USA, 2010, p. 631–639.
Y. Chen, C. Zong, K.-Y. Su, A joint model to identify and align bilingual named entities, Computational linguistics 39 (2) (2013) 229–266.
S. Henry, Y. Wang, F. Shen, O. Uzuner, The 2019 national natural language processing (nlp) clinical challenges (n2c2)/open health nlp (OHNLP) shared task on clinical concept normalization for clinical records, Journal of the American Medical Informatics Association (2020).
A. Névéol, C. Grouin, J. Leixa, S. Rosset, P. Zweigenbaum, The QUAERO French medical corpus: A resource for medical entity recognition and normalization, in: Proc of BioTextMining Work, 2014, pp. 24–30.
J. A. Kors, S. Clematide, S. A. Akhondi, E. M. Van Mulligen, D. Rebholz-Schuhmann, A multilingual gold-standard corpus for biomedical concept recognition: the mantra GCS, Journal of the American Medical Informatics Association 22 (5) (2015) 948–956.
O. Bojar, R. Chatterjee, C. Federmann, Y. Graham, B. Haddow, M. Huck, A. Ji- meno Yepes, P. Koehn, V. Logacheva, C. Monz, M. Negri, A. N´ev´eol, M. Neves, M. Popel, M. Post, R. Rubino, C. Scarton, L. Specia, M. Turchi, K. Verspoor, M. Zampieri, Findings of the 2016 conference on machine translation, in: Proceed- ings of the First Conference on Machine Translation: Volume 2, Shared Task Pa- pers, Association for Computational Linguistics, Berlin, Germany, 2016, pp. 131–198. doi:10.18653/v1/W16-2301.
R. Bawden, K. Bretonnel Cohen, C. Grozea, A. Jimeno Yepes, M. Kittner, M. Krallinger, N. Mah, A. Neveol, M. Neves, F. Soares, A. Siu, K. Verspoor, M. Vi- cente Navarro, Findings of the WMT 2019 biomedical translation shared task: Eval- uation for MEDLINE abstracts and biomedical terminologies, in: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), Association for Computational Linguistics, Florence, Italy, 2019, pp. 29–53. doi:10.18653/v1/W19-5403.
P. Wajsbürt, Extraction and normalization of simple and structured entities in medical documents, Theses, Sorbonne Universit´e (Dec. 2021).
C. Gérardin, P. Wajsbürt, P. Vaillant, A. Bellamine, F. Carrat, X. Tannier, Multil- abel classification of medical concepts for patient clinical profile identification, Artificial Intelligence in Medicine 128 (2022) 102311.
G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer, Neural architectures for named entity recognition, arXiv preprint arXiv:1603.01360 (2016).
S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation 9 (8) (1997) 1735–1780.
J. Kim, M. El-Khamy, J. Lee, Residual LSTM: Design of a deep recurrent architecture for distant speech recognition, arXiv preprint arXiv:1701.03360 (2017).
J. Yu, B. Bohnet, M. Poesio, Named Entity Recognition as Dependency Parsing (Jun. 2020). arXiv:2005.07150.
D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).
P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information, Transactions of the association for computational linguistics 5 (2017) 135– 146.
X. Wang, X. Han, W. Huang, D. Dong, M. R. Scott, Multi-similarity loss with general pair weighting for deep metric learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5022–5030.
. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation of machine translation, in: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318.
F. M. Dekking, C. Kraaikamp, H. P. Lopuha¨a, L. E. Meester, A Modern Introduction to Probability and Statistics: Understanding Why and How, SPRINGER NATURE, 2007.
V. Cotik, H. Rodrıguez, J. Vivaldi, Spanish named entity recognition in the biomedical domain, in: Annual International Symposium on Information Management and Big Data, Springer, 2018, pp. 233–248.
J. Hellrich, U. Hahn, Enhancing multilingual biomedical terminologies via machine translation from parallel corpora, in: International Conference on Applications of Natural Language to Databases/Information Systems, Springer, 2014, pp. 9–20.
G. Attardi, A. Buzzelli, D. Sartiano, Machine translation for entity recognition across languages in biomedical documents., in: CLEF (Working Notes), Citeseer, 2013.
Tiedemann, Train Opus-MT models, Language Technology at the University of Helsinki (Jun. 2022).
Homepage — CNIL, https://www.cnil.fr/en/home.

No competing interests reported.

Supplementarymaterials.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Impact of translation on biomedical information extraction from real-life clinical notes

Status:

Version 1

Abstract

Figures

1. Introduction

2. Background

3. Materials And Methods

3.1. Overview

3.1.1. Native French approach

3.1.2. Translated-English approach

3.2. Datasets

3.2.1. Overview

3.3. Translation

3.4. Named Entity Recognition

3.5. Normalization algorithms

3.5.1. Deep multilingual normalization

3.5.2. CODER

4. Results

4.1. NER performances

4.2. Normalization performance

4.3. Translation performances

4.4. Overall performances from raw text to CUI predictions

5. Discussion

Declarations

Ethics

Data availability

Acknowledgments

Competing interest

References

Additional Declarations

Supplementary Files

Status:

Version 1