Improving Medical Data Annotation Including Humans in the Machine Learning Loop

Bobes-Bascarán, José; Mosqueira-Rey, Eduardo; Alonso-Ríos, David

doi:10.3390/engproc2021007039

Open AccessProceeding Paper

Improving Medical Data Annotation Including Humans in the Machine Learning Loop^†

by

José Bobes-Bascarán

^*

,

Eduardo Mosqueira-Rey

and

David Alonso-Ríos

Centro de Investigación en TIC (CITIC), Universidade da Coruña, Elviña, 15071 A Coruña, Spain

^*

Author to whom correspondence should be addressed.

^†

Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.

Eng. Proc. 2021, 7(1), 39; https://doi.org/10.3390/engproc2021007039

Published: 19 October 2021

(This article belongs to the Proceedings of The 4th XoveTIC Conference)

Download Versions Notes

Abstract

:

At present, the great majority of Artificial Intelligence (AI) systems require the participation of humans in their development, tuning, and maintenance. Particularly, Machine Learning (ML) systems could greatly benefit from their expertise or knowledge. Thus, there is an increasing interest around how humans interact with those systems to obtain the best performance for both the AI system and the humans involved. Several approaches have been studied and proposed in the literature that can be gathered under the umbrella term of Human-in-the-Loop Machine Learning. The application of those techniques to the health informatics environment could provide a great value on prognosis and diagnosis tasks contributing to develop a better health service for Cancer related diseases.

Keywords:

Human-in-the-Loop Machine Learning; Interactive Machine Learning; Machine Teaching; Iterative Machine Teaching; Active Learning

1. Introduction

The majority of Machine Learning (ML) systems require the participation of humans at several steps of the AI pipeline. With this requirement in mind, new types of interactions between humans and machine learning algorithms are being defined, which we can group under the term Human-in-the-Loop Machine Learning (HITL-ML) [1]. The goal is to make machine learning models more accurate, obtain the desired accuracy faster, and also make humans more efficient when training or using a ML model.

In the health domain (and others), due to the reduced number of datasets, traditional ML approaches suffer from insufficient training samples [2]. Using specific techniques as the ones described in this proposal could help improving both the training process and the final user performance.

2. Materials and Methods

Hybrid Intelligence Systems include several strategies with the goal of enhancing the capabilities of either the human, the machine or both of them. A taxonomy has been proposed based on the task characteristics, learning paradigm, AI-Human interaction, and Human-AI interaction [3].

We describe below several techniques that can be grouped under the term Human-in-the-Loop Machine Learning (HITL-ML), and can be applied between others, to the Cancer prognosis and diagnosis scenarios, where training samples are scarce and domain expert knowledge is expensive.

2.1. Human-in-the-Loop Techniques

Human-in-the-Loop ML has the goal of increasing the accuracy of a ML model, reaching the target performance faster, combining human and machine intelligence to maximize accuracy, and assisting human tasks with machine learning to increase efficiency [1]. The most relevant tasks mentioned are:

Annotating unlabeled data to create training, validation, and evaluation data.
Sampling the most important unlabeled data items.
Incorporating Human-Computer Interaction principles into annotation.

Depending on who is in control of the learning process, we do identify different approaches: Active Learning, Interactive Machine Learning, and Machine Teaching.

2.2. Active Learning

One of the first techniques is Active Learning (AL) [4], where the system remains in control of the learning process and treats humans as oracles to label relevant unlabeled data. It is particularly useful when the labeling example process is expensive or time-consuming, and it also applies to the scenario of scarcity of examples (e.g., cancer). AL uses an interactive/iterative process for obtaining training data, unlike passive or classical learning, where the data is provided in advance. The learner requests information from the oracle, that it selects based on different query strategies.

2.3. Interactive Machine Learning

Another approach is Interactive Machine Learning (IML), in which there is a closer interaction between users and learning systems, with people iteratively supplying information in a more focused, frequent and incremental way compared to traditional machine learning [5,6]. In this technique the learning process control is shared between the system and the users, working closely to benefit from each other.

2.4. Machine Teaching

Finally, Machine Teaching (MT) [7,8] where the idea is to focus on the teacher role a human can play to create useful information from the data available. With the aim of facilitating the construction of new models that nowadays require practitioners with deep knowledge of machine learning, this method proposes to decouple knowledge about machine learning algorithms from the process of teaching. The human would behave as a teacher guiding the learning process [9].

A particular version of MT is Iterative Machine Teaching (iMT) [10] whose goal is to obtain the optimal training set given a machine learning algorithm and a target model. The idea is to learn a target concept with a minimal number of iterations using the smallest dataset.

2.5. Applying and Interpreting the Results

Once the model is deployed and it is used in a production environment, we could use Explainable AI (XAI) [11] to make the results of AI systems more understandable to humans.

There are specific domains where the aforementioned methods could fulfill the targets of the expected model. As an example, ML-approaches can be of particular interest to solve issues in Health Informatics, where we are lacking big data sets, we need to deal with complex data and/or rare events, and traditional learning algorithms suffer due to insufficient training samples [2].

3. Results

To date, we explored two of the techniques exposed: Iterative Machine Teaching (iMT) and Active Learning (AL). We have analyzed how to integrate them in the learning process using common datasets: Gaussian, MNIST and Vehicles.

Our proposal to incorporate iMT and AL into the machine learning loop is to use iMT as a technique to obtain the “Minimum Viable Data (MVD)” for training a learning model, that is, a dataset that allows us to increase speed and reduce complexity in the learning process by allowing to build early prototypes.

The results of the application of the iMT and AL on known datasets can be found at [12]. There we can see that, in the iMT experiment, the results show—both in the example problems and in the real-world problem—that the algorithms trained by any of the proposed teachers obtain better results than those trained by randomly choosing the examples. In our AL experiment, we find that the greatest advantage of this approach is in the continuous improvement of the model, which enhances resilience and prevents obsolescence.

4. Discussion

The quality of the data is a key factor that can make the model to fail in certain scenarios. If our data is better our algorithms will generalize better. This is the idea of the so-called data-centric approach which is behind some of the techniques explored (i.e., Machine Teaching).

The methods described in this paper are not mutually exclusive, so they can be combined with the aim of obtaining better results. Some of the techniques apply at different stages of the ML pipeline. Furthermore they can be incrementally implemented enhancing the model at every step.

The outcomes of the experiments conducted were obtained using common datasets as inputs. Even if they are promising, we plan to apply these techniques to relevant medical databases as The Cancer Genome Atlas Program (TCGA).

As for future work, we would be interested in applying these techniques considering multi-class problems and utilize the TCGA datasets.

5. Conclusions

The techniques exposed (combined or individually) can be applied to a specific domain (Cancer diagnosis and prognosis) making Machine Learning (ML) methods accessible to subject-matter experts and improving the performance of both the system and the human (i.e., HITL-ML), obtaining semantic and interpretable ML models (i.e., Explainable AI).

Funding

This work has been supported by the State Research Agency of the Spanish Government,112grant (PID2019-107194GB-I00/AEI/10.13039/501100011033) and by the Xunta de Galicia, grant113(ED431C 2018/34) with the European Union ERDF funds. We wish to acknowledge the support114received from the Centro de Investigacin de Galicia “CITIC”, funded by Xunta de Galicia and the115European Union (European Regional Development Fund- Galicia 2014-2020 Program), by grant116ED431G 2019/01.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Munro, R. Human-in-the-Loop Machine Learning; Manning Publications: Shelter Island, NY, USA, 2020. [Google Scholar]
Holzinger, A. Interactive machine learning for health informatics: When do we need the human-in-the-loop? Brain Inform. 2016, 3, 119–131. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Dellermann, D.; Calma, A.; Lipusch, N.; Weber, T.; Weigel, S.; Ebel, P. The future of human-AI collaboration: A taxonomy of design knowledge for hybrid intelligence systems. In Proceedings of the 52nd Hawaii International Conference on System Sciences, Honolulu, HI, USA, 8–11 January 2019. [Google Scholar]
Settles, B. Active Learning Literature Survey; Technical Report; Department of Computer Sciences, University of Wisconsin-Madison: Madison, WI, USA, 2009. [Google Scholar]
Fails, J.A.; Olsen, D.R. Interactive Machine Learning. In Proceedings of the 8th International Conference on IUI’03 Intelligent User Interfaces, Miami, FL, USA, 12–15 January 2003; Association for Computing Machinery: New York, NY, USA, 2003; pp. 39–45. [Google Scholar] [CrossRef]
Amershi, S.; Cakmak, M.; Knox, W.B.; Kulesza, T. Power to the People: The Role of Humans in Interactive Machine Learning. AI Mag. 2014, 35, 105–120. [Google Scholar] [CrossRef] [Green Version]
Simard, P.Y.; Amershi, S.; Chickering, D.M.; Pelton, A.E.; Ghorashi, S.; Meek, C.; Ramos, G.; Suh, J.; Verwey, J.; Wang, M.; et al. Machine Teaching: A New Paradigm for Building Machine Learning Systems. arXiv 2017, arXiv:1707.06742. [Google Scholar]
Lindvall, M.; Molin, J.; Löwgren, J. From Machine Learning to Machine Teaching: The Importance of UX. Interactions 2018, 25, 52–57. [Google Scholar] [CrossRef]
Ramos, G.; Meek, C.; Simard, P.; Suh, J.; Ghorashi, S. Interactive machine teaching: A human-centered approach to building machine-learned models. Hum.-Comput. Interact. 2020, 35, 1–39. [Google Scholar] [CrossRef]
Liu, W.; Dai, B.; Humayun, A.; Tay, C.; Yu, C.; Smith, L.B.; Rehg, J.M.; Song, L. Iterative Machine Teaching. In Proceedings of the 34th International Conference on Machine Learning (ICML-2017), Sydney, Australia, 6–11 August 2017; Volume 70, pp. 2149–2158. Available online: http://xxx.lanl.gov/abs/1705.10470 (accessed on 15 July 2021).
Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
Mosqueira-Rey, E.; Alonso-Ríos, D.; Baamonde-Lozano, A. Integrating Iterative Machine Teaching and Active Learning into the Machine Learning Loop. In Proceedings of the 25 International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES-2021), Szczecin, Poland, 8–10 September 2021; Volume 192, pp. 553–562. [Google Scholar] [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bobes-Bascarán, J.; Mosqueira-Rey, E.; Alonso-Ríos, D. Improving Medical Data Annotation Including Humans in the Machine Learning Loop. Eng. Proc. 2021, 7, 39. https://doi.org/10.3390/engproc2021007039

AMA Style

Bobes-Bascarán J, Mosqueira-Rey E, Alonso-Ríos D. Improving Medical Data Annotation Including Humans in the Machine Learning Loop. Engineering Proceedings. 2021; 7(1):39. https://doi.org/10.3390/engproc2021007039

Chicago/Turabian Style

Bobes-Bascarán, José, Eduardo Mosqueira-Rey, and David Alonso-Ríos. 2021. "Improving Medical Data Annotation Including Humans in the Machine Learning Loop" Engineering Proceedings 7, no. 1: 39. https://doi.org/10.3390/engproc2021007039

Article Menu

Improving Medical Data Annotation Including Humans in the Machine Learning Loop^†

Abstract

1. Introduction

2. Materials and Methods

2.1. Human-in-the-Loop Techniques

2.2. Active Learning

2.3. Interactive Machine Learning

2.4. Machine Teaching

2.5. Applying and Interpreting the Results

3. Results

4. Discussion

5. Conclusions

Funding

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Improving Medical Data Annotation Including Humans in the Machine Learning Loop †

Abstract

1. Introduction

2. Materials and Methods

2.1. Human-in-the-Loop Techniques

2.2. Active Learning

2.3. Interactive Machine Learning

2.4. Machine Teaching

2.5. Applying and Interpreting the Results

3. Results

4. Discussion

5. Conclusions

Funding

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Improving Medical Data Annotation Including Humans in the Machine Learning Loop^†