Ada-WHIPS: explaining AdaBoost classification with applications in the health sciences

Hatwell, Julian; Gaber, Mohamed Medhat; Atif Azad, R. Muhammad

doi:10.1186/s12911-020-01201-2

Technical advance
Open access
Published: 02 October 2020

Ada-WHIPS: explaining AdaBoost classification with applications in the health sciences

Julian Hatwell ORCID: orcid.org/0000-0001-6589-5165¹,
Mohamed Medhat Gaber¹ &
R. Muhammad Atif Azad¹

BMC Medical Informatics and Decision Making volume 20, Article number: 250 (2020) Cite this article

4791 Accesses
40 Citations
1 Altmetric
Metrics details

Abstract

Background

Computer Aided Diagnostics (CAD) can support medical practitioners to make critical decisions about their patients’ disease conditions. Practitioners require access to the chain of reasoning behind CAD to build trust in the CAD advice and to supplement their own expertise. Yet, CAD systems might be based on black box machine learning models and high dimensional data sources such as electronic health records, magnetic resonance imaging scans, cardiotocograms, etc. These foundations make interpretation and explanation of the CAD advice very challenging. This challenge is recognised throughout the machine learning research community. eXplainable Artificial Intelligence (XAI) is emerging as one of the most important research areas of recent years because it addresses the interpretability and trust concerns of critical decision makers, including those in clinical and medical practice.

Methods

In this work, we focus on AdaBoost, a black box model that has been widely adopted in the CAD literature. We address the challenge – to explain AdaBoost classification – with a novel algorithm that extracts simple, logical rules from AdaBoost models. Our algorithm, Adaptive-Weighted High Importance Path Snippets (Ada-WHIPS), makes use of AdaBoost’s adaptive classifier weights. Using a novel formulation, Ada-WHIPS uniquely redistributes the weights among individual decision nodes of the internal decision trees of the AdaBoost model. Then, a simple heuristic search of the weighted nodes finds a single rule that dominated the model’s decision. We compare the explanations generated by our novel approach with the state of the art in an experimental study. We evaluate the derived explanations with simple statistical tests of well-known quality measures, precision and coverage, and a novel measure stability that is better suited to the XAI setting.

Results

Experiments on 9 CAD-related data sets showed that Ada-WHIPS explanations consistently generalise better (mean coverage 15%-68%) than the state of the art while remaining competitive for specificity (mean precision 80%-99%). A very small trade-off in specificity is shown to guard against over-fitting which is a known problem in the state of the art methods.

Conclusions

The experimental results demonstrate the benefits of using our novel algorithm for explaining CAD AdaBoost classifiers widely found in the literature. Our tightly coupled, AdaBoost-specific approach outperforms model-agnostic explanation methods and should be considered by practitioners looking for an XAI solution for this class of models.

Peer Review reports

Background

Introduction

Medical diagnosis is a complex, knowledge intensive process. A medical expert must consider the symptoms of a patient, along with their medical and family history including complications and co-morbidities [1]. The expert may carry out physical examinations and order laboratory tests and combine the results with their prior knowledge. These activities are time intensive and, increasingly, considered sources of Big Data [2, 3]. Suitably experienced, available practitioners and experts are needed to orchestrate and interpret the results, yet these experts are a scarce resource in many healthcare settings. As healthcare needs grow and the sources of medical data increase in size and complexity, the diagnostic process must scale to meet these growing demands.

State of the art machine learning (ML) methods underpin many computer aided diagnostics (CAD) systems. CAD can address the aforementioned scalability challenges and may improve patient outcomes [4–6]. These ML methods demonstrate exceptional predictive and classification accuracy and can handle high dimensional data sets that often have very high rates of missing values. Examples of such challenging data sets include high throughput bioinformatics, magnetic resonance imaging scans, microarray experiments, and complex electronic health records (EHR) [7, 8], as well as unstructured, user-generated content (e.g. from social media feeds) that have been used to learn individuals’ sub-health and mental health status outside of a clinical setting [9, 10]. Unfortunately, however, many state of the art ML models are so-called “black boxes” because they defy explanation. The complexity of black box models renders them opaque to human reasoning. Consequently, experts and medical practitioners are reluctant to accept black box models in practice since they need to reason about, verify and approve the model’s output before making a final decision. In the clinical setting, the model’s output should facilitate professional decision-making alongside their expert clinical training and experience. A standalone classification from a black box model does not serve this purpose well, if at all. This barrier to adoption is evident, even when the black box models are demonstrably more accurate [1, 11–17]. There is also a legal right to explanation for high stakes decisions, which includes medical diagnosis and treatment recommendations [5, 18].

Some might argue that a black box model is no less transparent than a doctor [19]. Nevertheless, a doctor can be asked to justify their diagnosis and will do so from a position of domain understanding. In contrast, providing explanations for black box models is a very complex challenge. These models find patterns in data without domain understanding. Yet we wish to communicate explanations to a variety of levels of domain expertise: patient, practitioner, healthcare administrators and regulators. Additionally, we set higher standards of statistical rigour before granting our trust to ML derived decisions and explanations [20, 21].

Recent studies found that classification is the most widely implemented ML task in the medical sector and solutions using the AdaBoost algorithm [22] form a significant subset of the available research. Clinical applications include the diagnosis of Alzheimer’s disease, diabetes, hypertension and various cancers [23–26]. There are also non-clinical assessments of self-reported mental health, and subhealth status. The latter is characterised by chronic fatigue and infirmity that often leads to future ill-health. These non-clinical approaches used unstructured, user generated content from online health communities [9, 10]. AdaBoost has also been used as a preprocessing tool to select automatically the most important features from high dimensional data [27, 28]. Yet, AdaBoost is considered a typical black box as a consequence of its internal structure: an ensemble of typically 100s to 1000s of shallow decision trees. The ensemble uses a weighted majority vote to classify data instances; a system that is difficult to analyse mathematically. The widespread adoption of AdaBoost in medical applications, coupled with its black box nature leads to the challenge; to make AdaBoost explainable.

We present Adaptive-Weighted High Importance Path Snippets (Ada-WHIPS), a novel method for explaining multi-class AdaBoost classification through inspection of the model internals; a collection of adaptive weighted, shallow decision trees. The method proceeds by extracting the decision path from each tree that is specific to the data instance requiring an explanation (the explanandum). Only the paths that agree with the weighted majority vote are retained. These paths are disaggregated into individual decision nodes (which we call path snippets), and the weights are reassigned according to depth within the tree and frequency within the ensemble. The most important snippets are filtered and sorted by the newly applied weights. These adaptive-weighted, high importance path snippets are then greedily added to a classification rule. The final rule is tested for quality metrics and counterfactual conditions against the training (or historical) data.

To demonstrate our contribution, we now present four illustrative examples of Ada-WHIPS explanations. These examples have been drawn at random from the data sets used in our experiments, which are all CAD or medically relevant ML problems. An Ada-WHIPS explanation is a simple, conjunctive classification rule, presented alongside confidence and counterfactual (contrast) information. This includes: generality (coverage), specificity (precision), and how much precision decreases (% points) when any single rule term is violated. The end user can immediately determine the essential attributes (the features and decision boundary) that led to the model’s confident classification:

In Table 1, statistical features computed from foetal cardiotocograms are used to diagnose heart abnormalities. In Table 2, an online health community (self-selecting) responded to a twenty-four question survey on their mental health. The classification model identifies those individuals who have actually sought treatment. The individual shown in the examples has responded that they are experiencing problems at work and that there may be a family history of mental illness. Table 3 shows attributes from an EHR that were critical in determining the risk of readmission for one particular patient. Table 4 shows the results of a classifier for abnormal thyroid conditions. Full details of the data sets used can be found in Table 6.

Table 1 Explanation of a classifier for foetal heart abnormalities

Ada-WHIPS: explaining AdaBoost classification with applications in the health sciences

Abstract

Background

Methods

Results

Conclusions

Background

Introduction

XAI and interpretable models - current state of the art

Related work

Multi-Class adaBoost

Method

Ada-WHIPS

Extract decision paths

Redistribute adaptive weights

Generate classification rule

Generate counterfactuals

Experimental design

Hardware setup

Data sets

Limitations of the study

AdaBoost model training and testing

Significance testing

Results

Worked examples

Coverage analysis

Precision analysis

Stability analysis

Efficiency analysis

Discussion

Advantages of Ada-WHIPS

Limitations of Ada-WHIPS

Challenges

Conclusion & future work

Appendix

Supplementary

Cohen’s κ

Friedman test

Availability of data and materials

Notes

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Medical Informatics and Decision Making

Contact us