An interpretable machine learning model for diagnosis of Alzheimer's disease

We present an interpretable machine learning model for medical diagnosis called sparse high-order interaction model with rejection option (SHIMR). A decision tree explains to a patient the diagnosis with a long rule (i.e., conjunction of many intervals), while SHIMR employs a weighted sum of short rules. Using proteomics data of 151 subjects in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, SHIMR is shown to be as accurate as other non-interpretable methods (Sensitivity, SN = 0.84 ± 0.1, Specificity, SP = 0.69 ± 0.15 and Area Under the Curve, AUC = 0.86 ± 0.09). For clinical usage, SHIMR has a function to abstain from making any diagnosis when it is not confident enough, so that a medical doctor can choose more accurate but invasive and/or more costly pathologies. The incorporation of a rejection option complements SHIMR in designing a multistage cost-effective diagnosis framework. Using a baseline concentration of cerebrospinal fluid (CSF) and plasma proteins from a common cohort of 141 subjects, SHIMR is shown to be effective in designing a patient-specific cost-effective Alzheimer’s disease (AD) pathology. Thus, interpretability, reliability and having the potential to design a patient-specific multistage cost-effective diagnosis framework can make SHIMR serve as an indispensable tool in the era of precision medicine that can cater to the demand of both doctors and patients, and reduce the overwhelming financial burden of medical diagnosis.

) are examples of state of the art CAD methods. However, it is often the 49 case that a medical practitioner cannot rely on state-of-the-art CAD methods despite its high accuracy. 50 Because, most of these are opaque and cannot answer the basic question-why/how has it reached to 51 such a decision and why/how is it biologically relevant? (Freitas et al., 2010;Freitas, 2006;Burrell, 52 2016; Ribeiro et al., 2016b). Recently European Union has issued a "General Data Protection Regulation 53 (GDPR)" on algorithmic decision-making and a "right to explanation" (Goodman and Flaxman, 2016) 54 which mandate that a data subject has the right to "meaningful information about the logic involved in the 55 decision making". In other words, the GDPR requires that communication with data subjects has to be 56 made in a "concise, intelligible, and easily accessible form". Therefore, to cater to the demand of both, a 57 medical practitioner (doctor) and a subject (patient), the most effective approach would be the design of    , which are generated by SHIMR (B), are described as an intersection matrix (C). Each row in the intersection matrix (C) represents individual protein and each column represents interaction among proteins constituting a rule. Proteins selected by a rule are represented by 'light red gauge' (semicircles), whereas the unselected ones by 'light green gauge'. The exact selected range of a particular protein is highlighted by 'dark red wedge'. The blue rectangular box surrounding a set of proteins highlights the selected protein combination (or feature) for a subject. Blue colored bars above each column show the importance of each rule contributing to the overall "model score". The generated model score for a subject (patient) is also highlighted by an 'orange pointer' over a color bar at the top of each plot. This color bar describes the overall range of model scores. Construction of the intersection matrix from different concentrations of individual proteins is also shown (D). Each gauge represents the concentration range of a particular protein where the left end represents the minimum and the right end represents the maximum. The 'orange pointer' over each protein gauge describes the exact value of protein concentration corresponding to a particular subject. Abbreviation: NC, normal control; AD, Alzheimer's disease, R, rejected; R1, rule 1; R2, rule 2; R3, rule 3; f1, feature 1 of rule 1; f2, feature 2 of rule 1.
The incorporation of rejection option complements SHIMR to design a multistage cost-effective 78 framework ( Fig. 1)     Let H = {h 1 , . . . , h M } denote the set of all possible conjunction rules, where each feature x i is divided into a fixed number of intervals. SHIMR learns the following decision function from data, where x is the feature vector of a patient, a j is a weight associated with h j , and b is the bias term. In learning from data {x i , y i } n i=1 , the following objective function is minimized with respect to a j and b.
where φ is a loss function explained in the next section, C + and C − are the regularization parameters for 138 positive and negative classes, respectively. This is an extremely high dimensional problem, but at the 139 optimal solution, there are only a limited number of non-zero weights due to L1-norm regularization. We of Article S1 for details about the learning procedure.

144
Decision making from f (x) is affected by the cost of rejection. Rejection is not so bad as misclassifiation, but incurs some cost as we need a different means for final decision. Assuming that the cost of misclassification is one, let us define 0 ≤ d ≤ 0.5 as the cost of rejection. Bartlett and Wegkamp (Bartlett and Wegkamp, 2008) showed that, if η(x) = P(y = 1 | x) is the posterior probability of x being classified to the positive class, the following decision rule achieves the smallest expected cost, We use the above rule after converting f (x) to the posterior probability via isotonic calibration. Loss function If the cost of rejection is known, it is reasonable to incorporate it in the loss function φ in the learning procedure. We use the following double hinge function proposed by Bartlett and Wegkamp (Bartlett and Wegkamp, 2008),

5/13
PeerJ reviewing PDF | (2018:10:31754:2:1:NEW 28 Jan 2019) Fig. 3  proteins as the starting set of analytes. This is a collection of proteins responsible for AD pathology as plasma proteins can be found in Table S1. We also considered baseline concentration of cerebrospinal fluid 155 (CSF) and plasma proteins of 141 common cohort to demonstrate a cost-effective diagnosis framework.

156
Out of these 141 subjects, 88 subjects were diagnosed as AD and 53 subjects were diagnosed as NC.

157
For CSF data we used tau, amyloid-β (Aβ ) and phosphorylated tau (p-tau) proteins. We mainly used 158 the ratio tau/Aβ and p-tau/Aβ as the features for CSF analysis. Baseline demographic information of 159 all 151 subjects used in the current study is shown in In this section we will present that our method has the ability to produce comparable accuracy as other   decision tree methods. We also compared the classification performance of SHIMR with that of CORELS 211 using ADNI plasma data. Comparing the results of SHIMR (Table S5) with CORELS (Table S3, Table   212 S4), it can be observed that CORELS can produce more interpretable model by generating less number of  Table S3 and Table S4 can be 216 found in "Supplemental Results" of Article S1.

219
The visual representation of SHIMR can also help to explain a specific medical condition and its associated  Here, we displayed only top 10 rules because of the space constraint. Therefore, sum of the weights of the displayed selected rules may not match the model score as in the case of (B). This will match exactly if the full model is displayed.

234
Treatment of AD is often hampered due to the lack of easily accessible and cost-effective biomarkers with 235 reliable diagnostic accuracy as highlighted in the introduction section. In this section we will evaluate 236 that how our method (SHIMR) can lead to an interpretable cost-effective multistage framework for 237 clinical diagnosis by exploiting the notion of "classification with rejection option". Here we propose a 238 cost-effective framework in the context of precision medicine. Figure 6A describes the effect of rejection 239 for the classification of NC vs AD using plasma and CSF data and how it can be exploited to design 240 a cost-effective pathology for AD treatment. It can be seen that as the rejection rate is increased, the 241 classification accuracy improves with increased prediction reliability (increased rejection rate infers higher 242 decision threshold). Starting with an ACC of 0.74 at no rejection (RR=0), it is possible to achieve a 243 classification ACC=0.9 at a higher rejection rate (RR=0.38) using plasma data ( Fig. 6A: Plasma). On the 244 other hand, if CSF data is used for the same classification task, a more reliable prediction (ACC=0.87) can 245 be achieved with no rejection (Fig. 6A: CSF). Therefore, it can be argued that those 12 subjects (RR=0.26) 246 who are rejected using low cost and easily accessible plasma biomarkers can now be recommended for 247 a more sophisticated screening (e.g. CSF biomarkers). It can be observed from Table S2 that    To understand how SHIMR internally works, the accuracy vs rejection rate trade-off for the classifica-266 tion of NC vs AD using plasma data has been depicted in Fig. 6C. It can be observed that as rejection 267 rate is increased, more and more data points which are close to the decision boundary and hence hard 268 to classify get rejected and thus resulted in an improved accuracy after rejection. In order to understand 269 how CSF data can be used to classify those rejected data points, a 2D decision boundary generated by 270 SVM classifier for the same classification task has been plotted (Fig. 6D). The data points rejected by 271 SHIMR using plasma have been highlighted by drawing a circle around respective data points on the 272 same plot. In Fig. 6D, different decision confidence zones based on predicted probabilities have been 273 highlighted using different colors. The dark red region represents high confidence zone for AD with 274 positive predicted probability value more than 80% and dark blue region represents high confidence 275 zone for NC with positive predicted probability value less than 20%. The region of intermediate positive 276 predicted probability values are highlighted with light shades of respective colors. It is important to 277 mention that SHIMR has the ability to identify the ambiguous low confidence zones (light red or blue) 278 and refrain from taking any decision (reject) for those data points falling in that zone. Therefore, high Manuscript to be reviewed rejection rate conforms to high prediction probability of the classified samples and hence more reliability in prediction. In a sense SHIMR makes decision only for those data points for which it is highly confident 281 (high positive predictive probability) and thus can serve as a highly reliable CAD model to a medical 282 practitioner (e.g. Doctor).

284
To summarize, we have presented a highly accurate, interpretable and cost effected machine learning 285 framework in the context of precision medicine. We have formulated a sparse high order interaction 286 model with embedded rejection option and solved it using simplex based column generation method.

287
The learning objective function is linear and convex, and hence it is possible to find globally optimum Alzheimer's disease from normal control using ADNI dataset and presented a cost-effective AD pathology.

294
This iterative simultaneous forward feature and sample selection method can lead to a very high accuracy: 295 ACC=0.9 (AUC=0.82) at RR=0.38 for AD vs NC classification using plasma data. This potentially leads 296 to a highly confident prediction model, a much desirable aspect in clinical diagnosis. We have shown 297 that it is possible to design a patient specific systematic multistage cost-effective AD pathology using 298 low cost plasma profile followed by more advanced screening such as CSF. A large scale preventive 299 care is possible by exploiting such patient specific machine learning framework which leverages low 300 cost and easily accessible plasma pathology as an early predictor of AD and subsequently recommends 301 advanced pathology to those patients only for which it is not possible to generate desired level of accuracy 302 using low cost pathology. However, over-reliance on machine learning based automated diagnosis is a 303 matter of concern as a single "False Negative" is highly expensive as it is associated with the life of the 304 person being treated. Obvious social implication can be who or what would be made responsible for 305 such misdiagnosis. Another concern can be related to the privacy and security of the patient data used 306 for automated diagnosis. Automated diagnosis relies on electronic health records, the very construction 307 of which may induce large and systematic mismeasurement, resulting in bias in automated diagnosis.

308
Interpretable models such as SHIMR alleviates such concerns associated with automated diagnosis, but it 309 does not completely eliminate the role of a medical practitioner in medical diagnosis. A consensus among 310 machine derived diagnosis and diagnosis based on human expert knowledge is desirable. Therefore, 311 human intervention is inescapable in medical diagnosis where a doctor, expert in this domain can validate 312 the automated diagnosis and use CAD as a helping hand and not as an entity of complete reliance.

314
We would like to thank David duVerle and Aika Terada for their fruitful discussions. Data used in 315 preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) 316 database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design 317 and implementation of ADNI and/or provided data but did not participate in analysis or writing of 318 this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.