Efficient Interpretation of Deep Learning Models Using Graph Structure and Cooperative Game Theory: Application to ASD Biomarker Discovery

Li, Xiaoxiao; Dvornek, Nicha C.; Zhou, Yuan; Zhuang, Juntang; Ventola, Pamela; Duncan, James S.

doi:10.1007/978-3-030-20351-1_56

Xiaoxiao Li¹⁸,
Nicha C. Dvornek¹⁹,
Yuan Zhou¹⁹,
Juntang Zhuang¹⁸,
Pamela Ventola²⁰ &
…
James S. Duncan^18,19,21,22

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11492))

Included in the following conference series:

International Conference on Information Processing in Medical Imaging

6134 Accesses
8 Citations

Abstract

Discovering imaging biomarkers for autism spectrum disorder (ASD) is critical to help explain ASD and predict or monitor treatment outcomes. Toward this end, deep learning classifiers have recently been used for identifying ASD from functional magnetic resonance imaging (fMRI) with higher accuracy than traditional learning strategies. However, a key challenge with deep learning models is understanding just what image features the network is using, which can in turn be used to define the biomarkers. Current methods extract biomarkers, i.e., important features, by looking at how the prediction changes if “ignoring” one feature at a time. However, this can lead to serious errors if the features are conditionally dependent. In this work, we go beyond looking at only individual features by using Shapley value explanation (SVE) from cooperative game theory. Cooperative game theory is advantageous here because it directly considers the interaction between features and can be applied to any machine learning method, making it a novel, more accurate way of determining instance-wise biomarker importance from deep learning models. A barrier to using SVE is its computational complexity: $2^N$ given N features. We explicitly reduce the complexity of SVE computation by two approaches based on the underlying graph structure of the input data: (1) only consider the centralized coalition of each feature; (2) a hierarchical pipeline which first clusters features into small communities, then applies SVE in each community. Monte Carlo approximation can be used for large permutation sets. We first validate our methods on the MNIST dataset and compare to human perception. Next, to insure plausibility of our biomarker results, we train a Random Forest (RF) to classify ASD/control subjects from fMRI and compare SVE results to standard RF-based feature importance. Finally, we show initial results on ranked fMRI biomarkers using SVE on a deep learning classifier for the ASD/control dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Goldani, A.A.S., Downs, S.R., Widjaja, F., Lawton, B., Hendren, R.L.: Biomarkers in autism. Front. Psychiatry 5, 100 (2014)
Article Google Scholar
Li, X., Dvornek, N.C., Zhuang, J., Ventola, P., Duncan, J.S.: Brain biomarker interpretation in ASD using deep learning and fMRI. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11072, pp. 206–214. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00931-1_24
Chapter Google Scholar
Kaiser, M.D., et al.: Neural signatures of autism. Proc. Natl. Acad. Sci. 107(49), 21223–21228 (2010)
Article Google Scholar
Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, pp. 4765–4774 (2017)
Google Scholar
Chen, J., Song, L., Wainwright, M.J., Jordan, M.I.: L-shapley and C-shapley: efficient model interpretation for structured data. arXiv preprint arXiv:1808.02610 (2018)
Kononenko, I., Strumbelj, E.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11(Jan), 1–18 (2010)
MathSciNet MATH Google Scholar
Shapley, L.S.: A value for n-person games. Contrib. Theory Games 2(28), 307–317 (1953)
MathSciNet MATH Google Scholar
Zintgraf, L.M., Cohen, T.S., Adel, T., Welling, M.: Visualizing deep neural network decisions: prediction difference analysis. arXiv preprint arXiv:1702.04595 (2017)
Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)
Article Google Scholar
Tzourio-Mazoyer, N., et al.: Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15, 273–289 (2002)
Article Google Scholar
Newman, M.: Networks. Oxford University Press, Oxford (2018)
Book Google Scholar
Li, X., et al.: 2-channel convolutional 3D deep neural network (2CC3D) for fMRI analysis: ASD classification and feature learning. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 1252–1255. IEEE (2018)
Google Scholar
Young, R.C., Biggs, J.T., Ziegler, V.E., Meyer, D.A.: A rating scale for mania: reliability, validity and sensitivity. Br. J. Psychiatry 133(5), 429–435 (1978)
Article Google Scholar
Yarkoni, T., Poldrack, R.A., Nichols, T.E., Van Essen, D.C., Wager, T.D.: Large-scale automated synthesis of human functional neuroimaging data. Nat. Methods 8(8), 665 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Biomedical Engineering, Yale University, New Haven, CT, USA
Xiaoxiao Li, Juntang Zhuang & James S. Duncan
Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
Nicha C. Dvornek, Yuan Zhou & James S. Duncan
Child Study Center, Yale School of Medicine, New Haven, CT, USA
Pamela Ventola
Electrical Engineering, Yale University, New Haven, CT, USA
James S. Duncan
Statistics and Data Science, Yale University, New Haven, CT, USA
James S. Duncan

Authors

Xiaoxiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Nicha C. Dvornek
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Juntang Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Pamela Ventola
View author publications
You can also search for this author in PubMed Google Scholar
James S. Duncan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoxiao Li .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Albert C. S. Chung
Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA
James C. Gee
Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA
Paul A. Yushkevich
Department of Natural Language Processing, Baidu Inc., Shenzhen, China
Siqi Bao

A Appendix: Proof of Theorem 2

For any subset $A\subset \mathcal {N}$, we use the short notation and , noting that $A=U_{r}(A)\cup V_{r}(A)$. Rewriting Eq. (2) as

$$ \varPhi _{r}(v_{\varvec{X}})=\frac{1}{|\mathcal {N}|}\sum _{U\subseteq \mathcal {N}_{r}\setminus \{r\}}\sum _{A\subseteq \mathcal {N},U_{r}(A)=U}\left( \begin{array}{c} |\mathcal {N}|-1\\ |A| \end{array}\right) ^{-1}(v_{\varvec{X}}(A\cup \{r\})-v_{\varvec{X}}(A)), $$

and using

$$ \sum _{A\subseteq \mathcal {N},U_{r}(A)=U}\left( \begin{array}{c} |\mathcal {N}|-1\\ |A| \end{array}\right) ^{-1}=\frac{|\mathcal {N}|}{|\mathcal {N}_{r}|}\left( \begin{array}{c} |\mathcal {N}_{r}|-1\\ |U|-1 \end{array}\right) ^{-1}, $$

the expected error between $\hat{\varPhi }_{r}^{C}(v_{\varvec{X}})$ and $\varPhi _{r}(v_{\varvec{X}})$ is

$$ \mathbb {E}[\vert \hat{\varPhi }_{r}^{C}(v_{\varvec{X}})\ -\ \varPhi _{r}(v_{\varvec{X}})\vert ]\le \frac{1}{|\mathcal {N}|}\sum _{U\subseteq \mathcal {N}_{r}\setminus \{r\}}\sum _{A\subseteq \mathcal {N},U_{r}(A)=U}\left( \begin{array}{c} |\mathcal {N}|-1\\ |A| \end{array}\right) ^{-1}\mathbb {E}[\vert \varDelta _{r}^{{\varvec{X}}}(U,A)\vert ] $$

where

$$\begin{aligned} \varDelta _{r}^{{\varvec{X}}}(U,A)&=(v_{\varvec{X}}(U\cup \{r\})-v_{\varvec{X}}(U))-(v_{\varvec{X}}(A\cup \{r\})-v_{\varvec{X}}(A))\\&=\log \frac{p(Y\vert X_{\mathcal {N}\setminus U})}{p(Y\vert X_{\mathcal {N}\setminus (U\cup \{r\})})}-\log \frac{p(Y\vert X_{\mathcal {N}\setminus (U\cup V)})}{p(Y\vert X_{\mathcal {N}\setminus (U\cup V\cup \{r\}})}, \end{aligned}$$

with V short for $V_{r}(A)$. Let $W=\mathcal {N}\setminus (\mathcal {N}_{r}\cup V)$, $Z=\mathcal {N}_{r}\setminus (\{r\}\cup U)$. Then

$$\begin{aligned} \varDelta _{r}^{{\varvec{X}}}(U,A)&=\log \frac{p(Y\vert X_{W\cup V\cup Z\cup \{r\}})p(Y\vert X_{W\cup Z})}{p(Y\vert X_{W\cup V\cup Z})p(Y\vert X_{W\cup Z\cup \{r\}})}. \end{aligned}$$

(9)

Since $X_{r}\perp X_{V}\vert X_{Z}$, we have $p(X_{V}\vert X_{W\cup Z\cup \{r\}})=p(X_{V}\vert X_{W\cup Z})$, and

$$ (\star )=\frac{p(X_{W\cup V\cup Z\cup \{r\}})p(X_{W\cup Z})}{p(X_{W\cup V\cup Z})p(X_{W\cup Z\cup \{r\}})}=\frac{p(X_{W\cup Z\cup \{r\}})p(X_{V}\vert X_{W\cup Z\cup \{r\}})p(X_{W\cup Z})}{p(X_{W\cup Z})p(X_{V}\vert X_{W\cup Z})p(X_{W\cup Z\cup \{r\}})}=1. $$

We can multiply the quotient in Eq. (9) by $(\star )$,

$$\begin{aligned} \varDelta _{r}^{{\varvec{X}}}(U,A)&=\log \frac{p(Y\vert X_{W\cup V\cup Z\cup \{r\}})p(Y\vert X_{W\cup Z})}{p(Y\vert X_{W\cup V\cup Z})p(Y\vert X_{W\cup Z\cup \{r\}})}\frac{p(X_{W\cup V\cup Z\cup \{r\}})p(X_{W\cup Z})}{p(X_{W\cup V\cup Z})p(X_{W\cup Z\cup \{r\}})}\\&=\log \frac{p(X_{W\cup V\cup \{r\}}\vert Y,X_{Z})p(Y,X_{Z})p(Y,X_{Z})p(X_{W}\vert Y,X_{Z})}{p(Y,X_{Z})p(X_{W\cup V}\vert Y,X_{Z})p(Y,X_{Z})p(X_{W\cup \{r\}}\vert Y,X_{Z})}. \end{aligned}$$

We have $p(X_{W\cup V\cup \{r\}}\vert Y,X_{Z})=p(X_{W\cup V}\vert Y,X_{Z})p(X_{r}\vert Y,X_{Z})$, since $X_{W\cup V}\perp X_{r}\vert Y,X_{Z}$. So

$$ \varDelta _{r}^{{\varvec{X}}}(U,A)=\log \frac{p(X_{W\cup V}\vert Y,X_{Z})p(X_{r}\vert Y,X_{Z})p(X_{W}\vert Y,X_{Z})}{p(X_{W\cup V}\vert Y,X_{Z})p(X_{W\cup \{r\}}\vert Y,X_{Z})}. $$

Since $X_{W}\perp X_{r}\vert Y,X_{Z}$, we have $p(X_{W\cup \{r\}}\vert Y,X_{Z})=p(X_{W}\vert Y,X_{Z})p(X_{r}\vert Y,X_{Z})$. Hence $\varDelta _{r}^{{\varvec{X}}}(U,A)=\log 1=0$. Therefore we have $\mathbb {E}[\vert \hat{\varPhi }_{r}^{C}(v_{\varvec{X}})-\varPhi _{r}(v_{\varvec{X}})\vert ]=0$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Dvornek, N.C., Zhou, Y., Zhuang, J., Ventola, P., Duncan, J.S. (2019). Efficient Interpretation of Deep Learning Models Using Graph Structure and Cooperative Game Theory: Application to ASD Biomarker Discovery. In: Chung, A., Gee, J., Yushkevich, P., Bao, S. (eds) Information Processing in Medical Imaging. IPMI 2019. Lecture Notes in Computer Science(), vol 11492. Springer, Cham. https://doi.org/10.1007/978-3-030-20351-1_56

Download citation

DOI: https://doi.org/10.1007/978-3-030-20351-1_56
Published: 22 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20350-4
Online ISBN: 978-3-030-20351-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Interpretation of Deep Learning Models Using Graph Structure and Cooperative Game Theory: Application to ASD Biomarker Discovery

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix: Proof of Theorem 2

A Appendix: Proof of Theorem 2

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation