skip to main content
research-article

Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs

Published:29 May 2020Publication History
Skip Abstract Section

Abstract

As the use of machine learning (ML) models in product development and data-driven decision-making processes became pervasive in many domains, people's focus on building a well-performing model has increasingly shifted to understanding how their model works. While scholarly interest in model interpretability has grown rapidly in research communities like HCI, ML, and beyond, little is known about how practitioners perceive and aim to provide interpretability in the context of their existing workflows. This lack of understanding of interpretability as practiced may prevent interpretability research from addressing important needs, or lead to unrealistic solutions. To bridge this gap, we conducted 22 semi-structured interviews with industry practitioners to understand how they conceive of and design for interpretability while they plan, build, and use their models. Based on a qualitative analysis of our results, we differentiate interpretability roles, processes, goals and strategies as they exist within organizations making heavy use of ML models. The characterization of interpretability work that emerges from our analysis suggests that model interpretability frequently involves cooperation and mental model comparison between people in different roles, often aimed at building trust not only between people and models but also between people within the organization. We present implications for design that discuss gaps between the interpretability challenges that practitioners face in their practice and approaches proposed in the literature, highlighting possible research directions that can better address real-world needs.

References

  1. Josh M Attenberg, Pagagiotis G Ipeirotis, and Foster Provost. 2011. Beat the machine: Challenging workers to find the unknown unknowns. In Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  2. Aparna Balagopalan, Jekaterina Novikova, Frank Rudzicz, and Marzyeh Ghassemi. 2018. The Effect of Heterogeneous Data for Alzheimer's Disease Detection from Speech. arXiv preprint arXiv:1811.12254 (2018).Google ScholarGoogle Scholar
  3. Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel SWeld,Walter S Lasecki, and Eric Horvitz. 2019. Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 2429--2437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Adrien Bibal and Benoît Frenay. 2016. Interpretability of Machine Learning Models and Representations: an Introduction. In 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. 77--82.Google ScholarGoogle Scholar
  5. Ann Bostrom, Baruch Fischhoff, and M Granger Morgan. 1992. Characterizing mental models of hazardous processes: A methodology and an application to radon. Journal of social issues 48, 4 (1992), 85--100.Google ScholarGoogle ScholarCross RefCross Ref
  6. Leo Breiman. 2017. Classification and Regression Trees. Routledge.Google ScholarGoogle Scholar
  7. Adrian Bussone, Simone Stumpf, and Dympna O'Sullivan. 2015. The role of explanations on trust and reliance in clinical decision support systems. In 2015 International Conference on Healthcare Informatics. IEEE, 160--169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1721--1730. https://doi.org/10.1145/2783258.2788613Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Minsuk Choi, Cheonbok Park, Soyoung Yang, Yonggyu Kim, Jaegul Choo, and Sungsoo Ray Hong. 2019. AILA: Attentive Interactive Labeling Assistant for Document Classification through Attention-Based Deep Neural Networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, Article Paper 230, 12 pages. https://doi.org/10.1145/3290605.3300460Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. John Joon Young Chung, Jean Y Song, Sindhu Kutty, Sungsoo Ray Hong, Juho Kim, andWalter S Lasecki. 2019. Efficient Elicitation Approaches to Estimate Collective Crowd Answers. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1--25. https://doi.org/10.1145/3359164Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mark Craven and Jude W Shavlik. 1996. Extracting tree-structured representations of trained networks. In Advances in neural information processing systems. 24--30.Google ScholarGoogle Scholar
  12. John W Creswell and Cheryl N Poth. 2016. Qualitative inquiry and research design: Choosing among five approaches. Sage publications.Google ScholarGoogle Scholar
  13. D. Dingen, M. van't Veer, P. Houthuizen, E. H. J. Mestrom, E. H. H. M. Korsten, A. R. A. Bouwman, and J. van Wijk. 2019. RegressionExplorer: Interactive Exploration of Logistic Regression Models with Subgroup Analysis. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 246--255. https://doi.org/10.1109/TVCG.2018.2865043Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Finale Doshi-Velez and Been Kim. 2017. Towards a Rigorous Science of Interpretable Machine Learning. arXiv preprint arXiv:1702.08608 (2017). https://arxiv.org/abs/1702.08608Google ScholarGoogle Scholar
  15. Finale Doshi-Velez and Been Kim. 2018. Considerations for Evaluation and Generalization in Interpretable Machine Learning. Springer International Publishing, 3--17. https://doi.org/10.1007/978--3--319--98131--4_1Google ScholarGoogle Scholar
  16. Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. 2017. Dermatologist-level Classification of Skin Cancer with Deep Neural Networks. Nature 542, 7639 (2017), 115.Google ScholarGoogle Scholar
  17. David N Ford and John D Sterman. 1998. Expert knowledge elicitation to improve formal and mental models. System Dynamics Review: The Journal of the System Dynamics Society 14, 4 (1998), 309--340.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.Google ScholarGoogle Scholar
  19. Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining Explanations: An Overview of Interpretability of Machine Learning. 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) (2018). https://doi.org/10.1109/dsaa.2018.00018Google ScholarGoogle Scholar
  20. Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1--24. https://doi.org/10.1145/3359152Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2019. A Survey of Methods for Explaining Black Box Models. ACM Computing Surveys (CSUR) 51, 5 (2019), 93. https://doi.org/10.1145/3236009Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Maya Gupta, Andrew Cotter, Jan Pfeifer, Konstantin Voevodski, Kevin Canini, Alexander Mangylov, Wojciech Moczydlowski, and Alexander Van Esbroeck. 2016. Monotonic calibrated interpolated look-up tables. The Journal of Machine Learning Research 17, 1 (2016), 3790--3836.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, et al. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ACM, 1--9. https://doi.org/10.1145/2648584.2648589Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Fred Hohman, Andrew Head, Rich Caruana, Robert DeLine, and Steven M. Drucker. 2019. Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, 579:1--579:13. https://doi.org/10.1145/3290605.3300809Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kenneth Holstein, Bruce M McLaren, and Vincent Aleven. 2018. Student learning benefits of a mixed-reality teacher awareness tool in AI-enhanced classrooms. In International Conference on Artificial Intelligence in Education. Springer, 154--168.Google ScholarGoogle ScholarCross RefCross Ref
  26. Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé, III, Miro Dudik, and Hanna Wallach. [n. d.]. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, 600:1--600:16. https://doi.org/10.1145/3290605.3300830Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sungsoo Ray Hong, Jorge Piazentin Ono, Juliana Freire, and Enrico Bertini. 2019. Disseminating Machine Learning to domain experts: Understanding challenges and opportunities in supporting a model building process. In CHI 2019 Workshop, Emerging Perspectives in Human-Centered Machine Learning. ACM.Google ScholarGoogle Scholar
  28. Jessica Hullman, Paul Resnick, and Eytan Adar. 2015. Hypothetical outcome plots outperform error bars and violin plots for inferences about reliability of variable ordering. PloS one 10, 11 (2015). https://doi.org/10.1371/journal.pone.0142444Google ScholarGoogle Scholar
  29. M. Kahng, P. Y. Andrews, A. Kalro, and D. H. Chau. 2018. ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 88--97. https://doi.org/10.1109/TVCG.2017.2744718Google ScholarGoogle ScholarCross RefCross Ref
  30. Faisal Kamiran and Toon Calders. 2009. Classifying without discriminating. In 2009 2nd International Conference on Computer, Control and Communication. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  31. Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. 2012. Enterprise data analysis and visualization: An interview study. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2917--2926.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'20). ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Matthew Kay, Tara Kola, Jessica R Hullman, and Sean A Munson. 2016. When (ish) is my bus? user-centered visualizations of uncertainty in, everyday, mobile predictive systems. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 5092--5103. https://doi.org/10.1145/2858036.2858558Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016. Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems. 2280--2288.Google ScholarGoogle Scholar
  35. Yea-Seul Kim, Katharina Reinecke, and Jessica Hullman. 2017. Data through others' eyes: The impact of visualizing others' expectations on visualization interpretation. IEEE transactions on visualization and computer graphics 24, 1 (2017), 760--769.Google ScholarGoogle Scholar
  36. Yea-Seul Kim, Katharina Reinecke, and Jessica Hullman. 2017. Explaining the gap: Visualizing one's predictions improves recall and comprehension of data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 1375--1386.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ronald T Kneusel and Michael C Mozer. 2017. Improving Human-machine Cooperative Visual Search with Soft Highlighting. ACM Transactions on Applied Perception (TAP) 15, 1 (2017), 3. https://doi.org/10.1145/3129669Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Josua Krause, Adam Perer, and Kenney Ng. 2016. Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 5686--5697. https://doi.org/10.1145/2858036.2858529Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of Explanatory Debugging to Personalize Interactive Machine Learning. In Proceedings of the International Conference on Intelligent User Interfaces (IUI'15). ACM, 126--137. https://doi.org/10.1145/2678025.2701399Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Isaac Lage, Andrew Ross, Samuel J Gershman, Been Kim, and Finale Doshi-Velez. 2018. Human-in-the-loop interpretability prior. In Advances in Neural Information Processing Systems. 10159--10168.Google ScholarGoogle Scholar
  41. Derek Layder. 1998. Sociological practice: Linking theory and social research. Sage.Google ScholarGoogle Scholar
  42. Benjamin Letham, Cynthia Rudin, Tyler H McCormick, David Madigan, et al. 2015. Interpretable Classifiers using Rules and Bayesian analysis: Building a Better Stroke Prediction Model. The Annals of Applied Statistics 9, 3 (2015), 1350--1371. https://doi.org/10.1214/15-AOAS848Google ScholarGoogle ScholarCross RefCross Ref
  43. Zachary Chase Lipton. 2016. The Mythos of Model Interpretability. CoRR abs/1606.03490 (2016). http://arxiv.org/abs/1606.03490Google ScholarGoogle Scholar
  44. Paul Luff, Jon Hindmarsh, and Christian Heath. 2000. Workplace studies: Recovering work practice and informing system design. Cambridge university press.Google ScholarGoogle ScholarCross RefCross Ref
  45. Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 (NIPS '17. Curran Associates, Inc., 4765--4774.Google ScholarGoogle Scholar
  46. Gang Luo. 2016. Automatically Explaining Machine Learning Prediction Results: a Demonstration on Type 2 Diabetes Risk Prediction. Health information science and systems 4, 1 (2016), 2. https://doi.org/10.1186/s13755-016-0015--4Google ScholarGoogle Scholar
  47. Vikash Mansinghka, Richard Tibbetts, Jay Baxter, Pat Shafto, and Baxter Eaves. 2015. BayesDB: A probabilistic programming system for querying the probable implications of data. arXiv preprint arXiv:1512.05006 (2015).Google ScholarGoogle Scholar
  48. David Martens and Foster Provost. 2014. Explaining data-driven document classifications. MIS Quarterly 38 (2014), 73--100. Issue 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et al. 2013. Ad Click Prediction: a View from the Trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1222--1230. https://doi.org/10.1145/2487575.2488200Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Albert Meijer and Martijn Wessels. 2019. Predictive Policing: Review of Benefits and Drawbacks. International Journal of Public Administration 42, 12 (2019), 1031--1039. https://doi.org/10.1080/01900692.2019.1575664Google ScholarGoogle ScholarCross RefCross Ref
  51. Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence 267 (2019), 1--38. https://doi.org/10.1016/j.artint.2018.07.007Google ScholarGoogle ScholarCross RefCross Ref
  52. Yao Ming, Huamin Qu, and Enrico Bertini. 2018. RuleMatrix: Visualizing and Understanding Classifiers with Rules. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 342--352. https://doi.org/10.1109/TVCG.2018.2864812Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Eric Mjolsness and Dennis DeCoste. 2001. Machine Learning for Science: State of the Art and Future Prospects. Science 293, 5537 (2001), 2051--2055. https://doi.org/10.1126/science.293.5537.2051Google ScholarGoogle Scholar
  54. Ceena Modarres, Mark Ibrahim, Melissa Louie, and John W. Paisley. 2018. Towards Explainable Deep Learning for Credit Lending: A Case Study. CoRR abs/1811.06471 (2018). arXiv:1811.06471 http://arxiv.org/abs/1811.06471Google ScholarGoogle Scholar
  55. Christoph Molnar. 2019. Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book/ Accessed: 2019-09--19.Google ScholarGoogle Scholar
  56. Menaka Narayanan, Emily Chen, Jeffrey He, Been Kim, Sam Gershman, and Finale Doshi-Velez. 2018. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation. CoRR abs/1802.00682 (2018). http://arxiv.org/abs/1802.00682Google ScholarGoogle Scholar
  57. Luke Oakden-Rayner. 2017. Exploring the ChestXray14 dataset: problems. https://lukeoakdenrayner.wordpress.com/2017/12/18/the-chestxray14-dataset-problems/. Accessed: 2019-09--19.Google ScholarGoogle Scholar
  58. Anthony O'Hagan, Caitlin E Buck, Alireza Daneshkhah, J Richard Eiser, Paul H Garthwaite, David J Jenkinson, Jeremy E Oakley, and Tim Rakow. 2006. Uncertain judgements: eliciting experts' probabilities. John Wiley & Sons.Google ScholarGoogle Scholar
  59. Parul Pandey. 2019. TCAV: Interpretability Beyond Feature Attribution. https://towardsdatascience.com/tcav-interpretability-beyond-feature-attribution-79b4d3610b4d Accessed: 2019-09--19.Google ScholarGoogle Scholar
  60. Kayur Patel, Steven Drucker, James Fogarty, Ashish Kapoor, and Desney Tan. 2011. Using Multiple Models to Understand Data. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2011). AAAI Press, 1723--1728.Google ScholarGoogle Scholar
  61. Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake M. Hofman, JenniferWortman Vaughan, and Hanna M.Wallach. 2018. Manipulating and Measuring Model Interpretability. CoRR abs/1802.07810 (2018). arXiv:1802.07810 http://arxiv.org/abs/1802.07810Google ScholarGoogle Scholar
  62. Donghao Ren, Saleema Amershi, Bongshin Lee, Jina Suh, and Jason D Williams. 2016. Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE transactions on visualization and computer graphics 23, 1 (2016), 61--70. https://doi.org/10.1109/TVCG.2016.2598828Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Marco Ribeiro, Tulio, Carlos Sameer Singh, and Carlos Guestrin. [n. d.]. Anchors: High-precision Model-agnostic Explanations. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2018). AAAI Press.Google ScholarGoogle Scholar
  64. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In Proceedings of ACM International Conference on Knowledge Discovery and Data Mining (KDD'16). ACM, 1135--1144. https://doi.org/10.1145/2939672.2939778Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Feras Saad and Vikash Mansinghka. 2016. Detecting dependencies in sparse, multivariate databases using probabilistic programming and non-parametric Bayes. arXiv preprint arXiv:1611.01708 (2016).Google ScholarGoogle Scholar
  66. Johnny Saldaña. 2015. The coding manual for qualitative researchers. Sage.Google ScholarGoogle Scholar
  67. Philipp Schmidt and Felix Biessmann. 2019. Quantifying Interpretability and Trust in Machine Learning Systems. arXiv preprint arXiv:1901.08558 (2019).Google ScholarGoogle Scholar
  68. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside Convolutional Networks: Visualising Image Classification models and Saliency Maps. arXiv preprint arXiv:1312.6034 (2013). https://arxiv.org/abs/1312.6034v2Google ScholarGoogle Scholar
  69. Daniel Smilkov, Shan Carter, D. Sculley, Fernanda B. Viégas, and Martin Wattenberg. 2017. Direct-Manipulation Visualization of Deep Networks. CoRR abs/1708.03788 (2017). http://arxiv.org/abs/1708.03788Google ScholarGoogle Scholar
  70. Justin Talbot, Bongshin Lee, Ashish Kapoor, and Desney S Tan. 2009. EnsembleMatrix: Interactive Visualization to Support Machine Learning with Multiple Classifiers. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'09). ACM, 1283--1292. https://doi.org/10.1145/1518701.1518895Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AI. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 24. https://doi.org/10.1145/3359313Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Tong Wang, Cynthia Rudin, Finale Doshi-Velez, Yimin Liu, Erica Klampfl, and Perry MacNeille. 2017. A Bayesian Framework for Learning Rule Sets for Interpretable Classification. The Journal of Machine Learning Research 18, 1 (2017), 2357--2393. https://doi.org/10.1109/ICDM.2016.0171Google ScholarGoogle ScholarCross RefCross Ref
  73. Kanit Wongsuphasawat, Daniel Smilkov, James Wexler, Jimbo Wilson, Dandelion Mane, Doug Fritz, Dilip Krishnan, Fernanda B Viégas, and MartinWattenberg. 2017. Visualizing Dataflow Graphs of Deep learning Models in Tensorflow. IEEE transactions on visualization and computer graphics 24, 1 (2017), 1--12. https://doi.org/10.1109/TVCG.2017.2744878Google ScholarGoogle Scholar
  74. Qian Yang, Jina Suh, Nan-Chen Chen, and Gonzalo Ramos. 2018. Grounding Interactive Machine Learning Tool Design in How Non-Experts Actually Build Models. In Proceedings of the 2018 Designing Interactive Systems Conference (DIS'18). ACM, New York, NY, USA, 573--584. https://doi.org/10.1145/3196709.3196729Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Amy X Zhang, Michael Muller, and Dakuo Wang. 2020. How do Data Science Workers Collaborate? Roles, Workflows, and Tools. Proceedings of the ACM on Human-Computer Interaction CSCW (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Jiawei Zhang, YangWang, Piero Molino, Lezhi Li, and David S. Ebert. 2019. Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 364--373. https://doi.org/10.1109/tvcg.2018.2864499Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Zijian Zhang, Jaspreet Singh, Ujwal Gadiraju, and Avishek Anand. 2019. Dissonance between human and machine understanding. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1--23. https://doi.org/10.1145/3359158Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader