ABSTRACT
Discussion of the “right to an explanation” has been increasingly relevant because of its potential utility for auditing automated decision systems, as well as for making objections to such decisions. However, most existing work on explanations focuses on collaborative environments, where designers are motivated to implement good-faith explanations that reveal potential weaknesses of a decision system. This motivation may not hold in an auditing environment. Thus, we ask: how much could explanations be used maliciously to defend a decision system? In this paper, we demonstrate how a black-box explanation system developed to defend a black-box decision system could manipulate decision recipients or auditors into accepting an intentionally discriminatory decision model. In a case-by-case scenario where decision recipients are unable to share their cases and explanations, we find that most individual decision recipients could receive a verifiable justification, even if the decision system is intentionally discriminatory. In a system-wide scenario where every decision is shared, we find that while justifications frequently contradict each other, there is no intuitive threshold to determine if these contradictions are because of malicious justifications or because of simplicity requirements of these justifications conflicting with model behavior. We end with discussion of how system-wide metrics may be more useful than explanation systems for evaluating overall decision fairness, while explanations could be useful outside of fairness auditing.
- Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, Raja Chatila, and Francisco Herrera. 2019. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. (2019). https://doi.org/10.48550/ARXIV.1910.10045 arxiv:1910.10045 [cs.AI]Google ScholarDigital Library
- Ulrich Aïvodji, Hiromi Arai, Olivier Fortineau, Sébastien Gambs, Satoshi Hara, and Alain Tapp. 2019. Fairwashing: the risk of rationalization. (2019). https://doi.org/10.48550/ARXIV.1901.09749 arxiv:1901.09749 [cs.LG]Google Scholar
- Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the Whole Exceed Its Parts? The Effect of AI Explanations on Complementary Team Performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3411764.3445717Google ScholarDigital Library
- Solon Barocas, Andrew D. Selbst, and Manish Raghavan. 2020. The hidden assumptions behind counterfactual explanations and principal reasons. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency(FAT* ’20). Association for Computing Machinery, New York, NY, USA, 80–89. https://doi.org/10.1145/3351095.3372830Google ScholarDigital Library
- Umang Bhatt, Adrian Weller, and José M. F. Moura. 2020. Evaluating and Aggregating Feature-based Model Explanations. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 3016–3022. https://doi.org/10.24963/ijcai.2020/417Google ScholarCross Ref
- Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O’Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas, Maritza Johnson, Ben Laurie, Alex Ingerman, Igor Krawczuk, Amanda Askell, Rosario Cammarota, Andrew Lohn, David Krueger, Charlotte Stix, Peter Henderson, Logan Graham, Carina Prunkl, Bianca Martin, Elizabeth Seger, Noa Zilberman, Seán Ó hÉigeartaigh, Frens Kroeger, Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, Elizabeth Barnes, Allan Dafoe, Paul Scharre, Ariel Herbert-Voss, Martijn Rasser, Shagun Sodhani, Carrick Flynn, Thomas Krendl Gilbert, Lisa Dyer, Saif Khan, Yoshua Bengio, and Markus Anderljung. 2020. Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. (2020). https://doi.org/10.48550/ARXIV.2004.07213 arxiv:2004.07213 [cs.CY]Google Scholar
- Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-Assisted Decision-Making. Proc. ACM Hum.-Comput. Interact. 5, CSCW1 (April 2021). https://doi.org/10.1145/3449287Google ScholarDigital Library
- Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic Decision Making and the Cost of Fairness(KDD ’17). Association for Computing Machinery, New York, NY, USA, 797–806. https://doi.org/10.1145/3097983.3098095Google ScholarDigital Library
- Sanjoy Dasgupta, Nave Frost, and Michal Moshkovitz. 2022. Framework for Evaluating Faithfulness of Local Explanations. (2022). https://doi.org/10.48550/ARXIV.2202.00734 arxiv:2202.00734 [cs.LG]Google Scholar
- Jonathan Dodge, Q. Vera Liao, Yunfeng Zhang, Rachel K. E. Bellamy, and Casey Dugan. 2019. Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment. In Proceedings of the 24th International Conference on Intelligent User Interfaces (Marina del Ray, California) (IUI ’19). Association for Computing Machinery, New York, NY, USA, 275–285. https://doi.org/10.1145/3301275.3302310Google ScholarDigital Library
- Finale Doshi-Velez and Been Kim. 2017. Towards A Rigorous Science of Interpretable Machine Learning. (2017). https://doi.org/10.48550/ARXIV.1702.08608 arxiv:1702.08608 [cs.AI]Google Scholar
- Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O’Brien, Kate Scott, Stuart Schieber, James Waldo, David Weinberger, Adrian Weller, and Alexandra Wood. 2017. Accountability of AI Under the Law: The Role of Explanation. (2017). https://doi.org/10.48550/ARXIV.1711.01134 arxiv:1711.01134 [cs.AI]Google Scholar
- Lilian Edwards and Michael Veale. 2017. Slave to the Algorithm? Why a ’Right to an Explanation’ Is Probably Not the Remedy You Are Looking For. Duke Law & Technology Review 16 (2017), 18–84. https://doi.org/10.2139/ssrn.2972855Google Scholar
- Marzyeh Ghassemi, Luke Oakden-Rayner, and Andrew L. Beam. 2021. The false hope of current approaches to explainable artificial intelligence in health care.The Lancet. Digital health 3, 11 (Nov. 2021), e745–e750. https://doi.org/10.1016/S2589-7500(21)00208-9Google Scholar
- Gillian K. Hadfield. 2021. Explanation and justification: AI decision-making, law, and the rights of citizens. https://srinstitute.utoronto.ca/news/hadfield-justifiable-aiGoogle Scholar
- Ana Valeria González, Gagan Bansal, Angela Fan, Yashar Mehdad, Robin Jia, and Srinivasan Iyer. 2021. Do Explanations Help Users Detect Errors in Open-Domain QA? An Evaluation of Spoken vs. Visual Explanations. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 1103–1116. https://doi.org/10.18653/v1/2021.findings-acl.95Google ScholarCross Ref
- Yoyo Tsung-Yu Hou and Malte F. Jung. 2021. Who is the Expert? Reconciling Algorithm Aversion and Algorithm Appreciation in AI-Supported Decision Making. Proc. ACM Hum.-Comput. Interact. 5, CSCW2 (Oct. 2021). https://doi.org/10.1145/3479864Google ScholarDigital Library
- Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks.https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencingGoogle Scholar
- Todd Kulesza, Simone Stumpf, Margaret Burnett, Sherry Yang, Irwin Kwan, and Weng-Keen Wong. 2013. Too much, too little, or just right? Ways explanations impact end users’ mental models. In 2013 IEEE Symposium on Visual Languages and Human Centric Computing. IEEE, San Jose, CA, USA, 3–10. https://doi.org/10.1109/VLHCC.2013.6645235Google ScholarCross Ref
- Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2019. Faithful and Customizable Explanations of Black Box Models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (Honolulu, HI, USA) (AIES ’19). Association for Computing Machinery, New York, NY, USA, 131–138. https://doi.org/10.1145/3306618.3314229Google ScholarDigital Library
- Ruiwen Li, Zhibo Zhang, Jiani Li, Chiheb Trabelsi, Scott Sanner, Jongseong Jang, Yeonjeong Jeong, and Dongsub Shim. 2021. EDDA: Explanation-driven Data Augmentation to Improve Explanation Faithfulness. (2021). https://doi.org/10.48550/ARXIV.2105.14162 arxiv:2105.14162 [cs.AI]Google Scholar
- Q. Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3313831.3376590Google ScholarDigital Library
- Brian Y. Lim, Anind K. Dey, and Daniel Avrahami. 2009. Why and Why Not Explanations Improve the Intelligibility of Context-Aware Intelligent Systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for Computing Machinery, New York, NY, USA, 2119–2128. https://doi.org/10.1145/1518701.1519023Google ScholarDigital Library
- Zachary C. Lipton. 2018. The Mythos of Model Interpretability: In Machine Learning, the Concept of Interpretability is Both Important and Slippery.Queue 16, 3 (June 2018), 31–57. https://doi.org/10.1145/3236386.3241340Google ScholarDigital Library
- Han Liu, Vivian Lai, and Chenhao Tan. 2021. Understanding the Effect of Out-of-Distribution Examples and Interactive Explanations on Human-AI Decision Making. Proc. ACM Hum.-Comput. Interact. 5, CSCW2 (Oct. 2021). https://doi.org/10.1145/3479552Google ScholarDigital Library
- Hui Liu, Qingyu Yin, and William Yang Wang. 2019. Towards Explainable NLP: A Generative Explanation Framework for Text Classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 5570–5581. https://doi.org/10.18653/v1/P19-1560Google ScholarCross Ref
- Shixia Liu, Xiting Wang, Mengchen Liu, and Jun Zhu. 2017. Towards Better Analysis of Machine Learning Models: A Visual Analytics Perspective. (2017). https://doi.org/10.48550/ARXIV.1702.01226 arxiv:1702.01226 [cs.LG]Google Scholar
- Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A Survey on Bias and Fairness in Machine Learning. ACM Comput. Surv. 54, 6 (July 2021). https://doi.org/10.1145/3457607Google ScholarDigital Library
- Tim Miller. 2017. Explanation in Artificial Intelligence: Insights from the Social Sciences. (2017). https://doi.org/10.48550/ARXIV.1706.07269 arxiv:1706.07269 [cs.AI]Google Scholar
- Sina Mohseni, Niloofar Zarei, and Eric D. Ragan. 2021. A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems. ACM Transactions on Interactive Intelligent Systems 11, 3–4 (Aug. 2021). https://doi.org/10.1145/3387166Google ScholarDigital Library
- Menaka Narayanan, Emily Chen, Jeffrey He, Been Kim, Sam Gershman, and Finale Doshi-Velez. 2018. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation. (2018). https://doi.org/10.48550/ARXIV.1802.00682 arxiv:1802.00682 [cs.AI]Google Scholar
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1135–1144. https://doi.org/10.1145/2939672.2939778Google ScholarDigital Library
- Cynthia Rudin. 2019. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. https://doi.org/10.48550/arXiv.1811.10154 arXiv:1811.10154 [cs, stat].Google Scholar
- Andrew Selbst and Solon Barocas. 2018. The Intuitive Appeal of Explainable Machines. Fordham Law Review 87, 3 (Jan. 2018), 1085. https://ir.lawnet.fordham.edu/flr/vol87/iss3/11Google Scholar
- Andrew D Selbst and Julia Powles. 2017. Meaningful information and the right to explanation. International Data Privacy Law 7, 4 (Dec. 2017), 233–242. https://doi.org/10.1093/idpl/ipx022 arXiv:https://academic.oup.com/idpl/article-pdf/7/4/233/22923065/ipx022.pdfGoogle ScholarCross Ref
- Jacob Sippy, Gagan Bansal, and Daniel S. Weld. 2020. Data Staining: A Method for Comparing Faithfulness of Explainers. Technical Report. https://aiweb.cs.washington.edu/ai/pubs/sippy-icml20.pdfGoogle Scholar
- Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. 2020. Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (New York, NY, USA) (AIES ’20). Association for Computing Machinery, New York, NY, USA, 180–186. https://doi.org/10.1145/3375627.3375830Google ScholarDigital Library
- Sandra Wachter, Brent Daniel Mittelstadt, and Chris Russell. 2017. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Harvard Journal of Law & Technology 31, 2 (2017), 47. https://doi.org/10.2139/ssrn.3063289Google Scholar
- Fan Yang, Mengnan Du, and Xia Hu. 2019. Evaluating Explanation Without Ground Truth in Interpretable Machine Learning. (2019). https://doi.org/10.48550/ARXIV.1907.06831 arxiv:1907.06831 [cs.AI]Google Scholar
- Wei Zhang, Ziming Huang, Yada Zhu, Guangnan Ye, Xiaodong Cui, and Fan Zhang. 2021. On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 5399–5411. https://doi.org/10.18653/v1/2021.acl-long.419Google Scholar
Index Terms
- How to Explain and Justify Almost Any Decision: Potential Pitfalls for Accountability in AI Decision-Making
Recommendations
Decision making process: typology, intelligence, and optimization
Decision making is concerned with evaluating and/or ranking possible alternatives of action. In this paper, we develop a model for the process of decision making. Understanding the decision process can provide insights into how humans make decisions, ...
Sustainable decision making: the role of decision support systems
Sustainable decision making stands for decision making which contributes to the transition to a sustainable society. It raises a number of challenging problems for which existing decision support systems (DSS) may not be equipped. The role of DSS in ...
Consensus-based decision support for multicriteria group decision making
Consensus decision making is complex and challenging in multicriteria group decision making due to the involvement of several decision makers, the presence of multiple, and often conflicting criteria, and the existence of subjectiveness and imprecision ...
Comments