Work in Progress

Free Access

Contextualizing the “Why”: The Potential of Using Visual Map As a Novel XAI Method for Users with Low AI-literacy

Authors:
Anniek Jansen

Department of Industrial Design, Eindhoven University of Technology, Netherlands

Department of Industrial Design, Eindhoven University of Technology, Netherlands

0000-0003-3474-457X
View Profile

,
François Leborgne

Department of Industrial Design, Eindhoven University of Technology, Netherlands

Department of Industrial Design, Eindhoven University of Technology, Netherlands

0009-0000-9639-9519
View Profile

,
Qiurui Wang

Department of Industrial Design, Eindhoven University of Technology, Netherlands

Department of Industrial Design, Eindhoven University of Technology, Netherlands

0000-0002-9994-5359
View Profile

,
Chao Zhang

Department of Industrial Engineering & Innovation Sciences, Eindhoven University of Technology, Netherlands

Department of Industrial Engineering & Innovation Sciences, Eindhoven University of Technology, Netherlands

0000-0001-9811-1881
View Profile

CHI EA '24: Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing SystemsMay 2024Article No.: 87Pages 1–7https://doi.org/10.1145/3613905.3650812

Published:11 May 2024Publication History

CHI EA '24: Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems

Pages 1–7

Abstract

The surge of Artificial Intelligence (AI) for automatic decision-making raises concerns about transparency and interpretability of AI models. Explainable AI (XAI) addresses this by providing insights into AI predictions. Despite the availability of various methods for explaining decisions based on tabular data, there is no consensus on their effectiveness for different types of users. This paper introduces a novel XAI method, the Visual Map, and presents a human-grounded evaluation study comparing it with three common XAI methods. In an online experiment (N = 49), participants with either high or low AI-literacy evaluated all four methods in terms of explanation satisfaction, cognitive load, and overall evaluation in the same classification task environment. High AI-literacy participants were largely indifferent to the four methods, whereas low AI-literacy participants favoured the visual map, perceiving it as the least cognitively demanding. Our findings contribute to the evaluation and development of XAI methods for different types of end-users.

1 INTRODUCTION

With the integration of Artificial Intelligence (AI) into more and more products and services, and its frequent use in decision-making, the number of end-users will also increase. Many of these end-users lack AI expertise, yet they are required to assess the trustworthiness and acceptability of AI-driven decisions. This assessment is difficult as AI models, particularly those rooted in deep learning, are often characterized as "black boxes," owing to their inherent complexity and opacity [29]. To address the challenge of making these black-box AI models comprehensible to end-users, a range of explainable AI (XAI) methods has been developed [26, 29], including for instance, counterfactuals, decision trees, and SHapley Additive exPlanations (SHAP) [16, 22].

When aiming at democratizing the use and benefits of AI [30], it is important to develop XAI methods for users with low AI-literacy. Currently, these existing methods are lacking, as they frequently articulate their explanations at a high level, proving challenging for users with low AI-literacy and inducing high cognitive load [14]. This is in line with findings from [8, 18] who showed that end-user characteristics, such as AI-literacy, impact how explanations are perceived.

While this influence has been demonstrated, little research has been conducted on how to develop alternative XAI methods for low AI-literate users. In this paper, we design and evaluate a novel XAI method, the Visual Map, which aims at contextualizing the explanation [24] by embedding the explanations in visuals that correspond with the context of the Titanic dataset. We compared this method to three existing XAI methods for tabular data - SHAP, counterfactuals, and Decision Trees - using a human-grounded evaluation [6], involving users with varying levels of AI-literacy.

An online experiment was conducted in which participants (N = 49) were asked to interact with and assess the four XAI methods in terms of explanation satisfaction [15], cognitive load required to process their explanations [28, 34], and a final evaluation score (i.e., overall evaluation). To investigate if the AI-literacy level of participants influenced their evaluations, we measured the participants’ AI-literacy level objectively using the scale developed by Weber, Pinski and Baum [37]. Our results showed that the Visual Map received the most positive evaluation overall and required the least cognitive load among the participants in the low AI-literacy group. In contrast, the overall evaluation for the four methods did not differ significantly for the participant group with a higher AI-literacy, although the Visual Map also required less cognitive load than the decision tree.

Our work contributes to a more human-centered approach to XAI and the democratization of AI by (i) introducing a novel XAI method based on contextualized visual explanation and (ii) highlighting the influence of AI-literacy on the preference and end-user satisfaction for XAI methods.

2 BACKGROUND AND RELATED WORK

2.1 Explainable AI for Tabular Data

In recent years, various methods have been developed to explain decisions made by AI [1, 3, 16, 22, 23, 32]. These methods provide either global explanations, explaining the model’s overall behaviour, or local explanations, clarifying predictions for specific instances. This paper will focus on local explanations as they can explain in more detail to a user why they received a given prediction.

Regarding local explanations for tabular data, this paper focuses on three popular methods. The first method, SHap-ley Additive exPlanations (SHAP) [22], attempts to assign contributions to features, revealing their impact on the final prediction through Shapley values [13]. The second method, counterfactuals, focuses on what needs to be changed to get an opposite prediction. The third method, decision tree, shows which sequential decisions were made based on the values of certain variables.

2.2 AI-literacy

With the introduction of AI into society, more attention is being paid to the public understanding of this technology, i.e. their AI-literacy. Long and Marko defined AI-literacy as: a set of competencies that enables individuals to critically evaluate AI technologies; communicate and collaborate effectively with AI; and use AI as a tool online, at home, and in the workplace [21]. Results from previous studies showed that the AI-literacy level of end-users can influence their level of curiosity about AI system details [18] and how they perceive XAI [8]. For instance, Ehsan et al. [8] showed that both people with and without AI background over-trust numbers, but for different reasons. Users with an AI background assumed that numbers possessed all the information, while those with a non-AI background could not understand the numbers and therefore associated them with higher-order intelligence. Other literature also pointed out the importance of considering both the social and technical elements when developing and evaluating XAI methods [7, 9, 24]. Therefore, prior knowledge and competencies required to evaluate AI, for instance the AI-literacy level, should be thoroughly considered during the assessment of the XAI methods.

2.3 Human-Centered XAI

End-users often do not have an AI background nor the required domain knowledge. Hence, designing an XAI method that explains concepts that are easily understandable and do not require prior AI knowledge requires the participation of end-users in its development as displayed in the example of analogy-based explanations [14]. Besides, inspired by the fact that explanations should be contextual [24], we developed a novel XAI method, the Visual Map, which embeds the explanation in the context of the dataset.

The differences between end-users also urge researchers to evaluate XAI methods using application- or human-grounded evaluations instead of functionality-grounded evaluations [6] or simply based on a researcher’s intuition [24]. Application-grounded evaluations, evaluations with domain experts and real tasks, are most valuable if the tasks and methods are well established. For instance, Jesus et al. [17] adopted an application-grounded evaluation and compared three XAI methods for a fraud detection task. However, in this paper we set out to evaluate a novel XAI method against existing ones. Therefore, it is valuable to first test the method in general on a simple task with real humans with different levels of AI-literacy, i.e. a human-grounded evaluation. This type of evaluation has also previously been used to evaluate different XAI methods for text classification [20] or how SHAP can be used for alert processing [38].

Figure 1: The visual map with the attributes scaled and coloured according to their SHAP values. Red indicates it has negative SHAP values, i.e. contributing towards not surviving, and green a positive SHAP value. The larger the icons/ the bolder the text, the higher the SHAP value is for that attribute and the more it contributed to the final decision. When hovering over a coloured element the user can see the real SHAP value. Icons by: Eucalyp, Vectordoodle via SVGrepo; Wahid Ilham M Rifai and Angelina Fara from Noun Project; MAPSVG

3 DESIGNING A NEW XAI METHOD

In this section, we present a novel XAI method that attempts to combine the context of the prediction problem and the associated dataset with the information given by SHAP values [22].

3.1 The Context: the Titanic Dataset

Data, machine learning (ML) predictions and hence XAI do not exist in a vacuum. Instead, predictions and features of a dataset are often linked to a specific context. For our design, we focussed on the Titanic dataset [5], as the context is well-known to most and is interesting to visualize. In the discussion, we will reflect on how the developed method could be used for other datasets.

The Titanic dataset describes the survival status of the passengers on the Titanic boat that famously sank on April 15th 1912. Out of the 2224 passengers, only 722 of them survived the iceberg crash. The dataset not only records whether a passenger survived, it also includes features that describe the passengers and their passage (ticket class, gender, age, number of relatives, number of siblings and/or spouses, number of parents and/or children, port of embarkation, fare price, deck, title).

3.2 The Visual Map: Contextualizing and Visualizing Explanations

The objective when designing the Visual Map was to provide a clear explanation intelligible to every end-user regardless of their background. Existing XAI methods were taken as starting point and the authors discussed which elements they expected to be easy or hard to understand. Elements such as colour, size, and icons were expected to be easily understandable, while visualization with information in different locations were expected to be harder as it divides the visual attention [10]. As explanations should also be contextual [24], we started by visualizing the different features in the Titanic dataset using icons in a map. The importance of each feature was calculated using the values derived from the SHAP method [22] and communicated using different visual channels. The colour indicates if a feature contributed to survival or not and the size indicates the magnitude of importance. According to graphical perception model developed by Cleveland and McGill [4], the size of a visual element is less accurate compared to bars along a common scale. To still provide users with sufficiently accurate information about magnitudes of feature importance, we included a hover interaction which would show the exact SHAP value. An animation of a moving boat was added to increase the awareness of the context and create an immersive and interactive experience. The final version of the Visual Map was created using D3.js and can be found in Figure 1 and online ¹.

4 EVALUATION STUDY

This study set out to evaluate our Visual Map method by comparing it to three commonly used XAI methods (see Figure 2) - SHAP, decision tree, and counterfactual - in terms of cognitive load, explanation satisfaction, and overall user evaluation. We followed a human-grounded evaluation approach [6] and included users with both high and low AI-literacy to analyse potential similarities and differences between the two user groups.

4.1 Method

4.1.1 Participants.

An a priori power analysis was performed using G*Power [11] to calculate the minimal required sample size to achieve a power of 90% using a repeated measure ANOVA test (α = 0.05, r=0.5). A pilot study (N = 25) was conducted to determine the mean and standard deviation of each XAI method. Thus, with a sample size of N = 27 for each group, a minimum power of 90% would be achieved. Sixty participants were then recruited via a platform with two different sets of criteria applied to obtain participants with low and high AI-literacy. All participants had to be fluent in English, have at least a bachelor’s degree and be between 18 and 35 years old. For the first group, participants were not allowed to have any programming experience, use AI on a weekly basis, or have obtained a degree in the subjects computing (IT), computer science or mathematics. To be in the second group, participants had to have programming experience and use AI at least twice a week. These criteria were chosen as it was assumed that they would lead to participants with low and high AI-literacy as this was not a filter on the recruitment platform. The 60 participants were predominantly students (68.3%), evenly distributed between genders, from 12 different countries and with an average age of (25.8 ±2.9) years.

Figure 2: The three other XAI methods as included in the study

4.1.2 XAI Evaluation Interface.

To evaluate the Visual Map and the other three XAI methods, we developed a web interface (see Figure 3). This interface showed a brief description of the XAI method on that page, the name of the currently selected passenger alongside their characteristics, the prediction of the model together with the confidence score and then one of the four types of explanations (see Figures 1 and 2). The Python libraries SHAP², dtreeviz³ and DiCE⁴ were used to realize the three other XAI methods. To have a fair comparison, the output of the existing methods was modified to include the names of the categorical data instead of their encoded value, e.g. 1= Mr, as the current Python libraries do not offer this functionality yet.

4.1.3 Study Design and Procedure.

The study was published on the online survey platform Prolific and included a link to an external questionnaire hosted on Streamlit ⁵. Participants, after giving informed consent, received an introduction to the study and the Titanic case, along with an overview of dataset features. Next, participants completed the tasks of evaluating the four XAI methods in a random order. Participants saw four passengers for each XAI method, each participants saw the same four passengers at each method but the passengers differed between methods. After having seen all four passengers, participants continued to evaluate their cognitive load on an 11-point Likert scale [28], followed by the explanation satisfaction scale [15]. This scale focuses on different aspects of satisfaction such as understanding, level of detail and trustworthiness. After completion, participants advanced to the next XAI method until all methods were seen. The study included two attention checks to verify if participants had read the first explanation and viewed all four passengers for the visual map.

After seeing all four XAI methods, participants were asked to give an overall evaluation score to each of the four methods, select their favourite type of XAI method, and answer some open-ended questions. These open questions were designed to find out what they liked about their favourite method, how it helped them understand the prediction and what they did not like about the other three XAI methods. At the end, participants’ demographic data (age, gender) was collected and they completed an AI proficiency scale. Two of the subscales of the objective measurement scale developed by Weber, Pinski and Baum [37] were used to measure AI competence. The subsections Socio User AI-literacy and Socio Creator/Evaluator AI-literacy were omitted, as the focus of this study was on the technical literacy of the participants. The shortened scale consists of eight multiple-choice questions, each with four options and one correct answer. This method was preferred over a subjective assessment of AI skills provided by Prolific’s filter on participants, as people tend to overestimate their competencies [27]. To further increase the reliability of the scale and to reduce the effect of guessing answers, an "I don’t know (IDK) option was added to each question [12, 33, 39]. A correct answer was awarded +1 point, an incorrect answer -1 point and the IDK option 0 points. The separation between the two AI-literacy groups was done based on the total score result (low AI-literacy ≡ {Total score ≤ 0}, high AI-literacy ≡ {Total score > 0}). Then, a Mann-Whitney U test showed a significant difference in the total score between the questionnaire’s total score and the two groups (z = 138, p < 0.001).

The study protocol complied with the university Ethical Review Board procedures and all data were managed in accordance with GDPR regulations.

Figure 4: The different scores for each of the four XAI methods split on the AI-literacy level.(* p<0.05, ** p<0.01, *** p<0.001).

4.1.4 Data Analysis.

The data from the questionnaire was cleaned by removing entries from participants who did not successfully complete both attention checks and answered completely all questions. Thus, eleven participants were removed, resulting in the final sample of 49 participants. Next, participants were subdivided into a low AI-literacy group (total score < = 0, N = 21) or a high AI-literacy group (total score > 0, N = 28) ⁶.

To compare Visual Map with the other three XAI methods among both participants with high and low AI-literacy, we built a linear mixed model for each of the dependent variables (i.e., explanation satisfaction, cognitive load, and overall evaluation) with the XAI method, the AI-literacy level, and their interaction term as the fixed-effect predictors and participant ID as a grouping variable to account for individual-specific variations in the intercepts. Both predictors were dummy coded, with Visual Map as the reference level for the XAI method and low AI-literacy as the reference level for AI-literacy. This way, the estimated regression coefficients could be interpreted as the differences between evaluations for Visual Map and for the other three methods for the low AI-literacy group. The coefficients for the interaction terms could show whether the two participant groups responded differently to Visual Map as compared with the other groups. To also examine the evaluations from the high AI-literacy participants, we repeated the modeling procedure by changing the reference level for AI-literacy to the high-literacy group. All models were fitted using the lmerTest package in R [19].

4.2 Results

4.2.1 Explanations satisfaction.

As shown in Figure 4a, participants’ satisfactions with the explanations as measured by the Explanation Satisfaction Scale (ESS) did not vary a lot with regard to the four XAI methods. All ESS cores were around the mid-point of the 5-point scale (between 2.94 and 3.41). The linear mixed modelling results indicated only one significant difference for the low AI-literacy participants: they were significantly more satisfied with the Visual Map than the Decision Tree, B = -0.35, SE = 0.17, p =.046. All other differences were non-significant for both AI-literacy groups and there was no XAI method by AI-literacy interaction effect (all ps >.1).

4.2.2 Cognitive Load.

Figure 4b shows the perceived cognitive load for the four XAI methods. For the low AI-literacy group, results from the linear mixed model showed that Visual Map induced less cognitive load (M_VM = 3.90, SD_VM = 2.48) than the three other XAI methods, namely SHAP (M_SHAP = 5.24, SD_SHAP = 2.47; B = 1.34, SE = 0.42, p =.002), Decision Tree (M_DT = 6.50, SD_DT = 1.61; B = 2.65, SE = 0.42, p <.001), and Counterfactual (M_CF = 5.41, SD_CF = 2.45; B = 1.55, SE = 0.43, p <.001). For the high AI-literacy group, the only statistically significant difference was that Decision Tree (M_DT = 5.44, SD_DT = 1.87) led to higher cognitive load than Visual Map (M_VM = 4.12, SD_VM = 1.79), B = 1.32, SE = 0.45, p =.004. The model also suggested a significant method by AI-literacy interaction effect that the cognitive load difference between Visual Map and Decision Tree was larger for the low AI-literacy group than for the high AI-literacy group (B = -1.33, SE = 0.62, p =.032).

4.2.3 Overall Evaluation.

As shown in Figure 4c, participants from the low AI-literacy group seemed to have a clear preference for the Visual Map method in their overall evaluation. As suggested by the linear mixed model, the average evaluation score for Visual Map (M_VM = 7.23, SD_VM = 2.41) was significantly higher than the scores for SHAP (M_SHAP = 5.57, SD_SHAP = 3.06; B = -1.70, SE = 0.67, p =.012) and for Decision Tree (M_DT = 4.47, SD_DT = 2.90; B = -2.80, SE = 0.67, p <.001), but was not significantly different from the score for Counterfactual (M_CF = 6.10, SD_CF = 2.70; B = -1.12, SE = 0.67, p =.083). In contrast, for the participants with high AI-literacy, the differences in overall evaluations of the four methods were much smaller numerically (M_VM = 6.60, SD_VM = 2.90; M_SHAP = 6.72, SD_SHAP = 2.62; M_DT = 6.08, SD_DT = 2.83; M_CF = 5.64, SD_CF = 2.56) and were not statistically significant (all ps >.1). Finally, the model suggested one method by AI-literacy effect for overall evaluation, i.e., the difference in the evaluation scores of Visual Map and Decision Tree was significantly larger for the low AI-literacy group than for the high AI-literacy group (B = 2.28, SE = 0.99, p =.023).

5 DISCUSSION

In this study, we developed a new XAI method called Visual Map and compared it to three commonly-used XAI methods in a human-grounded evaluation study. By recruiting participants with both high and low AI-literacy, our online experiment clearly demonstrated that the comparison among the XAI methods depended on AI-literacy levels. For participants with low AI-literacy, the Visual Map was overall the preferred method, and it induced less cognitive load when they processed the explanations provided by the visual maps. For participants who knew more about AI and machine learning, while Visual Map also induced low cognitive load but the new method was not evaluated more positively than the other three existing XAI methods. In terms of satisfaction with the explanations, both group of participants didn’t seem to strongly favour any particular method.

Our findings suggest that the Visual Map should be considered as a potential option for real-world applications, especially when targeting end-users in the limited AI-literacy group, owing to the combined benefits of low cognitive load and overall preference. If the Visual Map is successfully implemented in real-world applications, then it enhances user experience and contributes to the broader goal of democratizing AI by making complex technologies more accessible to diverse user populations.

The preference for the Visual Map and its lower cognitive load can be attributed to at least three aspects. First, the Visual Maps provided the explanations in the context of the Titanic dataset while all other methods provided it an abstract manner. The context facilitates the user in understanding the case and what is being predicted, in our presentation the animation even showed the Titanic colliding with an iceberg and afterwards showed the prediction and how each feature contributed. In addition, the Visual Map provides context to the data features which otherwise remain more abstract, for example the port of embarkation. Second, the Visual Map requires less cognitive load as it breaks down complex information into more manageable components, i.e. it provides a scaffold for users [34, 36]. The icons, colours, and scaling act as cues, guiding users through the interpretation of the AI’s decision-making process and directing the users visual attention [10]. Finally, unlike the SHAP method where feature importance and case-specific feature values are separated spatially, the Visual Map integrate both information at one single location (i.e., feature values with colour coding), minimizing the need for users to divide their attention between the two information sources. According to cognitive load theory [35] and the split-attention principle [2], avoiding materials that require learners to divide their attention reduces cognitive load and facilitate understanding.

While the Visual Map in this paper is specific to the Titanic Dataset, the general method can be transferred to other tabular datasets and classification problems. Particular datasets with features that can be visualized with e.g. icons are suitable. An example is for instance predicting someone’s risk at diabetes [25] based on a.o. someone’s BMI (a person in varying length and width) if someone smokes (a smoke icon), if they have high cholesterol (a bar that will fill), etc. Transferring the proposed method to other types of data such as image and text data is less straightforward, but there is also a diminished need. In contrast to tabular data, explanations for image and text data are often presented within the context and require less familiarity with the domain of the problem to understand the explanation [31].

A curious and theoretically interesting aspect about our results is why the final score and cognitive load showed differences while the explanation satisfaction score did not. For future evaluations, it is important we get a better theoretical understanding of what drives people’s overall evaluation.

A limitation of our work is that we used only one instance of the commonly-used XAI methods. For instance, the SHAP library offers multiple types of visualizations and multiple libraries exist to make decision trees. We are therefore cautious with generalizing our findings to the methods in general. Secondly, this paper is limited as it only analyses the quantitative data and does not investigate why end-users preferred a certain method. In future work, we aim to answer these questions by analysing the collected qualitative data.

6 CONCLUSION

This study introduced a new explainable AI method using contextualized visual explanation, the Visual Map. We present a human-grounded evaluation in which we compared the Visual Map to three common XAI methods in terms of explanation satisfaction, cognitive load and overall evaluation for end-users with different AI-literacy levels. Our results show that low AI-literacy participants preferred the novel method as it received significant higher final scores and induced significant lower cognitive load. This difference in preference was not present among the high AI-literacy participants, although the Visual Map also required less cognitive load compared to the Decision Tree. While context specific, the general approach of creating a Visual Map can be transferred to other tabular datasets with features that enable visualization. If this approach is successfully implemented for real-world applications, it can contribute to the democratization of AI as it makes the explanations and predictions more understandable for users with low AI-literacy.

ACKNOWLEDGMENTS

This paper is partly supported by the European Union’s HORIZON Research and Innovation Programme under grant agreement No 101120657, project ENFIELD (European Lighthouse to Manifest Trustworthy and Green AI).

Footnotes

¹ https://observablehq.com/d/d177ef99668b6553
Footnote
² https://shap.readthedocs.io/en/latest/
Footnote
³ https://github.com/parrt/dtreeviz
Footnote
⁴ https://github.com/interpretml/DiCE
Footnote
⁵ https://xai-evaluation-e.streamlit.app/
Footnote
⁶ Creating the groups based on total correct answers or the original split from recruitment does not change the effects to a significant extent.
Footnote

Supplemental Material

3613905.3650812-talk-video.mp4

Talk Video

mp4

66.8 MB

Download

Available for Download

vtt

3613905.3650812-talk-video.vtt (4.3 KB)

References

Anna Markella Antoniadi, Yuhan Du, Yasmine Guendouz, Lan Wei, Claudia Mazo, Brett A Becker, and Catherine Mooney. 2021. Current challenges and future opportunities for XAI in machine learning-based clinical decision support systems: a systematic review. Applied Sciences 11, 11 (2021), 5088.Google ScholarCross Ref
Reference
Paul Ayres and John Sweller. 2005. The split-attention principle in multimedia learning. The Cambridge handbook of multimedia learning 2 (2005), 135–146.Google ScholarCross Ref
Reference
Shuvro Chakrobartty and Omar El-Gayar. 2021. Explainable artificial intelligence in the medical domain: a systematic review. Beadle Scholar AMCIS 2021 Proceedings, 1 (2021), 10.Google Scholar
Reference
William S Cleveland and Robert McGill. 1984. Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American statistical association 79, 387 (1984), 531–554.Google ScholarCross Ref
Reference
Will Cukierski. 2012. Titanic - Machine Learning from Disaster. https://kaggle.com/competitions/titanicGoogle Scholar
Reference
Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 2, 2 (2017), 1–13.Google Scholar
Reference 1Reference 2Reference 3
Upol Ehsan, Q. Vera Liao, Michael Muller, Mark O. Riedl, and Justin D. Weisz. 2021. Expanding Explainability: Towards Social Transparency in AI systems. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 82, 19 pages. https://doi.org/10.1145/3411764.3445188Google ScholarDigital Library
Reference
Upol Ehsan, Samir Passi, Q. Vera Liao, Larry Chan, I-Hsiang Lee, Michael Muller, and Mark O. Riedl. 2021. The Who in Explainable AI: How AI Background Shapes Perceptions of AI Explanations. arXiv Preprint arXiv:2107.13509 2, 2 (7 2021), 1–43. http://arxiv.org/abs/2107.13509v1Google Scholar
Reference 1Reference 2Reference 3
Upol Ehsan and Mark O Riedl. 2020. Human-centered explainable ai: Towards a reflective sociotechnical approach. In HCI International 2020-Late Breaking Papers: Multimodality and Intelligence: 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings 22. Springer, Springer, Cham, Copenhagen, Denmark, 449–466.Google ScholarDigital Library
Reference
Karla K Evans, Todd S Horowitz, Piers Howe, Roccardo Pedersini, Ester Reijnen, Yair Pinto, Yoana Kuzmova, and Jeremy M Wolfe. 2011. Visual attention. Wiley Interdisciplinary Reviews: Cognitive Science 2, 5 (2011), 503–514.Google ScholarCross Ref
Reference 1Reference 2
Franz Faul, Edgar Erdfelder, Albert-Georg Lang, and Axel Buchner. 2007. G*Power 3: A flexible statistical power analysis program for the social, behavior, and biomedical sciences. Behavior Research Methods Instruments & Computers 39 (05 2007), 175–191. https://doi.org/10.3758/BF03193146Google ScholarCross Ref
Reference
R. Mc G. Harden, R. A. Brown, L. A. Biran, W. P.Dallas Ross, and R. E. Wakeford. 1976. Multiple choice questions: to guess or not to guess. Medical Education 10 (1 1976), 27–32. Issue 1. https://doi.org/10.1111/J.1365-2923.1976.TB00527.XGoogle ScholarCross Ref
Reference
S Hart. 1989. Shapley value, Game Theory.Google Scholar
Reference
Gaole He, Agathe Balayn, Stefan Buijsman, Jie Yang, and Ujwal Gadiraju. 2022. It Is Like Finding a Polar Bear in the Savannah! Concept-Level AI Explanations with Analogical Inference from Commonsense Knowledge. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 10. The AAAI Press, Palo Alto, California USA, 89–101.Google ScholarCross Ref
Reference 1Reference 2
Robert R Hoffman, Shane T Mueller, Gary Klein, and Jordan Litman. 2018. Metrics for explainable AI: Challenges and prospects. arXiv preprint arXiv:1812.04608 1, 1 (2018), 1–50.Google Scholar
Reference 1Reference 2
Sheikh Rabiul Islam, William Eberle, Sheikh Khaled Ghafoor, and Mohiuddin Ahmed. 2021. Explainable artificial intelligence approaches: A survey. arXiv preprint arXiv:2101.09429 1, 1 (2021), 1–14.Google Scholar
Reference 1Reference 2
Sérgio Jesus, Catarina Belém, Vladimir Balayan, João Bento, Pedro Saleiro, Pedro Bizarro, and João Gama. 2021. How Can I Choose an Explainer? An Application-Grounded Evaluation of Post-Hoc Explanations. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 805–815. https://doi.org/10.1145/3442188.3445941Google ScholarDigital Library
Reference
Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, and Andrés Monroy-Hernández. 2023. "Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany>) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 250, 17 pages. https://doi.org/10.1145/3544548.3581001Google ScholarDigital Library
Reference 1Reference 2
Alexandra Kuznetsova, Per B. Brockhoff, and Rune H. B. Christensen. 2017. lmerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software 82, 13 (2017), 1–26. https://doi.org/10.18637/jss.v082.i13Google ScholarCross Ref
Reference
Piyawat Lertvittayakumjorn and Francesca Toni. 2019. Human-grounded Evaluations of Explanation Methods for Text Classification. EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference 1, 1 (8 2019), 1–17. https://doi.org/10.18653/v1/d19-1523Google ScholarCross Ref
Reference
Duri Long and Brian Magerko. 2020. What is AI literacy? Competencies and design considerations. In Proceedings of the 2020 CHI conference on human factors in computing systems. Association for Computing Machinery, New York, NY, USA, 1–16.Google ScholarDigital Library
Reference
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017), 1–10.Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Hasan Mahmud, AKM Najmul Islam, Ranjan Kumar Mitra, and Ahmed Rizvan Hasan. 2022. The Impact of Functional and Psychological Barriers on Algorithm Aversion–An IRT Perspective. In Conference on e-Business, e-Services and e-Society. Springer, Springer, Cham, New York, NY, USA, 95–108.Google Scholar
Reference
Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence 267 (2019), 1–38.Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
Farida Mohsen, Hamada RH Al-Absi, Noha A Yousri, Nady El Hajj, and Zubair Shah. 2023. A scoping review of artificial intelligence-based methods for diabetes risk prediction. npj Digital Medicine 6, 1 (2023), 197.Google Scholar
Reference
Christoph Molnar. 2022. Interpretable Machine Learning (2 ed.). Independently published, Munich, Germany. https://christophm.github.io/interpretable-ml-bookGoogle Scholar
Reference
Don A. Moore and Paul J. Healy. 2008. The Trouble With Overconfidence. Psychological Review 115 (4 2008), 502–517. Issue 2. https://doi.org/10.1037/0033-295X.115.2.502Google ScholarCross Ref
Reference
Fred G.W.C. Paas. 1992. Training Strategies for Attaining Transfer of Problem-Solving Skill in Statistics: A Cognitive-Load Approach. Journal of Educational Psychology 84 (1992), 429–434. Issue 4. https://doi.org/10.1037/0022-0663.84.4.429Google ScholarCross Ref
Reference 1Reference 2
Arun Rai. 2020. Explainable AI: From black box to glass box. Journal of the Academy of Marketing Science 48 (2020), 137–141.Google ScholarCross Ref
Reference 1Reference 2
Elizabeth Seger, Aviv Ovadya, Divya Siddarth, Ben Garfinkel, and Allan Dafoe. 2023. Democratising AI: Multiple Meanings, Goals, and Methods. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (Montréal, QC, Canada) (AIES ’23). Association for Computing Machinery, New York, NY, USA, 715–722. https://doi.org/10.1145/3600211.3604693Google ScholarDigital Library
Reference
Kacper Sokol and Peter Flach. 2020. Explainability fact sheets: a framework for systematic assessment of explainable approaches. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 56–67. https://doi.org/10.1145/3351095.3372870Google ScholarDigital Library
Reference
Bernd Carsten Stahl, Andreas Andreou, Philip Brey, Tally Hatzakis, Alexey Kirichenko, Kevin Macnish, S Laulhé Shaelou, Andrew Patel, Mark Ryan, and David Wright. 2021. Artificial intelligence for human flourishing–Beyond principles for machine learning. Journal of Business Research 124 (2021), 374–388.Google ScholarCross Ref
Reference
Tim Stoeckel, Phil Bennett, and Stuart Mclean. 2016. Is “I Don’t Know” a Viable Answer Choice on the Vocabulary Size Test?TESOL Quarterly 50 (12 2016), 965–975. Issue 4. https://doi.org/10.1002/TESQ.325Google ScholarCross Ref
Reference
John Sweller. 2011. Cognitive load theory. In Psychology of learning and motivation. Vol. 55. Elsevier, Amsterdam, Netherlands, 37–76.Google Scholar
Reference 1Reference 2
John Sweller, Jeroen JG van Merriënboer, and Fred Paas. 2019. Cognitive architecture and instructional design: 20 years later. Educational psychology review 31 (2019), 261–292.Google Scholar
Reference
Rachel R Van Der Stuyf. 2002. Scaffolding as a teaching strategy. Adolescent learning and development 52, 3 (2002), 5–18.Google Scholar
Reference
Patrick Weber, Marc Pinski, and Lorenz Baum. 2023. Toward an Objective Measurement of AI Literacy, In PACIS 2023 PROCEEDINGS. PACIS 2023 Proceedings 60, 1–10. https://aisel.aisnet.org/pacis2023Google Scholar
Reference 1Reference 2
Hilde J. P. Weerts, Werner van Ipenburg, and Mykola Pechenizkiy. 2019. A Human-Grounded Evaluation of SHAP for Alert Processing. arXiv preprint arXiv:1907.03324 1, 1 (7 2019), 1–6. https://arxiv.org/abs/1907.03324v1Google Scholar
Reference
Xian Zhang. 2013. The I Don’t Know Option in the Vocabulary Size Test. TESOL Quarterly 47 (12 2013), 790–811. Issue 4. https://doi.org/10.1002/TESQ.98Google ScholarCross Ref
Reference

Index Terms

Contextualizing the “Why”: The Potential of Using Visual Map As a Novel XAI Method for Users with Low AI-literacy
1. Computing methodologies
  1. Artificial intelligence
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Empirical studies in HCI

Recommendations

Towards an XAI-Assisted Third-Party Evaluation of AI Systems: Illustration on Decision Trees
Explainable and Transparent AI and Multi-Agent Systems
Abstract
We explored the potential contribution of eXplainable Artificial Intelligence (XAI) for the evaluation of Artificial Intelligence (AI), in a context where such an evaluation is performed by independent third-party evaluators, for example in the ...
Read More
DARPA's explainable artificial intelligence (XAI) program
IUI '19: Proceedings of the 24th International Conference on Intelligent User Interfaces

The DARPA's Explainable Artificial Intelligence (XAI) program endeavors to create AI systems whose learned models and decisions can be understood and appropriately trusted by end users. This talk will summarize the XAI program and present highlights ...
Read More
Adaptive XAI: Towards Intelligent Interfaces for Tailored AI Explanations
IUI '24 Companion: Companion Proceedings of the 29th International Conference on Intelligent User Interfaces

As the integration of Artificial Intelligence into daily decision-making processes intensifies, the need for clear communication between humans and AI systems becomes crucial. The Adaptive XAI (AXAI) workshop focuses on the design and development of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI EA '24: Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems
May 2024
4761 pages
ISBN:9798400703317
DOI:10.1145/3613905
Editors:
Florian Floyd Mueller
Monash University
,
Penny Kyburz
The Australian National University
,
Julie R. Williamson
University of Glasgow
,
Corina Sas
Lancaster University
Copyright © 2024 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 May 2024
Check for updates
Author Tags
AI-literacy
Artificial intelligence
Counterfactual
Decision tree
Explainable AI
Human-grounded evaluation
SHAP
Visualization
Qualifiers
- Work in Progress
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate6,164of23,696submissions,26%
Upcoming Conference
CHI PLAY '24

Sponsor:

sigchi

The Annual Symposium on Computer-Human Interaction in Play

October 14 - 17, 2024

Tampere , Finland
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 173
  Total Downloads
- Downloads (Last 12 months)173
- Downloads (Last 6 weeks)173
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Contextualizing the “Why”: The Potential of Using Visual Map As a Novel XAI Method for Users with Low AI-literacy

CHI EA '24: Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems

Abstract

1 INTRODUCTION

2 BACKGROUND AND RELATED WORK

2.1 Explainable AI for Tabular Data

2.2 AI-literacy

2.3 Human-Centered XAI

3 DESIGNING A NEW XAI METHOD

3.1 The Context: the Titanic Dataset

3.2 The Visual Map: Contextualizing and Visualizing Explanations

4 EVALUATION STUDY

4.1 Method

4.1.1 Participants.

4.1.2 XAI Evaluation Interface.

4.1.3 Study Design and Procedure.

4.1.4 Data Analysis.

4.2 Results

4.2.1 Explanations satisfaction.

4.2.2 Cognitive Load.

4.2.3 Overall Evaluation.

5 DISCUSSION

6 CONCLUSION

ACKNOWLEDGMENTS

Footnotes

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Towards an XAI-Assisted Third-Party Evaluation of AI Systems: Illustration on Decision Trees

DARPA's explainable artificial intelligence (XAI) program

Adaptive XAI: Towards Intelligent Interfaces for Tailored AI Explanations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media