Explain Yourself: Factors that Impact Trust in AI Recommendations
Trusting AI
Recent advances in machine learning and AI have made it possible to support decision-making in many industries. These tools are embedded in software applications and use large databases of internal and external information to generate recommendations and explanations where a human needs to make a decision. Often, these are high-stakes decisions. Among the industries using AI in high-stakes situations are healthcare, criminal justice, and finance.
In this post, I dig into a March 2023 study published in the International Journal of Human–Computer Studies titled, “How the different explanation classes impact trust calibration: The case of clinical decision support systems.” The premise of the paper is that collaborative decision-making with artificial intelligence (AI) leads to better decisions, but only if humans have a significant degree of trust in AI.
But, what is trust? The human-computer trust (HCT) model states that trust is formed with machines in two dimensions: cognition-based trust and affect-based trust. Simply put, trust is built on intellectual and emotional bases. In this model, trust comes down to three primary factors:
- Perceived understandability
- Perceived reliability
- Perceived technical competence
Building on the emerging field of Explainable Artificial Intelligence (XAI), the writers in this paper explore how different types of explanations (XAI models) affect trust in an AI recommendation. Specifically, they developed a tool that helps medical practitioners in screening chemotherapy prescriptions. In this simulated high-stakes setting, an AI provided prescription recommendations with a supporting explanation they could either accept or reject.
There are many types, or classes, of XAI. Yet, their goal is always the same: to provide context and rationale for the AI recommendation. The four XAI types tested in this study were:
- Local explanations: these justify the AI reasoning at the recommendation level by quantifying the contribution value for each data source to the recommendation.
- Example-based: these justify the AI decision by providing examples from the dataset with similar characteristics (i.e., reject the prescription because patient A looks like patient B).
- Counterfactual: these justify the AI decision by providing answers to users’ typical “what-if” questions (i.e., reject the prescription because the platelet count is 60; however, if the platelet count was greater than 75, the prescription would be confirmed).
- Global explanations: these justify the AI decision by attempting to make clear the overall logic of the black-box model.
The researchers recruited 41 medical practitioners who performed 410 Human-AI team tasks. They were presented with a recommendation to either accept or reject a chemotherapy prescription for a patient, along with an explanation of the recommendation. The medical practitioners used a clickable prototype for this research, not real-world software.
Notably, some recommendations were correct and others were intentionally incorrect. The purpose was to evaluate the explanation (types of XAI) associated with each recommendation and whether this affected the practitioners' trust in the recommendation.
The researchers used both quantitative and qualitative methods to assess the participants' trust in the recommendation and XAI type. If you’re interested in a detailed overview of the methodology, I suggest reading the paper. It’s quite interesting.
Key Insight
The majority of medical practitioners were likely to trust the AI recommendation when it included any of the XAI classes – even when it made wrong recommendations. The mere fact that there was an explanation gave these practitioners a sense of trust in the AI.
Designers of XAI interfaces often incorrectly assume that users will engage deeply with the explanation. However, humans prefer to use heuristics and shortcuts when making decisions (System 1 thinking). However, high-stakes decision-making requires a slower pace, critical thinking, and more effort (System 2 thinking). This study indicates that AI recommendations with an explanation can lull people into System 1 thinking. Spoiler alert: humans like convenience and speed, even when the stakes of the decision are higher.
Additional Insights & Learnings
While all classes or types of XAI models increased trust, some performed better than others. AI recommendations that included example-based and counterfactual explanations were more understandable. Humans are more willing to engage with explanations when they are familiar, simple, and casually relevant (Keil, 2006).
However, the participants did not rate example-based explanations as significantly competent. One participant commented in that context, “Examples are beneficial and similar to what we do in the clinic, but it is not a proper explanation … I mean, it could be supportive of other explanations … I would expect a more casual or correlation relationship between the patient and the AI decision”.
When designing recommendations and their explanations, the researchers found that there are several factors that influence trust and usefulness. These include:
- The ability to see the line of reasoning and how the AI arrived at a conclusion.
- If there is too much additional information, it creates a burden on the already time-consuming workflow.
- Long and redundant explanations made participants skip them and decreased participants' satisfaction with the explanation.
- Domain and task awareness are critical. Without them, AI explanations and recommendations will be ignored. In fields with niche contexts such as healthcare, oftentimes there is not enough depth in the data to properly train the models to get to precise context awareness.
- Additionally, practitioners question how up-to-date the AI model is. For example, participants in the study wanted to be assured that the latest treatments, medical sources, and knowledge were in the AI model.
- For example-based recommendations, they wondered if there was enough stability between similar cases as well as representativeness that describes how many cases could be covered by a given explanation.
Wrap up
The authors of the paper suggest future research should consider the trade-off between effectiveness and usability of the explanation to optimize the Human-AI team performance. They also note, “Explainability is a social and interactive process between the explainer and the explainee.” Real trust is built through dialog. AI developers should consider how to build paths between humans and their systems in a way that elevates the reasoning and informs the model’s evolution.
A previous study cited in the paper (CAI et al. 2019) used an onboarding technique to guide users’ understanding of the actual AI capabilities and limitations and ways of using it. Education before use is important to avoid potential misinterpretation of the AI’s ability. Co-design and significant user testing will lead to more engagement and understandability within specific types of users and domains.
I hope you found this interesting and useful. I’m already working on the next research breakdown within the field of AI. It’ll be out in the next week or so.
Until next time,
Andy