VisML #5: Beyond Input and Output: Introducing Model Explanations

Short overview of some basic methods to understand how machine learning models do what they do

Jun 24, 2024

In this section of the series on Visualization for Machine Learning, we focus on model explanations. Model explanations are a class of ML methods that extract information from models to derive information about the logic they use to produce their predictions. For example, many explanation methods produce “feature weights,” numbered scores associated with the features that provide information about how relevant a feature is for a given decision and how it relates to the output (i.e., if its influence on the output leads to higher or lower values). While the approaches we covered earlier based on data provide information about “what” the model does, explanations aim to provide information about “how” models make their decisions.

Let’s explore a specific example to clarify this idea. Below, you can see an explanation taken from the SHAP library, one of the most popular explanation methods. The model tries to predict median house value for California districts based on a number of features describing the district, such as median income, location (longitude and latitude), average number of rooms, etc.

For an individual district like the one depicted above, the explanation returns a ranked list of features with attached values that “explain” the effect they have on the output. A positive effect means the feature “pushes” the output towards higher values. Conversely, a negative effect pushes the output towards a lower value.

Explanations come in many shapes and forms and are constantly being developed. This area of research has been so active in recent years that it’s hard to organize the existing techniques into a coherent framework. Since research is so active, it’s entirely possible that new approaches will be developed sometime soon.

In this introductory post, I focus on providing a big picture of some of the most common techniques without delving too deep into the data visualization problems. In future posts, I will explore the visualization part more fully.

When we talk about explanations, we need to consider two separate aspects: the explanation type and the scope. Type pertains to the information extracted from the model and its form. Scope pertains to whether the explanations refer to individual decisions, subgroups, or the whole model.

Explanation Types

I like to group explanations into three broad types: feature weights, counterfactuals, and rules and trees. This categorization does not cover all possible methods, but it covers the most common ones.

Feature Weights

Features weights are numeric scores associated with each feature the model uses from the input to make a decision. The weights can be positive or negative, and their absolute value represents how much influence a feature has on the output. The image above, which I’ve described before, is an example of this type of explanation.

One way to think about weights is that they push the output in a given direction (positive or negative) with respect to the average output of the model and with a strength proportional to their amount. So, when we see a big positive weight and a high output value, we can attribute that value to that weight. This is a sort of simplistic mental model, but it’s good enough as a first approximation. As I mentioned in previous posts, features can be quantities or categories in a table but also individual pixels in an image or words in a text, and for these situations, the mental model breaks down a little. Another way to think about feature weights is as a kind of “emphasis.” If you want to understand the output, look at those features.

A special case of feature weights is a “saliency map.” Saliency maps are used with images to highlight regions of an image that are discriminative for the model to make decisions. Technically, they are often just the same feature weight mechanisms used for other data types (e.g., you can use general-purpose methods like SHAP or LIME to create saliency maps), but there are also some specialized ones designed specifically for image data.

The image below shows an example, once again, based on the same SHAP method I mentioned above.

Here, the features are individual pixels, and the red areas make the probability of the image being the animal on the left more likely. Similarly, the blue areas make that output less likely.

Counterfactuals

Another explanation method commonly used is counterfactual explanations. A counterfactual explanation of a single instance is the minimal set of changes one has to apply in order to change the output of the model. For example, imagine a bank's model to decide whether a customer should be given a loan. If the model predicts that the customer should not be given a loan, a counterfactual explanation provides information about what kind of changes would change the prediction from denied to accepted (there are a lot of ethical issues with this kind of decision that I am not going to discuss here). The image below shows an example (taken from a paper we published a while back).

The green arrows represent the changes necessary to flip the outcome from the current prediction to a different one.

While feature weights give information about which features drive the decisions, counterfactual explanations give information about what (minimal set of) changes produce a different outcome.

Rules and Trees

Rules and trees provide explanations in the form of logical structures such as “IF X is true and Y is true, THEN the predicted outcome is A.” The biggest advantage of this form of explanation is its highly interpretable format. Their meaning is self-explanatory: when the logical predicates of the antecedent part of the rule (X and Y) are true, then the consequent part is true (with a given probability and amount of confidence - two quantities that are often associated with rules). Trees are very similar except that they organize the predicates into branches of a tree. The figure below makes the concept easier to grasp.

Each node is a logical predicate, and each path from the root to a leaf node represents a rule. A rule is made of the predicates crossed in the path as antecedents and the outcomes of the leaves as consequents.

These logic structures work much better to describe the behavior of the entire model or parts of its input space rather than individual instances, although a rule can be used to explain a single instance by picking the rule (or multiple rules) that cover that instance.

Explanation Scope (Global, Local, or Subgroups)

One relevant aspect of explanations is at what level of granularity they operate. Some types of explanations are designed to provide information about individual instances, whereas others are designed to provide information about the whole model. For example, SHAP produces information at the level of individual instances, whereas surrogate rules and trees normally describe subsets of the input space or even the whole model.

This is particularly relevant for our purposes because different visualization approaches are necessary for entities representing individual instances versus whole models. As a general rule, any method focusing on individual instances will need a way to navigate the space of instances to decide which one to inspect. Common interaction and visualization strategies here include (visual) query languages (e.g., dynamic filters), visualization techniques to explore large sets of instances, and ways to aggregate the data to generate higher-level patterns. This is all a bit vague at this stage, but I will clarify it in future posts. For now, take this as a preview of what is possible to do in this space.

For methods that capture information about subsets or even the whole model, the data visualization challenge is typically more about visualizing the structures generated by the methods (e.g., how to visualize rules and decision trees) and navigating these data structures when they become large and complex. This is also a preview of something I will describe in more detail in a future post. For now, it suffices to know what kind of visualization challenges exist in this space.

Analytical Questions

When considering these classes of methods, it’s important to keep in mind that they tend to help answer different types of questions and are often complementary. I insist on knowing what type of questions one can answer because this aspect is often overlooked, and I think it’s crucial when deciding what method to use. The methods I outlined above answer these main questions.

Feature weights: “What features drive this specific decision? Does this feature make this outcome more or less likely?”
Counterfactual: “How can I change the values of this instance to obtain a different outcome?”
Rules and tree: “What is the overall logic of the model? What features and values lead to specific types of outcomes?”

In reality, these methods can answer more questions. For example, when one aggregates feature weights from individual instances, it is possible to infer something about the whole model or parts of it. With rules and trees, it is also possible to focus on error analysis and look for specific subsets where the error rate is higher than expected.

Limitations and Misinterpretations

I want to conclude this post with a warning. Model explanations are models of models, and as such, they are not perfect! There are many ways in which model explanations can lead people to make incorrect inferences; therefore, it is very important to 1) understand what one can and cannot infer from the output of these methods and 2) remain vigilant and always verify the correctness of the information extracted.

A group of Microsoft Research researchers ran a very relevant study in 2020 to study how data scientists use some of these methods, and their findings are quite alarming.

Kaur, Harmanpreet, et al. "Interpreting interpretability: understanding data scientists' use of interpretability tools for machine learning." Proceedings of the 2020 CHI conference on human factors in computing systems. 2020.

Despite the fact that they analyzed results from people who are experienced with machine learning, they found that these methods are often misused. This means that when using these methods, it is important to be extra careful in deriving correct interpretations of their results, especially when visualization is involved.

nicola leonardi

Feb 13

I think this sentence is not entirely correct: "For example, SHAP produces information at the level of individual instances". Indeed the beemswarm plot https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/plots/beeswarm.html ca be used for a global analysis of the model outputs. Do you agree?

Expand full comment

Gennady Andrienko

Jun 24

Excellent writing, Enrico, thank you!

One comment: trees and rules looks to be interpretable by design, but they are NOT understandable in many real-world applications. The reasons are depth of trees, number of distinct variables involved in trees, and mismatch between high level user's concepts and low-level features used in explanations. We published a couple of papers on this matter recently:

- Re-interpreting Rules Interpretability, https://doi.org/10.1007/s41060-023-00398-5

- Visual Analytics for Human-Centered Machine Learning, https://doi.org/10.1109/MCG.2021.3130314

FILWD

Discussion about this post