Inferential Steps in Data Visualization
How do we build plausible models of the world in our heads from charts?
Once you start seeing something under a new lens, a new world of wonder opens up before you. This happened to me a while back regarding how we talk about, study, and present data visualization.
Visualization research is mainly centered on effectiveness and accuracy. How quickly can you perform a task? How accurately can you extract a value from a chart? But this is not what matters really. At least not exclusively. Reading a value more accurately than another is not that important if you think about it. Even reading something faster is not that relevant (except that it can be used as a proxy for mental effort, which is indeed relevant). What matters is whether a given visualization helps you think effectively about the reality it represents. It’s more about the cognitive impact of logical and spatial arrangements than reading quantities faster or better.
What strikes me as a significant gap is that we do not have a good mental model to help us think about how we use visualization to draw inferences. Drawing inferences means concluding something about the real world starting from the analysis of a description of the real world, which in our case is data and their visual representation. Drawing inferences is what we do (mostly implicitly) all the time with visualization, yet we do not seem to talk much about it. At best, we talk about whether a visualization can be comprehended or interpreted correctly, but the ultimate test is whether what one draws out of a visualization is a correct, or at least plausible, representation of reality.
Think about it. When you observe a visualization, you try to infer something about the reality it represents. We are not interested in data per se but in understanding the reality the data represents. That’s the only thing that matters!
One useful exercise I’d like to do in the future is to categorize charts according to what kind of inferences we typically make from them and then define ways these inferences may be wrong. It seems almost impossible to do exhaustively, but maybe there is a way to create an initial set of categories. In a way, this is related to one of my previous posts on what I called “implied causality,” the causal implications we often draw from charts. As I write this, I realize that implied causality is probably only a subset of the inferences we draw from data visualizations and that mapping more inferences could be useful. Whenever we look at a chart, we build in our head a model of the world that is more or less plausible according to 1) how the chart has been designed and 2) our ability to critically absorb and judge the information contained in the chart. Having a better sense of what kind of inferences exist could help us think more critically about what we do when we read charts and also predict what our readers may do when they read our charts.
We often draw inexact inferences from the charts we produce or are shown, and we do not understand where things break down. While in visualization we have a good history of how graphical representations can mislead readers (and ourselves!), we do not have a good sense of the larger set of reasons why a gap between our inferences and reality may exist.
I came to think about this in even more depth recently when I read this intriguing paper titled “Misleading Beyond Visual Tricks: How People Actually Lie with Charts.” The paper introduces this very same idea that visualization can mislead in ways that go well beyond the graphical design elements a visualization uses. I strongly suggest you take a look at it! The authors studied a large set of COVID-19 charts taken from Twitter/X and categorized the reasoning errors they found. While I am confused by some of the authors' choices (which I hope to describe in a later post), I am excited by the fundamental new notion they present in the paper. It introduces the idea that people can draw wrong conclusions from charts that are perfectly fine from the graphical representation standpoint (e.g., no truncated or flipped axes).
In my future writings, I hope I will be able to create a more precise understanding of this problem. I don’t have many answers yet, but I hope I will be able to generate ideas that help me and you think about this more systematically.
Hi Enrico, the link to the paper says "Page not found". Could you please check it?
When I read that we should categorize charts by the type of inferences they allow us to perform, I thought that what we look for probably looks more like a graph. Some nodes are the types of inferences and others are the charts. We connect each chart to the type of inferences they allow us to perform. Perhaps the weight of the edge could be related to how that inference might go wrong.