Implied Causality in Line Charts
Exploring ways in which line charts may communicate causation when the is none!
Did you ever notice? When reading a chart, it’s easy to come to some conclusions. Take the first line chart from the left below. Something happened after the event, and from that point on, the curve went down. The event caused the curve to bend down!
When we look at graphs, we often produce (implicitly or explicitly) some causal inference of type: A caused B.
It’s surprising how little we talk about this aspect of data visualization. In so many years in which I have been a researcher in data visualization and an active disseminator of knowledge in this field, I have seen surprisingly little about the role of visualization in causal thinking (a notable exception is this paper on “the illusion of causality in visualization”).
Think about it. Isn’t it essential? When we look at graphs, understanding whether it is reasonable to think that something is the leading cause of a given phenomenon or outcome is essential. Just think for a movement how important this has been (and still is) during the recent pandemic we experienced. So many relevant questions of a causal nature!
Many people interpret graphs as demonstrating that a causal link exists between a given factor or event and an outcome of interest when the phenomenon’s reality is way more complicated.
So, how can we make it easier for people to think about implied causality in visualization? More precisely, how do we equip people with simple skills to prevent them from drawing far-fetched conclusions when other equally valid explanations are possible?
One small step is to start looking at different chart types and to reason about (1) how these charts can be used (or are currently used) to imply causality and (2) how faulty reasoning may stem from them.
Here I’ll start this process by focusing on line charts, maybe the type of graph that most lends itself to this type of analysis.
Types of implied causation
I have identified three main cases for line charts: event, factor, and covariation.
In depicting these cases, I will use fictional examples of potential relationships between the flu and things that may or may not affect the flu. You can imagine the same logic applied to many other cases.
Event/Intervention
This is when the main trend of a line chart changes after an event (typically a direct intervention) takes place, that is, after the event, the trend line changes its course. But the effect may be an illusion in many ways. Imagine a line chart showing body temperature improving over a few days during the flu and trying to attribute the improvement to a specific drug you are taking.
The drug may make you sleepy, and more sleep may be the actual cause of the improvement. Other than taking the medication, you also drink ginger tea, and this is what makes you improve. Or, even more interesting, your improvement pattern would be the same with or without the drug, so the effect is just an illusion; you are only experiencing the normal evolution of the disease.
Factor
This is when multiple timelines are separated by two (or more) groups that differ according to a factor. The two groups have consistent trends within the group and different trends between the groups. It’s easy to conclude that the thing that characterizes the two groups is the cause behind the different temporal trends one observes in the chart. Still, the two groups may differ according to other characteristics that are the actual cause behind the difference.
Imagine analyzing data about two groups of people who experience flu and have different temporal patterns. One group uses aspirin, and the other uses ginger. The group using ginger improves much faster than the group using aspirin. Ginger is so much better! But it turns out that the two groups differ in many ways other than their favorite treatment of choice. The group taking aspirin is much older and does not have a healthy lifestyle. But the group using ginger is mostly hipsters who are much younger, like yoga, drink green smoothies, and have an overall healthier lifestyle.
Covariation
This is when two line charts display a strong temporal, direct, or inverse correlation. A classic example is when more of a thing in one line chart corresponds to less/more of a thing in the other. Often these effects are depicted in a double-axis line chart, which is itself problematic.
Now, imagine a plot that shows the temporal evolution of flu cases and the number of bike rides in your city. As bike rides go up, flu cases go down, and vice-versa. Bike rides make you so healthy! Or it may simply be that people take more rides when the weather is nicer, and when it is nicer, there are fewer flu cases.
How about the absence of an effect?
In the examples above, I have assumed graphs are only used as a rhetorical device to demonstrate that a give causal effect exists. But graphs can be equally useful to persuade someone that a given effect does not exist. Imagine the three graphs above, from left to right, where (a) an event is supposed to have an effect, but there is no discernible change in the line chart, or the change is in the direction opposite to what is expected, (b) the two groups of entities are supposed to have different temporal trajectories, but they look the same or have no discernible pattern, (c) the two measures are supposed to be correlated, but the two line chart do now show any temporal correlation. Reasoning about cases where such graphs may be misleading seems a bit harder, but still possible. Imagine using the event graph type to argue that an effect does not exist because the line is flat after the intervention. It may still be possible that the intervention is, in principle, effective but is nullified by something else that happens at the same time. That said, I find the absence of an effect, in general, more persuasive than the presence of an effect.
Why does this matter?
I am convinced that it matters a LOT. Helping people (and myself) reason better with data is crucial for our society. In the last 10-15 years, our access to data and tools to work with data has increased enormously. What has not increased (in my estimate) is our ability to reason with data effectively. Without that knowledge, it’s so easy to fool ourselves and, even more, to get fooled by others. I want people to think by themselves and feel they have the tools to do that on their own. I am uncomfortable with the idea that we must rely exclusively on others to have an informed opinion. Everyone equipped with computer and data skills is a valuable resource for our society. While I think experts and expertise are important, I believe that individual citizens equipped with their skills and knowledge can contribute enormously to the discourse based on evidence. It’s up to us to bring the right tools and skills to have a society more based on open discussions and critical thinking.
Beyond toy examples (what can we do?)
I presented a tiny exercise here, but I am convinced it could be done at a much larger scale and deeper level. Understanding how visualizations may lead people to (wrongly) infer causality and equipping them with the necessary skills to evaluate their reasoning seems very relevant to me.
I imagine at least three relevant activities.
Taxonomy of implied causality by chart types. Can we create a catalog of visualization types and common ways in which they may imply causality? That would basically be just an extension of what I have done here. The same exercise could be done with many other chart types (bar charts, scatter plots, choropleth/symbol maps, etc.)
Analysis of existing charts (in the wild). What kind of charts do people use when they try to convey causality? What kind of language do authors use that may imply causality when they describe what one can see in a graph?
Study people’s ability to reason with graphs. What do we know about people’s ability to reason causally with graphs? Are they able to spot problems, if present?
Ideate and study educational and technological interventions. Can we improve people’s ability to reason causally with graphs? Are there technologies we can develop to train people or help people reason more effectively?
Each of these activities informs the next one. The taxonomy helps us get a sense of what are the possibilities. The analysis in the wild grounds the taxonomy into real-world usage, allowing us to prioritize interventions. The study of how people reason with these graphs gives us a better sense of how big of a problem the proposed problem is and how people reason with these graphs. The educational and technological interventions lay the ground for solutions to help people reason more effectively with graphs.
Conclusion
I tried to reason about how graphs can imply causality and focussed specifically on line charts as an example. I am convinced the same analysis could be done with other chart types and at a much deeper level. This is just a first step. I am curious to hear from you what you think. Did I miss anything relevant? What else could be done in this space that I did not mention? Are there works I should be aware of?
—
Thanks for reading!
Agreed this is an important topic and that much more attention is needed! Our lab has a closely related paper on causal assumptions and ways we can better contextualize visualizations using counterfactuals. I think it’s important to keep in mind the types of conclusions we expect people to make people when we design visualizations. https://vaclab.unc.edu/publication/tvcg_2022_kaul/tvcg_2022_kaul.pdf