I was listening to Cole Nussbaumer Knaflic interviewing Steven Franconeri a few days ago (excellent episode!) and I was struck by something they mentioned towards the end regarding the illusion of being neutral when visualizing data. Many seem reticent to editorialize their data visualizations for fear of being perceived as biased. Cole was mentioning strategies she advocates for that bring more clarity and focus but necessarily lead to more direct steering of the reader’s attention (highlighting, annotations, simplification, etc.)
This is an interesting problem. Should we provide as much information as possible, open to the widest possible interpretations, or guide the readers in absorbing the information we want them to absorb?
I am not sure I have a definite answer to this question, but I do think it is important to realize that most of the steering/biasing come from many other steps that happen well before one decides how to visualize something. While egregious visual tricks exist to fool people with data visualization (see Alberto Cairo’s book on the topic), the real and more subtle issues happen in the preliminary stages of the data visualization process. Let’s try to analyze them together.
1) Developing a data analysis goal. Most data endeavors start with a given goal to understand something with data. There is always some preliminary data analysis to pursue a specific goal. While in an ideal world this phase would include the highest level of skepticism and the widest possible context, the truth of the matter is that in many cases (the majority?) we start the analysis with a preconceived notion of what we would like to find. In turn, this influences what type of data and metrics we will look into, what type of analyses we will perform, etc. Journalists want to develop a specific story. Scientists want to demonstrate a specific phenomenon or theory. Business analysts want to advocate for a specific course of action. Etc. The way we decide to pursue the analysis is already influenced by what we would like to achieve with the data.
2) Data generation and collection. For any given data analysis goal there is a myriad of potential data sets one can use and different data sets can potentially paint very different pictures of the same phenomenon. When choosing or producing a data set one can decide what elements to focus on and their level of granularity, different time spans, and completely different metrics.
3) Data transformations and statistics. Another powerful degree of freedom is what kind of transformations and metrics one decides to implement with the data they have collected. This is where one can decide to include or exclude certain elements; aggregate objects and metrics at different levels of granularity; compute certain specific metrics or statistics; focus on specific geographical regions or time spans; decide what to compare to what; etc. The combinations are endless.
4) Data visualization. The final step is the one where one has to finally decide how to communicate visually the information gathered through the previous steps. This is where one still has some leeway by deciding which graphical properties to use. One can decide to use bars or lines, color or size, different scaling factors, etc.
—
When observed from this point of view, it seems to me that the problem of guidance/steering in visualization is less pertinent than it may seem at first sight. There is so much leeway in steps 1 to 3 that worrying about what one decides to do in step 4 seems marginal in comparison. If guidance can help with interpretation and at the same time has little detrimental effect compared to other steps of the process, maybe we should be more open to the idea of direct guidance than we are used to.
We keep insisting on this idea of depicting data without giving enough relevance to the idea of communicating messages. The two things go hand in hand. Of course, it would simply be silly to try to communicate a message that has no reality grounded in the data. But what is often overlooked is that it’s equally important to have a clear idea of what kind of message one wants to communicate. The reason is that by making the message explicit at the design stage of visualization, there are many choices one can make to facilitate the communication of that message (Cole has some great ones in her book). Without a specific communicative intent, we are left with this abstract idea of using the “right” visualization for the data we have, which is not particularly effective.
Anecdotally, I found this problem when we worked with climate scientists who wanted to learn more about how to communicate their data. One of the major findings of our work (see the paper here) was that scientists do not really distinguish too much between what kind of visualizations are needed to perform analysis versus communication and because of that they do not know how to do the editorial work of communicating data effectively once they know what they want to communicate.
There is an interesting scientific experiment that provides support to what I am writing here. Younghoon Kim and Jeffrey Heer studied the effectiveness of different representations of the same data (see the paper here) using different tasks (and different data distributions) and found that certain encodings work best for different tasks. In other words, you can’t really design visualizations considering only the data, you also have to consider what readers are expected to do with your visualization.
In conclusion, what I am suggesting here is not that we should give free rein to visualization practices that are highly questionable and that give too much power to designers. What I am suggesting is that visualization design can be more effective if we accept the idea of guiding the reader and realize that most of the biasing problem originates from the steps that take place before any piece of information is visualized.
Regarding how to deal with the problem of how much leeway there is in the steps that precede visualization there is a lot more to think and write. Much more than I can fit in this post. In the meantime, I think it’s important to discuss more the mindset we use while designing visualizations and be more open to the idea of designing visualizations that effectively convey a message and not only “data”. Similarly, we have to discuss more the value of guiding the reader and how to do it in a way that makes visualization more effective while avoiding detrimental biasing.
Hello Enrico, sorry I am writing here in the comments, but I sent you some letters and got no reply.