What Matters, Really, is Effective Data Thinking
Reflections on my frustration teaching data visualization without focusing on effective data-driven thinking
There’s a casual tweet I wrote over the weekend that gained way more traction than I expected.
The more proximal reason for the tweet is the frustration I felt realizing how insignificant it is to teach visualization to students who did not yet acquire the skills of data thinking, that is, thinking properly and effectively with data. As part of my data visualization course I assign a group project that requires students to do quite a bit of preliminary work before starting to visualize any data. The project requires students to formulate a data analysis and presentation goal and to generate (1) a series of “data questions”; (2) corresponding data transformations and models; and (3) a set of initial sketches of the associated visual representations. This is done on purpose. One of my mottos in the course is that data visualization is not only about visual representation, but also about asking good questions and generating appropriate information from the data. I tell my students there is no way to do proper data visualization without acquiring these skills because all of them are needed to generate an effective data visualization.
And inevitably when I get to this stage of the course I start seeing a lot of problems. Students have a very hard time coming up with a coherent set of questions. They seem to solve the problem mechanically. There’s no holistic sense of what is an interesting question and how to tie questions together in a coherent narrative. For a while I thought it was just a matter of explaining how a good sequence of questions could be designed, but the problems are way more basic than that. Students don’t know how to formulate a question properly. The questions are often vague and sometime even meaningless or abstruse. The data transformations and aggregations do not match the questions they have, and often they reveal a poor understanding of what is the actual meaning and organization of the original data.
Now … I am not writing this to shame or blame my students! Not at all! If anything, I am to blame if I am not providing the right content and learning methods for them. My reflection is more about something more troubling: if a very large proportion of highly educated students who were admitted to a top-tier American university have so many basic problems in thinking properly with data, how can we expect the public at large to be able to think properly when they are presented with news and information based on data? It’s almost comically tragic to think about how excited we have been for a long time about concepts like “data journalism“. Almost as if magically we will all become smarter because articles in newspapers contain more “data”.
I must confess I am troubled by these observations also because I am wondering if my own teaching can eventually even be detrimental. Is it possible to teach data visualization to students who don’t know how to think properly with data? Are there any unexpected consequences associated to that? And should I actually spend more time teaching data thinking before I move on to teaching actual data visualization? Or maybe data visualization is the best vehicle to teach data thinking, if only I knew how to frame it properly?
I do not have a definite answer to these questions, but I can’t help but notice that we, as a community, may be at fault here. With our obsession with whether a pie chart is “better“ than a bar chart, is it possible that we have lost sight of what actually really matters? Let me be blunt: it does not really matter how pretty a visualization is and whether it uses the latest technology or the fanciest graph. These are nice things to have, for sure. But at the end of the day, what really matters is thinking. Appropriate and effective thinking.
Now, it is certainly true that the way information is represented plays an important role in effective thinking. I do not deny that. But in reality what really matters is not how you visualize something, but what questions you ask and what information you decide to visualize in the first place.
I wish I had a more systematic way to talk about this problem (and in all fairness there are elements of my visualization course that try to address these gaps already) but many of the thoughts I have are still very much in flux. I plan to write more about this topic in the future because I really feel we need to make progress in this direction.
From my side there are a couple of ideas I am working on and that I’ll try to communicate with more details in future posts. The first one, is about modules of my course I have been designing to address some of the issues I mentioned above. The modules clearly did not solve the problem yet, but I suspect it is more a matter of providing better hands-on training than the actual concepts I cover in the modules. The second one, is an idea I have about developing a new course on Data Thinking. I have been mulling over it for quite a while and I suspect I might be able to make quite some progress sometime next year.
That’s all for now. Thanks for reading and let me know what you think!
"Data Thinking" - I love that idea for a course. I've also been struggling with the same gap in knowledge I see in students in a visualization course. Like what you're trying, I'm adding in a module that has no visualization, just asking questions of data first. But there's only so much time of a course that is supposed to be on visualization and communication that I feel I can realistically devote to "data thinking," especially given where the course is supposed to fit within student's overall program/curriculum. Hoping to work in some more of it via feedback on their projects during the course. Eager to see what you come up with, and how you sell it to others/convince others of the need for such a course. Whenever we want to add something, we have to let go of something else in the program.
Hi Enrico, I agree with you.
I wrote few months ago this article https://medium.com/@ciemme.25/complex-visualizations-and-visualized-complexity-how-can-we-interpret-the-world-around-us-122a76de807a
In my opinion the points, connected with data representation, are two: on one side there is “how we think”, what method we use to create knowledge through hypotheses and confirm them through data.
On the other side, there is the “what we look at”: scientific thinking may not be enough if the reference frame is a linear and reductionist worldview. The effort required is therefore double: an awareness of which tools we use to think, and the ones used by those who present us a thesis; and the comprehension of the basic characteristics of complex thinking, necessary to interpret a complex world.
We need technical skills for data visualization, but the framework should be the understanding of the complexity paradigm, to think before many “right questions” and to place then our results in a wider systemic view.