"Data Thinking" - I love that idea for a course. I've also been struggling with the same gap in knowledge I see in students in a visualization course. Like what you're trying, I'm adding in a module that has no visualization, just asking questions of data first. But there's only so much time of a course that is supposed to be on visualization and communication that I feel I can realistically devote to "data thinking," especially given where the course is supposed to fit within student's overall program/curriculum. Hoping to work in some more of it via feedback on their projects during the course. Eager to see what you come up with, and how you sell it to others/convince others of the need for such a course. Whenever we want to add something, we have to let go of something else in the program.
Thanks for sharing your thoughts! I am toying with the idea of having a course that works as a prerequisite for data visualization. Otherwise split data visualization into two courses and make sure basic data thinking is acquired in Data Visualization I.
In my opinion the points, connected with data representation, are two: on one side there is “how we think”, what method we use to create knowledge through hypotheses and confirm them through data.
On the other side, there is the “what we look at”: scientific thinking may not be enough if the reference frame is a linear and reductionist worldview. The effort required is therefore double: an awareness of which tools we use to think, and the ones used by those who present us a thesis; and the comprehension of the basic characteristics of complex thinking, necessary to interpret a complex world.
We need technical skills for data visualization, but the framework should be the understanding of the complexity paradigm, to think before many “right questions” and to place then our results in a wider systemic view.
None is to blame. The idea I get reading this and reflecting on my experience is that people see data visualization as a mere information design problem. Also, there is the other extreme: people doing dataviz without any background whatsoever in information design. Couldn't it be nice to meet in the middle?
I am curious about the first steps in the process you outlined: 1) formulate a data analysis and presentation goal and 2) generate a series of “data questions." How do they arrive at that goal? Are they thinking of the audience and their information needs? This may be part of step 1, but I think it could also be an initial step in itself, maybe Step 0: Identify the audience, their information needs (what questions do they want/need answered), the problem/challenge they are facing, their data proficiency, etc. I think that can help clarify the goal(s) and question(s) and subsequent visualizations. From this perspective, I would also say that much of the time the intended audience's question(s) ARE the data questions. However, the audience may not know how to formulate these well and the visualization designer or analyst will need to prompt them for clarity and precision and revision. This is an iterative, time consuming process.
I agree that people don't intuitively know how to formulate good data questions, and this is not something that is taught (well or at all) in K-12 education. Students' and working professionals' initial data questions are often vague, not measurable, and/or not able to be answered. I think people need to be taught the characteristics of good data questions (e.g., well-defined, answerable, actionable, unbiased, relevant, etc.), with examples, and given opportunities to critique and improve sample data questions, as well as their own. They should also be taught how to dialogue with the audience/stakeholder so they can refine a bad data question into a good one.
I have found in my own teaching, as you have, that a lot of the time there is a lack of alignment between the audience needs/stated purpose, questions, and visualization/analysis. I am a stickler for alignment. I think it's useful to give students samples (descriptions of audience, statements of purpose, questions, data visualizations), some that are aligned, and some that lack alignment, along with a rubric or set of criteria, and ask them to critique the samples first (and justify their evaluations), and then use the same rubric/criteria to evaluate their own work. This can help them internalize what this looks like and apply it their work.
These are just a few of my ideas and some things that I have done in my own work. Not sure if there's anything new or different here, but I thought I'd share them.
You are raising a number of excellent points Susan. Re: understanding the audience and Step 0. I think it can vary a lot. Many data visualizations, whether we like it or not, are created to either answer questions the analyst has or "stories" (not a big fan of the term) to convey to an audience (I have data journalism mostly in mind). In these cases the questions are mostly generated by the author with minimal engagement regarding what questions the readers will want to answer. This does not mean that what you are suggesting here is not relevant for many other contexts! In fact, the information visualization research literature has a lot on "task characterization," which is fundamentally about what questions the end users want to answer.
So yes, there is a big need to understand what questions people have. However, as you point out, most people don't know what questions they have and even professionals have a lot of implicit knowledge, so it takes some effort and skills to really uncover what the driving questions are. And there is a real need for translating these high-level domain questions into something that can be answered with the data (Tamara Munzner at UBC has done some great work in this space (especially her Nested Model of Visualization Design and Evaluation: https://www.cs.ubc.ca/labs/imager/tr/2009/NestedModel/).
I fully agree with what you wrote above and it matches with my experience. The challenge I see is what kind of pedagogical approaches work to develop these skills. I have tried a few things over the years but felt that I don't understand the problem well enough in order to design a solution. Hence, my other post on a research agenda for understanding the role of questions (https://filwd.substack.com/p/the-data-questions-data-answers-model). Given your background in education, I'd be happy to hear if you have ideas in this space.
I think the lack of data thinking is really a lack of (or limited) data literacy. Data literacy is not just a technical skill; it’s also a way of thinking about and with data. It’s key to successfully working successfully with data (including visualizing it).
Thanks for your comment Susan! I completely agree with you. I especially agree that what is lacking the most is not technical skills but way of thinking about data. I am trying to contribute to this gap by creating new courses in this space. If you are curious take a look at what working on here: https://filwd.substack.com/p/rhetvis-1-introduction-to-rhetorical. In any case, what is your experience with Data Literacy? Do you have any recommendations or ideas on how to improve skills in this space and how to make these skills more common?
I can't say I'm too surprised to read of the challenges students have thinking first about data in an analytical way - that's a graduate-level skill that is practiced usually when students start generating their own data and exploring it (at least in a research context). We can quickly take for granted how difficult this non-technical part of the skillset is.
What sounds like is missing is the more 'exploratory' aspect of the analytical approach, where one 'probes' a dataset using a combination of scatterplots, bar charts and frequency histograms.
There's an analogy to draw with academic science papers here, often scientists (especially of an older generation) don't have much training in graphic design, but are yet able to craft compelling and convincing stories, building narratives from a limited number of graph types.
Maybe there's an exercise here .. before trying to design a perfect visualization, the student can show that they've exhaustively asked the right questions using only scatterplots and histograms, before any design takes place?
The process I describe does include EDA if you wish. In fact I see EDA as just a sequence of questions you formulate and try to refine as you progress with your analysis. The problem is: if you don't even know how to formulate those questions there is no sense if trying to do the rest. The challenge is: how do you actually teach that kind of skill? My sense from struggling with this problem for a while is that this is one of those things where an apprenticeship model is crucial. I suspect that what students need is to observe a skilled person do the work a number of times and then start "copying / simulating" the same process. Maybe this is similar to what you describe in your comment?
"Data Thinking" - I love that idea for a course. I've also been struggling with the same gap in knowledge I see in students in a visualization course. Like what you're trying, I'm adding in a module that has no visualization, just asking questions of data first. But there's only so much time of a course that is supposed to be on visualization and communication that I feel I can realistically devote to "data thinking," especially given where the course is supposed to fit within student's overall program/curriculum. Hoping to work in some more of it via feedback on their projects during the course. Eager to see what you come up with, and how you sell it to others/convince others of the need for such a course. Whenever we want to add something, we have to let go of something else in the program.
Thanks for sharing your thoughts! I am toying with the idea of having a course that works as a prerequisite for data visualization. Otherwise split data visualization into two courses and make sure basic data thinking is acquired in Data Visualization I.
Hi Enrico, I agree with you.
I wrote few months ago this article https://medium.com/@ciemme.25/complex-visualizations-and-visualized-complexity-how-can-we-interpret-the-world-around-us-122a76de807a
In my opinion the points, connected with data representation, are two: on one side there is “how we think”, what method we use to create knowledge through hypotheses and confirm them through data.
On the other side, there is the “what we look at”: scientific thinking may not be enough if the reference frame is a linear and reductionist worldview. The effort required is therefore double: an awareness of which tools we use to think, and the ones used by those who present us a thesis; and the comprehension of the basic characteristics of complex thinking, necessary to interpret a complex world.
We need technical skills for data visualization, but the framework should be the understanding of the complexity paradigm, to think before many “right questions” and to place then our results in a wider systemic view.
None is to blame. The idea I get reading this and reflecting on my experience is that people see data visualization as a mere information design problem. Also, there is the other extreme: people doing dataviz without any background whatsoever in information design. Couldn't it be nice to meet in the middle?
I am curious about the first steps in the process you outlined: 1) formulate a data analysis and presentation goal and 2) generate a series of “data questions." How do they arrive at that goal? Are they thinking of the audience and their information needs? This may be part of step 1, but I think it could also be an initial step in itself, maybe Step 0: Identify the audience, their information needs (what questions do they want/need answered), the problem/challenge they are facing, their data proficiency, etc. I think that can help clarify the goal(s) and question(s) and subsequent visualizations. From this perspective, I would also say that much of the time the intended audience's question(s) ARE the data questions. However, the audience may not know how to formulate these well and the visualization designer or analyst will need to prompt them for clarity and precision and revision. This is an iterative, time consuming process.
I agree that people don't intuitively know how to formulate good data questions, and this is not something that is taught (well or at all) in K-12 education. Students' and working professionals' initial data questions are often vague, not measurable, and/or not able to be answered. I think people need to be taught the characteristics of good data questions (e.g., well-defined, answerable, actionable, unbiased, relevant, etc.), with examples, and given opportunities to critique and improve sample data questions, as well as their own. They should also be taught how to dialogue with the audience/stakeholder so they can refine a bad data question into a good one.
I have found in my own teaching, as you have, that a lot of the time there is a lack of alignment between the audience needs/stated purpose, questions, and visualization/analysis. I am a stickler for alignment. I think it's useful to give students samples (descriptions of audience, statements of purpose, questions, data visualizations), some that are aligned, and some that lack alignment, along with a rubric or set of criteria, and ask them to critique the samples first (and justify their evaluations), and then use the same rubric/criteria to evaluate their own work. This can help them internalize what this looks like and apply it their work.
These are just a few of my ideas and some things that I have done in my own work. Not sure if there's anything new or different here, but I thought I'd share them.
You are raising a number of excellent points Susan. Re: understanding the audience and Step 0. I think it can vary a lot. Many data visualizations, whether we like it or not, are created to either answer questions the analyst has or "stories" (not a big fan of the term) to convey to an audience (I have data journalism mostly in mind). In these cases the questions are mostly generated by the author with minimal engagement regarding what questions the readers will want to answer. This does not mean that what you are suggesting here is not relevant for many other contexts! In fact, the information visualization research literature has a lot on "task characterization," which is fundamentally about what questions the end users want to answer.
So yes, there is a big need to understand what questions people have. However, as you point out, most people don't know what questions they have and even professionals have a lot of implicit knowledge, so it takes some effort and skills to really uncover what the driving questions are. And there is a real need for translating these high-level domain questions into something that can be answered with the data (Tamara Munzner at UBC has done some great work in this space (especially her Nested Model of Visualization Design and Evaluation: https://www.cs.ubc.ca/labs/imager/tr/2009/NestedModel/).
I fully agree with what you wrote above and it matches with my experience. The challenge I see is what kind of pedagogical approaches work to develop these skills. I have tried a few things over the years but felt that I don't understand the problem well enough in order to design a solution. Hence, my other post on a research agenda for understanding the role of questions (https://filwd.substack.com/p/the-data-questions-data-answers-model). Given your background in education, I'd be happy to hear if you have ideas in this space.
Thanks for writing!!!
I think the lack of data thinking is really a lack of (or limited) data literacy. Data literacy is not just a technical skill; it’s also a way of thinking about and with data. It’s key to successfully working successfully with data (including visualizing it).
Thanks for your comment Susan! I completely agree with you. I especially agree that what is lacking the most is not technical skills but way of thinking about data. I am trying to contribute to this gap by creating new courses in this space. If you are curious take a look at what working on here: https://filwd.substack.com/p/rhetvis-1-introduction-to-rhetorical. In any case, what is your experience with Data Literacy? Do you have any recommendations or ideas on how to improve skills in this space and how to make these skills more common?
Hi Enrico, thanks for this thoughtful discussion.
I can't say I'm too surprised to read of the challenges students have thinking first about data in an analytical way - that's a graduate-level skill that is practiced usually when students start generating their own data and exploring it (at least in a research context). We can quickly take for granted how difficult this non-technical part of the skillset is.
What sounds like is missing is the more 'exploratory' aspect of the analytical approach, where one 'probes' a dataset using a combination of scatterplots, bar charts and frequency histograms.
There's an analogy to draw with academic science papers here, often scientists (especially of an older generation) don't have much training in graphic design, but are yet able to craft compelling and convincing stories, building narratives from a limited number of graph types.
Maybe there's an exercise here .. before trying to design a perfect visualization, the student can show that they've exhaustively asked the right questions using only scatterplots and histograms, before any design takes place?
The process I describe does include EDA if you wish. In fact I see EDA as just a sequence of questions you formulate and try to refine as you progress with your analysis. The problem is: if you don't even know how to formulate those questions there is no sense if trying to do the rest. The challenge is: how do you actually teach that kind of skill? My sense from struggling with this problem for a while is that this is one of those things where an apprenticeship model is crucial. I suspect that what students need is to observe a skilled person do the work a number of times and then start "copying / simulating" the same process. Maybe this is similar to what you describe in your comment?