What Can AI Do for Data Visualization?
A thought experiment on how AI can provide support to data visualization
AI is all the rage these days. Well, not AI in general; it’s actually large language models and generative AI, but we see AI everywhere, so I’ll take it.
As someone whose bread and butter is data visualization, I’ll venture into this brief thought experiment trying to imagine what AI could do for visualization in the near future. The list is roughly organized around the data processing pipeline, from data acquisition to representation and interpretation.
1. Finding, summarizing, and integrating data
Isn’t it amazing that we are still “wasting” time looking for an appropriate data set for a given problem? What if we could just ask an AI to find one or more data sets for a given problem and receive a report of the available data sets and a series of considerations about which one we might want to use? The AI could find the data we need, explain where they come from and how reliable they are, and finally summarize the main data attributes and values. In addition to that, the AI could also find data sets that augment data sets we already have. Imagine, for example, that I want to analyze the NYC vehicle collision data set (a data set I often use in class in my courses). The Open NYC Data portal's data set does not contain information about weather, which could be useful for finding associations between weather conditions and vehicle collisions. Why can’t I ask an AI to “Find a data set about precipitation in NYC and add information about precipitation as a column to the collisions data table?” It seems completely feasible, especially given the recent advancements in LLM tools for Google Sheets and Excel.
2. Language-based data wrangling
If you have spent even only a few hours transforming and cleaning data for a given project, you know that it’s one of those activities that can make you pull your hair. Sometimes, one needs to write a very complex set of procedures with intricate transformations (and often with mysterious regular expressions). Why bother about that if you can just ask an AI to do it for you? “Add a column to the table extracting only the year from the date column.” “Aggregate the table using the zip codes and compute the total number of collisions for each zip code.” As a matter of fact many tools already do that. The AIs developed for Excel and Google Sheets are pretty much there already and can only improve. Maybe soon, people will forget regular expressions and Excel functions? (Side note: Is it a loss for humanity when something is no longer needed? Most people can’t ride a horse anymore. Is that a bad thing? I am not sure where I stand.)
3. Language-based visual data analysis
This is also something companies are developing fast. Why bother learning to transform data and create an appropriate graphical representation if you can just ask an LLM a question? “Create a map showing how vehicle collisions are distributed in NYC. Depict the data at the level of zip codes.” “Is there a correlation between the level of precipitation and the number of collisions?” “No, don’t use a scatter plot; show me the results using a bar chart with binned precipitation values on the x-axis and the number of collisions on the y-axis.”
I find this use case particularly interesting for two reasons. First, it can go terribly wrong. Second, it allows us to understand better the value we humans bring to the table. Let me clarify. The biggest problem with using AI for data analysis is that we can never be sure all the calculations it does are correct. In our lab, we started performing initial experiments, and we were surprised by how wrong certain results were and, at the same time, how hard it would be to detect them. I will report more on the topic as our research develops (I am very excited about some new ideas we have in this space).
At the same time, looking at how LLM could transform data analysis and visualization, one develops a better appreciation for what is uniquely human. The quality of a given analysis is going to depend on the quality of the questions a user asks and the way the results are synthesized and interpreted into something meaningful. I have a personal obsession with the value of developing good questions in visualization, and maybe LLMs will help us focus more on this problem. If you want to know more about my ideas about “data questions,” you can take a look at this post, which describes the idea in more depth:
There’s another aspect of data analysis with LLMs that I find fascinating: the refinement step. When we ask a machine to do something for us, we often realize there are better ways to ask the same thing or that the thing we want to know is slightly different from how we formulated it the first time. This behavior is ubiquitous; we just don’t even realize we do that. In the early days of the web, researchers studying how people search on the web identified this behavior and often called it “query reformulation.” Prof. Marti Hearst has a great paragraph in her classic book on Search User Interfaces:
Examination of search engine query logs suggests a high frequency of query reformulation. One study by Jansen et al., 2005 analyzed 3 million records from a 24 hour snapshot of Web logs taken in 2002 from the AltaVista search engine. […] The analysis found that the proportion of users who modified queries was 52%, with 32% issuing 3 or more queries within the session. Other studies show similar proportions of refinements, thus supporting the assertion that query reformulation is a common part of the search process.
I am sure the same happens all the time with LLMs in general and, in particular, when using LLMs for data analysis.
4. Automated "data tours"
When you receive a data set for the first time, building a “data tour” to familiarize yourself with its content would be useful. Similarly, seeing if an AI could build interesting “stories” from a data set would be interesting. The analogy that comes to mind is how photo applications automatically create nice stories from your photo library. Occasionally, I receive a notification on my phone from Apple Photos or Google Photos telling me that a new story is available. I am wondering if one could do something similar with data visualization. Can I feed an AI with a data set and let the AI build interesting visual narratives from it? Would that be useful somehow? One area where I think this could be really useful is with personal data, where applications that collect your data can present automatically generated data stories for their users. Something similar is already happening in some apps like Spotify, which produces a nice year-in-review story to show you what you have listened to during the year. Similarly, sleep or exercise tracking software can produce (and some already do) little data stories to review your data.
5. Creativity booster (design space exploration)
How often have you heard that people are tired of bar charts? I am not tired of bar charts, but I am intrigued by the idea that some powerful AI could help designers explore alternative designs and take inspiration from “wild” experiments. Maybe generative AI can help people explore more solutions than they have in mind. When I teach visualization, I often mention that visualization designers need to master two fundamental skills: generative and evaluative power. The generative power is about imagining ways in which data can be visualized, such as the design space of solutions. The evaluative power is about deciding what works best for a given problem. An AI can help people explore more design solutions or at least guide people in thinking about broader solutions. Some initial attempts exist in the use of generative AI in visualization, but the existing ideas seem to concentrate more on ways to “decorate” visualizations than exploring different encodings. Nonetheless, I think that this kind of exploration is also interesting.
Some students in our lab and colleagues from other institutions published this interesting paper, “Doom or deliciousness: challenges and opportunities for visualization in the age of generative models, “ in 2023. The paper explores the application of generative AI to data visualization. Check the teaser from the paper. Isn’t that fascinating already?
These are just five possible applications of AI to data visualization that came to mind. Admittedly, they are not particularly original, but at the same time, they may very well happen at a large scale sometime soon, and, possibly, they may revolutionize the way we do data visualization. Or maybe not?
As the landscape of data visualization changes with AI technological advancements, an interesting question arises: How will the skills necessary to do data visualization change? As someone with a strong stake in education, I feel I have a strong responsibility to answer these questions. My gut reaction is that as long as we teach fundamental skills, there is nothing special about these tools that make data visualization skills and knowledge obsolete. I have always made a point to avoid teaching data visualization around technologies. I constantly strive to teach visualization without being particularly dependent on any technology, while I recognize that developing specific technical skills is important. As AI brings remarkable innovations, this is even truer. After all, the basics almost never become obsolete. So, I feel confident that in the future, we will still teach pretty much the same basic ideas and hopefully some new ones derived from intellectual advancements in our field.
—
What do you think? Do you have other ideas about what AI could do for data visualization? Please share your ideas by leaving a comment below!
Interesting article, with good questions about AI and Dataviz.
Have you looked at this project by Microsoft Research? https://microsoft.github.io/lida/
Though tech companies are including AI features on its data viz tools, there are a few articles talking about its impact. I'm excited to see how it would change data visualization processes, particularly how data engineering could optimize datasets for end users' discoveries. Great article!