Playback speed
×
Share post
Share post at current time
0:00
/
0:00
Transcript

Doing Data Work Assisted by an "AI Buddy" with Tyler Sloan

Tyler walks us through recreating visualizations of the Iris Dataset with ChatGPT

Let me start from the end. I’m completely blown away by the way chatbots can support reasoning if/when used creatively and properly. In this video interview and tutorial, I chat with data scientists and visualization expert Tyler Sloan. I connected with Tyler a while back to talk about the use of LLM in constructing data pipelines and he mentioned to me he regularly uses chatbots such as ChatGPT to assist him with his data science work. Driven by curiosity to discover how similar his workflow was to mine, we agreed to record a video walkthrough and share it with the world. The video above is the result of our chat.

You can go ahead and watch the whole thing or part of it. The video is long and, at times, a little slow because we get involved in a few trials to obtain a given result. I decided to keep it as it is and avoid editing to give you a sense of how one can develop data pipelines with an AI buddy. In the walkthrough, Tyler attempts to recreate two visualizations from the original paper in which botanist Edgar Anderson developed the classic iris dataset, including these beautiful ideographs, a visualization technique I had never seen before.

Diagrams showing the peta and sepal length and width.Diagrams showing the peta and sepal length and width.
Ideographs from the original Iris Dataset comparing sepal to petal width and length.

This is what we covered in our conversation:

  1. Intro (00:00)

  2. Tyler’s neuroscience journey (01:12)

  3. Ho Tyler started using ChatGPT (05:36)

  4. The fascinating story of the iris data set (09:46)

  5. Setting the stage (15:56)

  6. Recreating the bar charts (28:15)

  7. Recreating the ideographs (42:21)

  8. Reflections on the experience (59:58)

  9. A call to action: try yourself! (01:07:16)

Personal Reflections

Here are a few personal observations about the experience.

  1. Offloading boring tasks. ChatGPT works great to offload boring data wrangling tasks and most of the time it performs this kind of tasks without a single problem.

  2. Mistakes are fine. ChatGPT always makes mistakes, but it’s not a problem as long as you can detect them. The way interaction with LLMs works is that you try something, check the results, and provide more feedback when it’s not perfect. This is the way it works. The art is in creating progressively refined prompts.

  3. AI needs context. Most of the art of working with LLM is to provide rich context. However, you never know what type of context is going to be helpful. In other words, there is no direct association you can predict between the information you provide and the output you receive, but over time, you develop a sense of what context might help your AI go in the desired direction to obtain the results you want. This is a unique new way of interacting with machines. It’s a little weird, but it’s fine after a while.

  4. Using images in prompts. I was completely blown away by the way Tyler used images to give hints to ChatGPT. It turns out that if you want a specific type of plot, giving it an example seems to actually help! This is a modality I had not considered before, but it makes total sense.

  5. Integration in data workflows. How is AI going to be integrated into data workflows? It seems evident to me that we are going to see way more integration in the coming years (months?). Visualization and data analysis tools will probably be very different from what we have now. The interaction is pretty rudimentary for now (copy and paste). However, I suspect we will see a lot of solutions on how to integrate LLMs in data workflows.

  6. Verification as a bottleneck. I have written about this idea several times when discussing LLMs and data work. Verification is the bottleneck. As long as one can observe a mistake, it’s not a problem, but mistakes can be hidden, and it is not evident how to catch them other than reading the code. We will need new tools to help people review the output and identify hidden errors.

That’s all for now. I hope this video will inspire you to create your own visualizations. If you create something on your own, please share it here in a comment or reach out to us. It’s fun!