Analyzing (My Workouts ποΈββοΈ) Data with ChatGPT
I recorded myself doing data analysis with ChatGPT. Here is a summary and a few reflections on the experience.
I am convinced that the best way to understand a technology is to use it. There is a lot of talk about ChatGPT and its use for data analysis, but I can only form an opinion by performing some data analysis on my own and seeing what I get.
For this reason, I decided to go through a data analysis session while recording it and make it publicly available.
I donβt know if anyone has the patience to go through it, really, but I think itβs good to have it documented. What I was hoping to capture by recording a live video was natural occurrences of issues or ways in which ChatGPT surprised me. Writing this right after the analysis session, I can attest that I was not disappointed. In the video, there are moments when I am pretty excited and blown away, as well as when I failed to obtain what I wanted.
My workout data
For this initial analysis, I thought I would use something personal. This is because 1) I have an emotional attachment to it, 2) I have some specific curiosities, and 3) I can more easily get a sense of whether the results of the analysis are plausible because the data is literally about me.
I obtained the data as an export of an application I have been using for the longest time to track my habits. When I started using it, I focused on a number of different habits, but eventually, I ended up using it only to track my workouts. For this reason, the whole analysis is exclusively about workout data and not habits in general.
The data table contains a bunch of fields, but only a small subset is relevant. This is the way ChatGPT described the data right after I uploaded the file and asked it to tell me βwhatβs in the data.β
What is remarkable is how good ChatGPT is at guessing what the dataset represents. See the first sentence of its response, where it correctly states that the data set is about habits that have been tracked with the Coach.me app.
As you can see, many fields are uninteresting. For the analysis I focused only on the dates, the type of habit (exercise) and the notes, which is where I add notes about what type of workout I did, with specific exercises and repetitions, as well as where I performed the workout (at home, at the gym, at a specific park outside, etc.)
Temporal analysis
The first thing I asked was, βShow me a graph with the frequency of habits over time.β It returned the graph below with frequency for every single day.
This is not what I wanted, so I asked ChatGPT to aggregate the data to display frequency on a monthly basis, and it complied without a single problem.
I tried a few more things, and eventually, I managed to have a plot depicting how many workouts I did in a month for the entire time span of the dataset.
As you can see, I was pretty good between 2019 and 2020 and more lately in 2023!
One thing to notice before moving on is how ChatGPT adds explanatory titles on the top right of the chart (see the explanation of the Y and X axes) and an explanation at the bottom. I like the explanation quite a lot because it allows me to verify, to some extent, that what ChatGPT has done corresponds to what I wanted.
After this, I challenged ChatGPT to create a βsmall multiplesβ version of the timeline, with each plot representing an entire year. If you look at the video, youβll see that it took me a while to obtain what I wanted, but overall, I was impressed by what it could do. It really surprised me that it understood immediately what I meant by small multiples and what kind of small multiples I wanted (even though, in its first formulation, the plots were arranged in a grid instead of being stacked vertically).
One remarkable thing is that ChatGPT can easily interpret statements that ask it to change some aspects of the chart. Here is an example where I simply asked, βUse circles instead of crosses as symbols.β
I like the way ChatGPT seamlessly interprets statements that are basically reformulations of the same intent and small modifications of the plot's graphical aspect.
Keyword analysis
As a next step, I switched to analyzing the field called βNotes.β I write a few notes after each workout to keep track of what kind of workout I did, sometimes with full details on repetitions and exercises and where I performed the workout. My notes are very messy and inconsistent, so I was curious to see what I could do with ChatGPT.
First, I tried to get an overall sense of what was there, so I asked, βShow me the top 50 terms in a word cloud where the terms are colored by frequency.β This is what I received as an output.
Here, I could recognize some important terms referring to specific exercises and locations, so I went on analyzing exercises first by asking, βKeep only terms that represent exercises.β
This looks more like a summary of the most frequent types of exercises I do. It looks like squats are a staple of mine!1 πͺ
However, if you look carefully, you realize that some words here are really parts of a single term. Push-ups and pull-ups are a single word, not separate as in the word could. This is where I wanted to test the real βmagicβ of LLM. I was pretty sure ChatGPT could handle this without a problem and asked, βWhen appropriate stitch together terms that represent an exercise. For example, "push" and "ups" should go together to form the single term "push-ups." The results are better but some terms like βbench pressβ are still disconnected.
However, as we consider this, itβs important to pause for a moment and consider the pain of doing this kind of analysis with a traditional tool. Analyzing text data this way would be much slower and effortful! I find that unstructured data analysis is where LLMs shine and provide a big boost in reduced effort and novel capabilities.
Once I got what I wanted, my next question was about the temporal distribution of these terms. I was curious about that because I know I have changed my habits over time. I was curious to see if I could detect these habit changes in the data. To achieve that, I typed, βI want to look at the frequency of these terms over time.β Then I typed, βPlace each exercise type in its own plot. Use a set of stacked line charts.β
Initially, the result looked like what I wanted (even though the plots were too small to see the trends, and I had to download the file and open it with an external app), but upon inspection, I realized the results did not add up. This is an interesting aspect of doing data analysis with ChatGPT; when things do not add up, it is not evident how to verify what happened. One way is to sift through the code. Another is to open the dataset with an external tool to verify the results. This is what I did for some of the terms that represent specific exercises, and I realized there were some relevant discrepancies. One was due to spelling pushups in two different ways (βpush-upsβ and βpushupsβ), but even after rectifying, there were still some discrepancies.
After that, I tried to extract information about βwhereβ I do my workouts, always analyzing my notes, but I had way more trouble than I expected. My notes contain information about places such as βgymβ or βhome,β but often Iβll just type the name of the gym or the name of the park instead of βgymβ or βpark.β My expectation was that ChatGPT would do some magic with my requests and could figure out the names of the places somehow, but I had to give up after several attempts (you can see this in the last part of the video). Maybe I did not learn yet how to ask what I want (this is a general issue I have noticed - learning how to ask what you want is really important), but I can say this is the only task that was a complete failure.
Reflections
I am really glad I spent almost an hour analyzing a single dataset. And I am glad I decided to use a personal dataset. This allowed me to verify more directly the result because I could assess their plausibility.
The βmagic.β I am sure I am not the first to state that LLMs often feel like magic. What feels like magic, specifically when doing data analysis, is how seamlessly ChatGPT interprets my instructions. Even though I could not verify how accurate it is, one thing that really impressed me is its ability to analyze and summarize the information displayed in a plot. You can ask it something like, βTell what are the major trends in this plot,β and it it will return something.
Different type of questions. Looking back at how I interacted with ChatGPT, I realize that I asked questions that are fundamentally different in nature. Some questions are about some analysis or result I want and do not specify what kind of plot I want. For example, I can ask, βShow me how the frequency of exercises changed over time. For this type of question, ChatGPT has to guess what is the best representation. Sometimes (or often?), it does not return the type of chart I want, but itβs not a big deal because I can always ask for a different type of plot. In this data analysis session, I was almost always to get the type of plot I wanted quickly. I was really impressed when I asked for a small multiple version of a chart, and it did it without a problem. Other questions include specifications of what kind of plot I want to use for my specific question, which generally worked quite well. Another situation is when I want some reformulation. This can refer to different ways to process the data (e.g., aggregating monthly) or adjustments I need for the plot (e.g., showing only labels of the month and year rather than for each single day). Finally, sometimes I asked questions about interpreting the results, and as mentioned above, I was really impressed by the results.
Verification. The real bottleneck. The elephant in the room for data analysis with LLMs is verification. I already knew that before starting, but now I am even more convinced. When you ask ChatGPT to perform some data analysis tasks, you donβt have many ways to verify that what it did is βcorrect.β One way is plausibility, and this is why domain knowledge is relevant. But maybe relying on plausibility is too risky. After all, the worst mistakes are those where the machine does something that looks plausible, no? ChatGPT allows you to look at the code it produces, but it is evident this canβt work for a very large segment of the population. I have to say that even for someone who is able to understand what code does, I did not feel compelled to use it as a verification mechanism. ChatGPT also provide some summary statement after each plot to describe how the plot has been generated, but I am not sure one can rely on that. In some cases getting access to the raw data is the only way to verify that what the LLM has done is potentially correct, but this is also not a scalable solution I think. This is an area where more solutions are urgently needed.
β
If you try data analysis on your own after reading this, please let me know. Iβd be curious to learn about your experience. Or maybe you have already done it? In that case, please leave a comment below to share your experience.
If you think somebody you know may be interested in this article and the associated video, please share it! This will help me grow the audience and reach more people who might benefit from my work. Thanks! π
Later on, I realized the calculations here are inaccurate and other exercises are similarly frequent.
Hi Enrico, thanks for documenting and sharing your experience, happy to see that it's been largely positive, as has been my experience too.
I've been using ChatGPT in a professional coding capacity for visualization for the past year or so, best decision ever. I know people throw terms like 10x dev around.. but in terms of productivity I don't think it's an exaggeration. You've really nailed it with this article, especially the ultra mundane and gritty tasks like parsing text, counting occurences, matching dictionaries, it handles beautifully, and actually teaches more efficient/ pythonic ways of coding. But, I think the true power comes when you already have the expertise of knowing how to control all the different aspects of a plot, and know the specific language to refer to what you want, it makes far less errors when you remove ambiguity.
All in all I think it's changed the way I work from being a developer to more of a project manager. I have far more mental energy remaining to verify something it has written in 20 seconds based on a prompt, than I do to test functions I've spent 2 hours writing/debugging the artisan way. For that (more holistic/pragmatic) reason, I believe the quality of code is actually superior at the end of the day. Most of my visualization is done in Blender, a notoriously unintuitive programming interface with Python. I'm always blown away, so much that I rarely bother looking through my own codebase for what I've done before, always quicker and more performant to just regenerate as needed.
If you've ever seen 'the expanse', they have a natural language processing 3D visualization holograph computer, where the user prompts the computer to simulate celestial orbits and trajectories based on situation X. We're not *quite* there yet from high-level prompts to seamless visual output, but I think it's a valuable analogy for the role that Llama can play in modern visualization practice.
"The artisan way" -- great term!