Analyzing (My Workouts 🏋️‍♂️) Data with ChatGPT

I recorded myself doing data analysis with ChatGPT. Here is a summary and a few reflections on the experience.

May 24, 2024

I am convinced that the best way to understand a technology is to use it. There is a lot of talk about ChatGPT and its use for data analysis, but I can only form an opinion by performing some data analysis on my own and seeing what I get.

For this reason, I decided to go through a data analysis session while recording it and make it publicly available.

I don’t know if anyone has the patience to go through it, really, but I think it’s good to have it documented. What I was hoping to capture by recording a live video was natural occurrences of issues or ways in which ChatGPT surprised me. Writing this right after the analysis session, I can attest that I was not disappointed. In the video, there are moments when I am pretty excited and blown away, as well as when I failed to obtain what I wanted.

My workout data

For this initial analysis, I thought I would use something personal. This is because 1) I have an emotional attachment to it, 2) I have some specific curiosities, and 3) I can more easily get a sense of whether the results of the analysis are plausible because the data is literally about me.

I obtained the data as an export of an application I have been using for the longest time to track my habits. When I started using it, I focused on a number of different habits, but eventually, I ended up using it only to track my workouts. For this reason, the whole analysis is exclusively about workout data and not habits in general.

The data table contains a bunch of fields, but only a small subset is relevant. This is the way ChatGPT described the data right after I uploaded the file and asked it to tell me “what’s in the data.”

What is remarkable is how good ChatGPT is at guessing what the dataset represents. See the first sentence of its response, where it correctly states that the data set is about habits that have been tracked with the Coach.me app.

As you can see, many fields are uninteresting. For the analysis I focused only on the dates, the type of habit (exercise) and the notes, which is where I add notes about what type of workout I did, with specific exercises and repetitions, as well as where I performed the workout (at home, at the gym, at a specific park outside, etc.)

Temporal analysis

The first thing I asked was, “Show me a graph with the frequency of habits over time.” It returned the graph below with frequency for every single day.

This is not what I wanted, so I asked ChatGPT to aggregate the data to display frequency on a monthly basis, and it complied without a single problem.

I tried a few more things, and eventually, I managed to have a plot depicting how many workouts I did in a month for the entire time span of the dataset.

As you can see, I was pretty good between 2019 and 2020 and more lately in 2023!

One thing to notice before moving on is how ChatGPT adds explanatory titles on the top right of the chart (see the explanation of the Y and X axes) and an explanation at the bottom. I like the explanation quite a lot because it allows me to verify, to some extent, that what ChatGPT has done corresponds to what I wanted.

After this, I challenged ChatGPT to create a “small multiples” version of the timeline, with each plot representing an entire year. If you look at the video, you’ll see that it took me a while to obtain what I wanted, but overall, I was impressed by what it could do. It really surprised me that it understood immediately what I meant by small multiples and what kind of small multiples I wanted (even though, in its first formulation, the plots were arranged in a grid instead of being stacked vertically).

One remarkable thing is that ChatGPT can easily interpret statements that ask it to change some aspects of the chart. Here is an example where I simply asked, “Use circles instead of crosses as symbols.”

I like the way ChatGPT seamlessly interprets statements that are basically reformulations of the same intent and small modifications of the plot's graphical aspect.

Keyword analysis

As a next step, I switched to analyzing the field called “Notes.” I write a few notes after each workout to keep track of what kind of workout I did, sometimes with full details on repetitions and exercises and where I performed the workout. My notes are very messy and inconsistent, so I was curious to see what I could do with ChatGPT.

First, I tried to get an overall sense of what was there, so I asked, ”Show me the top 50 terms in a word cloud where the terms are colored by frequency.” This is what I received as an output.

Here, I could recognize some important terms referring to specific exercises and locations, so I went on analyzing exercises first by asking, “Keep only terms that represent exercises.”

This looks more like a summary of the most frequent types of exercises I do. It looks like squats are a staple of mine!1 💪

However, if you look carefully, you realize that some words here are really parts of a single term. Push-ups and pull-ups are a single word, not separate as in the word could. This is where I wanted to test the real “magic” of LLM. I was pretty sure ChatGPT could handle this without a problem and asked, “When appropriate stitch together terms that represent an exercise. For example, "push" and "ups" should go together to form the single term "push-ups." The results are better but some terms like “bench press” are still disconnected.

However, as we consider this, it’s important to pause for a moment and consider the pain of doing this kind of analysis with a traditional tool. Analyzing text data this way would be much slower and effortful! I find that unstructured data analysis is where LLMs shine and provide a big boost in reduced effort and novel capabilities.

Once I got what I wanted, my next question was about the temporal distribution of these terms. I was curious about that because I know I have changed my habits over time. I was curious to see if I could detect these habit changes in the data. To achieve that, I typed, “I want to look at the frequency of these terms over time.” Then I typed, “Place each exercise type in its own plot. Use a set of stacked line charts.”

Initially, the result looked like what I wanted (even though the plots were too small to see the trends, and I had to download the file and open it with an external app), but upon inspection, I realized the results did not add up. This is an interesting aspect of doing data analysis with ChatGPT; when things do not add up, it is not evident how to verify what happened. One way is to sift through the code. Another is to open the dataset with an external tool to verify the results. This is what I did for some of the terms that represent specific exercises, and I realized there were some relevant discrepancies. One was due to spelling pushups in two different ways (“push-ups” and “pushups”), but even after rectifying, there were still some discrepancies.

After that, I tried to extract information about “where” I do my workouts, always analyzing my notes, but I had way more trouble than I expected. My notes contain information about places such as “gym” or “home,” but often I’ll just type the name of the gym or the name of the park instead of “gym” or “park.” My expectation was that ChatGPT would do some magic with my requests and could figure out the names of the places somehow, but I had to give up after several attempts (you can see this in the last part of the video). Maybe I did not learn yet how to ask what I want (this is a general issue I have noticed - learning how to ask what you want is really important), but I can say this is the only task that was a complete failure.

Reflections

I am really glad I spent almost an hour analyzing a single dataset. And I am glad I decided to use a personal dataset. This allowed me to verify more directly the result because I could assess their plausibility.

The “magic.” I am sure I am not the first to state that LLMs often feel like magic. What feels like magic, specifically when doing data analysis, is how seamlessly ChatGPT interprets my instructions. Even though I could not verify how accurate it is, one thing that really impressed me is its ability to analyze and summarize the information displayed in a plot. You can ask it something like, “Tell what are the major trends in this plot,” and it it will return something.

Different type of questions. Looking back at how I interacted with ChatGPT, I realize that I asked questions that are fundamentally different in nature. Some questions are about some analysis or result I want and do not specify what kind of plot I want. For example, I can ask, “Show me how the frequency of exercises changed over time. For this type of question, ChatGPT has to guess what is the best representation. Sometimes (or often?), it does not return the type of chart I want, but it’s not a big deal because I can always ask for a different type of plot. In this data analysis session, I was almost always to get the type of plot I wanted quickly. I was really impressed when I asked for a small multiple version of a chart, and it did it without a problem. Other questions include specifications of what kind of plot I want to use for my specific question, which generally worked quite well. Another situation is when I want some reformulation. This can refer to different ways to process the data (e.g., aggregating monthly) or adjustments I need for the plot (e.g., showing only labels of the month and year rather than for each single day). Finally, sometimes I asked questions about interpreting the results, and as mentioned above, I was really impressed by the results.

Verification. The real bottleneck. The elephant in the room for data analysis with LLMs is verification. I already knew that before starting, but now I am even more convinced. When you ask ChatGPT to perform some data analysis tasks, you don’t have many ways to verify that what it did is “correct.” One way is plausibility, and this is why domain knowledge is relevant. But maybe relying on plausibility is too risky. After all, the worst mistakes are those where the machine does something that looks plausible, no? ChatGPT allows you to look at the code it produces, but it is evident this can’t work for a very large segment of the population. I have to say that even for someone who is able to understand what code does, I did not feel compelled to use it as a verification mechanism. ChatGPT also provide some summary statement after each plot to describe how the plot has been generated, but I am not sure one can rely on that. In some cases getting access to the raw data is the only way to verify that what the LLM has done is potentially correct, but this is also not a scalable solution I think. This is an area where more solutions are urgently needed.

—

If you try data analysis on your own after reading this, please let me know. I’d be curious to learn about your experience. Or maybe you have already done it? In that case, please leave a comment below to share your experience.

FILWD

Discussion about this post