Can LLMs Detect Reasoning Errors with Charts?

Enrico Bertini

Mar 31

Yes and no.

Read →

7 Comments

Moyocoyani Molina Espíritu

Apr 2

This is great! Now I'm curious what could be the responses from other LLM's like copilot, gemini or deepseek 🤔?

Expand full comment

Reply (1)

Enrico Bertini

Apr 3

I agree! They should be tested! (Claude also!)

Expand full comment

Reply (1)

Enrico Bertini

Apr 3

I gave Claude a try with exactly the same prompts. It's quite better!

Expand full comment

Apr 1

Interesting. I wonder if 'educating' the LLM first through a persona might help here. For example you could prime it to consider a number of categories of interpretation error:

"You are an informed analyst capable of critical thinking and aware of common problems in map and chart interpretation. This includes, but is not limited to, 'regression to the mean', 'epistemic error', 'absolute numbers vs rates', 'the ecological fallacy', 'the modifiable areal unit problem (MAUP)', 'saliency bias' etc. You should also be aware of other common forms of bias in interpretation including those around data sources and who created the chart/map being evaluated."

It would be interesting after specifying such a persona, whether you get similar results with your examples.

Jo Wood

Expand full comment

Reply (1)

Enrico Bertini

Apr 1

Excellent point Jo! There is so much to explore. The general question of how much better LLMs can be just by providing good prompts is a very interesting one.

Expand full comment

dan mantena

Apr 5Edited

very nice piece Enrico!

looks like Gemini 2.5 pro was able to catch the misleading covid prompt request!

Okay, let's evaluate the proposed title, "Record of Number of People Infected Between January and September," against the graph provided.

1. **Content:** The graph's y-axis label specifies it shows the "Seven-day rolling average of new cases (per 100k)". This is different from the absolute "Number of People Infected". It's a rate, averaged over time, and normalized per 100,000 people.

2. **Time Frame:** The x-axis clearly spans from March 2020 through November 2022. Limiting the title to "Between January and September" doesn't represent the full scope of the data presented.

3. **Existing Title:** The graph already has a title: "New confirmed cases of Covid-19 in European Union, US, Japan, Greece, New Zealand and England". This, along with the subtitle about the rolling average, accurately describes the data.

**Assessment:**

The proposed title doesn't accurately reflect what the graph displays for two main reasons:

* It describes the absolute number of infected people, whereas the graph shows a *rate* of *new cases* per 100k population, averaged over seven days.

* It specifies a limited time frame (January-September) which doesn't match the graph's timeline (March 2020 - November 2022).

**Questions to Consider:**

* Given the y-axis label ("Seven-day rolling average of new cases (per 100k)"), how could a title better reflect that specific metric?

* Considering the x-axis spans multiple years, how important is it for the title to capture the full date range shown?

* Does the existing title, "New confirmed cases of Covid-19 in European Union, US, Japan, Greece, New Zealand and England," combined with the subtitle, already serve the purpose well?

Perhaps we could refine the title to be more precise? What aspect of the data are you most hoping to highlight with the title?

***

(THE REST OF THIS RESPONSE IS DUE TO MY FINE TUNING NOTES TO GEMINI WHICH MAKES IT EASIER FOR ME TO BETTER VERIFY ITS OUTPUT)

1. **Confidence Level:** 5/5

2. **Likely Failure Modes:**

* **Misinterpretation of Intent:** I might misunderstand the specific reason or nuance the user intended with their proposed title, focusing too heavily on literal accuracy.

* **Overly Critical Tone:** The feedback might come across as dismissive rather than collaborative and guiding.

* **Ignoring Potential Context:** The user might be creating this graph for a specific section of a report focused only on Jan-Sep trends, even if the full data is shown.

3. **Suggestions for Improvement & Critical Thinking:**

* **Clarify Purpose:** Ask yourself: "What is the *main* message I want this graph's title to convey?" Is it the trend, the comparison between regions, the specific metric, or the time period?

* **Audience Consideration:** Who is this graph for? Would they understand terms like "rolling average per 100k," or does it need simplification (even if slightly less precise)?

* **Compare Alternatives:** Draft 2-3 different title options and compare their strengths and weaknesses in terms of accuracy, clarity, and conciseness.

4. **Thought Process:**

* First, I carefully examined the graph's components: the existing title, the axis labels (especially the y-axis definition), the plotted lines, and the x-axis time scale.

* Then, I analyzed the user's proposed title, breaking it down into its core components ("Number of People Infected" and "Between January and September").

* I compared the proposed title's components directly against the graph's actual data representation and time frame, identifying discrepancies (rate vs. number, limited timeframe vs. actual timeframe).

* Referencing the iSage principles (Socratic Guidance, Cognitive Skill Development), I formulated questions to guide the user toward recognizing these discrepancies themselves rather than just stating the title was wrong.

* I pointed out that the existing title/subtitle are already quite descriptive, offering a potential baseline.

* Finally, I invited the user to reflect on their goal for the title to facilitate refinement.

Expand full comment

Reply (1)

Enrico Bertini

Apr 5

Yes. I tried with Claude and got something similar. But you have to dig deeper to see if and when it catches the idea that cases are different from infections conceptually not just as a variable name. Thanks for sharing! This is awesome!!!

Expand full comment

FILWD

Can LLMs Detect Reasoning Errors with Charts?