3 Comments

In my summer course for masters students I ran an in-class activity to demonstrate this by giving students a somewhat messy dataset and a question and then having them compete to be the first to use gen AI to get the correct answer. Everyone gets it wrong, again, and again, until they learn that they have to dissect what the llm is doing analytically, how it's pulling variables, analytic definitions clear, etc. I do think there should be some nice UI designs you can come up with to make analytic output faster / easier to verify...

Expand full comment

Yep! Working on it! 😎

Expand full comment

good post! I mainly use Claude 3 Sonnet via Perplexity AI and the growing capability of these models are definitely creating an automation cognitive bias in me leading me to spend less time verifying the outputs because of how impressive the outputs are most of the time. Most of my use case are not analytical and more research around texts.

1. The following prompt for my perplexity AI account does a fairly good job at AI explainability. there are other technical methods as well like SHAP and decision trees. https://www.perplexity.ai/search/what-are-common-technical-meth-XjlMH5PyRrunNlZ_hqEKVg

"AI explainability (What were the most important variables and factors impacting this prompt and what are the percentage weights to the variables and factors were used in driving your answer.)

Also summarize the attention weights into a simple table and show which parts of the input the model is focusing on when generating the output. "

2. another useful approach for verification is just to feed the results of one LLM into another LLLM. LLMs have different model weights based on the training data, parameter size and fine tuning work done to them.

3. Lastly with the release of Mistral Agents today, shows the rise of agents specifically built to verify output of other agents which might help lessen the verification bottleneck you described above.

https://docs.mistral.ai/capabilities/agents/

Use case 4: Data analytical multi-agent workflow

Expand full comment