Stress-testing research with AI, now super easy and fully automated
- Tomas Havranek

- May 31
- 1 min read
Last December we shared a protocol for stress-testing (meta-)research by making AI models argue and keeping what survives. But running it by hand (opening several models, copying outputs back and forth) is a chore and our original automation via a GPT agent was not reliable.
So we automated the protocol using Claude Code. After a one-time setup it is a single sentence: you describe the task, and the skill (built with Zuzana Irsova) has Claude call OpenAI's Codex, runs the critique and synthesis rounds, and gives you a memo with the full trail of the debate.
We built it with meta-analysis in mind, but it works on any paper, proposal, or task. More generally, it is a simple way to have one AI check another's work. A worked example (WAIVE vs. MAIVE) is included so you can see what it produces.
The manual protocol (including a four-model version with Gemini and Grok in addition to Claude and GPT) is still here if you prefer copy-paste:
Try it on something you are working on and tell us how we could improve it!



I wholeheartedly recommend this new AI tool from Tomas and Zuzana. For those who are not used to working with the terminal versions of Claude Code and Codex -- like me -- the setup required some help from ChatGPT. However, well worth it. I used the mad-research skill. The comments I received were good and caused me to make some changes to my paper. I also compared it to two proprietary AI review sites: https://www.refine.ink/sign-up and https://reviewer3.com/ . mad-research was comparable, if not superior. And, of course, it is free. mad-research will be part of standard toolkit in the future when writing papers.