When AI fails, don't just retry. Have it reflect on what went wrong, remember the lesson, and try again with that insight. Learning from mistakes without retraining.
When a student gets an exam question wrong, a good teacher doesn't just say "try again." They ask "what went wrong?" and "what would you do differently?" That reflection is what makes the next attempt better, not just another random guess.
Reflexion applies this to AI agents. After a failed attempt, the AI generates a natural-language reflection — a verbal analysis of what went wrong and what to change. That reflection is stored in memory. On the next attempt, the AI reads its past reflections before acting, so each retry is informed by specific lessons learned. This simple addition took GPT-4's code generation accuracy from 67% to 91%.
This composition builds on:
ReAct Check Your WorkReflexion wraps the ReAct agent loop in an outer learning cycle: attempt, evaluate, reflect on failure, store the lesson, and retry with that context. The self-critique becomes persistent memory.
A ReAct-style agent that attempts the task. On retries, it receives past reflections as additional context — essentially reading its own "lessons learned" before trying again.
Scores the attempt. For code, this is running tests. For games, it's the environment outcome. For reasoning, it could be an LLM judge. The key: clear pass/fail signals.
When the evaluator says "fail," this generates a verbal analysis: what went wrong, why, and what to do differently next time. Concrete, actionable insights — not vague "try harder."
Stores reflections across attempts. Kept small (1–3 reflections) so it fits in context. Each retry reads the full memory, carrying forward all lessons learned so far.
Task: "Write a function that finds the second-largest number in a list."
The second attempt wasn't just another guess — it was guided by a specific analysis of what went wrong.
Adding self-reflection and memory to the same model yields dramatic improvements — no retraining needed.
A naive retry loop repeats the same mistakes because it has no memory of what went wrong. Reflexion gives the agent episodic memory — specific, verbal lessons from past failures that shape future attempts. Each retry starts from a better understanding.
The verbal format is key. Instead of opaque numerical signals, the AI writes reflections in plain language: "I forgot to handle edge cases" or "I searched for the wrong keyword." These are exactly the kind of insights that make the next attempt meaningfully different from the last.
Try. Evaluate. If it failed, reflect on why and remember the lesson. Retry with that memory. Each attempt is informed by specific insights from past failures — not just another blind guess.
Reflexion extends ReAct with an outer learning loop. While ReAct handles a single attempt (think-act-observe), Reflexion wraps multiple attempts with evaluation and reflection between them. It's also a more sophisticated version of Check Your Work — instead of just reviewing and fixing in one pass, it generates lasting insights stored in memory.
More advanced systems build on Reflexion: LATS adds tree search over multiple reasoning paths, evaluating and reflecting across an entire search tree rather than just sequential retries. If Reflexion is learning from your mistakes, LATS is exploring all the paths you could have taken.