Teaching Code LLMs to Learn from Mistakes

This research introduces RLEF (Reinforcement Learning with Execution Feedback), a novel approach that enables code-generating AI to autonomously improve based on execution results.

Trains LLMs to effectively leverage execution feedback through reinforcement learning
Achieves 40% improvement in complex coding tasks like SQL challenges
Demonstrates self-correction capabilities without human intervention
Outperforms previous methods in generating code that works on the first try

This advancement represents a significant step toward more reliable AI coding assistants that can learn from mistakes and iteratively refine solutions—essential for real-world engineering applications where code must be robust and functional.

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning