
LLMs and Buggy Code: Testing Implications
How incorrect code affects test generation by language models
This research investigates how code correctness influences language models' ability to generate effective test cases—a critical insight for software engineering practices.
- LLMs generate more accurate test cases when prompted with correct code versus buggy code
- Study evaluated 11 language models (5 open-source, 6 closed-source) across 3 benchmark datasets
- Findings reveal the quantifiable gap in test quality when models work with flawed code
- Results highlight important considerations for automated testing workflows and developer tools
For software engineers, this research provides practical guidance on when and how to integrate LLM-based test generation into development processes, especially when dealing with potentially buggy code.
Measuring the Influence of Incorrect Code on Test Generation