LLMs and Buggy Code: Testing Implications

This research investigates how code correctness influences language models' ability to generate effective test cases—a critical insight for software engineering practices.

LLMs generate more accurate test cases when prompted with correct code versus buggy code
Study evaluated 11 language models (5 open-source, 6 closed-source) across 3 benchmark datasets
Findings reveal the quantifiable gap in test quality when models work with flawed code
Results highlight important considerations for automated testing workflows and developer tools

For software engineers, this research provides practical guidance on when and how to integrate LLM-based test generation into development processes, especially when dealing with potentially buggy code.

Measuring the Influence of Incorrect Code on Test Generation