LLMs and Buggy Code: Testing Implications

LLMs and Buggy Code: Testing Implications

How incorrect code affects test generation by language models

This research investigates how code correctness influences language models' ability to generate effective test cases—a critical insight for software engineering practices.

  • LLMs generate more accurate test cases when prompted with correct code versus buggy code
  • Study evaluated 11 language models (5 open-source, 6 closed-source) across 3 benchmark datasets
  • Findings reveal the quantifiable gap in test quality when models work with flawed code
  • Results highlight important considerations for automated testing workflows and developer tools

For software engineers, this research provides practical guidance on when and how to integrate LLM-based test generation into development processes, especially when dealing with potentially buggy code.

Measuring the Influence of Incorrect Code on Test Generation

46 | 323