LLMs as Bug Replicators

This study reveals that Large Language Models frequently reproduce bugs when completing code in bug-prone contexts, raising significant concerns for software development and security.

LLMs show a strong tendency to replicate bugs present in their training data
The research evaluated 7 different language models on bug-prone code completion tasks
Models perform significantly worse on bug-prone code compared to non-buggy contexts
Different model architectures and sizes demonstrated varying susceptibility to reproducing bugs

For engineering teams, this underscores the critical need for robust testing and validation when using AI code assistants in production environments, as automated code generation may silently introduce security vulnerabilities.

LLMs are Bug Replicators: An Empirical Study on LLMs' Capability in Completing Bug-prone Code