LLMs as Security Guards

LLMs as Security Guards

Assessing AI's effectiveness in multi-language vulnerability detection

This research provides the first comprehensive benchmark of large language models (LLMs) for detecting software vulnerabilities across multiple programming languages.

  • LLMs can effectively detect vulnerabilities in 7 programming languages, with GPT-4 achieving the best performance
  • A combined approach using chain-of-thought reasoning and multiple programming contexts significantly enhances detection accuracy
  • Performance varies notably across different vulnerability types and programming languages
  • Models demonstrate strong zero-shot transfer capabilities when fine-tuned on one language and tested on others

These findings highlight LLMs' potential to transform software security practices by providing automated, cross-language vulnerability detection capabilities that can be integrated into development workflows.

Benchmarking Large Language Models for Multi-Language Software Vulnerability Detection

177 | 251