Code Length Matters in Vulnerability Detection

Code Length Matters in Vulnerability Detection

How input size affects LLM security performance

This research evaluates how the length of tokenized code impacts LLM accuracy in detecting security vulnerabilities.

Key findings:

  • Models like GPT-4, Mistral, and Mixtral demonstrated robust performance regardless of input length
  • Other LLMs showed significant performance variations based on code size
  • Chi-square tests confirmed inconsistent behavior across different models
  • Researchers recommend future LLM development focus on reducing input length sensitivity

Security Implications: As organizations increasingly rely on LLMs for code security, understanding these performance variations is critical for developing reliable vulnerability detection workflows and choosing appropriate models.

Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows

94 | 251