Code Length Matters in Vulnerability Detection

This research evaluates how the length of tokenized code impacts LLM accuracy in detecting security vulnerabilities.

Key findings:

Models like GPT-4, Mistral, and Mixtral demonstrated robust performance regardless of input length
Other LLMs showed significant performance variations based on code size
Chi-square tests confirmed inconsistent behavior across different models
Researchers recommend future LLM development focus on reducing input length sensitivity

Security Implications: As organizations increasingly rely on LLMs for code security, understanding these performance variations is critical for developing reliable vulnerability detection workflows and choosing appropriate models.

Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows