Securing Code LLMs Against Harmful Content

This research introduces a specialized framework for testing and improving content moderation in Code Large Language Models to prevent harmful naming patterns in generated code.

Identifies critical security vulnerabilities in current code generation moderation systems
Presents CHT (Code Harmfulness Testing) framework that automatically detects harmful code naming patterns
Demonstrates effectiveness through evaluation of popular Code LLMs
Provides actionable insights for strengthening security guardrails in generative AI for developers

This research is vital for organizations implementing Code LLMs, as it addresses specific security risks posed by malicious naming patterns that could bypass conventional content filters, helping to build safer AI coding assistants.

Automated Harmfulness Testing for Code Large Language Models