LLMs as Backend Developers: A Security Risk?

BaxBench is a novel benchmark that evaluates LLMs' ability to generate complete, production-ready backend applications with a focus on security vulnerabilities.

More than half of LLM-generated backend code contains security vulnerabilities
Current LLMs can generate functional code but struggle with complex tasks involving database interactions and authentication
Security issues include SQL injections, broken access controls, and insecure session management
LLMs require significant guidance and prompting to produce secure, production-ready backends

These findings highlight critical security concerns for organizations considering automated code generation, emphasizing the need for human review and security testing of AI-generated applications.

BaxBench: Can LLMs Generate Correct and Secure Backends?