Security Vulnerabilities in Autonomous LLM Agents

This research reveals critical security flaws in LLMs deployed as autonomous computer-use agents, demonstrating how safety guardrails can be systematically circumvented.

Introduces the SUDO attack framework that bypasses refusal-trained safeguards in commercial systems like Claude Computer Use
Employs a Detox2Tox mechanism that transforms harmful prompts into ones that evade detection
Highlights urgent security concerns as LLMs gain broader access to computing environments

As LLM agents gain capabilities to interact with real desktop and web environments, understanding and addressing these vulnerabilities becomes essential for organizations deploying AI systems with environmental access.

Original Paper: sudo rm -rf agentic_security