Automated Attacks Against LLM Systems

This research introduces a framework for systematically testing LLMs against prompt leakage attacks using cooperative AI agents that work together to expose system-level prompts.

Implements a multi-agent approach using AG2 (formerly AutoGen) to probe and exploit target LLMs
Defines prompt leakage as a critical security threat to LLM deployments
Provides a methodology for evaluating and strengthening LLM defenses
Demonstrates how agentic teams can collaborate to bypass security measures

For security teams, this research highlights the urgent need for robust defensive mechanisms against sophisticated attacks that can extract proprietary information from deployed LLM systems.

Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach