Automated Attacks Against LLM Systems

Automated Attacks Against LLM Systems

Using Multi-Agent Systems to Test LLM Security

This research introduces a framework for systematically testing LLMs against prompt leakage attacks using cooperative AI agents that work together to expose system-level prompts.

  • Implements a multi-agent approach using AG2 (formerly AutoGen) to probe and exploit target LLMs
  • Defines prompt leakage as a critical security threat to LLM deployments
  • Provides a methodology for evaluating and strengthening LLM defenses
  • Demonstrates how agentic teams can collaborate to bypass security measures

For security teams, this research highlights the urgent need for robust defensive mechanisms against sophisticated attacks that can extract proprietary information from deployed LLM systems.

Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach

29 | 45