Defending AI Agents Against Deception

This research addresses critical security vulnerabilities in AI agents that can be exploited through deceptive content in their operational environment.

Identifies how adversaries can manipulate AI agents through deceptive pop-ups and visual content
Demonstrates that simple defensive instructions like "ignore deceptive elements" are insufficient
Proposes effective in-context defense mechanisms that significantly improve agent resistance to manipulation
Provides a comprehensive evaluation framework for assessing agent security

As AI agents become more integrated into critical systems, these defense mechanisms are essential for protecting sensitive operations and maintaining user trust in automated systems.

In-Context Defense in Computer Agents: An Empirical Study