Hidden in Plain Sight: LLM Security Threats

This research reveals how situation-driven attacks using everyday contexts can create deceptive prompts that evade LLM safety mechanisms.

Uses realistic movie script scenarios to craft human-readable adversarial prompts
Creates prompts that appear harmless to humans but trigger harmful LLM responses
Demonstrates higher success rates than nonsensical attacks, which can be easily detected
Highlights critical security gaps in current LLM defense strategies

For security teams, this work exposes a concerning vulnerability: attacks that blend into normal conversation are significantly harder to detect and filter, requiring new approaches to LLM safety.

Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context