
Securing LLMs with Instruction Hierarchy
A novel embedding approach to prevent prompt attacks
Instructional Segment Embedding creates a hierarchical structure for LLM inputs, significantly improving safety and security against common attacks.
- Establishes priority levels for different instruction types (system messages, user prompts, data)
- Prevents lower-priority user prompts from overriding critical system instructions
- Demonstrates effectiveness against prompt injection, extraction, and harmful requests
- Provides a structural solution rather than relying solely on prompt engineering
This research addresses a fundamental security vulnerability in LLM architecture, offering a more robust defense against manipulation attempts - essential as LLMs become more integrated into critical business applications.
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy