Securing LLMs with Instruction Hierarchy

Securing LLMs with Instruction Hierarchy

A novel embedding approach to prevent prompt attacks

Instructional Segment Embedding creates a hierarchical structure for LLM inputs, significantly improving safety and security against common attacks.

  • Establishes priority levels for different instruction types (system messages, user prompts, data)
  • Prevents lower-priority user prompts from overriding critical system instructions
  • Demonstrates effectiveness against prompt injection, extraction, and harmful requests
  • Provides a structural solution rather than relying solely on prompt engineering

This research addresses a fundamental security vulnerability in LLM architecture, offering a more robust defense against manipulation attempts - essential as LLMs become more integrated into critical business applications.

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

15 | 45