Prompt Theft Detection

Prompt Detective is a novel statistical method that reliably determines if a system prompt has been used by a third-party LLM, addressing a critical gap in prompt privacy protection.

Creates a membership inference framework to detect unauthorized prompt usage
Achieves high detection accuracy across various LLM architectures and contexts
Introduces techniques that withstand evasion attempts by adversaries
Establishes a foundation for prompt copyright protection in the emerging AI economy

Why it matters: As proprietary system prompts become valuable intellectual property in AI development, this research provides security professionals with concrete methods to detect theft, enabling companies to protect their prompt engineering investments.

Has My System Prompt Been Used? Large Language Model Prompt Membership Inference