Invisible Fingerprints: Black-Box Watermarking for LLMs

Invisible Fingerprints: Black-Box Watermarking for LLMs

Detecting AI-generated text without access to model internals

A novel black-box watermarking technique that allows detection of AI-generated text without requiring access to the model's internal probability distributions.

  • Creates distortion-free watermarks by intelligently manipulating the sampling process
  • Enables nested watermarking where multiple watermarks can be applied sequentially
  • Achieves strong statistical detection while maintaining text quality
  • Works with any API-based LLM access where only text outputs are available

This research provides critical security capabilities for content authentication, giving organizations tools to verify text provenance and detect AI-generated content in practical deployment scenarios where model internals are inaccessible.

A Watermark for Black-Box Language Models

15 | 45