Defeating Watermarking in LLMs

This research reveals critical vulnerabilities in current LLM watermarking techniques, showing how watermarked text can be manipulated to bypass detection.

Watermarking without input repetition masking is susceptible to evasion through content modification
Human-like language adaptation patterns can be exploited to remove watermarks
Attackers can manipulate watermarked text by repeating inputs in specific ways
Study proposes improved watermarking techniques that resist these evasion methods

For security professionals, this research highlights the ongoing cat-and-mouse game in synthetic content detection, demonstrating the need for more robust watermarking solutions that account for adaptive text manipulation.

Watermarking Needs Input Repetition Masking