Stealing PII Through Model Merging

This research reveals how malicious actors can extract personally identifiable information (PII) from aligned language models through seemingly legitimate model merging techniques.

Introduces the Merger-as-a-Stealer attack framework that exploits model merging procedures
Demonstrates capability to extract specific PII data from safety-aligned LLMs
Shows how attackers can pose as legitimate mergers while conducting unauthorized data extraction
Highlights critical security gaps in current model merging practices

This work exposes significant security vulnerabilities in an increasingly common LLM enhancement technique, demanding urgent attention to develop robust defense mechanisms for protecting sensitive information during model integration processes.

Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging