Stealing PII Through Model Merging

Stealing PII Through Model Merging

A novel security vulnerability in LLM integration processes

This research reveals how malicious actors can extract personally identifiable information (PII) from aligned language models through seemingly legitimate model merging techniques.

  • Introduces the Merger-as-a-Stealer attack framework that exploits model merging procedures
  • Demonstrates capability to extract specific PII data from safety-aligned LLMs
  • Shows how attackers can pose as legitimate mergers while conducting unauthorized data extraction
  • Highlights critical security gaps in current model merging practices

This work exposes significant security vulnerabilities in an increasingly common LLM enhancement technique, demanding urgent attention to develop robust defense mechanisms for protecting sensitive information during model integration processes.

Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging

5 | 14