
Detecting Copyright Infringement in AI Models
A Novel Approach to Verify if Vision-Language Models Used Copyrighted Content
DIS-CO introduces a groundbreaking technique to identify whether copyrighted material was included in training data for vision-language models (VLMs) without requiring direct access to training datasets.
- Leverages the hypothesis that VLMs can recognize images from their training corpus
- Extracts content identity by repeatedly querying the model with specific frames
- Demonstrates effective identification of copyrighted content across various media types
- Raises important implications for intellectual property rights in AI development
This research addresses critical security concerns in the AI industry, providing rights holders with tools to verify potential copyright infringement and helping model developers demonstrate compliance with intellectual property laws.
DIS-CO: Discovering Copyrighted Content in VLMs Training Data