
Cross-Modal Safety Vulnerabilities in AI
When safe inputs still produce unsafe outputs in vision-language models
This research introduces the Safe Inputs but Unsafe Output (SIUO) challenge, exposing critical safety alignment gaps in large vision-language models.
- Models can generate unsafe content even when individual inputs appear harmless
- Cross-modal interactions create new vulnerability patterns that bypass traditional safety guardrails
- Researchers developed a comprehensive benchmark spanning security, ethical, legal, and medical domains
- Current models demonstrate concerning safety gaps when processing multi-modal inputs
For security professionals, this work highlights urgent safety alignment needs in deployed AI systems that handle both visual and textual information, as seemingly harmless inputs can trigger harmful outputs when combined.