Security in Multimodal LLMs and Vision-Language Models
Research on security challenges specific to multimodal LLMs and vision-language models, including cross-modal safety alignment

Security in Multimodal LLMs and Vision-Language Models
Research on Large Language Models in Security in Multimodal LLMs and Vision-Language Models

Securing Vision Language Models
A comprehensive safety alignment dataset to prevent harmful outputs

Cross-Modal Safety Vulnerabilities in AI
When safe inputs still produce unsafe outputs in vision-language models

Exposing Vulnerabilities in Mobile GUI Agents
A systematic framework for security testing of AI-driven mobile interfaces

Security Vulnerabilities in AI Robots
How embodied LLMs can be manipulated to perform harmful actions

Advancing Visual Intelligence: Human-Object Interaction Detection
Foundational technology for security monitoring and contextual understanding

Open-Vocabulary Video Relationship Detection
Advancing security surveillance with multi-modal prompting

FakeShield: Combating AI-Generated Image Forgery
Explainable Forgery Detection via Multi-modal LLMs

GLOV: Leveraging LLMs to Enhance Vision Models
Using language models as optimization guides for vision-language systems

Bridging Safety Gaps in Vision-Language Models
Transferring text-based safety mechanisms to protect against toxic visual content

Upgrading Robot Safety Through Smart Dialogues
AI-powered communication for safety-critical scenarios

UniGuard: Fortifying AI Against Multimodal Attacks
A novel approach to protecting MLLMs from jailbreak vulnerabilities

VideoGLaMM: Precision Video Understanding
Advancing pixel-level grounding in video content

The VLLM Security Paradox
Understanding why jailbreaks and defenses are both surprisingly effective

Strengthening AI Defense Against Visual Jailbreaks
A real-time protection framework for multimodal systems

Hiding in Plain Sight
Developing Scene-Coherent Typographic Attacks Against Vision-Language Models

Exposing the Vulnerabilities of Vision-Language Models
A novel framework for testing AI robustness to real-world 3D variations

Exploiting Vision-Language Models
New Black-Box Jailbreak Attack Maximizes Toxic Outputs in LVLMs

Next-Gen Video Anomaly Detection
Understanding anomalies at multiple time scales and contexts

Smarter Anomaly Detection with Minimal Examples
Enhancing security systems through graph-based visual prompts

Combating Hallucinations in Visual AI
A systematic approach to evaluating and mitigating AI visual hallucinations

AI Vision for the Visually Impaired
Using Vision-Language Models to Guide Navigation

Leveraging MLLMs for Image Safety
A human-free approach to detecting unsafe visual content

PromptGuard: Securing AI-Generated Images
A Novel Soft Prompt Approach for Safer Text-to-Image Models

FaceXBench: Testing AI's Face Understanding
First comprehensive benchmark for evaluating MLLMs on face recognition tasks

Combating Visual Disinformation in News
Using Vision-Language Models to Verify Cross-modal Entity Consistency

Enhancing Safety in Vision-Language Models
Addressing the Safety Reasoning Gap in VLMs

Testing LVLMs for Security Applications
Evaluating AI giants in human re-identification tasks

The Gaslighting Vulnerability in Multimodal AI
How negation arguments can trick advanced vision-language models

Predicting MLLM Reliability Under Shifting Conditions
A New Information-Theoretic Framework for Quantifying MLLM Risks

Defending MLLMs Against Jailbreak Attacks
A novel approach to protect multimodal AI from security exploits

Audio Jailbreaks: Exposing ALM Vulnerabilities
How adversarial audio can bypass security in Audio-Language Models

Voice Jailbreak Attacks on Multimodal LLMs
New Security Vulnerabilities in AI Systems Processing Multiple Input Types

Fortifying AI Vision Against Attacks
Building Robust Multi-modal Language Models That Resist Adversarial Manipulation

MaxInfo: Intelligent Video Frame Selection
Training-free approach to capture what truly matters in videos

Explaining Audio Differences with AI
Pioneering Framework for Natural Language Audio Analysis

Breaching VLLM Security Guardrails
How sophisticated attacks can bypass multi-layered safety defenses in Vision Large Language Models

Cloud-Edge-Terminal Collaboration for Video Analytics
Advancing distributed video processing for security applications

Vision-Enhanced LLMs for Safer Autonomous Driving
Combining Visual Processing with LLM Reasoning for Complex Road Scenarios

Optimizing Vision-Language Models for Edge Devices
Bringing powerful AI vision capabilities to resource-constrained environments

Exploiting DeepSeek's Visual Vulnerabilities
How embedding manipulation induces targeted hallucinations in multimodal AI

Breaking Alignment: Universal Attacks on Multimodal LLMs
How a single optimized image can bypass safety guardrails across multiple models

Breaking AI Defenses Across Models
A novel approach to testing vision-language model security

Stealthy Typographic Attacks on Vision-Language Models
New vulnerabilities in multi-image settings reveal enhanced security risks

Aligning Multimodal LLMs with Human Preferences
Advancing security and capability through MM-RLHF

Fighting Visual Misinformation with E²LVLM
Enhancing multimodal fact-checking through evidence filtering

Exploiting Visual Distractions in MLLMs
How complexity of visuals can bypass AI safety guardrails

Benchmarking Safety in Multimodal AI
First comprehensive safety awareness evaluation for text-image AI models

Fortifying VLMs Against Adversarial Attacks
A novel DPO approach for safer vision-language models

Securing Multimodal AI Systems
Cost-Effective Security Alignment Using Synthetic Embeddings

Hidden Defenders Against AI Jailbreaking
Detecting attacks on vision-language models through hidden state monitoring

Securing the Vision of AI
A Framework for LVLM Safety in the Age of Multimodal Models

Enhancing Object Detection with MQADet
A plug-and-play approach for improved open-vocabulary detection

AI-Powered Meme Moderation for Singapore
Leveraging Multimodal LLMs to Detect Offensive Content in Cultural Context

Exploiting the Blind Spots of MLLMs
A Dynamic Approach to Transfer Adversarial Attacks Across Models

Defending AI Vision Against Chart Deception
Protecting multimodal LLMs from misleading visualizations

Combating Hallucinations in Vision-Language Models
A Statistical Framework for Factuality Guarantee in LVLMs

Combating Face Manipulation
A Multimodal Approach to More Effective Forgery Detection

Defending Against Toxic Images in AI
Zero-Shot Protection for Large Vision-Language Models

Boosting Security with Automated Annotation Verification
ClipGrader: Using AI to Validate Object Detection Labels

Enhancing Vision Systems with Text-Guided Multimodal Fusion
Leveraging LLMs for RGB-Thermal fusion in challenging conditions

Exposing VLM Security Vulnerabilities
Novel red teaming approach reveals dangerous blind spots in vision-language models

Dynamic Tracking: Adapting to Reality
Enhancing visual-language tracking through real-time semantic updates

Illumination Vulnerabilities in AI Vision
How lighting changes can deceive vision-language models

LLM-Enhanced UAV Object Detection
Bridging Semantic Gaps for Better Aerial Detection

Real-Time Detection Without Boundaries
Advancing open-set object detection for security applications

Natural Language Person Identification
Advancing Security Through Person-Centric Visual Recognition

Advanced Facial Recognition for Video
Enhancing Security Through Multimodal AI Understanding

Uncovering LMM Vulnerabilities to Extremist Content
New benchmark reveals critical security gaps in AI safety systems

Making Vision-Language Models Safer
Novel approach to identify and neutralize unsafe model weights

Web Artifact Attacks: A New Security Threat to AI Vision
How seemingly harmless web elements can manipulate vision-language models

Defending Multimodal LLMs Against Adversarial Attacks
Understanding vulnerabilities across text, image, audio, and video modalities

Making Multimodal AI Assistants Safer
Enhancing MLLM safety through preference optimization

Evaluating the Reliability of Vision-Language Models
A comprehensive security and values assessment framework

Securing AI-Generated Images
Preventing Inappropriate Content through Subspace Projection

Securing Multimodal AI Systems
A Novel Framework for Safe Reinforcement Learning from Human Feedback

Bridging Visible to Infrared: Advanced Image Translation
AI-powered security imaging through vision-language understanding

Exposing Multimodal AI Vulnerabilities
How MIRAGE reveals security gaps in image-text AI systems

Context-Aware Image Segmentation
Leveraging LLMs to enhance pixel-level understanding beyond visual features

Enhancing Security with Thermal Vision
First benchmark for evaluating thermal image understanding in AI models

Intelligent Deepfake Detection
Multi-Modal Detection with Integrated Explanations

Improving ML Security through Better Error Detection
Novel approach for identifying misclassifications in vision-language models

Disrupting AI Video Surveillance
Protective Watermarking to Prevent Unauthorized Video Analysis

Combating Visual Hallucinations in AI
Automated detection of systematic errors in vision-language models

Bridging the Sky-Ground Gap in Person Identification
Using text-based attributes to enhance aerial-ground surveillance

Precise Instance Tracking Across Visual Content
Advancing multimodal retrieval for security and surveillance applications

Securing Vision-Language Models Against Noise Attacks
New defenses against jailbreak attacks using noisy images

Fortifying Vision-Language Models Against Attacks
A novel preference optimization approach for robust LVLMs

Defending Vision-Language Models Against Jailbreaks
Adaptive Memory Framework for Enhanced Security

The SCAM Dataset: Exposing Visual-Text Vulnerabilities
Largest real-world dataset for evaluating multimodal model security

Unlocking Facial Intelligence
A multimodal LLM for facial expression and attribute understanding

VideoExpert: Enhancing Video Timestamp Precision
Solving temporal-sensitive understanding in multimodal LLMs

Securing Multi-modal AI Systems
First systematic safety analysis of multi-modal large reasoning models

Zero-Shot Object Tracking with Natural Language
ReferGPT: Tracking multiple objects using only text descriptions without training data

Rethinking Safety Alignment in Multi-modal AI
Building safer models without malicious training data

Multi-Level Video Understanding for AI
Advancing MLLMs with granular video processing capabilities

Efficient Person Search with Language
How parameter-efficient transfer learning revolutionizes text-based person retrieval

Robust Object Detection Across Changing Environments
A new benchmark for measuring resilience to distribution shifts
