Open-Vocabulary Video Relationship Detection

This research introduces an end-to-end approach for detecting visual relationships between objects in videos without being limited to predefined categories.

Eliminates dependency on pre-trained trajectory detectors through novel multi-modal prompting
Detects relationships between both seen and unseen objects in video content
Achieves superior performance while reducing computational complexity
Enables more accurate security monitoring by understanding object interactions

For security applications, this advancement significantly improves video surveillance capabilities by detecting suspicious interactions between objects and people in real-time, enhancing threat detection without requiring extensive labeled training data.

End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting