
Open-Vocabulary Video Relationship Detection
Advancing security surveillance with multi-modal prompting
This research introduces an end-to-end approach for detecting visual relationships between objects in videos without being limited to predefined categories.
- Eliminates dependency on pre-trained trajectory detectors through novel multi-modal prompting
- Detects relationships between both seen and unseen objects in video content
- Achieves superior performance while reducing computational complexity
- Enables more accurate security monitoring by understanding object interactions
For security applications, this advancement significantly improves video surveillance capabilities by detecting suspicious interactions between objects and people in real-time, enhancing threat detection without requiring extensive labeled training data.
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting