Open-Vocabulary Video Relationship Detection

Open-Vocabulary Video Relationship Detection

Advancing security surveillance with multi-modal prompting

This research introduces an end-to-end approach for detecting visual relationships between objects in videos without being limited to predefined categories.

  • Eliminates dependency on pre-trained trajectory detectors through novel multi-modal prompting
  • Detects relationships between both seen and unseen objects in video content
  • Achieves superior performance while reducing computational complexity
  • Enables more accurate security monitoring by understanding object interactions

For security applications, this advancement significantly improves video surveillance capabilities by detecting suspicious interactions between objects and people in real-time, enhancing threat detection without requiring extensive labeled training data.

End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting

7 | 100