Zero-Shot Object Tracking with Natural Language

Zero-Shot Object Tracking with Natural Language

ReferGPT: Tracking multiple objects using only text descriptions without training data

ReferGPT introduces a zero-shot framework that can track multiple objects based solely on natural language descriptions without requiring labeled training data.

  • Combines the power of large language models with visual tracking technology
  • Enables tracking of any described object across video frames without prior training
  • Demonstrates strong performance against supervised methods requiring extensive training
  • Offers flexibility to track open-set objects based on textual queries

This advancement has significant security implications for autonomous driving systems and surveillance, allowing for more adaptable and robust tracking of vehicles, pedestrians, and potential threats through natural language descriptions.

ReferGPT: Towards Zero-Shot Referring Multi-Object Tracking

94 | 100