Zero-Shot Object Tracking with Natural Language

ReferGPT introduces a zero-shot framework that can track multiple objects based solely on natural language descriptions without requiring labeled training data.

Combines the power of large language models with visual tracking technology
Enables tracking of any described object across video frames without prior training
Demonstrates strong performance against supervised methods requiring extensive training
Offers flexibility to track open-set objects based on textual queries

This advancement has significant security implications for autonomous driving systems and surveillance, allowing for more adaptable and robust tracking of vehicles, pedestrians, and potential threats through natural language descriptions.

ReferGPT: Towards Zero-Shot Referring Multi-Object Tracking