3D Vision Meets Natural Language

NuGrounding introduces a breakthrough framework enabling autonomous vehicles to understand natural language instructions and precisely locate objects across multiple camera views.

First large-scale benchmark for multi-view 3D visual grounding specifically designed for autonomous driving
Implements a Hierarchy of Grounding (HoG) approach that effectively integrates 3D geometric reasoning with language comprehension
Addresses critical limitations in existing datasets by providing more detailed, fine-grained language instructions
Significantly improves vehicles' ability to interpret complex commands in real-world driving environments

This engineering advancement represents a crucial step toward intuitive human-machine interaction in autonomous vehicles, potentially enhancing both safety and user experience by allowing natural communication with self-driving systems.

NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving