
3D Vision Meets Natural Language
Advancing Autonomous Vehicles' Understanding of Human Instructions
NuGrounding introduces a breakthrough framework enabling autonomous vehicles to understand natural language instructions and precisely locate objects across multiple camera views.
- First large-scale benchmark for multi-view 3D visual grounding specifically designed for autonomous driving
- Implements a Hierarchy of Grounding (HoG) approach that effectively integrates 3D geometric reasoning with language comprehension
- Addresses critical limitations in existing datasets by providing more detailed, fine-grained language instructions
- Significantly improves vehicles' ability to interpret complex commands in real-world driving environments
This engineering advancement represents a crucial step toward intuitive human-machine interaction in autonomous vehicles, potentially enhancing both safety and user experience by allowing natural communication with self-driving systems.
NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving