3D Vision Meets Natural Language

3D Vision Meets Natural Language

Advancing Autonomous Vehicles' Understanding of Human Instructions

NuGrounding introduces a breakthrough framework enabling autonomous vehicles to understand natural language instructions and precisely locate objects across multiple camera views.

  • First large-scale benchmark for multi-view 3D visual grounding specifically designed for autonomous driving
  • Implements a Hierarchy of Grounding (HoG) approach that effectively integrates 3D geometric reasoning with language comprehension
  • Addresses critical limitations in existing datasets by providing more detailed, fine-grained language instructions
  • Significantly improves vehicles' ability to interpret complex commands in real-world driving environments

This engineering advancement represents a crucial step toward intuitive human-machine interaction in autonomous vehicles, potentially enhancing both safety and user experience by allowing natural communication with self-driving systems.

NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving

56 | 66