Multi-View Driving Scene Understanding for AI

NuPlanQA introduces a breakthrough dataset and benchmark for evaluating how Multi-Modal Large Language Models comprehend complex driving scenarios across multiple camera views.

Developed NuPlanQA-Eval, a first-of-its-kind multi-view evaluation benchmark for driving scene understanding
Created BEV-LLM architecture that integrates Bird's-Eye-View features for improved spatial reasoning in driving contexts
Addresses critical engineering challenges in autonomous vehicle development by enabling MLLMs to process and interpret multi-view information
Enhances safety considerations through comprehensive scene understanding capabilities

This research bridges a critical gap in autonomous driving technology by enabling AI systems to better understand complex driving environments from multiple perspectives, accelerating the path toward safer self-driving vehicles.

NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models