
ARCON: Next-Generation Video Prediction
Auto-Regressive Continuation for Enhanced Driving Video Generation
ARCON introduces a novel approach for video continuation using Large Vision Models (LVMs) by alternating between semantic and RGB token generation for more consistent and accurate predictions.
- Implements an alternating semantic-RGB token generation scheme to improve structural consistency
- Enhances video prediction accuracy through specialized optical flow-based texture stitching
- Demonstrates particular effectiveness in driving scenarios where accurate prediction is safety-critical
- Bridges the gap between large language models and visual prediction tasks
This research advances engineering capabilities for autonomous vehicles and simulation systems by improving how AI models predict future frames in dynamic environments — a critical component for safe autonomous driving systems and realistic driving simulators.
Original Paper: ARCON: Advancing Auto-Regressive Continuation for Driving Videos