Teaching Robots Through Video Observation

Teaching Robots Through Video Observation

Using Latent Motion Tokens to Bridge Human Motion and Robot Action

This research introduces Moto, a novel approach that allows robots to learn manipulation skills by observing human demonstration videos without explicit action labeling.

  • Creates a unified representation of motion that works across different embodiments
  • Leverages abundant video data instead of expensive labeled demonstrations
  • Achieves zero-shot transfer of skills from human videos to robot actions
  • Demonstrates improved performance on manipulation tasks with minimal training

This breakthrough has significant implications for manufacturing automation, enabling robots to learn complex assembly tasks more efficiently and reducing the programming burden for industrial applications.

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos

64 | 168