
Teaching Robots Through Video Observation
Using Latent Motion Tokens to Bridge Human Motion and Robot Action
This research introduces Moto, a novel approach that allows robots to learn manipulation skills by observing human demonstration videos without explicit action labeling.
- Creates a unified representation of motion that works across different embodiments
- Leverages abundant video data instead of expensive labeled demonstrations
- Achieves zero-shot transfer of skills from human videos to robot actions
- Demonstrates improved performance on manipulation tasks with minimal training
This breakthrough has significant implications for manufacturing automation, enabling robots to learn complex assembly tasks more efficiently and reducing the programming burden for industrial applications.
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos