
Smart Robots That Understand Instructions
Teaching Robots to Process Language, Images, and Maps Together
LIAM is a groundbreaking end-to-end model that enables domestic service robots to understand natural language instructions alongside visual and spatial information.
- Integrates language instructions, images, action sequences, and semantic maps into a unified transformer architecture
- Eliminates need for task-specific programming by allowing flexible task descriptions
- Leverages large language models and open-vocabulary perception for improved domestic robot capabilities
- Addresses the high variability of household tasks through multimodal understanding
This engineering advancement represents a significant step toward more adaptable and useful home robots that can understand context and follow instructions naturally in domestic environments.
LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps