Smart Robots That Understand Instructions

Smart Robots That Understand Instructions

Teaching Robots to Process Language, Images, and Maps Together

LIAM is a groundbreaking end-to-end model that enables domestic service robots to understand natural language instructions alongside visual and spatial information.

  • Integrates language instructions, images, action sequences, and semantic maps into a unified transformer architecture
  • Eliminates need for task-specific programming by allowing flexible task descriptions
  • Leverages large language models and open-vocabulary perception for improved domestic robot capabilities
  • Addresses the high variability of household tasks through multimodal understanding

This engineering advancement represents a significant step toward more adaptable and useful home robots that can understand context and follow instructions naturally in domestic environments.

LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps

133 | 168