
Mobile Manipulation Instruction Generation from Multiple Ima...
By Kei Katsumata, Motonari Kambara...
Abstract:
We consider the problem of generating free-form mobile manipulation instructions based on a target object image and receptacle image. Conventional image captioning models are not able to generate appropriate instructions because their architectures are typically optimized for single-image. In this s...
Key points:
- Research on large language models
- Engineering application
Source: Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement