Mobile Manipulation Instruction Generation from Multiple Ima...

Abstract:

We consider the problem of generating free-form mobile manipulation instructions based on a target object image and receptacle image. Conventional image captioning models are not able to generate appropriate instructions because their architectures are typically optimized for single-image. In this s...

Key points:

Research on large language models
Engineering application

Source: Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement