Mobile Manipulation Instruction Generation from Multiple Ima...

Mobile Manipulation Instruction Generation from Multiple Ima...

By Kei Katsumata, Motonari Kambara...

Abstract:

We consider the problem of generating free-form mobile manipulation instructions based on a target object image and receptacle image. Conventional image captioning models are not able to generate appropriate instructions because their architectures are typically optimized for single-image. In this s...

Key points:

  • Research on large language models
  • Engineering application

Source: Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement

78 | 168