Teaching Robots to See Possibilities

This research introduces KAGI (Knowledge-Guided Affordance for General Interactions), an innovative approach using visual prompting to help robots learn manipulation tasks more effectively.

Leverages vision-language foundation models to identify affordances (action possibilities) in visual scenes
Transforms these affordances into dense reward signals for reinforcement learning
Requires no task-specific training or human demonstrations
Enables robots to learn general manipulation skills through visual understanding

This breakthrough could accelerate robotic deployment in manufacturing environments by simplifying how robots learn to interact with objects, reducing the engineering effort needed to train specialized systems.

Affordance-Guided Reinforcement Learning via Visual Prompting