
Making Robots Understand Human Intent Naturally
Combining Voice Commands with Pointing Gestures through LLM Integration
This research introduces a multimodal interaction framework that enables more intuitive human-robot communication by combining verbal commands with natural pointing gestures.
- Addresses challenges faced by elderly users with complex syntax or traditional gesture-based systems
- Integrates voice commands with deictic posture information (pointing)
- Leverages Large Language Models to interpret combined multimodal inputs
- Creates a more accessible interface for service robots in healthcare and elderly care settings
Business Impact: As populations age globally, this technology could significantly improve elderly care by making service robots more accessible to users with limited technological familiarity or physical capabilities.