Making Robots Understand Human Intent Naturally

This research introduces a multimodal interaction framework that enables more intuitive human-robot communication by combining verbal commands with natural pointing gestures.

Addresses challenges faced by elderly users with complex syntax or traditional gesture-based systems
Integrates voice commands with deictic posture information (pointing)
Leverages Large Language Models to interpret combined multimodal inputs
Creates a more accessible interface for service robots in healthcare and elderly care settings

Business Impact: As populations age globally, this technology could significantly improve elderly care by making service robots more accessible to users with limited technological familiarity or physical capabilities.

Natural Multimodal Fusion-Based Human-Robot Interaction: Application With Voice and Deictic Posture via Large Language Model