LLaMo: Preserving Motion in Native Form

LLaMo is a novel multimodal framework that keeps human motion data in its native form during instruction tuning, preserving critical details that traditional tokenization approaches lose.

Key innovations:

Processes motion data in its native format rather than converting to language tokens
Preserves motion-specific details critical for nuanced understanding
Improves model ability to interpret complex human movements
Enables more accurate behavioral prediction and analysis

Security implications: By enhancing motion analysis and behavioral prediction capabilities, LLaMo creates new possibilities for security applications including anomaly detection and surveillance systems that can better understand and predict human behaviors.

Human Motion Instruction Tuning