
How Well Can AI 'See' Through Words?
Evaluating multimodal perception capabilities in advanced LLMs
This research assesses how accurately large language models like GPT-4 interpret sensory experiences described in language, comparing their ratings with human perceptual judgments.
- GPT-4 and GPT-4o demonstrated strongest alignment with human perceptual ratings across sensory modalities
- Multimodal inputs significantly enhanced models' ability to ground language in sensory experience
- Perceptual ratings provide an effective benchmark for evaluating LLMs' sensory understanding
- Models showed varying strengths across different sensory domains (visual, auditory, tactile)
For linguists, this research offers valuable insights into how AI systems process and interpret sensory descriptions in language, revealing both their capabilities and limitations in bridging linguistic expressions with perceptual experiences.
Exploring Multimodal Perception in Large Language Models Through Perceptual Strength Ratings