How Well Can AI 'See' Through Words?

This research assesses how accurately large language models like GPT-4 interpret sensory experiences described in language, comparing their ratings with human perceptual judgments.

GPT-4 and GPT-4o demonstrated strongest alignment with human perceptual ratings across sensory modalities
Multimodal inputs significantly enhanced models' ability to ground language in sensory experience
Perceptual ratings provide an effective benchmark for evaluating LLMs' sensory understanding
Models showed varying strengths across different sensory domains (visual, auditory, tactile)

For linguists, this research offers valuable insights into how AI systems process and interpret sensory descriptions in language, revealing both their capabilities and limitations in bridging linguistic expressions with perceptual experiences.

Exploring Multimodal Perception in Large Language Models Through Perceptual Strength Ratings