LLMs vs. Human Experts in Psychological Assessment

This research evaluates how well Large Language Models perform compared to human experts when assessing content validity in personality tests like BFQ and BFI.

Compares semantic item-construct alignment capabilities of LLMs and psychology graduate students
Demonstrates potential for AI to support psychology professionals in test validation
Provides a methodological framework for using embeddings in psychometric instrument evaluation
Explores implications for more efficient, objective content validity assessment

This research matters for clinical psychology by offering new technological approaches to ensure psychological measures accurately assess their intended constructs, potentially improving diagnostic accuracy and treatment planning.

Comparing Human Expertise and Large Language Models Embeddings in Content Validity Assessment of Personality Tests