Improving LLM Reliability in Social Sciences

Improving LLM Reliability in Social Sciences

Applying survey methodology to enhance AI text annotation

This research adapts established survey methodology principles to systematically assess and improve the reliability of Large Language Model annotations in social science research.

  • Implements three key interventions: option randomization, position randomization, and reverse validation
  • Reveals how traditional accuracy metrics can mask model instabilities, especially in edge cases
  • Demonstrates framework effectiveness using the F1000 biomedical dataset
  • Provides a structured approach to evaluate LLM annotation reliability beyond simple accuracy metrics

For medical research, this framework enables more reliable LLM-based analysis of biomedical literature and clinical notes by identifying and mitigating biases that traditional evaluation methods might miss.

Old Experience Helps: Leveraging Survey Methodology to Improve AI Text Annotation Reliability in Social Sciences

49 | 85