
Improving LLM Reliability in Social Sciences
Applying survey methodology to enhance AI text annotation
This research adapts established survey methodology principles to systematically assess and improve the reliability of Large Language Model annotations in social science research.
- Implements three key interventions: option randomization, position randomization, and reverse validation
- Reveals how traditional accuracy metrics can mask model instabilities, especially in edge cases
- Demonstrates framework effectiveness using the F1000 biomedical dataset
- Provides a structured approach to evaluate LLM annotation reliability beyond simple accuracy metrics
For medical research, this framework enables more reliable LLM-based analysis of biomedical literature and clinical notes by identifying and mitigating biases that traditional evaluation methods might miss.