
LLMs as Software Engineering Evaluators
Using AI to replace human annotation in software engineering research
This research explores whether Large Language Models (LLMs) can effectively replace human participants in software engineering evaluation studies.
- LLMs show promise in annotating software engineering artifacts, potentially reducing the cost and difficulty of human-subject studies
- The study evaluates LLMs' ability to assess code quality, identify bugs, and provide feedback comparable to human experts
- Results suggest LLMs could serve as preliminary evaluators before engaging human subjects in software engineering research
For engineering teams, this breakthrough could streamline software testing workflows, provide faster feedback cycles, and reduce dependence on scarce expert evaluators for code reviews.
Can LLMs Replace Manual Annotation of Software Engineering Artifacts?