LLMs as Software Engineering Evaluators

This research explores whether Large Language Models (LLMs) can effectively replace human participants in software engineering evaluation studies.

LLMs show promise in annotating software engineering artifacts, potentially reducing the cost and difficulty of human-subject studies
The study evaluates LLMs' ability to assess code quality, identify bugs, and provide feedback comparable to human experts
Results suggest LLMs could serve as preliminary evaluators before engaging human subjects in software engineering research

For engineering teams, this breakthrough could streamline software testing workflows, provide faster feedback cycles, and reduce dependence on scarce expert evaluators for code reviews.

Can LLMs Replace Manual Annotation of Software Engineering Artifacts?