Evaluating LLMs in Software Engineering

Evaluating LLMs in Software Engineering

A systematic framework for rigorous empirical research

New guidelines for conducting empirical studies involving Large Language Models in software engineering research, addressing the current lack of standardized evaluation approaches.

  • Establishes methodological standards for LLM-based software engineering research
  • Provides a structured framework to ensure validity and reproducibility
  • Addresses unique challenges of using LLMs in empirical studies
  • Promotes scientific rigor in a rapidly evolving research landscape

This research is crucial for engineering teams as it enables more reliable evaluation of LLM capabilities, ensuring that implementation decisions are based on trustworthy evidence rather than hype.

Towards Evaluation Guidelines for Empirical Studies involving LLMs

70 | 323