Evaluating Code Generation LLMs

This research provides a systematic approach to evaluate Large Language Models in code generation tasks, addressing the growing demand for automated software development tools.

Establishes standardized metrics for assessing code quality, functionality, and efficiency
Reviews the historical evolution of LLMs for code generation
Identifies current evaluation challenges and methodological gaps
Proposes improved evaluation frameworks for engineering applications

For engineering teams, this research offers valuable insights into selecting appropriate LLMs for development workflows and understanding their capabilities and limitations in real-world coding scenarios.

A Survey on Evaluating Large Language Models in Code Generation Tasks