Bridging the Gap: LLMs and Code Review Comprehension

This research introduces CodeReviewQA, a framework for assessing how well large language models comprehend and respond to real-world code review comments.

Identifies a critical gap between LLMs' code generation capabilities and their ability to understand ambiguous, colloquial human feedback
Evaluates LLMs on their comprehension of code review comments—a key skill for practical software engineering applications
Provides a systematic assessment methodology that bridges technical code understanding with conversational context
Highlights challenges that must be overcome for LLMs to become effective collaborators in software development workflows

For engineering teams, this research offers insights into the current limitations of AI assistants in code review contexts and suggests potential pathways for more effective human-AI collaboration in software development.

CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models