Improving AI Code Review Quality

This research addresses the critical challenge of noisy data in AI-powered code review automation, delivering more accurate and valuable review comments.

Identifies persistent noise issues in code review datasets that compromise model quality
Develops advanced data cleaning techniques beyond traditional heuristics
Demonstrates how higher quality training data produces more actionable and specific AI code review comments
Provides a framework for identifying and filtering low-value comments from training datasets

For engineering teams, this research enables more effective automated code review systems that can provide genuinely helpful feedback, potentially reducing review time while maintaining quality standards.

Too Noisy To Learn: Enhancing Data Quality for Code Review Comment Generation