Can LLMs Fix Real-World Code Maintenance Issues?

This research systematically evaluates how effectively Large Language Models resolve code maintainability issues found in real-world GitHub projects.

Key Findings:

Tested 127 maintainability issues across 10 GitHub repositories
Compared zero-shot prompting (Copilot Chat, Llama 3.1) vs. few-shot prompting (Llama 3.1)
Evaluated solutions for compilation errors, test failures, and introduction of new issues
Few-shot prompting with Llama 3.1 demonstrated the best overall performance

Engineering Impact: This research provides practical insights for development teams looking to leverage LLMs for code maintenance and technical debt reduction, helping determine which LLM approaches are most effective for real-world software quality improvement.

Evaluating the Effectiveness of LLMs in Fixing Maintainability Issues in Real-World Projects