
Evaluating AI Coding Assistants
Bringing Human-Centered Design to Automated LLM Evaluation
This research proposes a hybrid approach that combines HCI and AI methods to evaluate conversational coding assistants powered by LLMs at scale while maintaining human-centered design principles.
- Addresses the limitations of traditional human evaluation methods for LLM-based developer tools
- Advocates for automatic evaluation techniques informed by human-centered design
- Creates a framework to ensure AI coding assistants align with developers' actual needs
- Bridges the gap between qualitative human insights and quantitative AI evaluation
This work is significant for Engineering teams as it provides a practical pathway to ensure that AI coding assistants are evaluated not just on technical metrics but on how well they serve actual developer workflows and expectations.
Bridging HCI and AI Research for the Evaluation of Conversational SE Assistants