Engineering Better LLM Agents

This research introduces a comprehensive evaluation-driven development process for LLM agents that addresses their unique, dynamic nature.

Proposes a reference architecture specifically designed for LLM agent evaluation
Establishes systematic methods to assess performance and safety of autonomous AI systems
Addresses challenges of evaluating systems that can adapt post-deployment without explicit code changes
Creates a foundation for engineering practices that ensure AI system reliability

For engineering teams, this framework offers a structured approach to developing more reliable, safer LLM agents while maintaining governance alignment throughout the development lifecycle.

Evaluation-Driven Development of LLM Agents: A Process Model and Reference Architecture