
AI-Driven Root Cause Analysis
Solving Distributed System Failures with Code-Enhanced LLMs
COCA is a novel approach that leverages code knowledge and LLMs to automatically identify root causes of distributed system failures using only user-reported issues.
- Integrates code context knowledge with generative AI to diagnose failures without comprehensive monitoring data
- Works effectively with limited information from Github or JIRA issue reports
- Significantly outperforms existing techniques in failure diagnosis accuracy
- Provides actionable insights for rapid problem resolution in complex distributed systems
This research addresses a critical engineering challenge: reducing mean time to resolution for distributed system failures, potentially saving companies thousands of development hours and reducing system downtime.
COCA: Generative Root Cause Analysis for Distributed Systems with Code Knowledge