AI-Driven Root Cause Analysis

AI-Driven Root Cause Analysis

Solving Distributed System Failures with Code-Enhanced LLMs

COCA is a novel approach that leverages code knowledge and LLMs to automatically identify root causes of distributed system failures using only user-reported issues.

  • Integrates code context knowledge with generative AI to diagnose failures without comprehensive monitoring data
  • Works effectively with limited information from Github or JIRA issue reports
  • Significantly outperforms existing techniques in failure diagnosis accuracy
  • Provides actionable insights for rapid problem resolution in complex distributed systems

This research addresses a critical engineering challenge: reducing mean time to resolution for distributed system failures, potentially saving companies thousands of development hours and reducing system downtime.

COCA: Generative Root Cause Analysis for Distributed Systems with Code Knowledge

278 | 323