Taming Flaky Tests with LLMs

Taming Flaky Tests with LLMs

Leveraging AI to detect and classify non-deterministic tests

This research explores how Large Language Models can improve software testing reliability by identifying tests that produce inconsistent results without code changes.

  • Evaluates both fine-tuning and few-shot learning approaches for flaky test detection
  • Demonstrates how LLMs can effectively classify different types of flaky tests
  • Offers a more efficient alternative to traditional testing methods that require multiple test executions

For engineering teams, this approach reduces debugging time and improves continuous integration reliability by proactively identifying problematic tests before they disrupt development workflows.

An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification

120 | 323