Taming Flaky Tests with LLMs

This research explores how Large Language Models can improve software testing reliability by identifying tests that produce inconsistent results without code changes.

Evaluates both fine-tuning and few-shot learning approaches for flaky test detection
Demonstrates how LLMs can effectively classify different types of flaky tests
Offers a more efficient alternative to traditional testing methods that require multiple test executions

For engineering teams, this approach reduces debugging time and improves continuous integration reliability by proactively identifying problematic tests before they disrupt development workflows.

An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification