The LLM Reliability Gap in Cybersecurity

This research presents a comprehensive evaluation methodology for testing the reliability of Large Language Models in automating Cyber Threat Intelligence (CTI) tasks.

LLMs show significant inconsistency in CTI tasks across zero-shot, few-shot, and fine-tuned approaches
Research introduces a novel framework to quantify confidence levels in LLM-generated cyber threat analysis
Findings reveal critical reliability concerns when deploying LLMs for security operations
Results suggest caution is needed before integrating LLMs into mission-critical cybersecurity workflows

For security professionals, this research highlights the importance of rigorous evaluation before adopting LLM-powered solutions in threat intelligence pipelines where accuracy and reliability are paramount.

Large Language Models are Unreliable for Cyber Threat Intelligence