The Limits of Trust: LLMs in Systems Engineering

This research systematically evaluates how well Large Language Models perform when generating systems engineering artifacts compared to human experts.

Key findings:

LLMs consistently failed to match expert quality when creating engineering artifacts, despite appearing initially convincing
Researchers identified specific failure modes where LLMs underperform in complex systems engineering tasks
The study employed a mixed-methods approach combining qualitative and quantitative analysis to characterize LLM limitations
Results strongly caution against over-reliance on current LLMs for critical systems engineering work

This research matters because it provides empirical evidence about the practical limitations of AI in engineering contexts, helping organizations make informed decisions about appropriate AI implementation strategies.

Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes