The Limits of Trust: LLMs in Systems Engineering

The Limits of Trust: LLMs in Systems Engineering

Why Expert Human Knowledge Still Outperforms AI in Complex Engineering Tasks

This research systematically evaluates how well Large Language Models perform when generating systems engineering artifacts compared to human experts.

Key findings:

  • LLMs consistently failed to match expert quality when creating engineering artifacts, despite appearing initially convincing
  • Researchers identified specific failure modes where LLMs underperform in complex systems engineering tasks
  • The study employed a mixed-methods approach combining qualitative and quantitative analysis to characterize LLM limitations
  • Results strongly caution against over-reliance on current LLMs for critical systems engineering work

This research matters because it provides empirical evidence about the practical limitations of AI in engineering contexts, helping organizations make informed decisions about appropriate AI implementation strategies.

Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes

103 | 204