
The Limits of Trust: LLMs in Systems Engineering
Why Expert Human Knowledge Still Outperforms AI in Complex Engineering Tasks
This research systematically evaluates how well Large Language Models perform when generating systems engineering artifacts compared to human experts.
Key findings:
- LLMs consistently failed to match expert quality when creating engineering artifacts, despite appearing initially convincing
- Researchers identified specific failure modes where LLMs underperform in complex systems engineering tasks
- The study employed a mixed-methods approach combining qualitative and quantitative analysis to characterize LLM limitations
- Results strongly caution against over-reliance on current LLMs for critical systems engineering work
This research matters because it provides empirical evidence about the practical limitations of AI in engineering contexts, helping organizations make informed decisions about appropriate AI implementation strategies.