LLMs in Materials Science: Promise vs. Performance

This research rigorously evaluates Large Language Models for materials science applications, assessing their reliability under real-world and adversarial conditions.

Tested LLMs on domain-specific Q&A and materials property prediction tasks
Evaluated models across diverse datasets including multiple-choice questions
Assessed predictive capabilities for steel compositions and yield strengths
Identified key limitations and robustness challenges in engineering contexts

For engineers, this research provides critical insights into where LLMs can be trusted in materials science workflows and highlights areas requiring human expertise or specialized approaches.

Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions