LLMs in Materials Science: Promise vs. Performance

LLMs in Materials Science: Promise vs. Performance

Evaluating AI robustness for engineering applications

This research rigorously evaluates Large Language Models for materials science applications, assessing their reliability under real-world and adversarial conditions.

  • Tested LLMs on domain-specific Q&A and materials property prediction tasks
  • Evaluated models across diverse datasets including multiple-choice questions
  • Assessed predictive capabilities for steel compositions and yield strengths
  • Identified key limitations and robustness challenges in engineering contexts

For engineers, this research provides critical insights into where LLMs can be trusted in materials science workflows and highlights areas requiring human expertise or specialized approaches.

Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions

31 | 204