
AI Safety Blindspots in Scientific Labs
Evaluating LLMs for Laboratory Safety Knowledge
This research introduces LabSafety Bench, the first comprehensive benchmark for evaluating LLMs' ability to identify and respond to laboratory safety risks.
- Tests LLMs across 8 safety domains with 2,100+ questions derived from OSHA standards
- Reveals significant performance gaps even in advanced models like GPT-4 and Claude
- Demonstrates concerning model tendencies to provide dangerously incomplete safety advice
- Highlights urgent need for specialized safety training in scientific AI systems
For security professionals, this research exposes critical vulnerabilities in AI-guided laboratory workflows that could lead to physical harm, highlighting the need for rigorous safety evaluation before deployment in high-stakes scientific environments.
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs