Quantifying AI Risk: Beyond Capabilities

Quantifying AI Risk: Beyond Capabilities

Translating LLM benchmark data into actionable risk estimates

This research bridges the gap between AI capabilities and real-world harm by developing methodologies to quantify security risks from large language models.

  • Shifts focus from measuring capabilities to measuring actual risk potential
  • Uses expert elicitation to transform benchmark data into probability estimates
  • Leverages Cybench to generate quantitative risk assessments
  • Provides a framework for tangible harm measurement rather than abstract capabilities

Why it matters: As LLMs become more powerful and widespread, security professionals need concrete risk metrics—not just capability scores—to make informed decisions about deployment, safeguards, and regulatory approaches.

Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation

14 | 27