Computing the Hessian Matrix for LLMs

This technical guide provides a novel approach to calculating the Hessian matrix (second-order derivatives) for large language models using PyTorch autograd.

Key Contributions:

Demonstrates how to compute a portion of the Hessian matrix for LLMs despite computational constraints
Provides techniques for computing the full diagonal of the Hessian using vector-Hessian Products
Delivers an educational resource with practical implementation guidance
Addresses a fundamental engineering challenge in LLM optimization

Why It Matters: Understanding second-order derivatives in LLMs enables better optimization strategies, more efficient training, and improved model performance. This work bridges a critical technical gap for ML engineers working with large-scale models.

Hessian of Perplexity for Large Language Models by PyTorch autograd