
Computing the Hessian Matrix for LLMs
A practical approach to second-order derivatives in large language models
This technical guide provides a novel approach to calculating the Hessian matrix (second-order derivatives) for large language models using PyTorch autograd.
Key Contributions:
- Demonstrates how to compute a portion of the Hessian matrix for LLMs despite computational constraints
- Provides techniques for computing the full diagonal of the Hessian using vector-Hessian Products
- Delivers an educational resource with practical implementation guidance
- Addresses a fundamental engineering challenge in LLM optimization
Why It Matters: Understanding second-order derivatives in LLMs enables better optimization strategies, more efficient training, and improved model performance. This work bridges a critical technical gap for ML engineers working with large-scale models.
Hessian of Perplexity for Large Language Models by PyTorch autograd