
Faster Circuit Discovery in Transformers
Accelerating Mechanistic Interpretability with Contextual Decomposition
This research introduces a novel approach to understand the internal mechanisms of large language models more efficiently.
- Presents contextual decomposition for circuit discovery that is 10x faster than previous methods
- Overcomes limitations like slow runtime and approximation errors in existing techniques
- Enables more scalable explanations of neural network internals for complex models
- Enhances security by improving our ability to identify potential vulnerabilities through better model interpretability
Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition