Interpretability for LLM Security
Research on understanding and explaining LLM internal states and mechanisms to improve security, detect vulnerabilities, and enable safer steering of model behavior
This presentation covers 12 research papers on large language models applied to Interpretability for LLM Security.