Model Tampering Attacks and Detection
Research on understanding, performing, and defending against targeted modifications to LLM weights and behavior through model tampering
This presentation covers 12 research papers on large language models applied to Model Tampering Attacks and Detection.