Model Tampering Attacks and Detection

Research on understanding, performing, and defending against targeted modifications to LLM weights and behavior through model tampering

Hero image