Model Tampering Attacks and Detection

Research on understanding, performing, and defending against targeted modifications to LLM weights and behavior through model tampering

This presentation covers 12 research papers on large language models applied to Model Tampering Attacks and Detection.

1 | 14