
FLAME: Flexible LLM-Assisted Moderation Engine
By Ivan Bakulin, Ilia Kopanichuk...
Abstract:
The rapid advancement of Large Language Models (LLMs) has introduced significant challenges in moderating user-model interactions. While LLMs demonstrate remarkable capabilities, they remain vulnerable to adversarial attacks, particularly ``jailbreaking'' techniques that bypass content safety measur...
Key points:
- Research on large language models
- Security application