FLAME: Flexible LLM-Assisted Moderation Engine

FLAME: Flexible LLM-Assisted Moderation Engine

By Ivan Bakulin, Ilia Kopanichuk...

Abstract:

The rapid advancement of Large Language Models (LLMs) has introduced significant challenges in moderating user-model interactions. While LLMs demonstrate remarkable capabilities, they remain vulnerable to adversarial attacks, particularly ``jailbreaking'' techniques that bypass content safety measur...

Key points:

  • Research on large language models
  • Security application

Source: FLAME: Flexible LLM-Assisted Moderation Engine

65 | 104