Balancing Safety and Utility in AI Role-Playing

This research investigates the critical safety-utility trade-off in LLM-powered role-playing agents, proposing solutions for high-risk character simulations.

Systematically explores how different LLMs balance character portrayal against safety guardrails
Identifies a concerning "rise of darkness" pattern where harmful outputs increase with character accuracy
Proposes frameworks to better manage high-risk scenarios in gaming and creative applications
Offers practical approaches to maintain character authenticity while preventing dangerous content generation

Security Implications: As role-playing agents become more widespread in gaming and creative applications, this research provides essential guidance for developers to implement appropriate safeguards against harmful content while preserving the utility of character simulations.

The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents