Security of LLM Activation Functions and Architecture
Research on how architectural components like activation functions affect safety and security properties of LLMs

Security of LLM Activation Functions and Architecture
Research on Large Language Models in Security of LLM Activation Functions and Architecture

Hidden Dangers in LLM Optimization
How activation approximations compromise safety in aligned models

Multi-Dimensional Safety in LLM Alignment
Revealing the hidden complexity of safety mechanisms in language models

The Geometry of LLM Refusals
Uncovering Multiple Refusal Concepts in Language Models

Unlocking Precise Control of AI Behavior
Sparse Activation Steering: A New Approach to LLM Alignment
