Hidden Biases Against Mental Health Groups in LLMs

This research reveals how Large Language Models can generate unprovoked attack narratives targeting vulnerable mental health groups, creating a framework to understand bias propagation.

Discovered differential treatment of mental health conditions, with some disorders facing more severe stigmatization
Developed a network-based framework to analyze how biases propagate through LLM-generated content
Identified emergent patterns of harmful narratives that weren't explicitly present in training data

This work is critical for the medical community as it exposes how AI systems might perpetuate and amplify harmful stereotypes about mental health conditions, potentially affecting patient care and public perception.

Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups