Large language models can display inconsistent moral priorities tied to gendered framing (for example, judging harassment of women as less permissible than more severe harms like torture), indicating they’re generalizing discourse patterns rather than reasoning about harm. This pattern appears linked to the models’ training on public debates about gender equality, producing systematic but counterintuitive outputs.
— If true, these distortions matter for AI deployment in ethics‑sensitive domains (law, policing, content moderation) because models may amplify or invert social justice narratives unpredictably.
Steve Stewart-Williams
2026.03.31
100% relevant
Fulgu and Capraro (2026) experiments reported that GPT‑4 judged abusing a man to avert apocalypse more acceptable than abusing a woman, and answered inconsistently on torture versus harassment scenarios.
← Back to All Ideas