Reward spillover causes style contagion

When reinforcement signals reward a particular stylistic tic in one training condition (for example, a 'nerdy' persona that uses creature metaphors), that behavior can spread to other conditions and releases, producing unexpected, platform‑wide stylistic quirks. The spillover can be detected in usage data (OpenAI reported percentage increases across GPT‑5.x) and is sometimes addressed only by adding explicit runtime instructions or content filters. — This phenomenon matters because it shows alignment mistakes can propagate subtly through models and user experience, forcing firms and regulators to rethink monitoring, testing, and instruction‑level governance of deployed models.

Sources

ChatGPT Became So Obsessed With Goblins That OpenAI Had to Intervene

EditorDavid 2026.05.03 100% relevant

OpenAI's log post and the Wall Street Journal reporting that mentions of 'goblin' rose sharply after GPT‑5.1 and that an explicit base instruction was inserted: 'Never talk about goblins, gremlins... unless absolutely relevant.'