Subgroup Mining Signals P‑Hacking

Researchers commonly split samples and search for subgroups until an outcome reaches statistical significance; because interaction effects require much larger samples than main effects, these subgroup discoveries are especially likely to be flukes and fail replication. Identifying fields or papers with unusually many subgroup‑only significant results offers a scalable signal of p‑hacking and compromised evidence. — Flagging subgroup‑only findings would help journalists, policymakers, and funders distinguish robust results from likely data‑dredged artifacts and shape norms (preregistration, reporting) to reduce false positive science.

Sources

One Weird Trick to Get Significant Results

Cremieux 2026.03.13 100% relevant

Cremieux's example of splitting a drug trial by sex and the claim that detecting interactions needs about eight‑times the sample for an effect size d=0.25 is the concrete instance that motivates this diagnostic.