Goldman Sachs’ data chief says the open web is 'already' exhausted for training large models, so builders are pivoting to synthetic data and proprietary enterprise datasets. He argues there’s still 'a lot of juice' in corporate data, but only if firms can contextualize and normalize it well.
— If proprietary data becomes the key AI input, competition, privacy, and antitrust policy will hinge on who controls and can safely share these datasets.
msmash
2025.10.02
100% relevant
Neema Raphael on Goldman’s podcast: 'We’ve already run out of data,' citing DeepSeek’s use of model outputs and the need to mine enterprise data.
← Back to All Ideas