1. Proprietary Data (Youtube, docs, gmail, cloud logs, waymo, website analytics,...

est31 · 2026-03-08T20:01:26 1773000086

You can't train LLMs on proprietary data, at least not if you want to make that LLM as accessible as Gemini. Otherwise random people can ask it your home address.

So it matters less than one would think. Also, ChatGPT can do 'internet search' as a tool already, so it already has access to say Google maps POI database of SMBs.

And ChatGPT also gets a lot of proprietary data of its own as well. People use it as a Google replacement.

sillyfluke · 2026-03-08T21:54:45 1773006885

>You can't train LLMs on proprietary data, at least not if you want to make that LLM as accessible as Gemini. Otherwise random people can ask it your home address.

If this is your only criteria I think you have a misunderstanding of what proprietary data is and ways companies can mitigate the situation in the inference stage.