Ask HN: AI Pipelines have a “Customer 360” problem?

I’m in no way an AI export. I’m a real-time data plumber [0] and see much of the world through that lens. But:

ChatGPT and emerging AI tools require context - lots of it - to give great outcomes. Identifying potential churn, or classification / recommendations, or emerging fraud are latency-sensitive applications that are obvious applications of AI. But ChatGPT and AI tech is a pure function — ask a question, get an answer — and its performance can't surpass the quality of the question you ask.

Questions thus need to include context, and often that context is “here’s the history of interactions with this customer across services, APIs, perhaps HubSpot/Zendesk etc, as well as internal conversations and support tickets”. To ask whether this customer is a current churn risk, or is likely to buy a product, you first need to have all of this context available and up-to-date in your prompt.

Building this context is possible but very poorly served by the modern data stack, given latency and cost concerns (though no doubt Snowflake would love you to attempt it). Yet businesses need to own the details and easily experiment and evolve these pipelines. So what do they do? What are you doing?

Carving out two sub-topics of interest:

  - Maintaining external memory in an up-to-date vector DB: This technique seems to require that each document has useful stand-alone context. Or that you can quickly map a match to a broader context. Throwing stand-alone chat messages or interactions into a vector DB seems easy but… not useful.

  - Online AI transformations like “Given X & Y, is the user likely to churn? Answer Yes or No” require that you have the full context of X & Y already.

Both problems have the same shape? You need to build up contexts, and you need to do it continuously if you want low latency.

Thesis: I suspect that AI pipelines are a killer application for incremental, deeply stateful stream processing. But I’m very curious for the community’s perspective. Thanks.

[0]: Co-founder of streaming data ops company https://estuary.dev