Ok. That sort of lines up with "lack of docs" and "missing features" but maybe the bigger thing is the general impression and expectations.
Having spent much of last year pointing kafka at differential dataflow and watching kafka fall over or fail to start, I definitely feel like I would trust differential dataflow more. But I agree that nothing about the presentation of the project gives that impression.
Discovery is another. "Differential dataflow" and "timely dataflow" don't quite convey clearly what problem they're solving or what machine characteristics they rely on or even where to expect more performance from them and how. Not saying that they aren't performant, but we need to say how and where pretty clearly.
For example, Spark makes it clear that its performance comes from exclusively in-memory compute across a cluster.
Spark may also be "good enough" from a performance standpoint for many use cases. New tools can get adoption among small players only if they are radically easier to use and deploy.
I have never used neither kafka nor differential dataflow, but I would like to offer an personal anecdote as an illustration of an importance of a greater system:
I have once needed to set up webapp written in Python. I did this by running the code in WSGI instance, and it via nginx. Setting up all the activation files, locked-down permissions, secure sockets was pretty finicky to get right, and took non-trivial amount of time.
It would have been much easier to use Python's built-in web server and expose to internet directly. It has fewer moving parts and generally more predictable.
I still went with more complex solution -- because I needed logging, security and large file offload. And I used built-in web server for development.
There requirements of "production" system are pretty different than "development" one. Sometimes people are willing to install bigger (and therefore more fragile) system when they need more features.
That's not the issue here. Kafka is far more fragile than it should be, partly because companies have always approached it as cluster-first and partly because it's enterprise-first software where high setup costs just aren't important. A lot of JVM software ends up like this - there's a big chunk of fiddly O(1) work in getting it all going, just because no-one ever bothers to make it all easy to get started with.
I say this as a huge fan of Kafka, but things like MySQL have better defaults and are easier to get running out of the box, and there's no reason Kafka's starting experience couldn't be the same if someone cared enough to put the time and effort in. And ultimately it's a shame, because it leads people to ignore something that's a much better model and platform in the long term.
> there's no reason Kafka's starting experience couldn't be the same if someone cared enough to put the time and effort in
There's always a perverse incentive when the software provider's model is to monetize through support, consulting, and offering the software as a managed service. If it's too easy to run, then why would one pay for any of these services?
AIUI it's pretty expensive compared to, say, RDS or their managed Redis service? Which makes perfect sense relative to how much of a pain running your own Kafka cluster is.
100% worth it IMO, but it's a lot of upfront cost and you only start to see the benefits when a given flow is Kafka end-to-end and you learn how to use it, so I absolutely get why people are skeptical.
Having spent much of last year pointing kafka at differential dataflow and watching kafka fall over or fail to start, I definitely feel like I would trust differential dataflow more. But I agree that nothing about the presentation of the project gives that impression.