This looks really promising and thanks for sharing! I've actually been looking into this exact type of functionality lately, for anyone else I'm just going to drop some other projects in this area (also if anyone, including OP, want's to chime in on how they compare- i know there's all sorts of trade-offs ie. the similarity algo chosen, one of these relies on wasm, etc):
Another area I've looked into briefly is also trying to generate vector embeddings fully in-browser. I know there will likely be tradeoffs with dimensionality / overall accuracy/performance but having a semi-decent way to do this in-browser would be awesome.
This project only runs in node (but there's an issue that talks about how it might be modified to run in-browser):
If I'm putting myself in the shoes of the founders or executive team- just wondering how many parameters being dialed differently could have shaped the launch for the better (or worse). I'm imagining all the stakeholders arguments:
"the laser projector adds X more development months, X less hours of battery life, and X more costs to manufacture due to higher power needs therefore bigger battery, more heat management.. should we launch v1 without it? We can just push-notify your phone if we detect they need a display and they're reaching for their phone, they swipe notification and we take them to exactly what we were trying to show them. Or we can show it on their watch and it would serve as a forcing function for us to build out the voice UX to try and do more with voice."
"I get that we're introducing a new ux modality but shouldn't we give user's a pathway that blends between already heavily embedded habits (smartphone, watch) to ours? User's should be able to carry on existing txt conversations from their phones, call from their existing phone numbers, hear notifications from their phone, etc. In fact- these could be great hooks for us to get them used to using our natural language UX more frequently."
"I get that the device needs to look like beautiful jewelry but metal is heavier- can't we find a way to make a fully plastic enclosure look high-end?"
"Our software stack isn't ready! We can put compute and llms closer to the edge to reduce latency, train 'micro-llms' for very specific tasks that will run faster and cheaper, etc. etc."
For all of these hypothetical contentions I can equally imagine very valid reasons for choosing the paths Humane has chosen. Letting these objections always drive the product development could have resulted in a much worse outcome. There's a plethora of anecdotes about Steve Job's refusing to compromise on the many facets of the original iPhones design but if I had to guess there are less stories repeated about how many compromises he was ok with.
It's been notoriously difficult to introduce new consumer product categories- if it was easy or more obvious the Apple watch would already be at-parity with 80-90% of the features (which one would assume is what will happen in short order, watch & airpods as proxies to an embedded+edge llms on iPhone).
I can understand viewpoints from both those who are trashing the Humane ai pin and those excited about it but one thing I'm happy to see is a team of this caliber making bold moves (and a investment ecosystem to support it). With v1 out (remember, a lot of hardware startup never make it this far) and the community responding it will be interesting to see how deftly they can navigate rapidly iterating and improve the ux.
With v1 being basically DOA -- and not just for software reasons but fundamental design and cost reasons -- I doubt we will ever see a version 2 from Humane.
What will see, eventually, is similar devices from other companies that are not trying to build their own ecosystem but are happy to be a peripheral of an existing ecosystem. The will also probably compromise on design more to make it smaller, lighter, and cheaper. And even then it will be niche.
Until this demo the most impressive conversational experiences I've seen were Pi and Livekit's Kitt demo (https://livekit.io/kitt). I do not think kitt was quite as fast in response time (as retell) but incredibly impressive for being fully opensource and open to any choice of api's (imagine kitt with groq api + deepgram's aura for super low latency).
Retell focusing on all of the other weird/unpredictable aspects of human conversation sounds super interesting and the demo's incredible.
Vercel edge functions are Cloudflare Workers under the hood. I'm pretty sure Supabase is using Deno Deploy/Subhosting. There's a lot of nuance to comparing these two offerings but in short what I'd say is if you like the supabase platform offerings with all the tightly integrated goodies than you will likely benefit way more from going with Supabase functions.
Vercel is increasingly trying to offer all these integrated goodies too but imo right now most of them are off-the-shelf things like "a postgres db" (powered by neondb platform) a "kv redis store" (powered by upstash) and they seem to be betting their user's want to use their own pg/redis cli's & sdk's to manage & access the data from the edge functions (versus supabase's sdk's which unify auth, db, functions, etc.).
Both options do have their benefits but if you want everything tightly integrated with a friendly management UI, sdk & auth Supabase can save you tons of time!
In terms of speed I still think nothing beats cloudflare workers (way more geographically distributed points of presence so even closer to more users). However I think vercel's edge functions only use a subset of the total cloudflare workers PoPs. But I highly doubt the average use-case or user would ever notice the difference in latency between cloudflare workers & supabase edge functions so i don't know if that should be your determining factor. (And let's not get started on the debate of "edge doesn't matter if it's far from your db". If supabase is doing any magic to make their edge functions "close" to the db that's a huge win!)
Another option is to go full cloudflare ecosystem (they too have a kv store & now a limited sql db) but, again, I do not think their ecosystem gives you the all-in-one goodness that supabase does (no auth offering, definitely no way to tie together db/kv/auth with one sdk/management dashboard).
Yet another option is to go with fly.io + tursodb. Like deno & cloudflare workers they have their own geographically distributed PoP's that run lightweight, optimized firecracker vms running any of your dockerfile desires. With this you get even more freedom and control but you also need to get your hands a bit dirty in flyio's cli & platform magic. Also while turso is super interesting it's very new and they seem to be pushing more of a "have a database per user" than "host one giant multitenant db".
I have compared all these offerings, weighed the pros / cons. In the end don't let it paralyze you! Nearly all these offerings have enough standardized pathways in and out of them (or similar standard protocols like sql, postgres protocol, exports, cf runtime even has limited node/npm support for instance) to where lock-in should not too big a fear in the beginning. Worst case try to write abstractions for db-access, auth-hooks, kv-access, maybe runtime-specific logic so that you can adapt the abstraction to benefit app-wide if you ever need to migrate to another runtime/db.
We have a cluster of Edge Runtime instances in each region. We run multiple edge functions within an instance using v8 isolates (Deno's core runtime is a wrapper around v8 isolates). This setup provides a good balance between cost and security. Other edge hosting providers like Cloudflare also follow a similar setup.
I immediately recognized this as either inspired from or actually from the same creators of the mercuryos concept https://www.mercuryos.com/. Turns out it's the latter.
I love the Mercury OS concept and think it's design both elegantly and sort of subversively packs a myriad of potentially breakthrough ideas.
I have been stewing with ideas around the same vision for years. The idea of a new type of UI where the UI seams to dematerialize, where you directly manipulate the object in your current context (like multi-touch's direct manipulation but at a higher layer of abstraction powered by deep api integrations, intelligent self-assembling relational graphs, and of course ai). For over a decade I've had this thought "the data becomes the UI" like an emergent UI from whatever given data, task or context you are currently in. When I came across the mercuryos concept I immediately smiled.
Conceptually, strategically and technically there are so many challenges to introducing such a new ux paradigm but I'm very happy to see the mercuryos concept has seemed to evolve to New Computer's Dot and I wish them the best!
For those immediately turning to negative sentiment based on privacy or "it's just a gpt4 wrapper" I can see why that could be the knee-jerk reaction but I wouldn't underestimate a sturdy design-philosophy approach like this one. I'd go as far as to make comparisons to Next Computer's NextStepOS. NextStep introduced so many groundbreaking UX concepts and to a large extent I think their personal computing contemporaries underestimated what potential it packed. And, yes, I know the business model and many other factors played into an inevitable doom for Next Computer but there's belief that Steve Jobs may have never intended for Next to become a dominant computing player and instead knew it'd be an irresistible acquisition target in a latent space of UX innovation. It's possible he saw the next evolution of personal computing UX and hedged his bet on not compromising on it. Yet another comparison could be that NextStepOS needed more cpu, graphics and connectivity power to truly display it's heightened level of UX much in the same way something like Dot or mercuryOS would inherently need to leverage cutting-edge computing to truly enable it's vision (obviously LLM's, Vector DB's, etc).
I don't think it ever failed in the sense it never shipped because that was out of scope.
It was never meant to be an end-product. It was a concept/case-study. Happy they're finally realising that vision.
The "self-healing" sounds very interesting. I've tried to think, myself, how to approach this in a chrome extension running dom selectors in automations. Curious if you have any high-level thoughts/findings in this area?
We're just getting started on it ourselves but it's a really fun problem. I think the useful thing from our findings so far is that simplifying the DOM representation really helps the model reason about state.
This looks really promising, will definitely look into using this for a project i'm working on! Btw I've used both datadog and newrelic in large-scale production apps and for the costs I still am not very impressed by the dx/ux. If hyperdx can undercut price and deliver parity features/dx (or above) i can easily see this doing well in the market. Good luck!
Thank you! Absolutely agree on Datadog/New Relic DX, I think the funny thing we learned is that most customers of theirs mention how few developers on their team actually comfortably engage with either New Relic or Datadog, and most of the time end up relying on someone to help get the data they need!
Definitely striving to be the opposite of that - and would love to hear how it goes and any place we can improve!
Datadog feels like they've used a shotgun to shoot functionality all over the place. New Relic felt a bit more focused, but even then I had to go attend a New Relic seminar to properly learn how to use the bloody thing.
This is wild. A blog post breaking down the tech would be awesome. I wonder if you could train another ai on screenshots of great web ui design and have that agent automatically critique and guide it?
This is awesome! I've been looking for exactly this type of solution that gives a more intuitive (yet robust) ui and has node/typescript api (rivets node sdk is even better!).
Quick question, how does it know to choose plugins? I think you add a plugin node to the graph but are there ways in which you can describe what the plugin does so that an llm could dynamically choose amongst plugins? Or is this just building out a graph with nodes that explicitly choose amongst plugins via prompting?
Thanks so much for open sourcing this! Great work!
Thanks! Really glad to hear it, and excited to hear what you think!
Rivet doesn't have a built-in way of choosing between different ChatGPT plugins, so you have to explicitly build out the graph and choose via prompting. We published an example app that actually does this, although the "plugin choosing" part is intentionally simplistic: https://github.com/Ironclad/rivet-example
And yes... the docs and a decent amount of the code was heavily assisted by LLMs (all Andy, not me). Apparently if you look in the commit history, you can kind of see how Andy used Rivet to build Rivet!
https://github.com/tantaraio/voy
https://github.com/nitaiaharoni1/vector-storage
https://github.com/danielivanovz/indexed-vector-store
https://github.com/yusufhilmi/client-vector-search
Another area I've looked into briefly is also trying to generate vector embeddings fully in-browser. I know there will likely be tradeoffs with dimensionality / overall accuracy/performance but having a semi-decent way to do this in-browser would be awesome.
This project only runs in node (but there's an issue that talks about how it might be modified to run in-browser):
https://github.com/Anush008/fastembed-js
Looks like transformers.js supports embeddings in-browser:
https://github.com/xenova/transformers.js/releases/tag/2.1.0
https://github.com/pinecone-io/semantic-search-example/blob/...
Also it appears the client-vector-search supports both embeddings & vector indexing
https://github.com/yusufhilmi/client-vector-search
Hope this helps others looking into this stuff!