More

codingjaguar · 2025-12-16T10:11:34 1765879894

well it's apples and oranges. Why do people buy F150 instead of fitting things into the trunk of a Corolla? cuz they got a lot of stuff.

For people who run thousands of QPS on billions of vectors, Milvus is a solid choice. For someone playing with a twitter demo with a few thousand vectors, any vector db can do the job well. In fact there is a fun project Milvus Lite designed for that case :)

I've seen many builders migrate from pgvector to Milvus as their apps scale. But perhaps they wish they had considered scalability earlier.

(I'm from Milvus so i could be biased.)

Snakes3727 · 2025-12-16T10:45:51 1765881951

We regularly do tens of thousands of QPS on pgvector fine on massive data stores.

We dropped milvus after they started trying for force their zilliz garbage saas down our throats.

pietz · 2025-12-16T15:17:50 1765898270

People buy F150s because they find them cool and not because they actually need the space. Your Corolla could make deliveries around town in roughly the same time, while being cheaper and easier compared to introducing a new expensive car. In situations you need more space (which most of us won't), you can add a trailer instead.

Interesting, I guess we're on the same page ;)

codingjaguar · 2025-11-04T01:53:21 1762221201

This quite aligns with our observation at Milvus. Recently, we helped several users migrate from pgvector as the workload grew substantially.

It’s worth recognising the strengths of pgvector:

• For small-to-medium scale workloads (e.g., up to millions of vectors, relatively static data), embedding storage and similarity queries inside Postgres can be a simple, familiar architecture.

• If you already use Postgres and your vector workloads are light (low QPS, few dimensions, little metadata filtering / low concurrency), then piggy-backing vector search on Postgres is attractive: minimal added infrastructure.

• For teams that don’t want to introduce a separate vector service, or want to keep things within an existing RDBMS, pgvector is a compelling choice.

From our experience helping users scale vector search in production, several pain-points emerge when scaling vector workloads inside a general-purpose RDBMS like Postgres:

1. Index build / update overhead • Postgres isn’t built from the ground-up for high-velocity vector insertions plus large-scale approximate nearest neighbour (ANN) index maintenance, for example, lacking RaBitQ binary quantization supported in purpose built vector db like Milvus.

• For large datasets (tens/hundreds of millions or beyond), building or rebuilding HNSW/IVF indices inside Postgres can be memory- and time-intensive.

• In production systems where vectors are continuously ingested, updated, deleted, this becomes operationally tricky.

2. Filtered search

• Many use-cases require combining vector similarity with scalar/metadata filters (e.g., “give me top 10 similar embeddings where user_status = ‘active’ AND time > X”).

• Need to understand low level planner to juggle pre-filtering, post-filtering, and planner’s cost model wasn’t built for vector similarity search. For a system not designed primarily as a vector DB, this gets complex. Users shouldn't have to worry about such low level details.

3. Lack of support for full-text search / hybrid search

• Purpose built vector db such as Milvus has mature full-text search / BM25 / Sparse vector support.

tacoooooooo · 2025-11-04T16:17:11 1762273031

well said! we demo'd milvus (or zilliz i should say,) and while we didn't ultimately go with it--it seems like a great option

codingjaguar · 2025-08-30T00:00:29 1756512029

Curious is that colbert-like ones?

codingjaguar · on Nov 21, 2024

The complex aggregations are indeed an important feature for sophisticated search products like e-commerce search with interactive filtering. There is probably no easy way for a vector db to catch up quick. But, - Most RAG and enterprise apps don't really need that level of sophistication in UX, what they need instead is simple and reliable infrastructure - Give vector db some time to catch up. Google Spanner didn't have any SQL capability at the early days, but they caught up after a few years, now with full SQL support. And SQL is probably more complex than Elasticsearch's aggregation.

codingjaguar · on Oct 31, 2024

This article depicts a perfect world and links it to a solution which is fairly distant from that. I understand the wishful thinking of having a "magic box" for search infrastructure but as someone worked on web-scale search at Google for years I'd say the reality isn't that simple.

1. The real problem in embedding data lifecycle management is changing the embedding mode, which involves a migration process. You can't really solve that by simply streamline the vectorization and suddenly use a new model for new data ingested. You need the non-fancy migration process: create a new collection, batch generate new vectors with the new model, port all of them there, meanwhile doing dual write for all newly ingested documents, and switch search traffic to the new collection once batch ingestion is done. Streamlining vectorization as part of the ingestion call doesn't solve that. Though it is an interesting feature to lower mental complexity, that's why at Zilliz (a vector db startup) our product https://zilliz.com/zilliz-cloud-pipelines supports that and our open-source Milvus plans supporting streamlining API call to embedding service in 3.0 version: https://milvus.io/docs/roadmap.md. That said I must state that changing the embedding model is more difficult than what the article makes it feels like. We provide tools like bulk import to batch port a whole dataset of vector embeddings with other metadata like original text or image urls. But solving the problem with one "magic box" sounds unrealistic to me, at least not for production use cases.

2. The article linked to an implementation that does naive doc processing like chunking, but in reality people need more flexibility on parsing, doc chuncking, and choice of embedding models. That's why people need tools like LlamaIndex and unstructured.io, and write a doc processing pipeline for that.

3. Most vector DBs support storing original unstructured data with the vector embedding. For example, in Milvus users usually ingest text, the vector of the text, other labels like author, title, chunk id, publish_time. The ingestion of that data is atomic naturally as that's one single row of data. "Having data and embedding not in async" is just a false claim. When you update the document, you remove the old rows and add new rows with bundled new text and new vector. I'm not sure how it could be out-of-sync. The real problem is #1, the migration problem if you want to change the embedding model, in which case you need to wipe out all existing data's vectors as they are not compatible with new embedding model so you can't blend some docs with old embedding and some with new. You need to migrate the whole dataset to another new collection and decide when to start serving queries from the new collection.

4. Lastly, the consistency/freshness problem in search usually resides between the source data, say files on S3 or a Zendesk table, and the serving stack, say vector db. Thus to build a production ready search, it needs sophisticated syncing mechanism to detect data change from the source S3, business apps or even world-wide-web and sync them to the search indexing pipeline for processing the updates and write them to the serving stack. Tools like https://www.fivetran.com/blog/unlock-ai-powered-search-with-... can offer some help in avoiding engineering complexity of implementing that in house.

codingjaguar · on Sept 30, 2024

The developer community of milvus vector database benefits a lot from the inkeep ask-ai-button in discord and milvus.io website. As a user we are happy with the rich feature set of inkeep, like integration with github/discord, admin tool to study user's questions to identify issues in product or documentation. These features are often overlooked when people talk about RAG solutions but they turned out to be very important from our experience using RAG in a real world scenario. This agentic workflow of Keep feels a great addition to the existing core RAG functionality.

codingjaguar · on March 12, 2024

"By the end of 2024, we’re aiming to continue to grow our infrastructure build-out that will include 350,000 NVIDIA H100 GPUs as part of a portfolio that will feature compute power equivalent to nearly 600,000 H100s." This AI game is getting into a GPU war. Heard that Meta is pushing a lot of CPU wordloads to GPU to co-locate with model inference for infra simplicity.

codingjaguar · on Feb 7, 2024

Interesting that what comes with disassembly is standardization, and performance is no longer a main differentiator. In addition to the UI/UX features emphasized in the post, ecosystem integration will also be a main differentiating factor. A data stack with a stronger developer community is more likely to be integrated, thus having better connectivity. The community also helps it quickly adapt to new requirements, facebook's TAO (graph db over MySQL) and most recently pgvector are good examples.

codingjaguar · on Feb 6, 2024

SQL was introduced in 1970s. Considering vector search was only adopted in the last 5 years, I’m not surprised by the lack of standards on vector API. At Google embedding as retrieval became popular in 2019-2020.

This is the new kid in town so you would see soon all major SQL dbs will support vector. However, any serious user, O(10M) vectors or above, would still require a dedicated vector db for performance reasons.

codingjaguar · on Feb 5, 2024

The dataset probably ranges from hundreds to tens of thousands of queries. The exact number is confidential and frankly, it changes over time and differs from product to product, so the order of magnitude is more indicative. This also matches the case of most public datasets. https://github.com/beir-cellar/beir?tab=readme-ov-file#beers...

I guess the intuition is: if the dataset has less than 100 cases, it's arguably not diverse enough to cover all situations. On the other hand, the marginal gain of cases over 10,000 shrinks quickly. So O(1000) is probably a sweet spot if there is a way to automatically collect queries, e.g. from online traffic. If the dataset was hand-curated, it probably only makes sense to stay at O(100).

It's also important to note that at Google there are ML trained automatic rating in addition to human raters. Rating a query is a heavy job. The rating guideline itself has 36 pages: https://services.google.com/fh/files/misc/hsw-sqrg.pdf. Reportedly, Google hires 16000 external human raters. If all of the 800,000 experiments of the year were rated by humans, that would mean

800,000 experiments * 10,000 queries per exp / 250 working days per year / 16,000 raters = 2000 queries per rater per day (aka a rater needs to finish rating a query in 4 seconds)

Considering rating a query requires comprehending the results and making comparisons, this is unlikely to be achievable. So it's either the dataset is less than 10k, or a large portion of the rating is done by machine.