More

ifcologne · on April 7, 2021

This has the potential to accelerate the blocking of spam and troll accounts.

Is anyone already implementing such a service?

ifcologne · on Feb 11, 2021

I often hear the argument that one should only use the standard or in the cloud, the (OpenSource) tool specific features to be able to move anywhere else anytime.

No use of AWS specifics which would make developers live easier but doesn’t exist in Azure. SQL Standard instead of effective new datatypes or procedures that are vendor specific.

My favorite quote of the article:

> „Data has gravity“. Moving data can be both time-consuming and costly.

This is a major lock-in that is often ignored.

ifcologne · on Nov 14, 2019

Ingo from ArangoDB here:

If you need to scale-out, the ArangoDB packages are more affordable then MongoDB Atlas, as you don't need to spin-up a whole 3 node replica-set to add another shard to your cluster. The smaller instances are cheaper in Atlas, here you benefit from the established cloud service, which can negotiate better conditions with the large cloud providers. However, we will pass on lower cloud costs to customers, so there is hope that we will move closer over time. But, I don't see ArangoDB in direct competition with smaller, pure document-use cases. Most users need the multi-model capabilities and use graphs in combination with document operations.

ifcologne · on Nov 14, 2019

Yes. Now I need to refine my backlog. A lot of great ideas I didn't thought about yet.

A lot of users don't care and are used to these tactics. Is there a chance that this will change in the future? I hope so.

ifcologne · on Sept 25, 2019

What I miss in the conclusion: Multi-model databases

More and more products support multiple data models today.

This reduces the number of technologies in your tech stack and allows to combine different access patterns without the need to duplicate and sync data between systems.

ifcologne · on April 10, 2019

Good explanation of TF-IDF. Now I can verify if my colleagues have implemented it in the same way.

ifcologne · on March 20, 2019

Okay, Stephen O’Grady - here’s the obvious one you’ve asked for:

CSS is not a programming language. ;-)

Despite that, the list is quite complete and feels reasonable. Did you try to research how languages are used in certain use cases? Which languages compete in a certain domain?

sogrady · on March 20, 2019

Ha! We get asked about CSS every time. Our general answer is that we try very hard not to editorialize, and let GitHub’s Linguist make determinations. We do make decisions, but to date, CSS has continued to make the cut.

As for how languages are used, we spend a lot of time trying to understand that broadly, and where the rankings reveal anomalous patterns (e.g. Kotlin a year or two ago) we do more targeted research to understand those.

ifcologne · on March 14, 2019

Imagine you would go to your preferred online marketplace and search for a generic product.

You get 1000+ results.

So you filter by avg.star-rating > 4.0

Still 500+ results.

Those with just one 5 star rating in front of the one with 300 reviews and a 4.8 avg. Annoying.

What I really want:

I would like to filter for products that have at least 5 (relatively long) reviews, an average rating of 4.0 and at least 2 of these review comments mentioning the use case for which I would like to use this product. Maybe I just want the verified purchases to be counted or the reviews of friends and friends of friends...

Using a native multi-model approach you can do both. Simply retrieve all category X products ranked by product rating, limit 50/page or perform advanced lookups - without having to synchronize data from a document or relational model with an additional graph or search engine.

Combining full text search with scorers, graph traversals and/or join operations you could do an ad-hoc query in AQL to get the most relevant products & reviews with a single query.

Multi-model provides choice. In data modeling and querying.

ddebernardy · on March 14, 2019

> Those with just one 5 star rating in front of the one with 300 reviews and a 4.8 avg. Annoying.

That's easy enough to solve with a bayesian average. The problem is many developers and product managers don't know much or anything about stats.

dorgo · on March 14, 2019

How exactly? By adding a number (C) of average (2.5 stars) "virtual" reviews to all products?

https://en.wikipedia.org/wiki/Bayesian_average

ddebernardy · on March 14, 2019

Like so:

http://www.evanmiller.org/how-not-to-sort-by-average-rating....

in9 · on March 14, 2019

you can also use of a shrunken average. If the group has much data, it has its own average, else it is just a slight deviation of the global mean.

ifcologne · on March 14, 2019

The art is to combine all data models using one query language without duplicating data.

ifcologne · on Jan 17, 2019

Yes, we had cluster stability issues 1.5 years ago. That has changed, cluster stability and performance has the top priority and we invest a lot to improve the developer and devops experience with every release. Now, e.g. with K8s deployments or the arangodb starter, it's much easier to run and maintain clusters. Hope you find the time to give it a second try.