More

kdps · 2026-02-01T14:25:29 1769955929

It's not really surprising given the implementations. The C# stdlib just exposes more low-level levers here (quick look, correct me if I'm wrong):

For one, the C# code is explicitly using SIMD (System.Numerics.Vector) to process blocks, whereas Go is doing it scalar. It also uses a read-only FrozenDictionary which is heavily optimized for fast lookups compared to a standard map. Parallel.For effectively maps to OS threads, avoiding the Go scheduler's overhead (like preemption every few ms) which is small but still unnecessary for pure number crunching. But a bigger bottleneck is probably synchronization: The Go version writes to a channel in every iteration. Even buffered, that implies internal locking/mutex contention. C# is just writing to pre-allocated memory indices on unrelated disjoint chunks, so there's no synchronization at all.

freeopinion · 2026-02-01T14:39:28 1769956768

In other words the benchmark doesn't even use the same hardware for each run?

kdps · 2026-02-01T19:37:47 1769974667

If you're referring to the SIMD aspect (I assume the other points don't apply here): It depends on your perspective.

You could say yes, because the C# benchmark code is utilizing vector extensions on the CPU while Go's isn't. But you could also say no: Both are running on the same hardware (CPU and RAM). C# is simply using that hardware more efficiently here because the capabilities are exposed via the standard library. There is no magic trick involved. Even cheap consumer CPUs have had vector units for decades.

kdps · 2025-09-13T22:48:50 1757803730

> So, in theory, one can create 100K threads on one machine, but in practice that's going to keep burning processor for GC cycles.

The focus on "100k threads" and GC overhead is a red herring. The real win isn't spawning a massive number of threads, but automatically yielding on network I/O, like e.g. goroutines do. In an I/O bound web application, you'd have a single virtual thread handling the whole request, just like a goroutine does. The GC overhead caused by the virtual thread is minuscule compared to the heap allocations caused by everything else going on in the request. If you really have a scenario for 100k virtual threads, they would not be short lived.

> But if they access limited resources (database, another HTTP service), etc. in real application you face the standard issue: you cannot hit the targeted system with any data you want

Then why would you do it? That sounds like an architectural problem, not a virtual thread problem. In an actor system, for example, you wouldn't hit the database directly from 100k different actors.

> The good thing in reactive programming is that it does not try to pretend that above problem does not exist.

This compares a high-level programming paradigm, complete with its own libraries and frameworks, to a single, low-level concurrency construct. The former is a layer of abstraction that hides complexity, while the latter is a fundamental building block that, by design, does not and cannot hide anything.

> It forces to handle errors, to handle backpressure, as those problems will not magically disappear when we switch to green threads, lightweight threads, etc.

Synchronous code handles errors in the most time-tested and understandable way there is. It is easy to reason about and easy to debug. Reactive programming requires explicit backpressure handling because its asynchronous nature creates the problem in the first place. The simplest form of "backpressure" in synchronous code with a limited amount of threads is the act of blocking. For anything more than that, there are the classic tools (blocking queues, semaphores...) or higher-level libraries built on top of them.

fulafel · 2025-09-14T11:12:10 1757848330

> The real win isn't spawning a massive number of threads, but automatically yielding on network I/O

This is of course what normal OS threads do as well, they get suspended when blocking on IO. Which is why 100k OS threads doing IO works fine too.

kdps · 2025-09-14T14:11:56 1757859116

Yes. What I was trying to imply is that now there is a lightweight processing unit that still is able to suspend on IO (independently and without involvement from the OS scheduler), but can do that without relying on async/reactive patterns on code level. This required significant changes to the standard lib and runtime.

kdps · 2025-09-07T14:19:33 1757254773

Don't they charge for every TB exceeding the included limit? (website says "For each additional TB, we charge € 1.19 in the EU and US, and € 8.81 in Singapore.")

jsheard · 2025-09-07T14:43:12 1757256192

They do, but the risk of having to pay $1.44/TB after the first 20TB is easier to swallow than say, CloudFront's ~$100/TB after 1TB.

Cpoll · 2025-09-07T15:47:48 1757260068

> CloudFront's ~$100/TB after 1TB.

I had to double-check because that sounds hilariously wrong. I can't find it anywhere in the pricing. It's at most 0.08/TB.

Am I missing something?

jsheard · 2025-09-07T15:51:15 1757260275

You're missing the unit, it's $0.085 per GB, not TB, and that's only for NA/EU traffic. I rounded up a bit from that number because other regions cost more, plus you get billed a flat amount for each request as well.

They do offer progressively cheaper rates as you use more bandwidth each month, but that doesn't have much impact until you're already spending eye watering amounts of money.

Cpoll · 2025-09-07T16:53:44 1757264024

Oh, yeah, egg on my face. They only put the unit of measurement at the top, and then talk about TB, so it's a bit deceptive. In retrospect, I was stupid to imagine 0.085/TB made any sense.

Lionga · 2025-09-07T21:52:08 1757281928

0.085/TB makes a lot of sense if you sell just with a 50 to 100% markup. But they rather sell at tens of thousands of markup to the real cost

kdps · 2025-05-09T22:14:15 1746828855

I'd argue that the severity varies between languages, despite the core problem being universal. Languages with comprehensive standard libraries have an advantage over those with minimal built-in functionality, where people rely on external dependencies even for the most basic things (e.g. see Java/.NET vs JS/Node). Lightweight is not always better.

RossBencina · 2025-05-09T23:04:37 1746831877

> Languages with comprehensive standard libraries have an advantage

I don't see the advantage. Just a different axis of disadvantage. Take python for example. It has a crazy big standard library full of stuff I will never use. Some people want C++ to go in that direction too -- even though developers are fully capable of rolling their own. Similar problem with kitchen-sink libraries like Qt. "batteries included" languages lead to higher maintenance burden for the core team, and hence various costs that all users pay: dollars, slow evolution, design overhead, use of lowest common denominator non-specialised implementations, loss of core mission focus, etc.

zaptheimpaler · 2025-05-10T07:34:59 1746862499

It's a tradeoff. Those languages also have a very difficult time evolving anything in that standard library because the entire ecosystem relies on it and expects non-breaking changes. I think Rust gets sort of best of both worlds because dependencies are so easy to install it's almost as good as native, but there's a diversity of options and design choices, easy evolution and winners naturally emerge - these become as high quality as a stdlib component because they attract people/money to work on them but with more flexibility to change or be replaced

kdps · 2025-03-22T19:32:46 1742671966

> Imagine we have immutable records that hold just data and static classes as function containers, and those functions just act on the records, return some new ones and change no state

Or imagine those functions are part of the immutable record and create new instances. The aspect of (im)mutability is orthogonal to where you place your logic. In the context of domain models, if the logic is an inherent part of the type and its domain, then there are good reasons to model the logic as part of the type, and those have nothing to do with Java or the typical OOP dogma (Rust chrono: `let age = today.years_since(birthday)` - Yes, you could argue that the logic is part of the trait implementation, but the struct's data is still encapsulated and correct usage of the type is enforced. There is only one way to achieve this in Java.)

kdps · 2025-03-22T18:54:15 1742669655

In this case, API does not refer to client/server. The API of the aforementioned static class is the set of its methods and their signatures.

kdps · on Nov 11, 2024

> ...entirely missing the lightweight threading...

They deliberately took the longer route, aiming to integrate lightweight threads in a way that doesn't force developers to change their existing programming model. No need for callbacks, futures, coroutines, async/await, whatever. This required a massive effort under the hood and rework to many core APIs. Even code compiled with decade old Java versions can run on virtual threads and benefit, without any refactoring or recompilation.

> ...and/or async/await revolution of the last decade

async/await is largely syntactic sugar. Java has had the core building blocks for asynchronous programming for years, with CompletableFuture (2014, replacing the less flexible Future introduced in 2004) and NIO.2 (2011, building on the original NIO from 2002) for non-blocking I/O, along with numerous mature libraries that have been developed on top of them over time.

kdps · on Oct 5, 2024

> Right. It's not a very widespread use case, to be honest. You'd find that most would be N actors for M threads (where N <= M

What makes you think that? Having a large number of actors per thread is by far the most important use case. The Actor model is commonly used in communication systems where there are hundreds of thousands of actors per machine (often one for every single user). In this context, Actors are typically extremely lightweight and not CPU-bound. Instead, they mostly focus on network I/O and are often idle, waiting for messages to arrive or be sent.

Fiahil · on Oct 8, 2024

I think you misread :

- 2 actors on 1 thread = OK

- 1 actor on 2 thread = you are probably doing it wrong.

As for the rest, whether or not they are used in communication systems and whether or not they are cpu-bound, consider there are and run the handle on a separate loop from the main message dispatching. Otherwise you _will_ delay messaging if handles don't await.

kdps · on Sept 25, 2023

I don't get it :(. Why could the same task be executed more than once? From my understanding, if the UPDATE is atomic, only one worker will be able to set `used = 1`. If the update statement is not successful (affected != 1), then the worker should drop the task and do another select.

klauserc · on Sept 25, 2023

With a transaction isolation level below SERIALIZABLE you can have two transactions that both read the old row (with `used = 0`) at the time they perform the update (but before they commit the transaction). In that case, both transactions will have performed an update (rows affected = 1).

Why would both transactions see `used = 0`? The DB server tries to isolate transactions and actively hides effects of other transactions that have not committed yet.

singron · on Sept 25, 2023

This is not true in postgres. When the second transaction tries to update the row, it will wait for the first transaction to commit first and then recheck the WHERE.

https://www.postgresql.org/docs/current/transaction-iso.html...

kdps · on March 19, 2023

Zoom charges VAT based on the local country (or based on the VAT number if you are a company and provide one)