Is speed really a good reason for using async? If I remember correctly, asynchro...

jillesvangurp · on June 12, 2020

I think speed is the wrong word here. A better word is throughput.

The underlying issue with python is that it does not support threading well (due to the global interpreter lock) and mostly handles concurrency by forking processes instead. The traditional way of improving throughput is having more processes, which is expensive (e.g. you need more memory). This is a common pattern with other languages like ruby, php, etc.

Other languages use green threads / co-routines to implement async behavior and enable a single thread to handle multiple connections. On paper this should work in python as well except it has a few bottlenecks that the article outlines that result in throughput being somewhat worse than multi process & synchronous versions.

MaxBarraclough · on June 12, 2020

I think 'scalability' is the best word here.

Taken from Stephen Cleary's SO answer on this topic: https://stackoverflow.com/a/31192718

throwaway894345 · on June 12, 2020

> which is expensive (e.g. you need more memory)

Memory is cheap; the cost is in constant de/serialization. Same with "just rewrite the hotspots in C!"-style advice; de/serialization can easily eat anything you saved by multiprocessing/rewriting. Python is a deceivingly hard language, and a lot of this is a direct result of the "all of CPython is the public C-extension interface!" design decision (significant limitations on optimizations => heavy dependency on C-extensions for anything remotely performance sensitive => package management has to deal extensively with the nightmare that is C packaging => no meaningful cross-platform artifacts or cross compilation => etc).

ianbutler · on June 12, 2020

Memory is not cheap when dealing the real world cost of deploying a production system. The pre fork worker model used in many sync cases is very resource intensive and depending on the number of workers you're probably paying a lot more for the box it's running on, ofc this is different if you're running on your own metal but I have other issues with that.

throwaway894345 · on June 12, 2020

> Memory is not cheap when dealing the real world cost of deploying a production system.

What? What makes you say that? What did you think I was talking about if not a production system? To be clear, we're talking about the overhead of single-digit additional python interpreters unless I'm misunderstanding something...

ianbutler · on June 12, 2020

Observed costs from companies running the pre fork worker model vs alternative deployment methods and just in the benchmark they're running double digit interpreters which I've seen as more common and expensive.

throwaway894345 · on June 12, 2020

Double-digit interpreters per host? Where is the expense? Interpreters have a relatively small memory overhead (<10mb). If you're running 100 interpreters per host (you shouldn't be), that's an extra $50/host/year. But you should be running <10/host, so an extra $5/host/year. Not ideal, but not "expensive", and if you care about costs your biggest mistake was using Python in the first place.

ianbutler · on June 12, 2020

I don't know where you're seeing the < 10mb from the situation I saw they were easily consuming 30mb per interpreter. Even my cursory search around now shows them at roughly 15-20mb so assuming the 30mb Gunicorn was just misconfigured that's still an extra $100 per host using your estimate and what I'm looking at Googling around and across a situation where there are multiple public apis that's adding up pretty quickly.

Another google search shows me Gunicorn, for instance, using high memory on fork isn't exactly uncommon either.

Edit: I reworded some stuff up there and tried to make my point more clear.

throwaway894345 · on June 12, 2020

The interpreter overhead on macos is 7.7mb. I can't speak to gunicorn configuration but it's far from the only game in town.

ianbutler · on June 12, 2020

Totally fair point, my experience with fork type deploys has only been Gunicorn so I'll take this as a challenge to try some others out.

earthboundkid · on June 13, 2020

Yes, C dependency management is awful, and because Python is only practical with C extensions for performance critical code, it ends up being a nightmare as well.

jordic · on June 12, 2020

In our use case switching to asyncio it's like moving from 12 cores to 3... (And I'm pretty sure we are handling more concurrency... from 24-30 req/s to 150req/s But our workload is mostly network related (db, external services...)

blondin · on June 12, 2020

same.

maybe author is concerned that many people are jumping the gun on async-await before we all fully understand why we need it at all. and that's true. but that paradigm was introduced (borrowed) to solve a completely different issue.

i would love to see how many concurrent connections those sync processes handle.

calpaterson · on June 12, 2020

Hi - not sure what you mean by this. The sync workers handle one request (to completion) per worker. So 16 workers means 16 concurrent requests. For the async workers it's different - they do more concurrently - but as discussed their throughput is not better (and latency much worse).

Maybe what you're getting at is cases where there are a large number of (fairly sleepy) open connections? Eg for push updates and other websockety things. I didn't test that I'm afraid. The state of the art there seems to be using async and I think that's a broadly appropriate usage though that is generally not very performance sensitive code except that you try to do as little as possible in your connection manager code.

fnord123 · on June 12, 2020

In the case of everything working smoothly that model may play out. But if you get a client that times out, or worse, a slow connection then they used one of your workers for a long time in a synchronous model. In the async model this has less of a footprint as you are still accepting other connections despite the slow progress of one of the workers.

blondin · on June 12, 2020

yes many open connections is what i meant (suggested by other people as well). by the way, i really liked the writing, it's refreshing. and i agree with you that people aren't using async for the right reasons.

calpaterson · on June 12, 2020

Thanks :) , really appreciate that. I think all technology goes through a period of wild over-application early on. My country is full of (hand dug) canals for example