At the very moment person in charge says "ok this works, now make it not slow". Python is modern age BASIC. Easy to write and good for prototypes, scripting, gluing together libraries, fast iterations. If you want performance and heavy data processing anything else will be better. PHP, Java, even JavaScript.
For example Python is struggling to reach real time performance decoding RLL/MFM data off of ancient 40 year old hard drives (https://github.com/raszpl/sigrok-disk). 4GHz CPU and I cant break 500KB/s in a simple loop:
To optimize that code snippet, use temporary variables instead of member lookups to avoid slow getattr and setattr calls. It still won’t beat a compiled language, number crunching is the worst sport for Python.
Which is why in Python in practice you pay the cost of moving your data to a native module (numpy/pandas/polars) and do all your number crunching over there and then pull the result back.
Not saying it's ideal but it's a solved problem and Python is eating good in terms of quality dataframe libraries.
All those class variables are already in __slots__ so in theory it shouldnt matter. Your advice is good
self.shift_index -= 16
shift_byte = (self.shift >> self.shift_index) & 0x5555
shift_byte = (shift_byte + (shift_byte >> 1)) & 0x3333
shift_byte = (shift_byte + (shift_byte >> 2)) & 0x0F0F
self.shift_byte = (shift_byte + (shift_byte >> 4)) & 0x00FF
but only for exactly 2-4 milliseconds per 1 million pulses :) Declaring local variable in a tight loop forces Python into a cycle of memory allocations and garbage collection negative potential gains :(
SWAR : 0.288 seconds -> 0.33 MiB/s
SWAR local : 0.284 seconds -> 0.33 MiB/s
This whole snipped is maybe what 50-100 x86 opcodes? Native code runs at >100MB/s while Python 3.14 struggles around 300KB/s. Python 3.4 (Sigrok hardcoded requirement) is even worse:
This is very subjective. Using Python influences your architecture in ways you would not encounter with other languages.
I maintain a critical service written in Python and hosted in AWS and with about 40 containers it can do 1K requests/sec with good reliability. But we see issues with http libraries and systemic pressure within the service.
1k requests/sec over 40 containers, meaning 25 RPS per container. Are you using synchronous threads by any chance (meaning if you're waiting on IO or a network call you are blocked yet your CPU is actually idle)? If so you might benefit from moving to gevent and handle that load with just a handful of containers.
The scale where throwing more hardware to run your CPU-intensive Python part (and not the part that just wait on a DB, IO or other networked service - that won't change with Golang) starts costing more than paying developers to write it in a new language and incurring the downside of introducing another language into the stack, throwing away all the "tribal knowledge" of the existing app and so on.
Modern hardware is incredibly fast, so if you wait for said scale it may never actually happen. It's likely someone will win the push for a rewrite based on politics rather than an actual engineering constraint, which I suspect happened here.