This feels a bit ironic given how much the for loop has been villainized in nume...

anonpython · on Nov 12, 2021

In Python numeric computing it's common for your outer loops to be for loops and your inner loops to be vectorized PyTorch/whatever.

I personally like being able to easily comprehend and control what's being vectorized. Maybe it would be nice if my compiler could automatically replace any inefficient loops with vectorized equivalents, and I could think in whichever idiom came more naturally to the problem at hand. But I don't think there's anything too illogical about looping over epochs and batches, and then computing your loss function with matrices. Maybe I'm just used to a suboptimal way of doing things :)

StefanKarpinski · on Nov 12, 2021

> Maybe it would be nice if my compiler could automatically replace any inefficient loops with vectorized equivalents

The trouble is that a for loop is much more expressive than vectorized operations, so most for loops cannot be transformed into vectorized equivalents. The reason convincing people to write vectorized code works for performance is that you're constraining what they can express to a small set of operations that you already have fast code for (written in C). Instead of relying on compiler cleverness, this approach relies on human cleverness to express complex computations with that restricted set of vectorized primitives. Which is why it can feel like such a puzzle to write vectorized code that does what you want—because it is! So even if a compiler could spot some simple patterns and vectorize them for you, it would be incredibly brittle in the sense that as soon as you change the code just a little, it would immediately fall off a massive performance cliff.

I guess that's actually the second problem—the first problem is that there isn't any compiler in CPython to do this for you.

mixmastamyk · on Nov 12, 2021

With hindsight, the languages turned out to be a bit "too dynamic" for their own good. Very few are changing variable types often enough for that feature to be useful. The downside, makes typing bugs possible/more likely, and slows down every access. Par for the course, would say that slow loops are a symptom not a cause.

a-dub · on Nov 12, 2021

matlab has been jit compiled for years now, the "for loops are slow" dogma is over.

numerical computing in python is kinda weird as it wasn't the original purpose and the fast math libraries were bolted on as an afterthought, but even then tools like numba do the same in python, although there's a bunch of nuance in writing simple enough python and hinting at the correct types for the variables in order to get it to compile something reasonable.

julia's let's use strict types, jit compile everything from day one and avoid locking approach is nice though.

toxik · on Nov 12, 2021

After numba’ing several nontrivial pieces of numpy code in my life, you might as well just rewrite it in nopython mode for Cython unless it’s very trivial stuff. Numba errors and partial coverage of numpy is a huge time sink in my experience.

a-dub · on Nov 13, 2021

it gets even more fun when you need to support both mkl and standard numpy libraries, as well as several bit widths.

but it can be done and goes to show that it's not going to be putting optimization experts out of work yet.

reading about the state of autograd libraries in julia seems to indicate the same.