The Paul Cooper production is great. The Rest Is History also just finished a long series (spread out in three seasons, starting on episode 421) on the Punic wars, similarly well done.
That's wonderful, thanks. I've just about finished all of The Fall of Civilisations and am already feeling a sense of loss. I'm sure I'll listen to them again, but am happy to hear recommendations of other great history content.
Presenting information theory as a series of independent equations like this does a disservice to the learning process. Cross-entropy and KL-divergence are directly derived from information entropy, where InformationEntropy(P) represents the baseline number of bits needed to encode events from the true distribution P, CrossEntropy(P, Q) represents the (average) number of bits needed for encoding P with a suboptimal distribution Q, and KL-Divergence (better referred to as relative entropy) is the difference between these two values (how many more bits are needed to encode P with Q, i.e. quantifying the inefficiency):
Information theory is some of the most accessible and approachable math for ML practitioners, and it shows up everywhere. In my experience, it's worthwhile to dig into the foundations as opposed to just memorizing the formulas.
I think Shannon's Mathematical Theory of Communication is so incredibly well written and accessible that anyone interested in information theory should just start with the real foundational work rather than lists of equations, it really is worth the time to dig into it.
Agree 100% with this. It gives the illusion of understanding, like when a precocious 6 year old learns the word "precocious" and feels smart because they have can say it. Or any movie with tech or science with <technical speak>.
While I can share the sentiment, my small experience teaching (and studying the same area for over a decade) suggests that giving students a trivial formula to play with "as is" helps motivate its future usage well. It is difficult to teach everything important about X in one go, knowledge is accumulated in layers.
> One of Bill Atkinson’s amazing feats (which we are so accustomed to nowadays that we rarely marvel at it) was to allow the windows on a screen to overlap so that the “top” one clipped into the ones “below” it. Atkinson made it possible to move these windows around, just like shuffling papers on a desk, with those below becoming visible or hidden as you moved the top ones. Of course, on a computer screen there are no layers of pixels underneath the pixels that you see, so there are no windows actually lurking underneath the ones that appear to be on top. To create the illusion of overlapping windows requires complex coding that involves what are called “regions.” Atkinson pushed himself to make this trick work because he thought he had seen this capability during his visit to Xerox PARC. In fact the folks at PARC had never accomplished it, and they later told him they were amazed that he had done so. “I got a feeling for the empowering aspect of naïveté”, Atkinson said. “Because I didn’t know it couldn’t be done, I was enabled to do it.” He was working so hard that one morning, in a daze, he drove his Corvette into a parked truck and nearly killed himself. Jobs immediately drove to the hospital to see him. “We were pretty worried about you”, he said when Atkinson regained consciousness. Atkinson gave him a pained smile and replied, “Don’t worry, I still remember regions.”
With overlapping rectangular windows (slightly simpler case than ones with rounded corners) you can expect visible regions of windows that are not foremost to be, for example, perhaps "L" shaped, perhaps "T" shaped (if there are many windows and they overlap left and right edges). Bill's region structure was, as I understand it, more or less a RLE (run-length encoded) representation of the visible rows of a window's bounds. The region for the topmost window (not occluded in any way) would indicate the top row as running from 0 to width-of-window (or right edge of the display if clipped by the display). I believe too there was a shortcut to indicate "oh, and the following rows are identical" so that an un-occluded rectangular window would have a pretty compact region representation.
Windows partly obscured would have rows that may not begin at 0, may not continue to width-of-window. Window regions could even have holes if a skinnier window was on top and within the width of the larger background window.
The cleverness, I think, was then to write fast routines to add, subtract, intersect, and union regions, and rectangles of this structure. Never mind quickly traversing them, clipping to them, etc.
The QuickDraw source code refers to the contents of the Region structure as an "unpacked array of sorted inversion points". It's a little short on details, but you can sort of get a sense of how it works by looking at the implementation of PtInRgn(Point, RegionHandle):
As far as I can tell, it's a bounding box (in typical L/T/R/B format), followed by a sequence of the X/Y coordinates of every "corner" inside the region. It's fairly compact for most region shapes which arise from overlapping rectangular windows, and very fast to perform hit tests on.
The key seems to have been recognizing the utility of the region concept and making it fundamental to the QuickDraw API (and the clever representation that made finding the main rectangular portions easy). This insulated QuickDraw from the complexity of windowing system operations. Once you go implementing region operations you probably find that it's fairly efficient to work out the major rectangular regions so you can use normal graphics operations on them, leaving small areas that can just be done inefficiently as a bunch of tiny rectangles. All this work for clipped graphics was applicable to far more than just redrawing obscured window content, so it could justify more engineering time to polishing it. Given how easy they were to use, more things could leverage the optimization (e.g. using them to redraw only the dirty region when a window was uncovered).
I think the difference between the Apple and Xerox approach may be more complicated than the people at PARC not knowing how to do this. The Alto doesn't have a framebuffer, each window has its own buffer and the microcode walks the windows to work out what to put on each scanline.
Not doubting that, but what is the substantive difference here? Does the fact that there is a screen buffer on the Mac facilitate clipping that is otherwise not possible on the Alto?
It allows the Mac to use far less RAM to display overlapping windows, and doesn't require any extra hardware. Individual regions are refreshed independently of the rest of the screen, with occlusion, updates, and clipping managed automatically,
Yeah, it seems like the hard part of this problem isn't merely coming up with a solution that technically is correct, but one that also is efficient enough to be actually useful. Throwing specialized or more expensive hardware at something is a valid approach for problems like this, but all else being equal, having a lower hardware requirement is better.
I was just watching an interview with Andy Hertzfeld earlier today and he said this was the main challenge of the Macintosh project. How to take a $10k system (Lisa) and run it on a $3k system (Macintosh).
He said they drew a lot of inspiration from Woz on the hardware side. Woz was well known for employing lots of little hacks to make things more efficient, and the Macintosh team had to apply the same approach to software.
So when the OS needs to refresh a portion of the screen (e.g. everything behind a top window that was closed), what happens?
My guess is it asks each application that overlapped those areas to redraw only those areas (in case the app is able to be smart about redrawing incrementally), and also clips the following redraw so that any draw operations issued by the app can be "culled". If an app isn't smart and just redraws everything, the clipping can still eliminate a lot of the draw calls.
Displaying graphics (of any kind) without a framebuffer is called "racing the beam" and is technically quite difficult and involves managing the real world speed of the electron beam with the cpu clock speed ... as in, if you tax the cpu too much the beam goes by and you missed it ...
The very characteristic horizontally stretched graphics of the Atari 2600 are due to this - the CPU was actually too slow, in a sense, for the electron beam which means your horizontal graphic elements had a fairly large minimum width - you couldn't change the output fast enough.
It definitely makes it simpler. You can do a per-screen window sort, rather than per-pixel :).
Per-pixel sorting while racing the beam is tricky, game consoles usually did it by limiting the number of objects (sprites) per-line, and fetching+caching them before the line is reached.
I remember coding games for the C64 with an 8 sprite limit, and having to swap sprites in and out for the top and bottom half of the screen to get more than 8.
Frame buffer memory was still incredibly expensive in 1980. Our labs 512 x 512 x 8bit table lookup color buffer cost $30,000 in 1980. Mac's 512 x 384 x 8bit buffer in 1984 had to fit the Macs $2500 price. The Xerox Alto was earlier than these two devices and would have cost even more if it had a full frame buffer.
The Alto created the image from a display list, like the Atari 800 or the Amiga. So you could have a wider rectangle on most of the screen for pictures and a narrower rectangle at the bottom for displaying status. It was not up to showing overlapping windows. Nearly all applications just set things to one rectangle, having a frame buffer in practice. This was the case for Smalltalk, which is where Bill saw the overlapping windows. One problem is that filling up the whole screen (606x808) used up half of the memory and slowed down user code, so Smalltalk-72 reduced this to 512x684 to get back some memory and performance.
The Smalltalk-76 MVC user interface that the Apple people saw only ever updated the topmost window which, by definition, was not clipped by any other window. If you brought some other window to the front it would only then be updated. But since nothing ran in the background it was easy to get the wrong impression that the partially visible windows were being handled.
Bill's solution had two parts: one was regions, as several other people have explained. It allowed drawing to a background window even while clipping to any overlapping windows that are closer. But the second was PICTs, where applications did not directly draw to their windows but instead created a structure (could be a file) with a list of drawing commands which was then passed to the operating system for the actual drawing. You could do something like "open PICT, fill background with grey pattern, draw white oval, draw black rectangle, close PICT".
Now if the window was moved the OS could recalculate all the regions of the new configuration and re-execute all the PICTs to update any newly exposed areas. If the application chose to instead draw its own pixels (a game, for example) then the OS would insert a warning into the app's event queue that it should fix its window contents.
In parallel with Bill's work (perhaps a little before it) we had Rob Pike's Blit terminal (commercially released in 1982) which added windows to Unix machines. It had the equivalent of regions (less compact, however) but used a per window buffer so the terminal would have where to copy newly exposed pixels from.
Reminds me of a GPU's general workflow. (like the sibling comment, 'isn't that the obvious way this is done'? Different drawing areas being hit by 'firmware' / 'software' renderers?)
Would someone mind explaining the technical aspect here? I feel with modern compute and OS paradigms I can’t appreciate this. But even now I know that feeling when you crack it and the thrill of getting the imposible to work.
It’s on all of us to keep the history of this field alive and honor the people who made it all possible. So if anyone would nerd out on this, I’d love to be able to remember him that way.
There were far fewer abstraction layers than today. Today when your desktop application draws something, it gets drawn into a context (a "buffer") which holds the picture of the whole window. Then the window manager / compositor simply paints all the windows on the screen, one on top of the other, in the correct priority (I'm simplifying a lot, but just to get the idea). So when you are programing your application, you don't care about other applications on the screen; you just draw the contents of your window and that's done.
Back at the time, there wouldn't be enough memory to hold a copy of the full contents all possible windows. In fact, there were actually zero abstraction layers: each application was responsible to draw itself directly into the framebuffer (array of pixels), into its correct position. So how to handle overlapping windows? How could each application draw itself on the screen, but only on the pixels not covered by other windows?
QuickDraw (the graphics API written by Atkinson) contained this data structure called "region" which basically represent a "set of pixels", like a mask. And QuickDraw drawing primitives (eg: text) supported clipping to a region. So each application had a region instance representing all visible pixels of the window at any given time; the application would then clip all its drawing to the region, so that only the visibile pixels would get updated.
But how was the region implemented? Obviously it could have not been a mask of pixels (as in, a bitmask) as it would use too much RAM and would be slow to update. In fact, think that the region datastructure had to be quick at doing also operations like intersections, unions, etc. as the operating system had to update the regions for each window as windows got dragged around by the mouse.
So the region was implemented as a bounding box plus a list of visible horizontal spans (I think, I don't know exactly the details). When you represent a list of spans, a common hack is to use simply a list of coordinates that represent the coordinates at which the "state" switches between "inside the span" to "outside the span". This approach makes it for some nice tricks when doing operations like intersections.
Hope this answers the question. I'm fuzzy on many details so there might be several mistakes in this comment (and I apologize in advance) but the overall answer should be good enough to highlight the differences compared to what computers to today.
It's a good description, but I'm going to add a couple of details since details that are obvious to someone who lived through that era may not be obvious to those who came after.
> Obviously it could have not been a mask of pixels
To be more specific about your explanation of too much memory: many early GUIs were 1 bit-per-pixel, so the bitmask would use the same amount of memory as the window contents.
There was another advantage to the complexity of only drawing regions: the OS could tell the application when a region was exposed, so you only had to redraw a region if it was exposed and needed an update or it was just exposed. Unless you were doing something complex and could justify buffering the results, you were probably re-rendering it. (At least that is my recollections from making a Mandelbrot fractal program for a compact Mac, several decades back.)
And even ignoring memory requirements, an uncompressed bitmap mask would have taken a lot of time to process (especially considering when combining regions where one was not a multiple of 8 pixels shifted with respect to the other. With just the horizontal coordinates of inversions, it takes the same amount of time for a region 8 pixels wide and 800 pixels wide, given the same shape complexity.
Yeah those are the horizontal spans I was referring to.
It’s a sorted list of X coordinates (left to right). If you group them in couples, they are begin/end intervals of pixels within region (visibles), but it’s actually more useful to manipulate them as a flat array, as I described.
I studied a bit the code and each scanline is prefixed by the Y coordinates, and uses an out of bounds terminator (32767).
It's a bit more than that. The list of X coordinates is cumulative - once an X coordinate has been marked as an inversion, it continues to be treated as an inversion on all Y coordinates below that, not just until the next Y coordinate shows up. (This manifests in the code as D3 never being reset within the NOTRECT loop.) This makes it easier to perform operations like taking the union of two disjoint regions - the sets of points are simply sorted and combined.
Uhm can you better explain that? I don’t get it. D3 doesn’t get reset because it’s guaranteed to be 0 at the beginning of each scanline, and the code needs to go through all “scanline blocks” until it finds the one whose Y contains the one specified as argument. It seems to me that each scanline is still self contained and begins logically at X=0 in the “outside” state?
> D3 doesn’t get reset because it’s guaranteed to be 0 at the beginning of each scanline
There's no such guarantee. The NEXTHOR loop only inverts for points which are to the absolute left of the point being tested ("IS HORIZ <= PT.H ? \\ NO, IGNORE THIS POINT").
Imagine that, for every point, there's a line of inversion that goes all the way down to the bottom of the bounding box. For a typical rectangular region, there's going to be four inversion points - one for each corner of the rectangle. The ones on the bottom cancel out the ones on the top. To add a second disjoint rectangle to the region, you'd simply include its four points as well; so long as the regions don't actually overlap, there's no need to keep track of whether they share any scan lines.
Pretty awesome story, but also with a bit of dark lining. Of course any owner, and triple that for Jobs, loves over-competent guys who work themselves to the death, here almost literally.
But that's not a recipe for personal happiness for most people, and most of us would not end up contributing revolutionary improvements even if done so. World needs awesome workers, and we also need ie awesome parents or just happy balanced content people (or at least some part of those).
Pretty much. Most of us have creative itches to scratch that make us a bit miserable if we never get to pursue them, even if given a comfortable life. It’s circumstantial whether we get to pursue them as entrepreneurs or employees. The users or enjoyers of our work benefit either way.
Just to add on, some of us have creative itches that are not directly monetizable, and for which there may be no users or enjoyers of our work at all (if there are, all the better!).
Naturally I don’t expect to do such things for a living.
Survivorship bias. The guys going home at 5 went home at 5 and their companies are not written about. It’s dark but we’ve been competing for a while as life forms and this is “dark-lite” compared to what our previous generations had to do.
Some people are competing, and need to make things happen that can’t be done when you check out at 5. Or more generally: the behaviour that achieves the best outcome for a given time and place, is what succeeds and forms the legends of those companies.
If you choose one path, know your competitors are testing the other paths. You succeed or fail partly based on what your most extreme competitors are willing to do, sometimes with some filters for legality and morality. (I.e. not universally true for all countries or times.)
Edit: I currently go home at 5, but have also been the person who actually won the has-no-life award. It’s a continuum, and is context specific. Both are right and sometimes one is necessary.
> In fact the folks at PARC had never accomplished it, and they later told him they were amazed that he had done so.
Reminds me of the story where some company was making a new VGA card, and it was rumored a rival company had implemented a buffer of some sort in their card. When both cards came out the rival had either not actually implemented it or implemented a far simpler solution
An infamous Starcraft example also contains notes of a similar story where they were so humbled by a competitor's demo (and criticism that their own game was simply "Warcraft in space") that they went back and significantly overhauled their game.
Former Ion Storm employees later revealed that Dominion’s E3 1996 demo was pre-rendered, with actors pretending to play, not live gameplay.
I got a look at an early version of StarCraft source code as a reference for the sound library for Diablo 2 and curiosity made me do a quick analysis of the other stuff - they used a very naive approach to C++ and object inheritance to which first time C++ programmers often fall victim. It might have been their first C++ project so they probably needed to start over again anyways. We had an edict on Diablo 2 to make the C++ look like recognizable C for Dave Brevik's benefit which turned out pretty well I think (it was a year late but we shipped).
Michael Abrash's black book of graphics programming. They heard about a "buffer", so implemented the only non-stupid thing - a write FIFO. Turns out the competition had done the most stupid thing and built a read buffer.
I teach this lesson to my mentees. Knowing that something is possible gives you significant information. Also, don't brag - It gives away significant information.
Just knowing something is possible makes it much, much easier to achieve.
> Turns out the competition had done the most stupid thing and built a read buffer
This isn't really stupid though as explained in the pdf
> Paradise had stuck a read FIFO
between display memory and the video output stage of the VGA, allowing the video output to
read ahead, so that when the CPU wanted to access display memory, pixels could come from
the FIFO while the CPU was serviced immediately. That did indeed help performance--but not
as much as Tom’s write FIFO.
VRAM accesses are contended, so during the visual display period the VGA circuitry has priority.
CPU accesses result in wait states - a FIFO between the VRAM and the VGA means less contention and more cycles for CPU accesses
Why improve read performance though? Games accessing VRAM I presume would be 99% write.
Perhaps it was to improve performance in GUIs like Windows?
Pinterest | Hybrid @ {San Francisco, New York, or Seattle} | Full-time + internships
Pinterest’s Advanced Technologies Group (ATG) is an ML applied research organization within the company, focusing on large-scale foundation models (e.g. multimodal encoders, graph representation models, content embeddings, generative models, computer vision signals, etc.) that are deployed throughout the company. ATG is composed primarily of ML engineers and researchers, backed by a strong infrastructure team, and a small product prototyping + design team for deploying new AI/ML features in Pinterest. The organization is highly collaborative, research-driven, and delivers deep impact. The team is hiring for several engineering position
- iOS engineer for generative AI products: we are looking for senior or staff iOS engineers who have a track record of building fast prototyping work in the AI space — no deep machine learning domain expertise is required, but the ideal candidate would be comfortable interfacing with our ATG’s ML teams daily. An engineer in this role would be building entirely new features for Pinterest leveraging emerging technologies across LLMs, visual models, recommendation systems, and more.
- Computer vision domain specialist: we are looking for researchers or applied engineers with industry experience in the computer vision / visual-language modeling field (e.g. multimodal representation learning, visual diffusion models, visual encoders/decoders, etc.) We encourage the team to regularly publish, and the team works in a highly collaborative, research-driven environment, with full access to the Pinterest image-board-style graph for large-scale pre-training.
Please reach out to me directly (dkislyuk@pinterest.com) if you’re interested in either of these roles.
Additionally, the team is currently hiring for fall 2025 ML research internships for Master’s / PhD students, with opportunities to publish or to work on frontier models in the visual understanding and multimodal representation learning space: https://grnh.se/dad7c60e1us
This is a great characterization of self-information. I would add that the `log` term doesn't just conveniently appear to satisfy the additivity axiom, but instead is the exact historical reason why it was invented in the first place. As in, the log function was specifically defined to find a family of functions that satisfied f(xy) = f(x) + f(y).
So, self-information is uniquely defined by (1) assuming that information is a function transform of probability, (2) that no information is transmitted for an event that certainly happens (i.e. f(1) = 0), and (3) independent information is additive. h(x) = -log p(x) is the only set of functions that satisfies all of these properties.
I think commodification is directly tied to a perceived drop in quality. For example, if the barriers to making a video game keep going down, there will be far more attempts, and per Sturgeon's law, the majority will be of low quality. And we have a recency bias where we over-index on the last few releases that we've seen, and we only remember the good stuff from a generation or two ago. But for every multitude of low-effort, AI-generated video games out there, we still get gems like Factorio and Valheim.
Surgeon's paw was true in the 90's and is true in the '20s. There's not much point in comparing the crap to the crap. The only big difference is that it is easier to see the bottom of the barrel in your most popular storefronts with a click l(even on "curated" ones these days woth PSN and the eShop) instead of going out of your way to find some shareware from a Geocity that barely functioned.
Thing is those high profile disasters are still supposedly the "cream of the crop". That's why they get compared to the cream of before.
Popular examples are easier to exemplify as well instead of taking the time to explain what Blinx the Cat or Midnight Club are (examples of good but not genre-defining entries)
I found that looking at the original motivation of logarithms has been more elucidating than the way the topic is presented in grade-school. Thinking through the functional form that can solve the multiplication problem that Napier was facing (how to simplify multiplying large astronomical observations), f(ab) = f(a) + f(b), and why that leads to a unique family of functions, resonates a lot better with me for why logarithms show up everywhere. This is in contrast to teaching them as the inverse of the exponential function, which was not how the concept was discussed until Euler. In fact, I think learning about mathematics in this way is more fun — what original problem was the author trying to solve, and what tools were available to them at the time?
Toeplitz wrote "Calculus: The Genetic Approach" and his approach of explaining math via its historical development is apparently more widely used: https://en.wikipedia.org/wiki/Genetic_method . Felix Klein remarked: "on a small scale, a learner naturally and always has to repeat the same developments that the sciences went through on a large scale"
We could really take a page from this style for teaching advanced computing. We try to imagine that architectures just kind of come out of nowhere. Starting with mechanical computing and unit record equipment makes so much make more sense.
There is Mathematics for the Million by Lancelot Hogben, which not only covers math, but the history of math and why it was developed over the centuries. It starts with numbers, then geometry, arithmetic, trig, algebra, logarithms and calculus, in that order. It's a very cool book.
I was going to say the same! I got it years ago, it's hard to top a math book with a quote from a certain Al Einstein on the back cover singing its praises! Morris Kline's "Mathematics for the Nonmathematician" takes a similar approach, as I believe other books by the author do. Can also recommend "Code" by Charles Petzold and "The Information" by James Gleick, while not comprehensive they do cover the development of key mathematical insights over time.
I'm sympathetic but there's no clear historic chronology. For instance the ancient egyptians dealt with both algebra and calculus (at least in part) long before Pythagoras. And thats not starting on China and India which had very different chronologies.
Choose a chronology that makes sense. We can see how Western ideas build, we have less clarity on how the ancient Egyptians or Chinese ideas developed, and therefore it's harder to explain to a learner.
If you're sensitive to that singular world view warping the learner's prospect, you could at each point explain similar ideas from other cultures that pre-date that chronology.
For example, once you've introduced calculus and helped a student understand it, you can then jump back and point out that ancient Egyptians seemed to have a take on it, explain it, ask the student to reason did they get there in the same way as the Western school of ideas did, is there an interesting insight to that way of thinking about the World?
Another ideas is how ideas evolved. We know Newton and Leibniz couldn't have had access to direct Egyptian sources (hieroglyphs were a lost language in their life times), but Greek ideas would have been rolling around in their heads.
Here's one that starts with the concept of a straight line and builds all the way to string theory. It's a monumental book, and it still challenges me.
Roger Penrose's The Road To Reality.
A book without expecting any knowledge of mathematical notation would be a good start.
I've bought 3 math books to get into it and quit all of them within the first chapter.
Could you give a concrete example concerning what sort of notation caused you difficulty in the past? Asking because it seems odd to me that you feel you need to learn „all“ the notation to get started.
Starting in elementary school you slowly build up topics, mathematical intuition and notation more or less in unity. E.g. starting with whole numbers, plus and minus signs before multiplication, then fractions and decimal notation. By the end of high school you may have reached integrals and matrices to work with concepts from calculus and linear algebra…
It makes little sense to confront people with notation before the corresponding concepts are being taught. So it feels like you may have a different perspective on notation as a layperson that are no longer obvious to more advanced learners.
I want to learn the notation. Just not everything at once. I need to be able to see real world usecases, otherwise I wont be able to remember and apply the notation.
What I meant is learning the notation step by step, topic related.
Although the math in the book is relatively basic I enjoyed it tremendously because it gives the historical development for everything and even describes the characters of different mathematicians, etc. The historical context helps so much with understanding.
If you like this approach, I highly recommend Mathematics: It's Content, Methods, and Meaning by Kolmogorov. He uses this same approach, but applies it to many more concepts in math (about 1,000 pages!). In fact, I think I actually heard about that book on this site, so I guess I'm paying it forward.
This approach was to align with the Soviet philosophy of dialectical materialism, which claims that all things arise from a material need. Not sure I'm fully onboard with the philosophy as a whole, but Kolmogorov's book was really eye opening.
I think this should be front and center. To that end I propose "magnitude notation"[0] (and I don't think we should use the word logarithm, which sounds like advanced math and turns people away from the basic concept, which does make math easier and more fun).
> I think this should be front and center. To that end I propose "magnitude notation"[0] (and I don't think we should use the word logarithm, which sounds like advanced math and turns people away from the basic concept, which does make math easier and more fun).
The only reason that "logarithm" sounds like advanced math is because it was so useful that mathematicians, well, used it. Since this terminology is just logarithms without saying the word, if it is more useful it, too, will probably be used by mathematicians, and then it will similarly come to sound like advanced math. So what's the point of running away from a name for what we're doing that fits with what it's actually called, if eventually we'll just have to make up a new, even less threatening name for it?
(I'd argue that "logarithm" is frightening less because it sounds like advanced math than because it's an unfamiliar and old-fashioned-sounding word. I'm not completely sure that "magnitude" avoids both these issues, but it's at least arguable that it suffers less from them.)
It's written like ^6 and said like "mag 6", which sounds like an earthquake (and this is basically the Richter scale writ large). One syllable, sounds cool, easy to type/spell, evokes largeness. "Logarithm" is 3-4 syllables, hard to pronounce, hard to spell, sounds jargon-y.
People virtually never say “logarithm” in use though. They either say “log” or they say “lun” for natural log. Notice that both log and lun are one syllable, easy to pronounce etc.
Magnitude is an existing and important concept in maths - it would be extremely confusing to just overload it to mean something else.
The log of 3.1m is 6.5. How do you say "10^6.5"? I say "mag 6.5" and it is clear. The Richter scale famously uses "mag 6.5" exactly like this. If that was ever confusing, then we've managed to work past it, and this just expands the Richter scale to cover basically everything.
There's nothing particularly special about the Richter scale in that respect. All logarithmic scales (eg dB) work that way. Both the Richter scale and decibels (and other logarithmic scales) are also famous like other nonlinear scales[1] for being widely misunderstood so I'm not sure a lot of people would think your way is clearer than the current usage, which is just to say "3.1m" if that's what you mean. That said, I like log scales and logarithms in general so if you want to campaign for this scale, knock yourself out. I don't like that you're calling it magnitude though, because magnitude means a specific thing (the first coordinate of a vector in polar or spherical form).
I have been writing the same thing by (ab)using the existing unit of measurement known as a bel (B), which is most commonly seen with the SI prefix “deci” (d) as dB or decibel. I write the speed of light as 8.5 Bm/s (“8.5 bel meters per second”), which resembles the expression 20 dBV (“20 decibel volts”) shown at https://en.wikipedia.org/wiki/Decibel.
Mag is the inverse of log10. e.g. log10 ^6 = 6. We have no current shorthand for inverse log10 except "tentothe" which might be serviceable but is not as punchy.
I often wonder about this. I also believe that mathematical pedagogy strive to attract people that are very smart and think in the abstract like euler, and not operationally, meaning they will get it intuitively.
For other people, you need to swim in the original problem for a while to see the light.
I think it is a combination of factors. Mathematical pedagogy is legitimate if the end goal is to train mathematicians, so yes it is geared towards those who think in the abstract. (I'm going to ignore the comment about very smart, since I don't think mathematical ability should be used as a proxy for intelligence.)
On the other side, I don't think those who are involved in curriculum development are very skilled in the applications of mathematics. I am often reminded of an old FoxTrot comic where Jason calculated the area of a farmer's field using calculus.
Frankly I wish I had known integral calculus going into geometry, I could tell there was a pattern behind formulas for areas and volumes but I couldn't for the life of me figure it out. There are worse ways to remember the formula for the volume of a sphere than banging out a quick integral!
I had known it. Thanks Dr Steven Giavat. The geometric shapes gave the patterns meaning. I read 'mathematics and the imagination' and mathematics a human endever' while I was starting algebra. Also the time-life book on math. All very brilliant because they used the methods that were used to investigate it, to show how it was discovered. These allowed me to fly ahead in math until I got to trig. Which took a long year to get facile, until I was able to finish my degree.
I had brilliant teachers.
Napier's bones, were for adding exponents, hense multiplication. Brilliant and nessary for the development of the slide rule, and the foundation of modern engineering, until the pocket calculator.
I was recently struggling to model a financial process and solved it with Units. Once I started talking about colors of money as units, it became much easier to reason about which operations were valid.
I really disagree with the straightforward reduction of engineering to 'math but practical', but I'm finding it hard to express exactly why I feel this way.
The history of mathmatical advancement is full of very grounded and practical motivations, and I don't believe that math can be separated from these motivations. That is because math itself is "just" a language for precise description, and it is made and used exactly to fit our descriptive needs.
Yes, there is the study of math for its own sake, seemingly detached from some practical concern. But even then, the relationships that comprise this study are still those that came about because we needed to describe something practical.
So I suppose my feeling is that, teaching math without a use case is like teaching english by only teaching sentence construction rules. It's not that there's nothing to glean from that, but it is very divorced from its real use.
As someone who is studying maths at the moment I don’t recognise this picture at all. Every resource I learn from stresses the practical motivation for things. My book of odes is full of problems involving liquids mixing, pollution dispersing through lakes, etc, my analysis book has a whole big thing about heat diffusion to justify Fourier analysis, the course I’m following online uses differential equations in population dynamics to justify eigenvalues etc.
Agreed, and it's such a shame! A kid goes to math class and learns, say, derivatives as this weird set of transformations that have to be memorized, and it's only later in in physics class that they start to see why the transformations are useful.
I mean, imagine a programming course where students spend the whole first year studying OpenGL, and then in the second year they learn that those APIs they've been memorizing can be used to draw pictures :D
I actually prefer the straightforward log is an inverse of exponents. It's more intuitive that way because I automatically can understand 10^2 * 10^3 = 10^5. Hence if you are using log tables, addition makes sense. I didn't need an essay to explain that.
Take logs, add 2 + 3 = 5 and then raise it back to get 10^5.
This is how I've always taught logarithms to students I've tutored. I photocopy a table of various powers of ten, we use it in all sorts of ways to solve problems, and then I sneakily present an "inverse power" problem where they need to make the lookup backwards.
Almost every student gets it right away, and then I tell them looking up things backwards in the power table is called taking a logarithm.
That's how I mentally processed them when first learning them years ago. Doing operations on x and y with log(x) = y in the background somehow felt far less intuitive than thinking about 10^y = x.
I really enjoyed this author's work, BTW. Just spent several hours reading the entire first five chapters or so. What an excellent refresher for high school math in general.
This would be an interesting thing to study: How many different ways people learned about logarithms, and how they generally fared in math. I learned about logarithms by seeing my dad use his slide rule, and studying stock charts, which tended to be semi-logarithmic.
It gives the history / motivation behind logarithms and suddenly it became so much clearer to me. Pretty much multipling huge numbers by adding exponents , well I think I've understood that correctly?
I think why I'm so interested in programming and computing is because I fascinated by the history of it all. It somehow acts as a motivation to understand it.
Normally, a sliderule at distance x has the value of log(x) written on it, which allows doing multiplications by moving along the sliderule, since log(ab) = log(a) + log(b).
Now imagine a sliderule onto which values of x^2/2 are written. This also allows you to multiply two numbers, because ab = (a+b)^2/2 - (a^2/2 + b^2/2).
Yes, but such a property was not available to Napier, and from a teaching perspective, it requires understanding exponentials and their characterizations first. Starting from the original problem of how to simplify large multiplications seems like a more grounded way to introduce the concept.
From a teaching perspective it goes like this: first we learn additions, and to undo additions we have subtractions; then we learn repeated additions i.e. multiplications, and to undo multiplications we have divisions; finally we learn repeated multiplications, i.e. exponentiation, and to undo exponentiation we have logarithms and roots.
Right, I'm not saying it's for no reason, but the asymmetry makes it harder to keep track of which undoes exponentiation in which way.
And logs are frankly more confusing than the other operations because more than anything else they feel like an algebraic expression in the form of an operation. Other operations intuitively feel like a process, whereas logs feel like more like a question.
Maybe that's just because I never learned them super well though, maybe they're not actually that inherently different ¯\_(ツ)_/¯
Presumably the book from this thread by Charles Petzold will be a great canonical resource, but originally there was a quote by Howard Eves that I came across that got me curious:
> One of the anomalies in the history of mathematics is the fact that logarithms were discovered before exponents were in use.
One can treat the discovery of logarithms as the search for a computation tool to turn multiplication (which was difficult in the 17th century) into addition. There were previous approaches for simplifying multiplication dating back to antiquity (quarter square multiplication, prosthaphaeresis), and A Brief History of Logarithms by R. C. Pierce covers this, where it’s framed as establishing correspondences between geometric and and arithmetic sequences. Playing around with functions that could possibly fit the functional equation f(ab) = f(a) + f(b) is a good, if manual, way to convince oneself that such functions do exist and that this is the defining characteristic of the logarithm (and not just a convenient property). For example, log probability is central to information theory and thus many ML topics, and the fundamental reason is because Claude Shannon wanted a transformation on top of probability (self-information) that would turn the probability of multiple events into an addition — the aforementioned "f" is the transformation that fits this additive property (and a few others), hence log() everywhere.
Interestingly, the logarithm “algorithm” was considered quite groundbreaking at the time; Johannes Kepler, a primary beneficiary of the breakthrough, dedicated one of his books to Napier. R. C. Pierce wrote:
> Indeed, it has been postulated that logarithms literally lengthened the life spans of astronomers, who had formerly been sorely bent and often broken early by the masses of calculations their art required.
I had a slide rule in high school. It was more of a novelty item by that point in time, only one of my math teachers even knew what a slide rule was, but that didn't stop me from figuring out how it was used and how it works. It didn't take much to figure out that the sliding action was solving problems by addition, and the funky scales were logarithmic. In other words: it performed multiplication by adding logs.
That said, I did encounter references to its original applications in other places. I studied astronomy and had an interest in the history of computation.
Pinterest | San Francisco, New York, or hybrid/remote (US-only) | ML Engineer / Applied Research Scientist | Full-time
Pinterest’s Advanced Technologies Group (ATG) is hiring for an engineering position on our visual modeling team for developing Pinterest Canvas. Canvas is a foundation text-to-image model developed internally for helping various visualization, inpainting, and outpainting products. In this role, you’ll get to work with Pinterest’s rich visual-text dataset to build large-scale generative models which are continuously being shipped to production. The core Canvas pod is a small group (~6 engineers) inside of ATG, which focuses on a broad variety of AI/ML initiatives, such as core computer vision, multimodal representation learning, heterogeneous graph neural networks, recommender systems, etc.
New-grads are welcome to apply (preferably with a masters or PhD). Candidates should have diffusion modeling experience (e.g. diffusion transformers, LoRA fine-tuning, complex {text, image} conditioning, style transfer, etc.) and some form of industry experience. Engineers within ATG have a lot of leeway in terms of product contribution, so both ML engineers and research scientists are welcome to apply. We encourage the team to regularly publish, and the role can be either in person (SF, NY) or hybrid is preferred.
Please reach out to me directly (dkislyuk@pinterest.com) if you’re interested.
Pinterest Advanced Technologies Group | Staff Engineer, iOS and applied ML | US remote or hybrid in SF/NY | Full-time
We’re looking for strong engineers to help us build consumer AI products within Pinterest’s Advanced Technologies Group (ATG), our in-house ML research division. You’d be working with a full-stack team of ML researchers and product engineers on projects that bring LLMs, diffusion models, and other core models in the generative multimodal ML and computer vision space to life inside the Pinterest product. Projects include assistants, new ways to search, restyling of boards / pins / rooms, and many other new applications. Your work will directly impact how millions of users experience Pinterest.
Tracks:
*iOS engineer*: You’ll craft beautiful and intuitive user experiences for our new AI products. Strong command of iOS and UI/UX craftsmanship required. Bonus points if you’re an opinionated product thinker with 0-1 mentality or have experience working with ML models. Please apply here: https://www.pinterestcareers.com/jobs/5426324/staff-ios-soft...
*Applied ML*: If you think you’d be a better fit as an applied ML or research engineer with an interest in directly translating research into user-facing products, feel free to contact me directly (@dkislyuk everywhere).
The ML and product engineering teams on ATG work directly together, along with design. The team consists of long-tenured employees who care deeply about both the quality of the Pinterest experience, and taking full advantage of the new capabilities emerging in the ML space over the last two years. ATG more broadly has spent the past decade+ bringing various ML technologies into the Pinterest ecosystem, and values publishing our work, building long-term infrastructure, and a collaborative and remote-friendly culture (though we do expect everyone to join company onsites a few times a year).
Yes, exactly. ViTs need O(100M)-O(1B) images to overcome the lack of spatial priors. In that regime and beyond, they begin to generalize better than ConvNets.
Unfortunately, ImageNet is not a useful benchmark for a while now since pre-training is so important for production visual foundation models.