I'd like bigger GPUs. A trillion parameter model at 16 bits needs 2000gb+ for inference, more for training. All kinds of things can be done to spread it across multiple GPUs, downsize to less bits etc, but it's a lot easier to just shove a model on one GPU.
We'll likely see more efficiency from bigger GPUs and hopefully more availability as a result.
My question on the very slow growth of available memory: are there technical reasons they cannot trivially build a card with 100GB of RAM (even with lower performance) or has it been a business decision to milk the market for every penny?
High speed I/O pins cost a lot, and GDDR generally has 32 data pins per chip and no way to attach multiple chips to the same pins. So 256 bits and 16GB is hard to exceed by much on that tech. The high end is 384 bits and 24GB.
There is a mode to attach 16 data pins to each GDDR chip, so with some extra effort you could probably double that to 48GB. Or at least 32GB. Maybe this is a valid niche, or maybe there isn't enough demand.
The alternative to this is HBM, which can stack up big amounts, but it's a lot more expensive.
I don't disagree with Dylan, but I'm more than willing to bet that the only reason Nvidia's cards (and that's who we're talking about. CUDA is a hell of a moat.) are RAM-starved is that they haven't felt the pressure to do otherwise. AMD has an institutional aversion towards good software. Intel isn't even an also-ran, yet.
Apple and their unified memory architecture may be the prod needed to get larger levels of RAM available to single cards solutions. We'll see.
We'll likely see more efficiency from bigger GPUs and hopefully more availability as a result.