AES-specific instructions were even a bit questionable... but this is highly questionable. Algorithm-specific instructions are basically going to become very dated cruft within a few years to a decade tops and are probably a bad thing to add to any instruction set. Better would be to add instructions for speeding up all sorts of crypto algorithms.
AES-specific instructions are also the only sane way to implement AES without timing attacks.
I'm curious about this "instructions for speeding up all sorts of crypto algorithms" proposal -- how would you do that, given that crypto algorithms tend to have a wide variety of implementations and a wide variety of mathematical underpinnings? Do you want instructions that speed up all sorts of math?
I'm not sure that's true. Maybe you're thinking about implementing GCM without lookup tables; GCM is most often considered in AES context, and is infamously tricky to do in software in constant time and reasonable speed.
I didn't think it was obvious, but also didn't think the performance hit was as bad. (We're getting out of my comfort zone; I know how constant time AES implementations work, but not what the current speed records are for them.)
The best timings I'm aware of are ~7cpb for AES-CTR, and ~14cpb for GHASH on Nehalem [2]. It's a bitsliced implementation, so it makes sense to compare it to counter-mode AES-NI. A recent AES-NI implementation on Sandy Bridge [1, pg. 25-26] achieves 0.79cpb for AES-CTR, and 1.68cpb for GHASH.
The point: the ratios 14/1.68 and 7/0.79 are quite similar.
PS: The performance of PCLMULQDQ was vastly improved in Haswell, and I believe AES-GCM in there runs at something like 1.5cpb. However, the vector size of Haswell also doubles to 256 bits, which would also improve an hypothetical bitsliced AES-GCM implementation. Hard to say what that speed would be, so I won't try to compare things in Haswell.
There are definitely AES cache timing issues! I had it in my head that GHASH was harder to make constant time than AES, probably because of Adam Langley, but 'pbsd points out that it's subtler than that. Both GCM and AES are tricky to do in constant time in software.
AES-NI instructions have their own bit in CPUID (not bundled with any SSE bit) so future chips could not include it and software would fall back to regular AES code paths.