Is Intel still going to add SHA-1 specific instructions to their next chip? I'm ...

api · on Nov 15, 2013

AES-specific instructions were even a bit questionable... but this is highly questionable. Algorithm-specific instructions are basically going to become very dated cruft within a few years to a decade tops and are probably a bad thing to add to any instruction set. Better would be to add instructions for speeding up all sorts of crypto algorithms.

geofft · on Nov 15, 2013

AES-specific instructions are also the only sane way to implement AES without timing attacks.

I'm curious about this "instructions for speeding up all sorts of crypto algorithms" proposal -- how would you do that, given that crypto algorithms tend to have a wide variety of implementations and a wide variety of mathematical underpinnings? Do you want instructions that speed up all sorts of math?

tptacek · on Nov 15, 2013

I'm not sure that's true. Maybe you're thinking about implementing GCM without lookup tables; GCM is most often considered in AES context, and is infamously tricky to do in software in constant time and reasonable speed.

pbsd · on Nov 16, 2013

He's right; implementing plain constant-time AES can be done without AES-NI, but is by no means easy or obvious.

tptacek · on Nov 16, 2013

I didn't think it was obvious, but also didn't think the performance hit was as bad. (We're getting out of my comfort zone; I know how constant time AES implementations work, but not what the current speed records are for them.)

pbsd · on Nov 16, 2013

The best timings I'm aware of are ~7cpb for AES-CTR, and ~14cpb for GHASH on Nehalem [2]. It's a bitsliced implementation, so it makes sense to compare it to counter-mode AES-NI. A recent AES-NI implementation on Sandy Bridge [1, pg. 25-26] achieves 0.79cpb for AES-CTR, and 1.68cpb for GHASH.

The point: the ratios 14/1.68 and 7/0.79 are quite similar.

PS: The performance of PCLMULQDQ was vastly improved in Haswell, and I believe AES-GCM in there runs at something like 1.5cpb. However, the vector size of Haswell also doubles to 256 bits, which would also improve an hypothetical bitsliced AES-GCM implementation. Hard to say what that speed would be, so I won't try to compare things in Haswell.

[1] https://crypto.stanford.edu/RealWorldCrypto/slides/gueron.pd...

[2] http://eprint.iacr.org/2009/129

tptacek · on Nov 16, 2013

This is a cool paper, but I don't see 14cpb GHASH in it; their best timings for large packets in constant time are over 20cpb.

pbsd · on Nov 16, 2013

Yes, that's the aggregate time 7 + 14 (plus some small overhead). The 14cpb figure is mentioned at the end of page 10.

tptacek · on Nov 16, 2013

Neat. Thanks again!

geofft · on Nov 15, 2013

I might have it confused? I thought there were cache-timing issues with doing lookups in the S-box in RAM, for instance.

I certainly also buy that the Intel GCM acceleration functions are useful.

tptacek · on Nov 16, 2013

There are definitely AES cache timing issues! I had it in my head that GHASH was harder to make constant time than AES, probably because of Adam Langley, but 'pbsd points out that it's subtler than that. Both GCM and AES are tricky to do in constant time in software.

sillysaurus2 · on Nov 15, 2013

If you want to use AES without exposing yourself to timing attacks, the correct course of action is to use NaCl.

steve19 · on Nov 15, 2013

They can just implement cruft in microcode rather than silicon. In the meantime everyone using AES for full disk encryption gets a nice speedup.

jevinskie · on Nov 15, 2013

AES-NI instructions have their own bit in CPUID (not bundled with any SSE bit) so future chips could not include it and software would fall back to regular AES code paths.

yuhong · on Nov 16, 2013

I don't think even all current Intel chips have it either.