Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is Intel still going to add SHA-1 specific instructions to their next chip? I'm hoping for that to be reversed.


AES-specific instructions were even a bit questionable... but this is highly questionable. Algorithm-specific instructions are basically going to become very dated cruft within a few years to a decade tops and are probably a bad thing to add to any instruction set. Better would be to add instructions for speeding up all sorts of crypto algorithms.


AES-specific instructions are also the only sane way to implement AES without timing attacks.

I'm curious about this "instructions for speeding up all sorts of crypto algorithms" proposal -- how would you do that, given that crypto algorithms tend to have a wide variety of implementations and a wide variety of mathematical underpinnings? Do you want instructions that speed up all sorts of math?


I'm not sure that's true. Maybe you're thinking about implementing GCM without lookup tables; GCM is most often considered in AES context, and is infamously tricky to do in software in constant time and reasonable speed.


He's right; implementing plain constant-time AES can be done without AES-NI, but is by no means easy or obvious.


I didn't think it was obvious, but also didn't think the performance hit was as bad. (We're getting out of my comfort zone; I know how constant time AES implementations work, but not what the current speed records are for them.)


The best timings I'm aware of are ~7cpb for AES-CTR, and ~14cpb for GHASH on Nehalem [2]. It's a bitsliced implementation, so it makes sense to compare it to counter-mode AES-NI. A recent AES-NI implementation on Sandy Bridge [1, pg. 25-26] achieves 0.79cpb for AES-CTR, and 1.68cpb for GHASH.

The point: the ratios 14/1.68 and 7/0.79 are quite similar.

PS: The performance of PCLMULQDQ was vastly improved in Haswell, and I believe AES-GCM in there runs at something like 1.5cpb. However, the vector size of Haswell also doubles to 256 bits, which would also improve an hypothetical bitsliced AES-GCM implementation. Hard to say what that speed would be, so I won't try to compare things in Haswell.

[1] https://crypto.stanford.edu/RealWorldCrypto/slides/gueron.pd...

[2] http://eprint.iacr.org/2009/129


This is a cool paper, but I don't see 14cpb GHASH in it; their best timings for large packets in constant time are over 20cpb.


Yes, that's the aggregate time 7 + 14 (plus some small overhead). The 14cpb figure is mentioned at the end of page 10.


Neat. Thanks again!


I might have it confused? I thought there were cache-timing issues with doing lookups in the S-box in RAM, for instance.

I certainly also buy that the Intel GCM acceleration functions are useful.


There are definitely AES cache timing issues! I had it in my head that GHASH was harder to make constant time than AES, probably because of Adam Langley, but 'pbsd points out that it's subtler than that. Both GCM and AES are tricky to do in constant time in software.


If you want to use AES without exposing yourself to timing attacks, the correct course of action is to use NaCl.


They can just implement cruft in microcode rather than silicon. In the meantime everyone using AES for full disk encryption gets a nice speedup.


AES-NI instructions have their own bit in CPUID (not bundled with any SSE bit) so future chips could not include it and software would fall back to regular AES code paths.


I don't think even all current Intel chips have it either.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: