It is funny how poorly designed AMD's SYSCALL seems to be compared to Intel's SYSENTER. As a side note, I have been thinking of the removal of segmentation from x86 for a while now.
IMHO the whole amd64 was poorly designed. A misguided attempt at "cleaning up" the architecture by removing features that actually turn out to be quite useful and in active use, so much that they had to put some of those features back in later revisions (with feature bits, creating further mess):
Especially in the case of LAHF/SAHF, it's not like they reused the opcode for something else --- since amd64 processors must be able to execute 16 and 32-bit code, the circuitry required to execute those instructions is there and perfectly functional; they just inexplicably become invalid in 64-bit mode. AMD could've made it far more seamless and compatible; instead they seemingly ripped stuff out without much consideration of what could be depending on it, hence the strange invalidation of certain instructions and the absurd semi-presence of segmentation.
For example, they could've made segmentation disappear completely (and certainly break more applications...) in 64-bit mode, reassigning all the opcodes related to their usage, etc.; or they could've fit 64-bit segments into the existing segmentation model, but the current "segmentation half-working" state is just bizarre.
My favorite is features from IA-64 that had to be ported to AMD64 after the fact. For example, IA-64 had two stacks. Before SYSCALL was fixed for AMD64, it was so poorly designed that the only OS I know supporting it was XP pre-SP2.
> features from IA-64 that had to be ported to AMD64 after the fact
I'm not sure what you mean by this. IA-64 was Itanium. While it did have two stacks, that feature never existed on either vendor's implementation of x86-64.