Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not aware of any efficient transformer training code for AMD cards.

Also most training is done using bfloat, not single precision (which is usually only used for accumulators)



Sure, you would need to rewrite the training code for AMD's ecosystem. If you're using mixed precision training, I suppose you're right about BF16. That puts the relative performance of A100 about 2.5x that of the Radeon RX 7900 XT. May be better to go with the Nvidia GeForce RTX 4090 with a $1600 retail.


It all works with pytorch and huggingface's transformers library out of the box with Rocm.


You would need to compile a few components from source for Navi 31 if you were to try it today, so out-of-the-box is perhaps an overstatement, but it's certainly doable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: