Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
cma
10 months ago
|
parent
|
context
|
favorite
| on:
How has DeepSeek improved the Transformer architec...
Flash attention was also a set of common techniques in other areas of optimized software, yet the big guys weren't doing the optimizations when it came out and it significantly improved everything.
whimsicalism
10 months ago
[–]
yes, i agree that low-level & infra work is where a lot of deepseek's improvement came from
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: