Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fast inference can change the entire dynamic or working with these tools. At the typical speeds, I usually try to do something else while the model works. When the model works really fast, I can easily wait for it to finish.

So the total difference includes the cost of context switching, which is big.

Potentially speed matters less in a scenario that is focused on more autonomous agents running in the background. However I think most usage is still highly interactive these days.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: