Fast inference can change the entire dynamic or working with these tools. At the typical speeds, I usually try to do something else while the model works. When the model works really fast, I can easily wait for it to finish.
So the total difference includes the cost of context switching, which is big.
Potentially speed matters less in a scenario that is focused on more autonomous agents running in the background. However I think most usage is still highly interactive these days.
So the total difference includes the cost of context switching, which is big.
Potentially speed matters less in a scenario that is focused on more autonomous agents running in the background. However I think most usage is still highly interactive these days.