Update: I tried it out. It took about 8 seconds per token, and didn't seem to be...

steinvakt2 · 2025-08-05T19:11:29 1754421089

Did you run it the best way possible? im no expert, but I understand it can affect inference time greatly (which format/engine is used)

pamelafox · 2025-08-05T19:18:26 1754421506

I ran it via Ollama, which I assume uses the best way. Screenshot in my post here: https://bsky.app/profile/pamelafox.bsky.social/post/3lvobol3...

I'm still wondering why my MPU usage was so low.. maybe Ollama isn't optimized for running it yet?

wahnfrieden · 2025-08-06T00:17:11 1754439431

Might need to wait on MLX

turnsout · 2025-08-05T19:17:32 1754421452

To clarify, this was the 20B model?

pamelafox · 2025-08-05T19:18:59 1754421539

Yep, 20B model, via Ollama: ollama run gpt-oss:20b

Screenshot here with Ollama running and asitop in other terminal: