Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have a Radeon 7900 XTX 24GB and have been using the deepseek-r1:14b for a couple days. It achieves about 45 tokens/s. Only after reading this article did I realize that the 32B model would also fit entirely (23GB used). And since Ollama [0] was already installed, it as as easy as running: ollama run deepseek-r1:32b

The 32B model achieves about 25 tokens/s, which is faster than I can read. However, the "thinking" time is mostly a lower quality overhead taking ~1-4 minutes before the Solution/Answer

You can view the model performance within ollama using the command: /set verbose

[0] https://github.com/ollama/ollama



Yup, this is what deepseek does.

The good thing of 32B is being as good as 70B at many benchmarks according to Deepseek documentation

https://huggingface.co/deepseek-ai/DeepSeek-R1#distilled-mod...


I've been running 32b as well.

But I cannot find it in LM Studio, what am I doing wrong that I only find distilled models?


32b is distilled model. Only the 670b is not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: