I have a Radeon 7900 XTX 24GB and have been using the deepseek-r1:14b for a coup... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jamesaross 11 months ago \| parent \| context \| favorite \| on: How to Run DeepSeek R1 Distilled Reasoning Models ... I have a Radeon 7900 XTX 24GB and have been using the deepseek-r1:14b for a couple days. It achieves about 45 tokens/s. Only after reading this article did I realize that the 32B model would also fit entirely (23GB used). And since Ollama [0] was already installed, it as as easy as running: ollama run deepseek-r1:32b The 32B model achieves about 25 tokens/s, which is faster than I can read. However, the "thinking" time is mostly a lower quality overhead taking ~1-4 minutes before the Solution/Answer You can view the model performance within ollama using the command: /set verbose [0] https://github.com/ollama/ollama

waltercool 11 months ago | [–]

Yup, this is what deepseek does.

The good thing of 32B is being as good as 70B at many benchmarks according to Deepseek documentation

https://huggingface.co/deepseek-ai/DeepSeek-R1#distilled-mod...

stonecharioteer 11 months ago | [–]

I've been running 32b as well.

But I cannot find it in LM Studio, what am I doing wrong that I only find distilled models?

nickthegreek 11 months ago | [–]

32b is distilled model. Only the 670b is not.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact