Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It takes a few hours to compute the imatrix on some calibration dataset since we use more than 1-3 million tokens of high quality data. Then we have to decide on which layers to quantize to higher bits or not, which takes more time. And the quantization creation also takes some hours. Uploading also takes some time as well! Overall 8 hours maybe minimum?


What cluster do you have to do the quantizing? I'm guessing you're not using a single machine with a 3090 in your garage.


Oh definitely not! I use some spot cloud instances!


But you can get one of these quantized models to run effectively on a 3090?

If so, I'd love detailed instructions.

The guide you posted earlier goes over my (and likely many others') head!


Oh yes definitely! Oh wait is the guide too long / wordy? This section https://docs.unsloth.ai/basics/qwen3-coder-how-to-run-locall... shows how to run it on a 3090


Kind of you to respond! Thanks!

I have pretty bad ADHD. And I've only run locally using kobold; dilettante at DIY AI.

So, yeah, I'm a bit lost in it.


Oh sorry - for Kobold - I think it uses llama.cpp behind the hood? I think Kobold has some guides on using custom GGUFs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: