Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

32GB should be fine. I went a little overboard and got a new MBP with M2 MAX and 96GB, but the hardware is really best suited at this point to a 30B model. I can and do play around with 65B models, but at that point you're making a fairly big tradeoff in generation speed for an incremental increase in quality.

As a datapoint, I have a 30B model [0] loaded right now and it's using 23.44GB of RAM. Getting around 9 tokens/sec, which is very usable. I also have the 65B version of the same model [1] and it's good for around 3.6 tokens/second, but it uses 44GB of RAM. Not unusably slow, but more often than not I opt for the 30B because it's good enough and a lot faster.

Haven't tried the llama2 70B yet.

[0] https://huggingface.co/TheBloke/upstage-llama-30b-instruct-2... [1] https://huggingface.co/TheBloke/Upstage-Llama1-65B-Instruct-...



What's your use case for local if you don't mind?


Thankyou that’s really helpful! The CTO lead times on Mac are huge here so it’s either the pro with 16 or the max with 32. Ideally I’d go pro with 64.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: