Oh 8\*H200 is nice - for llama.cpp definitely look at https://docs.unsloth.ai/ba...

Oh 8*H200 is nice - for llama.cpp definitely look at https://docs.unsloth.ai/basics/qwen3-coder-how-to-run-locall... - llama.cpp has a high throughput mode which should be helpful.

You should be able to get 40 to 50 tokens / s in the minimum. High throughput mode + a small draft model might get you 100 tokens / s generation