DeepSeek-V3.2 is a sparse attention architecture, while Zebra-Llama is a hybrid ...

		cubefox 3 days ago \| parent \| context \| favorite \| on: Zebra-Llama – Towards efficient hybrid models DeepSeek-V3.2 is a sparse attention architecture, while Zebra-Llama is a hybrid attention/SSM architecture. The outcome might be similar in some ways (close to linear complexity) but I think they are otherwise quite different.