The fact that "scaling laws" didn't scale? Go open your favorite LLM in a hex editor, oftentimes half the larger tensors are just null bytes.
The fact that "scaling laws" didn't scale? Go open your favorite LLM in a hex editor, oftentimes half the larger tensors are just null bytes.