Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The whole point of Llama is to go beyond Chinchilla optional:

> The objective of the scaling laws from Hoffmann et al. (2022) is to determine how to best scale the dataset and model sizes for a particular training compute budget. However, this objective disregards the inference budget, which becomes critical when serving a language model at scale. In this context, given a target level of performance, the preferred model is not the fastest to train but the fastest at inference, and although it may be cheaper to train a large model to reach a certain level of performance, a smaller one trained longer will ultimately be cheaper at inference.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: