Hacker Newsnew | past | comments | ask | show | jobs | submit | orena's commentslogin

I do the following: www.yandex.com MOVIENAME direct streaming

And that is it... I have prime + netflix + hulu and such, but I use "yandex.com" as it does not have ads - even if sometimes it takes a bit to load and rarely it gets stuck for a second, it is less time than the stupid ads.


Any results on frontier math or arc ?

How many epochs did you train with ? 100k hours is not a lot for an LLM, Feels like bitter lesson


I train for 1M steps (batch size 64, block size 2048), which is enough for the model to more-or-less converge.

It's also a tiny model for LLM standards, with 150M parameters. The goal wasn't really to reach state of the art but to show how the performance of a single language model architecture can be vastly different when you just change the tokenizer.


To get around state of the art, how many parameters would be needed with your approach?


Although very cool, I lack the emotional context to understand why ppl work on it, what is the motivation that drives them ?

Not trying to offend, just trying to understand


I’ll hazard some guesses, since no one has replied.

- Big fish small pond. Because so few people are tackling problems like this, it’s much easier to get noticed and appreciated.

- Nostalgia. I think most people have a soft spot for the computers they grew up with. For me, it’s the TI-99/4A that I learned how to program on.

- Technical challenge. Making a decades-obsolete computer work with the modern world is not trivial.

>25 years ago I worked for BBN, and they had a warehouse of old equipment, including the IMPs that made Arpanet work, a pallet of early Macs, etc. I grabbed a hard drive to use with a routing project; it had 1GB of disk, and was the size of two rack units. Working with very old computers can be fascinating as well as frustrating.


Q (Cue) | Boston, MA | Onsite | Full-time | AI/ML – Speech & Audio

We’re building a revolutionary new way for people to communicate.

Our work is deeply rooted in large-scale deep learning. We train massive models across many GPUs using proprietary data, pushing the boundaries of what we can do with deep learning.

We’re looking for someone who is both highly research-oriented and technically strong. Training large models over long durations demands not only cutting-edge research thinking but also solid engineering skills and deep experience in scaling DL systems.

https://q.ai/open-position/?gh_jid=4569618101


I've asked claude to explain what you meant... https://claude.ai/share/391160c5-d74d-47e9-a963-0c19a9c7489a


I’m not source outsourcing even the comprehension of HN comments to an LLM is going to work out well for your mind


I’m not sure lacking comprehension of a comment and choosing to ignore that lack is better. Or worse: asking everyone to manually explain every reference they make. The LLM seems a good choice when comprehension is lacking.


So no bitter lesson?


The only reason IL can do it, is because the general population in Lebanon hates hezbollah, it is basically a proxy control power from IRAN.

Look at this to understand how big part of Lebanon sees Hezbollah https://www.instagram.com/brigitte_gabriel/reel/CyT_RanK1gi/


Page. 57 in Real Analysis, definition of a limit...

First time in my life that I did not understand something on the spot (actually took me a few months to (REALLY) understand it).


The style is not very different than NeMo(nvidia)/fairseq(Facebook)/espent(oss) etc..


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: