More

orena · 2025-12-06T02:48:03 1764989283

I do the following: www.yandex.com MOVIENAME direct streaming

And that is it... I have prime + netflix + hulu and such, but I use "yandex.com" as it does not have ads - even if sometimes it takes a bit to load and rarely it gets stuck for a second, it is less time than the stupid ads.

orena · 2025-12-01T21:50:25 1764625825

Any results on frontier math or arc ?

orena · 2025-10-22T03:06:05 1761102365

How many epochs did you train with ? 100k hours is not a lot for an LLM, Feels like bitter lesson

vvolhejn · 2025-10-22T07:39:05 1761118745

I train for 1M steps (batch size 64, block size 2048), which is enough for the model to more-or-less converge.

It's also a tiny model for LLM standards, with 150M parameters. The goal wasn't really to reach state of the art but to show how the performance of a single language model architecture can be vastly different when you just change the tokenizer.

singularfutur · 2025-10-22T07:58:48 1761119928

To get around state of the art, how many parameters would be needed with your approach?

orena · 2025-07-19T05:41:04 1752903664

Although very cool, I lack the emotional context to understand why ppl work on it, what is the motivation that drives them ?

Not trying to offend, just trying to understand

macintux · 2025-07-19T15:16:40 1752938200

I’ll hazard some guesses, since no one has replied.

- Big fish small pond. Because so few people are tackling problems like this, it’s much easier to get noticed and appreciated.

- Nostalgia. I think most people have a soft spot for the computers they grew up with. For me, it’s the TI-99/4A that I learned how to program on.

- Technical challenge. Making a decades-obsolete computer work with the modern world is not trivial.

>25 years ago I worked for BBN, and they had a warehouse of old equipment, including the IMPs that made Arpanet work, a pallet of early Macs, etc. I grabbed a hard drive to use with a routing project; it had 1GB of disk, and was the size of two rack units. Working with very old computers can be fascinating as well as frustrating.

orena · 2025-05-05T20:29:21 1746476961

Q (Cue) | Boston, MA | Onsite | Full-time | AI/ML – Speech & Audio

We’re building a revolutionary new way for people to communicate.

Our work is deeply rooted in large-scale deep learning. We train massive models across many GPUs using proprietary data, pushing the boundaries of what we can do with deep learning.

We’re looking for someone who is both highly research-oriented and technically strong. Training large models over long durations demands not only cutting-edge research thinking but also solid engineering skills and deep experience in scaling DL systems.

https://q.ai/open-position/?gh_jid=4569618101

orena · 2025-03-28T15:02:46 1743174166

I've asked claude to explain what you meant... https://claude.ai/share/391160c5-d74d-47e9-a963-0c19a9c7489a

dieortin · 2025-03-28T16:21:53 1743178913

I’m not source outsourcing even the comprehension of HN comments to an LLM is going to work out well for your mind

etherealG · 2025-03-31T13:21:30 1743427290

I’m not sure lacking comprehension of a comment and choosing to ignore that lack is better. Or worse: asking everyone to manually explain every reference they make. The LLM seems a good choice when comprehension is lacking.

orena · 2024-12-27T12:39:23 1735303163

So no bitter lesson?

orena · on Sept 18, 2024

The only reason IL can do it, is because the general population in Lebanon hates hezbollah, it is basically a proxy control power from IRAN.

Look at this to understand how big part of Lebanon sees Hezbollah https://www.instagram.com/brigitte_gabriel/reel/CyT_RanK1gi/

orena · on June 2, 2024

Page. 57 in Real Analysis, definition of a limit...

First time in my life that I did not understand something on the spot (actually took me a few months to (REALLY) understand it).

orena · on April 24, 2024

The style is not very different than NeMo(nvidia)/fairseq(Facebook)/espent(oss) etc..