alcinos's comments

alcinos · 2025-09-09T13:01:18 1757422878

> We've just only started RL training LLMs

That's just factually wrong. Even the original chatGPT model (based on gpt3.5, released in 2022) was trained with RL (specifically RLHF).

prasoon2211 · 2025-09-09T15:43:55 1757432635

RLHF is not the "RL" the parent is posting about. RLHF is specifically human driven reward (subjective, doesn't scale, doesn't improve the model "intelligence", just tweaks behavior) - which is why the labs have started calling it post-training, not RLHF, anymore.

True RL is where you set up an environment where an agent can "discover" solutions to problems by iterating against some kind of verifiable reward AND the entire space of outcomes is theoretically largely explorable by the agent. Maths and Coding are have proven amenable to this type of RL so far.

manscrober · 2025-09-09T13:10:16 1757423416

a) 2022 is not too long ago b) this was a first important step to usable ai but not scalable. I'd say "RL training" is not the same as RLHF.

bigyabai · 2025-09-09T16:17:08 1757434628

The original ChatGPT was like 3 years after the first usable transformer models.

alcinos · on Feb 5, 2020

It is already possible to know if a particular image has been used in training (see eg. https://arxiv.org/abs/1809.06396 by the same authors), but this new work also provides a p-value to give you a confidence on the result it gives.

Also notice that being proactive in watermarking the dataset can be desirable in some cases. For example, many datasets have large overlaps in the base images they use (but sometimes different labels), so it can be interesting to know whether a model was trained on "your" version of the dataset.

alcinos · on July 10, 2018

Well it's deterministic once you know the random seed, which is stored in the replay file. An agent doesn't know the seed, hence cannot predict the exact outcome of its actions, only a probability over the outcomes. So, from the agent's perspective, the game is indeed random.

alcinos · on May 5, 2015

To me, the main problem with papers in their current shape is that they are required to be more or less self-contained. When one wants to state a result that improves a little bit the knowledge in a well established field, he has to waste time and space stating the definitions and preliminary results necessary to understand his result. This is counterproductive both for the author and the reader interested only in the small new bit of information.

If papers were collaborative, one could simply propose his improvements directly where they fit in the reference paper, without having to write one from scratch, and readers would be immediately aware of these follow-up results, without having to search in dozens of papers.

flycaliguy · on May 5, 2015

I don't know. I think the act of forming a paper up from scratch is an essential element in the process of research. It really isn't a waste of time and space in the sense that proper research will require these definitions and preliminary results be carefully reviewed by the author anyways. Why not have them type it out for us? Or for their own sake even.

A wiki style database of information could be an exciting resource for kicking off new ideas. When scientists conduct research within this database though, findings and methodology must be to a high enough standard. Marks of high quality research include well defined terms and contextualized prior results.

If you want to get straight to the new information, read the abstract up top, skim the middle and read the results and discussion.

alcinos · on May 5, 2015

I completely agree that serious researchers should review definitions and results they are basing their findings upon. But then, I'd find much more reliable an editable reference paper corrected and improved by a significant number of people working in the field, rather than an old-style paper, published decades ago, after a (botched ?) review by a couple of anonymous reviewers, and not revised since then.

This would address also a little bit the problem of notations: if everybody agree and work on the same piece of work, they are likely to adopt the same notations, providing a nice coherence to the user.

A wiki style database has many advantages when it comes to organizing and searching information. In my opinion, though, the editing process should be closer to GitHub's pull request (as advocated by the article), to ensure that everything is properly reviewed before publication.

Finally, a publication scheme like the one I described also address a recurrent issue with traditional citation-based papers: citations are one-sided. It is easy to see which papers one article depends upon, but the converse is hard (unless using specific tools, at least). With collaborative editing and wikipedia-style links between articles, the reader is immediately aware of the latest findings in the field, which simplifies tremendously the bibliographic research.

jleyank · on May 5, 2015

There is a tradeoff here. If you want terse, field-specific papers without introduction and/or definition of terms they will be essentially incomprehensible for those not knowledgeable about the field. If you want things to be more generally accessible, a brief (!) explanation of the point and basis of the paper is helpful.

That said, I would like to see papers that actually provide a return for the time spent reading them. Teach me something that I can use if I'm familiar with the discipline. Make sure it's actually generally applicable rather than an artifact of the data. Do proper statistics and/or testing of the idea. Show it's not just an LPU (least-publishable unit) that resulted from running tutorials from the vendor.

There's a tremendous push to publish as it's the currency of academic and much professional life. I've done methodology all my working life, and the good papers are joys to read. They provide insight for new techniques, things which I can understand and apply and build upon. There's test data, so I can check that the paper and any work I do based on the paper is robust.

They unfortunately quite rare. I think editors use me as a hatchet man for me-too papers, or that's all people write anymore. Yeah, the goal is to get one's students trained and employed and the grants obtained but please, please write things that are worth the time of reading them.