If you go back through my Hacker News comments, I believe you'll see this. Perhaps look for keywords "GPT-2", "prediction", and "agent". (I don't know how to search HN comments efficiently.) I was talking about this sort of thing in 2018, though I don't think I published anything that's still accessible, and I'd hardly call myself an expert: it's just obviously how the system works.
The "author:" part was just being treated as a keyword, and was restricting the results too much. I haven't found the comments I was looking for, but I have found Chain of Thought prompting, 11 months before it was cool (https://news.ycombinator.com/item?id=26063189):
> Instruction: When considering the sizes of objects, you will calculate the sizes before attempting a comparison.
> GPT-2 doesn't have a concept of self, so it constructs plausible in-character excuses instead.
> Corollary: you can't patch broken FAI designs. Reinforcement learning (underlying basically all of our best AI) is known to be broken; it'll game the system. Even if they were powerful enough to understand our goals, they simply wouldn't care; they'd care less than a dolphin. https://vkrakovna.wordpress.com/2018/04/02/specification-gam...
> And there are far too many people in academia who don't understand this, after years of writing papers on the subject.
This criticism applies to RLHF, so it counts, imo. Not as explicit as my (probably unpublished, almost certainly embarrassing) wild ravings from 2018, but it's before 2024.