> your suggestion is to treat it like a person but (surprise surprise) you don't have any specific ideas of how and why that works. your idea just sounds like marketing
> A circle of competence is the subject area which matches a person's skills or expertise. The concept was developed by Warren Buffett and Charlie Munger as what they call a mental model, a codified form of business acumen, concerning the investment strategy of limiting one's financial investments in areas where an individual may have limited understanding or experience, while concentrating in areas where one has the greatest familiarity. -Wikipedia
> I try to temper my tendency to believe the Halo effect with Warren Buffett's notion of the Circle of Competence; there is often a very narrow domain where any person can be significantly knowledgeable. (commenter above)
Putting aside Buffett in particular, I'm wary of claims like "there is often a very narrow domain where any person can be significantly knowledgeable". How often? How narrow of a domain? Doesn't it depend on arbitrary definitions of what qualifies as a category? Is this a testable theory? Is it a predictive theory? What does empirical research and careful analysis show?
Putting that aside, there are useful mathematical ways to get an idea of some of the backing concepts without making assumptions about people, culture, education, etc. I'll cook one up now...
Start with 70K balls split evenly across seven colors: red, orange, yellow, green, blue, indigo, and violet. 1,000 show up demanding balls. So we mix them up and randomly distribute 10 balls to every person. What does the distribution tend to look like? What particulars would you tune and/or definitions would you choose to make this problem "sort of" map to something sort of like assessing the diversity of human competence across different areas?
Note the colored balls example assumes independence between colors (subjects or skills or something). But in real life, there are often causally significant links between skills. For example, general reasoning ability improves performance in a lot of other subjects.
Then a goat exploded, because I don't know how to end this comment gracefully.
When the world is complicated, entangled, and rapidly changing, would you expect there to be one centralized official guide?*
At the risk of sounding glib or paternalistic -- but I'm going to say it anyway, because once you "see it" it won't feel like a foreign idea being imposed on you -- there are ways that help to lower and even drop expectations.
How? To mention just one: good reading. Read "Be a new homunculus" [1]. To summarize, visualize yourself like you are the "thing that lives in your brain". Yes, this is non-sense but try it anyway.
If you find various ways to accept "the world is changing faster than ever before" and it feels like too much. Maybe you are pissed off or anxious about AI. Maybe AI is being "heavily encouraged" for you (on you?) at work. Maybe you feel like we're living in an unsustainable state of affairs -- don't deny it. Dig into that feeling, talk about it. See where it leads you. Burying these things isn't a viable long-term strategy.**
* There is an "awesome-*" GitHub repository for collecting recommended resources to help with Claude Code: [2] But still requires a lot of curation and end-user experimentation: [2] There are few easy answers in a dynamic uncertain world.
** Yes I'm intentionally cracking the door open to "Job loss is scary. It is time to get real on this, including political activism."
I get that. I probably could have been much more succinct by saying this: We can consciously act in ways that reduce the frustration level even if the environment itself doesn't change. It usually takes time and patience, but not always. Sometimes a particular mindset shift is sufficient to make a frustration completely vanish almost immediately.
Some examples from my experience: (1) Many particular frustrations with LLMs vanish the more I learn about their internals. (2) Frustration with the cacophony of various RAG/graph-database tooling vanishes once I realize that there is an entire slice of VC money chasing these problems precisely because it is uncertain: the victors are not pre-ordained and ... [insert bad joke about vectors here]
> I know that "X is destroying democracy, vote for Y" has been a prevalent narrative lately, but is there any evidence that it's true? I get that it's death by a thousand cuts, or "one step at a time" as they say.
I suggest reading [1], [2], and [3]. From there, you'll probably have lots of background to pose your own research questions. According to [4], until you write about something, your thinking will be incomplete, and I tend to agree nearly all of the time.
[4]: "Neuroscientists, psychologists and other experts on thinking have very different ideas about how our brains work, but, as Levy writes: “no matter how internal processes are implemented, (you) need to understand the extent to which the mind is reliant upon external scaffolding.” (2011, 270) If there is one thing the experts agree on, then it is this: You have to externalise your ideas, you have to write. Richard Feynman stresses it as much as Benjamin Franklin. If we write, it is more likely that we understand what we read, remember what we learn and that our thoughts make sense." - Sönke Ahrens. How to Take Smart Notes_ - Sonke Ahrens (p. 30)
Here is one sentence from the referenced prediction:
> I don't think there will be any more AI winters.
This isn't enough to qualify as a testable prediction, in the eyes of people that care about such things, because there is no good way to formulate a resolution criteria for a claim that extends indefinitely into the future. See [1] for a great introduction.
Many people are impressed by this, and I can see why. Still, this much isn't surprising: the Karpathy + LLM combo can deliver quickly. But there are downsides of blazing speed.
If you dig in, there are substantial flaws in the project's analysis and framing, such as the definition of a prediction, assessing comments, data quality overall, and more. Go spelunking through the comments here and notice people asking about methodology and checking the results.
Social science research isn't easy; it requires training, effort, and patience. I would be very happy if Karpathy added a Big Flashing Red Sign to this effect. It would raise awareness and focus community attention on what I think are the hardest and most important aspects of this kind of project: methodology, rigor, criticism, feedback, and correction.
I appreciate your intent, but this tool needs a lot of work -- maybe an entire redesign -- before it would be suitable for the purpose you seek. See discussion at [1].
Besides, in my experience, only a tiny fraction of HN comments can be interpreted as falsifiable predictions.
Instead I would recommend learning about calibration [2] and ways to improve one's calibration, which will likely lead you into literature reviews of cognitive biases and what we can do about them. Also, jumping into some prediction markets (as long as they don't become too much of a distraction) is good practice.
Good points. To summarize: for a given comment, one presumably must downselect to the ones that can reasonably be interpreted as forecasts. I see some indicators that the creator of the project (despite his amazing reputation) skated over this part.
> The conclusion was that superforecasters' ability to filter out "noise" played a more significant role in improving accuracy than bias reduction or the efficient extraction of information.
>In February 2023, Superforecasters made better forecasts than readers of the Financial Times on eight out of nine questions that were resolved at the end of the year.[19] In July 2024, the Financial Times reported that Superforecasters "have consistently outperformed financial markets in predicting the Fed's next move"
>In particular, a 2015 study found that key predictors of forecasting accuracy were "cognitive ability [IQ], political knowledge, and open-mindedness".[23] Superforecasters "were better at inductive reasoning, pattern detection, cognitive flexibility, and open-mindedness".
I'm really not sure what you want me to take from this article? Do you contend that everyone has the same competency at forecasting stock movements?
> I'm really not sure what you want me to take from this article?
I linked to the Wikipedia page as a way of pointing to the book Superforecasters by Tetlock and Gardner. If forecasting interests you, I recommend using it as a jumping off point.
> Do you contend that everyone has the same competency at forecasting stock movements?
No, and I'm not sure why you are asking me this. Superforecasters does not make that claim.
> I'm really not sure what you want me to take from this article?
If you read the book and process and internalize its lessons properly, I predict you will view what you wrote above in a different different light:
> Gotta auto grade every HN comment for how good it is at predicting stock market movement then check what the "most frequently correct" user is saying about the next 6 months.
Namely, you would have many reasons to doubt such a project from the outset and would pursue other more fruitful directions.
Until someone publishes a systematic quality assessment, we're grasping at anecdotes.
It is unfortunate that the questions of "how well did the LLM do?" and "how does 'grading' work in this app?" seem to have gone out the window when HN readers see something shiny.
Yes. And the article is a perfect example of the dangerous sort of automation bias that people will increasingly slide into when it comes to LLMs. I realize Karpathy is sort of incentivized toward this bias given his career, but he doesn't even spend a single sentence even so much as suggesting that the results would need further inspection, or that they might be inaccurate.
The LLM is consulted like a perfect oracle, flawless in its ability to perform a task, and it's left at that. Its results are presented totally uncritically.
For this project, of course, the stakes are nil. But how long until this unfounded trust in LLMs works its way into high stakes problems? The reign of deterministic machines for the past few centuries has ingrained a trust in the reliability of machines in us that should be suspended when dealing with an inherently stochastic device like an LLM.
This is unnecessarily mean. Please review https://news.ycombinator.com/newsguidelines.html
> Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.
reply