Hacker Newsnew | past | comments | ask | show | jobs | submit | cornel_io's commentslogin

Asking questions on SO was an exercise in frustration, not "interacting with peers". I've never once had a productive interaction there, everything I've ever asked was either closed for dumb reasons or not answered at all. The library of past answers was more useful, but fell off hard for more recent tech, I assume because people all were having the same frustrations as I was and just stopped going there to ask anything.

I have plenty of real peers I interact with, I do not need that noise when I just need a quick answer to a technical question. LLMs are fantastic for this use case.


this right here, not just overmoderated but the mods were wrong-headed from the start believing that it was more important to protect some sacred archive than for users to have good experiences.

SO was so elite it basically committed suicide rather than let the influx of noobs and their noob questions and noob answers kill the site

this nails it: https://www.tiktok.com/@techroastshow/video/7518116912623045...


Yahoo answers died a lot faster and heavily formed SO policy.

It's funny, because I had a similar question but wanted to be able to materialize a view in Microsoft SQL Server, and ChatGPT went around in circles suggesting invalid solutions.

There were about 4 possibilities that I had tried before going to ChatGPT, it went through all 4, then when the fourth one failed it gave me the first one again.


You can't use the free chat client for questions like that in my experience. Almost guaranteed to waste your time. Try the big-3 thinking models (ChatGPT 5.2 Pro, Gemini 3 Pro, and Claude Opus 4.5).

> this nails it

I assume you’re taking about the ending where gippity tells you how awesome you are and then spits out a wrong answer?


I had the opposite experience. I learned so much from the helpful people on StackExchange sites, in computer science, programming, geology, and biology.

Me too. I learned a lot from people on SO. Sometimes the tone was rude, but overall, I was and am grateful for it and sad to see this chart.

This article claims that's false, that 8-12 at higher weight leads to the same result as 20+ at lower weights.

The research is studying young untrained men. Everyone puts on muscle at mach chicken when untrained.

And at the end of the day it's not really a tradeoff we'll need to make, anyways: my experience with e.g. Claude Code is that every model iteration gets much better at avoiding balls of mud, even without tons of manual guidance and pleading.

I get that even now it's very easy to let stuff get out of hand if you aren't paying close attention yourself to the actual code, so people assume that it's some fundamental limitation of all LLMs. But it's not, much like 6 fingered hands was just a temporary state, not anything deep or necessary that was enforced by the diffusion architecture.


Theoretical "proofs" of limitations like this are always unhelpful because they're too broad, and apply just as well to humans as they do to LLMs. The result is true but it doesn't actually apply any limitation that matters.


You're confused about what applies to people & what applies to formal systems. You will continue to be confused as long as you keep thinking formal results can be applied in informal contexts.


When a tool call completes the result is sent back to the LLM to decide what to do next, that's where it can decide to go do other stuff before returning a final answer. Sometimes people use structured outputs or tool calls to explicitly have the LLM decide when it's done, or allow it to send intermediate messages for logging to the user. But the simple loop there lets the LLM do plenty of it has good tools.


So it returns a tool call for "continue" every time it wants to continue working? Do people implement this in different ways? It would be nice what method it has been trained on if any.


The model will quickly stop tool calling on its own; in fact, I've had more trouble getting GPT5 to tool call enough. The "real" loop is driven, at each iteration, by a prompt from the "user" (which might be human or might be human-mediated code that keeps supplying new prompts).

In my personal agent, I have a system prompt that tells the model to generate responses (after absorbing tool responses) with <1>...</1> <2>...</2> <3>...</3> delimited suggestions for next steps; my TUI presents those, parsed out of the output, as a selector, which is how I drive it.


There are thousands of projects out there that use mocks for various reasons, some good, some bad, some ugly. But it doesn't matter: most engineers on those projects do not have the option to go another direction, they have to push forward.


In this context, why not refactor (and have your LLM of choice) write and optimize the integration tests for you? If the crux of the argument for LLMs is that it is capable of producing sufficient quality software and dramatically reduced costs, why not have it rewrite tests?


Of course. But I've screened far more out because I was in a rush and got 40 resumes in that day and they just didn't pique my interest as much as the next one over.


There may be a ceiling, sure. It's overwhelmingly unlikely that it's just about where humans ended up, though.


I'm all for benchmarks that push the field forward, but ARC problems seem to be difficult for reasons having less to do with intelligence and more about having a text system that works reliably with rasterized pixel data presented line by line. Most people would score 0 on it if they were shown the data the way an LLM sees it, these problems only seem easy to us because there are visualizers slapped on top.


What is a visualisation?

Our rod and cone cells could just as well be wired up in any other configuration you care to imagine. And yet, an organisation or mapping that preserves spatial relationships has been strongly preferred over billions of years of evolution, allowing us most easily to make sense of the world. Put another way, spatial feature detectors have emerged as an incredible versatile substrate for ‘live-action’ generation of world models.

What do we do when we visualise, then? We take abstract relationships (in data, in a conceptual framework, whatever) and map them in a structure-preserving way to an embodiment (ink on paper, pixels on screen) that can wind its way through our perceptual machinery that evolved to detect spatial relationships. That is, we leverage our highly developed capability for pattern matching in the visual domain to detect patterns that are not necessarily visual at all, but which nevertheless have some inherent structure that is readily revealed that way.

What does any of this entail for machine intelligence?

On the one hand, if a problem has an inherent spatial logic to it, then it ought to have good learning gradients in the direction of a spatial organisation of the raw input. So, if specifically training for such a problem, the serialisation probably doesn’t much matter.

On the other hand: expecting a language model to generalise to inherently spatial reasoning? I’m totally with you. Why should we expect good performance?

No clue how the unification might be achieved, but I’d wager that language + action-prediction models will be far more capable than models grounded in language alone. After all, what does ‘cat’ mean to a language model that’s never seen one pounce and purr and so on? (Pictures don’t really count.)


If ChatGPT claims arsenic to be a tasty snack, OpenAI adds a p0 eval and snuffs that behavior out of all future generations of ChatGPT. Viewed vaguely in faux genetic terms, the "tasty arsenic gene" has been quickly wiped out of the population, never to return.

Evolution is much less brutal and efficient. To you death matters a lot more than being trained to avoid a response does to ChatGPT, but from the point of view of the "tasty arsenic" behavior, it's the same.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: