I don't quite understand the purpose. Yes, it's clearly stated, but, what do you mean "a reasonable subset of Python code" while "cannot use the standard library"? 99.9% of Python I write for anything ever uses standard library and then some (requests?). What do you expect your LLM-agent to write without that? A pseudo-code sorting algorithm sketch? Why would you even want to run that?
They plan to use to for "Code Mode" which mean the LLM will use this to run Python code that it writes to run tools instead of having to load the tools up front into the LLM context window.
The idea is that in “traditional” LLM tool calling, the entire (MCP) tool result is sent back to the LLM, even if it just needs a few fields, or is going to pass the return value into another tool without needing to see the intermediate value. Every step that depends on results from an earlier step also requires a new LLM turn, limiting parallelism and adding a lot of overhead.
With code mode, the LLM can chain tool calls, pull out specific fields, and run entire algorithms using tools with only the necessary parts of the result (or errors) going back to the LLM.
I like your effort. Time savings and strict security are real and important. In modern orchestration flows, however, a subagent handles the extra processing of tool results, so the context of the main agent is not poluted.
It's pydantic, they're verifying types and syntax, those don't require the stdlib. Type hints, syntax checks, likely logical issues,etc.. static type checking is good with that, but LLMs can take to the next level where they analyze the intended data flow and find logical bugs, or good syntax and typing but not the intended syntax.
For example, incorrect levels of indentation. Let me use dots instead of space because of HN formatting:
for key,val in mydict.items():
..if key == "operation":
....logging.info("Executing operation %s",val)
..if val == "drop_table":
....self.drop_table()
This uses good syntax, and I the logging part is not in the stdlib, so I assume it would ignore it or replace it with dummy code? That shouldn't prevent it from analyzing that loop and determining that the second if-block was intended to be under the first, and the way it is written now, the key check isn't done.
In other words, if you don't want to do validate proper stdlib/module usage, but proper __Python__ usage, this makes sense. Although I'm speculating on exactly what they're trying to do.
EDIT: I think I my speculation was wrong, it looks like they might have developed this to write code for pydantic-ai: https://github.com/pydantic/pydantic-ai , i'll leave the comment above as-is though, since I think it would still be cool to have that capability in pydantic.
No, seriously, people need to be punished for submitting LLM-generated garbage without specifying that it's LLM-generated garbage. 400+ points, oh my god, people, what's wrong with you...
We buried the post for seeming obviously-LLM-generated. But please email us about these (hn@ycombinator.com) rather than posting public accusations.
There are two reasons why emailing us is better:
First, the negative consequences of a false allegation outweigh the benefits of a valid accusation.
Second and more important: we'll likely see an email sooner than we'll see a comment, so we can nip it in the bud quickly, rather leaving it sitting on the front page for hours.
Hey Tom! Earnest question here - I am seeing on the order of one AI post a day on HN, sometimes more than that. It's good to know we can email in about these things, but I think most users don't understand that - certainly I didn't for the last few month as this has been going on. It would be nice if there was an affordance on the site to flag these, similar to the existing flag function.
Just flagging them is fine. Emailing us with the link is even better. Part of me wonders if we should have a new, specific-purpose flag for generated content, but it’s not the HN way to add new features for actions that can already be satisfied by existing UI features.
I would suggest a new feature here. I have been hesitating on flagging them because I feel 'flag' is for things which are obvious rule violations. I feel a bit bad flagging a submission off the front page when I only have a vague feeling that it was AI-written. (And sometimes AI generated content isn't that bad, i.e. if an author just used AI to translate their writing into English.)
You buried a popular post because of the public accusation or just your "hunch"?
Why not let your audience decide what it wants to read?
I say this as a long time HN reader, who feels like the community has become grumpier over the years. Which I feel like is a shame. But maybe that's just me.
It's my job to read HN posts and comments all day, every day, and these days that means spending a lot of time evaluating whether a post seems LLM-generated. In this case the post seems LLM-generated or heavily LLM-edited.
We have been asking the community not to publicly call out posts for being LLM-generated, for the reasons I explained in the latest edit of the comment you replied to. But if we're going to ask the community that, we also need to ask submitters to not post obviously-LLM-influenced articles. We've been asking that ever since LLMs became commonplace.
> I say this as a long time HN reader, who feels like the community has become grumpier over the years. Which I feel like is a shame. But maybe that's just me.
We've recently added this line to the guidelines: Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative.
HN has become grumpier, and we don't like that. But a lot of it is in reaction to the HN audience being disappointed at a lot of what modern tech companies are serving up, both in terms of products and content, and it doesn't work for us to tell them they're wrong to feel that way. We can try, but we can't force anyone to feel differently. It's just as much up to product creators and content creators to keep working to raise the standards of what they offer the audience.
Thanks Tom, I appreciate the openness. You are seemingly overriding the wishes of the community, but it your community and you have the right to do so. I still think it's a shame, but that's my problem.
> You are seemingly overriding the wishes of the community
That's false. The overwhelming sentiment of the community is that HN should be free of LLM-generated content or content that has obvious AI fingerprints. Sometimes people don't immediately realize that an article or comment has a heavy LLM influence, but once they realize it does, they expect us to act (this is especially true if they didn't realize it initially, as they feel deceived). This is clear from the comments and emails we get about this topic.
If you can publish a new version of the post that is human-authored, we'd happily re-up it.
>> You are seemingly overriding the wishes of the community
> That's false. The overwhelming sentiment of the community is that HN should be free of LLM-generated content or content that has obvious AI fingerprints.
Yeah it is indeed, and for good reason: why would I spend time reading something the author didn't spend time thinking through and writing?
It's not that people don't like Postgres articles (otherwise, the upvotes would be much lower), but once you read a bit of the article, the LLM stench it gives off is characteristic. You know: Standard. LLM. Style. It's tiresome. Irksome. Off-putting.
What I'm wondering is, if LLMs are trained on "our" (in the wider sense of the word) writing style, and spew it back at us, what data set was it that overused this superficial emphatic style to such a degree, that it's now overwhelmingly the bog-standard generative output style?
I’d be grumpy over wasting my time on an HN post that’s LLM generated which doesn’t state that it is. If I wanted this, I could be prompting N number of chat models available to me instead of meandering over here.
I just pasted the first paragraph in an "AI detector" app and it indeed came back as 100% AI. But I heard those things are unreliable. How did you determine this was LLM-generated? The same way?
Apart from the style of the prose, which is my subjective evaluation: This blog post is "a view from nowhere." Tiger Data is a company that sells postgres in some way (don't know, but it doesn't matter for the following): they could speak as themselves, and compare themselves to companies that sell other open source databases. Or they could showcase benchmarks _they ran_.
Them saying: "What you get: pgvectorscale uses the DiskANN algorithm (from Microsoft Research), achieving 28x lower p95 latency and 16x higher throughput than Pinecone at 99% recall" is marketing unless they give how you'd replicate those numbers.
Point being: this could have been written by an LLM, because it doesn't represent any work-done by Tiger Data.
For what it's worth, TigerData is the company that develops TimescaleDB, a very popular and performant time series database provided as a Postgres extension. I'm surprised that the fact that TigerData is behind it is not mentioned anywhere in the blog post. (Though, TimescaleDB is mentioned 14 times on the page).
In terms of that example: they should link to how they got those numbers, and it should state the benchmark used, the machines used, what they controlled for etc.
Just using LLMs enough I've developed a sense for the flavor of writing. Surely it could be hidden with enough work, but most of the time it's pretty blatant.
It’s got that LLM flow to it. Also liberal use of formatting. It’s like it cannot possibly emphasize enough. Tries to make every word hit as hard as possible. Theres no filler, nothing slightly tangential or off topic to add color. Just many vapid points rapid fire, as if they’re the hardest hitting truth of the decade lol
ChatGPT has a pretty obvious writing style at the moment. It's not a question of some nebulous "AI detector" gotcha, it's more just basic human pattern matching. The abundant bullet points, copious bold text, pithy one line summarizing assertions ("In the AI era, simplicity isn’t just elegant. It’s essential."). There are so many more in just how it structures its writing (eg "Let’s address this head-on."). Hard to enumerate everything, frankly.
I know everybody just wants to talk about Postgres but it’s still sad to see any sort of engagement with slop. Even though the actual article is essentially irrelevant lol
Is there some well established independent benchmark where I can easily (looking at a couple of graphs) compare all popular (especially self-hosted) transcription models?
To me, the most worrying part of the whole discussion is that your comment is pretty much the most "daring", if you can call it that, attempt to question if there even is a crime. Everyone else is worried about raids (which are normal whenever there is an ongoing investigation, unfortunate as it may be to the one being investigated). And no one dares to say, that, uh, perhaps making pictures on GPU should not be considered a crime in the same sense as human-trafficking or production of weapons are... Oh, wait. The latter is legal, right.
Not universally true. I had a couple of cards without a chip because, reasons. I still was walking to the counter myself, because giving somebody my card feels weird.
Honestly, it is funny that people even seriously discuss this. I mean, now. It is such a blatant bullshit that you'd expect this time people would say "okay, now Musk went too far, nobody would buy into this". Yeah, I wish.
Let me help you. Words / [days since account creation] would be a far better metric to decide if you have an HN problem, and by that metric you easily surpass current account #1000.
In fact, if I make a proper leaderboard by that metric, you are #914. And #731 if we only consider accounts older than 60 days.
Would you be interested to measure me? Because I do indeed have an almost certified HN problem (I contacted the moderators once and they said that I was posting quite a lot when I might be posting quite a lot of comments)
Not sure if I would be happy or sad if I would have a HN problem tho. LOL. so maybe let's just have it in mystery :)
Alright time to really sleep now..... (have said this for an hour but the comments here had me hooked on them haha commenting to many/ (almost all?) of them)
I feel like there's a lot of confusion, so not sure if anyone can clear up mine, but I'm really struggling to see a significance of this.
Obviously, like all ignorant people do, I am going to oversimplify things here. But still, to me, the "platonic idea" of Anki seems a dead simple thing. All what I care about when using Anki is what's on the 2 sides of a Card, a question + answer, which can only be some visual image (possibly encoded as text, possibly just JPEG, I really don't care as long as it fits in my mobile device memory) + optional sound. That's really it. If it should be bi-directional or uni-directional card is a detail of how the deck is generated/encoded, and the spaced repetition algorithm is a detail of the app that I use to study (so, usually AnkiDroid, I imagine — an unaffiliated 3rd party; who even uses desktop apps nowadays?).
So, I imagine there can exist (and do exist) some minor additional features, like an ability to require a typed answer for a card, but it seems pretty minor, and I really don't see a lot of room for the app to evolve.
So, ultimately people need only a common .apkg format, which exists and is relatively simple (although I suppose it could've been even simplier), and a place like AnkiWeb, where people can share their decks, so Spanish top-2000 or basic integrals deck isn't re-invented over and over again. It's a pity that AnkiWeb isn't more open and will be even less open from now on, but as long as someone is willing to just host it (which is ultimately just paying for downloads traffic) it's easy to replicate, so no super-valuable IP here.
Of course, a primary use-case for Anki is a tool to make decks, but you could really do with pretty simple python script + YAML/JSON/CSV/whatever metadata file to convert it to AnkiDroid-compatible .apkg file.
IMO, the add-on value is the repository of decks that exist (and may or may not be free).
So an app-store of sorts.
As others have said, there are some provisions in place that make it allegedly harder to do a hard landgrab and keep people from freely sharing decks, to to me, even if it were so, I would not be too concerned.
In my opinion, the very act of creating one's deck is a key part of the learning. Maybe it's different for larning vocabulary, but as you said, it will be very hard to make those hard to share.
Learning a deck generated by someone else has never been as effective with me, so I think it's a false sense of time saving to use those.
> All what I care about when using Anki is what's on the 2 sides of a Card, a question + answer
This is damn near the least effective way to use Anki. Cloze deletion alone surpasses this.
Also, Anki is SRS. The value of Anki is in the rescheduling, not in the fact it's flashcards. And Anki has implemented the FSRS rescheduling algorithm, which is just one more feature not all flashcard apps do.
reply