Hacker Newsnew | past | comments | ask | show | jobs | submit | krick's commentslogin

I don't quite understand the purpose. Yes, it's clearly stated, but, what do you mean "a reasonable subset of Python code" while "cannot use the standard library"? 99.9% of Python I write for anything ever uses standard library and then some (requests?). What do you expect your LLM-agent to write without that? A pseudo-code sorting algorithm sketch? Why would you even want to run that?

They plan to use to for "Code Mode" which mean the LLM will use this to run Python code that it writes to run tools instead of having to load the tools up front into the LLM context window.

(Pydantic AI lead here) We’re implementing Code Mode in https://github.com/pydantic/pydantic-ai/pull/4153 with support for Monty and abstractions to use other runtimes / sandboxes.

The idea is that in “traditional” LLM tool calling, the entire (MCP) tool result is sent back to the LLM, even if it just needs a few fields, or is going to pass the return value into another tool without needing to see the intermediate value. Every step that depends on results from an earlier step also requires a new LLM turn, limiting parallelism and adding a lot of overhead.

With code mode, the LLM can chain tool calls, pull out specific fields, and run entire algorithms using tools with only the necessary parts of the result (or errors) going back to the LLM.

These posts by Cloudflare: https://blog.cloudflare.com/code-mode/ and Anthropic: https://platform.claude.com/docs/en/agents-and-tools/tool-us... explain the concept and its advantages in more detail.


I like your effort. Time savings and strict security are real and important. In modern orchestration flows, however, a subagent handles the extra processing of tool results, so the context of the main agent is not poluted.

It's pydantic, they're verifying types and syntax, those don't require the stdlib. Type hints, syntax checks, likely logical issues,etc.. static type checking is good with that, but LLMs can take to the next level where they analyze the intended data flow and find logical bugs, or good syntax and typing but not the intended syntax.

For example, incorrect levels of indentation. Let me use dots instead of space because of HN formatting:

for key,val in mydict.items():

..if key == "operation":

....logging.info("Executing operation %s",val)

..if val == "drop_table":

....self.drop_table()

This uses good syntax, and I the logging part is not in the stdlib, so I assume it would ignore it or replace it with dummy code? That shouldn't prevent it from analyzing that loop and determining that the second if-block was intended to be under the first, and the way it is written now, the key check isn't done.

In other words, if you don't want to do validate proper stdlib/module usage, but proper __Python__ usage, this makes sense. Although I'm speculating on exactly what they're trying to do.

EDIT: I think I my speculation was wrong, it looks like they might have developed this to write code for pydantic-ai: https://github.com/pydantic/pydantic-ai , i'll leave the comment above as-is though, since I think it would still be cool to have that capability in pydantic.


No, seriously, people need to be punished for submitting LLM-generated garbage without specifying that it's LLM-generated garbage. 400+ points, oh my god, people, what's wrong with you...

We buried the post for seeming obviously-LLM-generated. But please email us about these (hn@ycombinator.com) rather than posting public accusations.

There are two reasons why emailing us is better:

First, the negative consequences of a false allegation outweigh the benefits of a valid accusation.

Second and more important: we'll likely see an email sooner than we'll see a comment, so we can nip it in the bud quickly, rather leaving it sitting on the front page for hours.


Not sure if I should mail this question but, is there any chance those 400+ votes are artificially inflated?

There’s no evidence of this, but a title that’s easy to agree with can often attract upvotes from people who don’t read the article.

Hey Tom! Earnest question here - I am seeing on the order of one AI post a day on HN, sometimes more than that. It's good to know we can email in about these things, but I think most users don't understand that - certainly I didn't for the last few month as this has been going on. It would be nice if there was an affordance on the site to flag these, similar to the existing flag function.

Thanks!


Just flagging them is fine. Emailing us with the link is even better. Part of me wonders if we should have a new, specific-purpose flag for generated content, but it’s not the HN way to add new features for actions that can already be satisfied by existing UI features.

I would suggest a new feature here. I have been hesitating on flagging them because I feel 'flag' is for things which are obvious rule violations. I feel a bit bad flagging a submission off the front page when I only have a vague feeling that it was AI-written. (And sometimes AI generated content isn't that bad, i.e. if an author just used AI to translate their writing into English.)

You buried a popular post because of the public accusation or just your "hunch"?

Why not let your audience decide what it wants to read?

I say this as a long time HN reader, who feels like the community has become grumpier over the years. Which I feel like is a shame. But maybe that's just me.


You're welcome to email us about this.

It's my job to read HN posts and comments all day, every day, and these days that means spending a lot of time evaluating whether a post seems LLM-generated. In this case the post seems LLM-generated or heavily LLM-edited.

We have been asking the community not to publicly call out posts for being LLM-generated, for the reasons I explained in the latest edit of the comment you replied to. But if we're going to ask the community that, we also need to ask submitters to not post obviously-LLM-influenced articles. We've been asking that ever since LLMs became commonplace.

> I say this as a long time HN reader, who feels like the community has become grumpier over the years. Which I feel like is a shame. But maybe that's just me.

We've recently added this line to the guidelines: Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative.

HN has become grumpier, and we don't like that. But a lot of it is in reaction to the HN audience being disappointed at a lot of what modern tech companies are serving up, both in terms of products and content, and it doesn't work for us to tell them they're wrong to feel that way. We can try, but we can't force anyone to feel differently. It's just as much up to product creators and content creators to keep working to raise the standards of what they offer the audience.


Thanks Tom, I appreciate the openness. You are seemingly overriding the wishes of the community, but it your community and you have the right to do so. I still think it's a shame, but that's my problem.

> You are seemingly overriding the wishes of the community

That's false. The overwhelming sentiment of the community is that HN should be free of LLM-generated content or content that has obvious AI fingerprints. Sometimes people don't immediately realize that an article or comment has a heavy LLM influence, but once they realize it does, they expect us to act (this is especially true if they didn't realize it initially, as they feel deceived). This is clear from the comments and emails we get about this topic.

If you can publish a new version of the post that is human-authored, we'd happily re-up it.


>> You are seemingly overriding the wishes of the community

> That's false. The overwhelming sentiment of the community is that HN should be free of LLM-generated content or content that has obvious AI fingerprints.

Yeah it is indeed, and for good reason: why would I spend time reading something the author didn't spend time thinking through and writing?

It's not that people don't like Postgres articles (otherwise, the upvotes would be much lower), but once you read a bit of the article, the LLM stench it gives off is characteristic. You know: Standard. LLM. Style. It's tiresome. Irksome. Off-putting.

What I'm wondering is, if LLMs are trained on "our" (in the wider sense of the word) writing style, and spew it back at us, what data set was it that overused this superficial emphatic style to such a degree, that it's now overwhelmingly the bog-standard generative output style?


Likely a lot of medium posts? That's my theory anyway.

I'm just sharing my thoughts as a long-time reader. Again, it's your show. You don't have to defend your actions. Thanks for all that you do.

I’d be grumpy over wasting my time on an HN post that’s LLM generated which doesn’t state that it is. If I wanted this, I could be prompting N number of chat models available to me instead of meandering over here.

There are also 200+ comments on here and a good discussion IMO which is now unfortunately buried.

Feels like a net negative for the HN community.


They’re upvoting because they agree with the sentiment in the title.

That’s largely how these voting sites work.


Exposing the age-old truth of "commenters and voters don't read articles" I see

I just pasted the first paragraph in an "AI detector" app and it indeed came back as 100% AI. But I heard those things are unreliable. How did you determine this was LLM-generated? The same way?

Apart from the style of the prose, which is my subjective evaluation: This blog post is "a view from nowhere." Tiger Data is a company that sells postgres in some way (don't know, but it doesn't matter for the following): they could speak as themselves, and compare themselves to companies that sell other open source databases. Or they could showcase benchmarks _they ran_.

Them saying: "What you get: pgvectorscale uses the DiskANN algorithm (from Microsoft Research), achieving 28x lower p95 latency and 16x higher throughput than Pinecone at 99% recall" is marketing unless they give how you'd replicate those numbers.

Point being: this could have been written by an LLM, because it doesn't represent any work-done by Tiger Data.


For what it's worth, TigerData is the company that develops TimescaleDB, a very popular and performant time series database provided as a Postgres extension. I'm surprised that the fact that TigerData is behind it is not mentioned anywhere in the blog post. (Though, TimescaleDB is mentioned 14 times on the page).

The cynical take is: the AI doesn't know you-the-blog-post-author made TimescaleDB unless you tell it!

I don't understand your example: pgvectorscale was built and is maintained by Tiger Data

In terms of that example: they should link to how they got those numbers, and it should state the benchmark used, the machines used, what they controlled for etc.

Just using LLMs enough I've developed a sense for the flavor of writing. Surely it could be hidden with enough work, but most of the time it's pretty blatant.

Sometimes I get an "uncanny valley" vibe when reading AI-generated text. It can be pretty unnerving.

It’s got that LLM flow to it. Also liberal use of formatting. It’s like it cannot possibly emphasize enough. Tries to make every word hit as hard as possible. Theres no filler, nothing slightly tangential or off topic to add color. Just many vapid points rapid fire, as if they’re the hardest hitting truth of the decade lol

ChatGPT has a pretty obvious writing style at the moment. It's not a question of some nebulous "AI detector" gotcha, it's more just basic human pattern matching. The abundant bullet points, copious bold text, pithy one line summarizing assertions ("In the AI era, simplicity isn’t just elegant. It’s essential."). There are so many more in just how it structures its writing (eg "Let’s address this head-on."). Hard to enumerate everything, frankly.

Was it leading with a bad analogy that gave it way?

I know everybody just wants to talk about Postgres but it’s still sad to see any sort of engagement with slop. Even though the actual article is essentially irrelevant lol

and it’s up voted 400+ on hn. This place has truly lost its way.

I mean to be fair, "Just use Postgres" will get 400 votes here without people even clicking TFA.

Is there some well established independent benchmark where I can easily (looking at a couple of graphs) compare all popular (especially self-hosted) transcription models?

Not that I am aware of unfortunately

To me, the most worrying part of the whole discussion is that your comment is pretty much the most "daring", if you can call it that, attempt to question if there even is a crime. Everyone else is worried about raids (which are normal whenever there is an ongoing investigation, unfortunate as it may be to the one being investigated). And no one dares to say, that, uh, perhaps making pictures on GPU should not be considered a crime in the same sense as human-trafficking or production of weapons are... Oh, wait. The latter is legal, right.

Not universally true. I had a couple of cards without a chip because, reasons. I still was walking to the counter myself, because giving somebody my card feels weird.

Honestly, it is funny that people even seriously discuss this. I mean, now. It is such a blatant bullshit that you'd expect this time people would say "okay, now Musk went too far, nobody would buy into this". Yeah, I wish.

You are largely right. In fact, only 444 710 of those accounts have more than 1 post, and only 229 920 have more than 5.

Let me help you. Words / [days since account creation] would be a far better metric to decide if you have an HN problem, and by that metric you easily surpass current account #1000.

In fact, if I make a proper leaderboard by that metric, you are #914. And #731 if we only consider accounts older than 60 days.

Here's the data: https://pastes.io/name-words


Would you be interested to measure me? Because I do indeed have an almost certified HN problem (I contacted the moderators once and they said that I was posting quite a lot when I might be posting quite a lot of comments)

Not sure if I would be happy or sad if I would have a HN problem tho. LOL. so maybe let's just have it in mystery :)

Alright time to really sleep now..... (have said this for an hour but the comments here had me hooked on them haha commenting to many/ (almost all?) of them)

Good night!


You are in the data I attached. #43.

Oh LOL. #43'rd nice. I would've preferred #42 tho just because the meaning of life but I will take #43th :D

thank you closes tab

Yeah, that was my reaction as well. Not sure what other people are so happy about. (But of course I did.)

I feel like there's a lot of confusion, so not sure if anyone can clear up mine, but I'm really struggling to see a significance of this.

Obviously, like all ignorant people do, I am going to oversimplify things here. But still, to me, the "platonic idea" of Anki seems a dead simple thing. All what I care about when using Anki is what's on the 2 sides of a Card, a question + answer, which can only be some visual image (possibly encoded as text, possibly just JPEG, I really don't care as long as it fits in my mobile device memory) + optional sound. That's really it. If it should be bi-directional or uni-directional card is a detail of how the deck is generated/encoded, and the spaced repetition algorithm is a detail of the app that I use to study (so, usually AnkiDroid, I imagine — an unaffiliated 3rd party; who even uses desktop apps nowadays?).

So, I imagine there can exist (and do exist) some minor additional features, like an ability to require a typed answer for a card, but it seems pretty minor, and I really don't see a lot of room for the app to evolve.

So, ultimately people need only a common .apkg format, which exists and is relatively simple (although I suppose it could've been even simplier), and a place like AnkiWeb, where people can share their decks, so Spanish top-2000 or basic integrals deck isn't re-invented over and over again. It's a pity that AnkiWeb isn't more open and will be even less open from now on, but as long as someone is willing to just host it (which is ultimately just paying for downloads traffic) it's easy to replicate, so no super-valuable IP here.

Of course, a primary use-case for Anki is a tool to make decks, but you could really do with pretty simple python script + YAML/JSON/CSV/whatever metadata file to convert it to AnkiDroid-compatible .apkg file.

So, basically, who cares? What is to "own" there?


IMO, the add-on value is the repository of decks that exist (and may or may not be free).

So an app-store of sorts.

As others have said, there are some provisions in place that make it allegedly harder to do a hard landgrab and keep people from freely sharing decks, to to me, even if it were so, I would not be too concerned.

In my opinion, the very act of creating one's deck is a key part of the learning. Maybe it's different for larning vocabulary, but as you said, it will be very hard to make those hard to share.

Learning a deck generated by someone else has never been as effective with me, so I think it's a false sense of time saving to use those.


> All what I care about when using Anki is what's on the 2 sides of a Card, a question + answer

This is damn near the least effective way to use Anki. Cloze deletion alone surpasses this.

Also, Anki is SRS. The value of Anki is in the rescheduling, not in the fact it's flashcards. And Anki has implemented the FSRS rescheduling algorithm, which is just one more feature not all flashcard apps do.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: