Your best guess is that the true rate is 20x higher than the observed rate? This seems unlikely to me given the number of samples (outside of systematic biases towards certain types of memory safety bugs that probably apply to C++ code too). 10 per hundred MLOC is closer to what I would have guessed too, but that is because I've historically been very conservative with my assumptions about the memory unsafety rate per unsafe LOC being similar to that of C++. The evidence here suggests that the true rate is probably much lower than that.
I'm making a conservative guess, which is why I said 10 or less (10 or fewer??). So the improvement is at least a hundredfold. I might say 5 or less instead. I think the exact rate is not so important; either way, it's clear that Rust is a boon.
That's not what happened at all. Graydon voluntarily stepped down because he didn't want to be in the role of BDFL. It's true that he wasn't a huge fan of the "C++-ification" of the language, but he wasn't pushed out or anything, and definitely could have stayed on steering the project as long as he wanted to. I think there are a number of other languages that would have benefited from a similar approach, actually.
Is it a usual thing you do that when you're given data about a literal thousandfold improvement, in a context where there are well-understood theoretical and practical reasons why you might have expected to see such an improvement, you make up reasons why it is actually not an improvement at all, without either investigating to see whether those reasons are actually true or demonstrating that even if they were true, they could possibly explain more than a tiny fraction of the improvement?
I usually am skeptical about a literal thousandfold improvement, yes. And I'm not saying it's impossible, but rather that the data and the way it's presented has inherent biases. Its on the people making grandiose claims to prove them.
The fact that the safe subset of Rust can indeed be made safe, and that unsafe code can be encapsulated like this, have already been formally verified. This is an empirical demonstration that matches these formal results across a large company with a large amount of code (technically, it exceeds them, but this is only under the assumption that memory safety issues per line of unsafe Rust are identical to memory safety issues per line of C, which is really an unwarranted simplifying assumption).
Do you really believe that "they're not actively looking for memory safety issues in Rust" is (1) true (at least outside of Google, there is actually a ton of security work done specifically targeting just the unsafe blocks, since those are obviously where the memory safety issues lie) or (2) could possibly be responsible for a literal thousandfold reduction in memory safety issues? Remember that the Rust code is often integrated with C++ code--there is not necessarily a way to just test the C++ part even if you wanted to. Additionally, Google has explicitly prioritized code that interacts with untrusted data (like parsers and networking code) meaning it's likely to be easier to fuzz most of this Rust code than most new C++ code.
Also remember that, again, there are mechanized proofs of memory safety for a large subset of the safe portion of Rust, which constitutes 96% of the code under consideration here. The rate of memory safety bugs would have to be 25x as high per LOC in unsafe Rust code as in C code for the number of vulnerabilities to match. It would be far more shocking if we didn't see a dramatic reduction. Google is empirically demonstrating that the observed memory safety bugs per line of unsafe Rust is actually far lower than per line of C, but my point is that even if you think that is the result of bias or them not applying the same scrutiny to Rust code (something that is certainly not true of Rust vs. C code in the wild), the effect of this underrepresentation cannot possibly explain most of the reduction they observe.
Google deployed numerous state of the art mitigations prior to adopting Rust and still found that 70% of their CVEs were due to memory safety issues--your assertion that they are engaged in motivated reasoning and just wanted Rust to work out is pretty ill-founded. In fact when I worked for Google prior to Rust's release, they were strongly averse towards adopting any new language and believed that good engineering practices, automation, and a rigorous review process always outweighed the benefits of adopting a new language past their core ones, whatever its purported reliability or performance benefits. Security researchers are highly skeptical as a rule of these sorts of claims and have a lot of say at Google. They themselves changed their minds based on this sort of internal evidence.
I agree that skepticism is warranted, because we are constantly being sold things by industry. At a certain point, though, when the effect size is massive and persistent and the mechanism extremely clear, that skepticism (not in general, but of a particular claim) becomes the unscientific position. We are well past that point with Rust wrt memory safety.
It wasn't really possible. We had neither the PL techniques nor the computational power to make something like Rust work at the time. All the answers people are throwing around showing it would have been possible rely on a garbage collector and require a runtime, or have many other unacceptable compromises (e.g. no use after free because you aren't allowed to free).
The actual best response would be to run any "unsupported" codecs in a WASM sandbox. That way you are not throwing away work, Google can stop running fuzzers against random formats from 1995, and you can legitimately say that the worst that can happen with these formats is a process crash. Everybody wins.
It feels like maybe people do not realize that Google is not the only company that can run fuzzers against ffmpeg? Attackers are also highly incentivized to do so and they will not do you the courtesy of filing bug reports.
I would never claim that we can reliably detect all AI generated text. There are many ways to write text with LLM assistance that is indistinguishable from human output. Moreover, models themselves are extremely bad at detecting AI-generated text, and it is relatively easy to edit these tells out if you know what to look for (one can try to prompt them out too, though success is more limited there). I am happy to make a much narrower claim, however: each particular set of models, when not heavily prompted to do otherwise, has a "house style" that's pretty easily identifiable by humans in long-form writing samples, and content written with that house style has a very high chance of being generated by AI. When text is written in this house style, it is often a sign that not only were LLMs used in its generation, but the person doing the generation did not bother to do much editing or use a more sophisticated prompt that wouldn't result in such obvious tells, which is why the style is commonly associated with "slop."
I find it interesting that you believe this claim is wildly conspirational, or that you think the difficulty of reliably detecting AI generated text at scale is evidence that humans can't do pretty well at this much more limited task. Do you also find claims that AIs are frequently sycophantic in ways that humans are not, or that they will use phrases like "you're absolutely right!" far more than a human would unless prompted otherwise (which are the exact same type of narrow claim) similarly conspirational? i.e., is your assertion that people would have difficulty differentiating between a real human's response to a prompt and Claude's response to a prompt when there was no specific pre-prompt trying to control the writing style of the response?
On the other fork where I responded to your claims with a direct and detailed response, you insisted that my comment “isn't really that interesting” and just disengaged. I’m not going to write another detailed explanation of why your “slop === AI” premise is flawed. Go reread the other fork if you’ve decided you’re interested.
> I find it interesting that you believe this claim is wildly conspirational
I don’t believe it’s wildly conspiratorial. I believe it’s foolishly conspiratorial. There’s some weird hubris in believing that you (and whatever group you identify as “us”) are able to deterministically identify AI text when experts can’t do it. If you could actually do it you’d probably sell it as a product.
> believing that you (and whatever group you identify as “us”) are able to deterministically identify AI text
I think you will find the OP said no such thing. They instead said they identified a mixture of writing styles consistent with a human author and an LLM. The OP says nothing about deterministically identifying LLMs, only that the style of specific sections is consistent with LLMs leading to the conclusion.
Thanks for adding the quote, that is a different part of the post than I was focusing on.
I still think that's a far cry from deterministically recognizing LLM-generated text. At least the way I would understand that would be an algorithmic test with very low rates of both false positives and false negatives. Instead I understood the OP to be saying that people have an intuitive sense of LLM generated text with a relatively low false negative rate.
I am certain that the skill varies widely between individuals, but in principle there is no reason to suspect that with training humans could not become quite good at recognizing low effort (no attempt at altering style) LLM generated content from the major models. In principle it is no different than authorship analysis used in digital forensics, a field that shows fairly high accuracy under similar conditions.
I am pretty much certain that parts of it were LLM-written, yes. This doesn't imply that the entire blog post is LLM-generated. If you're a good Bayesian and object to my use of "100%" feel free to pretend that I said something like "95%" instead. I cannot rule out possibilities like, for example, a human deliberately writing in the style of an LLM to trick people, or a human who uses LLMs so frequently that their writing style has become very close to LLM writing (something I mentioned as a possibility in an earlier reply; for various reasons, including the uneven distribution of the LLM-isms, I think that's unlikely here).
Human experts can reliably detect some kinds of long-form, AI-generated text using exactly the same sorts of cues I've outlined: https://arxiv.org/html/2501.15654v1. You may take issue with the quality of the paper, but there have been very few studies like this and this one found an extremely strong effect.
I am making an even more limited claim than the article, which is only that it's possible for "experts" (i.e. people who frequently interact with LLMs as part of their day jobs) to identify AI generated text in long-form passages in a way that has very few false positives, not classify it perfectly. I've also introduced the caveat that this only applies to AI generated text that has received minimal or no prompting to "humanize" the writing style, not AI generated text in general.
If you would like to perform a higher-quality study with more recent models, feel free (it's only fair that I ask you to do an unreasonable amount of work here given that your argument appears to be that if I don't quit my lucrative programming job and go manually classify text for pennies on the dollar, it proves that it can't be done).
The reason this isn't offered as a service is because it makes no economic sense to do so using humans, not because it's impossible as you claim. This kind of "human" detection mechanism does not scale the way generation does. The cues that I rely on are also pretty easy to eliminate if you know someone is looking for them. This means that heuristics do not work reliably against someone actively trying to avoid human detection, or a human deliberately trying to sound like an LLM (I feel the need to reiterate this as many of the counterarguments to what I'm saying are to claims of this form).
> I’m not going to write another detailed explanation of why your “slop === AI” premise is flawed.
This isn't a claim that I made. I believe that text written with LLM assistance is not necessarily slop, and that slop is not necessarily AI generated. The only assertion I made regarding slop is that being written with LLM assistance with minimal prompting or editing is a strong predictor of slop, and that the heuristics I'm using (if present in large quantities) are a strong predictor of an article being written with LLM assistance with minimal prompting or editing. i.e. I, I am asserting that these kinds of heuristics work pretty well on articles generated by people who don't realize (or care) that there are LLM "tells" all over their work. The fact that many of the articles posted to HN are being accused of being LLM generated could certainly indicate that this is all just a massive witch hunt, but given the acknowledged popularity of ChatGPT among the general population and the fact that experts can pretty easily identify non-humanized articles, I think "a lot of people are using LLMs in the process of generating their blog posts, and some sizable fraction of those people didn't edit the output very much" is an equally compelling hypothesis.
That’s a really interesting study. Thanks for sharing that.
This seems like the kind of thing to share when making a bold claim about being able to detect AI with high confidence. This is a lot more weighty than not so subtly asserting that I’m too dumb to recognize AI.
> a human deliberately trying to sound like an LLM (I feel the need to reiterate this as many of the counterarguments to what I'm saying are to claims of this form).
I assume this is a reference to me. To be clear, I was never referring to humans specifically attempting to sound like AI. I was saying that a lot of formulaic stuff people attribute to AI is simply following the same patterns humans started, and while it might be slop, it’s not necessarily AI slop. Hence the AITA rage bait example.
Thanks for engaging thoughtfully! FWIW I actually looked this article up because I was interested in your claim that even experts couldn't perform these tasks, something I hadn't heard before--I'm not actually ignoring what you're saying. It's actually very nice to have a productive conversation on HN :)
Parts of it were 100% LLM written. Like it or not, people can recognize LLM-generated text pretty easily, and if they see it they are going to make the assumption that the rest of the article is slop too.
I can point to individual sentences that were clearly generated by AI (for example, numerous instances of this parallel construction, "No warning. No error. Just different methods that make no sense.", "Not corrupted. Not misaligned. Not reading wrong offsets.", "Not a segfault. Not the T_NONE error from #1079. There it is, the exact error from production"). The style is list-heavy, including lists used for conditionals, and full of random bolding, both characteristic of AI-generated text. And there are a number of other tells as well.
The reason I don't usually bother to bring these specific things up is that I already know the response, which is just going to be you arguing that a human could have written this way, too. Which is true. The point is that if you read the collective whole of the article, it is very clear that it was composed with the aid of AI, regardless of whether any single part of it could be defensibly written by a human. I'd add that sometimes, the writing of people who interact heavily with LLMs all day starts to resemble LLM writing (a phenomenon I don't think people talk enough about), but usually not to this extent.
This doesn't mean that the entire article was written by an LLM, nor does it mean that there's not useful information in it. Regardless, given the amount of low effort LLM-generated spam that makes it onto HN, I think it is fairly defensible to use "this was written with the help of an LLM, and the person posting it did not even bother to edit the article to make that less obvious" as a heuristic to not bother wasting more time on an article.
“not A, not B, not C” and “not A, not B, but C” are extremely common constructions in general. So common in fact that you did it in this exact reply.
“This doesn't mean that the entire article was written by an LLM, nor does it mean that there's not useful information in it. Regardless, given the amount of low effort LLM-generated spam that makes it onto HN, I think it is fairly defensible”
> The style is list-heavy, including lists used for conditionals, and full of random bolding, both characteristic of AI-generated text
This is just blogspam-style writing. Short snippets that are easy to digest with lists to break it up and bold keywords to grab attention. This style was around for years before ChatGPT showed up. LLMs probably do this so much specifically because they were trained on so much blog content. Hell I’ve given feedback to multiple humans to cut out the distracting bold stuff in their communications because it becomes a distraction.
Blog spam doesn’t intersperse the drivel with literary narrative beats and subsection titles that sound like sci-fi novels. The greasy mixture of superficially polished but substantively vacuous is much more pronounced in LLM output than even the most egregious human-generated content marketing, in part because the cognitive entity in the latter case is either too smart, or too stupid, to leave such a starkly evident gap.
Again, this is why I don't bother explaining why it's very obvious to us. People like you immediately claim that human writing is like this all the time, which it's not. Suffice it to say that if a large number of people are immediately flagging something as AI, it is probably for a reason.
My reply wasn't an instance of this syntactic pattern, and the fact that you think it's the same thing shows that you are probably not capable of recognizing the particular way in which LLMs write.
> Again, this is why I don't bother explaining why it's very obvious to us.
The thing is, your premise is that you can identify certain patterns as being indicative of AI. However, those exact same patterns are commonly used by humans. So what you’re actually claiming is some additional insight that you can’t share. Because your premise does not hold up on its own. What you were actually claiming is “I know it when I see it”.
Let me give you a related example. If you go to any of the “am I the asshole” subreddits, you will encounter the exact same story format over and over: “Other person engages in obviously unacceptable behavior. I do something reasonable to stop the unacceptable behavior. People who should support me support other person instead. Am I the asshole?” The comments will be filled with people either enraged on behalf of the author or who call it AI.
The problem with claiming that it’s AI is that the sub was full of the exact same garbage before AI showed up. The stores have always been the same bullshit rage bait. So it’s not technically wrong to say it looks like AI, because it certainly could be. But it could also be human generated rage bait because it’s indistinguishable. My guess is that some of the sub is totally AI. And a chunk of it is from human humans engaged in shitty creative writing.
When you look at generic click-bait/blogspam patterns that humans have been using for decades now and call it AI, all you’re doing is calling annoying blog writing AI. Which it could be, but it could also not be. Humans absolutely write blogs like this and have for longer than LLMs have been widely available.
> My reply wasn't an instance of this syntactic pattern, and the fact that you think it's the same thing shows that you are probably not capable of recognizing the particular way in which LLMs write.
It was absolutely an example of the pattern, just more wordy. Spare me the ad hominem.
Your “you couldn’t understand” and “obvious to us” stuff is leaning into conspiracy theory type territory. When you believe you have some special knowledge, but you don’t know how to share it with others, you should question whether that knowledge is actually real.
> It was absolutely an example of the pattern, just more wordy. Spare me the ad hominem.
LLMs simply don't generate the syntactic pattern I used consistently, but they do generate the pattern in the article. I'm not really sure what else to tell you.
The rest of your post isn't really that interesting to me. You asked why nobody was giving specific examples of why it was generated. I told you some of the specific reasons we believe this article was generated with the assistance of an LLM (not all--there are many other sentences that are more borderline which only slightly increase the probability of LLM generation in isolation, which aren't worth cataloguing except in a context where people genuinely want to know why humans think a post reads as AI-generated and are not just using this as an excuse to deliver a pre-prepared rant), mentioned that the reason people don't typically bother to bring it up is that we know people who demand this sort of thing tend to claim without evidence that humans write in the exact same way all the time, and you proceeded to do exactly that. Next time you don't get a response when you ask for evidence, consider that it might be because we don't particularly want to waste time responding to someone who isn't interested in the answer.
This is what really scares me about people using AI. It will confidently hallucinate studies and quotes that have absolutely no basis in reality, and even in your own field you're not going to know whether what it's saying is real or not without following up on absolutely every assertion. But people are happy to completely buy its diagnoses of rare medical conditions based on what, exactly?
GPT-5 thinking is one of the biggest offenders and it's quite incredible to me that you think it doesn't hallucinate. It makes me strongly suspect your own judgment is impaired. Also, what do you mean asking for "reproducible examples"? Is it somehow not a valid example if it only sometimes makes up citations?