But how do you know you're getting the correct picture from that throwaway UI? A little while back there was an blog posted where the author wrote an article praising AI for his vibe-coded earth-viewer app that used Vulkan to render inside a GUI window. Unfortunately, that wasn't the case and AI just copied from somewhere and inserted code for a rudimentary software rendering. The AI couldn't do what was asked because it had seldom been done. Nobody on the internet ever discussed that particular objective, so it wasn't in the training set.
The lesson to learn is that these are "large-language models." That means it can regurgitate what someone else has done before textually, but not actually create something novel. So it's fine if someone on the internet has posted or talked about a quick UI in whatever particular toolkit you're using to analyze data. But it'll throw out BS if you ask for something brand new. I suspect a lot of AI users are web developers who write a lot of repetitive rote boilerplate, and that's the kind of thing these LLMs really thrive with.
> But how do you know you're getting the correct picture from that throwaway UI?
You get the AI to generate code that lets you spot-check individual data points :-)
Most of my work these days is in fact that kind of code. I'm working on something research-y that requires a lot of visualization, and at this point I've actually produced more throwaway code than code in the project.
Here's an example: I had ChatGPT generate some relatively straightforward but cumbersome geometric code. Saved me 30 - 60 minutes right there, but to be sure, I had it generate tests, which all passed. Another 30 minutes saved.
I reviewed the code and the tests and felt it needed more edge cases, which I added manually. However, these started failing and it was really cumbersome to make sense of a bunch of coordinates in arrays.
So I had it generate code to visualize my test cases! That instantly showed me that some assertions in my manually added edge cases were incorrect, which became a quick fix.
The answer to "how do you trust AI" is human in the loop... AND MOAR AI!!! ;-)
> Nintendo games even temporarily dropped music voices (typically preferring the second square wave voice, then the first) to play sound effects.
This even happened on the SNES. A few Square games like Chrono Trigger and Final Fantasy 6 have tracks that are noticeably missing a music channel when playing sound effects. It wasn't totally necessary to use all the channels, but they were very adamant about their music quality.
Maybe it's the opposite. IBM spent years on the technology. Watson used neural networks, just not nearly as large. Perhaps they foresaw that it wouldn't scale or that it would plateau.
I don't know when the phrase "em dash" got popular. It was probably due to web development, because, unless you were into typesetting, nobody knew what "em" was. We always just called them dashes--two hyphens make a dash.
I would go back to the advent of Desktop Publishing. The early Macintosh + Laserwriter really did a number in bringing esoteric terms like "font" to us commoners.
Some of us found out we were typography nerds and didn't know it until then.
I think the move to more test-driven development has made everyone a little bit overconfident. I've seen pull requests merged that pass all tests, but still end up bug-ridden. There's just no substitute for human eyes looking over the changes.
I tried to find some code that wasn't minified to assess the quality of this, and I found some shader code for the sky in the gemini version. The whole shader looks like it was regurgitated verbatim. This wouldn't hold up to licensing scrutiny. Here's a snippet from it:
// wavelength of used primaries, according to preetham
const vec3 lambda = vec3( 680E-9, 550E-9, 450E-9 );
// this pre-calcuation replaces older TotalRayleigh(vec3 lambda) function:
// (8.0 * pow(pi, 3.0) * pow(pow(n, 2.0) - 1.0, 2.0) * (6.0 + 3.0 * pn)) / (3.0 * N * pow(lambda, vec3(4.0)) * (6.0 - 7.0 * pn))
Who's Preetham? Probably one of the copyright holders on this code.
Rather than stolen from Mr. Preetham, it's much more likely this fragment is generated from a large number of Preetham algorithm implementations out there, eg. I know at least Blender and Unreal implement it and probably heaps of others was well.
Nobody is going to sue you for using their implementation of a skybox algorithm from 1999, give us break. It's so generic you can probably really only write it in a couple of different ways.
If youre worried about it you can always spend a day with Claude, ChatGPT and yourself looking for license infringements and clean up your code.
> Nobody is going to sue you for using their implementation of a skybox algorithm from 1999, give us break.
For personal use maybe not, but that's not the point, the point is it's spitting out licensed code and not even letting you know. Now if you're a business who hire exclusively "vibe" coders with zero experience with enterprise software, now you're on the hook and most likely will be sued.
If true, then this usage could violate its MIT License: "The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."
The file seems to have been copied verbatim, more or less. But without the copyright info
This particular case appears to me to be a straight derivative at best but I'm by no means an expert on copyright laws.
That's not to say there hasn't already been more direct cases with set examples [1], from an author directly who would have a better right to claim than I [2], it's not even a stretch to see how it can happen.
As discussed repeatedly in this thread already in this particular case the code at hand wasn't generated by an LLM at all, it was simply included from a dependency by the build system.
> Seems like a massive attack surface for copyright trolls.
If you think any court system in the world has the capacity to deal with the sheer amount an LLM code can emit in an hour and audit for alleged copyright infringements ... I think we're trying to close the barn door now that the horse is already on a ship that has sailed.
This is a terrible argument, just because of the way the legal system works.
If MegaCorp has massive $$$$, but everyone else has small $, then MegaCorp can sue anyone else for using "their" code, that was supposedly generated by an LLM. Most of the time, it won't even get to court. The repo, the program, the whatever-they-want will get taken down way before that.
Courts don't work by saying, "oh, but everyone is doing it! Not much we can do now."
Someone brings a case and they, very laboriously, start to address it on its merits. Even before that, costs are accumulating on both sides.
Copyright trolls are mostly not MegaCorps, but they are abusers of the legal system. They won't target Google, but you, with your repo that does something that minorly annoys them? You are fair game.
> Courts don't work by saying, "oh, but everyone is doing it! Not much we can do now."
No, but they do recognise when their case registrations are filling up in a way that they cannot possibly process and make adjustments. Courts do not have an infinite capacity.
There's a really simple solution that you may not have considered:
1) don't put your vibe-coded source code in a public git repo, keep it in a local one, with y'know, authentication in front of it;
2) regularly ask your agents to review the code for potential copyright infringements if you either want to release the source or compiled code to the public at any point.
As long as you've followed best practices, I can't see why this is really going to become an issue. Most copyright infringements need to start with Cease & Desist anyway or they'll be thrown out of court. The alleged offender has to be given the opportunity to make good.
So "Claude, we received a C&D for this section of code you stole from https://.../ , you need to make a unique implementation that does not breach their copyright".
You will be surprised how easily this can be resolved.
In the US you can't sue without having obtained or applied for a registration. If the registration does not grant, you cannot sue. You cannot get a registration for code developed by AI.
> Courts don't work by saying, "oh, but everyone is doing it! Not much we can do now."
They kind of do. If you fail to bring legal action to guard your intellectual property, and there’s a pattern of you not guarding it, then in future cases this can be used against you when determining damages etc. Weakens your case.
This is only true of trademarks, not copyrights (which was the discussion here).
Trademarks can become 'generic' if you don't defend them. But JK Rowling wrote Harry Potter, whether she sues fanfic authors or not, and can selectively enforce her copyright as she likes.
GPT’s differentiator is they focused on training for “thinking” while Gemini prioritized instant response. Medium thinking is not the limit of utility
Re: overparameterization specifically Medium and High are also identically parameterized
Medium will also dynamically use even higher thinking than High. High is fixed at a higher level rather than leaving it to be dynamic, though somewhat less than Medium’s upper limit
Copyright is so-so. At the end of the day you can say that the complete work (not just snippets) is something copyrightable. But the most bananas thing for me is that one can patent the concept of one click purchasing. That's insane on many levels.
I always find it amazing that people are wiling to use AI beacuse of stuff like this, its been illegally trained on code that it does not have the license to use, and constantly willy nilly regurgitates entire snippets completely violating the terms of use
As discussed in this thread before you posted this comment, this code wasn't generated from an LLM at all, but simply included in a dependency: https://news.ycombinator.com/item?id=46092904
Unlike your results which aren't exact match, or likely even a close enough match to be copyright infringment if the LLM was inspired by them (consider that copyright doesn't protect functional elements), an exact match of the code is here (and I assume from the comment I linked above this is a dependency of three.js, though I didn't track that down myself): https://github.com/GPUOpen-LibrariesAndSDKs/Cauldron/blob/b9...
Edit: Actually on further thought the date on the copyright header vs the git dates suggests the file in that repo was copied from somewhere else... anyways I think we can be reasonably confident that a version of this file is in the dependency. Again I didn't look at the three.js code myself to track down how its included.
If there's any copyright infringment here it would be because bog standard web tools fail to comply with the licenses of their dependencies and include a copy of the license, not because of LLMs. I think that is actually the case for many of them? I didn't investigate the to check if licenses are included in the network traffic.
There are several cases where copyright law is not only about exact copy but also derivatives. So finding an exact match is not necessary.. Not sure it matters in this case eitherway.
I have been trained on code I don't have the license to use myself. I'm not like these Creators, who suck wisdom from the cosmos directly, apparently.
Sure. It's a problem that corporations run by more or less insane people are the ones monetizing and controlling access to these tools. But the solution to that can't be even more extended private monopolistic property claims to thought-stuff. Such claims are usually the way those crazy people got where they are.
You think in a world where Elsevier didn't just own the papers, but rights to a share in everything learned from them, would be better for you?
It's fascinating that people care very much about this when it's visual arts, but when it comes to code almost no one does.
E.g. the latest Anno game (117) received a lot of hate for using AI generated loading screen backgrounds, while I have never heard of a single person caring about code, which probably was heavily AI generated.
"Claude - rewrite this apparently copyrighted code that can be found online here <http://...> in a way that makes it a unique implementation." <- probably will work.
If the generated code in TFA contained the actual Counter-Strike source code, then you (well, Valve) would have a defensible claim. But the prompt was to make something like Counter-Strike, and it came up with something different. That's fair game.
Definitely citation needed. Such court cases usually come with a lot of important context. How can you just make such a statement and get away with not providing any context link?
reply