OP here, I added a sample PDF output in the project assets and put screenshots in the ReadMe. The text is selectable after rehydration. would this work with your app?
> Try building something new in claude code (or codex etc) using a programming language you have not used before. Your opinion might change drastically.
Try changing something old in claude code (or codex etc) using a programming language you have used before. Your opinion might change drastically.
I did just that and I ended up horribly regretting it.
The project had to be coded in Rust, which I kind of understand but never worked with. Drunk on AI hype, I gave it step by step tasks and watched it produce the code. The first warning sign was that the code never compiled at the first attempt, but I ignored this, being mesmerized by the magic of the experience.
Long story short, it gave me quick initial results despite my language handicap. But the project quickly turned into an overly complex, hard to navigate, brittle mess. I ended up reading the Rust in Action book and spending two weeks cleaning and simplifying the code. I had to learn how to configure the entire tool chain, understand various cargo deps and the ecosystem, setup ci/cd from scratch, .... There is no way around that.
It was Claude Code Opus 4.1 instead of Codex but IMO the differences are negligible.
AI can be quite impressive if the conditions are right for it. But it still fails at so many common things for me that I'm not sure if it's actually saving me time overall.
I just tried earlier today to get Copilot to make a simple refactor across ~30-40 files. Essentially changing one constructor parameter in all derived classes from a common base class and adding an import statement. In the end it managed ~80% of the job, but only after messing it up entirely first (waiting a few minutes), then asking again after 5 minutes of waiting if it really should do the thing and then missing a bunch of classes and randomly removing about 5 parenthesis from the files it edited.
Just one anecdote, but my experiences so far have been that the results vary dramatically and that AI is mostly useless in many of the situations I've tried to use it.
This is exactly my experience. We wanted to modernize a java codebase by removing java JNDI global variables. This is a simple though tedious task. And we tried Claude Code and Gemini. Both of these results were hilarious.
One thing I like for this type of refactoring scenario is asking it to write a codemod (which you can of course do yourself but there's a learning curve). Faster result that takes advantage of a deterministic tool.
Yeah I've used it for personal projects and it's 50/50 for me.
Some of the stuff generated I can't believe is actually good to work with long term, and I wonder about the economics of it. It's fun to get something vaguely workable quickly though.
Things like deepwiki are useful too for open source work.
For me though the core problem I have with AI programming tools is that they're targeting a problem that doesn't really exist outside of startups, not writing enough code, instead of the real part of inefficiency in any reasonably sized org, coordination problems.
Of course if you tried to solve coordination problems, then it would probably be a lot harder to sell to management because we'd have to do some collective introspection as to where they come from.
>Of course if you tried to solve coordination problems, then it would probably be a lot harder to sell to management because we'd have to do some collective introspection as to where they come in.
Sad but true. Better to sell to management and tagline it as "you don't need a whole team anymore.", or going so far as "you can do this all by yourself now!".
Sadly managers usually have more money to spend than the workers too, so it's more profitable.
> Try building something new in claude code (or codex etc) using a programming language you have not used before. Your opinion might change drastically.
So it looks best when the user isn't qualified to judge the quality of the results?
> using a programming language you have not used before
haven't we established that if you are layman in an area AI can seem magical. Try doing something in your established area and you might get frustrated. It will give you the right answer with caveats - code which is too verbose, performance intensive or sometimes ignoring best security practices.
Average programmers do not produce average software; the former implements code, the latter is the full picture and is more about what to build, not how to build it. You don't get a better "what to build" by having above-average developers.
Anyway we don't need more efficient average programmers, time-to-market is rarely down to coding speed / efficiency and more down to "what to build". I don't think AI will make "average" software development work faster or better, case in point being decades of improvements in languages, frameworks and tools that all intend to speed up this process.
Yes. The "true" average software quality is far, far lower than the average person perceives it to be. ChatGPT and other LLM tools have contributed massively to lowering average software quality.
I don’t understand how your three sentences mesh with each other. In any case, making the development of average software more efficient doesn’t by itself change anything about its quality. You just get more of it faster. I do agree that average software quality isn’t great, though I wouldn’t attribute it to LLMs (yet).
> using a programming language you have not used before
But why would I do that? Either I'm learning a new language in which case I want to be as hands-on as possible and the goal is to learn, not to produce. Or I want to produce something new in which case, obviously, I'd use a toolset I'm experienced in.
There are plenty of scenarios where you want to work with a new language but you don't want to have to dedicate months/years of your life to becoming expert in it because you are only going to use it for a one-time project.
For example, perhaps I want to use a particular library which is only available in language X. Or maybe I'm writing an add-on for a piece of software that I use frequently. I don't necessarily want to become an expert in Elisp just to make a few tweaks to my Emacs setup, or in Javascript etc. to write a Firefox add-on. Or maybe I need to put up a quick website as a one-off but I know nothing about web technologies.
In none of these cases can I "use a toolset I'm experienced in" because that isn't available as an option, nor is it a worthwhile investment of time to become an expert in the toolset if I can avoid that.
The same way you assess results in a programming language you have used before. In a more complicated project that might mean test suites. For a simple project (e.g. a Bash script) you might just run it and see if it does what you expect.
The way I assess results in a familiar programming language is by reviewing and reasoning through the code. Testing is necessary, but not sufficient by any means.
Out of curiosity, how do you assess software that you didn't write and just use, and that is closed source? Don't you just... use it? And see if it works?
You are correct that this is indeed a mostly unsolved problem. In chapter 15 of "The Mythical Man-Month", Fred Brooks called for all program documentation to not only for how to use a program, but also for how to modify a program [1] and, relevant to this discussion, for how to believe a program. This was before automated tests and CI/CD were a thing, so he advocated for shipping testcases with the program that the user could review and execute at any time. It's now 50 years later, and this is one of the many lessons in that book that we've collectively not picked up on enough.
[1] Side-note: This was written at a time when selling software as a standalone product was not really a thing, so everything was open-source and the "how to modify" part was more about how to read and understand the code, e.g. architecture diagrams.
It is not worth it and it is not even that impressive considering the cost.
If you told me that you would spend half a trillion and the best minds on reading the whole internet, then with some statistical innovation try to guess the probable output of an input. The way it works now seems about right, probably a bit disappointing even.
I would also say, it seems cool and you could do that, but why would you? At least when the training is done it is cheap to use right? No!? What the actual fuck!
You would be surprised to know Apple started this in AppStore before Google on PlayStore. I assume it is because Google wanted to be safe from Antitrust lawsuits (Follow Apple rather than going there first).
Seems to be failing at API Calls right now with "You exceeded your current quota, please check your plan and billing details. For more information on this error,"
Exactly this. My startup count is zero, but even I’m well enough informed to know that casting such a wide net is diverting attention away into too many tangents, too quickly, and will drain your balance sheet dry before you can capitalize on any singular success to build a moat or secure another funding round.
They were banking on AI coding being better than it was, and the snowball effect happening faster than it ever could. And now, they’re toast.
My Text to Speech app uses bounding box to display what text in PDF is being read and would not work well PDF's from this project.