I think they are completely screwing up the AI integration.
After years of JetBrains PyCharm pro I'm seriously considering switch to cursor.
Before supermaven being acquired, pycharm+supermaven was feeling like having superpowers ... i really wish they will manage to somehow catch up, otherwise the path is written: crisis, being acquired by some big corp, enshitification.
I'm biased (work at Cognition) but I think it's worth giving the Windsurf JetBrains plugin a try. We're working harder on polish these days, so happy to hear any feedback.
I have running subscriptions with both claude and codex. They are good but, at least for me, don't fully replace the coding part. Plus I tend to lose focus because of basically random response time.
In the engineering team velocity section, the most important metric is missing: change rate of new code or how many times it is change before being fully consolidated.
I would consider feature complete with robust testing to be a great proxy for code quality. Specifically, that if a chunk of code is feature complete and well tested and now changing slowly, it means -- as far as I can tell -- that the abstractions contained are at least ok at modeling the problem domain.
I would expect code that continually changes and deprecates and creates new features is still looking for a good problem domain fit.
Most of our customers are enterprises, so I feel relatively comfortable assuming they have some decent testing and QA in place. Perhaps I am too optimistic?
That sounds like an opportunity for some inspection; coverage, linting (type checking??), and a by-hand spot check to assess the quality of testing. You might also inspect the QA process (ride-along with folks from QA).
It's tricky, but one can assume that code written once and not touched in a while is good code (didn't cause any issues, performance is good enough, ecc).
I guess you can already derive this value if you sum the total line changed by all PRs and divide it by (SLOC end - SLOC start). Ideally it must be a value slightly greater than 1.
fyi: You headline with "cross-industry", lead with fancy engineering productivity graphics, then caption it with small print saying its from your internal team data. Unless I'm completely missing something, it comes of as a little misleading and disingenuous. Maybe intro with what your company does and your data collection approach.
Apologies, that is poor wording on our part. It's internal data from engineers that use Greptile, which are tens of thousands of people from a variety of industries. As opposed to external, public data, which is where some of the charts are from.
just out of curiosity: if i'm located in spain and i setup an ec2 or digital ocean instance in germany and use it as a socks proxy over ssh, do you will detect me?
No. But most software products are nowhere near that sensitive and very few of them are developed with the level of caution and rigor appropriate for a safety-critical component.
In teams of high performers who have built a lot of mutual trust, code reviews are mostly a formality and a stop gap against the big, obvious accidental blunders. "LGTM!"
I do not know or trust the agents that are putting out all this code, and the code review process is very different.
Watching the Copilot code review plugin complain about Agent code on top of it all has been quite an experience.
I happily got rid of a legacy application (lost the pitch, another agency now must deal with the shit) I inherited as a somewhat technically savvy person about a year ago.
It was built by real people. Not a single line of AI slop in it. It was the most fragile crap I had ever the misfortune to witness. Even in my wildest vibe coding a prototype moments I was not able to get the AI to produce that amount of anti patterns, bad shit and code that would have had Hitchcock running.
I think we would be shocked to see what kind of human slop out there is running in production. The scale might change, but at least in this example, if I had rebuilt the app purely by vibe coding the code quality and the security of the code would actually have improved. Even with the lowest vibe coding effort thinkable.
I am not in any way condoning (is this the right word) bad practices, or shipping vibe code into prod without very, very thorough review. Far from it. I am just trying to provide a counter point to the narrative, that at least in the medium sized business I got to know in my time consulting/working in agencies, I have seen quite a metric ton of slop, that would make coding agents shiver.
DigitalOcean version 1 was a duck taped together mash of bash, chron jobs and perl, 2 people out of 12 understood it, 1 knew how to operate it. It worked, but it was insane, like really, really insane. 0% chance the original chatgpt would have written something as bad as DO v1.
To me, built and written are not the same. Built: OK, maybe that's an exaggeration. But could an early "this is pretty good at code" llm have written digitalocean v1? I think it could, yes (no offense Jeff). In terms of volume of code and size of architecture, yeah it was big and complex, but it was literally a bunch of relatively simple cron, bash and perl, and the whole thing was very...sloppy (because we were moving very quickly) - DigitalOcean as I last knew of it (a very long time ago), transformed to a very well written modern go shop. (Source: I am part of the "founding team" or whatever.)
AI doesn't overcome the limits of the one who is giving the input, like in pre-ai era SW, if the input sucks the output sucks.
What changed is the speed: AI and vibe coding just gave a turboboost to all you described. The amount of code will go parabolic (maybe it's already parabolic) and, in the mid-term, we will need even more swe/sre/devops/security/ecc to keep up.
After years of JetBrains PyCharm pro I'm seriously considering switch to cursor. Before supermaven being acquired, pycharm+supermaven was feeling like having superpowers ... i really wish they will manage to somehow catch up, otherwise the path is written: crisis, being acquired by some big corp, enshitification.
reply