I've tried both, and I'm still not sure. Claude Code steers more towards a hands-off, vibe coding approach, which I often regret later. With Copilot I'm more involved, which feels less 'magical' and takes me more time, but generally does not end in misery.
I'm more curious how Gemini 3 flash lite performs/is priced when it comes out. Because it may be that for most non coding tasks the distinction isn't between pro and flash but between flash and flash lite.
Token usage also needs to be factored in specifically when thinking is enabled, these newer models find more difficult problems easier and use less tokens to solve.
Thanks that was a great breakup of cost. I just assumed before that it was the same pricing. The pricing probably comes from the confidence and the buzz around Gemini 3.0 as one of the best performing models. But competetion is hot in the area and it's not too far where we get similar performing models for cheaper price.
The price increase sucks, but you really do get a whole lot more. They also had the "Flash Lite" series, 2.5 Flash Lite is 0.10/M, hopefully we see something like 3.0 Flash Lite for .20-.25.
Mostly at the time of release except for 1.5 Flash which got a price drop in Aug 2024.
Google has been discontinuing older models after several months of transition period so I would expect the same for the 2.5 models. But that process only starts when the release version of 3 models is out (pro and flash are in preview right now).
There are plenty. But it's not the comparison you want to be making. There is too much variability between the number of tokens used for a single response, especially once reasoning models became a thing. And it gets even worse when you put the models into a variable length output loop.
You really need to look at the cost per task. artificialanalysis.ai has a good composite score, measures the cost of running all the benchmarks, and has 2d a intelligence vs. cost graph.
Claude is just so good. Every time I try moving to ChatGPT or Gemini, they end up making concerning decisions. Trust is earned, and Claude has earned a lot of trust from me.
Honestly Google models have this mix of smart/dumb that is scary. Like if the universe is turned into paperclips then it'll probably be Google model.
Well, it depends. Just recently I had Opus 4.1 spend 1.5 hours looking at 600+ sources while doing deep research, only to get back to me with a report consisting of a single sentence: "Full text as above - the comprehensive summary I wrote". Anthropic acknowledged that it was a problem on their side but refused to do anything to make it right, even though all I asked them to do was to adjust the counter so that this attempt doesn't count against their incredibly low limit.
Same here. They have been aggressively increasing prices with each iteration (maybe because they started so low). Still hope that is not the case this time. GPT 5.1 is priced pretty aggressively so maybe that is an incentive to keep the current gemini API prices.
The prompt caching change is awesome for any agent. Claude is far behind with increased costs for caching and manual caching checkpoints. Certainly depends on your application but prompt caching is also ignored in a lot of cost comparisons.
Though to be fair, thinking tokens are also ignored in a lot of cost comparisons and in my experience Claude generally uses fewer thinking tokens for the same intelligence
Since we have cursor people joining, let me bring up my constant problems around applying code changes. For background, I mostly work with "chat":
1. The apply button does not appear. This used to be mostly a problem with Gemini 2.5 Pro and GPT-5 but now sometimes happens with all models. Very annoying because I have to apply manually
2. Cursor doesn't recognize which file to apply changes to and just uses the currently open file. Also very annoying and impossible to change the file to which I want to apply changes after they were applied to one file.
For both of these scenarios, it seems to happen when the context limit is getting full and the context is summarized. I've found it usually works to respond with the right file, i.e. "great, let's apply those changes in @path/to/file", but it may also be a good time to return to an earlier conversation point by editing one of your previous messages. You might edit the message that got you the response with changes not linked to a specific file, including the file path in that prompt will usually get you back on track.
Voyage models are great in my experience and I am planing to test 3.5. Almost more interested in 3.5-lite though. Great price.
My concern: voyage api has been unreliable. They were bought by mango db, which makes me a little uneasy.
Gemini embeddings look like a great model but it’s in preview and there haven’t been any updates for a while (including at io). Also not sure how committed Google is to embeddings models.