The motivation for AutoThink came from watching how current reasoning models was...

behnamoh · 2025-05-28T03:42:25 1748403745

> they spend the same amount of "thinking time" on "what's 2+2?" as they do on complex mathematical proofs.

Not anymore. Have you seen Gemini 2.5 Pro? Ask it simple questions and it almost doesn't "think". Ask it a coding question and it'll write a long reasoning article. I think the same goes for o3.

sigmoid10 · 2025-05-28T05:28:30 1748410110

The original o1 also didn't do this. Neither did the actual DeepSeek R1. You could even get it to answer immediately without any reasoning tokens. These highly distilled versions just lost most of their common sense for this.

shing3232 · 2025-05-28T05:57:53 1748411873

Well, it does overthink quite a bit. if It can reduce overthink,it s gonna be useful

victorbjorklund · 2025-05-28T07:14:22 1748416462

Overthink is subjectibe. It really depends on how much you value the answer.

"how long break distance does a train need if going in 100 km/hour?"

Just need a quick reply and you dont care so much (maybe showerthought)? Or is life and death depending on the answer?

The same question can need different amount of thinking.

normie3000 · 2025-05-28T09:41:32 1748425292

> is life and death depending on the answer?

In this situation I suspect you'd still want the answer quickly.

diggan · 2025-05-28T11:29:24 1748431764

Huge assumption, there is a wide range of various parameters that goes into how accurate you need an response to be, depending on context. As sure as there exists questions that you need 100% accurate response regardless of response times, I'm sure there exists questions on the other extreme.

GTP · 2025-05-28T14:02:40 1748440960

In this situation you would have someone with actual knowledge of the mechanics involved do the computation using the actual data (e.g., what's the mass of the train? Which kind of breaks does it have?) instead of asking an LLM and trusting it to give the correct answer without checking.

TeMPOraL · 2025-05-28T20:17:25 1748463445

Assuming you could find an expert like that in time, and that they will then be able to understand and solve the problem fast enough to still be helpful.

If you need the answer within a couple hours, you can probably get it for an expert; if you need to get an actionable answer within minutes, based on some back-of-the-envelope calculations, then a SOTA LLM is a much safer bet than flagging whoever seems the smartest in the room and asking them for help.

GTP · 2025-06-05T09:51:46 1749117106

I assumed we already did such calculations in advance, as it's needed to have proper safety measures.

victorbjorklund · 2025-05-29T21:49:20 1748555360

Why? Lets say your are designing a railway system. It does not matter if it takes 1 sec or an hour if the planning process are months long.

CjHuber · 2025-05-28T10:23:13 1748427793

What I really don't like is that I can't manually decide how much thinking it Gemini should allocate to a prompt. You're right sometimes it doesn't think but for me this also happens on complex query where I WOULD want it to think. Even things like "super think about this" etc don't help, it just refuses to

thegeomaster · 2025-05-28T11:08:11 1748430491

Gemini 2.5 Pro is getting thinking budgets when it GAs in June (at least that's the promise).

vladf · 2025-05-28T13:16:29 1748438189

This is available for Flash

codelion · 2025-05-28T03:50:22 1748404222

Yes, we started with the idea of trying to replicate similar control on thinking processes for open reasoning models. They also announced the Deep Think approach at IO which goes even further and combines parallel CoTs at inference.

CharlesW · 2025-05-28T12:37:15 1748435835

> I think the same goes for o3.

Definitely, in my experience. Elsewhere in the thread, OP says that open models/systems don't do this, in which case this seems like important work toward making open alternatives competitive.

olddustytrail · 2025-05-28T13:34:50 1748439290

Is that not just caching? If you have the same query just return the same response.

You could even put a simpler AI in front to decide if it was effectively the same query.

mclau157 · 2025-05-28T13:44:28 1748439868

Has Gemini or OpenAI put out any articles on this or is this just something you noticed?

Abishek_Muthian · 2025-05-28T04:50:51 1748407851

Congratulations! Any work to optimise efficiency w.r.t LLMs is much appreciated.

So far I’ve taken only lazy approach to optimising local LLMs by sending small queries to my M4 Mac Mini running MLX models and sending larger queries to my Nvidia 4090; it’s remarkable how efficient M4 is compared to Nvidia and I think Apple is in the right direction with MLX.

I would read about AutoThink and try to integrate it with my workflow.

Lerc · 2025-05-28T13:49:17 1748440157

I have thought it might be worth seeding responses with the output of non-reasoning models, so after the user prompt, inject a block of "a non-reasoning model thought this:... stuff ....Was that what the user wanted?" For the instances where the non reasoning version was sufficient it might help the reasoning model get to the point earlier.

codelion · 2025-05-28T13:52:22 1748440342

This is an interesting idea, I hadn't thought of it. It is worth experimenting I am not aware of anyone else trying it yet.

waffletower · 2025-05-28T15:46:32 1748447192

Claude Sonnet 3.5 (not even the latest iterations: 3.7 or 4) clearly adapts processing time to query complexity -- processing time is dynamic.