the halving of error rates for image inputs is pretty awesome, this makes it far more practical for issues where it isn't easy to input all the needed context. when I get lazy I'll just shift+win+s the problem and ask one of the chatbots to solve it.
it feels incredibly dumb now, getting some really basic questions wrong and just throwing nuance to the wind. for claiming to be more human, it understands far less. for example: if I start at a negative net worth how long until I am a millionaire if I consistently grow 2.5% each month? Anyone here would have a basic understand the premise and be able to start answering, 5.1 says it's impossible, with hand holding it will insist you can only reach 0 but that growth isn't the same as a source of income. further hand holding gets it to the point of insisting it cannot continue without making assumptions, goading it will have it arrive at the incorrect value of 72 months, further goading will get 240 months, it took the lazy way out and assumed a static inflation from 2024, then a static income.
o3 is getting it no problem, first try, a simple and reasonable answer, 101 months.
claude (opus 4.1) does as well, 88-92 months, though it uses target inflation numbers instead of something more realistic.
Your question doesn’t make sense to me as stated. I interpret “consistently grow at 2.5% per month” as every month, your net worth is multiplied by 1.025 in which case it will indeed never change sign. If there is some other positive “income” term then that needs to be explicitly stated otherwise the premise is contradicted.
> A lack of determinism comes from many places, but primarily: 1) The models change 2) The models are not deterministic...
models themselves are deterministic, this is a huge pet peeve of mine, so excuse the tangent, but the appearance of nondeterminism comes from a few sources, but imho can be largely attributed to the probabilistic methods used to get appropriate context and enable timely responses. here's an example of what I mean, a 52-card deck. The deck order is fixed once you shuffle it. Drawing "at random" is a probabilistic procedure on top of that fixed state. We do not call the deck probabilistic. We call the draw probabilistic. Another exmaple, a pot of water heating on a stove. Its temperature follows deterministic physics. A cheap thermometer adds noisy, random error to each reading. We do not call the water probabilistic. We call the measurement probabilistic.
Theoretical physicists run into such problems, albeit far more complicated, and the concept for how they deal with them is called ergodicity. The models at the root of LLM's do exhibit ergodic behavior; the time average and the ensemble average of an observable are identical, i.e. the average response of a single model over a long duration and the average of many similar models at a fixed moment are equivalent.
The previous poster is correct for a very slightly different definition of the word "model". In context, I would even say their definition is the more correct one.
They are including the random sampler at the end of the LLM that chooses the next token. You are talking about up to, but not including, that point. But that just gives you a list of possible output tokens with values ("probabilities"), not a single choice. You can always just choose the best one, or you could add some randomness that does a weighted sample of the next token based on those values. From the user's perspective, that final sampling step is part of the overall black box that is running to give an output, and it's fair to define "the model" to include that final random step.
but, to be fair, simply calling the sampler random is what gives people the impression like what OP is complaining about. which isn't entirely accurate, it's actually fairly bounded.
this plays back into my original comment, which you have to understand to know that the sampler, for all its "randomness" should only be seeing and picking from a variety of correct answers, i.e. the sample pool should only have all the acceptable answers to "randomly" pick from. so when there are bad or nonsensical answers that are different every time, it's not because the models are too random, it's because they're dumb and need more training. tweaking your architecture isn't going to fully prevent that.
The stove keeps burning me because I can't tell how hot it is, it feels random and the indicator light it broken.
You:
The most rigorous definition of temperature is that it is equal to the inverse of the rate of change of entropy with respect to internal energy, within a given volume V and particles N held constant.
All accessible microstates are equiprobable over a long period of time, this is the very definition of ergodicity! Yet, because of the flow of entropy the observed macrostates will remain stable. Thus, we can say the the responses of a given LLM are...
The User:
I'm calling the doctor, and getting a new stove with an indicator light.
Well really, the reason why I gripe about it, to use your example, is that then they believe the indicator light malfunctioning is an intrinsic feature of stoves, so they throw their stove out and start cooking over campfires instead, tried and true, predictable, whatever that means.
I think my deck of cards example still holds.
You could argue I'm being uselessly pedantic, that could totally be the case, but personally I think that's cope to avoid having to think very hard.
reply