It's basically continual learning. This is beyond a hard problem it's currently an impossible one. I know of no system that solve CL even at small scale let alone large models.
Annoyingly, they have SOME inherent capability to do it. It's really easy to get sucked down this path due to that glimmer of hope but the longer you play with it the more annoying it becomes.
SSI seems to be focused on this problem directly so maybe they discover something?
So, surprising, that is not completely true - I know of 2 finance HFT trading firms that do CL at scale, and it works - but in a relatively narrow context of predicting profitable actions. It is still very surprising it works, and the compute is impressively large to do it - but it does work. I do have some hope of it translating to the wider energy landscapers we want AI to work over…
During covid almost every prediction model like that exploded, everything went out of distribution really fast. In your sense we've been doing "CL" for a decade or more. It can also be cheap if you use smaller models.
But true CL is the ability to learn out of distribution information on the fly.
The only true solution I know to continual learning is to completely retrain the model from scratch with every new example you encounter. That technically is achievable now but it also is effectively useless.
Yes and no - the ones that exploded - and there were many - got shut down by the orchestrator model, and within 2 weeks it was now a new ensemble of winners - with some overlap to prior winners. To your point, it did in fact take 2-3 weeks - so one could claim this is retraining...
Ehhh KNN doesn’t have a training phase, so it’s really more that the concept of continual learning doesn’t apply. You have to store your entire dataset and recalculate everything from scratch every time anyway.
Yes, that's basically the point. You get 'free' continuous learning just by throwing the new data into the pool. Needing an explicit training step is a weakness that makes CL hard to make work for many other approaches.
For any practical application KNN will need some kind of accelerated search structure (eg Kd-tree for < ~7 dimensions) which then requires support for dynamic insertions. But this is an engineering problem, not a data science problem, it works and is practical. For example this has been used by the top systems in Robocode for 15+ years at this point, it's just academia that doesn't find this approach novel enough to bother pursuing.
>Needing an explicit training step is a weakness that makes CL hard to make work for many other approaches.
On the other hand, not having an explicit training step is a huge weakness of KNN.
Training-based methods scale better because the storage and runtime requirements are independent of dataset size. You can compress 100TB of training data down into a 70GB LLM.
A KNN on the same data would require keeping around the full 100TB, and it would be intractably slow.
Feature engineering is a thing, you don't need the full data source for KNN to do the search in. It is already used extensively in RAG type lookup systems, for example.
Annoyingly, they have SOME inherent capability to do it. It's really easy to get sucked down this path due to that glimmer of hope but the longer you play with it the more annoying it becomes.
SSI seems to be focused on this problem directly so maybe they discover something?