it is interesting. i believe they may be hitting a wall as well

flangola7 · on March 10, 2023

A paper from a week ago found that models trained on multiple data modes perform an order of magnitude better than text-only models of the same or even larger size.

gwoolhurme · on March 10, 2023

Genuinely curious, what does it mean in this context to perform better? (hopefully that doesn't come across as snarky as text sometimes does).

jerrygenser · on March 10, 2023

Some of these large models are able to do zero shot learning and perform tasks they weren't explicitly trained on since the training objective is very general.

Being able to perform more advanced types of zero shot learning tasks would be comparable and further the accuracy on those tasks can be evaluated

dmix · on March 10, 2023

Any chance this could improve coding LLMs like Co-pilot? Or would that sort of thing be limited to source code feeds (not that Github has a shortage).

refulgentis · on March 10, 2023

The next big step for coding LLMs will be context window increases, leaked docs have OpenAI pricing for up to 16K I believe, 4x the current maximum. Now you're talking "write a class" instead of this line and maybe sometimes a method

williamcotton · on March 10, 2023

It can already reliably write a todo web browser application.

With 16k and some other techniques, I’m guessing it could write a custom CMS database backed web application.

flangola7 · on March 10, 2023

Not 16k, 32k. 8x the current window.

refulgentis · on March 10, 2023

dmix · on March 10, 2023

What is 16k referring to here

refulgentis · on March 10, 2023

I’ve begun to grok it as “the amount of ram I have to play in before I have to start sharding work”

more literally and correctly, it’s the maximum number of tokens in the input and output, combined, where a token is 4/3 of a word

So we’re shifting from 5K words maximum to 40K (per sibling comment, who pointed out 32K context leaked as well)

MattRix · on March 10, 2023

A minor correction: a token is 3/4 of a word. ie. it’s slightly smaller than a word, not larger.

fiso64 · on March 10, 2023

Are you referring to PALM-E? It didn't have any positive transfer for NLP tasks, in fact the unfrozen model performed slightly worse after the finetune. That being said, PALM-E wasn't really a multimodal model from the start, it's still basically just a text model with a visual one glued on top. Whether a truly multimodal model will be better at reasoning and data efficiency is still an open question though.

eob · on March 10, 2023

I figure it’s a stand-in for embodiment.

dwallin · on March 10, 2023

Can you share a link to this paper?