After watching the demos I'm convinced that the new context length will have the...

amelius · on March 14, 2023

> As a professional...why not do this?

Because your clients do not allow you to share their data with third parties?

MagicMoonlight · on March 14, 2023

What we really need is a model that you can run on your own hardware on site. I could never use this for business because they're reading everything you send through it, but let me run it on my own server and it would be unbelievably useful.

Imagine being able to ask your workplace server if it has noticed any unusual traffic, or to write a report on sales with nice graphs. It would be so useful.

colinsane · on March 14, 2023

> What we really need is a model that you can run on your own hardware on site.

we won’t have that until we come up with a better way to fund these things. “””Open””” AI was founded on that idea, had the most likely chance of anyone in reaching it: even going into things with that intent they failed and switched to lock down the distribution of their models, somehow managed to be bought by MS despite the original non-profit-like structure. you just won’t see what you’re asking for for however long this field is dominated by the profit motive.

sounds · on March 15, 2023

Nah, it's already being done for GPT-3's competitors and will likely be done soon for GPT-4's competitors

https://arstechnica.com/information-technology/2023/03/you-c...

systemvoltage · on March 15, 2023

Curious why even companies at the very edge of innovation are unable to build moats?

I know nothing about AI, but when DALLE was released, I was under the impression that the leap of tech here is so crazy that no one is going to beat OpenAI at it. We have a bunch now: Stable Diffusion, MidJourney, lots of parallel projects that are similar.

Is it because OpenAI was sharing their secret sauce? Or is it that the sauce isn’t that special?

PaulHoule · on March 15, 2023

Google got a patent on transfomers but didn't enforce it.

If it wasn't for patents you'd never get a moat from technology. Google, Facebook, Apple and all have a moat because of two sided markets: advertisers go where the audience is, app makers go where the users are.

(There's another kind of "tech" company that is wrongly lumped in with the others, this is an overcapitalized company that looks like it has a moat because it is overcapitalized and able to lose money to win market share. This includes Amazon, Uber and Netflix.)

mgfist · on March 15, 2023

I don't think this is strictly true, though it's rare. The easiest example is the semiconductor industry. ASML's high end lithography machines are basically alien and cannot be reproduced by anyone else. China has spent billions trying. I don't even think there's a way to make the IP public because of how much of it is in people's heads and in the processes in place. I wonder how much money, time and ASML resources it would take to stand up a completely separate company that can do what ASML does assuming that ASML could dedicate 100% of their time in assisting in training the personnel at said company.

da_chicken · on March 15, 2023

The semiconductor industry is only tangentially or partially a tech company. They're producing physical goods that require complex physical manufacturing processes. The means of production are expensive, complex, and require significant expertise to operate once set up. The whole thing involves multiple levels of complex engineering challenges. Even if you wanted to make a small handful of chips, you'd still have to go through all that.

Most modern tech companies are software companies. To them, the means of production are a commodity server in a rack. It might be an expensive server, but that's actually dependent on scale. It might even be a personal computer on a desk, or a smartphone in a pocket. Further, while creating software is highly technical, duplicating it is probably the most trivial computing operation that exists. Not that distribution is trivial (although it certainly can be) just that if you have one copy of software or data, you have enough software or data for 8 billion people.

Robotbeat · on March 15, 2023

That is literally technology. It just isn’t as software heavy as you like?

da_chicken · on March 15, 2023

No, I think it's very clear that upthread is talking about how software is difficult to build a moat around.

Chip fabs are literally one of the most expensive facilities ever created. Saying that because they don't need a special moat so therefore nothing in tech ever needs a special moat is so willfully blind that it borders on disingenuity.

sokoloff · on March 16, 2023

I don't think it's at all clear that upthread is exclusively talking about software.

The first use of "moat" upthread:

> Curious why even companies at the very edge of innovation are unable to build moats?

mgfist · on March 15, 2023

So you mean "Software" not "tech".

da_chicken · on March 15, 2023

That's the comment you should have responded with instead of the one that you did.

Upthread used the term "tech" when the thread is very clearly talking about AI. AI is software, but because they used the term "tech" you cherry-picked non-software tech as a counter example. It doesn't fit because the type of tech that GPT-4 represents doesn't have the manufacturing cost like a chip fab does. It's totally different in kind regardless of the fact that they're both termed "tech".

PaulHoule · on March 15, 2023

Yeah, this is probably also true for TSMC, Intel and ARM. Look how slow progress is on RISC-V on the high end despite RISC-V having the best academic talent.

kybernetyk · on March 15, 2023

>despite RISC-V having the best academic talent.

academic performance is a bad predictor for real world performance

varjag · on March 15, 2023

It's a decent predictor of real world performance just not a perfect one.

pclmulqdq · on March 15, 2023

Unfortunately, RISC-V, despite the "open source" marketing, is still basically dominated by one company (SiFive) that designs all the commercial cores. They also employ everyone who writes the spec, so the current "compiled" spec document is about 5 years behind the actual production ISA. Intel and others are trying to break this monopoly right now.

Compare this to the AI ecosystem and you get a huge difference. The architecture of these AI systems is pretty well-known despite not being "open," and there is a tremendous amount of competition.

shiftingleft · on March 15, 2023

> the current "compiled" spec document is about 5 years behind the actual production ISA

How could I verify this information?

pclmulqdq · on March 15, 2023

Read the RISC-V foundation website. There are numerous "ratified" parts of the RISC-V instruction set that are not in the latest "compiled" spec document.

therealcamino · on March 15, 2023

Saying a "compiled" spec is out of date may be technically accurate (or not, I don't have any idea) but if open, published documentation of the ratified extensions is on the web site, it's misleading to cite it as evidence that the spec is not open. And I know that the draft specifications are open for public comment prior to being ratified, so it's not a secret what's under development, either.

pclmulqdq · on March 15, 2023

I never said that it wasn't actually open source. I just said that the openness hasn't actually created meaningful competition, because there is a single company in control of the specs that abuses that control to create a moat.

For a concrete example, the bitmanip extensions (which provide significant increases in MIPS/MHz) were used by SiFive in commercial cores before ratification and finalization. No other company could do that because SiFive employees could just change the spec if they did. They're doing the same thing with vector/SIMD instructions now to support their machine learning ambitions.

nradov · on March 16, 2023

It's kind of hilarious how complex some "reduced" instruction sets have become.

therealcamino · on March 15, 2023

That was my question, too. What instructions have been undocumented for five years? What non-standardized extensions exist in SiFive cores?

throwaway2037 · on March 15, 2023

I would also add Samsung semi to that list. As I understand, for the small nodes, everyone is using ASML. That's a bit scary to me.

About RISC-V: What does you think is different about RISC-V vs ARM? I can only think that ARM has been used in the wild for longer, so there is a meaningful feedback loop. Designers can incorporate this feedback into future designs. Don't give up hope on RISC-V too soon! It might have a place in IoT which needs more diverse compute.

light_hue_1 · on March 15, 2023

> Google got a patent on transfomers but didn't enforce it.

Google's Transformer patent isn't relevant to GPT at all. https://patents.google.com/patent/US10452978B2/en

They patented the original Transformer encoder-decoder architecture. But most modern models are built either only out of encoders (the BERT family) or only out of decoders (the GPT family).

Even if they wanted to enforce their patent, they couldn't. It's a classic problem with patenting things that every lawyer warns you about "what if someone could make a change to circumvent your patent".

novaRom · on March 15, 2023

Wait until Google goes down inevitably, then they will apply all their legal force just to save their sinking ship.

varjag · on March 15, 2023

You can't tell unless you read the claims thoroughly. Degenerate use cases can be covered by general claims.

light_hue_1 · on March 15, 2023

Indeed. I read the claims. You can too. They're short.

varjag · on March 15, 2023

Are you kidding? There are 30 claims, it's an hours' work to make complete sense of how these work together and what they possibly do/do not cover. I've filed my own patents so have read thru enough of prior art and am not doing it for a pointless internet argument.

versteegen · on March 16, 2023

IANAL. I looked through the patent, not just the Claims. I certainly didn't read all of it. But while it leaves open many possible variations, it's a patent for sequence transduction and it's quite explicit everywhere that the system comprises a decoder and an encoder (see Claim 1, the most vague) and nowhere did I see any hint that you could leave out one or the other or that you could leave out the encoder-decoder attention submodule (the "degenerate use-case" you suggested). The patent is only about sequence transduction (e.g. in translation).

Now an encoder+decoder is very similar to a decoder-only transformer, but it's certainly an inventive step to make that modification and I'm pretty sure the patent doesn't contain it. It does describe all the other pieces of a decoder/encoder-only transformer though, despite not being covered by any of the claims, and I have no idea what a court would think about that since IANAL.

sokoloff · on March 16, 2023

Or, Amazon, Uber, and Netflix have access to so much capital based on investors' judgment that they will be able to win and protect market share by effective execution, thereby creating a defensible moat.

Tanjreeve · on March 19, 2023

I think his point was that If that moat doesn't exist without the ongoing context of more money being thrown at it then it isn't a moat.

light_hue_1 · on March 15, 2023

It's because moving forward is hard, but moving backward when you know what the space of answers is, is much easier.

Once you know that OpenAI gets a certain set of results with roughly technology X, it's much easier to recreate that work than to do it in the first place.

This is true of most technology. Inventing the telephone is something, but if you told a competent engineer the basic idea, they'd be able to do it 50 years earlier no problem.

Same with flight. There are some really tricky problems with counter-intuitive answers (like how stalls work and how turning should work; which still mess up new pilots today). The space of possible answers is huge, and even the questions themselves are very unclear. It took the Wright brothers years of experiments to understand that they were stalling their wing. But once you have the basic questions and their rough answers, any amateur can build a plane today in their shed.

zamnos · on March 15, 2023

I agree with your overall point, but I don't think that we'd be able to get the telephone 50 years earlier because of how many other industries had to align to allow for its invention. Insulated wire didn't readily or cheaply come in spools until after the telegraph in the 1840's. The telephone was in 1876 so 50 years earlier was 1826.

hnick · on March 16, 2023

You didn't mention it explicitly but I think the morale factor is also huge. Once you know it's possible, it does away with all those fears of wasted nights/weekends/resources/etc for something that might not actually be possible.

elevaet · on March 15, 2023

I think it's because everyone's swimming in the same bath. People move around between companies, things are whispered, papers are published, techniques are mentioned and details filled in, products are backwards-engineered. Progress is incremental.

usrbinbash · on March 15, 2023

> Or is it that the sauce isn’t that special?

The sauce is special, but the recipe is already known. Most of the stuff things like LLMs are based on comes from published research, so in principle coming up with the architecture that can do something very close, is doable to everyone with the skills to understand the research material.

The problems start with a) taking the architecture to a finished and fine tuned model and b) running that model. Because now we are talking about non-trivial amounts of compute, storage and bandwidth, so quite simple resources suddenly become a very real problem.

sounds · on March 15, 2023

OpenAI can't build a moat because OpenAI isn't a new vertical, or even a complete product.

Right now the magical demo is being paraded around, exploiting the same "worse is better" that toppled previous ivory towers of computing. It's helpful while the real product development happens elsewhere, since it keeps investors hyped about something.

The new verticals seem smaller than all of AI/ML. One company dominating ML is about as likely as a single source owning the living room or the smartphones or the web. That's a platitude for companies to woo their shareholders and for regulators to point at while doing their job. ML dominating the living room or smartphones or the web or education or professional work is equally unrealistic.

taneq · on March 15, 2023

I'm not sure how "keep the secret sauce secret and only offer it as a service" isn't a moat? Here the 'secret sauce' is the training data and the trained network, not the methodology, but the way they're going, it's only a matter of time before they start withholding key details of the methodology too.

kybernetyk · on March 15, 2023

Luckily ML isn't that complicated. People will find out stuff without the cool kids at OpenAI telling them.

kybernetyk · on March 15, 2023

>Or is it that the sauce isn’t that special?

Most likely this.

raducu · on March 15, 2023

I also expect a high moat, especially regarding training data.

But the counter for the high moat would be the atomic bomb -- the soviets were able to build it for a fraction of what it cost the US because the hard parts were leaked to them.

GPT-3 afik is an easier picking because they used a bigger model than necessary, but afterwards there appeared guidelines about model size vs. training data, so GPT-4 probably won't be as easily trimmed down.

siva7 · on March 15, 2023

You can have the most special sauce in the world but if you're hiding it in the closet because you fear that it will hurt sales of your classic sauce then don't be surprised with what will happen (also known as Innovators Dilemma)

panzi · on March 15, 2023

Isn't MidJourney a fork of Stable Diffusion?

astrange · on March 15, 2023

One of the middle version models was, but the first and latest model versions are homegrown.

pavo-etc · on March 15, 2023

Not originally, MidJourney came out before Stable Diffusion

hoseja · on March 15, 2023

The sauce really doesn't seem all that special.

dr_dshiv · on March 15, 2023

Because we are headed to a world of semi-automated luxury socialism. Having a genius at your service for less than $1000 per year is just an insane break to the system we live in. We all need to think hard about how to design the world we want to live in.

malborodog · on March 15, 2023

> we won’t have that until we come up with a better way to fund these things.

Isn't this already happening with LLaMA and Dalai etc.? Already now you can run Whisper yourself. And you can run a model almost as powerful as gpt-3.5-turbo. So I can't see why it's out of bounds that we'll be able to host a model as powerful as gpt4.0 on our own (highly specced) Mac Studio M3s, or whatever it may be.

f0e4c2f7 · on March 15, 2023

https://github.com/tatsu-lab/stanford_alpaca

Tada! Literally runs on a raspberry pi (very slowly).

GPT models are incredible but the future is somehow even more amazing than that.

I suspect this will be the approach for legal / medical uses (if regulation allows).

bradleyjg · on March 14, 2023

I don’t think on site is going to be necessary. Even the US intelligence community trusts that Amazon isn’t spying on the spies.

But a model that can run on a private cluster is certainly something that there’s going to be demand for. And once that exists there’s no reason it couldn’t be run on site.

You can see why OpenAI doesn’t want to do it though. SaaS is more lucrative.

hailwren · on March 14, 2023

> Even the US intelligence community trusts that Amazon isn’t spying on the spies

I’m not sure what you mean by this, but it’s incorrect. Sensitive USG information is not processed on Amazon’s commercial offering.

> The Amazon-built cloud will operate behind the IC’s firewall, or more simply: It’s a public cloud built on private premises. [1]

I think this is what you’re referring to.

1 - https://www.theatlantic.com/technology/archive/2014/07/the-d...

jimhi · on March 14, 2023

They are referring to this https://aws.amazon.com/govcloud-us/?whats-new-ess.sort-by=it...

bradleyjg · on March 14, 2023

No, the grandparent poster was right. That’s other agencies, not the intelligence community. He’s right that the cloud I was thinking of is on prem but with Amazon personal (that are cleared).

So not the greatest analogy. But still I think most doctors, lawyers etc should be okay with their own cluster running in the cloud.

jasonfarnon · on March 14, 2023

Not lawyers in the US at least, that would typically be a violation of confidentiality. Even with a client's permission, it would work a waiver of attorney-client privilege. (I don't use GPT but I'm assuming the ToS is clear that someone there can examine the input material? Can it even be used to build their model, i.e., submitted information could potentially work it's way back to the eyes of the public and not just OpenAI engineers?) I imagine HIPAA issues would stop doctors. Can HIPAA data be stored on the cloud? Every instance I've seen they store it locally.

bradleyjg · on March 14, 2023

I agree with you on the SaaS version but the scenario I was thinking of was where there is a licensable model that can be run on a cluster in law firm’s AWS account. I think that should be okay.

HIPAA data can definitely be stored in the cloud given the right setup. I’ve worked for companies that have done so (the audit is a bit of a pain.)

foooobaba · on March 15, 2023

I work in legaltech, and we use cloud services like aws for lawsuit data, and lawyers trust it. Any 3rd party must of course be vetted and go through NDA, and follow regional laws and guidelines ect, but using the cloud is definitely used for legaltech documents including sensitive data.

webmaven · on March 15, 2023

It should be added that legaltech vendors are often employed as go-betweens for quite adversarial interactions, such as e-discovery, that require them to be trusted (to a degree) by both sides of a case, even if they are being paid by one side.

mr_machine · on March 16, 2023

Seems like there are lots of confidentiality and reliability issues in how tech is being used in law right now, but there aren't that many attorneys who understand the issues, and those that do find it more advantageous to overlook them unless forced to do otherwise.

classichasclass · on March 15, 2023

> Can HIPAA data be stored on the cloud?

Absolutely. Virtually every instance of Epic EHR is hosted, for example.

nradov · on March 16, 2023

HIPAA regulated organizations routinely store protected health information on the cloud. This has been common practice for many years. The physical location is legally irrelevant as long as security and privacy requirements are met. AWS and other large cloud vendors specifically target this market and make it easy to achieve legal compliance.

https://aws.amazon.com/compliance/hipaa-compliance/

jfoster · on March 15, 2023

Are they even aware of where their data is? Opening a web browser might be a big hint for them, but how about editing something in Microsoft Office? Does the data there ever touch the cloud? Do Chromebooks make it clear enough where the data is?

I imagine lawyers knowing about where document data is stored as a bit like software developers being sufficiently aware of licensing. There's plenty who are paying attention, but there's also plenty who are simply unaware.

jsjohnst · on March 14, 2023

> You can see why OpenAI doesn’t want to do it though.

Except they already do offer private cluster solutions, you just need usage in the hundreds of millions of tokens per day before they want to talk to you (as in they might before that, but that’s the bar they say on the contact us page).

lillecarl · on March 14, 2023

VMware charges people per GB RAM attached to a VM. Selling on-prem software on consumption is very much possible. It's closed source software, so as long as they require 443 outbound to tick consumption that'd work.

MagicMoonlight · on March 15, 2023

You can’t take the risk. A cloud server is too open and too juicy. Everyone will be probing it 24/7, including hostile countries

slt2021 · on March 14, 2023

maybe we implement tokenizer+first layer in Javascript on client side and that is enough to preserve raw data on client side and send to GPT only first layer (which is a vector of float values anyway)

matrix gets decoded into text on the client side in Javascript, so we receive send and receive from chatGPT only vector of floats (obfuscation?)

geysersam · on March 14, 2023

It's a good idea but it seems quite easy to invert the first layer mapping. And the output of the last layer you can easily steal just by doing whatever would've been done in the client.

nickpeterson · on March 14, 2023

Could open ai just offer letting you upload a key and use it for interaction with the model? Basically encrypt the model with the key and all the request and responses are all secure?

I’m probably oversimplifying but it feels doable.

slt2021 · on March 14, 2023

the goal is how to use chatGPT without sending plain text to OpenAI (to preserve privacy, make sure openai is unable to even see plain customer data)

andai · on March 15, 2023

Maybe if we could speak with GPT-4 instead of OpenAI ;)

abudabi123 · on March 15, 2023

Will the nonpareil paraquet make original discoveries and inventions from protein folding and stem cells results, GPT-X interfacing with DeepMind?

qualudeheart · on March 15, 2023

That model will be out in a few years. GPT-3 175b only took two years until someone trained an open source equivalent that could run on a few gpu devices.

ElFitz · on March 15, 2023

Or using homomorphic encryption. I remember some managing to run inference on encrypted images.

See

- https://www.zama.ai/post/encrypted-image-filtering-using-hom...

- https://news.ycombinator.com/item?id=31933995

- https://news.ycombinator.com/item?id=34080882

- https://news.ycombinator.com/item?id=25786154

- https://news.ycombinator.com/item?id=30626182

runnerup · on March 15, 2023

Homomorphic encryption has a 1,000,000x performance disadvantage. So maybe in 30 years as we approach the Landauer limit, but not in our generation.

insanitybit · on March 15, 2023

> So maybe in 30 years as we approach the Landauer limit, but not in our generation.

I feel like 30 years is squarely within our generation

tga_d · on March 15, 2023

Depends on the definition of "generation" being used. One definition of generation is "about 30 years", i.e., the amount of time it takes to go from infancy to raising a child. See definition 6 (as of time of writing): https://en.wiktionary.org/wiki/generation#Noun

insanitybit · on March 17, 2023

Huh, thanks. I would not have guessed.

zmmmmm · on March 15, 2023

> What we really need is a model that you can run on your own hardware on site

So, LLaMA? It's no chat gpt but it can potentially serve this purpose

make3 · on March 14, 2023

the problem is that if you steal the weights then you can serve your own gpt4, and it's very hard to prove that what you're serving is actually gpt4. (or you could just start using it without paying ofc)

RealityVoid · on March 14, 2023

Presumably, if you give it identical prompts you get identical answers?

Sander_Marechal · on March 14, 2023

No, these NLPs aren't idempotent. Even if you ask ChatGPT the same question multiple times you will get different answers.

trifurcate · on March 15, 2023

None of the siblings are right. The models themselves are idempotent: given the same context you will get the same activations. However the output distribution is sampled in a pseudorandom way by these chat tools. You can seed all the prngs in the system to always have reproducible output using sampling, or even go beyond that and just work with the raw probability distribution by hand.

webmaven · on March 15, 2023

Right. They are idempotent (making an API call doesn't cause a state change in the model[0] per se), but not necessarily deterministic (and less so as you raise the temp).

It is possible to architect things to be fully deterministic with an explicit seed for the pseudorandom aspects (which is mostly how Stable Diffusion works), but I haven't yet seen a Chatbot UI implementation that works that way.

[0] Except on a longer timeframe where the request may be incorporated into future training data.

LawTalkingGuy · on March 14, 2023

That's the feature of chat - it remembers what has been said and that changes the context in which it says new things. If you use the API it starts fresh each time, and if you turn down the 'temperature' it produces very similar and identical answers.

parentheses · on March 15, 2023

This may be an implementation detail to obfuscate GPT weights. OR it was to encourage selecting the best answers to further train the model.

textninja · on March 15, 2023

Pseudo random numbers are injected into the models via its temperature settings, but OpenAI could seed that to get the same answers with the same input. I’m going out on a limb here with pure speculation but given the model, a temperature, and a known text prompt, OpenAI could probably reverse engineer a seed and prove that the weights are the same.

slt2021 · on March 15, 2023

fine-tuning original weights solves that, and any sane person would fine-tune for their task anyways to get better results

textninja · on March 15, 2023

Since fine-tuning is often done by freezing all but the top layers I wonder if it would still be possible to take a set of inputs and outputs and mathematically demonstrate that a model is derivative of ChatGPT. There may well be too much entropy to unpack, but I’m sure there will be researchers exploring this, if only to identify AI-generated material.

Of course, since the model is so large and general purpose already, I can’t assume the same fine-tuning techniques are used as for vastly smaller models, so maybe layers aren’t frozen at all.

outside1234 · on March 14, 2023

yes - they are multinomial distributions over answers essentially

simonh · on March 14, 2023

LLMs calculate a probability distribution for the relative chances of the next token, then select a token randomly based on those weightings.

Semioj · on March 14, 2023

They inject randomness in a layer were it does have small impact on purpose.

Also to give it a more natural feel.

Can't find we're I read about it

javier2 · on March 15, 2023

You mean hallucinated graphs and word prediction unusual traffic? No, I get that the models are very impressive, but im not sure they actually reason

vintermann · on March 15, 2023

The thinking elevator

So the makers proudly say

Will optimize its program

In an almost human way.

And truly, the resemblance

Is uncomfortably strong:

It isn't merely thinking,

It is even thinking wrong.

Piet Hein wrote that in reference to the first operator-free elevators, some 70+ years ago.

What you call hallucination, I call misremembering. Humans do it too. The LLM failure modes are very similar to human failure modes, including making up stuff, being tricked to do something they shouldn't, and even getting mad at their interlocutors. Indeed, they're not merely thinking, they're even thinking wrong.

mrtranscendence · on March 15, 2023

I don't think it's very salient that LLMs make stuff up, or can be manipulated into saying something they have been trained not to say. An LLM applies a statistical model to the problem of probability assignment over a range of tokens; a token of high probability is selected and the process repeats. This is not what humans do when humans think.

Given that GPT-4 is a simply large collection of numbers that combine with their inputs via arithmetic manipulation, resulting in a sequence of numbers, I find it hard to understand how they're "thinking".

hnick · on March 16, 2023

> This is not what humans do when humans think.

Are you sure? Our senses have gaps that are being constantly filled all day long, it just gets more noticeable when our brain is exhausted and makes errors.

For example, when sleep deprived, people will see things that aren't there but in my own experience they are highly more likely to be things that could be there and make sense in context. I was walking around tired last night and saw a cockroach because I was thinking about cockroaches having killed one earlier but on closer inspection it was a shadow. This has happened for other things in the past like jackets on a chair, people when driving, etc. It seems to me at least when my brain is struggling it fills in the gaps with things it has seen before in similar situations. That sounds a lot like probabilistic extrapolation from possibilities. I could see this capacity extend to novel thought with a few tweaks.

> Given that GPT-4 is a simply large collection of numbers that combine with their inputs via arithmetic manipulation, resulting in a sequence of numbers, I find it hard to understand how they're "thinking".

Reduce a human to atoms and identify which ones cause consciousness or thought. That is the fundamental paradox here and why people think it's a consequence of the system, which could also apply to technology.

vintermann · on March 15, 2023

We talk about "statistical models", and even "numbers" but really those things are just abstractions that are useful for us to talk about things (and more importantly, design things). They don't technically exist.

What exists are voltage levels that cause different stuff to happen. And we can't say much more about what humans do when humans think. You can surely assign abstractions to that too. Interpret neural spiking patters as exotic biological ways to approximate numbers, or whatever.

As it happens I do think our difference from computers matter. But it's not due to our implementation details.

drdeca · on March 15, 2023

What do you mean by “actually reason”?

And, presumably you wouldn’t have the model generate the graph directly, but instead have it generate code which generates the graph.

I’m not sure what they had in mind for the “unusual traffic” bit.

ElFitz · on March 15, 2023

For that I'd suggest using Langchain with Wolfram Alpha.

It's already been done and discussed:

- https://news.ycombinator.com/item?id=34422122

- https://news.ycombinator.com/item?id=34422627

jahewson · on March 14, 2023

“on site”? Medical records are in the cloud already.

peterashford · on March 14, 2023

Yes, but their access is strictly controlled. There's a lot of regulation about this stuff

geysersam · on March 14, 2023

If the chatbot technology proves useful I'm sure OAI could make some agreement to not store sensitive data.

peterashford · on March 16, 2023

yes - you could add regulation

MisterBastahrd · on March 14, 2023

Yes. But they aren't being shared with third party AIs. Sharing personal medical information with OpenAI is a good way to get both your medical org to get ground into dust under a massive class action lawsuit, not to mention huge fines from the government.

geysersam · on March 14, 2023

That's ridiculous. Sure if you put it into ChatGPT today that's a problem. But if you have a deal with the company providing this service, and they are certified to follow the relevant regulations around sensitive data, why would that be different from any other cloud service?

If this proves actually useful I guess such agreements could be arranged quite quickly.

porknubbins · on March 15, 2023

Yes, almost all eDiscovery is managed by cloud vendors as is, and no one worries about waiver of privilege to these companies. The only concerns I’ve heard have been relates to foreign companies or governments not wanting their data to be hosted in a foreign country. But domestically it should be fine to have a chatgpt legal where data is discarded not saved.

chaxor · on March 15, 2023

It's only been a few hours since Ring was hacked... a system run by a large company which assured everyone they were taking good care of their data. Surely the wonderful Amazon, with all of it's massive capital, could do the simple thing of encrypting incredibly sensitive and private user data? Right?

sebzim4500 · on March 14, 2023

Why do you think sharing the data with OpenAI is legally any different than storing it on AWS/Azure/GCP/Whatever else they are using?

dragonwriter · on March 15, 2023

GCP/AWS/Azure have HIPAA programs in places, and will, consequently, sign HIPAA BAAs to legally perform as Business Associates of covered entities, fully responsible for handling PHI in accord with HIPAA rules (for certain of their services.) OpenAI itself does not seem to offer this for either its UI or API offerings.

Microsoft, OTOH, does now offer a HIPAA BAA for its Azure OpenAI service, which includes ChatGPT (which means either they have a bespoke BAA with OpenAI that OpenAI doesn’t publicly offer, or they just are hosting their own ChatGPT instance, a privilege granted based on them being OpenAI’s main sponsor.)

unusualmonkey · on March 14, 2023

GCP respects hipaa (google 'gcp hipaa baa'). Does OpenAPI?

sebzim4500 · on March 15, 2023

If they don't now they will in the future, if they think there is money to be made. Why wouldn't they? They could even charge a premium for the service.

chaxor · on March 15, 2023

Is and ought https://en.m.wikipedia.org/wiki/Is%E2%80%93ought_problem

szundi · on March 14, 2023

What is “the cloud” - that’s the question

gffrd · on March 14, 2023

As taken from the cover page of the July, 2018 edition of AARP Weekly.

serf · on March 14, 2023

right, but 'the cloud' isn't a singular monolithic database that everyone inputs data into for a result.

most of the AI offerings on the table right now aren't too dissimilar from that idea in principle.

Semioj · on March 14, 2023

That's not entirely true.

Google has a contract with the biggest hospital operator in the USA.

Tx also to some certificate they aquires

cutler · on March 15, 2023

This is Microsoft we're talking about. Hail the new old overlord.

ludovicianul · on March 15, 2023

Isn't Azure OpenAI suppose to do this? (not locally, but private)

PaulHoule · on March 16, 2023

Models you can run locally are coming soon.

la64710 · on March 15, 2023

Just ask OpenAI and it will build it :)

sshumaker · on March 15, 2023

Just use the Azure hosted solution, which has all of Azure's stronger guarantees around compliance. I'm sure it will update with GPT-4 pricing shortly.

https://azure.microsoft.com/en-us/products/cognitive-service...

(disclaimer: I work for Microsoft but not on the Azure team)

ndm000 · on March 15, 2023

Agreed. The same data privacy argument was used by people not wanting their data in the cloud. When an LLM provider is trusted with a company’s data, the argument will no longer be valid.

tippytippytango · on March 14, 2023

This is the biggest thing holding gpt back. Everyone with meaningful data has their hands tied behind their back. So many ideas and the answer is “we can’t put that data in gpt” very frustrating.

chillfox · on March 14, 2023

Another way of looking at that is that gpt not being open source so companies can run it on their own clusters is holding it back.

nine_k · on March 15, 2023

Back in the day Google offered hardware search appliances.

Offering sealed server boxes with GPT software, to run on premises heavily firewalled or air-gapped could be a viable business model.

euroderf · on March 15, 2023

[ A prompt that gets it to decompile itself. With good inline documentation too! ]

nine_k · on March 15, 2023

I'm afraid that even the most obedient human can't readily dump the contents of their connectome in a readable format. Same likely applies to LLMs: they study human-generated texts, not their own source code, let alone their tensors' weights.

euroderf · on March 15, 2023

Well, what they study is decided by the relevant hoominz. There's nothing actually stopping LLMs from trying to understand their own innards, is there ? Except for the actual access.

geysersam · on March 14, 2023

Sounds like an easy problem to solve if this is actually the case.

OpenAI just has to promise they won't store the data. Perhaps they'll add a privacy premium for the extra effort, but so what?

null_shift · on March 14, 2023

Anyone that actually cares about the privacy of their data isn’t going to be satisfied with just a “promise”.

geysersam · on March 14, 2023

A legal binding agreement, whatever.

Gene_Parmesan · on March 15, 2023

Still not enough. Seriously. Once information is out there it cannot be clawed back, but legal agreements are easily broken.

I worked as a lawyer for six years; there are extremely strict ethical and legal restrictions around sharing privileged information.

sebzim4500 · on March 15, 2023

Hospitals are not storing the data on a harddrive in their basement so clearly this is a solvable problem. Here's a list of AWS services which can be used to store HIPAA data:

https://aws.amazon.com/compliance/hipaa-eligible-services-re...

As you can see, there is much more than zero of them.

heartbreak · on March 15, 2023

The biglaw firms I’m familiar with still store matter data exclusively on-prem. There’s a significant chunk of floor space in my office tower dedicated to running a law firm server farm for a satellite office.

JamesBarney · on March 15, 2023

This might have been true 10-15 years ago. But I've worked at plenty of places that store/process confidential, HIPAA, etc data in the cloud.

Most company's confidential information is already in their Gmail, or Office 365.

Jensson · on March 15, 2023

> I worked as a lawyer for six years; there are extremely strict ethical and legal restrictions around sharing privileged information.

But Microsoft already got all the needed paperwork done to do these things, it isn't like this is some unsolved problem.

soderfoo · on March 15, 2023

You can't unring a bell. Very true.

Nevertheless, the development of AI jurisprudence will be interesting.

zirgs · on March 15, 2023

What if there's a data breach? Hackers can't steal data that OpenAI doesn't have in the first place.

hnick · on March 16, 2023

Or legal order. If you're on-site or on-cloud and in the US then it might not matter since they can get your data anyway, but if you're in another country uploading data across borders can be a problem.

netsroht · on March 15, 2023

That's why more research should be poured into homomorphic encryption where you could send encrypted data to the API, OpenAI would then run computation on the encrypted data and we would only decrypt on the output locally.

I would never send unencrypted PII to such an API, regardless of their privacy policy.

majkinetor · on March 14, 2023

Which will disappear soon enough, once it is able to run on premise.

jnwatson · on March 15, 2023

Then you really shouldn’t use Google Docs, or Photoshop Online, or host your emails in the cloud.

thiht · on March 15, 2023

You’re saying it like you found a loophole or something but it’s not a gotcha. Yes, if you manipulate sensitive data you shouldn’t use Google Docs or Photoshop online (I’m not imaginative enough to think of a case where you would put sensitive data in Photoshop online though, but if you do, don’t) or host your emails in the cloud. I’ve worked in a moderate size company where everything was self hosted and it’s never been an issue

Sharlin · on March 15, 2023

Doctor-patient or lawyer-client confidentiality is slightly more serious a matter than your examples. And obviously it’s one thing for you to decide where to store your own things and another thing for someone else doing it with your confidential data…

selfhoster11 · on March 15, 2023

Google Docs and Photoshop Online have offline alternatives (and if you ask me, native MS Office is still the golden standard for interoperability of editable documents), and I use neither in my work or personal life.

Email is harder, but I do run my own email server. For mostly network related reasons, it is easier to run it as a cloud VM, but there's nothing about the email protocol itself that needs you to use a centralised service or host it in a particular network location.

jfoster · on March 15, 2023

MS Office is just one login away from storing documents in the cloud. I bet tons of users have their documents stored in OneDrive without realizing it.

https://support.microsoft.com/en-us/office/save-documents-on...

faeriechangling · on March 15, 2023

These services now have privacy and legally complaint options nowadays, and decisions to use them get board approval.

OpenAI just simply does not offer the same thing at this time. You’re stuck using Facebook’s model for the moment which is much inferior.

jstummbillig · on March 15, 2023

In these particular circles the idea of privacy at a technical and ideological level is very strong, but in a world where the biggest companies make their money by people freely sharing data every chance they get, I doubt that most would object to an affordable way to better their chances of survival or winning a court case.

seydor · on March 14, 2023

I assume that health providers will use servers that are guaranteed not to share data with openAi

rawoke083600 · on March 15, 2023

Is that any different then sending you patient down the hall to get an MRI from a 3rd-party-practise operating inside the hospital ? (honest question, I don't know ?)

fatihcelikbas · on March 14, 2023

How about open-source models like Flan-T5? What stops you from using them in your own cloud account or better on-prem?

ShadowBanThis01 · on March 14, 2023

And yet boatloads of people are willing to hand their phone number over to OpenAI.

pas · on March 14, 2023

It'll be a routine question, and everyone will just nod to give consent.

textninja · on March 15, 2023

Biggest roadblock right here. Need a private version for sure.

TeeMassive · on March 14, 2023

You mean like the cloud?

alfor · on March 15, 2023

do you use gmail?

as300 · on March 14, 2023

What's the difference between entering in an anonymized patient history into ChatGPT and, say, googling their symptoms?

woodson · on March 14, 2023

Anonymization doesn’t just mean “leave their names out”. An entire patient's medical history is in itself personal identifiable information. Instead of googling for “headache”, they now have stored a copy of every medical detail in your life.

dragonwriter · on March 14, 2023

If it is de-identified per HIPAA, little.

OTOH, the more patient info you are putting in, the less likely it is actually legally deidentified.

pmoriarty · on March 14, 2023

Data that has ostensibly been "anonymized" can often be deanonymized.

Gene_Parmesan · on March 15, 2023

Especially when the system we're discussing is literally the most advanced AI model we're aware of.

mliker · on March 14, 2023

if you enter an entire patient history, it could easily be an identifier of the person whereas Google queries have a smaller max limit number of tokens

msikora · on March 14, 2023

Can OpenAI get HIPAA certification? Perhaps offer a product that has it?

gigel82 · on March 14, 2023

I've heard the Azure OpenAI service has HIPAA certification; they don't have GPT-4 yet, though.

Godel_unicode · on March 14, 2023

The pdf on this page has the services that are under audit scope, check the table in appendix A; OpenAI is in scope for HIPAA BAA.

parentheses · on March 15, 2023

The data moat effect is greater with OpenAIs products.

hanoz · on March 14, 2023

I'd be furious if I found out some professional I'd commissioned had taken a document based on my own personal data, and poured over it themselves looking for errors at the tune of hundreds of dollars per hour, instead of sumbitting it to ChatGPT.

paulryanrogers · on March 14, 2023

Then why submit it to a professional human at all? If ChatGPT is prone to massive errors humans have to pour over the input anyway. If ChatGPT can make subtle, rare errors then again humans may need to be involved if the stakes are high enough to commission someone.

glenstein · on March 14, 2023

>If ChatGPT can make subtle, rare errors

Yeah, I think the issues presented will relate to uniquely tricky errors, or entirely new categories of errors we have to understand the nature of. In addition to subtle and rare, I think elaborately hallucinated and justified errors, errors that become justified and reasoned for with increasing sophistication, is going to be a category of error we'll have to deal with. Consider the case of making fake but very plausible sounding citations to research papers, and how much further AI might be able to go to backfill in it's evidence and reasons.

Anyway, I just mean to suggest we will have to contend with a few new genres of errors

msikora · on March 14, 2023

As a second opinion advisory role this seems reasonable... And also things are going to improve with time.

throwaway2037 · on March 15, 2023

"Second Opinion machine" -- that's a good phrase. Before I read your post, the best term I heard was "summary machine". A huge part of "office work" (services) is reading and consuming large amounts of information, then trying to summarise or reason about it. Often, you are trying to find something that doesn't fit the expected pattern. If you are a lawyer, this is absolutely the future of your work. You write a short summary of the facts of the case, then ask GPT to find related case law and write the initial report. You review and ask GPT to improve some areas. It sounds very similar to how a senior partner directs their juniors, but the junior is replaced by GPT.

In my career, I saw a similar pattern with data warehouse users. Initially, managers asked junior analysts to write SQL. Later, the tools improved, and more technical managers could use a giant pivot table. Underneath, the effective query produced by the pivot table is way more complex than their previous SQL queries. Again, their jobs will change when on-site GPT become possible, so GPT can navigate their data warehouse.

It is 2023 now, and GPT-3 was already pretty good. GPT-4 will probably blow it away. What it look like in 2030? It is terrifying to me. I think the whole internet will be full of GPT-generated ad-copy that no one can distinguish from human-written material. There are a huge number of people employed as ad-copy writers on these crap ad-driven websites. What is their future work?

hassancf · on March 15, 2023

Pre 2023 “Wayback machine” will be the only content guaranteed to be human. The rest is AI-generated.

d3ckard · on March 14, 2023

I must have missed the part when it started doing anything algorithmically. I thought it’s applied statistics, with all the consequences of that. Still a great achievement and super useful tool, but AGI claims really seem exaggerated.

jakewins · on March 14, 2023

This paper convinced me LLMs are not just "applied statistics", but learn world models and structure: https://thegradient.pub/othello/

You can look at an LLM trained on Othello moves, and extract from its internal state the current state of the board after each move you tell it. In other words, an LLM trained on only moves, like "E3, D3,.." contains within it a model of a 8x8 board grid and the current state of each square.

thomastjeffery · on March 15, 2023

That paper is famously misleading.

It's all the same classic personification of LLMs. What an LLM can show is not the same as what it can do.

The model was already present: in the example game moves. The LLM modeled what it was given, and it was given none other than a valid series of Othello game states.

Here's the problem with personification: A person who has modeled the game of Othello can use that model to strategize. An LLM cannot.

An LLM can only take the whole model and repeat its parts with the most familiar patterns. It is stuck fuzzing around the strategies (or sections of strategy) it has been given. It cannot invent a new divergent strategy, even if the game rules require it to. It cannot choose the winning strategy unless that behavior is what was already recorded in the training corpus.

An LLM does not play games, it plays plays.

fenomas · on March 15, 2023

Sorry, but what does anything you've said there have to do with the Othello paper?

The point of that paper was that the AI was given nothing but sequences of move locations, and it nonetheless intuited the "world model" necessary to explain those locations. That is, it figured out that it needed to allocate 64 binary values and swap some of them after each move. The paper demonstrated that the AI was not just doing applied statistics on character strings - it had constructed a model to explain what the strings represented.

"Strategy", meanwhile, has nothing to do with anything. The AI wasn't trained on competitive matches - it had no way of knowing that Othello has scoring, or even a win condition. It was simply trained to predict which moves are legal, not to strategize about anything.

thomastjeffery · on March 15, 2023

> The point of that paper was that the AI was given nothing but sequences of move locations, and it nonetheless intuited the "world model" necessary to explain those locations

Yes...

> That is, it figured out that it needed to allocate 64 binary values and swap some of them after each move.

Yes, but "figured out" is misleading.

It didn't invent or "figure out" the model. It discovered it, just like any other pattern it discovers.

The pattern was already present in the example game. It was the "negative space" that the moves existed in.

> "Strategy", meanwhile, has nothing to do with anything. The AI wasn't trained on competitive matches - it had no way of knowing that Othello has scoring, or even a win condition. It was simply trained to predict which moves are legal, not to strategize about anything.

Yes, and that is critically important knowledge; yet dozens, if not hundreds, of comments here are missing that point.

It found a model. That doesn't mean it can use the model. It can only repeat examples the of "uses" it has already seen. This is also the nature of the model itself: it was found by looking at the structural patterns of the example game. It was not magically constructed.

> predict what moves are legal

That looks like strategy, but it's still missing the point. We are the ones categorizing GPT's results as "legal". GPT never uses the word. It doesn't make that judgement anywhere. It just generates the continuation we told it to.

What GPT was trained to do is emulate strategy. It modeled the example set of valid chronological game states. It can use that model to extrapolate any arbitrary valid game state into a hallucinated set of chronological game states. The model is so accurate that the hallucinated games usually follow the rules. Provided enough examples of edge cases, it could likely hallucinate a correct game every time; but that would still not be anything like a person playing the game intentionally.

The more complete and exhaustive the example games are, the more "correctly" GPT's model will match the game rules. But even having a good model is not enough to generate novel strategy: GPT will repeat the moves it feels to be most familiar to a given game state.

GPT does not play games, it plays plays.

fenomas · on March 15, 2023

> It found a model. That doesn't mean it can use the model.

It used the model in the only way that was investigated. The researchers tested whether the AI would invent a (known) model and use it to predict valid moves, and the AI did exactly that. They didn't try to make the AI strategize, or invent other models, or any of the things you're bringing up.

If you want to claim that AIs can't do something, you should present a case where someone tried unsuccessfully to make an AI do whatever it is you have in mind. The Othello paper isn't that.

RugnirViking · on March 15, 2023

"GPT will repeat the moves it feels to be most familiar to a given game state"

That's where temprature comes in. AI that parrots the highest probability output every time tends to be very boring and stilted. When we instead select randomly from all possible responses weighted by their probability we get more interesting behavior.

GPT also doesn't only respond based on examples it has already seen - that would be a markov chain. It turns out that even with trillions of words in a dataset, once you have 10 or so words in a row you will usually already be in a region that doesn't appear in the dataset at all. Instead the whole reason we have an AI here is so it learns to actually predict a response to this novel input based on higher-level rules that it has discovered.

I don't know how this relates to the discussion you were having but I felt like this is useful & interesting info

thomastjeffery · on March 15, 2023

> GPT also doesn't only respond based on examples it has already seen - that would be a markov chain

The difference between GPT and a Markov chain is that GPT is finding more interesting patterns to repeat. It's still only working with "examples it has seen": the difference is that it is "seeing" more perspectives than a Markov chain could.

It still can only repeat the content it has seen. A unique prompt will have GPT construct that repetition in a way that follows less obvious patterns: something a Markov chain cannot accomplish.

The less obvious patterns are your "higher level rules". GPT doesn't see them as "rules", though. It just sees another pattern of tokens.

I was being very specific when I said, "GPT will repeat the moves it feels to be most familiar to a given game state."

The familiarity I'm talking about here is between the game state modeled in the prompt and the game states (and progressions) in GPT's model. Familiarity is defined implicitly by every pattern GPT can see.

GPT adds the prompt itself into its training corpus, and models it. By doing so, it finds a "place" (semantically) in its model where the prompt "belongs". It then finds the most familiar pattern of game state progression when starting at that position in the model.

Because there are complex patterns that GPT has implicitly modeled, the path GPT takes through its model can be just as complex. GPT is still doing no more than blindly following a pattern, but the complexity of the pattern itself "emerges" as "behavior".

Anything else that is done to seed divergent behavior (like the temperature alteration you mentioned) is also a source of "emergent behavior". This is still not part of the behavior of GPT itself: it's the behavior of humans making more interesting input for GPT to model.

reasonabl_human · on March 15, 2023

What is the closest approach we know of today that plays games, not plays? The dialogue above is compelling, and makes me wonder if the same critique can be levied against most prior art in machine learning applied against games. E.g. would you say the same things about AlphaZero?

naasking · on March 15, 2023

> It didn't invent or "figure out" the model. It discovered it, just like any other pattern it discovers.

Sure, and why isn't discovering patterns "figuring it out"?

thomastjeffery · on March 15, 2023

What can be done with "it" after "figuring out" is different for a person than for an LLM.

A person can use a model to do any arbitrary thing they want to do.

An LLM can use a model to follow the patterns that are already present in that model. It doesn't choose the pattern, either: it will start at whatever location in the model that the prompt is modeled into, and then follow whatever pattern is most obvious to follow from that position.

naasking · on March 15, 2023

> An LLM can use a model to follow the patterns that are already present in that model.

If that were true then it would not be effective at zero-shot learning.

> It doesn't choose the pattern, either: it will start at whatever location in the model that the prompt is modeled into, and then follow whatever pattern is most obvious to follow from that position.

Hmm, sounds like logical deduction...

archon1410 · on March 15, 2023

> An LLM can only take the whole model and repeat its parts with the most familiar patterns. It is stuck fuzzing around the strategies (or sections of strategy) it has been given. It cannot invent a new divergent strategy, even if the game rules require it to. It cannot choose the winning strategy unless that behavior is what was already recorded in the training corpus.

Where are you getting that from? My understanding is that you can get new, advanced, winning moves by starting a prompt with "total victory for the genius grandmaster player one who uses new and advanced winning techniques". If the model is capable and big enough, it'll give the correct completion by really inventing new strategies.

thomastjeffery · on March 15, 2023

It could give you a new strategy that is built from the parts of other known strategies. But would it give you the best one?

Let's say the training corpus contains stories that compare example strategies. Each part of a strategy is explicitly weighed against another: one is called "superior".

Now all you need is a prompt that asks for "a strategy containing all superior features". There are probably plenty of grammatical examples elsewhere in the model that make that transformation.

All the work here is done by humans writing the training corpus. GPT never understood any of the steps. GPT just continued our story with the most obvious conclusion; and we made certain that conclusion would be correct.

GPT doesn't play games, it plays plays.

archon1410 · on March 15, 2023

> GPT never understood any of the steps. GPT just continued our story with the most obvious conclusion; and we made certain that conclusion would be correct.

Perhaps the earlier or current variations of GPT, for most games? But the idea that LLMs can never make anything novel, that it will never "generalise out of distribution" (if that's the correct term here) seems to be just an assertion, not backed by any theory with great evidence behind it.

The "goal" of an LLM is to predict the next token. And the best way to do that is not brute force memorisation or regurgitating training data in various combinations, but to have a world model inside of it that will allow it to predict both the moves a bad player might make, and moves that a grandmaster might make.

thomastjeffery · on March 15, 2023

> The "goal" of an LLM is to predict the next token

That's another common misconception. That statement personifies GPT: GPT does not have goals or make predictions. Those are the effects of GPT: the behavior its authors hope will "emerge". None of that behavior comes from GPT itself. The behavior is defined by the patterns of tokens in the training corpus.

GPT itself has two behaviors: modeling and presentation. GPT creates an implicit model of every pattern it can find between the tokens in its training corpus. It then expands that model to include the tokens of an arbitrary prompt. Finally, it presents the model to us by starting at the location it just added the prompt tokens to, and simply following the most obvious path forward until that path ends.

The paths that GPT has available to present to us were already present in the training corpus. It isn't GPT that constructs the behavior, it is the people writing patterns into text.

> not brute force memorisation or regurgitating training data in various combinations

Not brute force: the combinations are not blindly assembled by GPT. GPT doesn't assemble combinations. The combinations were already assembled with patterns of grammar by the humans who wrote the valid progressions of game states. GPT found those patterns when it made its model.

> to have a world model inside of it that will allow it to predict both the moves a bad player might make, and moves that a grandmaster might make.

There is no prediction. A series of moves is a path carved into grammar. The path from one game state to the next involves several complex patterns that GPT has implicitly modeled. Depending on where GPT starts, the most obvious continuation may be to follow a more complex path. Even so, it's not GPT deciding where to go, it's the patterns that are already present that determine the path.

Because we use the same grammatical/writing patterns to describe "good play" and "bad play", it's difficult to distinguish between the two. GPT alone can't categorize the skill level of games, but narrative surrounding those game examples potentially can.

Drew_ · on March 15, 2023

Sounds like the type of prompt that would boldly give you a wrong/illegal answer.

archon1410 · on March 15, 2023

Perhaps. But the point is that some prompt will coax it into giving good answers that really make it win the game, if it has a good "world model" of how the game works. And there's no reason to think a language model cannot have such a world model. What exactly that prompt might be, the prompt engineers know best.

glenstein · on March 14, 2023

That's a great way of describing it, and I think a very necessary and important thing to communicate at this time. A lot of people in this yhread are saying that it's all "just" statistics, but "mere" statistics can give enough info to support inferences to a stable underlying world, and the reasoning about the world shows up in sophisticated associations made by the models.

simonh · on March 14, 2023

It’s clear they do seem to construct models from which to derive responses. The problem is once you stray away from purely textual content, those models often get completely batshit. For example if you ask it what latitude and longitude are, and what makes a town further north than another, it will tell you. But if you ask it if this town is further north than this other town, it will give you latitudes that are sometimes correct, sometimes made up, and will randomly get which one is further north wrong, even based on the latitudes it gave.

That’s because it doesn’t have an actual understanding of the geography of the globe, because the training texts werent sufficient to give it that. It can explain latitude, but doesn’t actually know how to reason about it, even though it can explain how to reason about it. That’s because explaining something and doing it are completely different kinds of tasks.

If it does this with the globe and simple stuff like latitudes, what are the chances it will mess up basic relationships between organs, symptoms, treatments, etc for the human body? Im not going to trust medical advice from these things without an awful lot of very strong evidence.

_kava · on March 15, 2023

You can probably fix this insufficient training by going for multimodal training. Just like it would take excessively long to teach a person the concept of a color that they can't see, an AI would need infeasible amount of text data to learn about, say music. But give it direct training with music data and I think the model will quickly grasp a context of it.

naasking · on March 15, 2023

> It’s clear they do seem to construct models from which to derive responses. The problem is once you stray away from purely textual content, those models often get completely batshit

I think you mean that it can only intelligently converse in domains for which it's seen training data. Obviously the corpus of natural language it was trained on does not give it enough information to infer the spatial relationships of latitude and longitude.

I think this is important to clarify, because people might confuse your statement to mean that LLMs cannot process non-textual content, which is incorrect. In fact, adding multimodal training improves LLMs by orders of magnitude because the richer structure enables them to infer better relationships even in textual data:

Multimodal Chain-of-Thought Reasoning in Language Models, https://arxiv.org/abs/2302.00923

kaibee · on March 15, 2023

I don't think this is a particular interesting criticism. The fact of the matter is that this just solved by chain-of-though reasoning. If you need the model to be "correct", you can make it get there by first writing out the two different latitudes, and then it will get it right. This is basically the same way that people can/will guesstimate at something vs doing the actual math. For a medical AI, you'll definitely need it to chain-of-thought every inference and step/conclusion on the path but...

simonh · on March 15, 2023

>you can make it get there by first writing out the two different latitudes, and then it will get it right

As I said in my comment, even if the model 'knows' and tells you that town A is at 64' North latitude and town B is at 53', it will sometimes tell you town B is the furthest north.

That's because it's training set includes texts where people talk about one town being further north that the other, and their latitudes, but the neural net wasn't able to infer the significance of the numbers in the latitude values. There wasn't enough correlation in the text for it to infer their significance, or generate a model for accurately doing calculations on them.

Meanwhile the training text must have contained many explanations of what latitude and longitude are and how to do calculations on them. As a result the model can splurge out texts explaining latitude and longitude. That only helps it splurge out that kind of text though. It doesn't do anything towards actually teaching it what these concepts are, how they relate to a spherical geographic model, or to actually do the calculations.

It's the same way GPT-3 could reliably generate texts explaining mathematics and how to do arithmetic in lots of very accurate detail, because it was trained on many texts that gave such explanations, but couldn't actually do maths.

It is possible to overcome these issues with a huge amount of domain relevant training text to help the LLM build a model of the specific problem domain. So these problems can be overcome. But the point stands that just because a model can explain in detail how to do something, that doesn't mean it can actually do it itself at all. They're completely different things that require radically different training approaches.

MVissers · on March 15, 2023

Can you give an example that ChatGPT 4 doesn't get right? ChatGPT4 is much much better at logic than 3.5, it's almost laughable. It's really really impressive.

Here is ChatGPT 4s output btw:

> What is the longitude an latitude of Brussels?

Brussels, the capital city of Belgium, is located at approximately 50.8503° North latitude and 4.3517° East longitude.

> What about New York?

New York City, located in the United States, has approximate coordinates of 40.7128° North latitude and 74.0060° West longitude.

> Which one is more north?

Brussels is more north than New York City. Brussels is located at approximately 50.8503° North latitude, while New York City is at approximately 40.7128° North latitude.

xp84 · on March 14, 2023

^ Agree. I'm convinced my 2-year-old doesn't operate on a dramatically different strategy than a LLM -- she's learned that when you are negotiating something (continued access to browse pictures on parent's phone, getting to watch TV, staying longer at a place she likes, etc), you can add on "2 minutes?" to your request and sometimes the opposing negotiator will give you some more time. She doesn't know what exactly a minute is or that specific number, but she's observed that it's correlated with getting what you want more than say, a whine. This is simple statistics and probability, in a biological neural network.

I think it's really cute how defensive and dismissive humans get (including those who profess zero supernatural beliefs) when they're trying so valiantly to write off all AI as a cheap parlor trick.

gerad · on March 14, 2023

All that said, the fact that AI is catching up to 2 year olds is pretty impressive. Human's brains surpass dog's at about that age. It shows we're getting close to the realm of "human."

taneq · on March 15, 2023

Given how many university-level tests GPT4 places better than 50th percentile at, I don't know if "catching up to 2 year olds" is a fair description. For that kind of text based task it seems well ahead of the general adult human population.

Sharlin · on March 15, 2023

To be fair, such tests are designed with the human mind in, well, mind, and assume that various hard-to-quantify variables – ones that the tester is actually interested in – correlate with test performance. But LLMs are alien minds with very different correlations. It’s clear, of course, that ChatGPT’s language skills vastly exceed those of an average 2-year-old, and indeed surpass the skills of a considerable fraction of general adult population, but the generality of its intelligence is probably not above a human toddler.