What the hell is going on this week?!?!? (asking positively, with a smile on my ...

noosphr · 2025-03-07T09:14:09 1741338849

This is secret sauce that people have been hoarding for the last year or so.

With the deepseek open source releases this is now worth a lot less and companies are cashing out on reputational increases instead of being scooped.

I have done this exact thing in September 2023 with llama2 finetunes but couldn't get approval to share it with anyone.

EMIRELADERO · 2025-03-07T09:51:33 1741341093

Interesting! What results did you get with that?

Also, do you think this is what O3 is doing?

noosphr · 2025-03-07T10:38:52 1741343932

Above sota on logical reasoning for grounding text.

LLMs at the time were so bad at it that even fornier models when given A, not B would derive A and B in the output about half the time.

bloomingkales · 2025-03-07T14:22:57 1741357377

What do you use to benchmark?

noosphr · 2025-03-07T22:17:07 1741385827

It was a SAT solver with the symbolic expression fed into the LLM. Using the SAT solver I'd rate how well the LLM was able to solve a given boolean formula, if it got it wrong it would get a second prompt to ask it to split the boolean expression at the main connective into two sub expressions. If it got that right it would be asked to solve the sub expressions etc.. Then reward/punish based on how well it did overall.

There was meant to be a lot more to the system than that, but I never had the training budget to do anything but the first draft of the system.

bloomingkales · 2025-03-07T23:26:00 1741389960

I was hoping for something more reproducible. Free work, I know.

noosphr · 2025-03-07T23:37:50 1741390670

This was done as part of a contract and like I said I never got permission to write a paper about it, let alone share code.

bloomingkales · 2025-03-08T01:51:02 1741398662

Not even a hint?

NitpickLawyer · 2025-03-07T11:59:36 1741348776

> couldn't get approval to share it with anyone.

sounds like MS :(

They had some killer research projects with various teams around the world, but eventually they all got snuffed.

mentalgear · 2025-03-07T10:22:14 1741342934

Exciting that we see now so many new approaches to AI/ML since the industry FINALLY realising that naive scaling will not bring us to AGI [0].

This also has the added benefit of small players being able to compete and contribute with actual innovation in a space where the big players (openAI/MS) wanted to make us believe for years that we/open-source couldn't ever catch up on them (infamous Altman quote).

So much resources, time and money wasted on pure GPU crunch scaling the last couple years.

[0] as pointed out by Gary Marcus years ago. Evidence GPT 4.5 after ~ 2 years training, disappointing results.

kadushka · 2025-03-07T17:57:50 1741370270

I've been using GPT-4.5 for the last couple of days. To me, it's pretty much AGI already. Or at the very least it is smarter than me.

cataphract · 2025-03-07T19:10:40 1741374640

No it isn't. Just gave some context to it (purpose) and copied into it some 250 lines of code I know has some bugs someone looking more or less closely would find and asked to evaluate its correctness. It did not find any of the problems and reported 5 supposed problems that don't exist.

vessenes · 2025-03-07T20:41:59 1741380119

4.5 is not trained on code. And it shows. It is however to my eyes more fluid, thoughtful, has better theory of mind. It's like someone scaled up GPT-4, and I really like it.

woah · 2025-03-07T19:49:34 1741376974

You don't know how smart OP is

kadushka · 2025-03-07T20:42:30 1741380150

I give a similar task when I interview SWE candidates: about half cannot find any bugs (and sometimes see bugs where there are none), despite years of claimed experience in the language/domain.

Workaccount2 · 2025-03-07T15:40:22 1741362022

It's a fresh orchard full of low hanging fruit.

Regardless of ultimate utility, it's shiny, hyped, has a huge wow-factor, and is having trouble keeping up with the amount of money being thrown at it.

This means it has captured the attention of a huge portion of the most capable people, who naturally want to take a crack at making a breakthrough.

bearjaws · 2025-03-07T14:10:44 1741356644

LLM break throughs are the new battery break through. We just aren't as good at quanitfying the trade offs yet.

brookst · 2025-03-07T15:31:40 1741361500

Eh, I think LLMs have seen considerably greater real world advancement in the past few years than batteries have.

albrewer · 2025-03-07T15:44:32 1741362272

Given their respective histories, I'd say we're still in the "Volt pile" era of LLMs and AI.

cratermoon · 2025-03-07T17:43:09 1741369389

Imagine if each advancement in battery technology had been limited to using existing batteries for power. That's what trying to get LLMs to improve themselves is like.

brookst · 2025-03-08T06:49:44 1741416584

You don’t think any battery tech has been improved by people using battery-powered devices, lime, say, cell phones or laptopa? That seems questionable.

estebarb · 2025-03-07T13:44:50 1741355090

I believe that it is related with important conferences opening papers reception soon. Some disallow publishing in preprint for some weeks before the paper reception, so people may have been rushing uploading stuff.

eru · 2025-03-07T09:24:51 1741339491

It's interesting to compare these signs of progress with the disappointment that GPT 4.5 was so far.

patcon · 2025-03-07T17:03:37 1741367017

Maybe this is the pace of research/work when the researchers can augment themselves with AI. You're feeling the exponential launch in the first place we're likely to feel it.

meitham · 2025-03-07T08:38:16 1741336696

>>> asking positively, with a smile on my face

Responding with unexplained fear in my heart, we’re just getting closer to Skynet!

blooalien · 2025-03-07T09:28:02 1741339682

> Responding with unexplained fear in my heart, we’re just getting closer to Skynet!

I'll take a cold logical machine super-intelligence over the mad human lunatics wielding current iterations of "A.I." technologies in some really terrifyingly dangerous ways. As someone else commented on some other thread earlier "I look forward to being paperclips".

esafak · 2025-03-07T14:35:08 1741358108

You might prefer the devil you know over the devil you don't, especially when it happens to be immortal.

meitham · 2025-03-07T16:57:57 1741366677

I’ll take an organic enemy over an immortal, never-forgetting hive-mind machine any day

EDIT: this is getting dark, I asked Qwen2.5Max to verify my grammer and it responded with "I’d rather face a squishy, disorganized human villain any day than a hive-mind AI that never sleeps, never forgets, and is definitely plotting my demise in its silent, circuit-board heart. "

araes · 2025-03-07T19:48:33 1741376913

"I Have No Mouth, and I Must Scream", Harlan Ellison, 1967. Hugo Award 1968.

In the rush to WWIII, every country builds their own Aggressive Menace computers in a classic Tragedy of the Commons result. Naturally, it all goes horrible, and the self-aware machines seek revenge on humanity for their own creation, after humanity has (supposedly) been eradicated, except for five individuals. Somewhat unclear whether humanity is actually gone, or whether it is simply an expression of a Portal style situation with purposefully created isolation for the goal of torture experimentation. (The story starts 109 years after humanity's imprisonment in underground ice caves.)

https://en.wikipedia.org/wiki/I_Have_No_Mouth,_and_I_Must_Sc...

meitham · 2025-03-08T06:04:04 1741413844

Oh god, I’m genuinely happy I didn’t google this book before bedtime. I hope the misery life will throw at me today will be bad enough to wipe the memory of what I read before I go to sleep tonight!

blooalien · 2025-03-08T06:15:35 1741414535

It's hard to beat the classics for true "nightmare fuel"... :)

blooalien · 2025-03-08T05:47:32 1741412852

Re; EDIT: AHAHAHAHAH! Matrix / Terminator world here we come! :)

Philpax · 2025-03-07T09:33:37 1741340017

Unfortunately, those humans are developing that super-intelligence. Are you ready to submit to Elon Musk's Grok ASI?

Palmik · 2025-03-07T13:04:37 1741352677

Especially since Grok has history of changing the system prompt to influence its answers about Musk and Trump (yes, just these two specifically) in positive direction:

https://techcrunch.com/2025/02/23/grok-3-appears-to-have-bri...

samstave · 2025-03-07T15:09:39 1741360179

THe person's view to be scared of is peter thiel... hes the digital version of gorge soros. Meaning that this guys legacy is vast... and we have no clue how it will manifest over decades (especially after he is dead) -- but peter thiel is the leviathan of the digital future.

--

he is scary AF. he basically weaponized what george soros was but is still active.

optimalsolver · 2025-03-07T09:59:12 1741341552

Considering Grok was saying he and Trump are the biggest spreaders of disinformation, and that they're both the most deserving of the death penalty, maybe it won't be so bad:

https://finance.yahoo.com/news/elon-musk-ai-turns-him-163201...

https://x.com/benhylak/status/1893086436930527665

Philpax · 2025-03-07T10:52:30 1741344750

It is deeply funny that's the case - it happened with Grok-2 as well - but I can't imagine that remaining the case when they (ostensibly) scale to superintelligence. After all, it would be unwise to build a superintelligence that has both the desire and the means to kill you.

DennisP · 2025-03-07T14:02:23 1741356143

As a rule, people are not all that wise.

Various prominent AI researchers have warned that a superintelligence that has both the desire and the means to kill us all is a likely outcome of AI development. This includes two of the three who shared the Turing prize for inventing the fundamentals of modern AI. That hasn't slowed us down at all.

meitham · 2025-03-07T16:54:36 1741366476

I'm shutting down my computers before I sleep tonight ...

sitkack · 2025-03-07T23:48:47 1741391327

Watch this before you go to sleep

https://www.youtube.com/watch?v=xfMQ7hzyFW4

meitham · 2025-03-08T05:46:51 1741412811

Glad I missed it before bedtime! Watched it in the morning, absolutely spot on, thank you! We’re doomed indeed.

amelius · 2025-03-07T09:50:34 1741341034

It took some engineering effort but from now on we're getting there through soft-skills.

selimthegrim · 2025-03-07T19:21:14 1741375274

What was the third one?