Hacker Newsnew | past | comments | ask | show | jobs | submit | icedchai's commentslogin

If it was still around, it would probably still be stuck on M2, just like the Mac Pro.

Outside of YouTube influencers, I doubt many home users are buying a 512G RAM Mac Studio.

I'm neither and have 2. 24/7 async inference against github issues. Free. (once you buy the macs that is)

I'm not sure who 'home users' are, but i doubt they're buying two $9,499 computers.

Peanuts for people who make their living with computers.

So, not a home user then. If you make your living with computers in that manner you are by definition a professional, and just happen to have your work hardware at home.

I wonder what the actual lifetime amortized cost will be.

Every time I'm tempted to get one of these beefy mac studios, I just calculate how much inference I can buy for that amount and it's never a good deal.

Every time someone brings up that, it brings me back memories of trying to frantically finish stuff as quickly as possible as either my quota slowly go down with each API request, or the pay-as-you-go bill is increasing 0.1% for each request.

Nowadays I fire off async jobs that involve 1000s of requests, billion of tokens, yet it costs basically the same as if I didn't.

Maybe it takes a different type of person, than the one I am, but all these "pay-as-you-go"/tokens/credits platforms make me nervous to use, and I end up not using it or spending time trying to "optimize", while investing in hardware and infrastructure I can run at home and use that seems to be no problem for my head to just roll with.


But the downside is that you are stuck with inferior LLMs. None of the best models have open weights: Gemini 3.5, Claude Sonnet/Opus 4.5, ChatGPT 5.2. The best model with open weights performs an order of magniture worse than those.

The best weights are the weights you can train yourself for specific use cases. As long as you have the data and the infrastructure to train/fine-tune your own small models, you'll get drastically better results.

And just because you're mostly using local models doesn't mean you can't use API hosted models in specific contexts. Of course, then the same dread sets in, but if you can do 90% of the tokens with local models and 10% with pay-per-usage API hosted models, you get the best of both worlds.


anyone buying these is usually more concerned with just being able to run stuff on their own terms without handing their data off. otherwise it's probably always cheaper to rent compute for intense stuff like this

For now, while everything you can rent is sold at a loss.

Nevermind the fact that there are a lot of high quality (the highest quality?) models that are not released as open source.

Are the inference providers profitable yet? Might be nice to be ready for the day when we see the real price of their services.

Isn't it then even better to enjoy cheap inference thanks to techbro philanthropy while it lasts? You can always buy the hardware once the free money runs out.

Probably depends on what you are interested in. IMO, setting up local programs is more fun anyway. Plus, any project I’d do with LLMs would just be for fun and learning at this point, so I figure it is better to learn skills that will be useful in the long run.

Interesting. Answering them? Solving them? Looking for ones to solve?

Heh. I'm jealous. I'm still running a first gen Mac Studio (M1 Max, 64 gigs RAM.) It seemed like a beast only 3 years ago.

I did. Admittedly it was for video processing at 8k which uses more than 128gb of ram, but I am NOT a YouTuber.

I doubt many of them are, either.

When the 2019 Mac Pro came out, it was "amazing" how many still photography YouTubers all got launch day deliveries of the same BTO Mac Pro, with exactly the same spec:

18 core CPU, 384GB memory, Vega II Duo GPU and an 8TB SSD.

Or, more likely, Apple worked with them and made sure each of them had this Mac on launch day, while they waited for the model they actually ordered. Because they sure as hell didn't need an $18,000 computer for Lightroom.


Still rocking a 2019 Mac Pro with 192GB RAM for audio work, because I need the slots and I can’t justify the expense of a new one. But I’m sure a M4 Mini is faster.

How crazy do you have to get with # of tracks or plugins before it starts to struggle? I was under the impression that most studios would be fine with an Intel Mac Mini + external storage.

Of course they're not. Everybody is waiting for next generation that will run LLMs faster to start buying.

Every generation runs LLMs faster than the previous one.

That product can still steal fab slots from cheaper, more prosumer products.

For $50K, you could buy 25 Framework desktop motherboards (128G VRAM each w/Strix Halo, so over 3TB total) Not sure how you'll cluster all of them but it might be fun to try. ;)

There is no way to achieve a high throughput low latency connection between 25 Strix Halo systems. After accounting for storage and network, there are barely any PCIe lanes left to link two of them together.

You might be able to use USB4 but unsure how the latency is for that.


In general I agree with you, the IO options exposed by Strix Halo are pretty limited, but if we're getting technical you can tunnel PCIe over USB4v2 by the spec in a way that's functionally similar to Thunderbolt 5. That gives you essentially 3 sets of native PCIe4x4 from the chipset and an additional 2 sets tunnelled over USB4v2. TB5 and USB4 controllers are not made equal, so in practice YMMV. Regardless of USB4v2 or TB5, you'll take a minor latency hit.

Strix Halo IO topology: https://www.techpowerup.com/cpu-specs/ryzen-ai-max-395.c3994

Frameworks mainboard implements 2 of those PCIe4x4 GPP interfaces as M.2 PHY's which you can use a passive adapter to connect a standard PCIe AIC (like a NIC or DPU) to, and also interestingly exposes that 3rd x4 GPP as a standard x4 length PCIe CEM slot, though the system/case isn't compatible with actually installing a standard PCIe add in card in there without getting hacky with it, especially as it's not an open-ended slot.

You absolutely could slap 1x SSD in there for local storage, and then attach up to 4x RDMA supporting NIC's to a RoCE enabled switch (or Infiniband if you're feeling special) to build out a Strix Halo cluster (and you could do similar with Mac Studio's to be fair). You could get really extra by using a DPU/SmartNIC that allows you to boot from a NVMeoF SAN to leverage all 5 sets of PCIe4x4 for connectivity without any local storage but we're hitting a complexity/cost threshold with that that I doubt most people want to cross. Or if they are willing to cross that threshold, they'd also be looking at other solutions better suited to that that don't require as many workarounds.

Apple's solution is better for a small cluster, both in pure connectivity terms and also with respect to it's memory advantages, but Strix Halo is doable. However, in both cases, scaling up beyond 3 or especially 4 nodes you rapidly enter complexity and cost territory that is better served by nodes that are less restrictive unless you have some very niche reason to use either Mac's (especially non-pro) or Strix Halo specifically.


Do they need fast storage, in this application? Their OS could be on some old SATA drive or whatever. The whole goal is to get them on a fast network together; the models could be stored on some network filesystem as well, right?

It's more than just the model weights. During inference there would be a lot of cross-talk as each node broadcasts its results and gathers up what it needs from the others for the next step.

I figured, but it's good to have confirmation.

You could use llama.cpp rpc mode over "network" via usb4/thunderbolt connection

I have found value for one off tasks. I forget the exact situation, but I wanted to do some data transformation, something that would normally take me a half hour of awk/sed/bash or python scripting. AI spit it out right away.

My experience is the productivity gains are negative to neutral. Someone else basically wrote that the total "work" was simply being moved from one bucket to another. (I can't find the original link.)

Example: you might spend less time on initial development, but more time on code review and rework. That has been my personal experience.


I have, but these were generally founder types that accidentally became managers. They weren't "career managers." The career managers delegate that work.

At least they're running the test suite? I'm working with guys who don't even do that! I've also heard "I've fixed the tests" only to discover, yes, the tests pass now, but the behavior is no longer correct...

Yep, the value isn't there. I'm on a very lopsided team, about 5 juniors to 1 senior. Almost all of the senior time is being consumed in "mentorship", mostly slogging through AI slop laden code reviews. There have been improvements, but it's taking a long time.

Have you considered regulting AI use, or is it just easier to be mad at the workers and do nothing?

Yes, we are working on some guidelines, but there are layers of bureaucracy...

That's fair. I'm sorry for being snippy. It just feels weird how my junior years always felt like I was on the edge of a needle for being fired because I didn't work "fast enough". Then I hear stories of this vibe coded slop and everyone seeks to be shrugging in confusion.

Its even more frustrating knowing those people went through a overly long gauntlet and prevailed over hundreds of other qualified would-be engineers. Its so weird just seeing an entire pipeline built around minimizing this situation utterly fail.


There are some definite signs of over reliance on AI. From emojis in comments, to updates completely unrelated to the task at hand, if you ask "why did you make this change?", you'll typically get no answer.

I don't mind if AI is used as a tool, but the output needs to be vetted.


What is wrong with emojis in comments? I see no issue with it. Do I do it myself? No. Would I pushback if a young person added emojis to comments? No. I am looking at "the content, not the colour".

I think GP may be thinking that emojis in PR comments (plus the other red flags they mentioned) are the result of copy/paste from LLM output, which might imply that the person who does mindless copy/pasting is not adding anything and could be replaced by LLM automation.

The point is that heavy emoji use means AI was likely used to produce a changeset, not that emojis are inherently bad.

The emojis are not a problem themselves. They're a warning sign: slop is (probably) present, look deeper.

Exactly. Use LLMs as a tutor, a tool, and make sure you understand the output.

My favorite prompt is "your goal is to retire yourself"

Some people want the projects they're involved with to actually be successful?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: