More

lorey · 2025-12-30T21:07:41 1767128861

Very interesting points. Would you mind sharing a few examples of when cherry-picking is necessary and why atomic changes are a lie?

I'm using a monorepo for my company across 3+ products and so far we're deploying from stable release to stable release without any issues.

Eridrus · 2025-12-30T21:43:41 1767131021

Atomic changes are a lie in the sense that there is no atomic deployment of a repo.

The moment you have two production services that talk to each other, you end up with one of them being deployed before the other.

oooyay · 2025-12-31T02:04:29 1767146669

Atomicity also rarely matters as much as people think it does if contracts are well defined and maintained.

Eridrus · 2025-12-31T15:10:04 1767193804

A selling point of monorepos is that you don't need to maintain backwards compatible contracts and can make changes to both sides of an API at once.

array_key_first · 2025-12-31T20:39:57 1767213597

If you have a monolith you get atomic deployment, too.

yencabulator · 2026-01-02T19:11:22 1767381082

You lose atomic deployment and have a distributed system the moment you ship Javascript to a browser.

Hell, you lose "atomic" assets the moment you serve HTML that has URLs in it.

Consider switching from <img src=kitty.jpg> to <img src=puppy.jpg>. If you for example, delete kitty from the server and upload puppy.jpg, then change html, you can have a client with URL to kitty while kitty is already gone. Generally anything you published needs to stay alive for long enough to "flush out the stragglers".

Same thing applies to RPC contracts.

Same thing applies to SQL schema changes.

array_key_first · 2026-01-02T23:34:48 1767396888

They just refresh the page, it's not a big deal. It'll happen on form submission or any navigation anyway. Some people might be caught in a weird invalid state for, like, a couple minutes absolute maximum.

yencabulator · 2026-01-02T23:38:46 1767397126

If you're not interested in solving the problem, then don't claim to solve the problem.

array_key_first · 2026-01-03T22:29:28 1767479368

Right, there's level of solutions. You can't sit here and say that a few seconds of invalid state on the front-end only for mayyyyybe .01% of your users is enough to justify a sprawling distributed system because "well deployments aren't atomic anyway!1!".

IMO, monorepos are much easier to handle. Monoliths are also easier to handle. A monorepo monolith is pretty much as good as it gets for a web application. Doing anything else will only make your life harder, for benefits that are so small and so rare that nobody cares.

yencabulator · 2026-01-03T22:54:38 1767480878

Monorepo vs not is not the relevant criteria. The difference is simply whether you plan your rollout to have no(/minimal) downtime, or not. Consider SQL schema migration to add a non-NULL column on a system that does continuous inserts.

array_key_first · 2026-01-05T20:43:29 1767645809

Again, that's trivial if you use up and down servers. No downtime, and to your users, instant deployment across the entire application.

If you have a bajillion services and they're all doing their own thing with their own DB and you have to reconcile version across all of them and you don't have active/passive deployments, yes that will be a huge pain in the ass.

So just don't do that. There, problem solved. People need to stop doing micro services or even medium sized services. Make it one big ole monolith, maybe 2 monoliths for long running tasks.

yencabulator · 2026-01-05T20:58:47 1767646727

Magical thinking about monorepos isn't going to make SQL migrations with backfill instantaneous and occur simultaneously with the downtime you have while you switch software versions. You're just not familiar with the topic, I guess. That's okay. Please just don't claim the problem doesn't exist.

And yes, it's often okay to ignore the problem for small sites that can tolerate the downtime.

ratorx · 2025-12-30T21:43:48 1767131028

Not sure what GP had in mind, but I have a few reasons:

Cherry picks are useful for fixing releases or adding changes without having to make an entirely new release. This is especially true for large monorepos which may have all sorts of changes in between. Cherry picks are a much safer way to “patch” releases without having to create an entirely new release, especially if the release process itself is long and you want to use a limited scope “emergency” one.

Atomic changes - assuming this is related to releases as well, it’s because the release process for the various systems might not be in sync. If you make a change where the frontend release that uses a new backend feature is released alongside the backend feature itself, you can get version drift issues unless everything happens in lock-step and you have strong regional isolation. Cherry picks are a way to circumvent this, but it’s better to not make these changes “atomic” in the first place.

GeneralMayhem · 2025-12-30T21:38:16 1767130696

Do you take down all of your projects and then bring them back up at the new version? If not, then you have times at which the change is only partially complete.

cgio · 2025-12-30T23:10:37 1767136237

I would see a potentially more liberal use of atomic, that if the repo state reflects the totality of what I need to get to new version AND return to current one, then I have all I need from a reproducibility perspective. Human actions could be allowed in this, if fully documented. I am not a purist, obviously.

rezonant · 2025-12-30T22:18:18 1767133098

Nah, these days the new thing is Vibe Deployments, just ship the change and pray.

awesome_dude · 2025-12-30T22:20:04 1767133204

People that Blue Green are doing that, aren't they?

Canary/Incremental, not so much

Denvercoder9 · 2025-12-30T23:11:35 1767136295

Blue/green might allow you to do (approximately) atomic deploys for one service, but it doesn't allow you to do an atomic deploy of the clients of that service as well.

nkmnz · 2025-12-31T00:03:04 1767139384

Why that? In a very simple case, all services of a monorepo run on a single VM. Spin up new VM, deploy new code, verify, switch routing. Obviously, this doesn't work with humongous systems, but the idea can be expanded upon: make sure that components only communicate with compatible versions of other components. And don't break the database schema in a backward-incompatible way.

Denvercoder9 · 2025-12-31T01:24:16 1767144256

So yes, in theory you can always deploys sets of compatible services, but it's not really workable in practice: you either need to deploy the world on every change, or you need to have complicated logic to determine which services are compatible with which deployment sets of other services.

There's a bigger problem though: in practice there's almost always a client that you don't control, and can't switch along with your services, e.g. an old frontend loaded by a user's browser.

nkmnz · 2026-01-03T08:25:44 1767428744

The notion of external clients is a smell. If that’s the case, you need a compat layer between that client and your entrypoints, otherwise you’ll have a very hard time evolving anything. In practice, this can include providing frontend assets under previously cached endpoints; a version endpoint that triggers cache busting; a load balancer routing to a legacy version for a grace period… sadly, there‘s no free lunch here.

awesome_dude · 2025-12-31T00:09:26 1767139766

The only way I could read their answer as being close to correct is if the clients they're referring to are not managed by the deployment.

But (in my mind) even a front end is going to get told it is out of date/unusable and needs to be upgraded when it next attempts to interact with the service, and, in my mind atleast, that means that it will have to upgrade, which isn't "atomic" in the strictest sense of the word, but it's as close as you're going to get.

gorgoiler · 2025-12-30T22:40:39 1767134439

If your monorepo compiles to one binary on one host then fine, but what do you do when one webserver runs vN, another runs v(N-1), and half the DB cluster is stuck on v(N-17)?

A monorepo only allows you to reason about the entire product as it should be. The details of how to migrate a live service atomically have little to do with how the codebase migrates atomically.

eddd-ddde · 2025-12-31T05:11:44 1767157904

That's why I mention having real stable APIs for cross-service interaction, as you can't guarantee that all teams deploy the exact same commit everywhere at once. It is possible but I'd argue that's beyond what a monorepo provides. You can't exactly atomically update your postgres schema and JavaScript backend in one step, regardless of your repo arrangement.

Adding new APIs is always easy. Removing them not so much since other teams may not want to do a new release just to update to your new API schema.

bb88 · 2025-12-30T23:03:33 1767135813

But isn't that a self-inflicted wound then? I mean is there some reason your devs decided not to fix the DB cluster? Or did management tell you "Eh, we have other things we want to prioritize this month/quarter/year?"

This seems like simply not following the rules with having a monorepo, because the DB Cluster is not running the version in the repo.

Denvercoder9 · 2025-12-30T23:24:56 1767137096

Maybe the database upgrade from v(N-17) to v(N-16) simply takes a while, and hasn't completed yet? Or the responsible team is looking at it, but it doesn't warrant the whole company to stop shipping?

Being 17 versions behind is an extreme example, but always having everything run the latest version in the repo is impossible, if only because deployments across nodes aren't perfectly synchronised.

array_key_first · 2025-12-31T20:42:16 1767213736

This is why you have active/passive setup and you don't run half-deployed code in production. Using API contracts is a weak solution, because eventually you will write a bug. It's simpler to just say "everything is running the same version" and make that happen.

tedmiston · 2025-12-30T21:40:36 1767130836

each deployment is a separate "atomic change". so if a one-file commit downstream affects 2 databases, 3 websites and 4 APIs (madeup numbers), then that is actually 9 different independent atomic changes.

lorey · 2025-12-09T20:29:21 1765312161

Yes, there's the <link rel="icon"...> an other specific tags providing far more options than the old favicons.

For anyone interested: https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...

lorey · 2025-12-09T20:23:50 1765311830

Thanks, this is how I feel about this, too.

lorey · 2025-11-10T12:06:26 1762776386

Spent the last three months building a competitor/lookalike ML model + API. Started using plain embedding similarity and quickly realized you end up with similar noisy results as ocean.io. Ended up using similarity learning which works quite well with little data. Launched this as an API and small web app. Hardest part right now is to fend off scrapers honestly.

Examples:

- YC: https://markets.apistemic.com/companies/y-combinator-goaq9

- uber: https://markets.apistemic.com/companies/uber-com-ojj2j

- Anthropic: https://markets.apistemic.com/companies/anthropicresearch-yx...

Try it for any company here: https://markets.apistemic.com

lorey · 2025-09-02T07:46:44 1756799204

I don't understand why so many VCs fall for "not invented here". Vibe-coded or not, this is just another in-house solution, inferior and more expensive than most out-of-the-box products already out there.

nanark · 2025-09-02T08:35:04 1756802104

(Author here) Maybe you missed the point of what I wrote. I thought the disclaimer made it clear this is just a tiny project for 3 users only and not something meant to scale :) Is my product inferior to Notion, Slack, etc.? OF COURSE. Do I use Notion extensively? Fuck no. I'm more of a Bear (now Craft!) user, but I needed Notion for a handful of tiny features that Tiptap now gives me. So should I pay $60 per seat for the little I need, and miss out on the fun of building my own tool? I think not. But hey, that's just me :)

algo_trader · 2025-09-02T10:15:39 1756808139

I am more interested in the room (apartment?!!) picture [1]

Is it a common tourist setup or do people stay there long term? Whats the rent on that?

Thank god i had company lodging when visiting South Korea

1. https://substackcdn.com/image/fetch/$s_!kXTm!,f_auto,q_auto:...

nanark · 2025-09-02T12:10:11 1756815011

it's in Hongdae ("Hongdae T Stay" on Booking). I paid ₩39,000 a night (~23 euros) with a 25 day stay. I don't think people stay long there, it was mostly foreigners. The bed is like a plank, there is no window. But it was cool, I enjoyed it :)

lorey · on Dec 3, 2024

Had to modify the title slightly to pass validation. Also "How [I algo...]" got removed, did not know that happens automatically.

asicsp · on Dec 7, 2024

You can edit the title after submission (the automatic change won't trigger).

lorey · on March 25, 2023

Why GPT-based then? There are libraries that do this: You give examples, they generate the rules for you and give you a scraper object that takes any html and returns the scraped data.

Mine: https://github.com/lorey/mlscraper Another: https://github.com/alirezamika/autoscraper

pax · on March 27, 2023

Great projects, thank you for the links. On a brief scan neither cover paging/loops - or js frameworks where one would need to use headless browsers and wait for content to load, where a low/lazy code solution might provide the most added value.

lorey · on March 25, 2023

<div class="hidden">Actual name: Batman</div>

Most explicit CSS rules allow you to spot this, implicit rules won't and possibly can't.

tomberin · on March 25, 2023

:) Agree, but the scraping arms race is way beyond that, if someone doesn't want their page scraped this isn't a threat to them.

sebzim4500 · on March 25, 2023

Has it? Can you give me an example of a site that is hard to scrape by a motivated attacker?

I'm curious, because I've seen stuff like the above but of course it only fools a few off the shelf tools, it does nothing if the attacker is willing to write a few lines of node.js

tappio · on March 25, 2023

Try Facebook, I've spent some time trying to make it work but figured out I can do what I need by using Bing API instead and get structured data...

asddubs · on March 25, 2023

i guess the lazy way to prevent this in a foolproof way is to add an ocr somewhere in the pipeline, and use actual images generated from websites. although maybe then you'll get #010101 text on a #000000 background

lorey · on March 25, 2023

Personally, this feels like the direction scraping should move into. From defining how to extract, to defining what to extract. But we're nowhere near that (yet).

A few other thoughts from someone who did his best to implement something similar:

1) I'm afraid this is not even close to cost-effective yet. One CSS rule vs. a whole LLM. A first step could be moving the LLM to the client side, reducing costs and latency.

2) As with every other LLM-based approach so far, this will just hallucinate results if it's not able to scrape the desired information.

3) I feel that providing the model with a few examples could be highly beneficial, e.g. /person1.html -> name: Peter, /person2.html -> name: Janet. When doing this, I tried my best at defining meaningful interfaces.

4) Scraping has more edge-cases than one can imagine. One example being nested lists or dicts or mixes thereof. See the test cases in my repo. This is where many libraries/services already fail.

If anyone wants to check out my (statistical) attempt to automatically build a scraper by defining just the desired results: https://github.com/lorey/mlscraper

tomberin · on March 25, 2023

I was most worried about #2 but surprised how much temperature seems to have gotten that under control in my cases. The author added a HallucinationChecker for this but said on Mastodon he hasn't found many real-world cases to test it with yet.

Regarding 3 & 4:

Definitely take a look at the existing examples in the docs, I was particularly surprised at how well it handled nested dicts/etc. (not to say that there aren't tons of cases it won't handle, GPT-4 is just astonishingly good at this task)

Your project looks very cool too btw! I'll have to give it a shot.

polishdude20 · on March 25, 2023

This seems like part of the problem we're always complaining about where hardware is getting better and better but software is getting more and more bloated so the performance actually goes down.

specproc · on March 25, 2023

Yeah, #1 just makes this seem pointless for the time being. The whole point of needing something like this is horizontal scaling.

Also not clear from my phone down the pub if inference is needed at each step. That would be slow, no? Even (especially?) if you owned the model.

tomberin · on March 25, 2023

No inference is needed. IME it can do a single page in ~10s, $0.01/page. Not practical for most use cases, great for a limited few right now.

sebzim4500 · on March 25, 2023

Yeah seems like it would make way more sense to have an LLM output the CSS rules. Or maybe output something slightly more powerful, but still cheap to compute.

lorey · on Feb 27, 2023

Isn't this what contracts were created for?

Raed667 · on Feb 27, 2023

I have organised many events when I was in school, when you're doing things for fun and with a sense of community, it takes a couple of stings like this for you to start taking contracts more seriously.