Agreed. Website operators should have a hard look at why their unoptimized crap ...

immibis · 2025-05-28T15:24:00 1748445840

I've been blocking a few scrapers from my gitea service - not because it's overloaded, more just to see what happens. They're not getting good data from <repo>/commit/<every sha256 in the repo>/<every file path in the repo> anyway. If they actually wanted the data they could run "git clone".

I just checked, since someone was talking about scraping in IRC earlier. Facebook is sending me about 3 requests per second. I blocked their user-agent. Someone with a Googlebot user-agent is doing the same stupid scraping pattern, and I'm not blocking it. Someone else is sending a request every 5 seconds with

One thing that's interesting on the current web is that sites are expected to make themselves scrapeable. It's supposed to be my job to organize the site in such a way that scrapers don't try to scrape every combination of commit and file path.