Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Agreed. Website operators should have a hard look at why their unoptimized crap can't manage such low request rates before contributing to the enshittification of the web by deploying crapware like anubis or buttflare.


I've been blocking a few scrapers from my gitea service - not because it's overloaded, more just to see what happens. They're not getting good data from <repo>/commit/<every sha256 in the repo>/<every file path in the repo> anyway. If they actually wanted the data they could run "git clone".

I just checked, since someone was talking about scraping in IRC earlier. Facebook is sending me about 3 requests per second. I blocked their user-agent. Someone with a Googlebot user-agent is doing the same stupid scraping pattern, and I'm not blocking it. Someone else is sending a request every 5 seconds with

One thing that's interesting on the current web is that sites are expected to make themselves scrapeable. It's supposed to be my job to organize the site in such a way that scrapers don't try to scrape every combination of commit and file path.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: