More

snehesht · 2026-03-29T12:43:25 1774788205

Kokuyo Stapleless Stapler

In US, you can find it on Amazon for 15-20 bucks.

snehesht · 2026-03-29T12:36:28 1774787788

Why not simply blacklist or rate limit those bot IP’s ?

xprnio · 2026-03-29T13:17:31 1774790251

If you have real traffic and bot traffic, you still need to identify which is which. On top of that, bots very likely don’t reuse the same IPs over and over again. I assume if we knew all the IPs used only by bots ahead of time, then yeah it would be simple to blacklist them. But although it’s simple in theory, the practice of identifying what to blacklist in the first place is the part that isn’t as simple

snehesht · 2026-03-29T13:40:12 1774791612

You wouldn’t permanently block them, it’s more like a rolling window.

You can use security challenges as a mechanism to identify false positives.

Sure bots can get tons of proxies for cheap, doesn’t mean you can’t block them similar to how SSH Honeypots or Spamhaus SBL work albeit temporarily.

phyzome · 2026-03-29T13:10:16 1774789816

Because punishment for breaking the robots.txt rules is a social good.

Bender · 2026-03-29T17:48:56 1774806536

Why not simply blacklist or rate limit those bot IP’s ?

Many bots cycle through short DHCP leases on LTE wifi devices. One would have to accept blocking all cell phones which I have done for my personal hobby crap but most businesses will not do this. Another big swath of bots come from Amazon EC2 and GoogleCloud which I will also happily block on my hobby crap but most businesses will not.

Some bots are easier to block as they do not use real web clients and are missing some TCP/IP headers making them ultra easy to block. Some also do not spoof user-agent and are easy to block. Some will attempt to access URL's not visible to real humans thus blocking themselves. Many bots can not do HTTP/2.0 so they are also trivial to block. Pretty much anything not using headless Chrome is easy to block.

aduwah · 2026-03-29T13:04:55 1774789495

There are way too many to do that

snehesht · 2026-03-29T13:54:50 1774792490

True, most of the blacklists systems today aren’t realtime like Amazon WAF or Cloudflare.

We need a Crawler blacklist that can in realtime stream list deltas to centralized list and local dbs can pull changes.

Verified domains can push suspected bot ips, where this engine would run heuristics to see if there is a patters across data sources and issue a temporary block with exponential TTL.

There are many problems to solve here, but as any OSS it will evolve over time if there is enough interest in it.

Costs of running this system will be huge though and corp sponsors may not work but individual sponsors may be incentivized as it’s helps them reduce bandwidth, compute costs related to bot traffic.

pixl97 · 2026-03-29T14:25:35 1774794335

In the real-time spam market the lists worked well with honest groups for a bit, but started falling apart when once good lists get taken over by actors that realize they can use their position to make more money. It's a really difficult trap to avoid.

arbol · 2026-03-29T14:12:34 1774793554

The AI companies are using virtually unlimited "clean" residential IPs so this is not a valid strategy.

DaiPlusPlus · 2026-03-29T14:19:04 1774793944

How? They run their scraping and training infrastructure - and models themselves - from within those “AI datacenters”[1] we hear about in the news - and not proxying through end-users’ own pipes.

[1]: in quotes, because I dislike the term, because it’s immaterial whether or not an ugly block of concrete out in the sticks is housing LLM hardware - or good ol’ fashioned colo racks.

AyyEye · 2026-03-29T15:20:01 1774797601

Residential proxy networks.

nextlevelwizard · 2026-03-29T17:52:29 1774806749

Point is to kill or at least hinder AI progress

xyzal · 2026-03-29T15:37:46 1774798666

For the lulz

snehesht · 2026-02-03T18:43:21 1770144201

50/200 Gb free plus $0.5 / Gb out egress data seems expensive when scaling out.

snehesht · 2026-01-02T18:51:46 1767379906

Wow, the whole thing (website, github repo) is down.

snehesht · 2025-11-19T06:03:30 1763532210

After the first five minutes of using it on Ubuntu, it crashed with error saying I don't have enough free memory, quick look into system stats proved that wasn't the case.

Anyway, not a great first impression. I guess I'll try again in a few months.

snehesht · 2025-11-17T01:09:14 1763341754

Isn’t this easy for LLMs to avoid by passing an instruction to ignore any hidden links ?

krackers · 2025-11-17T01:34:08 1763343248

Companies mass crawling don't use LLMs for crawling itself, that would be too expensive.

snehesht · 2025-11-19T06:05:11 1763532311

Make sense, but doesn't necessarily have to be an llm, just a regular dom parser will be able to tell whether an element is visible or hidden.

snehesht · on April 25, 2024

Thanks for pointing it out.

snehesht · on March 14, 2024

Spreadsheet link https://github.com/ianand/spreadsheets-are-all-you-need/rele...

brnt · on March 14, 2024

I'm not a big Excel user, but I see errors that I get when I type in English function names while using a non-English version of Excel. Is it correct that functions (and thus this xlsb file) are not portable to other language versions of Excel?

c-linkage · on March 14, 2024

It's been my experience that Excel spreadsheets are not transferable from one locale to another. Maybe there is a "culture-invariant" version of a spreadsheet but I haven't found it.

ianand · on March 14, 2024

Oh man. Was not aware. That's a bummer. Have logged it as something to look into https://github.com/ianand/spreadsheets-are-all-you-need/issu...

It's another reason to potentially port this thing to the browser one day... https://github.com/ianand/spreadsheets-are-all-you-need/issu...

brnt · on March 15, 2024

Even the CSV export of Excel sets the separator based on locale, rendering them hard to use in international setting. I work at a European statistics office, and although there's SDMX, it's not under Save-as in Excel.

qsdf38100 · on March 15, 2024

They are portable, but the displayed function names are translated depending on the local. Same as comma vs dots.

snehesht · on Feb 7, 2023

Link to Github: https://github.com/sqrl-lang/sqrl

snehesht · on Nov 5, 2022

Code: https://github.com/TeaPearce/Counter-Strike_Behavioural_Clon...