Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are there any large publicly available archives of usenet one can download?

I need it for an information retrieval research project. I am aware of gmane.org, but do not think they allow bulk download.





Please use the bittorrent download option if possible. It reduces load on archive.org.


If there are any seeds, sure. Torrents are most useful for new, big, popular items.


IA torrent files use the archive as web seeds if they have to. But if there's a spike in interest - like right now apparently - it would reduce the load. So it will still work if there are no seeds, and it will reduce load on the servers when that's possible.

Edit: this is all just to be polite, since the archive is not worried about using a ton of bandwidth.


Is the code that runs the frontend of the IA open source? I'd be interested in contributing to that, so when requests for certain objects are creating excessive load, a response status code is provided to indicate so, and the alternate URI returned is a magnet link for the object.

EDIT: It appears an HTTP 303 status code accomplishes this


503, we return the standard 503 code. Remember that most of our users don't know what BitTorrent is, and would prefer that the archive Just Worked.


Right! I'm not interested in breaking the Internet Archive, and I'd expect it to move to IPFS [1] eventually (content addressable web) [2]. If/when/how that happens, I'd expect traditional http tooling to still work (curl, wget, etc), which is why I went looking for a status code that indicates an alternate path for the resource/content.

That's why my above comment kept getting edited as I did some more research. 503 is an ugly failure. 429 tells the client to back off, but it doesn't provide a fallback to still get the content. 303 does.

I thought this train of thought was in line with Brewster's blog post [2]. Apologies for the confusion!

[1] https://ipfs.io/

[2] http://brewster.kahle.org/2015/08/11/locking-the-web-open-a-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: