More

japaw · on April 4, 2017

I am working on a website where GitHub users can see how they rank compared to other users.

The site analyses GitHub profiles and commit history to make a more extensive summery then GitHub does. I made it mostly to learn more about ranking algorithms and automatic text generation.

If you have GitHub profile you can look your self up here: https://www.findsosial.com/search

japaw · on Feb 3, 2017

For some backround: This is the "Security gun rule". Under this regulation, the Social Security Administration will submit information on recipients of disability insurance to the National Instant Criminal Background Check System if they met certain “mental impairment” criteria[0].

As I understand it most of the peopled with disability insurance for mental impairment will be military veterans.

0: http://www.foxnews.com/politics/2017/01/16/gun-rights-groups...

japaw · on Dec 20, 2016

> The part that confused me is when they claim to have obtained MethBot source code, but never mention how.

>

On page 19 in the The Methbot Operation report they state that ‘White Ops detection technology was able to use a JavaScript language feature called “reflection” to gather extensive, detailed information about its inner workings.’

I have personally never heard about JavaScript reflection before, but it appear to be a debug method for one object to dump information or data about another object.

Maybe the White Ops software loaded some JavaScript that was able to dump much of its environment and send it back to White Ops?

untog · on Dec 21, 2016

I don't know more than anyone else about this particular situation, but I can imagine how JS reflection works. Something like:

    let test = function() { return "hello";}
    test.toString()

returns

    "function() { return "hello";}"

It's not too difficult to imagine that pairing that with some JS parsing would allow you to slowly crawl your way around an app and gather the app structure. Crazy, and fascinating idea.

geocar · on Dec 21, 2016

If WhiteOps did that, they didn't need to. A nodejs vm escape was sufficient to get process.mainModule.require and from there, it's game over.

japaw · on July 22, 2016

To be fear, they did not just publish a negative blog post about WeWork as the title may suggest, but somehow obtained proprietary data WeWork probably sees as a trade secret.

Many companies will revoke your access then.

michaelbuckbee · on July 22, 2016

It's still proprietary, but if you publish the data via your API is it reasonable to expect people to not chart it and draw conclusions?

Where's the line between "scraping" (prohibited) and reading (allowed) of the API data?

japaw · on July 22, 2016

As I understand it is currently unknown where the author got the data. If WeWorks published this in a publicly available API it is of course reasonable to chart it, but if the author somehow misuses the access he got as a tenant there, this is not (in my view) just an issue about the author publishing a negative blog post as the title indicates.

aji · on July 22, 2016

it's my understanding the API they used was undocumented, so while it's WeWork's fault for not doing a better job securing an API that seems sensitive to them, it's also disrespectful to do something that you're pretty clearly not wanted to be doing

pm90 · on July 22, 2016

IANAL but you can be in trouble even if the data is openly available but the party managing the data deems it restricted in any way. I believe this is the reason security researchers get in trouble: they access data that is considered restricted and get prosecuted for "stealing" it.

I think the logic to that is something akin to: its illegal for you to take my car even if I leave it unlocked with the keys in it.

nostrademons · on July 22, 2016

The company has to communicate that the data is restricted, they can't just deem it restricted after the fact. Usually when companies or individuals (eg. 3Taps & Aaron Swartz) have gotten in trouble under the CFAA, it's been because they've been served a C&D or IP-blocked and then persist in accessing the data, which the courts have upheld as "knowingly and intentionally accessing a computer without authorization".

In this case, WeWork is within their rights to terminate ThinkNum's membership for the ToS violation, but there's no legal case unless ThinkNum persists in scraping WeWork's data after the termination, or there's evidence that ThinkNum knew that the API was restricted at the time they accessed it. Hence the founder's repeated insistences that he did nothing wrong, and coyness in discussing the source of the data.

japaw · on Jan 20, 2016

Found one source that indicates that the payment may be substantial: "Financial Times reports that one digital media company (which asked not to be named) was told that it would cost 30 percent of its advertising revenue to be whitelisted by Eyeo and AdBlock Plus." - http://arstechnica.com/business/2015/02/over-300-businesses-...

japaw · on Jan 15, 2016

This is going to be a hurdle for me that need a search feed that can be parsed server side. There alternative YPA (Yahoo Partner Ads) is client side only.

Will most people migrate to Microsoft Bing? The results are the same anyway, but the api format differs.

japaw · on Jan 15, 2016

Are there any large publicly available archives of usenet one can download?

I need it for an information retrieval research project. I am aware of gmane.org, but do not think they allow bulk download.

sp332 · on Jan 15, 2016

https://archive.org/details/usenet

wumpus · on Jan 15, 2016

And https://archive.org/details/usenethistorical and https://archive.org/details/giganews

toomuchtodo · on Jan 15, 2016

Please use the bittorrent download option if possible. It reduces load on archive.org.

wumpus · on Jan 15, 2016

If there are any seeds, sure. Torrents are most useful for new, big, popular items.

sp332 · on Jan 15, 2016

IA torrent files use the archive as web seeds if they have to. But if there's a spike in interest - like right now apparently - it would reduce the load. So it will still work if there are no seeds, and it will reduce load on the servers when that's possible.

Edit: this is all just to be polite, since the archive is not worried about using a ton of bandwidth.

toomuchtodo · on Jan 15, 2016

Is the code that runs the frontend of the IA open source? I'd be interested in contributing to that, so when requests for certain objects are creating excessive load, a response status code is provided to indicate so, and the alternate URI returned is a magnet link for the object.

EDIT: It appears an HTTP 303 status code accomplishes this

wumpus · on Jan 15, 2016

503, we return the standard 503 code. Remember that most of our users don't know what BitTorrent is, and would prefer that the archive Just Worked.

toomuchtodo · on Jan 15, 2016

Right! I'm not interested in breaking the Internet Archive, and I'd expect it to move to IPFS [1] eventually (content addressable web) [2]. If/when/how that happens, I'd expect traditional http tooling to still work (curl, wget, etc), which is why I went looking for a status code that indicates an alternate path for the resource/content.

That's why my above comment kept getting edited as I did some more research. 503 is an ugly failure. 429 tells the client to back off, but it doesn't provide a fallback to still get the content. 303 does.

I thought this train of thought was in line with Brewster's blog post [2]. Apologies for the confusion!

[1] https://ipfs.io/

[2] http://brewster.kahle.org/2015/08/11/locking-the-web-open-a-...

japaw · on Jan 14, 2016

Probobly. Hopefully so have they learned from the fallout from the AOL search log case ( https://en.wikipedia.org/wiki/AOL_search_data_leak ). That case was certainly a big mess.

jacquesm · on Jan 14, 2016

I'm not sure why you're downvoted, the AOL search log case was a huge mistake on the part of AOL and I'm quite surprised that Yahoo! would take a risk like this. The real risk is never in just this data but in combining it with other public datasets. I haven't looked at what is in this particular dump in detail but if there is data that had to be anonymized (as they claim) then you can bet that there will be people already busy trying to reverse that.

japaw · on Jan 12, 2016

Besides China, there are other sources North Korea can have used to aquiver this technology. Both North Korea, Libya, Iran and China did get help developing nuclear technology from a network set up by Abdul Qadeer Khan, a Pakistani nuclear physicist often regarded as the fonder of the Pakistani nuclear enrichment program.

He again acquired the technology in Europe when working for Urenco, a nuclear fuel company.

The whole Q.A Khan affair is quit a fascinating story actually.

Some more information on http://world.time.com/2011/07/07/a-q-khans-revelations-did-p... and https://en.m.wikipedia.org/wiki/Abdul_Qadeer_Khan

I did also see a very good documentary about this, but I can not find it online right now.

Edit: Here is the BBC documentary about Khans dealings in nuclear secrets: http://www.unewstv.com/6821/bbc-documentary-on-dr-abdul-qade... . Stranger then fiction, it has a plot like a top thriller movie.

meric · on Jan 12, 2016

I wouldn't be surprised if the nuclear physicist in The Dictator was based on Khan. He sounds like he's proud of his work and I admire that, moral judgements on nuclear weapons aside.

japaw · on Jan 10, 2016

Maybe web scraping then? It would not be so hard to make a focused scraper that scrapes the friends of anyone using the app.

Edit: I did a quick test and at looks like I can see the friend list of many users that is not in my immediate network as long as I am logged in to Facebook. It should then be easy to use something like Perls WWW::Mecanize to make a scraper that log inn and scrapes the profiles you want, as long as one do not need so many that Facebook detects and banns you.

hboon · on Jan 10, 2016

But I presume they don't have the users' passwords to login with.

japaw · on Jan 10, 2016

No they probably do not have the users Facebook password, but they do not need it for scraping, because they can just use their own use for that.

I have looked around on Facebook and it looks like one can see other users friend list, even if you are not in their immediate network.

Even if Facebook has a limitation, like you can only see the friend list of friends of friends the company behind this app could probably make some fake Facebook users and befriends someone on each university to get an ok coverage.

TeMPOraL · on Jan 10, 2016

It depends on the privacy settings. Though the several iterations of privacy scaremongering and Facebook changing defaults resulted in people locking up their accounts like crazy, friend lists seem to still be visible semi-publicly for quite a lot of people. With more news like that, this will probably change too, though.

kuschku · on Jan 10, 2016

> Though the several iterations of privacy scaremongering

I'd argue that this is another case which shows that the privacy scaremongering isn't scaremongering, but the privacy issues are real.

TeMPOraL · on Jan 10, 2016

It's totally a POV issue IMO, that's why I phrased it that way :).

For me, half of the Facebook's utility was the ability to check people out without having to commit to a relation with them first. A publishing platform, a little bit like personal pages of old, but much more streamlined and accessible to the mainstream. But it turned out there's enough bad actors around (stalkers, marketers) that people voted against this, and so Facebook is now a very locked down place. I think most of those fears people have are overblown, but well, that's only my opinion and it seems that most people disagree.

0942v8653 · on Jan 10, 2016

Though those concerns may be real, the privacy issue being discussed is still a trick employed by Facebook. By redirecting people's fear towards the amount of information that the public can see, they were able to keep them from talking about the original issue -- what Facebook tracks, saves, and uses for advertising.