Hacker Newsnew | past | comments | ask | show | jobs | submit | phiresky's commentslogin

I'm happy to answer any questions! Nice to see this here again :)

This is great! Especially the DB sync part, because that happens before a user interaction, so you actually have to wait for it (the update itself can run in the background).

It always felt like such a waste to me how the DB always downloads tens of megabytes of data when likely only 1kB has changed. I mean I also really appreciate the beauty of how simple it is. But I'd bet even a delta against a monthly baseline file would reduce the data by >90%.

Also, it would be interesting to see how zstd --patch-from compares to the used delta library. That is very fast (as fast as normal zstd) and the code is already there within pacman.

For the recompression issue, there is some hard to find libraries that can do byte-exact reproducible decompression https://github.com/microsoft/preflate-rs but I don't know of any that work for zstd.


There's an extension to ISO8601 that fixes this and is starting to become supported in libraries:

    2019-12-23T12:00:00-02:00[America/Sao_Paulo]
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


That seems a bit excessive to sandbox a command that really just downloads arbitrary code you are going to execute immediately afterwards anyways?

Also I can recommend pnpm, it has stopped executing lifecycle scripts by default so you can whitelist which ones to run.


At work, we're currently looking into firejail and bubblewrap a lot though and within the ops-team, we're looking at ways to run as much as possible, if not everything through these tools tbh.

Because the counter-question could be: Why would anything but ssh or ansible need access to my ssh keys? Why would anything but firefox need access to the local firefox profiles? All of those can be mapped out with mount namespaces from the execution environment of most applications.

And sure, this is a blacklist approach, and a whitelist approach would be even stronger, but the blacklist approach to secure at least the keys to the kingdom is quicker to get off the ground.


firejail, bubblewrap, direct chroot, sandbox-run ... all have been mentioned in this thread.

There is a gazillion list of tools that can give someone analysis paralysis. Here's my simple suggestion: all of your backend team already knows (or should) learn Docker for production deployments.

So, why not rely on the same? It might not be the most efficient, but then dev machines are mostly underutilized anyway.


> Also I can recommend pnpm, it has stopped executing lifecycle scripts by default so you can whitelist which ones to run.

Imagine you are in a 50-person team that maintains 10 JavaScript projects, which one is easier?

  - Switch all projects to `pnpm`? That means switching CI, and deployment processes as well
  - Change the way *you* run `npm` on your machine and let your colleagues know to do the same
I find the second to be a lot easier.


I don’t get your argument here. 10 isn’t a huge number in my book but I don’t know of course what else that entails. I would opt for a secure process change over a soft local workflow restriction that may or may not be followed by all individuals. And I would definitely protect my CI system in the same way than local machines. Depending on the nature of CI these machines can have easy access rights. This really depends how you do CI and how lacks security is.


I'll do soft local workflow restriction right away.

The secure process change might take anywhere from a day to months.


There are a great many extra perks to switching to pnpm though. We switched on our projects a while back and haven’t looked back.


Yeah, id just take the time to convert the 10 projects rather than try to get 50 people to chnage their working habots, plus new staff coming in etc.

Switch your projects once, done for all.


So, switching to pnpm does not entail any work habit changes?


Am I missing something? Don't you also need to change how CI and deployment processes call npm? If my CI server and then also my deployment scripts are calling npm the old insecure way, and running infected install scripts/whatever, haven't I just still fucked myself, just on my CI server and whatever deployment system(s) are involved? That seems bad.


Your machine has more projects, data, and credentials than your CI machine, as you normally don't log into Gmail on your CI. So, just protecting your machine is great.

Further, you are welcome to use this alias on your CI as well to enhance the protection.


Attacking your CI machines means to poison your artifacts you ship and systems they get deployed to, get access to all source it builds and can access (often more than you have locally) and all infrastructure it can reach.

CI machines are very much high-value targets of interest.


> Further, you are welcome to use this alias on your CI as well to enhance the protection.

Yes, but if I've got to configure that across the CI fleet as well as in my deploy system(s) in order to not get, and also be distributing malware, what's the difference between having to do that vs switching to pnpm in all the same places?

Or more explicitly, your first point is invalid. Whether you ultimately choose to use docker to run npm or switch to pnpm, it doesn't count to half-ass the fix and only tell your one friend on the team to switch, you have to get all developers to switch AND fix your CI system, AND also your deployment system (s) (if they are exposed).

This comment proffers no option on which of the two solutions should be preferred, just that the fix needs to made everywhere.


You do the backward logic here. I would go for a single person to deal with pnpm migration and CI rather than instruct other 10 for everyone to hopefully do the right thing. And think about it when the next person comes in... so I'd go for the first option for sure.

And npm can be configured to prevent install scripts to be run anyways:

> Consider adding ignore-scripts to your .npmrc project file, or to your global npm configuration.

But I do like your option to isolate npm for local development purposes.


> which one is easier?

> Switch all projects to `pnpm`?

Sorry; I am out of touch. Does pnpm not have these security problems? Do they only exist for npm?


pnpm doesn't execute lifecycle scripts by default, so it avoids the particular attack vector of "simply downloading and installing an NPM package allows it to execute malicious code."

As phiresky points out, you're still "download[ing] arbitrary code you are going to execute immediately afterwards" (in many/most cases), so it's far from foolproof, but it's sufficient to stop many of the attacks seen in the wild. For example, it's my understanding that last month's Shai-Hulud worm depended on postinstall scripts, so pnpm's restriction of postinstall scripts would have stopped it (unless you whitelist the scripts). But last month's attack on chalk, debug, et al. only involved runtime code, so measures like pnpm's would not have helped.


Exactly so you should still execute all JS code in a container.


> That seems a bit excessive to sandbox a command that really just downloads arbitrary code you are going to execute immediately afterwards anyways?

I won't execute that code directly on my machine. I will always execute it inside the Docker container. Why do you want to run commands like `vite` or `eslint` directly on your machine? Why do they need access to anything outside the current directory?


I get this but then in practice the only actually valuable stuff on my computer is... the code and data in my dev containers. Everything else I can download off the Internet for free at any time.


No.

Most valuable data on your system for a malware author is login cookies and saved auth tokens of various services.


Maybe keylogging for online services.

But it is true that work and personal machines have different threat vectors.


Yes, but I'm willing to bet most workers don't follow strict digital life hygiene and cross contaminate all the time.


You don't have any stored passwords? Any private keys in your `.ssh/`? DB credentials in some config files? And the list goes on and on.


I don't store passwords (that always struck me as defeating the purpose) and my SSH keys are encrypted.


This kind of mentality, and "seems a bit excessive to sandbox a command that really just downloads arbitrary code", is why the JS ecosystem is so prone to credential theft. It's actually insane to read stuff like that said out loud.


Right but the opposite mentality winds up putting so much of the eggs in the basket of the container that it defeats a lot of the purpose of the container.


It's weird that it's downvoted because this is the way


maybe i'm misunderstanding the "why run anything on my machine" part. is the container on the machine? isn't that running things on your machine?

is he just saying always run your code in a container?


> is the container on the machine?

> is he just saying always run your code in a container?

yes

> isn't that running things on your machine?

in this context where they're explicitly contrasted, it isn't running things "directly on my machine"


it annoys me that people fully automate things like type checkers and linting into post commit or worse entirely outsourced to CI.

Because it means the hygiene is thrown over the fence in a post commit manner.

AI makes this worse because they also run them "over the fence".

However you run it, i want a human to hold accountability for the mainline committed code.


I run linters like eslint on my machine inside a container. This reduces attack surface.

How does this throw hygiene over the fence?


Yes in a sibling reply, i was able to better understand your comment to mean "run stuff on my machine in a container"


pnpm has lots of other good attributes: it is much faster, and also keeps a central store of your dependencies, reducing disk usage and download time, similar to what java/mvn does.


> command that really just downloads arbitrary code you are going to execute immediately afterwards anyways?

By default it directly runs code as part of the download.

By isolation there is at least a chance to do some form of review/inspection


I've tried use pnpm to replace npm in my project, it really speed up when install dependencies on host machine, but much slower in the CI containers, even after config the cache volume. Which makes me come back to npm.


> That seems a bit excessive to sandbox a command that

> really just downloads arbitrary code you are going to

> execute immediately afterwards anyways?

I don't want to stereotype, but this logic is exactly why javascript supply chain is in the mess its in.


I disagree, you can always spend one or two sentences at the top to immediately bring everyone to a good starting point, regardless of how much technical depth the rest of the article has.

For example in this case: "eBPF is a method for user space to add code to the running Linux kernel without compromising security. They have been tied [...]. The GNU toolchain, the historical and still by many preferred system to build Linux currently has no support.

The description of what LWN and Linux is would be in the about page linked in the article.

It costs almost nothing for an expert to skim/skip two sentences while saving loads of time for everyone else.

The article is also completely missing motivation (why do we care whether BPF is supported in the second toolchain?) Which would be helpful for almost everyone, including people who think it is obvious.

Edit: To be clear though, I love LWN. But the articles are very often missing important context that would be easy to add that I suspect would help a large portion of the reader base.


If your cache fits in Redis then it fits in RAM, if your cache fits in RAM then Postgres will serve it from RAM just as well.

Writes will go to RAM as well if you have synchronous=off.


Not necessarily true. If you're sharing the database with your transaction workload your cache will be paged out eventually.


This was my take as well, but I'm a MySQL / Redis shop. I really have no idea what tables MySQL has in RAM at any given moment, but with Redis I know what's in RAM.


I think relational databases are great, but this is my biggest problem with SQL, even before the ridiculous syntax.

A query like

    select users.*, orders.* from users left join orders on orders.user_id = users.id
Should always have returned a structure like:

    type SingleReturnRow = { users: {id: ..., }[], orders: {user_id: ..., id: ..., }[]}
    type Return = SingleReturnRow[]
Mangling the columns together and _removing_ groupings that naturally appear is just so unnecessary.

I don't think a larger change in the query language would even be needed.

Even better of course would be a return value like

    type SingleReturnRow = { user: User, orders: Order[] }
But I see how that would require a fundamental change in the language.

Of course in PG now you can use

    select users.*, json_agg(orders.*) as orders from users left join orders on orders.user_id = users.id group by users.id
but using JSON as intermediate steps just feels unnatural.


It follows the relational algebra model. Relations (aka Tables) go in, relations (aka Tables) come out. This makes some things really nice.

However I proposed a hierarchical result for such cases a long time to our database, but couldn't convince enough people. json_agg came later at there all the machinery is there, it would "just" require exposing this to the protocol and adapting all clients to understand that data format ...


The world deserves something better than json_agg. For XTDB we coined NEST_ONE and NEST_MANY: https://xtdb.com/blog/the-missing-sql-subqueries

The output format is either raw Arrow DenseUnions (e.g. via FlightSQL) or Transit via a pgwire protocol extension type.


There's nothing in the relational model that suggests a field (cell) can't take the value of a relation. Only SQL makes that difficult.


There is nothing forbidding it, but then you can't process it further with the same algebra, that value then is a single opaque value. (Which for many uses is fine as that should be one of the final steps of processing)


See https://arxiv.org/abs/2312.00638 for one proposal to address this


TypeScript does supports all of these - `C implements I` is not necessary but gives compile errors if not fulfilled.

You can use `o satisfies T` wherever you want to ensure that any object/instance o implements T structurally.

To verify a type implements/extends another type from any third-party context (as your third point), you could use `(null! as T1) satisfies T2;`, though usually you'd find a more idiomatic way depending on the context.

Of course it's all type-level - if you are getting untrusted data you'll need a library for verification. And the immutable story in TS (readonly modifier) is not amazing.


The problem with TypeScript is that it is ridiculously verbose. It is everything people used to complain Java was back in the 90s.


If you replace `sh:link` with `sh:clone` instead, it will.

> clone: reflink-capable filesystems only. Try to clone both files with the FIDEDUPERANGE ioctl(3p) (or BTRFS_IOC_FILE_EXTENT_SAME on older kernels). This will free up duplicate extents while preserving the metadata of both. Needs at least kernel 4.2.


Wow, amazing how you managed to convert that into a compact shader, including a representation of the visualization!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: