Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How to get the most out of Postgres memory settings (tembo.io)
272 points by samaysharma on June 12, 2024 | hide | past | favorite | 82 comments


Aside from these (most of which have reasonably sane defaults with RDS / Aurora), the biggest thing you can do to help the existing RAM you have is to stop making stupid indexing choices.

You probably don’t need (or want) an index on every column.

You shouldn’t index a UUIDv4 column if you can ever help it (and by extension, you shouldn’t use them as a PK).

Finally, you almost certainly need to REINDEX what you have periodically. I’ve seen indices shrink by 400%, giving back GB of RAM. You want a quick win for speed with zero cost? Gain back enough RAM to have your entire working set cached.


What's the problem using UUID as a PK? I really like 128bit numbers for PK, more than 64bit-int or anything text. I'm generally putting a ULID or UUIDV7/8 in there.


UUIDv4 specifically (though ULID [EDIT: this is incorrect, I was thinking of NanoID] has the same problem). Anything non-k-sortable will bloat a B+tree index, causing far more reads than should be required.

MySQL (technically InnoDB) and SQL Server have a huge problem with them due to their design as a clustering index – since the tuples are physically located, or clustered, around the PK, the random nature means that doing a simple query like "SELECT id FROM customer WHERE date_created > NOW() - INTERVAL 1 WEEK" means that the hits are going to be on tons of different pages. Even if everything is cached, that's still a lot of extra work.

Postgres has its own problems with them as well. In the above query, it's possible that it could be an index-only scan. Great, you don't have to dereference the pointer to find the tuple in the heap! Except you still have to go through the Visibility Map, and it's going to be extremely bloated with a UUIDv4 PK. Additionally, due to the combination of how Postgres manages the A in ACID (via `full_page_writes`) and its MVCC implementation, a change to any indexed column in a table results in the entire row being re-written. While logically sequential tuples may be physically co-located (which would be ideal from a write perspective), the index pages are highly unlikely to be if there are non-k-sortable keys, so now you've bloated the WAL as well.

If you must use a UUID (I maintain you generally don't – PlanetScale uses integers internally [0], so clearly it can be done at scale), UUIDv7 is probably the best from a DB performance perspective.

[0]: https://github.com/planetscale/discussion/discussions/366


PostgreSQL also supports a hash index type, which should be able to handle random (v4) UUIDs pretty well. Since there's no meaning to the sort order on random UUIDs anyway, the limitation to exact comparisons (=) shouldn't be a problem.

It doesn't seem like you can specify what index type is used to implement the PRIMARY KEY constraint, though, so this is really only useful for certain situations.


Postgres hash indexes were sorta unusable until a few years ago, and in my experience they weren't any faster back then. Haven't tried again recently, but if I did, I would run benchmarks before trusting it. Also, giving credit where it's due, someone on HN showed me why even this option is less scalable in theory than a btree with semi-ordered keys: https://news.ycombinator.com/item?id=40590760

Still, UUID4 is a good default for publicly visible IDs since anything else leaks info. But there's almost no need to have one as a PK.


UUID7 basic information/examples - https://uuid7.com


>though ULID has the same problem

I thought that ULID was meant to solve these issues by being sortable, is that not right ?.


You’re correct; I was thinking of NanoID, and updated my post to reflect this.


> MySQL (technically InnoDB) and SQL Server have a huge problem with them due to their design as a clustering index – since the tuples are physically located, or clustered, around the PK, the random nature means that doing a simple query like "SELECT id FROM customer WHERE date_created > NOW() - INTERVAL 1 WEEK" means that the hits are going to be on tons of different pages. Even if everything is cached, that's still a lot of extra work.

How do you fix this problem in Postgres?


The postgres query planner has a correlation statistic for how well indexes match the order of the table data on disk, and if it's bad and the query is expected to return enough rows, it'll skip the index entirely. Instead it'll do a sequential scan and sort, to work with the disk cache instead of against it.

Postgres provides a CLUSTER command to reorder the table data on disk to match a given index, resulting in a high correlation and allowing it to do an index scan without the penalty for random disk access. If you're on an SSD or know you have enough memory to have all the data cached you can also disable the random access penalty in the configuration; it exists mainly for spinning disks.


Imo I agree CLUSTER should help, although pg_repack is even better, since it doesn't yknow hold an exclusive lock while it copies the entire table. pg_repack still is very slow compared to consistent BTree rebalancing in MySQL.

The problem isn't fixed with SSD or RAM or the settings in Postgres. Fast random access is great, but if your data isn't localized you are leaving a 100x optimization on the table.

The original comment I replied to explained the data localization problem as if it was a MySQL problem and Postgres doesn't have the issue. That's like saying that Rust allows unsafe and this can cause memory corruption, so let's use C++. It's hard to understate how little sense it makes.

Postgres is basically the only DB which doesn't offer a decent solution for data localization. Every other DB lets you control the layout of rows automatically with minimal resource overhead, postgres tools are extremely bad in comparison.


That presumably explains why some of Microsoft's own products use sequentially assigned GUIDs in SQL Server rather than random ones - using NEWSEQUENTIALID() rather than NEWID()?


I don’t think the visibility map suffers from bloat. It’s a bitmap with 1 bit per page.

You might have more pages due to the extra size of each tuple, but it’s not likely to be a big difference.


> 1 bit per page

Close, it’s 2 bits/page :-)

It’s not that it bloats, it’s that the hits are scattered among many pages. My wording wasn’t great on that point.

Some tests demonstrating this are here [0].

[0]: https://www.cybertec-postgresql.com/en/unexpected-downsides-...


Ah nice. Thanks for the extra info. I’m weirdly fascinated by the inner workings of Postgres.


You might like: ‘PostgreSQL 14 Internals’ by Egor Rogov


128-Bit numbers are absolutely humongous and just by virtue of being a primary key means they get duplicated for every single relation. So if you have your user ID, for instance, as a 128bit uuid, any table and column that references the user ID is now also a 128-bit uuid. It's not a problem if you have very small amounts of data. But once your tables start to grow it will become misery to work with.

In my opinion, the best pattern is to have just a regular bigserial for primary keys to use for internal database relations since postgres is extremely good at generating and optimizing PK sequences (You're taking away all that optimization just because you want an enormous random number to be the primary key).

Then just have a uuid field for application-level identifiers and natural keys. You'll save yourself a whole lot of headache with this pattern.


They're not "absolutely humongous" but merely two times as large as a 64-bit number. That's nothing. They're not even that bad for performance, assuming you have random writes regardless. UUIDs work incredibly well.

Where they don't work well is when they're stored in textual encoding (base16 with dashes) which I've encountered far more than I'd like. They also don't work (random) for append-only insertions like events, for which ULID (or newer UUID variants) are more appropriate.

But for primary keys? Totally fine.

> extremely good at generating and optimizing PK sequences (You're taking away all that optimization just because you want an enormous random number to be the primary key).

Please share the benchmarks to back this up. UUIDs optimize down perfectly fine.

If you're trying to squeeze every last ounce of performance out of the database then certainly this is a yak you can shave. But most Postgres users have 1001 problems that are much bigger than UUIDs. If you've managed to avoid folks writing shitty queries and this is the bottleneck you're facing then I salute you.

Otherwise, don't worry. UUIDs are fine.


> but merely two times as large as a 64-bit number. That's nothing.

It's literally 2x as large. This starts to matter quite a bit at the millions/billions of rows scale.

> They're not even that bad for performance, assuming you have random writes regardless.

> Please share the benchmarks to back this up.

I've done [0] benchmarks, with [1] source to recreate it. This was for loading data, but it demonstrates quite nicely the problem of massive B+tree splits. An integer PK loads data ~2x as quickly as a UUIDv4 PK stored as `uuid` type, and ~3x as fast as one stored as `text` type.

> But most Postgres users have 1001 problems that are much bigger than UUIDs.

All of which become compounded by the use of non-k-sortable keys.

> If you've managed to avoid folks writing shitty queries and this is the bottleneck you're facing then I salute you.

Using UUIDs as a PK without having an extremely good reason to do so demonstrates a lack of understanding of how the underlying technology works. Worse, it demonstrates a lack of understanding of a basic data structure, something that devs should have near and dear to their heart.

With Postgres specifically, the use of UUIDv4 (or ULID) PKs will also cause an enormous amount of WAL bloat [2], unless you happen to be rolling your own DB and are using ZFS for the file system, and you've disabled `full_page_writes` (which is the only safe way to do so). I have personally seen this take down prod, because the other nodes couldn't keep up with the changes being issued (fun fact, most AWS instances under `.4xlarge` in size have a baseline network speed that's much lower than rated).

[0]: https://gist.github.com/stephanGarland/ee38c699a9bb999894d76...

[1]: https://gist.github.com/stephanGarland/fe0788cf2332d6e241ff3...

[2]: https://www.2ndquadrant.com/en/blog/on-the-impact-of-full-pa...


> This starts to matter quite a bit at the millions/billions of rows scale.

Only if you have no other data in the database? Our UTF8 encoded strings easily beat all the UUID’s in terms of storage size.


I doubt those are also duplicated repeatedly in indexes throughout the database.


> I've done [0] benchmarks, with [1] source to recreate it.

Where's the source for the data files? Without that it's not possible to make sense of these benchmarks. Are the keys sorted?

Comparing random-insert to insert-at-end, if that is what you are indeed doing, is silly and not representative of real-world scenarios.

Even then, it's _less than_ a 2x overhead, which is far from an order-of-magnitude difference and squarely within the realm of "I don't care."

> All of which become compounded by the use of non-k-sortable keys.

But that's the thing -- they don't! Bad queries, like missing an index, improperly pruning via predicate pushdown and recursive queries will all dominate performance regardless of what you do here. We're not talking _quadratic_ behavior here. The compounding effect you are worried about is negligible for the vast, vast majority of workloads.

> With Postgres specifically, the use of UUIDv4 (or ULID) PKs will also cause an enormous amount of WAL bloat [2],

Again, only for _random insertions_, which is not a problem specific to UUIDs. Yes I agree that if you are rewriting all of your tables all the time you're going to have a bad time. However UUIDs are not inherently problematic as UUIDv7 or their predecessor ULIDs facilitate temporal ordering, which nearly eliminates the disadvantages discussed here.

I fail to see how ULIDs inherently cause WAL bloat, and checking my notes (read: databases) shows they do not. Why do _you_ believe otherwise?

> Using UUIDs as a PK without having an extremely good reason to do so demonstrates a lack of understanding of how the underlying technology works. Worse, it demonstrates a lack of understanding of a basic data structure, something that devs should have near and dear to their heart.

What is problematic is not understanding the basic data structures and the underlying technology, which is the root of most database issues, which is my point: you almost always have bigger fish to fry. Whether or not you opt to wield UUIDs will not save you here.

In fact, 2ndquadrant agrees with me[0]:

> Chances are your database schema is more complicated and uses various other indexes, so maybe the issues due to UUIDs are fairly negligible in the bigger picture.

[0]: https://www.2ndquadrant.com/en/blog/sequential-uuid-generato...


> Where's the source for the data files? Without that it's not possible to make sense of these benchmarks. Are the keys sorted?

The names were randomly generated, as were the UUIDs. Serial / Identity were of course handled by the DB itself. Nothing was pre-sorted.

> 2ndquadrant agrees with me

I wouldn’t say a summary “maybe the issues are fairly negligible” counts as agreeing. Do terrible queries matter more? Often, yes. However, I’ve found there is a strong correlation between devs using UUIDv4s (even after having been warned of the consequences) and poor schema and query design.

I’m not going to argue any points further as I don’t see it being fruitful. My day job is DBRE; I have ran both MySQL and Postgres DBs at the TB scale, managed and self-hosted. I’m sure there are many who have ran far larger clusters, but I believe I have enough experience seeing the impact first-hand to be able to speak about them.


If your argument boils down to "bad engineers use them" then I don't know what to tell you. There's nothing intrinsically wrong or harmful with UUIDs, and they're not really any worse of a footgun than any other feature.

I don't particularly care about your credentials, but it's harmful to preach extremes. UUIDs don't deserve the hate they're getting here.

Credentials: Postgres (and begrudgingly MySQL) for thousands of developers at the TB scale, 100TB scale, as well as PB scale (we don't talk about that one), and all (yes, all) the Postgres-related OLAP DBs, as well as maintaining a fork of Postgres for some time.

The FUD here is unreal.


UUIDs are an inefficient bloated way to store PKs and only make sense in scenarios where distributed clients need a way to independently generate conflict-free identifiers.


To be honest, it sounds like you've made some of the terrible mistakes with uuid PKs described in this thread and now you're just very aggressively defending the poor decision making.

There's no FUD. These are facts.


I haven't! I literally never think about these things because they are literally never problems. It's frustrating to see people suggest that such things are intrinsically problematic because they have misused, or witnessed them be misused, without recognizing that they are not intrinsically worse or better than alternatives.

I've witnessed severe, horrifying problems, absolute nightmares, due to misuse of serial/bigserial. I'm not here trying to convince anyone not to use them. Right tool for the right job and all.

The argument here reads to me akin to suggesting that, a baseball bat is not very good at cutting loaves of bread! don't use baseball bats! you should really stick to knives!

Can we perhaps just agree that it's both possible to hold knives without cutting yourself and possess and wield a baseball bat without ruining your sandwich?

Anyways, at this point you're just posting flamebait and I don't particularly enjoy the direction the thread has taken by pulling credentials and implying inadequacy. It's not constructive. I hope at least some readers are convinced that they need not fear the UUID.


I can understand not caring at first if this option is easier, but it's not. Bigserials are already the easiest thing to use.


> But once your tables start to grow it will become misery to work with.

I buy the data layout argument for databases with clustered tables without any smi-sequencial uuid support. But the storage argument looks vanishingly applicable to me: if someone needs to add a column to one of these tables it basically offsets a 4 byte optimization already.


8 Byte (assuming 128 bit instead of 64 bit) but yeah.

It's not quite as simple as saving 8 bytes per row though. It's 8 bytes for the UUID, plus 8 for at least the PK, plus 8 more for any other keys the UUID is in.

Then you need to do that for any foreign-key-fields and keys those are in as well.

However, unless your table has very few non-uuid columns, the rest of the table size will dwarf the extra n*8 bytes here. And if you are storing any blobs (like pictures, documents etc) then frankly all the uuids won't add up to anything.

In summary, whether using uuids or not is right for you depends a Lot on your context. An Insert Only log table, with a billion rows, is a very different use case to say a list of employees or customers.

Generally I'm very comfortable with uuids for tables of say less than 100m rows. Very high inserting though suggests tables bigger than that, and perhaps benefits from different strategies.

When it comes to data stored in multiple places (think on-phone first, syncd to cloud, exported from multiple places to consolidated BI systems), uuids solve the Lot of problems.

Context is everything.


Quick question. So you’re saying use bigserial for PK (instead of uuid) and have another column where I set a uuid in case of application level identifier.

But wouldn’t that require a index on it if it’s going to be an identifier that’s used to query?


Yes but you only have a single index containing the UUIDs instead of duplicated everywhere that is using it as a foreign key relation. Those indexes quickly become explosive in relations like e.g. m2m tables.


UUID's are for when an auto-incrementing ID can't be used, usually due to scale (if you will be inserting millions of records per second from millions of clients).

However, postgres isn't suited to such scale (something like spanner IS).

Therefore, you probably shouldn't be using UUID's with postgres.

The other reason for UUID's is because the system is badly architected and , for example, the client gets to choose their own ID which are written into a DB. Such systems generally are best avoided.


Also think about systems that are "beyond one database". Like data that lives in multiple places at the same time (on phone local storage, in database, consolidated and syncd either other databases etc.)

In other words once you move beyond '1 sql database' they become important.

(I'm not downvoting you, just disagreeing with you.)


I've used auto-incrementing integers across multiple systems.

There was one system responsible for allocating the integers (what should have been a relational database that held our user accounts & their metadata); once that write was committed¹’² then other systems (e.g., one that kept bulk/frequent per-user metrics (lots of data), but which could shard it better) used it.

I think I'd opt for UUIDs today (the size difference is minimal, and they distribute more naturally if you shard on them), but integers are doable in this manner too.

¹which … this is its own battle. It turned out later our database was less than good here. ("So call me, maybe.")

²you need a lock here. That could matter. But "user creation" is not a frequent enough event for lock contention to matter. We only did this for user IDs.


This is why I asked at the start of the thread. We *need* 128bit numbers to make the uniqueness work across loads of systems (like a distributed/federated type thing)


Maybe, but they don't need to be the PKs. And you only need some 128bit number at all if you want one generator not to be able to guess the others' numbers, like if clients are generating them.


Are you mixing up primary key with clustered key?

Primary Key serves to uniquely identify the row. This is used to reference the tow in other tables. (So the Invoice table contains the Primary Key value for the customer.)

The Clustered key describes the physical sort order on the disk. Clustering on your most common search improves performance for that search. Searching on other criteria becomes equally slow.

So, it's really common to search through invoices by date. So a date key is ideal for a clustered index on that table. Looking up all invoices for a date range is faster than all invoices by, say, customer.

Clustering the customer or employee table by date though isn't great, I don't want to often see my employees on take-on order.

We teach Primary Keys at Uni, but in my era anyway they didn't put the same emphasis on selecting a correct clustered key. So I feel like people tend to assume the Primary Key is also the clustered key.


I'm used to Postgres, which doesn't have clustered key. If this is MySQL and you're using a clustered key, pkey is still how you'd identify rows in foreign keys etc, so I think the same advice of bigserial pkey applies.

We learned using natural pkeys in college, which in practice burned me enough that I'll never do it anymore. I just set some uniqueness constraints.


Even in these situations, you probably want bigserial PKs on the shards. I think Citus does this by starting each shard's sequence at a different value.


What's the problem with bigserial as a PK?


I've got some queries like this saved which are handy to workout when things are getting bloated and need some maintenance that I find pretty handy https://wiki.postgresql.org/wiki/Show_database_bloat



Should never use uuidv4 (the builtin gen_random_uuid()) for identifiers anyway since it clusters horribly and complete randomness is the absolute enemy of anything you want to "look up".

uuid_generate_v1mc() from uuid-ossp is a much better choice if you really want a uuid indentifier since it will at least cluster much better.

And yes, absolutely _never_ make it your primary key. I've seen that mistake far too many times and it's always a disaster that is a nightmare to fix once there's a non-trivial amount of data and the database is miserable to work with.


>randomness is the absolute enemy of anything you want to "look up".

Absolutely true that having a lot ordered on a column that's one of the search predicates makes look ups faster. But aren't there many situations where the PK is essentially random even if it's an automatically incrementing integer?

Like in a customer's table, is the order someone became your customer ever really relevant for most operations? It's essentially a random id, especially if you're only looking up a single customer (or things related to that single customer).

Insert performance I could see being an issue. Im not supper familiar with postgres performance tuning, but if the engine handles a last hot page better than insert spread across all pages better (which I think is the case due to how the WAL operates) I can see that being a compelling reason for going with a sequential PK.


You're correct that for simple one-off cases, in a new-ish table, the performance difference is tiny if you can measure it at all.

As tables and indices age, they'll bloat, which can cause additional latency. Again, you may not notice this for small queries, but it can start becoming obvious in more complex queries.

The main issue I've seen is that since software and hardware is so absurdly fast, in the beginning none of this is noticed. It isn't until you've had to upsize the hardware multiple times that someone stops to think about examining schema / query performance, and at that point, they're often unwilling to do a large refactor. Then it becomes my problem as a DBRE, and I sigh and explain the technical reasons why UUIDs suck in a DB, and inevitably get told to make it work as best I can.

> Insert performance I could see being an issue.

And UPDATE, since Postgres doesn't actually do that (it does an INSERT + DELETE).



I think that is putting it rather strongly. I use UUID7 (for its better clustering) for anything that both, 1) may end up having a lot of rows (1M+), and 2) where leaking the creation time in the UUID is acceptable.

Otherwise I use UUID4.


> where leaking the creation time in the UUID is acceptable.

I've seen tons of people cite this as a problem for various reasons, but not a single person has ever given an actual example in the wild of it causing problems. This isn't war; the German Tank Problem is not nearly as applicable as people think.

Worse (from a DB perspective), it's often used as an excuse for poor API design. "I have to have a random ID, because the API is /api/user/<user-id>" and the like. OK, so don't do that? Put the user ID into a JWT (encrypted first if you'd like). Or use an external ID that is random, and an internal ID that's a sequential ID which maps to the external. Use the internal ID as the PK.


One year later, in a News Article:

"$person_name's lawyer could not be reached for comment, but it is noted that the userid mentioned in the lawsuit was created on 2022-12-03, the same day as the photos were uploaded, and the same day as the arrest warrant was issued..."

Using the id directly as the pk admittedly just makes things a bit simpler, one less bit of information to track down. I know bridge tables are not hard, but it's just one extra step that the developer has to be aware of, plan ahead for (and possibly all your teams and customer support people need to be aware of when they are tracking down an issue with an account)

UUIDv4 lets you move fast now and pay your performance-piper next year rather than now.


PK should be bigserial no matter what. "Use an external ID that is random" has the same clustering problem mentioned above since you will need an index on it, so it's not always the right answer, but I would default to it.


> same clustering problem

Not necessarily. Depending on your secrecy desire, you could include the internal ID in JWT or cookie, and then locate the row (including the external UUID) using that. Another option would be to build an index with `INCLUDE external_id`. That way, it doesn’t impact the B+tree (kind of) but it still gets pulled along for query results. The leaf nodes would become larger, but the overall ordering wouldn’t change (I think – never actually tried this).


If you mean to encrypt your internal IDs when sending to clients and decrypt when receiving, it'd work. I've never seen it done this way, maybe because with standard 2048-bit RSA you get a 2048-bit output at least, which would bloat up the API responses.


I like it as a primary key because you don't have to worry about clashes if you rejig data. But I don't work at scale with data, so performance hasn't been a blocker for this approach.


If you're not working at scale, IMO this is also not a difficult problem to overcome with integer IDs. Dump the DB into a CSV and increment the ID column by N, then reload. Or copy the table to a new table, add a trigger to copy from the original to the new with an ID increment, then do `UPDATE <temp_table> SET id = id + N WHERE id <= $MAX_ID_ORIGINALLY_SEEN`.


Yeah, but that requires basically a special favor from the DBA to create the trigger for what feels like a basic, common-enough data-load task...

What I normally see though is that the auto-incrementing sequence (at least in MS SQL Server or Oracle) isn't clever enough to say "wow, that id already exists on your table somehow? Here let me bump the sequence again and you try again with a higher ID..."

Instead you get a 2am alarm because the generated sequence somehow ran into the block of IDs that you inserted into the table and crashed with a PK unique constraint violation.

Hence UUIDv7 or ULIDs being easy to insert from the temp table into the main table.


It's easier if you have a non-PK UUID column you can use as a crutch.


What is rejigging data? Combining tables into one but keeping original PKs?


Yes. I found it often necessary at my last job, either due to loading client data from various datasets and staging tables (ideally in an idempotent way to prevent mistakes or double-entries) , or because someone asked "Hey if we already have the data in system a, can we just copy the data to system b?"


> And yes, absolutely _never_ make it your primary key.

If I dont have lots of range queries, why not then? Only because of bloating (I think fragmentation is a more precise term here)?


Because there's a good chance down the line you will need to do some sort of range query. Let's say you want to add and backill a column. Not too bad, you create a partial index where the column is null and use that for backfilling data.

But at a certain scale that starts taking too long and a bigint column would be quicker. Or you decide you need to periodically scan the table in batches for some reason. Perhaps to export the contents to a data warehouse as part of an initial snapshot.

You can skip enumerating these possibilities by having a bigint surrogate key from the get go. There's other advantages as well like better joins and temporal locality when the bigint index can be used rather than the uuid.


I'm not against using UUIDs, but it has to be done in a balanced manner. Normal numeric IDs for all surrogates, and then a UUID as an external lookup attribute. You push that externally for API calls, tokens, etc. so you don't have to worry about leaking sequential values, so it should be indexed. Otherwise, it's purely a lookup to key you into the JOIN chain for everything else.

That said, Now that UUIDv7 is available, switching to that should be a priority.


How do you know when you need to reindex?


If you have pgstattuple [0], you can check the bloat of indices. Otherwise, you can just make it a cron on a monthly / quarterly / semi-annually / whatever basis. Since PG12 you can do `REINDEX INDEX CONCURRENTLY` with zero downtime, so it really doesn't hurt to do it more often than necessary. Even before PG12, you can do an atomic version of it:

`CREATE INDEX new_<index_name> CONCURRENTLY;`

`RENAME INDEX <index_name> TO old_<index_name>;`

`RENAME INDEX new_<index_name> TO <index_name>;`

`DROP INDEX CONCURRENTLY old_<index_name>;`

[0]: https://www.postgresql.org/docs/current/pgstattuple.html


I find average leaf density to be the best metric of them all. Most btree indexes with default settings (fill factor 90%) will converge to 67.5% leaf density over time. So anything below that is bloated and a candidate for reindexing.


You can measure "bloat" in the index. It's essentially the wasted space in pages.

You can also have bloat in the heap for the same reasons.

You may also want to cluster if your pg_stat.correlation is low since that indicates your heap isn't in the same order as your index anymore. pg_repack can do all of this without blocking, but you can reindex just an index concurrently on version >= 12.

https://wiki.postgresql.org/wiki/Show_database_bloat


For our production PGSQL databases, we use a combination of PGTuner[0] to help estimate RAM requirements and PGHero[1] to get a live view of the running DB. Furthermore, we use ZFS with the built-in compression to save disk space. Together, these three utilities help keep our DBs running very well.

[0] https://pgtune.leopard.in.ua

[1] https://github.com/ankane/pghero


Why did you choose to run PG on ZFS? DBs on CoW FS aren't usually ideal.


We were running very large storage volumes in Azure (+2TB) and wanted to leverage ZFS compression to save money. After running some performance testing, we landed on a good balance of PGSQL and ZFS options that worked well for us.


Is the old ZFS block size trick still necessary (since DBs internalise so much that OSs tend to provide)?


It is - depending on the read-vs-write workload. For our workload, we landed on a record size (blocksize) of 128K which gives us 3x-5x compression. Contrary to the 8KB/16KB suggestions on the internet, our testing indicated 128K was the best option. And, using compression allows us to run much smaller storage volume sizes in Azure (thus, saving money).

We did an exhaustive test of our use-cases, and the best ZFS tuning options with Postgres we found (again, for our workload):

  * Enable ZFS on-disk compression

  * Disable ZFS in-memory compression (enabling this option costs us 30% perf penalty)

  * Enable primary caching

  * Limit read-ahead caching

Edit: Forgot to add, here are the required PGSQL options when using ZFS:

  * full_page_writes = off

  * wal_compression = off

Once the above options were set, we were getting close to EXT4 read/write speeds with the benefit of compression.


Do not forget huge pages configuration in kernel - Postgres will use them and it will save a lot of RAM for large databases, esp with many connections.

One of the ways to configure is to continue adding nr_hugepages to kernel, restarting PG and running your workload until you see HugePages_Free growing.

E.g.:

sysctl.d/10-hugepages.conf:vm.nr_hugepages=20000

See how these are used:

[root@potato ~]# cat /proc/meminfo | grep -i huge

AnonHugePages: 4411392 kB

ShmemHugePages: 0 kB

FileHugePages: 0 kB

HugePages_Total: 20000

HugePages_Free: 3103

HugePages_Rsvd: 143

HugePages_Surp: 0

Hugepagesize: 2048 kB Hugetlb: 40960000 kB


Also, for multiprocess database engines like Postgres (and Oracle), starting up and tearing down connections will be much faster with hugepages too, as the shared memory attach and detach system calls have less pagetable setup work to do in the kernel.


I've always found that the config generator below gets you 90% of the optimizations you'll likely needed.

https://pgtune.leopard.in.ua


I find the "pg connections take a lot of memory" to be mostly a myth. The number floating around is 10 - 20MB per connection but it looks it's more like 2MB per connection https://blog.anarazel.de/2020/10/07/measuring-the-memory-ove...


Why doesn't Postgres autotune those settings? Determining this experimentally seems to be very fragile


There is a certain appeal for Javaheads.

    java -Xms$FOO -Xmx$BAR
I've worked on things where we had to know the correct heap incantation to run a service.


The exact settings for those params is always more


With a caveat: If you cross 32G heap size, the JVM will start using more and you effectively have less memory available until you get to 48G.

https://confluence.atlassian.com/jirakb/do-not-use-heap-size...

We had this problem for a while in one of our (non-Atlassian) systems and were running it with Xmx at 31.5G


So what your saying is your problem would have been solved if you just increased the number...


One other sneaky way to do this is to get the parameters from an AWS RDS instance of Postgres that matches your server and use those. It's general-purpose, but it's a good starting point...and probably much better than the default Postgres settings.


Well.. that was very detailed. Will bookmark for when I'm ready to optimize.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: