At stake is the question of whether the key locations should be saved as successive offsets, or as lengths. The advantage of the former is that it speeds up extraction by making it only two lookups to locate a top-level key regardless of the number of keys you have. The disadvantage is that a series of increasing offsets isn't very compressible.
For JSONB fields which consist of a few top-level keys and large values, this question makes no difference at all. However, for the worst case ... 150+ top-level keys with short (under 10 bytes) values, the difference is quite dramatic. For example, I constructed a test with 183 keys, of which 175 were NUMERIC. I checked both table size and time to extract key #160 from 100,000 rows:
This is a "worst case" scenario for the difference between these two designs. Note that the extraction slowdown affects only retrieving the value to the client; it does not affect index lookups of JSONB rows, which are speedy no matter which patch is employed.
However, we're undecided on this "fix" because we don't know a couple things:
- How likely are users to have 150+ top-level keys in one field (or keys on any single level together) with short values?
- Is up to 60% space savings in return for up to 80% extraction slowdown a good tradeoff? Or a bad one?