Franck Pachot for MongoDB

Posted on May 31 • Edited on Jun 2

No HOT updates on JSONB (write amplification)

#postgres #document #database #mongodb

PostgreSQL's Multi-Version Concurrency Control (MVCC) works around the challenge of in-place updates in fixed block storage by avoiding it. Instead of updating rows, it processes them as deletes and inserts, prioritizing simplicity of implementation over performance. Updating fields in a JSONB document can be problematic due to significant write amplification.

What are Heap Only Tuple (HOT) updates?

When a table row is updated, the entire row is marked for deletion by setting its xmax value, indicating the end of its visibility period. A new version of the row is then created with a fresh xmin value to signify the start of its visibility. Write amplification arises not only from copying the entire row but also from the need to update all indexes associated with the table. PostgreSQL indexes reference rows using their physical location (ctid), meaning that any change in the row's physical location requires new index entries to find the latest version of the row, even if the indexed column values remain unchanged. Over time, when older versions of rows are no longer visible to any active transaction—having passed the xmin horizon—they are eligible for garbage collection by the vacuum process, which removes outdated row versions and their associated index entries.

Given that many SQL applications have multiple indexes on their tables, frequent updates can exacerbate write amplification, with detrimental consequences for checkpoints and replication, especially when every index must be updated regardless of whether the indexed values changed. To mitigate this, PostgreSQL introduces an optimization called Heap-Only Tuple (HOT) updates that avoid adding new index entries for keys that didn't change, in cases where the new version of the row fits in the same block as the previous version. If a column is frequently updated and the old version is frequently vacuumed, some free space may be constantly available in the block for new versions (and this can be initialized with a lower fillfactor) and HOT optimization can kick-in.

This blog post series is about using PostgreSQL as a document database, with all data in JSONB, but there's no Heap-Only Tuple optimization for indexes on JSONB fields.

Test it with EXPLAIN (ANALYZE, WAL, BUFFERS)

I create a table similar to the one in the previous post, storing user profiles, and add a login sub-object to record the last login date and a login counter:

create table users (
  id bigserial primary key,
  data jsonb not null
);
insert into users (data) values (
 jsonb_build_object(
    'name', 'Homer Simpson',
    '{login}',
    jsonb_build_object(
      'last', to_char(current_timestamp, 'YYYY-MM-DD HH24:MI:SS'),
      'count', 0
    )  ,
    'email', jsonb_build_array(
      '[email protected]',
      '[email protected]',
      '[email protected]'
    )
  )
 );

This is the PostgreSQL equivalent of the following MongoDB call to insert a document:

// MongoDB equivalent query
db.users.insertOne({  
    "_id": 1,
    name: "Homer Simpson",  
    login: {  
      last: new Date(),  
      count: 0  
    },  
    email: [  
      "[email protected]",  
      "[email protected]",  
      "[email protected]"  
    ]  
});

My use-case is the equivalent of the following to increase the login counter and update the last login date:

// MongoDB equivalent query
db.users.updateOne(  
  { _id: 1 },  
  {  
    $set: { "login.last": new Date() },  
    $inc: { "login.count": 1 }  
  }  
);

In SQL, there's no increment operation. Instead, an update sets the new values. When stored as a JSONB field in PostgreSQL, we must replace the document with a new one using json_set() to modify the fields.

I run some updates to increase the login counter and update the last login date and show the execution plan with statistics:

explain (analyze, verbose, buffers, wal, serialize text, costs off)
UPDATE users
SET data = jsonb_set(
  data,
  '{login}',
  jsonb_build_object(
    'last', to_char(current_timestamp, 'YYYY-MM-DD'),
    'count', (COALESCE((data->'login'->>'count')::int, 0) + 1)
  )
)
where id=1
\watch

Here is the execution plan showing two buffer hits to find the row via index, and one Write-Ahead Logging (WAL) record for the update of the row (71 bytes)

QUERY PLAN                                                                                                           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on public.users (actual time=0.057..0.057 rows=0 loops=1)
   Buffers: shared hit=4
   WAL: records=1 bytes=71
   ->  Index Scan using users_pkey on public.users (actual time=0.040..0.041 rows=1 loops=1)
         Output: jsonb_set(data, '{login}'::text[], jsonb_build_object('last', to_char(CURRENT_TIMESTAMP, 'YYYY-MM-DD'::text), 'count', (COALESCE((((data -> 'login'::text) ->> 'count'::text))::integer, 0) + 1)), true), ctid
         Index Cond: (users.id = 1)
         Buffers: shared hit=2
 Planning Time: 0.063 ms
 Serialization: time=0.000 ms  output=0kB  format=text
 Execution Time: 0.077 ms

You can run that for a while and on a large table, and observe the same. Even if it writes more than necessary, because the whole row and JSON documents is re-written, the performance is predictable.

Note that you may observe some executions with one more WAL record generated by the Index Scan as reads may do some delayed cleanup:

QUERY PLAN                                                                                                           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on public.users (actual time=0.062..0.063 rows=0 loops=1)
   Buffers: shared hit=4
   WAL: records=2 bytes=157
   ->  Index Scan using users_pkey on public.users (actual time=0.047..0.048 rows=1 loops=1)
         Output: jsonb_set(data, '{login}'::text[], jsonb_build_object('last', to_char(CURRENT_TIMESTAMP, 'YYYY-MM-DD'::text), 'count', (COALESCE((((data -> 'login'::text) ->> 'count'::text))::integer, 0) + 1)), true), ctid
         Index Cond: (users.id = 1)
         Buffers: shared hit=2
         WAL: records=1 bytes=86
 Planning Time: 0.063 ms
 Serialization: time=0.000 ms  output=0kB  format=text
 Execution Time: 0.083 ms

While storing all data in JSONB, similar to a document database, may seem appealing, this table lacks indexes. In a real-world application, documents will contain more fields and sub-documents and require multiple indexes, which are likely to evolve as the application develops.

Adding indexes

During the lifecycle of an application, more indexes are created. I add an index on the user name:

create index on users(
 (data->>'name')
);

In PostgreSQL, adding an index to fields that are not updated does impact updates differently than in many other databases. For instance, my login update produces two additional WAL records, resulting in a total WAL size that is three times larger, along with increased buffer reads.

QUERY PLAN                                                                                                           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on public.users (actual time=0.091..0.092 rows=0 loops=1)
   Buffers: shared hit=9
   WAL: records=3 bytes=207
   ->  Index Scan using users_pkey on public.users (actual time=0.059..0.060 rows=1 loops=1)
         Output: jsonb_set(data, '{login}'::text[], jsonb_build_object('last', to_char(CURRENT_TIMESTAMP, 'YYYY-MM-DD'::text), 'count', (COALESCE((((data -> 'login'::text) ->> 'count'::text))::integer, 0) + 1)), true), ctid
         Index Cond: (users.id = 1)
         Buffers: shared hit=3
 Planning Time: 0.068 ms
 Serialization: time=0.000 ms  output=0kB  format=text
 Execution Time: 0.113 ms

PostgreSQL requires an expression index to index JSON fields. We have seen one limitation of expression indexes in a previous post (No Index Only Scan on JSONB Fields) and here is another one: PostgreSQL doesn't detect when the indexed value has not changed. This prevents it from applying HOT optimization, even if the new row fits within the same block.

This was with an expression index on a scalar value (with no array in the JSON path) but there's the same problem with GIN indexes. I create the same index as in the previous post (No Index for LIKE on JSONB):

CREATE INDEX idx_users_data_email ON users USING GIN (
 (data->'email') jsonb_path_ops
);

My update that does't touch this field shows one more WAL record, larger WAL size and more buffer reads:

                                                                                                           QUERY PLAN                                                                                                           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on public.users (actual time=0.080..0.080 rows=0 loops=1)
   Buffers: shared hit=11
   WAL: records=4 bytes=397
   ->  Index Scan using users_pkey on public.users (actual time=0.039..0.041 rows=1 loops=1)
         Output: jsonb_set(data, '{login}'::text[], jsonb_build_object('last', to_char(CURRENT_TIMESTAMP, 'YYYY-MM-DD'::text), 'count', (COALESCE((((data -> 'login'::text) ->> 'count'::text))::integer, 0) + 1)), true), ctid
         Index Cond: (users.id = 1)
         Buffers: shared hit=3
 Planning Time: 0.070 ms
 Serialization: time=0.000 ms  output=0kB  format=text
 Execution Time: 0.100 ms

The issue at hand is that you might prefer document data modeling over relational data modeling due to its simplicity in matching your domain access patterns, and may have come across some "Just use PostgreSQL" advocacy that claims that JSONB can transform PostgreSQL into a document database. You started such design with all data in a JSONB field, and your initial performance metrics might be met but, as your application grows and more indexes are added, critical use cases may struggle to scale.

I emphasized the importance of WAL records and size, as they are significant bottlenecks in PostgreSQL's scalability due to single-threaded WAL replication. Additionally, write amplification leads to other complications, including increased checkpoint work and higher pressure on vacuum. Scaling up with more CPUs won't resolve the issue, and adding read replicas won't help either since all indexes need to be created on the primary database.

PostgreSQL is a relational database that incorporates JSONB for added flexibility, but it doesn't convert it into a document database. In an SQL RDBMS, frequently updated or indexed fields should be in their own columns, maybe their own tables, while JSON can be used for additional flexible data accessed as a whole. If a document model is preferred, consider using a document database like MongoDB, which performs in-place updates to documents in memory and updates only the relevant indexes (FAQ: Indexes) and is not limited by fixed block size storage (documents are stored in a B-Tree with variable leaf size, and secondary indexes reference them with the key in this B-Tree).

Top comments (2)

Friday candour • Jun 1

This is a qualitative evaluation, I appreciate your effort for bringing this to light.

MakerBox • Jun 5

Great ideas!