Skip to main content

Useful queries to analyze PostgreSQL lock trees (a.k.a. lock queues)

· 6 min read

Example output of the query for 'lock trees' analysis

For OLTP workloads (such as web and mobile applications), it is important to understand object-level and row-level locks in PostgreSQL. There are several good materials that I can recommend reading:

When it comes to the lock monitoring and troubleshooting, you can start with basic queries collected at these PostgreSQL Wiki pages:

Example output of the query for 'lock trees' analysisFor more convenient (hence faster, in many cases) troubleshooting, you might want to use some advanced query, presenting results in a form that allows you to quickly:
  • find the "offending" queries – those that are the "roots" of each blocking tree (a.k.a. "lock queues", "wait queues", or "blocking chains"; in a previous post, we've discussed and demonstrated how queries requiring to acquire locks may organize multiple lock queues, see "Zero-downtime Postgres schema migrations need this: lock_timeout and retries. Problem demonstration"), and
  • decide what to do to fix it – either understand the source of the query (application or human) or just grab the PID and use pg_cancel_backend(..) / pg_terminate_backend(..) to interrupt it and unblock other sessions.

Here are two examples of other people's work that you might find helpful:

  • "Active Session History in PostgreSQL: blocker and wait chain" by Bertrand Drouvot – this post describes the recursive CTE query pg_ash_wait_chain.sql that is useful for those who use the pgsentinel extension. The query is inspired by Tanel Poder's script for Oracle.
  • locktree.sql – query to display a tree of blocking sessions based on the information from pg_locks and pg_stat_activity, by Victor Yegorov.

I've experimented with the latter for some time and eventually started to add some additional bits of information to it – first of all, based on pg_blocking_pids(..) introduced in PostgreSQL 9.6. At some point, I rewrote the query from scratch, so consider it as the third option for the "advanced" lock issue troubleshooting.

The function pg_blocking_pids(..), per documentation, should not be used often:

Frequent calls to this function could have some impact on database performance, because it needs exclusive access to the lock manager's shared state for a short time.

Still, it is very useful, so I recommend using it as an ad hoc tool, with additional protective measures:

  • do not run it frequently and avoid using it in monitoring,
  • use low values for statement_timeout to minimize the possible impact on other sessions.

Another improvement is pg_locks.waitstart added in PostgreSQL 14, which shows how much time the processes are waiting for the lock.

It is now time to discuss the query itself. As the first step, let's enabling timing reporting and set a low value for statement_timeout:

\timing onset statement_timeout to '100ms';

And now run the query:

with recursive activity as (  select    pg_blocking_pids(pid) blocked_by,    *,    age(clock_timestamp(), xact_start)::interval(0) as tx_age,    -- "pg_locks.waitstart" – PG14+ only; for older versions:  age(clock_timestamp(), state_change) as wait_age    age(clock_timestamp(), (select max(l.waitstart) from pg_locks l where a.pid = l.pid))::interval(0) as wait_age  from pg_stat_activity a  where state is distinct from 'idle'), blockers as (  select    array_agg(distinct c order by c) as pids  from (    select unnest(blocked_by)    from activity  ) as dt(c)), tree as (  select    activity.*,    1 as level,    activity.pid as top_blocker_pid,    array[activity.pid] as path,    array[activity.pid]::int[] as all_blockers_above  from activity, blockers  where    array[pid] <@ blockers.pids    and blocked_by = '{}'::int[]  union all  select    activity.*,    tree.level + 1 as level,    tree.top_blocker_pid,    path || array[activity.pid] as path,    tree.all_blockers_above || array_agg(activity.pid) over () as all_blockers_above  from activity, tree  where    not array[activity.pid] <@ tree.all_blockers_above    and activity.blocked_by <> '{}'::int[]    and activity.blocked_by <@ tree.all_blockers_above)select  pid,  blocked_by,  case when wait_event_type <> 'Lock' then replace(state, 'idle in transaction', 'idletx') else 'waiting' end as state,  wait_event_type || ':' || wait_event as wait,  wait_age,  tx_age,  to_char(age(backend_xid), 'FM999,999,999,990') as xid_age,  to_char(2147483647 - age(backend_xmin), 'FM999,999,999,990') as xmin_ttf,  datname,  usename,  (select count(distinct t1.pid) from tree t1 where array[tree.pid] <@ t1.path and t1.pid <> tree.pid) as blkd,  format(    '%s %s%s',    lpad('[' || pid::text || ']', 9, ' '),    repeat('.', level - 1) || case when level > 1 then ' ' end,    left(query, 1000)  ) as queryfrom treeorder by top_blocker_pid, level, pid
\watch 10

Note that I've used:

  • a low statement_timeout value, as discussed, and
  • \watch 10 instead of ; – this tells psql to run it in an infinite loop, with 10-second pauses (you can always interrupt it using Ctrl-C).

Here is an example output:

   pid   |   blocked_by    |  state  |        wait        | wait_age |  tx_age  | xid_age |   xmin_ttf    | datname  | usename  | blkd |                          query---------+-----------------+---------+--------------------+----------+----------+---------+---------------+----------+----------+------+---------------------------------------------------------  641449 | {}              | idletx  | Client:ClientRead  |          | 00:01:23 | 4       |               |     test |      nik |    4 |   [641449] update table1 set id = id;  641586 | {641449}        | waiting | Lock:transactionid | 00:01:12 | 00:01:12 | 3       | 2,147,483,637 |     test |      nik |    3 |   [641586] . delete from table1 ;  641594 | {641586,641449} | waiting | Lock:relation      | 00:00:53 | 00:00:53 | 2       | 2,147,483,637 |     test |      nik |    2 |   [641594] .. alter table table1 add column data jsonb;  641588 | {641594}        | waiting | Lock:relation      | 00:00:49 | 00:00:49 |         | 2,147,483,637 |     test |      nik |    0 |   [641588] ... select * from table1 where id = 1;  641590 | {641594}        | waiting | Lock:relation      | 00:00:45 | 00:00:45 |         | 2,147,483,637 |     test |      nik |    0 |   [641590] ... select * from table1;  641667 | {}              | idletx  | Client:ClientRead  |          | 00:00:39 | 1       |               |     test |      nik |    1 |   [641667] drop table table2;  641669 | {641667}        | waiting | Lock:relation      | 00:00:23 | 00:00:23 |         | 2,147,483,637 |     test |      nik |    0 |   [641669] . select * from table2;(7 rows)

Update 2022-05-15: Many thanks to Aleksey Lesovsky for reviewing this article, script fixes, and corrections.

Share this blog post:

Nikolay Samokhvalov
Nikolay Samokhvalov

CEO & Founder of Postgres.ai

Working on tools to balance Dev with Ops in DevOps

Database Lab
Database Lab by Postgres.ai

An open-source experimentation platform for PostgreSQL databases. Instantly create full-size clones of your production database and use them to test your database migrations, optimize SQL, or deploy full-size staging apps.