Joe Bot demo and example

Demonstration

Demo

If you have Slack app installed, try the live demonstration:

  1. Add the Database Lab community to your Slack: https://database-lab-team-slack-invite.herokuapp.com/.
  2. Go to the #joe-bot-demo public channel.
  3. Start with the help command.
  4. Explore the schema with \d, analyze queries using explain.
  5. Try dropping a column or a whole table using exec.
  6. Ask your colleague to try talking to Joe in parallel, to ensure that your sessions are independent (so your colleague doesn't observe your changes in her/his copy of the database).
  7. Perform reset to quickly revert your changes in your own session.

The live demonstration is based on a 200 GiB database generated by pgbench.

An example

The goal

Suppose we need to optimize the following query:

select count(*)
from posts
where likes > 10 and created > '2019-10-01';

Use the explain command

Use case - Initial EXPLAIN

Read the summary

  • Execution Time: 2.5 minutes
  • Joe Bot provides recommendations:
    • Query processes too much data to return a relatively small number of rows
    • Specialized index needed
  • Shared buffers reads: 165789 (~1.30 GiB) from the OS file cache, including disk I/O

Go deeper: analyze the details provided in the execution plan

  • An Index Scan is used.

  • Rows Returned: 142.

  • Rows Removed by Filter: 170975.

    Aggregate (cost=276857.32..276857.33 rows=1 width=8) (actual time=148536.494..148536.495 rows=1 loops=1)
    Buffers: shared hit=4803 read=165789
    -> Index Scan using iiii on public.posts (cost=0.56..276856.00 rows=528 width=0) (actual time=25230.526..148536.091 rows=142 loops=1)
    Index Cond: (posts.likes > 10)
    Filter: (posts.created > '2019-10-01 00:00:00'::timestamp without time zone)
    Rows Removed by Filter: 170975
    Buffers: shared hit=4803 read=165789
  • The query condition partially does not match the index condition.

    Index Cond: (posts.likes > 10)
    Filter: (posts.created > '2019-10-01 00:00:00'::timestamp without time zone)
  • The conclustion: a specialized index is needed.

Optimize: build a new index

  • Using the exec command, we are going to create the following new index in the current session:

    create index improved_ix_posts on posts(likes, created desc);

    Use case - Create index

Check the results: explain the query again

  • Run the explain command once again:

    explain select count(*) from posts where likes > 10 and created > '2019-10-01'
  • The result:

    Time: 157.236 ms
    - planning: 1.185 ms
    - execution: 156.051 ms
    Shared buffers:
    - hits: 69 (~552.00 KiB) from the buffer pool
    - reads: 778 (~6.10 MiB) from the OS file cache, including disk I/O
    - dirtied: 0
    - writes: 0
    Aggregate (cost=4147.01..4147.02 rows=1 width=8) (actual time=155.973..155.974 rows=1 loops=1)
    Buffers: shared hit=69 read=778
    -> Index Only Scan using improved_ix_posts on public.posts (cost=0.56..4145.69 rows=528 width=0) (actual time=3.981..155.870 rows=142 loops=1)
    Index Cond: ((posts.likes > 10) AND (posts.created > '2019-10-01 00:00:00'::timestamp without time zone))
    Heap Fetches: 142
    Buffers: shared hit=69 read=778
  • Comparing to the previous analysis:

    • Index Only Scan instead of Index Scan.
    • Execution Time: ~ 150 ms1000x faster.
    • Shared buffers reads: 778 (~6.10 MiB) from the OS file cache, including disk I/O – 218x fewer data.