Three cases against IF NOT EXISTS / IF EXISTS in Postgres DDL

November 3, 2021 · 8 min read

Nikolay Samokhvalov

EXISTS OR NOT EXISTS (meme)

What is this about?

Many DDL statements in PostgreSQL support modifiers IF EXISTS / IF NOT EXISTS. For example:

test=# create table if not exists mytable();
CREATE TABLE
test=# drop table if exists mytable;
DROP TABLE

I recommend using IF EXISTS / IF NOT EXISTS in DDL only when necessary. Here are three examples that demonstrate how the overuse of these words may lead to negative consequences.

Example 1: CREATE TABLE IF NOT EXISTS

Let's assume we have the following to be deployed using some DB schema versioning tool (such as Flyway, Sqitch, or something embedded to ORM/framework like Ruby on Rails Active Record migrations):

create table if not exists mytable(
  id int8 primary key,
  title text not null,
  created_at timestamptz not null default now()
);

The problem here is that strictly speaking, we cannot guarantee the result – there might be an existing table with the same name and a different structure. It might be created by a previous version of the same DB schema change attempt deployed by the same tool or something that was executed by another tool or manually:

test=# -- something from the past
test=# create table if not exists mytable(
  id int8 primary key
);
CREATE TABLE
test=#
test=# -- our attempt to create a table
test=# create table if not exists mytable(
  id int8 primary key,
  title text not null,
  created_at timestamptz not null default now()
);
NOTICE:  relation "mytable" already exists, skipping
CREATE TABLE
test=#
test=# -- what do we have?
test=# \d mytable
              Table "public.mytable"
 Column |  Type  | Collation | Nullable | Default
--------+--------+-----------+----------+---------
 id     | bigint |           | not null |
Indexes:
    "mytable_pkey" PRIMARY KEY, btree (id)

Why do engineers add IF NOT EXISTS in such cases? Because they are uncertain if their change was already deployed to lower environments: dev, QA, staging, and so on. IF NOT EXISTS suppresses errors and allows to "deploy" the change many times. However, this approach just hides the imperfection of the management of those environments – instead of improving the workflow, we now have increased risks of having various schema change anomalies and even production incidents. Further, we will discuss two more cases similar in nature.

DBLab Engine 4.0

Instant database branching with O(1) economics.

Example 2: DROP TABLE IF EXISTS

Most of the modern schema version control tools allow having both "do" and "undo" steps (they can be called somehow else – e.g., "deploy" and "revert"). There are opinions that the "undo" steps are useless – they may seem more or less valuable depending on circumstances, deployment workflow, and DevOps culture. However, if they are used, in some cases, we can find the following:

drop table if exists mytable;

What's bad with this approach? It can hide logical issues in the schema change scripts. For instance, let's assume we develop some software to be deployed in thousands of places, and our schema change scripts – both "do" and "undo" parts of them – should be reliably executed everywhere. We rely on the "undo" steps for the cases when our clients need to revert the upgrade; it may happen because of various issues not necessarily related to the database.

We have well-developed CI/CD pipelines testing our changes in advanced form to ensure that logic is correct and detect anomalies earlier – they always get tested using the chain "do -> undo -> do" in CI/CD pipelines.

Now let's consider this:

-- "do"
create table table2(
  id int8 primary key
);

-- "undo"
drop table if exists table2;
drop table if exists mytable; -- someone left it here just by mistake

What's wrong here? The CI/CD pipelines will not see any problems – the chain "do -> undo -> do" will work pretty well. The attempt to drop mytable will not fail; we'll just get a notice:

test=# drop table if exists mytable;
NOTICE:  table "mytable" does not exist, skipping
DROP TABLE

But what if mytable was created by another DB schema migration long ago? If the "undo" step needs to be executed somewhere, we are going to drop that table. This is not what we expected! For such time of mistakes, we do need to have an error in CI tests - bug IF EXISTS is going to "mask" the problem. As a result, automated testing is not going to catch the problem, and this wrong change has risks to be released.

Example 3: CREATE INDEX CONCURRENTLY IF NOT EXISTS

For large tables under load, it is recommended to use CREATE INDEX CONCURRENTLY – it is going to work longer that CREATE INEDEX but it won't cause downtime.

It is not uncommon to see how DBAs try various index ideas right on the production database, trying to find a band-aid for some suboptimal query, and then, once found, they suggest developers include the index to migrations – just to "register" it. In such cases, it is tempting to use CREATE INDEX CONCURRENTLY IF NOT EXISTS.

However, the thing is that CREATE INDEX CONCURRENTLY is not atomic: if an attempt to deploy it fails, then the index is going to remain to be defined – in the INVALID state. Here is a simple example:

test=# create table mytable(
  id int8 PRIMARY KEY,
  title text NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE

test=# insert into mytable(title)
  select random()::text
  from generate_series(1, 10000000) i;
INSERT 0 10000000

test=# set statement_timeout to '1ms';
SET
test=# create index concurrently if not exists mytable_title_idx
  on mytable
  using btree (title);
ERROR:  canceling statement due to statement timeout
test=# reset statement_timeout;
RESET

test=# \d mytable
                         Table "public.mytable"
   Column   |           Type           | Collation | Nullable | Default
------------+--------------------------+-----------+----------+---------
 id         | bigint                   |           | not null |
 title      | text                     |           | not null |
 created_at | timestamp with time zone |           | not null | now()
Indexes:
    "mytable_pkey" PRIMARY KEY, btree (id)
    "mytable_title_idx" btree (title) INVALID

In this case, another attempt to deploy CREATE INDEX CONCURRENTLY IF NOT EXISTS is going to do nothing and report "success":

test=# create index concurrently if not exists mytable_title_idx
  on mytable
  using btree (title);
NOTICE:  relation "mytable_title_idx" already exists, skipping
CREATE INDEX

Interestingly, in this case – actually, we talk about the creation of any index on a large table under significant load in general – the IF EXISTS can be helpful. We can use it in the "do" step to support cleanup – DROP INDEX CONCURRENTLY IF EXISTS, leaving CREATE INDEX CONCURRENTLY "clean":

test=# drop index concurrently if exists mytable_title_idx;
DROP INDEX
test=#
test=# create index concurrently if not exists mytable_title_idx
  on mytable
  using btree (title);
CREATE INDEX

This may be enough – only if we forget about DBA's actions that left behind a VALID index! (Of course, it is not a good idea to test index ideas on production - for that, thin clones provided by Database Lan Engine are better in terms of risks for production and freedom for testing.)

How to drop the index but only if it's marked as INVALID? In this case, we could implement some logic to run DROP INDEX CONCURRENTLY only if pg_index.indisvalid for the index is false:

test=# select indisvalid
from pg_index
where indexrelid = 'mytable_title_idx'::regclass::oid;
 indisvalid
------------
 f
(1 row)

Unfortunately, this logic cannot be scripted in the form of anonymous DO block of PL/pgSQL code because DROP INDEX CONCURRENTLY cannot be executed inside explicitly defined transactions, functions, or anonymous DO blocks – so the logic needs to be coded externally. Running DROP INDEX in such a block would be possible, but this is not suitable for zero-downtime changes.

When to use IF NOT EXISTS / IF EXISTS

If we aim to have predictable results, it is better to avoid IF NOT EXISTS / IF EXISTS in DB schema migrations. They need to be very rare – one of the examples was demonstrated above (DROP INDEX CONCURRENTLY IF EXISTS).

In some other cases, such as scripts used for benchmarks, they can be really useful. For example, we can write bootstrap scripts that create DB schema used for benchmarking in an idempotent manner and use it in multiple benchmark iterations – so it does not matter if it is the first iteration of the 100th, the result will always be predictable:

drop table if exists test_table;
create table test_table(
    ...
);

Or even dropping the whole "benchmarking" schema before re-creating it:

drop schema if exists benchmark cascade;
create schema ...
create ...

In other cases, especially dealing with production, multiple systems, multiple people – it is better to reduce the use of IF NOT EXISTS / IF EXISTS to a minimum.

And what do you think? Find me on Twitter (@samokhvalov) and let me know!

Nikolay Samokhvalov

CEO & Founder of PostgresAI

Working on tools to balance Dev with Ops in DevOps

DBLab Engine 4.0

Instant database branching with O(1) economics.

What is this about?​

Example 1: CREATE TABLE IF NOT EXISTS​

DBLab Engine 4.0

Example 2: DROP TABLE IF EXISTS​

Example 3: CREATE INDEX CONCURRENTLY IF NOT EXISTS​

When to use IF NOT EXISTS / IF EXISTS​

Nikolay Samokhvalov

DBLab Engine 4.0

What is this about?

Example 1: CREATE TABLE IF NOT EXISTS

Example 2: DROP TABLE IF EXISTS

Example 3: CREATE INDEX CONCURRENTLY IF NOT EXISTS

When to use IF NOT EXISTS / IF EXISTS