Skip to main content

Comprehensive Testing Covers Code AND Data

· 2 min read

Over its relatively short history, the discipline of Software Engineering has made rapid advances in the sophistication of its development processes and tools. In the past 15 years alone, the popularization of CI/CD tools has drastically improved software quality and reliability.

However, a large gap remains on the landscape of software tooling. For many engineers, it's a gap they are so accustomed to, they can no longer even see it.

The Elements of Application Behavior Code AND Data

A software application of even modest complexity is composed of two major elements: Code and Data. Application behavior is the result of the combination of these two elements.

Despite this, nearly ALL software testing focuses on code, not data. When an engineer deploys code, her visibility into application behavior looks more like this:

Testing Code Only

Why is this? Many reasons. Some are legitimate and some are worth re-examining:

  1. Code, not data, is the engineer's product
  2. Code can be tested in small, isolated, pieces
  3. Test Data is time-consuming and tedious to maintain
  4. There aren't many Data Testing tools

Taking a Fresh Look#

At Postgres.ai, we're taking a fresh look at the landscape of software testing. We're building next-generation tools to ensure engineers have a complete understanding of application behavior before their code hits production.

Here are just a few of the things we can imagine doing when application data can be easily tested:

  • Easily add database migration tests to your CI/CD pipeline
  • Add Data Validation tests pre and post migrations to ensure your expectations are met
  • Eaily add realistic load testing to your release pipeline
  • Automatically identify SQL query performance degradation as a part of your automated CI/CD pipeline

And these are just the start...

DLE 2.4: realistic DB testing in GitHub Actions; Terraform module

· 6 min read

DLE 2.4: DB Migration Checker and Terraform module

Database Lab Engine 2.4 is out#

The Database Lab Engine (DLE) is an open-source technology to enable thin cloning for PostgreSQL. Thin clones are exceptionally useful when you need to scale the development process. DLE can manage dozens of an independent clones of your database on a single machine, so each engineer or automation process works with their own database provisioned in seconds without extra costs.

DLE 2.4 brings two major capabilities to those who are interested in working with PostgreSQL thin clones:

Additionally, this release has a lot of improvements and fixes.

🚀 Add DB change testing to your CI/CD pipelines#

DB migrations – database schema and data changes, usually controlled by a special tool that tracks the changes in Git. There are many such tools: Flyway, Liquibase, and Active Record Migrations, to name a few. These tools are necessary for keeping DB schema changes sane, reliable, and predictable.

However, in most cases, testing of the changes in CI/CD is very weak because it is done using either an empty or some tiny, mocked database. As a result, with growing databases and workloads, deployments of DB changes fail more often. This problem may be so annoying for some people that they might even think about switching to some NoSQL, schemaless databases, to forget about such issues – but to meet, eventually, a bunch of others: data inconsistency or update difficulties caused by lack of normalization.

With DLE 2.4 and its DB Migration Checker component, it becomes easy to get realistic testing using thin clones of PostgreSQL databases of any size right in CI/CD pipelines. With this technology, you can drastically decrease the risk of deploying harmful DB schema changes and continue using PostgreSQL, with its excellent feature set and reliability, not compromising the development speed.

An example#

To have a basic demonstration of realistic testing of DB migrations in CI/CD pipelines, let's build an index on a table that has a significant number of rows. You can see the details of this testing in this GitHub PR: https://github.com/postgres-ai/green-zone/pull/4.

If we use CREATE INDEX to build an index on a table with data, forgetting to add CONCURRENTLY, this will block all queries to the table while CREATE INDEX is running. The larger our table is, the more noticeable the negative effect on our production workload will be. In the case of large tables, such mistakes cause partial downtime resulting in direct income and/or reputation losses, depending on the type of business.

To demonstrate how Database Lab Engine catches this problem during automated testing in CI/CD, I'm going to use a GitHub repository with some example DB migrations (managed by Sqitch) for our Demo database that contains random data. As you can see in commit 839be90, I commented out the word CONCURRENTLY. Once git push is done and our unique GitHub Action finished, we can see that our change was marked as failed by DLE's DB Migration Checker:


DB Migration Checker capturing dangerous CREATE INDEX (without CONCURRENTLY)

Let's open this job and see the details:


DB Migration Checker capturing dangerous CREATE INDEX (without CONCURRENTLY)

What happened here? Behind the schenes, a pre-installed DLE server (in AWS) quickly provisioned a thin clone of the Demo database. Next, the DB change was applied in this clone, and DB Migration Checker collected telemetry, and it becomes clear that such change is going to hold an AccessExclusiveLockё blocking other queries for a significant time (according to the settings, longer than for 10 seconds). Therefore, this change marked as failed in CI/CD. This is exactly what we need to be protected to avoid deploying such changes to production.

Of course, if we get the word CONCURRENTLY back (as I did in commit 6059bf4), we'll have our "green light":


DB Migration Checker capturing dangerous CREATE INDEX CONCURRENTLY

Key features of DLE's DB Migration Checker#

  • Automated: DB migration testing in CI/CD pipelines
  • Realistic: test results are realistic because real or close-to-real (the same size but no personal data) databases are used, thin-cloned in seconds, and destroyed after testing is done
  • Fast and inexpensive: a single machine with a single disk can operate dozens of independent thin clones
  • Well-tested DB changes to avoid deployment failures: DB Migration Checker automatically detects (and prevents!) long-lasting dangerous locks that could put your production systems down
  • Secure: DB Migration Checker runs all tests in a secure environment: data cannot be copied outside the secure container
  • Lots of helpful data points: Collect useful artifacts (such as pg_stat_*** system views) and use them to empower your DB changes review process

Currently supported tools and platforms#

Currently, full automation is supported for the DB migrations tracked in GitHub repositories using one of the following tools:

It is also supposed that the automated testing is done using GitHub Actions. However, the list of supported Git platforms, CI/CD tools, and DB migration version control systems is quite easy to extend – you can do it (please publish an MR if you do!) or open an issue to ask about it in the DLE & DB Migration Checker issue tracker.

🔷 Terraform module to deploy DLE and its components in AWS#

Terraform module for Database Lab helps you deploy the Database Lab Engine in clouds. You can find the code and detailed README here: https://gitlab.com/postgres-ai/database-lab-infrastructure.

Supported platforms and limitations of this Terraform module:

  • Your source PostgreSQL database can be located anywhere
  • DLE with its components will be deployed in AWS under your AWS account.
  • Currently, only the "logical" mode of data retrieval (dump/restore) is supported – the only available method for most so-called managed PostgreSQL cloud platforms such as RDS Postgres, RDS Aurora Postgres, Azure Postgres, Heroku. "Physical" mode is not yet supported.

Feedback and contributions are very welcome.

Useful links#

Request for feedback and contributions#

Feedback and contributions would be greatly appreciated:

Database Lab Engine 2.2 and Joe Bot 0.9

· 5 min read

DLE 2.2 and Joe 0.9

About Database Lab Engine#

The Database Lab Engine (DLE) is an open-source experimentation platform for PostgreSQL databases. The DLE instantly creates full-size thin clones of your production database which you can use to:

  1. Test database migrations
  2. Optimize SQL queries
  3. Deploy full-size staging applications

The Database Lab Engine can generate thin clones for any size database, eliminating the hours (or days!) required to create “thick” database copies using conventional methods. Thin clones are independent, fully writable, and will behave identically to production: they will have the same data and will generate the same query plans.

Learn more about the Database Lab Engine and sign up for an account at https://postgres.ai/.

Database Lab Engine 2.2.0#

Database Lab Engine (DLE) 2.2.0 further improves support for both types of PostgreSQL data directory initialization and synchronization: “physical” and “logical”. Particularly, for the “logical” type (which is useful for managed cloud PostgreSQL such as Amazon RDS users), it is now possible to setup multiple disks or disk arrays and automate data retrieval on a schedule. This gracefully cleans up the oldest versions of data, without downtime or interruptions in the lifecycle of clones.

Other improvements include:

  • Auto completion for the client CLI (“dblab”)
  • Clone container configuration — Docker parameters now can be defined in DLE config (such as --shm--size that is needed to avoid errors in newer versions of Postgres when parallel workers are used to process queries)
  • Allow requesting a clone with non-superuser access — This appears as a new option in the API and CLI called “restricted”

Database Lab Engine links:

Joe Bot 0.9.0 - A Virtual DBA for SQL Optimization#

“Joe Bot”, a virtual DBA for SQL optimization, is a revolutionary new way to troubleshoot and optimize PostgreSQL query performance. Instead of running EXPLAIN or EXPLAIN (ANALYZE, BUFFERS) directly in production, users send queries for troubleshooting to Joe Bot. Joe Bot uses the Database Lab Engine (DLE) to:

  • Generate a fresh thin clone
  • Execute the query on the clone
  • Return the resulting execution plan to the user

The returned plan is identical to production in terms of structure and data volumes – this is achieved thanks to two factors:

  • thin clones have the same data and statistics as production (at a specified point in time), and
  • the PostgreSQL planner configuration on clones matches the production configuration.

Joe Bot users not only get reliable and risk-free information on how a query will be executed on production but also they can easily apply any changes to their own thin clones and see how query behavior is affected. For example, it is possible to add a new index and see if it actually helps to speed up the query.

One key aspect of Joe Bot, is the fact that users do not see the data directly, they only work with metadata. Therefore, teams without access to production data can be granted permissions to use this tool [1]

The main change in Joe Bot 0.9.0 is improved security: in past versions, DB superuser was used. Now a non-superuser is used for all requests. This makes it impossible to use plpythonu, COPY TO PROGRAM, FDW, or dblink to perform a massive copy of data outside infrastructructure which is not well protected by a strict firewall. All users are strongly recommended to upgrade as soon as possible.

Another major new feature is the production duration estimator, currently in an “experimental” state. This feature is intended to help users understand how long a specific operation - for example, an index creation operation - will actually take on the production database, which is likely to have a different physical infrastructure (for example a different filesystem, more RAM, and/or more CPU cores) than the thin clone running on the DLE. Read more: “Query duration difference between Database Lab and production environments”.

SQL Optimization Chatbot “Joe Bot” links:


(1) Although only metadata is returned from Joe Bot, it is possible to probe data for specific values using EXPLAIN ANALYZE. Please consult security experts in your organization before providing Joe Bot to people without production-level access.


Both Joe Bot and Database Lab Engine are distributed based on OSI-approved license (AGPLv3).


Your feedback is highly appreciated!