8 posts tagged with "postgres.ai"

View All Tags

Database Lab Engine 2.0

Database Lab 2.0 release

Database Lab Engine 2.0 for PostgreSQL released#

The Postgres.ai team is proud to announce version 2.0 of Database Lab Engine (DLE) for PostgreSQL, a modern database tool for building powerful development and testing environments based on thin cloning. Using Database Lab API or CLI (and if you are using Database Lab SaaS, GUI), on a single machine with, say, a 1 TiB disk, you can easily create and destroy dozens of database copies of size 1 TiB each. All these copies are independently modifiable and created/destroyed in just a few seconds. This can become a game-changer in your development and testing workflow, improving time-to-market, and reducing costs of your non-production infrastructure.

This release continues our strategy to automate all routine tasks such as initialization of the PostgreSQL data directory, data transformation, and snapshot management. In DLE 2.0, all these tasks can be flexibly configured in a single configuration file. As a result, building dev&test environments for projects with many databases (such as those that adopted microservice architecture) becomes much easier.

The previous versions of the Database Lab introduced the core technology: thin clone provisioning, based on either ZFS (default) or LVM. It was already possible to provision full-sized multi-terabyte database clones in just a few seconds and use them for a broad spectrum of tasks such as database schema changes verification, SQL query analysis, or general application testing.

Version 2.0 speeds up and empowers the initialization of DLE itself. Instead of using custom scripts for initial and continuous data retrieval, it is now possible to configure everything in a declarative manner to get the data and be up and running.

Updates in DLE 2.0#

  • Automated data retrieval: specify the source and the method of initializing the data directory and how it is to be updated
  • Both physical (pg_basebackup, WAL-G, more) and logical methods (dump/restore, Amazon RDS, Heroku Postgres, more) are supported (see the guide Database Lab Engine data sources)
  • Any managed cloud PostgreSQL offering is now supported, with additional features for Amazon RDS (see DLE tutorial for Amazon RDS and the guide Data source: AWS RDS)
  • For continuously updated physically initialized data directory (which effectively makes your DLE a specialized replica), snapshot management is fully automated: snapshots are created and destroyed based on the schedule defined in the configuration file (see the reference Job physicalSnapshot)
  • Basic data transformation and masking supported: specify any custom script that will be applied each time a new snapshot is prepared (option preprocessingScript in both logicalSnapshot and physicalSnapshot jobs, see the Configuration reference)
  • License changed to AGPLv3
  • The documentation is significantly extended: 3 tutorials, 26 user guides, 6 references, and counting: http://postgres.ai/docs

What's next#

Check out:

Please send us any feedback you have – it is hard to overestimate its meaning for such a young project:

Database Lab Engine allows cloning PostgreSQL databases of any size in just a few seconds. This can save a lot of money for development and testing infrastructure, and at the same time, drastically improve development quality and time-to-market. Database Lab Engine is open-source software distributed under OSI-approved AGPLv3 license.

Database Lab Engine is equipped with API and CLI. Additionally, we at Postgres.ai continue developing the Enterprise version that offers GUI, authentication flexibility, and user management for Database Lab Engine API and CLI, more. The Enterprise version is in the "private beta" mode; we encourage you to sign up and request a demo.

Database Lab Engine 2.0 beta: one config to rule them all; support for Amazon RDS

Database Lab Engine 2.0 beta: one config to rule them all; support for Amazon RDS#

During this Summer, we were super-busy achieving two goals that defined version 2.0 of Database Lab Engine:

  1. Make all the things in Database Lab configurable in a unified manner (single configuration file): first of all, data initialization and snapshot management.
  2. Support both physical and logical types of initialization. Particularly, allow working with an RDS database as a source.

Both targets happened to be quite challenging, but it is finally done, and now we are happy to see that all the pieces of Database Lab Engine work in containers, the whole workflow is described in a single YAML configuration file, and, last but not least, it works with RDS Postgres databases. Yay!

Check out Database Lab Engine release notes, Tutorial for RDS users, and Database Lab Engine configuration reference.

As usual, please send us any feedback you have; it is hard to overestimate its meaning for such a young project:

Database Lab Engine is open-source software distributed under OSI-approved AGPLv3 license. Database Lab Engine allows to clone PostgreSQL databases of any size in just a few seconds. This can save you a lot of money for development and testing infrastructure, and at the same time, drastically improve development quality and time-to-market.

The open-source Database Lab Engine is equipped with convenient API and CLI. Additionally, we continue developing the Enterprise version that offers GUI, authentication flexibility, and user management for Database Lab Engine API and CLI, more. The Enterprise version is in the "private beta" mode; we encourage you to sign up and request a demo.

Joe 0.7.0 released! New in this release: Web UI, Channel Mapping, and new commands

In addition to Slack integration, Joe Bot can be now integrated with Postgres.ai Platform, providing convenient Web UI for all developers who want to troubleshoot and optimize SQL efficiently. Secure and performant Web UI works in any modern browser (even mobile!) and brings more flexibility, 1:1 communication, and visualization options.

What's new in version 0.7.0#

  • [EE] Support Web UI integration with Postgres.ai Platform (see our updated Joe Bot Tutorial to integrate)
  • Extendable communication types: implement support for your favorite messenger
  • Channel Mapping: plug-in as many databases as you want in one Database Lab instance
  • [EE] Support multiple Database Lab instances in parallel
  • New commands to monitor current activity and terminate long-lasting queries
  • Flexible Bot configuration: various convenient options are available in one place
  • Permalinks: when integrated with Postgres.ai Platform, Joe responses contain links to a detailed analysis of SQL execution plans, with three visualization options (FlameGraphs, PEV2 by Dalibo, and good old "explain.depesz.com", all embedded to the Platform)

The full list of changes can be found in Changelog. Can't wait to try!

Web UI communication type#

Originally, only the Slack version of Joe Bot was publicly available. Today, we are excited to announce that there are two available types of communication with Joe:

The good news is that you can use both of them in parallel.

Thanks to recent refactoring of Joe codebase, and the fact that this codebase is open-source, you can develop and add support for any messenger. Feel free to open issues to discuss the implementation and merge requests to include the code into the main Joe Bot repository. See also: communication channels issues, and discussions in our Community Slack.

Check Platform Overview to discover all advantages of using Web UI working on Postgres.ai Platform.

TODO Fix image ?Postgres.ai Console

Joe Bot Tutorial was adjusted and now explains setting up both Slack and Web UI versions: https://postgres.ai/docs/tutorials/joe-setup.

🚀 Note that currently, Postgres.ai Platform is working in "Closed Beta" mode. During this period, we activate accounts on Postgres.ai only after a 30-minute video call with a demo session and screen sharing. Feel free to join https://postgres.ai using our Google/GitLab/GitHub/LinkedIn account but allow some time while we process your registration and reach you to organize a demo session.

Channel Mapping#

Often the infrastructure doesn't limit by a single database. In addition, we want to work with different kinds of communication types. Someone is comfortable with Slack, whereas someone prefers Web UI.

Does it mean that we have to run multiple Joe instances? Starting with version 0.7.0 the answer is no.

Thanks to Channel Mapping you can easily use multiple databases and communication types. Moreover, you can configure multiple Database Lab instances in Enterprise Edition.

Check all configuration options in the docs to realize how the channel mapping can be implemented.

New commands: activity and terminate#

Imagine, we have a PostgreSQL database \d+:

List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------------+-------+----------+---------+-------------
public | pgbench_accounts | table | postgres | 171 GB |
public | pgbench_branches | table | postgres | 624 kB |
public | pgbench_history | table | postgres | 512 kB |
public | pgbench_tellers | table | postgres | 6616 kB |
(4 rows)

We are running a query and realizing that it will take for a long time:

explain select from pgbench_accounts where bid = 100;
Plan without execution:
Gather (cost=1000.00..29605361.74 rows=118321 width=0)
Workers Planned: 2
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..29592529.64 rows=49300 width=0)
Filter: (bid = 100)
JIT:
Functions: 3
Options: Inlining true, Optimization true, Expressions true, Deforming true

What can we do if we don't want to waste our time? Version 0.7.0 adds new commands to control running queries.

The activity command shows currently running sessions in Postgres for following states: active, idle in transaction, disabled. We can easily discover the current clone activity using this command:

Activity response:
PID | QUERY | STATE | BACKEND TYPE | WAIT EVENT | WAIT EVENT TYPE | QUERY DURATION | STATE CHANGED AGO
------+--------------------------------+--------+-----------------+--------------+-----------------+-----------------+--------------------
20 | EXPLAIN (ANALYZE, COSTS, | active | client backend | DataFileRead | IO | 00:10:06.738739 | 00:10:06.738783
| VERBOSE, BUFFERS, FORMAT JSON) | | | | | |
| select from pgbench_accounts | | | | | |
| where bid = 100... | | | | | |
29 | EXPLAIN (ANALYZE, COSTS, | active | parallel worker | DataFileRead | IO | 00:10:06.738798 | 00:10:06.698513
| VERBOSE, BUFFERS, FORMAT JSON) | | | | | |
| select from pgbench_accounts | | | | | |
| where bid = 100... | | | | | |
28 | EXPLAIN (ANALYZE, COSTS, | active | parallel worker | DataFileRead | IO | 00:10:06.738807 | 00:10:06.705874
| VERBOSE, BUFFERS, FORMAT JSON) | | | | | |
| select from pgbench_accounts | | | | | |
| where bid = 100... | | |

It shows we have a long-lasting query. So, in case if we don't want to wait when it finishes, a new terminate command will help us with this. The command terminates Postgres backend that has the specified PID. As we can see above, the client backend process has PID 20. Therefore, let's run terminate 20:

Terminate response:
PG TERMINATE BACKEND
------------------------
true

Success. Then repeat the command activity:

Activity response:
No results.

Also, we can notice that the previously running command has been stopped:

explain select from pgbench_accounts where bid = 100;
ERROR: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)

That's a very powerful tool!

See the full list of Joe's commands in the docs.

Links:#

Joe 0.6.0 supports hypothetical indexes

Joe's new command hypo to further boost development processes#

Building indexes for large tables may take a long time. The new release of Joe bot includes the ability to get a sneak peek of the SQL query plan, using hypothetical indexes, before proceeding to actually building large indexes.

A hypothetical index is an index that doesn't exist on disk. Therefore it doesn't cost IO, CPU, or any resource to create. It means that such indexes are created almost instantly.

With the brand new command, hypo, you can create hypothetical indexes with Joe and ensure that PostgreSQL would use them. Once it's done, you can use exec to build the actual indexes (in some cases, you'll need to wait some hours for this) and see the actual plan in action.

Note, since the command works on top of the HypoPG extension, your Database Lab image has to use a Docker image for Postgres that contains HypoPG, because this extension is not a part of the core PostgreSQL distribution. For convenience, we have prepared images with HypoPG (and some other extensions) included, for Postgres versions 9.6, 10, 11, and 12. Of course, you can always use your custom image.

To be able to see the plan without actual execution, we have added one more new command: plan. It is aware of hypothetical indexes, so if one is detected in the plan, it presents two versions of the plan, with and without HypoPG involved.

What's new in version 0.6.0#

Version 0.6.0 adds new commands to work with hypothetical indexes and get a query plan without execution, grand improvements in message processing, more. The full list of changes can be found in Changelog. Stay tuned!

Demo#

Joe demo

First, we need a running Database Lab instance that uses a Docker image with HypoPG extension. Choose a custom Docker image in Database Lab Engine configuration, specifying dockerImage in config.yml of your Database Lab instance:

...
dockerImage: "postgresai/extended-postgres:12"
...

Let's see how to use hypothetical indexes with Joe. Generate a synthetic database using standard PostgreSQL tool called pgbench:

$ pgbench -i -s 10000 test

Check the size of tables \d+:

List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------------+-------+----------+---------+-------------
public | pgbench_accounts | table | postgres | 171 GB |
public | pgbench_branches | table | postgres | 520 kB |
public | pgbench_history | table | postgres | 0 bytes |
public | pgbench_tellers | table | postgres | 5960 kB |
(4 rows)

Then, get a query plan that should benefit an index that’s not here:

explain select * from pgbench_accounts where bid = 1;

The result is:

Plan with execution:
Gather (cost=1000.00..29605106.00 rows=118320 width=97) (actual time=770.623..3673842.642 rows=100000 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=64 read=22457314
-> Parallel Seq Scan on public.pgbench_accounts (cost=0.00..29592274.00 rows=49300 width=97) (actual time=748.869..3673654.643 rows=33333 loops=3)
Filter: (pgbench_accounts.b
Recommendations:
SeqScan is used – Consider adding an index Show details
Query processes too much data to return a relatively small number of rows. – Reduce data cardinality as early as possible during the execution, using one or several of the following techniques: new indexes, partitioning, query rewriting, denormalization. See the visualization of the plan to understand which plan nodes are the main bottlenecks. Show details
Add LIMIT – The number of rows in the result set is too big. Limit number of rows. Show details
Summary:
Time: 61.231 min
- planning: 0.079 ms
- execution: 61.231 min
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 64 (~512.00 KiB) from the buffer pool
- reads: 22457314 (~171.30 GiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0

This query takes an enormously long time. The recommendations suggest adding an index. Before building a real index, let's verify our index idea with instant creation of the corresponding hypothetical index, simply using hypo create index on pgbench_accounts (bid):

HypoPG response:
INDEXRELID | INDEXNAME
-------------+------------------------------------
24588 | <24588>btree_pgbench_accounts_bid

Check that index has been created hypo desc:

HypoPG response:
INDEXRELID | INDEXNAME | NSPNAME | RELNAME | AMNAME
-------------+-----------------------------------+---------+------------------+---------
24588 | <24588>btree_pgbench_accounts_bid | public | pgbench_accounts | btree

Get more details about the index such as estimated size and index definition hypo desc 24588:

HypoPG response:
INDEXRELID | INDEXNAME | HYPOPG GET INDEXDEF | PG SIZE PRETTY
-------------+-----------------------------------+--------------------------------+-----------------
24588 | <24588>btree_pgbench_accounts_bid | CREATE INDEX ON | 1366 MB
| | public.pgbench_accounts USING |
| | btree (bid) |

With the consideration that it may be annoying and not really useful to wait seconds (or even minutes) for actual execution when we deal with hypothetical index checks - so let's use the plan command plan select * from pgbench_accounts where bid = 1; and save even more time:

Joe's response will be:

Plan (HypoPG involved 👻):
Index Scan using <24588>btree_pgbench_accounts_bid on pgbench_accounts (cost=0.08..5525.68 rows=118320 width=97)
Index Cond: (bid = 1)
Plan without HypoPG indexes:
Gather (cost=1000.00..29605106.00 rows=118320 width=97)
Workers Planned: 2
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..29592274.00 rows=49300 width=97)
Filter: (bid = 1)
JIT:
Functions: 2
Options: Inlining true, Optimization true, Expressions true, Deforming true

Perfect! The index works! It means we can reset the hypothetical index hypo reset and create the real one exec create index pgbench_accounts_bid on pgbench_accounts (bid);:

exec create index pgbench_accounts_bid on pgbench_accounts (bid);
Session: joe-bps8quk2n8ejes08vnhg
The query has been executed. Duration: 126.975 min

It's obvious that hypo and plan extremely save developers' time!

See the full list of Joe's commands in the docs: https://postgres.ai/docs/joe-bot/commands-reference.

Links:#

Database Lab Engine 0.3, supports both ZFS and LVM

Database Lab 0.3: users can choose which "thin clone manager" to use, ZFS or LVM#

Update: see the discussion on Hacker News!

Version 0.3 of Database Lab Engine (with a minor update to 0.3.1) adds support of LVM as an alternative to ZFS to enable thin cloning of large databases. This was one of the most requested features after the initial launch of the public Database Lab version a month ago.

Database Lab Engine is an open source technology that helps you clone non-production databases in seconds.

LVM can be chosen as a "thin-clone manager" instead of ZFS for those who do not want to use ZFS and prefer staying on more popular file systems (ext4, xfs) in non-production environments. It is worth noting that ZFS remains the default and recommended option. Postgres.ai team is very satisfied with the experience of using it running Database Labs for multi-terabyte, heavily loaded databases.

Compared to ZFS, the LVM module has a certain restriction: it is not possible to support multiple snapshots and allow choosing the snapshot when requesting a new clone. With LVM, the new clones always are based on the latest state of the database.

We invite everyone to test both modules, join our Community Slack, and provide your feedback! See the links below.

Links:

Joe bot, an SQL query optimization assistant, updated to version 0.5.0

Meet Joe#

Update: this post reached the HN top, see the discussion of Joe bot at Hacker News!

Joe is a Postgres query optimization assistant. Joe allows to boost the development process:

  • eliminating annoying waiting time needed to provision copies of large databases for development and testing purposes,
  • helping engineers understand details of SQL query performance.

Joe works on top of Database Lab. Every time when an engineer starts communicating with Joe, a new full-size copy of the database is provisioned.

This process is fully automated and takes only a few seconds, even for multi-terabyte databases. Such database copies are called "thin clones" because multiple clones share the same data blocks, so provisioning is super fast, and disk space consumption is very low. The clones are fully independent, so developers can modify databases. Finally, SQL execution plans are identical to production, which makes it possible to troubleshoot and optimize queries reliably without involving production databases.

Currently, Joe is provided only in the form of Slack chatbot. Slack was chosen to improve the level of collaboration of developers and DBAs. Alternative commucation ways (including beloved psql) are planned for future versions.

More about Joe features you can find in "What Is Joe Bot?".

Demo#

Joe demo

If you have Slack app installed, try the live demonstration:

  • Add the Database Lab community to your Slack: https://database-lab-team-slack-invite.herokuapp.com/.
  • Go to the #joe-bot-demo public channel.
  • Start with the help command, explain any queries, apply any changes to database schema using exec, quickly rollback changes with reset.

What's new in version 0.5.0#

Version 0.5.0 adds support of Slack API signed secrets, automated notifications when long-lasting jobs are done, various improvements in EXPLAIN plan processing, more. The full list of changes can be found in Changelog.

Links:#

Database Lab Engine 0.2

Database Lab Engine updated to 0.2: everything in containers, better API and CLI#

Update: see the discussion on Hacker News!

We have released version 0.2 of Database Lab Engine, an open source technology that helps you clone non-production databases in seconds.

Now all its components run in containers, so installation and use is much easier. Additionally, various improvements were made, including those in API and client CLI.

Work on documentation continues: we reworked Tutorial, and added new texts. One of them is What is Database Lab. This picture should help to compare Database Lab to traditional methods of development and testing involing large databases:

Comparison Matrix

Links:

The first public release of Database Lab Engine

Postgres.ai team is proud to announce the very first public release of Database Lab Engine#

Update: see the discussion on Hacker News!

Database Lab Engine helps you build non-production environments for projects that use multi-terabyte Postgres databases. Initially obtained using standard "thick" copying (such as pg_basebackup, restoration from an archive, or dump/restore), Postgres data directory then gets cloned on request. Such cloning takes just a couple of seconds. Developers, DBAs, and QA engineers can quickly get fully independent copies, perform testing, and idea verification obtaining reliable (close to production) results. As a result, development speed and quality significantly increase.

Database Lab Engine is open source, you can find the code, ongoing work, and the Issue tracker here: https://gitlab.com/postgres-ai/database-lab.

Here is the list of some tasks that Database Lab Engine can help solve:

  1. Troubleshoot an SQL query (run EXPLAIN, EXPLAIN (BUFFERS, ANALYZE)): with query planner settings matching production, one can check any query, including UPDATE, DELETE, INSERT, TRUNCATE, not putting production master into any risks. See also: Joe bot.
  2. Verify an index idea: it is easy to create an index and check if it helps optimize your queries.
  3. Check database migrations (DB schema changes) or massive data modifications and highlight potentially dangerous steps, to avoid performance degradation and downtime on production.

A single Database Lab instance can provide multiple thin Postgres clones (full-size and fully independent) simultaneously. It becomes possible thanks to copy-on-write (CoW) technology. The only option supported in version 0.1 is ZFS; however, there are plans to support other technologies in the future.

Database Lab can be installed either on a physical machine or a VM. Both on-premise or cloud setups are possible. Users communicate with Database Lab using either REST API or client CLI. The first version of Database Lab has certain limitations:

  • it works on Ubuntu 18.04 only,
  • only Postgres versions 9.6, 10, 11, and 12 are supported,
  • in addition to ZFS, the installation of Postgres and Golang is required (it is planned to get rid of this requirement in version 0.2, fully switching to containers).

Links: