Database Lab tutorial for any PostgreSQL database
tip
Working with Amazon RDS? See Database Lab tutorial for Amazon RDS.
Database Lab is used to boost software development and testing processes via enabling ultra-fast provisioning of databases of any size.
In this tutorial, we are going to create a "demo" PostgreSQL database and then set up a Database Lab Engine for it. Database Lab Engine will be installed on an AWS EC2 instance (alternatively, it can be an instance on another cloud platform such as GCP, or a bare-metal machine) with Ubuntu 18.04 or 20.04, and add an EBS volume to store PostgreSQL data directory. Optionally, you'll be able to skip creating the "demo" database and use an existing database instead.
note
Database Lab Engine is hosted and developed on GitLab.com. Why? GitLab Inc. is our (Postgres.ai) long-term client and an early adopter (see GitLab Development Docs). GitLab has an open-source version. Last but not least: GitLab uses PostgreSQL.
However, nowadays, not many open-source projects are hosted at GitLab.com unfortunately. ⭐️ Please support the project by giving a star on GitLab! It's on the main page of the Database Lab Engine repository, in the upper right corner:
]
Our steps:
- Prepare a virtual machine (we will use AWS EC2) with an additional disk to store data, install Docker to run containers, and ZFS to enable copy-on-write for thin cloning
- Configure and launch the Database Lab Engine
- Start using Database Lab API and client CLI to clone Postgres database in seconds
tip
Not using AWS? Not a problem: if you want to use any other cloud provider (like Google Cloud) or run your Database Lab on VMWare, or on bare metal; only the first step is going to be slightly different. In general, the overall procedure is pretty much the same.
info
LVM support as an alternative to ZFS first appeared in version 0.3.0 of Database Lab. Note that LVM, unlike a ZFS, does not support multiple snapshots and related features.
#
Step 1. Prepare a machine with disk, Docker, and ZFS#
Prepare a machineCreate an EC2 instance with Ubuntu 18.04 or 20.04, and add an EBS volume to store data. You can find detailed instructions on how to create an AWS EC2 instance here (if you want to use Google Cloud, see the GCP documentation).
#
(optional) Ports need to be open in the Security Group being usedYou will need to open the following ports (inbound rules in your Security Group):
22
: to connect to the instance using SSH2345
: to work with Database Lab Engine API (can be changed in the Database Lab Engine configuration file)6000-6100
: to connect to PostgreSQL clones (this is the default port range used in the Database Lab Engine configuration file, and can be changed if needed)
caution
For real-life use, it is not a good idea to open ports to the public. Instead, it is recommended to use VPN or SSH port forwarding to access both Database Lab API and PostgreSQL clones, or to enforce encryption for all connections using NGINX with SSL and configuring SSL in PostgreSQL configuration.
Additionally, to be able to install software, allow access to external resources using HTTP/HTTPS (edit the outbound rules in your Security Group):
80
for HTTP443
for HTTPS
Here is how the inbound and outbound rules in your Security Group may look like:
#
Install DockerIf needed, you can find the detailed installation guides for Docker here.
Install dependencies:
Install Docker:
#
Set $DBLAB_DISKFurther, we will need environment variable $DBLAB_DISK
. It must contain the device name that corresponds to the disk where all the Database Lab Engine data will be stored.
To understand what needs to be specified in $DBLAB_DISK
in your case, check the output of lsblk
:
Some examples:
AWS local ephemeral NVMe disks; EBS volumes for instances built on the Nitro system:
AWS EBS volumes for older (pre-Nitro) EC2 instances:
#
Set up either ZFS or LVM to enable thin cloningZFS is a recommended way to enable thin cloning in Database Lab. LVM is also available, but has certain limitations:
- much less flexible disk space consumption and risks for a clone to be destroyed during massive operations in it
- inability to work with multiple snapshots ("time travel"), cloning always happens based on the most recent version of data
- ZFS
- LVM
Install ZFS:
Create a new ZFS storage pool (make sure $DBLAB_DISK
has the correct value, see the previous step!):
And check the result using zfs list
and lsblk
, it has to be like this:
Install LVM2:
Create an LVM volume (make sure $DBLAB_DISK
has the correct value, see the previous step!):
info
Logical volume size needs to be defined at volume creation time. By default, we allocate 10% of the available memory. If the volume size exceeds the allocated memory, the volume will be destroyed and potentially lead to data loss. To prevent volumes from being destroyed, consider enabling the LVM auto-extend feature.
To enable the auto-extend feature, the following LVM configuration options need to be updated:
snapshot_autoextend_threshold
: auto-extend a "snapshot" volume when its usage exceeds the specified percentagesnapshot_autoextend_percent
: auto-extend a "snapshot" volume by the specified percentage of the available space once the usage exceeds the threshold
Update LVM configuration (located in /etc/lvm/lvm.conf
by default):
#
Step 2. Configure and launch the Database Lab Enginecaution
To make your work with Database Lab API secure, do not open Database Lab API and Postgres clone ports to the public and instead use VPN or SSH port forwarding. It is also a good idea to encrypt all the traffic: for Postgres clones, set up SSL in the configuration files; and for Database Lab API, install, and configure NGINX with a self-signed SSL certificate. See the How to Secure Database Lab Engine.
#
Prepare database data directoryNext, we need to get the data to the Database Lab Engine server. For our testing needs, we have 3 options:
- "Generated database": generate a synthetic database for testing purposes
- "Physical copy" (
pg_basebackup
): copy an existing database (perform "think cloning" once) using a "physical" method such aspg_basebackup
- "Logical copy" (dump/restore): copy an existing database using the "logical" method (dump/restore)
- 1. Generated database
- 2. Physical copy (pg_basebackup)
- 3. Logical copy (dump/restore)
If you don't have an existing database for testing, then let's just generate some synthetic database in the data directory ("PGDATA") located at /var/lib/dblab/dblab_pool/data
. A simple way of doing this is to use PostgreSQL standard benchmarking tool, pgbench
. With scale factor -s 100
, the database size will be ~1.4 GiB; feel free to adjust the scale factor value according to your needs.
To generate PGDATA with pgbench
, we are going to run a regular Docker container with Postgres temporarily. We will use POSTGRES_HOST_AUTH_METHOD=trust
to allow a connection without authentication (not suitable for real-life use).
Create the test
database:
Generate data in the test
database using pgbench
:
PostgreSQL data directory is ready. Now let's stop and remove the container:
Now, we need to take care of Database Lab Engine configuration. Copy the contents of configuration example config.example.logical_generic.yml
from the Database Lab repository to ~/.dblab/server.yml
:
Open ~/.dblab/server.yml
and edit the following options:
- Set secure
server:verificationToken
, it will be used to authorize API requests to the Database Lab Engine - Remove
logicalDump
section completely - Remove
logicalRestore
section completely - Leave
logicalSnapshot
as is - If your Postgres major version is not 13 (default), set the proper version in Postgres Docker images tags:
provision:options:dockerImage
retrieval:spec:logicalRestore:options:dockerImage
retrieval:spec:logicalDump:options:dockerImage
If you want to try Database Lab for an existing database, you need to copy the data to PostgreSQL data directory on the Database Lab server, to the directory /var/lib/dblab/dblab_pool/data
. This step is called "thick cloning". It only needs to be completed once. There are several options to physically copy the data directory. Here we will use the standard PostgreSQL tool, pg_basebackup
. However, we are not going to use it directly (although, it is possible) – we will specify its options in the Database Lab Engine configuration file.
First, copy the contents of configuration example config.example.physical_generic.yml
from the Database Lab repository to ~/.dblab/server.yml
:
Next, open ~/.dblab/server.yml
and edit the following options:
- Set secure
server:verificationToken
, it will be used to authorize API requests to the Database Lab Engine - In
retrieval:spec:physicalRestore:options:envs
, specify how to reach the source Postgres database to runpg_basebackup
:PGUSER
,PGPASSWORD
,PGHOST
, andPGPORT
- If your Postgres major version is not 13 (default), set the proper version in Postgres Docker images tags:
provision:options:dockerImage
retrieval:spec:physicalRestore:options:dockerImage
retrieval:spec:physicalSnapshot:options:promotion:dockerImage
tip
Optionally, you might want to keep PGDATA up-to-date (which is being continuously updated). Good news is that this is supported if you chose the "physical" method of initialization for the data directory. To have PGDATA updated continuously, configure retrieval:spec:physicalRestore:restore_command
option by specifying the value normally used in restore_command
on PostgreSQL replicas based on WAL shipping.
If you want to try Database Lab for an existing database, you need to copy the data to the PostgreSQL data directory on the Database Lab server, to the directory /var/lib/dblab/dblab_pool/data
. This step is called "thick cloning". It only needs to be completed once.
Here we will configure Database Lab Engine to use a "logical" method of thick cloning, dump/restore.
First, copy the contents of configuration example config.example.logical_generic.yml
from the Database Lab repository to ~/.dblab/server.yml
:
Now open ~/.dblab/server.yml
and edit the following options:
- Set secure
server:verificationToken
, it will be used to authorize API requests to the Database Lab Engine - Set connection options in
retrieval:spec:logicalDump:options:source:connection
:dbname
: database name to connect tohost
: database server hostport
: database server portusername
: database user namepassword
: database master password (can be also set asPGPASSWORD
environment variable and passed to the container using--env
option ofdocker run
)
- If your Postgres major version is not 13 (default), set the proper version in Postgres Docker images tags:
provision:options:dockerImage
retrieval:spec:logicalRestore:options:dockerImage
retrieval:spec:logicalDump:options:dockerImage
#
Launch Database Lab server#
How to check the Database Lab Engine logs#
Need to start over? Here is how to clean upIf something went south and you need to make another attempt at the steps in this tutorial, use the following steps to clean up:
#
Step 3. Start cloning!#
CLI#
Install Database Lab client CLICLI can be used on any machine, you just need to be able to reach the Database Lab Engine API (port 2345 by default). In this tutorial, we will install and use CLI locally on the EC2 instance.
Initialize CLI configuration:
Check the configuration by fetching the status of the instance:
#
Create a cloneAfter a second or two, if everything is configured correctly, you will see that the clone is ready to be used. It should look like this:
#
Connect to a cloneInstall psql:
Now you can work with this clone using any PostgreSQL client, for example, psql
. Use the following connection info (db
section of the response of the dblab clone create
command):
Check the available table:
Now let's see how quickly we can reset the state of the clone. Delete some data or drop some table.
To reset, use the clone reset
command (replace my_first_clone
with the ID of your clone if you changed it). You can do it not leaving psql -- for that, use the \!
command:
Check the status of the clone:
Notice how fast the resetting was, just a few seconds! 💥
Reconnect to the clone:
Now check the database objects you've dropped or partially deleted – everything should be the same as when we started.
For more, see the full client CLI reference.
#
GUI (Database Lab Platform)To use the GUI, you need to sign up for Database Lab Platform.
How to get the Database Lab GUI
Currently, the Database Lab GUI is in "private beta" mode. The onboarding consists of two steps. Step 1: Sign up using a Google, LinkedIn, GitLab, or GitHub account. Step 2: Next, the Postgres.ai team will contact you and schedule a demo, during which your account will be activated.
#
Add Database Lab Engine to the Platform- On the Database Lab instances page of your organization click the Add instance button.
- On the Add instance page fill in the following:
Project
: choose any project name, it will be created automaticallyVerification token
: specify the same verification token that you've used in the Database Lab Engine configuration fileURL
: Database Lab API server (EC2 instance public IP or hostname, specify port if needed, e.g.https://my-domain.com/dblab-engine/
orhttp://30.100.100.1:2345
)
- Click the Verify URL button to check the availability of the Engine. Ignore the warning about insecure connection – in this Tutorial, we have skipped some security-related steps.
- Click the Add button to add the instance to the Platform.
#
Create a clone- Go to the Database Lab instance page.
- Click the Create clone button.
- Fill the ID field with a meaningful name.
- (optional) By default, the latest data snapshot (closest to production state) will be used to provision a clone. You can choose another snapshot if any.
- Fill database credentials. Remember the password (it will not be available later, Database Lab Platform does not store it!) – you will need to use it to connect to the clone.
- Click the Create clone button and wait for a clone to be provisioned. The process should take only a few seconds.
- You will be redirected to the Database Lab clone page.
#
Connect to a clone- From the Database Lab clone page under section Connection info, copy the psql connection string field contents by clicking the Copy button.
- Here we assume that you have
psql
installed on your working machine. In the terminal, typepsql
and paste the psql connection string field contents. Change the database nameDBNAME
parameter, you can always usepostgres
for the initial connection. - Run the command and type the password you've set during the clone creation.
- Test established connection by listing tables in the database using
\d
.
Have questions?
Reach out to our team, we'll be happy to help! Use the Intercom widget located at the right bottom corner.
#
👋 Database Lab "Private Beta" programDatabase Lab Platform (SaaS) is currently in a "private beta" mode, being tested by several hundred engineers. Want to become an early adopter? Join Database Lab by Postgres.ai "Private Beta" program today: https://postgres.ai/console/.