Database Lab Engine configuration reference
Overview​
Database Lab Engine behavior can be controlled using the main configuration file that has YAML format. This reference describes available configuration options.
Database Lab Engine supports YAML 1.2 including anchors, aliases, tags, map merging.
Example config files can be found here: https://gitlab.com/postgres-ai/database-lab/-/tree/v3.0.0/configs.
You may store configuration files in any suitable location. The recommended location of configuration files for Database Lab Engine is ~/.dblab/engine/configs
.
In addition, Database Lab Engine provides functionality for storing information about current sessions and the state of the instance.
The recommended location of metadata files is ~/.dblab/engine/meta
. Note the metadata folder must be writable.
Make sure that the file name is server.yml
and its directory is mounted to /home/dblab/configs
inside the DLE container.
Useful guides that help manage Database Lab Engine:
The configuration of Database Lab Engine can be reloaded without downtime:
docker exec -it dblab_server kill -SIGHUP 1
docker logs --since 1m dblab_server
The list of configuration sections​
Here is how the configuration file is structured:
Section | Description |
---|---|
global | Contains global parameters, such as data directory path or enabling debugging. |
server | Pertains to the DLE API server. |
embeddedUI | Refers to the DLE UI. |
poolManager | Manages filesystem pools or volume groups. |
provision | Describes how thin cloning and database branching are organized. |
retrieval | Defines the data flow: a series of "jobs" for initial retrieval of the data, and, optionally, continuous data synchronization with the source, snapshot creation and retention policies. The initial retrieval may be either "logical" (dump/restore) or "physical" (based on replication or restoration from a archive). |
cloning | Thin cloning policies. |
platform | Postgres.ai Platform integration (provides GUI, advanced features such as user management, logs). |
observer | CI Observer configuration. CI Observer helps verify database schema changes (database migrations) automatically, in CI/CD pipelines. Available on the Postgres.ai Platform. |
diagnostic | Configuration to collect diagnostics logs - containers output, Postgres logs. |
estimator | (removed in DLE 3.4.0) Estimator configuration. Estimator estimates a timing of queries on the production database. |
Section global
: global parameters​
engine
- defines the Database Lab Engine. Supported engines:postgres
debug
- allows seeing more in the Database Lab Engine logs; WARNING: in this mode, sensitive data (such as passwords) can be printed to logsdatabase
(key-value, optional) - contains default configuration options of the restored databaseusername
(string, optional, default: "postgres") - a default username for logical/physical restore jobsdbname
(string, optional, default: "postgres") - a default database name for logical/physical restore jobs
Section server
: Database Lab Engine API server​
verificationToken
(string, required) - the token that is used to work with Database Lab APIport
(string, required, default: 2345) - HTTP server port
Section embeddedUI
: Database Lab Engine user interface​
enabled
(boolean, optional, default: true) - manages the state of the UI containerdockerImage
(string, required) - a Docker image of the UI applicationhost
(string, required, default: "127.0.0.1") - the host which the embedded UI container accepts HTTP connections fromport
(integer, required, default: 2346) - an HTTP port of the UI application
Section poolManager
: filesystem pools or volume groups management​
mountDir
(string, required) - specifies the location of the pools mount directory (can contain multiple pool directories)dataSubDir
(string, optional, default: "") - specifies the location of restored data by Database Lab Engine relative to the pool which is placed inside the mount directory (mountDir
)clonesMountSubDir
(string, required) - the directory that will be used to mount clonessocketSubDir
(string, required) - the UNIX socket directory that will be used to establish local connections to cloned databasespreSnapshotSuffix
(string, required) - the suffix to denote preliminary snapshotsselectedPool
(string, optional, default: "") - enforce selection of the working pool (or dataset) inside themountDir
directory. If this option is specified, it disables the automatic rotation of multiple pools, which may be useful when multiple DLEs are running on the same machine, sharing the same set of pools. An empty string turns off this feature, enabling the standard pool selection and rotation mechanism (default behavior).
Section provision
: thin cloning environment settings​
portPool
(key-value, required) - defines a pool of ports for Postgres clonesfrom
(integer, required) - the lowest port value in the poolto
(integer, required) - the highest port value in the pool
dockerImage
(string, required) - Postgres Docker image to be used for cloning. IMPORTANT: Postgres version of this image should match the source's Postgres version. For logical mode, it is a recommendation. For physical mode, it is a requirement.useSudo
(boolean, optional, default: false) - use sudo for ZFS/LVM and Docker commands if Database Lab server running outside a containerkeepUserPasswords
(bool, optional, default: "false") - By default, in addition to creating a new user with administrative privileges, Database Lab Engine resets passwords for all existing users. This is done for security reasons. If this behavior is undesirable and you want to keep the ability authenticate for the existing users with their unchanged passwords, then set the value of the variable totrue
.containerConfig
(key-value, optional) - options to pass custom parameters to clone containerscloneAccessAddresses
(string, optional, default: "127.0.0.1") - IP addresses that can be used to access clones. By default, use a loop-back to accept only local connections. The empty string means "all available addresses". The option supports multiple IPs (using comma-separated format) and IPv6 addresses (for example,[::1]
)
Section retrieval
: data retrieval​
refresh
(key-value, optional) - describes configuration for a full refresh.timetable
(string, optional, default: "") - defines a timetable in crontab format: https://en.wikipedia.org/wiki/Cron#OverviewskipStartRefresh
(boolean, optional, default: false) - skips running retrieval jobs while the DLE instance starts; supported since DLE 3.4
jobs
(list, optional) - declares the set of running jobs. Stages must be defined in thespec
sectionspec
(key-value, optional) - contains a configuration spec for each job
Data retrieval jobs​
Available job names:
logicalDump
logicalRestore
logicalSnapshot
physicalRestore
physicalSnapshot
You need to choose either "logical" or "physical" set of jobs. Mixing is not allowed
Note, that all jobs are optional. For example, all the following approaches defining the initial data retrieval process are allowed:
- You may consider using both
logicalDump
andlogicalRestore
to make a dump to a file and then restore from it - You may use only
logicalRestore
and restore from an already prepared dump file - You may use only
logicalDump
, withoutlogicalRestore
(however, this approach makes sense only if you defineimmediateRestore
option in thelogicalDump
job, to perform dump & restore on-the-fly, without saving the dump to a file)
Job logicalDump
​
Dumps a PostgreSQL database from a provided source to an archive or to the Database Lab Engine instance.
Options:
dumpLocation
(string, required) - specifies the location to store dump files (or directories, for directory-format archives), it will be automatically created on the host machine. DLE deletes all files and directories in this directory before creating new dumps.dockerImage
(string, required) - specifies the Docker image containing the dump-required toolcontainerConfig
(key-value, optional) - options to pass custom parameters to logicalDump containersource
(key-value, required) - describes source of data:type
(string, required) - defines location type of a dumped database. Available values:local
,remote
,rdsIam
connection
(key-value, required) - defines connection parameters of source:dbname
(string, required) - database name used for connection purposes; also seelogicalDump.databases
host
(string, required) - defines hostname of the databaseport
(integer, optional, default: 5432) - defines port of the databaseusername
(string, optional, default: postgres) - defines database username to connect to the databasepassword
(string, optional, default: "") - defines username password to connect to the database; the environment variable PGPASSWORD can be used instead of this option; the environment variable has a higher priority
rdsIam
(key-value, optional) - contains options specific for RDS IAM source typeawsRegion
(string, required) - AWS Region where RDS is locateddbInstanceIdentifier
(string, required) - RDS instance IdentifiersslRootCert
(string, required) - path on the host machine to the SSL root certificate. You can download it from https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem
parallelJobs
(integer, optional, default: 1) - defines the number of concurrent jobs using thepg_dump
optionjobs
. This option can dramatically reduce the time to dump a large databasedatabases
(key-value, optional) - defines options for specifying the database list that must be copied. By default, DLE dumps and restores all available databases. Do not specify the databases section to take all databases. Available options for each database:tables
tables
(list of strings, optional) - dumps definition and/or data of only the listed tables. Do not specify the tables section to dump all available tablesexcludeTables
(list of strings, optional) - excludes all tables matching any of the patterns from the dump. Accept specific schemas and tables or will allow for wildcards (*) for more flexibility.
customOptions
(list of strings, optional) - defines one or multiplepg_dump
options. See available options in the official PostgreSQL documentation.immediateRestore
(key-value, optional) - provides options for direct restore to a Database Lab Engine instance.enabled
(boolean, optional, default: false) - enable immediate restore.forceInit
(deprecated, boolean, optional, default: false) - init data even if the Postgres directory (see the configuration optionsglobal.mountDir
andglobal.dataSubDir
) is not empty; note the existing data might be overwritten; deprecated since DLE 3.4.0customOptions
(list of strings, optional) - defines one or multiplepg_restore
options. See available options in the official PostgreSQL documentation
ignoreErrors
(boolean, optional, default: false) - ignore errors that occurred during logical data dump; supported since DLE 3.4
Job logicalRestore
​
Restores a PostgreSQL database from an archive created by pg_dump in one of the non-plain-text formats.
Options:
dumpLocation
(string, required) - specifies the location of the archive files (or directories, for directory-format archives) on the host machine to be restoreddockerImage
(string, required) - specifies the Docker image containing the restore-required toolcontainerConfig
(key-value, optional) - options to pass custom parameters to logicalRestore containerforceInit
(deprecated, boolean, optional, default: false) - init data even if the Postgres directory (see the configuration optionsglobal.mountDir
andglobal.dataSubDir
) is not empty; note the existing data might be overwritten; deprecated since DLE 3.4.0parallelJobs
(integer, optional, default: 1) - defines the number of concurrent jobs using thepg_restore
optionjobs
. This option can dramatically reduce the time to restore a large database to a server running on a multiprocessor machinedatabases
(key-value, optional) - defines options for specifying the database list that must be restored. By default, DLE restores all available databases. Do not specify the databases section to restore all databases. Available options for each database:tables
,format
format
(string, optional, default: "") - defines a dump format. Available formats:directory
,custom
,plain
. Default format:directory
. See the description of each format in the official PostgreSQL documentation.compression
(string, optional, default: "no") - defines a compression type for plain-text dumps. Available compression types:gzip
,bzip2
,no
.tables
(list of strings, optional) - restores definition and/or data of only the listed tables. Do not specify the tables section to restore all available tables
customOptions
(list of strings, optional)- defines one or multiplepg_restore
options. See available options in the official PostgreSQL documentationqueryPreprocessing
(key-value, optional) - defines pre-processing parameters; supported since DLE 3.2queryPath
(string, optional, default: "") - specifies the path to SQL pre-processing queries; an empty string means that no pre-processing definedmaxParallelWorkers
(integer, optional, default: 2) - defines the worker limit for parallel queriesinline
(string, optional, default: "") - inline SQL queries to execute; if specified, queries fromqueryPath
are executed beforeinline
ignoreErrors
(boolean, optional, default: false) - ignore errors that occurred during logical data restore; supported since DLE 3.4skipPolicies
(boolean, optional, default: true) - do not restore row-level security policies (CREATE POLICY
); supported since DLE 3.4
Job logicalSnapshot
​
Prepares a snapshot for logical restored PostgreSQL database.
Options:
dataPatching
(key-value, optional) - defines SQL queries for data patchingdockerImage
(string, optional) - specifies the Docker image to run a data patching containercontainerConfig
(key-value, optional) - options to pass custom parameters to data patching containerqueryPreprocessing
(key-value, optional) - defines pre-processing parametersqueryPath
(string, optional, default: "") - specifies the path to SQL pre-processing queries; an empty string means that no pre-processing definedmaxParallelWorkers
(integer, optional, default: 2) - defines the worker limit for parallel queriesinline
(string, optional, default: "") - inline SQL queries to execute; if specified, queries fromqueryPath
are executed beforeinline
preprocessingScript
(string, optional) - path on the host machine to a pre-processing scriptconfigs
(key-value, optional) - applies PostgreSQL configuration parameters when preparing a working snapshot. These parameters are inherited by all clones. See also: How to configure PostgreSQL used by Database Lab Engine
Job physicalRestore
​
Restores data from a physical backup.
Supported restore tools:
- WAL-G (
walg
) - an archival restoration tool for PostgreSQL, it uses LZ4, LZMA, or Brotli compression, multiple processors, and non-exclusive base backups for Postgres (GitHub) - pgBackRest (
pgbackrest
) - a reliable, easy-to-use backup and restore solution that can seamlessly scale up to the largest databases and workloads by utilizing algorithms that are optimized for database-specific requirements (GitHub); supported since DLE 3.1 - Custom (
custom
) - allows defining own command to restore data
Options:
tool
(string, required) - defines the tool to restore data. See available restore tools listdockerImage
(string, required) - specifies the Docker image containing the restoring toolcontainerConfig
(key-value, optional) - options to pass custom parameters to physicalRestore containersync
(key-value, optional) - keep PGDATA up to date after (replaying new WALs from the source) the initial data fetching:enabled
(boolean, optional, default: false) - runs a separate container to keep Database Lab data up to datehealthCheck
(key-value, optional) - describes health check options for the sync container:interval
(int, optional, default: 5) - health check interval for the data sync container (in seconds)maxRetries
(int, optional, default: 200) - maximum number of health check retries
configs
(key-value, optional) - applies PostgreSQL configuration parameters to the sync instance
envs
(key-value, optional) - passes custom environment variables to the Docker container with the restoring toolwalg
(key-value, optional) - defines WAL-G configuration options:backupName
(string, required) - defines the backup name to restore
pgbackrest
(key-value, optional) - defines pgBackRest configuration options:stanza
(string, required) - defines the stanza name to restore (pgBackrest docs)delta
(boolean, optional, default: false) - defines usage--delta
option for restore using checksums (pgBackRest docs)
customTool
(key-value, optional) - defines configuration options for custom restoring tool:command
(string, required) - defines the command to restore data using a custom toolrestore_command
(string, optional) - defines the PostgreSQLrestore_command
configuration option to keep the data up to date; Database Lab Engine automatically propagates the specified value to the proper location, depending on the version of PostgreSQL: in versions 11 and older, it is to be stored inrecovery.conf
, while in 12 and newer, it is a part of the main file,postgresql.conf
Job physicalSnapshot
​
Prepares a snapshot for physical restored PostgreSQL database.
Options:
skipStartSnapshot
(boolean, optional, default: false) - skip taking a snapshot while the retrieval startspromotion
(key-value, optional) - promotes PGDATA after data fetching:enabled
(boolean, optional, default: false) - enable PGDATA promotiondockerImage
(string, optional) - specifies the Docker image containing the promotion-compatible PostgreSQL instancecontainerConfig
(key-value, optional) - options to pass custom parameters to physicalSnapshot containerhealthCheck
(key-value, optional) - describes health check options for a data promotion container:interval
(int, optional, default: 5) - health check interval for a data promotion container (in seconds)maxRetries
(int, optional, default: 200) - maximum number of health check retries
queryPreprocessing
(key-value, optional) - defines pre-processing SQL queriesqueryPath
(string, optional, default: "") - specifies the path to SQL pre-processing queries; an empty string means that no pre-processing definedmaxParallelWorkers
(integer, optional, default: 2) - defines the worker limit for parallel queries
inline
(string, optional, default: "") - inline SQL queries to execute; if specified, queries fromqueryPath
are executed beforeinline
configs
(key-value, optional) - applies PostgreSQL configuration parameters to the promotion instance
sysctls
(key-value, optional) - allows configuring namespaced kernel parameters (sysctls) of Docker container for a promotion stage of taking a snapshot. See supported parameters: https://docs.docker.com/engine/reference/commandline/run/#configure-namespaced-kernel-parameters-sysctls-at-runtimepreprocessingScript
(string, optional) - path on the host machine to a pre-processing scriptconfigs
(key-value, optional) - applies PostgreSQL configuration parameters to snapshot. These parameters are inherited by all clones. See also: How to configure PostgreSQL used by Database Lab Engineenvs
(key-value, optional) - passes custom environment variables to the promotion Docker containerscheduler
(key-value, required) - contains tasks which run on a schedule:snapshot
(key-value, optional) - defines rules to create a new snapshot on a schedule:timetable
(string, required) - defines a timetable in crontab format: https://en.wikipedia.org/wiki/Cron#Overview
retention
(key-value, optional) - defines rules to clean up old snapshots on a schedule:timetable
(string, required) - defines a timetable in crontab format: https://en.wikipedia.org/wiki/Cron#Overviewlimit
(integer, required) - defines how many snapshots should be held
Section cloning
: thin cloning policies​
accessHost
(string, required) - the host that will be specified in the database connection string to inform users about how to connect to database clonesmaxIdleMinutes
(integer, optional, default: 0) - automatically delete clones after the specified minutes of inactivity, 0 is being used to disable this feature
Section platform
: Postgres.ai Platform integration​
url
(string, optional, default: "https://postgres.ai/api/general") - Platform API URLaccessToken
(string, required) - the token for authorization in Platform API. This token can be obtained on the Postgres.ai ConsoleenablePersonalTokens
(boolean, optional, default: false) - enables authorization with personal tokens of the organization's members
Section observer
: CI Observer configuration​
replacementRules
(key-value, optional) - set up rules based on regular expressions (a pair of values"regexp":"replace"
; to check syntax, use this document) for Postgres logs that will be sent to the Platform when running Observed Sessions; this helps ensure that sensitive data is masked properly and it doesn't leave the origin Replacement rules applies to the following log fields:message
,detail
,hint
,internal_query
,query
Section estimator
: Estimator configuration​
The section has been removed in DLE 3.4.0
readRatio
(float, optional, default: 1) - the ratio evaluating the timing difference for operations involving IO Read between Database Lab and production environmentswriteRatio
(float, optional, default: 1) - the ratio evaluating the timing difference for operations involving IO Write between Database Lab and production environments.profilingInterval
(string, optional, default: 10ms) - time interval of samples taken by the profilersampleThreshold
- (integer, optional, default: 20) - the minimum number of samples sufficient to display the estimation results
Section diagnostic
: Diagnostic collection configuration​
logsRetentionDays
(integer, optional, default: 7) - the number of days after which collected containers logs will be discarded