DBLab Engine development roadmap
DBLab Engine (DLE) Roadmap​
We are working hard to develop the Database Lab Platform (DLP) and its open-source core component, Database Lab Engine (DLE). Below you can find the main ideas we are working on now or planning to work soon.
[DLE] Physical provisioning​
Physical provisioning: native support of DB provisioning from archives created by a specific backup solution or based on an existing Postgres database
- Support various sources
- Generic (anything: pg_basebackup, rsync, any backup tools)
- Native support for specific backup tools
- WAL-E/WAL-G
- pgBackRest
- Barman
- pg_probackup
- Continuously updated state (physical replication based on WAL shipping)
- Snapshot management (schedule, retention policy)
[DLE] Logical provisioning​
Logical provisioning: native support of DB provisioning for managed PostgreSQL databases
- Support various sources
- Any PostgreSQL DB via dump/restore: Amazon RDS, Heroku Postgres, Azure PostgreSQL, GCP CloudSQL, Digital Ocean Postgres, etc.
- AWS: RDS IAM
- GCP: CloudSQL IAM
- Azure IAM
- Restore from backups stored on AWS S3
- uncompressed
- compressed (gzip, bzip2)
- Continuously updated state (logical replication)
- Dump/restore on the fly (without need to save dumps on disk)
- Multiple pools, rotation/refresh on schedule
- "Selected pool": allow to specify pool for the case when multiple DLEs are running on the same machine
- Advanced refresh policies: force refresh on pools being in use, warning period, number of retries before forcing
- Partial data retrieval
- specific databases
- specific tables
- arbitrary filtering (column and row filtering)
[DLE] Engine features​
- Persisting clones when the engine restarts (added in DLE 3.0.0)
- Point-in-time recovery (PITR) (Can be used for ultra-fast recovery of accidentally deleted data)
- Troubleshoot/test standby behavior
- Creating a clone running in read-only mode to allow troubleshooting hot standby issues
- Support for launching N replicas for a clone
- For the "physical" mode: creating clone from "pre" snapshot (read-only, unpromoted; admins only)
- Duplicate DLE (create a new DLE based on existing one)
- Clone observability
- "Temporary" system- and Postgres-level monitoring for clones/sessions
- Clone analytics
- Advanced audit
- perf/FlameGraphs
- Utilization of DLE instance and alerts
- Usage and estimated savings reports
- SSH port forwarding for API and Postgres connections
- Tags
- Framework for macro database experiments (work with thick/regular clones)
- Auto-register DLE in Platform
- Resource usage quotas for clones: CPU, RAM (container quotas, supported by Docker)
- User quotas
- Disk quotas (
zfs set quota=xx
) - GUI with key features (added in DLE 3.0.0)
- Fast connection to clone's DB via CLI
- Advanced snapshot management
- API handle to create/destroy snapshots (for continuously updated state)
- User-defined snapshots for clones
- Snapshot export/import (S3)
- Advanced schema management
- schema diff
- zero-downtime DDL auto-generation
- Reset clone's state to particular database version – keeping DB creds the same (including port)
- physical: allow choosing
dataStateAt
- logical: allow "jumping" between DB versions (pools)
- physical: allow choosing
[DLP] Platform features​
- Support working with multiple DLEs
- Backups, PITR
- User management: basic permissions
- User management: advanced permissions
- SSO
- Clone (Postgres, Postgres over SSH / port forwarding) connection options
- LDAP
- SSH key management
- Security
- Security overview: software used, incidents, code analysis
- Basic audit logging
- Advanced audit logging and alerting
- Export audit logs from GUI
- Send events to SIEM
- Usage stats
- Monitoring (Netdata)
- Notifications
- Notification management – turn on/off all or specific ones
- Non-deletable clone is abandoned / not used for too long
- Clone and snapshot are using too much disk space / out-of-disk-space risks
- CPU saturation
- Disk space saturation
- Disk IO saturation
- Refresh cannot be done because all pools are busy and policy doesn't allow forced refresh
- Full refresh started
- Full refresh finished
- Lag value is too high ("sync" container)
- Initial data retrieval started
- Initial data retrieval finished
- Snapshot created
- Snapshot deleted
- Pricing, billing
- report usage to postgres.ai
- flexible pricing options
- AWS: instance type and size based
[DLP] Automated verification of database schema and complex data changes a.k.a. DB migrations​
- History and logging for clones/sessions
- Automated detection of locking issues
- Setting custom
statement_timeout
- PostgreSQL logs for the migration
- Report in CI and Platform
- Integration with CI tools – advanced integration
- GitHub Actions
- Bitbucket CI/CD
- CircleCI
- Jenkins
- GitLab CI/CD
- Bamboo
- TravisCI
- Support various database migration tools + demo
- Sqitch
- Flyway
- Liquibase
- Ruby on Rails Active Record
- Django migrations
- "Production timing" estimator (experimental feature, added in DLE 2.3.0, removed in DLE 3.4.0)
- More artifacts to support decisions: pgstat*, system usage, WAL, checkpoints, etc.
[DLE] Cloning (CoW technology)​
- ZFS
- LVM
- Storage-based CoW
[DLP] Automation, clouds, Kubernetes​
- Simplify setup for major Cloud Service Providers, automation of installation in clients' accounts
- Basic Terraform templates
- One-click setup on AWS. AWS Marketplace
- One-click setup on GCP. GCP Marketplace
- One-click setup on Azure. Azure Marketplace
- Cloud DLP ("DLP SaaS"): cloud offering (fully managed DLE and DLP)
- AWS
- GCP
- Azure
- Self-managed DLP ("DLP Enterprise"): work with multiple DLEs and all DLP features in customer's account
- Cost optimization
- AWS spot instances
- GCP preemptible instances (24h max)
- Azure spot instances
- AWS/GCP/Azure: Self-stopping instances for cost savings, keeping disk present, and refreshing when needed
- Kubernetes support
- DLE operator
- Integration with StackGres
- PoC (logical, physical: WAL-E/G)
- integration
- Support CSI Volume Cloning (GA: k8s 1.18)
[DLP/Joe] SQL optimization chatbot​
- Web UI version
- Slack chatbot
- Telegram chatbot
- History with Search and Share options
- Visualizations
- explain.depesz
- explain.dalibo (PEV2)
- pgMustard (WebUI/SaaS only)
- FlameGraphs
- Better optimization recommendations
- Macroanalysis insights (suggestions based on postgres-checkup / pgss)
- Hypothetical indexes
- Hypothetical partitioning
- Index advisor
- Utilization control
- Restore user sessions after Joe container restarts
- Chatbot security
- Do not use DB superuser
- Quotas (rate limits)
- Alerting admins when a quota is reached
- Runtime execution plan observability: pg_query_state
- Reset to specific
dataStateAt
- perf/FlameGraphs
- Wait event sampling
- Heavylock analysis
[SaaS] Data masking and anonymization​
- Basic support for masking and obfuscation
- custom scripts
- parallel execution of custom scripts
- postgres_anonymizer
- kitchen-sync
- pgsync
- Hybrid setup: raw and obfuscated/masked clones on the same DLE instance
- Dump/restore with runtime anonymization, parallelized, via GitOps
- Simplified setup for anonymization - GUI
- Automated masking / anonymization
[Docs] Better documentation and demo​
- Tutorials
- Basic
- RDS
- SQL optimization using Joe bot
- DB migration testing in CI/CD pipelines
- Katacoda
- User Guides
- DLE setup and administration
- Cloning
- SQL optimization with Joe bot
- Building non-production environments
- References
- DLE API
- CLI
- DLE configuration
- Joe configuration
- Explanations and concepts
- SQL optimization workflow with Database Lab
- Configuration details
- Why and how SQL optimization is possible on thin clones
- Security aspects
- Secure and robust test/staging environments
- Interactive demo
- Basic DLE features
- SaaS features
- SQL optimization using Joe bot (WebUI)
- DB migration testing (GitHub Actions)
- Advanced examples, use cases
- Video demonstrations