Database Lab data masking and obfuscation
#
Data masking: "pre-masking"Allows setting up dynamic masking rules without actual data changes. According to these rules, PII (personally identifiable information) data is masked only when certain users access it.
#
Pre-masking, option 1а: dump without PII directly from a production serverDatabase Lab retrieves the data from production in the form of an anonymized logical dump. This approach requires additional masking set up on the primary production database. Dumping may happen on a replica.
#
Pros- Very secure: PII is stored only on production and does not reach the Database Lab Engine
- In some cases, easier to set up: does not require an additional Database Lab Engine
#
Cons- May require installation of additional extensions on production servers.
- Anonymization affects production servers' performance: increased disk IO, bloat growth and/or replication lags.
- Data structure is not identical to production, making SQL troubleshooting and optimization less reliable.
#
Pre-masking, option 1b: anonymized dump using additional Database Lab EngineDatabase Lab Engine in the production infrastructure is used to create an anonymized dump. Database Lab Engine in the test/dev/staging environment retrieves the data from the production in the form of an anonymized dump.
#
Pros- Very secure: PII is stored only on production and does not reach the Database Lab Engine
- Data anonymization is not affecting production servers
- Data recovery and heavy analytics queries are not affecting production servers (executed on the "production" Database Lab Engine server)
#
Cons- Requires an additional Database Lab Engine
- Data structure is not identical to production, making SQL troubleshooting and optimization less reliable
#
Data masking: "post-masking"Allows to set up dynamic masking rules, without actual data changing. Data is dynamically masked on the side of the Database Lab Engine.
#
Post-masking, option 2a: Database Lab Engine on productionDatabase Lab Engine is deployed only on production infrastructure, and it physically stores PII. Depending on the access level, developers may or may not have access to the unmasked data.
#
Pros- Very secure: PII is stored only on the production environment (the Database Lab server also has it, but it also resides in the production environment)
- Physical data layout is identical to production, best conditions for SQL troubleshooting, and optimization
- Data anonymization is not affecting production servers
#
Cons- High requirements for security administration
- More difficult to configure and maintain access (developers need to deal with the production environment; automation scripts need to reach the Database Lab Engine, which may cause complications)
#
Post-masking, option 2b: Database Lab Engine in Test/Dev/StagingDatabase Lab Engine is deployed only on test/dev/staging infrastructure, and it physically stores PII. Developers work with masked data.
#
Pros- Identical data structure for development and optimization
- Physical data layout is identical to production, best conditions for SQL troubleshooting, and optimization
- Data anonymization is not affecting production servers
- Data recovery and heavy analytics queries are not affecting production servers
#
Cons- PII is physically copied from the production infrastructure and stored non-production (but cannot be accessed by developers).
- High requirements for security administration of test/dev/staging environments.
#
Data obfuscationInstead of masking, the data can be deleted permanently, e.g., during snapshot creation. Usually, this is a massive change requiring significant time and disk space.
Database Lab Engine supports obfuscation of any type via injecting an SQL transformation to the process of preparation of snapshots for thin clones.
Options:
- Use custom obfuscation script (define it using
preprocessingScript
option oflogicalSnapshot
orphysicalSnapshot
jobs) - Use PostgreSQL Anonymizer: Permanently remove sensitive data
#
Pros- PII data anonymization not affecting production servers
- Data recovery and heavy analytics queries are not affecting production servers
- Moderately difficult to set up
#
Cons- Data structure is not identical to production, making SQL troubleshooting and optimization less reliable
- PII is physically copied from the production infrastructure (but cannot be accessed by developers)
- High requirements for security administration of test/dev/staging environments