April 29, 2024
Trustworthy database environments for CI/CD pipelines
See Liquibase in Action
Accelerate database changes, reduce failures, and enforce governance across your pipelines.
This is one section of our complete 12-part guide to CI/CD for databases - get your copy.
To ensure a flawless production deployment, we must verify that the “lower” or “non-production” environments being used for batch testing and process rehearsals are trustworthy and reliable.
If the lower environments are not trustworthy, they can become a point of inconsistency when inspecting the results of the batch and process. Any such inconsistencies bring doubt to the process and mean that the team will need to invest excessive time and effort to verify that:
- The only changes present are the expected ones
- The batches being applied were the sole source of the changes appearing in the database
- Nothing is missing from the database
What is trust?
A trustworthy database in a CI/CD pipeline environment is dependable, reliable, and provides confidence in the integrity of its data, structure, and availability. For a database in a validation (pre-production) environment, all three things need to be true:
- Know your state: The state of the database must be precisely knowable BEFORE new changes are applied
- One path for changes: There are no sources of change to the database outside of the inbound batch
- Validation = Production: The state of the database is guaranteed to be truly representative of what Production will look like at the same point/version
If these points are always true, then we can minimize or eliminate environmental issues when inspecting the database after the batch has been applied. Let’s dive into each.
Know your state
There are two parts to making sure you always know the state of the database you are working with.
- Ensuring that the schema version is easily knowable by both people and automation scripts
- Having the ability to reset the database to a known state at will
Both people and automation scripts know the state
The first part is ensuring that the state — effectively its schema version — is easily knowable by both people and automation scripts. This requires a digitally readable shorthand structure in the managed database that can be checked and compared to incoming batches to determine which batches are already present, which changes (if any) need to be backfilled to bring the database to a current version, which batches should be executed, and so on.
The shorthand tracking structure exists because it is impractical to always completely scan a database. Full scans take too long and can be fooled by environment-specific variations. This is why migration-based tools such as Liquibase maintain change tracking tables in target databases, since they have an instant and ever-present way to reconcile inbound changes against the list of changes already in the database.
Reset at will
Once the basic version tracking is handled, the second part is the ability to reset the database to a known state at will. The obvious benefit is the ability to quickly get back to a known state in the event of a problem, but that is only the most basic reason.
The more advanced reason is to use a good resetting capability proactively, not just reactively, to avoid a whole class of problems in the first place. Proactive resets that happen frequently are great for work that happens in the validation stage because they ensure that you are always starting from a known state for every deployment. You can minimize or eliminate the chance of “drift” - whether that is untracked changes, something ‘leftover’ in an environment from testing, or similar - tripping you up. This is why so many DevOps practices leverage fast resets of their entire environments. Some go so far as completely replacing the environment for each test cycle.
One path for changes
Once you have constant visibility into the condition of your databases, you must address the issue of having only one closely guarded path for changes to get into the database in any environment. Per above, you cannot truly know the state of an environment if someone or something can invisibly make changes to it and cause it to “drift”. These are called ‘out-of-band’ changes. Depending on your technical capabilities, there are generally two patterns for dealing with this.
Path 1: If fast refreshes are not possible
The first pattern is for the case where you do not, or cannot, have fast refreshes to your environment. In this case, you must restrict access to the validation environments so that only the automation tools can actually log in and make changes except in very exceptional cases.
The analogy used here is that of breaking the glass on a fire alarm. It is permitted if there is a true need, but if you do choose to do so, everyone is going to know about it.
When teams use this approach, the emergency process starts with someone retrieving the credentials from a secure location - such as a password vault - where the act of retrieval cannot be hidden. Then, that person then makes whatever fixes and adjustments are required.
Once the event is over and the problem is corrected, two things must happen to reset the system. First, the credentials should be changed to reset the ‘fire alarm’. Then, the tools automatically processing the change batches must be updated to be aware of whatever adjustments were made during the incident.
Path 2: If you have advanced reset capabilities
The second, more ideal pattern, is for when you do have advanced reset capabilities. This pattern prefers that only tools can access environments, but adds the ability to automatically re-create environments at a known point in time from a known-good snapshot. The snapshot can be schema-only or it may include data, depending on the situation. Though, as we will see, data brings other considerations.
Authoritative schema snapshots are usually based on Production, which is generally well secured so the chances of unexpected changes start low. Then, as long as you can ensure the integrity of Production, you can have the ability to leverage an automated process to generate snapshots of Production at will.
While this second path is more difficult to set up, there are many benefits to this approach:
- Cost savings from not keeping test databases around in between cycles
- No chance of drift between tests (assuming a reset every cycle)
- Security of having easily verified checksums of snapshot files at rest
Validation database must be representative of Production
The third dimension of trustworthiness is that the databases in validation environments must be truly representative of Production as it will look when a new batch of changes is applied. This implies both structure and data.
If the validation environments are not representative of Production, then they are of little value for testing and even less value as a rehearsal for deploying the new changes in Production.
Test data management
The first two points about state and ensuring a single path for changes do not really address data as anything more than part of the configuration. However, data must be considered as an intrinsic part of a validation environment if it is to serve as a true representative of Production.
For example, while data-centric practices such Test Data Management (TDM) are arguably separate disciplines to CI/CD, most application test practices used in CI/CD rely on good test data. So, while TDM itself is well beyond our discussion here, we do need to address what we should expect from it to have a trustworthy test database.
There are two primary items for TDM within CI/CD environment management. First, the datasets need to be the smallest required to accomplish the testing task and no larger. This makes it easier and faster to reset, thus supporting the overall initiative.
The second is that it must be secure and compliant with all privacy standards. This point is more than the typical security reminder. If there are snapshots and copies of data around, the security risk goes up accordingly. So, keeping the data masked, encrypted, etc. is an important consideration lest our use of that data in a ‘lower’, and potentially less secure, environment becomes a breach.
Having validation environments you can trust is crucial to having a low-friction, automated CI/CD process. No matter how well the changes and deployment processes are defined and executed, if the target environments are unstable, the results will be unpredictable. Being able to track and control changes into a Production-representative environment reduces friction and time spent on flowing changes through the pipeline. It also helps people have more faith in better-automated testing.
We have now discussed creating batches of changes, ensuring that they are ready, and reliably deploying those batches. This covers the basic activities required for a CI/CD flow. With that out of the way, we are going to shift focus to optimization and management activities around CI/CD — starting with the next section on sandbox databases.
Ready to dive into all 12 parts of the complete CI/CD for databases guide? It covers:
- The goals & challenges of designing CI/CD pipelines
- Batching database changes
- Automating database “builds”
- Ensuring quality database changes
- Reliable and predictable database deployments
- Trustworthy database environments
- Sandbox databases
- Database governance for consistency, quality, and security
- Observability for database CI/CD
- Measuring database CI/CD pipeline performance
- Handling and avoiding problems with database changes
- Bring CI/CD to the database with Liquibase