April 6, 2024
Sandbox databases in CI/CD pipelines
See Liquibase in Action
Accelerate database changes, reduce failures, and enforce governance across your pipelines.
This is one section of our complete 12-part guide to CI/CD for databases - get your copy.
Once we have an automated CI/CD pipeline, it puts pressure on the database change creators to get things right the first time. In this section, we go all the way back to the creation of the change itself.
No one wants to be the person who “broke” the CI build or to cause a disruption in the flow of changes from other change creators. That means that we must equip them with as much tooling as possible (or at least practical) to enable them to make good decisions as they create the database changes.
Hence, the sandbox database concept. Like a developer’s sandbox, it’s a place to safely experiment and understand impacts of changes without disrupting the work of others.
A sandbox is just a copy of a ‘real’ pipeline database that:
- Runs on the same database engine version
- Contains an object structure that accurately reflects the pipeline (including stored logic)
- Contains just enough data to run tests to verify any new changes
The concept sounds simple, but reliably making those things available to a change creator, such as a developer, can be a bit more challenging. While some of those problems, such as the license economics of some database platforms, are to be solved elsewhere, many of the other problems become much easier once you have the environment management portion of the CI/CD pipeline sorted.
Beyond the simple conceptual aspects of a sandbox database, there are three key capabilities for the sandbox to be truly effective.
On-demand refresh
The individual change creator must be able to refresh it themselves when they need to do so. Consider that they are deliberately experimenting to figure out what any given change should look like. The nature of experiments means that they fail often. Further, the failure is, by definition, unpredictable.
This means that the need for a refresh will happen at an unpredictable time. Given that a change creator’s productivity is related to their ability to experiment, they cannot simply wait for a daily, weekly, monthly, or some other long-interval refresh. It must happen as needed.
This is one reason why desktop-based container management is so popular. But even with easy access to the database engine, you immediately hit the question of schema version and what data will be present for the developer as they begin work.
At a known point
The sandbox must be current to a known point every time it is refreshed. While you can implement from a snapshot or similar construct, the nuance arises that snapshots age and cannot always be refreshed immediately. There will always be a number of changes that precede the one the creator is working on but they are not yet present in the snapshot.
So, your system for refreshing sandboxes starts to look a lot like your system for refreshing the CI or test environments. The main difference is that because of the experimental nature of the sandbox environment, you may need to give the change creators some control over which of the changes are applied in a refresh.
Speed of provisioning and refreshing
The more quickly a sandbox can be set up, experimented in, and reset as needed, the better. This means less time for a change creator to wait to verify their work.
This has a direct positive impact on:
- The number of experiments they can perform
- Their ability to iterate toward a solution
- General productivity
- Overall morale and satisfaction
Keeping sandboxes in perspective
The most important thing to remember is that no sandbox database can be perfectly representative of the real database. A perfect sandbox would be a full copy of PRODUCTION with all the latest ‘under test’ changes from the pipeline associated with the codestream the developer is working on. That’s clearly impractical due to:
- Size
- Data privacy requirements
- Network and load balancing
- Capacity costs
A sandbox is never authoritative — it is a local productivity tool to help the change creator make good decisions about the changes they are creating.
It is the integration environment at the CI point, where changes from multiple developers come together, that must always take precedence when deciding what is ‘correct’ - because it is the first truly representative and front-line defender of the pipeline. If that sounds like a bit of a hard line, that is because the pipeline includes PRODUCTION.
This is not really so different from the principle that a developer saying “it works on my machine”. That simply does not matter if it does not work in the build or QA environments. If a change works in one environment and not in another, there is a serious problem with environment management that must be fixed lest there be a serious productivity impact for the whole team.
Well-managed sandbox databases are a productivity boon for a development team. A key part of getting the most from them is ensuring that you have matching capabilities for the database layer of your system.
Ready to dive into all 12 parts of the complete CI/CD for databases guide? It covers:
- The goals & challenges of designing CI/CD pipelines
- Batching database changes
- Automating database “builds”
- Ensuring quality database changes
- Reliable and predictable database deployments
- Trustworthy database environments
- Sandbox databases
- Database governance for consistency, quality, and security
- Observability for database CI/CD
- Measuring database CI/CD pipeline performance
- Handling and avoiding problems with database changes
- Bring CI/CD to the database with Liquibase