April 25, 2024
Handling and minimizing problems with database changes
See Liquibase in Action
Accelerate database changes, reduce failures, and enforce governance across your pipelines.
This is one section of our complete 12-part guide to CI/CD for databases - get your copy.
CI/CD is an engineering practice geared for change management. With any such process, no matter how automated, you must address what to do when something goes wrong.
This is particularly important because modern software systems are made of many parts. That complexity means that a problem with just one of them can mean that any or all of the other parts might have to be modified to get the system running properly again. So, each piece of the puzzle – including the database – needs a clear plan for handling problems.
Traditional thinking about database changes is rooted in two options – rolling back or fixing forward. However, it is more pragmatic to think of a layered defense strategy to efficiently deal with problematic database changes. Working from the beginning of the pipeline forward to production, we can think about the following elements to avoiding and handling problems in the database CI/CD pipeline:
- Shift left to avoid problems
- Build fixes into changes – and test them
- Handle hotfixes consistently
- Deliberately handle exceptions
- Rolling back and fixing forward
Shift left to avoid problems
It is a well-understood principle in software engineering that it is more efficient to catch bugs early in a process before they have had time to impact the work of other developers or the system as a whole. In CI/CD, this principle is represented by the mantra of “shift left” – which holds that the entire software delivery pipeline is geared to identifying problems as close to the point of creation, the developer, as possible.
The premise is simple:
The sooner you can detect a problem, the less impact it will have, and therefore less rework needs to be done.
This is why previous sections in this guide focused on ensuring change quality and using an automated set of quality checks in a CI cycle right as the changes enter the pipeline.
Build fixes into changes – and test them
Inevitably, problems will occur when changing a system. Regardless of whether those problems are caused by the database change, if the application change has to be backed out, it is possible that related database changes will need to be removed as well. Therefore it is crucial to know how changes will be removed should the need arise and have confidence that the removal procedure works if needed.
When defining your database changes in CI/CD, you should use smaller, well-defined changes and associate a specific ‘fixing’ script with that change. Note that we are deliberately not using the term ‘rollback’ here. While the script will likely simply reverse the change, that is not always truly what happens in databases to correct things.
Independent of how the ‘fixing’ script actually operates, that script should be programmatically tested every time the change is deployed to an environment. The goal here is to actually test the ‘fixing’ script in lower environments before it is needed in Production. Don’t wait to test these fixes until they’re needed – build it in programmatically so you have confidence it will work in a real situation.
Handle hotfixes consistently
Inevitably, there is a worst-case scenario where an emergency change is needed because something unforeseen has happened. This will require a rapid response and some way to quickly get a database change through the pipeline. Deliberately designing a ‘hotfix’ path into your pipeline ensures that you have the means to handle the situation in an organized way. There should be no need for heroic effort or high-risk, manual hacks.
The hotfix path should answer two questions:
- What is the minimum bar for acceptable checks we must have in place for a database change to bypass the pipeline and go to Production?
- How are we capturing the hotfix change so that it gets applied to the rest of the pipeline to avoid regression and ensure quality?
Minimally, these two questions should be answerable by the ‘shift left’ checks mentioned above. Those checks are theoretically the minimum standard for a database change to get into the pipeline and should apply as the minimum safety standard for an emergency or hotfix change. Additional checks are obviously possible if required, but that requirement might indicate that the shift left process needs to be enhanced.
Rolling back and fixing forward
When you bring up the topic of dealing with problematic database changes in a software delivery pipeline, one of the first topics will be classic rollback approaches. The idea is that if you take something and set it back the way it was before, it will return you to a previously known good configuration.
While that is usually true for hardware, operating system settings, and versions of application binaries, databases are a more complicated topic – and not always so easy to ‘reset’ to their previous state. That is especially true when data is placed at risk by the reset and/or the impact is to the production environment.
So, while “rollback” is the most common approach for IT shops to handle bad changes, the concept of undoing a database change is not the same as an application binary. While an application binary can be replaced by the old, unchanged one, two fundamental truths about databases make a database change rollback more complicated.
First, database changes are cumulative
Databases are stateful. Any action that adds or removes changes is, in effect, a forward change. This leads to a slightly academic conversation of whether databases can ever truly be rolled “back” or if they are really rolling “forward” to a new state that resembles an older state they previously held. While this is conceptually academic, it has technical ramifications for how problem-handling processes must be designed.
Second, database changes may not be directly reversible
“Rollback” or “Undo” implies the direct reversal of changes. On the surface, that would seem to be great for databases and their always cumulative changes. However, because there is data in databases — not just structure and logic — the order in which you add changes may not be easily reversed. Even when the changes are directly reversible, it may be less efficient to do so when compared to simply applying a new fixing change.
But, you still may need a rollback strategy
Classic IT management approaches rely on rollback as the safety net for what happens when a database change goes wrong. Many organizations have an inflexible dependence on the old IT management frameworks and therefore force a rollback-centered solution even while admitting it is technically dubious for a database. This leads to teams force-fitting reversal scripts or creating remediations that go ‘forward’ while claiming and documenting them as rollback plans. Neither example is particularly healthy.
The better, if potentially culturally difficult, modern best practice is to bundle every database change with its corresponding fix script as we discussed above. While it may take more effort at the beginning to get people comfortable with the approach, the benefits will rapidly become apparent - because things will literally be handled more rapidly.
What about fixing forward?
A popular contrast to rolling back is to roll forward or fix forward. A forward approach prioritizes getting the new business value of the changes and features into the hands of the users. It assumes the opportunity cost of features waiting until the next release cycle is unacceptable. From a database perspective, ”fix forward” better takes into account the cumulative nature of database changes and how the forward sequence of database changes is not necessarily the backward sequence.
Therefore, rather than reverting to an older configuration in order to deal with problems at the ‘next release’, the team should diagnose the problem and then be good enough at applying changes, perhaps using the techniques discussed in this guide, that they can push a fix to the problematic system very quickly.
Whether your organization uses a ‘forward’ or a ‘backward’ approach, the most important thing is to proactively manage individual changes and have an automated and tested fix packaged with each one.
Understanding the common approaches for how organizations handle problem changes is important. These expectations influence the standards that your organization will expect from a CI/CD pipeline and will influence your solution’s design. They also help you build efficiency into how the pipeline provides efficient problem-handling end-to-end.
Ready to dive into all 12 parts of the complete CI/CD for databases guide? It covers:
- The goals & challenges of designing CI/CD pipelines
- Batching database changes
- Automating database “builds”
- Ensuring quality database changes
- Reliable and predictable database deployments
- Trustworthy database environments
- Sandbox databases
- Database governance for consistency, quality, and security
- Observability for database CI/CD
- Measuring database CI/CD pipeline performance
- Handling and avoiding problems with database changes
- Bring CI/CD to the database with Liquibase