August 30, 2024
Overcoming the challenges of large-scale database change management
See Liquibase in Action
Accelerate database changes, reduce failures, and enforce governance across your pipelines.
When one of the world’s largest financial institutions experienced a sudden performance issue with a critical Liquibase command, our team sprang into action. What should have taken the customer seconds was dragging on for 15 minutes or more, revealing a significant challenge in handling massive deployments with tens of thousands of ChangeSets.
This unexpected slowdown highlighted a crucial pain point in rapidly scaling database pipelines. Although Liquibase is known for its ability to bring DevOps automation, governance, and observability to database pipelines of any type, architecture, and scale, this issue showed us there was room to improve.
Our engineering team quickly replicated the issue, identified the root cause, and delivered a solution to every customer. Here’s how we tackled the problem to ensure Liquibase continues to support ever-expanding data stores with speed and reliability.
Replicating the issue
To replicate our Liquibase Pro users’ issue – slower performance on larger datasets – we could run a simple test:
- Deploy a large number of changesets to the database
- Run a simple command like
liquibase release-locks
However, deploying tens of thousands of ChangeSets from scratch would be time-consuming and unnecessary. Instead, we used the Structured Logs provided by the customer to identify the underlying issue: querying the DATABASECHANGELOG table (DCBL).
So, to replicate the customer’s performance issue, we didn’t create 10,000+ ChangeSets, but just filled up the DATABASECHANGELOG table with test entries.
DECLARE
v_counter NUMBER := 1;
BEGIN
FOR i IN 1..40000 LOOP
INSERT INTO DATABASECHANGELOG(ID, AUTHOR, FILENAME, DATEEXECUTED, ORDEREXECUTED, EXECTYPE, MD5SUM, DESCRIPTION, COMMENTS, TAG, LIQUIBASE, CONTEXTS, LABELS, DEPLOYMENT_ID)
VALUES (TO_CHAR(v_counter), 'your.name', 'big.sql', TIMESTAMP '2024-04-24 16:12:31.507774', 100, 'EXECUTED', '9:22921d38f361c5f294e6a5092bb1b654', 'sql', 'example comment', NULL, 'DEV', 'example-context', 'example-label', '3975120367');
v_counter := v_counter + 1;
END LOOP;
END;
Now, with our DBCL table populated with generated entries, we can run our command: liquibase release-locks
.
$ time liquibase release-locks
Starting Liquibase at 11:55:08 (version [local build] #0 built at 2024-04-24 16:46+0000)
Liquibase Version: [local build]
Liquibase Pro [local build] by Liquibase licensed to LB Internal Engineering License until Tue Apr 30 19:00:00 CDT 2024
Successfully released all database change log locks for '***@jdbc:oracle:thin:@***.rds.amazonaws.com:***'
Liquibase command 'releaseLocks' was executed successfully.
real 3m20.455s
And there we have it – more than three minutes to release locks on a database with 40,000 ChangeSets. So how do we fix it and bring these minutes back down to seconds?
Fixing change command delays on high-volume databases
To find a solution, we turned to the logs.
The logs pointed us to a query that we modified during the upgrade of CheckSums. While it had seemed harmless at the time of the upgrade, we were now suspicious – had all of the CheckSums in the database been updated? The old code only checked the first row, which seemed like an obvious error. Surely modifying this query to check all of the rows would not be an issue, and would be a simple fix to an obvious bug!
We realized that loading the entire contents of the DBCL table into memory, just to check the contents of the MD5SUM column introduced a big performance issue. Fortunately, this big issue also needed only a simple fix, properly filtering our query to return only the data we’re interested in.
We didn’t actually care about the entire table, only the rows that did not match the current checksum version. So we simply added a filter to the query, and now the liquibase release-locks command performs much better:
$ time liquibase release-locks
Starting Liquibase at 12:18:38 (version [local build] #0 built at 2024-04-24 17:15+0000)
Liquibase Version: [local build]
Liquibase Pro [local build] by Liquibase licensed to LB Internal Engineering License until Tue Apr 30 19:00:00 CDT 2024
Successfully released all database change log locks for '***@jdbc:oracle:thin:@***.rds.amazonaws.com:***'
Liquibase command 'releaseLocks' was executed successfully.
real 0m4.632s
Under five seconds, down from over three minutes – fantastic!
Now that the problem was fixed, we dug in deeper to ask, “why was it so slow to begin with?” Loading 40,000 rows into memory might not be fast, but it definitely should not take 3 minutes.
This too, was a simple fix. We simply needed to increase the fetch size when we made a query. The fetch size informs the JDBC driver how many rows should be returned at one time. The issues were being reported on Oracle, and Oracle’s driver defaults to a fetch size of 10, meaning that every 10 rows requires a new communication with the database.
We opted to set the fetch size to a default of 1000 (for all databases, not just Oracle), and immediately observed notable performance improvements on all Liquibase commands. The liquibase history command, for instance, saw a 77% decrease in the time it took to run. The complete set of commands and improvements includes:
release-locks
: 65% fastervalidate
: 77%update-sql
: 42%update
: 37%history
: 77%status
: 78%
These speed and efficiency gains add up to significant time and resources saved for enterprises managing 1,000s or 10,000s of ChangeSets.
Supporting speed & agility for large-scale database pipelines
Since solving the issue causing Liquibase users to see performance degradation for massive deployments – 10,000+ ChangeSets – our engineering team is looking for other ways to support and enhance database scalability. We look toward innovation, keeping in mind what’s been learned from bug fixes like this one:
- The importance of efficiency at scale: Small inefficiencies, like querying unnecessary data, can cause major performance issues in large-scale systems, so always optimize your code for scalability.
- Intentional, disciplined bug-fixing: Fixing bugs outside a structured process can introduce new problems, underscoring the importance of sticking to documented tickets and thorough testing.
- Continuous improvement: Every performance issue is an opportunity for refinement, so embrace iterative improvement to make your systems more robust over time.
- Embrace a constant mindset of scalability: As your systems grow, regularly reassess your tools and processes to ensure they scale effectively and continue to meet evolving demands.
Engineering teams can take this advice for their own scalability initiatives – and have confidence that Liquibase can handle their rapidly increasing database change management needs.
Download the latest version of Liquibase to manage high-velocity change workflows across 60+ supported databases.