July 26, 2023
What is data integrity in DBMS? Protecting data integrity in database change management
See Liquibase in Action
Accelerate database changes, reduce failures, and enforce governance across your pipelines.
Change is the only constant in the world of databases and pipelines.
Perhaps add “sprawl” to that short list of database constants, because we’re only capturing and creating more data, every second.
But when it comes to having data you can rely on, extract value from, and share effectively, you can’t go about database changes without keeping a pulse on data integrity in the database. And as data sprawl accelerates, any type of manual process for managing change and ensuring integrity is quickly going to implode — or at least bottleneck your development workflows.
First, a recap on data integrity in general.
What is data integrity? Why is it important?
Data integrity makes sure every person, team, and platform that wants to interact with your data and development ecosystem can do so with speed, accuracy, and efficacy. When your data is properly created, stored, shared, and leveraged while maintaining critical structure, accuracy, value, and form, it has integrity.
Trust, quality, reliability, and security all factor into and hinge on data integrity. Maintaining data integrity ensures that information remains unaltered and trustworthy, protecting against unauthorized access, accidental modifications, and data corruption. Without data integrity, the foundation of data-driven decision-making is compromised, leading to potential errors, inefficiencies, and security vulnerabilities.
By ensuring data integrity, organizations can confidently base decisions on reliable data, maintain compliance with industry regulations, and protect sensitive information from unauthorized access or tampering. As systems grow more complex and interconnected, the importance of data integrity only increases, making it a critical component of modern data management practices.
It's about having the most powerful and trustworthy data throughout its journey.
As software systems evolve and scale, the risk of data corruption increases, especially during database changes or when integrating new components. Ensuring data integrity requires the automation of strong validation mechanisms, consistent data management practices, and regular audits to detect and rectify any inconsistencies. For DevOps teams, data integrity is not just a technical requirement but a key component of delivering reliable, high-quality applications.
Maintaining data integrity requires a combination of best practices, tools, and processes. These include data validation, error-checking, encryption, and secure access controls, all designed to protect data from tampering or loss. By prioritizing data integrity, organizations can build systems that not only function correctly but also maintain the trust of users and stakeholders by ensuring that their data is accurate, consistent, and secure throughout its journey.
Without data integrity, organizations risk data corruption, security breaches, and significant operational inefficiencies, which can lead to costly errors, loss of customer trust, and legal consequences. For application development and DevOps teams, prioritizing data integrity is not just about safeguarding information — it’s about building robust, dependable systems that support business goals and enhance the user experience.
Before covering data integrity at the database layer, there's still more to break down with this general definition of data integrity.
Physical integrity
Physical integrity refers to the tangible state of the database. Now whether the 1s and 0s are in the right place - but whether the server is plugged in. Much of this burden is now offloaded to cloud database vendors, but it nonetheless means preparing for physical damage, such as hardware failures, natural disasters, or other environmental factors. Ensuring physical integrity involves implementing backups, redundant systems, and disaster recovery plans to prevent data loss or security breach due to physical damage.
Logical integrity
Is data accurate and consistent within its database? Is it logically sound and conforming to the rules and constraints defined within the database schema? This breaks down further into four categories:
- Entity integrity, which ensures that each table has a unique primary key, preventing duplicate or null values in those key fields
- Referential integrity, which maintains consistent relationships between tables, ensuring that foreign keys correctly reference valid records in related tables
- Domain integrity, which enforces valid entries for a given column by restricting the type, format, or range of values that can be entered
- User-defined integrity, which allows for the creation of additional business rules or constraints that ensure data meets specific organizational requirements
It's evident how more frequent schema changes pose a risk to all four types of logical integrity.
Data integrity in transit
Data integrity in transit is more concerning with more frequent data aggregations, transformations, and connections as the journey expands. Data needs to maintain accuracy and consistency as it is transferred between systems or networks. Techniques like encryption, checksums, and error detection/correction are used to ensure that data remains in its original condition during transmission.
Data integrity at rest
While logical integrity is about the data and database's alignment with constraints and definitions, and physical integrity about the tangible hardware's protection, data integrity at rest refers to the data's digital safety. It refers to the accuracy and consistency of data stored in databases, files, or other storage systems. Techniques such as encryption, access controls, and regular integrity checks are used to protect data from unauthorized access, corruption, or alteration while it is stored.
Bringing these concepts to database change management creates a more secure, flexible, and scalable data pipeline.
What is data integrity for databases (DBMS)?
These same concepts extend to your databases, their structures, and the database management system (DBMS) where it all happens.
Database professionals need to consider both the logical and physical integrity of their data. Logical integrity includes:
- Ensuring unique entries
- Building context and relationships between datasets
- Outlining the spectrum of acceptable values to define and limit data
- Unique business requirements of your organization’s specific sets of data
Meanwhile, physical integrity refers to actual data storage hardware as well as pipelines, APIs, and other touchpoints. It also means protecting your database from the unexpected — hackers, natural disasters, power outages, and internal malicious activity.
Throughout database DevOps, management, deployment, CI/CD, and the entire IT/cloud infrastructure, data integrity is a make-it-or-break-it characteristic. With data integrity ensured, database teams will see gains in consistency, security, and speed for better, faster, stronger releases.
In the bigger picture, data integrity maintained through database changes means the organization can expand at the speed of rapid application development while making the best data-driven decisions possible. Think of all the big-dollar decisions made within your organization based on dashboards, insights, and other forms of data analysis. If data was somehow damaged or transformed while the database underwent its updates and changes, suddenly the company’s future could be in the balance.
Where does database data integrity begin?
In simplest terms: at the start of database design. The guidelines and specific workflows, processes, and procedures constructed by database developers are the foundation of data integrity throughout the entire data pipeline, and thus throughout any database changes.
Those database changes are also a key component—or potential weak point—in data integrity. Watertight, bulletproof database change procedures ladder directly into a data integrity strategy. By owning data integrity within the database change protocol, database developers take ownership of and deliver to their teammates accurate, reliable, and accessible data that’s up-to-date and complete.
Imagine a database rollback without data integrity. It could undo months or years of progress if data gets lost, altered, or disorganized. It could derail the database rollback and upend the entire initiative.
Database change opens the door to data integrity threats
Since data transfer happens at massive volume and scale during CI/CD and general database changes, errors in this process are a leading cause of data integrity degradation. Typically, they’re a combination of human mistakes, misalignment of seemingly similar data tables, or a bug within the database change management procedure.
You can then arrive at the conclusion that data integrity is the top priority for DevOps. But while it might be philosophically high on the ranking, the functions and tactics that ensure data integrity may not always be top of mind. That’s why database change management systems need to streamline and automate crucial aspects like access and permissions, validation, changelogs, audits, and rollbacks.
While a smart, experienced human’s guidance is critical to setting up this kind of data integrity protection program, the ongoing execution quickly overwhelms teams when handled manually. And with so many aspects to consider in the CI/CD and other database deployment schedules, it should ideally be a given that all data — before and after deployments — maintains its integrity.
Automated Quality Checks support data integrity in the database change process
So, how can data integrity be prioritized within database management and deployment?
One method involves Liquibase Quality Checks. They can help by adding an automated layer of governance that enables quality, safety, and consistency.
Quality Checks bring data integrity to the foreground of your deployment by allowing you and other database developers to check your code against the rules, rollback scripts, and taxonomies assigned by DBAs. Additionally, these Quality Checks can confirm details required by security teams to ensure critical alignment before anything is released.
This confidence-boosting data integrity layer empowers developers, reduces DBA review times, supports scalability, and obliterates procedural bottlenecks.
Quality Checks support data integrity by:
- Auto-reviewing every line of SQL code for accuracy, completeness, security, and more
- Instantly providing feedback for quicker turnaround
- Eliminating manual processes for review and removing the risk of human errors
- Enforces DBA requirements in every aspect of data structure
- Bringing database change reviews to earlier stages of the process
With database release automation including Quality Checks, a lot of bad scenarios can be detected before any shared database is changed, eliminating rollbacks and containing the scope of rework required further down the pipeline. Checking changes before they get close to production environments ensures that dependencies aren’t built atop faulty releases, which could trigger a domino effect of breakdowns.
More ways Liquibase supports data integrity
Quality Checks are a specific Liquibase capability directed at the protection of data integrity, but they’re not the only way the database DevOps solution supports data integrity. At its core, Liquibase equips teams with the ability to script and version control all database changes, which inherently feeds into integrity efforts. Keeping track of – and being able to easily, consistently reuse – those change scripts and centrally managing versions sets a foundation of data integrity for any and all evolutions of a company’s network of databases.
By integrating the database change management process into the CI/CD pipeline, Liquibase also prioritizes data integrity by putting every deployment through the ringer of staging and pre-production environments. The ability to test and review database changes before they’re pushed live, just like how software dev teams do in their realm, means nothing makes it live that isn’t functioning properly. Issues can be found earlier, abating any problems down the line.
Bring data integrity into the spotlight for your database teams
Likely the best and fastest way to bring the focus on data integrity to database teams throughout an organization is to focus on database release automation. You can dive into those topics with our robust Database Automation Guide, which lays out the critical elements for CI/CD, compliance, security, and general database management professionals.
You can also venture into the Liquibase product, a database change management solution that handles things like data integrity throughout database deployments, so you and your teams can focus on full-speed coding and continuous delivery.