July 29, 2024
What is a database schema? Defining & changing data structures for SQL & NoSQL data stores
See Liquibase in Action
Accelerate database changes, reduce failures, and enforce governance across your pipelines.
A database schema is often referred to as the blueprint for how data is stored in the environment – and that’s pretty much exactly right. Essentially, a database schema is a set of constraints that ensures all the data is aligned and compatible.
In essence, a database schema defines the structure of the database – like a blueprint. It’s defined and managed by database administrators (DBAs) to serve the needs of application development, DevOps, and data teams – as well as end-users and downstream stakeholders – who leverage the database.
Changing the schema – the underlying structure of the database – can be daunting and fraught with risk. It’s also a typically manual process that takes substantial time and resources to properly test and deploy. A database DevOps automation tool like Liquibase automates schema definition, change, and deployment.
Database schema definition
A database schema is the structural definition of the database – exactly how data is organized, constrained, and related. The schema drives consistency and alignment for database management, making for easier discoverability and analysis, while ensuring data isn’t duplicated or lost as it changes. It’s sometimes referred to as an entity-relationship diagram, as that’s essentially what it describes.
Schema represents the structure around data, so it can be understood by humans and machines. They are also iterative. While initially developed with complete stakeholder input, schemas evolve as the data, applications, and analytics they serve change in nature and need.
The most basic elements of traditional relational databases – tables and their fields, rows, and columns – need limits and guidelines. A map, key, or blueprint of how to read the table and its data. Which data goes where and what does it mean? How does it relate to other data? The schema is a database diagram that answers all these questions. It can also contain the keys, rules, permissions, and other governance that protects and controls data access and use. Let’s clarify some of the essential components:
- Tables
- Fields
- Relationships
- Constraints
- Indexes
- Views
A database schema tells the database management system where to go to answer a certain query. It gives a clear outline of where that data would be found and how it's organized. By explaining data storage attributes, it points the way to whatever data is being input or queried.
Building schemas: DDL (Data Definition Language)
A special subset of SQL, DDL (data definition language) is the code used to define and manage database schemas. DDL includes specific commands to specify database structure and organization – to create, alter, delete, or otherwise update database objects.
DDL provides a standard way to define and manage database structures, fitting neatly into automation platforms to facilitate schema management with consistency and integrity across multiple environments. It’s similar in nature to Data Manipulation Language (DML), another standardized language for databases but one that focuses on the data, not the structure.
Typically, database administrators (DBAs) and data architects work with application development teams, data analysts, product engineers, and infrastructure teams on data modeling, the process of designing and defining the database schema. The data modeling process involves discussing the logical and physical constraints of data storage from the perspective of every team.
The database schema doesn’t actually hold the data, it just describes how the data is to be held. Once the schema is finalized and the database is provisioned, data is loaded and the team can take a snapshot of the databases’s state – a database instance – that should show all the data properly populated and organized as outlined by the database’s schema. When it’s all working as it should, the database can best serve its users and applications as they query, analyze, and maintain the data within them.
Types of database schema
The kind of database schema used comes down to the needs of the teams targeting the database environment. They are also constrained by the database platform being used. At the most elemental, schema might be hierarchical: a simple tree-like structure with binary relationships, like an org chart. Slightly more complex, like a hierarchical schema with more than one relationship, a network schema might describe a logistics pipeline or a transportation system – think of a subway map.
In application development and data pipelines, the most commonly used schemas include relational, star, and snowflake. However, data can also be stored as raw or loosely defined format in NoSQL databases. No matter how precise or broad the database’s models, keeping it structurally aligned with its users and applications – and the other databases in the integrated pipeline – is critical to support availability, efficiency, security, and users.
Relational
The most common and straightforward, a relational schema defines a database as a collection of tables with rows and columns. Each table represents an entity, and relationships between tables are established using foreign or primary keys.
Relational schemas are primarily used for complex queries and transactional (OLTP) databases. Given they have such rigid, clearly defined structure and opportunities for normalization, they’re the de facto standard for master databases/systems of record because they support guaranteed data consistency.
Star schema
A star schema is composed of a central fact table surrounded by dimensional tables. While the central fact table stores quantitative data, the connected dimensional tables store attributes that describe the facts. Like basic relational schemas, star schemas are useful for complex query use cases. Yet because [???] they’re also foundational to data warehousing.
Snowflake schema
Take a star schema and break down its large dimensional tables into smaller, related tables – this tessellated star pattern becomes the snowflake schema.
By organizing – normalizing, to be precise – dimensional tables into more granular related tables, snowflake schemas reduce redundancy and duplicate data, since more minute descriptions can be applied consistently across fact tables. This boosts efficiency by ensuring that each piece of information is stored in only one place. A structure like this supports better data integrity and easier maintenance. Yet with more joined tables, querying becomes more difficult.
In data warehouses with high volumes of complex data aggregated from multiple sources, this kind of normalization is beneficial to minimize redundancy and maximize integrity. As such, performance and scalability improve, which enables more powerful analysis.
NoSQL database structure
A rapidly increasing category of the database landscape, NoSQL databases forgo the rigid, relational shema of traditional table-oriented schemas like those detailed above. They either don’t use SQL – Structured Query Language – or they don’t only use SQL. They instead store data as a defined data format, such as a JSON or XML document, an object, or a graph with nodes and edges, among many others.
NoSQL databases don’t have schemas, per se, bu they do have defined formats and can be organized into collections (analogous to tables in a relational databases). A NoSQL database could actually store relational data in tables and columns, if it was defined to do so. Yet it’s more suited to documents, objects, IoT data, and other specialized and high-volume data scenarios.
Just land it (now, quickly) – convert the data later.
Flexible, scalable, and ready for massive streams of raw data, NoSQL databases database definitions or formats include:
- Document, used by databases like MongoDB or DocumentDB treat each JSON, BSON, or XML document as a unit of data. They’re useful for content management systems, catalogs, and other sets with rich hierarchical representations.
- Key-value, used by databases like Redis, structure data as key-value pairs. Keys are unique and values can be any type of data, making key-value schemas useful for real-time data processing.
- Wide-column, used by databases like Apache Cassandra, evolves standard table-based relational schema by using dynamic columns…. This makes them useful for large-scale, high-write applications like IoT monitoring.
- Graph store, used by databases like Neo4j, stores data in nodes, edges, and properties, focusing on relationships and connections between data points. This schema is useful for social networks, recommendation engines, and other sets with traversing relationships.
- Time series, used by databases like InfluxDB, is oriented around time-stamped data. It’s useful for analytics, monitoring, and other time-bound practices.
- Object-oriented, used by databases like db4o, stores data as objects, allowing for complex data relationships inherently useful for object-priented programming.
Many variations of these core NoSQL schema types exist, including combinations or multimodal schemas.
Explore all 60+ databases supported by Liquibase, including innovative NoSQL platforms.
NoSQL allows for much better performance and elastic scaling. While data models do apply to NoSQL databases, they do not support the range of object definitions and enforcement of data rules. The data models are also typically highly denormalized to enable their scalability, flexibility, and speed.
NoSQL databases often exist as ephemeral data stores – they don’t persist and they won’t be the originator of a dataset. But they do capture data exceptionally fast, storing it just as it comes in. For instance, A NoSQL database may capture time-series data from IoT sensors every millisecond, storing it in raw format.
That data might end up in another, likely relational, database for analysis later on, but capturing it in an ephemeral NoSQL store enables fast data ingestion, high resiliency, and dynamic scalability. This “schema on read” situation means that the relational database’s schema won’t be applied to the data until it’s queried by the analyst. In contrast, data stored to a relational database undergoes “schema on write” – adopting the database structure (schema) as the data is created and stored.
They’ll likely never be the database of record or master source of truth, but boost capabilities for specific use cases as part of an integrated pipeline. Example use cases include:
- Ecommerce inventory management, in which an OLTP (relational) database is the system of record while NoSQL databases handle customer experiences like search and recommendations
- Flight reservations, in which a master flight schedule is created on a master SQL database, duplicated onto a NoSQL platform for greater search and retrieval speed, and then flight bookings are committed to a relational database for higher data resilience and consistency
- Real-time analytics, in which a relational database stores transactional data while a NoSQL database processes and analyzes large volumes of data in real time to provide immediate insights and reporting
- AI/ML applications, in which relational databases store structured training data, while NoSQL databases manage large volumes of unstructured data for high-speed processing of machine learning models and real-time user interactions
So if NoSQL databases enable freedom of data organization, yet schema is all about structure, how are they handled in the context of relational schemas and continuous change management?
While they don’t have rigid structures, NoSQL databases still abide by definitions and may also be grouped into constructs like collections, indexes, views, and others. These must all be updated consistently with the needs of their relevant applications, pipelines, and master (relational) databases.
Changing structures: Database schema migration
Collaborative data modeling and schema creation align and define a database’s structure, but soon enough, that structure will need to change to suit the evolving needs of the applications and pipelines it supports. This structural change is handled through database schema migration.
Database schema migration is the process of making iterative changes to the database schema to reflect new requirements. It involves modifying tables, indexes, or constraints while ensuring data integrity and consistency, typically using automated tools like Liquibase to streamline and manage the changes efficiently.
Dive into database schema migration with Liquibase’s comprehensive guide.
NoSQL change management
The same general approach applies to NoSQL databases, except the opportunities for change are more limited, given their inherently broad guidelines. Still, data definitions within these NoSQL databases will need to evolve and sync with future targets, and those will still need careful review to avoid production disruptions.
Relational, star, and snowflake schemas can be applied to NoSQL databases in whole or in part, and any changes to the relational database will need to be aligned with NoSQL definitions if and where necessary. While these specialized databases have only loosely defined schemas, they’re no less important and equally critical to keep up to date and consistent.
Automating schema creation & management
Creating schema takes strategy, expertise, and automation. Teams looking for self-service database deployments look to Liquibase for database schema creation and change management. It enables instant feedback on database-as-code change scripts to quickly refine critical structural enhancements across pipelines including traditional and NoSQL databases.
Liquibase supports schema management by automating the deployment of changes across various environments, ensuring consistency and reducing errors. It integrates seamlessly with CI/CD pipelines, provides version control for database changes, and allows teams to track, audit, and roll back changes as needed, facilitating efficient and reliable database schema evolution.
Learn more about how Liquibase works or watch an on-demand demo to see schema creation and change in action.