NoSQL, which stands for “Non-SQL,” refers to a class of database management systems that differ from traditional relational databases’ approach to storing and managing data. It focuses on providing a mechanism for storing and retrieving data modeled in means other than tabular relations.
The history of NoSQL is tied to the evolving needs of data storage and processing, particularly with the rise of the internet and large-scale web applications. NoSQL first emerged in the early 2000s, driven by the evolving needs of Web 2.0 applications.
Here’s an overview of the key milestones in the history of NoSQL:
1960-2000 –
The concept of non-relational databases predates NoSQL. Early database models, like hierarchical (e.g., IBM’s IMS) and network databases (e.g., CODASYL), did not use the relational model but were used for large-scale enterprise systems.
The rise of the internet and the web led to new types of applications that required databases to handle large volumes of unstructured data, high read/write loads, and distributed architectures. Relational databases struggled with these requirements, particularly in terms of scalability and flexibility.
2000s –
Carlo Strozzi coined the term NoSQL to describe his open-source, lightweight, non-relational database, which was not widely adopted. In the mid-2000s, Google and Amazon faced scalability challenges, prompting the development of their non-relational data stores. Google introduced Bigtable, a distributed storage system for managing structured data, while Amazon developed Dynamo, a distributed key-value store.
2009 –
2009 marked an important milestone for NoSQL as it marked the popularisation and beginning of the modern NoSQL movement. Johan Oskarsson reintroduced the term NoSQL in early 2009 when he organized an event to discuss open-source distributed, non-relational databases. The name attempted to label the emergence of an increasing number of non-relational, distributed data stores, including open-source clones of Google’s Bigtable and Amazon’s DynamoDB.
It was the year when the famous MongoDB and Redis were adopted. These 2 are famously used in most large-scale applications.
2010s –
The NoSQL ecosystem grew rapidly, with databases like Neo4j (a graph database), Elasticsearch (a search engine and analytics database), and HBase (a distributed database modeled after Google’s Bigtable) gaining popularity. Companies like Netflix, LinkedIn, and Twitter adopted NoSQL databases to handle their massive user bases and data streams.
The CAP Theorem adoption –
The definition of the CAP Theorem, by Eric Brewer, is an important part of non-relational databases. It states that a distributed data store cannot simultaneously offer more than two or three established guarantees. The guarantees are –
- Consistency: The data within the database remains consistent, even after an operation has been executed. For instance, after updating a system, all clients will see the same data.
- Availability: The system is constantly on, with no downtime.
- Partition Tolerance: Even if communication among the servers is no longer reliable, the system will continue to function. This is because the servers can be partitioned off, into multiple groups which can’t communicate with each other.
2020s –
The distinction between NoSQL and SQL began to blur as traditional relational databases started incorporating features inspired by NoSQL systems, such as support for JSON documents and horizontal scalability. Conversely, NoSQL databases added more traditional features like ACID transactions and SQL-like query languages. The rise of multi-model databases that support multiple data models (e.g., document, graph, key-value) within a single system reflected the trend towards flexible data storage solutions.
Non-Relational Data Storage Design –
Non-relational data storage is often open source, non-relational, schema-less, and horizontally scalable. The term elasticity is used for data storage that is scalable, schema-free, and allows for rapid changes and rapid replication.
NoSQL uses data stores optimized for specific purposes. Normally, NoSQL stores data in one of the four categories:
- Key-value storage – It is a data storage system designed for storing, retrieving, and managing associative arrays. Unlike relational databases, which use predefined data structures with tables and fields of specific data types, a Key-Value Store operates differently. It offers flexibility by allowing a variety of optimal options for classifying data types. Examples – Redis, DynamoDB
- Document storage – It is designed for storing, retrieving, and managing document-oriented information, often referred to as semi-structured data. While similar to Key-Value Stores, Document Stores differ in how they process data. They utilize the internal structure of documents for identification and storage. In a Document Store, all information related to a specific item is stored as a single instance within the database, rather than being distributed across multiple tables like in relational systems. This approach simplifies the mapping of items into the database. Examples – MongoDB, MarkLogic, RavenDB
- Wide Column storage – this utilizes tables, rows, and columns, but unlike relational databases, the names and formats of columns can vary from row to row within the same table, offering greater flexibility. These stores often support column families, which are stored separately and typically consist of multiple columns that are used together. In a column family, data is stored row-by-row, with all columns for a specific row stored together, rather than each column being stored separately. Wide Column Stores that support column families are also referred to as column family databases. Examples – ScyllaDB, Google BigTable, HBase
- Graph database – It is essentially a collection of relationships. Each node represents an entity, such as a business, person, or object, and these nodes are interconnected. The connections between nodes, known as edges, represent the relationships between them. In a Graph Database, each node has a unique identifier, a set of incoming and/or outgoing edges, and attributes stored as key-value pairs. Similarly, each edge has its unique identifier, a start and/or end node, and a set of properties. Examples – Neo4, Azure Cosmos DB, TigerGraph
About Euphoric Thought Technologies –
Euphoric is an IT firm providing end-to-end product development services with its deep technical expertise and industry experience. The offerings include DevOps, Cloud Computing, Application Development, Data Science and Analytics, AI/ML, and ServiceNow consulting. You can contact us here to learn more about Euphoric. Reach out here to connect with the team.