A substantial amount of database types are now offered in the cloud to suite a variety of business use cases. This adds more flexibility to data and application solution design but increases the difficultly of distinguishing which type is best suited depending on the scenario. In a previous post, we looked at the relational databases available on AWS and defined how they can be best leveraged for a variety of solutions. This post explores the diametric set of databases that fall into the non-relational category and serves to disentangle those services which are offered on AWS.
Non-Relational Databases
This is the opposite of a Relational database, here the data is not stored in a tabular structure (columns and rows) and no predefined relationship exists among tables. This means it’s based on the type of data its storing which makes it very flexible and adaptable.
TYPES OF NON-RELATIONAL DATABASES: Key-Value, Document, IN-Memory, Graph, Search, Time-Series, Ledger
1) Key-Value
A key-value database stores data as a key-value pair. An example is a dictionary, where the key is the unique identifier, and the value holds the associated attributes. The two major Key-value databases available on AWS are Amazon DynamoDB and Amazon Keyspaces.
This is a fully managed serverless NoSQL database in AWS, it is one of the most commonly used NoSQL databases because it supports Updating transactions across multiple tables (ACID) and it allows for in-memory caching with DAX. These are some of the features
- Global Tables: Multi-region, multi-master database
- Backups are allowed and supports point-in-time recovery
- Single-digit millisecond performance at any scale
- Supports CRUD (Create/Read/Update/Delete) operations through APIs
- No direct analytical queries (joins are not allowed)
- Access patterns must be known ahead of time for efficient design and performance
Amazon Keyspaces is a fully managed serverless database that is used to execute Cassandra workloads on AWS. Cassandra is an open-source NoSQL database. Keyspaces is available in both On-demand and Provisioned mode.
C) AWS S3
S3 is also considered a Key-Value database which is used for storing huge volumes of data (semi-structured or unstructured data). For each uploaded file, the Key is the unique filename, and the value is the content of the file.
2) Document:
This is a document NoSQL database, it is used for storing and managing json-like documents. This data model is commonly used by developers because it has same data format used in their application code. Documents store data in field-value pairs. There is only one document database on AWS (Documents store data in field-value pairs)
This is a Fully managed NoSQL document database for executing MongoDB workloads. These JSON documents are stored in collections. A collection is a group of documents similar to a table. It uses the same architecture as Aurora.
3) IN-Memory
Used for performing in-memory tasks mainly in situations where accessing data in a disk might be an expensive operation. So having an in-memory database helps save time and improve performance than pulling the same information from the disk.
This is the main Fully managed in-memory service available on AWS. It has its own dedicated caching instance (Remote cache). Elasticache supports two in-memory engines (Redis and Memcached). Redis is suitable for complex applications including message queues, session caching, leaderboards etc. Memcached on the other hand is suitable for simple Application Aurora (w/ integrated cache) Database caches and is also useful when working with Multithreaded architecture
Dax is an in-memory caching service that is used with DynamoDB, it allows for faster in-memory operations and better performance when working with DynamoDB. There are two types of DAX caches (item cache and query cache). Item cache stores results of index reads while Query cache stores results of Query and Scan operations. A DAX’s use case is when users access a small number of items more frequently than others.
Useful Scenario
If you have an application that needs to be accessed very often for the same information e.g. gaming Leaderboard. The best solution will be to have an in-memory database (Amazon ElastiCache for Redis), since it allows us to store frequently accessed data mainly to perform read operations
4) Graph
Graph database shows how data is interconnected. It provides a high-level detail of the relationship between the data in a database using nodes (stores data entities) and edges (stores the relationship information between edges).
Amazon Neptune is the Fully managed Graph database service available on AWS. This database makes it easy to quickly access complex relationships between connected datasets. It uses Apache TinkerPop Gremlin and RDF/SPARQL as the graph query languages
Useful Scenario
Fraud detection and Recommendation engines. For fraud detection, transactions will be stored as graphs and this will help identify related pieces in a dataset. Once the patterns are detected it then becomes easy to find the fraudulent ones.
5) Search
This service makes it easy to search for any kind of information in your data warehouse and to provide near real-time visualizations and analytics of your data (this includes log files, text files, messages etc.).
This is the fully managed search service available on AWS. This is an open-source fork of Elasticsearch and Kibana, it was recently renamed from Amazon Elasticsearch Service to Amazon OpenSearch Service.
Useful Scenario
This is mostly used by developers, and it can be used for Full-text search and Log analytics. An example is searching documents for a particular word. What this does is it gives an aggregate count of the word and summarizes the data.
6) Time-Series
Time series database is used to effectively store and retrieve trillions of events in real-time. This data is stored as a pair of time and associated value. Using this process makes it easy to analyze time series since we are working with data points in time.
This is a fully managed serverless time-series database service. It is used to process a huge volume of data over time.
Useful Scenario
Stock market and IoT device data of high volume where trending of patterns centered on time is the most important dimension to analyze data by. AWS Timestream has in-memory capabilities which make real-time use case (on analyzing the most recent data) extremely performant.
7) Ledger
This database is an append-only NoSQL database i.e. it is an immutable, transparent, and cryptographically verifiable ledger
This is a Fully managed serverless ledger database that uses PartiQL as the query language and stores the data in Amazon ION format. The three main features of QLDB are Ledger, Journal, and tables.
- Ledger contains a journal and list of tables
- Journal holds the ordered history of the cryptographically verifiable entry of every change made in the tables.
- Tables are set of documents i.e. actual data and are stored in the amazon ion format
Useful Scenerio
A government agency requires a method to track the history of Vehicle ownership. In this case, a non-immutable record-based system such as AWS QLDB which stores the history of Vehicle ownership over time could be a perfect fit.
Lastly, I have highlighted the differences between the NoSQL databases in the table below:
Database | Data Type | Workloads | Data Size | Performance |
Amazon DynamoDB | Semi-structured | Transactional Key-Value / Document Store | High TB range | Ultra-high throughput, low latency (ultra-low latency with Dax) |
Amazon Keyspaces | Semi-structured | Cassandra | N/A | Low latency |
Amazon DocumentDB | Semi-structured | MongoDB | Up to 64 TB | High throughput, low latency |
Amazon ElastiCache | Semi-structured/ Unstructured | In-memory caching | Low TB Range | High throughput, ultra-low latency |
Amazon Neptune | Graph-Structured | Highly connected graph datasets | Mid TB Range | High throughput, low latency |
Amazon QLDB | Structured/ Semi-structured | Transactional | N/A | High throughput, low latency |
In Conclusion
The explosion of non-relational database services has simplified and optimized backend architectures for a variety of old and new use cases. These non-relational databases cater to specific use cases and what AWS has been doing in recent times is bringing all these databases into their platform, therefore making it easy to have access to them in one environment. For example, Amazon QLDB is used for developing ledger databases and Amazon Keyspaces is used to run Cassandra workloads. If you are looking for a cloud solution that fits your business, you can reach out to us directly.
Indellient takes a customer-first approach to help you build a modern cloud strategy on Amazon Web Services, Windows Azure, and Google Cloud Platform. Our team can help you build, replatform, migrate and integrate applications, so you can benefit from the scalability, agility, and performance available through cloud technologies.
Indellient is an IT Professional Services Company that specializes in Data Analytics, Cloud Services, DevOps Services, and Business Process Management.