Comparing DynamoDB and MongoDB

Quick Comparison Table

MongoDB DynamoDB
Freedom to Run Anywhere

Only available on AWS

No support for on-premises deployments

Locked-in to a single cloud provider

Data Model

Limited key-value store with JSON support

Maximum 400KB record size

Limited data type support (number, string, binary only) increases application complexity

Querying

Key-value queries only

Primary-key can have at most 2 attributes, limiting query flexibility

Analytic queries requires replicating data to another AWS service, increasing cost and complexity

Indexing

Limited / Complex to manage

Indexes are sized, billed & provisioned separately from data

Hash or hash-range indexes only

Global secondary indexes (GSIs) are inconsistent with underlying data, forcing applications to handle stale data

Local secondary indexes (LSIs) can be strongly consistent, but must be defined when a table is created

GSIs can only be declared on top level item elements. Cannot index sub-documents or arrays, making complex queries impossible

Maximum of 20 GSIs & 5 LSIs per table

Data Integrity

Eventually Consistent

Complex – need to handle stale data in application

No data validation – must be handled in application

ACID transactions apply to table data only, not to indexes or backups

Maximum of 25 writes per transaction

Monitoring and Performance Tuning

Black-box

Fewer than 20 metrics limit visibility into database behavior

No tools to visualize schema or recommend indexes

Backup

On-demand or continuous backups

No queryable backup; additional charge to restore backups; many configurations are not backed up and need to be recreated manually

Pricing

Highly Variable

Throughput-based pricing

A wide range of inputs may affect price. See Pricing and Commercial Considerations

What is DynamoDB?

DynamoDB is a proprietary NoSQL database service built by Amazon and offered as part of the Amazon Web Services (AWS) portfolio.

The name comes from Dynamo, a highly available key-value store developed in response to holiday outages on the Amazon e-commerce platform in 2004. Initially, however, few teams within Amazon adopted Dynamo due to its high operational complexity and the trade-offs that needed to be made between performance, reliability, query flexibility, and data consistency.

Around the same time, Amazon found that its developers enjoyed using SimpleDB, its primary NoSQL database service at the time which allowed users to offload database administration work. But SimpleDB, which is no longer being updated by Amazon, had severe limitations when it came to scale; its strict storage limitation of 10 GB and the limited number of operations it could support per second made it only viable for small workloads.

DynamoDB, which was launched as a database service on AWS in 2012, was built to address the limitations of both SimpleDB and Dynamo.

What is MongoDB?

MongoDB is an open, non-tabular database built by MongoDB, Inc. The company was established in 2007 by former executives and engineers from DoubleClick, which Google acquired and now uses as the backbone of its advertising products. The founders originally focused on building a platform as a service using entirely open source components, but when they struggled to find an existing database that could meet their requirements for building a service in the cloud, they began work on their own database system. After realizing the potential of the database software on its own, the team shifted their focus to what is now MongoDB. The company released MongoDB in 2009.

MongoDB was designed to create a technology foundation that enables development teams through:

  1. The document data model – presenting them the best way to work with data.
  2. A distributed systems design – allowing them to intelligently put data where they want it.
  3. A unified experience that gives them the freedom to run anywhere – allowing them to future-proof their work and eliminate vendor lock-in.

MongoDB stores data in flexible, JSON-like records called documents, meaning fields can vary from document to document and data structure can be changed over time. This model maps to objects in application code, making data easy to work with for developers. Related information is typically stored together for fast query access through the MongoDB query language. MongoDB uses dynamic schemas, allowing users to create records without first defining the structure, such as the fields or the types of their values. Users can change the structure of documents simply by adding new fields or deleting existing ones. This flexible data model makes it easy for developers to represent hierarchical relationships and other more complex structures. Documents in a collection need not have an identical set of fields and denormalization of data is common.

In summer of 2016, MongoDB Atlas, the MongoDB fully managed cloud database service, was announced. Atlas offers genuine MongoDB under the hood, allowing users to offload operational tasks and featuring built-in best practices for running the database with all the power and freedom developers are used to with MongoDB.

Terminology and Concepts

Many concepts in DynamoDB have close analogs in MongoDB. The table below outlines some of the common concepts across DynamoDB and MongoDB.

DynamoDB MongoDB
Table Collection
Item Document
Attribute Field
Secondary Index Secondary Index

Deployment Environments

MongoDB can be run anywhere – from a developer’s laptop to an on-premises data center to any of the public cloud platforms. As mentioned above, MongoDB is also available as a fully managed cloud database with MongoDB Atlas; this model is most similar to how DynamoDB is delivered.

In contrast, DynamoDB is a proprietary database only available on Amazon Web Services. While a downloadable version of the database is available for prototyping on a local machine, the database can only be run in production in AWS. Organizations looking into DynamoDB should consider the implications of building on a data layer that is locked in to a single cloud vendor.

Comparethemarket.com, the UK’s leading price comparison service, completed a transition from on-prem deployments with Microsoft SQL Server to AWS and MongoDB. When asked why they hadn’t selected DynamoDB, a company representative was quoted as saying “DynamoDB was eschewed to help avoid AWS vendor lock-in.”

Data Model

MongoDB stores data in a JSON-like format called BSON, which allows the database to support a wide spectrum of data types including dates, timestamps, 64-bit integers, & Decimal128. MongoDB documents can be up to 16 MB in size; with GridFS, even larger assets can be natively stored within the database.

Unlike some NoSQL databases that push enforcement of data quality controls back into the application code, MongoDB provides built-in schema validation. Users can enforce checks on document structure, data types, data ranges and the presence of mandatory fields. As a result, DBAs can apply data governance standards, while developers maintain the benefits of a flexible document model.

DynamoDB is a key-value store with added support for JSON to provide document-like data structures that better match with objects in application code. An item or record cannot exceed 400KB. Compared to MongoDB, DynamoDB has limited support for different data types. For example, it supports only one numeric type and does not support dates. As a result, developers must preserve data types on the client, which adds application complexity and reduces data re-use across different applications. DynamoDB does not have native data validation capabilities.

Queries and Indexes

MongoDB‘s API enables developers to build applications that can query and analyze their data in multiple ways – by single keys, ranges, faceted search, graph traversals, JOINs and geospatial queries through to complex aggregations, returning responses in milliseconds. Complex queries are executed natively in the database without having to use additional analytics frameworks or tools. This helps users avoid the latency that comes from syncing data between operational and analytical engines.

MongoDB ensures fast access to data by any field with full support for secondary indexes. Indexes can be applied to any field in a document, down to individual values in arrays.

MongoDB supports multi-document transactions, making it the only database to combine the ACID guarantees of traditional relational databases; the speed, flexibility, and power of the document model; and the intelligent distributed systems design to scale-out and place data where you need it.

Multi-document transactions feel just like the transactions developers are familiar with from relational databases – multi-statement, similar syntax, and easy to add to any application. Through snapshot isolation, transactions provide a globally consistent view of data and enforce all-or-nothing execution. MongoDB allows reads and writes against the same documents and fields within the transaction. For example, users can check the status of an item before updating it. MongoDB best practices advise up to 1,000 operations in a single transaction. Learn more about MongoDB transactions here.

Supported indexing strategies such as compound, unique, array, partial, TTL, geospatial, sparse, hash, wildcard and text ensure optimal performance for multiple query patterns, data types, and application requirements. Indexes are strongly consistent with the underlying data.

DynamoDB supports key-value queries only. For queries requiring aggregations, graph traversals, or search, data must be copied into additional AWS technologies, such as Elastic MapReduce or Redshift, increasing latency, cost, and developer work. The database supports two types of indexes: Global secondary indexes (GSIs) and local secondary indexes (LSIs). Users can define up to 5 LSIs and 20 GSIs per table. Indexes can be defined as hash or hash-range indexes; more advanced indexing strategies are not supported.

GSIs, which are eventually consistent with the underlying data, do not support ad-hoc queries and usage requires knowledge of data access patterns in advance. GSIs can also not index any element below the top level record structure – so you cannot index sub-documents or arrays. LSIs can be queried to return strongly consistent data, but must be defined when the table is created. They cannot be added to existing tables and they cannot be removed without dropping the table.

DynamoDB indexes are sized and provisioned separately from the underlying tables, which may result in unforeseen issues at runtime. The DynamoDB documentation explains,

“In order for a table write to succeed, the provisioned throughput settings for the table and all of its global secondary indexes must have enough write capacity to accommodate the write; otherwise, the write to the table will be throttled.”

DynamoDB also supports multi-record ACID transactions. Unlike MongoDB transactions, each DynamoDB transaction is limited to just 25 write operations; the same item also cannot be targeted with multiple operations as a part of the same transaction. As a result, complex business logic may require multiple, independent transactions, which would add more code and overhead to the application, while also resulting in the possibility of more conflicts and transaction failures. Only base data in a DynamoDB table is transactional. Secondary indexes, backups and streams are updated “eventually”. This can lead to “silent data loss”. Subsequent queries against indexes can return data that is has not been updated data from the base tables, breaking transactional semantics. Similarly data restored from backups may not be transactionally consistent with the original table.

Consistency

MongoDB is strongly consistent by default as all read/writes go to the primary in a MongoDB replica set, scaled across multiple partitions (shards). If desired, consistency requirements for read operations can be relaxed. Through secondary consistency controls, read queries can be routed only to secondary replicas that fall within acceptable consistency limits with the primary server.

DynamoDB is eventually consistent by default. Users can configure read operations to return only strongly consistent data, but this doubles the cost of the read (see Pricing and Commercial Considerations) and adds latency. There is also no way to guarantee read consistency when querying against DynamoDB’s global secondary indexes (GSIs); any operation performed against a GSI will be eventually consistent, returning potentially stale or deleted data, and therefore increasing application complexity.

Operational Maturity

MongoDB Atlas allows users to deploy, manage, and scale their MongoDB clusters using built in operational and security best practices, such as end-to-end encryption, network isolation, role-based access control, VPC peering, and more. Atlas deployments are guaranteed to be available and durable with distributed and auto-healing replica set members and continuous backups with point in time recovery to protect against data corruption. MongoDB Atlas is fully elastic with zero downtime configuration changes and auto-scaling both storage and compute capacity. Atlas also grants organizations deep insights into how their databases are performing with a comprehensive monitoring dashboard, a real-time performance panel, and customizable alerting.

For organizations that would prefer to run MongoDB on their own infrastructure, MongoDB, Inc. offers advanced operational tooling to handle the automation of the entire database lifecycle, comprehensive monitoring (tracking 100+ metrics that could impact performance), and continuous backup. Product packages like MongoDB Enterprise Advanced bundle operational tooling and visualization and performance optimization platforms with end-to-end security controls for applications managing sensitive data.

MongoDB’s deployment flexibility allows single clusters to span racks, data centers and continents. With replica sets supporting up to 50 members and geography-aware sharding across regions, administrators can provision clusters that support globally deployments, with write local/read global access patterns and data locality. Using Atlas Global Clusters, developers can deploy fully managed “write anywhere” active-active clusters, allowing data to be localized to any region. With each region acting as primary for its own data, the risks of data loss and eventual consistency imposed by the multi-primary approach used by DynamoDB are eliminated, and customers can meet the data sovereignty demands of new privacy regulations. Finally, multi-cloud clusters enable users to provision clusters that span across AWS, Azure, and Google Cloud, giving maximum resilience and flexibility in terms of data distribution.

Offered only as a managed service on AWS, DynamoDB abstracts away its underlying partitioning and replication schemes. While provisioning is simple, other key operational tasks are lacking when compared to MongoDB:

  • Fewer than 20 database metrics are reported by AWS Cloudwatch, which limits visibility into real-time database behavior
  • AWS CloudTrail can be used to create audit trails, but it only tracks a small subset of DDL (administrative) actions to the database, not all user access to individual tables or records
  • DynamoDB has limited tooling to allow developers and/or DBAs to optimize performance by visualizing schema or graphically profiling query performance
  • DynamoDB supports cross region replication with multi-primary global tables, however these add further application complexity and cost, with eventual consistency, risks of data loss due to write conflicts between regions, and no automatic client failover

Pricing & Commercial Considerations

In this section we will again compare DynamoDB with its closest analog from MongoDB, Inc., MongoDB Atlas.

DynamoDB‘s pricing model is based on throughput. Users pay for a certain capacity on a given table and AWS automatically throttles any reads or writes that exceed that capacity.

This sounds simple in theory, but the reality is that correctly provisioning throughput and estimating pricing is far more nuanced.

Below is a list of all the factors that could impact the cost of running DynamoDB:

  • Size of the data set per month
  • Size of each object
  • Number of reads per second (pricing is based on “read capacity units”, which are equivalent to reading a 4KB object) and whether those reads need to be strongly consistent or eventually consistent (the former is twice as expensive)
    • If accessing a JSON object, the entire document must be retrieved, even if the application needs to read only a single element
  • Number of writes per second (pricing is based on “write capacity units”, which are the equivalent of writing a 1KB object)
  • Whether transactions will be used. Transactions double the cost of read and write operations
  • Whether clusters will be replicated across multiple regions. This increases write capacity costs by 50%.
  • Size and throughput requirements for each index created against the table
  • Costs for backup and restore. AWS offers on-demand and continuous backups – both are charged separately, at different rates for both the backup and restore operation
  • Data transferred by Dynamo streams per month
  • Data transfers both in and out of the database per month
  • Cross-regional data transfers, EC2 instances, and SQS queues needed for cross-regional deployments
  • The use of additional AWS services to address what is missing from DynamoDB’s limited key value query model
  • Use of on-demand or reserved instances
  • Number of metrics pushed into CloudWatch for monitoring
  • Number of events pushed into CloudTrail for database auditing

It is key to point out from the list above that indexes affect pricing and strongly consistent reads are twice as expensive.

With DynamoDB, throughput pricing actually dictates the number of partitions, not total throughput. Since users don’t have precise control over partitioning, if any individual partition is saturated, one may have to dramatically increase capacity by splitting partitions rather than scaling linearly. Very careful design of the data model is essential to ensure that provisioned throughput can be realized.

AWS has introduced the concept of Adaptive Capacity, which will automatically increase the available resources for a single partition when it becomes saturated, however it is not without limitations. Total read and write volume to a single partition cannot exceed 3,000 read capacity units and 1,000 write capacity units per second. The required throughput increase cannot exceed the total provisioned capacity for the table. Adaptive capacity doesn’t grant more resources as much as borrow resources from lower utilized partitions. And finally, DynamoDB may take up to 15 minutes to provision additional capacity.

For customers frustrated with capacity planning exercises for DynamoDB, AWS recently introduced DynamoDB On-Demand, which will allow the platform to automatically provision additional resources based on workload demand. On-demand is suitable for low-volume workloads with short spikes in demand. However, it can get expensive quick — when the database’s utilization rate exceeds 14% of the equivalent provisioned capacity, DynamoDB On-Demand becomes more expensive than provisioning throughput.

Compared to DynamoDB, pricing for MongoDB Atlas is relatively straightforward by selecting just:

  • The instance size with enough RAM to accommodate the portion of your data (including indexes) that clients access most often
  • the number of replicas and shards that will make up the cluster
  • whether to include fully managed backups
  • the region(s) the cluster needs to run in

Users can adjust any of these parameters on demand. The only additional charge is for data transfer costs.

When to use DynamoDB vs. MongoDB

DynamoDB may work for organizations that are:

  • Looking for a database to support relatively simple key-value workloads
  • Heavily invested in AWS with no plans to change their deployment environment in the future

For organizations that need their database to support a wider range of use cases with more deployment flexibility and no platform lock-in, MongoDB would likely be a better fit.

For example, biotechnology giant Thermo Fisher migrated from DynamoDB to MongoDB for their Instrument Connect IoT app, citing that while both databases were easy to deploy, MongoDB Atlas allowed for richer queries and much simpler schema evolution.

Want to Learn More?

MongoDB Atlas Best Practices

This guide describes the best practices to help you get the most out of the MongoDB Atlas service, including: schema design, capacity planning, security, and performance optimization.

MongoDB Atlas Security Controls

This document will provide you with an understanding of MongoDB Atlas’ Security Controls and Features as well as a view into how many of the underlying mechanisms work.

from:Comparing DynamoDB and MongoDB | MongoDB

发表评论

电子邮件地址不会被公开。 必填项已用*标注