Comparing DynamoDB and MongoDB

Quick Comparison Table

	MongoDB	DynamoDB
Freedom to Run Anywhere	Deploy anywhere Laptop to mainframe, on-premise to hybrid cloud to managed cloud service MongoDB Atlas database as a service can be deployed on AWS, Azure and GCP	Only available on AWS No support for on-premises deployments Locked-in to a single cloud provider
Data Model	JSON based document store Up to 16MB document size Regular JSON data types and advanced BSON types: (int, long, date, timestamp, geospatial, floating point, and decimal128)	Limited key-value store with JSON support Maximum 400KB record size Limited data type support (number, string, binary only) increases application complexity
Querying	Rich query language Query by single keys, ranges, faceted search, JOINs and graph traversals, and geospatial queries Complex aggregation stages resembling UNIX pipes for data analytics On-demand materialized views for fast analytic queries	Key-value queries only Primary-key can have at most 2 attributes, limiting query flexibility Analytic queries requires replicating data to another AWS service, increasing cost and complexity
Indexing	Robust / Easy to develop against No extra cost to index your data Create hash, compound, unique, array, partial, TTL, geospatial, sparse, text and wildcard indexes to support any query pattern Indexes are strongly consistent with underlying data, always returning latest results Define secondary indexes on any field, at any time, including deeply nested array elements Performance Advisor recommends indexes based on query patterns with click-to-create functionality	Limited / Complex to manage Indexes are sized, billed & provisioned separately from data Hash or hash-range indexes only Global secondary indexes (GSIs) are inconsistent with underlying data, forcing applications to handle stale data Local secondary indexes (LSIs) can be strongly consistent, but must be defined when a table is created GSIs can only be declared on top level item elements. Cannot index sub-documents or arrays, making complex queries impossible Maximum of 20 GSIs & 5 LSIs per table
Data Integrity	Strongly Consistent Easy to reason about – always see current data Native schema governance and data validation ACID transactions apply to documents, indexes, and backups 1,000 operations per transaction (executing within 60 seconds by default)	Eventually Consistent Complex – need to handle stale data in application No data validation – must be handled in application ACID transactions apply to table data only, not to indexes or backups Maximum of 25 writes per transaction
Monitoring and Performance Tuning	Transparent MongoDB Atlas exposes 100+ metrics on database performance MongoDB Atlas Performance Advisor recommends optimal indexes and suggests schema changes MongoDB Compassenables schema visualization and graphical query construction	Black-box Fewer than 20 metrics limit visibility into database behavior No tools to visualize schema or recommend indexes
Backup	On-demand, continuous, or snapshot backups MongoDB Atlas provides fully-managed continuous backups, with filesystem snapshot backups for lower cost Queryable backups let you search backups without restoring data	On-demand or continuous backups No queryable backup; additional charge to restore backups; many configurations are not backed up and need to be recreated manually
Pricing	Consistent MongoDB Atlas pricing is based on RAM, I/O, and storage On-premises pricing is based on number of nodes or RAM sizing	Highly Variable Throughput-based pricing A wide range of inputs may affect price. See Pricing and Commercial Considerations

What is DynamoDB?

DynamoDB is a proprietary NoSQL database service built by Amazon and offered as part of the Amazon Web Services (AWS) portfolio.

The name comes from Dynamo, a highly available key-value store developed in response to holiday outages on the Amazon e-commerce platform in 2004. Initially, however, few teams within Amazon adopted Dynamo due to its high operational complexity and the trade-offs that needed to be made between performance, reliability, query flexibility, and data consistency.

Around the same time, Amazon found that its developers enjoyed using SimpleDB, its primary NoSQL database service at the time which allowed users to offload database administration work. But SimpleDB, which is no longer being updated by Amazon, had severe limitations when it came to scale; its strict storage limitation of 10 GB and the limited number of operations it could support per second made it only viable for small workloads.

DynamoDB, which was launched as a database service on AWS in 2012, was built to address the limitations of both SimpleDB and Dynamo.

What is MongoDB?

MongoDB is an open, non-tabular database built by MongoDB, Inc. The company was established in 2007 by former executives and engineers from DoubleClick, which Google acquired and now uses as the backbone of its advertising products. The founders originally focused on building a platform as a service using entirely open source components, but when they struggled to find an existing database that could meet their requirements for building a service in the cloud, they began work on their own database system. After realizing the potential of the database software on its own, the team shifted their focus to what is now MongoDB. The company released MongoDB in 2009.

MongoDB was designed to create a technology foundation that enables development teams through:

The document data model – presenting them the best way to work with data.
A distributed systems design – allowing them to intelligently put data where they want it.
A unified experience that gives them the freedom to run anywhere – allowing them to future-proof their work and eliminate vendor lock-in.

MongoDB stores data in flexible, JSON-like records called documents, meaning fields can vary from document to document and data structure can be changed over time. This model maps to objects in application code, making data easy to work with for developers. Related information is typically stored together for fast query access through the MongoDB query language. MongoDB uses dynamic schemas, allowing users to create records without first defining the structure, such as the fields or the types of their values. Users can change the structure of documents simply by adding new fields or deleting existing ones. This flexible data model makes it easy for developers to represent hierarchical relationships and other more complex structures. Documents in a collection need not have an identical set of fields and denormalization of data is common.

In summer of 2016, MongoDB Atlas, the MongoDB fully managed cloud database service, was announced. Atlas offers genuine MongoDB under the hood, allowing users to offload operational tasks and featuring built-in best practices for running the database with all the power and freedom developers are used to with MongoDB.

Terminology and Concepts

Many concepts in DynamoDB have close analogs in MongoDB. The table below outlines some of the common concepts across DynamoDB and MongoDB.

DynamoDB	MongoDB
Table	Collection
Item	Document
Attribute	Field
Secondary Index	Secondary Index

Deployment Environments

MongoDB can be run anywhere – from a developer’s laptop to an on-premises data center to any of the public cloud platforms. As mentioned above, MongoDB is also available as a fully managed cloud database with MongoDB Atlas; this model is most similar to how DynamoDB is delivered.

In contrast, DynamoDB is a proprietary database only available on Amazon Web Services. While a downloadable version of the database is available for prototyping on a local machine, the database can only be run in production in AWS. Organizations looking into DynamoDB should consider the implications of building on a data layer that is locked in to a single cloud vendor.

Comparethemarket.com, the UK’s leading price comparison service, completed a transition from on-prem deployments with Microsoft SQL Server to AWS and MongoDB. When asked why they hadn’t selected DynamoDB, a company representative was quoted as saying “DynamoDB was eschewed to help avoid AWS vendor lock-in.”

Data Model

MongoDB stores data in a JSON-like format called BSON, which allows the database to support a wide spectrum of data types including dates, timestamps, 64-bit integers, & Decimal128. MongoDB documents can be up to 16 MB in size; with GridFS, even larger assets can be natively stored within the database.

Unlike some NoSQL databases that push enforcement of data quality controls back into the application code, MongoDB provides built-in schema validation. Users can enforce checks on document structure, data types, data ranges and the presence of mandatory fields. As a result, DBAs can apply data governance standards, while developers maintain the benefits of a flexible document model.

DynamoDB is a key-value store with added support for JSON to provide document-like data structures that better match with objects in application code. An item or record cannot exceed 400KB. Compared to MongoDB, DynamoDB has limited support for different data types. For example, it supports only one numeric type and does not support dates. As a result, developers must preserve data types on the client, which adds application complexity and reduces data re-use across different applications. DynamoDB does not have native data validation capabilities.

Queries and Indexes

MongoDB‘s API enables developers to build applications that can query and analyze their data in multiple ways – by single keys, ranges, faceted search, graph traversals, JOINs and geospatial queries through to complex aggregations, returning responses in milliseconds. Complex queries are executed natively in the database without having to use additional analytics frameworks or tools. This helps users avoid the latency that comes from syncing data between operational and analytical engines.

MongoDB ensures fast access to data by any field with full support for secondary indexes. Indexes can be applied to any field in a document, down to individual values in arrays.

MongoDB supports multi-document transactions, making it the only database to combine the ACID guarantees of traditional relational databases; the speed, flexibility, and power of the document model; and the intelligent distributed systems design to scale-out and place data where you need it.

Multi-document transactions feel just like the transactions developers are familiar with from relational databases – multi-statement, similar syntax, and easy to add to any application. Through snapshot isolation, transactions provide a globally consistent view of data and enforce all-or-nothing execution. MongoDB allows reads and writes against the same documents and fields within the transaction. For example, users can check the status of an item before updating it. MongoDB best practices advise up to 1,000 operations in a single transaction. Learn more about MongoDB transactions here.

Supported indexing strategies such as compound, unique, array, partial, TTL, geospatial, sparse, hash, wildcard and text ensure optimal performance for multiple query patterns, data types, and application requirements. Indexes are strongly consistent with the underlying data.

DynamoDB supports key-value queries only. For queries requiring aggregations, graph traversals, or search, data must be copied into additional AWS technologies, such as Elastic MapReduce or Redshift, increasing latency, cost, and developer work. The database supports two types of indexes: Global secondary indexes (GSIs) and local secondary indexes (LSIs). Users can define up to 5 LSIs and 20 GSIs per table. Indexes can be defined as hash or hash-range indexes; more advanced indexing strategies are not supported.

GSIs, which are eventually consistent with the underlying data, do not support ad-hoc queries and usage requires knowledge of data access patterns in advance. GSIs can also not index any element below the top level record structure – so you cannot index sub-documents or arrays. LSIs can be queried to return strongly consistent data, but must be defined when the table is created. They cannot be added to existing tables and they cannot be removed without dropping the table.

DynamoDB indexes are sized and provisioned separately from the underlying tables, which may result in unforeseen issues at runtime. The DynamoDB documentation explains,

“In order for a table write to succeed, the provisioned throughput settings for the table and all of its global secondary indexes must have enough write capacity to accommodate the write; otherwise, the write to the table will be throttled.”

DynamoDB also supports multi-record ACID transactions. Unlike MongoDB transactions, each DynamoDB transaction is limited to just 25 write operations; the same item also cannot be targeted with multiple operations as a part of the same transaction. As a result, complex business logic may require multiple, independent transactions, which would add more code and overhead to the application, while also resulting in the possibility of more conflicts and transaction failures. Only base data in a DynamoDB table is transactional. Secondary indexes, backups and streams are updated “eventually”. This can lead to “silent data loss”. Subsequent queries against indexes can return data that is has not been updated data from the base tables, breaking transactional semantics. Similarly data restored from backups may not be transactionally consistent with the original table.

Consistency

MongoDB is strongly consistent by default as all read/writes go to the primary in a MongoDB replica set, scaled across multiple partitions (shards). If desired, consistency requirements for read operations can be relaxed. Through secondary consistency controls, read queries can be routed only to secondary replicas that fall within acceptable consistency limits with the primary server.

DynamoDB is eventually consistent by default. Users can configure read operations to return only strongly consistent data, but this doubles the cost of the read (see Pricing and Commercial Considerations) and adds latency. There is also no way to guarantee read consistency when querying against DynamoDB’s global secondary indexes (GSIs); any operation performed against a GSI will be eventually consistent, returning potentially stale or deleted data, and therefore increasing application complexity.

Operational Maturity

MongoDB Atlas allows users to deploy, manage, and scale their MongoDB clusters using built in operational and security best practices, such as end-to-end encryption, network isolation, role-based access control, VPC peering, and more. Atlas deployments are guaranteed to be available and durable with distributed and auto-healing replica set members and continuous backups with point in time recovery to protect against data corruption. MongoDB Atlas is fully elastic with zero downtime configuration changes and auto-scaling both storage and compute capacity. Atlas also grants organizations deep insights into how their databases are performing with a comprehensive monitoring dashboard, a real-time performance panel, and customizable alerting.

For organizations that would prefer to run MongoDB on their own infrastructure, MongoDB, Inc. offers advanced operational tooling to handle the automation of the entire database lifecycle, comprehensive monitoring (tracking 100+ metrics that could impact performance), and continuous backup. Product packages like MongoDB Enterprise Advanced bundle operational tooling and visualization and performance optimization platforms with end-to-end security controls for applications managing sensitive data.

MongoDB’s deployment flexibility allows single clusters to span racks, data centers and continents. With replica sets supporting up to 50 members and geography-aware sharding across regions, administrators can provision clusters that support globally deployments, with write local/read global access patterns and data locality. Using Atlas Global Clusters, developers can deploy fully managed “write anywhere” active-active clusters, allowing data to be localized to any region. With each region acting as primary for its own data, the risks of data loss and eventual consistency imposed by the multi-primary approach used by DynamoDB are eliminated, and customers can meet the data sovereignty demands of new privacy regulations. Finally, multi-cloud clusters enable users to provision clusters that span across AWS, Azure, and Google Cloud, giving maximum resilience and flexibility in terms of data distribution.

Offered only as a managed service on AWS, DynamoDB abstracts away its underlying partitioning and replication schemes. While provisioning is simple, other key operational tasks are lacking when compared to MongoDB:

Fewer than 20 database metrics are reported by AWS Cloudwatch, which limits visibility into real-time database behavior
AWS CloudTrail can be used to create audit trails, but it only tracks a small subset of DDL (administrative) actions to the database, not all user access to individual tables or records
DynamoDB has limited tooling to allow developers and/or DBAs to optimize performance by visualizing schema or graphically profiling query performance
DynamoDB supports cross region replication with multi-primary global tables, however these add further application complexity and cost, with eventual consistency, risks of data loss due to write conflicts between regions, and no automatic client failover

Pricing & Commercial Considerations

In this section we will again compare DynamoDB with its closest analog from MongoDB, Inc., MongoDB Atlas.

DynamoDB‘s pricing model is based on throughput. Users pay for a certain capacity on a given table and AWS automatically throttles any reads or writes that exceed that capacity.

This sounds simple in theory, but the reality is that correctly provisioning throughput and estimating pricing is far more nuanced.

Below is a list of all the factors that could impact the cost of running DynamoDB:

Size of the data set per month
Size of each object
Number of reads per second (pricing is based on “read capacity units”, which are equivalent to reading a 4KB object) and whether those reads need to be strongly consistent or eventually consistent (the former is twice as expensive)
- If accessing a JSON object, the entire document must be retrieved, even if the application needs to read only a single element
Number of writes per second (pricing is based on “write capacity units”, which are the equivalent of writing a 1KB object)
Whether transactions will be used. Transactions double the cost of read and write operations
Whether clusters will be replicated across multiple regions. This increases write capacity costs by 50%.
Size and throughput requirements for each index created against the table
Costs for backup and restore. AWS offers on-demand and continuous backups – both are charged separately, at different rates for both the backup and restore operation
Data transferred by Dynamo streams per month
Data transfers both in and out of the database per month
Cross-regional data transfers, EC2 instances, and SQS queues needed for cross-regional deployments
The use of additional AWS services to address what is missing from DynamoDB’s limited key value query model
Use of on-demand or reserved instances
Number of metrics pushed into CloudWatch for monitoring
Number of events pushed into CloudTrail for database auditing

It is key to point out from the list above that indexes affect pricing and strongly consistent reads are twice as expensive.

With DynamoDB, throughput pricing actually dictates the number of partitions, not total throughput. Since users don’t have precise control over partitioning, if any individual partition is saturated, one may have to dramatically increase capacity by splitting partitions rather than scaling linearly. Very careful design of the data model is essential to ensure that provisioned throughput can be realized.

AWS has introduced the concept of Adaptive Capacity, which will automatically increase the available resources for a single partition when it becomes saturated, however it is not without limitations. Total read and write volume to a single partition cannot exceed 3,000 read capacity units and 1,000 write capacity units per second. The required throughput increase cannot exceed the total provisioned capacity for the table. Adaptive capacity doesn’t grant more resources as much as borrow resources from lower utilized partitions. And finally, DynamoDB may take up to 15 minutes to provision additional capacity.

For customers frustrated with capacity planning exercises for DynamoDB, AWS recently introduced DynamoDB On-Demand, which will allow the platform to automatically provision additional resources based on workload demand. On-demand is suitable for low-volume workloads with short spikes in demand. However, it can get expensive quick — when the database’s utilization rate exceeds 14% of the equivalent provisioned capacity, DynamoDB On-Demand becomes more expensive than provisioning throughput.

Compared to DynamoDB, pricing for MongoDB Atlas is relatively straightforward by selecting just:

The instance size with enough RAM to accommodate the portion of your data (including indexes) that clients access most often
the number of replicas and shards that will make up the cluster
whether to include fully managed backups
the region(s) the cluster needs to run in

Users can adjust any of these parameters on demand. The only additional charge is for data transfer costs.

When to use DynamoDB vs. MongoDB

DynamoDB may work for organizations that are:

Looking for a database to support relatively simple key-value workloads
Heavily invested in AWS with no plans to change their deployment environment in the future

For organizations that need their database to support a wider range of use cases with more deployment flexibility and no platform lock-in, MongoDB would likely be a better fit.

For example, biotechnology giant Thermo Fisher migrated from DynamoDB to MongoDB for their Instrument Connect IoT app, citing that while both databases were easy to deploy, MongoDB Atlas allowed for richer queries and much simpler schema evolution.

Want to Learn More?

MongoDB Atlas Best Practices

This guide describes the best practices to help you get the most out of the MongoDB Atlas service, including: schema design, capacity planning, security, and performance optimization.

MongoDB Atlas Security Controls

This document will provide you with an understanding of MongoDB Atlas’ Security Controls and Features as well as a view into how many of the underlying mechanisms work.

from:Comparing DynamoDB and MongoDB | MongoDB

Dotte博客

大数据、云计算、架构、语言的本质、计算的未来

Comparing DynamoDB and MongoDB

Quick Comparison Table

What is DynamoDB?

What is MongoDB?

Terminology and Concepts

Deployment Environments

Data Model

Queries and Indexes

Consistency

Operational Maturity

Pricing & Commercial Considerations

When to use DynamoDB vs. MongoDB

Want to Learn More?

发表评论取消回复

Quick Comparison Table

What is DynamoDB?

What is MongoDB?

Terminology and Concepts

Deployment Environments

Data Model

Queries and Indexes

Consistency

Operational Maturity

Pricing & Commercial Considerations

When to use DynamoDB vs. MongoDB

Want to Learn More?

发表评论 取消回复

发表评论取消回复