[melkjug-dev] MongoDB

Joshua Bronson

2009-04-13 18:48:13 UTC

one of the mongodb people wrote up a couch-vs-mongo page I found helpful.
from http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB:

Comparing Mongo DB and Couch
DB<http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB>

Added by Dwight Merriman <http://www.mongodb.org/display/~dwight>, last
edited by Dwight Merriman <http://www.mongodb.org/display/~dwight> on Mar
30, 2009 (view
change<http://www.mongodb.org/pages/diffpages.action?pageId=590739&originalId=590768>
)

We are getting a lot of questions "how are mongo db and couch different?"
It's a good question: both are documented oriented databases with schemaless
JSON-style object data storage. Both products have their place -- we are
big believers that databases are specializing and "one size fits all" no
longer applies.

We are not Couch DB gurus so please let us know in the
forums<http://groups.google.com/group/mongodb-user/browse_thread/thread/757d7f1e5f1765e8>
if
we have something wrong.
MVCC

One big difference is that Couch is
MVCC<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>
based, and Mongo db is more of a traditional update-in-place store. MVCC is
very good for certain classes of problems: problems which need intense
versioning; problems with offline databases that resync later; problems
where you want a large amount of master-master replication happening. Along
with MVCC comes some work too: first, the database must be compacted
periodically, if there are many updates. Second, when conflicts occur on
transactions, they must be handled by the programmer manually (unless the db
also does conventional locking).

Mongo DB updates an object in-place when possible. Problems require high
update rates of objects are a great fit; compaction is not really necessary.
Mongo's replication works great but without the MVCC model, it is more
oriented towards master/slave and auto failover configurations than to
complex master-master setups.
Horizontal Scalability

One fundamental difference is that a number of Couch users use replication
as a way to scale. With Mongo, we tend to think of replication as a way to
gain reliability/failover rather than scalability. Mongo uses (auto)
sharding as our path to scalabity (the first sharding release is in April).
In this sense Mongo DB is more like Google BigTable. (We hear that Couch
might one day add partitioning too.)
Query Expression

Couch uses a clever index building scheme to generate indexes which support
particular queries. There is an elegance to the approach, although one must
predeclare these structures for each query one wants to execute.

Mongo's approach is more traditional: like say, MySQL, we can do queries
where an index does not exist, or where an index is helpful but only
partially so. Mongo includes a query optimizer which makes these
determinations. We find this is very nice for inspecting the data
administratively. And when an index corresponds perfectly to the query, the
Couch and Mongo approaches are then conceptually similar.

Couch's index building method, using Javascript functions, does provide an
extra layer of flexibility in how one can create keys.
Performance, ACID, REST

Philosophically, Mongo is very oriented toward performance, at the expense
of features that would impede performance. We see Mongo DB being useful for
many problems where databases have not been used in the past because
databases are too "heavy".

To ensure high speed, Mongo forgoes full ACID compliance -- in some ways it
is similar to the MySQL MyISAM storage engine. Couch has full ACID
compliance.

Couch uses REST as its interface to the database. This is very nice but is
not as fast as native drivers. With its focus on performance, Mongo DB
relies on language-specific database drivers for access to the database. A
REST API could be added in the future, however -- that would be a great open
source project -- one could for example build one as an Apache module using
the existing Mongo C++ driver.
Use Cases

It may be helpful to look at some particular problems and consider how we
could solve them.

- if we were building Lotus Notes, we would use Couch as the MVCC model
fits perfectly. Any problem where data is offline for hours then back
online would fit this.
- if we need several eventually consistent master-master replica
databases, geographically distributed, we would use Couch.
- if we had very high performance requirements, and less concern about
ACID compliance, we would use Mongo. For example, web site user profile
object storage and caching of data from other sources. Any problem with
"high volume, low value" data.
- if we were building a system with very critical transactions, such as
financial transactions, we would not use Mongo for those transactions --
although we might in hybrid for other data elements of the system. Although
for something like this we would likely choose a traditional RDBMS.
- for a problem with very high update rates, we would use Mongo as it is
good at that. For example, updating real time analytics counters for a web
sites (pages views, visits, etc.)