In Part 1 of this series we saw how to create and run a simple MongoDB instance based on CentOS. This is good for basic dev and test use, but not much beyond that as it does not address a number of performance and fault tolerance challenges. In this post, we take a closer look at Docker’s disk storage options and the associated considerations for running a database (like MongoDB) on it.

File system layering

docker-filesystems-multilayer

Docker’s root file system layering.

 

One of Docker’s key features (and my personal favourite) is the layering of the root file system. Each of the underlying layers are read-only, stacking up to form the actual file system with only the top layer writable. These can then be easily versioned, compared to see exactly what changed, and cached so that we don’t need to rebuild it from scratch each time.

This is a huge improvement from the traditional golden image approach, whereby entire file system images or Virtual Machine (VM) templates are manually built – it’s often unclear what exactly are in them and why. More recent approaches involve Configuration Management (CM) tools such as Puppet, Chef, and Ansible, but building a complex image on-demand from scratch will take a long time. Docker’s layering approach makes this blazingly fast by rebuilding only the layers that have changed.

It is however, not without downsides: the run-time performance of such layered file systems are woefully slow. This is dependent on the storage module used, with the original AUFS being deprecated in favour of other backends like OverlayFS, Btrfs, and device mapper. Regardless, I/O heavy workloads should be moved to Docker data volumes  for optimal performance. They live outside of the original Docker container and thus bypass the layered file system. There are two main data volume types: host directory and data-only containers.

Data Volumes: Host directory

Using a Docker host directory data volume.

Using a Docker host directory data volume (image source).

 

A host directory data volume is simply a directory that is mounted into the original container. Building upon our previous example in Part 1, create a directory on our Docker host and use it for MongoDB’s dbpath (which contains the data and journal files). For example:

Check that the MongoDB container has started successfully by inspecting the log files:

Ensure that the data files have been created in the specified host directory ~/db:

Quick benchmarking

How much faster are host directory data volumes than the default layered root file system? This of course depends on your environment and proper performance testing is beyond the scope of this blog post, but here’s a quick way to do some quick benchmarking with mongoperf.

First let’s create a mongoperf Docker image with the following Dockerfile:

Use the same mongodb.repo as the previous example in Part 1, reproduced here for your convenience:

With the above two files in your current directory, build the image by running:

Now benchmark the layered root file system by running:

You should see output similar to the following:

mongoperf will keep running so press CTRL-c to get back to the terminal. The container is still running in the background, so let’s terminate it:

Now re-run the benchmark with a host directory data volume instead:

Here’s the corresponding output from my setup:

Terminate and remove the container as before.

Comparing the last set of results with 32 concurrent read-write threads, we see a 180% improvement in the number of operations per second, from 1211 to 3385 ops/sec. There’s also a 225% increase in throughput from 4 to 13 MB/sec.

Container portability

These performance gains are offset by container portability – our mongod container now require a directory on the Docker host that is not managed by Docker so we can’t easily run or move it to another Docker host. The solution is to use data-only containers, as described in the next section.

Data Volumes: Data-only containers

Using a Docker data volume container.

Using a Docker data volume container (image source).

 

Data-only containers are the recommend pattern storing data in Docker as it avoids the tight coupling to host directories.

To create the data-only container for our benchmark, we re-use the existing mongoperf image:

Now re-run the benchmark with the --volume-from mongoperf-data parameter to use our data-only container:

This produces the following output in my setup:

Performance wise it is similar to host directory data volumes. The data-only container persists even if the referencing container is removed (unless the -v option is used when running docker rm). We see this by running:

Wrapping up

Coming back to our mongod container, we can now run it with a data-only container for better performance:

Remember, you can see the mapped local port number by running docker ps. For example:

Volumes will eventually become first class citizens in Docker. Meanwhile, consider using community tools like docker-volume to manage them more easily.

What’s next

In the next part of this series, we will investigate the various Docker networking options and see how that fits in with a multi-host MongoDB replica set. Stay tuned!