Docker Discoveries

Stumbling blocks & realisations

James Booth

13 minute read


My approach to new technology is to treat it like Lego. Start small, build the pieces together, miss a few blocks, watch it topple over and rebuild with the new knowledge in mind. This has been especially true with Docker, as the barrier to entry is very low, but there are a lot of specific features that are great when you need them, but they often come with big gotchas.

After a little while of using Docker at a small scale to run some services, I’ve hit some roadblocks along the way and wish to share some of the changes I’ve made to my workflow and discoveries I’ve uncovered. These might be obvious to you, but I would have been happy to know about them when I first started!

Stay composed!

Firstly, docker-compose is a key part of my Docker workflow and I think it should be one of the first things people invest a little bit of time into when learning Docker.

When I was starting with docker, it seemed like a lot of examples for running containers just used the docker run command, along with some unweildy, arcane list of arguments. This is great for the one-off containers, but when I started building small sets of containers (eg. a web server and its mysql instance), or had some container configs which I wanted to re-use/persist, I quickly found that the docker command alone didnt’t scale.

This is where docker-compose excels. It allows us to define the exactly what we want in our Docker project - containers, networks, volumes, etc. - and store this in a simple yaml file. Run docker-compose up and we have a self-documenting, reproducable container or set of containers!

Take a look at the docs as docker-compose is extremely powerful and can even be used for managing Docker Swarm.

Compose version

Originally, I just stuck to version 2 compose files… because that’s all I knew of. But By defining the newer compose file versions - The latest is version: '3.1' at the time of writing - we get more features available to us, and are guided towards the newer way of doing things.

One of the bigger changes that affected my workflow is the move to declaring volumes and networks at the root level of the yaml file, for example:

version: "3.1"

    image: myimage
      - "somenamedvolume:/etc/destination"


   external: true 
   # This means it has been declared outside of the compose file

Now that they’ve declared outside of any services, we can re-use them for a number of services (More on this coming below).

Using multiple compose files

Additionally, we can even specify multiple compose files, which allows us to tke a modular approach to buildin our final product.

For example, I could have a service defined in my docker-compose.yml with environmental variables, and then inside the container, my application will use these variables to connect to a database:

# Example docker-compose.yml
version: '2'
    image: myapp:latest
      - 9999:9999
      DATABASE_USER: productionuser
      DATABASE_PASS: productionpass

Simply running docker-compose up in the same directory as this file will spin up my container.

If perhaps I wanted to connect my app to a different database for testing, instead of editing the docker-compose.yml, I could specify another compose file that overrides only these options:

version: '2'
      DATABASE_USER: devuser
      DATABASE_PASS: devpass

And then we simply launch the project, specifying the paths to these files, with files that take presidence coming later in the chain, eg:

docker-compose -f docker-compose.yml -f dev.yml up

Data persistence using volumes

Containers are ephemeral by nature, but once you move past the ‘let’s give this a whirl’ stage, the need for persistent data becomes apparent. This is where volumes come into play. They are a way to persist data from a container by storing somewhere on the host’s filesystem and not just inside the container. This allows them to survive a deletion and recreation of the container, and as long as we re-use the volume arguments we provided (more below), the data will be right where we left off.

There are two ways I try and map data in (and out) of my containers, each with their own caveats: named data volumes and host volumes.

Data volumes

We can define a named data volume by simply appending -v <volume_name>:location/to/map/in/container to our docker run command, or more appropriately, by defining it in our compose file like so:

    image: abc:latest
      - "mynamedvolume:/path/in/container"

    image: xyz:latest
      - "mynamedvolume:/a/different/path/perhaps"

    # options here if required

As you can see, the volume is defined outside of any services, and this allows multiple services to map it into their filesystem. It also means that even if we docker-compose down, the volumes will persist on the host’s filesystem under /var/lib/docker/volumes/<servicename>_<volumename>.

Additionally, using this method, we get the additional benefit of any existing data in the location we are mapping to is now added to the volume, and is therefore persistent & accessible on the host filesystem. This is great for making adjustments to an image’s files without actually altering the image, for example, if we spin up an nginx container with docker run -v nginx_config:/etc/nginx nginx:latest and then kill the container, we can now access that data at /var/lib/docker/volumes/nginx_config/_data (Requires sudo).

├── conf.d
│   └── default.conf
├── fastcgi_params
├── koi-utf
├── koi-win
├── mime.types
├── modules -> /usr/lib/nginx/modules
├── nginx.conf
├── scgi_params
├── uwsgi_params
└── win-utf

Now, edit any configs as necessary and relaunch the container with this named volume attached.

I use this method most for data I don't need to access a lot and I don't want the permissions altering, or if I want to share the container amongst multiple containers.

Host volumes

Another option is to define a host volume. This differs in that we can choose where to locate the volume on the host filesystem (either relative or full path).

For example, using the docker command, we can just supply -v /path/on/host:/path/in/container, or if we wanted to supply a relative path, we have to expand the variable $PWD like so: docker run -v ${PWD}/volumes/nginx_config:/etc/nginx nginx:latest. And when using docker-compose, we just scrap the volumes: definition in the above example and directly specify the paths like so:

    image: abc:latest
      - "./volumes/nginx_config:/etc/nginx"

Using this method provides us with the huge benefit of being able to use any storage backend as we’re simply mapping the files to the host’s filesystem. On the other hand, it does have some caveats associated…

Files & folders

Obviously we already know that we can mount a folder into a container, but this method lets us map in a single file too. This is great for mapping in individual config files, for example the Kerberos config file /etc/krb5.conf - here we definitely don’t want to map the entire /etc/ directory into the container.

Another thing to be aware of when mapping just a single file is that it is copied in at container launch. This means that changes on either the host or container will not be reflected on the other until the container is relaunched, wherein it will use the host’s copy again.


When using host volumes, the host’s folder is mounted directly on top of the container destination, meaning that you will only see files from the host path, not the files that may have already been there. As well as this, no pre-existing files are copied out onto the host’s filesystem to ensure no files on the hosts filesystem are overwritten. New files will obviously be accessible on the host’s file system and any files that were in the folder on the host’s file system when you launched the container.

If you wanted to use this method but also initial files from the container’s file system, you can launch the container without the volume mount and copy the files/folders out to the host filesystem with docker cp my_container-name:/path/to/files ./relative/path/on/host/filesystem.


This is the big one. Because we are taking files from the host’s filesystem and placing them directly into the container’s filesystem, permissions are maintained from the host. This is normally fine when your container process runs as root, but a soon as you throw in other users (eg. www-data for web server processes), then they will quickly get access denied.

There are a few ways to get around this, and it’s best to plan ahead for this as it will become a source of frustration (Trust me). The priority here is to ensure that the user inside the container has correct access to the files.

  • In your Docker entrypoint/command, chmod the folders in question. This means that whenever the container is launched, it will take ownership of the files.
  • Use relaxed permissions on the host filesystem (777, 665). Please be aware that this has security implications of its own.

I use this method most of the time as it helps to keep all of the Docker config & container files together in the host's filesystem.

Sidenote: When using this method, I originally settled on the format ./volumes/<servicename>/etc/<application>:/etc/<application>, trying to best mirror the container filesystem. However, since I use a variation of host and data volumes, I now try and use a more general name for either, so they are all alike - Eg. nginx_config:/etc/nginx for a name container or ./volumes/nginx_config:/etc/nginx if using a host volume.

Using full disks for volumes

When we need to store a lot of data, it might be a good idea to map a volume directly to an entire disk. And because my Docker hosts are virtual, that just means adding a VHDx to the VM.

To do this, define the following in the compose file.

    driver: local
      device: /dev/sdb
      type: ext4

Please be aware, this does not copy existing data out like a named volume.

External volumes

As well as defining volumes in this way, we can also specify that a volume is external meaning that it has been created outside of the compose file (docker volume create myvolume) and it will be used instead of a new volume being created with the project’s name as a prefix. This means we can manually create a volume that we may wish to share between multiple different Docker projects and just refer to it in ever project we use.

When referencing an external volume in a compose file, the syntax should look like:

    external: true

This external volume Must already have been manually created prior to launching the project!


This was another big one where I started working with something but then had to revisit it to ensure I was doing it properly and in a scalable fashion (Hint: I wasn’t).

To begin with, I was using the default network_mode: bridge which simply attaches the containers to the Docker bridged network. On top of this, I was making use of the networking ‘links’ feature was used to connect containers securely. But this way doesn’t scale (And ‘links’ are deprecated now) as you have to explicitly specify every connected container.

Now, I’m working towards using Docker networks, which act like standard layer 2 broadcast domains, with Docker managing some extras like DNS resolution between containers. So, for a Docker project that has a web service and database, the compose file could look like:

    image: nginx
      - 80:80
      - 443:443

    image: mariadb
          - "db"
        # This adds makes the container resolvable by the hostname 'db' too

    internal: true
    # Internal means it isn't available to the host

Using the above example, we can access our webserver instance externally, but only the webserver container can connect to the mariadb container (via. the database network).

External networks

Similar to external volumes, we can also define Docker networks outside of a compose file (via. docker network create mynetworkname) and reference it inside the compose files, throughout multiple projects. So, we can define networks inside our project’s compose file for networks that are limited to the specific project (And they will be prefixed with the project’s name), and have a few static networks that are used by all projects.

For example, I make use of this is by having a single private network that any containers that have a webserver component are connected to. This private network was created with network create --internal private, and the --internal flag means that it is not bridged to the host’s network, so nobody external to the host will be able to access these containers directly.

Next, I have a single proxy container that is on both the public network (Exposing ports 80 & 443 on the host), and on the private network to connect to all of these web server containers. This means that requests going to the Docker host are sent to the proxy container (via. the public network), then depending upon the hostname requested, it is proxied to another container (on the private network). This way, the containers on this internal network are never directly presented to outside hosts, and it allows us to host multiple web-services on the same host, without having to supply different ports for each.


    external: true

Sidenote: Proxying

I initially made heavy use of nginx’s proxying features, but as I’ve added more containers, it became hard to scale successfully. I’ve moved towards using Traefik as it does a lot of the configuration automatically and is reloaded on-the-fly to accomodate any new containers that are added. More info to come on this, but it warrants its own post!

Building and running

Just a few points here as this is still a point I’m actively improving my worklow on, but so far I’ve found the following to be really helpful:

  • Build containers using a familiar, non-restrictive base first (ubuntu:16.04) then move to a tiny base when happy (alpine:3.5). Obviously, the Dockerfile will have to be converted to use Alpine Linux’s package manager, etc. but it makes for a tiny base image!
  • Know when to keep instructions in a single Docker RUN command and when to split them to control the layers in a Docker image - This is especially helpful when troubleshooting a failed build, as you can launch a new container from the most recent image layer and manuallly troubleshoot from there.
  • Build from git! This is a handy extra for those times when the image is in a git repo, but not yet published to a Docker registry (Basically all of my images!). Check out the docs for more info, as you can build from specific tags, folders, etc.
  • Use the --rm flag to automatically delete a container once it has quit (Now available with docker-compose run too!).
  • It’s possible to keep a container running in the background, even if there’s no active service. For example, I like to quickly jump into my Ansible container with docker exec -it ansible /bin/bash, but this assumes it’s still running. The workaround I use is to keep a ‘fake’ process running and make it require a tty in my compose file:
  command: "tail -f /dev/null"
  tty: yes

Other tidbits

A final footnote on this post was to just say how useful I’ve found Docker to be, not for the unicorn, floaty, cloudy microservices kind of situation you may hear about but just as a way to run multiple applications on a single Ubuntu VM.

This lets me make the most out of the ‘infrastructure as code’ philosophies, by version controlling & iterating upon my containers/images, using Docker as a pseudo-hypervisor. I might not follow the ‘one service per container’ mantra all the time, but I use Docker as a tool when it fits me.

comments powered by Disqus