Docker 101


Joseph Gefroh

Docker is a great tool, but it’s difficult to get into if you’re not familiar with the deployment side of software engineering. This post should hopefully help you decipher some of it.

Docker is a container engine — a lightweight virtual machine.

You use Docker by first writing a text file with a set of instructions that sets up your server environment appropriately to run your program. This instruction file, the Dockerfile, is a step-by-step list of instructions that details how Docker should go about installing programs, copying files, changing environment settings, etc. to make sure your program runs.

Docker can use this Dockerfile to build an image: a fully-contained executable version of your program that you can run along with the OS and other dependencies necessary to run it.

Docker can run this image, creating a container out of it.

The benefits are tremendous!

You get reproducible builds across multiple environments. Because the Dockerfile is a text file, it can be run by any other computer with Docker and ultimately produce the same resulting image. No more having to set up a build environment on a teammate’s machine and not having it work for various reasons.

You get consistent runtime environments. No more having bugs because different servers have slightly different configurations or settings. Docker’s images ensure that what you run is the same regardless of when or where you run it.

You get isolation. Because your programs run in containers, which are essentially self-contained environments, you can run multiple programs on a single host without affecting each other outside of basic resource usage.

It’s lightweight. Traditional virtual machines are huge and some even go into the gigabytes range. Containers are much more lightweight — you can have a production ready container running in as little as 25 megabytes.

Convinced? Let’s get started.

The Dockerfile is your starting point. It contains the set of instructions you need to start using Docker.

Most Dockerfiles have a base image, specified by the FROM directive. The FROM directive tells Docker to run the commands in the Dockerfile on top of another image. The base image is often an install of a Linux OS — Alpine is a popular choice due to its incredibly compact footprint.

Let’s examine the following Dockerfile, line-by-line:

FROM alpine:3.5
COPY script.sh /script.sh
RUN echo ‘Built!’
CMD ./script.shFROM alpine:3.5

This line tells Docker to look up and download the alpine base image that was tagged as version 3.5 from the Docker repository. There’s many different base images you can choose from, and you can even make and publish your own.

COPY script.sh /script.sh

This line tells Docker to COPY the file script.sh from the host machine and place it in the image at the root path /. This is a primary way you get files from your build machine into your final image.

RUN echo Built

This would tell Docker to RUN the command given, in this case to print to the console Built!. RUN is the primary way you execute programs to install or configure your image.

CMD ./script.sh

This final line tells Docker to run the command given when the container is launched — in this case to run the script.sh file. Note that the CMD directive would not run until you actually executed the image via docker run

There’s a lot of other directives available when building your image — EXPOSE lets you open a port to the container, ENV lets you specify an environment variable, etc. Read the full Dockerfile reference for more information.

Once you have a Dockerfile, you would run the docker build command to create an executable image. Docker would run through the commands in order, moving files and doing whatever it is you wrote in the Dockerfile to create an image containing your program and anything needed to run it.

The end result is an image — run docker images to see a list of all completed images available to you.

Docker runs through the instructions in the Dockerfile, setting up an environment.

Once you have the image, you can tell Docker to run it by creating containers.

All images are assigned a unique identifier — a random hexadecimal string. You can view a list of images available by running docker images.

Use the docker run [optional-command] command to actually execute your image and create a container instance of it.

If you had a CMD in your Dockerfile instruction, it’ll automatically execute that command.

You can run the container in the background with the -d flag to specify it is detached, eg. docker run -d

A lot of the power of Docker comes from actually running it in a server environment. You have several options to get your image uploaded:

  • You can save it as a .tar file locally, scp it into the server, extract it, then run it. Check out docker export, docker save, docker import, and docker load, depending on what your needs are.
  • You can upload it to a repository like Docker Hub or Amazon ECR, then have the server pull the image and run it. Check out docker push and docker pull.
  • You can build the image on your server directly by uploading your source code and Dockerfile, and then running the various commands there.

If you’re using Alpine Linux you have to manually install everything.

Alpine is small for a reason — it is missing almost everything. You’ll need to manually download things you take for granted — even bash or curl. My docker files often include a line with multiple package installs from their package manager apk:

RUN apk add –update
bash
build-base
curl
file
git
nodejs
openssl-dev
readline-dev
vim
wget
zlib-dev

Node may break on Alpine, especially if you use native libraries

Some node packages have native libraries. For various reasons, many of these packages do not come with precompiled binaries that work with Alpine.

If you try to use the library or npm install, you’ll likely get an error and will need to manually build the binaries to match Alpine. However, to do that you’ll need to install the tools these packages need to actually build the libraries, which may vary depending on the package.

For example, many of my projects use the build tool, Brunch, which in turn uses a library called node-sass. node-sass does not come with precompiled binaries that work with Alpine. In order to actually compile the binaries natively, I also have to include an instruction in my Dockerfile to installgcc, python, and g++ via the apk.

Take advantage of the layer caches

Docker will cache the results of every instruction in a Dockerfile, ensuring that if that portion of the image doesn’t change, you don’t have to go through it all over again.

This is a fantastic increase in speed, but requires careful planning on web applications that download dependencies, such as those using package.json or the Ruby Gemfile. For example, the following example below would cause the bundle install command to be run every time, regardless of whether the Gemfile actually changed — this is slow and inefficient.

COPY . /app
WORKDIR /app
RUN bundle install

A good way to utilize caching is to not copy the entirety of the application into the Docker image initially, but rather just the dependency listing:

COPY Gemfile /app/
COPY Gemfile.lock /app/WORKDIR /app
RUN bundle install
COPY . /app

Utilizing this approach ensures that if the source code changes, only the final instruction, COPY . /app will need to be run. It ensures that only when the Gemfile (or other package manifest) changes will the time-consuming bundle install run.

Multi-stage builds can greatly reduce the size of your final image

Let’s face it. A lot of the dependencies required to build the final runnable output of a project aren’t of any use during runtime, such as a site generated by a static site generator or concatenation tools for Javascript. They take up space, might cause conflicts, and may even be a security issue to leave in the image.

You can use multi-stage builds in Docker to ensure that those dependencies aren’t included in the final build. By having multiple FROM statements in your Dockerfile, you can use instructions to copy specific files into the final output image:

FROM mhart/alpine-node:6.11.3 AS builder

COPY . /
RUN ./build.sh #This outputs compiled files to /outputFROM alpine:3.4
COPY –from=builder /output /app/public

Explore the state of your image if you have any errors

Docker creates an intermediate image layer between every instruction in your Dockerfile as a cache.

This means that if any of your instructions fail, you can view, manipulate, and explore the state of the in-progress image right before the failure. You can use this to figure out what exactly went wrong and even run commands as if you were Docker yourself.

To do this, you can use docker run -it /bin/bash to enter the image. For example, I could use the image id b08b377f004f from the example below to enter the image in the state it was left in right after the COPY command was run.

Did you find this story helpful? Please Clap to show your support!
If you didn’t find it helpful, please let me know why with a
Comment!

Leave a Comment

Your email address will not be published. Required fields are marked *