What is Docker and why is it so popular?
Since its release in 2013, Docker has gained massive traction amongst software companies as it made deployment of containerized microservices extremely convenient and easy. Today, Docker is synonymous with the term containers and containerized microservices.
Needless to say, even in the deployment of Machine Learning models, Docker has been the go to container service as it boasts the following characteristics and offers many benefits including:
- Operating System level virtualization, enabling portability and consistency. As a result of this layer of abstraction, Docker containers are often much faster as compared to virtual machines as there is no need to lumber through the process of spinning-up the operating system each time it launches .
- Lightweight (i.e., contain only required dependencies) — As a result of (1), Docker containers are also more lightweight, resulting in fewer system resources needed. Also, more often than not, ML models are deployed as API microservices and hence only require bare minimum dependencies. Going through the arduous process of matching UAT and production environments is often a process that is a lot more complicated than it should be. Docker containers hence form the de facto microservice architecture for deploying ML models.
- Less expensive — this is simply the product of the above two points.
Yet with all the benefits that Docker has to offer, it is convenient to simply use Docker without understanding how Docker works and the security issues it may present when trying to ship an API microservice to a client.
To understand how Docker works, we need to understand two main features of the Linux kernel which Docker takes advantage of.
In short, namespaces allow the partitioning of kernel resources and processes. These form the building blocs of containers on Linux as resources and processes having the same namespace will only have access to those distinct resources, creating a layer of isolation.
When a Docker container is spun up, Docker creates a set of namespaces for that particular container and its access is limited to that namespace. Examples of namespaces used within Docker can be found in  and typically used to manage processes, networking, volume mounting, managing access to InterProcess communication resources and isolating kernel/version identifiers (Unix Time-Sharing). This isolates the execution environment as it cannot see other namespaces, containers and applications on the system.
Also known as cgroups, this feature of the Linux kernel limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes . This essentially allows the Docker engine to share available hardware resources of the host to the containers and optionally enforce limits and constraints. This ensures that each containerized microservice will not use more resources than it should. From a security perspective, this prevents a distributed denial-of-service (DDoS) attack from inadvertently “crashing” the container and consequently, the host system.
How are namespaces different from cgroups?
Namespaces limit what you can see (and therefore use) while cgroups limit how much you can use.
Consequently, when we execute the command docker run, behind the scenes Docker creates a set of namespaces and cgroups for the container. From the perspective of the container, it does not know that it is in fact a “machine within a machine” .
Now that we know how Docker works, what are the potential concerns when deploying a container microservice? On Docker’s official page on Docker security , it lists four major areas to consider, including:
- The intrinsic security of the Linux kernel and its support of namespaces and cgroups
- The attack surface of the Docker daemon itself
- The loopholes in the container configuration profile, and
- The “hardening” security features of the kernel and how they interact with containers
Before we deep dive into each of these potential concerns, let’s take a step back and ponder upon the setup of a Docker container.
Fundamentally, a Docker container sits on a host machine and shares the resources of the host machine. These as described above, can be configured by the user that has access to the Docker engine. How then, can an attacker gain access or potentially inject malicious malware into the container and/or host?
If a container service is exposed, an attacker can potentially gain access to it and carry out his/her attack typically via a shell. If there is a read-write volume mounted on the container, the attacker can then write out malicious files to the host system. These potential problems may occur from within the container.
On the contrary, there may also be potential loopholes from the host perspective, prior to spinning up the container. Running containers and applications with Docker requires the Docker daemon and this daemon requires root privileges. As such, the user setting up the container has to be trusted and consequently the container configuration is also very important. As such, other problems may occur from outside the container (i.e., from the host/kernel).
Let’s take a look at some of the ways to increase the security of our containers by reducing the attack surface of the daemon and container themselves.
Hardening the Host
#1: Use a container image scanner
The purpose of an image scanner is to analyze the contents and the build process of a container image in order to detect security issues, vulnerabilities or bad practices. Some of the tools in the market include Docker Bench, Clair and Anchore. I have only personally tried out Docker Bench and I’ve indicated a screenshot below to show the kind of output it shows.
In my opinion, it is a good place to start with to get acquainted with security best practices after one has gotten the hang of building a basic Docker image as it flags out potential security issues related to:
- Linux host specific configuration
- Docker daemon configuration & configuration files
- Container images & build file
- Container runtime
- Docker swarm configuration (if any)
- Docker enterprise configuration (if any)
Some of the checks correspond to some of the best practices when building Docker images and also inspired by the CIS Docker Benchmark v1.2.0. It may also be worthwhile investigating how to implement these checks into your CI/CD pipeline.
Examples of checks performed by Docker Bench
Comments: I had issues running the MacOS docker run command on my local machine, and resorted to cloning the repository and running the sudo sh docker-bench-security.sh script directly.
#2: Enhance the kernel security using modern Linux kernels like SELinux, AppArmor, etc
AppArmor is a Linux kernel security module that restricts the capabilities of processes running on the host operating system. By default, a deployed Docker container remains secured through an auto-generated profile named docker-default generated from this template, and provides moderate security on the application level. To increase the security, An AppArmor security profile can be associated with the container using the –security-opt option during docker run to enforce a security profile that works at the process/program level of an application .
Like AppArmor, Security-Enhanced Linux (SELinux) is a Linux kernel security module that provides a mechanism for supporting access control security policies, including mandatory access controls (MAC) . It provides an out-of-the-box additional layer of security by giving administrators more control over who can access the system. This is especially important since an attacker an do pretty much anything to the host if he/she can run a Docker container with root privileges. (Do note that using SELinux with containers is only supported on CentOS and Red Hat Enterprise Linux.)
Hardening the Container
#1: Do not execute the container as the root user (if possible)— after doing the necessary installations in the initial steps of the Dockerfile, change the user to an unprivileged user using something like:
RUN groupadd -r myuser && useradd -r -g myuser myuser
#2: Create ephemeral containers
As a rule of thumb, grant access to only what you need and grant minimal level of access. This can be accomplished by:
- Limiting Linux kernel capabilities by dropping all capabilities and only adding what is needed (e.g., docker run –cap-drop all –cap-add CHOWN alphine)
- Preventing privilege escalation within the container by using the –security-opt=no-new-privileges option
- Disabling inter-container communication by using the -icc=false option
- Limiting resources that is available to container
- Setting filesystem (e.g., docker run –read-only alpine sh) and volumes (e.g., docker run -v volume-name:/path/in/container:ro alpine) to read-only (unless necessary)
#3: Use a distroless image in a multi-stage Docker build
Distroless images are base images that do not contain package managers, shells, or any other programs you would expect to find in a standard Linux distribution . It is essentially a lightweight version of the original image you are using in the first stage of your build. To do so, use a normal base image to install dependencies and the necessary in the first stage and then switch over to a distroless version of that base image before your RUN command.
Doing so minimizes the attack surface if attackers gains access to your container as they typically make use of the shell to perform broader attacks. Examples of how to use distroless images can be found here.
Besides understanding the security concerns with regards to Docker containers, it is also important to first understand the best practices for writing Dockerfiles and see how they interplay with the security issues listed above.
Understanding the vulnerabilities of deployment of a Docker container is important for ML/AI engineers as we are the ones that builds these Dockerfiles and shipping them off to clients. It is thus also important to understand the environment in which the client will eventually deploy the container on to make the necessary security configurations.