Docker Made Easy for Data Scientists


Gagandeep Singh

“Google runs all software in containers and they run around 2 billion containers every week.”

source

Before proceeding any further the first question that should come to your mind is why I need docker and even more important what is it?

So, what is Docker? A Docker is a set of platform service products that use OS-level virtualization to deliver software in packages called containers. The great advantage of Docker is that it makes your deployment easy and when the time comes you can even deploy it on multiple machines with almost no extra effort.

You can also use Docker with Kubernetes that can automatically handle the workload by distributing the work among different Docker Containers. On top of that, it also takes care if any Docker Container goes offline and automatically restarts it and many more things.

Here, we are going to build various flask apps using Docker.

  1. Simple Hello World app using Flask
  2. Passing arguments in Docker Container
  3. Simple ML app in Docker Container
  4. Passing Image as an argument in Docker Container

$ sudo apt update
$ sudo apt install docker.io

This will install docker. Now, to check if docker is running or not type

$ sudo service docker status

If you don’t see docker in active mode then type

$ sudo service docker start

  1. Docker Image — In very simple terms docker image is just like an ISO image file used to install any OS. You can view all docker images(publicly available on DockerHub)
  2. Docker Container — When a Docker Image runs it becomes a Docker Container. You can run the same image again and a different docker container will be created.
  3. Dockerfile — A Dockerfile contains all the code to set up a docker container from downloading the docker image to setting the environment.

I’ve put the GitHub link on the bottom where all the code is available. Docker is very easy to learn after you have understood its flow.

This is a very simple app that prints ‘hello world’ on the browser’s screen.

Let’s build the flask app

Also, create requirements.txt file by

$ pip3 list > requirements.txt

A few things to remember while making the flask app is that the host should be set to ‘0.0.0.0’ because we are running it inside Docker.

Let’s build the docker file

  1. With the below command we are pulling ubuntu image. It also contains a tag with which we can specify the version. Here we are pulling the ‘latest’ image.

FROM ubuntu:latest

2. Now, imagine you freshly install Linux os, the first thing you’ll do is update the local database that contains software package info. RUN A ‘-y’ flag is also added to make sure it doesn’t expect any user input.

RUN sudo apt update -y

3. The next task is to install python. There isn’t any need to set up a virtual environment since we are using only one application.

RUN apt install -y python3-pip

4. The next task is to copy all files from the directory to the docker image.

COPY . /app

where ‘.’ is the current directory and ‘/app’ is where we wish to copy the files. You can choose any folder of your choice.

5. Set the working directory to /app.

WORKDIR /app

6. We have pulled ubuntu image, installed python, copied all data. The next will be to set up a python environment by installing all the required packages. We’ve already created requirements.txt file. Now, we just have to install it.

RUN pip3 install -r requirements.txt

7. Now, set ENTRYPOINT as ‘python3’.

ENTRYPOINT [‘python3’]

8. Lastly, run the app.py file

CMD [‘app.py’]

The final file will look like this.

This file will be saved as Dockerfile with no extension.

The final directory structure will look like this

Finally, build the docker file by

$ docker build -t simpleflask .

All this may sound confusing and please bear with me. We are gonna see more examples to learn better. Let’s recap what we learned so far.

  1. You need to create a Dockerfile that will contain information about the Docker Container will setup. It includes an image, python, python environment and then running the app itself.
  2. Then you’ll need to build Dockerfile.

Now, please pay attention here

You can view all docker container by

$ docker ps -a

You will not see any docker container because you have just build it. Now, we’ll run that image.

Type

$ docker images

You can view all your docker images with this command. The next task will be to run that docker image that you just build.

$ docker run -d -p 5000:5000

  • 5000:5000 means you have attached port 5000 of your system to docker. The latter port is of Flask. By default, flask runs on port 5000.
  • -d flag means you want to run it in daemon mode (background).

After running it successfully, you’ll see an output. To check if your container is running or not. Type

$ docker ps -a

If it is running successfully, then in your web browser type localhost:5000. You’ll see “Hello World”.

This isn’t a machine learning prediction model rather we are going to learn how to take input from flask via POST method and print it.

Let’s Create the flask app first

  • We have used the POST method by explicitly declaring it.
  • To receive the input use request.get_json method.
  • Then simply print the value

Let’s create requirements.txt file

$ pip3 list > requirements.txt

The Dockerfile is going to be exactly same as previous

To build the docker file use

$ docker build -t passingarguments .

We’ll follow all the steps we followed above.

$ docker images
$ docker run -d -p 5000:5000

Now, pay attention here

There are various ways to call the API we just created.

  1. Using requests in python.

2. Using POSTMAN

3. Using Curl

curl -X POST http://127.0.0.1:5000/predict -d 5

The result will be the square of the number passed.

In this example, we are going to see how to pass multiple examples at the same time. We are going to pass arguments in the form of a JSON.

For now, you need not worry about model training. The code to train model is in the Docker file and the model will be trained as part of the Docker building process.

Ideally the model should be trained on local system and then should be copied to Docker Container.

Let’s look at the Flask application

This time in flask application we are fetching the parameters from JSON received. JSON has a similar data format as of the python dictionary.

In practice we should also include error handling here. Errors can include if the parameters are not numbers or if parameters are empty.

Also, create a requirements.txt file

$ pip3 list > requirements.txt

Now, let’s look at the Docker file

Notice we are training the model inside Docker Container. You could also train the model on your local system and copy it to Container. In that case you’ll just have to remove the line that is training the model.

Now, there are multiple ways to get a prediction.

  1. Using requests in python

2. Using curl

curl -X POST http://127.0.0.1:5000/predict -d ‘{“f1”:4, “f2”:5, “f3”:10}’

Notice how we are passing the data in JSON format. While using requests in python you’ll have to use json.dumps to data to convert it into JSON form.

For simplicity, I’m using MNIST dataset for the classification task.

Let’s look at the Flask app

  • We are importing all necessary packages
  • We are loading the model on run time to make inference faster
  • We are converting the image to greyscale by using convert(‘L’).
  • The image is resized to 28×28 pixel.
  • For prediction, Keras needs the image in format (batch_size, width, height, colour_channel). So, we are converting it to (1, 28, 28, 1).
  • Next, we are using its prediction method with no.argmax to find the class which has the highest probability.

Let’s create requirements.txt file

$ pip3 list > requirements.txt

Let’s look at its docker file

Note- Here I’ve already trained the model. You could also add the functionality of training while docker file is build.

We’ll follow the steps we did above

$ docker build -t imageclassification .
$ docker run -d -p 5000:5000

Before building the Docker Image make sure to train the model using train.py file included in folder

Now, there are multiple ways to get a prediction.

  1. Using curl

curl -F “file=@pic_name.extension” http://localhost:5000/predict

2. Using requests

That’s enough for now.

If you have understood to some basic extent then congratulations on making up to this far. You’ll learn more when you will try new things and not hesitate to experiment. A master now once was a beginner.

GitHub repo where all the code is present.

Happy Learning!

Leave a Comment

Your email address will not be published. Required fields are marked *