Introduction

When I was containerizing my blog search service, I went through multiple iterations of the Dockerfile as I was learning how to create an image. Containerizing was easy but I wanted to create a small and efficient image and that process was a bit more involved than I expected. I want to go into a little detail about what I learned about writing a Dockerfile that puts importance on the final image size.

I’m going to show a comparison of various Dockerfiles and at the end I’ll provide a table of the image sizes produced from each one. That should make it easy to see the impact of each optimization.

Example Service

For the Dockerfile examples, I need something to go into the image. I’m going to use a small example Python service to make it easier to look at the Dockerfile iterations instead of using the search service. This service is still a gunicorn powered Falcon API based web service and to keep it simple, it just says “hi”.

Dependencies

Starting off we have the Python dependencies that we’ll need.

requirements.txt

Plain-Text-Markdown-Extention @ git+https://github.com/kostyachum/python-markdown-plain-text.git#egg=plain-text-markdown-extention
falcon
gunicorn

I’m going to pretend the “Plain Text Markdown Extension” is required for the service even though it isn’t. This is to demonstrate needing pip to install a dependency from GitHub. This will force git to be pulled into the image as a build dependency. Just pretend this is necessary because I want to simulate a real world situation.

Service

Then there is the super simple demonstration service.

ex_serv.py

#!/usr/bin/env python

import falcon

class ExResource:

    def on_get(self, req, resp):
        resp.status = falcon.HTTP_200
        resp.text = "hi"

app = falcon.App()
exr = ExResource()
app.add_route('/', exr)

“Hi” service!

Optimizations

The absolute most important factor is base image selection because the base image can have the largest impact on the final image size.

Second but somewhat less important is splitting build and release into two Dockerfile stages. Having a build and release image within the Dockerfile makes a big difference.

Base Image Selection

When I first started building the service I used “python:3.11” as the base image because it made the most sense. This is a Python application so the main Python image should be perfect. However, this image is a full Debian install and quite large.

Next, I tried the “python:3.11-slim” image which is a slimmer Debian base. It’s considerably smaller but still larger than I think is reasonable.

Next up was “python:3.11-alpine”, which was quite good and produced a small final image. Yet I felt like it could be even smaller.

So I tried “alpine:latest” and installing Python myself. This was interesting because without splitting the build and release steps this ended up larger than the “python:3.11-alpine” image. However, it was smaller when splitting build and release.

Single vs Build + Release Images

Let’s look at what goes into building a single and a split image.

Single Image

When creating a single image everything is installed into and left in a single image. This includes build dependencies like git. This makes for a very simple Dockerfile but leaves things in the image that aren’t needed to run the service.

While this is a very simple example if you look at the image size section you’ll see how much of an impact leaving something like git installed makes. This is mainly due to how many dependencies git itself pulls in.

Split Image

The split image first creates a “build” image that installs all dependences. Including build dependencies. Then the application is “built” within this image. However, this being a simple Python application there really isn’t a “build” step per say.

Once the application is “built” a second “release” image is created where the dependencies are copied from the “build” image into the “release” image. The service is copied into the “release” image too. The “release” image defines all configuration we want to expose. Such as, the default port the service will be listening on and the command to start gunicorn.

Finally, the “build” image is discarded by the Docker build process because only the last image defined in the Dockerfile is retained.

The split image is a bit more complex but doesn’t pull git, its dependencies, or the requirements.txt file into the “release” image. This makes a bigger difference that you’d think. Even with a project this small and simple.

Dockerfiles

python:3.11

Single
FROM python:3.11
EXPOSE 80
WORKDIR /app

COPY ./requirements.txt .

RUN apt-get install -y git
RUN pip3 install --no-cache-dir -r requirements.txt

COPY ./ex_serv.py .

CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]
Split
FROM python:3.11 AS build
WORKDIR /app

COPY ./requirements.txt .

RUN apt-get install -y git

RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

RUN pip3 install --no-cache-dir -r requirements.txt
RUN pip3 uninstall -y pip setuptools packaging

FROM python:3.11 AS release
EXPOSE 80
WORKDIR /app

COPY ./ex_serv.py .

COPY --from=build /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]

python:3.11-slim

For the slim image we have to run apt-get update in order to populate the package manager’s file listing. Otherwise, we’ll get an error when trying to install git.

Single
FROM python:3.11-slim
EXPOSE 80
WORKDIR /app

COPY ./requirements.txt .

RUN apt-get update
RUN apt-get install -y git
RUN pip3 install --no-cache-dir -r requirements.txt

COPY ./ex_serv.py .

CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]
Split
FROM python:3.11-slim AS build
WORKDIR /app

COPY ./requirements.txt .

RUN apt-get update
RUN apt-get install -y git

RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

RUN pip3 install --no-cache-dir -r requirements.txt
RUN pip3 uninstall -y pip setuptools packaging

FROM python:3.11-slim AS release
EXPOSE 80
WORKDIR /app

COPY ./ex_serv.py .

COPY --from=build /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]

python:3.11-alpine

Single
FROM python:3.11-alpine
EXPOSE 80
WORKDIR /app

COPY ./requirements.txt .

RUN apk add --no-cache git
RUN pip3 install --no-cache-dir -r requirements.txt

COPY ./ex_serv.py .

CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]
Split
FROM python:3.11-alpine AS build
WORKDIR /app

COPY ./requirements.txt .

RUN apk add --no-cache git

RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

RUN pip3 install --no-cache-dir -r requirements.txt
RUN pip3 uninstall -y pip setuptools packaging

FROM python:3.11-alpine AS release
EXPOSE 80
WORKDIR /app

COPY ./ex_serv.py .

COPY --from=build /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]

alpine:latest

Single
FROM alpine:latest
EXPOSE 80
WORKDIR /app

COPY ./requirements.txt .

RUN apk add --no-cache git python3
RUN python3 -m ensurepip
RUN pip3 install --no-cache-dir -r requirements.txt

COPY ./ex_serv.py .

CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]
Split
FROM alpine:latest AS build
WORKDIR /app

COPY ./requirements.txt .

RUN apk add --no-cache git python3
RUN python3 -m ensurepip

RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

RUN pip3 install --no-cache-dir -r requirements.txt
RUN pip3 uninstall -y pip setuptools packaging

FROM alpine:latest AS release
EXPOSE 80
WORKDIR /app

RUN apk add --no-cache python3

COPY ./ex_serv.py .

COPY --from=build /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

VOLUME /data

CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]

Image Sizes

Base ImageSize (MB) SingleSize (MB) Build + Release
python:3.1110601040
python:3.11-slim328.90191.37
python:3.11-alpine90.6560.34
alpine:latest95.2556.57

Conclusion

Both Alpine bases are by far the smallest and considerably smaller than the Debian based images. Unless there is something that doesn’t work with Alpine Linux, use Alpine as your base.

Also, use the two step build and release image process in the Dockerfile. While in this example there isn’t a huge difference in size, it will scale up quickly the more build dependencies there are. For example, *-dev packages for libraries and an entire clang tool chain would quickly grow the image size.