Introduction
When I was containerizing my blog search service, I went through multiple iterations of the Dockerfile as I was learning how to create an image. Containerizing was easy but I wanted to create a small and efficient image and that process was a bit more involved than I expected. I want to go into a little detail about what I learned about writing a Dockerfile that puts importance on the final image size.
I’m going to show a comparison of various Dockerfiles and at the end I’ll provide a table of the image sizes produced from each one. That should make it easy to see the impact of each optimization.
Example Service
For the Dockerfile examples, I need something to go into the image. I’m going
to use a small example Python service to make it easier to look at the
Dockerfile iterations instead of using the search service. This service is
still a gunicorn
powered Falcon API based web service and to keep it simple,
it just says “hi”.
Dependencies
Starting off we have the Python dependencies that we’ll need.
requirements.txt
Plain-Text-Markdown-Extention @ git+https://github.com/kostyachum/python-markdown-plain-text.git#egg=plain-text-markdown-extention
falcon
gunicorn
I’m going to pretend the “Plain Text Markdown Extension” is required for the
service even though it isn’t. This is to demonstrate needing pip
to install a
dependency from GitHub. This will force git
to be pulled into the
image as a build dependency. Just pretend this is necessary because I want to
simulate a real world situation.
Service
Then there is the super simple demonstration service.
ex_serv.py
#!/usr/bin/env python
import falcon
class ExResource:
def on_get(self, req, resp):
resp.status = falcon.HTTP_200
resp.text = "hi"
app = falcon.App()
exr = ExResource()
app.add_route('/', exr)
“Hi” service!
Optimizations
The absolute most important factor is base image selection because the base image can have the largest impact on the final image size.
Second but somewhat less important is splitting build and release into two Dockerfile stages. Having a build and release image within the Dockerfile makes a big difference.
Base Image Selection
When I first started building the service I used “python:3.11” as the base image because it made the most sense. This is a Python application so the main Python image should be perfect. However, this image is a full Debian install and quite large.
Next, I tried the “python:3.11-slim” image which is a slimmer Debian base. It’s considerably smaller but still larger than I think is reasonable.
Next up was “python:3.11-alpine”, which was quite good and produced a small final image. Yet I felt like it could be even smaller.
So I tried “alpine:latest” and installing Python myself. This was interesting because without splitting the build and release steps this ended up larger than the “python:3.11-alpine” image. However, it was smaller when splitting build and release.
Single vs Build + Release Images
Let’s look at what goes into building a single and a split image.
Single Image
When creating a single image everything is installed into and left in a single image.
This includes build dependencies like git
. This makes for a very simple
Dockerfile but leaves things in the image that aren’t needed to run the service.
While this is a very simple example if you look at the image size section you’ll see how
much of an impact leaving something like git
installed makes. This is mainly due to
how many dependencies git
itself pulls in.
Split Image
The split image first creates a “build” image that installs all dependences. Including build dependencies. Then the application is “built” within this image. However, this being a simple Python application there really isn’t a “build” step per say.
Once the application is “built” a second “release” image is created where the dependencies are
copied from the “build” image into the “release” image. The service is copied into the
“release” image too. The “release” image defines all configuration we want to expose. Such
as, the default port the service will be listening on and the command to start gunicorn
.
Finally, the “build” image is discarded by the Docker build process because only the last image defined in the Dockerfile is retained.
The split image is a bit more complex but doesn’t pull git
, its dependencies, or
the requirements.txt
file into the “release” image. This makes a bigger difference
that you’d think. Even with a project this small and simple.
Dockerfiles
python:3.11
Single
FROM python:3.11
EXPOSE 80
WORKDIR /app
COPY ./requirements.txt .
RUN apt-get install -y git
RUN pip3 install --no-cache-dir -r requirements.txt
COPY ./ex_serv.py .
CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]
Split
FROM python:3.11 AS build
WORKDIR /app
COPY ./requirements.txt .
RUN apt-get install -y git
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
RUN pip3 install --no-cache-dir -r requirements.txt
RUN pip3 uninstall -y pip setuptools packaging
FROM python:3.11 AS release
EXPOSE 80
WORKDIR /app
COPY ./ex_serv.py .
COPY --from=build /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]
python:3.11-slim
For the slim image we have to run apt-get update
in order to populate
the package manager’s file listing. Otherwise, we’ll get an error when
trying to install git
.
Single
FROM python:3.11-slim
EXPOSE 80
WORKDIR /app
COPY ./requirements.txt .
RUN apt-get update
RUN apt-get install -y git
RUN pip3 install --no-cache-dir -r requirements.txt
COPY ./ex_serv.py .
CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]
Split
FROM python:3.11-slim AS build
WORKDIR /app
COPY ./requirements.txt .
RUN apt-get update
RUN apt-get install -y git
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
RUN pip3 install --no-cache-dir -r requirements.txt
RUN pip3 uninstall -y pip setuptools packaging
FROM python:3.11-slim AS release
EXPOSE 80
WORKDIR /app
COPY ./ex_serv.py .
COPY --from=build /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]
python:3.11-alpine
Single
FROM python:3.11-alpine
EXPOSE 80
WORKDIR /app
COPY ./requirements.txt .
RUN apk add --no-cache git
RUN pip3 install --no-cache-dir -r requirements.txt
COPY ./ex_serv.py .
CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]
Split
FROM python:3.11-alpine AS build
WORKDIR /app
COPY ./requirements.txt .
RUN apk add --no-cache git
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
RUN pip3 install --no-cache-dir -r requirements.txt
RUN pip3 uninstall -y pip setuptools packaging
FROM python:3.11-alpine AS release
EXPOSE 80
WORKDIR /app
COPY ./ex_serv.py .
COPY --from=build /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]
alpine:latest
Single
FROM alpine:latest
EXPOSE 80
WORKDIR /app
COPY ./requirements.txt .
RUN apk add --no-cache git python3
RUN python3 -m ensurepip
RUN pip3 install --no-cache-dir -r requirements.txt
COPY ./ex_serv.py .
CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]
Split
FROM alpine:latest AS build
WORKDIR /app
COPY ./requirements.txt .
RUN apk add --no-cache git python3
RUN python3 -m ensurepip
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
RUN pip3 install --no-cache-dir -r requirements.txt
RUN pip3 uninstall -y pip setuptools packaging
FROM alpine:latest AS release
EXPOSE 80
WORKDIR /app
RUN apk add --no-cache python3
COPY ./ex_serv.py .
COPY --from=build /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
VOLUME /data
CMD ["gunicorn", "--bind", "0.0.0.0:80", "ex_serv:app" ]
Image Sizes
Base Image | Size (MB) Single | Size (MB) Build + Release |
---|---|---|
python:3.11 | 1060 | 1040 |
python:3.11-slim | 328.90 | 191.37 |
python:3.11-alpine | 90.65 | 60.34 |
alpine:latest | 95.25 | 56.57 |
Conclusion
Both Alpine bases are by far the smallest and considerably smaller than the Debian based images. Unless there is something that doesn’t work with Alpine Linux, use Alpine as your base.
Also, use the two step build and release image process in the Dockerfile.
While in this example there isn’t a huge difference in size, it will scale up quickly
the more build dependencies there are. For example, *-dev
packages for libraries and
an entire clang
tool chain would quickly grow the image size.