Docker Image Optimization
- Large images are slower to push, pull, and deploy - and carry more vulnerabilities.
- The default base images on Docker Hub (e.g.,
node,python,golang) include compilers, package managers, debugging tools, and OS utilities needed to build software - but none of that belongs in a production image. - Goal: Ship only what’s needed to run the application.
Why Base Images Are Bloated
Section titled “Why Base Images Are Bloated”A default node:20 image is ~1GB. The Node.js runtime itself is ~80MB. The other ~900MB is:
- The full Debian/Ubuntu base OS
apt-getand system package tools- Build utilities (make, gcc, g++, etc.)
- Development headers
None of these are needed at runtime. They increase attack surface and image pull time for zero benefit in production.
Multi-Stage Builds
Section titled “Multi-Stage Builds”The standard solution: use separate stages for building and running. Only the final stage ships.
# Stage 1: Build environment (large, temporary)FROM node:20 AS builderWORKDIR /appCOPY package*.json ./RUN npm ci --only=production # Install production deps onlyCOPY . .RUN npm run build # Compile, bundle, etc.
# Stage 2: Runtime image (minimal)FROM node:20-alpineWORKDIR /app# Copy ONLY the built output and production node_modulesCOPY --from=builder /app/dist ./distCOPY --from=builder /app/node_modules ./node_modulesEXPOSE 3000USER node # Don't run as rootCMD ["node", "dist/server.js"]- The builder stage (with all dev tooling) is discarded. Only the final
alpinestage is the output image. - Alpine-based: ~150MB. Full Debian equivalent: ~1GB+.
Distroless Images
Section titled “Distroless Images”-
Distroless (maintained by Google) takes this further - images contain only the language runtime and application, with no shell, package manager, or OS utilities at all.
-
Significantly reduces attack surface: an attacker who exploits your app can’t run
bash,curl, orapt-get.FROM golang:1.22 AS builderWORKDIR /appCOPY . .RUN CGO_ENABLED=0 go build -o server .# Distroless: no shell, no package manager, just the Go runtimeFROM gcr.io/distroless/static-debian12COPY --from=builder /app/server /serverENTRYPOINT ["/server"] -
For debugging distroless containers, use ephemeral debug containers:
docker run --rm -it --entrypoint=/busybox/sh gcr.io/distroless/static:debug
Build Cache Ordering
Section titled “Build Cache Ordering”Docker rebuilds every layer after the first cache miss. Layer order matters.
# BAD - source code copied before dependencies are installed# Every code change invalidates the npm install layerCOPY . /appRUN npm install
# GOOD - dependency files copied first, so npm install is cached# unless package.json actually changesCOPY package*.json ./ # Rarely changesRUN npm install # Cached unless package*.json changesCOPY . . # Changes frequently - goes lastRule: Order layers from least-likely-to-change to most-likely-to-change.
Build Cache Mounts (--mount=type=cache)
Section titled “Build Cache Mounts (--mount=type=cache)”For package managers that maintain a local cache (pip, npm, cargo, apt), RUN --mount=type=cache persists the cache directory across builds outside of image layers — so you get cache hits without the cache bloating the image:
# syntax=docker/dockerfile:1
# pip cache persists across builds, never lands in the imageRUN --mount=type=cache,target=/root/.cache/pip \ pip install -r requirements.txt
# Same for npmRUN --mount=type=cache,target=/root/.npm \ npm ciThis can be faster than layer caching alone because the package manager cache survives even when requirements.txt changes. See Art of Writing a Dockerfile for more.
PID 1 and Signal Handling
Section titled “PID 1 and Signal Handling”-
The first process in a container gets PID 1. This matters for container shutdown.
-
Linux signal handlers are not auto-inherited by PID 1. If your app doesn’t register
SIGTERMexplicitly,docker stopwill wait 10 seconds and then sendSIGKILL- potentially causing data loss or incomplete requests.# BAD - shell form: the shell becomes PID 1, not your app# Signals from Docker are sent to the shell, not pythonCMD python3 app.py# GOOD - exec form: your app is PID 1 directlyCMD ["python3", "app.py"] -
For multi-process containers, use
--initto inject a minimal init process (based on tini) as PID 1 — it properly forwards signals and reaps zombie processes.--initis built into Docker and requires no additional packages:docker run --init my-image
Security Best Practices
Section titled “Security Best Practices”- Run as non-root. Root inside a container is root on the host if container namespaces are escaped. Always add and switch to a dedicated user.
RUN addgroup -S app && adduser -S app -G appUSER app
- Use read-only filesystems where possible:
docker run --read-only my-image - Scan images for vulnerabilities. Tools:
docker scout cves my-image, Trivy, Grype. - Pin digest, not just tag. Tags are mutable. For CI/CD:
FROM node:20-alpine@sha256:abc123... - Use Chainguard images for hardened, minimal base images with provenance attestations: chainguard.dev/chainguard-images
Quick Size Comparison
Section titled “Quick Size Comparison”| Base Image | Approx Size | Shell | Package Manager |
|---|---|---|---|
ubuntu:22.04 | ~77MB | bash | apt |
debian:bookworm-slim | ~75MB | bash | apt |
alpine:3.20 | ~7MB | sh | apk |
distroless/static | ~2MB | None | None |
scratch | 0MB | None | None |
scratchis a Docker built-in empty image - used for statically compiled binaries (Go, Rust) that need no runtime libraries.