Skip to content

Docker Image Optimization

  • Large images are slower to push, pull, and deploy - and carry more vulnerabilities.
  • The default base images on Docker Hub (e.g., node, python, golang) include compilers, package managers, debugging tools, and OS utilities needed to build software - but none of that belongs in a production image.
  • Goal: Ship only what’s needed to run the application.

A default node:20 image is ~1GB. The Node.js runtime itself is ~80MB. The other ~900MB is:

  • The full Debian/Ubuntu base OS
  • apt-get and system package tools
  • Build utilities (make, gcc, g++, etc.)
  • Development headers

None of these are needed at runtime. They increase attack surface and image pull time for zero benefit in production.

The standard solution: use separate stages for building and running. Only the final stage ships.

# Stage 1: Build environment (large, temporary)
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production # Install production deps only
COPY . .
RUN npm run build # Compile, bundle, etc.
# Stage 2: Runtime image (minimal)
FROM node:20-alpine
WORKDIR /app
# Copy ONLY the built output and production node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
USER node # Don't run as root
CMD ["node", "dist/server.js"]
  • The builder stage (with all dev tooling) is discarded. Only the final alpine stage is the output image.
  • Alpine-based: ~150MB. Full Debian equivalent: ~1GB+.
  • Distroless (maintained by Google) takes this further - images contain only the language runtime and application, with no shell, package manager, or OS utilities at all.

  • Significantly reduces attack surface: an attacker who exploits your app can’t run bash, curl, or apt-get.

    FROM golang:1.22 AS builder
    WORKDIR /app
    COPY . .
    RUN CGO_ENABLED=0 go build -o server .
    # Distroless: no shell, no package manager, just the Go runtime
    FROM gcr.io/distroless/static-debian12
    COPY --from=builder /app/server /server
    ENTRYPOINT ["/server"]
  • For debugging distroless containers, use ephemeral debug containers: docker run --rm -it --entrypoint=/busybox/sh gcr.io/distroless/static:debug

Docker rebuilds every layer after the first cache miss. Layer order matters.

# BAD - source code copied before dependencies are installed
# Every code change invalidates the npm install layer
COPY . /app
RUN npm install
# GOOD - dependency files copied first, so npm install is cached
# unless package.json actually changes
COPY package*.json ./ # Rarely changes
RUN npm install # Cached unless package*.json changes
COPY . . # Changes frequently - goes last

Rule: Order layers from least-likely-to-change to most-likely-to-change.

For package managers that maintain a local cache (pip, npm, cargo, apt), RUN --mount=type=cache persists the cache directory across builds outside of image layers — so you get cache hits without the cache bloating the image:

# syntax=docker/dockerfile:1
# pip cache persists across builds, never lands in the image
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt
# Same for npm
RUN --mount=type=cache,target=/root/.npm \
npm ci

This can be faster than layer caching alone because the package manager cache survives even when requirements.txt changes. See Art of Writing a Dockerfile for more.

  • The first process in a container gets PID 1. This matters for container shutdown.

  • Linux signal handlers are not auto-inherited by PID 1. If your app doesn’t register SIGTERM explicitly, docker stop will wait 10 seconds and then send SIGKILL - potentially causing data loss or incomplete requests.

    # BAD - shell form: the shell becomes PID 1, not your app
    # Signals from Docker are sent to the shell, not python
    CMD python3 app.py
    # GOOD - exec form: your app is PID 1 directly
    CMD ["python3", "app.py"]
  • For multi-process containers, use --init to inject a minimal init process (based on tini) as PID 1 — it properly forwards signals and reaps zombie processes. --init is built into Docker and requires no additional packages: docker run --init my-image

  • Run as non-root. Root inside a container is root on the host if container namespaces are escaped. Always add and switch to a dedicated user.
    RUN addgroup -S app && adduser -S app -G app
    USER app
  • Use read-only filesystems where possible: docker run --read-only my-image
  • Scan images for vulnerabilities. Tools: docker scout cves my-image, Trivy, Grype.
  • Pin digest, not just tag. Tags are mutable. For CI/CD: FROM node:20-alpine@sha256:abc123...
  • Use Chainguard images for hardened, minimal base images with provenance attestations: chainguard.dev/chainguard-images
Base ImageApprox SizeShellPackage Manager
ubuntu:22.04~77MBbashapt
debian:bookworm-slim~75MBbashapt
alpine:3.20~7MBshapk
distroless/static~2MBNoneNone
scratch0MBNoneNone
  • scratch is a Docker built-in empty image - used for statically compiled binaries (Go, Rust) that need no runtime libraries.