Docker Image Optimization

Large images are slower to push, pull, and deploy - and carry more vulnerabilities.
The default base images on Docker Hub (e.g., node, python, golang) include compilers, package managers, debugging tools, and OS utilities needed to build software - but none of that belongs in a production image.
Goal: Ship only what’s needed to run the application.

Why Base Images Are Bloated

A default node:20 image is ~1GB. The Node.js runtime itself is ~80MB. The other ~900MB is:

The full Debian/Ubuntu base OS
apt-get and system package tools
Build utilities (make, gcc, g++, etc.)
Development headers

None of these are needed at runtime. They increase attack surface and image pull time for zero benefit in production.

Multi-Stage Builds

The standard solution: use separate stages for building and running. Only the final stage ships.

# Stage 1: Build environment (large, temporary)
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production   # Install production deps only
COPY . .
RUN npm run build              # Compile, bundle, etc.

# Stage 2: Runtime image (minimal)
FROM node:20-alpine
WORKDIR /app
# Copy ONLY the built output and production node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
USER node                      # Don't run as root
CMD ["node", "dist/server.js"]

The builder stage (with all dev tooling) is discarded. Only the final alpine stage is the output image.
Alpine-based: ~150MB. Full Debian equivalent: ~1GB+.

Distroless Images

Distroless (maintained by Google) takes this further - images contain only the language runtime and application, with no shell, package manager, or OS utilities at all.

Significantly reduces attack surface: an attacker who exploits your app can’t run bash, curl, or apt-get.

FROM golang:1.22 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o server .

# Distroless: no shell, no package manager, just the Go runtime
FROM gcr.io/distroless/static-debian12
COPY --from=builder /app/server /server
ENTRYPOINT ["/server"]

For debugging distroless containers, use ephemeral debug containers: docker run --rm -it --entrypoint=/busybox/sh gcr.io/distroless/static:debug

Build Cache Ordering

Docker rebuilds every layer after the first cache miss. Layer order matters.

# BAD - source code copied before dependencies are installed
# Every code change invalidates the npm install layer
COPY . /app
RUN npm install

# GOOD - dependency files copied first, so npm install is cached
# unless package.json actually changes
COPY package*.json ./    # Rarely changes
RUN npm install          # Cached unless package*.json changes
COPY . .                 # Changes frequently - goes last

Rule: Order layers from least-likely-to-change to most-likely-to-change.

Build Cache Mounts (`--mount=type=cache`)

For package managers that maintain a local cache (pip, npm, cargo, apt), RUN --mount=type=cache persists the cache directory across builds outside of image layers — so you get cache hits without the cache bloating the image:

# syntax=docker/dockerfile:1

# pip cache persists across builds, never lands in the image
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt

# Same for npm
RUN --mount=type=cache,target=/root/.npm \
    npm ci

This can be faster than layer caching alone because the package manager cache survives even when requirements.txt changes. See Art of Writing a Dockerfile for more.

PID 1 and Signal Handling

The first process in a container gets PID 1. This matters for container shutdown.
Linux signal handlers are not auto-inherited by PID 1. If your app doesn’t register SIGTERM explicitly, docker stop will wait 10 seconds and then send SIGKILL - potentially causing data loss or incomplete requests.
```
# BAD - shell form: the shell becomes PID 1, not your app
# Signals from Docker are sent to the shell, not python
CMD python3 app.py

# GOOD - exec form: your app is PID 1 directly
CMD ["python3", "app.py"]
```
For multi-process containers, use --init to inject a minimal init process (based on tini) as PID 1 — it properly forwards signals and reaps zombie processes. --init is built into Docker and requires no additional packages: docker run --init my-image

Security Best Practices

Run as non-root. Root inside a container is root on the host if container namespaces are escaped. Always add and switch to a dedicated user.
```
RUN addgroup -S app && adduser -S app -G app
USER app
```
Use read-only filesystems where possible: docker run --read-only my-image
Scan images for vulnerabilities. Tools: docker scout cves my-image, Trivy, Grype.
Pin digest, not just tag. Tags are mutable. For CI/CD: FROM node:20-alpine@sha256:abc123...
Use Chainguard images for hardened, minimal base images with provenance attestations: chainguard.dev/chainguard-images

Quick Size Comparison

Base Image	Approx Size	Shell	Package Manager
`ubuntu:22.04`	~77MB	bash	apt
`debian:bookworm-slim`	~75MB	bash	apt
`alpine:3.20`	~7MB	sh	apk
`distroless/static`	~2MB	None	None
`scratch`	0MB	None	None

scratch is a Docker built-in empty image - used for statically compiled binaries (Go, Rust) that need no runtime libraries.