Skip to content

Container Orchestration

  • Container orchestration automates the deployment, scaling, networking, and lifecycle management of containers across a fleet of machines.
  • A single Docker host with docker run is sufficient for development and simple workloads - but production quickly exposes its limits.
  • Orchestrators solve this by abstracting the cluster into a single logical unit and continuously reconciling desired state with actual state.
ProblemWhy Docker Alone Doesn’t Solve It
Container crashesdocker run starts a container once. If it dies, it stays dead. Manual intervention required.
ScalingYou can only add containers on one machine. When the host is full, you’re out of capacity.
DeploymentsUpdating an image requires stopping the old container and starting a new one - there is no built-in rolling update or health gate.
Multi-host networkingContainers on different hosts can’t reach each other without manual networking setup.
Secrets & config at scaleDistributing secrets across many hosts via env files or manual injection doesn’t scale.
Node failureIf the host goes down, every container on it goes down with it.

An orchestrator treats the cluster as a single compute pool and manages:

  • Scheduling - deciding which node a container runs on based on available resources and constraints
  • Self-healing - restarting containers that crash, replacing them on failed nodes
  • Scaling - adding or removing instances in response to load (manually or automatically)
  • Rolling updates - deploying new image versions without downtime, with automatic rollback on failure
  • Service discovery - containers find each other by name, not by IP address
  • Load balancing - distributing traffic across healthy instances automatically
  • Secrets & config - distributing sensitive data encrypted, without embedding it in images
  • Leader election - ensuring exactly one active instance of a stateful component at any given time, with automatic failover

A useful mental model: Kubernetes does for an entire cloud environment what a traditional OS does for a single computer.

LayerTraditional OSKubernetes
AbstractsCPU cores, memory, storage devicesCompute nodes, failure zones, storage volumes
SchedulesApplication processesContainer workloads across nodes
ManagesProcess lifecycle, file I/O, networkingPod lifecycle, service networking, persistent storage

This abstraction makes Kubernetes infrastructure-agnostic - you deploy the same manifest whether the cluster runs on AWS, Azure, GCP, bare metal, or a local data centre. This is what enables hybrid-cloud and multi-cloud strategies: workloads are portable because the underlying infrastructure is fully abstracted.

Beyond the technical capabilities, Kubernetes changes how teams operate:

  • Self-service deployment - all worker nodes are presented as a single unified pool. Developers submit a manifest and Kubernetes decides where to run it. There is no bottleneck waiting for a sysadmin to manually assign a server - developers ship independently.
  • Developer focus on business logic - by hiding infrastructure details behind a standard API, Kubernetes removes the need to write infrastructure-specific code (service discovery wiring, health-check polling, failover scripting). Developers interact with the cluster the same way regardless of the underlying cloud or hardware.
  • Hardware cost reduction - Kubernetes packs workloads onto machines more tightly than any human scheduler can. By continuously analysing available resources and application requirements, it eliminates idle capacity - running more workloads on fewer servers and reducing infrastructure spend.

Kubernetes (K8s) is the dominant container orchestration platform. Originally developed at Google, it is now maintained by the CNCF and backed by every major cloud provider.

  • Runs on any infrastructure: on-premises bare metal, VMs, or cloud
  • Supports any OCI-compliant container image - images built with Docker run on Kubernetes without modification
  • Has a vast ecosystem: Helm, ArgoCD, Prometheus, Istio, Cilium, Kyverno, and hundreds of CNCF projects are built around it
  • All three major cloud providers offer managed Kubernetes - where the control plane is operated, updated, and charged for by the provider:
ProviderServiceNotes
AWSEKS (Elastic Kubernetes Service)Deep IAM and VPC integration
Google CloudGKE (Google Kubernetes Engine)Autopilot mode for fully managed node pools
AzureAKS (Azure Kubernetes Service)Tight integration with Azure AD and Azure Policy
  • Vanilla Kubernetes - the upstream open-source release. Represents the cutting edge but ships with minimal security defaults and requires significant hardening before it is production-ready.
  • Enterprise distributions (Red Hat OpenShift, Rancher, Canonical MicroK8s) - typically one or two versions behind upstream but ship with hardened security defaults, built-in cluster user management, and tooling for deploying third-party applications out of the box.

For most teams, a managed cloud service or an enterprise distribution is a better starting point than self-managing vanilla Kubernetes.

The rest of this section covers Kubernetes in depth: architecture, the API and object model, kubectl, workloads, networking, storage, security, and cluster operations.

Early Kubernetes tightly coupled itself to Docker as its sole container runtime. As the ecosystem matured, Docker became over-engineered for Kubernetes’ needs, and the community standardised a Container Runtime Interface (CRI) - a pluggable layer that lets any compliant runtime be swapped in or out.

Kubernetes removed native Docker support in v1.24. Today most clusters ship with containerd as the default runtime - a stripped-down, Kubernetes-optimised subset of what Docker provides.

RuntimeNotes
containerdDefault on most clusters; lightweight, battle-tested
CRI-OMinimal runtime purpose-built for Kubernetes, used in OpenShift
Kata ContainersVM-backed isolation for workloads requiring stronger security boundaries
gVisor (runsc)Google’s user-space kernel that intercepts syscalls; stronger isolation than runc without a full VM; used in GKE Sandbox
WasmEdge / SpinWasm runtimes; see Wasm on Kubernetes

It is fully supported (and increasingly common) to run multiple runtimes simultaneously across the nodes of a single cluster - for example, containerd for most workloads and a Wasm runtime on a subset of nodes.

Docker Swarm was Docker’s built-in clustering and orchestration solution, introduced around 2015. Its appeal was simplicity - a single docker swarm init command turned a standalone Docker host into an orchestrator node.

Where you’ll still see Swarm:

  • Legacy production systems migrated from Swarm before Kubernetes became the clear standard (~2018–2020)
  • Small industrial or IoT deployments that adopted Swarm and haven’t migrated
  • Teams running Portainer on top of Swarm for a simple self-hosted container platform
DimensionDocker SwarmKubernetes
Setup complexityLow - one commandHigh - many components
Learning curveLowHigh
ScaleThousands of containersTens of thousands of nodes
Auto-scalingManual onlyHPA / VPA / KEDA (automatic)
Rolling updatesBasicFine-grained (canary, blue/green, progressive)
NetworkingOverlay (built-in)Plugin-based (Calico, Cilium, Flannel)
StorageBasic volume supportRich CSI driver ecosystem
Managed cloud optionsNoneAKS, EKS, GKE
EcosystemSmall, Docker Inc.Massive CNCF ecosystem
Active developmentMaintenance modeActively developed

When Swarm still makes sense: A small team already deep in Docker Compose tooling, managing a handful of hosts, with no appetite for Kubernetes complexity and no need for the ecosystem around it. That said, Docker Compose on a single node with a reverse proxy (Traefik, Caddy) is often simpler and sufficient at the same scale.

When to use Kubernetes: Production at any meaningful scale, need for autoscaling, access to the CNCF ecosystem, deploying to cloud infrastructure, or any environment where operational maturity and long-term support matter.

Kubernetes is the right tool for many situations - but not all of them.

  • Monolithic applications - a single large process gains nothing from container orchestration at scale.
  • Very small microservice footprints - if your system consists of fewer than ~5 separate services, the operational overhead of Kubernetes is unlikely to pay off. The platform delivers the most value when managing 20 or more microservices.
  • Early-stage projects - initial adoption carries hidden costs: training, tooling, additional hardware for the cluster itself, and significant developer distraction while the team learns the system.