Container Orchestration

Container orchestration automates the deployment, scaling, networking, and lifecycle management of containers across a fleet of machines.
A single Docker host with docker run is sufficient for development and simple workloads - but production quickly exposes its limits.
Orchestrators solve this by abstracting the cluster into a single logical unit and continuously reconciling desired state with actual state.

The Limits of Single-Host Docker

Problem	Why Docker Alone Doesn’t Solve It
Container crashes	`docker run` starts a container once. If it dies, it stays dead. Manual intervention required.
Scaling	You can only add containers on one machine. When the host is full, you’re out of capacity.
Deployments	Updating an image requires stopping the old container and starting a new one - there is no built-in rolling update or health gate.
Multi-host networking	Containers on different hosts can’t reach each other without manual networking setup.
Secrets & config at scale	Distributing secrets across many hosts via env files or manual injection doesn’t scale.
Node failure	If the host goes down, every container on it goes down with it.

What Orchestrators Add

An orchestrator treats the cluster as a single compute pool and manages:

Scheduling - deciding which node a container runs on based on available resources and constraints
Self-healing - restarting containers that crash, replacing them on failed nodes
Scaling - adding or removing instances in response to load (manually or automatically)
Rolling updates - deploying new image versions without downtime, with automatic rollback on failure
Service discovery - containers find each other by name, not by IP address
Load balancing - distributing traffic across healthy instances automatically
Secrets & config - distributing sensitive data encrypted, without embedding it in images
Leader election - ensuring exactly one active instance of a stateful component at any given time, with automatic failover

Kubernetes as the Cloud OS

A useful mental model: Kubernetes does for an entire cloud environment what a traditional OS does for a single computer.

Layer	Traditional OS	Kubernetes
Abstracts	CPU cores, memory, storage devices	Compute nodes, failure zones, storage volumes
Schedules	Application processes	Container workloads across nodes
Manages	Process lifecycle, file I/O, networking	Pod lifecycle, service networking, persistent storage

This abstraction makes Kubernetes infrastructure-agnostic - you deploy the same manifest whether the cluster runs on AWS, Azure, GCP, bare metal, or a local data centre. This is what enables hybrid-cloud and multi-cloud strategies: workloads are portable because the underlying infrastructure is fully abstracted.

Organisational Impact

Beyond the technical capabilities, Kubernetes changes how teams operate:

Self-service deployment - all worker nodes are presented as a single unified pool. Developers submit a manifest and Kubernetes decides where to run it. There is no bottleneck waiting for a sysadmin to manually assign a server - developers ship independently.
Developer focus on business logic - by hiding infrastructure details behind a standard API, Kubernetes removes the need to write infrastructure-specific code (service discovery wiring, health-check polling, failover scripting). Developers interact with the cluster the same way regardless of the underlying cloud or hardware.
Hardware cost reduction - Kubernetes packs workloads onto machines more tightly than any human scheduler can. By continuously analysing available resources and application requirements, it eliminates idle capacity - running more workloads on fewer servers and reducing infrastructure spend.

The Landscape in 2026

Kubernetes

Kubernetes (K8s) is the dominant container orchestration platform. Originally developed at Google, it is now maintained by the CNCF and backed by every major cloud provider.

Runs on any infrastructure: on-premises bare metal, VMs, or cloud
Supports any OCI-compliant container image - images built with Docker run on Kubernetes without modification
Has a vast ecosystem: Helm, ArgoCD, Prometheus, Istio, Cilium, Kyverno, and hundreds of CNCF projects are built around it
All three major cloud providers offer managed Kubernetes - where the control plane is operated, updated, and charged for by the provider:

Provider	Service	Notes
AWS	EKS (Elastic Kubernetes Service)	Deep IAM and VPC integration
Google Cloud	GKE (Google Kubernetes Engine)	Autopilot mode for fully managed node pools
Azure	AKS (Azure Kubernetes Service)	Tight integration with Azure AD and Azure Policy

Vanilla vs Enterprise Distributions

Vanilla Kubernetes - the upstream open-source release. Represents the cutting edge but ships with minimal security defaults and requires significant hardening before it is production-ready.
Enterprise distributions (Red Hat OpenShift, Rancher, Canonical MicroK8s) - typically one or two versions behind upstream but ship with hardened security defaults, built-in cluster user management, and tooling for deploying third-party applications out of the box.

For most teams, a managed cloud service or an enterprise distribution is a better starting point than self-managing vanilla Kubernetes.

The rest of this section covers Kubernetes in depth: architecture, the API and object model, kubectl, workloads, networking, storage, security, and cluster operations.

The Container Runtime Layer

Early Kubernetes tightly coupled itself to Docker as its sole container runtime. As the ecosystem matured, Docker became over-engineered for Kubernetes’ needs, and the community standardised a Container Runtime Interface (CRI) - a pluggable layer that lets any compliant runtime be swapped in or out.

Kubernetes removed native Docker support in v1.24. Today most clusters ship with containerd as the default runtime - a stripped-down, Kubernetes-optimised subset of what Docker provides.

Runtime	Notes
containerd	Default on most clusters; lightweight, battle-tested
CRI-O	Minimal runtime purpose-built for Kubernetes, used in OpenShift
Kata Containers	VM-backed isolation for workloads requiring stronger security boundaries
gVisor (runsc)	Google’s user-space kernel that intercepts syscalls; stronger isolation than runc without a full VM; used in GKE Sandbox
WasmEdge / Spin	Wasm runtimes; see Wasm on Kubernetes

It is fully supported (and increasingly common) to run multiple runtimes simultaneously across the nodes of a single cluster - for example, containerd for most workloads and a Wasm runtime on a subset of nodes.

Docker Swarm (Historical Context)

Docker Swarm was Docker’s built-in clustering and orchestration solution, introduced around 2015. Its appeal was simplicity - a single docker swarm init command turned a standalone Docker host into an orchestrator node.

Where you’ll still see Swarm:

Legacy production systems migrated from Swarm before Kubernetes became the clear standard (~2018–2020)
Small industrial or IoT deployments that adopted Swarm and haven’t migrated
Teams running Portainer on top of Swarm for a simple self-hosted container platform

Swarm vs Kubernetes

Dimension	Docker Swarm	Kubernetes
Setup complexity	Low - one command	High - many components
Learning curve	Low	High
Scale	Thousands of containers	Tens of thousands of nodes
Auto-scaling	Manual only	HPA / VPA / KEDA (automatic)
Rolling updates	Basic	Fine-grained (canary, blue/green, progressive)
Networking	Overlay (built-in)	Plugin-based (Calico, Cilium, Flannel)
Storage	Basic volume support	Rich CSI driver ecosystem
Managed cloud options	None	AKS, EKS, GKE
Ecosystem	Small, Docker Inc.	Massive CNCF ecosystem
Active development	Maintenance mode	Actively developed

When Swarm still makes sense: A small team already deep in Docker Compose tooling, managing a handful of hosts, with no appetite for Kubernetes complexity and no need for the ecosystem around it. That said, Docker Compose on a single node with a reverse proxy (Traefik, Caddy) is often simpler and sufficient at the same scale.

When to use Kubernetes: Production at any meaningful scale, need for autoscaling, access to the CNCF ecosystem, deploying to cloud infrastructure, or any environment where operational maturity and long-term support matter.

When Not to Use Kubernetes

Kubernetes is the right tool for many situations - but not all of them.

Monolithic applications - a single large process gains nothing from container orchestration at scale.
Very small microservice footprints - if your system consists of fewer than ~5 separate services, the operational overhead of Kubernetes is unlikely to pay off. The platform delivers the most value when managing 20 or more microservices.
Early-stage projects - initial adoption carries hidden costs: training, tooling, additional hardware for the cluster itself, and significant developer distraction while the team learns the system.