Container Orchestration
- Container orchestration automates the deployment, scaling, networking, and lifecycle management of containers across a fleet of machines.
- A single Docker host with
docker runis sufficient for development and simple workloads - but production quickly exposes its limits. - Orchestrators solve this by abstracting the cluster into a single logical unit and continuously reconciling desired state with actual state.
The Limits of Single-Host Docker
Section titled “The Limits of Single-Host Docker”| Problem | Why Docker Alone Doesn’t Solve It |
|---|---|
| Container crashes | docker run starts a container once. If it dies, it stays dead. Manual intervention required. |
| Scaling | You can only add containers on one machine. When the host is full, you’re out of capacity. |
| Deployments | Updating an image requires stopping the old container and starting a new one - there is no built-in rolling update or health gate. |
| Multi-host networking | Containers on different hosts can’t reach each other without manual networking setup. |
| Secrets & config at scale | Distributing secrets across many hosts via env files or manual injection doesn’t scale. |
| Node failure | If the host goes down, every container on it goes down with it. |
What Orchestrators Add
Section titled “What Orchestrators Add”An orchestrator treats the cluster as a single compute pool and manages:
- Scheduling - deciding which node a container runs on based on available resources and constraints
- Self-healing - restarting containers that crash, replacing them on failed nodes
- Scaling - adding or removing instances in response to load (manually or automatically)
- Rolling updates - deploying new image versions without downtime, with automatic rollback on failure
- Service discovery - containers find each other by name, not by IP address
- Load balancing - distributing traffic across healthy instances automatically
- Secrets & config - distributing sensitive data encrypted, without embedding it in images
- Leader election - ensuring exactly one active instance of a stateful component at any given time, with automatic failover
Kubernetes as the Cloud OS
Section titled “Kubernetes as the Cloud OS”A useful mental model: Kubernetes does for an entire cloud environment what a traditional OS does for a single computer.
| Layer | Traditional OS | Kubernetes |
|---|---|---|
| Abstracts | CPU cores, memory, storage devices | Compute nodes, failure zones, storage volumes |
| Schedules | Application processes | Container workloads across nodes |
| Manages | Process lifecycle, file I/O, networking | Pod lifecycle, service networking, persistent storage |
This abstraction makes Kubernetes infrastructure-agnostic - you deploy the same manifest whether the cluster runs on AWS, Azure, GCP, bare metal, or a local data centre. This is what enables hybrid-cloud and multi-cloud strategies: workloads are portable because the underlying infrastructure is fully abstracted.
Organisational Impact
Section titled “Organisational Impact”Beyond the technical capabilities, Kubernetes changes how teams operate:
- Self-service deployment - all worker nodes are presented as a single unified pool. Developers submit a manifest and Kubernetes decides where to run it. There is no bottleneck waiting for a sysadmin to manually assign a server - developers ship independently.
- Developer focus on business logic - by hiding infrastructure details behind a standard API, Kubernetes removes the need to write infrastructure-specific code (service discovery wiring, health-check polling, failover scripting). Developers interact with the cluster the same way regardless of the underlying cloud or hardware.
- Hardware cost reduction - Kubernetes packs workloads onto machines more tightly than any human scheduler can. By continuously analysing available resources and application requirements, it eliminates idle capacity - running more workloads on fewer servers and reducing infrastructure spend.
The Landscape in 2026
Section titled “The Landscape in 2026”Kubernetes
Section titled “Kubernetes”Kubernetes (K8s) is the dominant container orchestration platform. Originally developed at Google, it is now maintained by the CNCF and backed by every major cloud provider.
- Runs on any infrastructure: on-premises bare metal, VMs, or cloud
- Supports any OCI-compliant container image - images built with Docker run on Kubernetes without modification
- Has a vast ecosystem: Helm, ArgoCD, Prometheus, Istio, Cilium, Kyverno, and hundreds of CNCF projects are built around it
- All three major cloud providers offer managed Kubernetes - where the control plane is operated, updated, and charged for by the provider:
| Provider | Service | Notes |
|---|---|---|
| AWS | EKS (Elastic Kubernetes Service) | Deep IAM and VPC integration |
| Google Cloud | GKE (Google Kubernetes Engine) | Autopilot mode for fully managed node pools |
| Azure | AKS (Azure Kubernetes Service) | Tight integration with Azure AD and Azure Policy |
Vanilla vs Enterprise Distributions
Section titled “Vanilla vs Enterprise Distributions”- Vanilla Kubernetes - the upstream open-source release. Represents the cutting edge but ships with minimal security defaults and requires significant hardening before it is production-ready.
- Enterprise distributions (Red Hat OpenShift, Rancher, Canonical MicroK8s) - typically one or two versions behind upstream but ship with hardened security defaults, built-in cluster user management, and tooling for deploying third-party applications out of the box.
For most teams, a managed cloud service or an enterprise distribution is a better starting point than self-managing vanilla Kubernetes.
The rest of this section covers Kubernetes in depth: architecture, the API and object model, kubectl, workloads, networking, storage, security, and cluster operations.
The Container Runtime Layer
Section titled “The Container Runtime Layer”Early Kubernetes tightly coupled itself to Docker as its sole container runtime. As the ecosystem matured, Docker became over-engineered for Kubernetes’ needs, and the community standardised a Container Runtime Interface (CRI) - a pluggable layer that lets any compliant runtime be swapped in or out.
Kubernetes removed native Docker support in v1.24. Today most clusters ship with containerd as the default runtime - a stripped-down, Kubernetes-optimised subset of what Docker provides.
| Runtime | Notes |
|---|---|
| containerd | Default on most clusters; lightweight, battle-tested |
| CRI-O | Minimal runtime purpose-built for Kubernetes, used in OpenShift |
| Kata Containers | VM-backed isolation for workloads requiring stronger security boundaries |
| gVisor (runsc) | Google’s user-space kernel that intercepts syscalls; stronger isolation than runc without a full VM; used in GKE Sandbox |
| WasmEdge / Spin | Wasm runtimes; see Wasm on Kubernetes |
It is fully supported (and increasingly common) to run multiple runtimes simultaneously across the nodes of a single cluster - for example, containerd for most workloads and a Wasm runtime on a subset of nodes.
Docker Swarm (Historical Context)
Section titled “Docker Swarm (Historical Context)”Docker Swarm was Docker’s built-in clustering and orchestration solution, introduced around 2015. Its appeal was simplicity - a single docker swarm init command turned a standalone Docker host into an orchestrator node.
Where you’ll still see Swarm:
- Legacy production systems migrated from Swarm before Kubernetes became the clear standard (~2018–2020)
- Small industrial or IoT deployments that adopted Swarm and haven’t migrated
- Teams running Portainer on top of Swarm for a simple self-hosted container platform
Swarm vs Kubernetes
Section titled “Swarm vs Kubernetes”| Dimension | Docker Swarm | Kubernetes |
|---|---|---|
| Setup complexity | Low - one command | High - many components |
| Learning curve | Low | High |
| Scale | Thousands of containers | Tens of thousands of nodes |
| Auto-scaling | Manual only | HPA / VPA / KEDA (automatic) |
| Rolling updates | Basic | Fine-grained (canary, blue/green, progressive) |
| Networking | Overlay (built-in) | Plugin-based (Calico, Cilium, Flannel) |
| Storage | Basic volume support | Rich CSI driver ecosystem |
| Managed cloud options | None | AKS, EKS, GKE |
| Ecosystem | Small, Docker Inc. | Massive CNCF ecosystem |
| Active development | Maintenance mode | Actively developed |
When Swarm still makes sense: A small team already deep in Docker Compose tooling, managing a handful of hosts, with no appetite for Kubernetes complexity and no need for the ecosystem around it. That said, Docker Compose on a single node with a reverse proxy (Traefik, Caddy) is often simpler and sufficient at the same scale.
When to use Kubernetes: Production at any meaningful scale, need for autoscaling, access to the CNCF ecosystem, deploying to cloud infrastructure, or any environment where operational maturity and long-term support matter.
When Not to Use Kubernetes
Section titled “When Not to Use Kubernetes”Kubernetes is the right tool for many situations - but not all of them.
- Monolithic applications - a single large process gains nothing from container orchestration at scale.
- Very small microservice footprints - if your system consists of fewer than ~5 separate services, the operational overhead of Kubernetes is unlikely to pay off. The platform delivers the most value when managing 20 or more microservices.
- Early-stage projects - initial adoption carries hidden costs: training, tooling, additional hardware for the cluster itself, and significant developer distraction while the team learns the system.