Infrastructure as Code
Infrastructure as Code (IaC) is the practice of provisioning and managing infrastructure using code rather than manual command-line tools or ClickOps GUIs. Beyond the mechanics of provisioning, IaC is fundamentally about applying the principles, practices, and tools of software engineering to infrastructure - version control, code review, automated testing, and continuous delivery, all applied to cloud resources.
The result is infrastructure that is reproducible, auditable, and safe to change at speed.
From Iron Age to Cloud Age
Section titled “From Iron Age to Cloud Age”Modern infrastructure management has evolved through distinct phases. Understanding where we came from explains why IaC exists:
| Era | Technology | Operations | Governance mindset |
|---|---|---|---|
| Iron Age | Physical servers, monolithic architectures | Manual runbooks, hand-configured servers | Change = risk. CABs throttle change to prevent mistakes |
| Shadow IT | Early cloud, skunkworks DevOps teams | Bypassing formal IT to avoid restrictive policy | ”Move fast and break things” |
| Age of Sprawl | Multiple disconnected cloud initiatives | Rushed adoption, accumulating technical debt | Speed at all costs - multiple vendors, varying stacks |
| Age of Sustainable Growth | Rationalised, automated infrastructure | Selective investment, cost-conscious | Efficient, sustainable growth with less waste |
| Cloud Age (target) | Virtualised resources, microservices, containers | Code-driven automation | Frequent, small changes are a stability mechanism - not a risk |
The Cloud Age reframes change: stability comes from making changes, not from preventing them. A system that cannot be patched quickly remains vulnerable. A system that cannot be rebuilt quickly cannot recover from failure.
Why IaC?
Section titled “Why IaC?”Before IaC, infrastructure was managed manually. Servers were configured by hand, environments were documented in wikis that fell out of date, and the difference between staging and production was tribal knowledge. Reproducing a broken environment took days. Recovering from a failure took longer.
| Benefit | What it means in practice |
|---|---|
| Repeatability | The same code, run twice, produces identical infrastructure. No snowflake servers or “works on my machine” environments. |
| Reusability | Modular building blocks shared across teams. One team’s battle-tested VPC module becomes everyone’s VPC module. |
| Shareability | Code lives in Git - reviewable, forkable, versioned. Infrastructure decisions are documented by the commit that made them. |
| Auditability | Every change has a commit, a PR, and a deployment record. Compliance questions become a git log. |
| Recovery speed | Environments recreated from scratch in minutes. Disaster recovery becomes a pipeline run, not a multi-day manual operation. |
Key Concepts at a Glance
Section titled “Key Concepts at a Glance”The philosophical foundations of IaC - myths, metrics, and design principles - are covered in depth in IaC Principles. Here’s the summary:
| Concept | Key insight |
|---|---|
| Three myths | Infrastructure changes constantly; you can’t automate later; speed and quality reinforce each other |
| DORA metrics | Delivery lead time, deployment frequency, change fail rate, MTTR - proven correlates of success |
| Cloud principles | Reproducible, disposable, variation-free, no snowflakes - design for unreliable hardware |
| Automation fear spiral | Drift → fear → manual changes → more drift. Break it with incremental, scheduled enforcement |
| Strategic alignment | Customer value → org strategy → product strategy → tech strategy → infra strategy |
The Three Core Practices
Section titled “The Three Core Practices”IaC rests on three foundational practices. These aren’t recommendations - they’re what separates IaC from “just using an IaC tool”:
- Define everything as code - configuration, versions, dependencies, secrets injection. If it’s not in code, it’s tribal knowledge.
- Continually test and deliver all work in progress - build quality in, don’t test at the end.
- Build small, simple pieces with clear interfaces - so each can be tested, deployed, and changed independently.
Full detail in IaC Principles.
The IaC Stack
Section titled “The IaC Stack”IaC is not a single tool - it’s a stack of concerns, each layer handled by different tooling:
| Layer | What it manages | Common tools |
|---|---|---|
| Provisioning | Creating and maintaining cloud resources (VMs, networks, databases, IAM) | Terraform, Pulumi, Cloud Foundation Toolkit |
| Configuration management | What runs on existing servers after provisioning | Ansible, OS Config, Cloud Init |
| Container orchestration | Deploying workloads to clusters | GKE + Argo CD, Helm, Kustomize |
| CI/CD | Automating IaC changes through a delivery pipeline | Cloud Build, GitHub Actions, Atlantis |
| Policy enforcement | Validating that all resources meet security and compliance standards | Checkov, OPA/Conftest, GCP Org Policy |
| Observability | Detecting when real infrastructure drifts from declared state | terraform plan -refresh-only, Security Command Center |
Terraform Overview
Section titled “Terraform Overview”Terraform is the dominant IaC provisioning tool. It uses HCL (HashiCorp Configuration Language) - a declarative language where you define the desired end state and Terraform figures out how to get there, including the correct order of operations via a Directed Acyclic Graph (DAG).
Its four core components:
| Component | Role |
|---|---|
| HCL | Declarative language for defining resources, variables, outputs, and modules |
| CLI / Core | The engine - reads HCL, communicates with providers, manages the plan/apply lifecycle |
| Providers | Plugins wrapping vendor APIs (AWS, GCP, Azure, and 3,280+ others in the Terraform Registry) |
| State & Backends | Tracks what Terraform manages; remote backends (GCS, S3) enable team collaboration |
The deployment workflow is: write → terraform init → terraform plan → terraform apply. During plan, Terraform refreshes real-world state from the vendor API, diffs it against your code, and outputs exactly what will change. During apply, it executes the DAG concurrently where possible.
See the Terraform section for full coverage.