Skip to content

Infrastructure as Code

Infrastructure as Code (IaC) is the practice of provisioning and managing infrastructure using code rather than manual command-line tools or ClickOps GUIs. Beyond the mechanics of provisioning, IaC is fundamentally about applying the principles, practices, and tools of software engineering to infrastructure - version control, code review, automated testing, and continuous delivery, all applied to cloud resources.

The result is infrastructure that is reproducible, auditable, and safe to change at speed.


Modern infrastructure management has evolved through distinct phases. Understanding where we came from explains why IaC exists:

EraTechnologyOperationsGovernance mindset
Iron AgePhysical servers, monolithic architecturesManual runbooks, hand-configured serversChange = risk. CABs throttle change to prevent mistakes
Shadow ITEarly cloud, skunkworks DevOps teamsBypassing formal IT to avoid restrictive policy”Move fast and break things”
Age of SprawlMultiple disconnected cloud initiativesRushed adoption, accumulating technical debtSpeed at all costs - multiple vendors, varying stacks
Age of Sustainable GrowthRationalised, automated infrastructureSelective investment, cost-consciousEfficient, sustainable growth with less waste
Cloud Age (target)Virtualised resources, microservices, containersCode-driven automationFrequent, small changes are a stability mechanism - not a risk

The Cloud Age reframes change: stability comes from making changes, not from preventing them. A system that cannot be patched quickly remains vulnerable. A system that cannot be rebuilt quickly cannot recover from failure.


Before IaC, infrastructure was managed manually. Servers were configured by hand, environments were documented in wikis that fell out of date, and the difference between staging and production was tribal knowledge. Reproducing a broken environment took days. Recovering from a failure took longer.

BenefitWhat it means in practice
RepeatabilityThe same code, run twice, produces identical infrastructure. No snowflake servers or “works on my machine” environments.
ReusabilityModular building blocks shared across teams. One team’s battle-tested VPC module becomes everyone’s VPC module.
ShareabilityCode lives in Git - reviewable, forkable, versioned. Infrastructure decisions are documented by the commit that made them.
AuditabilityEvery change has a commit, a PR, and a deployment record. Compliance questions become a git log.
Recovery speedEnvironments recreated from scratch in minutes. Disaster recovery becomes a pipeline run, not a multi-day manual operation.

The philosophical foundations of IaC - myths, metrics, and design principles - are covered in depth in IaC Principles. Here’s the summary:

ConceptKey insight
Three mythsInfrastructure changes constantly; you can’t automate later; speed and quality reinforce each other
DORA metricsDelivery lead time, deployment frequency, change fail rate, MTTR - proven correlates of success
Cloud principlesReproducible, disposable, variation-free, no snowflakes - design for unreliable hardware
Automation fear spiralDrift → fear → manual changes → more drift. Break it with incremental, scheduled enforcement
Strategic alignmentCustomer value → org strategy → product strategy → tech strategy → infra strategy

IaC rests on three foundational practices. These aren’t recommendations - they’re what separates IaC from “just using an IaC tool”:

  1. Define everything as code - configuration, versions, dependencies, secrets injection. If it’s not in code, it’s tribal knowledge.
  2. Continually test and deliver all work in progress - build quality in, don’t test at the end.
  3. Build small, simple pieces with clear interfaces - so each can be tested, deployed, and changed independently.

Full detail in IaC Principles.


IaC is not a single tool - it’s a stack of concerns, each layer handled by different tooling:

LayerWhat it managesCommon tools
ProvisioningCreating and maintaining cloud resources (VMs, networks, databases, IAM)Terraform, Pulumi, Cloud Foundation Toolkit
Configuration managementWhat runs on existing servers after provisioningAnsible, OS Config, Cloud Init
Container orchestrationDeploying workloads to clustersGKE + Argo CD, Helm, Kustomize
CI/CDAutomating IaC changes through a delivery pipelineCloud Build, GitHub Actions, Atlantis
Policy enforcementValidating that all resources meet security and compliance standardsCheckov, OPA/Conftest, GCP Org Policy
ObservabilityDetecting when real infrastructure drifts from declared stateterraform plan -refresh-only, Security Command Center

Terraform is the dominant IaC provisioning tool. It uses HCL (HashiCorp Configuration Language) - a declarative language where you define the desired end state and Terraform figures out how to get there, including the correct order of operations via a Directed Acyclic Graph (DAG).

Its four core components:

ComponentRole
HCLDeclarative language for defining resources, variables, outputs, and modules
CLI / CoreThe engine - reads HCL, communicates with providers, manages the plan/apply lifecycle
ProvidersPlugins wrapping vendor APIs (AWS, GCP, Azure, and 3,280+ others in the Terraform Registry)
State & BackendsTracks what Terraform manages; remote backends (GCS, S3) enable team collaboration

The deployment workflow is: write → terraform initterraform planterraform apply. During plan, Terraform refreshes real-world state from the vendor API, diffs it against your code, and outputs exactly what will change. During apply, it executes the DAG concurrently where possible.

See the Terraform section for full coverage.