Infrastructure as Code

Infrastructure as Code (IaC) is the practice of provisioning and managing infrastructure using code rather than manual command-line tools or ClickOps GUIs. Beyond the mechanics of provisioning, IaC is fundamentally about applying the principles, practices, and tools of software engineering to infrastructure - version control, code review, automated testing, and continuous delivery, all applied to cloud resources.

The result is infrastructure that is reproducible, auditable, and safe to change at speed.

From Iron Age to Cloud Age

Modern infrastructure management has evolved through distinct phases. Understanding where we came from explains why IaC exists:

Era	Technology	Operations	Governance mindset
Iron Age	Physical servers, monolithic architectures	Manual runbooks, hand-configured servers	Change = risk. CABs throttle change to prevent mistakes
Shadow IT	Early cloud, skunkworks DevOps teams	Bypassing formal IT to avoid restrictive policy	”Move fast and break things”
Age of Sprawl	Multiple disconnected cloud initiatives	Rushed adoption, accumulating technical debt	Speed at all costs - multiple vendors, varying stacks
Age of Sustainable Growth	Rationalised, automated infrastructure	Selective investment, cost-conscious	Efficient, sustainable growth with less waste
Cloud Age (target)	Virtualised resources, microservices, containers	Code-driven automation	Frequent, small changes are a stability mechanism - not a risk

The Cloud Age reframes change: stability comes from making changes, not from preventing them. A system that cannot be patched quickly remains vulnerable. A system that cannot be rebuilt quickly cannot recover from failure.

Why IaC?

Before IaC, infrastructure was managed manually. Servers were configured by hand, environments were documented in wikis that fell out of date, and the difference between staging and production was tribal knowledge. Reproducing a broken environment took days. Recovering from a failure took longer.

Benefit	What it means in practice
Repeatability	The same code, run twice, produces identical infrastructure. No snowflake servers or “works on my machine” environments.
Reusability	Modular building blocks shared across teams. One team’s battle-tested VPC module becomes everyone’s VPC module.
Shareability	Code lives in Git - reviewable, forkable, versioned. Infrastructure decisions are documented by the commit that made them.
Auditability	Every change has a commit, a PR, and a deployment record. Compliance questions become a `git log`.
Recovery speed	Environments recreated from scratch in minutes. Disaster recovery becomes a pipeline run, not a multi-day manual operation.

Key Concepts at a Glance

The philosophical foundations of IaC - myths, metrics, and design principles - are covered in depth in IaC Principles. Here’s the summary:

Concept	Key insight
Three myths	Infrastructure changes constantly; you can’t automate later; speed and quality reinforce each other
DORA metrics	Delivery lead time, deployment frequency, change fail rate, MTTR - proven correlates of success
Cloud principles	Reproducible, disposable, variation-free, no snowflakes - design for unreliable hardware
Automation fear spiral	Drift → fear → manual changes → more drift. Break it with incremental, scheduled enforcement
Strategic alignment	Customer value → org strategy → product strategy → tech strategy → infra strategy

The Three Core Practices

IaC rests on three foundational practices. These aren’t recommendations - they’re what separates IaC from “just using an IaC tool”:

Define everything as code - configuration, versions, dependencies, secrets injection. If it’s not in code, it’s tribal knowledge.
Continually test and deliver all work in progress - build quality in, don’t test at the end.
Build small, simple pieces with clear interfaces - so each can be tested, deployed, and changed independently.

Full detail in IaC Principles.

The IaC Stack

IaC is not a single tool - it’s a stack of concerns, each layer handled by different tooling:

Layer	What it manages	Common tools
Provisioning	Creating and maintaining cloud resources (VMs, networks, databases, IAM)	Terraform, Pulumi, Cloud Foundation Toolkit
Configuration management	What runs on existing servers after provisioning	Ansible, OS Config, Cloud Init
Container orchestration	Deploying workloads to clusters	GKE + Argo CD, Helm, Kustomize
CI/CD	Automating IaC changes through a delivery pipeline	Cloud Build, GitHub Actions, Atlantis
Policy enforcement	Validating that all resources meet security and compliance standards	Checkov, OPA/Conftest, GCP Org Policy
Observability	Detecting when real infrastructure drifts from declared state	`terraform plan -refresh-only`, Security Command Center

Terraform Overview

Terraform is the dominant IaC provisioning tool. It uses HCL (HashiCorp Configuration Language) - a declarative language where you define the desired end state and Terraform figures out how to get there, including the correct order of operations via a Directed Acyclic Graph (DAG).

Its four core components:

Component	Role
HCL	Declarative language for defining resources, variables, outputs, and modules
CLI / Core	The engine - reads HCL, communicates with providers, manages the plan/apply lifecycle
Providers	Plugins wrapping vendor APIs (AWS, GCP, Azure, and 3,280+ others in the Terraform Registry)
State & Backends	Tracks what Terraform manages; remote backends (GCS, S3) enable team collaboration

The deployment workflow is: write → terraform init → terraform plan → terraform apply. During plan, Terraform refreshes real-world state from the vendor API, diffs it against your code, and outputs exactly what will change. During apply, it executes the DAG concurrently where possible.

See the Terraform section for full coverage.

Topics in This Section

Foundations

IaC Principles What IaC is, the Cloud Age philosophy, the core practices, and why infrastructure management had to change.

Tools & Platforms IaC tool landscape, declarative vs. imperative, GCP tooling options, and Terraform's place in the ecosystem.

OpenTofu What OpenTofu is, why it was forked from Terraform, compatibility, and who it's for.

Refactoring IaC Internal and external refactoring patterns - reorganizing projects, moved/import blocks, breaking changes.

Design

Modules & Design CUPID design principles, stack size patterns, module structures, and infrastructure library patterns.

Stack Configuration Patterns for parameterizing stack instances, secrets management, and resource discovery between stacks.

Stack Integration Cross-stack resource discovery, remote state references, dependency injection, and integration registries.

Servers & Environments Immutable server patterns, multi-environment architecture, cluster topologies, and serverless infrastructure.

Terraform

Terraform HCL Block syntax, variables, type system, expressions, for_each, dynamic blocks, and meta-arguments.

Plan & Apply DAGs, planning modes, apply lifecycle, resource targeting, and common pitfalls.

State Management How state works, remote backends, drift detection, state manipulation, and cross-project access.

Terraform Deep Dives Advanced topics, alternative interfaces (CDKTF, JSON HCL), and writing custom providers.

Delivery

Testing IaC Progressive testing, static analysis, Terratest, the native Terraform test framework, and mocks.

IaC & CI/CD Delivery pipelines, GitHub Actions, Workload Identity Federation, GitOps workflows, and CD platforms.

Deploying Infrastructure Deployment strategies, push/pull/GitOps patterns, infrastructure as data, and deployment scripts.

Changing Infrastructure Safely changing live infrastructure - expand-and-contract, blue-green, rolling upgrades, data migrations.

Governance & Compliance Shift-left compliance, OPA/Checkov, GCP Org Policy as code, and audit automation.