IaC Tools & Platforms
Every IaC project sits inside a stack of platforms and tooling. Understanding the layers - infrastructure platform, engineering platform, delivery toolchain - makes it clear what Terraform (or any IaC tool) is actually managing and why language and tool choices matter.
The Three System Layers
Section titled “The Three System Layers”A typical IT system divides into three distinct layers:
| Layer | What it contains | Examples |
|---|---|---|
| User Services | Applications that deliver business value directly to customers | Web apps, APIs, microservices |
| Engineering Platform | Hosting and operational services that support applications | Container clusters, monitoring, databases |
| Infrastructure Platform | Primitive compute, storage, and networking resources | VMs, VPCs, object storage, bare metal |
Infrastructure as Code primarily targets the infrastructure platform layer - provisioning and wiring together the primitives that the engineering platform assembles into services.
Infrastructure Platforms (IaaS)
Section titled “Infrastructure Platforms (IaaS)”The infrastructure platform must expose a programmable API for IaC tools to work against - this is the definition of Infrastructure as a Service (IaaS). IaC tools like Terraform talk to these APIs to create, modify, and destroy resources.
Types of IaaS Platforms
Section titled “Types of IaaS Platforms”| Category | Examples |
|---|---|
| Public hyperscalers | AWS, Google Cloud, Microsoft Azure, Alibaba Cloud, DigitalOcean |
| Private IaaS software | OpenStack, Apache CloudStack, VMware vCloud |
| Bare-metal provisioning | RackN Digital Rebar, Cobbler, Foreman, Crowbar |
| Vendor-managed on-premises | AWS Outposts, Azure Local, Google Cloud Anthos |
Infrastructure Resources: Primitive vs. Composite
Section titled “Infrastructure Resources: Primitive vs. Composite”IaaS platforms expose resources in two forms:
- Primitive resources - fundamental building blocks: a single subnet, a virtual disk volume, a firewall rule.
- Composite resources - combinations of primitives packaged as a single API call. Requesting a GKE cluster provisions a pool of VMs, networking, and stateful storage in one operation.
Resources fall into three functional categories:
Compute - executes code:
- VM instances, bare-metal servers (BMaaS)
- Server clusters (AWS Auto Scaling groups, Google Managed Instance Groups)
- Container clusters as a service - Amazon EKS, Google GKE (CaaS / CCaaS)
- Serverless runtimes - AWS Lambda, Google Cloud Run (FaaS)
Storage - persists data:
- Block storage - virtual drives attached to instances (Amazon EBS, Azure Managed Disks)
- Object storage - multi-location file access via API (Amazon S3, Google Cloud Storage). Cheaper and more reliable than block storage, with higher latency.
- Networked filesystems - shared volumes via NFS, AFS, SMB/CIFS
- Structured databases - RDBMS, key-value stores, document stores
- Secrets management - structured storage with access controls and rotation
Networking - connects everything:
- Network address blocks (VPCs, subnets, VLANs), DNS
- Traffic routing - gateways, proxies, load balancers
- Security constructs - VPNs, firewall rules
- Messaging and caching - queues, service meshes
Multicloud Strategies
Section titled “Multicloud Strategies”| Strategy | Definition | Honest assessment |
|---|---|---|
| Hybrid cloud | Private infra + public cloud | Legitimate for legacy systems or strict data residency |
| Polycloud | Different workloads on different public clouds | Often organic (post-acquisition), not a deliberate choice |
| Cloud-agnostic | Workloads portable across any cloud vendor | True portability costs 10× a single-cloud approach. A migration plan is almost always sufficient instead |
Engineering Platforms
Section titled “Engineering Platforms”The engineering platform sits between raw infrastructure and user applications. It assembles infrastructure primitives into services that developers actually consume.
Platform Service Categories
Section titled “Platform Service Categories”| Category | What it covers | Examples |
|---|---|---|
| Application runtime | Hosting compute, data, and networking for apps | Container clusters, managed databases, event queues |
| Application operations | Reliability and compliance services | Monitoring, observability, disaster recovery, secrets |
Organizations deliver platform service functionality in three ways:
- Packaged software - self-hosted third-party tools. Prometheus, HashiCorp Vault, Kong. IaC provisions the hosting and wires it into the network.
- Cloud-managed services - higher-level vendor offerings. Azure Monitor, AWS Secrets Manager, GKE. IaC defines and configures these directly.
- Externally hosted SaaS - vendor-managed capabilities. Datadog, Okta. IaC configures and integrates them via their APIs.
Platform Delivery Services (Control Planes)
Section titled “Platform Delivery Services (Control Planes)”Control planes are the “meta-capabilities” used to build and manage the system. They generally map to three areas:
| Control plane | Purpose | Tools |
|---|---|---|
| Application delivery | Build, test, deploy application code | Git, CI/CD pipelines, artifact registries |
| Infrastructure delivery | Manage the IaC lifecycle | Terraform, Pulumi, AWS CDK, Chef, Spacelift, env0 |
| Platform management | Orchestrate and self-service platform services | Red Hat OpenShift, Kratix, Humanitec, Backstage |
From Manual to Code: The IaC Shift
Section titled “From Manual to Code: The IaC Shift”Traditional Methods and Their Limits
Section titled “Traditional Methods and Their Limits”| Method | How it works | Key failure mode |
|---|---|---|
| ClickOps | Point-and-click through a cloud console | No audit trail, inconsistent results |
| CLIOps | Manual CLI commands | Same as ClickOps, slightly faster |
| Task scripts | Bash/Python scripts for individual operations | Requires human judgment on when to run; becomes a tangle of conditional logic |
The core problem with procedural scripts: run a “create server” script three times and you get three servers. Scripts accumulate messy conditional logic (if server exists, skip; else create) that degrades over time.
Infrastructure as Code: The Shift
Section titled “Infrastructure as Code: The Shift”IaC pulls decision-making out of human heads and into code ahead of time:
- Embedded knowledge - definitions, specifications, and tests encode the desired state; no human decides what to run at deploy time.
- Idempotency - the code produces the same result on every run. Server missing? Create it. Already exists with the right config? Leave it.
- Hands-off automation - provisioning, configuration, and deployment become machine-driven.
What this unlocks: automated scaling, automated recovery, self-service platform environments, compliance-by-audit-trail, and team-wide knowledge of how systems are built.
What Can Be Defined as Code
Section titled “What Can Be Defined as Code”
| Category | Examples |
|---|---|
| IaaS resources | VMs, VPCs, GKE clusters, Cloud SQL - the core target of tools like Terraform |
| Server configuration | OS setup, packages, runtime config - target of Chef, Puppet, Ansible |
| Hardware provisioning | Bare-metal servers, SDN-configured network hardware (firewalls, routers) |
| Application deployments | Kubernetes manifests, Docker Compose, container definitions |
| Delivery pipelines | CI/CD pipeline definitions (GitHub Actions workflows, Jenkinsfiles) |
| Platform service config | Monitoring rules, identity policies, log aggregation config |
| Tests | Automated infrastructure tests, monitoring checks, validation scripts |
Code vs. Configuration
Section titled “Code vs. Configuration”| Concept | What it captures |
|---|---|
| Code | Aspects that are consistent across all deployed instances |
| Configuration | Aspects that are specific to one instance (e.g., memory allocation per environment) |
In practice, these categories blur. Ansible scripts to generate a JBoss config file are “code,” but the file they produce is “configuration.” The key point: both must be version-controlled and treated as first-class engineering artifacts, regardless of whether they’re written in Python or YAML.
IaC Tool Landscape
Section titled “IaC Tool Landscape”Historical Evolution
Section titled “Historical Evolution”| Era | Tools | Approach |
|---|---|---|
| 1990s | CFEngine | First declarative, idempotent DSL for server config |
| 2000s | Puppet, Chef, Ansible, Salt | Config management for VMs; rode the virtualization wave |
| Early 2010s | boto, fog, raw SDK scripts | Procedural IaaS API wrappers - a temporary regression |
| Mid 2010s | Terraform, CloudFormation, Azure Bicep, GCP Deployment Manager | Declarative stack-oriented tools; IaaS-native |
| Late 2010s+ | Pulumi, AWS CDK | GPL-based tools - write infrastructure in TypeScript, Python, Go, C# |
| Emerging | IaSQL, StackQL, Crossplane, Ampt, Winglang | SQL-based, Kubernetes-native IaD, Infrastructure from Code |
Terraform’s Place in the Ecosystem
Section titled “Terraform’s Place in the Ecosystem”Terraform (and its open-source fork OpenTofu) sits in the declarative, stack-oriented, external DSL column - the most widely adopted category for cross-cloud IaaS provisioning. See Terraform Overview for a full treatment.
IaC Language Characteristics
Section titled “IaC Language Characteristics”IaC language choices come down to four axes:
| Axis | Option A | Option B |
|---|---|---|
| Execution style | Procedural - step-by-step, requires human to decide what to run | Idempotent - same output every run, decision-making embedded in code |
| Specification style | Imperative - tells the tool how to achieve a state (AWS CDK, Pulumi) | Declarative - describes what the state should be (Terraform, CloudFormation) |
| Language scope | DSL - tightly scoped to infrastructure concepts; easier to read (HCL, Puppet DSL) | GPL - general-purpose language with full ecosystem (TypeScript, Python) |
| Abstraction level | Low-level - thin wrappers over IaaS APIs; aws_instance → run_instances | High-level - modules and libraries that hide wiring complexity |
Most teams combine axes: Terraform uses a declarative, idempotent, external DSL that is low-level by default but supports high-level abstractions through modules.
Infrastructure Code Processing
Section titled “Infrastructure Code Processing”Infrastructure code behaves fundamentally differently from application code - understanding this prevents debugging confusion and dangerous refactoring mistakes.
When Code Executes
Section titled “When Code Executes”
| Code type | When it runs |
|---|---|
| Application code | After deployment, in a runtime - processing events and requests |
| Infrastructure code | During deployment - executing the code is the act of changing infrastructure |
The Three Substeps of Deployment
Section titled “The Three Substeps of Deployment”Every terraform apply (or equivalent) runs three substeps under the hood:
- Assemble - collate all code files, dependencies, provider plugins into a build.
- Compile - run the assembled code to generate a desired state model in memory: exactly what the infrastructure should look like when done.
- Execute - use the IaaS API to compare current real-world state against the desired state model, then make the necessary changes.
The terraform plan command runs steps 1 and 2 only, stopping before executing - producing a diff of what would change without touching anything.
Testing, Debugging, and Refactoring Gotchas
Section titled “Testing, Debugging, and Refactoring Gotchas”| Practice | How infrastructure code differs from app code |
|---|---|
| Debugger / profiler | Traces the Compile substep (desired state generation), not the actual API calls |
| Unit tests | Can validate the desired-state data structures generated; cannot prove runtime infrastructure behavior |
| Refactoring | Refactored infra code can trigger destructive changes on apply - deleting storage with live data. Plan before every apply. |
State Management
Section titled “State Management”IaC tools must maintain a mapping between the resources declared in code and the actual instances running on the IaaS platform.
Platform-Native vs. Third-Party
Section titled “Platform-Native vs. Third-Party”| Tool type | How state is managed |
|---|---|
| Native IaaS tools (CloudFormation, Azure Resource Manager) | State tracked internally by the vendor’s own API. Pass a stack ID; the platform finds its resources. |
| Third-party tools (Terraform, OpenTofu, Pulumi) | Must maintain their own state files mapping code definitions to real-world resource IDs. |
State Files
Section titled “State Files”Third-party tools store state in a state file - one per workspace. Storage options:
- Local - stored on the developer’s machine. Fine for solo development; breaks for teams.
- Remote backend - GCS bucket, AWS S3, Azure Blob, or a TACOS platform. Required for team collaboration.
Emerging: Post-Code Infrastructure Automation
Section titled “Emerging: Post-Code Infrastructure Automation”Current file-based IaC models have limits - feedback loops are slow, and code-based interfaces are not always the right tool. Post-code automation experiments address this by representing infrastructure state as a live dynamic graph rather than static files.
Tools like System Initiative and ConfigHub represent infrastructure in a real-time data model. Engineers interact with it via code, APIs, or a GUI - and critically, the GUI is not legacy ClickOps:
- Changes are prepared as change sets, not instant mutations
- Automated tests and approvals run before anything applies
- Multiple engineers can collaborate without overwriting each other’s work
- The model stays continuously synchronized with live infrastructure, shrinking the feedback loop