Skip to content

IaC Tools & Platforms

Every IaC project sits inside a stack of platforms and tooling. Understanding the layers - infrastructure platform, engineering platform, delivery toolchain - makes it clear what Terraform (or any IaC tool) is actually managing and why language and tool choices matter.


A typical IT system divides into three distinct layers:

LayerWhat it containsExamples
User ServicesApplications that deliver business value directly to customersWeb apps, APIs, microservices
Engineering PlatformHosting and operational services that support applicationsContainer clusters, monitoring, databases
Infrastructure PlatformPrimitive compute, storage, and networking resourcesVMs, VPCs, object storage, bare metal

Infrastructure as Code primarily targets the infrastructure platform layer - provisioning and wiring together the primitives that the engineering platform assembles into services.


The infrastructure platform must expose a programmable API for IaC tools to work against - this is the definition of Infrastructure as a Service (IaaS). IaC tools like Terraform talk to these APIs to create, modify, and destroy resources.

IaC Provisioning Loop
CategoryExamples
Public hyperscalersAWS, Google Cloud, Microsoft Azure, Alibaba Cloud, DigitalOcean
Private IaaS softwareOpenStack, Apache CloudStack, VMware vCloud
Bare-metal provisioningRackN Digital Rebar, Cobbler, Foreman, Crowbar
Vendor-managed on-premisesAWS Outposts, Azure Local, Google Cloud Anthos

Infrastructure Resources: Primitive vs. Composite

Section titled “Infrastructure Resources: Primitive vs. Composite”

IaaS platforms expose resources in two forms:

  • Primitive resources - fundamental building blocks: a single subnet, a virtual disk volume, a firewall rule.
  • Composite resources - combinations of primitives packaged as a single API call. Requesting a GKE cluster provisions a pool of VMs, networking, and stateful storage in one operation.

Resources fall into three functional categories:

Compute - executes code:

  • VM instances, bare-metal servers (BMaaS)
  • Server clusters (AWS Auto Scaling groups, Google Managed Instance Groups)
  • Container clusters as a service - Amazon EKS, Google GKE (CaaS / CCaaS)
  • Serverless runtimes - AWS Lambda, Google Cloud Run (FaaS)

Storage - persists data:

  • Block storage - virtual drives attached to instances (Amazon EBS, Azure Managed Disks)
  • Object storage - multi-location file access via API (Amazon S3, Google Cloud Storage). Cheaper and more reliable than block storage, with higher latency.
  • Networked filesystems - shared volumes via NFS, AFS, SMB/CIFS
  • Structured databases - RDBMS, key-value stores, document stores
  • Secrets management - structured storage with access controls and rotation

Networking - connects everything:

  • Network address blocks (VPCs, subnets, VLANs), DNS
  • Traffic routing - gateways, proxies, load balancers
  • Security constructs - VPNs, firewall rules
  • Messaging and caching - queues, service meshes
StrategyDefinitionHonest assessment
Hybrid cloudPrivate infra + public cloudLegitimate for legacy systems or strict data residency
PolycloudDifferent workloads on different public cloudsOften organic (post-acquisition), not a deliberate choice
Cloud-agnosticWorkloads portable across any cloud vendorTrue portability costs 10× a single-cloud approach. A migration plan is almost always sufficient instead

The engineering platform sits between raw infrastructure and user applications. It assembles infrastructure primitives into services that developers actually consume.

CategoryWhat it coversExamples
Application runtimeHosting compute, data, and networking for appsContainer clusters, managed databases, event queues
Application operationsReliability and compliance servicesMonitoring, observability, disaster recovery, secrets

Organizations deliver platform service functionality in three ways:

  1. Packaged software - self-hosted third-party tools. Prometheus, HashiCorp Vault, Kong. IaC provisions the hosting and wires it into the network.
  2. Cloud-managed services - higher-level vendor offerings. Azure Monitor, AWS Secrets Manager, GKE. IaC defines and configures these directly.
  3. Externally hosted SaaS - vendor-managed capabilities. Datadog, Okta. IaC configures and integrates them via their APIs.

Platform Delivery Services (Control Planes)

Section titled “Platform Delivery Services (Control Planes)”

Control planes are the “meta-capabilities” used to build and manage the system. They generally map to three areas:

Control planePurposeTools
Application deliveryBuild, test, deploy application codeGit, CI/CD pipelines, artifact registries
Infrastructure deliveryManage the IaC lifecycleTerraform, Pulumi, AWS CDK, Chef, Spacelift, env0
Platform managementOrchestrate and self-service platform servicesRed Hat OpenShift, Kratix, Humanitec, Backstage

MethodHow it worksKey failure mode
ClickOpsPoint-and-click through a cloud consoleNo audit trail, inconsistent results
CLIOpsManual CLI commandsSame as ClickOps, slightly faster
Task scriptsBash/Python scripts for individual operationsRequires human judgment on when to run; becomes a tangle of conditional logic

The core problem with procedural scripts: run a “create server” script three times and you get three servers. Scripts accumulate messy conditional logic (if server exists, skip; else create) that degrades over time.

IaC pulls decision-making out of human heads and into code ahead of time:

  • Embedded knowledge - definitions, specifications, and tests encode the desired state; no human decides what to run at deploy time.
  • Idempotency - the code produces the same result on every run. Server missing? Create it. Already exists with the right config? Leave it.
  • Hands-off automation - provisioning, configuration, and deployment become machine-driven.

What this unlocks: automated scaling, automated recovery, self-service platform environments, compliance-by-audit-trail, and team-wide knowledge of how systems are built.

IaC Stack
CategoryExamples
IaaS resourcesVMs, VPCs, GKE clusters, Cloud SQL - the core target of tools like Terraform
Server configurationOS setup, packages, runtime config - target of Chef, Puppet, Ansible
Hardware provisioningBare-metal servers, SDN-configured network hardware (firewalls, routers)
Application deploymentsKubernetes manifests, Docker Compose, container definitions
Delivery pipelinesCI/CD pipeline definitions (GitHub Actions workflows, Jenkinsfiles)
Platform service configMonitoring rules, identity policies, log aggregation config
TestsAutomated infrastructure tests, monitoring checks, validation scripts
ConceptWhat it captures
CodeAspects that are consistent across all deployed instances
ConfigurationAspects that are specific to one instance (e.g., memory allocation per environment)

In practice, these categories blur. Ansible scripts to generate a JBoss config file are “code,” but the file they produce is “configuration.” The key point: both must be version-controlled and treated as first-class engineering artifacts, regardless of whether they’re written in Python or YAML.


EraToolsApproach
1990sCFEngineFirst declarative, idempotent DSL for server config
2000sPuppet, Chef, Ansible, SaltConfig management for VMs; rode the virtualization wave
Early 2010sboto, fog, raw SDK scriptsProcedural IaaS API wrappers - a temporary regression
Mid 2010sTerraform, CloudFormation, Azure Bicep, GCP Deployment ManagerDeclarative stack-oriented tools; IaaS-native
Late 2010s+Pulumi, AWS CDKGPL-based tools - write infrastructure in TypeScript, Python, Go, C#
EmergingIaSQL, StackQL, Crossplane, Ampt, WinglangSQL-based, Kubernetes-native IaD, Infrastructure from Code

Terraform (and its open-source fork OpenTofu) sits in the declarative, stack-oriented, external DSL column - the most widely adopted category for cross-cloud IaaS provisioning. See Terraform Overview for a full treatment.

IaC language choices come down to four axes:

AxisOption AOption B
Execution styleProcedural - step-by-step, requires human to decide what to runIdempotent - same output every run, decision-making embedded in code
Specification styleImperative - tells the tool how to achieve a state (AWS CDK, Pulumi)Declarative - describes what the state should be (Terraform, CloudFormation)
Language scopeDSL - tightly scoped to infrastructure concepts; easier to read (HCL, Puppet DSL)GPL - general-purpose language with full ecosystem (TypeScript, Python)
Abstraction levelLow-level - thin wrappers over IaaS APIs; aws_instancerun_instancesHigh-level - modules and libraries that hide wiring complexity

Most teams combine axes: Terraform uses a declarative, idempotent, external DSL that is low-level by default but supports high-level abstractions through modules.


Infrastructure code behaves fundamentally differently from application code - understanding this prevents debugging confusion and dangerous refactoring mistakes.

App vs Infra Execution
Code typeWhen it runs
Application codeAfter deployment, in a runtime - processing events and requests
Infrastructure codeDuring deployment - executing the code is the act of changing infrastructure

Every terraform apply (or equivalent) runs three substeps under the hood:

  1. Assemble - collate all code files, dependencies, provider plugins into a build.
  2. Compile - run the assembled code to generate a desired state model in memory: exactly what the infrastructure should look like when done.
  3. Execute - use the IaaS API to compare current real-world state against the desired state model, then make the necessary changes.
IaC Compile and Apply

The terraform plan command runs steps 1 and 2 only, stopping before executing - producing a diff of what would change without touching anything.

Testing, Debugging, and Refactoring Gotchas

Section titled “Testing, Debugging, and Refactoring Gotchas”
PracticeHow infrastructure code differs from app code
Debugger / profilerTraces the Compile substep (desired state generation), not the actual API calls
Unit testsCan validate the desired-state data structures generated; cannot prove runtime infrastructure behavior
RefactoringRefactored infra code can trigger destructive changes on apply - deleting storage with live data. Plan before every apply.

IaC tools must maintain a mapping between the resources declared in code and the actual instances running on the IaaS platform.

Tool typeHow state is managed
Native IaaS tools (CloudFormation, Azure Resource Manager)State tracked internally by the vendor’s own API. Pass a stack ID; the platform finds its resources.
Third-party tools (Terraform, OpenTofu, Pulumi)Must maintain their own state files mapping code definitions to real-world resource IDs.

Third-party tools store state in a state file - one per workspace. Storage options:

  • Local - stored on the developer’s machine. Fine for solo development; breaks for teams.
  • Remote backend - GCS bucket, AWS S3, Azure Blob, or a TACOS platform. Required for team collaboration.

Emerging: Post-Code Infrastructure Automation

Section titled “Emerging: Post-Code Infrastructure Automation”

Current file-based IaC models have limits - feedback loops are slow, and code-based interfaces are not always the right tool. Post-code automation experiments address this by representing infrastructure state as a live dynamic graph rather than static files.

Tools like System Initiative and ConfigHub represent infrastructure in a real-time data model. Engineers interact with it via code, APIs, or a GUI - and critically, the GUI is not legacy ClickOps:

  • Changes are prepared as change sets, not instant mutations
  • Automated tests and approvals run before anything applies
  • Multiple engineers can collaborate without overwriting each other’s work
  • The model stays continuously synchronized with live infrastructure, shrinking the feedback loop