IaC & CI/CD

Infrastructure as Code reaches its full potential only when changes flow through an automated pipeline - built, tested, and delivered with the same rigour as application software. This page covers the principles, workflows, and tooling that make that possible.

Continuous Delivery Principles for IaC

CD for infrastructure builds on a set of foundational rules:

Principle	What it means in practice
Automate the full process	Pipelines orchestrate every step without human intervention; tests, monitoring, and developer setups are all codified
Use only the automated process	No manual fixes in staging or production - every fix goes through the pipeline from the start
Keep environments consistent	Use reusable stacks and effective workflows to prevent configuration drift between dev, test, and production
Deliver changes comprehensively	Apply code to all relevant infrastructure within a short timeframe; measure time to reach the last system, not just the first
Keep delivery cycles short	If automation is slow, developers bypass it; optimise continuously so engineers prefer the pipeline over manual changes
Keep all code production-ready	Validate incrementally rather than batching testing to the end of a sprint
Ensure code and deployed resources match	Use control loops (GitOps, Puppet, Chef) to continuously reconcile the codebase with live infrastructure
Minimise disruption	Track downtime metrics; deliver small, frequent, incremental changes instead of large, infrequent batches

Core Infrastructure Delivery Workflow

The workflow processes small, incremental changes through a repeating cycle:

Workflow Stages

Stage	What happens
Development	Developer edits code in a personal workspace (local emulator or cloud sandbox), runs initial tests, then pushes
Build	Automated system downloads dependencies and packages code into a versioned, deployable artifact (container image, tagged branch)
Test	Build is deployed to a series of test environments - automated checks + manual validations (exploratory testing, code reviews, UAT)
Release	After passing all tests, the build is deployed to production (called “release” because the code has already been deployed multiple times to test environments)
Run	Infrastructure actively hosts workloads; the versioned build can be reapplied to recover from failures or correct drift

Operational Principles

Small, frequent pushes - isolate issues immediately; if an increment breaks the build, the developer can pinpoint the exact change
Selective production deployment - not every commit reaches production; teams may batch several passing increments and release once or twice a day
Fix forward - if a change fails, the fix must be made in source code and pushed through the pipeline from the beginning, never patched directly in a downstream environment

Building and Distributing IaC

The Three Steps of Code Processing

Step	Purpose	Output
Assemble	Gather code and build-time dependencies (modules, plugins, providers)	A build - not specific to any target instance
Compile	Add deployment-specific configuration (variables, state config, auth tokens)	A desired-state model for a specific instance
Execute	Compare desired state against actual resources via the IaaS API and apply changes	Modified infrastructure

Build-on-Deploy vs. Build-Once

Build-on-Deploy:

Many tools resolve dependencies during every deployment. If a dependency version changes between deploying to staging and production, the environments silently diverge - creating bugs that are extremely difficult to trace.

Build Once, Deploy Many:

Assemble exactly once. Deploy the same build to every environment, running only the Compile and Execute steps. Lock dependencies by bundling them or using a lock file.

Pre-Build Development Workflows

Workflow	How it works	Feedback speed
Pull Requests	Feature branches → human review → merge to main	Slower - waits for reviewers
Trunk-Based Development	Frequent, small commits directly to main; fast automated tests catch errors	Faster - release candidate published immediately on green tests

Distribution Methods

Method	Mechanism	Best for
Code branches	Pull from SCM by commit ID, tag, or environment branch	Simple setups; environment branches work well with controllers that sync branch → environment
Stack packages	Bundle code into ZIP, TGZ, RPM, container image; store in Artifactory, Nexus, S3, or GitHub Releases	Teams needing strict artifact immutability
Libraries (wrapper stacks)	Publish core logic as a versioned Terraform module; each environment has a thin stack that consumes it	Leverages existing module registries for versioning and distribution

Immutability rule: every build must be treated as immutable. Never edit stack code to customise it for an environment - use configuration parameters instead.

Integration Workflows

When infrastructure is split into independently deployable stacks, teams need a strategy to integrate and test them:

Pattern	How it works	Best for
Fan-in	Build and test each component separately, then deploy and test all related stacks together before production	Components owned by a single team
Federation	Each component is delivered and released independently; dependencies treated like APIs with contract testing	Components owned by different teams
Monorepo	All components and shared code in one repository; build tools (Bazel, Buck) limit builds to changed paths	Large codebases needing guaranteed consistency of shared code

CI Automation

Source Control and Branching

SCM platforms (GitHub, GitLab) provide the foundation: code storage, action runners for automated testing, issue trackers, and security scanners. Branches let developers build and test features in isolation; PRs provide a gate where automated tests run and team members review changes.

What Humans Should Review vs. What Should Be Automated

Automated (CI system)	Human (code review)
Formatting, syntax, linting	Security implications (wrong subnet, missing backups)
Test execution	Adherence to team best practices and consistency
Dependency checks	Quality of comments and documentation

Automating Chores

Consolidate repetitive maintenance into a single command (e.g., make chores):

Generating documentation (terraform-docs):

terraform-docs markdown table --output-file README.md --output-mode inject .

Place  /  markers in the README. Use --output-check in CI to verify docs are current.

Standardising formatting (terraform fmt):

terraform fmt -recursive

Resolves formatting discrepancies project-wide. Rudimentary but effective for keeping code uniform.

Auto-fixing lint errors (tflint --fix):

tflint --fix

Organising Codebases

Key Definitions

Term	Meaning
Build project	Code used to build a discrete component (library, stack, application)
Codebase	One or more interrelated build projects
Repository	One or more build projects in a source control system; branches/tags/commits apply to all files

Repository Strategies

Strategy	Strengths	Weaknesses
Monorepo	Simplifies integration; code is versioned and branched together	Project boundaries blur; tangled cross-folder imports
Microrepo	Clean separation; change triggers only its own pipeline	Impractical for build-time integration across repos
Hybrid	Group tightly integrated projects; separate loosely coupled ones	Requires deliberate design decisions

Design forces: team ownership and access controls, reducing friction from conflicting changelogs, enforcing architectural boundaries.

Organise by Domain, Not Technology

Organising by technology (all databases in one file, all firewalls in another) emphasises implementation over use and forces developers to sift through unrelated workloads. Instead, organise by domain or workload:

infrastructure/
├── customer_service.infra   # DB, networking, security for this service
├── search_service.infra
├── shared_network.infra     # Categorised by domain, not dumped in "shared"
└── monitoring.infra

Project Support Files

Keep support files alongside the primary source code to guarantee version alignment:

my-stack/
├── src/           # Core infrastructure code
├── tests/         # Offline and online test suites
├── environments/  # Per-instance configuration values
├── pipeline/      # Delivery configuration
├── build.sh       # Build orchestration
└── deploy.sh      # Deployment orchestration

Local Development Practices

Consistent Development Environments

Standardise tools, versions, and configuration across the team. Automate setup with containers, local VMs (Vagrant, Batect, Dojo), or server configuration tools (Ansible, Chef, Puppet). This accelerates onboarding and eliminates “works on my machine” debugging.

Local IaaS Emulators

Emulators (LocalStack, Moto, Azurite) provide fast feedback by mocking cloud APIs locally. However, they don’t provision real resources and lack useful UIs - they’re best for running automated tests, not interactive exploration.

Personal Cloud Environments

High-performing teams let every developer provision a personal cloud environment on demand and tear it down when finished. Deploy from a branch via hosted pipelines (not local workstations) so the team can clean up orphaned environments if someone goes on holiday.

Just Enough Environment

Full environments may be too expensive or slow. Provision partial environments with only the dependencies you need, using test fixtures to replace heavy upstream stacks.

Delivery Pipeline Architecture

The Anatomy of a Pipeline Stage

Every stage has three elements:

Content (Inputs → Outputs):

Inputs: source code, libraries, test files, configuration values, or a completed build
Scope: the stage proves its component works with its dependencies - it doesn’t validate the dependencies themselves
Outputs: distributable code/package, version numbers, tags, test reports, logs

Actions (Triggers → Promotion):

Automated stages run on every input change; manual stages wait for a human
Never mix automated and manual activities in the same stage
Use passive triggers - consumer pipelines auto-detect when a provider pipeline publishes a new build

Context (Progressive Realism):

Stage	Environment	Dependencies
Offline	Pipeline agent / emulator	Test fixtures and mocks
IaaS with mocks	Real cloud platform	Test fixtures replace real dependencies - fast, isolated
Production-like	Real cloud + real integrations	Full dependencies - only catch issues that emerge in realistic conditions

Automated vs. Manual Stages

Place automated stages first - catch machine-detectable errors before humans invest time
Manual stages (exploratory testing, code review, UAT) come later
Automation doesn’t mean surrendering control over when things deploy - it eliminates the manual, error-prone execution of repetitive tasks

Delivery Orchestration Scripts

Wrap build, deployment, and testing logic in standalone scripts (Bash, Python, Make) rather than embedding it in the CI platform’s configuration:

Activity	What the script manages
Building	Resolve dependencies, assemble files, generate code
Testing	Set up fixtures/emulators, execute tests, compile results
Deployment	Assemble config parameters, apply code to stacks, orchestrate multi-stack deployments
Delivery	Upload, download, and promote packages

Best practices:

Keep scripts small and focused on a single activity - don’t build a monolith
Separate multi-stack orchestration from single-stack deployment details
Write automated tests for your scripts (e.g., Bats for shell scripts)
Use the same scripts locally and in CI for consistency

Team Topologies for IaC Delivery

Foundational Team Types

Type	Role
Stream-aligned	5–9 people focused on long-term design, build, and run of a service
Enabling	Experts who mentor and facilitate - don’t own components themselves
Platform	Provides non-differentiating infrastructure “as a service”
Complicated subsystem	Dedicated to a specific complex domain requiring deep expertise

Infrastructure Delivery Models

Model	Structure	Trade-off
Split ownership	Separate software and infrastructure teams	Handoffs cause delays and rework; fragmented workflow
Full-stack	One team owns both software and infrastructure	No handoffs; treats delivery as a single stream
Enablement	Software team owns infrastructure; enablement team mentors them	Interim step before scaling to dedicated service/component teams

Infrastructure Service Models

As organisations scale, infrastructure teams shift from instance management to service provision:

Model	How it works
Shared infrastructure (multi-tenancy)	Multiple teams deploy onto shared infrastructure (e.g., a shared cluster); four self-service journeys: onboarding, configuring, troubleshooting, deploying
On-demand provisioning (single-tenancy)	Teams provision dedicated instances via API; automated policy checks enforce compliance
Deployable components	Teams publish versioned infrastructure components to a repository; consumers deploy via a portal without writing IaC

Measuring Effectiveness

DORA Metrics:

Metric	What it measures
Delivery lead time	Time from commit to production
Deployment frequency	How often changes reach production
Change fail percentage	Percentage of changes that cause impairment or require rollback
Mean time to restore	Time to recover from an unplanned outage

Additional IaC metrics: effort (expert time per change), toil (repetitive manual work), version spread (how many versions are deployed), utilisation (how often environments are actually used).

Value Stream Mapping

Measure the total time for every activity - including queue time. Often the biggest bottleneck isn’t the automated step (e.g., 8-hour provisioning reduced to 10 minutes) but the waiting time (e.g., an 8-day approval queue). Optimise the wait, not just the automation.

Delivering Modules

Semantic Versioning

Terraform assumes published modules follow Semantic Versioning 2.0 (vMajor.Minor.Patch):

Level	Meaning	Example
Patch	Bug fix, no interface change	`v1.2.3` → `v1.2.4`
Minor	New feature, backward compatible	`v1.2.4` → `v1.3.0`
Major	Breaking change	`v1.3.0` → `v2.0.0`

Use the pessimistic constraint operator (~>) to allow safe upgrades:

module "vpc" {
  source  = "registry.example.com/networking/vpc"
  version = "~> 1.1"   # allows 1.1.x and 1.2.x, blocks 2.0.0
}

SCM-Based Delivery

Pulling modules directly from Git (using the ref field to pin a commit or tag) works for testing branches but doesn’t scale - Git sources don’t support Terraform’s version constraint logic.

Registries

Type	Details
Public (HashiCorp / OpenTofu)	Index pointing to public GitHub repos; automatically tracks semantic version tags
Private	For proprietary code; authenticate with `terraform login`; self-host with Terrareg or use a commercial CD platform’s built-in registry
Artifactory	Enterprise registry; requires explicit pushes via `jf` CLI; automate via GitHub Actions triggered on release tags; authenticate with OIDC

Managing Secrets

Authentication Hierarchy

Method	Security	Recommendation
OIDC	✅ No static secrets; temporary credentials	Preferred - eliminates secret sprawl entirely
Secret managers	✅ Centralised, RBAC-controlled	Use when OIDC isn’t available; authenticate to the manager itself via OIDC
Orchestrator settings	⚠️ Write-only; scales poorly	Last resort - updating an expired key across hundreds of projects is painful

OIDC Workflow

Register the Identity Provider URL (GitHub Actions, Spacelift, etc.) with the cloud vendor
Map the IdP to a specific identity (AWS IAM role, Azure Service Principal)
Enforce conditions - restrict the assumed role to specific repositories and workflows

Secret Managers

Fetch secrets dynamically with a Terraform data source. But beware: retrieved values may be exposed in the state file. Where possible, pass the secret’s identifier (e.g., an ARN) directly to the resource instead of pulling the plaintext value into Terraform.

Deployments

Requirement	Why it matters
Access and credentials	The system needs correct network access and cloud credentials
Time	Some resources (databases) take up to an hour to launch; deployment tools must handle long-running jobs without interruption
Consistency and queuing	Never run concurrent deployments to the same environment - use job queuing to enforce sequential execution

CD Platform Features

TACOS (Terraform Automation and Collaboration Software)

Platforms that bundle delivery, state management, and private module registries. They manage the state backend transparently and provide web UIs to review previous state versions.

Key Differentiators

Feature	Details
Drift detection	Automatically detect when live infrastructure diverges from code; many teams enable alerts (e.g., Slack) without automatic correction to avoid unreviewed changes
Multi-IaC support	Some platforms support Helm, Pulumi, Ansible alongside Terraform - avoids maintaining separate deployment systems as you scale
Policy enforcement	Enforce rules at the deployment level (not module level, where users inject values via variables); most platforms standardise on OPA / Rego
Cost estimation	Built-in (HCP Terraform) or via Infracost; limited to major cloud providers; estimates only - cannot predict consumption-based spikes

Platform Comparison

Platform	Type	Key characteristics
HCP Terraform	Managed TACOS	Deep CLI integration, built-in cost estimation; Terraform-only (no OpenTofu/Terragrunt); per-resource pricing can inflate costs
Env0 / Spacelift	Managed TACOS	OpenTofu sponsors; multi-framework (Terragrunt, Helm, Ansible, Pulumi); recommended for polished multi-tool experience
Scalr	Managed TACOS	OpenTofu sponsor; Terraform/OpenTofu-only; native CLI-driven workflows; excellent migration target from HCP Terraform
Digger / Terrateam	GitOps Plus	Deployment-focused; no state backend or registry; PR-comment-driven workflow tightly integrated with GitHub
Harness / Octopus Deploy	Enterprise CD	Broad platforms for mixed environments (IaC + legacy + hardware); no built-in registries or state management
Atlantis / Terrakube	Self-hosted OSS	Terrakube = traditional TACOS; Atlantis = PR-comment workflow; saves money but introduces administrative burden and security responsibility