IaC & CI/CD
Infrastructure as Code reaches its full potential only when changes flow through an automated pipeline - built, tested, and delivered with the same rigour as application software. This page covers the principles, workflows, and tooling that make that possible.
Continuous Delivery Principles for IaC
Section titled “Continuous Delivery Principles for IaC”CD for infrastructure builds on a set of foundational rules:
| Principle | What it means in practice |
|---|---|
| Automate the full process | Pipelines orchestrate every step without human intervention; tests, monitoring, and developer setups are all codified |
| Use only the automated process | No manual fixes in staging or production - every fix goes through the pipeline from the start |
| Keep environments consistent | Use reusable stacks and effective workflows to prevent configuration drift between dev, test, and production |
| Deliver changes comprehensively | Apply code to all relevant infrastructure within a short timeframe; measure time to reach the last system, not just the first |
| Keep delivery cycles short | If automation is slow, developers bypass it; optimise continuously so engineers prefer the pipeline over manual changes |
| Keep all code production-ready | Validate incrementally rather than batching testing to the end of a sprint |
| Ensure code and deployed resources match | Use control loops (GitOps, Puppet, Chef) to continuously reconcile the codebase with live infrastructure |
| Minimise disruption | Track downtime metrics; deliver small, frequent, incremental changes instead of large, infrequent batches |
Core Infrastructure Delivery Workflow
Section titled “Core Infrastructure Delivery Workflow”The workflow processes small, incremental changes through a repeating cycle:
Workflow Stages
Section titled “Workflow Stages”| Stage | What happens |
|---|---|
| Development | Developer edits code in a personal workspace (local emulator or cloud sandbox), runs initial tests, then pushes |
| Build | Automated system downloads dependencies and packages code into a versioned, deployable artifact (container image, tagged branch) |
| Test | Build is deployed to a series of test environments - automated checks + manual validations (exploratory testing, code reviews, UAT) |
| Release | After passing all tests, the build is deployed to production (called “release” because the code has already been deployed multiple times to test environments) |
| Run | Infrastructure actively hosts workloads; the versioned build can be reapplied to recover from failures or correct drift |
Operational Principles
Section titled “Operational Principles”- Small, frequent pushes - isolate issues immediately; if an increment breaks the build, the developer can pinpoint the exact change
- Selective production deployment - not every commit reaches production; teams may batch several passing increments and release once or twice a day
- Fix forward - if a change fails, the fix must be made in source code and pushed through the pipeline from the beginning, never patched directly in a downstream environment
Building and Distributing IaC
Section titled “Building and Distributing IaC”The Three Steps of Code Processing
Section titled “The Three Steps of Code Processing”| Step | Purpose | Output |
|---|---|---|
| Assemble | Gather code and build-time dependencies (modules, plugins, providers) | A build - not specific to any target instance |
| Compile | Add deployment-specific configuration (variables, state config, auth tokens) | A desired-state model for a specific instance |
| Execute | Compare desired state against actual resources via the IaaS API and apply changes | Modified infrastructure |
Build-on-Deploy vs. Build-Once
Section titled “Build-on-Deploy vs. Build-Once”Build-on-Deploy:
Many tools resolve dependencies during every deployment. If a dependency version changes between deploying to staging and production, the environments silently diverge - creating bugs that are extremely difficult to trace.
Build Once, Deploy Many:
Assemble exactly once. Deploy the same build to every environment, running only the Compile and Execute steps. Lock dependencies by bundling them or using a lock file.
Pre-Build Development Workflows
Section titled “Pre-Build Development Workflows”| Workflow | How it works | Feedback speed |
|---|---|---|
| Pull Requests | Feature branches → human review → merge to main | Slower - waits for reviewers |
| Trunk-Based Development | Frequent, small commits directly to main; fast automated tests catch errors | Faster - release candidate published immediately on green tests |
Distribution Methods
Section titled “Distribution Methods”| Method | Mechanism | Best for |
|---|---|---|
| Code branches | Pull from SCM by commit ID, tag, or environment branch | Simple setups; environment branches work well with controllers that sync branch → environment |
| Stack packages | Bundle code into ZIP, TGZ, RPM, container image; store in Artifactory, Nexus, S3, or GitHub Releases | Teams needing strict artifact immutability |
| Libraries (wrapper stacks) | Publish core logic as a versioned Terraform module; each environment has a thin stack that consumes it | Leverages existing module registries for versioning and distribution |
Immutability rule: every build must be treated as immutable. Never edit stack code to customise it for an environment - use configuration parameters instead.
Integration Workflows
Section titled “Integration Workflows”When infrastructure is split into independently deployable stacks, teams need a strategy to integrate and test them:
| Pattern | How it works | Best for |
|---|---|---|
| Fan-in | Build and test each component separately, then deploy and test all related stacks together before production | Components owned by a single team |
| Federation | Each component is delivered and released independently; dependencies treated like APIs with contract testing | Components owned by different teams |
| Monorepo | All components and shared code in one repository; build tools (Bazel, Buck) limit builds to changed paths | Large codebases needing guaranteed consistency of shared code |
CI Automation
Section titled “CI Automation”Source Control and Branching
Section titled “Source Control and Branching”SCM platforms (GitHub, GitLab) provide the foundation: code storage, action runners for automated testing, issue trackers, and security scanners. Branches let developers build and test features in isolation; PRs provide a gate where automated tests run and team members review changes.
What Humans Should Review vs. What Should Be Automated
Section titled “What Humans Should Review vs. What Should Be Automated”| Automated (CI system) | Human (code review) |
|---|---|
| Formatting, syntax, linting | Security implications (wrong subnet, missing backups) |
| Test execution | Adherence to team best practices and consistency |
| Dependency checks | Quality of comments and documentation |
Automating Chores
Section titled “Automating Chores”Consolidate repetitive maintenance into a single command (e.g., make chores):
Generating documentation (terraform-docs):
terraform-docs markdown table --output-file README.md --output-mode inject .Place <!-- BEGIN_TF_DOCS --> / <!-- END_TF_DOCS --> markers in the README. Use --output-check in CI to verify docs are current.
Standardising formatting (terraform fmt):
terraform fmt -recursiveResolves formatting discrepancies project-wide. Rudimentary but effective for keeping code uniform.
Auto-fixing lint errors (tflint --fix):
tflint --fixOrganising Codebases
Section titled “Organising Codebases”Key Definitions
Section titled “Key Definitions”| Term | Meaning |
|---|---|
| Build project | Code used to build a discrete component (library, stack, application) |
| Codebase | One or more interrelated build projects |
| Repository | One or more build projects in a source control system; branches/tags/commits apply to all files |
Repository Strategies
Section titled “Repository Strategies”| Strategy | Strengths | Weaknesses |
|---|---|---|
| Monorepo | Simplifies integration; code is versioned and branched together | Project boundaries blur; tangled cross-folder imports |
| Microrepo | Clean separation; change triggers only its own pipeline | Impractical for build-time integration across repos |
| Hybrid | Group tightly integrated projects; separate loosely coupled ones | Requires deliberate design decisions |
Design forces: team ownership and access controls, reducing friction from conflicting changelogs, enforcing architectural boundaries.
Organise by Domain, Not Technology
Section titled “Organise by Domain, Not Technology”Organising by technology (all databases in one file, all firewalls in another) emphasises implementation over use and forces developers to sift through unrelated workloads. Instead, organise by domain or workload:
infrastructure/├── customer_service.infra # DB, networking, security for this service├── search_service.infra├── shared_network.infra # Categorised by domain, not dumped in "shared"└── monitoring.infraProject Support Files
Section titled “Project Support Files”Keep support files alongside the primary source code to guarantee version alignment:
my-stack/├── src/ # Core infrastructure code├── tests/ # Offline and online test suites├── environments/ # Per-instance configuration values├── pipeline/ # Delivery configuration├── build.sh # Build orchestration└── deploy.sh # Deployment orchestrationLocal Development Practices
Section titled “Local Development Practices”Consistent Development Environments
Section titled “Consistent Development Environments”Standardise tools, versions, and configuration across the team. Automate setup with containers, local VMs (Vagrant, Batect, Dojo), or server configuration tools (Ansible, Chef, Puppet). This accelerates onboarding and eliminates “works on my machine” debugging.
Local IaaS Emulators
Section titled “Local IaaS Emulators”Emulators (LocalStack, Moto, Azurite) provide fast feedback by mocking cloud APIs locally. However, they don’t provision real resources and lack useful UIs - they’re best for running automated tests, not interactive exploration.
Personal Cloud Environments
Section titled “Personal Cloud Environments”High-performing teams let every developer provision a personal cloud environment on demand and tear it down when finished. Deploy from a branch via hosted pipelines (not local workstations) so the team can clean up orphaned environments if someone goes on holiday.
Just Enough Environment
Section titled “Just Enough Environment”Full environments may be too expensive or slow. Provision partial environments with only the dependencies you need, using test fixtures to replace heavy upstream stacks.
Delivery Pipeline Architecture
Section titled “Delivery Pipeline Architecture”The Anatomy of a Pipeline Stage
Section titled “The Anatomy of a Pipeline Stage”Every stage has three elements:
Content (Inputs → Outputs):
- Inputs: source code, libraries, test files, configuration values, or a completed build
- Scope: the stage proves its component works with its dependencies - it doesn’t validate the dependencies themselves
- Outputs: distributable code/package, version numbers, tags, test reports, logs
Actions (Triggers → Promotion):
- Automated stages run on every input change; manual stages wait for a human
- Never mix automated and manual activities in the same stage
- Use passive triggers - consumer pipelines auto-detect when a provider pipeline publishes a new build
Context (Progressive Realism):
| Stage | Environment | Dependencies |
|---|---|---|
| Offline | Pipeline agent / emulator | Test fixtures and mocks |
| IaaS with mocks | Real cloud platform | Test fixtures replace real dependencies - fast, isolated |
| Production-like | Real cloud + real integrations | Full dependencies - only catch issues that emerge in realistic conditions |
Automated vs. Manual Stages
Section titled “Automated vs. Manual Stages”- Place automated stages first - catch machine-detectable errors before humans invest time
- Manual stages (exploratory testing, code review, UAT) come later
- Automation doesn’t mean surrendering control over when things deploy - it eliminates the manual, error-prone execution of repetitive tasks
Delivery Orchestration Scripts
Section titled “Delivery Orchestration Scripts”Wrap build, deployment, and testing logic in standalone scripts (Bash, Python, Make) rather than embedding it in the CI platform’s configuration:
| Activity | What the script manages |
|---|---|
| Building | Resolve dependencies, assemble files, generate code |
| Testing | Set up fixtures/emulators, execute tests, compile results |
| Deployment | Assemble config parameters, apply code to stacks, orchestrate multi-stack deployments |
| Delivery | Upload, download, and promote packages |
Best practices:
- Keep scripts small and focused on a single activity - don’t build a monolith
- Separate multi-stack orchestration from single-stack deployment details
- Write automated tests for your scripts (e.g., Bats for shell scripts)
- Use the same scripts locally and in CI for consistency
Team Topologies for IaC Delivery
Section titled “Team Topologies for IaC Delivery”Foundational Team Types
Section titled “Foundational Team Types”| Type | Role |
|---|---|
| Stream-aligned | 5–9 people focused on long-term design, build, and run of a service |
| Enabling | Experts who mentor and facilitate - don’t own components themselves |
| Platform | Provides non-differentiating infrastructure “as a service” |
| Complicated subsystem | Dedicated to a specific complex domain requiring deep expertise |
Infrastructure Delivery Models
Section titled “Infrastructure Delivery Models”| Model | Structure | Trade-off |
|---|---|---|
| Split ownership | Separate software and infrastructure teams | Handoffs cause delays and rework; fragmented workflow |
| Full-stack | One team owns both software and infrastructure | No handoffs; treats delivery as a single stream |
| Enablement | Software team owns infrastructure; enablement team mentors them | Interim step before scaling to dedicated service/component teams |
Infrastructure Service Models
Section titled “Infrastructure Service Models”As organisations scale, infrastructure teams shift from instance management to service provision:
| Model | How it works |
|---|---|
| Shared infrastructure (multi-tenancy) | Multiple teams deploy onto shared infrastructure (e.g., a shared cluster); four self-service journeys: onboarding, configuring, troubleshooting, deploying |
| On-demand provisioning (single-tenancy) | Teams provision dedicated instances via API; automated policy checks enforce compliance |
| Deployable components | Teams publish versioned infrastructure components to a repository; consumers deploy via a portal without writing IaC |
Measuring Effectiveness
Section titled “Measuring Effectiveness”DORA Metrics:
| Metric | What it measures |
|---|---|
| Delivery lead time | Time from commit to production |
| Deployment frequency | How often changes reach production |
| Change fail percentage | Percentage of changes that cause impairment or require rollback |
| Mean time to restore | Time to recover from an unplanned outage |
Additional IaC metrics: effort (expert time per change), toil (repetitive manual work), version spread (how many versions are deployed), utilisation (how often environments are actually used).
Value Stream Mapping
Section titled “Value Stream Mapping”Measure the total time for every activity - including queue time. Often the biggest bottleneck isn’t the automated step (e.g., 8-hour provisioning reduced to 10 minutes) but the waiting time (e.g., an 8-day approval queue). Optimise the wait, not just the automation.
Delivering Modules
Section titled “Delivering Modules”Semantic Versioning
Section titled “Semantic Versioning”Terraform assumes published modules follow Semantic Versioning 2.0 (vMajor.Minor.Patch):
| Level | Meaning | Example |
|---|---|---|
| Patch | Bug fix, no interface change | v1.2.3 → v1.2.4 |
| Minor | New feature, backward compatible | v1.2.4 → v1.3.0 |
| Major | Breaking change | v1.3.0 → v2.0.0 |
Use the pessimistic constraint operator (~>) to allow safe upgrades:
module "vpc" { source = "registry.example.com/networking/vpc" version = "~> 1.1" # allows 1.1.x and 1.2.x, blocks 2.0.0}SCM-Based Delivery
Section titled “SCM-Based Delivery”Pulling modules directly from Git (using the ref field to pin a commit or tag) works for testing branches but doesn’t scale - Git sources don’t support Terraform’s version constraint logic.
Registries
Section titled “Registries”| Type | Details |
|---|---|
| Public (HashiCorp / OpenTofu) | Index pointing to public GitHub repos; automatically tracks semantic version tags |
| Private | For proprietary code; authenticate with terraform login; self-host with Terrareg or use a commercial CD platform’s built-in registry |
| Artifactory | Enterprise registry; requires explicit pushes via jf CLI; automate via GitHub Actions triggered on release tags; authenticate with OIDC |
Managing Secrets
Section titled “Managing Secrets”Authentication Hierarchy
Section titled “Authentication Hierarchy”| Method | Security | Recommendation |
|---|---|---|
| OIDC | ✅ No static secrets; temporary credentials | Preferred - eliminates secret sprawl entirely |
| Secret managers | ✅ Centralised, RBAC-controlled | Use when OIDC isn’t available; authenticate to the manager itself via OIDC |
| Orchestrator settings | ⚠️ Write-only; scales poorly | Last resort - updating an expired key across hundreds of projects is painful |
OIDC Workflow
Section titled “OIDC Workflow”- Register the Identity Provider URL (GitHub Actions, Spacelift, etc.) with the cloud vendor
- Map the IdP to a specific identity (AWS IAM role, Azure Service Principal)
- Enforce conditions - restrict the assumed role to specific repositories and workflows
Secret Managers
Section titled “Secret Managers”Fetch secrets dynamically with a Terraform data source. But beware: retrieved values may be exposed in the state file. Where possible, pass the secret’s identifier (e.g., an ARN) directly to the resource instead of pulling the plaintext value into Terraform.
Deployments
Section titled “Deployments”| Requirement | Why it matters |
|---|---|
| Access and credentials | The system needs correct network access and cloud credentials |
| Time | Some resources (databases) take up to an hour to launch; deployment tools must handle long-running jobs without interruption |
| Consistency and queuing | Never run concurrent deployments to the same environment - use job queuing to enforce sequential execution |
CD Platform Features
Section titled “CD Platform Features”TACOS (Terraform Automation and Collaboration Software)
Section titled “TACOS (Terraform Automation and Collaboration Software)”Platforms that bundle delivery, state management, and private module registries. They manage the state backend transparently and provide web UIs to review previous state versions.
Key Differentiators
Section titled “Key Differentiators”| Feature | Details |
|---|---|
| Drift detection | Automatically detect when live infrastructure diverges from code; many teams enable alerts (e.g., Slack) without automatic correction to avoid unreviewed changes |
| Multi-IaC support | Some platforms support Helm, Pulumi, Ansible alongside Terraform - avoids maintaining separate deployment systems as you scale |
| Policy enforcement | Enforce rules at the deployment level (not module level, where users inject values via variables); most platforms standardise on OPA / Rego |
| Cost estimation | Built-in (HCP Terraform) or via Infracost; limited to major cloud providers; estimates only - cannot predict consumption-based spikes |
Platform Comparison
Section titled “Platform Comparison”| Platform | Type | Key characteristics |
|---|---|---|
| HCP Terraform | Managed TACOS | Deep CLI integration, built-in cost estimation; Terraform-only (no OpenTofu/Terragrunt); per-resource pricing can inflate costs |
| Env0 / Spacelift | Managed TACOS | OpenTofu sponsors; multi-framework (Terragrunt, Helm, Ansible, Pulumi); recommended for polished multi-tool experience |
| Scalr | Managed TACOS | OpenTofu sponsor; Terraform/OpenTofu-only; native CLI-driven workflows; excellent migration target from HCP Terraform |
| Digger / Terrateam | GitOps Plus | Deployment-focused; no state backend or registry; PR-comment-driven workflow tightly integrated with GitHub |
| Harness / Octopus Deploy | Enterprise CD | Broad platforms for mixed environments (IaC + legacy + hardware); no built-in registries or state management |
| Atlantis / Terrakube | Self-hosted OSS | Terrakube = traditional TACOS; Atlantis = PR-comment workflow; saves money but introduces administrative burden and security responsibility |