Skip to content

Testing IaC

Infrastructure code powers the systems your users depend on. Unlike application code, testing it means provisioning real cloud resources - which is slow, costs money, and requires careful orchestration. This page covers the theory, tools, and workflows that make IaC testing practical.


Building infrastructure is rarely a one-off task. Systems require continuous patching, upgrading, and improvement. Integrating testing directly into the daily development cycle - not deferring it to a “QA phase” - delivers three key advantages:

  • Painless go-lives - the first production deployment uses the same automation that ran throughout development, removing deployment drama
  • Avoiding technical debt - catching problems immediately is cheaper than investigating and rewriting broken code later
  • Tight feedback loops - validating each change as early as possible is the single strongest differentiator of high-performing teams

A comprehensive IaC testing strategy reaches well beyond “does the resource get created”:

Validation targetExample
Code qualitySyntax, formatting, complexity
FunctionalityAn HTTPS route reaches the web servers end-to-end
SecurityPorts, permissions, exposed secrets
ComplianceRegulatory, contractual, and internal policy adherence
ProvenanceSupply-chain checks - vulnerabilities, license compatibility, SBOM
PerformanceNetwork latency, connection throughput
ScalabilityAutoscaling triggers and the desired outcome (capacity actually increases)
AvailabilityDestroy a node → verify automatic replacement
OperabilityLogging, monitoring, scheduled maintenance tasks
ChallengeWhy it mattersMitigation
Declarative code is low-value to unit-testA test that restates a hardcoded value is redundantTest variable logic, combinations of declarations, and functional outcomes instead
Testing is slowProvisioning takes minutes to hoursDivide into small stacks, shift tests offline, use progressive pipelines
Testing costs moneyReal cloud resources incur chargesDestroy immediately, use isolated test accounts, automate cleanup
Complex dependenciesComponents depend on upstream/downstream stacksUse test fixtures, mocks, and local emulators

Terraform’s entire purpose is configuring external systems, which makes pure unit testing inherently difficult. The vast majority of IaC testing is integration testing - verifying how resources, modules, and systems interact. Newer features like native mocks (v1.7+) are beginning to close this gap, but integration tests remain the backbone.


Progressive testing runs suites in sequence - fast and narrow first, slow and broad later - optimising for fast, accurate feedback. Only after the quick checks pass does the pipeline invest in expensive cloud provisioning.

The Test Pyramid

A large base of fast, offline checks; a middle layer of integration tests; a small top of full-system tests. Higher levels test emergent behaviour, not individual components.

The Infrastructure Test Diamond

For declarative codebases the pyramid inverts: low-level unit tests are less valuable (they restate the code), so the bulk of testing sits in the middle layer - stack tests. The diamond tapers to fewer offline checks at the bottom and fewer system-wide tests at the top.

The Swiss Cheese Model

No single layer catches everything. Stack multiple layers so that the “holes” in one are covered by the next - just like stacking Swiss cheese slices eventually blocks every gap.

StageEnvironmentScopeSpeed
Build test (offline)Local / CI agentSyntax, linting, supply-chain, policySeconds
Stack test (online)IaaS - isolated stack onlyThe stack itself, using test fixtures for dependenciesMinutes
Integrated infra testIaaS - multiple stacksCross-stack integration pointsMinutes–hours
Product testIaaS - full stack + workloadApplication behaviour on top of infrastructureHours

Pre-release testing covers known unknowns. Production environments have characteristics that cannot be replicated:

  • Data - real-world volume and variety
  • Users - creative, unpredictable interactions
  • Traffic - long-term compounding effects (e.g. logs filling storage)
  • Concurrency - rare timing-dependent interactions

Safe production testing techniques: monitoring and observability, zero-downtime deployment, progressive rollout, dedicated test data records, and chaos engineering.


Offline tests run on a developer workstation or CI agent - no cloud provisioning required. They execute in seconds.

Terminal window
terraform init -backend=false # initialise without a backend
terraform validate # parse code, catch typos and naming errors

terraform validate does not need variables set, but it does require an initialised workspace. Use -backend=false for automated pipelines without a state backend.

TFLint evaluates code for bugs, errors, and style violations without running it:

.tflint.hcl
plugin "terraform" {
enabled = true
preset = "all" # "recommended" is the default; "all" adds rules like
# requiring descriptions on variables and outputs
}
plugin "aws" {
enabled = true
}
FeatureDetail
Plugin systemCore terraform plugin + cloud-specific plugins (AWS, GCP, Azure) + OPA plugin
ExceptionsDisable rules globally in .tflint.hcl or per-line with # tflint-ignore: rule_name
Autofixtflint --fix auto-corrects simple issues (e.g. comment style) - review findings first

Some tools connect to the cloud API to verify that referenced resources (AMIs, VM sizes) actually exist, catching errors that pure offline analysis misses.

Infrastructure code imports third-party modules, base images, and libraries. Supply-chain tools:

  • Check components against public vulnerability databases
  • Verify license compatibility with organisational policy
  • Generate a Software Bill of Materials (SBOM) for future vulnerability tracking
PlatformEmulator
AWSLocalStack, Moto
AzureAzurite

Security validation lets teams catch vulnerabilities before infrastructure is deployed.

ToolNotes
CheckovEvaluates Terraform, plans, Helm, CloudFormation. CLI-only, no central service required.
TrivySuccessor to TFSec. Broader provider coverage. Minimal configuration to start.

Both are free and run locally. Running them simultaneously adds redundancy - Trivy covers more providers, but Checkov contains unique rules Trivy may miss.

When a finding is intentional, create an explicit, documented exception:

Checkov:

resource "aws_s3_bucket" "public_assets" {
#checkov:skip=CKV_AWS_18:Intentionally public - serves static marketing assets
bucket = "acme-public-assets"
}

Trivy:

# Public bucket for static marketing assets
#trivy:ignore:AVD-AWS-0086
resource "aws_s3_bucket" "public_assets" {
bucket = "acme-public-assets"
}

Platforms like Snyk, Checkmarx, and Mend offer centralised dashboards, cross-project issue tracking, and SBOM generation. Even when using a commercial scanner, keep Checkov running as a free, extra layer.

Checkov (YAML) - the recommended approach for most teams:

metadata:
id: "CUSTOM_001"
name: "Block GPU instance families"
category: "FINANCE"
definition:
cond_type: "attribute"
resource_types:
- "aws_instance"
attribute: "instance_type"
operator: "not_starting_with"
value: "p3."

Store YAML policies in a central Git repository. Pull them at runtime with --external-checks-git. Checkov also supports Python for policies requiring API calls.

OPA with TFLint - uses the Rego policy language via TFLint’s OPA plugin. Powerful but Rego has a steep learning curve; prefer Checkov YAML unless your organisation already uses OPA elsewhere.


Some tools can preview changes before applying (Terraform plan, Pulumi preview). You can write automated assertions against the preview output:

  • Fail if the preview would destroy a database
  • Fail if a deprecated resource type would be created
  • Fail if a security-sensitive change is detected

Terratest (by Gruntwork, since 2018) is the long-standing standard for IaC testing. Tests are written in Go.

func TestWebServer(t *testing.T) {
opts := &terraform.Options{
TerraformDir: "../examples/web-server",
Vars: map[string]interface{}{
"environment": "test",
"name_suffix": random.UniqueId(), // avoid name collisions
},
}
defer terraform.Destroy(t, opts)
terraform.InitAndApply(t, opts)
url := terraform.Output(t, opts, "endpoint_url")
http_helper.HttpGetWithRetry(t, url, nil, 200, "OK", 10, 5*time.Second)
}
  • Maturity - 20+ helper packages for AWS, Azure, DNS, HTTP, SSH, Docker, Kubernetes, etc.
  • AI support - LLMs generate Terratest/Go code effectively because of abundant training data
  • Full language power - loops, error handling, retries, custom assertions
  • Requires learning Go
  • Can only validate via Terraform outputs - cannot access internal module state directly

Introduced in Terraform/OpenTofu v1.6, the native framework lets you write tests in HCL - no second language required.

tests/web_server.tftest.hcl
variables {
environment = "test"
}
run "create_web_server" {
command = apply
assert {
condition = aws_instance.web.tags["Environment"] == "test"
error_message = "Instance was not tagged with the correct environment."
}
assert {
condition = output.endpoint_url != ""
error_message = "Endpoint URL output must not be empty."
}
}
  • No Go required - tests are pure HCL
  • Internal module access - can reference local values, internal resource attributes, and data sources directly (no need to expose them as outputs)
  • Mocks (v1.7+) - replace real providers with fakes; default return values (0 for numbers, false for bools, "" for strings) or inject specific values via override_resource / override_data blocks
mock_provider "aws" {
override_resource {
target = aws_instance.web
values = {
id = "i-mock123"
public_ip = "10.0.0.1"
}
}
}
run "plan_only" {
command = plan
assert {
condition = aws_instance.web.instance_type == "t3.micro"
error_message = "Expected t3.micro instance type."
}
}

Mocks avoid provisioning real infrastructure entirely - tests run in seconds and cost nothing.

  • Version-locked - cannot test code targeting older Terraform versions (e.g., you cannot use mocks with v1.6, and the framework itself doesn’t exist before v1.6)
  • AI support is weak - LLMs currently struggle with the native framework due to insufficient training data. Teams relying heavily on AI-generated tests should use Terratest until models catch up.

How you manage the cloud environments where tests run has a major impact on speed, reliability, and cost.

PatternHow it worksSpeedReliabilityTrade-off
Persistent stackAlways running; pipeline updates it✅ Fast⚠️ Can get “wedged”Failed updates may block the entire team
Ephemeral stackCreated from scratch, destroyed after every run❌ Slow✅ Clean every timeToo slow for rapid development; destroy can fail, requiring cleanup tools
Dual persistent + ephemeralBoth run in parallelMixedMixedAntipattern - combines the worst of both; teams still unwedge persistent and wait for ephemeral
Periodic rebuildPersistent during the day; destroyed and rebuilt nightly✅ Fast (daytime)⚠️ Masks design issuesHides accumulated state problems that could cause production outages
Continuous resetAfter each successful test, a background job destroys and rebuilds using the last production version✅ Fast✅ CleanIf the background rebuild fails silently, the next developer discovers a broken environment

When provisioning a full dependency graph is too slow or introduces unrelated failures, use test fixtures - lightweight stand-ins:

Replacing upstream providers:

Instead of deploying a production-grade VPC with security controls and logging, deploy a minimal fixture with just a VPC and two subnets.

Replacing downstream consumers:

Deploy a lightweight serverless function and an external client. Assert that the client can connect through the infrastructure you’re testing (gateway, routing rules) and receive a 200 OK.

Orchestrate the full lifecycle: create fixtures → provision stack → run validations → consolidate results → destroy everything.

Guidelines:

  1. Support local testing - developers must be able to run the same test scripts on their workstation before pushing. Use the exact same orchestration scripts locally and in CI.
  2. Don’t couple to the CI tool - write orchestration in standalone scripts (Bash, Python, Make). The CI stage should do nothing but call the script. This keeps tests portable across CI platforms and runnable locally.

Tests interact with real cloud APIs and need credentials.

ApproachRecommendation
OIDC✅ Preferred - short-lived, automatically rotated, no secrets to leak
Static credentials⚠️ Avoid in CI - use only for local development if OIDC is impractical

Multiple PRs running tests simultaneously will collide if they create identically-named resources. Inject randomness:

resource "random_string" "suffix" {
length = 6
special = false
upper = false
}

For resources that require global uniqueness even after deletion (e.g., AWS Secrets Manager), add the random suffix directly in the module, not just in tests.

Tests will inevitably crash before cleanup runs. Protect your budget:

  • Never test in production accounts - use isolated test accounts
  • Schedule nuke jobs - run tools like aws-nuke or azure-nuke nightly via a scheduled CI job to erase everything in test accounts
  • Use resource groups (Azure, GCP) - group test infrastructure and delete the entire group for guaranteed cleanup

Test suites are code. Apply the same standards:

  • Clear variable names and extensive comments explaining each test’s goal
  • Review test code in PRs with the same rigour as infrastructure code
  • Refactor test suites when they become harder to maintain than the infrastructure they validate