State Management

Terraform is a stateful tool. It maintains a local record of the infrastructure it manages so it can compare your configuration against reality and produce accurate, minimal-change plans. This page covers why state exists, how it is structured, how to store it safely, how to manipulate it, and how to share it across projects.

Purpose of State

Terraform’s decision to use state rather than querying providers on every run is a deliberate design choice that delivers four concrete advantages:

1 · Real-World Linkage

State maps Terraform code to real-world infrastructure by recording each resource’s true vendor identifier (e.g. an AWS ARN or GCP resource ID). Without state, Terraform would have to discover infrastructure by querying vendor APIs using heuristics like resource tags - but not all resources support tags, manual tag changes make this brittle, and every vendor has a different tag-search API. State solves this cleanly.

2 · Reduced Engine Complexity

Removing state would force Terraform to implement multiple, vendor-specific lookup strategies, making the codebase harder to debug and extend. By keeping state, HashiCorp can keep providers simple and well-defined, lowering the barrier for third-party provider authors.

3 · Performance

Looking up a resource from a saved identifier in the state file is orders of magnitude faster than querying external vendor APIs on every plan. Fast plans keep developers in flow; slow ones break concentration and stall debugging cycles. State also accelerates subcommands like terraform graph that can reference state instead of performing a full refresh.

4 · State-Only Resources

Some resources produce no cloud infrastructure at all - they exist only inside the state file, generating data that feeds into other resources. See State-Only Resources below.

State Security, Resiliency, and Availability

Choosing and configuring a state backend requires evaluating three properties:

Resiliency (Preventing Data Loss)

Losing or corrupting a state file is one of the worst outcomes for a Terraform team. Without state, Terraform cannot produce an upgrade plan and will either error or attempt to re-create all infrastructure from scratch. Recovery is tedious: you must either manually delete cloud resources or painstakingly import them back.

Choose a backend with a proven durability track record (AWS S3 advertises 99.999999999% - “eleven nines” - object durability)
Always configure and regularly test backups - even a durable backend cannot protect against an engineer accidentally deleting the storage bucket

Security (Protecting Data)

Common vulnerabilities:

Not enforcing MFA (leaves accounts vulnerable to brute-force / credential-stuffing attacks)
Accidentally misconfiguring a storage backend to allow public access

Mitigation: use a secrets manager (Vault, AWS Secrets Manager) to keep sensitive values out of your configuration in the first place. Note that some values must still flow through Terraform and will appear in state regardless.

Availability (Ensuring Access)

If the state backend is unreachable, Terraform is completely blocked - including during live incidents when engineers need to deploy emergency fixes. Measure availability in “nines”:

Uptime	Downtime per month
99%	~7 h 18 min
99.9%	~43 min
99.99% (“four nines”)	~4 min 30 sec

Expect at least four nines from any production state backend. Avoid vendors that do not publish SLAs with financial penalties for breaches.

Remote Backends

By default Terraform stores state locally in terraform.tfstate. This is fine for experimentation but fails all production requirements - it cannot be shared across a team, has no locking, and is not backed up.

Choosing a Backend

Backends are built into Terraform and cannot be added as third-party extensions. The primary differentiator between them is storage location and authentication mechanism:

Cloud	Recommended Backend
AWS	`s3` (+ DynamoDB for locking)
Azure	`azurerm`
Google Cloud	`gcs`
Multi-cloud / SaaS	Terraform Cloud / TACOS (see below)

TACOS (Terraform Automation and Collaboration Software) - such as HashiCorp Terraform Cloud, Scalr, and Env0 - provide specialized backends with additional features like audit logs, policy enforcement, and remote execution.

Backend Configuration Best Practices

Restrict access - only authorised users and CI/CD pipelines should be able to reach the backend
Enable logging and encryption explicitly if the vendor does not do so by default
Configure state locking - some backends (like S3) require a separate resource (DynamoDB) to enforce locks that prevent concurrent applies from corrupting state
Test backups regularly - untested backups are functionally worthless during a disaster

The `backend` Block and Partial Configurations

terraform {
  backend "s3" {
    bucket         = "my-tf-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

# Supply credentials separately at init time
terraform init -backend-config="access_key=..." -backend-config="secret_key=..."

# Or point to a separate file
terraform init -backend-config=backend.hcl

The `cloud` Block (Terraform Cloud / Enterprise)

The cloud block is a special backend variant that executes plan and apply operations remotely rather than locally:

terraform {
  cloud {
    organization = "my-org"
    workspaces {
      name = "production"
    }
  }
}

Authentication is handled via terraform login (stores a secure token locally) - no hardcoded credentials required. Use either backend or cloud - not both.

Workspaces

Workspaces let a single configuration manage multiple independent state files:

terraform workspace new staging
terraform workspace select staging
terraform workspace list

Inside configuration code, reference the current workspace name:

resource "aws_instance" "web" {
  instance_type = terraform.workspace == "prod" ? "t3.medium" : "t3.micro"
}

Migrating Between Backends

When you change the backend block, running terraform init detects the change and prompts you to migrate existing state:

terraform init -migrate-state

Always check Terraform upgrade notes - backend parameters occasionally change or are deprecated between CLI versions.

Dissecting State Structure

State is stored as JSON (terraform.tfstate for the local backend). Understanding the schema helps with troubleshooting, backup management, and writing automation that interacts with state.

Top-Level Fields

Field	Purpose
`version`	Schema version of the state format. Terraform uses this to maintain backward compatibility when the format changes.
`terraform_version`	The CLI version (e.g. `"1.7.2"`) that last wrote this state.
`lineage`	A UUID generated at `terraform init`. Never changes. Backends use it to ensure one project’s state is never accidentally overwritten by another’s.
`serial`	Integer that increments by 1 on every save. Backends use it to reject pushes of older state over newer state.
`resources`	Array of every resource and data source under Terraform’s control - see below.
`outputs`	Root-module output values. Only root outputs are saved here; child-module outputs must be explicitly re-exported.
`check_results`	Pass/fail results of `check` block assertions from both root and child modules.

The `resources` Array

Each entry contains:

module, type, name - together form the unique resource address
provider - the provider that created it; Terraform uses this to detect provider switches
attributes - every argument and computed attribute for that resource, stored in plain text

Why `outputs` Matter

Root-level outputs saved in state are what:

terraform show displays
terraform_remote_state data sources in other projects can read

Manipulating State

Some operations - renaming a resource, moving it into a module, or stopping Terraform from managing it - require modifying state without touching real infrastructure.

# Backup
terraform state pull > my_backup.tfstate

# Restore (bypass serial protection with -force if reverting)
terraform state push -force my_backup.tfstate

Method 1 · Code-Driven Changes (Recommended)

Changes recorded in version control - repeatable, reviewable, safe for module authors.

moved block - tell Terraform a resource was renamed or re-scoped:

moved {
  from = aws_instance.old_name
  to   = aws_instance.new_name
}

Safe to leave in permanently - in fresh environments where the old address never existed, Terraform simply ignores it.

removed block - stop managing a resource without destroying it:

removed {
  from = aws_instance.legacy
  lifecycle {
    destroy = false
  }
}

Method 2 · CLI-Driven Changes

Use when code blocks are insufficient.

Command	Purpose
`terraform state list`	List all resource addresses in current state
`terraform state rm <address>`	Remove a resource from state without destroying it. ⚠️ Also remove it from config or Terraform will recreate it next run.
`terraform state replace-provider <old> <new>`	Update the recorded provider for resources (e.g. when switching to a fork or dev build)

Method 3 · Manually Editing State (Last Resort)

Only attempt if state is corrupted and all other tools have failed.

Always have a backup first
Run edits through a JSON validator - state format is unforgiving
Manually increment the serial field by 1 so terraform state push accepts it without -force
Diff the file before and after to confirm only the intended changes were made

State Drift

State drift occurs when infrastructure changes outside of Terraform, causing the real world to diverge from saved state. Terraform detects drift during the refresh phase at the start of every terraform plan.

Before applying a plan that resolves drift, always ask: What changed, and why?

Categories of Drift

1 · Accidental Manual Changes (Human Error)

Cause: An engineer modified the wrong resource - wrong account, wrong command.

Fix: Usually safe to simply terraform plan and apply - Terraform will revert the resource back to the desired state.

Prevention: Restrict direct production access; enforce all changes through CI/CD pipelines.

2 · Intentional Manual Changes

Cause: Emergency intervention - an on-call engineer manually fixes an outage.

Danger: If Terraform runs before the code is updated, it will revert the emergency fix.

Fix: Update the Terraform code to reflect the emergency changes before running Terraform again. Establish a policy that all changes - including emergency ones - are eventually codified.

3 · Conflicting Automated Changes

Cause: External systems change managed resources as designed - autoscaling group adjusting instance counts, cloud vendor applying a minor database version upgrade, an orchestrator adding tags.

Fix: Since these are intentional, use the ignore_changes lifecycle rule to tell Terraform to disregard specific attributes:

lifecycle {
  ignore_changes = [desired_count, tags["LastApplied"]]
}

For expected automated upgrades (minor DB versions, etc.), a terraform apply -refresh-only harmlessly updates state to match reality without touching the infrastructure.

4 · Terraform Errors (Failed State Write)

Cause: Terraform successfully created or modified infrastructure but crashed or lost backend connectivity before writing the result to state.

Danger: On the next run, Terraform has no record of the new resources and will attempt to create them again - producing duplicate infrastructure or blocked deployments.

Fix:

Review logs carefully to identify what Terraform created before the failure
Import orphaned resources: terraform import <address> <cloud-resource-id>
Or manually delete the cloud resources and let Terraform recreate them cleanly
If the state file is fully corrupted, restore from a tested backup

Accessing State Across Projects

As configurations grow, splitting into multiple projects improves plan performance, isolates long-running resources, and aligns with team ownership (Conway’s Law). Cross-project state access lets independent projects share data without duplicating code.

The `terraform_remote_state` Data Source

data "terraform_remote_state" "networking" {
  backend = "s3"
  config = {
    bucket = "my-tf-state"
    key    = "networking/terraform.tfstate"
    region = "us-east-1"
  }

  defaults = {
    vpc_id = null   # soft dependency - project still runs if networking isn't ready yet
  }
}

resource "aws_instance" "app" {
  subnet_id = data.terraform_remote_state.networking.outputs.private_subnet_id
}

Key characteristics:

Built into Terraform - not an external provider
Read-only - cannot accidentally modify the remote project’s infrastructure
Root outputs only - can only read outputs defined in the remote project’s root module; child-module outputs must be explicitly re-exported

Alternatives to Remote State

Before reaching for terraform_remote_state, evaluate these safer options:

Alternative	How	Trade-off
Native data sources	Look up physical resources via vendor API (e.g. AWS tags, GCP labels)	Requires strict tagging; doesn’t expose full state
Input variables	Pass required values as module inputs	Safer; more manual work to keep values in sync

Structuring Remote State Access Cleanly

Top-level injection - place all terraform_remote_state calls in the root module only; pass looked-up values down as input variables to child modules. Keeps core modules reusable and free of remote-state logic.
Dedicated lookup modules - for internal organisations, build a module whose sole purpose is to perform a specific remote-state lookup (e.g. a network-lookup module). Encapsulates the access pattern.

State-Only Resources

Some Terraform resources generate no cloud infrastructure. They calculate values, persist them in state between runs, and act as stable, plan-time-known data sources for other resources.

Why State-Only Resources Exist

Native Terraform functions like timestamp() and uuid() are impure - they return a different value every time they are called. Using them inside resource arguments causes eternal drift: Terraform always detects a change, even immediately after a successful apply. State-only resources solve this by generating the value once, saving it to state, and keeping it stable across plans.

The Random Provider

Generates random passwords, UUIDs, integers, and readable names:

resource "random_password" "db" {
  length  = 24
  special = true

  keepers = {
    # Regenerate the password only when the DB instance name changes
    db_instance = var.db_instance_name
  }
}

resource "aws_db_instance" "main" {
  password = random_password.db.result
}

The Time Provider

Avoids drift from the impure timestamp() function:

Resource	Purpose
`time_static`	Records the timestamp of first creation; never changes
`time_offset`	Records a time offset (e.g. 2 hours in the future)
`time_rotating`	Holds a timestamp that automatically regenerates after a configured lifespan
`time_sleep`	Introduces a deliberate delay in the execution graph - useful for waiting for a provisioning script on one machine to finish before launching a dependent resource

The Null Provider (`null_resource`)

null_resource fully implements the Terraform lifecycle (create/update/delete) but does absolutely nothing. Primary uses:

Testing CI/CD pipelines and Terraform automation without spinning up real infrastructure
Historically: running custom provisioner blocks

`terraform_data` (Terraform v1.4+)

The modern replacement for null_resource, built directly into Terraform - no provider download required.

# Solve the replace_triggered_by limitation: lifecycle rules cannot
# natively reference local variables, but they can reference resources
resource "terraform_data" "trigger" {
  input = local.config_hash
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"

  lifecycle {
    replace_triggered_by = [terraform_data.trigger]
  }
}

Prefer terraform_data over null_resource in all new code.