Skip to content

State Management

Terraform is a stateful tool. It maintains a local record of the infrastructure it manages so it can compare your configuration against reality and produce accurate, minimal-change plans. This page covers why state exists, how it is structured, how to store it safely, how to manipulate it, and how to share it across projects.


Terraform’s decision to use state rather than querying providers on every run is a deliberate design choice that delivers four concrete advantages:

State maps Terraform code to real-world infrastructure by recording each resource’s true vendor identifier (e.g. an AWS ARN or GCP resource ID). Without state, Terraform would have to discover infrastructure by querying vendor APIs using heuristics like resource tags - but not all resources support tags, manual tag changes make this brittle, and every vendor has a different tag-search API. State solves this cleanly.

Removing state would force Terraform to implement multiple, vendor-specific lookup strategies, making the codebase harder to debug and extend. By keeping state, HashiCorp can keep providers simple and well-defined, lowering the barrier for third-party provider authors.

Looking up a resource from a saved identifier in the state file is orders of magnitude faster than querying external vendor APIs on every plan. Fast plans keep developers in flow; slow ones break concentration and stall debugging cycles. State also accelerates subcommands like terraform graph that can reference state instead of performing a full refresh.

Some resources produce no cloud infrastructure at all - they exist only inside the state file, generating data that feeds into other resources. See State-Only Resources below.


State Security, Resiliency, and Availability

Section titled “State Security, Resiliency, and Availability”

Choosing and configuring a state backend requires evaluating three properties:

Losing or corrupting a state file is one of the worst outcomes for a Terraform team. Without state, Terraform cannot produce an upgrade plan and will either error or attempt to re-create all infrastructure from scratch. Recovery is tedious: you must either manually delete cloud resources or painstakingly import them back.

  • Choose a backend with a proven durability track record (AWS S3 advertises 99.999999999% - “eleven nines” - object durability)
  • Always configure and regularly test backups - even a durable backend cannot protect against an engineer accidentally deleting the storage bucket

Common vulnerabilities:

  • Not enforcing MFA (leaves accounts vulnerable to brute-force / credential-stuffing attacks)
  • Accidentally misconfiguring a storage backend to allow public access

Mitigation: use a secrets manager (Vault, AWS Secrets Manager) to keep sensitive values out of your configuration in the first place. Note that some values must still flow through Terraform and will appear in state regardless.

If the state backend is unreachable, Terraform is completely blocked - including during live incidents when engineers need to deploy emergency fixes. Measure availability in “nines”:

UptimeDowntime per month
99%~7 h 18 min
99.9%~43 min
99.99% (“four nines”)~4 min 30 sec

Expect at least four nines from any production state backend. Avoid vendors that do not publish SLAs with financial penalties for breaches.


By default Terraform stores state locally in terraform.tfstate. This is fine for experimentation but fails all production requirements - it cannot be shared across a team, has no locking, and is not backed up.

Backends are built into Terraform and cannot be added as third-party extensions. The primary differentiator between them is storage location and authentication mechanism:

CloudRecommended Backend
AWSs3 (+ DynamoDB for locking)
Azureazurerm
Google Cloudgcs
Multi-cloud / SaaSTerraform Cloud / TACOS (see below)

TACOS (Terraform Automation and Collaboration Software) - such as HashiCorp Terraform Cloud, Scalr, and Env0 - provide specialized backends with additional features like audit logs, policy enforcement, and remote execution.

  • Restrict access - only authorised users and CI/CD pipelines should be able to reach the backend
  • Enable logging and encryption explicitly if the vendor does not do so by default
  • Configure state locking - some backends (like S3) require a separate resource (DynamoDB) to enforce locks that prevent concurrent applies from corrupting state
  • Test backups regularly - untested backups are functionally worthless during a disaster

The backend Block and Partial Configurations

Section titled “The backend Block and Partial Configurations”
terraform {
backend "s3" {
bucket = "my-tf-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
Terminal window
# Supply credentials separately at init time
terraform init -backend-config="access_key=..." -backend-config="secret_key=..."
# Or point to a separate file
terraform init -backend-config=backend.hcl

The cloud Block (Terraform Cloud / Enterprise)

Section titled “The cloud Block (Terraform Cloud / Enterprise)”

The cloud block is a special backend variant that executes plan and apply operations remotely rather than locally:

terraform {
cloud {
organization = "my-org"
workspaces {
name = "production"
}
}
}

Authentication is handled via terraform login (stores a secure token locally) - no hardcoded credentials required. Use either backend or cloud - not both.

Workspaces let a single configuration manage multiple independent state files:

Terminal window
terraform workspace new staging
terraform workspace select staging
terraform workspace list

Inside configuration code, reference the current workspace name:

resource "aws_instance" "web" {
instance_type = terraform.workspace == "prod" ? "t3.medium" : "t3.micro"
}

When you change the backend block, running terraform init detects the change and prompts you to migrate existing state:

Terminal window
terraform init -migrate-state

Always check Terraform upgrade notes - backend parameters occasionally change or are deprecated between CLI versions.


State is stored as JSON (terraform.tfstate for the local backend). Understanding the schema helps with troubleshooting, backup management, and writing automation that interacts with state.

FieldPurpose
versionSchema version of the state format. Terraform uses this to maintain backward compatibility when the format changes.
terraform_versionThe CLI version (e.g. "1.7.2") that last wrote this state.
lineageA UUID generated at terraform init. Never changes. Backends use it to ensure one project’s state is never accidentally overwritten by another’s.
serialInteger that increments by 1 on every save. Backends use it to reject pushes of older state over newer state.
resourcesArray of every resource and data source under Terraform’s control - see below.
outputsRoot-module output values. Only root outputs are saved here; child-module outputs must be explicitly re-exported.
check_resultsPass/fail results of check block assertions from both root and child modules.

Each entry contains:

  • module, type, name - together form the unique resource address
  • provider - the provider that created it; Terraform uses this to detect provider switches
  • attributes - every argument and computed attribute for that resource, stored in plain text

Root-level outputs saved in state are what:

  • terraform show displays
  • terraform_remote_state data sources in other projects can read

Some operations - renaming a resource, moving it into a module, or stopping Terraform from managing it - require modifying state without touching real infrastructure.

Terminal window
# Backup
terraform state pull > my_backup.tfstate
# Restore (bypass serial protection with -force if reverting)
terraform state push -force my_backup.tfstate
Section titled “Method 1 · Code-Driven Changes (Recommended)”

Changes recorded in version control - repeatable, reviewable, safe for module authors.

moved block - tell Terraform a resource was renamed or re-scoped:

moved {
from = aws_instance.old_name
to = aws_instance.new_name
}

Safe to leave in permanently - in fresh environments where the old address never existed, Terraform simply ignores it.

removed block - stop managing a resource without destroying it:

removed {
from = aws_instance.legacy
lifecycle {
destroy = false
}
}

Use when code blocks are insufficient.

CommandPurpose
terraform state listList all resource addresses in current state
terraform state rm <address>Remove a resource from state without destroying it. ⚠️ Also remove it from config or Terraform will recreate it next run.
terraform state replace-provider <old> <new>Update the recorded provider for resources (e.g. when switching to a fork or dev build)

Method 3 · Manually Editing State (Last Resort)

Section titled “Method 3 · Manually Editing State (Last Resort)”

Only attempt if state is corrupted and all other tools have failed.

  • Always have a backup first
  • Run edits through a JSON validator - state format is unforgiving
  • Manually increment the serial field by 1 so terraform state push accepts it without -force
  • Diff the file before and after to confirm only the intended changes were made

State drift occurs when infrastructure changes outside of Terraform, causing the real world to diverge from saved state. Terraform detects drift during the refresh phase at the start of every terraform plan.

Before applying a plan that resolves drift, always ask: What changed, and why?

1 · Accidental Manual Changes (Human Error)

Section titled “1 · Accidental Manual Changes (Human Error)”

Cause: An engineer modified the wrong resource - wrong account, wrong command.

Fix: Usually safe to simply terraform plan and apply - Terraform will revert the resource back to the desired state.

Prevention: Restrict direct production access; enforce all changes through CI/CD pipelines.

Cause: Emergency intervention - an on-call engineer manually fixes an outage.

Danger: If Terraform runs before the code is updated, it will revert the emergency fix.

Fix: Update the Terraform code to reflect the emergency changes before running Terraform again. Establish a policy that all changes - including emergency ones - are eventually codified.

Cause: External systems change managed resources as designed - autoscaling group adjusting instance counts, cloud vendor applying a minor database version upgrade, an orchestrator adding tags.

Fix: Since these are intentional, use the ignore_changes lifecycle rule to tell Terraform to disregard specific attributes:

lifecycle {
ignore_changes = [desired_count, tags["LastApplied"]]
}

For expected automated upgrades (minor DB versions, etc.), a terraform apply -refresh-only harmlessly updates state to match reality without touching the infrastructure.

4 · Terraform Errors (Failed State Write)

Section titled “4 · Terraform Errors (Failed State Write)”

Cause: Terraform successfully created or modified infrastructure but crashed or lost backend connectivity before writing the result to state.

Danger: On the next run, Terraform has no record of the new resources and will attempt to create them again - producing duplicate infrastructure or blocked deployments.

Fix:

  • Review logs carefully to identify what Terraform created before the failure
  • Import orphaned resources: terraform import <address> <cloud-resource-id>
  • Or manually delete the cloud resources and let Terraform recreate them cleanly
  • If the state file is fully corrupted, restore from a tested backup

As configurations grow, splitting into multiple projects improves plan performance, isolates long-running resources, and aligns with team ownership (Conway’s Law). Cross-project state access lets independent projects share data without duplicating code.

data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "my-tf-state"
key = "networking/terraform.tfstate"
region = "us-east-1"
}
defaults = {
vpc_id = null # soft dependency - project still runs if networking isn't ready yet
}
}
resource "aws_instance" "app" {
subnet_id = data.terraform_remote_state.networking.outputs.private_subnet_id
}

Key characteristics:

  • Built into Terraform - not an external provider
  • Read-only - cannot accidentally modify the remote project’s infrastructure
  • Root outputs only - can only read outputs defined in the remote project’s root module; child-module outputs must be explicitly re-exported

Before reaching for terraform_remote_state, evaluate these safer options:

AlternativeHowTrade-off
Native data sourcesLook up physical resources via vendor API (e.g. AWS tags, GCP labels)Requires strict tagging; doesn’t expose full state
Input variablesPass required values as module inputsSafer; more manual work to keep values in sync
  • Top-level injection - place all terraform_remote_state calls in the root module only; pass looked-up values down as input variables to child modules. Keeps core modules reusable and free of remote-state logic.
  • Dedicated lookup modules - for internal organisations, build a module whose sole purpose is to perform a specific remote-state lookup (e.g. a network-lookup module). Encapsulates the access pattern.

Some Terraform resources generate no cloud infrastructure. They calculate values, persist them in state between runs, and act as stable, plan-time-known data sources for other resources.

Native Terraform functions like timestamp() and uuid() are impure - they return a different value every time they are called. Using them inside resource arguments causes eternal drift: Terraform always detects a change, even immediately after a successful apply. State-only resources solve this by generating the value once, saving it to state, and keeping it stable across plans.

Generates random passwords, UUIDs, integers, and readable names:

resource "random_password" "db" {
length = 24
special = true
keepers = {
# Regenerate the password only when the DB instance name changes
db_instance = var.db_instance_name
}
}
resource "aws_db_instance" "main" {
password = random_password.db.result
}

Avoids drift from the impure timestamp() function:

ResourcePurpose
time_staticRecords the timestamp of first creation; never changes
time_offsetRecords a time offset (e.g. 2 hours in the future)
time_rotatingHolds a timestamp that automatically regenerates after a configured lifespan
time_sleepIntroduces a deliberate delay in the execution graph - useful for waiting for a provisioning script on one machine to finish before launching a dependent resource

null_resource fully implements the Terraform lifecycle (create/update/delete) but does absolutely nothing. Primary uses:

  • Testing CI/CD pipelines and Terraform automation without spinning up real infrastructure
  • Historically: running custom provisioner blocks

The modern replacement for null_resource, built directly into Terraform - no provider download required.

# Solve the replace_triggered_by limitation: lifecycle rules cannot
# natively reference local variables, but they can reference resources
resource "terraform_data" "trigger" {
input = local.config_hash
}
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
lifecycle {
replace_triggered_by = [terraform_data.trigger]
}
}

Prefer terraform_data over null_resource in all new code.