State Management
Terraform is a stateful tool. It maintains a local record of the infrastructure it manages so it can compare your configuration against reality and produce accurate, minimal-change plans. This page covers why state exists, how it is structured, how to store it safely, how to manipulate it, and how to share it across projects.
Purpose of State
Section titled “Purpose of State”Terraform’s decision to use state rather than querying providers on every run is a deliberate design choice that delivers four concrete advantages:
1 · Real-World Linkage
Section titled “1 · Real-World Linkage”State maps Terraform code to real-world infrastructure by recording each resource’s true vendor identifier (e.g. an AWS ARN or GCP resource ID). Without state, Terraform would have to discover infrastructure by querying vendor APIs using heuristics like resource tags - but not all resources support tags, manual tag changes make this brittle, and every vendor has a different tag-search API. State solves this cleanly.
2 · Reduced Engine Complexity
Section titled “2 · Reduced Engine Complexity”Removing state would force Terraform to implement multiple, vendor-specific lookup strategies, making the codebase harder to debug and extend. By keeping state, HashiCorp can keep providers simple and well-defined, lowering the barrier for third-party provider authors.
3 · Performance
Section titled “3 · Performance”Looking up a resource from a saved identifier in the state file is orders of magnitude faster than querying external vendor APIs on every plan. Fast plans keep developers in flow; slow ones break concentration and stall debugging cycles. State also accelerates subcommands like terraform graph that can reference state instead of performing a full refresh.
4 · State-Only Resources
Section titled “4 · State-Only Resources”Some resources produce no cloud infrastructure at all - they exist only inside the state file, generating data that feeds into other resources. See State-Only Resources below.
State Security, Resiliency, and Availability
Section titled “State Security, Resiliency, and Availability”Choosing and configuring a state backend requires evaluating three properties:
Resiliency (Preventing Data Loss)
Section titled “Resiliency (Preventing Data Loss)”Losing or corrupting a state file is one of the worst outcomes for a Terraform team. Without state, Terraform cannot produce an upgrade plan and will either error or attempt to re-create all infrastructure from scratch. Recovery is tedious: you must either manually delete cloud resources or painstakingly import them back.
- Choose a backend with a proven durability track record (AWS S3 advertises 99.999999999% - “eleven nines” - object durability)
- Always configure and regularly test backups - even a durable backend cannot protect against an engineer accidentally deleting the storage bucket
Security (Protecting Data)
Section titled “Security (Protecting Data)”Common vulnerabilities:
- Not enforcing MFA (leaves accounts vulnerable to brute-force / credential-stuffing attacks)
- Accidentally misconfiguring a storage backend to allow public access
Mitigation: use a secrets manager (Vault, AWS Secrets Manager) to keep sensitive values out of your configuration in the first place. Note that some values must still flow through Terraform and will appear in state regardless.
Availability (Ensuring Access)
Section titled “Availability (Ensuring Access)”If the state backend is unreachable, Terraform is completely blocked - including during live incidents when engineers need to deploy emergency fixes. Measure availability in “nines”:
| Uptime | Downtime per month |
|---|---|
| 99% | ~7 h 18 min |
| 99.9% | ~43 min |
| 99.99% (“four nines”) | ~4 min 30 sec |
Expect at least four nines from any production state backend. Avoid vendors that do not publish SLAs with financial penalties for breaches.
Remote Backends
Section titled “Remote Backends”By default Terraform stores state locally in terraform.tfstate. This is fine for experimentation but fails all production requirements - it cannot be shared across a team, has no locking, and is not backed up.
Choosing a Backend
Section titled “Choosing a Backend”Backends are built into Terraform and cannot be added as third-party extensions. The primary differentiator between them is storage location and authentication mechanism:
| Cloud | Recommended Backend |
|---|---|
| AWS | s3 (+ DynamoDB for locking) |
| Azure | azurerm |
| Google Cloud | gcs |
| Multi-cloud / SaaS | Terraform Cloud / TACOS (see below) |
TACOS (Terraform Automation and Collaboration Software) - such as HashiCorp Terraform Cloud, Scalr, and Env0 - provide specialized backends with additional features like audit logs, policy enforcement, and remote execution.
Backend Configuration Best Practices
Section titled “Backend Configuration Best Practices”- Restrict access - only authorised users and CI/CD pipelines should be able to reach the backend
- Enable logging and encryption explicitly if the vendor does not do so by default
- Configure state locking - some backends (like S3) require a separate resource (DynamoDB) to enforce locks that prevent concurrent applies from corrupting state
- Test backups regularly - untested backups are functionally worthless during a disaster
The backend Block and Partial Configurations
Section titled “The backend Block and Partial Configurations”terraform { backend "s3" { bucket = "my-tf-state" key = "prod/terraform.tfstate" region = "us-east-1" dynamodb_table = "terraform-locks" encrypt = true }}# Supply credentials separately at init timeterraform init -backend-config="access_key=..." -backend-config="secret_key=..."
# Or point to a separate fileterraform init -backend-config=backend.hclThe cloud Block (Terraform Cloud / Enterprise)
Section titled “The cloud Block (Terraform Cloud / Enterprise)”The cloud block is a special backend variant that executes plan and apply operations remotely rather than locally:
terraform { cloud { organization = "my-org" workspaces { name = "production" } }}Authentication is handled via terraform login (stores a secure token locally) - no hardcoded credentials required. Use either backend or cloud - not both.
Workspaces
Section titled “Workspaces”Workspaces let a single configuration manage multiple independent state files:
terraform workspace new stagingterraform workspace select stagingterraform workspace listInside configuration code, reference the current workspace name:
resource "aws_instance" "web" { instance_type = terraform.workspace == "prod" ? "t3.medium" : "t3.micro"}Migrating Between Backends
Section titled “Migrating Between Backends”When you change the backend block, running terraform init detects the change and prompts you to migrate existing state:
terraform init -migrate-stateAlways check Terraform upgrade notes - backend parameters occasionally change or are deprecated between CLI versions.
Dissecting State Structure
Section titled “Dissecting State Structure”State is stored as JSON (terraform.tfstate for the local backend). Understanding the schema helps with troubleshooting, backup management, and writing automation that interacts with state.
Top-Level Fields
Section titled “Top-Level Fields”| Field | Purpose |
|---|---|
version | Schema version of the state format. Terraform uses this to maintain backward compatibility when the format changes. |
terraform_version | The CLI version (e.g. "1.7.2") that last wrote this state. |
lineage | A UUID generated at terraform init. Never changes. Backends use it to ensure one project’s state is never accidentally overwritten by another’s. |
serial | Integer that increments by 1 on every save. Backends use it to reject pushes of older state over newer state. |
resources | Array of every resource and data source under Terraform’s control - see below. |
outputs | Root-module output values. Only root outputs are saved here; child-module outputs must be explicitly re-exported. |
check_results | Pass/fail results of check block assertions from both root and child modules. |
The resources Array
Section titled “The resources Array”Each entry contains:
module,type,name- together form the unique resource addressprovider- the provider that created it; Terraform uses this to detect provider switchesattributes- every argument and computed attribute for that resource, stored in plain text
Why outputs Matter
Section titled “Why outputs Matter”Root-level outputs saved in state are what:
terraform showdisplaysterraform_remote_statedata sources in other projects can read
Manipulating State
Section titled “Manipulating State”Some operations - renaming a resource, moving it into a module, or stopping Terraform from managing it - require modifying state without touching real infrastructure.
# Backupterraform state pull > my_backup.tfstate
# Restore (bypass serial protection with -force if reverting)terraform state push -force my_backup.tfstateMethod 1 · Code-Driven Changes (Recommended)
Section titled “Method 1 · Code-Driven Changes (Recommended)”Changes recorded in version control - repeatable, reviewable, safe for module authors.
moved block - tell Terraform a resource was renamed or re-scoped:
moved { from = aws_instance.old_name to = aws_instance.new_name}Safe to leave in permanently - in fresh environments where the old address never existed, Terraform simply ignores it.
removed block - stop managing a resource without destroying it:
removed { from = aws_instance.legacy lifecycle { destroy = false }}Method 2 · CLI-Driven Changes
Section titled “Method 2 · CLI-Driven Changes”Use when code blocks are insufficient.
| Command | Purpose |
|---|---|
terraform state list | List all resource addresses in current state |
terraform state rm <address> | Remove a resource from state without destroying it. ⚠️ Also remove it from config or Terraform will recreate it next run. |
terraform state replace-provider <old> <new> | Update the recorded provider for resources (e.g. when switching to a fork or dev build) |
Method 3 · Manually Editing State (Last Resort)
Section titled “Method 3 · Manually Editing State (Last Resort)”Only attempt if state is corrupted and all other tools have failed.
- Always have a backup first
- Run edits through a JSON validator - state format is unforgiving
- Manually increment the
serialfield by 1 soterraform state pushaccepts it without-force - Diff the file before and after to confirm only the intended changes were made
State Drift
Section titled “State Drift”State drift occurs when infrastructure changes outside of Terraform, causing the real world to diverge from saved state. Terraform detects drift during the refresh phase at the start of every terraform plan.
Before applying a plan that resolves drift, always ask: What changed, and why?
Categories of Drift
Section titled “Categories of Drift”1 · Accidental Manual Changes (Human Error)
Section titled “1 · Accidental Manual Changes (Human Error)”Cause: An engineer modified the wrong resource - wrong account, wrong command.
Fix: Usually safe to simply terraform plan and apply - Terraform will revert the resource back to the desired state.
Prevention: Restrict direct production access; enforce all changes through CI/CD pipelines.
2 · Intentional Manual Changes
Section titled “2 · Intentional Manual Changes”Cause: Emergency intervention - an on-call engineer manually fixes an outage.
Danger: If Terraform runs before the code is updated, it will revert the emergency fix.
Fix: Update the Terraform code to reflect the emergency changes before running Terraform again. Establish a policy that all changes - including emergency ones - are eventually codified.
3 · Conflicting Automated Changes
Section titled “3 · Conflicting Automated Changes”Cause: External systems change managed resources as designed - autoscaling group adjusting instance counts, cloud vendor applying a minor database version upgrade, an orchestrator adding tags.
Fix: Since these are intentional, use the ignore_changes lifecycle rule to tell Terraform to disregard specific attributes:
lifecycle { ignore_changes = [desired_count, tags["LastApplied"]]}For expected automated upgrades (minor DB versions, etc.), a terraform apply -refresh-only harmlessly updates state to match reality without touching the infrastructure.
4 · Terraform Errors (Failed State Write)
Section titled “4 · Terraform Errors (Failed State Write)”Cause: Terraform successfully created or modified infrastructure but crashed or lost backend connectivity before writing the result to state.
Danger: On the next run, Terraform has no record of the new resources and will attempt to create them again - producing duplicate infrastructure or blocked deployments.
Fix:
- Review logs carefully to identify what Terraform created before the failure
- Import orphaned resources:
terraform import <address> <cloud-resource-id> - Or manually delete the cloud resources and let Terraform recreate them cleanly
- If the state file is fully corrupted, restore from a tested backup
Accessing State Across Projects
Section titled “Accessing State Across Projects”As configurations grow, splitting into multiple projects improves plan performance, isolates long-running resources, and aligns with team ownership (Conway’s Law). Cross-project state access lets independent projects share data without duplicating code.
The terraform_remote_state Data Source
Section titled “The terraform_remote_state Data Source”data "terraform_remote_state" "networking" { backend = "s3" config = { bucket = "my-tf-state" key = "networking/terraform.tfstate" region = "us-east-1" }
defaults = { vpc_id = null # soft dependency - project still runs if networking isn't ready yet }}
resource "aws_instance" "app" { subnet_id = data.terraform_remote_state.networking.outputs.private_subnet_id}Key characteristics:
- Built into Terraform - not an external provider
- Read-only - cannot accidentally modify the remote project’s infrastructure
- Root outputs only - can only read outputs defined in the remote project’s root module; child-module outputs must be explicitly re-exported
Alternatives to Remote State
Section titled “Alternatives to Remote State”Before reaching for terraform_remote_state, evaluate these safer options:
| Alternative | How | Trade-off |
|---|---|---|
| Native data sources | Look up physical resources via vendor API (e.g. AWS tags, GCP labels) | Requires strict tagging; doesn’t expose full state |
| Input variables | Pass required values as module inputs | Safer; more manual work to keep values in sync |
Structuring Remote State Access Cleanly
Section titled “Structuring Remote State Access Cleanly”- Top-level injection - place all
terraform_remote_statecalls in the root module only; pass looked-up values down as input variables to child modules. Keeps core modules reusable and free of remote-state logic. - Dedicated lookup modules - for internal organisations, build a module whose sole purpose is to perform a specific remote-state lookup (e.g. a
network-lookupmodule). Encapsulates the access pattern.
State-Only Resources
Section titled “State-Only Resources”Some Terraform resources generate no cloud infrastructure. They calculate values, persist them in state between runs, and act as stable, plan-time-known data sources for other resources.
Why State-Only Resources Exist
Section titled “Why State-Only Resources Exist”Native Terraform functions like timestamp() and uuid() are impure - they return a different value every time they are called. Using them inside resource arguments causes eternal drift: Terraform always detects a change, even immediately after a successful apply. State-only resources solve this by generating the value once, saving it to state, and keeping it stable across plans.
The Random Provider
Section titled “The Random Provider”Generates random passwords, UUIDs, integers, and readable names:
resource "random_password" "db" { length = 24 special = true
keepers = { # Regenerate the password only when the DB instance name changes db_instance = var.db_instance_name }}
resource "aws_db_instance" "main" { password = random_password.db.result}The Time Provider
Section titled “The Time Provider”Avoids drift from the impure timestamp() function:
| Resource | Purpose |
|---|---|
time_static | Records the timestamp of first creation; never changes |
time_offset | Records a time offset (e.g. 2 hours in the future) |
time_rotating | Holds a timestamp that automatically regenerates after a configured lifespan |
time_sleep | Introduces a deliberate delay in the execution graph - useful for waiting for a provisioning script on one machine to finish before launching a dependent resource |
The Null Provider (null_resource)
Section titled “The Null Provider (null_resource)”null_resource fully implements the Terraform lifecycle (create/update/delete) but does absolutely nothing. Primary uses:
- Testing CI/CD pipelines and Terraform automation without spinning up real infrastructure
- Historically: running custom
provisionerblocks
terraform_data (Terraform v1.4+)
Section titled “terraform_data (Terraform v1.4+)”The modern replacement for null_resource, built directly into Terraform - no provider download required.
# Solve the replace_triggered_by limitation: lifecycle rules cannot# natively reference local variables, but they can reference resourcesresource "terraform_data" "trigger" { input = local.config_hash}
resource "aws_instance" "web" { ami = data.aws_ami.ubuntu.id instance_type = "t3.micro"
lifecycle { replace_triggered_by = [terraform_data.trigger] }}Prefer terraform_data over null_resource in all new code.