Skip to main content
All Posts

Terraform Remote State — Why It Matters and What Breaks Without It

5 min read
TerraformIaCDevOpsAWSAzureState Management
Terraform Remote State — Why It Matters and What Breaks Without It

Local Terraform state works fine until it doesn't. Three failure modes of keeping state on your machine, and what remote state with locking actually gives you.

When I set up the GitHub Actions CI pipeline for the Azure Terraform project, the obvious next step was to run terraform plan in CI — validate changes before they merge, catch drift early. Then I hit the question that remote state answers: where does CI get the state file?

The state file on my machine was the entire record of what Terraform had built. Without it, terraform plan in CI would conclude that nothing existed and plan to create everything from scratch. Run that plan unchecked, push to production, and terraform apply would attempt to rebuild a complete duplicate environment — or fail on naming conflicts — while the real infrastructure sat there, unrecognised.

That is when local state stops being good enough.

What state actually is

Terraform does not query cloud APIs to discover what it built. It tracks everything in a state file — resource IDs, configuration metadata, and the dependency graph for every managed resource. When you run terraform plan, Terraform reads state to understand what currently exists, then compares it against your configuration to calculate what needs to change.

The state file is the source of truth. If it says a resource does not exist, Terraform will try to create it. If it says a resource exists but your configuration no longer includes it, Terraform will destroy it. State being wrong is operationally indistinguishable from the infrastructure being wrong.

Three things that break with local state

The state file disappears. Your machine is replaced, the disk corrupts, or the wrong rm runs in the project directory. The state file is gone. Terraform no longer knows about the VPC, subnets, EC2 instances, or RDS cluster it provisioned. Your next terraform plan shows a blank slate — all creates, nothing to update.

From here you have two options. Import every managed resource back into state manually using terraform import, once per resource, looking up each cloud resource ID as you go. Or tear down the environment entirely and rebuild. Neither is fast. A three-tier AWS stack with eight or nine resource types is an afternoon of recovery work, minimum.

State drifts across machines. You apply changes from your laptop. A colleague applies from theirs. Both have local state files. Neither is the same file.

Each terraform apply reads the local state and writes back to it after the run. Your state reflects what you built. Theirs reflects what they built. The first person to apply on a given resource type owns the current state going forward. The other person's state is now stale. Their next terraform plan will show changes that do not exist, or miss changes that do. Eventually someone applies on stale state: duplicates appear, or resources that were added get flagged for destruction.

Concurrent applies without a lock. Two terraform apply operations running simultaneously against the same infrastructure — from two developer machines, or from a CI pipeline and a local terminal — will both read state at the same moment. Each makes its changes. Each writes its version of state at the end. Last writer wins. The first apply's state changes are overwritten. Resources it built are now live in the infrastructure but absent from the state file that the second apply wrote.

That is the failure mode that destroys things without an obvious error: not the apply that crashes, but the apply that succeeds and silently orphans resources in the process.

What remote state gives you

Remote state moves the state file to a shared backend — S3 on AWS, Azure Blob Storage on Azure — so every Terraform operation reads and writes from the same source, regardless of which machine or pipeline runs it.

S3 with DynamoDB state locking is the standard AWS pattern:

terraform {
  backend "s3" {
    bucket         = "my-project-tfstate"
    key            = "production/terraform.tfstate"
    region         = "eu-west-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
  }
}

Before any terraform apply or terraform plan begins, Terraform writes a lock entry to the DynamoDB table. No other Terraform operation can run until the lock is released. If a CI job crashes mid-apply, the lock persists — you clear it manually and investigate rather than discovering the consequences of two overlapping state writes later.

The state file itself is encrypted at rest in S3. Outputs that surface sensitive values — database passwords, connection strings, generated private keys — stay encrypted in the bucket rather than sitting in a local file that gets excluded from git but otherwise has no access controls.

The Azure equivalent uses Azure Blob Storage with blob lease locking:

terraform {
  backend "azurerm" {
    resource_group_name  = "rg-tfstate"
    storage_account_name = "myprojecttfstate"
    container_name       = "tfstate"
    key                  = "production.tfstate"
  }
}

Azure applies a lease lock automatically when any Terraform operation begins. No additional locking resource is needed.

The takeaway

Local state is fine for a proof of concept running on one machine. The moment CI touches your infrastructure — or a second person runs terraform apply — local state has already failed. You just have not seen the consequences yet. Remote state with locking is what makes terraform apply safe to use in any workflow beyond solo experimentation.