Case study: Modernising an AWS estate with Claude Code

Active period: 2026  |  Role: Architect / engineer  |  Stack: Claude Code, Terraform, AWS, WordPress

An estate modernisation carried out on my own infrastructure – including this site – using Claude Code as the working tool throughout. A useful demonstration because it is real, end-to-end, and entirely client-safe: the only estate involved is mine.

The problem

A three-repository Terraform platform – templates, reusable modules, and environment parameters – managing AWS infrastructure across two accounts had drifted behind current versions across the board: an ageing operating system, an end-of-life language runtime, an out-of-date CMS, and Terraform state files still in a format written by Terraform 0.11 back in 2019. The work was to bring every layer up to current, supported versions without losing state, breaking infrastructure, or taking services down, and to do it methodically rather than as a risky big-bang upgrade.

In addition, there are 6 separate WordPress deployments on a single host, each in its own deployment path. They all had to stay functional throughout and receive the necessary upgrades and configuration changes.

One architectural detail shaped the whole job: each template stack’s provider configuration is a symlink back to a single canonical file, so the provider and Terraform version constraints live in exactly one place and update everywhere at once.

What was upgraded

  • Terraform state – format version 3 to version 4 across all state files
  • AWS provider – 5.x to 6.0
  • Terraform CLI constraint – an open-ended >= 1.9.0 to a pinned ~> 1.15.0
  • Operating system – Amazon Linux 2 to Amazon Linux 2023
  • Language runtime – PHP 7.3 to 8.3
  • CMS – WordPress 6 to 7, including themes and plugins

How Claude Code was used

Claude Code did the heavy lifting of reading the existing Terraform, identifying deprecated patterns, and proposing the upgrade path, with me reviewing and directing every change. The value was less in writing code from scratch and more in working through a large, interdependent estate quickly and consistently – catching the knock-on effects of a runtime bump on the OS, the CMS, and the IaC at the same time. The honest account of where that worked and where it did not is at the end of this post.

Sequencing

  • State and IaC first, to bring the deployment process onto the latest tools. The older OS and WordPress installs were still functional, so the IaC could be modernised while everything stayed running.
  • AMI creation with Packer next, to bake in the tooling and the newer PHP version – covered in detail in Building a Packer AMI for use with an Auto Scaling Group.
  • Spin the new AMI into the working site to confirm that the websites stayed functional.
  • Upgrade the WordPress installs to the latest core, themes and plugins.

The hard part: migrating Terraform state from v3 to v4

This was the most technically involved part of the whole exercise. All state files were still at format version 3, originally created by Terraform 0.11 in 2019-2020 – 53 of them in S3 across the accounts. Everything had to migrate to format version 4 before the modern provider stack would accept it.

A Python and boto3 script migrated the state files in place and was extended to batch-update the DynamoDB digest entries in the same pass. The migration surfaced a sequence of distinct failure modes, each of which had to be understood rather than worked around:

  • DynamoDB checksum errors. The state-lock table stores an MD5 digest of each state file; patching a file on S3 invalidates the stored digest, so every terraform plan failed with state data in S3 does not have the expected content. The fix pattern was: patch state, recalculate MD5, update the DynamoDB digest – for every altered file.
  • Output type corruption. The migration script had hardcoded every output as a string type. Two outputs were actually lists, which surfaced as invalid value saved in state at plan time. The fix corrected the type metadata and unwrapped a double-nested value, followed by another digest update. (This was a bug in the generated script – see the honest assessment.)
  • DynamoDB locking to S3 native locking. Terraform 1.10+ deprecates dynamodb_table in the S3 backend in favour of use_lockfile = true. Changing it triggered Backend configuration changed on every initialised workspace. The correct remedy was terraform init -reconfigure, not -migrate-state – the state was not moving, only the lock mechanism.
  • Strict backend schema validation. A role_arn argument at the top level of two terraform_remote_state blocks had been silently ignored by older Terraform; 1.10+ validates strictly and rejected it. Removal was the fix. Separately, a deprecated acl argument in the S3 backend had to go.
  • The moved-block dead end. A prior rename of aws_alb* resources to aws_lb* had been handled with moved blocks. These turned out to be invalid: the AWS provider blocks cross-type moved blocks for these aliases, treating them as genuinely different resource types. Rather than attempt state surgery on production, the rename was reverted entirely.

AWS Provider 6.0, Terraform 1.15 pin

The AWS provider jump from 5.x to 6.0 was, pleasingly, almost entirely mechanical. An upfront audit of every resource against the v6 upgrade guide found the codebase already clean: the breaking changes (the aws_ami owners requirement, the removed aws_eip vpc argument, dropped OpsWorks/SimpleDB resources, and so on) either were not used or had been addressed in earlier work. Two patterns the scan flagged turned out to be false alarms on closer reading. So the change came down to version constraints: one edit to the canonical provider file via the symlink, and a single find | xargs sed pass across all 34 modules.

The Terraform CLI constraint was tightened from an open-ended >= 1.9.0 to a pinned ~> 1.15.0. The deliberate choice was the patch-level pin over the looser ~> 1.15: if the toolchain runs 1.15.4, the constraint should say so, making it a statement of intent rather than an open door to silent drift.

Three things bit during these otherwise mechanical changes, all of them pre-existing problems the upgrade simply exposed:

  • A variable "depends_on" block – dead, unreachable code, since depends_on is a reserved meta-argument – that had survived undetected because terraform validate was not previously enforced at commit time. The pre-commit hook caught it.
  • A stale .terraform.lock.hcl pinned to an old provider version, force-added to git long ago against the gitignore rule. Invisible locally, fully visible to CI on a clean checkout, where it broke the pipeline. Deleting it fixed it.
  • The terraform-docs pre-commit hook regenerates each module README on a constraint change and rejects the commit, so you can stage and recommit. Across 34 modules, that is a predictable two-pass cycle, not a fault – worth knowing as the standard workflow.

PHP 7.3 to 8.3: WordPress compatibility

The OS and runtime move together – Amazon Linux 2023 does not ship PHP 7.x – so the two are inseparable. The mechanics of installing and baking the new runtime belong in a separate post detailing the conversion of an existing EC2 instance into an AWS AMI using Packer – what matters here is the application impact, which was reassuringly small. WordPress 6.x supports PHP 8.3, and the site configurations carried across cleanly once the web server was pointed at the new PHP-FPM socket. The one caveat worth flagging for anyone doing the same: PHP 8.x is markedly stricter than 7.x about deprecated function usage, so plugin compatibility is worth checking before cutover, particularly for older or less-maintained plugins.

WordPress upgrades via wp-cli

The web-based updater proved unreliable due to performance and timeout issues, so the WordPress 6 to 7 upgrades were driven through wp-cli. One wrinkle: a version check reported 6.5.8 as the latest available when it was not, so the upgrade needed wp core update --version=7.0.0 to bypass the stale check.

The approach here was to update one of the less important sites to WordPress 7.0.0, then run the database updates, and finally update any themes and plugins. This allowed me to verify that the process was smooth and that there were no real issues during the update. Fortunately, the update went smoothly on the test installation, and all themes and plugins were updated without issue.

That gave me confidence to move on to the next site, and the next, until they were all complete. Fortunately, this took only a few hours, and each site was tested post-upgrade to ensure there were no serious issues.

Validation at each step

  • Terraform – all infrastructure is managed by Terraform, so each change had to come back as “no change” unless it was minor and non-functional.
  • New AMI – add to the build, roll out, and confirm that all existing websites remain functional before proceeding.
  • WordPress – upgrade one less-critical deployment first, review the process, then upgrade the rest.

A performance win along the way

Moving the backing EFS volume from bursting to elastic throughput delivered a substantial performance improvement across all the sites. The new AMI also took the opportunity to move from x86_64 to arm64 (AWS Graviton), which, on Amazon Linux 2023, is a clean swap for a PHP-FPM stack and brought more memory headroom for OPcache at a marginal cost difference.

The performance improvement benefits from the elastic EFS mode came at a cost, though. The elastic mode is significantly more expensive than bursting mode, so I decided to move as much content off the EFS volume as possible, and revert to bursting mode – more about that in a later post, as it deserves to be looked at in its own right.

A security review pass

While the estate was open, an autonomous security review read across 70+ files and produced two dozen findings spanning several severity levels – CI/CD configuration, IAM scoping, encryption defaults and a few legacy script habits. Surfacing that breadth by hand would have taken hours; several of the findings were the non-obvious kind that are easy to miss on a manual pass. The findings are tracked and being remediated separately, and are not detailed here.

By the numbers

  • 53 state files migrated from format v3 to v4
  • 27 resource stacks cycled through init, plan and apply across both accounts
  • 34 modules updated for the provider and Terraform constraints; roughly 69 files touched per upgrade in a tight two-commit pattern
  • A conservative 2 to 3 days of focused, Claude-assisted sessions for the full arc, from 0.11-era state through provider 6.0 and Terraform 1.15

Honest assessment: where Claude Code helped, and where it did not

The point of writing this up is not to claim the tool did the work. It is to be precise about where it added value and where judgement and correction were still required.

Where it clearly accelerated things: batch operations across many files (the 34-module constraint update and the stage-and-recommit cycle for regenerated docs); the breadth of the security review; writing, debugging and extending the state-migration script; inferring the symlink architecture without being told; and running the init/plan loop across all 27 stacks, identifying each distinct failure mode and fixing it without hand-holding between stacks.

Where corrections were needed: it tried to infer credentials and the right execution profile when I was already handling that, overcomplicating something simple. It suggested -migrate-state when the lock-mechanism change only needed -reconfigure. Its own migration script introduced the output-type bug that only surfaced a step later at plan time. It wrote moved blocks for the ALB aliases before discovering they were invalid. And it occasionally misread a missing local binary, or expected a mid-session settings change to take effect without a restart.

The pattern is consistent: the corrections cluster around the tool trying to solve things it should have deferred to me (credentials, environment state, tool availability), and around bugs in its own generated code that only surface one step downstream. The acceleration is clearest on repetitive multi-file operations and on autonomous research where breadth matters more than judgement. Used with that understanding – and with every change reviewed before it lands – it is a genuine force multiplier on exactly this kind of large, stateful, interdependent migration.

Outcome

Every layer of the estate is now on current, supported versions, with the Terraform codebase clean against current conventions, state migrated to v4, and no loss of state along the way. The deployment runs on a reproducible, Packer-built arm64 image, and a backlog of latent issues – dead code, a stale lock file, schema violations older Terraform had ignored – was cleared out as a side effect of doing the job properly.

The transferable lesson: an AI coding tool earns its place on a real, stateful production estate not by running unsupervised, but by accelerating the breadth-heavy and repetitive work while you keep judgement, sequencing and the final review firmly in human hands.

Leave a Reply