#claude-code#devops#infrastructure#ci-cd#terraform#docker#tutorial

Claude Code for DevOps: Write Terraform, GitHub Actions, and Docker Configs in Minutes

DevOps is YAML, shell scripts, and config files — Claude Code's native language. Real examples: Terraform modules, GitHub Actions pipelines, Docker multi-stage builds, and incident runbooks.

AI Builder ClubApril 12, 20264 min read

DevOps work is 90% config files, scripts, and YAML. Claude Code was born for this.

The pattern is always the same: you know what you want the system to do, but translating that into the exact Terraform syntax, the right GitHub Actions workflow, or the correct Docker multi-stage build takes 30 minutes of reading docs. Claude Code reads the docs faster than you and has seen thousands of production configs.


Use Case 1: Terraform Modules from Scratch

Create a Terraform module for our production infrastructure on AWS:

1. VPC with public and private subnets across 3 AZs
2. ECS Fargate cluster for our Next.js app
   - Service with auto-scaling (min 2, max 10 tasks, scale on CPU > 70%)
   - ALB with HTTPS listener (ACM certificate)
   - Health check on /api/health
3. RDS PostgreSQL 15 in private subnet
   - Multi-AZ, db.t3.medium, 100GB gp3 storage
   - Automated backups, 7-day retention
4. ElastiCache Redis for session storage
5. S3 bucket for file uploads with CloudFront CDN

Use separate files: main.tf, variables.tf, outputs.tf for each module.
Modules in modules/ directory (vpc, ecs, rds, redis, cdn).
Tag everything with: Environment, Project, ManagedBy=terraform.

Time saved: A full production Terraform setup is a 1-2 day task. Claude Code generates it in minutes.


Use Case 2: GitHub Actions CI/CD Pipeline

Create a GitHub Actions workflow at .github/workflows/deploy.yml:

On push to main:
1. Run TypeScript type checking
2. Run ESLint
3. Run the test suite (Jest)
4. Build the Next.js app
5. If all pass, deploy to Vercel production
6. After deploy, run a smoke test (curl the /api/health endpoint, expect 200)
7. If smoke test fails, automatically rollback the Vercel deployment

On pull request:
1. Run steps 1-4 (no deploy)
2. Post a comment on the PR with the build status and test coverage
3. Deploy a Vercel preview and post the preview URL as a PR comment

Use caching for node_modules, concurrency groups so multiple pushes
don't deploy simultaneously, and environment secrets for VERCEL_TOKEN.

Use Case 3: Docker Multi-Stage Build Optimization

Our Dockerfile builds a Next.js app but the image is 1.2GB and
takes 8 minutes to build. Optimize it.

Goals:
- Final image under 200MB
- Build time under 3 minutes with warm cache
- Use multi-stage build (deps stage, build stage, runner stage)
- Runner stage should use node:20-alpine
- Only copy production artifacts to the final stage
- Add proper .dockerignore and health check instruction
- Pin all base image versions by SHA digest for reproducibility
- Don't run as root in the final stage

Use Case 4: Monitoring and Alerting Setup

Set up monitoring for our Next.js app deployed on Vercel:

1. Create a lib/monitoring.ts module that wraps logging and metrics:
   - Structured JSON logging (not console.log)
   - Request duration tracking for all API routes
   - Error rate tracking with stack traces
   - Custom business metrics (signups, purchases, api_calls)

2. Create an API route app/api/health/route.ts:
   - Check database connectivity (Supabase query)
   - Check Stripe API reachability
   - Return 200 with status of each dependency, or 503 if any is down

3. Create alert rules:
   - Error rate > 5% for 5 minutes → Slack alert
   - P99 latency > 3s for 10 minutes → Slack alert
   - Health check down for 2 minutes → PagerDuty alert

Use Case 5: Incident Runbooks

Create incident runbooks in docs/runbooks/ for our most common incidents:

1. database-connection-exhausted.md
   - Symptoms, diagnosis steps, resolution, prevention

2. stripe-webhook-failures.md
   - Symptoms, diagnosis, resolution, prevention

3. deployment-rollback.md
   - When to rollback, steps, post-rollback actions

4. high-latency.md
   - Symptoms, diagnosis, resolution by cause

Each runbook should follow the same template: Severity, Symptoms,
Diagnosis, Resolution, Prevention, Escalation contacts.

Why this works: Nobody writes runbooks until after an incident. Claude Code generates thorough, well-structured runbooks from your architecture description.


DevOps CLAUDE.md Template

# CLAUDE.md

## Infrastructure
AWS: ECS Fargate, RDS PostgreSQL, ElastiCache, S3/CloudFront.
Deployment: Vercel (app), Terraform (infra).
CI/CD: GitHub Actions. Monitoring: Grafana Cloud + structured logging.

## Conventions
- Terraform: modules in modules/, environments in envs/
- Docker: multi-stage builds, alpine base, non-root user
- GitHub Actions: reusable workflows in .github/workflows/
- Scripts: bash with set -euo pipefail, shellcheck compliant
- Secrets: never in code, always in environment variables

## Don'ts
- Never hardcode AWS credentials or API keys
- Don't modify production Terraform state manually
- No latest tags for Docker images — always pin versions

If you're using Claude Code for infrastructure and want to share patterns with other DevOps engineers, join AI Builder Club. We discuss IaC patterns, CI/CD optimization, and real production setups.

Get the free AI Builder Newsletter

Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.

No spam. Unsubscribe anytime.

Go deeper with AI Builder Club

Join 1,000+ ambitious professionals and builders learning to use AI at work.

  • Expert-led courses on Cursor, MCP, AI agents, and more
  • Weekly live workshops with industry builders
  • Private community for feedback, collaboration, and accountability