G
GuideDevOps
Lesson 14 of 14

Terraform Best Practices

Part of the Terraform tutorial series.

Foundational Practices

1. Remote State is Mandatory

DON'T: Store state locally in production

# ❌ BAD: Local state only
# terraform.tfstate in working directory

DO: Use remote, shared backends

# ✅ GOOD: Remote S3 backend
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

Why: Team collaboration, locking, backup, security, audit trail

2. Version Everything

DON'T: Use latest versions

# ❌ BAD: Unpinned versions
terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
  }
}

DO: Pin major and minor versions

# ✅ GOOD: Locked versions
terraform {
  required_version = "~> 1.5.0"
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.20"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.23"
    }
  }
}

Why: Prevents breaking changes in production

3. Separate Concerns

DON'T: Build everything in one folder

Networking/ + Compute/ + Database/ = terraform/
(Monolithic root module - HARD TO MANAGE)

DO: Organize by logical layers

terraform/
  ├─ core/              # Networking, VPC, subnets
  ├─ database/          # RDS, data layer
  ├─ compute/           # EC2, load balancers
  └─ monitoring/        # CloudWatch, alerts

Why: Easy to modify, test, and deploy independently

4. Use Meaningful Naming

ItemNamingExample
Resourcesdescriptive_typeaws_security_group_web
Variablessnake_caseinstance_type
Outputsnoun_describing_valuealb_dns_name
Local namesuse_typesecurity_group_web
Modulesdomain_functionalitynetworking_vpc

Code Organization

5. Follow Standard File Structure

Recommended layout:

project/
  ├─ main.tf           # Primary resources
  ├─ variables.tf      # Variable declarations
  ├─ outputs.tf        # Output definitions
  ├─ locals.tf         # Local values
  ├─ data.tf           # Data sources
  ├─ terraform.tf      # Provider and backend
  ├─ locals_override.tf # Local overrides (git-ignored)
  └─ module/           # Reusable modules
      └─ vpc/
          ├─ main.tf
          ├─ variables.tf
          └─ outputs.tf

Why: Team members immediately know where to find things

6. Use Modules for Reusability

DON'T: Copy-paste resource definitions

# ❌ BAD: Duplicated in dev, staging, prod
resource "aws_security_group" "web_dev" {
  # 50 lines of config
}
resource "aws_security_group" "web_staging" {
  # 50 lines identical config
}
resource "aws_security_group" "web_prod" {
  # 50 lines identical config
}

DO: Create a reusable module

# ✅ GOOD: Single module for all environments
module "web_sg" {
  source = "./modules/security_group"
  
  for_each = toset(["dev", "staging", "prod"])
  
  environment   = each.value
  vpc_id        = var.vpc_ids[each.value]
  ingress_ports = var.ingress_ports
}

7. Document Everything

variable "instance_type" {
  description = "EC2 instance type for web servers"
  type        = string
  default     = "t3.micro"
  
  # Document why this default exists
  # and what each type means for performance/cost
}
 
output "alb_dns_name" {
  description = "DNS name of Application Load Balancer"
  value       = aws_lb.main.dns_name
  
  # Explain what this URL is used for
  # and how to access the application
}

Input Validation & Safety

8. Use Variable Validation

variable "environment" {
  type    = string
  
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}
 
variable "instance_count" {
  type    = number
  
  validation {
    condition     = var.instance_count >= 1 && var.instance_count <= 100
    error_message = "Instance count must be between 1 and 100."
  }
}

Why: Catch errors early before infrastructure changes

9. Use Sensitive Data Properly

# ❌ WRONG: Hardcoded secrets
variable "db_password" {
  type    = string
  default = "mypassword123"  # NEVER EVER
}
 
# ✅ RIGHT: Use AWS Secrets Manager
data "aws_secretsmanager_secret_version" "db" {
  secret_id = "rds/prod/password"
}
 
resource "aws_db_instance" "main" {
  password = data.aws_secretsmanager_secret_version.db.secret_string
}

Why: Never expose secrets in code or logs

10. Use -lock and terraform state lock

# Prevent others from modifying during apply
terraform apply -lock=true
 
# Fail immediately if someone else has lock (in automation)
terraform apply -lock-timeout=30s

Deployment Practices

11. Always Run Plan First

# Read-only operation shows what will change
terraform plan -out=tfplan
 
# Review changes carefully
 
# Then apply only approved changes
terraform apply tfplan

Why: Prevents surprises in production

12. Use Workspaces for Environments

DON'T: Separate folders

cd terraform-dev   && terraform apply  # Dev environment
cd ../terraform-staging && terraform apply  # Staging
cd ../terraform-prod && terraform apply  # Production

DO: Use workspaces with single codebase

terraform workspace select dev      && terraform apply
terraform workspace select staging  && terraform apply
terraform workspace select prod     && terraform apply
 
# Variables differ per workspace:
# terraform.dev.tfvars, terraform.staging.tfvars, terraform.prod.tfvars

13. Automate with CI/CD

# .github/workflows/terraform.yml
name: Terraform
 
on: [pull_request, push]
 
jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: hashicorp/setup-terraform@v2
      - run: terraform fmt -check
      - run: terraform plan -out=tfplan
      - name: Comment on PR
        run: |
          terraform show tfplan > /tmp/plan.txt
          # Use GitHub API to post comment with plan

Code Quality

14. Format Code Consistently

# Auto-format all .tf files
terraform fmt -recursive
 
# Check if formatting is correct (for CI/CD)
terraform fmt -recursive -check

Why: Consistency across team, easier reviews

15. Validate Syntax

# Check syntax without applying
terraform validate
 
# Good for CI/CD pipelines

16. Use Linting Tools

# Install tflint
brew install tflint
 
# Check for issues
tflint

Common rules:

  • Naming conventions
  • Unused variables
  • Deprecated attributes
  • Best practice violations

17. Security Scanning

# Install tfsec
brew install tfsec
 
# Find security issues
tfsec .
 
# Example: Unencrypted S3, exposed RDS

State Management

18. Protect State Files

# Enable encryption
terraform {
  backend "s3" {
    encrypt = true
    kms_key_id = "arn:aws:kms:region:account:key/id"
  }
}
 
# Block public access
aws s3api put-public-access-block \
  --bucket terraform-state \
  --public-access-block-configuration \
  "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

19. Regular State Backups

# Manual backup
terraform state pull > terraform.tfstate.backup
 
# Improve S3 with versioning
aws s3api put-bucket-versioning \
  --bucket terraform-state \
  --versioning-configuration Status=Enabled

20. Never Edit State Directly

# ❌ WRONG
vi terraform.tfstate
# Then upload somehow
 
# ✅ RIGHT: Use Terraform
terraform state rm old_resource
terraform state mv old_name new_name
terraform import aws_instance.web i-1234567890abcdef0

Why: State corruption = infrastructure corruption


Scaling Practices

21. Use Variables for Configuration

DON'T: Hardcode values

resource "aws_instance" "web" {
  instance_type = "t2.large"      # Hardcoded
  availability_zone = "us-east-1a" # Hardcoded
}

DO: Use variables

resource "aws_instance" "web" {
  instance_type      = var.instance_type
  availability_zone  = var.availability_zone
}

22. Use Locals for Computed Values

locals {
  common_tags = {
    Environment = var.environment
    ManagedBy   = "Terraform"
    CostCenter  = var.cost_center
  }
  
  name_prefix = "${var.project_name}-${var.environment}"
}
 
resource "aws_instance" "web" {
  tags = merge(
    local.common_tags,
    { Name = "${local.name_prefix}-web-server" }
  )
}

23. Use count and for_each Wisely

# count: When number is known
resource "aws_instance" "web" {
  count         = var.instance_count
  instance_type = var.instance_type
  
  tags = { Name = "web-${count.index + 1}" }
}
 
# for_each: When using map with different configs
resource "aws_instance" "servers" {
  for_each      = var.server_configs
  instance_type = each.value.type
  availability_zone = each.value.az
  
  tags = { Name = each.key }
}

Performance & Costs

24. Use data sources to reference existing infrastructure

# DON'T hardcode IDs
# DO fetch them
data "aws_vpc" "main" {
  tags = { Name = "production" }
}
 
resource "aws_subnet" "app" {
  vpc_id = data.aws_vpc.main.id
}

25. Avoid Common Mistakes

MistakeProblemSolution
Modifying state directlyInconsistencyUse Terraform operations
Not using remote stateNo collaborationSet up S3 + DynamoDB
Hardcoding valuesNon-reusableExtract to variables
No variable validationBad inputsAdd validation blocks
Ignoring plan outputSurprisesAlways review apply
Not versioning providersBreaking changesPin versions
Large root modulesHard to manageUse modules
Secrets in codeSecurity breachUse AWS Secrets Manager

Team Workflow

26. Pull Request Process

1. Developer creates feature branch
2. Makes Terraform changes
3. Runs: terraform fmt, validate, plan
4. Creates PR with plan output
5. Team reviews: terraform show
6. Approves if safe
7. Merge to main
8. CI/CD automatically applies to staging
9. Manual approval for production apply

27. Documentation as Code

# README.md in each module explains:
# - What this module does
# - Input variables
# - Outputs
# - Example usage
# - Common modifications
 
# Provided in every module directory

Summary

PriorityPracticeImpact
CriticalRemote stateTeam safety
CriticalVersion pinningStability
CriticalValidationError prevention
HighModular structureMaintainability
HighDocumentationTeam productivity
MediumCode formattingConsistency
MediumCI/CD automationReliability
MediumNaming conventionsCode clarity