G
GuideDevOps
Lesson 11 of 11

CI/CD Best Practices

Part of the CI/CD Pipelines tutorial series.

The 10 Commandments of CI/CD

These are the foundational rules that separate amateur pipelines from production-grade ones. Follow these, and your CI/CD will be reliable, fast, and secure.

1.  Build once, deploy everywhere
2.  Everything in code (pipeline as code)
3.  Keep pipelines fast (under 10 minutes)
4.  Fail fast, fail loudly
5.  Never skip security
6.  Immutable artifacts only
7.  Automate everything (except approval)
8.  Monitor deployments actively
9.  Practice rollbacks regularly
10. Secure the pipeline itself

1. Build Once, Deploy Everywhere

The single most important rule of CI/CD. Never rebuild your application for different environments. Build once, then deploy the same artifact to staging, pre-production, and production.

❌ Wrong: Build Per Environment

# BAD: Building separately for each environment
deploy-staging:
  steps:
    - run: npm run build  # Build #1 for staging
    - run: docker build -t myapp:staging .
    - run: kubectl apply ...
 
deploy-production:
  steps:
    - run: npm run build  # Build #2 for production ← DIFFERENT BINARY!
    - run: docker build -t myapp:production .
    - run: kubectl apply ...

Problem: Build #1 and Build #2 might produce different results due to timing, dependency updates, or environment differences.

✅ Correct: Build Once, Deploy Same Artifact

# GOOD: Build once, deploy the exact same artifact everywhere
build:
  steps:
    - run: npm run build
    - run: docker build -t myapp:${{ github.sha }} .
    - run: docker push myapp:${{ github.sha }}
 
deploy-staging:
  needs: build
  steps:
    - run: kubectl set image deployment/myapp myapp=myapp:${{ github.sha }} -n staging
 
deploy-production:
  needs: deploy-staging
  steps:
    - run: kubectl set image deployment/myapp myapp=myapp:${{ github.sha }} -n production
    # ← EXACT same image as staging ✅

Environment-Specific Configuration

Use environment variables or ConfigMaps/Secrets for per-environment configuration — never bake configuration into the artifact:

# Kubernetes ConfigMap per environment
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
  namespace: staging        # or production
data:
  DATABASE_URL: "postgres://staging-db:5432/myapp"
  LOG_LEVEL: "debug"
  API_URL: "https://staging-api.myapp.com"
 
---
# Production uses different values, same image
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
  namespace: production
data:
  DATABASE_URL: "postgres://prod-db:5432/myapp"
  LOG_LEVEL: "warn"
  API_URL: "https://api.myapp.com"

2. Pipeline as Code

Your pipeline definition must live in version control, alongside the application code. Never configure pipelines through a web UI.

Why Pipeline as Code?

Click-Based (UI)Pipeline as Code
Not reproducibleFully reproducible
No history/audit trailGit history shows all changes
Can't code-reviewPull requests for pipeline changes
One person knows the configTeam-visible and documented
Disaster = rebuild from memoryDisaster = git clone + done

File Convention by Tool

GitHub Actions:  .github/workflows/ci.yml
GitLab CI:       .gitlab-ci.yml
Jenkins:         Jenkinsfile
CircleCI:        .circleci/config.yml
Azure Pipelines: azure-pipelines.yml

Organized Workflow Structure

.github/
└── workflows/
    ├── ci.yml              # Lint, build, test (on every PR)
    ├── cd-staging.yml      # Deploy to staging (on develop merge)
    ├── cd-production.yml   # Deploy to production (on main merge)
    ├── security.yml        # Nightly security scans
    ├── cleanup.yml         # Weekly cleanup of old images
    └── dependabot.yml      # Dependency updates

3. Keep Pipelines Fast

Slow pipelines kill productivity. If developers wait 30 minutes for a pipeline, they context-switch, lose focus, and batch changes — leading to bigger, riskier deployments.

Target Timelines

Pipeline StageTargetAction if Exceeded
Lint< 30 secFewer rules or parallel linters
Build< 2 minCaching, multi-stage Docker builds
Unit tests< 2 minParallel test suites
Integration tests< 5 minTests run in parallel
Full pipeline< 10 minOptimize or parallelize
Deploy< 2 minPre-pull images, rolling updates

Speed Optimization Techniques

Cache Dependencies

# GitHub Actions — npm cache
- uses: actions/setup-node@v4
  with:
    node-version: '20'
    cache: 'npm'  # Caches ~/.npm automatically
 
# Docker layer caching
- uses: docker/build-push-action@v5
  with:
    cache-from: type=gha
    cache-to: type=gha,mode=max

Parallelize Independent Jobs

# Run independent jobs simultaneously
jobs:
  lint:       # ┐
    ...       # ├── These 3 run in PARALLEL
  test-unit:  # │
    ...       # │
  security:   # ┘
    ...
 
  deploy:
    needs: [lint, test-unit, security]  # Wait for all 3

Skip Unnecessary Work

on:
  push:
    paths:
      - 'src/**'           # Only run if source code changed
      - 'package.json'
      - 'Dockerfile'
    paths-ignore:
      - '**.md'             # Skip on doc changes
      - '.github/ISSUE_TEMPLATE/**'

Cancel Redundant Runs

# Cancel previous pipeline if new commit pushed
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

4. Fail Fast, Fail Loudly

The cheapest checks should run first. If linting takes 15 seconds and catches a bug, don't waste 5 minutes building before discovering it.

Optimal Stage Order

Stage 1: Lint          (15 sec)  ← Cheapest check first
Stage 2: Type Check    (20 sec)  ← Static analysis
Stage 3: Unit Tests    (45 sec)  ← Fast tests
Stage 4: Build         (90 sec)  ← Compile/package
Stage 5: Integration   (3 min)   ← Heavier tests
Stage 6: Security Scan (2 min)   ← Can run in parallel with Stage 5
Stage 7: E2E Tests     (5 min)   ← Slowest, most expensive
Stage 8: Deploy        (1 min)   ← Only if everything passes

Loud Notifications on Failure

# Always notify on failure
- name: Notify failure
  if: failure()
  run: |
    curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
      -H 'Content-Type: application/json' \
      -d '{
        "text": "❌ Pipeline FAILED",
        "blocks": [
          {
            "type": "section",
            "text": {
              "type": "mrkdwn",
              "text": "❌ *Pipeline Failed*\n*Repo:* ${{ github.repository }}\n*Branch:* ${{ github.ref_name }}\n*Commit:* `${{ github.sha }}`\n*Author:* ${{ github.actor }}\n*Message:* ${{ github.event.head_commit.message }}\n<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View Pipeline>"
            }
          }
        ]
      }'

Set Timeouts on Every Job

jobs:
  build:
    runs-on: ubuntu-latest
    timeout-minutes: 15      # Kill if stuck for 15 min
 
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
 
  deploy:
    runs-on: ubuntu-latest
    timeout-minutes: 5

5. Never Skip Security

Every pipeline must include security scanning. At minimum, scan your dependencies. Ideally, scan your code and containers too.

Minimum Security Pipeline

security:
  runs-on: ubuntu-latest
  steps:
    # 1. Dependency vulnerabilities (MUST HAVE)
    - run: npm audit --audit-level=high
 
    # 2. Secret detection (MUST HAVE)
    - uses: trufflesecurity/trufflehog@main
      with:
        extra_args: --only-verified
 
    # 3. Static code analysis (RECOMMENDED)
    - uses: github/codeql-action/init@v3
      with:
        languages: javascript-typescript
    - uses: github/codeql-action/analyze@v3
 
    # 4. Container scanning (IF using Docker)
    - uses: aquasecurity/trivy-action@master
      with:
        image-ref: myapp:${{ github.sha }}
        severity: 'CRITICAL,HIGH'
        exit-code: '1'

Security Scanning Schedule

Scan TypeFrequencyBlock Deploy?
npm auditEvery pipelineYes (high/critical)
Secret detectionEvery pipelineYes
SAST (CodeQL)Every PRYes (high severity)
Container scanEvery buildYes (critical)
Full DAST scanNightlyNo (creates tickets)
SBOM generationEvery releaseNo (compliance)

6. Immutable Artifacts

Once an artifact is built and tagged, never overwrite it. If you need a fix, build a new artifact with a new tag.

❌ Wrong: Mutable Tags

# DON'T DO THIS
docker build -t myapp:latest .
docker push myapp:latest
# Next day...
docker build -t myapp:latest .  # ← Overwrites yesterday's image!
docker push myapp:latest

✅ Correct: Immutable Tags

# DO THIS
docker build -t myapp:abc123f .    # Tagged with git SHA
docker push myapp:abc123f
# Next day...
docker build -t myapp:def456a .    # New tag, old image preserved
docker push myapp:def456a

Why Immutability Matters

MutableImmutable
myapp:latest could be anythingmyapp:abc123f is always the same
Can't reproduce a deploymentCan always redeploy exact version
Rollback might not workRollback always works
"What version is running?" → "¯\(ツ)/¯""What version is running?" → "abc123f"

7. Automate Everything (Except Approval)

The only human step in your pipeline should be the approval gate before production deployment. Everything else must be automated.

What to Automate

✅ Code linting
✅ Building artifacts
✅ Running all tests
✅ Security scanning
✅ Deploying to staging
✅ Smoke testing staging
✅ Creating release notes
✅ Tagging releases
✅ Notifying team
✅ Monitoring deployment health
✅ Rolling back on failure

What to Keep Manual

👤 Approval to deploy to production
👤 Decision to rollback (unless auto-rollback configured)
👤 Major version releases (marketing coordination)

Automated Release Notes

release:
  runs-on: ubuntu-latest
  if: github.ref == 'refs/heads/main'
  steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
 
    - name: Generate changelog
      id: changelog
      run: |
        PREVIOUS_TAG=$(git describe --tags --abbrev=0 HEAD^ 2>/dev/null || echo "")
        if [ -z "$PREVIOUS_TAG" ]; then
          CHANGES=$(git log --pretty=format:"- %s (%h)" -20)
        else
          CHANGES=$(git log $PREVIOUS_TAG..HEAD --pretty=format:"- %s (%h)")
        fi
        echo "changes<<EOF" >> $GITHUB_OUTPUT
        echo "$CHANGES" >> $GITHUB_OUTPUT
        echo "EOF" >> $GITHUB_OUTPUT
 
    - name: Create GitHub Release
      uses: actions/create-release@v1
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      with:
        tag_name: v${{ github.run_number }}
        release_name: Release v${{ github.run_number }}
        body: |
          ## Changes
          ${{ steps.changelog.outputs.changes }}
 
          ## Deployment Info
          - **Commit:** ${{ github.sha }}
          - **Date:** ${{ github.event.head_commit.timestamp }}
          - **Author:** ${{ github.actor }}

8. Monitor Deployments Actively

Deploying is not the end — it's the beginning of monitoring. Your pipeline should include post-deployment health checks and alerting.

Post-Deployment Health Checks

post-deploy-monitor:
  name: Post-Deployment Monitoring
  runs-on: ubuntu-latest
  needs: deploy-production
 
  steps:
    - name: Wait for app to stabilize
      run: sleep 60
 
    - name: Check application health
      run: |
        for i in $(seq 1 5); do
          STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://myapp.com/health)
          if [ "$STATUS" != "200" ]; then
            echo "❌ Health check failed (attempt $i/5): HTTP $STATUS"
            if [ "$i" = "5" ]; then exit 1; fi
            sleep 10
          else
            echo "✅ Health check passed: HTTP $STATUS"
            break
          fi
        done
 
    - name: Check error rate
      run: |
        # Query Prometheus/Datadog for error rate
        ERROR_RATE=$(curl -s "https://monitoring.myapp.com/api/metrics" \
          -H "Authorization: Bearer ${{ secrets.MONITORING_TOKEN }}" \
          | jq -r '.error_rate')
 
        echo "Error rate: ${ERROR_RATE}%"
 
        if (( $(echo "$ERROR_RATE > 5" | bc -l) )); then
          echo "❌ Error rate too high! Triggering rollback..."
          exit 1
        fi
 
    - name: Check response latency
      run: |
        LATENCY=$(curl -s -o /dev/null -w "%{time_total}" https://myapp.com/api/status)
        echo "Response time: ${LATENCY}s"
 
        if (( $(echo "$LATENCY > 2.0" | bc -l) )); then
          echo "⚠️ Latency elevated: ${LATENCY}s (threshold: 2.0s)"
        fi
 
    - name: Auto-rollback on failure
      if: failure()
      run: |
        echo "🔄 Auto-rolling back production deployment..."
        kubectl rollout undo deployment/myapp -n production
        kubectl rollout status deployment/myapp -n production --timeout=5m
 
        curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
          -d '{"text":"🚨 PRODUCTION AUTO-ROLLBACK triggered after deployment of ${{ github.sha }}"}'

Deployment Dashboard Metrics

MetricWhat to WatchAlert Threshold
HTTP 5xx rateServer errors> 1% of requests
P99 latencySlowest 1% of responses> 2 seconds
Error countTotal errors per minute> 10/min
CPU usageApplication load> 80% for 5 min
Memory usageMemory leaks> 85%
Pod restartsCrash loops> 0 after deploy

9. Practice Rollbacks Regularly

A rollback you've never tested is a rollback that won't work. Practice rolling back regularly so your team is prepared.

Rollback Methods by Strategy

StrategyRollback MethodTime
Rolling Updatekubectl rollout undo30-60 sec
Blue-GreenSwitch service selector5 sec
CanaryScale canary to 010 sec
Feature FlagToggle flag OFFInstant
RecreateRedeploy old version2-5 min

Rollback Pipeline

rollback:
  name: Emergency Rollback
  runs-on: ubuntu-latest
 
  # Manual trigger with version selection
  on:
    workflow_dispatch:
      inputs:
        target_version:
          description: 'Version to rollback to (git SHA or tag)'
          required: true
        reason:
          description: 'Reason for rollback'
          required: true
 
  steps:
    - name: Log rollback
      run: |
        echo "🚨 ROLLBACK INITIATED"
        echo "Target: ${{ github.event.inputs.target_version }}"
        echo "Reason: ${{ github.event.inputs.reason }}"
        echo "By: ${{ github.actor }}"
        echo "Time: $(date -u)"
 
    - name: Rollback deployment
      run: |
        kubectl set image deployment/myapp \
          myapp=myregistry.io/myapp:${{ github.event.inputs.target_version }} \
          -n production
        kubectl rollout status deployment/myapp -n production --timeout=5m
 
    - name: Verify rollback
      run: |
        curl -f https://myapp.com/health
        echo "✅ Rollback successful"
 
    - name: Notify team
      run: |
        curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
          -d "{\"text\":\"🚨 ROLLBACK to ${{ github.event.inputs.target_version }}\nReason: ${{ github.event.inputs.reason }}\nBy: ${{ github.actor }}\"}"
 
    - name: Create incident ticket
      run: |
        curl -X POST https://api.pagerduty.com/incidents \
          -H "Authorization: Token token=${{ secrets.PAGERDUTY_TOKEN }}" \
          -H "Content-Type: application/json" \
          -d "{
            \"incident\": {
              \"title\": \"Production Rollback: ${{ github.event.inputs.reason }}\",
              \"urgency\": \"high\"
            }
          }"

Rollback Drill Schedule

Run a rollback drill monthly to verify your process works:

Monthly Rollback Drill Checklist:
☐ Deploy a known-good older version to production
☐ Verify monitoring detects the version change
☐ Verify alerts fire correctly
☐ Verify the application works after rollback
☐ Measure rollback time (target: under 5 minutes)
☐ Document any issues found
☐ Update runbook if process changed

10. Secure the Pipeline Itself

Your CI/CD pipeline has privileged access to production, registries, and cloud accounts. It's a prime attack target.

Pipeline Security Checklist

Secrets Management

# ✅ GOOD: Use GitHub Secrets
env:
  API_KEY: ${{ secrets.API_KEY }}
 
# ❌ BAD: Hardcoded secrets
env:
  API_KEY: "sk-1234567890abcdef"
 
# ❌ BAD: Secrets in logs
- run: echo "Key is ${{ secrets.API_KEY }}"  # Visible in logs!

Dependency Pinning

# ✅ GOOD: Pin actions to SHA (prevents supply chain attacks)
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11  # v4.1.1
 
# ⚠️ OK: Pin to major version (gets security patches)
- uses: actions/checkout@v4
 
# ❌ BAD: No version pinning
- uses: actions/checkout@main  # Could change at any time!

Least Privilege Permissions

# Set minimum permissions for the workflow
permissions:
  contents: read          # Only read repo contents
  packages: write         # Write to container registry
  id-token: write         # For OIDC authentication to cloud
 
# Even more restrictive per job
jobs:
  build:
    permissions:
      contents: read
  deploy:
    permissions:
      contents: read
      id-token: write     # Only deploy job needs cloud access

Branch Protection

Repository Settings → Branch Protection Rules:

✅ Require pull request before merging
✅ Require status checks to pass (CI pipeline)
✅ Require code review approval (1-2 reviewers)
✅ Dismiss stale reviews on new pushes
✅ Require signed commits
✅ Do not allow bypassing above settings
✅ Restrict force pushes

Self-Hosted Runner Security

# If using self-hosted runners:
jobs:
  build:
    runs-on: [self-hosted, linux, x64]
 
# Security measures:
# ✅ Run in ephemeral containers (clean state per job)
# ✅ Network isolation (no access to other internal systems)
# ✅ Regularly update runner software
# ✅ Monitor runner activity
# ❌ Never run untrusted code on self-hosted runners

Pipeline Anti-Patterns

Anti-Pattern 1: The Monolith Pipeline

# ❌ BAD: One massive job that does everything
jobs:
  everything:
    steps:
      - run: npm ci
      - run: npm run lint
      - run: npm test
      - run: npm run build
      - run: docker build .
      - run: docker push .
      - run: kubectl apply .
      # If ANY step fails, you restart EVERYTHING
# ✅ GOOD: Separate jobs with dependencies
jobs:
  lint:    { ... }
  build:   { needs: lint, ... }
  test:    { needs: build, ... }
  deploy:  { needs: test, ... }
  # If test fails, only re-run from test

Anti-Pattern 2: Manual Steps Disguised as Automation

# ❌ BAD: "Run this script manually after pipeline completes"
# This defeats the purpose of CI/CD
 
# ✅ GOOD: Every step is in the pipeline
deploy:
  steps:
    - run: kubectl apply -f k8s/
    - run: kubectl rollout status deployment/myapp
    - run: bash scripts/smoke-test.sh
    - run: bash scripts/notify-team.sh

Anti-Pattern 3: Ignoring Failures

# ❌ BAD: Ignoring security scan failures
- run: npm audit
  continue-on-error: true  # ← Why even scan?
 
# ✅ GOOD: Block on critical/high
- run: npm audit --audit-level=high
  # Pipeline fails if high/critical vulnerabilities found

Anti-Pattern 4: No Cleanup

# ❌ BAD: Artifacts accumulate forever
# → $500/month in registry storage
 
# ✅ GOOD: Automated cleanup
cleanup:
  schedule:
    - cron: '0 3 * * SUN'
  steps:
    - uses: actions/delete-package-versions@v5
      with:
        min-versions-to-keep: 10
        delete-only-untagged-versions: true

Pipeline Maturity Model

Level 1: Basic (Getting Started)

☐ Code in version control
☐ Pipeline runs on push
☐ Basic lint + unit tests
☐ Manual deployment

Level 2: Standard (Most Teams)

☐ All Level 1 items
☐ Integration tests
☐ Automated staging deployment
☐ Manual production approval
☐ Slack notifications
☐ Test coverage tracking

Level 3: Advanced (Strong DevOps Culture)

☐ All Level 2 items
☐ Security scanning (SAST, SCA, secrets)
☐ Container image scanning
☐ Canary or blue-green deployments
☐ Post-deploy monitoring
☐ Auto-rollback on failure
☐ SBOM generation

Level 4: Elite (Top 10% of Organizations)

☐ All Level 3 items
☐ Feature flags for gradual rollout
☐ Chaos engineering in pipeline
☐ Performance regression testing
☐ Automated rollback drills
☐ Full GitOps workflow
☐ &lt; 15% change failure rate
☐ &lt; 1 hour lead time for changes

Quick Reference: Pipeline Template

# The "Golden Pipeline" — copy and customize
 
name: CI/CD Pipeline
 
on:
  push:
    branches: [main, develop]
    paths-ignore: ['**.md']
  pull_request:
    branches: [main]
 
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true
 
permissions:
  contents: read
  packages: write
 
jobs:
  lint:
    runs-on: ubuntu-latest
    timeout-minutes: 5
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npm run lint
      - run: npx tsc --noEmit
 
  test:
    needs: lint
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npm test -- --coverage --ci
      - uses: codecov/codecov-action@v4
 
  security:
    needs: lint
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
      - run: npm audit --audit-level=high
      - uses: github/codeql-action/init@v3
        with: { languages: javascript-typescript }
      - uses: github/codeql-action/analyze@v3
 
  build:
    needs: [test, security]
    runs-on: ubuntu-latest
    timeout-minutes: 15
    if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop'
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - uses: docker/build-push-action@v5
        with:
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
 
  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/develop'
    runs-on: ubuntu-latest
    timeout-minutes: 5
    environment: staging
    steps:
      - run: echo "Deploy to staging"
      # kubectl set image ...
 
  deploy-production:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    timeout-minutes: 5
    environment: production
    steps:
      - run: echo "Deploy to production"
      # kubectl set image ...
 
  notify:
    needs: [deploy-staging, deploy-production]
    if: always()
    runs-on: ubuntu-latest
    steps:
      - run: |
          curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
            -d '{"text":"Pipeline complete: ${{ needs.deploy-production.result || needs.deploy-staging.result }}"}'