CI/CD Best Practices - CI/CD Pipelines

The 10 Commandments of CI/CD

These are the foundational rules that separate amateur pipelines from production-grade ones. Follow these, and your CI/CD will be reliable, fast, and secure.

1.  Build once, deploy everywhere
2.  Everything in code (pipeline as code)
3.  Keep pipelines fast (under 10 minutes)
4.  Fail fast, fail loudly
5.  Never skip security
6.  Immutable artifacts only
7.  Automate everything (except approval)
8.  Monitor deployments actively
9.  Practice rollbacks regularly
10. Secure the pipeline itself

1. Build Once, Deploy Everywhere

The single most important rule of CI/CD. Never rebuild your application for different environments. Build once, then deploy the same artifact to staging, pre-production, and production.

❌ Wrong: Build Per Environment

# BAD: Building separately for each environment
deploy-staging:
  steps:
    - run: npm run build  # Build #1 for staging
    - run: docker build -t myapp:staging .
    - run: kubectl apply ...
 
deploy-production:
  steps:
    - run: npm run build  # Build #2 for production ← DIFFERENT BINARY!
    - run: docker build -t myapp:production .
    - run: kubectl apply ...

Problem: Build #1 and Build #2 might produce different results due to timing, dependency updates, or environment differences.

✅ Correct: Build Once, Deploy Same Artifact

# GOOD: Build once, deploy the exact same artifact everywhere
build:
  steps:
    - run: npm run build
    - run: docker build -t myapp:${{ github.sha }} .
    - run: docker push myapp:${{ github.sha }}
 
deploy-staging:
  needs: build
  steps:
    - run: kubectl set image deployment/myapp myapp=myapp:${{ github.sha }} -n staging
 
deploy-production:
  needs: deploy-staging
  steps:
    - run: kubectl set image deployment/myapp myapp=myapp:${{ github.sha }} -n production
    # ← EXACT same image as staging ✅

Environment-Specific Configuration

Use environment variables or ConfigMaps/Secrets for per-environment configuration — never bake configuration into the artifact:

# Kubernetes ConfigMap per environment
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
  namespace: staging        # or production
data:
  DATABASE_URL: "postgres://staging-db:5432/myapp"
  LOG_LEVEL: "debug"
  API_URL: "https://staging-api.myapp.com"
 
---
# Production uses different values, same image
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
  namespace: production
data:
  DATABASE_URL: "postgres://prod-db:5432/myapp"
  LOG_LEVEL: "warn"
  API_URL: "https://api.myapp.com"

2. Pipeline as Code

Your pipeline definition must live in version control, alongside the application code. Never configure pipelines through a web UI.

Why Pipeline as Code?

Click-Based (UI)	Pipeline as Code
Not reproducible	Fully reproducible
No history/audit trail	Git history shows all changes
Can't code-review	Pull requests for pipeline changes
One person knows the config	Team-visible and documented
Disaster = rebuild from memory	Disaster = `git clone` + done

File Convention by Tool

GitHub Actions:  .github/workflows/ci.yml
GitLab CI:       .gitlab-ci.yml
Jenkins:         Jenkinsfile
CircleCI:        .circleci/config.yml
Azure Pipelines: azure-pipelines.yml

Organized Workflow Structure

.github/
└── workflows/
    ├── ci.yml              # Lint, build, test (on every PR)
    ├── cd-staging.yml      # Deploy to staging (on develop merge)
    ├── cd-production.yml   # Deploy to production (on main merge)
    ├── security.yml        # Nightly security scans
    ├── cleanup.yml         # Weekly cleanup of old images
    └── dependabot.yml      # Dependency updates

3. Keep Pipelines Fast

Slow pipelines kill productivity. If developers wait 30 minutes for a pipeline, they context-switch, lose focus, and batch changes — leading to bigger, riskier deployments.

Target Timelines

Pipeline Stage	Target	Action if Exceeded
Lint	< 30 sec	Fewer rules or parallel linters
Build	< 2 min	Caching, multi-stage Docker builds
Unit tests	< 2 min	Parallel test suites
Integration tests	< 5 min	Tests run in parallel
Full pipeline	< 10 min	Optimize or parallelize
Deploy	< 2 min	Pre-pull images, rolling updates

Speed Optimization Techniques

Cache Dependencies

# GitHub Actions — npm cache
- uses: actions/setup-node@v4
  with:
    node-version: '20'
    cache: 'npm'  # Caches ~/.npm automatically
 
# Docker layer caching
- uses: docker/build-push-action@v5
  with:
    cache-from: type=gha
    cache-to: type=gha,mode=max

Parallelize Independent Jobs

# Run independent jobs simultaneously
jobs:
  lint:       # ┐
    ...       # ├── These 3 run in PARALLEL
  test-unit:  # │
    ...       # │
  security:   # ┘
    ...
 
  deploy:
    needs: [lint, test-unit, security]  # Wait for all 3

Skip Unnecessary Work

on:
  push:
    paths:
      - 'src/**'           # Only run if source code changed
      - 'package.json'
      - 'Dockerfile'
    paths-ignore:
      - '**.md'             # Skip on doc changes
      - '.github/ISSUE_TEMPLATE/**'

Cancel Redundant Runs

# Cancel previous pipeline if new commit pushed
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

4. Fail Fast, Fail Loudly

The cheapest checks should run first. If linting takes 15 seconds and catches a bug, don't waste 5 minutes building before discovering it.

Optimal Stage Order

Stage 1: Lint          (15 sec)  ← Cheapest check first
Stage 2: Type Check    (20 sec)  ← Static analysis
Stage 3: Unit Tests    (45 sec)  ← Fast tests
Stage 4: Build         (90 sec)  ← Compile/package
Stage 5: Integration   (3 min)   ← Heavier tests
Stage 6: Security Scan (2 min)   ← Can run in parallel with Stage 5
Stage 7: E2E Tests     (5 min)   ← Slowest, most expensive
Stage 8: Deploy        (1 min)   ← Only if everything passes

Loud Notifications on Failure

# Always notify on failure
- name: Notify failure
  if: failure()
  run: |
    curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
      -H 'Content-Type: application/json' \
      -d '{
        "text": "❌ Pipeline FAILED",
        "blocks": [
          {
            "type": "section",
            "text": {
              "type": "mrkdwn",
              "text": "❌ *Pipeline Failed*\n*Repo:* ${{ github.repository }}\n*Branch:* ${{ github.ref_name }}\n*Commit:* `${{ github.sha }}`\n*Author:* ${{ github.actor }}\n*Message:* ${{ github.event.head_commit.message }}\n<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View Pipeline>"
            }
          }
        ]
      }'

Set Timeouts on Every Job

jobs:
  build:
    runs-on: ubuntu-latest
    timeout-minutes: 15      # Kill if stuck for 15 min
 
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
 
  deploy:
    runs-on: ubuntu-latest
    timeout-minutes: 5

5. Never Skip Security

Every pipeline must include security scanning. At minimum, scan your dependencies. Ideally, scan your code and containers too.

Minimum Security Pipeline

security:
  runs-on: ubuntu-latest
  steps:
    # 1. Dependency vulnerabilities (MUST HAVE)
    - run: npm audit --audit-level=high
 
    # 2. Secret detection (MUST HAVE)
    - uses: trufflesecurity/trufflehog@main
      with:
        extra_args: --only-verified
 
    # 3. Static code analysis (RECOMMENDED)
    - uses: github/codeql-action/init@v3
      with:
        languages: javascript-typescript
    - uses: github/codeql-action/analyze@v3
 
    # 4. Container scanning (IF using Docker)
    - uses: aquasecurity/trivy-action@master
      with:
        image-ref: myapp:${{ github.sha }}
        severity: 'CRITICAL,HIGH'
        exit-code: '1'

Security Scanning Schedule

Scan Type	Frequency	Block Deploy?
npm audit	Every pipeline	Yes (high/critical)
Secret detection	Every pipeline	Yes
SAST (CodeQL)	Every PR	Yes (high severity)
Container scan	Every build	Yes (critical)
Full DAST scan	Nightly	No (creates tickets)
SBOM generation	Every release	No (compliance)

6. Immutable Artifacts

Once an artifact is built and tagged, never overwrite it. If you need a fix, build a new artifact with a new tag.

❌ Wrong: Mutable Tags

# DON'T DO THIS
docker build -t myapp:latest .
docker push myapp:latest
# Next day...
docker build -t myapp:latest .  # ← Overwrites yesterday's image!
docker push myapp:latest

✅ Correct: Immutable Tags

# DO THIS
docker build -t myapp:abc123f .    # Tagged with git SHA
docker push myapp:abc123f
# Next day...
docker build -t myapp:def456a .    # New tag, old image preserved
docker push myapp:def456a

Why Immutability Matters

Mutable	Immutable
`myapp:latest` could be anything	`myapp:abc123f` is always the same
Can't reproduce a deployment	Can always redeploy exact version
Rollback might not work	Rollback always works
"What version is running?" → "¯\(ツ)/¯"	"What version is running?" → "abc123f"

7. Automate Everything (Except Approval)

The only human step in your pipeline should be the approval gate before production deployment. Everything else must be automated.

What to Automate

✅ Code linting
✅ Building artifacts
✅ Running all tests
✅ Security scanning
✅ Deploying to staging
✅ Smoke testing staging
✅ Creating release notes
✅ Tagging releases
✅ Notifying team
✅ Monitoring deployment health
✅ Rolling back on failure

What to Keep Manual

👤 Approval to deploy to production
👤 Decision to rollback (unless auto-rollback configured)
👤 Major version releases (marketing coordination)

Automated Release Notes

release:
  runs-on: ubuntu-latest
  if: github.ref == 'refs/heads/main'
  steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
 
    - name: Generate changelog
      id: changelog
      run: |
        PREVIOUS_TAG=$(git describe --tags --abbrev=0 HEAD^ 2>/dev/null || echo "")
        if [ -z "$PREVIOUS_TAG" ]; then
          CHANGES=$(git log --pretty=format:"- %s (%h)" -20)
        else
          CHANGES=$(git log $PREVIOUS_TAG..HEAD --pretty=format:"- %s (%h)")
        fi
        echo "changes<<EOF" >> $GITHUB_OUTPUT
        echo "$CHANGES" >> $GITHUB_OUTPUT
        echo "EOF" >> $GITHUB_OUTPUT
 
    - name: Create GitHub Release
      uses: actions/create-release@v1
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      with:
        tag_name: v${{ github.run_number }}
        release_name: Release v${{ github.run_number }}
        body: |
          ## Changes
          ${{ steps.changelog.outputs.changes }}
 
          ## Deployment Info
          - **Commit:** ${{ github.sha }}
          - **Date:** ${{ github.event.head_commit.timestamp }}
          - **Author:** ${{ github.actor }}

8. Monitor Deployments Actively

Deploying is not the end — it's the beginning of monitoring. Your pipeline should include post-deployment health checks and alerting.

Post-Deployment Health Checks

post-deploy-monitor:
  name: Post-Deployment Monitoring
  runs-on: ubuntu-latest
  needs: deploy-production
 
  steps:
    - name: Wait for app to stabilize
      run: sleep 60
 
    - name: Check application health
      run: |
        for i in $(seq 1 5); do
          STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://myapp.com/health)
          if [ "$STATUS" != "200" ]; then
            echo "❌ Health check failed (attempt $i/5): HTTP $STATUS"
            if [ "$i" = "5" ]; then exit 1; fi
            sleep 10
          else
            echo "✅ Health check passed: HTTP $STATUS"
            break
          fi
        done
 
    - name: Check error rate
      run: |
        # Query Prometheus/Datadog for error rate
        ERROR_RATE=$(curl -s "https://monitoring.myapp.com/api/metrics" \
          -H "Authorization: Bearer ${{ secrets.MONITORING_TOKEN }}" \
          | jq -r '.error_rate')
 
        echo "Error rate: ${ERROR_RATE}%"
 
        if (( $(echo "$ERROR_RATE > 5" | bc -l) )); then
          echo "❌ Error rate too high! Triggering rollback..."
          exit 1
        fi
 
    - name: Check response latency
      run: |
        LATENCY=$(curl -s -o /dev/null -w "%{time_total}" https://myapp.com/api/status)
        echo "Response time: ${LATENCY}s"
 
        if (( $(echo "$LATENCY > 2.0" | bc -l) )); then
          echo "⚠️ Latency elevated: ${LATENCY}s (threshold: 2.0s)"
        fi
 
    - name: Auto-rollback on failure
      if: failure()
      run: |
        echo "🔄 Auto-rolling back production deployment..."
        kubectl rollout undo deployment/myapp -n production
        kubectl rollout status deployment/myapp -n production --timeout=5m
 
        curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
          -d '{"text":"🚨 PRODUCTION AUTO-ROLLBACK triggered after deployment of ${{ github.sha }}"}'

Deployment Dashboard Metrics

Metric	What to Watch	Alert Threshold
HTTP 5xx rate	Server errors	> 1% of requests
P99 latency	Slowest 1% of responses	> 2 seconds
Error count	Total errors per minute	> 10/min
CPU usage	Application load	> 80% for 5 min
Memory usage	Memory leaks	> 85%
Pod restarts	Crash loops	> 0 after deploy

9. Practice Rollbacks Regularly

A rollback you've never tested is a rollback that won't work. Practice rolling back regularly so your team is prepared.

Rollback Methods by Strategy

Strategy	Rollback Method	Time
Rolling Update	`kubectl rollout undo`	30-60 sec
Blue-Green	Switch service selector	5 sec
Canary	Scale canary to 0	10 sec
Feature Flag	Toggle flag OFF	Instant
Recreate	Redeploy old version	2-5 min

Rollback Pipeline

rollback:
  name: Emergency Rollback
  runs-on: ubuntu-latest
 
  # Manual trigger with version selection
  on:
    workflow_dispatch:
      inputs:
        target_version:
          description: 'Version to rollback to (git SHA or tag)'
          required: true
        reason:
          description: 'Reason for rollback'
          required: true
 
  steps:
    - name: Log rollback
      run: |
        echo "🚨 ROLLBACK INITIATED"
        echo "Target: ${{ github.event.inputs.target_version }}"
        echo "Reason: ${{ github.event.inputs.reason }}"
        echo "By: ${{ github.actor }}"
        echo "Time: $(date -u)"
 
    - name: Rollback deployment
      run: |
        kubectl set image deployment/myapp \
          myapp=myregistry.io/myapp:${{ github.event.inputs.target_version }} \
          -n production
        kubectl rollout status deployment/myapp -n production --timeout=5m
 
    - name: Verify rollback
      run: |
        curl -f https://myapp.com/health
        echo "✅ Rollback successful"
 
    - name: Notify team
      run: |
        curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
          -d "{\"text\":\"🚨 ROLLBACK to ${{ github.event.inputs.target_version }}\nReason: ${{ github.event.inputs.reason }}\nBy: ${{ github.actor }}\"}"
 
    - name: Create incident ticket
      run: |
        curl -X POST https://api.pagerduty.com/incidents \
          -H "Authorization: Token token=${{ secrets.PAGERDUTY_TOKEN }}" \
          -H "Content-Type: application/json" \
          -d "{
            \"incident\": {
              \"title\": \"Production Rollback: ${{ github.event.inputs.reason }}\",
              \"urgency\": \"high\"
            }
          }"

Rollback Drill Schedule

Run a rollback drill monthly to verify your process works:

Monthly Rollback Drill Checklist:
☐ Deploy a known-good older version to production
☐ Verify monitoring detects the version change
☐ Verify alerts fire correctly
☐ Verify the application works after rollback
☐ Measure rollback time (target: under 5 minutes)
☐ Document any issues found
☐ Update runbook if process changed

10. Secure the Pipeline Itself

Your CI/CD pipeline has privileged access to production, registries, and cloud accounts. It's a prime attack target.

Pipeline Security Checklist

Secrets Management

# ✅ GOOD: Use GitHub Secrets
env:
  API_KEY: ${{ secrets.API_KEY }}
 
# ❌ BAD: Hardcoded secrets
env:
  API_KEY: "sk-1234567890abcdef"
 
# ❌ BAD: Secrets in logs
- run: echo "Key is ${{ secrets.API_KEY }}"  # Visible in logs!

Dependency Pinning

# ✅ GOOD: Pin actions to SHA (prevents supply chain attacks)
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11  # v4.1.1
 
# ⚠️ OK: Pin to major version (gets security patches)
- uses: actions/checkout@v4
 
# ❌ BAD: No version pinning
- uses: actions/checkout@main  # Could change at any time!

Least Privilege Permissions

# Set minimum permissions for the workflow
permissions:
  contents: read          # Only read repo contents
  packages: write         # Write to container registry
  id-token: write         # For OIDC authentication to cloud
 
# Even more restrictive per job
jobs:
  build:
    permissions:
      contents: read
  deploy:
    permissions:
      contents: read
      id-token: write     # Only deploy job needs cloud access

Branch Protection

Repository Settings → Branch Protection Rules:

✅ Require pull request before merging
✅ Require status checks to pass (CI pipeline)
✅ Require code review approval (1-2 reviewers)
✅ Dismiss stale reviews on new pushes
✅ Require signed commits
✅ Do not allow bypassing above settings
✅ Restrict force pushes

Self-Hosted Runner Security

# If using self-hosted runners:
jobs:
  build:
    runs-on: [self-hosted, linux, x64]
 
# Security measures:
# ✅ Run in ephemeral containers (clean state per job)
# ✅ Network isolation (no access to other internal systems)
# ✅ Regularly update runner software
# ✅ Monitor runner activity
# ❌ Never run untrusted code on self-hosted runners

Pipeline Anti-Patterns

Anti-Pattern 1: The Monolith Pipeline

# ❌ BAD: One massive job that does everything
jobs:
  everything:
    steps:
      - run: npm ci
      - run: npm run lint
      - run: npm test
      - run: npm run build
      - run: docker build .
      - run: docker push .
      - run: kubectl apply .
      # If ANY step fails, you restart EVERYTHING

# ✅ GOOD: Separate jobs with dependencies
jobs:
  lint:    { ... }
  build:   { needs: lint, ... }
  test:    { needs: build, ... }
  deploy:  { needs: test, ... }
  # If test fails, only re-run from test

Anti-Pattern 2: Manual Steps Disguised as Automation

# ❌ BAD: "Run this script manually after pipeline completes"
# This defeats the purpose of CI/CD
 
# ✅ GOOD: Every step is in the pipeline
deploy:
  steps:
    - run: kubectl apply -f k8s/
    - run: kubectl rollout status deployment/myapp
    - run: bash scripts/smoke-test.sh
    - run: bash scripts/notify-team.sh

Anti-Pattern 3: Ignoring Failures

# ❌ BAD: Ignoring security scan failures
- run: npm audit
  continue-on-error: true  # ← Why even scan?
 
# ✅ GOOD: Block on critical/high
- run: npm audit --audit-level=high
  # Pipeline fails if high/critical vulnerabilities found

Anti-Pattern 4: No Cleanup

# ❌ BAD: Artifacts accumulate forever
# → $500/month in registry storage
 
# ✅ GOOD: Automated cleanup
cleanup:
  schedule:
    - cron: '0 3 * * SUN'
  steps:
    - uses: actions/delete-package-versions@v5
      with:
        min-versions-to-keep: 10
        delete-only-untagged-versions: true

Pipeline Maturity Model

Level 1: Basic (Getting Started)

☐ Code in version control
☐ Pipeline runs on push
☐ Basic lint + unit tests
☐ Manual deployment

Level 2: Standard (Most Teams)

☐ All Level 1 items
☐ Integration tests
☐ Automated staging deployment
☐ Manual production approval
☐ Slack notifications
☐ Test coverage tracking

Level 3: Advanced (Strong DevOps Culture)

☐ All Level 2 items
☐ Security scanning (SAST, SCA, secrets)
☐ Container image scanning
☐ Canary or blue-green deployments
☐ Post-deploy monitoring
☐ Auto-rollback on failure
☐ SBOM generation

Level 4: Elite (Top 10% of Organizations)

☐ All Level 3 items
☐ Feature flags for gradual rollout
☐ Chaos engineering in pipeline
☐ Performance regression testing
☐ Automated rollback drills
☐ Full GitOps workflow
☐ &lt; 15% change failure rate
☐ &lt; 1 hour lead time for changes

Quick Reference: Pipeline Template

# The "Golden Pipeline" — copy and customize
 
name: CI/CD Pipeline
 
on:
  push:
    branches: [main, develop]
    paths-ignore: ['**.md']
  pull_request:
    branches: [main]
 
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true
 
permissions:
  contents: read
  packages: write
 
jobs:
  lint:
    runs-on: ubuntu-latest
    timeout-minutes: 5
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npm run lint
      - run: npx tsc --noEmit
 
  test:
    needs: lint
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npm test -- --coverage --ci
      - uses: codecov/codecov-action@v4
 
  security:
    needs: lint
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
      - run: npm audit --audit-level=high
      - uses: github/codeql-action/init@v3
        with: { languages: javascript-typescript }
      - uses: github/codeql-action/analyze@v3
 
  build:
    needs: [test, security]
    runs-on: ubuntu-latest
    timeout-minutes: 15
    if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop'
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - uses: docker/build-push-action@v5
        with:
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
 
  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/develop'
    runs-on: ubuntu-latest
    timeout-minutes: 5
    environment: staging
    steps:
      - run: echo "Deploy to staging"
      # kubectl set image ...
 
  deploy-production:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    timeout-minutes: 5
    environment: production
    steps:
      - run: echo "Deploy to production"
      # kubectl set image ...
 
  notify:
    needs: [deploy-staging, deploy-production]
    if: always()
    runs-on: ubuntu-latest
    steps:
      - run: |
          curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
            -d '{"text":"Pipeline complete: ${{ needs.deploy-production.result || needs.deploy-staging.result }}"}'