G
GuideDevOps
Lesson 12 of 17

Jobs & CronJobs

Part of the Kubernetes tutorial series.

The Problem with Deployments

Deployments are designed for long-running services:

  • Continuously running Pods
  • Self-healing if containers crash
  • Scaling based on demand

But what if you need to:

  • Run a one-time database migration?
  • Process a batch of files once and exit?
  • Clean up old data daily?

For these, use Jobs and CronJobs.


Jobs

A Job creates Pods that run until completion, then stop.

Basic Job

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  template:
    spec:
      containers:
      - name: migration
        image: myapp:1.0
        command: ["/app/migrate.sh"]
      restartPolicy: Never    # Don't restart on failure
  backoffLimit: 3              # Try up to 3 times before giving up
  ttlSecondsAfterFinished: 3600 # Delete job 1 hour after completion

Run a Job

kubectl apply -f job.yaml
 
# Check status
kubectl get jobs
kubectl describe job db-migration
kubectl logs job/db-migration
 
# Delete job
kubectl delete job db-migration

Job States

Job created
   ↓
Pod created → Running
   ↓
Container completes with exit code 0 → Job succeeds
   ↓
(After ttlSecondsAfterFinished) → Job deleted

Or:

Container exits with non-zero code
   ↓
If attempts < backoffLimit → Restart pod
   ↓
If attempts >= backoffLimit → Job fails

Parallel Jobs

Run multiple Pods in parallel:

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-process
spec:
  parallelism: 5           # Run 5 Pods simultaneously
  completions: 100         # Total of 100 Pods needed to complete
  backoffLimit: 3
  template:
    spec:
      containers:
      - name: processor
        image: batch-processor:1.0
      restartPolicy: Never

How it works:

  • Start 5 Pods in parallel
  • When one finishes, start another
  • Keep going until 100 Pods have succeeded

Work Queue Pattern

For distributed work:

apiVersion: batch/v1
kind: Job
metadata:
  name: work-queue-job
spec:
  parallelism: 10
  completions: 10
  backoffLimit: 3
  template:
    spec:
      containers:
      - name: worker
        image: worker:latest
        env:
        - name: QUEUE_URL
          value: "rabbitmq-service:5672"
      restartPolicy: Never

Each Pod:

  1. Connects to a work queue (Redis, RabbitMQ, etc.)
  2. Pulls a task
  3. Processes it
  4. Exits

CronJobs

A CronJob runs a Job on a schedule (like Linux crontab).

Basic CronJob

apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-backup
spec:
  schedule: "0 2 * * *"     # 2 AM every day (UTC)
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: backup-tool:latest
            command: ["/app/backup.sh"]
            volumeMounts:
            - name: backup-volume
              mountPath: /backups
          volumes:
          - name: backup-volume
            persistentVolumeClaim:
              claimName: backup-storage
          restartPolicy: OnFailure
      backoffLimit: 1

Cron Schedule Format

┌─────────────────── minute (0 - 59)
│ ┌───────────────── hour (0 - 23)
│ │ ┌─────────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌─────────── day of week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
│ │ │ │ │
* * * * *

Common Schedules

schedule: "0 0 * * *"        # Every day at midnight
schedule: "0 2 * * *"        # Every day at 2 AM
schedule: "0 * * * *"        # Every hour
schedule: "*/15 * * * *"      # Every 15 minutes
schedule: "0 0 * * 0"        # Every Sunday at midnight
schedule: "0 0 1 * *"        # First day of every month
schedule: "0 0 * * 1-5"      # Weekdays at midnight
schedule: "0 9,17 * * *"     # At 9 AM and 5 PM

Timezone Support

apiVersion: batch/v1
kind: CronJob
metadata:
  name: localized-job
spec:
  schedule: "0 9 * * *"
  timeZone: "America/New_York"   # Run at 9 AM EST
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: job
            image: job:latest
          restartPolicy: OnFailure

Managing CronJob History

spec:
  successfulJobsHistoryLimit: 3   # Keep last 3 successful jobs
  failedJobsHistoryLimit: 1        # Keep last 1 failed job
  concurrencyPolicy: Forbid        # Don't run if previous job still running
  # Or:
  concurrencyPolicy: Replace       # Cancel previous job and start new one

Job Patterns

Pattern 1: One-Time Task

apiVersion: batch/v1
kind: Job
metadata:
  name: one-time-task
spec:
  template:
    spec:
      containers:
      - name: task
        image: ubuntu
        command: ["echo", "Hello World"]
      restartPolicy: Never

Pattern 2: Database Migration

apiVersion: batch/v1
kind: Job
metadata:
  name: django-migrate
spec:
  template:
    spec:
      serviceAccountName: django-sa
      containers:
      - name: migrate
        image: myapp:2.0
        command: ["python", "manage.py", "migrate"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: url
      restartPolicy: Never
  backoffLimit: 3

Pattern 3: Daily Report Generation

apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-report
spec:
  schedule: "0 6 * * *"       # 6 AM daily
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: report-generator
          containers:
          - name: report
            image: reporting:latest
            env:
            - name: REPORT_DATE
              value: "{{ .ScheduledTime }}"
          restartPolicy: OnFailure

Pattern 4: Batch Data Processing

apiVersion: batch/v1
kind: Job
metadata:
  name: image-processor
spec:
  parallelism: 10
  completions: 100
  template:
    spec:
      containers:
      - name: processor
        image: image-processor:latest
        volumeMounts:
        - name: images
          mountPath: /input
        - name: output
          mountPath: /output
      volumes:
      - name: images
        persistentVolumeClaim:
          claimName: images-pvc
      - name: output
        persistentVolumeClaim:
          claimName: output-pvc
      restartPolicy: OnFailure

Monitoring Jobs

# List all jobs
kubectl get jobs
 
# View job details
kubectl describe job db-migration
 
# View job logs
kubectl logs job/db-migration
kubectl logs job/db-migration --all-containers=true
 
# Watch job progress
kubectl get jobs -w
 
# View CronJob history
kubectl get cronjobs
kubectl get jobs -l cronjob=daily-backup

Best Practices

Set backoffLimit appropriately

  • Prevent infinite retry loops
  • Balance reliability with cost

Use ttlSecondsAfterFinished

  • Clean up completed jobs automatically
  • Keeps cluster tidy

Monitor job completion

kubectl get jobs -w
kubectl wait --for=condition=complete job/my-job

Log job output

kubectl logs job/my-job > job-output.log

Handle failures gracefully Check exit codes:

if [ $? -ne 0 ]; then
  echo "Job failed"
  exit 1
fi

Don't use "sleep" for scheduling Use CronJobs instead of Jobs that sleep

Don't create jobs directly from command line Use YAML manifests for reproducibility