Kubernetes is complex, and things will eventually break. Here is the standard workflow for diagnosing and fixing cluster issues.
1. The Pod is failing (CrashLoopBackOff / Error)
This is the most common issue.
Step 1: Check Pod Status
Action:
kubectl get podsResult:
NAME READY STATUS RESTARTS AGE
web-app-1 0/1 CrashLoopBackOff 4 2mStep 2: Check Events (The "Why")
Action:
kubectl describe pod web-app-1Result: Look for the Events section at the bottom:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Failed 30s (x3 over 1m) kubelet Error: ImagePullBackOff(This tells you Kubernetes can't find your image or doesn't have permission to pull it)
Step 3: Check Logs (Application Errors)
Action:
kubectl logs web-app-1Result:
[FATAL] Database connection failed: 'db-service' not found.(This tells you the issue is inside your code or configuration)
2. Service is unreachable
If your app is running but you can't talk to it.
Step 1: Verify Endpoints
Action:
kubectl get endpoints my-web-serviceResult:
NAME ENDPOINTS AGE
my-web-service <none> 5mInterpretation: <none> means your Service Selector doesn't match your Pod Labels. Fix your YAML!
3. Node or Cluster Issues
Check Node Health
Action:
kubectl get nodesResult:
NAME STATUS ROLES AGE VERSION
node-1 Ready control-plane 10d v1.26.1
node-2 NotReady <none> 10d v1.26.1Interpretation: NotReady often means the node is out of disk space, memory, or the Kubelet has crashed.
Summary: Debugging Checklist
kubectl get pods: Is it running?kubectl describe: What do the Events say?kubectl logs: What does the app code say?kubectl get events -A: Check global cluster errors.kubectl exec: Can you ping other services from inside?