18

Health Probes: Liveness vs Readiness (vs Startup)

Video: Day 18/40 — Kubernetes Health Probes | Liveness vs Readiness Probes • https://www.youtube.com/watch?v=x2e6pIBLKzw • Duration: ~29 min

Key terms

TermMeaning
Liveness probeRestart the container if it fails
Readiness probeHold traffic until the pod is ready
Startup probeProtect slow-starting apps from early restarts
httpGet/tcpSocket/execThe three probe check methods
initialDelaySecondsWait before the first probe
periodSeconds/thresholdProbe timing and failure tolerance

Problem & solution

A container can be "running" yet broken (deadlocked) or still warming up. Without health checks, Kubernetes will route traffic to bad pods and never restart hung ones, causing silent outages.

Solution: Add liveness (restart if stuck), readiness (gate traffic), and startup probes so Kubernetes only sends traffic to healthy containers.

The analogy

When a ship arrives, the harbor doctor boards it and asks three questions. Is the ship still alive at all, and if not, send it back for repairs. Is it ready to load cargo right now, and if not, hold the trucks back but leave it docked. And has it finished starting up after a long voyage, so a slow boat gets time before the other two checks judge it. In Kubernetes the doctor is the kubelet, the alive check is the liveness probe that restarts the container, the ready check is the readiness probe that gates Service traffic, and the boot check is the startup probe that protects slow starters.

Where this fits in the cluster

Probes are defined per container. The kubelet on the node runs them; results drive container restarts (liveness) and Service endpoint membership (readiness).

Why probes?

A container can be running but broken (deadlocked) or not yet ready (still warming up). Probes let the kubelet check actual health and react.

The three probes

Kubernetes offers three probe kinds, each answering a different question and triggering a different reaction when it fails.

   Liveness   -> "Is it alive?"     fail -> RESTART the container
   Readiness  -> "Can it serve?"    fail -> remove from Service endpoints
   Startup    -> "Has it booted?"   fail -> kill; gates the other two

Liveness vs Readiness (the key distinction)

The two probes act on different axes: readiness decides whether a pod gets traffic, while liveness decides whether the container is restarted.

   Readiness controls TRAFFIC   (in/out of the load balancer)
   Liveness   controls LIFECYCLE (restart the container)

A pod can be Live but not Ready (alive, still warming up -> no traffic yet).

Probe types (how the check is done)

Each probe can run the health check in one of three ways, depending on what the app exposes.

   httpGet   -> GET a path/port; 2xx-3xx = pass
   tcpSocket -> can we open the TCP port? = pass
   exec      -> run a command in container; exit 0 = pass

YAML — all three

A single container can declare all three probes together; the startupProbe gates the others until the app has booted.

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
    - name: app
      image: myapp:1.0
      ports:
        - containerPort: 8080
      startupProbe:                # gate slow starters first
        httpGet: { path: /healthz, port: 8080 }
        failureThreshold: 30
        periodSeconds: 10          # allows up to 300s to boot
      readinessProbe:
        httpGet: { path: /ready, port: 8080 }
        initialDelaySeconds: 5
        periodSeconds: 10
      livenessProbe:
        httpGet: { path: /healthz, port: 8080 }
        initialDelaySeconds: 10
        periodSeconds: 10

tcpSocket and exec variants

Instead of an HTTP check you can probe by opening a TCP port or running a command inside the container.

      readinessProbe:
        tcpSocket: { port: 5432 }
      livenessProbe:
        exec:
          command: ["cat", "/tmp/healthy"]

Tuning fields

These timing fields control how patient or aggressive a probe is before it declares success or failure.

   initialDelaySeconds  wait this long before first check
   periodSeconds        how often to check
   timeoutSeconds       per-check timeout
   successThreshold     consecutive passes to be considered healthy
   failureThreshold     consecutive fails before acting (restart/unready)

Why startupProbe exists (slow apps)

Apps with long boot times need a startupProbe so liveness checks don't kill them mid-startup in a crash loop.

Timeline (ASCII)

This is the order in which the probes activate over a pod's life, from boot to serving traffic to a possible restart.

Inspect

Use these commands to see probe results, restart counts, and which pods are currently in a Service's endpoints.

kubectl describe pod app        # see probe results + restart reasons
kubectl get pod app             # READY column, RESTARTS count
kubectl get endpoints <svc>     # readiness controls who is listed here

End-to-end example: watch readiness gate traffic

Deploy an app behind a Service with all three probes, then break readiness and watch the pod drop out of the Service endpoints without being restarted.

apiVersion: apps/v1
kind: Deployment
metadata: { name: web }
spec:
  replicas: 1
  selector: { matchLabels: { app: web } }
  template:
    metadata: { labels: { app: web } }
    spec:
      containers:
        - name: web
          image: nginx
          ports: [{ containerPort: 80 }]
          readinessProbe:
            httpGet: { path: /ready.html, port: 80 }   # file must exist to be Ready
            periodSeconds: 5
          livenessProbe:
            httpGet: { path: /, port: 80 }
            initialDelaySeconds: 10
            periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata: { name: web }
spec:
  selector: { app: web }
  ports: [{ port: 80 }]
kubectl apply -f web-probes.yaml
POD=$(kubectl get pod -l app=web -o name)
kubectl exec $POD -- sh -c 'echo ok > /usr/share/nginx/html/ready.html'  # make Ready
kubectl get endpoints web                  # pod IP now listed
kubectl exec $POD -- rm /usr/share/nginx/html/ready.html                 # break readiness
kubectl get pod -l app=web                  # READY 0/1, but still Running
kubectl get endpoints web                  # pod IP removed -> no traffic

End-to-end flow

Probes gate a container's life: startup first, then readiness controls traffic and liveness controls restarts.

Key takeaways

  • Liveness restarts; Readiness gates traffic; Startup protects slow boots.
  • A pod can be Live but not Ready (running, no traffic).
  • Probe via httpGet / tcpSocket / exec; tune with delay/period/thresholds.

Checklist

  • [ ] Added readiness + liveness probes to a pod
  • [ ] Forced readiness to fail and saw it drop from Service endpoints
  • [ ] Forced liveness to fail and saw RESTARTS increment
  • [ ] Used a startupProbe for a slow-starting app