12

DaemonSet, Job & CronJob

Video: Day 12/40 — Kubernetes Daemonset Explained — Daemonsets, Job and Cronjob • https://www.youtube.com/watch?v=kvITrySpy_k • Duration: ~28 min

Key terms

TermMeaning
DaemonSetRuns one pod per (matching) node
JobRuns pods until successful completion
CronJobCreates Jobs on a schedule
completions/parallelismHow many / how concurrently a Job runs
concurrencyPolicyWhether CronJob runs may overlap
ScheduleCron expression driving a CronJob

Problem & solution

Deployments assume long-running, horizontally scalable services, but some workloads don't fit: an agent that must run on every node, a one-off batch task that should finish and stop, or a recurring scheduled job.

Solution: Pick the right controller for the lifecycle: DaemonSet (one pod per node), Job (run to completion), CronJob (scheduled).

The analogy

Think of three different port jobs. A safety inspector stationed on every berth watches each dock all day, and the moment a brand-new berth opens the port posts one there too. A one-off cargo haul is dispatched, runs until the load is delivered, then stops for good. A scheduled nightly haul goes out automatically every night on a fixed timetable. These map to a DaemonSet that runs one pod per node, a Job that runs to completion, and a CronJob that creates Jobs on a schedule.

Where this fits in the cluster

These are controllers that live in the control plane and decide which pods land on which nodes. DaemonSet works at the node layer, Job/CronJob at the pod layer.

Three special workload controllers

Beyond Deployments, Kubernetes ships controllers for non-standard workloads: node agents, one-off batch tasks, and scheduled tasks.

   DaemonSet  -> one pod on EVERY node           (agents)
   Job        -> run to completion, then stop     (batch task)
   CronJob    -> Job on a schedule                (cron)

Where they sit in the controller family

Every controller manages pods, but each answers a different question: how many, where, and for how long?

   CONTROLLER     "how many / where"            "how long does a pod live"
   -----------    --------------------------     --------------------------
   Deployment     N replicas, scheduler picks    forever (restarted on exit)
   DaemonSet      exactly 1 per matching node     forever (restarted on exit)
   Job            N completions, anywhere         until task succeeds, then stop
   CronJob        spawns a Job each tick          each Job stops when done

Rule of thumb: long-running = Deployment/DaemonSet, runs-then-exits = Job/CronJob. The restart expectation is the key difference.

DaemonSet — one pod per node

A DaemonSet guarantees a copy of a pod runs on every (matching) node, and automatically places one on any new node that joins the cluster.

Use for node-level agents: log collectors (fluentd), monitoring (node-exporter), CNI plugins, kube-proxy.

DaemonSet vs Deployment scaling

A Deployment's count is something you choose; a DaemonSet's count is dictated by the cluster size. You never set replicas on a DaemonSet.

Targeting a subset of nodes

Pair a nodeSelector (Day 15) or taints/tolerations (Day 14) with a DaemonSet to run the agent only where it belongs (e.g. only GPU nodes).

The manifest below is a minimal DaemonSet; notice it has no replicas field, because the controller decides the count by running one pod on every matching node.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-agent
spec:
  selector:
    matchLabels: { app: agent }
  template:
    metadata:
      labels: { app: agent }
    spec:
      containers:
        - name: agent
          image: busybox
          command: ['sh','-c','while true; do sleep 3600; done']

Job — run once to completion

A Job runs a pod until its task finishes successfully, then stops — unlike a Deployment, it is not meant to stay running.

  • Not restarted on success.
  • Retries on failure up to backoffLimit.

completions x parallelism

These two knobs control batch shape: how many successes you need, and how many pods run at once toward that goal.

completions: 1, parallelism: 1 runs a single one-shot task (the default).

backoffLimit and retries

On failure a Job retries with exponential backoff until backoffLimit is hit, then it's marked Failed and stops trying.

activeDeadlineSeconds: hard cap on total runtime regardless of retries.

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  completions: 1          # how many successful pods needed
  parallelism: 1          # how many run at once
  backoffLimit: 4         # retries before marking failed
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: pi
          image: perl
          command: ['perl','-Mbignum=bpi','-wle','print bpi(200)']

CronJob — Jobs on a schedule

A CronJob creates a Job automatically on a recurring cron schedule — the Kubernetes equivalent of a crontab entry.

This manifest defines a CronJob that backs up at 02:00 every night; the extra fields control overlap and how much run history Kubernetes keeps around.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup
spec:
  schedule: "0 2 * * *"            # daily at 02:00
  concurrencyPolicy: Forbid        # don't start a new run if one is still going
  startingDeadlineSeconds: 120     # skip a missed run if >2 min late
  successfulJobsHistoryLimit: 3    # keep last 3 successful Jobs
  failedJobsHistoryLimit: 1        # keep last 1 failed Job
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: backup
              image: busybox
              command: ['sh','-c','echo backing up...']

concurrencyPolicy — what if a run overlaps?

If a Job is still running when the next tick fires, this policy decides what happens. Critical for long backups or jobs that must not double-run.

   Allow   (default): start the new run anyway -> two Jobs run at once
   Forbid           : skip the new run while the old one is still going
   Replace          : kill the old run, start the new one

Object chain: CronJob -> Job -> Pod

A CronJob doesn't run containers directly — it creates Jobs, which create Pods. History limits control how many old Jobs are kept around.

   schedule: "*/5 * * * *"   (every 5 min)
      |   |   |   |   |
      |   |   |   |   +-- day of week (0-6)
      |   |   |   +------ month (1-12)
      |   |   +---------- day of month (1-31)
      |   +-------------- hour (0-23)
      +------------------ minute (0-59)

   tick -> creates a Job -> creates a Pod -> runs -> done

Commands

Everyday commands for inspecting and triggering these controllers.

kubectl get daemonset -A
kubectl get jobs
kubectl get cronjobs                               # shows SCHEDULE, LAST SCHEDULE
kubectl logs job/pi                                # logs from a Job's pod
kubectl create job manual --from=cronjob/backup    # trigger a CronJob now
kubectl get pods --selector=job-name=pi            # find a Job's pods
kubectl delete job pi                              # also deletes its pods

Reading the STATUS / COMPLETIONS columns

The list output tells you batch progress at a glance.

   $ kubectl get jobs
   NAME   COMPLETIONS   DURATION   AGE
   pi     1/1           5s         1m      <- done: 1 of 1 succeeded
   etl    2/4           30s        30s     <- in progress: 2 of 4 done

   $ kubectl get pods
   pi-xxxxx   0/1   Completed   0   1m      <- batch pods end "Completed", not Running

When to use which

A quick decision guide for picking the right controller for the job.

   Need it on every node?           -> DaemonSet
   One-off batch task?              -> Job
   Recurring scheduled task?        -> CronJob
   Long-running stateless service?  -> Deployment (Day 8)

End-to-end example: nightly DB backup

A realistic pipeline: a DaemonSet log agent on every node, plus a CronJob that backs up a database every night and a way to trigger it on demand.

# 1) node-level agent on EVERY node
apiVersion: apps/v1
kind: DaemonSet
metadata: { name: log-agent }
spec:
  selector: { matchLabels: { app: log-agent } }
  template:
    metadata: { labels: { app: log-agent } }
    spec:
      containers:
        - name: agent
          image: busybox
          command: ['sh','-c','while true; do echo collecting; sleep 30; done']
---
# 2) scheduled backup that must not overlap
apiVersion: batch/v1
kind: CronJob
metadata: { name: db-backup }
spec:
  schedule: "0 2 * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  jobTemplate:
    spec:
      backoffLimit: 2
      activeDeadlineSeconds: 600
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: backup
              image: postgres:16
              command: ['sh','-c','pg_dump "$DB_URL" > /tmp/dump.sql && echo done']
              env:
                - name: DB_URL
                  valueFrom:
                    secretKeyRef:
                      name: db-credentials
                      key: url
kubectl apply -f backup.yaml
kubectl get daemonset log-agent -o wide        # one pod per node
kubectl get cronjob db-backup                  # SCHEDULE + LAST SCHEDULE
kubectl create job test-now --from=cronjob/db-backup   # trigger immediately
kubectl logs job/test-now                      # watch the backup run to Completed

Common pitfalls

The mistakes that surprise people first.

   * restartPolicy: Always on a Job  -> rejected; Jobs need Never or OnFailure
   * Pod stuck Completed, not Running -> that's correct for Jobs; don't "fix" it
   * CronJob never fires              -> check timezone, schedule syntax, and
                                         startingDeadlineSeconds skipping runs
   * Overlapping CronJob runs         -> set concurrencyPolicy: Forbid/Replace
   * DaemonSet skips a node           -> node has a taint; add a toleration (Day 14)
   * Job retries forever             -> set backoffLimit + activeDeadlineSeconds

End-to-end flow

Each controller drives a different pod lifecycle: scheduled by a CronJob, one per node by a DaemonSet, or run-to-completion by a Job.

Key takeaways

  • DaemonSet = exactly one pod per (matching) node, auto-scales with nodes; no replicas field.
  • Job runs to completion (completions, parallelism, backoffLimit); use restartPolicy: Never or OnFailure.
  • CronJob spawns Jobs on a cron schedule; control overlap with concurrencyPolicy and clutter with history limits.
  • Batch pods end as Completed, not Running — that's expected.

Checklist

  • [ ] Deployed a DaemonSet and saw one pod per node
  • [ ] Scoped a DaemonSet to a subset of nodes with a nodeSelector
  • [ ] Ran a Job to completion and read its logs
  • [ ] Tuned a Job with completions + parallelism
  • [ ] Created a CronJob and manually triggered it with --from
  • [ ] Set a concurrencyPolicy and watched history limits prune old Jobs