Taints and Tolerations

Video: Day 14/40 — Taints and Tolerations in Kubernetes • https://www.youtube.com/watch?v=nwoS2tK2s6Q • Duration: ~26 min

Published 21 Jun 2026

Key terms

Term	Meaning
Taint	A node mark that repels pods
Toleration	A pod's permission to land on a tainted node
NoSchedule	Effect blocking new pods
PreferNoSchedule	Soft "avoid if possible" effect
NoExecute	Effect that also evicts non-tolerating pods
tolerationSeconds	Grace time before NoExecute eviction
key/value/effect	The three parts of a taint

Problem & solution

By default the scheduler can place any pod on any node, but some nodes are special (GPU, dedicated, control-plane) and should repel general workloads unless a pod explicitly opts in.

Solution: Taint nodes to repel pods, and add matching tolerations to the pods allowed to run there, to dedicate nodes.

The analogy

A busy port marks certain docks with a "hazmat berth, permit required" sign so ordinary ships steer clear, and only a vessel carrying the matching permit is waved in. A Kubernetes taint is that sign painted on a node, and a toleration is the permit a pod carries to be allowed onto it. A ship with no permit is turned away, just as a pod with no matching toleration stays Pending elsewhere.

Where this fits in the cluster

Taints live on nodes; tolerations live on pods. The scheduler reads both to decide placement. This is a node-level admission gate.

The idea

Taints repel pods from nodes. Tolerations let specific pods stick anyway. It's the opposite of attraction — taints push pods AWAY unless they tolerate it.

Graph legend — a GPU node repels everything but GPU workloads:

Graph node	Maps to	What it does
GPU node tainted nvidia.com/gpu=true NoSchedule	a node with `kubectl taint nodes ... nvidia.com/gpu=true:NoSchedule`	Repels pods lacking the matching toleration
nginx pod - no toleration	an ordinary workload	Blocked from the GPU node
triton-inference-server pod - has toleration	a pod with a `nvidia.com/gpu` toleration	Allowed onto the dedicated GPU node

Mnemonic: Taint = the bouncer on the node. Toleration = the VIP pass on the pod. Note: a toleration allows placement; it does not force it (that's affinity).

Taint a node

You apply a taint to a node with kubectl taint, and remove it by repeating the command with a trailing minus.

kubectl taint nodes cka-gpu nvidia.com/gpu=true:NoSchedule
kubectl describe node cka-gpu | grep -i taint

# remove a taint (trailing minus)
kubectl taint nodes cka-gpu nvidia.com/gpu=true:NoSchedule-

Taint format:

   key            = value : effect
   nvidia.com/gpu = true  : NoSchedule

The 3 taint effects

The effect decides how harshly the taint treats pods that don't tolerate it, ranging from soft avoidance to outright eviction.

   NoSchedule        -> new pods without toleration are NOT placed here
   PreferNoSchedule  -> soft; avoid if possible, but allowed if needed
   NoExecute         -> also EVICTS already-running pods that don't tolerate

Toleration on a pod

A toleration in the pod spec must match the node's taint key, value, and effect for the pod to be allowed onto that node.

apiVersion: v1
kind: Pod
metadata:
  name: triton
spec:
  tolerations:
    - key: "nvidia.com/gpu"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  containers:
    - name: triton
      image: nvcr.io/nvidia/tritonserver:24.05-py3
      resources:
        limits:
          nvidia.com/gpu: 1

operator:

   Equal   -> key, value, and effect must all match
   Exists  -> key + effect match; value ignored (tolerate any value)

NoExecute extra field: tolerationSeconds

For NoExecute taints, tolerationSeconds lets a tolerating pod linger for a set time before it is finally evicted.

    - key: "node.kubernetes.io/not-ready"
      operator: "Exists"
      effect: "NoExecute"
      tolerationSeconds: 300     # stay 5 min after taint, then evict

Why control-plane nodes run no normal pods

Control-plane nodes carry a built-in taint that keeps ordinary workloads off the master unless they explicitly tolerate it.

   kubectl describe node <control-plane> | grep Taints
   -> node-role.kubernetes.io/control-plane:NoSchedule
   This taint keeps your workloads off the master by default.

Taints/Tolerations vs Affinity (don't confuse them)

These solve opposite problems: taints repel pods from a node, while affinity attracts a pod toward nodes.

   Taint/Toleration -> NODE repels pods   (pod needs permission to land)
   Node Affinity    -> POD attracts nodes (pod prefers/requires nodes)  [Day 15]
   Best practice: combine both to truly dedicate nodes.

End-to-end example: dedicate a GPU node

Taint a node so only GPU workloads land there, then deploy a pod that tolerates it. A plain pod is rejected; the tolerating pod is admitted.

Graph legend — dedicating a GPU node to a real inference server:

Graph node	Maps to	What it does
node cka-gpu tainted nvidia.com/gpu=true NoSchedule	`kubectl taint nodes cka-gpu nvidia.com/gpu=true:NoSchedule`	Repels non-GPU workloads from the node
nginx plain-pod - no toleration	`kubectl run plain --image=nginx:1.27`	Rejected; stays Pending elsewhere
triton-inference-server pod - has toleration	a pod tolerating `nvidia.com/gpu`	Admitted onto the GPU node to serve models

# 1) taint the node
kubectl taint nodes cka-gpu nvidia.com/gpu=true:NoSchedule

# 2) a plain pod will NOT land on cka-gpu
kubectl run plain --image=nginx:1.27

# 3) a tolerating GPU pod is allowed
kubectl apply -f triton-pod.yaml
kubectl get pods -o wide                 # triton on cka-gpu, plain elsewhere
kubectl describe node cka-gpu | grep -i taint

# triton-pod.yaml
apiVersion: v1
kind: Pod
metadata: { name: triton }
spec:
  tolerations:
    - { key: "nvidia.com/gpu", operator: "Equal", value: "true", effect: "NoSchedule" }
  containers:
    - name: triton
      image: nvcr.io/nvidia/tritonserver:24.05-py3
      resources:
        limits: { nvidia.com/gpu: 1 }

To make the node exclusively GPU (force the pod onto it, not just allow it), add node affinity too — see Day 15's "dedicated node" pattern.

End-to-end flow

The scheduler matches a node's taints against a pod's tolerations to decide placement, and NoExecute evicts pods that do not tolerate.

Graph legend — the scheduler matches taints against tolerations:

Graph node	Maps to	What it does
triton pod submitted	a pod with a `nvidia.com/gpu` toleration	Requests scheduling
nvidia.com/gpu taint tolerated?	the node's taint vs the pod's `tolerations`	Decides if the pod may land
triton scheduled on cka-gpu	a successful placement	Pod runs on the GPU node
NoExecute taint added	`effect: NoExecute`	Also evicts already-running pods that don't tolerate

Key takeaways

Taint a node to repel pods; add a matching toleration to a pod to allow it.
Effects: NoSchedule, PreferNoSchedule (soft), NoExecute (evicts).
Toleration only permits, it does not attract — pair with affinity.

Checklist

[ ] Tainted a node and saw a pod fail to schedule
[ ] Added a matching toleration and saw it schedule
[ ] Tested NoExecute evicting a running pod
[ ] Inspected the control-plane node's default taint