Pod Security Standards, Linux Capabilities, and Security Context
Video: Day 54 — Pod Security Standards & securityContext • Theme: lock pods down with PSS levels, capabilities, and a tight securityContext.
Key terms
| Term | Meaning |
|---|---|
| Pod Security Standards (PSS) | Three policy levels: privileged, baseline, restricted |
| Pod Security Admission (PSA) | Built-in controller that enforces PSS per namespace |
| securityContext | Pod/container-level security settings |
| Linux capabilities | Fine-grained slices of root power (e.g. NET_BIND_SERVICE) |
runAsNonRoot | Refuses to start a container running as UID 0 |
| Privileged container | Near-host access (all caps, devices) |
seccompProfile | Syscall filter applied to the container |
Problem & solution
By default a container can run as root, keep most Linux capabilities, and a privileged pod can nearly own the node. One compromised image then becomes a host takeover. PodSecurityPolicy (the old gate) was removed in 1.25.
Solution: Apply Pod Security Standards through the built-in Pod Security Admission controller (namespace labels), and harden each workload with a securityContext that drops capabilities and forbids root.
The analogy
Every port posts a safety code, and it comes in tiers: an anything-goes zone for trusted service vessels, a baseline rulebook that bans the obvious hazards, and a strict restricted-berth code for dangerous cargo that demands locked hatches and minimal crew privileges. A safety officer checks each ship against the code posted for its berth section and turns away any that fail. In Kubernetes those tiers are the Pod Security Standards, the officer is Pod Security Admission enforcing the level per namespace, and a ship's own locked-down rig is its securityContext.
Where this fits in the cluster
The same cluster entities appear in every day's notes; the <== marks what this day touches.
The three Pod Security Standards
PSS defines three cumulative levels. Most workloads should target restricted.
privileged -> no restrictions (host access, all caps) — trusted infra only
baseline -> blocks known privilege escalations (no privileged, no hostPID,
no hostNetwork, limited caps) — sensible minimum
restricted -> hardened: runAsNonRoot, drop ALL caps, seccomp RuntimeDefault,
no privilege escalation, restricted volume types
Enforcing with Pod Security Admission
PSA is enabled by default. You opt a namespace into a level with labels, and
choose a mode per level: enforce (reject), audit (log), warn (kubectl
warning). You typically warn/audit at restricted while enforcing baseline,
then tighten.
apiVersion: v1
kind: Namespace
metadata:
name: payments
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/audit: restricted
kubectl label namespace payments \
pod-security.kubernetes.io/enforce=restricted --overwrite
kubectl get ns payments --show-labels
# a pod that violates the level is rejected at create time
securityContext: pod vs container
Settings can sit on the pod (apply to all containers, e.g. fsGroup) or on a
container (override per container, e.g. capabilities). Container values win
where they overlap.
apiVersion: v1
kind: Pod
metadata:
name: hardened
spec:
securityContext: # pod-level
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: myapp:1.0
securityContext: # container-level
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
privileged: false
capabilities:
drop: ["ALL"]
add: ["NET_BIND_SERVICE"] # only what the app truly needs
Linux capabilities: drop ALL, add back a few
Root power is split into ~40 capabilities. Containers should drop ALL and add back only the minimum (the restricted PSS requires exactly this).
NET_BIND_SERVICE -> bind ports < 1024 without being root
CHOWN -> change file ownership
SYS_TIME -> set the system clock (rarely needed)
NET_ADMIN -> configure networking (CNI/agents only)
dropping ALL then adding only what's needed = least privilege
# inspect what a running container ended up with
kubectl exec hardened -- grep Cap /proc/1/status
# decode a CapEff bitmask
capsh --decode=00000000a80425fb
runAsNonRoot and privilege escalation
runAsNonRoot: truemakes the kubelet refuse to start a container whose image would run as UID 0 — a strong, simple guardrail.allowPrivilegeEscalation: falsesetsno_new_privs, blocking setuid binaries from gaining more than the process already has.privileged: trueis the opposite of all this — it grants all caps and host device access. Restricted/baseline forbid it.
# verify the effective user inside the container
kubectl exec hardened -- id
# uid=1000 gid=1000 ... (NOT uid=0 root)
Verifying enforcement
Try to create a non-compliant pod in the restricted namespace and read the rejection — this is exactly what the CKA-style task checks.
kubectl run bad --image=nginx -n payments \
--overrides='{"spec":{"containers":[{"name":"bad","image":"nginx","securityContext":{"privileged":true}}]}}'
# Error: violates PodSecurity "restricted:latest": privileged, allowPrivilegeEscalation,
# capabilities, runAsNonRoot, seccompProfile ...
kubectl label ns payments pod-security.kubernetes.io/enforce=baseline --overwrite # relax if needed
End-to-end: a pod create under restricted PSA
The full path a pod takes through Pod Security Admission and the kubelet checks.
End-to-end example: enforce restricted and prove a pod is rejected
A complete walkthrough: label a namespace enforce=restricted, apply a fully
compliant Pod that runs, then apply a non-compliant Pod and read the exact
PodSecurity violation list.
Step 1 — create and label the namespace at the restricted level.
# ns.yaml
apiVersion: v1
kind: Namespace
metadata:
name: secure-apps
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/audit: restricted
kubectl apply -f ns.yaml
kubectl get ns secure-apps --show-labels
# NAME STATUS AGE LABELS
# secure-apps Active 3s pod-security.kubernetes.io/enforce=restricted,...
Step 2 — apply a compliant Pod (non-root, drop ALL caps, seccomp).
# compliant.yaml
apiVersion: v1
kind: Pod
metadata:
name: compliant
namespace: secure-apps
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: nginxinc/nginx-unprivileged:1.27
ports:
- containerPort: 8080
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
kubectl apply -f compliant.yaml
# pod/compliant created
kubectl get pod compliant -n secure-apps
# NAME READY STATUS RESTARTS AGE
# compliant 1/1 Running 0 8s
kubectl exec -n secure-apps compliant -- id
# uid=1000 gid=0 ... (not root)
kubectl exec -n secure-apps compliant -- grep CapEff /proc/1/status
# CapEff: 0000000000000000 (no capabilities)
Step 3 — apply a non-compliant Pod; PSA rejects it at admission.
# bad.yaml
apiVersion: v1
kind: Pod
metadata:
name: bad
namespace: secure-apps
spec:
containers:
- name: app
image: nginx:1.27
securityContext:
privileged: true
kubectl apply -f bad.yaml
# Error from server (Forbidden): error when creating "bad.yaml": pods "bad" is forbidden:
# violates PodSecurity "restricted:latest":
# privileged (container "app" must not set securityContext.privileged=true),
# allowPrivilegeEscalation != false,
# unrestricted capabilities (must drop "ALL"),
# runAsNonRoot != true,
# seccompProfile (pod or containers must set securityContext.seccompProfile.type
# to "RuntimeDefault" or "Localhost")
kubectl get pod bad -n secure-apps
# Error from server (NotFound): pods "bad" not found
Step 4 — confirm warn mode also surfaces issues at apply time.
# a near-miss (only missing seccomp) is rejected by enforce and warned about
kubectl run almost --image=nginxinc/nginx-unprivileged:1.27 -n secure-apps \
--overrides='{"spec":{"securityContext":{"runAsNonRoot":true,"runAsUser":1000},"containers":[{"name":"almost","image":"nginxinc/nginx-unprivileged:1.27","securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]}}}]}}'
# Warning: would violate PodSecurity "restricted:latest": seccompProfile ...
# Error from server (Forbidden): ... seccompProfile
Step 5 — relax the level only if a workload genuinely needs it.
kubectl label ns secure-apps \
pod-security.kubernetes.io/enforce=baseline --overwrite
# namespace/secure-apps labeled (baseline still blocks privileged/hostNetwork)
Key takeaways
- PSS has three levels — privileged / baseline / restricted; aim for restricted.
- Pod Security Admission enforces a level per namespace via labels and
enforce/audit/warnmodes. - PodSecurityPolicy is gone (1.25); PSA + securityContext replace it.
- Harden pods:
runAsNonRoot,allowPrivilegeEscalation: false,readOnlyRootFilesystem,seccompProfile: RuntimeDefault. - Drop ALL capabilities and add back only the few the app needs (least privilege).
Checklist
- [ ] Named the three PSS levels and what restricted requires
- [ ] Labeled a namespace with
pod-security.kubernetes.io/enforce - [ ] Wrote a pod that drops ALL caps and runs as non-root
- [ ] Saw a privileged pod rejected in a restricted namespace
- [ ] Verified the effective UID/caps with
kubectl exec