Multi-Master Cluster Setup with Load Balancer

Video: Day 55 — High-Availability Control Plane with kubeadm • Theme: survive a control-plane node loss with multiple masters behind an LB.

Published 22 Jun 2026

Key terms

Term	Meaning
Control-plane node	Runs api-server, scheduler, controller-manager, etcd
HA control plane	2+ control-plane nodes (use an odd count, usually 3)
Stacked etcd	etcd runs on each control-plane node
External etcd	etcd on its own dedicated nodes
Load balancer	Single VIP fronting all api-servers (HAProxy + keepalived)
`--control-plane-endpoint`	Stable DNS/VIP all nodes talk to
Quorum	Majority of etcd members needed to commit writes

Problem & solution

A single control-plane node is a single point of failure — lose it and you lose the api-server, scheduler, controller-manager, and (with stacked etcd) the cluster's brain. Workloads keep running, but you cannot schedule, scale, or recover.

Solution: Run an odd number of control-plane nodes (typically 3) behind a load balancer VIP. The api-server is stateless and scales horizontally; etcd forms a quorum-based cluster so it tolerates losing a minority of members.

How this guide differs from the source video. The walkthrough video builds 2 masters + 2 workers behind a single dedicated HAProxy node on AWS — and calls out that one LB node is itself a single point of failure. This guide keeps that base but hardens it: an odd 3 masters (real quorum) and keepalived floating the VIP so the load balancer isn't a SPOF. Everything else (kubeadm flags, join flow, CNI) is the same. If you only have two masters, you have no fault tolerance for etcd (quorum 2, can lose 0) — fine for a lab, not production.

Node prerequisites (every node). Each master and worker needs the same base prep from Day 27 before kubeadm init/join: disable swap, load the overlay/br_netfilter modules and sysctl bridge-nf/ip-forward settings, install a container runtime (containerd + runc), the CNI plugins, and kubeadm/kubelet/kubectl. The kubelet stays inactive (dead) until the node is initialized or joined — that is expected.

The analogy

If a port had only one harbor-master office and it burned down, every incoming ship would be stranded: no berth assignments, no records, no decisions. So the port runs several identical offices and puts a single front-desk phone number in front of them, so callers always reach a working office even when one is gone. The offices keep one shared master ledger in sync and commit a change only when a majority agree. In Kubernetes each office is a control-plane node, the shared phone number is the load-balancer VIP over the api-servers, and the synchronized ledger is the etcd quorum.

Graph legend — each Kubernetes node maps a harbor-office concept to the HA control plane:

Graph node	Maps to	What it does
kubectl or kubelet	API clients	Always talk to the VIP, never a single master
load balancer VIP	HAProxy + keepalived VIP on :6443	One stable endpoint fronting all api-servers
control-plane nodes	the 3 control-plane nodes	Stateless api-servers serving requests in parallel
etcd quorum	the etcd cluster	Commits writes only when a majority agrees

Where this fits in the cluster

The same cluster entities appear in every day's notes; the diagram below shows where this day's topic fits.

Graph legend — this day replicates the control plane across three nodes and fronts them with a VIP:

Graph node	Maps to	What it does
cp1 / cp2 / cp3	the 3 control-plane nodes	Each runs its own api-server/etcd/scheduler/controller-manager
load balancer VIP :6443	HAProxy + keepalived endpoint	Routes node traffic to a healthy api-server
every kubelet talks to the LB VIP	worker `kubelet` config	Points at the VIP so any master can serve it

Topology: stacked vs external etcd

Two supported HA topologies. Stacked is simpler and the kubeadm default; external isolates etcd failures from the api-server at the cost of more nodes.

Graph legend — the two supported kubeadm HA topologies:

Graph node	Maps to	What it does
cp1/cp2/cp3 api and etcd	stacked topology	Each control-plane node also runs an etcd member (kubeadm default)
cp1/cp2/cp3 api	external topology	Control-plane nodes run only the api-server stack
etcd1/etcd2/etcd3	dedicated etcd nodes	etcd runs on its own hosts, isolating its failures

Quorum: why an odd number

etcd commits a write only when a majority of members agree. Quorum = floor(N/2)+1.

Lose quorum (e.g. 2 of 3 down) and etcd goes read-only; the cluster cannot accept changes until a majority returns.

   members | quorum | can lose
   --------+--------+---------
      1    |   1    |    0      (no HA)
      3    |   2    |    1
      5    |   3    |    2
   even counts add cost without more fault tolerance -> always use 3 or 5

The load balancer (HAProxy + keepalived)

All nodes must reach the api-servers through one stable endpoint. HAProxy load-balances TCP 6443 across the masters; keepalived floats a virtual IP so the LB itself is not a single point of failure.

Graph legend — each node is a real piece of the HAProxy + keepalived front end:

Graph node	Maps to	What it does
kubectl, kubelets, join	API clients	All reach the control plane via the VIP
VIP 192.168.1.100 6443, keepalived owns it	keepalived `virtual_ipaddress`	Floating IP that fails over between LB hosts
HAProxy TCP 6443 round-robins	HAProxy `backend k8s-cp`	Load-balances 6443 across the masters with health checks
cp1/cp2/cp3 6443	each api-server	The backend servers HAProxy forwards to

# /etc/haproxy/haproxy.cfg
frontend k8s-api
    bind *:6443
    mode tcp
    default_backend k8s-cp
backend k8s-cp
    mode tcp
    balance roundrobin
    option tcp-check
    server cp1 192.168.1.101:6443 check
    server cp2 192.168.1.102:6443 check
    server cp3 192.168.1.103:6443 check

sudo systemctl enable --now haproxy keepalived
nc -vz 192.168.1.100 6443        # the VIP answers

Initialize the first control-plane node

The crucial flag is --control-plane-endpoint: it must be the VIP/DNS, not a node IP, so certificates and the kubeconfig point at the LB. --upload-certs shares the control-plane certs so peers can join without manual copying. --apiserver-advertise-address is the private IP of this first node (the one the LB backend points to); on a single-node Day 27 install you skip it, but in HA you set it explicitly.

sudo kubeadm init \
  --control-plane-endpoint "192.168.1.100:6443" \
  --upload-certs \
  --pod-network-cidr=10.244.0.0/16 \
  --apiserver-advertise-address=192.168.1.101   # private IP of THIS node

# kubeadm prints TWO join commands: one for control planes, one for workers
mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f <your-cni>.yaml      # install a CNI before nodes go Ready

CNI choice. Pick the CNI's matching --pod-network-cidr. Flannel uses 10.244.0.0/16; Calico (used in the source walkthrough) defaults to 192.168.0.0/16 and installs via its operator:
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/tigera-operator.yaml
kubectl apply  -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/custom-resources.yaml
Use the same --pod-network-cidr here as in the CNI manifest, or pods stay NotReady.

Join the other control-plane nodes

Use the control-plane join command (note --control-plane and --certificate-key). Run it on cp2 and cp3.

sudo kubeadm join 192.168.1.100:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane \
  --certificate-key <cert-key>

# the upload-certs secret expires after 2h; regenerate if needed
sudo kubeadm init phase upload-certs --upload-certs
# regenerate a join token if it expired
sudo kubeadm token create --print-join-command

Workers join with the plain command (no --control-plane):

sudo kubeadm join 192.168.1.100:6443 \
  --token <token> --discovery-token-ca-cert-hash sha256:<hash>

Verify the HA cluster and etcd health

After the nodes join, confirm every control plane is Ready and that etcd has a healthy quorum across the masters.

kubectl get nodes -o wide                       # 3 control-plane + workers Ready
kubectl get pods -n kube-system -o wide          # api-server/etcd pod per master

# check etcd quorum from inside an etcd pod
kubectl -n kube-system exec etcd-cp1 -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  member list -w table

# simulate failure: stop cp1; cluster keeps working via cp2/cp3 (still quorum)

End-to-end: a request survives a master failure

The full path of a kubectl request and how the LB rides over a lost master.

Graph legend — each node is a real step a write takes, including a master loss:

Graph node	Maps to	What it does
kubectl applies a manifest to the VIP 6443	client request	Hits the VIP, not a specific master
keepalived VIP on active HAProxy	keepalived	Owns the VIP and fails it over if the LB dies
HAProxy load-balances across the api-servers	HAProxy	Spreads requests over cp1/cp2/cp3
cp1 / cp2 / cp3 api-server	the 3 masters behind the VIP	Whichever the LB picks proposes the write to etcd
etcd majority acknowledges	etcd quorum	Commits only when a majority agrees
etcd read-only until majority returns	quorum loss	Blocks writes if fewer than majority are up
cp1 dies, HAProxy health check drops it	health check	Removes the failed master from rotation
Requests routed to cp2 and cp3	failover	Cluster keeps serving with quorum 2 of 3

End-to-end example: build a 3-master HA cluster behind a VIP

A complete walkthrough: stand up HAProxy + keepalived for a floating VIP, kubeadm init the first master against that endpoint, join two more control planes and two workers, then verify etcd quorum and ride over a master loss.

Step 1 — configure HAProxy + keepalived on the LB hosts (VIP .100).

# /etc/haproxy/haproxy.cfg
frontend k8s-api
    bind *:6443
    mode tcp
    option tcplog
    default_backend k8s-cp
backend k8s-cp
    mode tcp
    balance roundrobin
    option tcp-check
    server cp1 192.168.1.101:6443 check
    server cp2 192.168.1.102:6443 check
    server cp3 192.168.1.103:6443 check

# /etc/keepalived/keepalived.conf (MASTER node; BACKUP uses lower priority)
vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 101
    authentication { auth_type PASS; auth_pass k8shapass }
    virtual_ipaddress { 192.168.1.100/24 }
}

sudo systemctl enable --now haproxy keepalived
ip addr show eth0 | grep 192.168.1.100
#     inet 192.168.1.100/24 scope global secondary eth0
nc -vz 192.168.1.100 6443
# Connection to 192.168.1.100 6443 port [tcp/*] succeeded!

Step 2 — init the first control-plane node against the VIP.

sudo kubeadm init \
  --control-plane-endpoint "192.168.1.100:6443" \
  --upload-certs \
  --pod-network-cidr=10.244.0.0/16 \
  --apiserver-advertise-address=192.168.1.101   # private IP of cp1
# ...
# You can now join any number of control-plane nodes by running:
#   kubeadm join 192.168.1.100:6443 --token <t> \
#     --discovery-token-ca-cert-hash sha256:<hash> \
#     --control-plane --certificate-key <key>
# Then you can join any number of worker nodes by running:
#   kubeadm join 192.168.1.100:6443 --token <t> \
#     --discovery-token-ca-cert-hash sha256:<hash>

mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

Step 3 — join cp2 and cp3 as control planes.

# run on cp2 and cp3 (the --control-plane variant)
sudo kubeadm join 192.168.1.100:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane \
  --certificate-key <cert-key>
# This node has joined the cluster and a new control plane instance was created.

# if the 2h upload-certs window expired, regenerate it on cp1:
sudo kubeadm init phase upload-certs --upload-certs
sudo kubeadm token create --print-join-command

Step 4 — join the worker nodes (plain command).

# run on worker1 and worker2
sudo kubeadm join 192.168.1.100:6443 \
  --token <token> --discovery-token-ca-cert-hash sha256:<hash>

Step 5 — verify nodes, control-plane pods, and etcd quorum.

Step 6 — prove HA: stop cp1, cluster keeps serving via cp2/cp3.

# on cp1
sudo systemctl stop kubelet && sudo crictl stop $(sudo crictl ps -q --name kube-apiserver)

# from a workstation: HAProxy health-checks cp1 out; the VIP still answers
kubectl get nodes
# cp1 -> NotReady, all others Ready; cluster accepts changes (etcd quorum 2 of 3)
kubectl create deploy still-works --image=nginx --replicas=2
# deployment.apps/still-works created

Graph legend — each node is a real component of the 3-master cluster built above:

Graph node	Maps to	What it does
kubectl or kubelet	API clients	Connect through the VIP only
VIP 192.168.1.100 6443	keepalived VIP	The single endpoint from `--control-plane-endpoint`
keepalived owns VIP, fails over to BACKUP	keepalived VRRP	Moves the VIP if the active LB fails
HAProxy TCP 6443 round-robin	HAProxy backend	Distributes to cp1/cp2/cp3 api-servers
cp1/cp2/cp3 api-server	the 3 masters	Serve the api in parallel
etcd cp1/cp2/cp3	stacked etcd members	Replicate state and form quorum
HAProxy health check drops cp1	failover	Removes the stopped master from rotation
traffic to cp2 and cp3, etcd quorum 2 of 3 holds	surviving cluster	Keeps serving and accepting writes

Step 7 — access the cluster from your workstation (no SSH). You don't have to SSH into a master to run kubectl. Copy a master's admin kubeconfig to your laptop and point its server: at the load balancer's reachable address — the public IP/DNS of the LB when you're outside the VPC, the private VIP when you're inside it.

# on a master: print the admin kubeconfig
sudo cat /etc/kubernetes/admin.conf       # copy this to your workstation

# on your workstation: save it, then repoint server: at the LB you can reach
#   server: https://<LB-PUBLIC-IP>:6443     # from outside the VPC
#   server: https://192.168.1.100:6443      # from inside the VPC (the VIP)
KUBECONFIG=~/ha-kubeconfig.yaml kubectl get nodes

The cert is valid because --control-plane-endpoint already put the LB endpoint in the API server's SANs. If you get network is unreachable, you used an address the LB can't be reached on (e.g. the private IP from outside the VPC) — switch to the public one and confirm the LB security group allows 6443 from your IP.

Verify the LB is actually balancing. On the LB node, watch HAProxy route requests across the masters:

sudo journalctl -u haproxy -f          # shows requests fanned out to master1/master2/...

kubectl get nodes -o wide
# NAME      STATUS   ROLES           VERSION
# cp1       Ready    control-plane   v1.30.2
# cp2       Ready    control-plane   v1.30.2
# cp3       Ready    control-plane   v1.30.2
# worker1   Ready    <none>          v1.30.2
# worker2   Ready    <none>          v1.30.2

kubectl -n kube-system exec etcd-cp1 -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  endpoint status --cluster -w table
# +------------------+------------------+---------+--------+-----------+-------+
# |     ENDPOINT     |        ID        | VERSION | IS LEADER | RAFT TERM |  ... |
# | https://...101.. | a1b2...          | 3.5.x   | true      |   ...     |  ... |
# | https://...102.. | c3d4...          | 3.5.x   | false     |   ...     |  ... |
# | https://...103.. | e5f6...          | 3.5.x   | false     |   ...     |  ... |

Key takeaways

A single master is an SPOF; run an odd number (3) for HA.
Stacked etcd (default) co-locates etcd with the api-server; external etcd isolates it.
etcd needs quorum (majority); 3 members tolerate 1 loss, 5 tolerate 2.
A load balancer VIP (HAProxy + keepalived) fronts all api-servers on 6443.
kubeadm init --control-plane-endpoint <VIP> --upload-certs, then join masters with --control-plane --certificate-key.

Checklist

[ ] Explained SPOF and why control-plane count is odd
[ ] Compared stacked vs external etcd
[ ] Computed quorum for 3 and 5 members
[ ] Configured HAProxy/keepalived for a VIP on 6443
[ ] Ran kubeadm init --control-plane-endpoint and joined extra masters