Multi-Master Cluster Setup with Load Balancer
Video: Day 55 — High-Availability Control Plane with kubeadm • Theme: survive a control-plane node loss with multiple masters behind an LB.
Key terms
| Term | Meaning |
|---|---|
| Control-plane node | Runs api-server, scheduler, controller-manager, etcd |
| HA control plane | 2+ control-plane nodes (use an odd count, usually 3) |
| Stacked etcd | etcd runs on each control-plane node |
| External etcd | etcd on its own dedicated nodes |
| Load balancer | Single VIP fronting all api-servers (HAProxy + keepalived) |
--control-plane-endpoint | Stable DNS/VIP all nodes talk to |
| Quorum | Majority of etcd members needed to commit writes |
Problem & solution
A single control-plane node is a single point of failure — lose it and you lose the api-server, scheduler, controller-manager, and (with stacked etcd) the cluster's brain. Workloads keep running, but you cannot schedule, scale, or recover.
Solution: Run an odd number of control-plane nodes (typically 3) behind a load balancer VIP. The api-server is stateless and scales horizontally; etcd forms a quorum-based cluster so it tolerates losing a minority of members.
The analogy
If a port had only one harbor-master office and it burned down, every incoming ship would be stranded: no berth assignments, no records, no decisions. So the port runs several identical offices and puts a single front-desk phone number in front of them, so callers always reach a working office even when one is gone. The offices keep one shared master ledger in sync and commit a change only when a majority agree. In Kubernetes each office is a control-plane node, the shared phone number is the load-balancer VIP over the api-servers, and the synchronized ledger is the etcd quorum.
Where this fits in the cluster
The same cluster entities appear in every day's notes; the <== marks what this day touches.
Topology: stacked vs external etcd
Two supported HA topologies. Stacked is simpler and the kubeadm default; external isolates etcd failures from the api-server at the cost of more nodes.
Quorum: why an odd number
etcd commits a write only when a majority of members agree. Quorum = floor(N/2)+1.
Lose quorum (e.g. 2 of 3 down) and etcd goes read-only; the cluster cannot accept changes until a majority returns.
members | quorum | can lose
--------+--------+---------
1 | 1 | 0 (no HA)
3 | 2 | 1
5 | 3 | 2
even counts add cost without more fault tolerance -> always use 3 or 5The load balancer (HAProxy + keepalived)
All nodes must reach the api-servers through one stable endpoint. HAProxy load-balances TCP 6443 across the masters; keepalived floats a virtual IP so the LB itself is not a single point of failure.
# /etc/haproxy/haproxy.cfg
frontend k8s-api
bind *:6443
mode tcp
default_backend k8s-cp
backend k8s-cp
mode tcp
balance roundrobin
option tcp-check
server cp1 192.168.1.101:6443 check
server cp2 192.168.1.102:6443 check
server cp3 192.168.1.103:6443 check
sudo systemctl enable --now haproxy keepalived
nc -vz 192.168.1.100 6443 # the VIP answers
Initialize the first control-plane node
The crucial flag is --control-plane-endpoint: it must be the VIP/DNS, not a
node IP, so certificates and the kubeconfig point at the LB. --upload-certs
shares the control-plane certs so peers can join without manual copying.
sudo kubeadm init \
--control-plane-endpoint "192.168.1.100:6443" \
--upload-certs \
--pod-network-cidr=10.244.0.0/16
# kubeadm prints TWO join commands: one for control planes, one for workers
mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f <your-cni>.yaml # install a CNI before nodes go Ready
Join the other control-plane nodes
Use the control-plane join command (note --control-plane and
--certificate-key). Run it on cp2 and cp3.
sudo kubeadm join 192.168.1.100:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-plane \
--certificate-key <cert-key>
# the upload-certs secret expires after 2h; regenerate if needed
sudo kubeadm init phase upload-certs --upload-certs
# regenerate a join token if it expired
sudo kubeadm token create --print-join-command
Workers join with the plain command (no --control-plane):
sudo kubeadm join 192.168.1.100:6443 \
--token <token> --discovery-token-ca-cert-hash sha256:<hash>
Verify the HA cluster and etcd health
After the nodes join, confirm every control plane is Ready and that etcd has a healthy quorum across the masters.
kubectl get nodes -o wide # 3 control-plane + workers Ready
kubectl get pods -n kube-system -o wide # api-server/etcd pod per master
# check etcd quorum from inside an etcd pod
kubectl -n kube-system exec etcd-cp1 -- etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list -w table
# simulate failure: stop cp1; cluster keeps working via cp2/cp3 (still quorum)
End-to-end: a request survives a master failure
The full path of a kubectl request and how the LB rides over a lost master.
End-to-end example: build a 3-master HA cluster behind a VIP
A complete walkthrough: stand up HAProxy + keepalived for a floating VIP,
kubeadm init the first master against that endpoint, join two more control
planes and two workers, then verify etcd quorum and ride over a master loss.
Step 1 — configure HAProxy + keepalived on the LB hosts (VIP .100).
# /etc/haproxy/haproxy.cfg
frontend k8s-api
bind *:6443
mode tcp
option tcplog
default_backend k8s-cp
backend k8s-cp
mode tcp
balance roundrobin
option tcp-check
server cp1 192.168.1.101:6443 check
server cp2 192.168.1.102:6443 check
server cp3 192.168.1.103:6443 check
# /etc/keepalived/keepalived.conf (MASTER node; BACKUP uses lower priority)
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 101
authentication { auth_type PASS; auth_pass k8shapass }
virtual_ipaddress { 192.168.1.100/24 }
}
sudo systemctl enable --now haproxy keepalived
ip addr show eth0 | grep 192.168.1.100
# inet 192.168.1.100/24 scope global secondary eth0
nc -vz 192.168.1.100 6443
# Connection to 192.168.1.100 6443 port [tcp/*] succeeded!
Step 2 — init the first control-plane node against the VIP.
sudo kubeadm init \
--control-plane-endpoint "192.168.1.100:6443" \
--upload-certs \
--pod-network-cidr=10.244.0.0/16
# ...
# You can now join any number of control-plane nodes by running:
# kubeadm join 192.168.1.100:6443 --token <t> \
# --discovery-token-ca-cert-hash sha256:<hash> \
# --control-plane --certificate-key <key>
# Then you can join any number of worker nodes by running:
# kubeadm join 192.168.1.100:6443 --token <t> \
# --discovery-token-ca-cert-hash sha256:<hash>
mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
Step 3 — join cp2 and cp3 as control planes.
# run on cp2 and cp3 (the --control-plane variant)
sudo kubeadm join 192.168.1.100:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-plane \
--certificate-key <cert-key>
# This node has joined the cluster and a new control plane instance was created.
# if the 2h upload-certs window expired, regenerate it on cp1:
sudo kubeadm init phase upload-certs --upload-certs
sudo kubeadm token create --print-join-command
Step 4 — join the worker nodes (plain command).
# run on worker1 and worker2
sudo kubeadm join 192.168.1.100:6443 \
--token <token> --discovery-token-ca-cert-hash sha256:<hash>
Step 5 — verify nodes, control-plane pods, and etcd quorum.
Step 6 — prove HA: stop cp1, cluster keeps serving via cp2/cp3.
# on cp1
sudo systemctl stop kubelet && sudo crictl stop $(sudo crictl ps -q --name kube-apiserver)
# from a workstation: HAProxy health-checks cp1 out; the VIP still answers
kubectl get nodes
# cp1 -> NotReady, all others Ready; cluster accepts changes (etcd quorum 2 of 3)
kubectl create deploy still-works --image=nginx --replicas=2
# deployment.apps/still-works created
kubectl get nodes -o wide # NAME STATUS ROLES VERSION # cp1 Ready control-plane v1.30.2 # cp2 Ready control-plane v1.30.2 # cp3 Ready control-plane v1.30.2 # worker1 Ready <none> v1.30.2 # worker2 Ready <none> v1.30.2 kubectl -n kube-system exec etcd-cp1 -- etcdctl \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ endpoint status --cluster -w table # +------------------+------------------+---------+--------+-----------+-------+ # | ENDPOINT | ID | VERSION | IS LEADER | RAFT TERM | ... | # | https://...101.. | a1b2... | 3.5.x | true | ... | ... | # | https://...102.. | c3d4... | 3.5.x | false | ... | ... | # | https://...103.. | e5f6... | 3.5.x | false | ... | ... |
Key takeaways
- A single master is an SPOF; run an odd number (3) for HA.
- Stacked etcd (default) co-locates etcd with the api-server; external etcd isolates it.
- etcd needs quorum (majority); 3 members tolerate 1 loss, 5 tolerate 2.
- A load balancer VIP (HAProxy + keepalived) fronts all api-servers on 6443.
kubeadm init --control-plane-endpoint <VIP> --upload-certs, then join masters with--control-plane --certificate-key.
Checklist
- [ ] Explained SPOF and why control-plane count is odd
- [ ] Compared stacked vs external etcd
- [ ] Computed quorum for 3 and 5 members
- [ ] Configured HAProxy/keepalived for a VIP on 6443
- [ ] Ran
kubeadm init --control-plane-endpointand joined extra masters