55

Multi-Master Cluster Setup with Load Balancer

Video: Day 55 — High-Availability Control Plane with kubeadm • Theme: survive a control-plane node loss with multiple masters behind an LB.

Key terms

TermMeaning
Control-plane nodeRuns api-server, scheduler, controller-manager, etcd
HA control plane2+ control-plane nodes (use an odd count, usually 3)
Stacked etcdetcd runs on each control-plane node
External etcdetcd on its own dedicated nodes
Load balancerSingle VIP fronting all api-servers (HAProxy + keepalived)
--control-plane-endpointStable DNS/VIP all nodes talk to
QuorumMajority of etcd members needed to commit writes

Problem & solution

A single control-plane node is a single point of failure — lose it and you lose the api-server, scheduler, controller-manager, and (with stacked etcd) the cluster's brain. Workloads keep running, but you cannot schedule, scale, or recover.

Solution: Run an odd number of control-plane nodes (typically 3) behind a load balancer VIP. The api-server is stateless and scales horizontally; etcd forms a quorum-based cluster so it tolerates losing a minority of members.

The analogy

If a port had only one harbor-master office and it burned down, every incoming ship would be stranded: no berth assignments, no records, no decisions. So the port runs several identical offices and puts a single front-desk phone number in front of them, so callers always reach a working office even when one is gone. The offices keep one shared master ledger in sync and commit a change only when a majority agree. In Kubernetes each office is a control-plane node, the shared phone number is the load-balancer VIP over the api-servers, and the synchronized ledger is the etcd quorum.

Where this fits in the cluster

The same cluster entities appear in every day's notes; the <== marks what this day touches.

Topology: stacked vs external etcd

Two supported HA topologies. Stacked is simpler and the kubeadm default; external isolates etcd failures from the api-server at the cost of more nodes.

Quorum: why an odd number

etcd commits a write only when a majority of members agree. Quorum = floor(N/2)+1.

Lose quorum (e.g. 2 of 3 down) and etcd goes read-only; the cluster cannot accept changes until a majority returns.

   members | quorum | can lose
   --------+--------+---------
      1    |   1    |    0      (no HA)
      3    |   2    |    1
      5    |   3    |    2
   even counts add cost without more fault tolerance -> always use 3 or 5

The load balancer (HAProxy + keepalived)

All nodes must reach the api-servers through one stable endpoint. HAProxy load-balances TCP 6443 across the masters; keepalived floats a virtual IP so the LB itself is not a single point of failure.

# /etc/haproxy/haproxy.cfg
frontend k8s-api
    bind *:6443
    mode tcp
    default_backend k8s-cp
backend k8s-cp
    mode tcp
    balance roundrobin
    option tcp-check
    server cp1 192.168.1.101:6443 check
    server cp2 192.168.1.102:6443 check
    server cp3 192.168.1.103:6443 check
sudo systemctl enable --now haproxy keepalived
nc -vz 192.168.1.100 6443        # the VIP answers

Initialize the first control-plane node

The crucial flag is --control-plane-endpoint: it must be the VIP/DNS, not a node IP, so certificates and the kubeconfig point at the LB. --upload-certs shares the control-plane certs so peers can join without manual copying.

sudo kubeadm init \
  --control-plane-endpoint "192.168.1.100:6443" \
  --upload-certs \
  --pod-network-cidr=10.244.0.0/16

# kubeadm prints TWO join commands: one for control planes, one for workers
mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f <your-cni>.yaml      # install a CNI before nodes go Ready

Join the other control-plane nodes

Use the control-plane join command (note --control-plane and --certificate-key). Run it on cp2 and cp3.

sudo kubeadm join 192.168.1.100:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane \
  --certificate-key <cert-key>
# the upload-certs secret expires after 2h; regenerate if needed
sudo kubeadm init phase upload-certs --upload-certs
# regenerate a join token if it expired
sudo kubeadm token create --print-join-command

Workers join with the plain command (no --control-plane):

sudo kubeadm join 192.168.1.100:6443 \
  --token <token> --discovery-token-ca-cert-hash sha256:<hash>

Verify the HA cluster and etcd health

After the nodes join, confirm every control plane is Ready and that etcd has a healthy quorum across the masters.

kubectl get nodes -o wide                       # 3 control-plane + workers Ready
kubectl get pods -n kube-system -o wide          # api-server/etcd pod per master

# check etcd quorum from inside an etcd pod
kubectl -n kube-system exec etcd-cp1 -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  member list -w table

# simulate failure: stop cp1; cluster keeps working via cp2/cp3 (still quorum)

End-to-end: a request survives a master failure

The full path of a kubectl request and how the LB rides over a lost master.

End-to-end example: build a 3-master HA cluster behind a VIP

A complete walkthrough: stand up HAProxy + keepalived for a floating VIP, kubeadm init the first master against that endpoint, join two more control planes and two workers, then verify etcd quorum and ride over a master loss.

Step 1 — configure HAProxy + keepalived on the LB hosts (VIP .100).

# /etc/haproxy/haproxy.cfg
frontend k8s-api
    bind *:6443
    mode tcp
    option tcplog
    default_backend k8s-cp
backend k8s-cp
    mode tcp
    balance roundrobin
    option tcp-check
    server cp1 192.168.1.101:6443 check
    server cp2 192.168.1.102:6443 check
    server cp3 192.168.1.103:6443 check
# /etc/keepalived/keepalived.conf (MASTER node; BACKUP uses lower priority)
vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 101
    authentication { auth_type PASS; auth_pass k8shapass }
    virtual_ipaddress { 192.168.1.100/24 }
}
sudo systemctl enable --now haproxy keepalived
ip addr show eth0 | grep 192.168.1.100
#     inet 192.168.1.100/24 scope global secondary eth0
nc -vz 192.168.1.100 6443
# Connection to 192.168.1.100 6443 port [tcp/*] succeeded!

Step 2 — init the first control-plane node against the VIP.

sudo kubeadm init \
  --control-plane-endpoint "192.168.1.100:6443" \
  --upload-certs \
  --pod-network-cidr=10.244.0.0/16
# ...
# You can now join any number of control-plane nodes by running:
#   kubeadm join 192.168.1.100:6443 --token <t> \
#     --discovery-token-ca-cert-hash sha256:<hash> \
#     --control-plane --certificate-key <key>
# Then you can join any number of worker nodes by running:
#   kubeadm join 192.168.1.100:6443 --token <t> \
#     --discovery-token-ca-cert-hash sha256:<hash>

mkdir -p $HOME/.kube && sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

Step 3 — join cp2 and cp3 as control planes.

# run on cp2 and cp3 (the --control-plane variant)
sudo kubeadm join 192.168.1.100:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane \
  --certificate-key <cert-key>
# This node has joined the cluster and a new control plane instance was created.

# if the 2h upload-certs window expired, regenerate it on cp1:
sudo kubeadm init phase upload-certs --upload-certs
sudo kubeadm token create --print-join-command

Step 4 — join the worker nodes (plain command).

# run on worker1 and worker2
sudo kubeadm join 192.168.1.100:6443 \
  --token <token> --discovery-token-ca-cert-hash sha256:<hash>

Step 5 — verify nodes, control-plane pods, and etcd quorum.

Step 6 — prove HA: stop cp1, cluster keeps serving via cp2/cp3.

# on cp1
sudo systemctl stop kubelet && sudo crictl stop $(sudo crictl ps -q --name kube-apiserver)

# from a workstation: HAProxy health-checks cp1 out; the VIP still answers
kubectl get nodes
# cp1 -> NotReady, all others Ready; cluster accepts changes (etcd quorum 2 of 3)
kubectl create deploy still-works --image=nginx --replicas=2
# deployment.apps/still-works created
kubectl get nodes -o wide
# NAME      STATUS   ROLES           VERSION
# cp1       Ready    control-plane   v1.30.2
# cp2       Ready    control-plane   v1.30.2
# cp3       Ready    control-plane   v1.30.2
# worker1   Ready    <none>          v1.30.2
# worker2   Ready    <none>          v1.30.2

kubectl -n kube-system exec etcd-cp1 -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  endpoint status --cluster -w table
# +------------------+------------------+---------+--------+-----------+-------+
# |     ENDPOINT     |        ID        | VERSION | IS LEADER | RAFT TERM |  ... |
# | https://...101.. | a1b2...          | 3.5.x   | true      |   ...     |  ... |
# | https://...102.. | c3d4...          | 3.5.x   | false     |   ...     |  ... |
# | https://...103.. | e5f6...          | 3.5.x   | false     |   ...     |  ... |

Key takeaways

  • A single master is an SPOF; run an odd number (3) for HA.
  • Stacked etcd (default) co-locates etcd with the api-server; external etcd isolates it.
  • etcd needs quorum (majority); 3 members tolerate 1 loss, 5 tolerate 2.
  • A load balancer VIP (HAProxy + keepalived) fronts all api-servers on 6443.
  • kubeadm init --control-plane-endpoint <VIP> --upload-certs, then join masters with --control-plane --certificate-key.

Checklist

  • [ ] Explained SPOF and why control-plane count is odd
  • [ ] Compared stacked vs external etcd
  • [ ] Computed quorum for 3 and 5 members
  • [ ] Configured HAProxy/keepalived for a VIP on 6443
  • [ ] Ran kubeadm init --control-plane-endpoint and joined extra masters