Cluster Component Version Skew
Control plane and nodes have version skew limits.
Rules
Cluster version skew rules constrain which Kubernetes component versions can coexist. The discipline is understanding the rules and respecting them during upgrades; violating them produces broken clusters.
What the rules look like:
- kubelet less than or equal to kube-apiserver.: The kubelet on each node must not be newer than the cluster's API server. The control plane must be at least as new as the nodes; the rule prevents node-side features the API server cannot understand.
- kube-proxy equals kubelet.: The kube-proxy version should match the kubelet on the same node. Both are node components; both should upgrade together.
- Up to 2 versions skew typical.: The official skew tolerance is typically 2 minor versions. Newer kubelets running against older API servers produce undefined behavior; the team stays within the supported skew.
- Per-component rules.: Each component has its own skew rules. The team's upgrade plan respects each rule; the cluster stays healthy across upgrades.
- Read the docs.: The exact skew rules evolve. The team consults the official Kubernetes version skew policy for their version; the documentation is authoritative.
The rules are foundational. Understanding them prevents broken clusters during upgrades.
Upgrade order
The upgrade order respects the skew rules. Control plane first; nodes second; sequential progression maintains the rules throughout the upgrade.
- Control plane first.: The API server, controller-manager, scheduler upgrade first. The control plane becomes the newer version; the nodes still run the older version; the skew rule (kubelet less than or equal to API server) is satisfied.
- Nodes second.: After the control plane is stable on the new version, nodes upgrade. The kubelet and kube-proxy match each other; the skew with the control plane is reduced.
- Sequential to maintain skew rules.: The team does not skip steps. Each phase happens completely before the next; the skew rules are respected throughout.
- Test after each phase.: The team verifies after the control plane upgrade. After the nodes upgrade. Each verification catches issues; the upgrade proceeds when verified.
- Document the order.: The team's upgrade runbook documents the order. New engineers follow the procedure; the discipline is consistent.
The order is the discipline. Respecting it prevents the cluster from getting into invalid states.
Rollback
If an upgrade fails, rollback is needed. The skew rules apply to rollback too; the team plans both directions.
- Skew rules apply to rollback too.: Rolling back the API server while leaving newer kubelets violates the rules. The rollback must respect the same constraints; the team plans accordingly.
- Plan in both directions.: The upgrade plan includes the rollback plan. Each forward step has a corresponding rollback; the team can navigate either direction safely.
- Reverse order.: Rollback is in reverse order: nodes first, control plane second. The skew rules are respected during rollback as during upgrade.
- Test the rollback.: Some teams test rollback in non-production. The team's confidence in rollback grows; production rollback is informed.
- Document the rollback procedure.: Like the upgrade procedure, rollback is documented. New engineers can perform it; the discipline transfers; the institutional knowledge is preserved.
Cluster version skew is one of those Kubernetes operational disciplines that prevents specific failure modes during upgrades. Nova AI Ops integrates with cluster version data, surfaces skew patterns, and supports the team's upgrade discipline.