4 min read | by Jordi Prats
If we want just a subset of Pods to be able to be scheduled on a given node we can achieve it using taints and tolerations
With a taint we can tell the cluster not to schedule Pods on this node, but with a toleration on a Pod we can allow it to tolerate this taint
First we are going to create a taint on a node:
$ kubectl taint nodes minikube-m02 application=example:NoSchedule node/minikube-m02 tainted
Using kubect describe node we will be able to see that it have been applied:
$ kubectl describe node minikube-m02 Name: minikube-m02 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=minikube-m02 kubernetes.io/os=linux Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 19 Aug 2021 18:14:37 +0200 Taints: node.kubernetes.io/not-ready:NoExecute application=example:NoSchedule node.kubernetes.io/not-ready:NoSchedule Unschedulable: false (...)
We can use a nodeSelector to try to schedule a Pod on this node:
apiVersion: v1 kind: Pod metadata: name: example spec: containers: - name: nginx image: nginx nodeSelector: kubernetes.io/hostname: minikube-m02
But the node will remain in Pending state:
$ kubectl get pods NAME READY STATUS RESTARTS AGE example 0/1 Pending 0 3s
We can check the reason using the kubectl describe: The only node that matches the nodeSelector has a taint that does not tolerate, so it cannot be scheduled there:
$ kubectl describe pod example Name: example Namespace: default Priority: 0 Node: <none> Labels: <none> Annotations: <none> Status: Pending IP: IPs: <none> Containers: nginx: Image: nginx Port: <none> Host Port: <none> Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f7bff (ro) Conditions: Type Status PodScheduled False Volumes: kube-api-access-f7bff: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: application=example Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 11s (x2 over 13s) default-scheduler 0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {application: example}, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
We can add a toleration on the Pod for the taint that we have created:
apiVersion: v1 kind: Pod metadata: name: example spec: containers: - name: nginx image: nginx nodeSelector: kubernetes.io/hostname: minikube-m02 tolerations: - key: "application" operator: "Equal" value: "example" effect: "NoSchedule"
If we create this Pod we will be able to see how it is scheduled to run on this node, ignoring (tolerating) it's taint:
$ kubectl describe pod example Name: example Namespace: default Priority: 0 Node: minikube-m02/192.168.49.3 Start Time: Thu, 19 Aug 2021 19:01:54 +0200 Labels: <none> Annotations: <none> Status: Pending IP: IPs: <none> Containers: nginx: Container ID: Image: nginx Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7z5w8 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-7z5w8: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: kubernetes.io/hostname=minikube-m02 Tolerations: application=example:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 15s default-scheduler Successfully assigned default/example to minikube-m02 Normal Pulling 11s kubelet Pulling image "nginx"
Posted on 20/08/2021