Fine tunning Pod scheduling using taints, tolerations and nodeSelector

4 min read | by Jordi Prats

If we want just a subset of Pods to be able to be scheduled on a given node we can achieve it using taints and tolerations

With a taint we can tell the cluster not to schedule Pods on this node, but with a toleration on a Pod we can allow it to tolerate this taint

First we are going to create a taint on a node:

$ kubectl taint nodes minikube-m02 application=example:NoSchedule node/minikube-m02 tainted 

Using kubect describe node we will be able to see that it have been applied:

$ kubectl describe node minikube-m02 Name: minikube-m02 Roles: <none> Labels: beta.kubernetes.io/arch=amd64  beta.kubernetes.io/os=linux  kubernetes.io/arch=amd64  kubernetes.io/hostname=minikube-m02  kubernetes.io/os=linux Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock  node.alpha.kubernetes.io/ttl: 0  volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 19 Aug 2021 18:14:37 +0200 Taints: node.kubernetes.io/not-ready:NoExecute  application=example:NoSchedule  node.kubernetes.io/not-ready:NoSchedule Unschedulable: false (...) 

We can use a nodeSelector to try to schedule a Pod on this node:

apiVersion: v1 kind: Pod metadata:  name: example spec:  containers:  - name: nginx  image: nginx  nodeSelector:  kubernetes.io/hostname: minikube-m02 

But the node will remain in Pending state:

$ kubectl get pods NAME READY STATUS RESTARTS AGE example 0/1 Pending 0 3s 

We can check the reason using the kubectl describe: The only node that matches the nodeSelector has a taint that does not tolerate, so it cannot be scheduled there:

$ kubectl describe pod example Name: example Namespace: default Priority: 0 Node: <none> Labels: <none> Annotations: <none> Status: Pending IP:  IPs: <none> Containers:  nginx:  Image: nginx  Port: <none>  Host Port: <none>  Environment: <none>  Mounts:  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f7bff (ro) Conditions:  Type Status  PodScheduled False  Volumes:  kube-api-access-f7bff:  Type: Projected (a volume that contains injected data from multiple sources)  TokenExpirationSeconds: 3607  ConfigMapName: kube-root-ca.crt  ConfigMapOptional: <nil>  DownwardAPI: true QoS Class: BestEffort Node-Selectors: application=example Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s  node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events:  Type Reason Age From Message  ---- ------ ---- ---- -------  Warning FailedScheduling 11s (x2 over 13s) default-scheduler 0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {application: example}, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. 

We can add a toleration on the Pod for the taint that we have created:

apiVersion: v1 kind: Pod metadata:  name: example spec:  containers:  - name: nginx  image: nginx  nodeSelector:  kubernetes.io/hostname: minikube-m02  tolerations:  - key: "application"  operator: "Equal"  value: "example"  effect: "NoSchedule" 

If we create this Pod we will be able to see how it is scheduled to run on this node, ignoring (tolerating) it's taint:

$ kubectl describe pod example Name: example Namespace: default Priority: 0 Node: minikube-m02/192.168.49.3 Start Time: Thu, 19 Aug 2021 19:01:54 +0200 Labels: <none> Annotations: <none> Status: Pending IP:  IPs: <none> Containers:  nginx:  Container ID:   Image: nginx  Image ID:   Port: <none>  Host Port: <none>  State: Waiting  Reason: ContainerCreating  Ready: False  Restart Count: 0  Environment: <none>  Mounts:  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7z5w8 (ro) Conditions:  Type Status  Initialized True   Ready False   ContainersReady False   PodScheduled True  Volumes:  kube-api-access-7z5w8:  Type: Projected (a volume that contains injected data from multiple sources)  TokenExpirationSeconds: 3607  ConfigMapName: kube-root-ca.crt  ConfigMapOptional: <nil>  DownwardAPI: true QoS Class: BestEffort Node-Selectors: kubernetes.io/hostname=minikube-m02 Tolerations: application=example:NoSchedule  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s  node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events:  Type Reason Age From Message  ---- ------ ---- ---- -------  Normal Scheduled 15s default-scheduler Successfully assigned default/example to minikube-m02  Normal Pulling 11s kubelet Pulling image "nginx" 

Posted on 20/08/2021