Kubernetes: volume node affinity conflict

4 min read | by Jordi Prats

While trying to deploy Pods we might notice the on the Events section that Pod cannot be scheduled due to a volume node affinity conflict:

$ kubectl describe pod website-365-flask-ampa2-ha-member-1 -n website-365  Name: website-365-flask-ampa2-ha-member-1 Namespace: website-365 Priority: 0 Node: <none> Labels: (...) Annotations: (...) Status: Pending IP:  IPs: <none> Controlled By: StatefulSet/website-365-flask-ampa2-ha-member Init Containers: (...) Containers: (...) Conditions:  Type Status  PodScheduled False  Volumes:  volume:  Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)  ClaimName: volume-website-365-flask-ampa2-ha-member-1  ReadOnly: false (...) Events:  Type Reason Age From Message  ---- ------ ---- ---- -------  Normal NotTriggerScaleUp 31m (x20835 over 7d19h) cluster-autoscaler pod didn't trigger scale-up: 2 node(s) had taint {pti/role: system}, that the pod didn't tolerate, 1 node(s) had volume node affinity conflict  Normal NotTriggerScaleUp 95s (x46144 over 7d19h) cluster-autoscaler pod didn't trigger scale-up: 1 node(s) had volume node affinity conflict, 2 node(s) had taint {pti/role: system}, that the pod didn't tolerate  Warning FailedScheduling 64s (x2401 over 43h) default-scheduler 0/4 nodes are available: 2 node(s) had taint {pti/role: system}, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict. 

This message is stating the fact that the node sits on a different availability zones than the volume it tries to use hence it cannot be scheduled on that node since it wouldn't be able to mount the requested volume.

We can check it looking to the Volumes section:

$ kubectl describe pod website-365-flask-ampa2-ha-member-1 -n website-365  Name: website-365-flask-ampa2-ha-member-1 Namespace: website-365 Priority: 0 Node: <none> (...) Volumes:  volume:  Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)  ClaimName: volume-website-365-flask-ampa2-ha-member-1  ReadOnly: false (...) 

We'll need to check the PVC first to retrieve the actual volume it is using:

$ kubectl get pvc -n website-365 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-website-365-postgresql-0 Bound pvc-dc818c5c-2677-4bc0-aa32-e141e0ac1516 200Gi RWO ebs-gp2 41d volume-website-365-flask-ampa2-ha-member-0 Bound pvc-710b454f-c06b-4367-b8da-1ec5a3d78a00 200Gi RWO ebs-gp2 41d volume-website-365-flask-ampa2-ha-member-1 Bound pvc-a0cb18a4-b471-4169-b408-699aedaed33d 200Gi RWO ebs-gp2 41d volume-website-365-flask-ampa2-ha-primary-0 Bound pvc-7d4ea83f-da45-44bd-88eb-801950abb8de 200Gi RWO ebs-gp2 41d 

If we describe it we'll be able to see on which availability zone it is:

$ kubectl describe pv pvc-a0cb18a4-b471-4169-b408-699aedaed33d Name: pvc-a0cb18a4-b471-4169-b408-699aedaed33d Labels: <none> Annotations: pv.kubernetes.io/provisioned-by: ebs.csi.aws.com Finalizers: [kubernetes.io/pv-protection external-attacher/ebs-csi-aws-com] StorageClass: ebs-gp2 Status: Bound Claim: website-365/volume-website-365-flask-ampa2-ha-member-1 Reclaim Policy: Delete Access Modes: RWO VolumeMode: Filesystem Capacity: 200Gi Node Affinity:   Required Terms:   Term 0: topology.ebs.csi.aws.com/zone in [eu-west-1b] Message:  Source:  Type: CSI (a Container Storage Interface (CSI) volume source)  Driver: ebs.csi.aws.com  FSType: ext4  VolumeHandle: vol-09923383c7c9af32f  ReadOnly: false  VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=1633054440112-8081-ebs.csi.aws.com Events: <none> 

Now it's just a matter of checking the availability zone of each of the nodes:

$ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-10-120-194-190.eu-west-1.compute.internal Ready <none> 7d22h v1.21.4-eks-033ce7e ip-10-120-194-235.eu-west-1.compute.internal Ready <none> 37d v1.21.4-eks-033ce7e ip-10-120-195-8.eu-west-1.compute.internal Ready <none> 8m28s v1.21.4-eks-033ce7e ip-10-120-197-126.eu-west-1.compute.internal Ready <none> 14h v1.21.4-eks-033ce7e $ kubectl describe node ip-10-120-195-8.eu-west-1.compute.internal Name: ip-10-120-195-8.eu-west-1.compute.internal Roles: <none> Labels: beta.kubernetes.io/arch=amd64  beta.kubernetes.io/instance-type=m5a.xlarge  beta.kubernetes.io/os=linux  failure-domain.beta.kubernetes.io/region=eu-west-1  failure-domain.beta.kubernetes.io/zone=eu-west-1a  kubernetes.io/arch=amd64  kubernetes.io/hostname=ip-10-120-195-8.eu-west-1.compute.internal  kubernetes.io/os=linux  node.kubernetes.io/instance-type=m5a.xlarge  pti/eks-workers-group-name=default  pti/lifecycle=spot  topology.ebs.csi.aws.com/zone=eu-west-1a  topology.kubernetes.io/region=eu-west-1  topology.kubernetes.io/zone=eu-west-1a  vpc.amazonaws.com/has-trunk-attached=true Annotations: csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0e34bcb1ab40300fb"}  node.alpha.kubernetes.io/ttl: 0  volumes.kubernetes.io/controller-managed-attach-detach: true (...) 

Depending on how we have our cluster configured this can be handled in different ways. Usually the ClusterAutoscaler or Karpenter to schedule new nodes on the appropriate availability zone. If, after some time, they don't we'll have to check why: Being having reached it's maximum number of nodes the most likely reason


Posted on 27/04/2022