Using an HPA object to autoscale a deployment based on it's Pods CPU metrics

3 min read | by Jordi Prats

On Kubernetes, scaling an application is just a matter of defining how many replicas we want:

$ kubectl scale deployment/demo --replicas=5 deployment.apps/demo scaled 

Having to manually adjust the number of replicas is not really practical. Here's where the HorizontalPodAutoscaler (HPA) comes into play

An HPA can be configured to use resource metrics (metrics.k8s.io), custom metrics (custom.metrics.k8s.io) and external metrics (external.metrics.k8s.io). The most basic usage is using resource metrics provided by the metrics-server that will need to be installed. We can check it's availability using kubectl get apiservice:

$ kubectl get apiservice | grep metrics v1beta1.metrics.k8s.io default/metrics-server True 15d 

Once we have checked that it is available we will have to make sure the Pod have a resource request configured (or at least the namespace has a LimitRange in place). We can check it by taking a look at the Pod definition:

$ kubectl get pod ampa-voting-5bd8449967-sstrw -o yaml apiVersion: v1 kind: Pod metadata:  name: spin-clouddriver-8b84fcf99-4nb74 spec:  affinity: {}  containers:  - image: jordiprats/pet2cattle  name: pet2cattle  ports:  - containerPort: 8008  protocol: TCP  resources:  limits:  cpu: "2"  memory: 8000Mi  requests:  cpu: 200m  memory: 1000Mi (...) 

Once we have resource requests in place, we can create a new HPA imperatively using kubectl autoscale specifying which deployment we want to control. It's options are:

  • Minimum number of replicas: Using the --min option
  • Maximum number of replicas: Using the --max option
  • Target CPU usage: Using the --cpu-percent option we can tell when we want a new Pod created base on the amount of CPU it is using during the last minute across all the Pods. For example, if we set it to 80 percent, the Pod have requested 200m but it's using more than 160m (ie 2000.8) then it will create a new Pod*

On the following example we are going to create a HPA that will keep the number of replicar between 2 and 10, scaling the application when the CPU actual usage goes beyond the 80% of the requested resources:

$ kubectl autoscale deployment ampa-voting --min=2 --max=10 --cpu-percent=80 horizontalpodautoscaler.autoscaling/ampa-voting autoscaled 

Once we have it in place it's going to take a while to collect the statistics:

$ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE ampa-voting Deployment/ampa-voting <unknown>/80% 2 10 0 7s 

After that it will start scaling the deployment based on the CPU usage of the existing Pods:

$ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE ampa-voting Deployment/ampa-voting 29%/80% 2 10 4 10m 

Posted on 01/07/2021