Argo Rollouts: Canary deployments

Kubernetes Argo Rollouts Canary

7 min read | by Jordi Prats

A canary deployment is a technique to reduce the risk of introducing a new version of a software application in production by slowly rolling out the change to a small subset of users before rolling it out to the entire infrastructure. If any issue is detected on the "canary", the deployment can be stopped, and the rest of the users won't be affected. With Argo Rollouts, we can easily implement this strategy.

Installing Argo Rollouts

First we'll have to make sure we have Argo Rollouts and it's CLI installed:

kubectl create namespace argo-rollouts kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml brew install argoproj/tap/kubectl-argo-rollouts 

To start using the canary deployment strategy with Argo Rollouts, we need to update the Deployment manifest to use the Rollout resource and set the stategy to canary:

apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata:  name: canary-rollout spec:  replicas: 10  selector:  matchLabels:  app: canary-rollout  template:  metadata:  labels:  app: canary-rollout  spec:  containers:  - name: nginx  image: nginx:latest  ports:  - containerPort: 80  strategy:  canary:  maxSurge: '25%'  maxUnavailable: 0 

In the previous example we are using the canary strategy but without any additional configuration. This will make it behave like a regular rolling update.

Simple canary deployment

To make it a canary deployment, we'll need to design the steps we want to follow. In a simple canary deployment we can use setWeight and pause to control how we are going to do it:

  • setWeight: The percentage of the new version to be deployed.
  • pause: The time to wait between steps. We can set a specific duration or wait for a manual resume.

During the rollout, the controller by default will keep the previous version running at it's maximum replicas, and the new version will be scaled up to the desired replicas. If the rollout is successful, the previous version will be scaled down to zero. This is to make sure we can switch back to the previous version in case of any issue without having to wait for the replicas to be scaled up again. If we don't want to use this approach, we can set dynamicStableScale to true so that it will automatically scale down the previous version as it is scaling up the new one.

Let's see an example:

apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata:  name: canary-rollout spec:  replicas: 10  selector:  matchLabels:  app: canary-rollout  template:  metadata:  labels:  app: canary-rollout  spec:  containers:  - name: nginx  image: nginx:latest  ports:  - containerPort: 80  env:  - name: APP_VERSION  value: "v2"  strategy:  canary:  steps:  - setWeight: 20  - pause: {}  - setWeight: 50  - pause:  duration: 10m  - setWeight: 90  - pause: {} 

In this example we are:

  • First, we are going to scale up to 20% of the total replicas, and wait for a manual resume.
  • Then we are going to scale up to 50% of the total replicas, and wait for 10 minutes.
  • Finally we are going to scale up to 90% of the total replicas and wait for a manual resume again.

We can use the kubectl argo rollouts get rollout command to check the status of the rollout:

$ kubectl argo rollouts get rollout canary-rollout Name: canary-rollout Namespace: demo-rollout Status:  Paused Message: CanaryPauseStep Strategy: Canary  Step: 1/6  SetWeight: 20  ActualWeight: 20 Images: nginx:latest (canary, stable) Replicas:  Desired: 10  Current: 10  Updated: 2  Ready: 10  Available: 10 NAME KIND STATUS AGE INFO  canary-rollout Rollout  Paused 10m ├──# revision:2  └──⧉ canary-rollout-7f9c9956ff ReplicaSet  Healthy 20s canary  ├──□ canary-rollout-7f9c9956ff-6bvt9 Pod  Running 10s ready:1/1  └──□ canary-rollout-7f9c9956ff-dsg2l Pod  Running 10s ready:1/1 └──# revision:1  └──⧉ canary-rollout-8fc79696d ReplicaSet  Healthy 10m stable  ├──□ canary-rollout-8fc79696d-57gz5 Pod  Running 10m ready:1/1  ├──□ canary-rollout-8fc79696d-6ghm9 Pod  Running 10m ready:1/1  ├──□ canary-rollout-8fc79696d-8fmjd Pod  Running 10m ready:1/1  ├──□ canary-rollout-8fc79696d-9dq6r Pod  Running 10m ready:1/1  ├──□ canary-rollout-8fc79696d-jqzsp Pod  Running 10m ready:1/1  ├──□ canary-rollout-8fc79696d-pz46t Pod  Running 10m ready:1/1  ├──□ canary-rollout-8fc79696d-t8k7j Pod  Running 10m ready:1/1  └──□ canary-rollout-8fc79696d-vsmmx Pod  Running 10m ready:1/1 

Since we don't have any specific duration for this step, we'll need to resume the rollout manually usign the promote command:

$ kubectl argo rollouts promote canary-rollout rollout 'canary-rollout' promoted $ kubectl argo rollouts get rollout canary-rollout Name: canary-rollout Namespace: demo-rollout Status:  Paused Message: CanaryPauseStep Strategy: Canary  Step: 3/6  SetWeight: 50  ActualWeight: 50 Images: nginx:latest (canary, stable) Replicas:  Desired: 10  Current: 10  Updated: 5  Ready: 10  Available: 10 NAME KIND STATUS AGE INFO  canary-rollout Rollout  Paused 24m ├──# revision:2  └──⧉ canary-rollout-7f9c9956ff ReplicaSet  Healthy 14m canary  ├──□ canary-rollout-7f9c9956ff-6bvt9 Pod  Running 13m ready:1/1  ├──□ canary-rollout-7f9c9956ff-dsg2l Pod  Running 13m ready:1/1  ├──□ canary-rollout-7f9c9956ff-4jqrf Pod  Running 4s ready:1/1  ├──□ canary-rollout-7f9c9956ff-6d5m4 Pod  Running 4s ready:1/1  └──□ canary-rollout-7f9c9956ff-hjrnj Pod  Running 4s ready:1/1 └──# revision:1  └──⧉ canary-rollout-8fc79696d ReplicaSet  Healthy 24m stable  ├──□ canary-rollout-8fc79696d-57gz5 Pod  Running 24m ready:1/1  ├──□ canary-rollout-8fc79696d-6ghm9 Pod  Running 24m ready:1/1  ├──□ canary-rollout-8fc79696d-8fmjd Pod  Running 24m ready:1/1  ├──□ canary-rollout-8fc79696d-t8k7j Pod  Running 24m ready:1/1  └──□ canary-rollout-8fc79696d-vsmmx Pod  Running 24m ready:1/1 

If we have a duration set, the rollout will automatically resume after the time has passed or we can also make it to continue by using the promote command.

Canary and stable services

If we define the canaryService and stableService, the controller will update the services to select the right set of Pods.

First we'll need to create these services with a generic selector for the Rollout to use:

apiVersion: v1 kind: Service metadata:  name: rollout-canary spec:  ports:  - port: 80  targetPort: http  protocol: TCP  name: http  selector:  app: canary-rollout --- apiVersion: v1 kind: Service metadata:  name: rollout-stable spec:  ports:  - port: 80  targetPort: http  protocol: TCP  name: http  selector:  app: canary-rollout 

Having the services created, we can how create a new Rollout using these services:

apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata:  name: canary-rollout spec:  replicas: 10  selector:  matchLabels:  app: canary-rollout  template:  metadata:  labels:  app: canary-rollout  spec:  containers:  - name: nginx  image: nginx:latest  ports:  - containerPort: 80  env:  - name: APP_VERSION  value: "v3"  strategy:  canary:  canaryService: rollout-canary  stableService: rollout-stable  steps:  - setWeight: 20  - pause: {} 

Once the rollout have progressed, we can see the services being updated with the pod template hash to point to the right set of Pods:

$ kubectl get svc rollout-canary -o yaml apiVersion: v1 kind: Service metadata: (...)  name: rollout-canary spec:  clusterIP: 10.96.51.65  clusterIPs:  - 10.96.51.65  internalTrafficPolicy: Cluster  ipFamilies:  - IPv4  ipFamilyPolicy: SingleStack  ports:  - name: http  port: 80  protocol: TCP  targetPort: http  selector:  app: canary-rollout  rollouts-pod-template-hash: 7df7c59c9b  sessionAffinity: None  type: ClusterIP status:  loadBalancer: {} 

Other features

While this post covered canary deployments, there are several additional features worth exploring:

  • Advanced Traffic Routing: Leveraging ingress controllers and service meshes to dynamically shift traffic based on custom rules and real-time metrics.
  • Progressive Experimentation: Using analysis templates and metrics to automatically validate new versions before promoting them.
  • Experiments: Running A/B tests and other experiments to compare different versions of an application.
  • Automated Analysis: Integrating metrics-based analysis (prometheus, CloudWatch...) to make rollout decisions based on real-time performance data.

You can also checkout blue-green deployments with Argo Rollouts.


Posted on 18/03/2025