Flagger
Search…
Traefik Canary Deployments
This guide shows you how to use the Traefik and Flagger to automate canary deployments.
Flagger Traefik Overview

Prerequisites

Flagger requires a Kubernetes cluster v1.16 or newer and Traefik v2.3 or newer.
Install Traefik with Helm v3:
1
helm repo add traefik https://helm.traefik.io/traefik
2
kubectl create ns traefik
3
4
cat <<EOF | helm upgrade -i traefik traefik/traefik --namespace traefik -f -
5
deployment:
6
podAnnotations:
7
prometheus.io/port: "9100"
8
prometheus.io/scrape: "true"
9
prometheus.io/path: "/metrics"
10
metrics:
11
prometheus:
12
entryPoint: metrics
13
EOF
Copied!
Install Flagger and the Prometheus add-on in the same namespace as Traefik:
1
helm repo add flagger https://flagger.app
2
3
helm upgrade -i flagger flagger/flagger \
4
--namespace traefik \
5
--set prometheus.install=true \
6
--set meshProvider=traefik
Copied!

Bootstrap

Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler (HPA), then creates a series of objects (Kubernetes deployments, ClusterIP services and TraefikService). These objects expose the application outside the cluster and drive the canary analysis and promotion.
Create a test namespace:
1
kubectl create ns test
Copied!
Create a deployment and a horizontal pod autoscaler:
1
kubectl apply -k https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main
Copied!
Deploy the load testing service to generate traffic during the canary analysis:
1
helm upgrade -i flagger-loadtester flagger/loadtester \
2
--namespace=test
Copied!
Create Traefik IngressRoute that references TraefikService generated by Flagger (replace app.example.com with your own domain):
1
apiVersion: traefik.containo.us/v1alpha1
2
kind: IngressRoute
3
metadata:
4
name: podinfo
5
namespace: test
6
spec:
7
entryPoints:
8
- web
9
routes:
10
- match: Host(`app.example.com`)
11
kind: Rule
12
services:
13
- name: podinfo
14
kind: TraefikService
15
port: 80
Copied!
Save the above resource as podinfo-ingressroute.yaml and then apply it:
1
kubectl apply -f ./podinfo-ingressroute.yaml
Copied!
Create a canary custom resource (replace app.example.com with your own domain):
1
apiVersion: flagger.app/v1beta1
2
kind: Canary
3
metadata:
4
name: podinfo
5
namespace: test
6
spec:
7
provider: traefik
8
# deployment reference
9
targetRef:
10
apiVersion: apps/v1
11
kind: Deployment
12
name: podinfo
13
# HPA reference (optional)
14
autoscalerRef:
15
apiVersion: autoscaling/v2beta2
16
kind: HorizontalPodAutoscaler
17
name: podinfo
18
# the maximum time in seconds for the canary deployment
19
# to make progress before it is rollback (default 600s)
20
progressDeadlineSeconds: 60
21
service:
22
# ClusterIP port number
23
port: 80
24
# container port number or name
25
targetPort: 9898
26
analysis:
27
# schedule interval (default 60s)
28
interval: 10s
29
# max number of failed metric checks before rollback
30
threshold: 10
31
# max traffic percentage routed to canary
32
# percentage (0-100)
33
maxWeight: 50
34
# canary increment step
35
# percentage (0-100)
36
stepWeight: 5
37
# Traefik Prometheus checks
38
metrics:
39
- name: request-success-rate
40
interval: 1m
41
# minimum req success rate (non 5xx responses)
42
# percentage (0-100)
43
thresholdRange:
44
min: 99
45
- name: request-duration
46
interval: 1m
47
# maximum req duration P99
48
# milliseconds
49
thresholdRange:
50
max: 500
51
webhooks:
52
- name: acceptance-test
53
type: pre-rollout
54
url: http://flagger-loadtester.test/
55
timeout: 10s
56
metadata:
57
type: bash
58
cmd: "curl -sd 'test' http://podinfo-canary.test/token | grep token"
59
- name: load-test
60
type: rollout
61
url: http://flagger-loadtester.test/
62
timeout: 5s
63
metadata:
64
type: cmd
65
cmd: "hey -z 10m -q 10 -c 2 -host app.example.com http://traefik.traefik"
66
logCmdOutput: "true"
Copied!
Save the above resource as podinfo-canary.yaml and then apply it:
1
kubectl apply -f ./podinfo-canary.yaml
Copied!
After a couple of seconds Flagger will create the canary objects:
1
# applied
2
deployment.apps/podinfo
3
horizontalpodautoscaler.autoscaling/podinfo
4
canary.flagger.app/podinfo
5
6
# generated
7
deployment.apps/podinfo-primary
8
horizontalpodautoscaler.autoscaling/podinfo-primary
9
service/podinfo
10
service/podinfo-canary
11
service/podinfo-primary
12
traefikservice.traefik.containo.us/podinfo
Copied!

Automated canary promotion

Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance indicators like HTTP requests success rate, requests average duration and pod health. Based on analysis of the KPIs a canary is promoted or aborted, and the analysis result is published to Slack or MS Teams.
Flagger Canary Stages
Trigger a canary deployment by updating the container image:
1
kubectl -n test set image deployment/podinfo \
2
podinfod=stefanprodan/podinfo:4.0.6
Copied!
Flagger detects that the deployment revision changed and starts a new rollout:
1
kubectl -n test describe canary/podinfo
2
3
Status:
4
Canary Weight: 0
5
Failed Checks: 0
6
Phase: Succeeded
7
Events:
8
New revision detected! Scaling up podinfo.test
9
Waiting for podinfo.test rollout to finish: 0 of 1 updated replicas are available
10
Pre-rollout check acceptance-test passed
11
Advance podinfo.test canary weight 5
12
Advance podinfo.test canary weight 10
13
Advance podinfo.test canary weight 15
14
Advance podinfo.test canary weight 20
15
Advance podinfo.test canary weight 25
16
Advance podinfo.test canary weight 30
17
Advance podinfo.test canary weight 35
18
Advance podinfo.test canary weight 40
19
Advance podinfo.test canary weight 45
20
Advance podinfo.test canary weight 50
21
Copying podinfo.test template spec to podinfo-primary.test
22
Waiting for podinfo-primary.test rollout to finish: 1 of 2 updated replicas are available
23
Routing all traffic to primary
24
Promotion completed! Scaling down podinfo.test
Copied!
Note that if you apply new changes to the deployment during the canary analysis, Flagger will restart the analysis.
You can monitor all canaries with:
1
watch kubectl get canaries --all-namespaces
2
3
NAMESPACE NAME STATUS WEIGHT LASTTRANSITIONTIME
4
test podinfo-2 Progressing 30 2020-08-14T12:32:12Z
5
test podinfo Succeeded 0 2020-08-14T11:23:88Z
Copied!

Automated rollback

During the canary analysis you can generate HTTP 500 errors to test if Flagger pauses and rolls back the faulted version.
Trigger another canary deployment:
1
kubectl -n test set image deployment/podinfo \
2
podinfod=stefanprodan/podinfo:4.0.6
Copied!
Exec into the load tester pod with:
1
kubectl -n test exec -it deploy/flagger-loadtester bash
Copied!
Generate HTTP 500 errors:
1
hey -z 1m -c 5 -q 5 http://app.example.com/status/500
Copied!
Generate latency:
1
watch -n 1 curl http://app.example.com/delay/1
Copied!
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary, the canary is scaled to zero and the rollout is marked as failed.
1
kubectl -n traefik logs deploy/flagger -f | jq .msg
2
3
New revision detected! Scaling up podinfo.test
4
Canary deployment podinfo.test not ready: waiting for rollout to finish: 0 of 1 updated replicas are available
5
Starting canary analysis for podinfo.test
6
Pre-rollout check acceptance-test passed
7
Advance podinfo.test canary weight 5
8
Advance podinfo.test canary weight 10
9
Advance podinfo.test canary weight 15
10
Advance podinfo.test canary weight 20
11
Halt podinfo.test advancement success rate 53.42% < 99%
12
Halt podinfo.test advancement success rate 53.19% < 99%
13
Halt podinfo.test advancement success rate 48.05% < 99%
14
Rolling back podinfo.test failed checks threshold reached 3
15
Canary failed! Scaling down podinfo.test
Copied!

Custom metrics

The canary analysis can be extended with Prometheus queries.
Create a metric template and apply it on the cluster:
1
apiVersion: flagger.app/v1beta1
2
kind: MetricTemplate
3
metadata:
4
name: not-found-percentage
5
namespace: test
6
spec:
7
provider:
8
type: prometheus
9
address: http://flagger-prometheus.traefik:9090
10
query: |
11
sum(
12
rate(
13
traefik_service_request_duration_seconds_bucket{
14
service=~"{{ namespace }}-{{ target }}-canary-[0-9a-zA-Z-][email protected]",
15
code!="404",
16
}[{{ interval }}]
17
)
18
)
19
/
20
sum(
21
rate(
22
traefik_service_request_duration_seconds_bucket{
23
service=~"{{ namespace }}-{{ target }}-canary-[0-9a-zA-Z-][email protected]",
24
}[{{ interval }}]
25
)
26
) * 100
Copied!
Edit the canary analysis and add the not found error rate check:
1
analysis:
2
metrics:
3
- name: "404s percentage"
4
templateRef:
5
name: not-found-percentage
6
thresholdRange:
7
max: 5
8
interval: 1m
Copied!
The above configuration validates the canary by checking if the HTTP 404 req/sec percentage is below 5 percent of the total traffic. If the 404s rate reaches the 5% threshold, then the canary fails.
Trigger a canary deployment by updating the container image:
1
kubectl -n test set image deployment/podinfo \
2
podinfod=stefanprodan/podinfo:4.0.6
Copied!
Generate 404s:
1
watch curl http://app.example.com/status/400
Copied!
Watch Flagger logs:
1
kubectl -n traefik logs deployment/flagger -f | jq .msg
2
3
Starting canary deployment for podinfo.test
4
Advance podinfo.test canary weight 5
5
Advance podinfo.test canary weight 10
6
Advance podinfo.test canary weight 15
7
Halt podinfo.test advancement 404s percentage 6.20 > 5
8
Halt podinfo.test advancement 404s percentage 6.45 > 5
9
Halt podinfo.test advancement 404s percentage 7.60 > 5
10
Halt podinfo.test advancement 404s percentage 8.69 > 5
11
Halt podinfo.test advancement 404s percentage 9.70 > 5
12
Rolling back podinfo.test failed checks threshold reached 5
13
Canary failed! Scaling down podinfo.test
Copied!
If you have alerting configured, Flagger will send a notification with the reason why the canary failed.
For an in-depth look at the analysis process read the usage docs.
Last modified 3mo ago