This guide show you how to use Prometheus Operator for canary analysis.
Flagger requires a Kubernetes cluster v1.16 or newer and Prometheus Operator v0.40 or newer.
Install Prometheus Operator with Helm v3:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts​kubectl create ns monitoringhelm upgrade -i prometheus prometheus-community/kube-prometheus-stack \--namespace monitoring \--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \--set fullnameOverride=prometheus
The prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
option allows Prometheus Operator to watch serviceMonitors outside of its namespace.
Install Flagger by setting the metrics server to Prometheus:
helm repo add flagger https://flagger.app​kubectl create ns flagger-systemhelm upgrade -i flagger flagger/flagger \--namespace flagger-system \--set metricsServer=http://prometheus-prometheus.monitoring:9090 \--set meshProvider=kubernetes
Install Flagger's tester:
helm upgrade -i loadtester flagger/loadtester \--namespace flagger-system
Install podinfo demo app:
helm repo add podinfo https://stefanprodan.github.io/podinfo​kubectl create ns testhelm upgrade -i podinfo podinfo/podinfo \--namespace test \--set service.enabled=false
The demo app is instrumented with Prometheus, so you can create a ServiceMonitor
objects to scrape podinfo's metrics endpoint:
apiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata:name: podinfo-canarynamespace: testspec:endpoints:- path: /metricsport: httpinterval: 5sselector:matchLabels:app: podinfo-canary---apiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata:name: podinfo-primarynamespace: testspec:endpoints:- path: /metricsport: httpinterval: 5sselector:matchLabels:app: podinfo
We are setting interval: 5s
to have a more aggressive scraping. If you do not define it, you should use a longer interval in the Canary object.
Create a metric template to measure the HTTP requests error rate:
apiVersion: flagger.app/v1beta1kind: MetricTemplatemetadata:name: error-ratenamespace: testspec:provider:address: http://prometheus-prometheus.monitoring:9090type: prometheusquery: |100 - rate(http_requests_total{namespace="{{ namespace }}",job="{{ target }}-canary",status!~"5.*"}[{{ interval }}])/rate(http_requests_total{namespace="{{ namespace }}",job="{{ target }}-canary"}[{{ interval }}]) * 100
Create a metric template to measure the HTTP requests average duration:
apiVersion: flagger.app/v1beta1kind: MetricTemplatemetadata:name: latencynamespace: testspec:provider:address: http://prometheus-prometheus.monitoring:9090type: prometheusquery: |histogram_quantile(0.99,sum(rate(http_request_duration_seconds_bucket{namespace="{{ namespace }}",job="{{ target }}-canary"}[{{ interval }}])) by (le))
Using the metrics template you can configure the canary analysis with HTTP error rate and latency checks:
apiVersion: flagger.app/v1beta1kind: Canarymetadata:name: podinfonamespace: testspec:provider: kubernetestargetRef:apiVersion: apps/v1kind: Deploymentname: podinfoprogressDeadlineSeconds: 60service:port: 80targetPort: httpname: podinfoanalysis:interval: 30siterations: 10threshold: 2metrics:- name: error-ratetemplateRef:name: error-ratethresholdRange:max: 1interval: 30s- name: latencytemplateRef:name: latencythresholdRange:max: 0.5interval: 30swebhooks:- name: load-testtype: rollouturl: "http://loadtester.flagger-system/"timeout: 5smetadata:type: cmdcmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test/"
Based on the above specification, Flagger creates the primary and canary Kubernetes ClusterIP service.
During the canary analysis, Prometheus will scrape the canary service and Flagger will use the HTTP error rate and latency queries to determine if the release should be promoted or rolled back.