Metrics Analysis

As part of the analysis process, Flagger can validate service level objectives (SLOs) like availability, error rate percentage, average response time and any other objective based on app specific metrics. If a drop in performance is noticed during the SLOs analysis, the release will be automatically rolled back with minimum impact to end-users.

Builtin metrics

Flagger comes with two builtin metric checks: HTTP request success rate and duration.

  analysis:
    metrics:
    - name: request-success-rate
      interval: 1m
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      thresholdRange:
        min: 99
    - name: request-duration
      interval: 1m
      # maximum req duration P99
      # milliseconds
      thresholdRange:
        max: 500

For each metric you can specify a range of accepted values with thresholdRange and the window size or the time series with interval. The builtin checks are available for every service mesh / ingress controller and are implemented with Prometheus queries.

Custom metrics

The canary analysis can be extended with custom metric checks. Using a MetricTemplate custom resource, you configure Flagger to connect to a metric provider and run a query that returns a float64 value. The query result is used to validate the canary based on the specified threshold range.

The following variables are available in query templates:

  • name (canary.metadata.name)

  • namespace (canary.metadata.namespace)

  • target (canary.spec.targetRef.name)

  • service (canary.spec.service.name)

  • ingress (canary.spec.ingresRef.name)

  • interval (canary.spec.analysis.metrics[].interval)

  • variables (canary.spec.analysis.metrics[].templateVariables)

A canary analysis metric can reference a template with templateRef:

A canary analysis metric can reference a set of custom variables with templateVariables. These variables will be then injected into the query defined in the referred MetricTemplate object during canary analysis:

Prometheus

You can create custom metric checks targeting a Prometheus server by setting the provider type to prometheus and writing the query in PromQL.

Prometheus template example:

Reference the template in the canary analysis:

The above configuration validates the canary by checking if the HTTP 404 req/sec percentage is below 5 percent of the total traffic. If the 404s rate reaches the 5% threshold, then the canary fails.

Prometheus gRPC error rate example:

The above template is for gRPC services instrumented with go-grpc-prometheus.

Prometheus authentication

If your Prometheus API requires basic authentication, you can create a secret in the same namespace as the MetricTemplate with the basic-auth credentials:

or if you require bearer token authentication (via a SA token):

Then reference the secret in the MetricTemplate:

Datadog

You can create custom metric checks using the Datadog provider.

Create a secret with your Datadog API credentials:

Datadog template example:

Reference the template in the canary analysis:

Amazon CloudWatch

You can create custom metric checks using the CloudWatch metrics provider.

CloudWatch template example:

The query format documentation can be found here.

Reference the template in the canary analysis:

Note that Flagger need AWS IAM permission to perform cloudwatch:GetMetricData to use this provider.

New Relic

You can create custom metric checks using the New Relic provider.

Create a secret with your New Relic Insights credentials:

New Relic template example:

Reference the template in the canary analysis:

Graphite

You can create custom metric checks using the Graphite provider.

Graphite template example:

Reference the template in the canary analysis:

Graphite authentication

If your Graphite API requires basic authentication, you can create a secret in the same namespace as the MetricTemplate with the basic-auth credentials:

Then, reference the secret in the MetricTemplate:

Google Cloud Monitoring (Stackdriver)

Enable Workload Identity on your cluster, create a service account key that has read access to the Cloud Monitoring API and then create an IAM policy binding between the GCP service account and the Flagger service account on Kubernetes. You can take a look at this guide

Annotate the flagger service account

Alternatively, you can download the json keys and add it to your secret with the key serviceAccountKey (This method is not recommended).

Create a secret that contains your project-id (and, if workload identity is not enabled on your cluster, your service account json).

Then reference the secret in the metric template. Note: The particular MQL query used here works if Istio is installed on GKE.

The reference for the query language can be found here

InfluxDB

The InfluxDB provider uses the flux query language.

Create a secret that contains your authentication token that can be found in the InfluxDB UI.

Then reference the secret in the metric template.

Note: The particular MQL query used here works if Istio is installed on GKE.

Dynatrace

You can create custom metric checks using the Dynatrace provider.

Create a secret with your Dynatrace token:

Dynatrace metric template example:

Reference the template in the canary analysis:

Keptn

You can create custom metric checks using the Keptn provider. This Provider allows to verify either the value of a single KeptnMetric, representing the value of a single metric, or of a Keptn Analysis, which provides a flexible grading logic for analysing and prioritising a number of different metric values coming from different data sources.

This provider requires Keptn to be installed in the cluster.

Example for a Keptn metric template:

This will reference the KeptnMetric with the name response-time in the namespace my-namespace, which could look like the following:

The query contains the following components, which are divided by / characters:

  • type (required): Must be either keptnmetric or analysis.

  • namespace (required): The namespace of the referenced KeptnMetric/AnalysisDefinition.

  • resource-name (required): The name of the referenced KeptnMetric/AnalysisDefinition.

  • timeframe (optional): The timeframe used for the Analysis. This will usually be set to the same value as the analysis interval of a Canary. Only relevant if the type is set to analysis.

  • arguments (optional): Arguments to be passed to an Analysis. Arguments are passed as a list of key value pairs, separated by ; characters, e.g. foo=bar;bar=foo. Only relevant if the type is set to analysis.

For the type analysis, the value returned by the provider is either 0 (if the analysis failed), or 1 (analysis passed).

Splunk

You can create custom metric checks using the Splunk provider.

Create a secret that contains your authentication token that can be found in the Splunk o11y UI.

Splunk template example:

The query format documentation can be found here.

Reference the template in the canary analysis:

Last updated

Was this helpful?