Fast fail metric #764

rdji123 · 2020-11-17T15:53:37Z

What are you trying to accomplish with this PR?
Adding statsd data to get a better understanding of fast fail occurrences.

cc @Shopify/pipeline

timothysmith0609 · 2020-11-17T16:15:56Z

lib/krane/kubernetes_resource/deployment.rb

+      if progress_condition.present?
+        StatsD.client.increment('kubectl.error', 1, tags: { context: context, namespace: namespace,
+                                                            progress_condition: deploy_failing_to_progress? })
+      end


I don't really like the idea of instrumenting the resource models, themselves, and would prefer if we could move that to the actual task runner, where such instrumentation makes more sense. It's worth noting we already capture timeout errors and publish them via StatsD (see https://shopify.datadoghq.com/dashboard/5kc-557-amd/krane--kubernetes-deploy-dash?from_ts=1605624831373&live=true&to_ts=1605628431373).

Alternatively, since we are constrained by the KubernetesResource interface, could we add a statsd_tag in Deployment#deploy_timed_out? when deploy_failing_to_progress? is true to indicate an actual progressing failure? 🤔 It would be hard to know if it's a fail-fast or an initial progressing failure, though.

timothysmith0609 · 2020-11-19T17:13:18Z

lib/krane/kubernetes_resource/deployment.rb

+      if progress_condition.present?
+        StatsD.client.increment('kubectl.error', 1, tags: statsd_tags)
+      end


I'm confused, why are we incrementing kubectl.error?

I wanted to use an existing metric in Krane. Do you have any suggestions as to which metric we can use? We can also create a new metric for this

I don't think there's a metric we have that's quite suitable. Perhaps I'm ignorant, but is there some reason against using a bespoke metric?

No reason against it! I've updated the metric to a new one 👍

ayatsynych · 2020-11-19T19:48:20Z

lib/krane/kubernetes_resource/deployment.rb

@@ -91,6 +91,10 @@ def deploy_timed_out?
      return false if deploy_failed?
      return super if timeout_override

+      if progress_condition.present?
+        StatsD.client.increment('fail_fast', 1, tags: statsd_tags)


might be a good idea to keep the kubectl prefix, so it's clear that this metric is specific to kubectl and it's easier to discover

rdji123 added 2 commits Nov 16, 2020

adding logger info for fast fail

Loading status checks…

01d0411

added statsd data

Loading status checks…

82ca718

rdji123 requested a review from Shopify/krane as a code owner Nov 17, 2020

timothysmith0609 reviewed Nov 17, 2020

View changes

rdji123 added 4 commits Nov 19, 2020

adding statsd

Loading status checks…

acd51ce

Update .rubocop-http---shopify-github-io-ruby-style-guide-rubocop-yml

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

737fdd7

add progress deadline to statsd tags

Loading status checks…

9738d94

add if condition to progress_deadline_present?

Loading status checks…

221d80a

timothysmith0609 reviewed Nov 19, 2020

View changes

rdji123 added 2 commits Nov 19, 2020

change metric name

Loading status checks…

a938929

change metric name

Loading status checks…

91c02de

ayatsynych reviewed Nov 19, 2020

View changes

Shopify / krane

Fast fail metric #764

Fast fail metric #764

rdji123 commented Nov 17, 2020 •

edited

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Shopify / krane

Join GitHub today

GitHub is where the world builds software

Fast fail metric #764

Fast fail metric #764

Conversation

rdji123 commented Nov 17, 2020 • edited

This comment has been minimized.

timothysmith0609 Nov 17, 2020 Contributor

This comment has been minimized.

timothysmith0609 Nov 19, 2020 Contributor

This comment has been minimized.

rdji123 Nov 19, 2020 Author Contributor

This comment has been minimized.

timothysmith0609 Nov 19, 2020 Contributor

This comment has been minimized.

rdji123 Nov 19, 2020 Author Contributor

This comment has been minimized.

ayatsynych Nov 19, 2020

Essential cookies

Always active

Analytics cookies

rdji123 commented Nov 17, 2020 •

edited

timothysmith0609 Nov 17, 2020
Contributor

timothysmith0609 Nov 19, 2020
Contributor

rdji123 Nov 19, 2020
Author Contributor

timothysmith0609 Nov 19, 2020
Contributor

rdji123 Nov 19, 2020
Author Contributor