site-reliability-engineering

Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system's capability to withstand turbulent and unexpected conditions. Chaos engineering is a disciplined approach to identifying failures before they become outages

The library controller-runtime requires setting a logger (by log.SetLogger()) at the first 30s when the application starts, or it would use the default NullLogSink. We should also call it in testing codes.

When we test with ginkgo, ginkgo provides a helpful GinkgoWriter, which hides the output as default, only prints it when the test failed. We'd better use it to keep our testing output

Issue Description

Question

Describe what happened (or what feature you want)

Trying to evaluate ChaosBlade as an option for resiliency testing. But I'm not sure if this is a feature request or a question. Actually, two questions:

Does ChaosBlade support Azure, or can it be extended to support Azure?
Can ChaosBlade inject failures into a Platform as a Service (Pa

We can add an upgrade/downgrade command for litmusctl binary, it can look at the matrix of versions in a file and upgrade/downgrade according to the user's choice.

example

litmusctl upgrade v0.5.0
litmusctl downgrade v0.4.0

It seems to me that UTC is selected for on the wire representation of time as well as in the database (jaegertracing/jaeger#712), which sort of makes sense, at least with a somewhat naive handling of timezones. However, I think that the Jaeger UI should support displaying times in the timezone local to the user, i.e. of the browser as to reduce the mental load when viewing

Although it's not a high priority, we could get a more fancy and modern wheel.

This is a rewrite of #129 to make it easier to parse :-)

Background

Prometheus is de-facto standard for monitoring applications in the cloud native space. One of the core conceits here is the idea of "time-series" data (look at the Prometheus docs to get a better idea) for metrics. At a high level, you can just think of it as a continues series of values for som

site-reliability-engineering

Here are 62 public repositories matching this topic...

dastergon / awesome-sre

upgundecha / howtheysre

dastergon / awesome-chaos-engineering

chaos-mesh / chaos-mesh

chaosblade-io / chaosblade

Issue Description

Describe what happened (or what feature you want)

litmuschaos / litmus

alexei-led / pumba

dastergon / postmortem-templates

jaegertracing / jaeger-ui

SquadcastHub / awesome-sre-tools

chris-short / DevOps-README.md

mister0 / How-to-prepare-for-google-interview-SWE-SRE

rishiloyola / SRE-Interviews

chiaen / sre-book-in-audio

dastergon / CardsAgainstReliability

zeroc0d3lab / awesome-sre

dastergon / wheel-of-misfortune

danrl / skinny

dastergon / availability-calculator

gremlin / sre-tools

krootee / awesome-scalability-toolbox

marceloboeira / sre

dastergon / sreworkbook-templates-md

exajobs / devops-collection

QAInsights / Performance-Engineers-DevOps

dastergon / common-disaster-recovery-scenarios

operate-first / operations

Background

ari-hacks / kubernetes-chaos-sandbox

at15 / sre-handbook

zeroc0d3lab / awesome-scalability

Related Topics