-
Updated
Jul 7, 2020 - HTML
#
sre
Here are 238 public repositories matching this topic...
Compilation of public failure/horror stories related to Kubernetes
A curated list of Site Reliability and Production Engineering resources.
devops
availability
list
awesome
monitoring
reliability-engineering
incident-response
site-reliability-engineering
production
post-mortem
capacity-planning
service-level-agreement
scalability
reliability
alerting
on-call
awesome-list
sre
postmortem
site-reliability
-
Updated
Aug 10, 2020
Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
ansible
devops
automation
ops
deployment
scheduler
rundeck
audit
orchestration
operations
sre
devops-tools
runbook
devops-team
-
Updated
Aug 21, 2020 - Groovy
Site Reliability Engineer Interview Preparation Guide
-
Updated
Jul 26, 2020
manugarg
commented
Oct 24, 2019
Cloudprober supports building additional metrics (other than the default ones) from external probe output. We could possible do the same for HTTP probe.
A framework for gradual system automation
ruby
devops
rubygem
automation
ops
orchestration-framework
orchestration
remote-execution
operations
sre
devops-tools
sshkit
automation-framework
runbook
runbooks
opseng
runbook-configuration
runbook-generators
runbook-dsl
runbook-command
-
Updated
Jul 24, 2020 - Ruby
What to Read to Learn More About DevOps
devops
cloud
monitoring
continuous-integration
continuous-delivery
stress
site-reliability-engineering
continuous-deployment
culture
systems
release
leader
lean
cloud-native
sre
failure
blame
systems-engineering
systems-administration
devops-journey
-
Updated
Jun 9, 2019
Knowledge seeks no man
linux
docker
kubernetes
aws
devops
cloud
containers
site-reliability-engineering
gcp
gke
infrastructure-as-code
sre
information-security
devsecops
-
Updated
Aug 16, 2020
Linux Bash Shell Script and Python Script For Ops and Devops
-
Updated
Aug 19, 2020 - Python
Guidance on how to make your environment easier to onboard for Web Ops Engineers, SRE's and DevOps Practitioners
-
Updated
Jun 16, 2020
A reading/viewing list for larval stage sysadmins and SREs
-
Updated
Aug 21, 2020
Curated list of good SRE interview questions.
-
Updated
May 6, 2020
Marmot workflow execution engine
go
kubernetes
golang
devops
google
network
google-cloud
network-monitoring
sre
kubernetes-operator
devops-tools
devops-services
-
Updated
Sep 6, 2017 - Go
Collection of AWS SSM Documents to perform Chaos Engineering experiments
aws
chaos
chaos-monkey
software-engineering
aws-ec2
sre
amazon-web-services
chaos-testing
chaos-engineering
-
Updated
Jul 21, 2020 - Python
Kubernetes utility for exposing image versions in use, compared to latest available upstream, as metrics.
-
Updated
Aug 20, 2020 - Go
Google Site Reliability Engineering book converted in audio
-
Updated
Mar 22, 2017
s3-streaming-upload is node.js library that listens to your stream and upload its data to Amazon S3 using ManagedUpload API.
-
Updated
Jul 17, 2020 - JavaScript
Notes on Site Reliability Engineering. Leave a 🌟 if you found this useful!
-
Updated
Oct 31, 2019
-
Updated
Nov 10, 2018
A curated list of Site Reliability and Production Engineering Tools
devops
availability
list
awesome
monitoring
reliability-engineering
site-reliability-engineering
production
post-mortem
service-level-agreement
reliability
awesome-list
sre
devops-tools
service-level-objective
incident-management
postmortem
monitoring-tools
service-level-monitoring
incident-responce
-
Updated
Jul 26, 2020
A curated list of awesome Site Reliability and Production Engineering resources.
devops
availability
awesome
monitoring
reliability-engineering
incident-response
site-reliability-engineering
production
post-mortem
capacity-planning
service-level-agreement
scalability
reliability
alerting
on-call
awesome-list
sre
observability
postmortem
site-reliability
-
Updated
May 6, 2018
Collection of python scripts to run failure injection on AWS infrastructure
-
Updated
May 4, 2020 - Python
Cloud Operations Sandbox is an open source tool that helps practitioners to learn Service Reliability Engineering practices from Google and apply them on their cloud services using Cloud Operations suite of tools.
debugger
devops
cloud
profiler
google-cloud
operations
cloud-native
sre
stackdriver
cloud-operations
stackdriver-monitoring
cloudops
opencensus
stackdriver-logs
stackdriver-trace
opentelemetry
ops-management
stackdriver-sandbox
-
Updated
Aug 22, 2020 - C#
The Skinny Distributed Lock Service
-
Updated
May 23, 2020 - Go
Chaos Injection library for AWS Lambda
-
Updated
May 5, 2020 - Python
Improve this page
Add a description, image, and links to the sre topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the sre topic, visit your repo's landing page and select "manage topics."

This is a reminder for me or a task if anyone wants :P
Basically, The last two questions aren't really regex's questions.
To do: