GitHub Status

Mar 12, 2024

Incident with API Requests, Git Operations, Webhooks and Copilot

Resolved - This incident has been resolved.
Mar 12, 01:00 UTC

Update - We believe we've resolved the root cause and are waiting for services to recover
Mar 12, 01:00 UTC

Update - API Requests is operating normally.
Mar 12, 00:56 UTC

Update - Git Operations is operating normally.
Mar 12, 00:55 UTC

Update - Webhooks is operating normally.
Mar 12, 00:54 UTC

Update - Copilot is operating normally.
Mar 12, 00:54 UTC

Update - We're continuing to investigate issues with our authentication service, impacting multiple services
Mar 12, 00:14 UTC

Update - Webhooks is experiencing degraded performance. We are continuing to investigate.
Mar 11, 23:55 UTC

Update - Webhooks is operating normally.
Mar 11, 23:31 UTC

Update - Copilot is experiencing degraded performance. We are continuing to investigate.
Mar 11, 23:21 UTC

Update - Git Operations is experiencing degraded performance. We are continuing to investigate.
Mar 11, 23:20 UTC

Update - Webhooks is experiencing degraded performance. We are continuing to investigate.
Mar 11, 23:09 UTC

Investigating - We are investigating reports of degraded availability for API Requests, Git Operations and Webhooks
Mar 11, 23:01 UTC

Mar 11, 2024

Incident with Actions

Resolved - This incident has been resolved.
Mar 11, 19:22 UTC

Update - Actions experienced a period of decreased workflow run throughput, and we are seeing recovery now. We are in the process of investigating the cause.
Mar 11, 19:21 UTC

Investigating - We are investigating reports of degraded performance for Actions
Mar 11, 19:02 UTC

Incident with Copilot

Resolved - This incident has been resolved.
Mar 11, 10:20 UTC

Update - We are deploying mitigations for the failures we have been observing in some chat requests for Copilot. We will continue to monitor and update.
Mar 11, 10:02 UTC

Update - We are seeing an elevated failure rate for chat requests for Copilot. We are investigating and will continue to keep users updated on progress towards mitigation.
Mar 11, 09:03 UTC

Investigating - We are investigating reports of degraded performance for Copilot
Mar 11, 08:14 UTC

Mar 10, 2024

No incidents reported.

Mar 9, 2024

No incidents reported.

Mar 8, 2024

No incidents reported.

Mar 7, 2024

No incidents reported.

Mar 6, 2024

No incidents reported.

Mar 5, 2024

No incidents reported.

Mar 4, 2024

No incidents reported.

Mar 3, 2024

No incidents reported.

Mar 2, 2024

No incidents reported.

Mar 1, 2024

Incident with API Requests, Copilot, Git Operations, Actions and Pages

Resolved - This incident has been resolved.
Mar 1, 17:42 UTC

Update - Git Operations is operating normally.
Mar 1, 17:42 UTC

Update - Actions and Pages are operating normally.
Mar 1, 17:41 UTC

Update - Copilot is operating normally.
Mar 1, 17:36 UTC

Update - Pages is experiencing degraded performance. We are continuing to investigate.
Mar 1, 17:34 UTC

Update - One of our clusters is experiencing problems, and we are working on restoring the cluster at this time.
Mar 1, 17:34 UTC

Investigating - We are investigating reports of degraded performance for API Requests, Copilot, Git Operations and Actions
Mar 1, 17:30 UTC

Incident with Pull Requests, Actions and Issues

Resolved - On March 1, 2024, between 14:17 UTC and 15:54 UTC the service that sends messages from our event stream into our background job processing service was degraded and delayed the transmission of jobs for processing. No data or jobs were lost. From 14:17 to 14:41 UTC, there was a partial degradation, where customers would experience intermittent delays with PRs and Actions. From 14:41 to 15:24 UTC, 36% of PRs users saw stale data, and 100% of in progress Actions workflows did not see updates , even though the workflows were succeeding. At 15:24 UTC, we mitigated the incident by redeploying our service and jobs began to burn down, with full job catchup by 15:54 UTC. This was due to under provisioned memory and lack of memory based back pressure in the service, which overwhelmed consumers and led to OutOfMemory crashes.

We have adjusted memory configurations to prevent this problem, and are analyzing and adjusting our alert sensitivity to reduce our time to detection of issues like this one in the future.

Mar 1, 16:12 UTC

Update - Issues, Pull Requests and Actions are operating normally.
Mar 1, 16:12 UTC

Update - We're seeing our background job queue sizes trend down, and expect full recovery in the next 15 minutes.
Mar 1, 15:48 UTC

Update - Issues is experiencing degraded performance. We are continuing to investigate.
Mar 1, 15:39 UTC

Update - We're continuing to investigate issues with background jobs that have impacted Actions and Pull Requests. We have a mitigation in place and are monitoring for recovery.
Mar 1, 15:27 UTC

Update - We're investigating issues with background jobs that are causing sporadic delays in pull request synchronization and reduced Actions throughput.
Mar 1, 14:51 UTC

Investigating - We are investigating reports of degraded performance for Pull Requests and Actions
Mar 1, 14:39 UTC

Feb 29, 2024

Incident with Issues, Webhooks and Actions

Resolved - On February 29, 2024, between 9:32 and 11:54 UTC, queuing in our background job service caused processing delays to Webhooks, Actions, and Issues. Nearly 95% of delays occurred between 11:05 and 11:27 UTC, with 5% during the remainder of the incident. During this incident, the following customer impacts occurred: 50% of webhooks experienced delays of up to 5m, 1% of webhooks experienced delays of 17m at peak; Actions: on average, 7% of customers experienced delays, with a peak of 44%; and many Issues saw a delay in appearing in searches. At 9:32 UTC our automated failover successfully routed traffic to a secondary cluster. But an improper restoration to primary at 10:32 UTC caused a significant increase in queued jobs until 11:21 UTC, when a correction was made and healthy services began burning down the backlog until full resolution.

We have made improvements to the automation and reliability of our fallback process to prevent recurrence. We also have larger work already in progress to improve the overall reliability of our job processing platform.

Feb 29, 12:27 UTC

Update - We're seeing recovery and are going to take time to verify that all systems are back in a working state.
Feb 29, 12:21 UTC

Update - Issues is operating normally.
Feb 29, 12:19 UTC

Update - Webhooks is operating normally.
Feb 29, 12:18 UTC

Update - We're continuing to investigate delayed background jobs. We've seen partial recovery for Issues, and there is ongoing impact to actions, notifications and webhooks.
Feb 29, 11:05 UTC

Update - Actions is experiencing degraded performance. We are continuing to investigate.
Feb 29, 10:58 UTC

Update - We're seeing issues related to background jobs, which are causing delays for webhook delivery and search indexing, and other updates.
Feb 29, 10:36 UTC

Investigating - We are investigating reports of degraded performance for Issues and Webhooks
Feb 29, 10:33 UTC

Feb 28, 2024

No incidents reported.

Feb 27, 2024

No incidents reported.

Product

Platform

Support

Company