Resolved -
On January 9 between 12:45 and 13:56 UTC, services in one of our three sites experienced elevated latency for connections. This led to a sustained period of timed out requests across a number of services, including but not limited to our git backend. An average of 5% and max of 10% of requests failed with a 5xx response or timed out during this period. This was caused by a combination of events that led to connection limits being hit in load balancer proxies in that site. An upgrade of hosts was in flight, which meant a subset of proxy hosts were draining and coming offline as the upgrade rolled through the fleet. A config change event also triggered a connection reset across all services in that site. These events are commonplace, but led to a spike in connection establishment events that led to the online proxy hosts hitting the connection limit. Upon further analysis, that limit was lower than it should have been. We have increased that limit to prevent this from recurring. We have also identified improvements to our monitoring of connection limits and behavior and changes to reduce the risk of proxy host upgrades leading to reduced capacity.
Jan 9, 14:40 UTC
Update -
Pull Requests is operating normally.
Jan 9, 14:38 UTC
Update -
Packages is operating normally.
Jan 9, 14:37 UTC
Update -
Git Operations is operating normally.
Jan 9, 14:32 UTC
Update -
Codespaces is operating normally.
Jan 9, 14:27 UTC
Update -
Actions is operating normally.
Jan 9, 14:25 UTC
Update -
API Requests is operating normally.
Jan 9, 14:24 UTC
Update -
Issues is operating normally.
Jan 9, 14:24 UTC
Update -
Webhooks is operating normally.
Jan 9, 14:20 UTC
Update -
Pages is operating normally.
Jan 9, 14:16 UTC
Update -
API Requests is experiencing degraded performance. We are continuing to investigate.
Jan 9, 14:15 UTC
Update -
5xx error rates remain elevated but are seeing a downward trend with many services fully recovered. We will continue monitoring the situation and keep users updated on progress toward full recovery.
Jan 9, 14:15 UTC
Update -
Actions is experiencing degraded performance. We are continuing to investigate.
Jan 9, 14:11 UTC
Update -
We are experiencing an elevated rate of 5xx errors on the order of 1-5% being returned from numerous APIs across the site. The issue has been isolated to one datacenter. We will continue to keep users updated on progress towards mitigation.
Jan 9, 13:39 UTC
Update -
Codespaces is experiencing degraded performance. We are continuing to investigate.
Jan 9, 13:35 UTC
Update -
Packages is experiencing degraded performance. We are continuing to investigate.
Jan 9, 13:34 UTC
Update -
Webhooks is experiencing degraded performance. We are continuing to investigate.
Jan 9, 13:31 UTC
Update -
Git Operations is experiencing degraded performance. We are continuing to investigate.
Jan 9, 13:24 UTC
Update -
Pages is experiencing degraded performance. We are continuing to investigate.
Jan 9, 13:23 UTC
Update -
API Requests is experiencing degraded availability. We are continuing to investigate.
Jan 9, 13:07 UTC
Update -
We are an increase in the rate of 5xx errors on the order of 1-3% being returned from numerous APIs across the site. We will continue to keep users updated on progress towards mitigation.
Jan 9, 13:05 UTC
Update -
Actions is experiencing degraded availability. We are continuing to investigate.
Jan 9, 13:05 UTC
Update -
Pull Requests is experiencing degraded performance. We are continuing to investigate.
Jan 9, 13:04 UTC
Investigating -
We are investigating reports of degraded performance for Issues and API Requests
Jan 9, 13:02 UTC