Resolved -
On July 31, 2024, between 07:05 UTC and 09:01 UTC the Actions service experienced degradation, preventing it from processing API requests and executing jobs, in particular Pages builds. On average, 2% of jobs run during the incident window were affected. This was due to some nodes in one of our partner services experiencing connectivity issues in the East US2 region. We mitigated the incident by failing over the impacted service and re-routing the service’s traffic out of that region.
We are working to improve monitoring and processes of failover to reduce our time to detection and mitigation of issues like this one in the future.
Jul 31, 09:20 UTC
Update -
Actions is operating normally.
Jul 31, 09:20 UTC
Update -
We are continuing to see improvements in queuing and running Actions jobs and are monitoring for full recovery.
Jul 31, 09:13 UTC
Update -
We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.
Jul 31, 08:28 UTC
Update -
Actions is experiencing degraded performance. We are continuing to investigate.
Jul 31, 08:07 UTC
Update -
We are investigating reports of degraded performance in some Redis clusters.
Jul 31, 08:02 UTC
Investigating -
We are currently investigating this issue.
Jul 31, 07:59 UTC