Pingdom Service Outage 2018-06-09

2018-06-09 All times are UTC

16:57 Alerts about and AWS-DC VPN sent to Pingdom SRE/Operations

Investigating begins

17:32 Rebooted message queue servers used for uptime monitoring

Service 100% impaired from this moment. Uptime monitoring and alerts delayed, but not ignored.

18:19 Message queue service restored

Uptime monitoring service recovering.

18:57 Back to normal operation, all queues cleared and alerts sent out, although delayed

We are working on migrating our various services to more resilient infrastructure and deprecating old functionality (beepmanager) that is behind the root cause of this. In the meantime additional monitoring and automated services has been put into place to prevent this type of issue from happening again.

Was this article helpful?
0 out of 0 found this helpful