2018-06-09 All times are UTC
16:57 Alerts about my.pingdom.com and AWS-DC VPN sent to Pingdom SRE/Operations
17:32 Rebooted message queue servers used for uptime monitoring
Service 100% impaired from this moment. Uptime monitoring and alerts delayed, but not ignored.
18:19 Message queue service restored
Uptime monitoring service recovering.
18:57 Back to normal operation, all queues cleared and alerts sent out, although delayed
We are working on migrating our various services to more resilient infrastructure and deprecating old functionality (beepmanager) that is behind the root cause of this. In the meantime additional monitoring and automated services has been put into place to prevent this type of issue from happening again.