Pingdom public API incident on Sun 22nd July 2018

Notice that all times are UTC.

On Friday 20th of July our engineering team deployed an update to our Public API. This update required cooperation with an internal 3rd party integration that unfortunately suffered a memory leak over the weekend.

On Sunday 22nd of July at 11:16 our SRE team received an alert that our API had started to respond with HTTP 503 server error due to the memory leak. Due to the gradually expanding memory leak some customers might have been suffering from this issue even earlier, some as early as 11 hours before 11:16. 



After a quick discussion with our engineering team the change on Friday was rolled back and service was fully restored at 12:00 for all API and mobile app users.

To prevent this from happening in the future we have added additional monitoring to our services, and will of course make sure that anything 3rd party can not impede the API service again like this.

Was this article helpful?
4 out of 4 found this helpful