28th August 2018

North America (Chicago) - Pressero - Admin Slowness on Admin

One of our application modules crashed in our servers one at a time in different moments, causing the application to restart. The warmup process may take a minute and, cause some requests to fail due to timeouts (504 errors). We do have a healthcheck procedure in place, but it only removes the failed instances when they fail to respond for three times in a row with an expected time of 15s. This process may take up to 45s to detect that a node is having issues and only after that will wait for it to recover, what happens automatically. The crash dumps were collected and will be analysed by our engineering team.