North America (Chicago) - Pressero - AdminSlowness on Admin
One of our application modules crashed in our servers one at a time in different moments, causing the application to restart.
The warmup process may take a minute and, cause some requests to fail due to timeouts (504 errors).
We do have a healthcheck procedure in place, but it only removes the failed instances when they fail to respond for three times in a row with an expected time of 15s. This process may take up to 45s to detect that a node is having issues and only after that will wait for it to recover, what happens automatically.
The crash dumps were collected and will be analysed by our engineering team.