Today our Pressero customers hosted in Americas (CHI) Data Center experienced a large slowdown between 12:45PM and 2:45PM CDT. A database problem occurred when we attempted to fix a performance problem, which was affecting a limited number of customers in two different areas of Pressero: Reports and AWI. While reviewing plans and applying small updates to the database procedures, an index was inadvertently disabled, causing the database server to max its CPU load, and causing the large slowdown in Pressero.
The index wasn't intentionally disabled, and discovering that it had been disabled took us some time. We had to try other recovery procedures, like freeing up execution plan cache, investigating slow queries and blocking processes, and even restarting the database server. During all these attempts, due to the high CPU load on the server, sometimes we could not even connect to the database from an external computer, which interfered with ou identification of the real issue.
Once we identified the disabled index, we had to once again switch off the Pressero servers, so we could get exclusive access to the server and rebuild/reenable the index. The duration of the entire slowness/partial outage was about two hours, and orders did continue to be received during that period. All other services and products remained up and running without any service interruption.
We do apologize for this inconvenience.
Infrastructure Manager / Lead Software Architect