On September 12, 2017 from 7:12AM to 4PM EDT, 43% of requests to UserVoice failed.
Business Impact
During the incident, admins and end users would have observed the following:
Root Cause
One of our log servers crashed reducing our logging capacity, and as a result our application instances attempted to log at a volume that exceeded the allocated capacity.
When the application instance was unable to the send its logs, it followed default behavior and paused until log throughput could continue.
These widespread sporadic pauses, beyond slowing down in-process requests, also caused individual instance health-checks to fail leading to frequent restarts and additional reduced capacity.
What We are Doing to Prevent This
We do apologize for the pain points this caused for you and your team. It is something we take very seriously as we work to provide you with the best service possible.
If you have any additional questions in regards to this issue, please reach out to me directly at claire.talbott@uservoice.com.
Claire Talbott
Support Manager