On October 19th between 10:00 and 11:30 PDT UserVoice experienced two approximately 10 minute infrastructure outages that caused site-wide outages and system unavailability.
Business Impact
During the outage end users and admins would have been unable to load or interact with UserVoice sites or widgets.
Email would have been delayed, but no emails were lost.
Root Cause
UserVoice uses an in-memory data-store cluster (Redis) to handle asynchronous job management and transient data storage. A recent change to one of the libraries that use this service caused a very sudden increase in its usage. The sudden usage increase caused a system failure and prevented failover to like-sized standby services.
What we are Doing to Prevent This