November 19th, 2006
Network Outage
We had a major network outage Sunday morning starting around 0900 CST. More details to follow, but in short, a routing loop in Chicago seems to be the culprit. Sincere apologies for the outage.
We had a major network outage Sunday morning starting around 0900 CST. More details to follow, but in short, a routing loop in Chicago seems to be the culprit. Sincere apologies for the outage.
March 17th, 2008 at 10:00 PM matt
Just a bit more, since we probably won’t have solid details for another couple of days. We started experiencing intermittent network trouble just before 0900. Our early guess is a routing loop out of Chicago which seems to have bogged down both our routers and an upstream router. We’re not ruling any sort of malicious attack out just yet (DoS), but that doesn’t appear to be the case.
We’re planning an offsite update page to keep people in the loop should a long outage like this strike again. Please offer any other suggestions.
March 17th, 2008 at 10:00 PM david
Just wondering how long the down time was – was it more than an hour?
Thanks
March 17th, 2008 at 10:00 PM matt
Yes David it was over 2 hours. We’re still not exactly sure how many people were affected since it started off intermittent and seemed to spread.
March 17th, 2008 at 10:00 PM Chris
Any updates on this?
March 17th, 2008 at 10:00 PM matt
Not at the moment Chris, we haven’t forgotten and are working w/ the NOC engineers to track down the cause. Will post an update when we are certain.
March 17th, 2008 at 10:00 PM nick
Anything new on the outages and what’s being done to prevent ‘em in the future? I’m considering selling hosting at Slice to a fairly large client of mine, but I’d really like to hear the full story first. A two to three hour outage would be potentially very expensive for this particular client.
March 17th, 2008 at 10:00 PM matt
Matt we’re still working w/ the NOC team and have brought in some external help. As soon as we have something concrete, we’ll let everyone know.
March 17th, 2008 at 10:00 PM Tom
Any news on a) what went wrong and b) the offsite status page you suggested?
March 17th, 2008 at 10:00 PM matt
Tom – we have some theories on what went wrong, but cannot be 100% yet (and hence no updated). In short, we think it may have been a malicious break-in to a user Slice and a DoS attack, but some things aren’t adding up – so we’re hesitant to just call it that. We’ve continued monitoring for other evidence with the NOC.
Regarding the offsite notification, we’ve looked into it and are considering moving the blog/forum offsite to serve as a communication point should something go wrong.