November 19th, 2006

Network Outage

We had a major network outage Sunday morning starting around 0900 CST. More details to follow, but in short, a routing loop in Chicago seems to be the culprit. Sincere apologies for the outage.

9 Comments

  1. Just a bit more, since we probably won’t have solid details for another couple of days. We started experiencing intermittent network trouble just before 0900. Our early guess is a routing loop out of Chicago which seems to have bogged down both our routers and an upstream router. We’re not ruling any sort of malicious attack out just yet (DoS), but that doesn’t appear to be the case.

    We’re planning an offsite update page to keep people in the loop should a long outage like this strike again. Please offer any other suggestions.

  2. Just wondering how long the down time was – was it more than an hour?

    Thanks

  3. Yes David it was over 2 hours. We’re still not exactly sure how many people were affected since it started off intermittent and seemed to spread.

  4. Any updates on this?

  5. Not at the moment Chris, we haven’t forgotten and are working w/ the NOC engineers to track down the cause. Will post an update when we are certain.

  6. Anything new on the outages and what’s being done to prevent ‘em in the future? I’m considering selling hosting at Slice to a fairly large client of mine, but I’d really like to hear the full story first. A two to three hour outage would be potentially very expensive for this particular client.

  7. Matt we’re still working w/ the NOC team and have brought in some external help. As soon as we have something concrete, we’ll let everyone know.

  8. Any news on a) what went wrong and b) the offsite status page you suggested?

  9. Tom – we have some theories on what went wrong, but cannot be 100% yet (and hence no updated). In short, we think it may have been a malicious break-in to a user Slice and a DoS attack, but some things aren’t adding up – so we’re hesitant to just call it that. We’ve continued monitoring for other evidence with the NOC.

    Regarding the offsite notification, we’ve looked into it and are considering moving the blog/forum offsite to serve as a communication point should something go wrong.

Leave a Reply