Our current datacenter is in New York City. Yep, where they make all that great salsa. So whenever you make a request to any Stack Exchange site, the internet tubes must connect from your location to our datacenter in NYC. We are not (yet) immune to the laws of physics, so depending on the distance [...]

In 2003, Fog Creek Software (aka Joel’s other baby) moved offices, and decided to ditch its internal T1 and look for a colocation provider. Joel was impressed with PEER 1 Hosting’s customer service, the shiny new data center in NYC, and PEER 1 Hosting even volunteered to host Joel on Software – for free! When we [...]

Database Upgrade

Jeff Atwood

As part of our datacenter migration, the database server received a substantial upgrade: Oregon48 GB2 Xeon X5470 CPUs8 total cores @ 3.33 Ghz NYC64 GB2 Xeon X5680 CPUs12 total cores @ 3.33 GHz However, a few things didn’t go quite to plan in the migration. Much to our chagrin, the database server ended up being [...]

This Saturday, October 23rd, starting at about 2 PM Pacific, we will be migrating all of our primary sites from the Corvallis, OR datacenter to the New York, NY datacenter. Please be advised that this is a major move, and while we will do everything we can to prevent major service interruptions (largely with a [...]

Stack Overflow Outage

Kyle Brandt

As you may have noticed many of Stack Overflow’s websites suffered some down time today from 6am EDT for about an hour and Stackoverflow.com is still offline

As you can see from our network configuration, HAProxy is a much beloved part of our infrastructure. Willy Tarreau, the author, has been extremely responsive and helpful to us in the past. So when we reached a surprising dead-end in our quest to find a reverse proxy that could block HTTP clients using too much [...]

Our hosting provider, PEAK, let us know that they had a cooling compressor fail in the facility. The primary database server was apparently taken offline at 2:53 AM Pacific Time by this thermal event. The backup database server is still online and has the most recent (12 AM) backups restored to it; we’re currently just [...]

Remember Joel Spolsky’s fine article “Five Whys”? Sure you do! It contained this paragraph: Michael spent some time doing a post-mortem, and discovered that the problem was a simple configuration problem on the switch. There are several possible speeds that a switch can use to communicate (10, 100, or 1000 megabits/second). You can either set [...]

I think we’ve finally arrived at a semi-stable network layout for running Stack Overflow and the rest of the trilogy. Here’s a diagram of our current network layout: The most recent changes were all in the name of redundancy: move to dual HAProxy routing instances add a third Stack Overflow server to the HAProxy rotation [...]

This Saturday, December 19th, from 4 PM to 8 PM PST, all trilogy sites and the blog will be down for a spot of maintenance. What are we doing? A few things, several of which were inspired by comments on our Server Rack Glamour Shots: Upgrading one server which does a lot of VM utility [...]

We installed our secondary (backup) database server tonight. Geoff took the opportunity to snap a few glamour shots of the Stack Overflow server rack at our host PEAK Internet. I present them here for your unbridled enjoyment and pleasure: update: Based on feedback from this post, we went back and improved our rack hygiene: These [...]

On Friday, the server which hosts this blog suffered catastrophic data loss. Fortunately, the blog server is at a different datacenter entirely than PEAK, which hosts all the Trilogy sites. It’s a long story, and I’ll document it in more detail elsewhere, but the short version is this: This particular host’s (again, not PEAK) backup [...]

There will be an outage today from 5 pm – 7 pm PST as we upgrade the database server. If you’re curious, we’re going from 2.5 GHz to 3.33 GHz CPUs, and using the old CPUs to build a second, backup database server. Our last upgrade was in July when we went from 24 GB [...]

Stack Overflow Outage

Jeff Atwood

Our apologies for the all-site outage today. According to our Pingdom monitors, we were down from 7:18 PM PST to 9:43 PM PST. There goes our vaunted envy-of-the-industry three nines uptime guarantee! Apparently there was a router meltdown at our ISP, Peak Internet. They promised pictures of the (literally?) melted router via an update on [...]

New DNS Provider

Jeff Atwood

Our domain name registrar is GoDaddy. We’ve had a lot of problems with GoDaddy’s handling of DNS, where DNS entries will suddenly appear and disappear at random. Often, changing a completely unrelated DNS record would result in other DNS entries going missing for hours. Extremely frustrating. As a result of many, many bad experiences, over [...]

Starting right now, we will be load balancing the Stack Overflow servers — going from one web tier server, to two. This means you may end up on a different server depending on what HAProxy decides the hash of your IP address is. This shouldn’t cause any problems, but … If you notice anything unusual, [...]

It was quite an honor to see that the High Scalability Blog posted an entry on Stack Overflow! We referred to the HSB, and its exhaustively detailed information about how other websites handle scaling, many times during the course of Stack Overflow development. And I’ve cited it myself when researching what we think is the [...]

We are currently upgrading the database server to 48 GB of memory, which also means we have to upgrade the operating system, too. Should be done in about an hour. OK, this is complete. Our database server not only has 48 GB of memory installed, but has access to all of that memory. Finally. And [...]

We noticed something unusual on our Cacti graphs today. Can you spot it? Yes! The light gray of the graph background does seem a few shades lighter than normal! I see it too! No, no, of course I’m talking about that massive traffic spike from 06:00 to 15:00 PST (server time). In the words of [...]

We had a brief outage early Tuesday morning from 3 AM – 5 AM PST, because the database server was doing this: Oh noes! Not … !!! CRITICAL ERROR: Memory retention failure, unflushed cache lost !!! There are six exclamation points so you know it’s serious. Also, you have to press ENTER. Because it’s a [...]