The Speed of Light Sucks
Jeff Atwood
Our current datacenter is in New York City. Yep, where they make all that great salsa. So whenever you make a request to any Stack Exchange site, the internet tubes must connect from your location to our datacenter in NYC. We are not (yet) immune to the laws of physics, so depending on the distance [...]
PEER 1 Hosting – Making your data center more awesome!
Alison Sperling
In 2003, Fog Creek Software (aka Joel’s other baby) moved offices, and decided to ditch its internal T1 and look for a colocation provider. Joel was impressed with PEER 1 Hosting’s customer service, the shiny new data center in NYC, and PEER 1 Hosting even volunteered to host Joel on Software – for free! When we [...]
Database Upgrade
Jeff Atwood
As part of our datacenter migration, the database server received a substantial upgrade: Oregon48 GB2 Xeon X5470 CPUs8 total cores @ 3.33 Ghz NYC64 GB2 Xeon X5680 CPUs12 total cores @ 3.33 GHz However, a few things didn’t go quite to plan in the migration. Much to our chagrin, the database server ended up being [...]
Datacenter Migration Oct. 23
Jeff Atwood
This Saturday, October 23rd, starting at about 2 PM Pacific, we will be migrating all of our primary sites from the Corvallis, OR datacenter to the New York, NY datacenter. Please be advised that this is a major move, and while we will do everything we can to prevent major service interruptions (largely with a [...]
Stack Overflow Outage
Kyle Brandt
As you may have noticed many of Stack Overflow’s websites suffered some down time today from 6am EDT for about an hour and Stackoverflow.com is still offline
Stack Overflow Sponsors HAProxy
Jeff Atwood
As you can see from our network configuration, HAProxy is a much beloved part of our infrastructure. Willy Tarreau, the author, has been extremely responsive and helpful to us in the past. So when we reached a surprising dead-end in our quest to find a reverse proxy that could block HTTP clients using too much [...]
Thermal Event at Datacenter
Jeff Atwood
Our hosting provider, PEAK, let us know that they had a cooling compressor fail in the facility. The primary database server was apparently taken offline at 2:53 AM Pacific Time by this thermal event. The backup database server is still online and has the most recent (12 AM) backups restored to it; we’re currently just [...]
Six Whys – Or, Never Trust Your Network Switch
Jeff Atwood
Remember Joel Spolsky’s fine article “Five Whys”? Sure you do! It contained this paragraph: Michael spent some time doing a post-mortem, and discovered that the problem was a simple configuration problem on the switch. There are several possible speeds that a switch can use to communicate (10, 100, or 1000 megabits/second). You can either set [...]
Stack Overflow Network Configuration
Jeff Atwood
I think we’ve finally arrived at a semi-stable network layout for running Stack Overflow and the rest of the trilogy. Here’s a diagram of our current network layout: The most recent changes were all in the name of redundancy: move to dual HAProxy routing instances add a third Stack Overflow server to the HAProxy rotation [...]
Scheduled Site Maintenance – Saturday
Jeff Atwood
This Saturday, December 19th, from 4 PM to 8 PM PST, all trilogy sites and the blog will be down for a spot of maintenance. What are we doing? A few things, several of which were inspired by comments on our Server Rack Glamour Shots: Upgrading one server which does a lot of VM utility [...]
Stack Overflow Rack Glamour Shots
Jeff Atwood
We installed our secondary (backup) database server tonight. Geoff took the opportunity to snap a few glamour shots of the Stack Overflow server rack at our host PEAK Internet. I present them here for your unbridled enjoyment and pleasure: update: Based on feedback from this post, we went back and improved our rack hygiene: These [...]
Blog Outage – Backup Policies
Jeff Atwood
On Friday, the server which hosts this blog suffered catastrophic data loss. Fortunately, the blog server is at a different datacenter entirely than PEAK, which hosts all the Trilogy sites. It’s a long story, and I’ll document it in more detail elsewhere, but the short version is this: This particular host’s (again, not PEAK) backup [...]
Database Server Upgrade: Outage Saturday
Jeff Atwood
There will be an outage today from 5 pm – 7 pm PST as we upgrade the database server. If you’re curious, we’re going from 2.5 GHz to 3.33 GHz CPUs, and using the old CPUs to build a second, backup database server. Our last upgrade was in July when we went from 24 GB [...]
Stack Overflow Outage
Jeff Atwood
Our apologies for the all-site outage today. According to our Pingdom monitors, we were down from 7:18 PM PST to 9:43 PM PST. There goes our vaunted envy-of-the-industry three nines uptime guarantee! Apparently there was a router meltdown at our ISP, Peak Internet. They promised pictures of the (literally?) melted router via an update on [...]
New DNS Provider
Jeff Atwood
Our domain name registrar is GoDaddy. We’ve had a lot of problems with GoDaddy’s handling of DNS, where DNS entries will suddenly appear and disappear at random. Often, changing a completely unrelated DNS record would result in other DNS entries going missing for hours. Extremely frustrating. As a result of many, many bad experiences, over [...]
Load Balancing Stack Overflow
Jeff Atwood
Starting right now, we will be load balancing the Stack Overflow servers — going from one web tier server, to two. This means you may end up on a different server depending on what HAProxy decides the hash of your IP address is. This shouldn’t cause any problems, but … If you notice anything unusual, [...]
High Scalability Blog on Stack Overflow
Jeff Atwood
It was quite an honor to see that the High Scalability Blog posted an entry on Stack Overflow! We referred to the HSB, and its exhaustively detailed information about how other websites handle scaling, many times during the course of Stack Overflow development. And I’ve cited it myself when researching what we think is the [...]
Database Server Upgrade — 48 GB
Jeff Atwood
We are currently upgrading the database server to 48 GB of memory, which also means we have to upgrade the operating system, too. Should be done in about an hour. OK, this is complete. Our database server not only has 48 GB of memory installed, but has access to all of that memory. Finally. And [...]
The Perfect Web Spider Storm
Jeff Atwood
We noticed something unusual on our Cacti graphs today. Can you spot it? Yes! The light gray of the graph background does seem a few shades lighter than normal! I see it too! No, no, of course I’m talking about that massive traffic spike from 06:00 to 15:00 PST (server time). In the words of [...]
Tuesday Outage: It’s RAID-tastic!
Jeff Atwood
We had a brief outage early Tuesday morning from 3 AM – 5 AM PST, because the database server was doing this: Oh noes! Not … !!! CRITICAL ERROR: Memory retention failure, unflushed cache lost !!! There are six exclamation points so you know it’s serious. Also, you have to press ENTER. Because it’s a [...]