OK, so we’re not the biggest site in the world but we have a fair amount of data, a fair amount of users and speed is very important to me so it’s important everything is as fast as possible. A few people have asked what our architecture is and I thought it’d make an interesting post. As is always the way with these things it’s easier to describe with a diagram:
Because we use a web cluster to serve the main HTML we need a central server for avatars and other central data that we don’t push to Cloudfront. The only challenge here was getting content to it. Security in IIS from the main webcluster meant I couldn’t access the machine directly to I had to use a SQL database as a proxy
The main web serving is done by a cluster of IIS machines. These are cheap commodity machines in the Google style. 2GB Ram/2GHz Dual Core CPU/80GB drive. Nothing fancy or expensive. By using multiple cheap machines instead of one big expensive one we get vastly better availability (they can be brought offline for updating), far better performance (if you add up the total computing power) at less cost. It’s a win-win other than it makes the software development slightly more complex at times.
Each machine runs a copy of SQL Express to write access logs to (Which are then copied to the main SQL box when things are quiet) and to store a whole bunch of reasonably static information (such as configuration) to reduce the load on the main SQL box. Each machine can do front end web serving, back end task processing or both. As we need more capacity we can simply add more machines. The load balancers will send the users request to a particular web server using a session cookie. If the server goes down, the failover happens within 10 seconds and you’ll be transparently placed onto a different server.
The back end task processing is something I’m particularly pleased with as it allows the processing load to be distributed across as many machines as we need. At the moment these are the same machines that serve the front end web stuff but at a later date will be split off into a dedicated back end cluster. All the back end processing is done by requesting webpages from a queue. If you want to read about how we process background tasks heres my blog post about it
Background / Offline Processing
As mentioned above, this is done using queues of webpages and is processed by the main web cluster
Main SQL Store
Nothing interesting here really I’m afraid. Just a reasonably beefy Dell machine with data replicated to a hotspare backup.
I’m now using Solr to generate the data for the new Message Analytics Feature . I’ll do a blog post about it at some point in the future but it’s incredibly fast compared to using XML data with SQL. Doing a ‘Group By’ on an XML value in SQL was taking around 1200ms for a particular data set (with an unloaded server). Using Solr on a *much* less powerful machine took 20ms. It’s an incredible piece of software if slightly tricky to use.
The staple of every high performance website. Memcached is a memory based data store. I don’t use it to store reasonably static data as that’s done in the ASP.Net cache object (which is 10x quicker due to it being on the machine itself), but I use Memcached to store precompiled data that’s used across machines. For example, if you get sent a U2U it’s a background task that ‘delivers’ it to your inbox. This task puts the message in your inbox, adds it to the search database, then takes the most recent 10 U2U’s for you and recompiles the HTML you see in your ‘Recent U2U Messages’ widget on your homepage and inserts it into Memcached. The background task then notifies the Totem server about the U2U, Totem notifies your browser, your browser requests the new HTML blob back from the webserver and guess what? It’s already been generated and the webserver just grabs it from Memcached. The beauty of using it over the ASP.Net cache is that cached objects can be shared across machines.
Memcached is a great bit of software and we’ve had absolutely zero issues with it. The current stats from our memcached instance are below:
STAT uptime 28625365 (nearly a year)
STAT time 1280509032
STAT pointer_size 32
STAT curr_items 30626 (It’s 100,000 or so during busy periods)
STAT total_items 10108777
STAT bytes 9450225
STAT curr_connections 17
STAT total_connections 10040
STAT connection_structures 24
STAT cmd_get 39701711
STAT cmd_set 10108777
STAT get_hits 33158267 (It’s saved a LOT of SQL reads!)
STAT get_misses 6543444
STAT bytes_read 3126086257
STAT bytes_written 821258193 (It’s served 800GB!)
STAT limit_maxbytes 524288000
And to think, I was almost tempted to use Velocity instead. You can read why I didn’t.
By applying a bit of thought and leveraging the right technology for each part of the puzzle we’ve got a platform that *way* outperforms a traditional single big webserver setup. We also have minimal load on the main SQL box by using quite aggressive caching (In memory on the local webserver, in Memcached and in SQL Express on the local webserver).