Distributed Caching Showdown – Memcached vs Velocity

June 29, 2009 ~ briandrought ~ 1 Comment

In the red corner is Memcached (http://www.danga.com/memcached/) and the BeITMemcached .NET library (http://code.google.com/p/beitmemcached/ weighing in at £0 and all the way from geeky Unix-land.

In the blue corner is Velocity (http://msdn.microsoft.com/en-us/data/cc655792.aspx) also weighing in at £0 and from Redmond.

Distributed caching is a simple system… you have 1 or more machines which you use as a memory store, normally with key/value pairs. It’s not really complicated, so how do these two differ?

Installation

After 2 hours of messing around with Ubuntu and trying to get memcached to install I gave up and plumped for the seriously simple MemcachedManager (http://allegiance.chi-town.com/MemCacheDManager.aspx). You just tell it which Windows servers you want to use and it remotely installs the service for you. Can’t be any easier.

Velocity was about as difficult. Just install PowerShell V1.0 on the machine first, then run the Velocity installer and you’re
pretty much done. You need to run a few scripts (included in help file) to create a cache but it takes under 2 minutes.

Features

No question here, Velocity has it licked. Memcached offers you, errr, ‘Put’ and ‘Get’ pretty much. Velocity gives you such lovelies as:

Cache Invalidation (things in SQL changing can expire the cache)
Cache Groups (so you can specify different policies for different types of data)
High Availability (you can use 3 or more servers to 100% ensure your data stays up)
Local Cache (for even more performance for data that can be stale)
Ability to use it to store Session data
64bit version so no real limit on memory
…and a whole bundle more.

Performance

After installing Memcached and Velocity on the same pair of servers, I wrote a small app to compare the performance. It simply
writes/reads 1000 small strings, 1000 largeish XML strings and 1000 Integers to and from the cache. The results are as follows:

MemCached

Velocity

Velocity (With local-cache turned on)

Clearly memcached is faster than Velocity (unless you count the local cache option which is cheating!!!). Velocity seems less fussed about the long XML strings than memcached (5x slower on memcached but only 2x slower on Velocity to read them back!) but that could be the client library.

Working With Them

Both support a PUT/GET model that’s pretty identical:

Velocity:

Dim CacheFactory1 As DataCacheFactory = New DataCacheFactory() Dim myCache1 As DataCache = CacheFactory1.GetCache("test")

myCache1.Put("Author", "Brian")
Dim name As String = myCache1.Get("Author")

Memcached:

Dim objcache As BeIT.MemCached.MemcachedClient
objcache = (BeIT.MemCached.MemcachedClient.GetInstance("production"))

objcache.Set("Author", "Brian")
Dim name As String = objcache.Get("Author")

In use there’s very little in it, however there was something about Velocity I just couldn’t put my finger on. Pulling out the
network lead confirmed my hunch:

DOH! Something designed to give us scalibility, resilience etc fails spectacularly if it can’t find the hosts. The whole idea behind these things is that you can use spare memory on spare machines. Machines that may go down once in a while (or reboot for updates for example). Memcached fails much more gracefully and just returns ‘nothing’ after a short delay which is what you’d expect. I’m sure Velocity could be coded around, but for me, for now… using ‘spare’ machines, Memcached seems to be the way to go. I’ll code it behind a ‘layer’ so I can switch to Velocity (or something else) if we ever need something bigger than memcached.

Tweaking the Message Review Page

June 19, 2009 ~ briandrought ~ 1 Comment

When our operators save a client message, it then takes them to a review screen (which I’ll cover in more detail about the neat things it does when I’ve got time). It tells them if they’ve made a spelling mistake, or if they’ve omitted a field etc. It also does things like proper casing etc.

Previously it looked like this:

However, even when presented with the review, occasionally they either instinctively click on ‘Send Original’ or they miss a spelling suggestion. Not very often, but when it’s busy in the call centre it does happen. So let’s see what could be the problem:

Right, so a few quick changes later and we end up with something much cleaner and steers the operator towards the modified version:

The styles of the corrections are now the same (mis-spelt words being a bit stronger). More visual weight to the ‘Send/Edit Modified’ links. The ‘Original’ links are now hidden for the first 2 seconds aswell.

All in all, some very simple tweaks that took 20 minutes to do. I can measure the percentage of messages that get sent without being corrected, so hopefully after a few weeks we’ll have a noticeable difference.

It’s not about the tools or the language! (Yeah right)

June 12, 2009 ~ briandrought ~ Leave a comment

Pretending it’s all about *doing* stuff seems the be the trendy viewpoint for some developers these days with their war crys of “It doesn’t matter what language you write in, just get writing” and “You can use anything to write code, even notepad” etc.

The people most vocal about this seem to be the Ruby On Rails guys with their “Just start coding” ethos, though in part that may be them picking up on DHH‘s slightly militant world viewpoint.

Great, I agree 100%. But do you know what? Ruby On Rails is too fiddly. Maybe it’s purely because I’m on Windows, but the promise of it being ‘Low on dependencies and prides itself on shipping with most (sic) everything in the box’ is a bit of… well… a lie. You got to download Ruby. Then Gems. Then use Gems to download Rails…..and a DB. Then you’re going to need an editor……

If the RoR guys want a bigger take up they need to lower the barrier to entry a long long way.

I did eventually manage to get it running by using a combination of the recommended downloads, a lot of luck, and the InstantRails download (which hilariously didn’t work first time and was missing something though I can’t remember what). I wish I’d documented the comedy install process, but I must have typed a good 20 commands into the command line, edited a fair few config files and spent about an hour on Google tracing error messages. By the time I’d got the ‘hello world’ site running I no longer had the enthusiasm to start learning it.

In contrast, I put Visual Web Developer on my laptop (which is a clean machine other than the huge amount of Ruby junk now on it). The install took 2 minutes, worked first time and needed nothing else.

So no, it’s not about the tools. It’s not about the language either. But I’ll take something that works first time over something that doesn’t…….

LINQ to SQL (part 2)

June 3, 2009 ~ briandrought ~ Leave a comment

Follow on from Part 1

The problem I find with these ‘drag your table on here and it all just works’ tools is……. extensibility. If you want to just have an editing tool then they’re great, but anything custom and it becomes really really hard (To the point where you end up binning it and hand rolling from the start again!). Not so with LINQ to SQL…

Validation

The data layer supports any kind of validation you like, and it just appears to your code like a normal exception. For example, if we want to prevent someone entering a date from the past, we can just go into the pregenerated designer class, find the OnValidate sub and put in our code:

Run our insert code with a date in the past, and this is what happens:

Very very neat solution and doesn’t rely on any complicated overrides.

Extra Custom Code

The other extensibility issue I normally find is when you want to ‘do something else’. Say you want to bill for copied MBytes through the backup service. Where would you put that in? You could do it in the main code, but you can also handle it in the data layer. For example, you can put some code in this ‘attach’ method:

Now whenever we call the Add method, our code runs and we can bill for those pesky copied MBytes.

Note the really cool part… in our parameters for the call to the billing routine, we’re grabbing info from both entities.

LINQ to SQL (part 1)

June 2, 2009 ~ briandrought ~ Leave a comment

I’ve begun my learning escapade with LINQ to SQL. I sort of had an idea what it was about but had never really had a chance to play with it. I’m glad I took the time! There’s a million and one tutorials on the net so I won’t bother replicating one. I’ll just do a ‘snapshot’ of some of the features.

Diagram

First of all, you add the LINQ to SQL Class in Solution Explorer. Then drag a couple of tables from Server Explorer onto the design surface. You can then tell it what the relationship between the tables is (you don’t have to have set this in SQL!).

Classes

Then, when you hit save, it auto generates some classes for you that look like voodoo inside:

Getting Records

Now the cool part… the code window. Check out how simply you can select a record and update it. Then see how we can select a record which would have required a SQL JOIN before. We’ve selected from the JobHistory table, where the related BackupJob has a Computer field which equals “Mimir”. It’s all automatic, I’ve not added anything else in. Very clever.

Adding Records

Adding a record in SQL is normally an utter PITA. Lack of intellisense means having to build a huge parameterised query. Not with LINQ to SQL!

Adding Relational Records

I’ve still got more digging to do into the product but this bit here has pretty much blown me away. This would have been a right nightmare in SQL. You’d have had to do an insert and return the SCOPE_IDENTITY, then insert the history rows using that. This makes is staggeringly easy. You just build the objects in memory, add the ‘child’ rows to the main row and off you go.

and to prove it works, here are the rows:

I have to say, I’m hugely impressed! I wish I’d known about it sooner. For performance, it’s undoubtedly better to stick to hand rolled SQL, but for those horrendously boring CRUD scenarios like saving user preferences, this will be a god send!!

Building a scalable, fault tolerant background processing system (part 2)

June 2, 2009 ~ briandrought ~ 1 Comment

After having decided how I’ll roll my batch jobs up (see Part 1) I now needed a ‘thingy’ ™ to fire the jobs off.

This ‘thingy’ should probably be a .NET desktop app rather than a service so I can touch it and see it. This ‘thingy’ simply needs to look in a SQL table to fetch the ‘queue’ and then it needs to execute the jobs. But what if it crashes? Or that machine goes down? Simple.. we go multi server. We simply run the ‘thingy’ (let’s call it QueueProcessor from now on?) on each of the web servers in our web farm. We’ll put the ‘job’ web pages on each machine as well so they can simply call http://localhost/backgroundservices/dosomething.aspx

OK, but what do we do about polling? Since this is designed for scalability and resilience over outright speed/low latency you probably only need to check the queue once per second (and let’s be honest, if you wanted to be more responsive than that you wouldn’t poll at all, you’d find a way of pushing data instead). If it was a single server it’d be a piece of cake, but multi server we need to be cleverer.

We need to poll the queue every second, but from alternating machines

A few caveats though:

The local times on the machines will be different.
The network latency to the server can be different from machine to machine (if some are located elsewhere)

If the local times are out of sync you can’t possibly hope to get 2 servers to ping every second. If one server is 0.9 seconds out you’d get checks at 0.000, 0.100, 1.000, 1.100. Not ideal.

So I use SQL time as a central time, i.e. return GetDate() as part of the queue check. Then… I also work out our network latency by figuring out the response time (we assume the query took 0ms on the server). Once I’ve done that, I then have an offset between local time and server time. I then have a timer than runs every 10ms, and just does a simple calculation using local time + the offset, and the number of running machines (grabbed by looking at how many machines have ‘checked in’) to determine if it should do anything.

Simple.

Expanding your programming skillset

June 2, 2009 ~ briandrought ~ Leave a comment

I’ve followed a pretty simple education process when learning how to program:

BBC Basic
QuickBasic
Acorn Basic
Visual Basic 5
Visual Basic 6
ASP
ASP.Net (VB)

Spot a theme there? Obviously along the way I’ve learnt SQL, picked up a bit of Javascript and enough to know that I’m dangerous with C. Side skills like HTML/CSS are a given.

But now what? For the last 4 years I’ve been trudging along with ASP.Net. Sure, it’s a MASSIVE language and I come across unknown functions every day, but it’s not really a challenge anymore. I want to learn something new. Languages are an obvious choice, but then you could also argue that some of the framework extensions in .NET are big enough to be classed as a language, or at least a dialect.

I’ve set myself a target of groking one language/technology per week, though I’m pretty sure some are going to take less time and others much much more. So what to learn?

PHP
Ruby
Ruby on Rails
LINQ
ADO Entity Framework
ASP.NET MVC
Javascript (like… learn it properly rather than my pidgin version)

PHP for example isn’t that ‘big’. Sure, there’s a few functions but so long as I can remember where to look for those functions I’ll be fine. Ruby On Rails looks hugely appealing for those “I need to knock this up quick sharp” moments.

So, if you had 2 months worth of spare time (say 1.5-3 hours per day) to learn some new skills what would you learn?

Voice File Storage (aka “storing a metric crap load of MP3s”)

June 1, 2009 ~ briandrought ~ Leave a comment

We record every call we take for quality control reasons, and to protect our operators against nuisance callers etc. Anyway, everything is recorded by a rather ghetto (but very workable) system which records each call onto the soundcard of the workstation that’s taking the call. It’s then pushed into a queue on our main file server as a WAV file, the filename of which contains the ‘tag’ of the message for future matching up.

Then there’s a processor server than converts all these WAV files into MP3’s and dumps them onto a larger file server in this kind of fashion:

\bigassservervoicefiles200951thisfilename.mp3

This larger file server is currently running out of space however. It’s also used for our backup, and the 1.5GB/day of voice recordings is starting to take it’s toll.

So how should I store all these MP3s? The server doesn’t need to be fast at all. It probably writes 1gb day and reads maybe 100mb / day so it’s hardly stressed at all. Uptime isn’t an issue either. If it goes down, things just get queued up. The data however is quite ‘precious’. You certainly couldn’t replace it. The data is also incredibly unwieldy, you can’t just keep a 99% copy offsite easily.

A NAS device is probably the way forwards, but they come in so many flavours it’s difficult to know how to approach it. With a requirement of around 3TB of storage, there’s a few options available. I could buy an expensive single NAS, put my faith in RAID5 and cross my fingers that the device itself won’t break. I could buy a pair of cheap 2 bay NAS devices, run then with no RAID on at all and mirror the data across the boxes. This takes away the potential of the device breaking, but means that the devices WILL fail and at some point I will have to rebuild from the other box. Or you could spend even more cash and do a hybrid of those. A decent RAID5 3TB box, and a cheap 3TB non RAID mirror.

Option 1 – ‘High End’ NAS device

NetGear ReadyNAS 1100 4x1TB Rackmount Network Storage (NAS) RNR4410-100EUS – £1,566.99 inc vat

Pros: Plug and play, 3TB of RAID5 storage, should work fine.
Cons: Could dump the data if the box goes pop or the RAID volume goes pop

Option 2 – 2 x low end NAS devices

2 x NetGear ReadyNAS Duo + 4 x 1.5TB drives = (£205 + £200) x 2 = £810 inc vat

Pros: Protected against device failure.
Cons: A drive failure (which WILL happen) will take the whole device down

Option 3 – 1 x medium and 1 x low end

NetGear ReadyNAS NV+ 4 Bay + 4x1TB = £459 + (4 x £80) = £779
NetGear ReadyNAS Duo + 2 x 1.5TB drives = £405
total of £1184 inc vat

Pros: Protected against device failure AND disk failure on primary device. Can spec 4 bay with 1.5TB drives to get an extra 1.5TB of unmirrored space.
Cons: Little bit more pricey than option 2, more work to admin than option 1.

So which option would you go for, and why?

If money was no object I’d have one of these by now…..