Mini mouse review – Microsoft Sidewinder X8

My old stalwart Logitech cordless mouse has been slowly getting worse over the last few months. It was struggling with the wood grain on my desk, and even on a better surface the pointer would occasionally jump.

Looking around at mice I wanted something with super high DPI for accuracy, a couple of buttons for doing ‘other things’ and it needed to be cordless. Getting the wire wrapped around your keyboard is so 1990s

Seemingly, the problem with wireless mice is a vast majority now seem to be aimed at laptop users, so there’s no charge cradle, they just run on an AA battery or two and have an off switch. Great until you leave it on overnight and have a flat mouse in the morning.

On the DPI front, most regular mice seem to top out at 2000DPI or so, but who do we know that always need vastly over the top kit? Yes… gamers!

After a hasty 10 minutes research I located this puppy on Amazon:

41GKGnzX3fL._SL500_AA300_[1] 

http://www.amazon.co.uk/dp/B001DCELH2

 

Yes, it looks ridiculous. The odd styling touches, bizarre logos and occasional lights make it the Citroen DS3 Racing of the mouse world,  but for someone looking for a responsive wireless mouse it’s perfect.

Have used it for a week now and the accuracy over the old Logitech is very useful. Also useful are the adjustable DPI buttons on the top, so I can have Fast/Normal/Slow But Accurate settings direct from the mouse. Designed for gamers doing stupid things in first person shooters, but incredibly useful to be able to slow it right down for masking fiddly details in photo software.

Oh, and the bit I like the most though I’ve not needed it yet, there’s no cradle. Instead, there’s a magnetic puck on the end of a thin cable so you can charge it whilst you use it like a corded mouse. My old Logitech required you to pop it in the cradle and twiddle your thumbs for a few hours if the battery ran out.

All in all, a top product at a reasonable price. I just wish it didn’t look quite so Halfords.

Managing clients patched calls with a single click

The status quo of call patching

With our pureJAM service you can have calls patched to you. You simply set your call instructions for one of 4 statuses. You can change this status via Twitter/SMS/Web page so you have pretty good flexibility about how we handle your calls. You can have us try to patch to your mobile, your landline, both, or neither for each status. See the below example for what a call instruction looks like when being set up.

1

In the above example, when the operator takes the call for me and I’m ‘Busy’, they’ll politely put the caller on hold, call me on 02072070007 and see if I want to take the call. They’ll tell me who they have on the line, and if I want to take the call they’ll put them through, if I don’t they’ll explain to the caller that I’m unavailable and take a message.

A better solution

Wouldn’t it be great though, if you could have this ‘discussion’ with our operators much much more quickly and less intrusively. A lot of us spend our time behind a computer, but that doesn’t always mean you’re available to take calls. Maybe you’re on another call, maybe you’re on a video chat or maybe you just want to get some work done.

So, I’ve developed a very prototype system for doing just that. It uses a desktop client that sits in your taskbar (and allows you to change your status, gives you indication of unread messages etc). You must be a pureJAM client (otherwise there’s no point as you can’t have calls patched!), have a PC, and you must have the .NET Framework 3.5 or later. It’s a Windows ClickOnce application so it’s easily installed from a URL. It also connects to our systems on port 80 and uses standard HTTP so it’ll work behind any firewall/NAT. If you want the download link then you’ll need to register for the Beta program (details at the end of the post).

Step 1: Instruction Setup

Once you’ve installed the desktop client and logged in and out of your online portal you’ll see the desktop client logs in as well (using a rather neat way of linking a browser to a Windows application). Then, if you go into your contact instructions you’ll see a new menu option under ‘Patch To’:

2

You then get a choice of action if we don’t hear back from you i.e. you’re not at your computer (incidentally, you can install copies of this desktop client on multiple computers and it’ll send the request to any that are logged in)

3

And this is what the status summary now looks like:

4

Step 2: A caller comes through to us

When our operator gets a call for you, they’ll get the basic details from the caller, and then they’ll click the (See If They Want This Call) button.

5

Step 3: We let you know about the call

Within a few milliseconds, a packet of data is sent to your PC (via our really cool comms system that you don’t need to care about) containing the basic call information that our operator got from the caller. You’ll get a notification bleep and you’ll then have a choice of what to do with the call.

6

Step 4: Our operator gets your response

Using the long polling techniques I developed in Project Totem, your response is instantly  pushed backed to the operators screen and they can handle the call how you want them to.

7

 

Beta testers required

If you’re a pureJAM client and want to test this system out you’ll need to be a PC user (no Macs I’m afraid) and you’ll need the .NET Framework 3.5 or later. Most Vista/Windows7 PCs should work fine. XP will work fine provided it’s been kept up to date ! To register for the Beta program, please login to your pureJAM account and send a U2U to your Account Manager entitled something like “Desktop Client Beta”.

viewmessages.com Architecture

OK, so we’re not the biggest site in the world but we have a fair amount of data, a fair amount of users and speed is very important to me so it’s important everything is as fast as possible. A few people have asked what our architecture is and I thought it’d make an interesting post. As is always the way with these things it’s easier to describe with a diagram:

PearlServerArchitecture

Web Servers

content.viewmessages.com

First of all, we serve images/bulky javascript and CSS from Amazon Cloudfront CDN which is an incredibly cheap way of offloading those things to the Amazon infrastructure. It also makes the platform much much snapper for our American users. If you even have a basic website it’s worth looking into using Cloudfront if only because it gives you a second domain to pull you data from which allows the browser to parallelise more downloads.

totem.viewmessages.com

Totem is my own long polling server I developed to allow instant communication to the users browser. This allows things like instant new message notification. In short, your browser uses JQuery to request a script from Totem. If there’s no new messages, Totem will sit there for 40 seconds and return nothing. Your browser will then re-request the script and wait for another 40 seconds. If you get sent a U2U for example 5 seconds into the 40 seconds, the web server/background server dealing with the U2U sends a notification to Totem which creates a bit of Javascript to display the U2U notification and sends it back as the response to the original request that was made 5 seconds previously. For more on Totem, read my Project Totem blog post.

static.viewmessages.com

Because we use a web cluster to serve the main HTML we need a central server for avatars and other central data that we don’t push to Cloudfront. The only challenge here was getting content to it. Security in IIS from the main webcluster meant I couldn’t access the machine directly to I had to use a SQL database as a proxy

http://www.viewmessages.com

The main web serving is done by a cluster of IIS machines. These are cheap commodity machines in the Google style. 2GB Ram/2GHz Dual Core CPU/80GB drive. Nothing fancy or expensive. By using multiple cheap machines instead of one big expensive one we get vastly better availability (they can be brought offline for updating), far better performance (if you add up the total computing power) at less cost. It’s a win-win other than it makes the software development slightly more complex at times.

Each machine runs a copy of SQL Express to write access logs to (Which are then copied to the main SQL box when things are quiet) and to store a whole bunch of reasonably static information (such as configuration) to reduce the load on the main SQL box. Each machine can do front end web serving, back end task processing or both. As we need more capacity we can simply add more machines. The load balancers will send the users request to a particular web server using a session cookie. If the server goes down, the failover happens within 10 seconds and you’ll be transparently placed onto a different server. 

The back end task processing is something I’m particularly pleased with as it allows the processing load to be distributed across as many machines as we need. At the moment these are the same machines that serve the front end web stuff but at a later date will be split off into a dedicated back end cluster. All the back end processing is done by requesting webpages from a queue. If you want to read about how we process background tasks heres my blog post about it

Background Servers

 Background / Offline Processing

As mentioned above, this is done using queues of webpages and is processed by the main web cluster

Main SQL Store

Nothing interesting here really I’m afraid. Just a reasonably beefy Dell machine with data replicated to a hotspare backup.

Solr Server

I’m now using Solr to generate the data for the new Message Analytics Feature . I’ll do a blog post about it at some point in the future but it’s incredibly fast compared to using XML data with SQL. Doing a ‘Group By’ on an XML value in SQL was taking around 1200ms for a particular data set (with an unloaded server). Using Solr on a *much* less powerful machine took 20ms. It’s an incredible piece of software if slightly tricky to use.

Memcached

The staple of every high performance website. Memcached is a memory based data store. I don’t use it to store reasonably static data as that’s done in the ASP.Net cache object (which is 10x quicker due to it being on the machine itself), but I use Memcached to store precompiled data that’s used across machines. For example, if you get sent a U2U it’s a background task that ‘delivers’ it to your inbox. This task puts the message in your inbox, adds it to the search database, then takes the most recent 10 U2U’s for you and recompiles the HTML you see in your ‘Recent U2U Messages’ widget on your homepage and inserts it into Memcached.  The background task then notifies the Totem server about the U2U, Totem notifies your browser, your browser requests the new HTML blob back from the webserver and guess what? It’s already been generated and the webserver just grabs it from Memcached. The beauty of using it over the ASP.Net cache is that cached objects can be shared across machines.

Memcached is a great bit of software and we’ve had absolutely zero issues with it.  The current stats from our memcached instance are below:

STAT uptime 28625365 (nearly a year)
STAT time 1280509032
STAT pointer_size 32
STAT curr_items 30626   (It’s 100,000 or so during busy periods)
STAT total_items 10108777
STAT bytes 9450225
STAT curr_connections 17
STAT total_connections 10040
STAT connection_structures 24
STAT cmd_get 39701711
STAT cmd_set 10108777
STAT get_hits 33158267  (It’s saved a LOT of SQL reads!)
STAT get_misses 6543444
STAT bytes_read 3126086257
STAT bytes_written 821258193  (It’s served 800GB!)
STAT limit_maxbytes 524288000

And to think, I was almost tempted to use Velocity instead. You can read why I didn’t.

Summary

By applying a bit of thought and leveraging the right technology for each part of the puzzle we’ve got a platform that *way* outperforms a traditional single big webserver setup. We also have minimal load on the main SQL box by using quite aggressive caching (In memory on the local webserver, in Memcached and in SQL Express on the local webserver).

Hot air extraction – more efficient server room cooling

 

In addition to the power for servers, a huge cost we have is cooling them. 6kW of servers is going to require some chilling. Our data room has air conditioning and it works very hard for a living particularly in summer where the heat differential on the aircon exchanger outside is lower. In a big data centre you’d pump chilled air into a ‘cold aisle’ in front of a load of racks, and then have a ‘hot aisle’ behind them where you suck the air back into the A/C. Unfortunately our building wasn’t designed with this in mind so we simply have a wall mounted unit that cools the whole room. The problem with cooling the room though, is there’s no way of making sure the servers see chilled air, they might get air that has come directly from the back of the rack and sucked back round again.

Whilst doing some tidying up I spotted one of our old extraction fans from years gone by. When we were a much smaller company, air was drawn into the room at one end and extracted at the other end. It kept things cool enough until we started to need more equipment and then A/C was the only option.

Anyway, below, you can see the unused fan and our main server rack to the left.

freshaircooling1

There are probably some very expensive hot air extraction systems on the market, but I figured there was no point in spending a lot of cash to trial it out. B&Q to the rescue for some gaffa tape and guttering pipe. Add in the old box from my Herman Miller ‘Mirra’ chair, and an hour of creativity and we have a working hot air extraction system…..

freshaircooling2

I simply made a baffle infront of the fan and added ducting that goes down behind the server. It’s not pretty, but it does work:

freshaircoolinggraph

There… proof it works! We dropped the temp measured at the top of the rack by a degree. Air intake temps on the servers lowered even more. Our SQL server was drawing in 24 degree air previously, and is now a lot more chilled. (21 degrees!). The UPS unit on the floor beside the rack had a similar drop from 25deg to 22deg.

We’ve massively reduced the strain on our air con unit for the grand sum of about £50 and the overhead of a 100W fan. (which is more than offset by the potential savings in air con for that room)

The next thing to try is adding curtains from the side of the rack to the wall to force the hot air into the extracted area.

Quick book review: Leaving Microsoft to Change the World

 

 

http://www.amazon.co.uk/Leaving-Microsoft-Change-World-Entrepreneurs/dp/0007237030/

 

My rating: 9/10

A great read overall… not quite Three Cups Of Tea, but inspiring nonetheless. Unlike Greg Mortenson, John Wood started out from a very strong position as a senior exec at Microsoft. It’s fascinating to see how he uses lessons from his past life working with highly driven people like Steve Ballmer to create a non profit that has improved education for more than 4 million children in Bangladesh, Cambodia, India, Laos, Nepal, South Africa, Sri Lanka, Vietnam and Zambia.

As well as making you want to jack it all in for something more meaningful, it’s got some half decent business lessons in there.

If you had to make the choice, I’d go with Three Cups of Tea every time, but this is still a cracker.

Re-inventing the spell checker

Background

Our system does a ‘review’ of messages after our operators save them. It checks for things like fields not being filled in where they normally are, but most importantly it checks for spelling mistakes and typos.

We used to use the Telerik Radspell spell checker component in a back end web service. It worked adequately but it had a limited dictionary, didn’t know the Queen’s English (it uses American spellings) and the suggested corrections were often a bit…… random as you can see from the screenshot below. The word column contains the supplied misspelling and the subsequent columns are the suggestions (in order).

 

spellcheckbefore

 

How does an average spell checker work?

It’s pretty simple to make a crude spell checker and all you need is a dictionary of correct words. You take each word and check if it exists in the dictionary. If not, you then loop through the dictionary seeing how different each correct word is to the supplied misspelled word. There’s a well used algorithm for seeing how different words are. This is called the Levenstein distance, or ‘edit distance’. Each addition/subtraction/substitution counts as an ‘edit’ For instance, take the misspelling of ‘hosspitle’

hosspitle -> hospitle
hospitle -> hospitae
hospitae -> hospital

That’s an edit distance of 3. The lower the Levenstein distance the more the words are alike.

There’s a slight snag with doing it this way though. If you have even a small dictionary of say 10,000 words you’d need to compare each of the 10,000 words to your misspelling. There’s no real way of pre-computing it as you can’t possibly cater for all misspellings. It’d be quite a costly computational exercise. We can get a much much smaller subset of words to compare by selecting them based on a phonetic algorithm. The most common of which is soundex. This way we can pre-compute the soundex code for all of our known good words.

For example, you can get the soundex value for ‘hosspitle’ using SQL2008 by doing select soundex(‘hosspitle’). This gives a value of H213. If I check the soundex of ‘hospital’ I also get H213. This means that the correct result would be in the subset which is a good start!

Spell Check 2.0…….

Why?

Because crappy looking messages with spelling mistakes and typos don’t give the client a sense of professionalism. The previous spelling corrector called wolf a bit too often and I found that a lot of operators would get used to ignoring it. Also, if it didn’t list the correct suggestion first time around it took them a while to go back into the message, correct the word and resend… there was a temptation to just ignore the mistake and send it anyway.

Improvement #1 – Junking the Telerik engine.

First step is to reproduce what the Telerik spell checker does so I can start to develop my own system. This turned out to be pretty easy. Just find a dictionary of English words on the tinterweb, upload to SQL, create a column for a Soundex field and use the built in SQL Soundex function to pre-compute the soundex’s by doing “update englishwords set soundexvalue = soundex(word)”.

You can then select your word subset back by doing something like “select * from words where soundexvalue = soundex(@mywrongword)”. Using the ‘hosspitle’ example, my dictionary gives me 47 records including

 

hagbut
hasped
hispid
hackbut
hagbuts
hawkbit
hexapod
hackbuts
hawkbits
hexapods
hexapody
hiccuped
hospital
hagbuteer
hagbutter
hiccupped
hispidity
hospitals
hospitium
houseboat
housebote
hospitalizing
hospitableness
hospitalization
   

(No, I don’t know what half of those words are either!)

Anyway, once I get our subset I order it by Edit Distance, so hospital will come amongst the first few search results. This gave me exactly the same results as the Telerik engine and therefore a decent baseline to work from…..

 

Improvement #2 – Using a bigger dictionary

Bigger is better most of the time, and I needed more words. Searching the internet for a while I found some decent sized CSV’s which had lists of words along with the number of occurrences that word has been seen (this will be useful later). I uploaded this in exactly the same way as improvement 1, into a table called BigDictionary but with a field for the occurrences. The system now uses the previous English words dictionary just to check if it’s a valid word. If I don’t see it in the table I then use the BigDictionary table to retrieve a list of possibilities.

Improvement #3 – Learn words itself

If the spelling corrector uses a fixed dictionary, it doesn’t have a hope of keeping track with the modern world. For example, just looking at the ‘wrong’ words being flagged up by the system as it was at stage 2 I could see it was probably annoying operators. It had flagged up words such as Skype, Mercedes, Google, Ferrari, Bosch, Microsoft, Nokia etc. I wrote a small routine to go through two years worth of messages, separate out each word and upload it to a table. If the word was already in the table, I incremented an ‘occurrences’ field (again, this will be useful later!). I set the system to gate the results so uncommon words don’t appear in the suggestions. This helps to stop any misspellings being learnt as valid words.

I check the BigDictionary table for suggestions, then the LearntWords table and aggregate the suggestions before sorting by edit distance.

Improvement #4 – Double Metaphone

The soundex algorithm is pretty basic and it’s totally reliant on the first letter being correct. This meant that the pre-computed subset was often a bit limited and wouldn’t contain the correct result.  After doing a big of research into phonetic algorithms it seemed like Double Metaphone was a good bet and a fair bit more advanced than soundex. I created a Primary and Secondary Metaphone field for all of my dictionary tables so far (including the learnt words table) and made a script to calculate the primary and secondary metaphone values for every word. After an hour of it grinding away I had precomputed values for metaphone as well as soundex. I changed my SQL queries to something like Select * from dictionary where (pm = @pm or sm=@sm or soundex=@soundex). This instantly made the results set bigger and it seemed to get a few more hits particularly if the typo was early on in the word.

Improvement #5 – Weight by Frequency as well as Edit Distance

If you look at the screenshot of the initial results you’ll see the Telerik checker suggested ‘darvon’, ‘driven’, ‘thriven’ for the typo ‘dirven’. This is because it has no idea how common a word is, and it just so happens than ‘darvon’ has the same edit distance from ‘dirven’ as ‘driven’. I have absolutely no idea what a darvon is, and I suspect neither would our callers. Fortunately, in the BigDictionary and my LearntWords tables I have an integer field essentially telling me how common that word is. I decided against simply using the count as a multiplier of a ‘relevancy’ as some words are hugely more common than others and would overwhelm  the edit distance… for example if you put ‘thene’ instead of ‘theme’, you’d find that it’d suggest ‘the’ as it’s vastly more common than theme, or even them and then. Instead, I used the ‘position’ as the multiplier, so my SQL became something like:

Select * from dictionary where (pm = @pm or sm=@sm or soundex=@soundex) order by wordcount desc

I then take the results in and do something like:

For Each Result
    Position += 1
    Score = Position * EditDistance(Result,Word)
Next

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

The lower the score, the more relevant the results.

 

Improvement #6 – Learn from our mistakes

Looking through the log of mistakes and corrections it seemed to be that the same ones were coming up again and again, for example ‘Plesae’ being changed to ‘Please’. It’s pretty obvious, but the system should look at what’s been corrected for that same mistake and bring up the correction in the results. To recap, the process we’re now doing is:

  1. Check EnglishWords table to see if it’s a common word
  2. Check LearntMistakes to see if we’ve seen the mistake before, if so, load in the corrections into an array of suggestions
  3. Search LearntWords by Soundex and Double Metaphone to see any soundalike word we’ve seen before in a previous message
  4. Search BigDictionary by Soundex and Double Metaphone to see any soundalike words that are in the dictionary
  5. Score all suggestions retrieved by Edit Distance and Position

 

Improvement #7 – Weight by source

Now we’re pulling in previous corrections, it’s pretty obvious that some sources are more relevant than others. For example, if I’ve seen ‘plesae’ changed to ‘please’ 80 times, it’s a fair bet when I next see ‘plesae’ they didn’t mean ‘police’, ‘palace’ etc. So, our array of suggestions that is being filled by our LearntMistakes, LearntWords and BigDictionary suggestions now gains a source column, and our weighting code becomes something like:

For Each Result

    Select Case Source
       Case PreviousCorrections
          SourceWeight = 10
       Case LearntWords
          SourceWeight = 15
       Case BigDictionary
          SourceWeight = 20
    End Select

    Position += 1
    Score = Position * EditDistance(Result,Word) * SourceWeight
Next

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }Again, the lower the weight, the higher the relevance.

 

Improvement #8 – Learn words by client not just globally

Some of our clients have industry specific words, for example, if someone phones up to book a car with Supercar Experiences and we see the typo Miserati it’s pretty likely the operator meant Maserati and not Miserable. When I processed the previously seen words from the last few years, I actually created two tables. One that was global across all clients, and one that had a client code on each row, i.e. treat the learnt words separately per company. I use a much lower threshold on this table so the system is quicker to allow learnt words into the suggestions than on the global table. This is purely because any wrong words that get learnt will only appear in suggestions for that company and won’t poison the global dictionaries.

 

Improvement #9 – Treat transpositions differently

One snag with the Levenshtein distance algorithm as it has no way of detecting transpositions, so ‘ditsance’ is an edit distance of 2 from ‘distance’. Changing to the Damerau–Levenshtein distance algorithm changes that and seemed to massively improve results where it was just a transposition.

 

Improvement #10 – Context

This is my favourite part……! By now the system is getting pretty smart and the number of messages going out with mistakes is falling rapidly (I re-analyse every message that’s sent after the review process so I can count word errors) but there’s still something missing and sometimes it seems a bit woeful compared to the human brain. We’re pretty good at reading typos and half the time our brain has corrected the word without us noticing.. this is because we know what word to expect. The computer however, doesn’t.

Consider the following sentence:  “sending info regarding meeting she had witrh you last month”. We can see they clearly meant with, but the computer has no idea and has to evaluate it without context.

I fed the system a made up message with words in context that it had previously struggled on. The message was  “you itnerested in. off hlaf way. some ifno on. refused to leavr number. was looking to spk with accounts. her leter box. meeting he had witrh you last week. llease call regarding”

You can see in the screenshot below that the primary suggestion in word2 field was pretty rotten most of the time:

 

beforecontext

 

What if, we had a massive database of text……? Lucky really, we do.

I wrote a routine to go back through our previous messages and split every sentence into three word groups, so the sentence “Wanted to follow up on the meeting he had with you last week” would give us:

Word1 Word2 Word3
wanted to follow
to follow up
follow up on
up on the
on the meeting
the meeting he
meeting he had
he had with
had with you
with you last
you last week

 

So there we have it… context. Whizzing through our database of past messages gave me around a million different three word phrases. Again, I used a ‘count’ so if it was a common phrase such as “please call back” I’d just increment the count if it was already in the database.

Then, I added another stage to the spell check, which was find words in context. If I came across an unknown word, I’d simply look in my table of the word phrases by using the surrounding words. For example, if I have ‘please ca regarding’ I’d simply search for any row where word1=please and word3=regarding. Here are some example results:

Please call regarding

Please email regarding

Please contact regarding

I then load all the returned middle words into my array, giving them a low weighting so they score highly

This context method gives the engine a much better idea of what the word could be than previously. Without context, the ‘please ca’ example the suggestion would likely be ‘please can’ which obviously makes no sense if the following word is ‘regarding’ but would make a lot of sense if word3 was ‘you’.

This screenshot shows how much better the results are with an idea of context:

 

aftercontext

 

Stage #11 – Always learning

Goes without saying really, but the system continuously learns words and three word phrases from each new message

 

Stage #12 – Wrong words

The danger with the system learning is that it could learn wrong words. I have a block process and once a week I check for any words that it’s learnt that are above or near the inclusion thresholds to appear in the results. With a single click I can either delete the word from the tables, or delete and block the word from ever being learnt by adding it to a BlackListedWords table.

 

Summary

The process we’re now doing is:

  1. Check EnglishWords table to see if it’s a common word
  2. Check LearntMistakes to see if we’ve seen the mistake before, if so, load in the corrections into an array of suggestions
  3. Check ThreeWordPhrases using context to see what the word could be
  4. Search LearntWords by Soundex and Double Metaphone to see any soundalike word we’ve seen before in a previous message for this client
  5. Search LearntWords by Soundex and Double Metaphone to see any soundalike word we’ve seen before in any previous message (higher threshold)
  6. Search BigDictionary by Soundex and Double Metaphone to see any soundalike words that are in the dictionary
  7. Score all suggestions retrieved by Edit Distance, Position and a Source weighting

 

Conclusion

Has it made any difference? Yes!!

As I mentioned before, I re-analyse every message that’s sent after the review process. To make it fair, I re-analysed the past 3 months worth and did some stats. The number of spelling mistakes and typos was never really very high as we have a very strict QC policy but in percentage terms, going on the two weeks the new system has been in place, the number of mistakes sent out to clients has dropped by 85%. It also speeds up the operators as if they spot a mistake it used to take a while to correct if the suggestions were poor.

All in all, a very worth while exercise and a great learning project… I ended up learning linguistics, re-learning probability and reading some ‘challenging’ research papers!

Monitoring Electricity Usage

Simple one, this one…

We wanted to see exactly how much power we were using, and wanted to be able to display this information to staff.

First off, you need a monitoring device. I opted for the CurrentCost Envi with the optional data lead (and two more sensors as we’re on three phase!)

Next, you download the driver from the CurrentCost site. Then you plug the monitor into your USB port. In theory it’s now pumping data into COM3 at 56700 baud. Ace.

A quick check with HyperTerminal (you have to go hunting for this, it died with XP!) and sure as hell… we have some data coming in. The Envi pumps in the current readings at 6 second intervals. Cool.

Now, a teeny tiny bit of code in VB.Net gets the data into your app. With .NET 3.5 you get a nice SerialPort control. Drag one of those onto your form, and then add this code:

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
SerialPort1.PortName = "COM3"
SerialPort1.BaudRate = 57600
SerialPort1.Handshake = IO.Ports.Handshake.None
SerialPort1.Open()
End Sub

Private Sub SerialPort1_DataReceived(ByVal sender As Object, ByVal e As System.IO.Ports.SerialDataReceivedEventArgs) Handles SerialPort1.DataReceived

Dim datain As String = ""
datain = SerialPort1.ReadLine()
System.Diagnostics.Debug.Print(datain)

End Sub

 

Tada!!!! You now have live electricity readings from within your app, coming in nice XML blobs like this:

<?xml version="1.0" encoding="utf-8" ?>

<msg>
  <src>CC128-v0.12</src>
  <dsb>00001</dsb>
  <time>12:38:19</time>
  <tmpr>18.5</tmpr>
  <sensor>0</sensor>
  <id>00077</id>
  <type>1</type>
  <ch1>
    <watts>02330</watts>
  </ch1>
</msg>

 

A bit of XML jiggery pokery and you have a reasonably accurate data feed of your power readings in your SQL server.

To see what I did with the data, have a look at the JAM Blog

Project Totem – A Long Polling server (Part 1)

 

Normal Polling

Let’s start with normal polling. The browser simply runs some Javascript on a timer that repeatedly checks for new data on the server. The problem with this is there a trade off between latency and bandwidth. If you were to use a timer that ran every minute your server load would be minimal…. but there’d also be up to a minute before the user saw the changed data. You could drop it to a very short interval but you’d have a LOT of requests to your site.

 

 

The server setup is unchanged from a normal web server setup:

 

 

 

Long Polling

If we don’t want the bandwidth/latency trade off there is another way. You can use the timeout function of most AJAX libraries (I use jQuery) to perform ‘long polling’. Instead of asking the conversation between the browser and the server going like this:

“Anything new…………………..? Anything new…………………..? Anything new…………………..? Anything new…………………..?Anything new…………………..? Anything new…………………..?Anything new…………………..? Anything new…………………..? Anything new…………………..?”

It goes more like this:

“Tell me if anything new comes along in the next 20 seconds ………………………………………… ………………………………………………………………….

Nothing? OK, let’s try again…

Tell me if anything new comes along in the next 20 seconds ………………………………………… …………………………………………………………………."

a much more efficient use of bandwidth, but here’s the double bubble bonus. So long as the server it’s asking returns the new data and closes the connection there’s actually less latency. It doesn’t matter when the new data arrives, but with standard polling you have to wait until the next poll.

In flowchart form, long polling is super simple:

So we’re all sorted right? Not quite.

The problem with long polling using a regular web server is, it’s not very efficient. You end up with a LOT of open connections, and other than having IIS sit there spinning on each ‘poll’ page waiting for new data to come in, there’s not really a nice notification structure either. Apache is even worse on this front as it really dislikes connections being held open. Another minor snag is that you don’t want to query the original hostname for the data. Most browsers only allow you 2 connections per site, so if you tie up one on the polling there’s only one left to actually fetch data.

So, the answer is a dedicated polling server.

These things exist in the *nix world, most notably CometD, but it’s a lot to learn just to do something simple.

After 10 minutes of pontificating, I decided to do the obvious. Make my own! Project Totem is born. ( because a Totem is a ‘long pole’ and also as a nod to my friend Sam who runs Totem Development )

In essence it’s a very simple Windows Sockets application that just pushes Javascript back to the browser. The browser then executes that script and gets the data from the original web server.

The server generates a GUID that’s sent to each page in the polling javascipt. The server also tells Totem that it’s served that GUID, and that page needs to know about changes to data sets A, B and C.

The browser then polls Totem using the GUID, and if there’s nothing new the request will just time out after 20 seconds. It then polls again, and repeats polling using a 20 second timeout. The very millisecond that Totem gets a notification from the webtier that say data set A has changed, it returns the appropriate Javascript back to the browser and shuts the connection. The browser then does whatever you want to go get the data etc.

I’ll explain more about how I’m tracking keys/scripts and GUIDs etc in part 2 🙂

Making the ‘New Message Alert’ slicker with jQuery AJAX

 

On our web platform, when you get sent a ‘U2U’ message, whether it’s from a colleague, or notification of a phone message, you get a little flashing envelope icon in the toolbar. Clicking on this takes you to your inbox. All of this is pretty much as you’d expect. The count of unread messages is done at page load (well.. taken from memcached anyway) and the button is generated then.

The snag with this is, if you’re expecting a message it can turn you into a bit of a refresh-monkey, reloading the page until you see the unread messages icon.

So we have a page that looks like this:

and we need to make that U2U alert a bit more…………. ‘realtime’. Fortunately it’s pretty easy with a little bit of AJAX magic.

First of all we need to make the button have an identifiable <div> so all we do when rendering the button in the Page_Load is do this: <span class="u2ubutton"></span>

Next, we simply create an ASPX page that returns the inner HTML for the button, which obviously depends on the number of messages. To do that we simply have an ASPX that goes a little like this:

response.clearcontent 
response.cachecontrol = "no-cache" 
response.write MessageCount & "  U2Us" 

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

Obviously it’s a bit more complex than that as you only want to display the envelope icon if there’s an unread message, but I’ll spare you the boring part.

Once that’s done it’s a matter of polling that page. In time, I’ll convert it to “Long Polling” for higher efficiency and better response time, but for now, a simple javascript timer will suffice. To poll the button content page, and update the content we can use a single line of JQuery called from a timer:

        <script type="text/javascript">

            $(document).ready(function() {
                 GetNewU2Us()
            });


            function GetNewU2Us() {
                $(".u2ubutton").load("<%=request.applicationpath%>/returnu2ubuttoncontent.aspx")
                window.setTimeout(function() {
                    GetNewU2Us()
                }, 10000);
            }
                                                                   
    </script>

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

The initial page load calls GetNewU2Us which updates the div with the class ‘u2ubutton’ with the HTML that’s spat out by our button page. It then starts a timer off so the event will happen every 10 seconds.

So far so good then, our message alert now works nicely in the background and updates every 10 seconds.

If you’re not currently viewing that page though you might not see it. If you’ve used the Twitter web page recently you may have spotted a nice new addition. If more tweets come in, it updates the page title to ‘(3) Twitter / Home’ signifying there are 3 unread tweets. You can see that number in your browser tab, so even if you’re doing something else you can see at a glance you have new messages.

Replicating this with our solution is a doddle. In our  returnu2ubuttoncontent.aspx page we just insert a bit of Javascript after the button text:

       If MessageCount > 0 Then
            ButtonText += "<script language=""JavaScript"">" & vbCrLf
            ButtonText += "var leftchar = document.title.substring(0, 1)" & vbCrLf
            ButtonText += "if (leftchar == '(')" & vbCrLf
            ButtonText += "{" & vbCrLf
            ButtonText += "var oldtitle = document.title" & vbCrLf
            ButtonText += "var rhb = oldtitle.indexOf("") "")" & vbCrLf
            ButtonText += "oldtitle = oldtitle.substring(rhb + 1)" & vbCrLf
            ButtonText += "document.title = '(" & MessageCount & ") ' + oldtitle" & vbCrLf
            ButtonText += "}" & vbCrLf
            ButtonText += "else" & vbCrLf
            ButtonText += "{" & vbCrLf
            ButtonText += "document.title = '(" & MessageCount & ") ' + document.title" & vbCrLf
            ButtonText += "}" & vbCrLf
            ButtonText += "</script>" & vbCrLf
        Else
            ButtonText += "<script language=""JavaScript"">" & vbCrLf
            ButtonText += "var leftchar = document.title.substring(0, 1)" & vbCrLf
            ButtonText += "if (leftchar == '(')" & vbCrLf
            ButtonText += "{" & vbCrLf
            ButtonText += "var oldtitle = document.title" & vbCrLf
            ButtonText += "var rhb = oldtitle.indexOf("") "")" & vbCrLf
            ButtonText += "oldtitle = oldtitle.substring(rhb + 1)" & vbCrLf
            ButtonText += "document.title = oldtitle" & vbCrLf
            ButtonText += "}" & vbCrLf
            ButtonText += "</script>" & vbCrLf
        End If

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

Apologies for not just pasting in the resultant javascript, but you can see what it does. If there’s an unread message, if inserts the top blob which strips a ‘(xx)’ from the title if there’s already one and then adds the new count. If the message count is 0 it just strips the (xx) if it exists. Not pretty but it works!

So after that bit of work, we’re left with:

Obviously the background polling and the nice magic updating aren’t apparent in a screenshot, but the end result is super slick…!!

Reducing operator stress – Answer Phrase

 

Our call agents have a pretty hard job. They answer around 300 calls per shift, and probably 50% of those calls are for unique clients. It’s not the same as say a call centre for Barclays where you answer the phone in the same way every time. Most of the time, they’ll have to answer the phone with a different greeting every time.

95% of our clients have a standard greeting of ‘Good [timeofday][company], how can I help you?’ so the operators have that down pretty well. The only snag is, since the client can change the greeting themselves, our operators are forced to read to whole answer phrase each time as they don’t know if it’s the standard greeting or not.

A few lines of Regex.Replace code and a bit of CSS styling, and we have a nice little mod. Instead of the whole greeting being in bold red text, if it’s the standard “Good morning [companyname], how can I help you?” then we lowlight (yes that’s a word!) the ‘standard’ parts.

Example:

Look how much easier that is on the eye and how much quicker your brain can do the mental ‘replace’.

This should hopefully reduce the mental weight on the operator in that first half second of the call, allowing them to give a more natural greeting.