Wednesday, June 30, 2010

Computer Voice Recognition: Not so great

In an article titled Whatever Happened to Voice Recognition? (read it here, Jeff Atwood, a developer I greatly admire, declares the death of Voice Recognition. The problem seems to be that for ten years or more, the accuracy of word recognition has remained at a miserable 80%. An article by Robert Forstner, The Death of Peas, explains the problem in entertaining detail. The code that chooses words has not changed much since the 1950’s, and that code base makes guesses based on probability. Yet humans (not quite including me, I’m afraid) hear words with an accuracy of 98%.

Despite the low accuracy rate, I’m still interested in VR. I tried the 9th version of Dragon Naturally Speaking, and I intend to try the 11th version. The reason is that, even with 80% accuracy, VR is good for writers: for dictating a first draft, and for cleaning up a draft.

I abandoned Dragon-9 for a number of reasons that, cumulatively, made it not quite useful enough. One of these reasons was that I have been typing drafts for over forty years, and I’m not used to dictating. But when I dictate my thoughts, the results can be terrific, and I’d like to force myself to try again.

The final straw for me, in Dragon-9, was that sometimes it mistranslated an entire sentence. It would get almost every single word wrong. (This is not so surprising, by the way; a few wrong words suggest a wrong context, and that implies more wrong interpretations.) Dragon-9 was a bit slow on my laptop, and I would be on to the next sentence before the mistranslation appeared on my screen. Later, when I went through my dictation, I would look at these entirely wrong sentences and wonder, what in the world was I trying to say?
