By Jane Wakefield
BBC Technology reporter, BBC website
Teaching the speech machine to understand words
Speech recognition seems to be one of those technologies that should be the perfect answer to a whole range of situations but has never made much impact in the mass market.
Voice Recognition software has improved over the years and speech-to-text software on computers, music and sat nav systems activated by voice in cars are now readily available.
A company based in Buckinghamshire is hoping to take things a step farther with its system that converts voice messages to text.
Over the last five years SpinVox has been developing a system which has now been taken up by 12 carriers including Vodafone in Spain, Telstra in Australia and broadband telephony provider Skype.
To date it has converted more than 50 million voice messages to text and is forecasting that will rise to over 300 million a month by the end of 2008.
"It all started out in a moment of frustration," explained co-founder of SpinVox, Daniel Doulton.
"Myself and a colleague had been in a meeting with our phones turned off. We got back in a car and my colleague had nine new voice messages. By the fourth one, trying to scribble down notes, the thought occurred that there must be a better way," he said.
SpinVox converts speech to text
The SpinVox system aims to convert normal speech into a meaningful message - a kind of voice-to-text post-it note.
As well as offering the service for voicemail, it allows bloggers to speak their blog entries and users of social networking sites Facebook, Jaiku or Twitter to update their profile over the phone.
The issue with voice recognition technology in the past has been that there is a kind of duel between the user and the system, said Mr Doulton.
"You have to speak slowly. It works most of the time but it doesn't understand everything you say and you have to train it and people get frustrated with that," he said.
SpinVox has come at the problem from a different angle - rather than wanting the system to pick up the nuance of every word, it is far more interested in general meaning.
Despite that, Mr Doulton said his initial idea was greeted by amusement from those who had worked in the industry for years.
"Most of the experts laughed and said, in effect, it is not possible," he said.
Speech veteran Dr Tony Robinson was one expert who was won over by the system and has recently been recruited as director of SpinVox's Advanced Speech Group.
Dr Robinson had been working at Cambridge University in the field of speech recognition since 1985.
He has worked with US Defense Advanced Research Projects Agency (DARPA) which funded speech recognition work at the university in the early nineties and has helped the BBC develop its automatic subtitles.
Cambridge has spawned hundreds of voice specialists including Phonetic Arts, a company which is working on converting text into emotional speech for the gaming industry and Toshiba, which has moved its speech research labs to Cambridge.
Cambridge's pre-eminence in speech recognition is partly down to its development of the Hidden Markov Model Toolkit (HTK), which has become the standard software for building speech recognition systems, and was developed at the university's Machine Intelligence Lab.
It forms the basis of Blinkx, the video search engine.
Oliveresque has entered popular speech
The SpinVox system - in development for the last five years - uses HTK for its own voice message conversion system.
It also uses other speech recognition systems from providers such as Phillips and IBM.
"Some speech engines are good at recognising short bursts of speech while others are better at addresses or number recognition," said Mr Doulton.
SpinVox - available in English, French, Spanish and German - is also constantly learning
and asks for human intervention when it comes across a term it doesn't recognise.
"It knows what it doesn't know and asks for help," said Mr Doulton.
As a result of developing the system SpinVox has built up an interesting semantic database.
"We know by hour of the day what kind of conversations people are having," said Mr Doulton.
Late morning on Wednesday is its busiest time with people leaving more messages than any other day of the week.
By the end of the day on Friday, the information being left in messages is far more social in content as people attempt to make arrangements for meeting up, he said.
SpinVox had a stab at Shakespeare
The SpinVox system suggests that, on average, people are learning between 300 and 400 new words each year as terms such as SocGen, Tsunami and Oliveresque (as in Jamie) enter the lexicon via key cultural and environmental events.
"It is a resource we are building and sharing and there is a project there for us to consider for the future," said Mr Doulton.
BBC News Technology has been putting the system through its paces.
It coped pretty well with straight-forward messages and even had a stab at converting Hamlet's famous soliloquy.
The message came back as: "To be or not to be. That is the question. Whether it is no blow in the minds to suffer the flings and arrows of outrageous fortune or to take arms against the sea of troubles and my opposing in them to die, to sleep no more and by a fleet to say we end the heartache and the thousand natural shocks that flashes there to".
Which gets the gist, if not the intricacies, of Shakespeare's language.
For Dr Robinson the potential uses of voice recognition are only just beginning to be realised.
"For example, at SpinVox one of the things we are looking at is a system for use in the car that reads out all our texts and also allows any calls you make to be converted into text," he said.