By LJ Rich
Reporter, BBC Click
Dr Tony Robinson explains how voice messages are converted into text
Like many other technologies, speech recognition has improved over the years and those advances have led to it being used in many more places.
Despite this, the voice recognition software found in computers, handsets and satnav systems, among others, is far from perfect.
But the shortcomings have not stopped the technology industry from selling the idea of voice interaction in ever more novel ways.
Voice matching systems, for instance, compare a speaker's words to an database of responses. If the user is asked a yes or no question, only those words will be recognised.
Far more complex is proper speech recognition because it requires the understanding of any word said in any sequence.
Voice matching systems compare spoken words to a database
The scale of that challenge perhaps explains why technology still struggles to understand language comprehensively.
What makes the job harder is the expectation that computers should be able to understand people's "most common form of communication", according to Ian Turner from speech recognition firm Nuance.
"I think the biggest challenge is expectation, and as the technology gets better we're getting over that," he said.
"Whether it's on a device, a coffee machine, a desktop or a call centre, a lot of the time people don't even notice they're using it anymore," said Mr Turner, European general manager for the firm.
Software looks sluggish when compared to the brain. He said: "If I slur my words or I miss a word out, your brain fills it in and it does understand context and everything else".
Steps towards getting over that gap are being made as computers gain processing power and memory. One day, he believes, machines could make this kind of communication possible.
Ian Turner said high expectations from tech is a big challenge
Rather than understand what is said like humans do, desktop speech recognition systems employ statistical modelling and context to work out what is being said.
"It will actually figure out the word you say and effectively disambiguate it by the words that came before it," explained Mr Turner.
Computers interpret speech by breaking down words into small chunks of sounds called phonemes - the building blocks which make up English and every other language.
The latest speech recognition programs no longer need the user to train it to understand their particular speech patterns or voice. That corpus of sounds helps it work out what someone is saying.
But that is not the end of the problems that speech presents. Environmental factors, such as background noise, can be detrimental to the tech's accuracy.
Spinvox faces this challenge daily with its service that deciphers voicemail messages and converts them to text.
Desktop speech recognition software now uses "context"
"It could be the lorry that just went past and all these other noises that we have to split out from the speech and recognise just that speech bit," said Dr Tony Robinson from Spinvox.
To date the company has translated millions of voice messages - but it does use people to check unfamiliar words which get fed back into the database for next time.
And then there is the way that language evolves. New words come in and out of usage or old words can get new meanings.
For example, said Spinvox, one of the latest words added to its database is the recession-inspired term "staycation", meaning to spend a holiday at home rather than on foreign shores.
Dr Robinson reiterated the importance of "real world knowledge" in building sources to get machines to understand speech.
But for the moment, a machine that fully comprehends what people are saying remains in the realms of science fiction.