From the moment people are born, they learn to make associations and to understand words depending on the context of a sentence.
This learning continues throughout life, so teaching machines to make common sense assumptions about language is a mammoth task.
Over the years researchers have been making in-roads into improving voice recognition and speech-to-text software.
But being able to recognise words is still a long way from machines actually understanding what people are saying.
Now the American Palo Alto Research Centre (PARC) is working on an ambitious project with the aim to take computers' language skills to the next level.
PARC's research on natural language processing was bought by search engine Powerset who combined it with data from online encyclopaedia Wikipedia.
Users can search Powerset with keywords, phrases, or simple questions - the results are aggregated from multiple Wikipedia articles. The aim is to provide more accurate results and answer questions directly.
PARC researcher Cleo Condoravdi said search engines are not refined: "You put in your keywords, you get some results, a lot of them might be relevant but it's up to you the user to sort through," she said.
"Now the dream would be if you could just ask your question and get the answer directly," she added.
'Sorry, try again'
Microsoft has now bought Powerset technology for use on its Bing search engine.
Mr Bobrow said how models of the world helps machines begin to understand context
Fifteen years ago Ask Jeeves was one of the first search engines to make use of natural language processing.
This approach focuses on developing models of the world around us and how we talk to one another naturally.
Danny Bobrow, a research fellow at PARC, said teaching life-long lessons to machines is a never-ending job.
"People take 15 or 20 years to really get to the place where we think we know everything, which is not true, as we find out as we go further on," he said.
"So that's why it takes so long - you have to encode for the computer all this background knowledge".
But Mr Bobrow believes this kind of intelligent search could become a reality in the next five to 10 years.
"We'll see specialised programs that understand specific things - it could be sports, recipes, or finances.
"What you will probably have is voice recognition and then [the machine] feedback would be able to say 'I didn't understand you, try again'."
Voice recognition, which uses natural language processing, is still far from perfect but it has developed sufficiently to become integrated into mobile operating systems such as Android.
Mobile phones still struggle to make sense of spoken language
The obstacles for this system are not only technological but also environmental, as Google research scientist Mike Cohen explained.
"There's background noise, and people may be saying all sorts of things with pauses and hesitations like 'um' and 'erm' - that's one thing that makes it hard," he said.
"People can talk about almost anything. It's enormous vocabs, unpredictable queries, and we need to be able to handle that," he added.
Google is taking voice technology one step further and employing it to automatically transcribe audio speech into text captions on YouTube videos.
The machine-generated subtitles are not always accurate, but they make video content more accessible for people dependent on visual communication.
Ken Harrenstien, a software engineer at Google, said the captions benefit everyone.
"On the surface it's built for accessibility. But it's for anyone in the world no matter what language they speak - it makes the information available to anyone whether they're hearing it or not," he said.
This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.