As GCSE students pick up their results this week, they may like to spare a thought for the examiners who devoted thousands of hours to marking their answer booklets.
Exams mean a lot of work for examiners
But in future, computers could help them reclaim their summer holidays.
Professor Sargur Srihari's research team at the University at Buffalo, New York, is developing software to fully automate the essay-marking process.
"Trying to analyse children's handwriting is a completely unexplored domain," says Professor Srihari.
Exam scripts are scanned into the computer, the software reads the handwriting and translates it into computer type, and then grades the response as an examiner would, Professor Srihari explains.
He has tested the software using a reading comprehension exam at US Grade 8 level (13-14 year-olds). The children were asked to read a passage about American First Ladies and use the information within to answer a question:
"How was Martha Washington's role as First Lady different from that of Eleanor Roosevelt?"
Professor Srihari asked human examiners to grade 300 answer booklets. Half of the graded scripts were then fed into the computer to "teach" it the grading process. The software identified key words and phrases that were repeatedly associated with high grades. If few of these features are present in an exam script, it generally receives a low grade.
Next, the computer was switched from "learning" mode to "grading" mode. Professor Srihari fed the remaining 150 scripts into the computer without the human grades attached. The computer predicted which grade a teacher would give each answer.
The computer was within a grade of the human examiners 70% of the time. The results are published in the journal Artificial Intelligence.
To see the software in action, the staff on the BBC Online technology desk tried their hand at the Grade 8 exam.
Could your results be marked by computer one day?
While the computer graded the responses, Professor Srihari explained some of the problems involved in developing the software.
"Analysing handwriting is difficult," he says. "But once you have computerised texts you can do many things."
"Today, over 90% of handwritten addresses on envelopes are read by computer, which seemed unbelievable 15 years ago."
The key, Professor Srihari explains, is to only expose the computer to legible handwriting. This is one advantage of using handwriting recognition software on children.
"Children write better," he says. "They don't have the bad habits you see in an adult scrawl."
The computer can easily break down children's handwriting into individual letters and interpret the words, Professor Srihari says. To refine the process, the computer also carries a database of handwriting samples that it can use to compare the general form of handwritten words.
Working with children's handwriting carries additional advantages. They have a comparatively small vocabulary, says Professor Srihari, and so the computer needs a smaller "dictionary" of words to translate their handwriting.
Professor Srihari's software can correctly identify about 60% of the words in any child's handwriting. But that is only half the problem. Next, the computer must grade the child's answer using its artificial neural network.
"The artificial neural network learns from the human scored answers to identify the important features of the text," says Professor Srihari.
"Some of the features are content dependent - key phrases or words in the answer - and some are content independent - the length of the sentences and the total answer length, for example."
The computer isolates and identifies keywords
Dr Mary McGee Wood at the University of Manchester is also studying the role of computers in exam marking.
"It's interesting stuff," she says. "But they've been very clever to limit this to a specific domain - reading comprehension."
Because the children have to answer a question using information in a specific passage of text, the software programmers can predict what keywords will appear in the text, which makes the computer's task relatively simple, she explains.
"There is a buzz word for this - 'expectation driven'. If there's a single model answer, it gives you a great deal to go on," Dr Wood says.
Professor Srihari accepts that the software cannot yet tackle an answer to a more ambiguous exam question.
"It's a much bigger problem to read arbitrary handwriting that you know nothing about," he says.
Professor Stephen Pulman at the University of Oxford has identified another potential pitfall in Professor Srihari's approach.
"You can't just look for keywords, because the student might have the right keywords in the wrong configuration, or they might use keywords equivalents," Professor Pulman says.
For instance, if a student is asked about World War 2, the keywords might include Churchill and Hitler. But it is not enough for the computer to identify those words - it would need to recognise the error in an essay that stated Churchill was the German leader and Hitler led the British.
It would also need to learn that "Prime Minister" or "The British Bulldog" are equivalent terms for "Churchill".
But Professor Pulman thinks software of this kind could have an application in the future. In 2003, his team developed computer software to mark typed responses in a GCSE biology exam.
"We tried to persuade the exam boards to take up the technology, but people are resistant to having a computer determine how well they did in an exam," he says.
A spokesman at the Oxford, Cambridge and RSA (OCR) exam board agrees. "We're reluctant to take human markers out of the process. But that's not to say the technology won't have reached a useable stage in the future."
But Professor Srihari thinks that these arguments underestimate the benefits his technology could have for teachers today.
"One of the most mundane things teachers have to do is grade pieces of writing," he says. "In large volumes, marking isn't enjoyable."
Alison Smith, head of the English Department at Millom School, Cumbria, is willing to concede the point, but she stresses that marking is an important aspect of teaching nonetheless.
"Most examiners who are current teachers do it partly because it gives them information that they can use with their own classes," Miss Smith says.
"I've been an examiner for five years because it makes me a better teacher."
Even if the technology improves, it is unlikely that there will be a clamour to use it for nationwide exams, Miss Smith thinks.
"If your chances of going to university rested on grades, you'd want a human examiner."
Professor Srihari is still at an early stage of his research, and he hopes there will be improvements in software accuracy in the future. Until that happens, GCSE examiners must continue to sacrifice their summer holidays to hours of marking.
Three BBC journalists provided written answers to the question - How was Martha Washington's role as First Lady different from that of Eleanor Roosevelt? - after reading the accompanying text.
Our handwritten answers were sent to Professor Srihari and the computer marked the responses.
The grades covered the entire range from the highest grade six to the lowest grade one.
A grade one answer is "brief, repetitive, and shows minimal understanding of the text" according to the computer's marking scheme.
So have standards fallen at the BBC?
"Some of the words in the answers weren't recognised," explains Professor Srihari. "Perhaps because they were not in the computer's dictionary."