BBC News Magazine

Page last updated at 11:54 GMT, Thursday, 4 June 2009 12:54 UK

The problem with junk stats? It's you

Michael Blastland
Different ways of seeing stats

If you want to know why so many statistics are rubbish try answering this one question, below, says Michael Blastland in his regular column.

Q: When answering survey questions, do you tell the truth?

  • Yes
  • No
  • Sometimes
Please choose an answer before submitting.
We don't necessarily believe you.
Why? Read the article below to find out.

The old complaint that there are lies, damned lies and statistics blames the bean counter, the supposed stubby-pencilled bureaucrat who gathers the data, or the slick vote-grabber who mangles the results for a shifty cause.

But who started it? The answer - sorry folks - is you. Want to know who introduces half the junk into junk stats? Look in the mirror.

I'm not talking about the survey of our top 10 fears for our kids in a big bad world that then tries to flog us life assurance, or the junk surveys that prove 11 out of 10 cats trust their livers with our new-brewed-seaweed-detox-virus scan.

A handful in every hundred will say they snort tinfoil, if asked - at least researchers have an approximate benchmark for the drivel

I mean everything from immigration stats to unemployment, retail sales surveys or earnings to what the nation tunes to on the radio or how old you are, from the illegal drugs that teenagers take to how much sex you have.

To know what people experience, want, think, do, prefer, intend, how they behave, vote etc there's often little alternative to asking them. The numbers depend on you, or people like you, and the truth is that some of you can't be trusted to tell it.

Most of the data we care about is about people. Trouble is, some of those people don't half talk some rubbish.

Take this example from the US Census Bureau, which among other things tracks the number of centenarians. The chart shows how many say they are aged 100 or over (Declared) - and how many the bureau trusts (Preferred).


What happened in 1970 - with 22 times more saying they were 100-plus than the bureau thought likely - is anyone's guess. A decade's worth of LSD, maybe? "Me, I'm 504, man."

In some of these years there were whole families apparently born in the 1800s. "You counting this life, or all of them?"

Some will genuinely struggle with the form (it's believed the huge disparity in 1970 arose from a misunderstanding on the form) and don't deserve bad jokes. But you do wonder if some hope to make it onto Oprah.

Here's another. When you read scare stories about the latest new teen drug habit, bear in mind that researchers routinely insert an entirely fictitious drug into the questionnaire because they know some teenagers are full of - technical term coming up, here - tripe.

Rubbish in, rubbish out

A handful in every hundred will say they snort tinfoil, if asked. The fictional drug sometimes has a similar number of ticks to others that, once headlined, become a national panic. At least researchers have an approximate benchmark for the drivel.

David Tennant
One doctor who could be forgiven for getting his birthday wrong

Of course, no respectable professional who appreciates the value of accurate information would put a spanner in the data. Think so? Then think of doctors in the health service asked by an e-mail survey how old they were.

Simple? Too simple, thought the docs, as they typed into the date-of-birth field the don't-waste-my-time digits 00/00/00. Wise to this possibility, the system was set up to reject it. So the docs typed 11/11/11 instead, as in "born on the 11th November 1911". Hence the discovery of huge numbers of medical staff in the NHS aged over 90. Rubbish in, rubbish out. (Thanks to Professor David Hand for that story).

Bluntly then, it's your fault too, when stats go wrong, after all's said, done, aggregated and analysed and the numbers still spout rubbish.

The blame starts here. The serious point is that it's easy to treat data as if it falls into our laps and that only then can people do underhand things with it. As the writer Joel Best points out, it's people who count, and what they mostly count is other people. If you want to know some of the most curious problems with data, start not with the technicalities but by asking: "what am I like?"

Perhaps the surprise is that none of this makes official data worthless. Those trusted with collecting facts weren't born yesterday. They know the wheezes we get up to. They typically allow margins of error, take special care when people self-report their behaviour and try to keep them honest by cross-checks, confidentiality and the like.

They work hard to understand when and how people's answers can be unreliable and often find ways of filtering out some of the rubbish. But it's a question to ask whenever you see data, official or otherwise, the result of the numerous surveys and samples on which our knowledge of our country depends: "Honestly?"

Print Sponsor


Has China's housing bubble burst?
How the world's oldest clove tree defied an empire
Why Royal Ballet principal Sergei Polunin quit


Americas Africa Europe Middle East South Asia Asia Pacific