Bayes' Theorem. Bayesian Statistics. You may have heard of them. But if you're not a statistician or mathematician, you probably don't have much idea what they are or why anybody cares. So, for you, here is a summary.
Reverend Thomas Bayes
Reverend Thomas Bayes lived from 1702 to 1761. The son of a nonconformist minister, he was privately educated and went on to become a Presbyterian minister in Tunbridge Wells in 1731, where he remained until he died. After his death his papers, including mathematical works, were found by his friend Richard Price and submitted for publication. His paper An Essay Towards Solving a Problem in the Doctrine of Chances was published in 17631 and included his form of what was to become known as Bayes' Theorem.
Statement of the Theorem
Here will follow the simplest statement of the theorem. First, though, a summary of the terminology:
P(A) - The probability that event A will happen. The event can be anything, such as winning the lottery or finding a parking space. The probability is the chance it will happen, from 0 to 1. 0 means that it will never happen, and 1 means that it will always happen. Most events are somewhere in between.
P(A,B) - The probability that both events A and B occur. This is called the 'joint probability'.
P(A|B) - This is the probability that event A occurs given we already know that B has occurred. This is called the 'conditional probability'.
Our simple Bayes' Theorem states that:
P(A|B) = P(A,B) / P(B)
In words, it says that the probability of A given B equals the probability of A and B divided by the probability of B.
Use of the Theorem
That's all fine and dandy, but why is it important? Well, the theorem relates three probabilities, and if two of them are known it is possible to find the other. This may not seem to be much, but it underpins just about every form of prediction or forecasting. It provides a way of calculating the probability of an event it's not possible to observe directly.
If event A is defined as 'what we want to predict' and event B as 'what we have just observed', then A|B becomes 'what we want to predict, in light of what we have just observed' and Bayes' Theorem provides a way of updating the probability for A as new observations trickle in. This is why Bayes' Theorem is so powerful.
Traditional statistics typically estimate something by taking a big bag full of data, performing some kind of function on it and arriving at an answer. This is perfectly adequate as far as it goes. Sometimes this means that if new data arrive the calculation has to be performed again from scratch. It also means that care must be taken to ensure that the data are 'fair', because if there is some bias in them then the answer will be wrong. Even if the statistician knows that the data are biased, often the estimates of the bias will not be independent of the model and so can't be used sensibly in the model. If data are sparse, or if there are no data at all, then traditional methods don't usually give very good answers.
Step forward Bayesian Statistics. This method starts with an answer (or 'prior') and a measure of certainty regarding this answer. When new data arrive they are incorporated into the new answer (the 'posterior'2). The answer is updated using Bayes' theorem: the new probability 'given' the new data. With more data, the uncertainty regarding the prediction diminishes and following predictions improve. Not only does this incorporate new data by design, but it can give reasonable predictions from the very start. Even if the initial answer is wayward, performance quickly improves as data flow in. It is also possible to account for bias or incorporate information from outside the system in updating it. This makes the method even more powerful.
Apart from an impressive battery of statistical methods3, Bayesian statistics are also found in neural nets, other machine learning (such as radar tracking) and decision analysis.
Some statisticians have problems with Bayesian Statistics. The two main concerns are as follows:
How do we arrive at the starting 'answer'? - It is not always obvious what should be chosen as the starting point. If a woefully inadequate one is picked, the early predictions will be poor. It is also difficult to establish how certain to be about it. If the model is too certain then the predictions won't adapt to new data quickly enough, but if the model is not certain enough then the early predictions have too big a margin of error to be useful.
Bayesian probabilities are subjective - meaning that the probabilities are based on a personal belief about the universe. One person's probability that a sports team will win can vary wildly from another's. The fact that both can be considered 'correct' by Bayesian statistics is enough to put some people off4. It is more comfortable to think of an objective probability that is 'true' and definite for everyone. From a practical standpoint, however, probabilities are used to help people make decisions, and these decisions are personal and are wrapped up in subjective values anyway. Another way to look at it is to consider the results as the model's beliefs.
As you can probably tell, this was written by a Bayesian statistician and so is not entirely without bias. Feel free to form your own opinions about it.
You can read the original paper in a variety of formats from this list of important essays.
Unless you're a maths professional, the details about the subject will probably be a little dry5. You can find out more information in primers in statistics and academic journals. These aren't the most friendly of sources, but academics have to keep their jobs safe, after all.