What is a probability? For us who are not mathematicians or philosophers, who learned it in middle school and then forgot most of it, it just means, ‘the degree to which something is likely to happen’. We studied axioms, how to calculate probability and we studied several distributions and applied them to problems (Read A Short History of Probability to know more). However, did you know that the simple subject of probability was hiding several deep and unanswered philosophical questions?
Mathematicians have been trying to answer them for decades now and the experiments have led to multiple interpretations of what it means to say something is likely to occur.
For example, let us consider our novice cook Anna who gives us home cooked brunch every Sunday. Anna bakes either cupcakes or brownies but never both as dessert. Therefore, dessert selection is mutually exclusive and collectively exhaustive.
It was advocated by Pierre de Laplace and even though it is found in other mathematicians’ works. When an event occurs, we count all the possible outcomes and then divide it by the total number of outcomes to get the probability. It must be noted that probabilities assigned to each outcome must be the same.
Anna bakes every Sunday either cupcakes or brownies but never both.
N is the total number of outcomes = 2
P(Me Having Brownies) = (Number of Times Anna can Bake Brownies on Sunday)/(Total Number of Desserts She can Bake)
=1/2 = 0.5
P(Me Having Cupcakes) = 0.5 (Equally likely)
After a while, Anna gets quite driven to improve her baking skills and learned to bake a chocolate cake.
So now I assign 0.3 to the probability of me having one of the three desserts.
This approach seems simple and useful in experiments that do not require a very high degree of accuracy in resulting probability and outcomes that are limited to finite spaces. Over time, Anna learns to bake an infinite number of desserts of many flavors. Do we assign them equal probabilities?
How do we assign equal probability considering the physics? If we toss a coin the odds we assign is 50% for heads and 50% for tails. But won’t it’s physical aspects like the air movement, the weight of the coin, even the surface on which the coin lands play a part in the probability?
Consider the Hypothesis H.
According to logical probability H must be supported to a high degree of force by evidence E. It is denoted as c(H, E).
The high degree of evidence is important to note as it reinforces the condition that logical probability is believed to be free from subjectivity. In this case, unlike classical probability, when we think about a set of outcomes of an event, we can assign multiple values of probability to multiple outcomes.
As per the information I have, Anna had a very busy week. Anna also finds it easier to bake brownies than cupcakes when she has had a busy week. Whenever Anna has a busy week, she bakes brownies 60% of the time. An additional information is that I like brownies more than cupcakes.
My hypothesis H is, ‘I will have brownies’.
My evidence E is, ‘Anna had a very busy week and that Anna finds it easier to bake brownies than cupcakes. Whenever Anna has a busy week, she bakes brownies 60% of the time’.
H is probably true considering E supports H to a high degree.
Therefore, after considering the facts, I feel that probability of me having brownies is over a 70% – according to the logical probability.
I go over to Anna’s house and find that she made cupcakes.
How did my logic probability fail me?
For the answer, I need to examine the strength of my evidence ‘E’. Had I been truly objective as I was supposed to be? Did I have enough data to prove that ‘whenever she had a busy week, she baked brownies 60% of the time’ part of the evidence? I correlated ‘having a very busy week’ and ‘she finds brownies easier to bake’ and made the evidence stronger. It added subjectivity to the incomplete evidence. Also, the fact that I liked brownies more than cupcakes added more subjectivity to the evidence.
Therefore, the fault with logical probability is that; one we might not always have the complete evidence. And two, interpretation of the evidence plays a role in the addition of subjectivity to it.
Subjective Probability or Bayesian Probability
In the previous section of logical probability, we saw how subjectivity crept into our so-called objective evidence. Subjectivists or Bayesians embraces this subjectivity. Subjectivists or Bayesians (See our article on Bayesian Reasoning for a detailed explanation) regards probability as ‘degree of belief’ or to what extent the person assigning the probability to an event believes it to be true or false.
However, Bayesians don’t just use their subjective degree of belief. They use evidence from the past, called prior and the evidence or data we have to find the conditional probability. The resulting probability is called posterior.
The process does not stop there, for the next iteration or to update the accuracy of the posterior, we use it as the prior for finding the new posterior. This process of updating is iterated.
Now let us come back to our brunch. We get some information from the baker herself so I can find out the probability of getting my favorite dessert.
Anna says out of all the desserts she baked, about 70% are brownies and 30% must be cupcakes. When she has had a busy week, she finds it easier to bake brownies and does so about 80% of the time.
With this information in hand, how do we find the probability of me having brownies this Sunday?
P(Baking Brownies|Busy Week) =?
P(Baking Brownies) = 0.7 (prior)
P(Busy Week|Baking Brownies) = 0.8 (evidence)
P(Baking Brownies|Busy Week) = (0.8 * 0.7 ) / ((0.7*0.8) + (0.3*0.2))
= 0.35 (posterior)
Therefore, even though as per the information it looks as if the odds that Anna might have baked brownies is good, by using conditional probability, we see that it is only 35%.
Next week, before we go Anna’s, we go Anna’s we do the same calculation. We will use 35% as the prior instead of 70%.
There will be a certain degree of variation of probability from each prior and posterior.
The frequency of an event is the number of times an event occurs. To find the relative frequency, we need to conduct the experiment a large number of times and divide the favorable outcomes by the all number total outcomes. Frequentists believe that the probability of an event occurring is this relative frequency.
So now we will go even more nerdy on our Sunday brunch. This time we will begin to count the number of times Anna bakes brownies and cupcakes for 4 months.
N is the total number of outcomes = 16
Number of times Anna baked brownies = 9
Number of times Anna baked brownies = 7
P(Me Having Brownies) = 9/16 = 56.3%
P(Me Having Cupcakes) = 7/16 = 43.8%
Even though both frequentism and classical probability seem anatomically similar, it is important to note that when we use the former, the number of all possible outcomes are counted. In case of frequentism, we carry out the experiment many times and find an actual number.
Now we have the probability that me having brownies is more than the probability of me having cupcakes. According to the proportion of brownies vs. cupcakes Anna bakes, we are certainly right. But this number is clearly not accurate. If we find the relative frequency in the next 4 months, will it be the same? There will be a variation to a slight degree.
In our case, we can repeat the trials to find the relative frequency. However, what about events that give a sense of unrepeatability? For example, Malala getting a Nobel Peace Price again? It happened once and is not likely to happen again. These are called single case events. In a way, in frequentism, we are converting each set of trials to a single case and comparing it with the rest.
The only way to overcome these issues to get the accurate probability is to consider an infinite number of reference classes. How can we manage to do that in reality?
Also, as you might have noticed, we cannot incorporate the surrounding facts that affect the probability into the trials. We can find the probability that Anna had a busy week and she cooked brownies we can also find the probability that Anna cooked brownies. How do we incorporate the two together?
Propensity relates the natural tendency of an event to its probability. In a way, propensities try to explain why an event occurs the way it does. We repeat the trials with the intention of getting those outcomes and the more we try, the better probability we get. It relates the probability of an event occurring once to the probability of the event occurring a large number of times.
For example, what makes Anna bake brownies? Anna says she had a very busy week and whenever the number of guests for brunch exceeds 10 and when her significant other is visiting she bakes brownies. When we observe the brunches more closely, we try to figure out when these conditions are met.
Therefore, the probability of Anna baking brownies under these conditions once (single case) and the probability of Anna baking brownies under these conditions 100 times is the same.
According to the law of large numbers, when we conduct an experiment a large number of times, the average of the results will converge to being more and more accurate. Propensity uses this law to explain its results.
We assign the probability of getting 3 in a dice throw is 1/6 (single case) based on the properties of the dice and the environment we are conducting the experiment. Then we throw this dice 1000 times. By the end of the experiment, based on the number of 3’s we get we derive the relative frequency. This will be close to the probability we assigned for each dice throw 1/6, thereby making use of the law of large numbers.
The long-run propensity is an indication of the constant single case propensity. However, the issue with the single case propensity is that is after all an abstract and all aspects of it cannot always be tested repeatedly under the same exact conditions.
After we cover all these interpretations, it is obvious there are lots of room in this area for further research as all these methods have their own caveats. Each concept has its own unique utility but a there is no unified interpretation yet that would cover all dimensions of probability yet.
As we resign to this fact I quote Jacob Bernoulli who said, “It is utterly impossible that a mathematical formula should make the future known to us, and those who think it can once believed in witchcraft”.
E-mail us at firstname.lastname@example.org to inspire our readers with your story – be it your success story or a lesson learned, share what you learned or send some love to a friend. We would love to hear from you!