## The Bayesian and the Frequentist – I

I think I write a series of postings about Bayesian statistics and the view that I take on it as a frequentist. Do not expect too much depth here. I am not a specialist in statistics. I just want to study the differences in thinking, maybe with the help of some Monte-Carlo simulations in Euler Math Toolbox (EMT). Of course, as I said before in this blog once you simulate statistics you take the frequentist approach. I also said that anything that cannot be simulated is not well understood. Let us put that to the test of real life problems.

For a start, a typical application of Bayesian thinking is the following: Assume you have two bowls A and B with red and gray balls in them.

The numbers of the gray balls are GA and GB respectively, and likewise the numbers of red balls are RA and RB. So in A we have GA+RA balls, and in B we have GB+RB balls. In the image above, we have RA=10, GA=11, RB=5, GB=30.

Someone draws a ball from one of the bowls and it is red. What can you induce about the bowl the ball was drawn from? From the image, we would say that the red ball is more likely to come from bowl A. But how to quantify that? And what does „more likely“ mean, if anything?

You can learn a lot from your mistakes, false tries, complete failures or illegal arguments. So go ahead and try to say something substantial about the bowl once you know that the drawn ball is red!

The probabilistic analysis goes like this: Our events are the individual balls with the probability that each ball has to be drawn. We select a bowl with probabilities pA and pB first, and then a random ball from the selected bowl. The probability to draw a specific ball in bowl A is then the probability to draw from A divided by the number of balls in A, likewise for any ball in B.

E.g., the probability to draw a red ball turns out to be

\(p_R = p_A \dfrac{R_A}{R_A+G_A} + p_B \dfrac{R_B}{R_B+G_B}. \tag{1}\)

A similar formula holds for the probability to draw a gray ball, and you can check that both probabilities will add to 1. For a specific example, we assume that we draw from A and B with the same probability. Then

\(p_R = \dfrac{1}{2} \dfrac{10}{21} + \dfrac{1}{2} \dfrac{1}{7} = \dfrac{13}{42} \approx 0.3095\)

We now express this in the Bayesian way, which is just a shortcut for the same thing.

\(P(\text{Red}) = P(\text{Red}|\text{A}) P(\text{A}) + P(\text{Red}|\text{B}) P(\text{B}). \tag{2}\)

You read P(Red|A) as the probability of a ball being red under the conditions that it has been drawn from from bowl A.

For a sharp definition, we need the set of all red and gray balls (denoted by Red and Gray), the set of all balls in A and B (denoted by A and B). We can easily compute the probability of a ball to be drawn from any of these sets, or the intersection or union of any of these sets, by adding all probabilities for all balls in these sets. If you think about it a while the proper definition of P(Red|A) must clearly be

\(P(\text{Red}|\text{A}) = \dfrac{P(\text{Red} \cap \text{A})}{P(\text{A})}\)

using the intersection of the sets of red balls and sets of balls in A. This is the expected portion of red balls, provided they are drawn from bowl A. If we multiply the probability from any red ball in A by the number of red balls in A we get

\(P(\text{Red} \cap \text{A}) = R_A \dfrac{p_A}{R_A+R_B}. \)

With that, the Bayesian expression (2) becomes the same as (1) as you will easily verify. But we are not yet sure which one is easier to understand, or easier to handle.

There is another way to understand what is going on besides adding the probabilities of the balls. Since the red ball is either from A or from B we have, using the definition of the probability under a condition,

\(P(\text{Red}) = P(\text{Red} \cap \text{A}) + P(\text{Red} \cap \text{B}) = P(\text{Red}|\text{A})P(\text{A}) + P(\text{Red}|\text{B})P(\text{B}).\)

Our goal was to compute the probability that the ball is from A under the condition that it is red. In Bayesian speech that is

\(P(\text{A}|\text{Red}) = \dfrac{P(\text{A} \cap \text{Red})}{P(\text{Red})}.\)

We already computed everything in this formula.

\(P(\text{A}|\text{Red}) = \dfrac{5/21}{13/42} = \dfrac{10}{13} \approx 0.7692\)

But the special charm of the Bayesian way of saying things this lies in the following formula which follows easily from the definition

\(P(\text{Red}|\text{A}) P(\text{A}) = P(\text{Red} \cap \text{A}) = P(\text{A}|\text{Red}) P(\text{Red}).\)

So we get

\(P(\text{A}|\text{Red}) = P(\text{Red}|\text{A}) \dfrac{P(\text{A})}{P(\text{Red})}. \tag{3}\)

This allows us to invert probabilities and conditions. Note that P(A) appears in (2) and (3). So it is clear that it is important to know the probabilities for the selection from bowl A and B. The result will strongly depend on these probabilities.

The extreme cases are P(A)=0 and P(A)=1, i.e., we never or always draw from bowl A. Clearly, this gives a precise information on the probability that the ball is from A under the condition that it is red!

This plot has been done with

>RA=10, GA=11, RB=5, GB=30 10 11 5 30 >pA=1/2, pB=1/2 0.5 0.5 >function pAR(PA) := (PA*RA/(RA+GA))/(PA*RA/(RA+GA)+(1-PA)*RB/(RB+GB)); >plot2d("pAR",0,1,xl="P(A)",yl="P(A|Red)"):

Let us try to simulate this in EMT. We have the choice to use an easy to understand, but slow program or the quick matrix language. For this demonstration, we use the latter.

>n=1000000; >bowl=intrandom(n,2); A=(bowl==1); sum(A)/n 0.500005 >p=[RA/(RA+GA),RB/(RB+GB)]; fraction p [10/21, 1/7] >Red=random(n)<p[bowl]; // 1 for each red ball >ired=nonzeros(Red); nred=length(ired); nred/n // ired = indices of red balls 0.309861 >sum(bowl[ired]==1)/length(ired) // portion of bowl A in red balls 0.76877051323

The numbers are close enough to justify our computations. And since we can simulate the process we are sure that the results make sense.

Finally, let me add some well known application of this trick. We test patients for cancer with a test that has a true positive rate and a false positive rate of detection. Usually, the true positive rate is close to 100%, i.e., if there is cancer it will be detected. But it also claims cancer if there is none with a false positive rate which cannot be neglected. Then there is the rate of patients which have cancer. It is a good idea to think of the population as split in four groups.

- cancer and positive test
- cancer and negative test
- no cancer and positive test
- no cancer and negative test

You can quantify the expected numbers in each category if you know the above mentioned rates of true and false positive tests and the rate of the cancer in the population (or the selected populations wich undergoes the test).

In Bayesian speech we get

\(P(\text{Cancer}|\text{Positive}) = P(\text{Positive}|\text{Cancer}) \dfrac{P(\text{Cancer})}{P(\text{Positive})}\)

So if your test was positive, chances are that you have no cancer even if the positive rate P(Positive|Cancer) is close to 100%. It all depends on the frequency of the cancer and the frequency of a positive test (including the falsely positive tests). Both are known to your doctor by experience.

For an example, we take the cancer rate as 1% , the true positive rate as 100%, and the false positive rate as 5%. We than have to compute

\(P(\text{Pos.}) = P(\text{Pos.}|\text{Canc.})P(\text{Canc.}) + P(\text{Pos.}|\text{No Canc.}) P(\text{No Canc.}) \)

If we assume that P(Positive|Cancer) is very close to 1, and P(No Cancer) is also very close to 1, we just have to add the rate of cancer and the rate of positive tests in the no cancer population and get an estimate of 6% of positive results as a good estimate. Since the rate of cancer is only 1% we conclude that our chances of having cancer is approximately

\(P(\text{Cancer}|\text{Positive}) \approx \dfrac{1}{6}.\)

That looks way better than our initial alarming positive test indicates.