Let me tackle another problem of statistics. I found it on a web page among other problems of the silly kind like „Is 0.999..<1?“. But actually, this one is a real problem that can provide insight into statistical thinking. It is elementary though. I am going to use Euler Math Toolbox (EMT) for some calculations and simulations.
Assume we have a set of Bridge cards (52 cards with 4 colors from 2 to 10, J, Q, K, Ace). This is dealt to four players, each player getting 13 cards. Answer the following two questions.
- Assume one player gets an ace of any color. What is the probability to get a second ace?
- Assume one player gets the ace of spades. What is the probability to get another ace?
My take on this is the following guideline:
Device a repeating experiment that mimics your problem!
In this case, we deal the 52 cards „randomly“ over and over again. We assume that every deal has the same probability. And I assume that you know that there are
\(\displaystyle \binom{52}{13} = \frac{52\cdot\ldots\cdot40}{2\cdot\ldots\cdot13} = 635’013’559’600\)
possible ways to select 13 cards from 52 cards. We have assumed that all of these deals come with the same probability. The frequentistic approach is to assume that we deal each distribution with the same frequency on the long (very long) run. So we can just assume we deal each one exactly once, and get our probability (the expected ratio of „good cases“) by
probability = „number of good cases“ / „number of all cases“
For problem 1, „all cases“ are the cases where there is one ace, and „good cases“ are the cases where there are two or more aces. For problem 2, „all cases“ are the cases with the ace of spaces and „good cases“ the cases with the ace of spades and another ace.
We are then faced with counting, a problem of combinatorics. In this case, we can count the result. In other cases, there is only the option of a Monte-Carlo simulation. Let us do one in EMT.
>function simulate1 (N) ... $ c=1:52; n1=0; n2=0; $ loop 1 to N $ c=shuffle(c); $ k=sum(c[1:13]<=4); $ if k>=1 then $ n1=n1+1; $ if k>=2 then n2=n2+1; endif; $ endif; $ end; $ return n2/n1; $ endfunction >simulate1(1000000) 0.370341025387
EMT may be an easy language to do the programming, but it is not the fastest one. And it is easy only if you know the basic syntax of a matrix language. E.g., the command c[1:13]<=4 returns an array b of 13 zeros and ones (where we simulate the four aces by the numbers 1 to 4). Its element b[i] will be one if and only if c[i]<=4. Summing up we get the number of elements in the first 13 shuffled cards that are smaller or equal to 4. In the code, n1 is then the number of times where one ace showed up and n2 the number times where two showed up.
It is important to note that we discarded all shuffles with no ace in the first 13 cards. It is like a condition in Bayesian reasoning. We are asking for the probability that there are two or more aces under the condition that there is one ace.
The change for the second problem is only in one line. We have to check if a specific ace (say the number 1) is in the array.
>function simulate2 (N) ... $ c=1:52; n1=0; n2=0; $ loop 1 to N $ c=shuffle(c); $ k=sum(c[1:13]<=4); $ if any(c[1:13]==1) then $ n1=n1+1; $ if k>=2 then n2=n2+1; endif; $ endif; $ end; $ return n2/n1; $ endfunction >simulate2(1000000) 0.561442110359
The function any() returns 1 if any element of the argument is non-zero. Since c[1:13]==1 tests for the elements to be equal to 1 (the ace of spaces) we get the right answer.
As you see, the outcome of the experiment is quite different. This is why the problem is interesting.
Now we want to do the combinatorial job of counting and computing the probabilities. We expect our simulations to get „close“ to the „correct“ probabilities. The mathematics behind this is very basic combinatorics. My feeling is that this kind of reasoning should be in the mathematical class of any higher education. If you do not know the result you should learn it.
The number N(k) of deals for player number 1 which contain exactly k aces is as follows.
\(\displaystyle N_k = \binom{4}{k}\binom{48}{13-k}\)
The reasoning behind this is as follows: We have to select k from the 4 aces, and 13-k cards from the 48 cards that are no aces. If we do this we get all possible selections with 13 cards that contain exactly k aces.
Finally, the probability in problem 1 is („good cases / all couting cases“)
\(\displaystyle p_1 = \frac{N_2+N_3+N_4}{N_1+N_2+N_3+N_4} \approx 0.3696\)
as computed by EMT with
>function N(k) := bin(4,k)*bin(48,13-k) >sum(N(2:4))/sum(N(1:4)) 0.369637191337
This is close enough to our Monte-Carlo simulation.
Note that N(2:4) returns the array [N(2), N(3), N(4)] automatically in a matrix language. This works since the function bin() „maps“ to elements of array arguments. The sum of N(0) to N(4) is the number N of all deals. This allows a simplification with a new insight.
\(\displaystyle p_1 = 1 – \frac{N_1}{N-N_0} = 1 – \frac{a_1}{1-a_0}\)
where a(k)=N(k)/N is the probability to get exactly k aces. Of course, 1-a(0) is the probability got the one or more aces. So the right-hand side is what we expect as the probability for our Problem 1. If you are not as frequentistic as I am you may have come up with this formula immediately.
Let us tackle the Problem 2 now. This is even easier. We are now only dealing with 51 cards (all but the ace of spades), and we deal only 12 cards. It is easy to count the number of deals with none of the remaining aces. The answer is
\(\displaystyle p_2 = 1 – \frac{\binom{48}{12}}{\binom{51}{12}} \approx 0.56115\)
This should now be obvious (after some thought), and our simulation was close enough to this.
So the probabilities are indeed different. This seems counter-intuitive at first thought. But let us try to find a good argument.
Let us look at the complementary probabilities 1-p(1) and 1-p(2). In both problems, we then have to divide the number of times we have exactly one ace („bad cases“) over the number of times we have at least one ace („all counting cases“). But both numbers change between Problem 1 and 2. The bad cases in Problem 1 are simply 4 times the bad cases in Problem 2. But the counting cases in Problem 1 are less than 4 times the counting cases of Problem 2. It is not 4 times as likely to have any 3 aces than to have the ace of spades plus any two aces. Since the numerator is smaller, 1-p(1) is larger, and p(1) is consequently smaller.