# Yahtzee Waiting Times

I recently was asked about waiting times in the game of Yahtzee. If you do not know the game it suffices to say that you throw 5 dice, and one of the goals is to get 5 dice with the same number, a Yahtzee. I wrote about waiting times a long time ago. But let me repeat the main points.

The trick is to use a number of states S0, S1, …, Sn, and probabilities P(i,j) to get from state Si to state Sj. S0 is the starting state, and Sn is the state we want to reach, in our case the Yahtzee. For a first attempt, we use the number 6s we have on out 5 dice. Then we have 6 states, S0 to S5. With a bit of combinatorics, we can compute the probabilities P(i,j) as

$$p_{i,j} = p^{j-i} (1-p)^{n-i-(j-i)} \binom{n-i}{j-i}$$

If we compute that with Euler Math Toolbox we get the following matrix P.

>p=1/6;
>i=(0:5)'; j=i';
% This is the matrix of probabilities to get from i sixes to j sixes.
0.4018776 0.4018776 0.1607510 0.0321502 0.0032150 0.0001286
0.0000000 0.4822531 0.3858025 0.1157407 0.0154321 0.0007716
0.0000000 0.0000000 0.5787037 0.3472222 0.0694444 0.0046296
0.0000000 0.0000000 0.0000000 0.6944444 0.2777778 0.0277778
0.0000000 0.0000000 0.0000000 0.0000000 0.8333333 0.1666667
0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 1.0000000

This matrix allows us to simulate or compute results about probabilities for 6-Yahtzees.  E.g., we can start with no 6 in S0. After one dice throw, the first row of P yields the distribution of the expected states. We can do dice throws by applying P to the right of our distribution vector.

>v=[1,0,0,0,0,0].P
0.4018776 0.4018776 0.1607510 0.0321502 0.0032150 0.0001286
% After two more throws, we get the following.
>v.P.P
0.0649055 0.2362559 0.3439886 0.2504237 0.0911542 0.0132721


This tells us that the chance to be in state S5 after three throws is just 1.3%.

How about the average time that we have to wait if we keep throwing the dice until we get a 6-Yahtzee? This can be solved by denoting the waiting time from state Si to S5 by w(i) and observe

$$w_i = p_{i,n} + \sum_{j \ne n} p_{i,j} (1+w_j)$$

where n=5 is the final state. Obviously, w(5)=0. Moreover, the sum of all probabilities in one row is 1. Thus we get

$$w_i – \sum_{j=1}^{n-1} w_j = 1$$

Let us solve this system in Euler Math Toolbox.

>B=P[1:5,1:5];
>w=(id(5)-B)\ones(5,1)
13.0236615
11.9266963
10.5554446
8.7272727
6.0000000


The average waiting time for a 6-Yahtzee is approximately 13 throws. If you already got 4 sixes, the average waiting time for the fifth is indeed 6.

We can interpret this in another way. Observe

$$w = (I-B)^{-1} \cdot 1 = (I+B+B^2+\ldots) \cdot 1$$

The sum converges because the norm of B is clearly less than 1. So we interpret this as a process

$$w_{n+1} = B \cdot (w_n + 1)$$

Suppose a waiting room. We have w(n,i) persons in state Si at time n. Now, one more person arrives at each state and we process the states according to our probabilities. On the long run, the number of persons in state S0 will become the waiting time to get out of the system into the final state.

Unfortunately, the probabilities are not that easy to compute when we ask for a Yahtzee of any kind. E.g., if we have 66432 we will keep the two sixes as 66. But we may throw 66555 and switch to keep 555. There is a recursion for the probability to get from i same dice to j same dice. I computed the following matrix which replaces the matrix P above.

 >P
0.0925926 0.6944444 0.1929012 0.0192901 0.0007716
0.0000000 0.5555556 0.3703704 0.0694444 0.0046296
0.0000000 0.0000000 0.6944444 0.2777778 0.0277778
0.0000000 0.0000000 0.0000000 0.8333333 0.1666667
0.0000000 0.0000000 0.0000000 0.0000000 1.0000000


Note that I have only the states S1 to S5 here. With the same method, I get the waiting times.

>B=PM[1:4,1:4];
>w=(id(4)-B)\ones(4,1)
11.0901554
10.4602273
8.7272727
6.0000000


Thus we need only about 11 throws to get any Yahtzee. The probability to get one in three throws is now about 4.6%.

>[1,0,0,0,0].PM.PM.PM
0.0007938 0.2560109 0.4524017 0.2447649 0.0460286


# The „One More Ace?“ Problem

Let me tackle another problem of statistics. I found it on a web page among other problems of the silly kind like „Is 0.999..<1?“. But actually, this one is a real problem that can provide insight into statistical thinking. It is elementary though. I am going to use Euler Math Toolbox (EMT) for some calculations and simulations.

Assume we have a set of Bridge cards (52 cards with 4 colors from 2 to 10, J, Q, K, Ace). This is dealt to four players, each player getting 13 cards. Answer the following two questions.

1. Assume one player gets an ace of any color. What is the probability to get a second ace?
2. Assume one player gets the ace of spades. What is the probability to get another ace?

My take on this is the following guideline:

Device a repeating experiment that mimics your problem!

In this case, we deal the 52 cards „randomly“ over and over again. We assume that every deal has the same probability. And I assume that you know that there are

$$\displaystyle \binom{52}{13} = \frac{52\cdot\ldots\cdot40}{2\cdot\ldots\cdot13} = 635’013’559’600$$

possible ways to select 13 cards from 52 cards. We have assumed that all of these deals come with the same probability. The frequentistic approach is to assume that we deal each distribution with the same frequency on the long (very long) run. So we can just assume we deal each one exactly once, and get our probability (the expected ratio of „good cases“) by

probability = „number of good cases“ / „number of all cases“

For problem 1, „all cases“ are the cases where there is one ace, and „good cases“ are the cases where there are two or more aces.  For problem 2, „all cases“ are the cases with the ace of spaces and „good cases“ the cases with the ace of spades and another ace.

We are then faced with counting, a problem of combinatorics. In this case, we can count the result. In other cases, there is only the option of a Monte-Carlo simulation. Let us do one in EMT.

>function simulate1 (N) ...
$c=1:52; n1=0; n2=0;$  loop 1 to N
$c=shuffle(c);$     k=sum(c[1:13]<=4);
$if k>=1 then$         n1=n1+1;
$if k>=2 then n2=n2+1; endif;$     endif;
$end;$   return n2/n1;
$endfunction >simulate1(1000000) 0.370341025387  EMT may be an easy language to do the programming, but it is not the fastest one. And it is easy only if you know the basic syntax of a matrix language. E.g., the command c[1:13]<=4 returns an array b of 13 zeros and ones (where we simulate the four aces by the numbers 1 to 4). Its element b[i] will be one if and only if c[i]<=4. Summing up we get the number of elements in the first 13 shuffled cards that are smaller or equal to 4. In the code, n1 is then the number of times where one ace showed up and n2 the number times where two showed up. It is important to note that we discarded all shuffles with no ace in the first 13 cards. It is like a condition in Bayesian reasoning. We are asking for the probability that there are two or more aces under the condition that there is one ace. The change for the second problem is only in one line. We have to check if a specific ace (say the number 1) is in the array. >function simulate2 (N) ...$  c=1:52; n1=0; n2=0;
$loop 1 to N$     c=shuffle(c);
$k=sum(c[1:13]<=4);$     if any(c[1:13]==1) then
$n1=n1+1;$         if k>=2 then n2=n2+1; endif;
$endif;$   end;
$return n2/n1;$  endfunction
>simulate2(1000000)
0.561442110359


The function any() returns 1 if any element of the argument is non-zero. Since c[1:13]==1 tests for the elements to be equal to 1 (the ace of spaces) we get the right answer.

As you see, the outcome of the experiment is quite different. This is why the problem is interesting.

Now we want to do the combinatorial job of counting and computing the probabilities. We expect our simulations to get „close“ to the „correct“ probabilities. The mathematics behind this is very basic combinatorics. My feeling is that this kind of reasoning should be in the mathematical class of any higher education. If you do not know the result you should learn it.

The number N(k) of deals for player number 1 which contain exactly k aces is as follows.

$$\displaystyle N_k = \binom{4}{k}\binom{48}{13-k}$$

The reasoning behind this is as follows: We have to select k from the 4 aces, and 13-k cards from the 48 cards that are no aces. If we do this we get all possible selections with 13 cards that contain exactly k aces.

Finally, the probability in problem 1 is („good cases / all couting cases“)

$$\displaystyle p_1 = \frac{N_2+N_3+N_4}{N_1+N_2+N_3+N_4} \approx 0.3696$$

as computed by EMT with

>function N(k) := bin(4,k)*bin(48,13-k)
>sum(N(2:4))/sum(N(1:4))
0.369637191337


This is close enough to our Monte-Carlo simulation.

Note that N(2:4) returns the array [N(2), N(3), N(4)] automatically in a matrix language. This works since the function bin() „maps“ to elements of array arguments. The sum of N(0) to N(4) is the number N of all deals. This allows a simplification with a new insight.

$$\displaystyle p_1 = 1 – \frac{N_1}{N-N_0} = 1 – \frac{a_1}{1-a_0}$$

where a(k)=N(k)/N is the probability to get exactly k aces. Of course, 1-a(0) is the probability got the one or more aces. So the right-hand side is what we expect as the probability for our Problem 1. If you are not as frequentistic as I am you may have come up with this formula immediately.

Let us tackle the Problem 2 now. This is even easier. We are now only dealing with 51 cards (all but the ace of spades), and we deal only 12 cards. It is easy to count the number of deals with none of the remaining aces. The answer is

$$\displaystyle p_2 = 1 – \frac{\binom{48}{12}}{\binom{51}{12}} \approx 0.56115$$

This should now be obvious (after some thought), and our simulation was close enough to this.

So the probabilities are indeed different. This seems counter-intuitive at first thought. But let us try to find a good argument.

Let us look at the complementary probabilities 1-p(1) and 1-p(2). In both problems, we then have to divide the number of times we have exactly one ace („bad cases“) over the number of times we have at least one ace („all counting cases“). But both numbers change between Problem 1 and 2. The bad cases in Problem 1 are simply 4 times the bad cases in Problem 2. But the counting cases in Problem 1 are less than 4 times the counting cases of Problem 2. It is not 4 times as likely to have any 3 aces than to have the ace of spades plus any two aces. Since the numerator is smaller, 1-p(1) is larger, and p(1) is consequently smaller.

# A Problem on Probability

I just found an old problem in Spiegel Online (see here). The problem is absolutely mind-boggling. It goes as follows.

Three prisoners A, B and C are told that two of them will come free on the next day. You are prisoner A. You cannot stand the waiting time and ask a guard to tell you the name of one of the prisoners who will not come free, but not your name, of course. The guard says: B will come free. Now, what is the chance that you will come free too?

The impulsive answer is 1/2. After some thought, it is 2/3 again. Or maybe it is still 1/2? There are good arguments for both. And does „probability“ make any sense at all?

The decision about the two lucky prisoners has been made before A asked, hasn’t it? Asking does not change the chance. So it is still 2/3, assuming that the two prisoners are chosen at random with equal probabilities among the three. Right?

If you have decided in favor of 2/3, I change the question a bit to confuse you. Assume, it was 10 prisoners and 9 become free. After you ask the guard, he sets 8 of them free. You are alone with the last one and claim that the chance for both of you to get free is 9/10. Would you think this is a sensible claim?

To confuse a bit more assume that the guard is free to say any name. He says A. Now what is the meaning of you saying that your probability to become free is 2/3?

What makes this problem hard to treat is the notion of „probability“. For me, probabilities make sense only if there is an experiment going on. I am what they call a frequentist. Now, what could be the experiment in this case? Clearly, it is the selection of the two prisoners by an external force. Without further knowledge, A is among the selection with probability 2/3, i.e., in 2/3 of the cases on average.

The important question is if the knowledge that B is among the selected prisoners changes our experiment. We can argue that it does. The options that A/C are selected is no longer possible. We have only two options left, A/B and C/B. So in 1/2 of the possible outcomes of our experiment, A will come free. What this tells us is that the probability for A to come free is 1/2 provided we know that B will come free.

But does this change of our experiment (discarding A/C) really reflect what is happening in the problem? This can only be clarified by studying the exact question in the problem.

In the online journal, the question was formulated carefully: Does it make sense for A to ask? Will he know more after he gets an answer?

And this is not a question we can answer without assumptions. The reason is that it depends on the preference of the guard in the case B/C. If he does not select to say B or C with equal probability A does indeed know more after asking. But if he does A cannot gain any further knowledge. Our chance is still 2/3.

Let me elaborate that. We start our experiment by selecting 2 of the 3 prisoners. A is among the selected with probability 2/3. Now we ask the guard. In the B/C case, we assume the guard says B with probability p. Now a bit of thinking shows that he will say B in 1/3+p/3 of all outcomes of our experiment, and C in 1/3+(1-p)/3.

• Assuming he says B, A will be set free in (1/3)/(1/3+p/3) = 1/(1+p) of these cases. For p=1/2 this is 2/3 as expected. For p=1 it is 1/2.
• Assuming he says C, A will be set free in (1/3)/(1/3+(1-p)/3)=1/(2-p) of these cases. For p=1/2 this is 2/3 again. For p=1 it is 1. Indeed, if the guard always says B in the case B/C, we know for sure that we get free if he says C.

As always, the problem turns out to be more complicated than it appeared at first sight.

For those of you who still think this is rubbish and the probability must be 2/3 to get free because the selection has been made beforehand, I have good news. You are also right!

Let us compute. A comes free in 1/(1+p) of the cases where the guard says A, and he says that in (1+p)/3 of the cases. A does also come free in 1/(2-p) of the cases where the guard says B, and he does so in (2-p)/3 of all cases. If you add all the cases where A gets free you end up with 2/3.

We have just fine tuned our knowledge about the chances if the guard says B or C. In the case p=1/2, the result is quite easy. The guard will say B or C with probability 1/2, and A gets free in 2/3 of the cases, no matter if the guard says B or C.

# A Geometry Problem with Three Circles and a Point

I friend gave me a geometric problem which turns out to boil down to the figure above. We have three arbitrary circles C1, C2, C3. Now we construct the three green lines through the two intersection points of each pair of these circles. We get lines g11, g12, g13. These lines intersect in one point. Why?

As you know, the argument for the middle perpendicular lines on the sides of a triangle goes like this: Each middle perpendicular is the set of points which have the same distance to two of the corners. So if we intersect two of them in P then d(P,A)=d(P,B) and d(P,B)=d(P,C), which implies d(P,A)=d(P,C). As usual, d(P,A) denotes the distance from P to A. Thus P is also on the third middle perpendicular. Note that we need that P is on the middle perpendicular on AB if and only if d(P,A)=d(P,B).

A similar argument is possible for the angle bisectors. These rays are the set of points with equal distance to two sides. For the heights, such an argument is not available. The standard proof goes by constructing a bigger triangle where the heights are middle perpendiculars. By the way, this proof stops working in Non-Euclidean hyperbolic geometry, where the fact still holds.

Can we make up a proof similar to these proofs for our problem? It turns out that this is indeed possible. The correct value is the following:

$$f(P,C) = d(P,M_C)^2-r_C^2$$

where r(C) is the radius of the circle C, and M(C) is its center. To complete the proof, we only need to show that the line through the intersection of two circles C1 and C2 is the set of all points P such that f(P,C1)=f(P,C2). Then the proof is as easy as the proofs above.

There are several ways to see this.

We could use a result that I call chord-secant-tangent theorem which deals with products of distances of a point on a secant or chord to the circle. But it is possible to get a proof using the Pythagoras only. In the image above we have

$$d(P,Q)^2+d(M,Q)^2 = d(P,M)^2, \quad d(S,Q)^2 + d(M,Q)^2 = r^2$$

Thus

$$f(P,C) = d(P,M)^2 – r^2 = d(P,Q)^2-d(S,Q)^2$$

where C is the circle. Now, if we have two intersection circles, the right-hand side of the equation is the same for both circles, and thus also the left-hand side.

We have seen that f(P,C1)=f(P,C2) for all points on the green line.

But we have to prove the converse too. For this, we observe that D(P,C1)=D(P,C2) implies that P is on a circle around M(C1) and on another circle around M(C2). The two circles meet only in points on the green line.

There is also another way to see that f(P,C1)=f(P,C2) defines the green line. If you work out this equation analytically, you see that it is equivalent to an equation for a line. I leave that to you to check.

Note that there is a second situation where the result does hold too.

In this case, we need f(P,C) for P inside C. It will be negative, but f(P,C1)=f(P,C2) still holds for all points on the line through the intersection, and even if P is on the circles.

There is the following special situation.

It can be seen as a limit case of the previous situation. But it can also be proved by observing that all the tangents have the same length between the intersecting point and the tangent point.

Here is another situation.

The green lines are the sets of points such that f(P,C1)=f(P,C2) for two of the circles. It is quite interesting to construct these lines. I leave that to the reader.

# A Geometric Problem

I found the following problem on my Google+ account: In the image below, the total length of the blue line segments is equal to the total length of the green line segments. The segments form an angle of 60° among each other.

After a while of thinking, I found a nice proof of this phenomenon. It is probably the standard proof, and it also yields a lot of generalizations.

We assume the black dot at (0,0) in the plane and denote the direction of the green line segments by vectors by v1, v2, and v3, all of them with length 1. Then

$$v_1+v_2+v_3=0$$

If we denote the lengths of the green segments with a1, a2 and a3, we have that

$$a_1v_1, \quad a_2v_2, \quad a_3a_3$$

are the endpoints of the segments. Now, let c be the center of the circle. We thus have

$$\|c-a_kv_k\| = \|c+b_kv_k\|$$

for k=1,2,3 if we denote the lengths of the blue segments with b1, b2, b3. Squaring this and using a bit of vector algebra, we get

$$– a_k (c \cdot v_k) + a_k^2 = b_k (c \cdot v_k) + b_k^2.$$

Here, the dot denotes the scalar product of two vectors. Thus

$$a_k – b_k = c \cdot v_k$$

Summing up for k=1,2,3 we get our conclusion

$$a_1+a_2+a_3 = b_1 + b_2 + b_3.$$

This generalizes to finitely many vectors which sum up to 0. E.g., we can take the sides of the following pentagram as vectors.

The corresponding result can be seen in the next figure.

Of course, a special case is 10 line segments with equal angle 36° between them.

# Least Common Multiplier of First N Numbers

There is a famous result that is quite surprising if you see it the first time: The log of the least common multiplier of the first n integers is approximately n. The quotient converges to 1. Or to say it another way: It’s n-th root converges to e. You will find this result in the net. It is equivalent to the theorem about the distribution of prime numbers.

Let us check that with software.

>&load("functs");
>&lcm(makelist(n,n,1,500)), &log(float(%))/500

7323962231895284659386387451904229882976133825128925904634919\
003459630742080371339432775981989132698526831260664840887571331401331\
362333709431244066365980335206141556095539831625389222073894558545019\
7206138869521568000

1.003304233299495


Of course, this infinite integer arithmetic of Maxima is not an effective way to check the result. Instead, we can use the following code to get the quotient for much higher numbers.

>N=100000000;
>p=primes(N);
>f=floor(log(N)/log(p));
>longest sum(f*log(p))/N
0.9999824279662022


This is based on the following observation. To compute the least common multiplier of 1  to n, we have to compute

$$p_1^{k_1} \cdot \ldots \cdot p_m^{k_m}$$

for all primes p(i) less then n, where k(i) is the largest number so that

$$p_i^{k_i} \le n$$

The logarithm of this product is exactly what is computed in the code above. If we make this a function and compute it for N from 10 to 5000, we get the following graph.

# Computing a Special Sum

I recently found the following problem in my Google+ account: What is

$$\dfrac{1}{2 \cdot 4} + \dfrac{1 \cdot 3}{2 \cdot 4 \cdot 6} + \dfrac{1 \cdot 3 \cdot 5}{2 \cdot 4 \cdot 6 \cdot 8} + \ldots$$

This kind of sum of growing products can often be solved with the following trick.

$$1 = a_1 + (1-a_1) = a_1 + (1-a_1) a_2 + (1-a_1)(1-a_2) = \ldots$$

The n-th iteration of this process has yields

$$1 = a_1 + \ldots + (1-a_1)\ldots(1-a_n)a_{n+1} + (1-a_1)\ldots(1-a_{n+1}).$$

This is true for any sequence a_n. If we want to let n go to infinity we need to control the last term in this sum. We want to know its limit. E.g.,

$$1 = \sum\limits_{k=1}^\infty \left( a_{k+1} \prod\limits_{l=1}^k (1-a_l) \right)$$

will be true only if

$$\lim\limits_{k \to \infty} \prod\limits_{l=1}^k (1-a_l) = 0.$$

We can use

$$\log(1-h) \le -h \qquad\text{for } 0 \le h < 1$$

and get the following estimate

$$\log \prod\limits_{l=1}^k (1-a_k) = \sum\limits_{l=1}^k \log (1-a_l) \le – \sum\limits_{l=1}^k a_l.$$

Thus for positive a(l), not larger than 1,

$$\sum\limits_{k=1}^\infty a_l = \infty \quad\Rightarrow\quad \prod\limits_{l=1}^\infty (1-a_k) = 0. \tag1$$

Note that the right conclusion is also true, if the a_l do not converge to 0, but are bounded by 1.

Now we simply apply all this with

$$a_k = \dfrac{1}{2k}$$

and get that the sum above is equal to 1/2.

For the records, we study the conclusion (1) in more detail. First of all for positive a_l between 0 and 1, we get

$$\sum\limits_{k=1}^\infty a_l = \infty \quad\Leftrightarrow\quad \prod\limits_{l=1}^\infty (1-a_l) = 0.$$

We can use the estimate

$$– 2h \le \log(1-h) \le -h \qquad\text{for } 0 \le h < \dfrac{1}{2}<h$$

for this. In case, infinitely many a(l) are larger that 1/2, the result is obvious.

In a similar way we can prove for positive a(l)

$$\sum\limits_{k=1}^\infty a_l = \infty \quad\Leftrightarrow\quad \prod\limits_{l=1}^\infty (1+a_l) < \infty.$$

If the a(l) negative and positive things get more involved.